Around 80% of the data generated today are unannotated and unstructured text, making it challenging for AI applications to leverage it effectively. Manual annotation by domain experts can provide high precision and incorporate domain-specific knowledge, but it is expensive, inefficient, and unscalable. This motivates the need for a hybrid approach that combines Natural Language Processing techniques and domain expertise to more efficiently annotate and classify text data. The proposed approach is divided into a pipeline of multiple sub-tasks with the goal of creating meaningful datasets that are classified according to defined features.
The first step in this pipeline is to support the domain expert in defining the classes with the help of keyword extraction techniques, which is the focus of this thesis. In this context, the role of a domain expert involves conceptualizing the desired classes by assigning relevant tags or creating class descriptions. This domain-specific knowledge can then be injected into state-of-the-art keyword extraction methods, offering support for the domain expert to better identify related class-specific keywords and potentially refine the scope of the class. The objective is to create a more efficient and accurate approach to keyword extraction that is tailored to the specific needs of the domain expert.
The results of this study can provide a valuable contribution to the development of domain-specific datasets for AI applications, particularly for small and medium-sized companies with limited resources. The evaluation of the modified approach will involve domain experts and their assessment of the comprehensiveness of the resulting keyword sets.
How can domain experts be supported in the definition of classes for characterizing large text corpora, particularly in the creation of keywords and keyphrases?
In what way can the modified approach be evaluated by domain experts to validate the representativeness of the resulting keyword sets?
Name | Type | Size | Last Modification | Last Editor |
---|---|---|---|---|
230626 Weixin Yan Kickoff.pptx | 2,23 MB | 29.01.2024 | ||
230915 Weixin Yan Thesis.pdf | 5,61 MB | 29.01.2024 | ||
231002 Weixin Yan Final Presentation.pptx | 3,73 MB | 29.01.2024 |