Back to top

Master's Thesis Alexandra Seibicke

Last modified Feb 15

Title

Extracting Semantically Meaningful Context Windows around Class-Specific Keywords

 

Abstract

In our present time, automatic text generation, recognition and translation have become more and more important. With the rise of chatbots like ChatGPT, the use of artificial intelligence in natural language has increased drastically. But to use artificial intelligence for natural language processing, numerous texts have to be pre-processed. Doing so requires annotating texts and extracting keywords, which must be put in meaningful contexts. A word can contain various meanings depending on the context, for example “bank” could be a credit institute, or a shore, or even a verb. To figure out the meaning of a word, a context window must be set, which is the scope around a word used to identify its meaning. The arbitrariness of how long a context window may cause ambiguity, which makes extracting meaningful and useful context windows a significant challenge. This thesis addresses the task of finding such context windows.

For this purpose, we extract keywords from sentences and implement various approaches to determine meaningful context windows. We develop pre- and post-processing steps and evaluate our results. These results can then be used for further by natural language processing pipelines.

We investigate the current state-of-the-art approaches for Word Sense Disambiguation and address the question on how these approaches can be combined with clustering techniques for class-based context filtering. Further, we study what method evaluation approaches are most appropriate to assess the effectiveness of filtering context windows of sentences and the cohesiveness of the window extraction. In addition, we investigate which evaluation approaches are most appropriate to assess the effectiveness of the filtering step and the cohesiveness of the window extraction.

 

Research Questions

1.) What are the current state-of-the-art approaches for Word Sense Disambiguation?

2.) How can these approaches be combined with clustering techniques for class-based context filtering?

3.) What methods can be leveraged to trim meaningful context windows from text chunks containing keywords?

4.) Which evaluation approaches are most appropriate to assess the effectiveness of the filtering step and the cohesiveness of the window extraction?

Files and Subpages