Legal research is an important task for lawyers. It has been shown that Word Sense Disambiguation (WSD) has an important impact on general information retrieval (IR) because the ambiguity in natural language can have detrimental effects on the performance of text-based IR systems. On the one hand, precision in IR can be improved if only documents containing the relevant word sense in relation to a search query are retrieved. On the other hand, users can potentially benefit from directly displaying the different word senses of a word for search query terms. Current transformer-based models such as BERT and its derivatives dominate existing benchmarks due to their ability to capture context-sensitive information. They intrinsically contain the ability to encode WSD. It has not yet been investigated how transformer-based models for WSD perform on German legal text corpora. The main goal of this thesis is to qualitatively and quantitatively evaluate the performance of such models on German court rulings.
Research Questions
What possibilities are there to compensate for the lack of German sense annotated
(legal) data?
How do WSD algorithms perform on German (legal) text?
How do legal experts judge the usefulness of our word sense filter?
Name | Type | Size | Last Modification | Last Editor |
---|---|---|---|---|
210111 Probst Word Sense Disambiguation in Information Retrieval KickOff.pdf | 2,93 MB | 10.01.2021 | ||
210715 Probst Master Thesis.pdf | 4,24 MB | 15.07.2021 | ||
210726 Probst Word Sense Disambiguation in Information Retrieval Final Presentation.pdf | 3,99 MB | 26.07.2021 |