Back to top

Master's Thesis Andreas Probst

Last modified Jul 28, 2021
   No tags assigned

Improving German Legal Information Retrieval with Contextual Word Embeddings for Word Sense Disambiguation

 

Motivation

Legal research is an important task for lawyers. It has been shown that Word Sense Disambiguation (WSD) has an important impact on general information retrieval (IR) because the ambiguity in natural language can have detrimental effects on the performance of text-based IR systems. On the one hand, precision in IR can be improved if only documents containing the relevant word sense in relation to a search query are retrieved. On the other hand, users can potentially benefit from directly displaying the different word senses of a word for search query terms. Current transformer-based models such as BERT and its derivatives dominate existing benchmarks due to their ability to capture context-sensitive information. They intrinsically contain the ability to encode WSD. It has not yet been investigated how transformer-based models for WSD perform on German legal text corpora. The main goal of this thesis is to qualitatively and quantitatively evaluate the performance of such models on German court rulings.

 

Research Questions

  • What algorithms already exist to automatically classify word senses (knowledge-
    based, unsupervised, and supervised)?
  • What possibilities are there to compensate for the lack of German sense annotated
    (legal) data?

  • How do WSD algorithms perform on German (legal) text?

  • How do legal experts judge the usefulness of our word sense filter?

 

 

Files and Subpages