Back to top

Master's Thesis von Markus Müller

Last modified Nov 13, 2018

Label Propagation for Tax Law Thesaurus Extension

Abstract

With the rise of digitalization, information retrieval has to cope with increasing amounts of digitized content. Legal content providers invest a lot of money for building domain- specific ontologies such as thesauri to retrieve a significantly increased number of relevant documents. Since 2002, many label propagation methods have been developed e.g. to identify groups of similar nodes in graphs. Label propagation is a family of graph-based semi-supervised machine learning algorithms. In this thesis, we will test the suitability of label propagation methods to extend a thesaurus from the tax law domain. The graph on which label propagation operates is a similarity graph constructed from word embeddings. We cover the process from end to end and conduct several parameter-studies to understand the impact of certain hyper-parameters on the overall performance. The results are then evaluated in manual studies and compared with a baseline approach.

This thesis is carried out in cooperation with Prof. Dr. Günnemann who holds the Professorship of Data Mining and Analytics at the chair for Datenbanksysteme at TUM.

Keywords: Thesaurus Extension, Legal Tech, Information Retrieval, Label Propagation, Word Embeddings, Data Science, Machine Learning

Code Repository

GitHub: sebischair/ThesaurusLabelPropagation

Files and Subpages