Master's Thesis von Markus Müller

Last modified Nov 13, 2018

legal tech thesaurus thesaurus extension data science word2vec information retrieval label propagation tax law machine learning

Label Propagation for Tax Law Thesaurus Extension

Abstract

With the rise of digitalization, information retrieval has to cope with increasing amounts of digitized content. Legal content providers invest a lot of money for building domain- specific ontologies such as thesauri to retrieve a significantly increased number of relevant documents. Since 2002, many label propagation methods have been developed e.g. to identify groups of similar nodes in graphs. Label propagation is a family of graph-based semi-supervised machine learning algorithms. In this thesis, we will test the suitability of label propagation methods to extend a thesaurus from the tax law domain. The graph on which label propagation operates is a similarity graph constructed from word embeddings. We cover the process from end to end and conduct several parameter-studies to understand the impact of certain hyper-parameters on the overall performance. The results are then evaluated in manual studies and compared with a baseline approach.

This thesis is carried out in cooperation with Prof. Dr. Günnemann who holds the Professorship of Data Mining and Analytics at the chair for Datenbanksysteme at TUM.

Keywords: Thesaurus Extension, Legal Tech, Information Retrieval, Label Propagation, Word Embeddings, Data Science, Machine Learning

Code Repository

GitHub: sebischair/ThesaurusLabelPropagation

Incoming references

Files and Subpages

Name	Type	Size	Last Modification	Last Editor
180604 Mueller Label Propagation Thesaurus Extension MA Kick-off.pdf	File	835 KB	04.06.2018	Markus Müller
180604 Mueller Label Propagation Thesaurus Extension MA Kick-off.pptx	File	9,76 MB	04.06.2018	Markus Müller
181107 Mueller Label Propagation Thesaurus Extension MA Thesis.pdf	File	4,68 MB	07.11.2018	Markus Müller
181109 Mueller Label Propagation Thesaurus Extension MA Final.pdf	File	8,98 MB	09.11.2018	Markus Müller
181109 Mueller Label Propagation Thesaurus Extension MA Final.pptx	File	8,81 MB	09.11.2018	Markus Müller