With the rise of digitalization, information retrieval has to cope with increasing amounts of digitized content. Legal content providers invest a lot of money for building domain- specific ontologies such as thesauri to retrieve a significantly increased number of relevant documents. Since 2002, many label propagation methods have been developed e.g. to identify groups of similar nodes in graphs. Label propagation is a family of graph-based semi-supervised machine learning algorithms. In this thesis, we will test the suitability of label propagation methods to extend a thesaurus from the tax law domain. The graph on which label propagation operates is a similarity graph constructed from word embeddings. We cover the process from end to end and conduct several parameter-studies to understand the impact of certain hyper-parameters on the overall performance. The results are then evaluated in manual studies and compared with a baseline approach.
This thesis is carried out in cooperation with Prof. Dr. Günnemann who holds the Professorship of Data Mining and Analytics at the chair for Datenbanksysteme at TUM.
|Name||Type||Size||Last Modification||Last Editor|
|180604 Mueller Label Propagation Thesaurus Extension MA Kick-off.pdf||835 KB||04.06.2018||Markus Müller|
|180604 Mueller Label Propagation Thesaurus Extension MA Kick-off.pptx||9,76 MB||04.06.2018||Markus Müller|
|181107 Mueller Label Propagation Thesaurus Extension MA Thesis.pdf||4,68 MB||07.11.2018||Markus Müller|
|181109 Mueller Label Propagation Thesaurus Extension MA Final.pdf||8,98 MB||09.11.2018||Markus Müller|
|181109 Mueller Label Propagation Thesaurus Extension MA Final.pptx||8,81 MB||09.11.2018||Markus Müller|