Extending Thesauri Using Word Embeddings and the Intersection Method

Last modified Jul 4, 2018

rechtsinformatik legal informatics inproceedings german laws publication

Abstract

In many legal domains, the amount of available and relevant literature is continuously growing. Legal content providers face the challenge to provide their customers relevant and comprehensive content for search queries on large corpora. However, documents written in natural language contain many synonyms and semantically related concepts. Legal content providers usually maintain thesauri to discover more relevant documents in their search engines. Maintaining a high-quality thesaurus is an expensive, difficult and manual task. The word embeddings technology recently gained a lot of attention for building thesauri from large corpora. We report our experiences on the feasibility to extend thesauri based on a large corpus of German tax law with a focus on synonym relations. Using a simple yet powerful new approach, called intersection method, we can significantly improve and facilitate the extension of thesauri.

Incoming references

Conference (0)

Publication(s)

Files and Subpages

Name	Type	Size	Last Modification	Last Editor
La17a.pdf	File	1,01 MB	04.07.2018