The number of legal documents is rapidly growing and most of them are available in digital format. These documents are often the basis of a legal expert’s daily work. He often needs to find certain documents, but cannot oversee the huge amount of data without computer aid. The starting point of such a search is frequently an already found document.
Therefore, this thesis provides a prototypical implementation of a relatedness search for legal documents. Beforehand different approaches to identify similarity in texts are discussed by looking at related work and their approaches. These approaches use various techniques from natural language processing and machine learning. They differ in complexity and underlying models.
To ease the effort of implementation, a concept for the architecture of the similarity search is developed. The main goal of the architecture is to allow adding further similarity methods as easily as possible. Also approaches for persistence and visualization are described in this thesis.
Three of the techniques discussed in the literature review are prototypically implemented. Additionally, a similarity search provided by the database elasticsearch is integrated as further point of reference. The described concepts for architecture, persistence and visualization are also realized.
After the implementation is done, the results provided by the different similarity methods are assessed. This is done via a questionnaire, in which the participants rank the search results according their relevance towards a source document. Based on these relevance rankings the performances of the relatedness methods are compared.
In the end a summary of the findings of this thesis is given. Also starting points for further research in this field are shortly mentioned.