Nowadays, many sectors face the obstacle called digitalization. So does the legal domain as well. The rising of legal technology is highlighted by the in- creasing number of digitized legal documents, in particular legal contracts. After capturing these, in many cases they are only available as unstructured data and thus barely processable by computer systems. However, the semantic knowledge within such a document is highly relevant to the reader. Further- more, different contracts often incorporate diverse wording, while also includ- ing a lot of superfluous information. All these facts hamper the utilization of digitized legal contracts.
This work provides support for this business need by implementing a software component, enabling semantic analysis and structuring of legal contracts. In order to implement this process, common Natural Language Processing (NLP) tasks like Named Entity Recognition (NER) and Named Entity Disambigua- tion (NED) are incorporated into an Apache UIMA pipeline. In the course of this study, the existing functionality of Lexia a collaborative legal data sci- ence environment is utilized. Hereby, the software component being developed during this thesis is integrated into Lexia.
A new approach to NER, tailored to legal contracts, which are based on tem- plates, called templated NER is implemented in the framework of this study. Then this method is enhanced by so called templated NED. The evaluation of the developed system, using German legal data, demonstrates the applicability of such approaches. Templated NER performed with an overall F1 measure of 0.92, while implementations based on GermaNER and DBpedia Spotlight only achieved 0.8, respectively 0.87.
Keywords: Natural Language Processing, Named Entity Recognition, Named Entity Disambiguation, Named Entity Linking, Legal Text Analysis, UIMA, GermaNER, DBpedia Spotlight