LEXIA - A Data Science Environment for Semantic Analysis of German Legal Texts

The analysis of legal data using information technology, more specifically text and data mining algorithms, has become very attractive in the field of legal informatics. Legal science and practice consist of data-, knowledge-, and time-intensive tasks, which have always been in the focus of legal informatics. Recent developments in computer science have enabled new possibilities for process and analysis support. This paper contributes a data science environment, which is in particular suited for legal texts, e.g. documents from legislation and jurisdiction but also contracts and patents. The environment consists of a reference architecture and a specific data model. Furthermore, it integrates an easily adaptable and extendable text mining engine allowing reuse of components. The base line architecture for the text mining engine is the Apache UIMA. The environment enables to collaboratively specify linguistic and semantic structures. Thereby, it uses an existing rule-based script language, namely Apache Ruta (rule based text annotation). This paper shows how the system can be used to unveil legal definitions in the German Civil Code (BGB) by not only finding them but also by determining which legal term is defined and how. Using this functionality in a collaborative context, it will be possible to structure unstructured information, i.e., text, and enable data scientists and legal experts to investigate and explore laws, judgments more efficiently using tailored software support.

