Back to top

Modeling, Semantic Analysis, and Generation of Legal Contracts

Last modified Feb 8


For more than 5000 years, written language has been the most important medium for documentation and transfer of knowledge. Texts are essential particularly for big organisations, as persistency and traceability of written texts are quite important in the entrepreneurial environment. 

However, digitized documents usually occur in an unstructured format, limiting the processing by computers. In many cases, the digitalisation only constitutes optical character recognition, resulting in a digital full text, which does not allow any further and deeper analysis other than a simple full text search. This is caused by the simple textual representation of the document, which can't be utilized by a computer. The textual representation of the company policies of a bank does not carry any links to minimum requirements for risk management (MaRisk). An application can't recognize tenant and landlord in a rental agreement. In order to enrich text with such semantic knowledge, different procedures of Natural Language Processing (NLP) have to be applied.

Another variant of digitalisation is the electronic document creation. In the easiest method a document is created unstructured by menas of standard text processing applications. This also results in an unstructured storage of text. Such a process is also very time and kowledge intensive. Either the creator needs to have a considerable amount of experience, or invest a big effort in research. Hence, big potential for improvement not only exists in the semantic analysis of textual documents, but also in the computer-aided document creation.

Nowadays, many sectors face the obstacle called digitlization. So, does the legal domain as well. An increasing number of digitized documents is just one indicator. In this context, contracts play an important role. The two aforementioned problems apply in particular to contracts. Therefore, the scope of this project is the design, development and prototypical implementation of a system to semi-automatically create generic contract models. Those models shall be used in a contract creation tool.


Focus and Goals

The MOSEGA project aims at the creation of a platform, consisting of two tools. The analysis tool allows a domain expert, to create a generic contract model computer-aided. The underlying process is built in a semi-automatic fashion. That is necessary, as a fully automatic approach is not possible based on current state-of-the-art. Furthermore, an input without any annotations is required. The document creation tool can be used by any user in order to create the respective documents. In order to do so, the tool utilizes the generic contract model and guides the user through the creation process.


Research Questions

The main research questions within this project address:

  • How to implement a semi-automated process for the analysis of terms of services in order to create generic contract models? At which process stages is the interaction of a domain expert required and where are automatic analysis suitable?
  • Which NLP approaches are necessary to conduct such analysis?
  • How to concept a NLP pipeline in order to extract semantic structures from contracts?
  • What is a proper software architecture for such a contract analysis platform, being able to semi-automatically create generic contract model?
  • To what extent is a metamodel-based data structure able to properly depict a contract and to improve the contract creation process?
  • Is a computer-aided contract creation benefital for the creator compared to a manual creation?

Contributions (in reverse chronological order)

[Gl19a] Glaser, I.; Landthaler, J.; Matthes,F.: Supporting the Legal Reasoning Process by Classification of Judgments Applying Active Machine Learning, IRIS: Internationales Rechtsinformatik Symposium, Salzburg, Austria, 2019
[Gl18b] Glaser, I.; Scepankova, E.; Matthes, F.: Classifying Semantic Types of Legal Sentences: Portability of Machine Learning Models, Jurix: International Conference on Legal Knowledge and Information Systems, Groningen, Netherlands, 2018
[Gl18a] Glaser, I.; Waltl, B.; Matthes, F.: Named Entity Recognition, Extraction, and Linking in German Legal Contracts

IRISInternationales Rechtsinformatik Symposium, Salzburg, Austria, 2018



Y. Bakos, F. Marotta-Wurgler, and D. R. Trossen. Does anyone read the fine print? consumer attention to standard-form contracts. The Journal of Legal Studies, vol. 43, no. 1, pp. 1–35, 2014.

R. Binns and D. Matthews. Community structure for efficient information flow in’tos; dr’, a social machine for parsing legalese. In Proceedings of the 23rd International Conference on World Wide Web. ACM, 2014, pp. 881–884.

D. Braun, E. Scepankova, P. Holl, and F. Matthes. Satos: Assessing and summarising terms of services from german webshops. In Proceedings of the 10th International Conference on Natural Language Generation, 2017, pp. 223–227.

E. Francesconi, S. Montemagni, W. Peters, and D. Tiscornia. Semantic processing of legal texts: Where the language of law meets the law of language. Springer, 2010, vol. 6036.

J. A. Obar and A. Oeldorf-Hirsch. The biggest lie on the internet: Ignoring the privacy policies and terms of service policies of social networking services. Information, Communication & Society, pp. 1–20, 2018.

A.-H. Tan et al. Text mining: The state of the art and the challenges. In Proceedings of the PAKDD 1999 Workshop on Knowledge Disocovery from Advanced Databases, vol. 8. sn, 1999, pp. 65–70.