For more than 5000 years, written language has been the most important medium for documentation and transfer of knowledge. Texts are essential particularly for big organisations, as persistency and traceability of written texts are quite important in the entrepreneurial environment.
However, digitized documents usually occur in an unstructured format, limiting the processing by computers. In many cases, the digitalisation only constitutes optical character recognition, resulting in a digital full text, which does not allow any further and deeper analysis other than a simple full text search. This is caused by the simple textual representation of the document, which can't be utilized by a computer. The textual representation of the company policies of a bank does not carry any links to minimum requirements for risk management (MaRisk). An application can't recognize tenant and landlord in a rental agreement. In order to enrich text with such semantic knowledge, different procedures of Natural Language Processing (NLP) have to be applied.
Another variant of digitalisation is the electronic document creation. In the easiest method a document is created unstructured by menas of standard text processing applications. This also results in an unstructured storage of text. Such a process is also very time and kowledge intensive. Either the creator needs to have a considerable amount of experience, or invest a big effort in research. Hence, big potential for improvement not only exists in the semantic analysis of textual documents, but also in the computer-aided document creation.
Nowadays, many sectors face the obstacle called digitlization. So, does the legal domain as well. An increasing number of digitized documents is just one indicator. In this context, contracts play an important role. The two aforementioned problems apply in particular to contracts. Therefore, the scope of this project is the design, development and prototypical implementation of a system to semi-automatically create generic contract models. Those models shall be used in a contract creation tool.
The MOSEGA project aims at the creation of a platform, consisting of two tools. The analysis tool allows a domain expert, to create a generic contract model computer-aided. The underlying process is built in a semi-automatic fashion. That is necessary, as a fully automatic approach is not possible based on current state-of-the-art. Furthermore, an input without any annotations is required. The document creation tool can be used by any user in order to create the respective documents. In order to do so, the tool utilizes the generic contract model and guides the user through the creation process.
The main research questions within this project address:
Y. Bakos, F. Marotta-Wurgler, and D. R. Trossen. Does anyone read the fine print? consumer attention to standard-form contracts. The Journal of Legal Studies, vol. 43, no. 1, pp. 1–35, 2014.
R. Binns and D. Matthews. Community structure for efficient information flow in’tos; dr’, a social machine for parsing legalese. In Proceedings of the 23rd International Conference on World Wide Web. ACM, 2014, pp. 881–884.
D. Braun, E. Scepankova, P. Holl, and F. Matthes. Satos: Assessing and summarising terms of services from german webshops. In Proceedings of the 10th International Conference on Natural Language Generation, 2017, pp. 223–227.
E. Francesconi, S. Montemagni, W. Peters, and D. Tiscornia. Semantic processing of legal texts: Where the language of law meets the law of language. Springer, 2010, vol. 6036.
J. A. Obar and A. Oeldorf-Hirsch. The biggest lie on the internet: Ignoring the privacy policies and terms of service policies of social networking services. Information, Communication & Society, pp. 1–20, 2018.
A.-H. Tan et al. Text mining: The state of the art and the challenges. In Proceedings of the PAKDD 1999 Workshop on Knowledge Disocovery from Advanced Databases, vol. 8. sn, 1999, pp. 65–70.