Classifying Semantic Types of Legal Sentences - Portability of Machine Learning Models

Last modified Oct 15, 2018

rechtsinformatik legal informatics natural language processing contract analysis inproceedings text analytics german laws publication legal text analysis

Legal contract analysis is an important research area. The classification of clauses or sentences enables valuable insights such as the extraction of rights and obligations. However, datasets consisting of contracts are quite rare, particularly regarding German language.

Therefore this paper experiments the portability of machine learning (ML) models with regard to different document types. We trained different ML classifiers on the tenancy law of the German Civil Code (BGB) to apply the resulting models on a set of rental agreements afterwards. The performance of our models varies on the contract set. Some models perform significantly worse, while certain settings reveal a portability. Additionally, we trained and evaluated the same classifiers on a dataset consisting solely of contracts, to be able to observe a reference performance. We could show that the performance of ML models may depend on the document type used for training, while certain setups result in portable models.

Keywords: legal sentence classification, portability of machine learning models, natural language processing, text mining

Incoming references

Conference (0)

Publication(s)

Files and Subpages

Name	Type	Size	Last Modification	Last Editor
Gl18b.pdf	File	97 KB	18.12.2018