Back to top

Master's Thesis Johannes Muhr

Last modified Jul 11, 2017

Design, Prototypical Implementation, and Evaluation of an Active Machine Learning Service in the Context of Legal Text Classification

Abstract

 

In the contemporary era, great quantities of legal texts are produced, stored digitally, and retrieved for work later, to the extent that manual classification of these documents, and the manual processing of the content, has become unfeasible. This study provides support for this business need by implementing a microservice (LexML) for legal document and norm classification, which applies the concept of active machine learning. Following the evaluation of possible solutions for (legal) text classification and (active) machine learning in the existing literature, LexML was implemented using Apache Spark MLlib as the machine learning framework. Within the scope of this study, the existing functionality of the legal data-science environment called Lexia was utilized. Various cllabelledassifiers and query strategies were implemented and evaluated using German legal data. Overall, active learning strategies outperform traditional machine learning in terms of the speed of learning and maximum accuracy. The results of the document and norm classification experiments vary greatly: while for document classification, Naïve Bayes and Multi-Layer Perceptron outperform Logistic Regression, the latter is undoubtedly superior to the other two for norm classification.

 

Files and Subpages