Back to top

Bachelor's Thesis Oliver Knapp

Last modified Jun 3, 2020
   No tags assigned

Metadata Extraction of German Legal Judgments

 

Legal judgments contain lots of information. This Information can be roughly divided into two categories: Semantic information, which is in context and conclusions. And metadata - like general properties of the documents content - for example publishing date or reference number. Neither semantic information nor metadata are provided as digitally structured data formats. Given that, in the era of automatic electronic data processing, such a presentation of information poses a significant advantage for workflows involving legal judgments, some publishers provide judgments already processed for machine-readability. This processing - however - usually happens in manual labor, and results differ in format from publisher to publisher.

This thesis examines the technical possibilities and difficulties of doing a defined subset of the processing of German legal judgments automatically. It presents and benchmarks an implementation of a program, that extracts defined subset of information from legal judgment provided as plain-text. The targeted pieces of information are from the category metadata. Rule-based approaches, as well as trained models are to be applied.
Furthermore judgments get divided into categories 'civil law' and 'criminal law', so afterwards a segmentation can be done. The distinction is necessary, because 'civil law' and 'criminal law' differ regarding segmentation. Machine learning is used for segmentation and classification of sentences. Earlier mentioned manually processed judgments serve as training data for machine learning, as well as for determining of performance of the implemented program.

The results show that the extraction of metadata from the header of the judgment via rule-based approaches, as well as the classification of paragraphs and segmentation performs well.

Files and Subpages

Name Type Size Last Modification Last Editor
Knapp Bachelor_Thesis.pdf 1,17 MB 03.06.2020
Knapp Final_Slides(1).pptx 3,25 MB 03.06.2020
Knapp Kick-off_Slides.pptx 1,17 MB 03.06.2020