Legal judgments contain lots of information. This Information can be roughly divided into two categories: Semantic information, which is in context and conclusions. And metadata - like general properties of the documents content - for example publishing date or reference number. Neither semantic information nor metadata are provided as digitally structured data formats. Given that, in the era of automatic electronic data processing, such a presentation of information poses a significant advantage for workflows involving legal judgments, some publishers provide judgments already processed for machine-readability. This processing - however - usually happens in manual labor, and results differ in format from publisher to publisher.
This thesis examines the technical possibilities and difficulties of doing a defined subset of the processing of German legal judgments automatically. It presents and benchmarks an implementation of a program, that extracts defined subset of information from legal judgment provided as plain-text. The targeted pieces of information are from the category metadata. Rule-based approaches, as well as trained models are to be applied.
Furthermore judgments get divided into categories 'civil law' and 'criminal law', so afterwards a segmentation can be done. The distinction is necessary, because 'civil law' and 'criminal law' differ regarding segmentation. Machine learning is used for segmentation and classification of sentences. Earlier mentioned manually processed judgments serve as training data for machine learning, as well as for determining of performance of the implemented program.
The results show that the extraction of metadata from the header of the judgment via rule-based approaches, as well as the classification of paragraphs and segmentation performs well.
Name | Type | Size | Last Modification | Last Editor |
---|---|---|---|---|
Knapp Bachelor_Thesis.pdf | 1,17 MB | 03.06.2020 | ||
Knapp Final_Slides(1).pptx | 3,25 MB | 03.06.2020 | ||
Knapp Kick-off_Slides.pptx | 1,17 MB | 03.06.2020 |