Customizable Anonymization of German Legal Court Rulings using Domain-specific Named Entity Recognition
In the legal domain, published court decisions play a vital role for legal researchers and developers of data-driven legal software. However, original court rulings usually contain sensitive information. Therefore, the publication of these documnents highly depends on the underlying anonymization process. The anonymization of legal documents is mainly done manually by trained employees and is generally considered an inefficient and error-prone process. Additionally, previos research has shown that generalized automated anonymization fails to adapt to vastly different anonymization standards of individual courts. Interviews with court employees from different courts immediately suggest that judges prefer to customize anonymization solutions in order to flexibly adapt to case-specific requirements.
In this work, we propose and evaluate a customizable approach to automatically anonymize legal court decisions using predefined configurations. This approach utilizes a trained machine learning model to detect special named entities in text paragraphs within original court rulings and masks sensitive named entity types according to the predefined rules. The detected entity types are specially chosen for this anonymization task and may be extended by future work.