As a result of the progressive digitalization and the companion simplification of transformation, administration, evaluation and transmission of data, the data privacy becomes increasingly relevant. New risks, abuse possibilities and consequently more challenges, in context with the protection of personal data, arise. To meet this changes the compulsory regulations for the companies have to be adapted to the changing conditions.
As a result, on the 27th of April 2016 the General Data Protection Regulation (GDPR) came into effect. It regulates the data protection and privacy for all companies operating in the European Union. Every private company and public places had two years time to implement this rules. In case of violation, fines up to 20 milion euro or 4% of the worldwide annual turnover of the previous financial year, depending which sum is greater, can be imposed. This fact emphasizes the importance of this regulation.
The goal of this master's thesis is to implement a tool for the automatic analysis of privacy policies with regard to the coverage of these data subject rights. For this kind of problem, supervised machine learning approaches seem to be well suited. The problem at hand can be either seen as a classification task on a sentence-by-sentence base, as well as a sequence labelling task. An annotated corpus constituting privacy policies will be provided by the chair and can be used for training, testing, and evaluation. The corpus includes annotations on a sentence-level, as well as token-level. Hence, the thesis should investigate in both approaches in order to compare them against each other. A proper UI to utilize the different approaches and visualize the results is part of the thesis, too. To provide an evaluation of the prototype, it shall be evaluated quantitatively.