Back to top

Computer-aided Analysis of Privacy Policies

Last modified Oct 11
   No tags assigned

As a result of the progressive digitalization and the companion simplification of transformation, administration, evaluation and transmission of data, the data privacy becomes increasingly relevant. New risks, abuse possibilities and consequently more challenges, in context with the protection of personal data, arise. To meet this changes the compulsory regulations for the companies have to be adapted to the changing conditions.

As a result, on the 27th of April 2016 the General Data Protection Regulation (GDPR) came into effect. It regulates the data protection and privacy for all companies operating in the European Union. Every private company and public places had two years time to implement this rules. In case of violation, fines up to 20 milion euro or 4% of the worldwide annual turnover of the previous financial year, depending which sum is greater, can be imposed. This fact emphasizes the importance of this regulation.

Chapter three of the GDPR deals with the rights of the data subject. A privacy policy must treat each of the rights of a data subject: (1, Article 15) Right to access, (2, Article 16) Right to rectification, (3, Article 17) Right to erasure, (4, Article 18) Right to restrict processing, (5, Article 20) Right to data portability, (6, Article 21) Right to object, (7, Article 22) Right not to be subject to automated decisions. It is crucial for companies to ensure that these rights are covered by their own privacy policies. On the other side, it is also interesting for end consumers to see whether an existing privacy policy does implement the GDPR properly.

The goal of this master's thesis is to implement a tool for the automatic analysis of privacy policies with regard to the coverage of these data subject rights. For this kind of problem, supervised machine learning approaches seem to be well suited. The problem at hand can be either seen as a classification task on a sentence-by-sentence base, as well as a sequence labelling task. An annotated corpus constituting privacy policies will be provided by the chair and can be used for training, testing, and evaluation. The corpus includes annotations on a sentence-level, as well as token-level. Hence, the thesis should investigate in both approaches in order to compare them against each other. A proper UI to utilize the different approaches and visualize the results is part of the thesis, too. To provide an evaluation of the prototype, it shall be evaluated quantitatively.


The following prior knowledge is necessary:
- Experience in the development with Java and Python
- Experience in Machine Learning
- Either experience in NLP or the willingness to dedicate time in order to familiarize yourself with NLP

If you are interested please contact Ingo Glaser with your application, including motivational letter, current CV and transcript of records.

Files and Subpages

There are no subpages or files.