Master's Thesis Alexander Karpp

Last modified Mar 27

No tags assigned

Abstract

In the realm of Natural Language Processing (NLP), the advancement and refinement of Large Language Models (LLMs) heavily rely on the quality and quantity of textual data. However, the users, being the primary source of such data, face inherent risks in terms of privacy when sharing or disclosing sensitive information embedded within text. Drawing parallels between Differential Privacy (DP) mechanisms employed in database systems and their application in LLMs underscores their shared emphasis on privacy preservation. Yet, the question arises: how can we discern and assess Differential Privacy rewriting mechanisms concerning their efficacy in privatizing textual content?

To address this question, a systematic literature review is conducted to identify classic and metric DP mechanisms, which will be further categorized according to their granularity: word, sentence, or document level DP. Having identified a representative sample of each evaluation mechanism, their privacy budgets (epsilon) will have to be composed to ensure comparability. The next step of the research methodology will involve implementing the selected mechanisms and aggregating them into a repository. Finally, the chosen mechanisms will be evaluated by users with respect to epsilon on a set of textual samples.

Research Questions

1. How can one systematically structure currently available implementations of differential privacy text privatization mechanisms?

2. How can a representative sample of the identified mechanisms be implemented and evaluated in a user study?

3. What insights can be gained about the factors influencing user perception on differentially private text privatization?

Expected Outcome

The systematic research and structuring of available implementations will result in a comprehensive taxonomy or framework categorizing differential privacy text privatization mechanisms based on their implementation approaches, granularity levels, and other relevant factors. This outcome will provide researchers and practitioners with a clear understanding of the landscape of existing mechanisms, facilitating comparison and selection based on specific requirements.
The investigation into factors influencing user perception on differentially private text privatization will provide valuable insights into preferences regarding privacy-preserving techniques applied to textual data. This outcome may include identifying key factors such as privacy budget (epsilon), granularity of privatization, user demographics, and contextual variables that influence user perception. The outcome will contribute to a better understanding of user needs and preferences in the context of privacy-preserving text privatization mechanisms, informing the design and implementation of future systems and policies.

Incoming references

Files and Subpages

There are no subpages or files.