Back to top

Master's Thesis Furkan Yilmaz

Last modified Jun 17, 2024
   No tags assigned

Assessing the Resilience of Word-Level Differential Privacy Mechanisms: An Adversarial Approach

 

Abstract

Current Natural language processing (NLP) techniques are firmly ingrained in Big Data to manage a wide range of text-based jobs. However, using these methods may require handling private or sensitive data. As a result, privacy in natural language processing has come under more attention. A prominent and extensively studied strategy is Differential Privacy (DP), drawing the focus of many researchers on how to integrate this privacy enhancement tool into NLP models. Nevertheless, an aspect requiring further research is the assessment of word-level DP mechanisms against adversary NLP models. Consequently, this thesis delves into the investigation of unveil privacy, with the objective of revealing the original text inputs from the privatized, obfuscated text outputs generated by the word-level DP mechanisms. Our study focuses on identifying the prerequisites essential for attackers to successfully make correct inferences from privatized data which involves the examination of training methodologies and architectural features essential for the success of an adversary model. We also address suitable metrics to comprehensively assess the success of these adversary models and consequently, our research contributes to the field by developing a benchmark to evaluate and compare the resilience of word-level DP mechanisms in their resistance against an attacker's attempts to infer the original text inputs.

 

Research Questions

  1. What prerequisites must be satisfied in order for an attacker to infer original plaintext from privatized, obfuscated text outputs?
  2. What specific training methodologies and architectural features are essential for a model to possess the capability to be able to retrieve the original plaintext?
  3. What are the suitable methods to assess how well these models work in terms of recovering the original plaintext?
  4. How can a benchmark be devised to evaluate and compare the resilience of alternative models trained with word-level differential privacy for privatizing textual inputs, against an attacker's attempts to infer the original inputs?

 

Files and Subpages