Back to top

Bachelor's Thesis Natalia Milanova

Last modified Nov 1, 2023
   No tags assigned

Investigating the State-of-the-Art of Differential

Privacy in Natural Language Processing

 

Natural Language Processing (NLP) has become an essential tool for various applications, including chatbots and sentiment analysis. Despite its rapid growth, this trend towards NLP has also raised concerns about privacy, as the technology often relies on large datasets that contain sensitive user data. To address these concerns, researchers have explored the use of differential privacy, a technique that ensures that individual privacy is not violated when performing data analysis while still providing valuable insights. Nowadays, it is used by researchers, companies and even governments \parencite{kapelke}. However, applying differential privacy in NLP comes with its own set of challenges, such as handling textual data and adding noise while preserving the utility and the semantics of the data \parencite{Thaine}.

Since most academic papers focus on the development of differentially private algorithms, in this paper the research will focus on the appropriate usage of differential privacy when this privacy-preserving technology is mapped to NLP use cases and the arising challenges and benefits of using them together.

Therefore, this paper builds on the literature review of differential privacy and NLP to provide a comprehensive analysis of the properties of differential privacy that are particularly valuable for its practical applications, and provides a foundation for how these properties are applied in NLP. For this purpose, this research paper uses the approach of Gallersdoerfer and Matthes \parencite{gallersdoerfer} to examine the characteristics of differential privacy from two perspectives. First of all, the current work concentrates on the features that are mandatory from a technical point of view, and secondly, it focuses on the additional properties that make this privacy-preserving technology suitable.

Interviews are then conducted to help expand knowledge of differential privacy and NLP use cases. Thus, our main findings describe the characteristics of differential privacy, that define its practical application from the described perspectives, and explain how these characteristics are applied in NLP use cases, and what are the current challenges and success factors of implementing differential privacy with NLP. Finally, the paper outlines the limitations of this study and presents opportunities for future research that will build on the knowledge gathered to date.

In conclusion, this Bachelor’s thesis will cover the following research questions:

  • RQ1: What are the properties of Differential Privacy that define its practical application settings?
  • RQ2: How can these characteristics be mapped appropriately to Natural Language Processing use cases?
  • RQ3: What are the barriers to adoption for differential privacy in natural language processing, and what success factors have been observed?

 

References

[1] C. Kapelke, "Using differential privacy to harness big data and preserve privacy", "https://www.brookings.edu/techstream/using-differential-privacy-to-harness-big-data-and-preserve-privacy/ ", 11.08.2020

[2] P. Thaine, "Differentially Private Natural Language Processing", "https://medium.com/privacy-preserving-natural-language-processing/differentially-private-natural-language-processing-4f18912c5de0 ", 28.01.2019

[3] U. Gallersdörfer and F. Matthes, "Towards Valid Use Cases: Requirements and Supporting Characteristics of Proper Blockchain Applications," in 2020 Seventh International Conference on Software Defined Systems (SDS), Apr. 2020, pp. 202– 207.

 

 

 

Files and Subpages

Name Type Size Last Modification Last Editor
230703 Milanova BT Kick-off.pptx 1,37 MB 06.11.2023
231015 Milanova Bachelor Thesis.pdf 1,02 MB 06.11.2023
231023 Milanova BT Final_Presentation_DPNLP.pptx 2,12 MB 06.11.2023