Studying the Privacy-Utility Trade-off of Word Embedding Perturbations with a Focus on Sensitivity Analysis and Vector Mapping
Text data is often used to communicate information and forms the basis of many natural language processing (NLP) tasks. In order to use text data in NLP tasks, a successful approach is to represent individual words as word embedding vectors. However, such vector representations can pose privacy risks as they may reveal information about the text or its author. Differential Privacy (DP) is a useful countermeasure to mitigate these risks. One type of DP methods adds noise to perturb the vector representations and thereby privatize word embeddings. Though adding more noise leads to stronger privacy guarantees, it can also impair the word’s semantics encoded in the original word embedding and can reduce the utility of the vectors for downstream NLP tasks. Thus, it is crucial for creating effective private word embeddings to find a suitable trade-off between privacy and utility. The goal of this thesis is to explore the privacy-utility trade-off in the context of embedding vector perturbation methods. The focus will be on investigating the impact of two key factors on privacy and utility: i) different approaches for estimating sensitivity and ii) mapping noisy word embeddings to similar embedding vectors associated with real words. A better understanding of their impact will be helpful for adjusting the trade-off between privacy and utility for downstream NLP tasks. The following research questions have been defined to guide the achievement of this goal:
Name | Type | Size | Last Modification | Last Editor |
---|---|---|---|---|
230807 MT_AlishaRiecker_KickOffPresentation_final.pdf | 1,39 MB | 15.12.2023 | ||
230807 MT_AlishaRiecker_KickOffPresentation_final.pptx | 1,93 MB | 20.12.2023 | ||
231215 MasterThesis_AlishaRiecker.pdf | 781 KB | 15.12.2023 | ||
231218 MT_AlishaRiecker_FinalPresentation.pptx | 1,94 MB | 20.12.2023 |