Master Thesis Bhawna Saini

Last modified Jul 19, 2021

No tags assigned

Design and Implementation of a Data Utility Analysis Tool to Optimize the Application of De-Identification Techniques

The current technological shift and the adoption of technologies by the masses have led to the enormous generation of data. This data, often consisting of personal information, is highly valuable to data-driven organizations to develop personalized products and services for the customers. However, the collection and processing of this data can only be done in accordance with privacy regulations around the world, resulting in a trade-off between ensuring user privacy and utilizing data to its full potential. Therefore, organizations resort to data de-identification, a privacy-enhancing process that maintains user privacy while at the same time preserves data utility.

Even though the data de-identification process enhances privacy, the application of de-identification techniques has a direct impact on data utility due to the resulting information loss. Data utility metrics provide an overview of this information loss by measuring the change in data utility. Thus, helping to understand the effects of de-identifications techniques on a dataset. In this thesis, we propose that to effectively and optimally de-identify the data, the data utility analysis process should be combined with the data de-identification process. Doing so would result in the adoption of a better de-identification strategy for the dataset.

The thesis tests this claim in the context of an automotive enterprise with the aim of enhancing its existing de-identification process. We develop a data utility analysis tool that allows the user to de-identify the data and then further assess its utility through various utility metrics. To implement this tool, various de-identification techniques and utility metrics are explored. Additionally, interviews are conducted with privacy experts at the industry partner to understand the process of de-identification and to derive the technical requirements for this tool in regards to a large enterprise. Finally, we evaluate the effectiveness of this tool by testing it with a real automotive dataset as well as with the privacy experts. From the feedback, we address the potential use cases of such a tool, the future enhancements, and the limitations.

Abstract

Research Questions

RQ1: What is the state-of-the-art of data utility metrics and data de-identification tools?

RQ2: How could the implementation of an enterprise level data utility analysis tool look like?

RQ3: Given the feedback during the user testing, in what ways could the tool be improved?

Incoming references

Files and Subpages

Name	Type	Size	Last Modification
Bhawna Saini Kickoff.pdf	File	594 KB	22.09.2021
Master Thesis Bhawna Saini.pdf	File	4,30 MB	22.09.2021
SEBIS_Final.pdf	File	1,30 MB	12.07.2021