The current technological shift and the adoption of technologies by the masses have led to the enormous generation of data. This data, often consisting of personal information, is highly valuable to data-driven organizations to develop personalized products and services for the customers. However, the collection and processing of this data can only be done in accordance with privacy regulations around the world, resulting in a trade-off between ensuring user privacy and utilizing data to its full potential. Therefore, organizations resort to data de-identification, a privacy-enhancing process that maintains user privacy while at the same time preserves data utility.
Even though the data de-identification process enhances privacy, the application of de-identification techniques has a direct impact on data utility due to the resulting information loss. Data utility metrics provide an overview of this information loss by measuring the change in data utility. Thus, helping to understand the effects of de-identifications techniques on a dataset. In this thesis, we propose that to effectively and optimally de-identify the data, the data utility analysis process should be combined with the data de-identification process. Doing so would result in the adoption of a better de-identification strategy for the dataset.
The thesis tests this claim in the context of an automotive enterprise with the aim of enhancing its existing de-identification process. We develop a data utility analysis tool that allows the user to de-identify the data and then further assess its utility through various utility metrics. To implement this tool, various de-identification techniques and utility metrics are explored. Additionally, interviews are conducted with privacy experts at the industry partner to understand the process of de-identification and to derive the technical requirements for this tool in regards to a large enterprise. Finally, we evaluate the effectiveness of this tool by testing it with a real automotive dataset as well as with the privacy experts. From the feedback, we address the potential use cases of such a tool, the future enhancements, and the limitations.
Research Questions
RQ1: What is the state-of-the-art of data utility metrics and data de-identification tools?
RQ2: How could the implementation of an enterprise level data utility analysis tool look like?
RQ3: Given the feedback during the user testing, in what ways could the tool be improved?
Name | Type | Size | Last Modification | Last Editor |
---|---|---|---|---|
Bhawna Saini Kickoff.pdf | 594 KB | 22.09.2021 | ||
Master Thesis Bhawna Saini.pdf | 4,30 MB | 22.09.2021 | ||
SEBIS_Final.pdf | 1,30 MB | 12.07.2021 |