The need to protect sensitive information in digital text data has led to the development of many automated systems that “privatize” text, that is, remove or mask personal details to protect privacy. In our environment, several approaches exist that achieve this goal, but comparing their effectiveness is challenging. An existing benchmark has been implemented to evaluate these text privatization systems. This benchmark is built as an aggregate of different evaluation modules, each assessing one specific aspect of the privatization process.
Although the benchmark provides a comprehensive evaluation by aggregating different modules, there is still room for improvement. The current setup requires refinement to enhance the modular evaluation, incorporate versioning to track improvements, and increase efficiency. This work will explore how to fine-tune the benchmark to deliver even more reliable evaluations across various text privatization methods.
There is a growing demand for clear, standardized, and efficient evaluation tools in data privacy. An improved benchmark not only allows for better comparison between different anonymization techniques but also supports developers and researchers in refining their methods to meet regulatory standards and practical requirements.
The objective of this thesis is to enhance the existing benchmark by refining its modules, improving versioning and efficiency, and ensuring it meets the operational requirements for evaluation. The system will serve as a practical tool for assessing text privatization methods in terms of privacy, utility, and performance.
RQ2: How can the platform be designed and engineered to fulfill key software quality attributes such as performance, maintainability, extensibility, and fairness in the benchmarking process?
There are no subpages or files.