Abstract
Semantic Analysis in Natural Language Processing describes the process of understanding natural language in the way humans communicate, based on its meaning and context. It tries to interpret unstructured texts by analyzing lexical items and their relations. One of the types of Semantic Analysis is Semantic Similarity, which is the task of determining how similar two sentences are in terms of what they mean. This is done by scoring the relationship between text fragments or documents using a defined similarity metric.
In DevOps, there exist multiple security tools that produce huge amounts of findings regarding software vulnerabilities or its quality. These tools are usually for static code analysis, a practice to identify where and how the code doesn't comply with the coding rules, or for dynamic software analysis, a practice that tries to manipulate a running program in order to discover security vulnerabilities.
Often, these DevOps tools have similar or overlapping use cases, resulting in a lot of duplicate security findings which are cumbersome for the security experts to go through and analyze manually. These findings consist of natural language text, and are often accompanied by extra information like severity, reference, and/or a unique ID. For each tool, the findings are contained in separate reports of parsable formats (JSON, XML, etc.).
In this Master Thesis, we intend to explore and compare the usage of different semantic similarity techniques to deduplicate the security findings in DevOps security tool reports. In doing so, we plan to test numerous state-of-the-art semantic similarity models with our security findings data to first cluster semantically similar findings together, and then identify duplicate findings from each cluster.
According to our initial research, there exist four types of semantic similarity methods, namely knowledge-based, corpus-based, deep neural network-based and hybrid ones. We intend to try methods from different types, and assess them quantitatively as well as qualitatively with the help of security experts to see how well the implemented techniques perform on our security reports data.
Research Questions
Sources
Name | Type | Size | Last Modification | Last Editor |
---|---|---|---|---|
Gulraiz - Kick-off Presentation.pdf | 1,70 MB | 15.02.2023 | ||
Master-Thesis-Abdullah-Gulraiz.pdf | 2,95 MB | 15.02.2023 Versions | ||
Master-thesis-final-presentation-Abdullah-Gulraiz.pdf | 2,62 MB | 15.02.2023 |