Back to top

Master's Thesis Benjamin Storz

Last modified Jan 15, 2016

Semantic Annotation of Legal Concepts in Terms and Conditions Using the Analytical Components of IBM Watson Explorer



According to experts at Gartner, Forrester and IDC around 80 percent of all the data is contained in unstructured forms. Such information is often contained in word documents, spreadsheets, power point files, audio, video, sensor and log data or external data such as social media feeds. Without information extraction, classification of the content or linguistic analysis, it is not possible to utilize the relevant features contained in unstructured data. In the insurance industry one essential use case is the claim management process which is a manual intensive task with little IT based support. Recent technological developments in the area of natural language processing, semantic technologies and next generation text analytics capabilities show already a potential outlook to overcome these existing limitations. Among other technologies IBM Watson Explorer provides a text analytics platform to extract relevant information from unstructured data and helps to support decisions by providing analytical and semantical insights. The objective of this thesis is to explore the capabilities, limitations and benefits using text analytics technologies of IBM Watson Explorer for a concrete legal protection insurance claim handling scenario. The outcome will help to understand the potential of the selected approach and technology and draw a future roadmap by extending the results in the area of cognitive computing.

Problem statement

To support the clerks that are responsible for the claim handling process, they need to make a decision if a claim is covered or not by the insurance product of the policy holder. To answer this question, the clerk needs various facts that are contained in the claims documents, additional information from the master data of the claimant and the knowledge represented by the terms and conditions of his insurance product. The first problem is to understand which information from a claim is relevant to a clerk and how it can be extracted, the second problem is to find a way to represent the situational knowledge (claims), the domain knowledge (contracts) and the decision logic that the clerk needs to follow. The text analytics platform will exploit this representation to provide evidence for each question in doubt, finally helping the clerk making a decision on a more precise information basis. The thesis will focus on a specific use case in the field of legal protection insurance. For the research 1500 anonymized claims, 50 claims that are annotated with the knowledge of a clerk, as well as the required master data are utilized.


Files and Subpages

Name Type Size Last Modification Last Editor
MAInfo_BenjaminStorz.pdf 15 KB 15.01.2016