Master's Thesis Abraham Duplaa

Last modified Nov 4, 2019

No tags assigned

Research is a time-consuming, dull, and exhausting process that requires large amounts of reading for small amounts of vital information. Especially in the area of public policy and government, researchers spend countless hours looking over dry documents published by public institutions with little success in finding relevant information. Sometimes, crucial information could even be skipped or missed due to the tedious nature of research. There are current linguistic models developed to extract important information from text. However, relevant and crucial information in text is subjective and based on each person’s research. For instance, a researcher interested in the chancellor of Germany will value different information from the same text when compared to a researcher interested in the current state of the German economy.

The goal of this master’s thesis is to develop and evaluate a computational linguistics model capable of performing extractive summarization based on user-specified entities, for the domain of public institutions. Recent progress in computational linguistics models have mostly focused on extracting somewhat objective important sentences in a text, which provides a general idea of the text, but does not provide the key sentences related to a specific entity. Furthermore, most advancements in extractive summarization have been developed and optimized for the English language and not for German.

The domain used for textual data in this thesis is text released from public institutions from the German Government. This includes federal sources such as the Federal Ministry of Justice (Das Bundesministerium der Justiz und für Verbraucherschutz), the Federal Ministry of Transport and Digital Infrastructure (Das Bundesministerium für Verkehr und digitale Infrastruktur), and the many other federal ministries. Furthermore, regional public institutions will also be analyzed.

If time allows and assuming the proposed model is capable of robustly extracting vital information relevant to user-provided entities, further research will be conducted in creating an abstractive summarization which takes the extractive summary as input (Subramanian, 2019). However, since the extractive summary is subjective, the abstractive summarization will be unique to the user-provided entities.

Incoming references

Files and Subpages

Name	Type	Size	Last Modification	Last Editor