Meta Model based Natural Language Generation (MONTAGUE)

Last modified Mar 29, 2020

The increasing digitisation of businesses and the fourth industrial revolution are generating an unprecedented amount of data in organizations. In order to make educated decisions, not only machines, but also humans need to be able to understand this data. Not only because of the sheer amount, but also because of the complexity of the data, new technologies are needed to process and work up this data for human readers.

Cokely et al. (2012) have shown that people struggle to understand even small amount of complex data, if they are presented only numerical. And, although one says a picture is worth a thousand words, different studies e.g. from Gkatzia et al. (2016) and Braun et al. (2015) have shown, that textual representation is often preferable and can lead to better decisions. However, the manual generation of textual reports is very time consuming and therefore expensive.

Approaches

One of the main challenges during the development of Natural Language Generation (NLG) systems is bringing together domain experts and developers, because a lot of domain specific knowledge is necessary to create the right textual representation of data, however, in classical NLG systems, programing skills are necessary to encode this knowledge.

One of the main advantages of meta-model based system is, that they allow users without programing skills to change the model of a system. Therefore, Hybrid Wikis (Matthes, 2011), which combine structured meta-model based data with unstructured textual representation of data, could build the ideal foundation for NLG systems which empower domain experts to create textual representation of data, without needing any programming skills.

The aim of the MONTAGUE (Meta Model based Natural Language Generation) project therefore is to explore different approaches for generating natural language from meta-model based data representations.

Templates

The easiest, however not very powerful, way is to enable end-users to create templates for text generation based on their data model, like it is common in text processing software for creating serial letters. While this method is very easy to implement and easy to use, it can only produce very static texts, without much variety.

Rule-based NLG

A more sophisticated approach is the extension of the meta-model to store additional information, which is necessary to create rules for NLG, e.g. how certain entity types are verbalized. This would seize the concept of classical NLG systems, but make the involvement of software developers redundant.

Stochastic NLG

The most powerful approach would be a with machine learning completely automatized NLG pipeline. Such a system would take a set of structured data and corresponding textual representation as input, would derive rules for content selection, document planning, sentence planning and linguistic realization from it and would be able to automatically generate textual representation from new, structured, input data, based on these rules.

References

Braun, Daniel, Ehud Reiter and Advaith Siddharthan. "Creating Textual Driver Feedback from Telemetric Data." ENLG 2015 (2015): 156.

Cokely, Edward T., et al. "Measuring risk literacy: The Berlin numeracy test." Judgment and Decision Making 7.1 (2012): 25.

Gkatzia, Dimitra, Oliver Lemon, and Verena Rieser. "Natural Language Generation enhances human decision-making with uncertain information." The 54th Annual Meeting of the Association for Computational Linguistics. 2016.

Matthes, Florian, Christian Neubert, and Alexander Steinhoff. "Hybrid Wikis: Empowering Users to Collaboratively Structure Information." ICSOFT (1) 11 (2011): 250-259.