Back to top

Master's Thesis Karim Arabi

Last modified Aug 11, 2023

Automated Refinement of an Ontology of NLP Research Concepts

Problem Statement

In a 2018 scientific journal, it was stated that “early studies discovered an exponential growth in the volume of scientific literature … a trend that continues with an average doubling period of 15 years” [1]. 

With the ever-expanding purview of available research studies and documents becoming available, the discoverability of such papers has become challenging. As the rate of scientific publications is increasing with time, many publications with relevant topics could be omitted from a simple search due to a difference in terminology.

Motivation

A better means of sorting through and finding relevant research topics rather than through traditional keyword searches would be a helpful tool for researchers to utilize. A domain-specific ontology would satisfy this issue, providing a search through semantic understanding of a requested topic. Researchers could also utilize this ontology to explore direct relations to the queried topic, as well as discover new avenues for research.

Objective

We aim to build off the work of a previous master’s thesis to expand what has been defined. The aforementioned thesis provides a solid foundation for keyphrase extraction and filtering from a corpus of research documents about Natural Language Processing (NLP). It also presents concept clustering and hierarchical relationships between them. This paper continues to improve on the ideas presented to build a deeper semi-automated relationship model for NLP. concept merging idea, as well as to deepen the hierarchical relationships into an ontology.

Research Questions

How do we deepen the hierarchical relationships between research concepts?

  • How to use manual refinement to improve top-level navigation for users?

  • How to enhance the existing concepts and relations through automated refinement approaches?

  • How to transition from a taxonomy to an ontology with more complex relations?

 

References:

1. L. Bornmann, R. Haunschild, and R. Mutz. Growth rates of modern science: a latent piecewise growth curve approach to model publication numbers from established and new literature databases. Mar. 2018.

Files and Subpages