Master's Thesis Tobias Geilen

Last modified Apr 23

No tags assigned

Advisory in cooperation with Sebastian Sartor.

MOTIVATION

Artificial intelligence (AI) has become a primary focus of both academic institutions and industry research laboratories. Between 2018 and 2022, the number of AI-related publications increased significantly, rising from approximately 145,000 to over 240,000 [1]. The development of foundational models, used for example, in large language models (LLMs) such as ChatGPT, has substantially enhanced AI’s capabilities and broadened its range of applications. New AI research is published each week across diverse domains, including robotics, medicine, and even Earth observation [6]. Given the rapid expansion of AI research and the overwhelming volume of publications, systematically identifying trends and comparing findings across domains has become increasingly difficult. Comprehensive summaries are, therefore, crucial for guiding future AI advancements and improving the evaluation of model performance.

AI technologies now enable the efficient summarization of vast numbers of research papers within a comparably short time. Moreover, AI can be leveraged within the scientific process to facilitate large-scale data collection, aggregation, and analysis, thereby enhancing research efficiency and knowledge dissemination [5].

Prior studies have analyzed the history and development of AI [2], examined its opportunities and risks [3], and extensively surveyed AI advancements to continuously abstract and summarize findings [9], [10], [11]. These surveys offer a broad overview of AI adoption in specific application domains, structuring findings, and recent developments in the rapidly changing field of AI and foundational models [6] [7]. Additionally, some meta-reviews have been published to compare and provide common benchmark tests and evaluation methods for specific AI technologies across different domains [8].

The existing surveys are often highly specific in their scope – either focusing on a certain application domain or a certain model type. Therefore, systematically comparing the adoption and performance of different model types across various application domains is currently methodologically complex due to the fragmented nature of existing studies.

Furthermore, most existing surveys rely on manual reviews of a limited number of publications (typically a few dozen to a few hundred), restricting their ability to capture broader trends in AI adoption. One of the most extensive summaries is provided by the Epoch AI research group. Their flagship dataset currently contains information on over 900 notable models. Nevertheless, their methodology is also centered around manual paper review and information retrieval. [12]

This study aims to bridge these gaps by leveraging AI to conduct a large-scale, systematic literature review of Foundation Model development across various application domains.

The objective is to create a comprehensive overview of models and their characteristics - like the Epoch AI dataset - by developing a software tool, which automates the manual paper review and utilizes LLMs for the information retrieval process to extend the currently existing overviews:

1. “Build a fully automated systems to identify papers introducing FMs"

2. “Are NLP tools able to extract relevant objective parameters from previously identified papers? What is the quality of the results from the automated solution compared to the Epoch AI dataset”

3. “What is the status quo in FM development for the specific field of Robotics and how to the models compare in their characteristics to other fields?”

4. Are there statistically relevant differences between the robotics domain and other domains (such as language)

This research is relevant both theoretically and practically. Theoretically, it will provide insights into how well large language models can support and automate the scientific process. Practically, the findings will be valuable for AI practitioners and organizations to understand how Foundation Models are developed and adjusted for a specific application domain.

RESEARCH METHODOLOGY

This study will be conducted in five phases, combining both quantitative analysis and AI-powered tools to provide a comprehensive and systematic review of AI research, with a specific focus on foundational models.

Analyze existing AI research tools Prior to starting the research phases and the development of a custom AI-based research support solution, a broad market analysis will be conducted to understand the current capabilities of existing tools and leverage them in the following phases.
Definition of Key Values to Extract: The first research phase will involve defining the key data points to be extracted from the selected research papers. These will include characteristics such as model types, the number of parameters, benchmark results, and application domains (e.g., healthcare, robotics, finance) and follow closely the structure of the Epoch AI dataset. Establishing these values at the outset ensures that the data collected is relevant for understanding key characteristics of foundational models and also comparable to the Epoch AI dataset to benchmark the quality of the automated information retrieval.
Finding and Comparing Data Sources: The second phase will focus on identifying and comparing data sources. The Epoch AI dataset provides arXiv publications as sources for many of the included models.Additionally, further publications on Foundation Models with a focus on robotics will be gathered. In the best-case scenario APIs such as the one of arXiv or HuggingFace can be used to systematically access publications and model information.
Building the Extraction Tool: In the third phase, an LLM-based tool will be developed to extract the predefined values identified in Step 1 from the selected publications. The tool will be designed to automatically extract key data points, such as model characteristics (e.g., type, parameters) and application domains, without overfitting to the provided publications, so it can also be utilized for different publications and formats. The Epoch AI dataset will serve as benchmark to validate the output quality.
Analysis of Foundation Models for Robotics: Once the developed tool provides high-quality output for the initial dataset, it will be used to analyze different publications on Foundation Models for the specific application domain Robotics. The output data can then be seen as addition to the Epoch AI dataset, which is mainly focused on large language models.
Analysis and Visualization: In the final phase, the extracted data will be analyzed to identify trends and patterns in the adoption and performance of foundational models in the field of robotics and compared to existing data from other application domains. Statistical methods will be used to assess how characteristics such as model type and size influence performance and to track trends over time in the analyzed foundational models. The findings will be visualized through charts and graphs to provide a clear, comprehensive overview of the role of foundational models.

Incoming references

Files and Subpages

There are no subpages or files.