SAP and TUM collaborate in strategic areas, where applied research can provide a positive impact to business and people. The portfolio of joint activities cover a broad set of areas as SAP‘s solutions is used in various aspects. "Enterprise AI" is a crucial part of that portfolio under which several research projects are being carried out. There is currently two on-going projects and one completed project.
Overview
Retrieval-Augmented Generation (RAG) pipelines are widely used to enhance the performance of large language models by combining retrieval mechanisms with generative modeling. However, optimizing these pipelines is complicated and time-consuming, as it involves carefully selecting and tuning components such as retrievers, filters, and generators. Bayesian Optimization (BO) is a probabilistic method designed to efficiently identify optimal solutions for complex optimization problems. This research will investigate the application of BO within RAG pipelines to automate and accelerate the optimization process. Specifically, we aim to explore how BO can be effectively utilized for both discrete module selection and continuous hyperparameter tuning within a unified framework. The research will also evaluate how robust Bayesian Optimization is under different conditions, including variations in data quality and domain characteristics. The main goal of this research is to find the optimal configurations for RAG pipelines while reducing the optimization time.
Joule is SAP’s Business Digital Assistant across platforms such as smart watches, tablets, mobile that supports text functionality. But How can a voice assistant understand business specific vocabulary and meaning? Given a large choice of LLMs, it is essential to evaluate and benchmark STT, TTSandvoicecloning models from different vendors on both performance and cost. There is lack of existing benchmarks to compare the models against. LLMs should be tested on their ability to comprehend domain-specific data, which is not available due to privacy concerns. Customers have different business processes, data, and hence vocabulary and acronyms. SAP does not train models on customer data and also can’t have direct access to many of the customer data. The goal is to extending Joule to add voice support with state-of-the-art models would bring great value to SAP’s Customers.
TBD
During the initial phases, this project explored semi-supervised learning frameworks with applications of text generation using deep learning models. This project addressed several research questions, such as the development of an approach for automated labeling of data generated via chatbot interactions, the integration of user feedback for an enhanced learning experience for the chatbot, and the improvement of chatbot responses where only a limited dataset is available for training. For the initial cycle of this project, the focus was on a HR chatbot use-case where a semi-supervised learning framework is used to generate response for user utterance. A human-in-the-loop is also embedded into the semi-supervised learning framework to correct the generated responses before they are appended to the training data.
During the second year of the project, we worked with the domain experts of SAP SE to develop an HR support chatbot as an efficient and effective tool for addressing employee inquiries. We inserted a human-in-the-loop in various parts of the development cycles such as dataset collection, prompt optimization, and evaluation of generated output. By enhancing the LLM-driven chatbot's response quality and exploring alternative retrieval methods, we have created an efficient, scalable, and flexible tool for HR professionals to address employee inquiries effectively. Our experiments and evaluation conclude that GPT-4 outperforms other models and can overcome inconsistencies in data through internal reasoning capabilities. Additionally, through expert analysis, we infer that reference-free evaluation metrics such as G-Eval and Prometheus demonstrate reliability closely aligned with that of human evaluation.
Would an SSL framework along with Human in the loop generate high quality labelled data to be used for model training?
Can direct inference yield adequate results without the need for fine-tuning, and what prompt-tuning techniques can be used to improve the quality of the responses?
This project is part of the SAP @ TUM Collaboration Lab and hence fosters a close research partnership with SAP Intelligent Enterprise Solutions and Artificial Intelligence Center of Excellence.