Back to top

Master's Thesis Gentrit Fazlija

Last modified Jun 3

Toward Optimizing a Retrieval Augmented Generation Pipeline using Large Language Model

 

Introduction & Motivation

Hello! Welcome to my project page :)

Currently, I'm working on my master's thesis. As a student of Mathematics in Data Science, I'm deeply interested in data and how to maximize the inherent value within it. Ever since I was introduced to NLP, I was immediately hooked. Currently, I'm focusing on an information retrieval model, which aims to assist both current students and students-to-be in understanding the different study programs that TUM offers.

Through this, I am leveraging the reasoning capabilities of Large Language Models to extract current data about the study programs at TUM. The goal is to build a model pipeline that answers a variety of questions one might have about this subject field.

Join me on this journey either by checking back on this page around mid-February or connecting with me on LinkedIn.

 

Research Questions

Q1: Would a multi-query formulation system improve the performance?

Q2: Would an optimization approaches, such as ensamble retriever in combination with a child-parent chunking imporove the performance of the passage retriever?

Q3: How much will few-shot promping help us with respect to zero-shot prompting?

Q4: How does the performance change when using a free open-source model compared to a paid closed source model? How can open-sourced models be optimized?

 

References

tba

Files and Subpages