Natural Language Processing - Methods and Applications

Last modified Jun 24

No tags assigned

In today's world, organizations, societies, and institutions rely heavily on natural language for communication. Vast amounts of unstructured information are stored in text documents, posing a significant challenge for machines to swiftly query relevant content and extract structured data.

Natural language processing (NLP), text mining, and natural language generation collectively encompass machine-driven solutions for text analysis, indexing, and creation. Over the past decades, a wide array of methods and solutions have emerged, propelled by the diverse challenges and rapid advancements in technologies like machine learning. This evolution highlights the vast potential of these tools.

This seminar is designed to explore the core technological components of NLP and their real-world applications. Participants will engage independently in research on a scientific topic using existing literature. They will then present and discuss their findings through presentations and seminar papers, fostering an enriching learning experience

Weekly Sessions

Session

Date

Topic

-1

10.02.2025

11:00 - 12:00

Preliminary Meeting

25.04.2025

10 am - 12 pm

1) Introduction

2) Word Embeddings: From Bag-of-words to Transformers

3) From N-grams to Large Language Models

02.05.2025

10 am - 12 pm

09.05.2025

10 am - 12 pm

1) Large Language Models: Building Blocks

2) From Binary to Extreme – An Overview of Text Classification Methods and their Challenges

16.05.2025

10 am - 12 pm

1) Information Retrieval: (Domain-Specific) Improvement Strategies

2) (Generative) Information Extraction (including NER)

23.05.2025

10 am - 12 pm

1) Prompt-Engineering vs. PEFT Approaches

2) Domain Adaptations: In-context Learning, RAG and Fine-tuning

30.05.2025

10 am - 12 pm

1) Natural Language Processing and Computer-Human Interaction

2) Knowledge Graphs for Dialogue State Tracking

06.06.2025

10 am - 12 pm

1) Text Summarization: Approaches, Challenges and Evaluation

2) Question Answering Systems: Challenges and Approaches

13.06.2025

10 am - 12 pm

1) Agent-based Systems using LLM

2) Task-based, Social Conversational Agents & Dialogue Management (Dialogue State Tracking & Policy)

20.06.2025

10 am - 12 pm

27.06.2025

10 am - 12 pm

1) Model Hallucination

2) Common-sense & Logical Reasoning with LLMs

04.07.2025

10 am - 12 pm

1) Multimodal LLMs: SOTA Models, Techniques and Benchmarks/Usabillity (and Future Developments)

2) Explainability in Natural Language Processing

11.07.2025

10 am - 12 pm

1) Tiny LLMs: Quantization, Pruning and Distillation

2) Large Language Model Alignment

18.07.2025

10 am - 12 pm

1) Differential Privacy in Natural Language Processing

2) Adversarial Attacks on (Large) Language Models and their Mitigations

25.07.2025

10 am - 12 pm

1) Ethical , Societal, and Legal Aspects of Large Language Models

2) Explainable Fact-Checking with LLMs

Seminar Topics:

Foundations of NLP:

Word Embeddings: From Bag-of-words to Transformers

Machines cannot process text but only numbers. Therefore, representation of text in numerical format is necessary for doing any NLP task. This topic will give an overview of historical approaches for word embeddings,including simple word counting, embedding words as vectors in highly dimensional vector spaces, and modern transformer embeddings.

From N-grams to Large Language Models

This topic will provide an overview of NLP models over the years including some early classification modelsbuilt via Naive Bayes approach and Hidden Marko Models, to Recurrent Neural Networks, to Transformer-based Language Models.

Large Language Models: Building Blocks

This topic covers the distinction between Language Models and Large Language Models, building blocks and training strategies, and differentparameter scales of the same model.

Techniques in NLP:

From Binary to Extreme – An Overview of Text Classification Methods and their Challenges

The task of text classification is a fundamental task in Natural Language Processing, yet there are many ways to approach it. This topic will explore the spectrum of text classification methods, as well as discuss exisiting challenges and limitations.

Natural Language Processing and Computer-Human Interaction
TBD

Information Retrieval: (Domain-Specific) Improvement Strategies

This topic covers principles in dense and sparse Information Retrieval (IR). Since IR is the foundation of approaches such as RAG-systems it is important to ensure decent quality especially in specific domains like in the medical area. Therefore, this topic investigates techniques to make IR more domain aware.

(Generative) Information Extraction (including NER)

Information Extraction, especially Named Entity Recognition, is a commonly used technique in order to structure textual data. This topic explores traditional techniques as well as generative approaches.

Large Language Models:

Domain Adaptations: In-context Learning, RAG and Fine-tuning 

Language Models are often trained on general purpose data and do not perform well in specialized domains. Additionally, they have a knowledge cut-off depending on when they were trained. This topic would cover the techniques used to inject new data into LLMs through either updated weights (fine-tuning) or external techniques like In-context learning and Retrieval Augmented Generation.

Prompt-Engineering vs. PEFT Approaches

Since training/finetuning LLMs is expensive due to their hardware requirements, several approaches have been developed to decrease the amount of resources needed to adapt LLMs for your specific tasks. This topic will focus on various parameter-efficient fine-tuning (PEFT) and prompt engineering techniques with respect to different use cases.

Text Summarization: Approaches, Challenges and Evaluation

Automatic Text Summarization helps summarize enormous amounts of data into concise summaries. This topic cover various text summarization techniques, types of summarizations, and challenges involved in evaluation of summaries.

Question Answering Systems: Challenges and Approaches

One of the oldest tasks in NLP is question answering, i.e., building systems that can provide an answer to a posed user question. This includes multiple-choice QA, short-span QA, or long-form QA. The topic will investigate historical and recent approaches to QA systems and the challenges that come with integrating evidence, evaluating the answers, and more.

Model Hallucination

Generative language models tend to produce coherent text that sometimes containshallucinations – text that is factually inconsistent and contradicts established knowledge. This topic will investigate the reasons hallucinations emerge, their characteristics in different NLP tasks, and methods to detect and mitigate them.

Common-sense & Logical Reasoning with LLMs

LLMs mainly work with unstructured textual input. Their reasoning capabilities can be improved by introducing logical predicates, thinking about problems step-by-step, or other techniques that improve their common-sense reasoning skills. The topic will investigate recent advancements in improving the reasoning process in LLMs.

Multimodal LLMs: SOTA Models, Techniques and Benchmarks/Usabillity (and Future Developments)

Recent LLMs are already quite good in numerous text-only tasks. However many real-world use cases include for instance text, tables, charts, and images. Under this topic we will have a look into techniques, evaluation and the usability of multimodal models.

Explainability in Natural Language Processing 

This topic explores the explainability aspect of NLP, covering various methods and approaches to interpret model outputs and decisions. It spans explainability techniques for both pre-trained and large language models while also highlighting the challenges and limitations of these methods

Explainable Fact-Checking with LLMs

This topic covers research on the ability of LLMs to verify claims and generate high-quality explanations and justifications for their assessments. It covers methods, datasets, and evaluation metrics, as well as various approaches for generating justifications and evaluating explanation quality in claim verification.

Tiny LLMs: Quantization, Pruning and Distillation

Modern LLMs are often scaled to datacenter-sizes to achieve their resultsbut applications require them to run on edge-devices. This topic covers techniques to reduce their sizewhile keeping as much of the quality as possible.You will explore quantization to reduce general models, and pruning and distillation for task-specific size reduction.

Agent-based Systems using LLM

Explore agentic architectures, frameworks for LLMs and separate hype from actual potential. Such systems aim to change how we interact with computers, enable actions by computers and attempt to push AI technology further.

Conversational AI:

Task-based, Social Conversational Agents & Dialogue Management (Dialogue State Tracking & Policy) 

With new capabilitiesstemming from LLMs, dialogue systems need to adopt improved mechanisms for state tracking that can’t always rely purelyon context windows. This topic will explore how conversations can be engineered to be empathetic and natural while also achieving the goals of the user.As the lines often get blurred between chitchat and tasks, mechanisms

Knowledge Graphs for Dialogue State Tracking

TBD

Privacy & Security in Natural Language Processing:

Ethical , Societal, and Legal Aspects of Large Language Models 

Despite the rapidly advancing field of AI and NLP, powered by the impressive capabilities of LLMs, there have come to light a number ofethical, societal, and legal concerns regarding the proliferation of such tools. This topic will systematically explore and introduce the abovementioned concerns, bringing to light important considerations in modern NLP.

Differential Privacy in Natural Language Processing 

Among privacy-preserving Natural Language Processing methods, the field of Differential Privacy (DP) in NLP has gained considerable traction in the research community, bringing about numerous innovative technical solutions. Despite this, there remains a number of challenges and open research directions. This topic will dive into DP in NLP, with a focus on providing a comprehensive yet approachable introduction to the field.

Adversarial Attacks on (Large) Language Models and their Mitigations

The beyond rapid growth of LLM usage has sparked countless productive and useful applications. Unfortunately, not all interactions with modern models are good-willed, and in the research sphere, many vulnerabilities have been uncovered with LLMs, along with adversarial attacks that exploit them. This topic will provide a cursory overview of such attacks, but also survey existing and proposed mitigation strategies to defend against malicious adversaries.

Large Language Model Alignment

The notion of alignment has become extremely important in modern LLMs, yet the discussions surrounding the topic are either divisive or not well-defined. This topic will provide clarity of LLM alignment – what it means, what are the predominant strategies and current thinking, and what are the most exciting research areas moving forward.

Deliverables

Prerequisites	Regular attendance (not more than one missed session) Active Participation during sessions
Presentations	40 min: 30 min presentation 15 min discussion
Project/Demo	optional. 0.3 grade bonus
Seminar Paper	8 pages Latex-Template provided on Moodle
Peer Review	2 reviews of other seminar papers

Deliverables

Prerequisites	Regular attendance (not more than one missed session) Active Participation during sessions
Presentations	40 min: 30 min presentation 15 min discussion
Project/Demo	optional. 0.3 grade bonus
Seminar Paper	8 pages Latex-Template provided on Moodle
Peer Review	2 reviews of other seminar papers

Submissions

Deliverable	Deadline	Format
Final presentation slides	Before your talk	Powerpoint, Keynote or PDF
Code for the project	After your talk	.zip
Seminar paper for peer review	28.07.25	PDF based on provided LaTex template
Peer review	04.08.25	txt-File
Revised seminar paper	11.08.25	PDF based on provided LaTex template

Incoming references

Files and Subpages

Name	Type	Size	Last Modification	Last Editor
250210_Afzal_NLP_Seminar_Preliminary_slides.pdf	File	770 KB	10.02.2025	Anum Afzal