Back to top

Natural Language Processing - Methods and Applications

Last modified by Anum Afzal Apr 1
   No tags assigned

In today's world, organizations, societies, and institutions rely heavily on natural language for communication. Vast amounts of unstructured information are stored in text documents, posing a significant challenge for machines to swiftly query relevant content and extract structured data. 

Natural language processing (NLP), text mining, and natural language generation collectively encompass machine-driven solutions for text analysis, indexing, and creation. Over the past decades, a wide array of methods and solutions have emerged, propelled by the diverse challenges and rapid advancements in technologies like machine learning. This evolution highlights the vast potential of these tools.

This seminar is designed to explore the core technological components of NLP and their real-world applications. Participants will engage independently in research on a scientific topic using existing literature. They will then present and discuss their findings through presentations and seminar papers, fostering an enriching learning experience

Weekly Sessions

Session Date Topic
-1

10.02.2025

11:00 - 12:00

Preliminary Meeting
1

25.04.2025

10 am - 12 pm

1) Introduction 

2) Word Embeddings: From Bag-of-words to Transformers

3) From N-grams to Large Language Models

2

02.05.2025

10 am - 12 pm

 

3

09.05.2025

10 am - 12 pm

1) Large Language Models: Building Blocks

2) From Binary to Extreme – An Overview of Text Classification Methods and their Challenges

4

16.05.2025

10 am - 12 pm

1) Information Retrieval: (Domain-Specific) Improvement Strategies

2) (Generative) Information Extraction (including NER)

5

23.05.2025

10 am - 12 pm

1) Prompt-Engineering vs. PEFT Approaches

2) Domain Adaptations: In-context Learning, RAG and Fine-tuning

6

30.05.2025

10 am - 12 pm

1) Natural Language Processing and Computer-Human Interaction

2) Knowledge Graphs for Dialogue State Tracking

7

06.06.2025

10 am - 12 pm

1) Text Summarization: Approaches, Challenges and Evaluation

2) Question Answering Systems: Challenges and Approaches

8

13.06.2025

10 am - 12 pm

1) Agent-based Systems using LLM

2) Task-based, Social Conversational Agents & Dialogue Management (Dialogue State Tracking & Policy)

9

20.06.2025

10 am - 12 pm

 

10  

27.06.2025

10 am - 12 pm

1) Model Hallucination

2) Common-sense & Logical Reasoning with LLMs

11  

04.07.2025

10 am - 12 pm

1) Multimodal LLMs: SOTA Models, Techniques and Benchmarks/Usabillity (and Future Developments)

2) Explainability in Natural Language Processing

12  

11.07.2025

10 am - 12 pm

1) Tiny LLMs: Quantization, Pruning and Distillation

2) Large Language Model Alignment

13

18.07.2025

10 am - 12 pm

1) Differential Privacy in Natural Language Processing

2) Adversarial Attacks on (Large) Language Models and their Mitigations

14

25.07.2025

10 am - 12 pm

1) Ethical , Societal, and Legal Aspects of Large Language Models

2) Explainable Fact-Checking with LLMs

 

 

Seminar Topics:

Foundations of NLP: 

  • Word Embeddings: From Bag-of-words to Transformers

    Machines cannot process text but only numbers. Therefore, representation of text in numerical format is necessary for doing any NLP task. This topic will give an overview of historical approaches for word embeddings,including simple word counting, embedding words as vectors in highly dimensional vector spaces, and modern transformer embeddings.  

  • From N-grams to Large Language Models

    This topic will provide an overview of NLP models over the years including some early classification modelsbuilt via Naive Bayes approach and Hidden Marko Models, to Recurrent Neural Networks, to Transformer-based Language Models. 

  • Large Language Models: Building Blocks

    This topic covers the distinction between Language Models and Large Language Models, building blocks and training strategies, and differentparameter scales of the same model. 

 

Techniques in NLP: 

  • From Binary to Extreme – An Overview of Text Classification Methods and their Challenges

    The task of text classification is a fundamental task in Natural Language Processing, yet there are many ways to approach it. This topic will explore the spectrum of text classification methods, as well as discuss exisiting challenges and limitations. 

  • Natural Language Processing and Computer-Human Interaction

    TBD
  • Information Retrieval: (Domain-Specific) Improvement Strategies 

    This topic covers principles in dense and sparse Information Retrieval (IR). Since IR is the foundation of approaches such as RAG-systems it is important to ensure decent quality especially in specific domains like in the medical area. Therefore, this topic investigates techniques to make IR more domain aware. 

  • (Generative) Information Extraction (including NER)

    Information Extraction, especially Named Entity Recognition, is a commonly used technique in order to structure textual data. This topic explores traditional techniques as well as generative approaches. 

Large Language Models: 

  • Domain Adaptations: In-context Learning, RAG and Fine-tuning

    Language Models are often trained on general purpose data and do not perform well in specialized domains. Additionally, they have a knowledge cut-off depending on when they were trained. This topic would cover the techniques used to inject new data into LLMs through either updated weights (fine-tuning) or external techniques like In-context learning and Retrieval Augmented Generation. 

  • Prompt-Engineering vs. PEFT Approaches

    Since training/finetuning LLMs is expensive due to their hardware requirements, several approaches have been developed to decrease the amount of resources needed to adapt LLMs for your specific tasks. This topic will focus on various parameter-efficient fine-tuning (PEFT) and prompt engineering techniques with respect to different use cases.   

  • Text Summarization: Approaches, Challenges and Evaluation

    Automatic Text Summarization helps summarize enormous amounts of data into concise summaries. This topic cover various text summarization techniques, types of summarizations, and challenges involved in evaluation of summaries. 

  • Question Answering Systems: Challenges and Approaches

    One of the oldest tasks in NLP is question answering, i.e., building systems that can provide an answer to a posed user question. This includes multiple-choice QA, short-span QA, or long-form QA. The topic will investigate historical and recent approaches to QA systems and the challenges that come with integrating evidence, evaluating the answers, and more. 

  • Model Hallucination

    Generative language models tend to produce coherent text that sometimes containshallucinationstext that is factually inconsistent and contradicts established knowledge. This topic will investigate the reasons hallucinations emerge, their characteristics in different NLP tasks, and methods to detect and mitigate them. 

  • Common-sense & Logical Reasoning with LLMs 

    LLMs mainly work with unstructured textual input. Their reasoning capabilities can be improved by introducing logical predicates, thinking about problems step-by-step, or other techniques that improve their common-sense reasoning skills. The topic will investigate recent advancements in improving the reasoning process in LLMs. 

  • Multimodal LLMs: SOTA Models, Techniques and Benchmarks/Usabillity (and Future Developments)

    Recent LLMs are already quite good in numerous text-only tasks. However many real-world use cases include for instance text, tables, charts, and images. Under this topic we will have a look into techniques, evaluation and the usability of multimodal models.  

  • Explainability in Natural Language Processing

    This topic explores the explainability aspect of NLP, covering various methods and approaches to interpret model outputs and decisions. It spans explainability techniques for both pre-trained and large language models while also highlighting the challenges and limitations of these methods 

  • Explainable Fact-Checking with LLMs

    This topic covers research on the ability of LLMs to verify claims and generate high-quality explanations and justifications for their assessments. It covers methods, datasets, and evaluation metrics, as well as various approaches for generating justifications and evaluating explanation quality in claim verification. 

  • Tiny LLMs: Quantization, Pruning and Distillation  

    Modern LLMs are often scaled to datacenter-sizes to achieve their resultsbut applications require them to run on edge-devices. This topic covers techniques to reduce their sizewhile keeping as much of the quality as possible.You will explore quantization to reduce general models, and pruning and distillation for task-specific size reduction. 

  • Agent-based Systems using LLM 

    Explore agentic architectures, frameworks for LLMs and separate hype from actual potential. Such systems aim to change how we interact with computers, enable actions by computers and attempt to push AI technology further.  

Conversational AI: 

  • Task-based, Social Conversational Agents & Dialogue Management (Dialogue State Tracking & Policy) 

    With new capabilitiesstemming from LLMs, dialogue systems need to adopt improved mechanisms for state tracking that can’t always rely purelyon context windows. This topic will explore how conversations can be engineered to be empathetic and natural while also achieving the goals of the user.As the lines often get blurred between chitchat and tasks, mechanisms 

  • Knowledge Graphs for Dialogue State Tracking

    TBD

Privacy & Security in Natural Language Processing: 

  • Ethical Societal, and Legal Aspects of Large Language Models 

    Despite the rapidly advancing field of AI and NLP, powered by the impressive capabilities of LLMs, there have come to light a number ofethical, societal, and legal concerns regarding the proliferation of such tools. This topic will systematically explore and introduce the abovementioned concerns, bringing to light important considerations in modern NLP. 

  • Differential Privacy in Natural Language Processing  

    Among privacy-preserving Natural Language Processing methods, the field of Differential Privacy (DP) in NLP has gained considerable traction in the research community, bringing about numerous innovative technical solutions. Despite this, there remains a number of challenges and open research directions. This topic will dive into DP in NLP, with a focus on providing a comprehensive yet approachable introduction to the field. 

  • Adversarial Attacks on (Large) Language Models and their Mitigations

    The beyond rapid growth of LLM usage has sparked countless productive and useful applications. Unfortunately, not all interactions with modern models are good-willed, and in the research sphere, many vulnerabilities have been uncovered with LLMs, along with adversarial attacks that exploit them. This topic will provide a cursory overview of such attacks, but also survey existing and proposed mitigation strategies to defend against malicious adversaries.  

  • Large Language Model Alignment

    The notion of alignment has become extremely important in modern LLMs, yet the discussions surrounding the topic are either divisive or not well-defined. This topic will provide clarity of LLM alignment – what it means, what are the predominant strategies and current thinking, and what are the most exciting research areas moving forward.   

 

 

Deliverables

Prerequisites

  • Regular attendance (not more than one missed session)
  • Active Participation during sessions

Presentations

  • 40 min:
    • 30 min presentation
    • 15 min discussion

Project/Demo

 optional. 0.3 grade bonus

Seminar Paper

  • 8 pages
  • Latex-Template provided on Moodle

Peer Review

  • 2 reviews of other seminar papers



Deliverables

Prerequisites

  • Regular attendance (not more than one missed session)
  • Active Participation during sessions

Presentations

  • 40 min:
    • 30 min presentation
    • 15 min discussion

Project/Demo

 optional. 0.3 grade bonus

Seminar Paper

  • 8 pages
  • Latex-Template provided on Moodle

Peer Review

  • 2 reviews of other seminar papers

Submissions

Deliverable

Deadline

Format

Final presentation slides

Before your talk

Powerpoint, Keynote or PDF

Code for the project

After your talk

.zip

Seminar paper for peer review

28.07.24

PDF based on provided LaTex template

Peer review

04.08.24

txt-File

Revised seminar paper

11.08.24

PDF based on provided LaTex template

Files and Subpages

Name Type Size Last Modification Last Editor
250210_Afzal_NLP_Seminar_Preliminary_slides.pdf 770 KB 10.02.2025 Anum Afzal