Advanced Natural Language Processing
M.Sc course, University of Debrecen, Department of Data Science and Visualization, 2025
This course delves into advanced concepts of Natural Language Processing (NLP) and Machine Learning (ML) with a strong focus on modern deep learning techniques. It covers foundational topics such as tokenization, text representation, and pipelines, as well as cutting-edge research in large language models (LLMs), transformers, and their applications. The course emphasizes both theoretical understanding and practical implementation, preparing students to tackle real-world NLP challenges, including security, privacy, and human-centered design. During the semester, students will also have the opportunity to test and train these architectures on real data using cloud-based services (Google Collab).
======
Email address of the Teacher
Consultation
- Monday / Hétfő - 14:00 - 15:30 - IK 107 (in the classroom / teremben)
- Monday / Hétfő - 15:30 - 16:00 - IK I128 (in the office / irodában)
- Monday / Hétfő - 16:00 - 17:30 - IK 204 (in the classroom / teremben)
- Monday / Hétfő - 17:30 - 18:00 - IK I128 (in the office / irodában)
Monday / Hétfő - 18:00 - 19:30 - IK 204 (in the classroom / teremben)
- Tuesday / Kedd - 14:00 - 15:30 - IK 132 (in the classroom / teremben)
- Tuesday / Kedd - 15:30 - 16:00 - IK I128 (in the office / irodában)
- Tuesday / Kedd - 16:00 - 17:30 - IK TEOKJ II. em. 109 (in the classroom / teremben)
- Tuesday / Kedd - 17:30 - 18:00 - IK I128 (in the office / irodában)
- Tuesday / Kedd - 18:00 - 19:30 - IK TEOKJ II. em. 106 (in the classroom / teremben)
Attendance sheet
Attendance sheet status
Requirements
Project Overview
The goal of this assignment is to design, implement, and evaluate a Multi-Agent System (MAS) using Large Language Models (LLMs). Instead of relying on a single “monolithic” chat prompt, you must decompose a complex problem into specific sub-tasks handled by at least two distinct agents.
A key part of this project is understanding the transition from “Prompt Engineering” to “Agentic Workflows.” You will demonstrate how specialized roles, even when powered by the same local model, can produce superior results compared to a single-agent baseline.
Core Requirements
- Attendance sheet: Fewer absences than allowed. Active participation in classes.
- Create a working application, solve a real problem, and present it as a video using the solutions and models learned in class.
- Maximum length of video is 5-10 minutes.
- In the video, each creator must present their own contribution. (for 3-8 minutes)
- The application must be shown in action at the end of the video. (for 1-2 minutes)
- Video size must be 50 MB or less.
- The group members (students) have to send one and clear (without executed blocks) Jupyter notebook (ipynb) file with Python code that contains all the project code.
- Local Execution: All agents must run locally using Ollama.
- Multi-Agent Design: Use at least two agents with unique system instructions (e.g., a Researcher and a Technical Writer, or a Coder and a Reviewer).
- Comparative Analysis: You must compare the multi-agent output against a “Baseline” (a single prompt sent to a standard chat agent).
- Organizing into teams (2-4 people) or working individually.
- It is not certain that the team members receive a uniform grade, but they get grades proportionate to the task they have completed in the project.
Evaluation
Project type 1
Build an LLM-based multi-agent based on your own idea.
| Grade | Requirements |
|---|---|
| 5 (Excellent) | Outstanding original idea. Robust orchestration (e.g., iterative feedback loops between agents). Deep theoretical understanding of parameters (temperature, context window) and clear proof that the multi-agent approach significantly outperformed the baseline. |
| 4 (Good) | Complex interaction logic. Agents provide critical feedback to one another. Detailed comparison with the baseline using specific metrics (e.g., accuracy, code quality, or hallucination reduction). |
| 3 (Fair) | Practitioner Clearly defined roles with distinct System Prompts. The notebook is well-documented and the comparison between the “single chat” and “multi-agent” results is present. |
| 2 (Pass) | Basic implementation of two agents (e.g., Writer + Editor). The code runs on Ollama, but agent differentiation is minimal. The video explains the “how” but struggles with the “why” regarding agentic workflows. |
| 1 (Fail) | The system does not run locally, uses only one agent, or lacks the required video/notebook components. The submission was after the deadline. |
Project type 2
Create (Train/Fine-tune) an efficient LLM for your specific task.
| Task | Grade 3 | Grade 4 | Grade 5 |
|---|---|---|---|
| Data | You need data your project. | Exploratory data analysis (Visualization) | |
| Preprocess | The code should include preprocessing steps. (Cureate the data) | UnitTest, Visualization | |
| Model | Train a LLM model on the data. | Use more models, at least 3 | |
| Evaluate | Evaluate your model on your test data. | Complex Benchmarking | |
| Application | Package your model into an application. Present the your pipeline and use the technical term / keywords correctly. | Create a conclusion. Explain how your model works. Present the limitations and opportunities for further development. |
Submission
- Submission deadline: 2026.05.24
- Submission form
Lecture
- I. Tokenization
- I. Text representation I.
- I. Text representation II.
- II. Large language models I. fancy-rnn
- II. Large language models I. CNN-TreeRNN
- III. Large language models II. Basic
- III. Large language models II. Transformer
- IV. Pretrain - podcast : hu / en
- IV. Question Answering - podcast hu
- V. Prompting RLHF - podcast hu / en
- VI. Life After DPO
- VI. Training
- VII. Efficient Adaptation
- IX. Evaluation
- IX. Evaluation - Example: Arcprize
- X. Human-Centered NLPF
Labor
Basics
Transformers
- IV Transformers - (online)
- V. GPT - (online)
- VI. BERT
Efficient
LLM based AI application
- IX. Local AI
Submitted
Usefull Links
Recommended Literatures and Courses
- Jurafsky, Daniel, and James H. Martin. “Speech and language processing (draft).” Chapter A: Hidden Markov Models (Draft of September 11, 2018). Retrieved March 19 (2018): 2019.
- Eisenstein, Jacob. “Introduction to natural language processing.” MIT press, 2019.
- Goldberg, Yoav. “A primer on neural network models for natural language processing.” Journal of Artificial Intelligence Research 57 (2016): 345-420.
- Francois Chollet. “Deep Learning with Python”
- Hugging Face NLP Course
- MIT Introduction to Deep Learning
- Visual Guide to Transformer Neural Networks - (Episode 1)
- Visual Guide to Transformer Neural Networks - (Episode 2)
- Visual Guide to Transformer Neural Networks - (Episode 3)
- Stanford CS 224N / Ling 280 — Natural Language Processing
Usefull Publications
[2] Improving Language Understanding by Generative Pre-Training
[3] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
[4] Efficient Estimation of Word Representations in Vector Space
