Advanced Natural Language Processing

M.Sc course, University of Debrecen, Department of Data Science and Visualization, 2025

This course delves into advanced concepts of Natural Language Processing (NLP) and Machine Learning (ML) with a strong focus on modern deep learning techniques. It covers foundational topics such as tokenization, text representation, and pipelines, as well as cutting-edge research in large language models (LLMs), transformers, and their applications. The course emphasizes both theoretical understanding and practical implementation, preparing students to tackle real-world NLP challenges, including security, privacy, and human-centered design. During the semester, students will also have the opportunity to test and train these architectures on real data using cloud-based services (Google Collab).

======

Email address of the Teacher

Consultation

  • Monday / Hétfő - 14:00 - 15:30 - IK 107 (in the classroom / teremben)
  • Monday / Hétfő - 15:30 - 16:00 - IK I128 (in the office / irodában)
  • Monday / Hétfő - 16:00 - 17:30 - IK 204 (in the classroom / teremben)
  • Monday / Hétfő - 17:30 - 18:00 - IK I128 (in the office / irodában)
  • Monday / Hétfő - 18:00 - 19:30 - IK 204 (in the classroom / teremben)

  • Tuesday / Kedd - 14:00 - 15:30 - IK 132 (in the classroom / teremben)
  • Tuesday / Kedd - 15:30 - 16:00 - IK I128 (in the office / irodában)
  • Tuesday / Kedd - 16:00 - 17:30 - IK TEOKJ II. em. 109 (in the classroom / teremben)
  • Tuesday / Kedd - 17:30 - 18:00 - IK I128 (in the office / irodában)
  • Tuesday / Kedd - 18:00 - 19:30 - IK TEOKJ II. em. 106 (in the classroom / teremben)

Attendance sheet

Attendance sheet status

Requirements

Project Overview

The goal of this assignment is to design, implement, and evaluate a Multi-Agent System (MAS) using Large Language Models (LLMs). Instead of relying on a single “monolithic” chat prompt, you must decompose a complex problem into specific sub-tasks handled by at least two distinct agents.

A key part of this project is understanding the transition from “Prompt Engineering” to “Agentic Workflows.” You will demonstrate how specialized roles, even when powered by the same local model, can produce superior results compared to a single-agent baseline.

Core Requirements

  • Attendance sheet: Fewer absences than allowed. Active participation in classes.
  • Create a working application, solve a real problem, and present it as a video using the solutions and models learned in class.
    • Maximum length of video is 5-10 minutes.
    • In the video, each creator must present their own contribution. (for 3-8 minutes)
    • The application must be shown in action at the end of the video. (for 1-2 minutes)
    • Video size must be 50 MB or less.
    • The group members (students) have to send one and clear (without executed blocks) Jupyter notebook (ipynb) file with Python code that contains all the project code.
    • Local Execution: All agents must run locally using Ollama.
    • Multi-Agent Design: Use at least two agents with unique system instructions (e.g., a Researcher and a Technical Writer, or a Coder and a Reviewer).
    • Comparative Analysis: You must compare the multi-agent output against a “Baseline” (a single prompt sent to a standard chat agent).
  • Organizing into teams (2-4 people) or working individually.
  • It is not certain that the team members receive a uniform grade, but they get grades proportionate to the task they have completed in the project.

Evaluation

Project type 1

Build an LLM-based multi-agent based on your own idea.

GradeRequirements
5 (Excellent)Outstanding original idea. Robust orchestration (e.g., iterative feedback loops between agents). Deep theoretical understanding of parameters (temperature, context window) and clear proof that the multi-agent approach significantly outperformed the baseline.
4 (Good)Complex interaction logic. Agents provide critical feedback to one another. Detailed comparison with the baseline using specific metrics (e.g., accuracy, code quality, or hallucination reduction).
3 (Fair)Practitioner Clearly defined roles with distinct System Prompts. The notebook is well-documented and the comparison between the “single chat” and “multi-agent” results is present.
2 (Pass)Basic implementation of two agents (e.g., Writer + Editor). The code runs on Ollama, but agent differentiation is minimal. The video explains the “how” but struggles with the “why” regarding agentic workflows.
1 (Fail)The system does not run locally, uses only one agent, or lacks the required video/notebook components. The submission was after the deadline.

Project type 2

Create (Train/Fine-tune) an efficient LLM for your specific task.

TaskGrade 3Grade 4Grade 5
DataYou need data your project.Exploratory data analysis (Visualization) 
PreprocessThe code should include preprocessing steps. (Cureate the data)UnitTest, Visualization 
ModelTrain a LLM model on the data. Use more models, at least 3
EvaluateEvaluate your model on your test data.Complex Benchmarking 
ApplicationPackage your model into an application. Present the your pipeline and use the technical term / keywords correctly. Create a conclusion. Explain how your model works. Present the limitations and opportunities for further development.

Submission

Lecture

Labor

Basics

Transformers

Efficient

LLM based AI application

Submitted

  1. Jurafsky, Daniel, and James H. Martin. “Speech and language processing (draft).” Chapter A: Hidden Markov Models (Draft of September 11, 2018). Retrieved March 19 (2018): 2019.
  2. Eisenstein, Jacob. “Introduction to natural language processing.” MIT press, 2019.
  3. Goldberg, Yoav. “A primer on neural network models for natural language processing.” Journal of Artificial Intelligence Research 57 (2016): 345-420.
  4. Francois Chollet. “Deep Learning with Python”
  5. Hugging Face NLP Course
  6. MIT Introduction to Deep Learning
  7. Visual Guide to Transformer Neural Networks - (Episode 1)
  8. Visual Guide to Transformer Neural Networks - (Episode 2)
  9. Visual Guide to Transformer Neural Networks - (Episode 3)
  10. Stanford CS 224N / Ling 280 — Natural Language Processing

Usefull Publications

[1] Attention Is All You Need

[2] Improving Language Understanding by Generative Pre-Training

[3] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

[4] Efficient Estimation of Word Representations in Vector Space

[5] Global Vectors for Node Representations