Advanced Natural Language Processing

M.Sc course, University of Debrecen, Department of Data Science and Visualization, 2025

This course delves into advanced concepts of Natural Language Processing (NLP) and Machine Learning (ML) with a strong focus on modern deep learning techniques. It covers foundational topics such as tokenization, text representation, and pipelines, as well as cutting-edge research in large language models (LLMs), transformers, and their applications. The course emphasizes both theoretical understanding and practical implementation, preparing students to tackle real-world NLP challenges, including security, privacy, and human-centered design. During the semester, students will also have the opportunity to test and train these architectures on real data using cloud-based services (Google Collab).

======

Email address of the Teacher

Consultation

  • Monday / Hétfő - 14:00 - 15:30 - IK 107 (in the classroom / teremben)
  • Monday / Hétfő - 15:30 - 16:00 - IK I128 (in the office / irodában)
  • Monday / Hétfő - 16:00 - 17:30 - IK 204 (in the classroom / teremben)
  • Monday / Hétfő - 17:30 - 18:00 - IK I128 (in the office / irodában)
  • Monday / Hétfő - 18:00 - 19:30 - IK 204 (in the classroom / teremben)

  • Tuesday / Kedd - 14:00 - 15:30 - IK 132 (in the classroom / teremben)
  • Tuesday / Kedd - 15:30 - 16:00 - IK I128 (in the office / irodában)
  • Tuesday / Kedd - 16:00 - 17:30 - IK TEOKJ II. em. 109 (in the classroom / teremben)
  • Tuesday / Kedd - 17:30 - 18:00 - IK I128 (in the office / irodában)
  • Tuesday / Kedd - 18:00 - 19:30 - IK TEOKJ II. em. 106 (in the classroom / teremben)

Attendance sheet

Attendance sheet status

Requirements

  • Attendance sheet: Fewer absences than allowed. Active participation in classes.
  • Create a working application, solve a real problem, and present it as a video using the solutions and models learned in class.
    • It must be uploaded to Github and shared.
    • Maximum length of video is 5-10 minutes.
    • In the video, each creator must present their own contribution. (for 3-8 minutes)
    • The application must be shown in action at the end of the video. (for 1-2 minutes)
    • Video size must be 50 MB or less.
    • The group members (students) have to send one Jupyter notebook (ipynb) file with Python code that contains all the project code.
  • Organizing into teams (2-4 people) or working individually.
  • If the creator(s) uses a service based on a generative language model to complete the task, they must attach the prompt log to the completed project as additional material.
  • It is not certain that the team members receive a uniform grade, but they get grades proportionate to the task they have completed in the project.
  • Submission deadline: 2025.05.24
  • Submission form

Lecture

Labor

Basics

Submitted

  1. Jurafsky, Daniel, and James H. Martin. “Speech and language processing (draft).” Chapter A: Hidden Markov Models (Draft of September 11, 2018). Retrieved March 19 (2018): 2019.
  2. Eisenstein, Jacob. “Introduction to natural language processing.” MIT press, 2019.
  3. Goldberg, Yoav. “A primer on neural network models for natural language processing.” Journal of Artificial Intelligence Research 57 (2016): 345-420.
  4. Francois Chollet. “Deep Learning with Python”
  5. Hugging Face NLP Course
  6. MIT Introduction to Deep Learning
  7. Visual Guide to Transformer Neural Networks - (Episode 1)
  8. Visual Guide to Transformer Neural Networks - (Episode 2)
  9. Visual Guide to Transformer Neural Networks - (Episode 3)
  10. Stanford CS 224N / Ling 280 — Natural Language Processing

Usefull Publications

[1] Attention Is All You Need

[2] Improving Language Understanding by Generative Pre-Training

[3] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

[4] Efficient Estimation of Word Representations in Vector Space

[5] Global Vectors for Node Representations