Introduction to Natural Language Text Processing

B.Sc course, University of Debrecen, Department of Data Science and Visualization, 2024

Within the framework of the subject, students will learn about the basics of natural language text processing (NLP). In addition, they also gain practical experience while solving various tasks. Main topics: logistic regression, naive Bayes model, PCA, n-gram models, Word2Vec, classical and recurrent neural networks. Furthermore, during the completion of the subject, students can gain insight into current, modern neural architectures. During the semester, students will also have the opportunity to test and train these architectures on real data using cloud-based services (Google Collab).

======

Email address of the Teacher

Requirements

  • Attendance sheet: Fewer absences than allowed. Active participation in classes.
  • Create a working application, solve a real problem, and present it as a video using the solutions and models learned in class.
    • It must be uploaded to Github and shared.
    • Maximum length of video is 5-10 minutes.
    • In the video, each creator must present their own contribution. (for 3-8 minutes)
    • The application must be shown in action at the end of the video. (for 1-2 minutes)
  • Organizing into teams (2-4 people) or working individually.
  • If the creator(s) uses a service based on a generative language model to complete the task, they must attach the prompt log to the completed project as additional material.
  • It is not certain that the team members receive a uniform grade, but they get grades proportionate to the task they have completed in the project.
  • Submission deadline: 2024.12.20.
  • Submission form
    • Mandatory fields: Neptun code, Video link, Source link.
    • If there are more than one of you, the Neptun code can be entered as a list separated by commas.
    • The Source link contains the source code.
    • The Video link contains the video code.
    • If two are the same, the same must be entered in both places.
    • If there are no exceptional obstacles, please allow (chose ‘yes’ on the form) your submitted work to be shared among the students of the following semesters within the framework of the subject.

Labor

Submitted

  1. Jurafsky, Daniel, and James H. Martin. “Speech and language processing (draft).” Chapter A: Hidden Markov Models (Draft of September 11, 2018). Retrieved March 19 (2018): 2019.
  2. Eisenstein, Jacob. “Introduction to natural language processing.” MIT press, 2019.
  3. Goldberg, Yoav. “A primer on neural network models for natural language processing.” Journal of Artificial Intelligence Research 57 (2016): 345-420.
  4. Francois Chollet. “Deep Learning with Python”
  5. Hugging Face NLP Course
  6. MIT Introduction to Deep Learning