Introduction to Natural Language Text Processing

B.Sc course, University of Debrecen, Department of Data Science and Visualization, 2024

Within the framework of the subject, students will learn about the basics of natural language text processing (NLP). In addition, they also gain practical experience while solving various tasks. Main topics: logistic regression, naive Bayes model, PCA, n-gram models, Word2Vec, classical and recurrent neural networks. Furthermore, during the completion of the subject, students can gain insight into current, modern neural architectures. During the semester, students will also have the opportunity to test and train these architectures on real data using cloud-based services (Google Collab).

======

Email address of the Teacher

Requirements

Attendance sheet: Fewer absences than allowed. Active participation in classes.
Create a working application, solve a real problem, and present it as a video using the solutions and models learned in class.
- It must be uploaded to Github and shared.
- Maximum length of video is 5-10 minutes.
- In the video, each creator must present their own contribution. (for 3-8 minutes)
- The application must be shown in action at the end of the video. (for 1-2 minutes)
Organizing into teams (2-4 people) or working individually.
If the creator(s) uses a service based on a generative language model to complete the task, they must attach the prompt log to the completed project as additional material.
It is not certain that the team members receive a uniform grade, but they get grades proportionate to the task they have completed in the project.
Submission deadline: 2024.12.20.
Submission form
- Mandatory fields: Neptun code, Video link, Source link.
- If there are more than one of you, the Neptun code can be entered as a list separated by commas.
- The Source link contains the source code.
- The Video link contains the video code.
- If two are the same, the same must be entered in both places.
- If there are no exceptional obstacles, please allow (chose ‘yes’ on the form) your submitted work to be shared among the students of the following semesters within the framework of the subject.

Labor

N. Python Basics
N. Numpy and Matplotlib
N. Pandas Intro
I. Text cleaning I. (2024.09.09/12)
II. Text cleaning II. (2024.09.16/19)
II. Tokenization (2024.09.16/19)
III. Vectorization (2024.09.23/27)
IV. Modeling 1 - Linear Regression (Sentiment Analysis) (2024.09.30/10.03)
V. Modeling 2 - Neural Network (Text Classification) (2024.10.07/10)
VI. Embedding (2024.10.14/17)
VII. Topic Modelling I. (2024.10.28/31)
VIII. Topic Modelling II. (2024.11.04/07)
IX. Text-based Recommendation Systems (2024.11.11/14)
X. Dependency Parsing, Named-entity recognition / Token Classification (2024.11.18/21)
XI. Text Generator I. (2024.11.25/28)
XII. Large Language Models (2024.12.02/05)
XIII. Guest speaker - Morgan Stanley (2024.12.09) Webex, Plan B (Teams)

Róbert Lakatos

Introduction to Natural Language Text Processing

Email address of the Teacher

Requirements

Labor

Submitted

Usefull Links

Recommended Literatures and Courses

Share on