BERT

M.Sc course, University of Debrecen, Department of Data Science and Visualization, 2025

This Colab notebook demonstrates the process of fine-tuning a DistilBERT model for sentiment classification on the Stanford IMDB dataset. Here’s a breakdown of the steps involved:

Installation: Installs necessary libraries like transformers, datasets, evaluate, accelerate, peft, and bitsandbytes.
Dataset Loading: Loads the IMDB dataset using the datasets library.
Data Preparation: Tokenizes the dataset using the AutoTokenizer from the transformers library and prepares it for training using DataCollatorWithPadding.
Evaluation Metric: Loads the accuracy metric using evaluate.
Model Loading: Loads a pre-trained DistilBERT model using AutoModelForSequenceClassification.
Training: Fine-tunes the model using the Trainer from the transformers library.
LoRA (Low-Rank Adaptation): Applies LoRA (Low-Rank Adaptation) to the model for efficient fine-tuning.
Training with LoRA: Fine-tunes the model again, this time with LoRA applied, to improve performance.

In essence, the notebook showcases a standard workflow for sentiment classification using the Hugging Face Transformers library and incorporates LoRA for enhanced fine-tuning.

Róbert Lakatos

Colab