BERT
M.Sc course, University of Debrecen, Department of Data Science and Visualization, 2025
This Colab notebook demonstrates the process of fine-tuning a DistilBERT model for sentiment classification on the Stanford IMDB dataset. Here’s a breakdown of the steps involved:
- Installation: Installs necessary libraries like transformers, datasets, evaluate, accelerate, peft, and bitsandbytes.
- Dataset Loading: Loads the IMDB dataset using the datasets library.
- Data Preparation: Tokenizes the dataset using the AutoTokenizer from the transformers library and prepares it for training using DataCollatorWithPadding.
- Evaluation Metric: Loads the accuracy metric using evaluate.
- Model Loading: Loads a pre-trained DistilBERT model using AutoModelForSequenceClassification.
- Training: Fine-tunes the model using the Trainer from the transformers library.
- LoRA (Low-Rank Adaptation): Applies LoRA (Low-Rank Adaptation) to the model for efficient fine-tuning.
- Training with LoRA: Fine-tunes the model again, this time with LoRA applied, to improve performance.
In essence, the notebook showcases a standard workflow for sentiment classification using the Hugging Face Transformers library and incorporates LoRA for enhanced fine-tuning.