Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Leer Fine-tuning BERT for Sentiment Analysis | Transfer Learning in NLP
Transfer Learning Essentials

bookFine-tuning BERT for Sentiment Analysis

Fine-tuning BERT for sentiment analysis involves adapting a pre-trained BERT model to your specific dataset by updating its weights during training. The process starts by adding a classification head—a simple feedforward layer—on top of BERT's pooled output. This head is responsible for mapping the high-dimensional representations from BERT to the desired number of sentiment classes, such as positive, negative, or neutral. After adding the classification head, you adjust key hyperparameters like the learning rate, batch size, number of epochs, and optimizer settings to suit the size and nature of your dataset. Typically, a small learning rate is chosen to avoid overwriting the valuable knowledge BERT has already acquired during pre-training.

import torch
import torch.nn as nn
from transformers import BertModel

This imports PyTorch and the base BERT model from Hugging Face Transformers. torch.nn is used to define new layers on top of BERT for fine-tuning.

class BertForSentimentAnalysis(nn.Module):
    def __init__(self, num_labels=2):
        super(BertForSentimentAnalysis, self).__init__()
        self.bert = BertModel.from_pretrained("bert-base-uncased")
        self.dropout = nn.Dropout(0.3)
        self.classifier = nn.Linear(self.bert.config.hidden_size, num_labels)

This defines a custom model that reuses BERT as the feature extractor.

  • num_labels=2 - binary sentiment classification (positive/negative);
  • BertModel.from_pretrained loads BERT with pre-trained weights;
  • Dropout(0.3) helps reduce overfitting by randomly disabling 30% of neurons;
  • Linear(hidden_size, num_labels) maps BERT's pooled embedding to class logits.
    def forward(self, input_ids, attention_mask):
        outputs = self.bert(
            input_ids=input_ids,
            attention_mask=attention_mask,
            return_dict=True
        )
        pooled_output = outputs.pooler_output
        dropped = self.dropout(pooled_output)
        logits = self.classifier(dropped)
        return logits

The forward method defines how data flows through the model.

  • input_ids and attention_mask are outputs from a tokenizer;
  • pooled_output corresponds to the embedding of the [CLS] token, summarizing the sentence meaning;
  • After dropout regularization, the linear layer produces the final logits (raw class scores).
# Example of preparing the model for fine-tuning
model = BertForSentimentAnalysis(num_labels=2)
optimizer = torch.optim.AdamW(model.parameters(), lr=2e-5)

AdamW is a variant of the Adam optimizer with weight decay, recommended for transformer fine-tuning. A learning rate of 2e-5 is standard for adapting BERT to small downstream tasks such as sentiment analysis.

Note
Note

When fine-tuning BERT, using a small learning rate (such as 2e-5 or 3e-5) is critical. Large learning rates can quickly destroy the pre-trained weights, causing the model to forget what it has learned and resulting in poor performance.

To avoid overfitting when fine-tuning transformer models like BERT, you should use techniques such as dropout, early stopping, and data augmentation. Monitoring validation loss and using regularization strategies help ensure that your model generalizes well to unseen data. It is also helpful to limit the number of training epochs and to use smaller batch sizes when working with limited data.

question mark

Which of the following is a best practice when fine-tuning BERT for a small sentiment analysis dataset?

Select the correct answer

Was alles duidelijk?

Hoe kunnen we het verbeteren?

Bedankt voor je feedback!

Sectie 3. Hoofdstuk 3

Vraag AI

expand

Vraag AI

ChatGPT

Vraag wat u wilt of probeer een van de voorgestelde vragen om onze chat te starten.

Suggested prompts:

Can you explain how to implement early stopping during fine-tuning?

What are some effective data augmentation techniques for sentiment analysis?

How do I monitor validation loss while training the model?

Awesome!

Completion rate improved to 9.09

bookFine-tuning BERT for Sentiment Analysis

Veeg om het menu te tonen

Fine-tuning BERT for sentiment analysis involves adapting a pre-trained BERT model to your specific dataset by updating its weights during training. The process starts by adding a classification head—a simple feedforward layer—on top of BERT's pooled output. This head is responsible for mapping the high-dimensional representations from BERT to the desired number of sentiment classes, such as positive, negative, or neutral. After adding the classification head, you adjust key hyperparameters like the learning rate, batch size, number of epochs, and optimizer settings to suit the size and nature of your dataset. Typically, a small learning rate is chosen to avoid overwriting the valuable knowledge BERT has already acquired during pre-training.

import torch
import torch.nn as nn
from transformers import BertModel

This imports PyTorch and the base BERT model from Hugging Face Transformers. torch.nn is used to define new layers on top of BERT for fine-tuning.

class BertForSentimentAnalysis(nn.Module):
    def __init__(self, num_labels=2):
        super(BertForSentimentAnalysis, self).__init__()
        self.bert = BertModel.from_pretrained("bert-base-uncased")
        self.dropout = nn.Dropout(0.3)
        self.classifier = nn.Linear(self.bert.config.hidden_size, num_labels)

This defines a custom model that reuses BERT as the feature extractor.

  • num_labels=2 - binary sentiment classification (positive/negative);
  • BertModel.from_pretrained loads BERT with pre-trained weights;
  • Dropout(0.3) helps reduce overfitting by randomly disabling 30% of neurons;
  • Linear(hidden_size, num_labels) maps BERT's pooled embedding to class logits.
    def forward(self, input_ids, attention_mask):
        outputs = self.bert(
            input_ids=input_ids,
            attention_mask=attention_mask,
            return_dict=True
        )
        pooled_output = outputs.pooler_output
        dropped = self.dropout(pooled_output)
        logits = self.classifier(dropped)
        return logits

The forward method defines how data flows through the model.

  • input_ids and attention_mask are outputs from a tokenizer;
  • pooled_output corresponds to the embedding of the [CLS] token, summarizing the sentence meaning;
  • After dropout regularization, the linear layer produces the final logits (raw class scores).
# Example of preparing the model for fine-tuning
model = BertForSentimentAnalysis(num_labels=2)
optimizer = torch.optim.AdamW(model.parameters(), lr=2e-5)

AdamW is a variant of the Adam optimizer with weight decay, recommended for transformer fine-tuning. A learning rate of 2e-5 is standard for adapting BERT to small downstream tasks such as sentiment analysis.

Note
Note

When fine-tuning BERT, using a small learning rate (such as 2e-5 or 3e-5) is critical. Large learning rates can quickly destroy the pre-trained weights, causing the model to forget what it has learned and resulting in poor performance.

To avoid overfitting when fine-tuning transformer models like BERT, you should use techniques such as dropout, early stopping, and data augmentation. Monitoring validation loss and using regularization strategies help ensure that your model generalizes well to unseen data. It is also helpful to limit the number of training epochs and to use smaller batch sizes when working with limited data.

question mark

Which of the following is a best practice when fine-tuning BERT for a small sentiment analysis dataset?

Select the correct answer

Was alles duidelijk?

Hoe kunnen we het verbeteren?

Bedankt voor je feedback!

Sectie 3. Hoofdstuk 3
some-alt