Fine-tuning BERT for Sentiment Analysis
Fine-tuning BERT for sentiment analysis involves adapting a pre-trained BERT model to your specific dataset by updating its weights during training. The process starts by adding a classification head—a simple feedforward layer—on top of BERT's pooled output. This head is responsible for mapping the high-dimensional representations from BERT to the desired number of sentiment classes, such as positive, negative, or neutral. After adding the classification head, you adjust key hyperparameters like the learning rate, batch size, number of epochs, and optimizer settings to suit the size and nature of your dataset. Typically, a small learning rate is chosen to avoid overwriting the valuable knowledge BERT has already acquired during pre-training.
import torch
import torch.nn as nn
from transformers import BertModel
This imports PyTorch and the base BERT model from Hugging Face Transformers.
torch.nn is used to define new layers on top of BERT for fine-tuning.
class BertForSentimentAnalysis(nn.Module):
def __init__(self, num_labels=2):
super(BertForSentimentAnalysis, self).__init__()
self.bert = BertModel.from_pretrained("bert-base-uncased")
self.dropout = nn.Dropout(0.3)
self.classifier = nn.Linear(self.bert.config.hidden_size, num_labels)
This defines a custom model that reuses BERT as the feature extractor.
num_labels=2- binary sentiment classification (positive/negative);BertModel.from_pretrainedloads BERT with pre-trained weights;Dropout(0.3)helps reduce overfitting by randomly disabling 30% of neurons;Linear(hidden_size, num_labels)maps BERT's pooled embedding to class logits.
def forward(self, input_ids, attention_mask):
outputs = self.bert(
input_ids=input_ids,
attention_mask=attention_mask,
return_dict=True
)
pooled_output = outputs.pooler_output
dropped = self.dropout(pooled_output)
logits = self.classifier(dropped)
return logits
The forward method defines how data flows through the model.
input_idsandattention_maskare outputs from a tokenizer;pooled_outputcorresponds to the embedding of the[CLS]token, summarizing the sentence meaning;- After dropout regularization, the linear layer produces the final logits (raw class scores).
# Example of preparing the model for fine-tuning
model = BertForSentimentAnalysis(num_labels=2)
optimizer = torch.optim.AdamW(model.parameters(), lr=2e-5)
AdamW is a variant of the Adam optimizer with weight decay, recommended for transformer fine-tuning.
A learning rate of 2e-5 is standard for adapting BERT to small downstream tasks such as sentiment analysis.
When fine-tuning BERT, using a small learning rate (such as 2e-5 or 3e-5) is critical. Large learning rates can quickly destroy the pre-trained weights, causing the model to forget what it has learned and resulting in poor performance.
To avoid overfitting when fine-tuning transformer models like BERT, you should use techniques such as dropout, early stopping, and data augmentation. Monitoring validation loss and using regularization strategies help ensure that your model generalizes well to unseen data. It is also helpful to limit the number of training epochs and to use smaller batch sizes when working with limited data.
Takk for tilbakemeldingene dine!
Spør AI
Spør AI
Spør om hva du vil, eller prøv ett av de foreslåtte spørsmålene for å starte chatten vår
Awesome!
Completion rate improved to 9.09
Fine-tuning BERT for Sentiment Analysis
Sveip for å vise menyen
Fine-tuning BERT for sentiment analysis involves adapting a pre-trained BERT model to your specific dataset by updating its weights during training. The process starts by adding a classification head—a simple feedforward layer—on top of BERT's pooled output. This head is responsible for mapping the high-dimensional representations from BERT to the desired number of sentiment classes, such as positive, negative, or neutral. After adding the classification head, you adjust key hyperparameters like the learning rate, batch size, number of epochs, and optimizer settings to suit the size and nature of your dataset. Typically, a small learning rate is chosen to avoid overwriting the valuable knowledge BERT has already acquired during pre-training.
import torch
import torch.nn as nn
from transformers import BertModel
This imports PyTorch and the base BERT model from Hugging Face Transformers.
torch.nn is used to define new layers on top of BERT for fine-tuning.
class BertForSentimentAnalysis(nn.Module):
def __init__(self, num_labels=2):
super(BertForSentimentAnalysis, self).__init__()
self.bert = BertModel.from_pretrained("bert-base-uncased")
self.dropout = nn.Dropout(0.3)
self.classifier = nn.Linear(self.bert.config.hidden_size, num_labels)
This defines a custom model that reuses BERT as the feature extractor.
num_labels=2- binary sentiment classification (positive/negative);BertModel.from_pretrainedloads BERT with pre-trained weights;Dropout(0.3)helps reduce overfitting by randomly disabling 30% of neurons;Linear(hidden_size, num_labels)maps BERT's pooled embedding to class logits.
def forward(self, input_ids, attention_mask):
outputs = self.bert(
input_ids=input_ids,
attention_mask=attention_mask,
return_dict=True
)
pooled_output = outputs.pooler_output
dropped = self.dropout(pooled_output)
logits = self.classifier(dropped)
return logits
The forward method defines how data flows through the model.
input_idsandattention_maskare outputs from a tokenizer;pooled_outputcorresponds to the embedding of the[CLS]token, summarizing the sentence meaning;- After dropout regularization, the linear layer produces the final logits (raw class scores).
# Example of preparing the model for fine-tuning
model = BertForSentimentAnalysis(num_labels=2)
optimizer = torch.optim.AdamW(model.parameters(), lr=2e-5)
AdamW is a variant of the Adam optimizer with weight decay, recommended for transformer fine-tuning.
A learning rate of 2e-5 is standard for adapting BERT to small downstream tasks such as sentiment analysis.
When fine-tuning BERT, using a small learning rate (such as 2e-5 or 3e-5) is critical. Large learning rates can quickly destroy the pre-trained weights, causing the model to forget what it has learned and resulting in poor performance.
To avoid overfitting when fine-tuning transformer models like BERT, you should use techniques such as dropout, early stopping, and data augmentation. Monitoring validation loss and using regularization strategies help ensure that your model generalizes well to unseen data. It is also helpful to limit the number of training epochs and to use smaller batch sizes when working with limited data.
Takk for tilbakemeldingene dine!