Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Impara What Is LoRA? | Section
Fine-tuning and Adapting LLMs

bookWhat Is LoRA?

Scorri per mostrare il menu

Full fine-tuning updates every parameter in the model. For a 7B parameter model, that means storing and computing gradients for 7 billion values — expensive in memory, time, and storage. Low-Rank Adaptation (LoRA) makes fine-tuning tractable by updating only a tiny fraction of additional parameters while keeping the original weights frozen.

The Core Idea

For each weight matrix WW in the model (typically the attention projections), LoRA introduces two small trainable matrices AA and BB such that:

W=W+BAW' = W + BA

where ARr×dA \in \mathbb{R}^{r \times d} and BRd×rB \in \mathbb{R}^{d \times r}, with rank rdr \ll d. The original WW is frozen. Only AA and BB are updated during training.

At initialization, BB is set to zero so that BA=0BA = 0 – the adapter has no effect at the start of fine-tuning. As training progresses, the adapter learns the task-specific update direction.

Why Low Rank Works

The hypothesis behind LoRA is that the weight updates needed for fine-tuning lie in a low-dimensional subspace of the full parameter space. Instead of updating the full d×dd \times d matrix, you approximate the update with two small matrices whose product is low-rank. In practice, r=4r = 4 to r=16r = 16 is often sufficient.

What This Means in Practice

12345678910111213141516171819202122
# A linear layer with LoRA applied manually import torch import torch.nn as nn class LoRALinear(nn.Module): def __init__(self, in_features, out_features, rank=4): super().__init__() self.weight = nn.Parameter( torch.randn(out_features, in_features), requires_grad=False ) # Frozen base weight self.lora_A = nn.Parameter(torch.randn(rank, in_features) * 0.01) self.lora_B = nn.Parameter(torch.zeros(out_features, rank)) def forward(self, x): base = x @ self.weight.T lora = x @ self.lora_A.T @ self.lora_B.T return base + lora layer = LoRALinear(in_features=512, out_features=512, rank=4) x = torch.rand(2, 10, 512) print(layer(x).shape) # Expected: torch.Size([2, 10, 512])
copy

Run this locally and count the trainable parameters — rank × in + out × rank vs. in × out for the full matrix. With rank=4 and d=512, you train 4096 parameters instead of 262144.

question mark

What Is Correct about LoRA?

Seleziona tutte le risposte corrette

Tutto è chiaro?

Come possiamo migliorarlo?

Grazie per i tuoi commenti!

Sezione 1. Capitolo 4

Chieda ad AI

expand

Chieda ad AI

ChatGPT

Chieda pure quello che desidera o provi una delle domande suggerite per iniziare la nostra conversazione

Sezione 1. Capitolo 4
some-alt