Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lære What Is LoRA? | Section
Fine-tuning and Adapting LLMs

bookWhat Is LoRA?

Sveip for å vise menyen

Full fine-tuning updates every parameter in the model. For a 7B parameter model, that means storing and computing gradients for 7 billion values — expensive in memory, time, and storage. Low-Rank Adaptation (LoRA) makes fine-tuning tractable by updating only a tiny fraction of additional parameters while keeping the original weights frozen.

The Core Idea

For each weight matrix WW in the model (typically the attention projections), LoRA introduces two small trainable matrices AA and BB such that:

W=W+BAW' = W + BA

where ARr×dA \in \mathbb{R}^{r \times d} and BRd×rB \in \mathbb{R}^{d \times r}, with rank rdr \ll d. The original WW is frozen. Only AA and BB are updated during training.

At initialization, BB is set to zero so that BA=0BA = 0 – the adapter has no effect at the start of fine-tuning. As training progresses, the adapter learns the task-specific update direction.

Why Low Rank Works

The hypothesis behind LoRA is that the weight updates needed for fine-tuning lie in a low-dimensional subspace of the full parameter space. Instead of updating the full d×dd \times d matrix, you approximate the update with two small matrices whose product is low-rank. In practice, r=4r = 4 to r=16r = 16 is often sufficient.

What This Means in Practice

12345678910111213141516171819202122
# A linear layer with LoRA applied manually import torch import torch.nn as nn class LoRALinear(nn.Module): def __init__(self, in_features, out_features, rank=4): super().__init__() self.weight = nn.Parameter( torch.randn(out_features, in_features), requires_grad=False ) # Frozen base weight self.lora_A = nn.Parameter(torch.randn(rank, in_features) * 0.01) self.lora_B = nn.Parameter(torch.zeros(out_features, rank)) def forward(self, x): base = x @ self.weight.T lora = x @ self.lora_A.T @ self.lora_B.T return base + lora layer = LoRALinear(in_features=512, out_features=512, rank=4) x = torch.rand(2, 10, 512) print(layer(x).shape) # Expected: torch.Size([2, 10, 512])
copy

Run this locally and count the trainable parameters — rank × in + out × rank vs. in × out for the full matrix. With rank=4 and d=512, you train 4096 parameters instead of 262144.

question mark

What Is Correct about LoRA?

Velg alle riktige svar

Alt var klart?

Hvordan kan vi forbedre det?

Takk for tilbakemeldingene dine!

Seksjon 1. Kapittel 4

Spør AI

expand

Spør AI

ChatGPT

Spør om hva du vil, eller prøv ett av de foreslåtte spørsmålene for å starte chatten vår

Seksjon 1. Kapittel 4
some-alt