Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lära What Is LoRA? | Section
Fine-tuning and Adapting LLMs

bookWhat Is LoRA?

Svep för att visa menyn

Full fine-tuning updates every parameter in the model. For a 7B parameter model, that means storing and computing gradients for 7 billion values — expensive in memory, time, and storage. Low-Rank Adaptation (LoRA) makes fine-tuning tractable by updating only a tiny fraction of additional parameters while keeping the original weights frozen.

The Core Idea

For each weight matrix WW in the model (typically the attention projections), LoRA introduces two small trainable matrices AA and BB such that:

W=W+BAW' = W + BA

where ARr×dA \in \mathbb{R}^{r \times d} and BRd×rB \in \mathbb{R}^{d \times r}, with rank rdr \ll d. The original WW is frozen. Only AA and BB are updated during training.

At initialization, BB is set to zero so that BA=0BA = 0 – the adapter has no effect at the start of fine-tuning. As training progresses, the adapter learns the task-specific update direction.

Why Low Rank Works

The hypothesis behind LoRA is that the weight updates needed for fine-tuning lie in a low-dimensional subspace of the full parameter space. Instead of updating the full d×dd \times d matrix, you approximate the update with two small matrices whose product is low-rank. In practice, r=4r = 4 to r=16r = 16 is often sufficient.

What This Means in Practice

12345678910111213141516171819202122
# A linear layer with LoRA applied manually import torch import torch.nn as nn class LoRALinear(nn.Module): def __init__(self, in_features, out_features, rank=4): super().__init__() self.weight = nn.Parameter( torch.randn(out_features, in_features), requires_grad=False ) # Frozen base weight self.lora_A = nn.Parameter(torch.randn(rank, in_features) * 0.01) self.lora_B = nn.Parameter(torch.zeros(out_features, rank)) def forward(self, x): base = x @ self.weight.T lora = x @ self.lora_A.T @ self.lora_B.T return base + lora layer = LoRALinear(in_features=512, out_features=512, rank=4) x = torch.rand(2, 10, 512) print(layer(x).shape) # Expected: torch.Size([2, 10, 512])
copy

Run this locally and count the trainable parameters — rank × in + out × rank vs. in × out for the full matrix. With rank=4 and d=512, you train 4096 parameters instead of 262144.

question mark

What Is Correct about LoRA?

Välj alla rätta svar

Var allt tydligt?

Hur kan vi förbättra det?

Tack för dina kommentarer!

Avsnitt 1. Kapitel 4

Fråga AI

expand

Fråga AI

ChatGPT

Fråga vad du vill eller prova någon av de föreslagna frågorna för att starta vårt samtal

Avsnitt 1. Kapitel 4
some-alt