Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn What Is LoRA? | Section
Fine-tuning and Adapting LLMs

bookWhat Is LoRA?

Swipe to show menu

Full fine-tuning updates every parameter in the model. For a 7B parameter model, that means storing and computing gradients for 7 billion values — expensive in memory, time, and storage. Low-Rank Adaptation (LoRA) makes fine-tuning tractable by updating only a tiny fraction of additional parameters while keeping the original weights frozen.

The Core Idea

For each weight matrix WW in the model (typically the attention projections), LoRA introduces two small trainable matrices AA and BB such that:

W=W+BAW' = W + BA

where ARr×dA \in \mathbb{R}^{r \times d} and BRd×rB \in \mathbb{R}^{d \times r}, with rank rdr \ll d. The original WW is frozen. Only AA and BB are updated during training.

At initialization, BB is set to zero so that BA=0BA = 0 – the adapter has no effect at the start of fine-tuning. As training progresses, the adapter learns the task-specific update direction.

Why Low Rank Works

The hypothesis behind LoRA is that the weight updates needed for fine-tuning lie in a low-dimensional subspace of the full parameter space. Instead of updating the full d×dd \times d matrix, you approximate the update with two small matrices whose product is low-rank. In practice, r=4r = 4 to r=16r = 16 is often sufficient.

What This Means in Practice

12345678910111213141516171819202122
# A linear layer with LoRA applied manually import torch import torch.nn as nn class LoRALinear(nn.Module): def __init__(self, in_features, out_features, rank=4): super().__init__() self.weight = nn.Parameter( torch.randn(out_features, in_features), requires_grad=False ) # Frozen base weight self.lora_A = nn.Parameter(torch.randn(rank, in_features) * 0.01) self.lora_B = nn.Parameter(torch.zeros(out_features, rank)) def forward(self, x): base = x @ self.weight.T lora = x @ self.lora_A.T @ self.lora_B.T return base + lora layer = LoRALinear(in_features=512, out_features=512, rank=4) x = torch.rand(2, 10, 512) print(layer(x).shape) # Expected: torch.Size([2, 10, 512])
copy

Run this locally and count the trainable parameters — rank × in + out × rank vs. in × out for the full matrix. With rank=4 and d=512, you train 4096 parameters instead of 262144.

question mark

What Is Correct about LoRA?

Select all correct answers

Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 1. Chapter 4

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Section 1. Chapter 4
some-alt