Summary  
This chapter covers how to inject trainable low-rank adapter matrices into selected neural network layers using the PEFT library, freezing the base model’s weights to achieve parameter-efficient fine-tuning.

General domain of usage  
Fine-tuning transformer-based language models

The `peft` library wraps any Hugging Face model with LoRA adapters in a few lines of code. You define which layers to adapt and at what rank, and `peft` handles injecting the trainable matrices and freezing everything else.

## Applying LoRA to a Pretrained Model

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import LoraConfig, get_peft_model

model_name = "bigscience/bloom-560m"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

lora_config = LoraConfig(
    r=8,                              # Rank of the low-rank matrices
    lora_alpha=32,                    # Scaling factor for adapter outputs
    target_modules=["query_key_value"],  # Attention modules to adapt
    lora_dropout=0.05,                # Dropout on adapter layers
    bias="none",                      # No additional bias parameters
    task_type="CAUSAL_LM"
)

lora_model = get_peft_model(model, lora_config)
lora_model.print_trainable_parameters()
```

`print_trainable_parameters()` shows how many parameters are trainable vs. frozen. For a 560M model with `r=8`, expect trainable parameters to be well under 1% of total.

Run this locally to see the parameter breakdown and confirm that only the LoRA adapter weights are marked as trainable.

## Key Configuration Parameters

`r` sets the rank of the adapter matrices. Higher rank means more capacity but more memory. Start with `r=8` and increase only if the model underfits.

`lora_alpha` scales the adapter output – effectively a learning rate multiplier for the adapters. A common heuristic is to set `lora_alpha = 2 × r`.

`target_modules` controls which layers receive adapters. Targeting only attention projections is the most common approach; adding MLP layers increases capacity at higher cost.

`lora_dropout` applies dropout to adapter outputs during training, reducing overfitting on small datasets.

## Training with the Wrapped Model

The `lora_model` is a standard `nn.Module` – you train it with the same loop as any other PyTorch model:

```python
from torch.optim import AdamW

optimizer = AdamW(lora_model.parameters(), lr=2e-4)

lora_model.train()
inputs = tokenizer("Fine-tuning with LoRA is efficient.", return_tensors="pt")
outputs = lora_model(**inputs, labels=inputs["input_ids"])

outputs.loss.backward()
optimizer.step()
optimizer.zero_grad()

print(f"Loss: {outputs.loss.item():.4f}")
```

What Is Correct about PEFT-based LoRA Fine-tuning?

Master the art of adapting pretrained large language models to new tasks using supervised fine-tuning, LoRA, QLoRA, and RLHF. Learn to prepare instruction datasets, implement parameter-efficient techniques, and evaluate your models for real-world applications.

Explore modern techniques for adapting large language models to specific tasks, focusing on supervised fine-tuning, parameter-efficient methods, RLHF, and evaluation.

Implementing LoRA with the PEFT Library

Applying LoRA to a Pretrained Model

Key Configuration Parameters

Training with the Wrapped Model