# Task

Fine-tune a small open-source LLM for a customer support scenario using LoRA or QLoRA. Complete all steps locally.

1. **Choose a base model**: pick a small pretrained model compatible with PEFT, such as `bigscience/bloom-560m` or `facebook/opt-125m`;
2. **Prepare an instruction dataset**: write at least 10–20 prompt-response pairs covering real customer support interactions. Format them as JSONL with `instruction` and `response` fields;
3. **Apply LoRA or QLoRA**: use the `peft` library to add adapters. Choose QLoRA if your GPU has less than 8GB of VRAM;
4. **Fine-tune**: implement a training loop using the SFT techniques from this course. Use gradient accumulation and a cosine schedule with warmup;
5. **Generate responses**: after training, run both the base model and your fine-tuned model on several prompts, including examples from your dataset and new unseen queries;
6. **Evaluate**: compare outputs side by side. Score them on helpfulness, accuracy, and tone. Note where the fine-tuned model improves and where it still falls short.

After completing the steps, reflect on the following:

- How did the model's responses change after fine-tuning?
- What effect did dataset size have on output quality?
- What would you do differently with more data or compute – more epochs, higher rank, a larger base model?

Master the art of adapting pretrained large language models to new tasks using supervised fine-tuning, LoRA, QLoRA, and RLHF. Learn to prepare instruction datasets, implement parameter-efficient techniques, and evaluate your models for real-world applications.

Explore modern techniques for adapting large language models to specific tasks, focusing on supervised fine-tuning, parameter-efficient methods, RLHF, and evaluation.

Challenge: Fine-tune a Customer Support LLM

Task