Fine-Tune Llama 3: A Step-by-Step Guide Using Llama-Factory

Ready to fine-tune Llama 3 but overwhelmed by the complexity? This guide simplifies the process using Llama-Factory, a powerful tool designed for efficient and accessible LLM customization. Whether you're a seasoned AI expert or just starting, you'll learn how to tailor Llama 3 for your specific needs.

What is Llama-Factory and Why Use It for Fine-Tuning?

Llama-Factory is a user-friendly framework that streamlines the fine-tuning of over 100 large language models (LLMs), including Llama 3, Mistral, and Falcon. It simplifies the process, making it more accessible and cost-effective.

Here's why Llama-Factory is a game-changer:

Accessibility: Fine-tune LLMs without extensive coding knowledge.
Efficiency: Utilizes advanced algorithms like LoRA and GaLore to reduce GPU usage.
Flexibility: Supports multiple datasets, advanced features, and monitoring tools.
Integration: Seamlessly integrates with TensorBoard, VanDB, MLflow, Gradio, and CLI.

Prerequisites for Fine-Tuning Llama 3

Before diving in, ensure you have the following:

Basic understanding of Generative Pretrained Transformers (GPTs).
A sufficiently powerful NVIDIA GPU (H100 recommended for optimal performance).
A DigitalOcean account (optional, but recommended for cloud GPU access).

Understanding Model Fine-Tuning: Adapting LLMs for Specific Tasks

Model fine-tuning is the process of adjusting the parameters of a pre-trained model to enhance its performance on a specific task or dataset. This involves training the model with new data, modifying weights and biases, to minimize loss and maximize accuracy. Think of it as giving your LLM specialized training.

The benefits of fine-tuning:

Improved Accuracy: Tailor the model to your specific domain for better results.
Reduced Resources: Avoid training a model from scratch, saving time and computational power.
Safety and Control: Mitigate harmful or toxic outputs by fine-tuning with safety measures.

Setting Up Your Environment: Cloning the Repository and Installing Dependencies

Let's begin by setting up your environment. This involves cloning the Llama-Factory repository and installing the required libraries.

Clone the Llama-Factory repository:

!git clone https://github.com/hiyouga/LLaMA-Factory.git
%cd LLaMA-Factory
%ls

Install necessary packages:
```
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps xformers==0.0.25
!pip install .[bitsandbytes]
!pip install 'urllib3<2'
```
This installs Unsloth for efficient fine-tuning, xformers for optimized performance and bitsandbytes for quantization which speeds up training by reducing memory usage.

Verify GPU availability:

import torch
try:
    assert torch.cuda.is_available() is True
except AssertionError:
    print("Your GPU is not setup!")

Fine-Tuning Llama 3 with Llama-Factory: A Step-by-Step Guide

With the environment set up, let's walk through the process of fine-tuning Llama 3 using Llama-Factory.

Prepare your dataset. You can use the sample dataset provided or create your own.

import json

%cd /notebooks/LLaMA-Factory
MODEL_NAME = "Llama-3"

with open("/notebooks/LLaMA-Factory/data/identity.json", "r", encoding="utf-8") as f:
    dataset = json.load(f)

for sample in dataset:
    sample["output"] = sample["output"].replace("MODEL_NAME", MODEL_NAME).replace("AUTHOR", "LLaMA Factory")

with open("/notebooks/LLaMA-Factory/data/identity.json", "w", encoding="utf-8") as f:
    json.dump(dataset, f, indent=2, ensure_ascii=False)

Launch the Gradio web app:
```
%cd /notebooks/LLaMA-Factory
!GRADIO_SHARE=1 llamafactory-cli webui
```
This generates a public link to access the Llama-Factory GUI. LLaMA Board is a user-friendly tool that helps you adjust and improve Language Model (LLM) performance without needing to know how to code.
Configure the training process using the GUI or command line:
- Select the Llama 3 model.
- Choose an adapter configuration (LoRa, QLoRa, freeze, or full).
- Specify training options (supervised fine-tuning, DPU, or PPU).
- Select your dataset.
- Adjust hyperparameters (epochs, batch size, learning rate, etc.).
- Start the training process. You can use the web GUI or CLI and follow the steps.

args = dict(
 stage="sft", # Specifies the stage of training. Here, it's set to "sft" for supervised fine-tuning
 do_train=True,
 model_name_or_path="unsloth/llama-3-8b-Instruct-bnb-4bit", # use bnb-4bit-quantized Llama-3-8B-Instruct model
 dataset="identity,alpaca_gpt4_en", # use the alpaca and identity datasets
 template="llama3", # use llama3 for prompt template
 finetuning_type="lora", # use the LoRA adapters which saves up memory
 lora_target="all", # attach LoRA adapters to all linear layers
 output_dir="llama3_lora", # path to save LoRA adapters
 per_device_train_batch_size=2, # specify the batch size
 gradient_accumulation_steps=4, # the gradient accumulation steps
 lr_scheduler_type="cosine", # use the learning rate as cosine learning rate scheduler
 logging_steps=10, # log every 10 steps
 warmup_ratio=0.1, # use warmup scheduler
 save_steps=1000, # save checkpoint every 1000 steps
 learning_rate=5e-5, # the learning rate
 num_train_epochs=3.0, # the epochs of training
 max_samples=500, # use 500 examples in each dataset
 max_grad_norm=1.0, # clip gradient norm to 1.0
 quantization_bit=4, # use 4-bit QLoRA
 loraplus_lr_ratio=16.0, # use LoRA+ with lambda=16.0
 use_unsloth=True, # use UnslothAI's LoRA optimization for 2x faster training
 fp16=True, # use float16 mixed precision training
)

json.dump(args, open("train_llama3.json", "w", encoding="utf-8"), indent=2)

Next, open a terminal and run the below command,

!llamafactory-cli train train_llama3.json

This will start the training process. Fine-tuning LLMs can be this easy.

Inferencing With Your Fine-Tuned Model

Inference refers to using a fine-tuned machine learning model to make predictions on new, unseen data. Once the model training is completed, we can use the model to infer from. Let us try doing that and check how the model works.

args = dict(
 model_name_or_path="unsloth/llama-3-8b-Instruct-bnb-4bit", # Specifies the name or path of the pre-trained model to be used for inference. In this case, it's set to "unsloth/llama-3-8b-Instruct-bnb-4bit".
 #adapter_name_or_path="llama3_lora", # load the saved LoRA adapters
 finetuning_type="lora", # Specifies the type of fine-tuning. Here, it's set to "lora" for LoRA adapters.
 template="llama3", # Specifies the prompt template to be used for inference. Here, it's set to "llama3"
 quantization_bit=4, # Specifies the number of bits for quantization. In this case, it's set to 4
 use_unsloth=True, # use UnslothAI's LoRA optimization for 2x faster generation
)

json.dump(args, open("infer_llama3.json", "w", encoding="utf-8"), indent=2)

Next, run the below code using your terminal.

!llamafactory-cli chat infer_llama3.json

Conclusion: Empowering LLMs for a Better Future

Llama-Factory democratizes the process of fine-tuning large language models, making it accessible to a wider audience. By providing a user-friendly interface and efficient training techniques, it empowers developers to create custom LLMs tailored for specific tasks and societal benefits.

Key takeaways:

Llama-Factory simplifies fine-tuning for over 100 LLMs.
It offers a user-friendly interface and advanced training techniques.
Fine-tuning LLMs allows for greater accuracy, safety, and control.
Experiment with Llama-Factory and contribute to the open-source community.

Remember to adhere to the model's license when using Llama-Factory to prevent misuse and promote responsible AI development.

Additional Resources

Ready to start exploring the possibilities of fine-tuning Llama 3 and other LLMs? With Llama-Factory, the power to customize AI is now at your fingertips.