Fine-Tune Llama 3: A Step-by-Step Guide Using Llama-Factory
Ready to fine-tune Llama 3 but overwhelmed by the complexity? This guide simplifies the process using Llama-Factory, a powerful tool designed for efficient and accessible LLM customization. Whether you're a seasoned AI expert or just starting, you'll learn how to tailor Llama 3 for your specific needs.
What is Llama-Factory and Why Use It for Fine-Tuning?
Llama-Factory is a user-friendly framework that streamlines the fine-tuning of over 100 large language models (LLMs), including Llama 3, Mistral, and Falcon. It simplifies the process, making it more accessible and cost-effective.
Here's why Llama-Factory is a game-changer:
- Accessibility: Fine-tune LLMs without extensive coding knowledge.
- Efficiency: Utilizes advanced algorithms like LoRA and GaLore to reduce GPU usage.
- Flexibility: Supports multiple datasets, advanced features, and monitoring tools.
- Integration: Seamlessly integrates with TensorBoard, VanDB, MLflow, Gradio, and CLI.
Prerequisites for Fine-Tuning Llama 3
Before diving in, ensure you have the following:
- Basic understanding of Generative Pretrained Transformers (GPTs).
- A sufficiently powerful NVIDIA GPU (H100 recommended for optimal performance).
- A DigitalOcean account (optional, but recommended for cloud GPU access).
Understanding Model Fine-Tuning: Adapting LLMs for Specific Tasks
Model fine-tuning is the process of adjusting the parameters of a pre-trained model to enhance its performance on a specific task or dataset. This involves training the model with new data, modifying weights and biases, to minimize loss and maximize accuracy. Think of it as giving your LLM specialized training.
The benefits of fine-tuning:
- Improved Accuracy: Tailor the model to your specific domain for better results.
- Reduced Resources: Avoid training a model from scratch, saving time and computational power.
- Safety and Control: Mitigate harmful or toxic outputs by fine-tuning with safety measures.
Setting Up Your Environment: Cloning the Repository and Installing Dependencies
Let's begin by setting up your environment. This involves cloning the Llama-Factory repository and installing the required libraries.
-
Clone the Llama-Factory repository:
-
Install necessary packages:
This installs Unsloth for efficient fine-tuning, xformers for optimized performance and bitsandbytes for quantization which speeds up training by reducing memory usage.
-
Verify GPU availability:
Fine-Tuning Llama 3 with Llama-Factory: A Step-by-Step Guide
With the environment set up, let's walk through the process of fine-tuning Llama 3 using Llama-Factory.
-
Prepare your dataset. You can use the sample dataset provided or create your own.
-
Launch the Gradio web app:
This generates a public link to access the Llama-Factory GUI. LLaMA Board is a user-friendly tool that helps you adjust and improve Language Model (LLM) performance without needing to know how to code.
-
Configure the training process using the GUI or command line:
- Select the Llama 3 model.
- Choose an adapter configuration (LoRa, QLoRa, freeze, or full).
- Specify training options (supervised fine-tuning, DPU, or PPU).
- Select your dataset.
- Adjust hyperparameters (epochs, batch size, learning rate, etc.).
- Start the training process. You can use the web GUI or CLI and follow the steps.
args = dict(
stage="sft", # Specifies the stage of training. Here, it's set to "sft" for supervised fine-tuning
do_train=True,
model_name_or_path="unsloth/llama-3-8b-Instruct-bnb-4bit", # use bnb-4bit-quantized Llama-3-8B-Instruct model
dataset="identity,alpaca_gpt4_en", # use the alpaca and identity datasets
template="llama3", # use llama3 for prompt template
finetuning_type="lora", # use the LoRA adapters which saves up memory
lora_target="all", # attach LoRA adapters to all linear layers
output_dir="llama3_lora", # path to save LoRA adapters
per_device_train_batch_size=2, # specify the batch size
gradient_accumulation_steps=4, # the gradient accumulation steps
lr_scheduler_type="cosine", # use the learning rate as cosine learning rate scheduler
logging_steps=10, # log every 10 steps
warmup_ratio=0.1, # use warmup scheduler
save_steps=1000, # save checkpoint every 1000 steps
learning_rate=5e-5, # the learning rate
num_train_epochs=3.0, # the epochs of training
max_samples=500, # use 500 examples in each dataset
max_grad_norm=1.0, # clip gradient norm to 1.0
quantization_bit=4, # use 4-bit QLoRA
loraplus_lr_ratio=16.0, # use LoRA+ with lambda=16.0
use_unsloth=True, # use UnslothAI's LoRA optimization for 2x faster training
fp16=True, # use float16 mixed precision training
)
json.dump(args, open("train_llama3.json", "w", encoding="utf-8"), indent=2)
Next, open a terminal and run the below command,
!llamafactory-cli train train_llama3.json
This will start the training process. Fine-tuning LLMs can be this easy.
Inferencing With Your Fine-Tuned Model
Inference refers to using a fine-tuned machine learning model to make predictions on new, unseen data. Once the model training is completed, we can use the model to infer from. Let us try doing that and check how the model works.
args = dict(
model_name_or_path="unsloth/llama-3-8b-Instruct-bnb-4bit", # Specifies the name or path of the pre-trained model to be used for inference. In this case, it's set to "unsloth/llama-3-8b-Instruct-bnb-4bit".
#adapter_name_or_path="llama3_lora", # load the saved LoRA adapters
finetuning_type="lora", # Specifies the type of fine-tuning. Here, it's set to "lora" for LoRA adapters.
template="llama3", # Specifies the prompt template to be used for inference. Here, it's set to "llama3"
quantization_bit=4, # Specifies the number of bits for quantization. In this case, it's set to 4
use_unsloth=True, # use UnslothAI's LoRA optimization for 2x faster generation
)
json.dump(args, open("infer_llama3.json", "w", encoding="utf-8"), indent=2)
Next, run the below code using your terminal.
!llamafactory-cli chat infer_llama3.json
Conclusion: Empowering LLMs for a Better Future
Llama-Factory democratizes the process of fine-tuning large language models, making it accessible to a wider audience. By providing a user-friendly interface and efficient training techniques, it empowers developers to create custom LLMs tailored for specific tasks and societal benefits.
Key takeaways:
- Llama-Factory simplifies fine-tuning for over 100 LLMs.
- It offers a user-friendly interface and advanced training techniques.
- Fine-tuning LLMs allows for greater accuracy, safety, and control.
- Experiment with Llama-Factory and contribute to the open-source community.
Remember to adhere to the model's license when using Llama-Factory to prevent misuse and promote responsible AI development.
Additional Resources
Ready to start exploring the possibilities of fine-tuning Llama 3 and other LLMs? With Llama-Factory, the power to customize AI is now at your fingertips.