Fine-Tune Llama 3: A Step-by-Step Guide Using Llama-Factory and DigitalOcean

Want to customize Llama 3 for specific tasks? This in-depth guide shows you how to fine-tune Llama 3 using Llama-Factory on a cloud GPU, like DigitalOcean. We'll cover everything from prerequisites to inference, making the process accessible even if you're not an AI expert. Learn how to train Llama 3 so you can build powerful, customized language models with ease.

Why Fine-Tune Llama 3? Unleash the Power of Customization

Fine-tuning adapts a pre-trained model to a specific task or dataset, boosting its performance and accuracy.

Improve Accuracy: Tailor Llama 3 to excel in a niche area.
Reduce Errors: Minimize irrelevant or incorrect outputs.
Save Resources: Fine-tuning is faster and cheaper than training from scratch.

By using tools like Llama-Factory, you can efficiently train Llama 3 to generate responses specific to your needs, whether for customer service, content creation, or research.

Introducing Llama-Factory: Your Secret Weapon for Efficient Model Training

Llama-Factory simplifies fine-tuning models like Llama 3, Mistral, and Falcon, making it accessible to a wider audience. It provides a user-friendly interface and supports advanced algorithms. Using Llama-factory, you can perform Supervised Fine-Tuning, DPO, ORPO, and PPO.

User-Friendly Interface: Fine-tune models without extensive coding knowledge as Llama factory support 100+ models.
Cost-Effective: Reduce GPU usage with LoRA and GaLore configurations.
Monitoring Tools: Integrate TensorBoard, VanDB, and MLflow for real-time insights.
Hugging Face Integration: Leverage the Llama Board on Hugging Face for easy fine-tuning.

Prerequisites: What You'll Need to Get Started with Training Llama Models.

Before diving into fine-tuning, ensure you have the following:

Basic Understanding of LLMs: Familiarity with Generative Pretrained Transformers is recommended.
Sufficient GPU Power: An NVIDIA A4000, H100 or similar GPU is essential for efficient training. Consider DigitalOcean's GPU offerings for scalable and cost-effective solutions.

Step-by-Step Guide: Fine-Tuning Llama 3 with Llama-Factory

Let's walk through the process of fine-tuning Llama 3 locally, leveraging Llama-Factory's capabilities.

Step 1: Clone the Repository and Install Dependencies

Start by cloning the Llama-Factory repository and installing the necessary libraries, including Unsloth for efficient training and xformers for optimized performance:

!git clone https://github.com/hiyouga/LLaMA-Factory.git
%cd LLaMA-Factory
%ls
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps xformers==0.0.25
!pip install .[bitsandbytes]
!pip install 'urllib3<2'

Step 2: Verify GPU Availability

Confirm that your GPU is correctly set up and accessible using the following commands:

!nvidia-smi
import torch
try:
 assert torch.cuda.is_available() is True
except AssertionError:
 print("Your GPU is not setup!")

Step 3: Import and Prepare the Dataset

Import the dataset provided in the repository or create your custom dataset. In this example, we're using the identity.json dataset:

import json

%cd /notebooks/LLaMA-Factory
MODEL_NAME = "Llama-3"

with open("/notebooks/LLaMA-Factory/data/identity.json", "r", encoding="utf-8") as f:
 dataset = json.load(f)

for sample in dataset:
 sample["output"] = sample["output"].replace("MODEL_NAME", MODEL_NAME).replace("AUTHOR", "LLaMA Factory")

with open("/notebooks/LLaMA-Factory/data/identity.json", "w", encoding="utf-8") as f:
 json.dump(dataset, f, indent=2, ensure_ascii=False)

Step 4: Launch the Gradio Web App

Generate the Gradio web app link for Llama-Factory to access the GUI:

%cd /notebooks/LLaMA-Factory
!GRADIO_SHARE=1 llamafactory-cli webui

Step 5: Configure Training Parameters in the GUI

Use the Gradio interface to configure your training parameters:

Model Selection: Choose Llama 3 (8B).
Adapter Configuration: Select LoRa or other adapters.
Training Options: Opt for supervised fine-tuning (SFT).
Dataset Selection: Use the provided dataset or upload your own.
Hyperparameter Configuration: Adjust epochs, batch size, and learning rate.
Start Training: Click "Start" to begin the fine-tuning process.

Step 6: Start Training via CLI (Alternative)

Alternatively, initiate training using CLI commands. This method provides more control over the training process:

args = dict(
 stage="sft", # Specifies the stage of training. Here, it's set to "sft" for supervised fine-tuning
 do_train=True,
 model_name_or_path="unsloth/llama-3-8b-Instruct-bnb-4bit", # use bnb-4bit-quantized Llama-3-8B-Instruct model
 dataset="identity,alpaca_gpt4_en", # use the alpaca and identity datasets
 template="llama3", # use llama3 for prompt template
 finetuning_type="lora", # use the LoRA adapters which saves up memory
 lora_target="all", # attach LoRA adapters to all linear layers
 output_dir="llama3_lora", # path to save LoRA adapters
 per_device_train_batch_size=2, # specify the batch size
 gradient_accumulation_steps=4, # the gradient accumulation steps
 lr_scheduler_type="cosine", # use the learning rate as cosine learning rate scheduler
 logging_steps=10, # log every 10 steps
 warmup_ratio=0.1, # use warmup scheduler
 save_steps=1000, # save checkpoint every 1000 steps
 learning_rate=5e-5, # the learning rate
 num_train_epochs=3.0, # the epochs of training
 max_samples=500, # use 500 examples in each dataset
 max_grad_norm=1.0, # clip gradient norm to 1.0
 quantization_bit=4, # use 4-bit QLoRA
 loraplus_lr_ratio=16.0, # use LoRA+ with lambda=16.0
 use_unsloth=True, # use UnslothAI's LoRA optimization for 2x faster training
 fp16=True, # use float16 mixed precision training
)

json.dump(args, open("train_llama3.json", "w", encoding="utf-8"), indent=2)

Start the training with this command:

!llamafactory-cli train train_llama3.json

Step 7: Inference with the Fine-Tuned Model

Once the training is complete, use the fine-tuned model for inference. Configure the model and adapter settings:

args = dict(
 model_name_or_path="unsloth/llama-3-8b-Instruct-bnb-4bit", # Specifies the name or path of the pre-trained model to be used for inference. In this case, it's set to "unsloth/llama-3-8b-Instruct-bnb-4bit".
 #adapter_name_or_path="llama3_lora", # load the saved LoRA adapters
 finetuning_type="lora", # Specifies the type of fine-tuning. Here, it's set to "lora" for LoRA adapters.
 template="llama3", # Specifies the prompt template to be used for inference. Here, it's set to "llama3"
 quantization_bit=4, # Specifies the number of bits for quantization. In this case, it's set to 4
 use_unsloth=True, # use UnslothAI's LoRA optimization for 2x faster generation
)

json.dump(args, open("infer_llama3.json", "w", encoding="utf-8"), indent=2)

Run the inference using the CLI:

!llamafactory-cli chat infer_llama3.json

Llama Board: A User-Friendly Interface for Llama-Factory

LLaMA Board's user-friendly dashboard simplifies the process of adjusting and improving language model performance without requiring coding expertise.

Easy Customization: Modify model learning via a web interface.
Progress Monitoring: Track model performance with real-time updates and graphs.
Flexible Testing: Evaluate model understanding through comparisons and direct interaction.
Multilingual Support: Works in multiple languages, making it accessible to a global audience.

The Future of LLMs: Empowering Developers with Fine-Tuning

Llama-Factory is democratizing AI development, enabling more people to customize and fine-tune powerful language models like Llama 3. This fosters innovation and allows developers to create LLMs tailored to specific needs, driving advancements across various industries. By learning how to fine-tune Llama 3, you're opening doors to creating custom AI solutions that can benefit society.

Conclusion: Embrace the Power of Fine-Tuning

Fine-tuning is essential for adapting large language models to specific tasks, and Llama-Factory makes this process easier than ever. By following this guide, you can efficiently fine-tune Llama 3, unlocking its full potential and creating custom AI solutions. Remember to adhere to the model's license and explore different parameters to achieve optimal results. Take the plunge and see how Llama-Factory can revolutionize your approach to LLMs today!