Fine-Tune Llama 3: A Beginner's Guide with Llama-Factory on DigitalOcean

Ready to fine-tune cutting-edge language models without a Ph.D. in AI? This guide will walk you through using Llama-Factory to fine-tune Llama 3, even on a cloud GPU like DigitalOcean. We'll break down the process, explain the benefits, and show you how to get started, even if you're not a machine learning expert.

What is Llama-Factory? The User-Friendly Key to LLM Fine-Tuning

Llama-Factory simplifies the complex world of large language model (LLM) fine-tuning. It's a tool designed to make fine-tuning accessible and efficient, even for those new to the field. With Llama-Factory, you can fine-tune over 100 different models, including Llama, Mistral, and Falcon. The best part? You don't need to be a coding expert to get started!

Why Fine-Tune Llama 3? Unleash Custom Performance

Fine-tuning takes a pre-trained model and adjusts it to perform even better on specific tasks or with specific datasets. This allows the model to learn nuances and patterns it wouldn't otherwise be exposed to. By fine-tuning, you tailor the model's knowledge to your unique needs, improving accuracy and efficiency.

Think of it like this: a general-purpose chef is good at many dishes, but a pastry chef excels at desserts. Fine-tuning turns a general LLM into a specialized expert for your use case.

Key Benefits of Using Llama-Factory for Llama 3 Fine-Tuning

Accessibility: Fine-tune Llama 3 without extensive coding knowledge.
Cost-Effective: Llama-Factory optimizes resource usage, saving you money.
Model Variety: Supports various models like Llama, Mistral, and Falcon.
Advanced Techniques: Integrates LoRA and GaLore for reduced GPU usage.
Monitoring Tools: Supports TensorBoard, VanDB, and MLflow for tracking progress.
Faster Inference: Utilizes Gradio and CLI for efficient model deployment.

LLaMA Board: Your No-Code Fine-Tuning Dashboard

LLaMA Board is a user-friendly interface within Llama-Factory that lets you adjust and improve your LLM's performance without coding. Think of it as a control panel for your model, allowing you to tweak settings and monitor progress easily.

LLaMA Board Key Features:

Easy Customization: Adjust settings through a web interface.
Progress Monitoring: Track model performance with real-time updates and graphs.
Flexible Testing: Compare model outputs to known answers or test with custom prompts.
Multilingual Support: Works in English, Russian, and Chinese, with room for more languages.

Step-by-Step: Fine-Tuning Llama 3 with Llama-Factory

Prerequisites:

You'll need a DigitalOcean account and a GPU with sufficient power (NVIDIA A4000 or better recommended).

Let's dive into fine-tuning Llama 3 using the Llama-Factory.

Clone the Repository: Download Llama-Factory from GitHub.

!git clone https://github.com/hiyouga/LLaMA-Factory.git
%cd LLaMA-Factory
%ls

Install Dependencies: Install Unsloth, xformers, and bitsandbytes for efficient fine-tuning.

!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps xformers==0.0.25
!pip install .[bitsandbytes]
!pip install 'urllib3<2'

Verify GPU: Check your GPU specifications to ensure compatibility.
```
!nvidia-smi
```

Check CUDA: Confirm CUDA is available and configured correctly.

import torch
try:
 assert torch.cuda.is_available() is True
except AssertionError:
 print("Your GPU is not setup!")

Import Dataset: Use the provided dataset or create your own custom dataset.

import json

%cd /notebooks/LLaMA-Factory
MODEL_NAME = "Llama-3"

with open("/notebooks/LLaMA-Factory/data/identity.json", "r", encoding="utf-8") as f:
 dataset = json.load(f)

for sample in dataset:
 sample["output"] = sample["output"].replace("MODEL_NAME", MODEL_NAME).replace("AUTHOR", "LLaMA Factory")

with open("/notebooks/LLaMA-Factory/data/identity.json", "w", encoding="utf-8") as f:
 json.dump(dataset, f, indent=2, ensure_ascii=False)

Generate Gradio Web App Link Launch the Llama-Factory interface in your browser.

#generates the web app link
%cd /notebooks/LLaMA-Factory
!GRADIO_SHARE=1 llamafactory-cli webui

Configure Training: Use the Gradio interface to select your model, adapter, training options, and hyperparameters, then start training.

CLI Training (Alternative): Alternatively, use CLI commands for more control.

args = dict(
 stage="sft", # Specifies the stage of training. Here, it's set to "sft" for supervised fine-tuning
 do_train=True,
 model_name_or_path="unsloth/llama-3-8b-Instruct-bnb-4bit", # use bnb-4bit-quantized Llama-3-8B-Instruct model
 dataset="identity,alpaca_gpt4_en", # use the alpaca and identity datasets
 template="llama3", # use llama3 for prompt template
 finetuning_type="lora", # use the LoRA adapters which saves up memory
 lora_target="all", # attach LoRA adapters to all linear layers
 output_dir="llama3_lora", # path to save LoRA adapters
 per_device_train_batch_size=2, # specify the batch size
 gradient_accumulation_steps=4, # the gradient accumulation steps
 lr_scheduler_type="cosine", # use the learning rate as cosine learning rate scheduler
 logging_steps=10, # log every 10 steps
 warmup_ratio=0.1, # use warmup scheduler
 save_steps=1000, # save checkpoint every 1000 steps
 learning_rate=5e-5, # the learning rate
 num_train_epochs=3.0, # the epochs of training
 max_samples=500, # use 500 examples in each dataset
 max_grad_norm=1.0, # clip gradient norm to 1.0
 quantization_bit=4, # use 4-bit QLoRA
 loraplus_lr_ratio=16.0, # use LoRA+ with lambda=16.0
 use_unsloth=True, # use UnslothAI's LoRA optimization for 2x faster training
 fp16=True, # use float16 mixed precision training
)

json.dump(args, open("train_llama3.json", "w", encoding="utf-8"), indent=2)

Execute the command:

!llamafactory-cli train train_llama3.json

Inference: Once training is complete, use the model for inference. Define model parameters.

args = dict(
 model_name_or_path="unsloth/llama-3-8b-Instruct-bnb-4bit", # Specifies the name or path of the pre-trained model to be used for inference. In this case, it's set to "unsloth/llama-3-8b-Instruct-bnb-4bit".
 #adapter_name_or_path="llama3_lora", # load the saved LoRA adapters
 finetuning_type="lora", # Specifies the type of fine-tuning. Here, it's set to "lora" for LoRA adapters.
 template="llama3", # Specifies the prompt template to be used for inference. Here, it's set to "llama3"
 quantization_bit=4, # Specifies the number of bits for quantization. In this case, it's set to 4
 use_unsloth=True, # use UnslothAI's LoRA optimization for 2x faster generation
)

json.dump(args, open("infer_llama3.json", "w", encoding="utf-8"), indent=2)

Run the chat interface:

!llamafactory-cli chat infer_llama3.json

Llama-Factory: Democratizing LLM Fine-Tuning for All

Llama-Factory breaks down the barriers to LLM fine-tuning, empowering developers of all skill levels to customize and optimize models like Llama 3. Its user-friendly interface, combined with powerful optimization techniques, makes fine-tuning accessible, cost-effective, and efficient. Explore Llama-Factory and discover the potential of personalized LLMs. Remember to always adhere to the model's license agreements.

Fine-Tune Llama 3: A Beginner's Guide with Llama-Factory on DigitalOcean

What is Llama-Factory? The User-Friendly Key to LLM Fine-Tuning

Why Fine-Tune Llama 3? Unleash Custom Performance

Think of it like this: a general-purpose chef is good at many dishes, but a pastry chef excels at desserts. Fine-tuning turns a general LLM into a specialized expert for your use case.

Key Benefits of Using Llama-Factory for Llama 3 Fine-Tuning

Accessibility: Fine-tune Llama 3 without extensive coding knowledge.
Cost-Effective: Llama-Factory optimizes resource usage, saving you money.
Model Variety: Supports various models like Llama, Mistral, and Falcon.
Advanced Techniques: Integrates LoRA and GaLore for reduced GPU usage.
Monitoring Tools: Supports TensorBoard, VanDB, and MLflow for tracking progress.
Faster Inference: Utilizes Gradio and CLI for efficient model deployment.

LLaMA Board: Your No-Code Fine-Tuning Dashboard

LLaMA Board Key Features:

Easy Customization: Adjust settings through a web interface.
Progress Monitoring: Track model performance with real-time updates and graphs.
Flexible Testing: Compare model outputs to known answers or test with custom prompts.
Multilingual Support: Works in English, Russian, and Chinese, with room for more languages.

Step-by-Step: Fine-Tuning Llama 3 with Llama-Factory

Prerequisites:

You'll need a DigitalOcean account and a GPU with sufficient power (NVIDIA A4000 or better recommended).

Let's dive into fine-tuning Llama 3 using the Llama-Factory.

Clone the Repository: Download Llama-Factory from GitHub.

!git clone https://github.com/hiyouga/LLaMA-Factory.git
%cd LLaMA-Factory
%ls

Install Dependencies: Install Unsloth, xformers, and bitsandbytes for efficient fine-tuning.

!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps xformers==0.0.25
!pip install .[bitsandbytes]
!pip install 'urllib3<2'

Verify GPU: Check your GPU specifications to ensure compatibility.
```
!nvidia-smi
```

Check CUDA: Confirm CUDA is available and configured correctly.

import torch
try:
 assert torch.cuda.is_available() is True
except AssertionError:
 print("Your GPU is not setup!")

Import Dataset: Use the provided dataset or create your own custom dataset.

import json

%cd /notebooks/LLaMA-Factory
MODEL_NAME = "Llama-3"

with open("/notebooks/LLaMA-Factory/data/identity.json", "r", encoding="utf-8") as f:
 dataset = json.load(f)

for sample in dataset:
 sample["output"] = sample["output"].replace("MODEL_NAME", MODEL_NAME).replace("AUTHOR", "LLaMA Factory")

with open("/notebooks/LLaMA-Factory/data/identity.json", "w", encoding="utf-8") as f:
 json.dump(dataset, f, indent=2, ensure_ascii=False)

Generate Gradio Web App Link Launch the Llama-Factory interface in your browser.

#generates the web app link
%cd /notebooks/LLaMA-Factory
!GRADIO_SHARE=1 llamafactory-cli webui

Configure Training: Use the Gradio interface to select your model, adapter, training options, and hyperparameters, then start training.

CLI Training (Alternative): Alternatively, use CLI commands for more control.

args = dict(
 stage="sft", # Specifies the stage of training. Here, it's set to "sft" for supervised fine-tuning
 do_train=True,
 model_name_or_path="unsloth/llama-3-8b-Instruct-bnb-4bit", # use bnb-4bit-quantized Llama-3-8B-Instruct model
 dataset="identity,alpaca_gpt4_en", # use the alpaca and identity datasets
 template="llama3", # use llama3 for prompt template
 finetuning_type="lora", # use the LoRA adapters which saves up memory
 lora_target="all", # attach LoRA adapters to all linear layers
 output_dir="llama3_lora", # path to save LoRA adapters
 per_device_train_batch_size=2, # specify the batch size
 gradient_accumulation_steps=4, # the gradient accumulation steps
 lr_scheduler_type="cosine", # use the learning rate as cosine learning rate scheduler
 logging_steps=10, # log every 10 steps
 warmup_ratio=0.1, # use warmup scheduler
 save_steps=1000, # save checkpoint every 1000 steps
 learning_rate=5e-5, # the learning rate
 num_train_epochs=3.0, # the epochs of training
 max_samples=500, # use 500 examples in each dataset
 max_grad_norm=1.0, # clip gradient norm to 1.0
 quantization_bit=4, # use 4-bit QLoRA
 loraplus_lr_ratio=16.0, # use LoRA+ with lambda=16.0
 use_unsloth=True, # use UnslothAI's LoRA optimization for 2x faster training
 fp16=True, # use float16 mixed precision training
)

json.dump(args, open("train_llama3.json", "w", encoding="utf-8"), indent=2)

Execute the command:

!llamafactory-cli train train_llama3.json

Inference: Once training is complete, use the model for inference. Define model parameters.

args = dict(
 model_name_or_path="unsloth/llama-3-8b-Instruct-bnb-4bit", # Specifies the name or path of the pre-trained model to be used for inference. In this case, it's set to "unsloth/llama-3-8b-Instruct-bnb-4bit".
 #adapter_name_or_path="llama3_lora", # load the saved LoRA adapters
 finetuning_type="lora", # Specifies the type of fine-tuning. Here, it's set to "lora" for LoRA adapters.
 template="llama3", # Specifies the prompt template to be used for inference. Here, it's set to "llama3"
 quantization_bit=4, # Specifies the number of bits for quantization. In this case, it's set to 4
 use_unsloth=True, # use UnslothAI's LoRA optimization for 2x faster generation
)

json.dump(args, open("infer_llama3.json", "w", encoding="utf-8"), indent=2)

Run the chat interface:

!llamafactory-cli chat infer_llama3.json

Fine-Tune Llama 3: A Beginner's Guide with Llama-Factory on DigitalOcean

What is Llama-Factory? The User-Friendly Key to LLM Fine-Tuning

Why Fine-Tune Llama 3? Unleash Custom Performance

Key Benefits of Using Llama-Factory for Llama 3 Fine-Tuning

LLaMA Board: Your No-Code Fine-Tuning Dashboard

LLaMA Board Key Features:

Step-by-Step: Fine-Tuning Llama 3 with Llama-Factory

Llama-Factory: Democratizing LLM Fine-Tuning for All

Fine-Tune Llama 3: A Beginner's Guide with Llama-Factory on DigitalOcean

What is Llama-Factory? The User-Friendly Key to LLM Fine-Tuning

Why Fine-Tune Llama 3? Unleash Custom Performance

Key Benefits of Using Llama-Factory for Llama 3 Fine-Tuning

LLaMA Board: Your No-Code Fine-Tuning Dashboard

LLaMA Board Key Features:

Step-by-Step: Fine-Tuning Llama 3 with Llama-Factory

Llama-Factory: Democratizing LLM Fine-Tuning for All

Related Posts