Fine-Tune Llama 3: A Beginner's Guide with Llama-Factory on DigitalOcean
Ready to fine-tune cutting-edge language models without a Ph.D. in AI? This guide will walk you through using Llama-Factory to fine-tune Llama 3, even on a cloud GPU like DigitalOcean. We'll break down the process, explain the benefits, and show you how to get started, even if you're not a machine learning expert.
What is Llama-Factory? The User-Friendly Key to LLM Fine-Tuning
Llama-Factory simplifies the complex world of large language model (LLM) fine-tuning. It's a tool designed to make fine-tuning accessible and efficient, even for those new to the field. With Llama-Factory, you can fine-tune over 100 different models, including Llama, Mistral, and Falcon. The best part? You don't need to be a coding expert to get started!
Why Fine-Tune Llama 3? Unleash Custom Performance
Fine-tuning takes a pre-trained model and adjusts it to perform even better on specific tasks or with specific datasets. This allows the model to learn nuances and patterns it wouldn't otherwise be exposed to. By fine-tuning, you tailor the model's knowledge to your unique needs, improving accuracy and efficiency.
Think of it like this: a general-purpose chef is good at many dishes, but a pastry chef excels at desserts. Fine-tuning turns a general LLM into a specialized expert for your use case.
Key Benefits of Using Llama-Factory for Llama 3 Fine-Tuning
- Accessibility: Fine-tune Llama 3 without extensive coding knowledge.
- Cost-Effective: Llama-Factory optimizes resource usage, saving you money.
- Model Variety: Supports various models like Llama, Mistral, and Falcon.
- Advanced Techniques: Integrates LoRA and GaLore for reduced GPU usage.
- Monitoring Tools: Supports TensorBoard, VanDB, and MLflow for tracking progress.
- Faster Inference: Utilizes Gradio and CLI for efficient model deployment.
LLaMA Board: Your No-Code Fine-Tuning Dashboard
LLaMA Board is a user-friendly interface within Llama-Factory that lets you adjust and improve your LLM's performance without coding. Think of it as a control panel for your model, allowing you to tweak settings and monitor progress easily.
LLaMA Board Key Features:
- Easy Customization: Adjust settings through a web interface.
- Progress Monitoring: Track model performance with real-time updates and graphs.
- Flexible Testing: Compare model outputs to known answers or test with custom prompts.
- Multilingual Support: Works in English, Russian, and Chinese, with room for more languages.
Step-by-Step: Fine-Tuning Llama 3 with Llama-Factory
Prerequisites:
You'll need a DigitalOcean account and a GPU with sufficient power (NVIDIA A4000 or better recommended).
Let's dive into fine-tuning Llama 3 using the Llama-Factory.
-
Clone the Repository: Download Llama-Factory from GitHub.
!git clone https://github.com/hiyouga/LLaMA-Factory.git %cd LLaMA-Factory %ls
-
Install Dependencies: Install Unsloth, xformers, and bitsandbytes for efficient fine-tuning.
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git" !pip install --no-deps xformers==0.0.25 !pip install .[bitsandbytes] !pip install 'urllib3<2'
-
Verify GPU: Check your GPU specifications to ensure compatibility.
!nvidia-smi
-
Check CUDA: Confirm CUDA is available and configured correctly.
import torch try: assert torch.cuda.is_available() is True except AssertionError: print("Your GPU is not setup!")
-
Import Dataset: Use the provided dataset or create your own custom dataset.
import json %cd /notebooks/LLaMA-Factory MODEL_NAME = "Llama-3" with open("/notebooks/LLaMA-Factory/data/identity.json", "r", encoding="utf-8") as f: dataset = json.load(f) for sample in dataset: sample["output"] = sample["output"].replace("MODEL_NAME", MODEL_NAME).replace("AUTHOR", "LLaMA Factory") with open("/notebooks/LLaMA-Factory/data/identity.json", "w", encoding="utf-8") as f: json.dump(dataset, f, indent=2, ensure_ascii=False)
-
Generate Gradio Web App Link Launch the Llama-Factory interface in your browser.
#generates the web app link %cd /notebooks/LLaMA-Factory !GRADIO_SHARE=1 llamafactory-cli webui
-
Configure Training: Use the Gradio interface to select your model, adapter, training options, and hyperparameters, then start training.
-
CLI Training (Alternative): Alternatively, use CLI commands for more control.
args = dict( stage="sft", # Specifies the stage of training. Here, it's set to "sft" for supervised fine-tuning do_train=True, model_name_or_path="unsloth/llama-3-8b-Instruct-bnb-4bit", # use bnb-4bit-quantized Llama-3-8B-Instruct model dataset="identity,alpaca_gpt4_en", # use the alpaca and identity datasets template="llama3", # use llama3 for prompt template finetuning_type="lora", # use the LoRA adapters which saves up memory lora_target="all", # attach LoRA adapters to all linear layers output_dir="llama3_lora", # path to save LoRA adapters per_device_train_batch_size=2, # specify the batch size gradient_accumulation_steps=4, # the gradient accumulation steps lr_scheduler_type="cosine", # use the learning rate as cosine learning rate scheduler logging_steps=10, # log every 10 steps warmup_ratio=0.1, # use warmup scheduler save_steps=1000, # save checkpoint every 1000 steps learning_rate=5e-5, # the learning rate num_train_epochs=3.0, # the epochs of training max_samples=500, # use 500 examples in each dataset max_grad_norm=1.0, # clip gradient norm to 1.0 quantization_bit=4, # use 4-bit QLoRA loraplus_lr_ratio=16.0, # use LoRA+ with lambda=16.0 use_unsloth=True, # use UnslothAI's LoRA optimization for 2x faster training fp16=True, # use float16 mixed precision training ) json.dump(args, open("train_llama3.json", "w", encoding="utf-8"), indent=2)
Execute the command:
!llamafactory-cli train train_llama3.json
-
Inference: Once training is complete, use the model for inference. Define model parameters.
args = dict( model_name_or_path="unsloth/llama-3-8b-Instruct-bnb-4bit", # Specifies the name or path of the pre-trained model to be used for inference. In this case, it's set to "unsloth/llama-3-8b-Instruct-bnb-4bit". #adapter_name_or_path="llama3_lora", # load the saved LoRA adapters finetuning_type="lora", # Specifies the type of fine-tuning. Here, it's set to "lora" for LoRA adapters. template="llama3", # Specifies the prompt template to be used for inference. Here, it's set to "llama3" quantization_bit=4, # Specifies the number of bits for quantization. In this case, it's set to 4 use_unsloth=True, # use UnslothAI's LoRA optimization for 2x faster generation ) json.dump(args, open("infer_llama3.json", "w", encoding="utf-8"), indent=2)
Run the chat interface:
!llamafactory-cli chat infer_llama3.json
Llama-Factory: Democratizing LLM Fine-Tuning for All
Llama-Factory breaks down the barriers to LLM fine-tuning, empowering developers of all skill levels to customize and optimize models like Llama 3. Its user-friendly interface, combined with powerful optimization techniques, makes fine-tuning accessible, cost-effective, and efficient. Explore Llama-Factory and discover the potential of personalized LLMs. Remember to always adhere to the model's license agreements.