Squeeze Every Drop: Distill GPT-4o Knowledge into GPT-4o-mini for Wine Expertise
OpenAI's model distillation offers a game-changing approach: transfer the smarts of powerful (but pricey) models like GPT-4o into smaller, faster ones. Imagine GPT-4o-mini, but with the wine-guessing abilities of its big brother. This guide explores how to achieve this, cutting costs and latency without sacrificing accuracy. We'll focus on a wine classification problem, using enums for structured outputs to boost performance even further.
The Magic of Model Distillation: Why It Matters
Model distillation is like cloning the knowledge of a large model into a smaller one.
- Cheaper Inferences: Run smaller models on less expensive hardware.
- Lower Latency: Get answers faster with smaller models.
- Specialized Expertise: Fine-tune for specific tasks, like wine tasting.
Dependencies
!pip install openai tiktoken numpy pandas tqdm -- quiet
import openai
import json
import tiktoken
from tqdm import tqdm
from openai import OpenAI
import numpy as np
import concurrent.futures
import pandas as pd
client = OpenAI()
French Wine Focus: Unearthing a Dataset
For this experiment, we'll leverage a fascinating dataset that contains numerous wine reviews available on Kaggle.
- Dataset Source: https://www.kaggle.com/datasets/zynicide/wine-reviews
- Narrowed Scope: We'll focus specifically on French wines to streamline the process, focusing on a subset of 500 to keep testing efficient.
- The Goal: To predict the grape variety based on review details like description, region, and winery.
Crafting the Perfect Prompt: Setting the Stage for Success
A well-structured prompt is key for accurate predictions and successful distallation of knowledge into a smaller model.
Let's filter the grape varieties that have less than 5 occurences in reviews.
Let's proceed with a subset of 500 random rows from this dataset.
df = pd.read_csv( 'data/winemag/winemag-data-130k-v2.csv')
df_france = df[df[ 'country'] == 'France']
# Let's also filter out wines that have less than 5 references with their grape variety – even though we'd like to find those
# they're outliers that we don't want to optimize for that would make our enum list be too long
# and they could also add noise for the rest of the dataset on which we'd like to guess, eventually reducing our accuracy.
varieties_less_than_five_list = df_france[ 'variety'].value_counts()[df_france[ 'variety'].value_counts() < 5].index.tolist()
df_france = df_france[ ~ df_france[ 'variety'].isin(varieties_less_than_five_list)]
df_france_subset = df_france.sample( n = 500)
df_france_subset.head()
# Let's retrieve all grape varieties to include them in the prompt and in our structured outputs enum list.
varieties = np.array(df_france[ 'variety'].unique()).astype( 'str')
varieties
Consider this when creating your prompts for your models:
- Detailed Context: Provide winery, region, description, reviewer, and points.
- Variety List: Explicitly list possible grape varieties for the model, improving focus.
- Concise Instructions: Request a single-word answer (the grape variety), ensuring consistency.
Prompt Example
Cost-Effective Completion Calls: Token Estimates
Before launching a large number of API calls, estimate token usage with tiktoken
and model costs.
Structured Wine Wisdom: Enums for Accurate Output
Using structured outputs ensures the model answers predictably and accurately.
- Enumerated List: Define a JSON schema with an
enum
listing all grape varieties. - Deterministic Answers: Force the model to select from the list, avoiding irrelevant responses.
- Performance Boost: Eliminate the need for further processing - an immediate accuracy win!
Store Completions with Metadata
To distill a model, it's essential to store all completions, this allows us to give it as a reference to the smaller model to fine-tune it. Storing with metadata also simplifies filtering for distillation and future evaluations.
- Store Parameter: Set
store=True
inclient.chat.completions.create
. - Metadata Tag: Add a descriptive metadata tag like
"distillation": "wine-distillation"
.
Parallel Processing for Speed: Concurrent Futures to the Rescue
Parallelizing API calls dramatically reduces processing time using concurrent futures.
ThreadPoolExecutor
: Use this to run completions concurrently.- Progress Bar: Implement
tqdm
for a visual display of processing. - Error Handling: Include
try...except
blocks to gracefully handle any API errors.
How to Ensure Accuracy in Results
Now that you have your code working with the model lets first check it before you distill models.
Now that all the checks are good we can now process our dataset.
Run both gpt-4o and gpt-4o-mini against the French wine dataset:
Compare those 2 Models
Comparing completion results between the 'teacher' model (GPT-4o) and the 'student' model (GPT-4o-mini). This is an important step to assess the effectiveness of distillation.
Distilling Knowledge: The OpenAI Platform Steps
Upload the processed dataset with stored gpt-4o completions to the OpenAI platform and begin distillation.
- Stored Completions: Navigate to the OpenAI Stored completions page.
- Filter: Select
gpt-4o
as the model and"distillation: wine-distillation"
as the metadata. - Initiate Distillation: Click "Distill" in the top right corner.
- Choose Base Model: Select
gpt-4o-mini
and configure fine-tuning parameters. - Track Progress: Retrieve the fine-tuning job ID and monitor progress.