Master AI Model Chaining: Structured JSON Outputs with OpenAI's o1 and gpt-4o-mini

Struggling to get consistent, structured data from OpenAI's powerful o1 reasoning models? While the initial o1 releases excel at complex tasks, they lack native structured output support, making JSON parsing unreliable. This article dives into a clever workaround leveraging chained calls to unlock type-safe JSON outputs for streamlined workflows.

Why You Need Structured Outputs from AI Models

Type Safety: Ensure data integrity with predictable, defined formats.
Simplified Prompting: Reduce prompt complexity, focusing on core instructions instead of JSON formatting.
Code Reusability: Integrate object schemas seamlessly into existing systems.
Efficient Workflows: Automate data ingestion and processing with ease.

Method 1: Prompting o1-preview to get JSON returned

The initial approach involves explicitly prompting o1-preview to return a JSON response.

Fetch Data: Retrieve relevant content from a source, for example, a Wikipedia page about major companies.
Craft a Detailed Prompt: Instruct the model to analyze the data and provide insights in the specified JSON format.
Process the JSON: Parse the response, and handle potential errors manually.

This approach can yield decent results, it has its drawbacks:

Manual JSON processing is required
Model refusals are not provided.

Let's illustrate with a python example:

import requests
from openai import OpenAI

client = OpenAI()

def fetch_html(url):
    response = requests.get(url)
    if response.status_code == 200:
        return response.text
    else:
        return None

url = "https://en.wikipedia.org/wiki/List_of_largest_companies_in_the_United_States_by_revenue"
html_content = fetch_html(url)

json_format = """
{
    companies: [
        {
            \"company_name\": \"OpenAI\",
            \"page_link\": \"https://en.wikipedia.org/wiki/OpenAI\",
            \"reason\": \"OpenAI would benefit because they are an AI company...\"
        }
    ]
}
"""

o1_response = client.chat.completions.create(
    model="o1-preview",
    messages=[
        {
            "role": "user",
            "content": f"""
            You are a business analyst designed to understand how AI technology could be used across large corporations.

            - Read the following html and return which companies would benefit from using AI technology: {html_content}.
            - Rank these propects by opportunity by comparing them and show me the top 3. Return only as a JSON with the following format: {json_format} "
            """
        }
    ]
)

print(o1_response.choices[0].message.content)

Method 2: Unleash Structured Outputs with Chained Calls (o1-preview + gpt-4o-mini)

Here’s how to achieve reliable structured outputs by chaining o1-preview with gpt-4o-mini:

Define Your Data Schema: Use Pydantic to create a clear data model (e.g., CompanyData, CompaniesData).
First Call (o1-preview): Task o1-preview with analysis, requesting the output to contain specific fields.
Second Call (gpt-4o-mini): Feed the o1-preview response to gpt-4o-mini and instruct it to format the data according to your defined schema using the response_format parameter.
Enjoy Type-Safe Results: Access parsed, structured data directly.

from pydantic import BaseModel
from devtools import pprint

class CompanyData(BaseModel):
    company_name: str
    page_link: str
    reason: str

class CompaniesData(BaseModel):
    companies: list[CompanyData]

o1_response = client.chat.completions.create(
    model="o1-preview",
    messages=[
        {
            "role": "user",
            "content": f"""
            You are a business analyst designed to understand how AI technology could be used across large corporations.

            - Read the following html and return which companies would benefit from using AI technology: {html_content}.
            - Rank these propects by opportunity by comparing them and show me the top 3. Return each with {CompanyData.__fields__.keys()}
            """
        }
    ]
)

o1_response_content = o1_response.choices[0].message.content

response = client.beta.chat.completions.parse(
    model="gpt-4o-mini",
    messages=[
        {
            "role": "user",
            "content": f"""
            Given the following data, format it with the given response format: {o1_response_content}
            """
        }
    ],
    response_format=CompaniesData,
)

pprint(response.choices[0].message.parsed)

Benefits of Chained Calls

Reliable Type Safety: Guarantees data conforms to your schema.
Simplified Workflows: Streamlines data processing and integration.
Reduced Prompt Complexity: Focus o1-preview on analysis, and gpt-4o-mini on formatting.
Reusability: Leverages pre-defined schemas across your codebase.

Tips for Maximizing JSON Output Accuracy with OpenAI Models:

Be verbose and clear in your prompt, telling the models what you need from them with all the formatting info
Test the models iteratively
Define your types and schemas
Carefully specify data types (string, integer, boolean, etc.) to minimize parsing errors.

Conclusion

While o1 models lack native structured output support, this two-step method effectively bridges the gap using gpt-4o-mini. This approach delivers reliable, type-safe JSON outputs, significantly enhancing the utility of these powerful models in automated workflows. Embrace chained calls to simplify your code, ensure data integrity, and unlock the full potential of OpenAI's advanced AI capabilities.

Master AI Model Chaining: Structured JSON Outputs with OpenAI's o1 and gpt-4o-mini

Why You Need Structured Outputs from AI Models

Type Safety: Ensure data integrity with predictable, defined formats.

Simplified Prompting: Reduce prompt complexity, focusing on core instructions instead of JSON formatting.

Code Reusability: Integrate object schemas seamlessly into existing systems.

Efficient Workflows: Automate data ingestion and processing with ease.

Method 1: Prompting o1-preview to get JSON returned

The initial approach involves explicitly prompting o1-preview to return a JSON response.

Fetch Data: Retrieve relevant content from a source, for example, a Wikipedia page about major companies.

Craft a Detailed Prompt: Instruct the model to analyze the data and provide insights in the specified JSON format.

Process the JSON: Parse the response, and handle potential errors manually.

This approach can yield decent results, it has its drawbacks:

Manual JSON processing is required

Model refusals are not provided.

Let's illustrate with a python example:

Method 2: Unleash Structured Outputs with Chained Calls (o1-preview + gpt-4o-mini)

Here’s how to achieve reliable structured outputs by chaining o1-preview with gpt-4o-mini:

Define Your Data Schema: Use Pydantic to create a clear data model (e.g., CompanyData, CompaniesData).

First Call (o1-preview): Task o1-preview with analysis, requesting the output to contain specific fields.

Second Call (gpt-4o-mini): Feed the o1-preview response to gpt-4o-mini and instruct it to format the data according to your defined schema using the response_format parameter.

Enjoy Type-Safe Results: Access parsed, structured data directly.

Benefits of Chained Calls

Reliable Type Safety: Guarantees data conforms to your schema.

Simplified Workflows: Streamlines data processing and integration.

Reduced Prompt Complexity: Focus o1-preview on analysis, and gpt-4o-mini on formatting.

Reusability: Leverages pre-defined schemas across your codebase.

Tips for Maximizing JSON Output Accuracy with OpenAI Models:

Be verbose and clear in your prompt, telling the models what you need from them with all the formatting info

Test the models iteratively

Define your types and schemas

Carefully specify data types (string, integer, boolean, etc.) to minimize parsing errors.

Conclusion

Master AI Model Chaining: Structured JSON Outputs with OpenAI's o1 and gpt-4o-mini

Method 1: Prompting o1-preview to get JSON returned

Method 2: Unleash Structured Outputs with Chained Calls (o1-preview + gpt-4o-mini)

Tips for Maximizing JSON Output Accuracy with OpenAI Models:

Master AI Model Chaining: Structured JSON Outputs with OpenAI's o1 and gpt-4o-mini

Method 1: Prompting o1-preview to get JSON returned

Method 2: Unleash Structured Outputs with Chained Calls (o1-preview + gpt-4o-mini)

Tips for Maximizing JSON Output Accuracy with OpenAI Models:

Related Posts