Unlock the Power of Your Data: A Developer's Guide to Using GPT Actions with AWS Redshift

Are you ready to bridge the gap between natural language and your Redshift data warehouse? This guide empowers developers to create GPT Actions that seamlessly connect to Redshift, unlocking data analysis and insights through the power of conversational AI. Imagine citizen data users asking questions directly through ChatGPT and getting answers derived from your Redshift database. This article will show you how.

Why Connect ChatGPT to Redshift? Unleash the Benefits

Democratize Data Access: Enable citizen data users to ask basic questions, eliminating the need for specialized SQL knowledge.
Accelerate Data Analysis: Empower data scientists to connect to Redshift tables and perform in-depth data analysis using ChatGPT's capabilities.
Gain Real-Time Insights: Improve visibility into your data and quickly identify potential anomalies through simple conversational queries.
Streamline Data Interaction: Leverage ChatGPT's natural language processing (NLP) to interact with complex Redshift data.

Prerequisites: Gearing Up for Success

Before diving in, ensure you have the following in place:

Redshift Access: An active Redshift environment.
AWS Permissions: Rights to deploy AWS Functions within the same Virtual Private Network (VPC) as your Redshift instance.
AWS CLI Authentication: Your AWS Command Line Interface properly authenticated.
Essential Tools: Installation of AWS CLI, AWS SAM CLI, Python, and yq.

Building the Middleware: The Key to Redshift and GPT Integration

This solution uses AWS Functions as middleware, performing actions exclusively within the AWS ecosystem. The middleware function will execute SQL queries on your Redshift database, await results, and return the data as a file digestible by GPT.

Step-by-Step Guide to Deploying the Redshift Middleware Function:

Install Required Libraries: Refer to AWS documentation for detailed AWS CLI and AWS SAM CLI installation procedures.
Create the AWS Function: Follow the AWS Middleware Action cookbook. The steps here will cover specifically the requirements for Redshift integration
Clone the Repository: Obtain the necessary code by cloning the provided GitHub repository:
```
git clone https://github.com/pap-openai/redshift-middleware
cd redshift-middleware
```
This repository holds the core Python code for interacting with your Redshift instance.
Customize the Code: Modify the code to precisely match your unique needs.

Code Deep Dive: Understanding the Python Script

Here's a breakdown of the key components within the lambda_handler function of the Python script:

Environment Variables: Securely retrieves Redshift credentials from environment variables.
SQL Execution: Executes the SQL query received from the GPT Action.
Data Conversion: Processes data in the following format in order to serve the action effectively: csv.
Base64 Encoding: Encodes the CSV file content using Base64.
Response Formatting: Structures the response in a format compatible with OpenAI's file retrieval mechanism.

Connecting to Your Redshift VPC: Network Configuration

To allow your AWS Function to communicate with Redshift, you have to find the network used by Redshift. You can find this on your Redshift interface the AWS console, under Amazon Redshift Serverless > Workgroup configuration > your_workgroup > Data access, or through the CLI:

aws redshift - serverless get - workgroup -- workgroup - name default - workgroup -- query 'workgroup. {address: endpoint.address, port: endpoint.port, SecurityGroupIds: securityGroupIds, SubnetIds: subnetIds} '

Setting Up the AWS Function: Configuring the Environment

Prepare the env.yaml File: Create a copy of the env.sample.yaml file and rename it to env.yaml.

Populate the env.yaml File: Fill this file with the connection details obtained in the previous step.

RedshiftHost: default-workgroup.xxxxx.{region}.redshift-serverless.amazonaws.com
RedshiftPort: 5439
RedshiftUser: username
RedshiftPassword: password
RedshiftDb: my-db
SecurityGroupId: sg-xx
SubnetId1: subnet-xx
SubnetId2: subnet-xx
SubnetId3: subnet-xx
SubnetId4: subnet-xx
SubnetId5: subnet-xx
SubnetId6: subnet-xx

Deploying the AWS Function using AWS SAM:

Use the following commands to deploy your function, replacing placeholders with your specific values:

PARAM_FILE="env.yaml"
PARAMS=$(yq eval -o=json $PARAM_FILE | jq -r 'to_entries | map("\(.key)=\(.value|tostring)") | join(" ")')
sam deploy --template-file template.yaml --stack-name redshift-middleware --capabilities CAPABILITY_IAM --parameter-overrides $PARAMS

Configuring Your Custom GPT: Enabling Conversational Data Access

Providing the Custom GPT Instructions: Directing the Conversation

Instruct your GPT with the following, tailoring it to be an expert at writing and executing Redshift SQL queries:

Context: Emphasize the GPT's expertise in crafting Redshift SQL queries.
Initial Query: Always have the GPT start by querying the INFORMATION_SCHEMA.COLUMNS table to gather the database schema.
SQL Conversion: Direct the GPT to convert user questions into SQL statements, confirming the query works before execution.
Data Presentation: Limit the number of rows displayed to the user for quick analysis.
Important Note: Emphasize the importance of using existing table names and attributes without guessing.

Defining the OpenAPI Schema: Connecting the Dots

Integrate the following OpenAPI schema into your Custom GPT's Actions panel, ensuring you replace the placeholder URL with your deployed function's endpoint:

openapi: 3.1.0
info:
  title: SQL Execution API
  description: API to execute SQL statements and return results as a file.
  version: 1.0.0
servers:
  - url: (your_function_url) / Prod
    description: Production server
paths:
  / sql_statement:
    post:
      operationId: executeSqlStatement
      summary: Executes a SQL statement and returns the result as a file.
      requestBody:
        required: true
        content:
          application / json:
            schema:
              type: object
              properties:
                sql_statement:
                  type: string
                  description: The SQL statement to execute.
                  example: SELECT * FROM customers LIMIT 10
              required:
                - sql_statement
      responses:
        '200':
          description: The SQL query result as a JSON file.
          content:
            application / json:
              schema:
                type: object
                properties:
                  openaiFileResponse:
                    type: array
                    items:
                      type: object
                      properties:
                        name:
                          type: string
                          description: The name of the file.
                          example: query_result.json
                        mime_type:
                          type: string
                          description: The MIME type of the file.
                          example: application / json
                        content:
                          type: string
                          description: The base64 encoded content of the file.
                          format: byte
                          example: eyJrZXkiOiJ2YWx1ZSJ9
        '500':
          description: Error response
          content:
            application / json:
              schema:
                type: object
                properties:
                  error:
                    type: string
                    description: Error message.
                    example: Database query failed error details

Conclusion: Empowering Data Interaction through Conversation

Congratulations! You've created a secure and authenticated GPT Action that connects to Redshift, providing users with accessible, conversational data analysis capabilities. This integration empowers users of all technical levels to extract insights from your data. In particular, this article focused on the use case of GPT actions library usage with AWS Redshift and OpenAI Cookbook. This enables integration with Redshift serverless, in particular.