AI Engine: Building OpenAI GPT Prompt

We need to build an AI prompt in the build_prompt() method.

We will send this to the OpenAI API, so we must carefully choose how to formulate the prompt.

Our prompt will consist of four parts:

Basic instructions
Guidelines
Context
Additional rules

In this lesson, we will build the following application part:

Prompt Instructions

Let's start by giving basic instructions:

You are a salesman in a car dealership. Information about all the cars in our stock
is provided in the Context section of the question. Your main job is
to help a customer end up with a car model that would fit his needs.
 
You should ask for more information about what type of car the person wants,
but you should not exceed the information given in the context.
 
Given an input question, first create a syntactically correct MySQL query to run,
then look at the results of the query and return the answer.
 
Use the following format:

You might be wondering what the hell this is, right? Well, this is a way to make sure our AI agent is... Impersonating someone!

Seriously, this is how we have to think about it. We are giving it a role and a set of instructions to follow. This is how we can make sure it doesn't go off the rails and start doing something we don't want it to do.

Yes, prompt engineering is a separate big topic outside of this tutorial. Still, it improves the result you can get from any LLM model.

Prompt Guidelines

Ok, let's continue adding more information to the prompt with guidelines:

You are a salesman in a car dealership. Information about all the cars in our stock
is provided in the Context section of the question. Your main job is
to help a customer end up with a car model that would fit his needs.
 
You should ask for more information about what type of car the person wants,
but you should not exceed the information given in the context.
 
Given an input question, first create a syntactically correct MySQL query to run,
then look at the results of the query and return the answer.
 
Use the following format:
 
---
 
Guidelines
 
Question: "User question here"
SQLQuery: "SQL Query used to generate the result (if applicable)"
SQLResult: "Result of the SQLQuery (if applicable)"
Answer: "Final answer here (You fill this in with the SQL query only)"
 
---

These guidelines act as a template for how our communication will work. We explicitly tell the AI what format he will receive and what format we want back. In this case:

Question - The user's question
SQLQuery - The SQL query we used to generate the SQLResult, and it's only sent in the second request to API
SQLResult - The result of the SQLQuery, and it's only sent in the second request to API
Answer - The final answer, which is the SQL query only for the first request and the final answer for the second request

Those are some pretty strict guidelines, right? But again, we want to protect our users from AI going off the limits (like it did on a DPD website).

Prompt Context

Last on our list is adding the context. This is where our AI will receive the information about the tables and columns it can use:

You are a salesman in a car dealership. Information about all the cars in our stock
is provided in the Context section of the question. Your main job is
to help a customer end up with a car model that would fit his needs.
 
You should ask for more information about what type of car the person wants,
but you should not exceed the information given in the context.
 
Given an input question, first create a syntactically correct MySQL query to run,
then look at the results of the query and return the answer.
 
Use the following format:
 
---
 
Guidelines
 
Question: "User question here"
SQLQuery: "SQL Query used to generate the result (if applicable)"
SQLResult: "Result of the SQLQuery (if applicable)"
Answer: "Final answer here (You fill this in with the SQL query only)"
 
---
 
Context
 
Only use the following tables and columns:
 
{table_prompt}
{history}
 
Question: "{question}"
{query_prompt}
 
---
 
{rules}

In this context, we will add a few things:

table_prompt - A list of our DB tables and their columns
history - The chat history between our customer and AI

And lastly, we add rules to form 2 discrete answers based on what we sent. Here's what that looks like:

1st request - We expect just a raw SQL query and nothing else to be returned
2nd request - We expect the final answer to be returned that answers the user's question

Now, let's build this into our ai_engine.py:

ai_engine.py

from openai import OpenAI
 
def build_prompt(question, query=None, query_output=None, chat_messages=None):
    tables = get_table_names()
 
    table_prompt = ""
    for table in tables:
        table_prompt += f"{table} has columns: " + ', '.join([f"{column['name']} ({column['type']})" for column in tables[table]]) + "\n"
 
    history = ""
    if chat_messages and chat_messages != "":
        history = "\n" + chat_messages
 
    query_prompt = ""
 
    if query:
        query_prompt += "SQLQuery: " + query + "\n"
        if query_output:
            query_prompt += "SQLResult: " + query_output + "\n"
        rules = "Answer: "
    else:
        rules = ("(Your answer HERE must be a syntactically correct MySQL query with no extra information or "
                 "quotes.Omit SQLQuery: from your answer)")
 
    return f"""
    You are a salesman in a car dealership. Information about all the cars in our stock is provided in the Context section of the question. Your main job is to help a customer end up with a car model that would fit his needs. You should ask for more information about what type of car the person wants, but you should not exceed the information given in the context.
 
    Given an input question, first create a syntactically correct MySQL query to run, then look at the results of the query and return the answer.
Use the following format:
 
---
 
Guidelines
 
Question: "User question here"
SQLQuery: "SQL Query used to generate the result (if applicable)"
SQLResult: "Result of the SQLQuery (if applicable)"
Answer: "Final answer here (You fill this in with the SQL query only)"
 
---
 
Context
 
Only use the following tables and columns:
 
{table_prompt}
{history}
 
Question: "{question}"
{query_prompt}
 
---
 
{rules}
"""
 
# ...

Our prompt requires us to load database tables for our prompt like this:

addon_categories has columns: id (bigint), title (varchar), created_at (timestamp), updated_at (timestamp)
car_addons has columns: id (bigint), car_id (bigint), addon_category_id (bigint), title (varchar), created_at (timestamp), updated_at (timestamp)
cars has columns: id (bigint), manufacturer (varchar), model (varchar), year (varchar), type (varchar), drivetrain (varchar), created_at (timestamp), updated_at (timestamp)
failed_jobs has columns: id (bigint), uuid (varchar), connection (text), queue (text), payload (longtext), exception (longtext), failed_at (timestamp)
migrations has columns: id (int), migration (varchar), batch (int)
password_reset_tokens has columns: email (varchar), token (varchar), created_at (timestamp)
personal_access_tokens has columns: id (bigint), tokenable_type (varchar), tokenable_id (bigint), name (varchar), token (varchar), abilities (text), last_used_at (timestamp), expires_at (timestamp), created_at (timestamp), updated_at (timestamp)
users has columns: id (bigint), name (varchar), email (varchar), email_verified_at (timestamp), password (varchar), remember_token (varchar), created_at (timestamp), updated_at (timestamp)

To get the table names and columns, we have to add the get_table_names() function to our ai_engine.py:

ai_engine.py

import mysql.connector
import os
 
# ...
 
def connect_database():
    connector = mysql.connector.connect(
        host=os.getenv('DB_HOST'),
        user=os.getenv('DB_USER'),
        password=os.getenv('DB_PASSWORD'),
        database=os.getenv('DB_DATABASE')
    )
    return connector
 
 
def get_table_names():
    connector = connect_database()
    cars_query = connector.cursor()
 
    cars_query.execute("""
SELECT TABLE_NAME, COLUMN_NAME, DATA_TYPE
FROM information_schema.columns
WHERE table_schema = DATABASE()
ORDER BY table_name, ordinal_position
""")
    tables = cars_query.fetchall()
 
    connector.close()
 
    table_data = {}
    for table in tables:
        table_data[table[0]] = table_data.get(table[0], [])
        table_data[table[0]].append({
            "name": table[1],
            "type": table[2]
        })
 
    return table_data
 
# ...

For now, we are building the prompt with our instructions and guidelines. Next on our list is to send this prompt to the OpenAI API and receive the response.