This starter takes all the .mdx files in the pages directory and processes them to use as custom context within OpenAI Text Completion prompts.
Deploy this starter to Vercel. The Supabase integration will automatically set the required environment variables and configure your Database Schema. All you have to do is set your OPENAI_KEY and you're ready to go!
Building your own custom ChatGPT involves four steps:
- [👷 Build time] Pre-process the knowledge base (your .mdxfiles in yourpagesfolder).
- [👷 Build time] Store embeddings in Postgres with pgvector.
- [🏃 Runtime] Perform vector similarity search to find the content that's relevant to the question.
- [🏃 Runtime] Inject content into OpenAI GPT-3 text completion prompt and stream response to the client.
Step 1. and 2. happen at build time, e.g. when Vercel builds your Next.js app. During this time the generate-embeddings script is being executed which performs the following tasks:
sequenceDiagram
    participant Vercel
    participant DB (pgvector)
    participant OpenAI (API)
    loop 1. Pre-process the knowledge base
        Vercel->>Vercel: Chunk .mdx pages into sections
        loop 2. Create & store embeddings
            Vercel->>OpenAI (API): create embedding for page section
            OpenAI (API)->>Vercel: embedding vector(1536)
            Vercel->>DB (pgvector): store embedding for page section
        end
    end
    In addition to storing the embeddings, this script generates a checksum for each of your .mdx files and stores this in another database table to make sure the embeddings are only regenerated when the file has changed.
Step 3. and 4. happen at runtime, anytime the user submits a question. When this happens, the following sequence of tasks is performed:
sequenceDiagram
    participant Client
    participant Edge Function
    participant DB (pgvector)
    participant OpenAI (API)
    Client->>Edge Function: { query: lorem ispum }
    critical 3. Perform vector similarity search
        Edge Function->>OpenAI (API): create embedding for query
        OpenAI (API)->>Edge Function: embedding vector(1536)
        Edge Function->>DB (pgvector): vector similarity search
        DB (pgvector)->>Edge Function: relevant docs content
    end
    critical 4. Inject content into prompt
        Edge Function->>OpenAI (API): completion request prompt: query + relevant docs content
        OpenAI (API)-->>Client: text/event-stream: completions response
    end
    The relevant files for this are the SearchDialog (Client) component and the vector-search (Edge Function).
The initialization of the database, including the setup of the pgvector extension is stored in the supabase/migrations folder which is automatically applied to your local Postgres instance when running supabase start.
- cp .env.example .env
- Set your OPENAI_KEYin the newly created.envfile.
- Set NEXT_PUBLIC_SUPABASE_ANON_KEYandSUPABASE_SERVICE_ROLE_KEYrun:
Note: You have to run supabase to retrieve the keys.
Make sure you have Docker installed and running locally. Then run
supabase startTo retrieve NEXT_PUBLIC_SUPABASE_ANON_KEY and SUPABASE_SERVICE_ROLE_KEY run:
supabase statusIn a new terminal window, run
pnpm dev- By default your documentation will need to be in .mdxformat. This can be done by renaming existing (or compatible) markdown.mdfile.
- Run pnpm run embeddingsto regenerate embeddings.Note: Make sure supabase is running. To check, run supabase status. If is not running runsupabase start.
- Run pnpm devagain to refresh NextJS localhost:3000 rendered page.
Deploy this starter to Vercel. The Supabase integration will automatically set the required environment variables and configure your Database Schema. All you have to do is set your OPENAI_KEY and you're ready to go!
- Read the blogpost on how we built ChatGPT for the Supabase Docs.
- [Docs] pgvector: Embeddings and vector similarity
- Watch Greg's "How I built this" video on the Rabbit Hole Syndrome YouTube Channel:
