What is RAG and Why Does It Matter
Standard GPT-4o doesn't know about your company's internal docs, your product catalog, or your customer data. RAG fixes this: you store your content as vector embeddings in a database, and at query time, you find the most relevant chunks and send them as context to the LLM.
The result: an AI assistant that answers questions about your specific data, accurately and with citations.
Step 1: Enable pgvector in Supabase
Supabase comes with pgvector built in. In the Supabase SQL editor, run: CREATE EXTENSION IF NOT EXISTS vector;
Then create your documents table: CREATE TABLE documents ( id bigint primary key generated always as identity, content text, embedding vector(1536), metadata jsonb, created_at timestamptz default now() );
Create an index for fast similarity search: CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops);
This is the only SQL you need to write.
Step 2: Generate Embeddings with Xano
In Xano, create an endpoint that accepts a text string, calls the OpenAI Embeddings API (text-embedding-3-small), and stores the result in your Supabase documents table.
The function stack: (1) Get text input, (2) Call OpenAI API POST /v1/embeddings, (3) Extract the embedding array from the response, (4) Insert into Supabase via the Supabase API connector.
Run this for every document, article, or FAQ entry you want your AI to know about.
Step 3: Semantic Search in Xano
Create a second endpoint that: (1) Accepts a user query string, (2) Generates an embedding for the query (same OpenAI call), (3) Runs a vector similarity search in Supabase.
The Supabase RPC function for this: SELECT content, 1 - (embedding <=> query_embedding) as similarity FROM documents ORDER BY embedding <=> query_embedding LIMIT 5;
This returns the 5 most semantically relevant chunks to the user's question.
Step 4: Generate the Answer with GPT-4o
With the retrieved chunks, call OpenAI's Chat Completions API. The system prompt:
"You are a helpful assistant. Answer the user's question using ONLY the context provided below. If the answer is not in the context, say so.
Context: [INSERT RETRIEVED CHUNKS HERE]"
This grounds the LLM's response in your actual data, preventing hallucinations.
Step 5: Build the Chat UI in WeWeb
In WeWeb, create a chat interface with a message list and input field. On send: call your Xano search endpoint, then stream the GPT response using Xano's streaming support or a direct OpenAI call from WeWeb's custom code.
For production, add: message history (store in Supabase), source citations (show which documents were retrieved), and a feedback mechanism (thumbs up/down to improve retrieval quality).