The Architecture
A production AI chatbot has four components:
1. **Chat UI**: Input field, message history display, loading state, error handling 2. **Conversation state**: An array of messages (role + content) stored in frontend state 3. **Backend proxy**: A Supabase Edge Function or Xano endpoint that calls OpenAI 4. **System prompt**: The instruction set that defines your chatbot's persona, knowledge, and constraints
The conversation state is the most important concept. OpenAI's API is stateless — every request must include the full conversation history. Your frontend maintains this history and sends it with each message.
Building the Chat UI in WeWeb
In WeWeb, create a page-level variable `messages` (array, default empty). Add two components:
**Message list**: A Repeating Group bound to `messages`. Each item has a conditional style — user messages right-aligned with a primary colour background, assistant messages left-aligned with a neutral background. Bind the text to `item.content`.
**Input area**: A text input bound to a `userInput` variable, plus a "Send" button. On button click: (1) append `{role: "user", content: userInput}` to `messages`, (2) clear `userInput`, (3) call the API action, (4) append the response as `{role: "assistant", content: response}`.
Add a loading spinner that shows while the API call is in progress.
The Supabase Edge Function
Your Edge Function receives the message array and system prompt, calls OpenAI, and returns the response:
```typescript
serve(async (req) => {
const { messages, systemPrompt } = await req.json()
const completion = await openai.chat.completions.create({
model: "gpt-4o",
messages: [
{ role: "system", content: systemPrompt },
...messages
],
max_tokens: 500,
temperature: 0.7
})
return new Response(
JSON.stringify({ content: completion.choices[0].message.content }),
{ headers: { "Content-Type": "application/json" } }
)
})
```
The `systemPrompt` can be passed from the frontend (useful for multi-persona apps) or hardcoded in the function (more secure).
Writing an Effective System Prompt
The system prompt determines everything about your chatbot's behaviour. A good production system prompt includes:
- **Role**: "You are a customer support agent for Acme SaaS, a project management tool." - **Knowledge**: "You help users with: creating projects, inviting team members, setting up integrations, and billing questions." - **Constraints**: "Only answer questions about Acme SaaS. For unrelated questions, politely redirect. Never discuss competitor products. Never make up features that don't exist." - **Format**: "Keep responses under 100 words. Use bullet points for steps. Always end support answers with: 'Let me know if this helps!'" - **Escalation**: "If the user expresses frustration or mentions a billing error, say: 'I'll connect you with our team' and trigger the escalation flow."
Adding Persistent Context
The basic chatbot forgets everything when the page refreshes. To make it smarter:
**User context injection**: When the chatbot session starts, fetch the user's account data (plan, usage, recent activity) and append it to the system prompt: "The user's current plan is Pro. Their last activity was 3 days ago. They have 2 active projects."
**Conversation persistence**: Save messages to a Supabase table (chatbot_sessions) with user_id and session_id. On page load, fetch the last N messages and pre-populate the messages array.
**Knowledge base**: For product documentation, store articles in Supabase with embeddings (using pgvector). Before calling GPT-4o, run a similarity search and inject the most relevant articles into the system prompt. This is called RAG (Retrieval Augmented Generation) and dramatically improves answer accuracy.
Costs and Performance in Production
For a SaaS with 500 active users each sending 10 messages/day:
- Average message: 50 tokens input + 100 tokens output - GPT-4o pricing: $2.50/M input + $10/M output - Daily cost: 500 × 10 × 150 tokens = 750,000 tokens = ~$2.50/day = ~$75/month
To manage costs: implement a session token budget (stop adding history messages when the conversation exceeds 2,000 tokens, start summarising old messages). Use GPT-4o mini for simple queries ($0.15/M input) and reserve GPT-4o for complex ones.
Response time: GPT-4o returns in 1–3 seconds. Add a typing indicator to set expectations. For sub-second UX, implement streaming.