A minimal end-to-end Retrieval-Augmented Generation (RAG) system built with Next.js, Supabase pgvector, and Google Gemini. Users can ask natural-language questions about NovaTech's recruiting FAQ and receive grounded, cited answers.
User Query
│
▼
┌─────────────────────────┐
│ Query Expansion │ gemini-2.5-flash → translate intent to English
│ (gemini-2.5-flash) │ + 3 alternative English phrasings
└────────────┬────────────┘
│ 3 English queries (translated + expanded; original dropped)
▼
┌─────────────────────────┐
│ Embedding │ gemini-embedding-2-preview (MRL → 768 dims)
│ (per query) │
└────────────┬────────────┘
│ 4 × vector(768)
▼
┌─────────────────────────┐
│ Vector Search │ Supabase match_documents RPC
│ (Supabase pgvector) │ HNSW cosine index, threshold 0.5, top-5 per query
└────────────┬────────────┘
│ up to 20 raw results
▼
┌─────────────────────────┐
│ Deduplication │ Keep highest-similarity result per document ID
│ & Re-ranking │ Sort desc → top 5
└────────────┬────────────┘
│ ≤5 source documents
▼
┌─────────────────────────┐
│ Answer Generation │ gemini-2.5-flash with system prompt + context
│ (gemini-2.5-flash) │ + conversation history
└────────────┬────────────┘
│
▼
Answer + Sources + Expanded Queries
- Node.js 18+ (via fnm or nvm)
- A Supabase project with pgvector enabled
- A Google AI Studio API key
1. Install dependencies
npm install2. Configure environment variables
cp .env.example .env.localEdit .env.local with your credentials:
NEXT_PUBLIC_SUPABASE_URL=https://your-project-ref.supabase.co
NEXT_PUBLIC_SUPABASE_ANON_KEY=your-anon-key-here
GOOGLE_AI_API_KEY=your-google-ai-api-key-here3. Apply the database schema
Open the Supabase SQL Editor and run the contents of supabase/schema.sql. This creates:
documentstable withvector(768)embedding column- HNSW cosine index for fast similarity search
match_documentsRPC function
Verify with:
select * from documents limit 1;
select routine_name from information_schema.routines where routine_name = 'match_documents';4. Ingest the knowledge base
Start the dev server, then trigger ingestion:
npm run dev
# In another terminal:
curl -X POST http://localhost:3000/api/ingest
# Expected: {"success":true,"count":15}Verify in Supabase: select count(*) from documents; → 15
After ingestion, run ANALYZE documents; in the SQL Editor so the query planner has accurate statistics.
5. Start chatting
Open http://localhost:3000 and ask questions about NovaTech's hiring process, benefits, culture, or tech stack.
├── app/
│ ├── api/
│ │ ├── ingest/route.ts # POST — embeds knowledge base into Supabase
│ │ └── chat/route.ts # POST — query expansion + retrieval + generation
│ ├── globals.css # TailwindCSS v4 imports
│ ├── layout.tsx # Root layout
│ └── page.tsx # Chat UI (client component)
├── data/
│ └── knowledge-base.json # 15 NovaTech FAQ entries (static source of truth)
├── lib/
│ ├── genai.ts # GoogleGenAI client factory
│ ├── supabase.ts # SupabaseClient factory
│ └── types.ts # Shared TypeScript interfaces
├── supabase/
│ └── schema.sql # pgvector schema — apply manually in Supabase
└── .env.example # Environment variable template
gemini-embedding-2-preview supports Matryoshka Representation Learning (MRL), allowing output dimensionality to be reduced from 3072 to 768 via outputDimensionality: 768. For a 15-document corpus this halves storage and index size with negligible quality loss — cosine similarity is well-preserved at 768 dims for FAQ-style content.
HNSW (Hierarchical Navigable Small World) was chosen over IVFFlat because it requires no pre-built centroid lists and performs well on empty or small tables. IVFFlat requires lists to be tuned relative to row count and needs warm-up (ANALYZE) before queries are efficient. HNSW is the better default for a prototype that starts empty. For corpora >100k rows, IVFFlat becomes more memory-efficient.
A single query can miss relevant documents if phrasing doesn't align with how content was written. Expanding to 4 phrasings (original + 3 LLM alternatives) and deduplicating by best similarity improves recall by approximately 30–50% on paraphrased queries against small corpora, at the cost of 3 extra embedding calls per request.
Parsing PDFs or arbitrary uploads adds significant complexity (chunking strategies, format handling) that is out of scope for a 2–4 hour prototype. A static knowledge-base.json provides a clean, inspectable source of truth. However, the architecture is deliberately future-proofed: By selecting gemini-embedding-2-preview—Google's natively multimodal embedding model—the system bypasses traditional OCR limitations. In a production scenario, this exact pipeline can seamlessly ingest raw PDFs, employee handbooks, or architecture diagrams, mapping text and visual documents into the same unified 768-dimensional vector space without requiring a separate, brittle OCR extraction step.
Supabase JS .upsert() only accepts real column names in onConflict, not JSON path expressions. An expression-based unique index on (metadata->>'id') cannot be referenced. The solution is a dedicated source_id text unique column, allowing clean onConflict: 'source_id' upserts. This also makes the unique constraint explicit and queryable.
Both embedding calls use task-type hints: taskType: 'RETRIEVAL_DOCUMENT' during ingestion and taskType: 'RETRIEVAL_QUERY' at query time. These hints allow the model to optimize the embedding geometry for asymmetric retrieval, improving match quality between stored documents and incoming queries.
The knowledge base is authored in English, but the assistant responds in German. Rather than relying purely on the multilingual nature of the embeddings, the query expansion stage (Stage 2) explicitly translates the user's intent into English and generates 3 English alternative phrasings before any embedding is computed. This ensures the entire retrieval path is English-to-English, maximising cosine similarity precision against the English vector database. The translation boundary is contained within Stage 2; the answer generation stage then synthesises the retrieved English context into a German response for the user.
The Supabase anon key is used in server-side API routes (not exposed to the browser). RLS is not enforced on the documents table in this prototype since the data is public FAQ content. In production, either a service role key (server-only, never in NEXT_PUBLIC_*) or row-level security policies should gate write access to the documents table.
- No streaming: Answers are returned as a single JSON payload. Streaming via
ReadableStreamwould improve perceived latency for longer answers. - No persistent chat history: Conversation history is stored in React state only — lost on page refresh.
- Unauthenticated ingest endpoint:
/api/ingesthas no auth guard. In production this should require an API key or admin session. - Sequential embedding calls: The ingest route embeds entries one-by-one to avoid rate limits. A batched approach with retry/backoff would be faster at scale.