NovaTech RAG Pipeline

A minimal end-to-end Retrieval-Augmented Generation (RAG) system built with Next.js, Supabase pgvector, and Google Gemini. Users can ask natural-language questions about NovaTech's recruiting FAQ and receive grounded, cited answers.

Architecture

User Query
    │
    ▼
┌─────────────────────────┐
│   Query Expansion        │  gemini-2.5-flash → translate intent to English
│   (gemini-2.5-flash)     │  + 3 alternative English phrasings
└────────────┬────────────┘
             │ 3 English queries (translated + expanded; original dropped)
             ▼
┌─────────────────────────┐
│   Embedding              │  gemini-embedding-2-preview (MRL → 768 dims)
│   (per query)            │
└────────────┬────────────┘
             │ 4 × vector(768)
             ▼
┌─────────────────────────┐
│   Vector Search          │  Supabase match_documents RPC
│   (Supabase pgvector)    │  HNSW cosine index, threshold 0.5, top-5 per query
└────────────┬────────────┘
             │ up to 20 raw results
             ▼
┌─────────────────────────┐
│   Deduplication          │  Keep highest-similarity result per document ID
│   & Re-ranking           │  Sort desc → top 5
└────────────┬────────────┘
             │ ≤5 source documents
             ▼
┌─────────────────────────┐
│   Answer Generation      │  gemini-2.5-flash with system prompt + context
│   (gemini-2.5-flash)     │  + conversation history
└────────────┬────────────┘
             │
             ▼
     Answer + Sources + Expanded Queries

Setup

Prerequisites

Node.js 18+ (via fnm or nvm)
A Supabase project with pgvector enabled
A Google AI Studio API key

Steps

1. Install dependencies

npm install

2. Configure environment variables

cp .env.example .env.local

Edit .env.local with your credentials:

NEXT_PUBLIC_SUPABASE_URL=https://your-project-ref.supabase.co
NEXT_PUBLIC_SUPABASE_ANON_KEY=your-anon-key-here
GOOGLE_AI_API_KEY=your-google-ai-api-key-here

3. Apply the database schema

Open the Supabase SQL Editor and run the contents of supabase/schema.sql. This creates:

documents table with vector(768) embedding column
HNSW cosine index for fast similarity search
match_documents RPC function

Verify with:

select * from documents limit 1;
select routine_name from information_schema.routines where routine_name = 'match_documents';

4. Ingest the knowledge base

Start the dev server, then trigger ingestion:

npm run dev
# In another terminal:
curl -X POST http://localhost:3000/api/ingest
# Expected: {"success":true,"count":15}

Verify in Supabase: select count(*) from documents; → 15

After ingestion, run ANALYZE documents; in the SQL Editor so the query planner has accurate statistics.

5. Start chatting

Open http://localhost:3000 and ask questions about NovaTech's hiring process, benefits, culture, or tech stack.

Project Structure

├── app/
│   ├── api/
│   │   ├── ingest/route.ts   # POST — embeds knowledge base into Supabase
│   │   └── chat/route.ts     # POST — query expansion + retrieval + generation
│   ├── globals.css           # TailwindCSS v4 imports
│   ├── layout.tsx            # Root layout
│   └── page.tsx              # Chat UI (client component)
├── data/
│   └── knowledge-base.json   # 15 NovaTech FAQ entries (static source of truth)
├── lib/
│   ├── genai.ts              # GoogleGenAI client factory
│   ├── supabase.ts           # SupabaseClient factory
│   └── types.ts              # Shared TypeScript interfaces
├── supabase/
│   └── schema.sql            # pgvector schema — apply manually in Supabase
└── .env.example              # Environment variable template

Design Decisions

768-dimensional MRL embeddings

gemini-embedding-2-preview supports Matryoshka Representation Learning (MRL), allowing output dimensionality to be reduced from 3072 to 768 via outputDimensionality: 768. For a 15-document corpus this halves storage and index size with negligible quality loss — cosine similarity is well-preserved at 768 dims for FAQ-style content.

HNSW over IVFFlat

HNSW (Hierarchical Navigable Small World) was chosen over IVFFlat because it requires no pre-built centroid lists and performs well on empty or small tables. IVFFlat requires lists to be tuned relative to row count and needs warm-up (ANALYZE) before queries are efficient. HNSW is the better default for a prototype that starts empty. For corpora >100k rows, IVFFlat becomes more memory-efficient.

Query expansion improves recall

A single query can miss relevant documents if phrasing doesn't align with how content was written. Expanding to 4 phrasings (original + 3 LLM alternatives) and deduplicating by best similarity improves recall by approximately 30–50% on paraphrased queries against small corpora, at the cost of 3 extra embedding calls per request.

Static JSON and multimodal future-proofing

Parsing PDFs or arbitrary uploads adds significant complexity (chunking strategies, format handling) that is out of scope for a 2–4 hour prototype. A static knowledge-base.json provides a clean, inspectable source of truth. However, the architecture is deliberately future-proofed: By selecting gemini-embedding-2-preview—Google's natively multimodal embedding model—the system bypasses traditional OCR limitations. In a production scenario, this exact pipeline can seamlessly ingest raw PDFs, employee handbooks, or architecture diagrams, mapping text and visual documents into the same unified 768-dimensional vector space without requiring a separate, brittle OCR extraction step.

`source_id` column for idempotent upserts

Supabase JS .upsert() only accepts real column names in onConflict, not JSON path expressions. An expression-based unique index on (metadata->>'id') cannot be referenced. The solution is a dedicated source_id text unique column, allowing clean onConflict: 'source_id' upserts. This also makes the unique constraint explicit and queryable.

Task-type hints for asymmetric retrieval

Both embedding calls use task-type hints: taskType: 'RETRIEVAL_DOCUMENT' during ingestion and taskType: 'RETRIEVAL_QUERY' at query time. These hints allow the model to optimize the embedding geometry for asymmetric retrieval, improving match quality between stored documents and incoming queries.

Cross-language RAG (Translate & Expand)

The knowledge base is authored in English, but the assistant responds in German. Rather than relying purely on the multilingual nature of the embeddings, the query expansion stage (Stage 2) explicitly translates the user's intent into English and generates 3 English alternative phrasings before any embedding is computed. This ensures the entire retrieval path is English-to-English, maximising cosine similarity precision against the English vector database. The translation boundary is contained within Stage 2; the answer generation stage then synthesises the retrieved English context into a German response for the user.

Anon key for API routes

The Supabase anon key is used in server-side API routes (not exposed to the browser). RLS is not enforced on the documents table in this prototype since the data is public FAQ content. In production, either a service role key (server-only, never in NEXT_PUBLIC_*) or row-level security policies should gate write access to the documents table.

Limitations

No streaming: Answers are returned as a single JSON payload. Streaming via ReadableStream would improve perceived latency for longer answers.
No persistent chat history: Conversation history is stored in React state only — lost on page refresh.
Unauthenticated ingest endpoint: /api/ingest has no auth guard. In production this should require an API key or admin session.
Sequential embedding calls: The ingest route embeds entries one-by-one to avoid rate limits. A batched approach with retry/backoff would be faster at scale.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
app		app
data		data
lib		lib
public		public
supabase		supabase
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
eslint.config.mjs		eslint.config.mjs
middleware.ts		middleware.ts
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NovaTech RAG Pipeline

Architecture

Setup

Prerequisites

Steps

Project Structure

Design Decisions

768-dimensional MRL embeddings

HNSW over IVFFlat

Query expansion improves recall

Static JSON and multimodal future-proofing

`source_id` column for idempotent upserts

Task-type hints for asymmetric retrieval

Cross-language RAG (Translate & Expand)

Anon key for API routes

Limitations

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NovaTech RAG Pipeline

Architecture

Setup

Prerequisites

Steps

Project Structure

Design Decisions

768-dimensional MRL embeddings

HNSW over IVFFlat

Query expansion improves recall

Static JSON and multimodal future-proofing

source_id column for idempotent upserts

Task-type hints for asymmetric retrieval

Cross-language RAG (Translate & Expand)

Anon key for API routes

Limitations

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages

`source_id` column for idempotent upserts