Skip to content

CommiAI/llm-wiki-poc

Repository files navigation

LLM Wiki POC

Multimodal RAG agent using Gemini embeddings, ChromaDB, and BAML.

Reference idea

Based on Karpathy's LLM wiki sketch.

Prerequisites

  • Docker

Run

cp .env.example .env
# edit .env: set GEMINI_API_KEY (required), BOUNDARY_API_KEY (optional, for tracing)

docker compose up --build

App serves at http://localhost:8000. Ingested files persist to ./data, Chroma to a named volume.

User guide

  1. Open http://localhost:8000
  2. Upload documents in the Ingest tab (multimodal: PDF, text, markdown, audio, video)
  3. Chat with the agent in the Threads tab — it'll search, read, write, and edit wiki pages grounded in your uploads

Architecture

System overview

System overview

Ingestion pipeline — how uploaded files are chunked, embedded, and stored per user.

Ingestion pipeline

Agent tools — how the agent loop decides between semantic_search, read, write, and edit, and how each call is sandboxed to the user.

Agent tools

System prompt

The main system prompt lives in baml_src/agent.baml.

Demo assets

Test assets for trying out the ingest pipeline can be found in docs/assets/. They include a PDF, markdown file, image, audio clip, and video clip.

Future enhancements

  • Multimodal input in chat — let users paste images, audio, and video directly into a turn, not just during ingest.
  • Conversation-level evals — build an eval harness over full threads (not just single turns) to iterate on the agent prompt with signal.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors