RASS (Retrieval-Augmented Semantic Search)

CoRAG is a multi-service RAG platform for document ingestion, retrieval, and grounded chat. The current codebase is centered around three backend services plus a React frontend:

mcp-server: auth, API gateway, chat persistence, document registry, knowledge bases, workspaces, admin APIs, and MCP transport
rass-engine-service: retrieval pipeline and answer generation
embedding-service: async ingestion, chunking, embedding, indexing, and provenance callbacks
frontend: React chat UI, usually run separately in local development

This README describes the code as it exists now. Older phase-oriented writeups in the repo are historical context, not the primary source of truth.

Current Architecture

flowchart TD
    Browser["Browser / React frontend"]
    MCP["mcp-server :8080"]
    Engine["rass-engine-service :8000"]
    Embed["embedding-service :8001"]
    OS["OpenSearch :9200"]
    Redis["Redis :6379"]
    PG["Postgres :5432"]

    Browser -->|REST + SSE| MCP
    MCP -->|stream proxy| Engine
    MCP -->|upload proxy| Embed
    MCP -->|Prisma| PG
    Engine -->|hybrid search| OS
    Engine -->|parent fetch| Embed
    Embed -->|BullMQ| Redis
    Embed -->|parent chunks| Redis
    Embed -->|child chunks| OS

Service Responsibilities

`mcp-server`

Issues short-lived JWTs and rotating refresh-token cookies
Accepts Authorization: Bearer <jwt> and Authorization: ApiKey <raw_key>
Stores users, chats, messages, documents, provenance, feedback, annotations, entities, relations, workspaces, API keys, refresh tokens, audit logs, and shared-chat metadata in Postgres via Prisma
Proxies uploads to embedding-service
Proxies streaming queries to rass-engine-service
Exposes REST routes for chats, documents, knowledge bases, organizations, workspaces, admin/compliance, feedback, annotations, knowledge graph extraction, and MCP transport

`rass-engine-service`

Exposes POST /ask and POST /stream-ask
Uses the current staged retrieval pipeline:
- HydeQueryExpansionStage
- EmbedQueryStage
- HybridSearchStage
- ParentFetchStage
- DeduplicateStage
- RerankStage
- FeedbackBoostStage
- TopKSelectStage
Streams OpenAI-style SSE chunks:
- context metadata first
- token deltas next
- citations metadata last
- [DONE] sentinel at the end

`embedding-service`

Accepts uploads and immediately enqueues BullMQ ingestion jobs
Worker flow is parse -> chunk -> embed -> index -> provenance callback
Stores parent chunks in Redis and child chunks in OpenSearch
Supports OCR fallback for images and scanned PDFs when VISION_ENABLED: true
Exposes job status and Bull Board in non-production

`frontend`

React 19 + MUI single-page app
Uses in-memory JWT state plus an HTTP-only refresh-token cookie
Streams assistant responses into the chat UI
Includes document management, context panel, guided tour, and shared-chat route code

Runtime Ports

The root docker-compose.yml currently exposes:

Service	Host Port	Notes
`mcp-server`	`8080`	Main API surface
`rass-engine-service`	`8000`	Internal query engine
`embedding-service`	`8001`	Internal ingestion service
`opensearch`	`9200`	Search cluster
`redis`	`6379`	Queue + docstore
`db`	`5432`	Postgres
`jaeger`	`16686`	Trace UI
`prometheus`	`9090`	Metrics
`grafana`	`3001`	Dashboards
`loki`	`3100`	Log backend
`ollama`	`11434`	Optional local models

Important: the root compose stack does not currently run the standard React frontend. For local UI work, run frontend separately. The demo compose attempts to include a frontend service, but that flow is not the canonical local-development path.

Data Placement

Postgres: application state and governance data
OpenSearch: searchable child chunks and hybrid retrieval
Redis: BullMQ queues and parent chunk storage

Core Prisma models include:

User, Chat, Message, ChatDocument
Document, DocumentProvenance
KnowledgeBase, KBMember
Organization, OrgMember, Workspace, WorkspaceMember
ApiKey, RefreshToken, AuditLog
RetrievalFeedback, Entity, Relation, Annotation, SharedChat

Core Flows

Auth

POST /api/auth/login returns a short-lived JWT and sets an HTTP-only refresh-token cookie.
The frontend stores the JWT in memory only.
On reload, the frontend calls POST /api/auth/refresh with withCredentials: true.
POST /api/auth/logout invalidates the refresh token and clears the cookie.

Upload and Ingestion

Client uploads a file to POST /api/embed-upload on mcp-server.
mcp-server creates a Document row with status QUEUED.
The file is forwarded to embedding-service.
embedding-service writes the file to disk, enqueues a BullMQ job, and returns 202 Accepted.
The worker parses, chunks, embeds, indexes, writes provenance, and calls back into mcp-server internal routes.
Clients poll GET /api/ingest/status/:jobId.

Query and Streaming

Client sends POST /api/stream-ask.
mcp-server forwards the request and user identity to rass-engine-service.
The retrieval pipeline runs and selects source documents.
The engine streams:
- retrieved context metadata
- answer tokens
- structured citations
- [DONE]

Local Development

Prerequisites

Docker with Compose support
Node.js 18+
A root .env file with secrets
shared_rass_network Docker network created once:

docker network create shared_rass_network

Configure

config.yml at the repo root is the shared non-secret config for all backend services.

Put secrets in .env, for example:

OPENAI_API_KEY=...
GEMINI_API_KEY=...
JWT_SECRET=...
REFRESH_TOKEN_SECRET=...
DATABASE_URL=postgresql://rass_user:rass_password@db:5432/rass_db

Start the backend stack

./scripts/start.sh

That runs:

docker-compose up -d --build

The mcp-server container runs npx prisma migrate deploy on boot.

Run the frontend locally

cd frontend
npm install
npm start

The frontend talks to http://localhost:8080/api.

Stop the backend stack

./scripts/stop.sh

Commands and Scripts

Top-level scripts

scripts/start.sh: starts the root backend stack
scripts/stop.sh: stops the root backend stack
scripts/demo.sh: starts the demo compose flow
scripts/ollama-pull-models.sh: pulls local Ollama models

Service commands

embedding-service: npm start, npm test
rass-engine-service: npm start, npm test
mcp-server: npm test, npm run validate:api
frontend: npm start, npm test, npm build

Observability

The current stack includes:

OpenTelemetry in all three Node services
Prometheus metrics at /metrics
Jaeger trace collection
Grafana dashboards
Loki + Promtail log aggregation
Product-level audit logging in AuditLog

Useful URLs in local development:

API health: http://localhost:8080/api/health
Swagger UI: http://localhost:8080/api/docs
Bull Board: http://localhost:8001/admin/queues
Jaeger: http://localhost:16686
Prometheus: http://localhost:9090
Grafana: http://localhost:3001

Known Limitations

These are current code realities and should be treated as such:

The root compose stack does not include the normal frontend service.
Per-KB and per-workspace ingestion exists, but the active HybridSearchStage still searches the configured default OpenSearch index rather than dynamically switching to a KB/workspace index.
The shared-chat backend route currently selects role and content fields even though the Prisma Message model uses sender and text. Treat shared chats as incomplete.
The repo contains two distinct "knowledge graph" concepts:
- an older document-similarity graph route in knowledgeBases.js
- the newer entity/relation graph in knowledgeGraph.js
Some frontend components still assume legacy token handling. The canonical auth model is in-memory JWT + refresh cookie.
Document deletion is a soft-delete plus best-effort search cleanup. Redis parent chunks are not comprehensively purged in the normal delete flow.
POST /api/chat/completions is intended for OpenAI-compatible integrations and forwards unscoped requests to the engine.

Where To Read Next

Step-by-step setup and usage: docs/getting-started.md
Current architecture walkthrough: docs/PLANNER_AND_DIAGRAMS.md
Deployment details: DEPLOYMENT.md
User-facing behavior: docs/user-guide.md
Streaming protocol: docs/api/streaming.md
Service-specific notes:
- mcp-server/README.md
- rass-engine-service/README.md
- embedding-service/README.md
- frontend/README.md

Name		Name	Last commit message	Last commit date
Latest commit History 265 Commits
.github/workflows		.github/workflows
.serena		.serena
benchmarks		benchmarks
demo		demo
docs		docs
embedding-service		embedding-service
evaluation		evaluation
frontend		frontend
mcp-server		mcp-server
monitoring		monitoring
rass-engine-service		rass-engine-service
scripts		scripts
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
DEPLOYMENT.md		DEPLOYMENT.md
README.md		README.md
WORKLOG.md		WORKLOG.md
config.example.yml		config.example.yml
config.yml		config.yml
docker-compose.yml		docker-compose.yml
mcp-schemas.json		mcp-schemas.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RASS (Retrieval-Augmented Semantic Search)

Current Architecture

Service Responsibilities

`mcp-server`

`rass-engine-service`

`embedding-service`

`frontend`

Runtime Ports

Data Placement

Core Flows

Auth

Upload and Ingestion

Query and Streaming

Local Development

Prerequisites

Configure

Start the backend stack

Run the frontend locally

Stop the backend stack

Commands and Scripts

Top-level scripts

Service commands

Observability

Known Limitations

Where To Read Next

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RASS (Retrieval-Augmented Semantic Search)

Current Architecture

Service Responsibilities

mcp-server

rass-engine-service

embedding-service

frontend

Runtime Ports

Data Placement

Core Flows

Auth

Upload and Ingestion

Query and Streaming

Local Development

Prerequisites

Configure

Start the backend stack

Run the frontend locally

Stop the backend stack

Commands and Scripts

Top-level scripts

Service commands

Observability

Known Limitations

Where To Read Next

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`mcp-server`

`rass-engine-service`

`embedding-service`

`frontend`

Packages