Northwoods processes handwritten intake documents into structured, confidence-scored fields that route through human-in-the-loop review before acceptance. It was built for the Banyan Software CTO assignment.
Live instance: northwoods.muness.com
This section is for you. It maps the assignment requirements to where they live in the codebase.
| Role | Password | What you'll see | |
|---|---|---|---|
| Intake Worker | worker@sunrise.example | password | Upload dashboard — select template, attach PDF, watch status |
| Reviewer | reviewer@sunrise.example | password | Review queue — confidence indicators, field corrections, similar cases, finalize |
Tenant-b (Lakewood) credentials also exist (worker@lakewood.example, reviewer@lakewood.example) — use them to verify tenant isolation.
The developer scaffold (preset login buttons, raw API explorer) is at northwoods.muness.com/#dev. It is not linked from the main UI.
| Document | What it covers |
|---|---|
| Architecture Rationale | System diagram, component responsibilities, extraction model, RAG strategy, tenancy model, trade-offs |
| ADR 001: Postgres hybrid retrieval | Why Postgres for vector search instead of a dedicated vector DB |
| ADR 002: Temporal for workflows | Considered and deferred — worker polling used instead |
| ADR 003: MinIO for storage | S3-compatible object store for documents |
| ADR 004: Shared tenancy with RLS | Row-level security as defense-in-depth for tenant isolation |
| ADR 005: Consensus extraction pipeline | Multi-provider extraction with append-only attempt history |
| Document | What it covers |
|---|---|
| AI Development Tooling | What tools were used, how, and what remained human-owned |
| Self-Assessment | What's complete, what's missing, 5 concrete prompts with outputs |
The agentic development pipeline is in .claude/agents/dev-pipeline.md and .claude/agents/dev-pipeline-oversight.md. Each GitHub issue was worked end-to-end by an agent: branch, implement, /review, /dissent, CodeRabbit, merge. The human role was scoping, reviewing output, catching drift, and making architectural calls.
Log in as a reviewer and navigate to /#rag-report to see the RAG pipeline results: expected vs. actual similar case retrieval for known narrative arcs across 40 fictional people. This page runs live queries — it is not canned data.
- Confidence drives behavior. Low-confidence fields route to review. High-confidence fields can auto-accept. The review UI shows why the system is uncertain (per-provider breakdown).
- Tenant isolation is provable. RLS on all data tables.
scripts/ci/check-rls-compliance.pyverifies it. Integration tests confirm cross-tenant returns empty. - Audit trail is append-only.
extraction_completed → field_corrected × N → finalized— each event with actor, timestamp, correlation ID, before/after values. - The corpus tells a story. 40 people across two tenants, two visual form styles, longitudinal narrative arcs. Frequent flyers (P019 Raymond Castillo, P039 Gloria Navarro) show progression from crisis to stability across months. The RAG report page shows whether the system found these relationships.
- Docker and Docker Compose
- .NET SDK 10
- Node 22+ and pnpm
docker compose up -dStarts Postgres (with pgvector), MinIO, the API, the extraction worker, and the web frontend. Schema and seed data load automatically.
| Service | URL |
|---|---|
| Web UI | http://localhost:5173 |
| API | http://localhost:5100 |
| OpenAPI spec | http://localhost:5100/openapi/v1.json |
| MinIO console | http://localhost:9001 (northwoods/northwoods) |
| Health check | http://localhost:5100/healthz |
# All backend tests
dotnet test src/Northwoods.slnx
# Frontend type check + lint
pnpm --filter web check
# Playwright e2e (requires running stack + pnpm dev)
pnpm --filter web test:e2e
# RAG pipeline test (requires running stack + OPENAI_API_KEY)
dotnet test tests/Northwoods.Api.IntegrationTests/ --filter Category=RAGThe extraction worker defaults to UseOpenAiVision=true. Without an API key the worker will throw a startup exception and documents will not be processed.
Set OPENAI_API_KEY before starting:
export OPENAI_API_KEY=sk-...
docker compose up -dOr create a .env file at the repo root (docker compose picks this up automatically):
cp .env.example .env
# Edit .env and set OPENAI_API_KEY=sk-...To run without extraction (UI/API development only), disable the provider explicitly:
# Add to .env:
Extraction__UseOpenAiVision=falseIn this mode uploads and review UI work, but no documents will be extracted.
Intake Worker → [Upload PDF] → API → MinIO (store) + Postgres (metadata)
↓
Extraction Worker polls
↓
OCR/AI extraction → confidence scoring
↓
review_ready → Reviewer sees queue
↓
Reviewer: correct fields, finalize
↓
Embeddings regenerated → case_profiles updated
↓
RAG: similar cases surfaced for next review
| Component | Technology | Responsibility |
|---|---|---|
| API | .NET 10 minimal API | Auth, upload, review, search, audit, tenant scoping |
| Extraction Worker | .NET BackgroundService | Polling, staged provider extraction, confidence gating, append-only attempts |
| Frontend | React + TypeScript + Tailwind | Role-based dashboards, confidence visualization, similar case panel |
| Database | Postgres 18 + pgvector + pg_trgm | System of record, RLS, hybrid retrieval, FTS |
| Object Storage | MinIO (S3-compatible) | Document blobs |
- 2 tenants: Sunrise (tenant-a) and Lakewood (tenant-b)
- 6 users per system: worker + reviewer + admin per tenant (password:
password) - 4 templates per tenant: General Assistance, Housing Stability, Behavioral Health, SOAP Progress Note
- 8 seeded documents for RAG demo (P017, P019, P037, P039 across both tenants) — loaded by
DatabaseInitializer.SeedCorpusAsyncon every API startup - Corpus generators in
scripts/corpus/— regenerate withpython3 scripts/corpus/generate_seed_sql.py
Vector similarity search requires embeddings generated via the OpenAI API. On a fresh database:
- With
OPENAI_API_KEYset: The extraction worker generates embeddings for seeded case profiles automatically. All three retrieval strategies (FTS, trigram, vector cosine similarity) work. - Without
OPENAI_API_KEY: FTS and trigram search still work, but vector similarity results will be absent. The similar-cases panel will show fewer or no results for queries that depend on semantic matching.
To regenerate embeddings after resetting the database, run scripts/reset_demo.py which re-uploads seed documents and triggers the extraction/embedding pipeline.
Deployment is release-driven. Create and push a v* tag (for example v0.3.3) and GitHub Actions will build images and publish a GitHub Release for that tag.
It then updates all three Render services (api, worker, web) and triggers deploys.
Blueprint: render.yaml. Custom domain via Cloudflare CNAME.
| Method | Path | Auth | Description |
|---|---|---|---|
| POST | /auth/login |
None | Authenticate, receive JWT |
| GET | /templates |
JWT | List tenant-scoped templates |
| GET | /templates/{id}/blank |
JWT | Download printable blank template |
| POST | /templates |
JWT (Admin) | Create a new template |
| PUT | /templates/{id} |
JWT (Admin) | Update an existing template |
| DELETE | /templates/{id} |
JWT (Admin) | Archive a template |
| POST | /templates/{id}/blank-pdf |
JWT (Admin) | Upload blank PDF for a template |
| POST | /intakes |
JWT (Worker) | Upload intake document |
| POST | /intakes/{id}/retry |
JWT (Reviewer) | Retry a failed intake |
| GET | /intakes/{id} |
JWT | Check processing status |
| GET | /documents |
JWT | List tenant-scoped documents |
| GET/HEAD | /documents/{id}/source |
JWT | Retrieve or check source document |
| GET | /review-queue |
JWT (Reviewer) | Documents awaiting review |
| GET | /reviews/{id} |
JWT (Reviewer) | Fields, confidence, similar cases, audit trail |
| POST | /reviews/{id}/finalize |
JWT (Reviewer) | Finalize with corrections |
| GET | /search?q= |
JWT | Full-text search |
| GET | /cases/{personKey} |
JWT | Case aggregate across documents |
| DELETE | /admin/documents |
JWT (Admin) | Wipe all tenant documents |
| POST | /admin/reprocess |
JWT (Admin) | Reprocess all tenant documents |
| GET | /healthz |
None | Service health |
| GET | /metrics |
JWT | Tenant-scoped counters |
Full OpenAPI spec: http://localhost:5100/openapi/v1.json