Skip to content

Forward-port main's security & bug fixes onto refactor/hexagonal#568

Merged
Ahmath-Gadji merged 75 commits into
refactor/hexagonalfrom
forward-port/main-to-hexagonal
Jun 26, 2026
Merged

Forward-port main's security & bug fixes onto refactor/hexagonal#568
Ahmath-Gadji merged 75 commits into
refactor/hexagonalfrom
forward-port/main-to-hexagonal

Conversation

@Ahmath-Gadji

@Ahmath-Gadji Ahmath-Gadji commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

What

Forward-ports the changes that landed on main after the refactor branch point onto the new hexagonal layout (core/ services/ api/ di/ + infra/). A literal git rebase was the wrong tool (it would replay the refactor's ~465 commits onto main); instead each main commit is re-expressed against the new file structure, skipping anything already fixed and obviating anything the new architecture no longer needs.

Of the 68 non-merge commits on main since the merge-base, this branch lands 59 commits: every source commit is ported, obviated (with analysis), already-present, or (one) deliberately deferred. Each commit is 1:1 with its main source and carries a Forward-ported from <hash> trailer. Full per-commit accounting (including new-vs-old file mapping and skip reasons) is in FORWARD_PORT_LOG.md.

By area

  • Infra / deploy hardening — Ray dashboard → localhost everywhere (+ external RAY_ADDRESS), non-root OpenShift Dockerfiles, no prod bind-mount / --reload, MinIO/DB/AUTH default removal, Helm DB-password→Secret + default-deny NetworkPolicy, seccomp drop, metrics lockdown, image-tag pins, Ansible hardening, version → 1.1.13.
  • Auth / OIDCALLOW_NO_AUTH gate on the no-token admin bypass, update_user mass-assignment whitelist, external_user_id→NULL, no session-token in UI file URLs, back-channel-logout exp/jti + replay cache, clock-skew/nbf, logout CSRF (Fetch-Metadata), OIDC-session revocation on token regen.
  • RAG / retrieval / loadersMilvus filter-injection guards (paren-wrap + identifier allowlist), llm_override credential/SSRF strip, control-token neutralizer, surrounding-chunk partition scoping (cross-tenant leak), token-limit + n/best_of bounds, stack-trace leak, partition-scoped source-download authz (the open /static mount was still live), EML attachment-fan-out cap, web/SVG SSRF hardening, cap-partitions-per-user, copy-endpoint validation, streaming finish_reason fix, non-empty-answer prompt rule.
  • Dependencies — Starlette/FastAPI CVE-2025-54121 pin (lockfile regenerated) + a path-tiered rate-limit middleware (limits).
  • Docs — Chainlit-on-CHAINLIT_PORT note, OIDC/SSO quick-start frontmatter + pointer fixes.

Obviated (verified, with analysis — not a live problem on this branch)

  • 199424bf empty-stream 502 → non-streaming uses materialized chat()/generate() + InferenceError(502); no __anext__ path.
  • 70a2db36 doc-loader page bug → CustomDocLoader removed by the parser-shim cleanup.
  • 63a857af image-URL SSRF → the refactor captions extracted bytes; a document's remote image URL is never fetched nor forwarded to the VLM.

Several ports involved real judgment where the refactor differed (output-escaping vs. input-allowlist for Milvus, repo-abstraction vs. atomic-session for the partition cap, the source-download rekeyed to chunk-id, and not re-loosening the refactor's stricter CHAINLIT_AUTH_SECRET policy). See the log for rationale.

Not included — one item, by deliberate choice

4bbefd41 compose non-root rework (init-perms bootstrap container + per-service user: mappings). It's all-or-nothing, runtime-only (no CI/unit feedback — compose isn't exercised in tests), depends on fragile path-mapping onto infra/compose/, and is the least security-critical item (the images already build non-root via the Dockerfiles ported here). Committing it on faith risks a silently broken stack, so it's left for a follow-up that can docker compose up to verify. Target files + notes are in FORWARD_PORT_LOG.md.

Verification

  • tests/unit/1355 passed. (One failure, test_seed_defaults_preserves_endpoint_api_keys, is an environmental artifact of the local nested-worktree .env, not a code defect — passes in a non-nested checkout / CI.)
  • Integration repos vs. a real Postgres — 82 passed; the lone failure (create_legacy_user signature) is a pre-existing base-branch bug untouched by this PR.
  • Live ASGI checks (real HTTP via TestClient) — rate-limit 429/per-tier budgets, logout CSRF 403/302, download authz 403/404.
  • App boots cleanly under the new deps; middleware stack correct (… Auth → RateLimit → route), download_source route present, open /static mount removed.
  • ruff check + ruff format --check clean; layer-import guard OK.

⚠️ The base refactor/hexagonal advanced after this branch was cut, so a rebase / conflict check may be needed before merge.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features
    • Added request rate limiting for authentication and chat traffic.
    • Added secure source-document downloads and improved source-link URL generation.
  • Bug Fixes
    • Fixed streaming so terminal SSE chunks don’t prematurely end client handling.
    • Enforced stricter validation for partitions, file IDs, and chat token limits across modes.
  • Security
    • Restricted Ray dashboard/monitoring exposure to localhost; added default-deny ingress NetworkPolicy.
    • Removed unsafe credential defaults; require secrets via environment variables.
    • Added OIDC logout CSRF protection, SSRF blocking, prompt-injection defenses, and logout replay prevention.
  • Documentation
    • Updated OIDC/SSO and Ray/environment guidance (including Ray Serve/port behavior).

The Ray dashboard (8265) was published on all host interfaces. It has no
authentication and its job-submission API allows arbitrary code execution
on the cluster, so exposing it to the network is a remote code execution
vector. Bind the published port to 127.0.0.1 so it is reachable only from
the host.

Forward-ported from c1079d7 (main) into the infra/compose/ layout.
The Ray worker image ran as root. Add a dedicated uid/gid 10001 user that
owns /app (where the app writes the venv, data, logs and model weights) and
switch to it with USER, so a container escape does not land as root on the
host. HOME is set to /app before the uv install so uv's managed Python and
cache land under the user-owned tree rather than /root.

Forward-ported from 8914dfb (main) into infra/docker/ray.Dockerfile.
…user

The API image ran as root. Add a fixed uid (APP_UID=10001, primary group 0)
for plain Docker/Kubernetes and make every runtime-writable path group-owned
by GID 0 and group-writable so the arbitrary UID OpenShift assigns can start
the app: strip group-write from the whole tree, then grant it back only on the
venv, egg-info, $HOME, data, db, logs, the HF model cache and uv's cache.
Chainlit's ./.files and ./.chainlit dirs are pre-created group-writable so the
app does not crash with PermissionError. UV_*/HOME/USER/LOGNAME env vars keep
uv's Python and cache on root-owned paths and let getpass.getuser() resolve
without an /etc/passwd entry.

Forward-ported from 8914dfb, 6860341, 26a70db and c0847d8 (main) into
infra/docker/api.Dockerfile. The later OpenShift/USER-LOGNAME/chainlit commits
fully replaced 8914dfb's initial app-image non-root approach, so they are
consolidated into this single commit against the new layout (which copies
scripts/ instead of prompts/).
The production compose bind-mounted ./openrag over the image and the
entrypoint always ran uvicorn with --reload (a dev feature, also incompatible
with multiple workers). Both let host-side changes override the running code.

- Comment out the ../../openrag dev bind-mount (uncomment for local dev).
- Gate --reload behind UVICORN_RELOAD=true (default off).

Forward-ported from 645128d (main) into infra/compose/docker-compose.yaml and
infra/scripts/entrypoint.sh (uvicorn/Ray entrypoints target api.main here).
The uvicorn deployment path fed API_NUM_WORKERS into `uvicorn --workers N`,
but the app calls ray.init() at import time, so each extra worker starts its
own isolated Ray cluster with duplicate named actors (Indexer, Vectordb,
TaskStateManager), fragmenting task state and vector-DB access.

- entrypoint.sh: always run a single uvicorn worker; warn if API_NUM_WORKERS
  is set to a non-1 value, pointing operators to Ray Serve.
- charts: drop the dead API_NUM_WORKERS: "8" (the chart runs Ray Serve, which
  takes the api.main branch and never reads it).
- .env.example / docs: remove the knob and document Ray Serve
  (ENABLE_RAY_SERVE + RAY_SERVE_NUM_REPLICAS) as the HTTP scaling path.

Closes #500

Forward-ported from 0e5687b (main) into the infra/ layout.
.env.example removed the API_NUM_WORKERS knob, but its two hand-maintained
mirrors under docs/assets/ (env_example.env, env_linux_gpu.env) still
advertised it with the old, now-incorrect description. Apply the same comment
as .env.example pointing to the Ray Serve scaling path.

Forward-ported from 319bc5c (main).
…inned uv installer) (#488)

- ansible.cfg: enable host_key_checking and switch ssh_args to
  StrictHostKeyChecking=accept-new (drop UserKnownHostsFile=/dev/null +
  StrictHostKeyChecking=no, which disabled MITM protection).
- openrag.yml: write the deployed .env as 0600 (was 0644) and pin the uv
  installer to a specific version under `set -euo pipefail` / bash.

Forward-ported from 2af1d76 (main) into infra/ansible/.
The Ray dashboard and co-hosted Jobs API are unauthenticated
(CVE-2023-48022 "ShadowRay"), so exposing them on a routable interface
is an unauthenticated-RCE vector. Bind to 127.0.0.1 by default
everywhere it is configured, overridable via RAY_DASHBOARD_HOST:

- openrag/api/main.py + openrag/api/mcp/server.py: ray.init reads
  RAY_DASHBOARD_HOST (default 127.0.0.1). The refactor has two embedded
  ray.init call sites (the API lifespan and the MCP server) where main had
  one, so both are hardened.
- infra/quick_start/docker-compose.yaml: publish 127.0.0.1:8265 only
- infra/cluster.yaml: --dashboard-host ${RAY_DASHBOARD_HOST:-127.0.0.1}
- docs/assets/compose_ollama_cpu.yaml: localhost-only dashboard port

The KubeRay pod keeps 0.0.0.0 because the dashboard Service requires in-pod
reachability; it is instead isolated via the NetworkPolicy added separately (N12).

Forward-ported from 34dc3a2 (main).
When RAY_ADDRESS is set, attach to an existing Ray cluster instead of
starting an embedded one (so no local dashboard is started — the head
node owns it). The embedded branch keeps binding the unauthenticated
dashboard to 127.0.0.1 by default (CVE-2023-48022), overridable via
RAY_DASHBOARD_HOST. Applied to both refactor ray.init sites (API lifespan
and the standalone MCP server).

Also document RAY_ADDRESS and RAY_DASHBOARD_HOST in the env examples
and the env-vars / Ray-cluster deployment docs.

Forward-ported from aa015bd (main).
…ault (H3)

Hardcoded minioadmin:minioadmin in the Milvus stacks meant any foothold on
the internal network yielded full object-store access. Require operators to
supply MINIO_ACCESS_KEY / MINIO_SECRET_KEY (no default — compose fails fast
if unset) in both the compose milvus stack and quick_start/vdb/milvus.yaml,
and pass the same credentials to the Milvus container (MINIO_ACCESS_KEY_ID /
MINIO_SECRET_ACCESS_KEY) so it no longer relies on the minioadmin default.
Document the new required vars in .env.example.

Forward-ported from 0515f70 (main): vdb/milvus.yaml -> infra/compose/milvus/
milvus.yaml; quick_start/vdb -> infra/quick_start/vdb; .env -> infra/compose.
Shipped defaults (root_password, POSTGRES_PASSWORD=root, AUTH_TOKEN=OpenRAG,
minioadmin) are usable credentials on any exposed deployment.

- docker-compose.yaml / quick_start: require ${POSTGRES_PASSWORD:?} (fail fast)
- conf/config.yaml: drop the "root_password" default; password must come from
  the POSTGRES_PASSWORD env var
- docs/assets/compose_ollama_cpu.yaml: require AUTH_TOKEN, POSTGRES_PASSWORD
  and MinIO credentials via env instead of the weak literals
- document POSTGRES_PASSWORD in .env.example

The Helm chart's Postgres password is addressed in the N7 commit (moved to a
Secret) alongside the rest of the chart hardening.

Forward-ported from 0164b82 (main) into the infra/ layout.
The Milvus services disabled syscall filtering entirely (security_opt:
seccomp:unconfined), widening container-escape surface. Remove the override so
the default seccomp profile applies. The refactor split the compose Milvus into
milvus.yaml + milvus.named-volumes.yaml, so the override is dropped from both
plus quick_start/vdb/milvus.yaml.

Forward-ported from b8002ef (main) into the infra/ layout.
…, M2)

The Postgres password was rendered into the world-readable ConfigMap
(env.config) and defaulted to root_password. Move POSTGRES_PASSWORD to
env.secrets (rendered into the Opaque Secret, which the deployment already
mounts via secretRef) and replace the default with an obvious placeholder
that must be overridden at install time.

Forward-ported from 24efaa6 (main) into infra/charts/openrag-stack/values.yaml.
…k8s)

The chart shipped no NetworkPolicy, so any pod in the cluster could reach the
unauthenticated Ray dashboard (8265), GCS (6379), Ray client (10001), Postgres
and Milvus. Add a default-deny-ingress NetworkPolicy (podSelector: {}) that
allows only intra-namespace traffic plus the configurable public HTTP ports
(8080, 3000). Gated by networkPolicy.enabled (default true). This is the
in-cluster isolation referenced by the C3 fix for the KubeRay dashboard.

Forward-ported from c8f2d47 (main) into infra/charts/openrag-stack/.
The metrics compose exposed Prometheus and exporters on 0.0.0.0 and enabled
Prometheus' unauthenticated lifecycle endpoints.

- Prometheus: bind 127.0.0.1:9090 and drop --web.enable-lifecycle
  (unauthenticated /-/reload and /-/quit).
- node-exporter (host pid + rootfs) and nvidia-gpu-exporter: bind to
  127.0.0.1. Prometheus scrapes them over the compose network by service
  name, so localhost binding doesn't affect scraping.

Forward-ported from 5caec50 (main): openrag_metrics/docker-compose.yaml ->
infra/compose/monitoring.docker-compose.yaml.
Tracking :latest makes deploys non-reproducible and silently pulls unreviewed
images. Pin the OpenRAG-owned chart images to the release version and the
metrics stack to specific stable versions:

- charts: openrag, openrag-ray, indexer-ui -> 1.1.13 (this branch's release;
  main pinned 1.1.11)
- metrics: prometheus v2.54.1, grafana 11.2.2, node-exporter v1.8.2

Operator-supplied model-serving images (vLLM engines, infinity reranker) carry
a comment to pin to a specific release/digest before production.

NOTE: the compose / quick_start openrag-image pins from the original commit are
NOT ported — main reverted those to :latest in 64c3e72 (already the refactor
state), so only the chart + metrics pins remain in main's final state.

Forward-ported from d7fc313 (main): openrag_metrics -> infra/compose/
monitoring.docker-compose.yaml; charts -> infra/charts.
@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

The PR updates authentication, partition handling, rate limiting, Ray startup, documentation, compose and Helm defaults, and several security-related service behaviors. It also adds a forward-port log entry.

Changes

API and Service Security

Layer / File(s) Summary
Partition validation and limits
openrag/core/indexing/validators.py, openrag/api/dependencies/auth.py, openrag/api/routers/admin/partitions.py, openrag/services/orchestrators/partition_service.py, tests/unit/api/dependencies/test_auth.py, tests/unit/core/indexing/test_validators.py, tests/unit/services/orchestrators/test_partition_service.py, tests/unit/api/routers/admin/test_phase14_partition_routes.py, tests/unit/services/persistence/test_partition_repo.py
file_id and partition inputs use the shared allowlist, missing partitions return 404 outside editor flows, and partition creation enforces MAX_PARTITIONS_PER_USER for non-admins.
Middleware and startup
openrag/api/middleware/*, openrag/api/main.py, openrag/api/mcp/server.py, tests/unit/api/middleware/test_*.py, tests/unit/api/mcp/test_server.py
Token-mode dev bypass now requires ALLOW_NO_AUTH, rate limiting keys by user or IP with tiered budgets, and Ray startup branches on RAY_ADDRESS and RAY_DASHBOARD_HOST.
Logout, downloads, and source links
openrag/api/main.py, openrag/api/routers/auth/oidc.py, openrag/api/routers/user/download.py, openrag/api/routers/user/source_links.py, openrag/app_front.py, tests/unit/api/routers/user/test_*.py, tests/unit/test_auth_router.py
Logout blocks cross-site non-navigation requests, source URLs move to authorized chunk-id downloads, and browser source links stop appending the token query parameter in OIDC mode.
Admin responses and schema limits
openrag/api/routers/admin/indexing.py, openrag/api/routers/user/chat.py, openrag/api/routers/user/search.py, openrag/api/schemas/user/chat.py, tests/unit/api/schemas/test_api_schema_imports.py
Admin file-save and task-error responses hide internal details, search and chat requests validate identifiers and token sizes, and the chat schema bounds n/best_of.

Service-Layer Security and Content Processing

Layer / File(s) Summary
OIDC sessions and user updates
openrag/services/auth/oidc_client.py, openrag/services/orchestrators/auth_service.py, openrag/services/orchestrators/user_service.py, openrag/core/models/user.py, tests/unit/services/orchestrators/test_auth_service.py, tests/unit/services/orchestrators/test_user_service.py
OIDC claim validation adds clock-skew leeway, replayed logout tokens are rejected, token rotation revokes sessions, and user updates ignore unknown fields.
Prompt, retrieval, and fetch safety
openrag/core/utils/text.py, openrag/core/utils/source_filtering.py, openrag/core/prompts/chat_prompt_builder.py, openrag/services/inference/vllm_client.py, openrag/core/indexing/parsers/*, openrag/services/storage/*, openrag/services/websearch/content_fetcher.py, tests/unit/core/utils/test_*.py, tests/unit/core/indexing/parsers/test_eml_parser.py, tests/unit/services/inference/test_vllm_client.py, tests/unit/services/storage/test_*.py, tests/unit/services/websearch/test_content_fetcher.py
Prompt-control tokens are neutralized before prompt assembly, streamed tail chunks clear finish_reason, vLLM overrides keep only model, attachment fan-out is capped, Milvus filters are parenthesized, surrounding-chunk lookup stays partition-scoped, and web requests are guarded against private IP targets.
User and partition services
openrag/services/orchestrators/partition_service.py, openrag/services/orchestrators/user_service.py, tests/unit/services/orchestrators/test_partition_service.py, tests/unit/services/orchestrators/test_user_service.py
Partition creation applies the owned-partition cap from the router, and user updates only persist allowlisted fields.

Documentation and Deployment

Layer / File(s) Summary
Docs and env references
CLAUDE.md, README.md, docs/assets/env_*.env, docs/content/docs/documentation/*, docs/content/docs/getting_started/usage.mdx
The OIDC and Ray docs point to the new guide locations and document Ray Serve scaling, external Ray attachment, dashboard binding, and Chainlit port behavior.
Compose examples
docs/assets/compose_*.yaml, infra/compose/.env.example, tests/unit/infra/test_compose_storage.py, tests/integration/api/api_run/docker-compose.yaml
Compose example files switch Ray dashboard bindings to localhost, require credentials from environment variables, and update the compose-related tests.
Cluster and chart defaults
conf/config.yaml, infra/ansible/ansible.cfg, infra/charts/openrag-stack/*, infra/cluster.yaml, pyproject.toml
Cluster and Helm defaults enable the network policy, pin images, change PostgreSQL and Ray defaults, and update host-key checking plus package bounds.
Containers and bootstrap
infra/ansible/playbooks/openrag.yml, infra/compose/*.yaml, infra/docker/*.Dockerfile, infra/quick_start/*, infra/scripts/entrypoint.sh
Playbook, Dockerfile, compose, quick-start, and entrypoint changes enforce non-root runtimes, localhost-only dashboard and metrics bindings, required secrets, pinned installs, and single-worker Uvicorn startup.

Forward-port Record

Layer / File(s) Summary
Forward-port log
FORWARD_PORT_LOG.md
The forward-port log adds the dated mainrefactor/hexagonal entry, lists the remaining TODO items, and marks the legacy critical section as superseded.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

  • linagora/openrag#257: Both PRs modify openrag/core/utils/source_filtering.py’s SSE tail-chunk handling.
  • linagora/openrag#393: Both PRs change Milvus filter-expression construction in openrag/services/storage/milvus_store.py.
  • linagora/openrag#396: Both PRs touch the OIDC auth orchestration layer in openrag/services/orchestrators/auth_service.py.
  • linagora/openrag#398: Both PRs update OIDC claim validation in openrag/services/auth/oidc_client.py.
  • linagora/openrag#429: Both PRs harden ContentFetcher against unsafe outbound targets.
  • linagora/openrag#430: Both PRs modify openrag/api/dependencies/auth.py partition access-control behavior.
  • linagora/openrag#433: Both PRs modify openrag/api/mcp/server.py auth/startup behavior.
  • linagora/openrag#462: Both PRs clear the tail chunk finish_reason in streaming output.
  • linagora/openrag#466: Both PRs harden Milvus filter construction with parentheses.
  • linagora/openrag#467: Both PRs restrict llm_override handling to model only.
  • linagora/openrag#470: Both PRs implement allowlist-based file_id/partition validation.
  • linagora/openrag#478: Both PRs add rate limiting, SSRF guarding, and prompt-control token neutralization.
  • linagora/openrag#491: Both PRs update ansible.cfg and playbooks/openrag.yml for SSH host-key checking and pinned uv installation.
  • linagora/openrag#501: Both PRs remove the API_NUM_WORKERS knob in favor of Ray Serve scaling.
  • linagora/openrag#503: Both PRs touch the pyproject.toml version field and dependency bounds.
  • linagora/openrag#509: Both PRs modify docs/content/docs/getting_started/usage.mdx for Ray Serve and Chainlit port behavior.
  • linagora/openrag#515: Both PRs modify openrag/api/routers/admin/partitions.py endpoint signatures around file_id/limit.
  • linagora/openrag#526: Both PRs branch Ray startup on RAY_ADDRESS and RAY_DASHBOARD_HOST.
  • linagora/openrag#537: Both PRs are tied to OIDC logout flow changes.
  • linagora/openrag#547: Both PRs update openrag/prompts/templates/sys_prompt_tmpl.txt to require a non-empty assistant response body.
  • linagora/openrag#555: This PR adds AuthService.revoke_user_oidc_sessions(user_id), which overlaps with the OIDC-session revocation path in that PR.

Suggested reviewers

  • hedhoud
  • EnjoyBacon7

Poem

A bunny hops through Ray-lit halls,
With localhost snugly guarding walls.
It nibbles tokens, trims the sneaks,
And keeps the streams neat for many weeks.
🐇✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 23.86% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the PR’s main purpose: forward-porting security and bug fixes from main onto refactor/hexagonal.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch forward-port/main-to-hexagonal

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

…bypass

When AUTH_MODE=token and AUTH_TOKEN was unset, the middleware silently
treated every request as admin user 1 — a fail-open that turns a single
missing env var into a world-open admin API. Gate this dev bypass behind
an explicit ALLOW_NO_AUTH=true opt-in (with a loud warning); without it a
missing AUTH_TOKEN no longer fails open and unauthenticated requests are
denied.

Tests added under tests/unit/api/middleware/test_bypass_config.py (the
refactor has no equivalent of the legacy components/auth/test_middleware.py
dev-bypass suite) using the existing dispatch harness.

Forward-ported from 7bc4669 (main): openrag/components/auth/middleware.py ->
openrag/api/middleware/auth.py (resolves admin via the injected auth service).
…ent)

UserUpdate used extra="allow" and the update payload flowed into the repo's
broad column setter. A caller could set columns never meant to be writable via
the update — e.g. token (overwrite a victim's auth token to a known value),
file_count (bypass quota), or id.

- core/models/user.py: switch UserCreate/UserUpdate to extra="ignore" so
  unknown fields are dropped at parse.
- services/orchestrators/user_service.update_user: whitelist writable profile
  fields (display_name, external_user_id, email, is_admin, file_quota) before
  calling the repo. The repo's update_user still accepts token/file_count for
  internal callers (token rotation, quota), so the boundary guard lives in the
  service rather than tightening the shared repo method.

Forward-ported from 202433d (main). Partial on arrival: the refactor already
had a column whitelist in the repo, but it included token/file_count and the
DTOs still allowed extra fields; this closes both.
…ex collision (#121)

An empty/whitespace external_user_id written to the unique index collides
(Postgres permits many NULLs but only one empty string). Add a UserBase
field_validator that coerces "" / whitespace to None.

The refactor already normalized this on the create path in the repo
(user_repo.create_user), but not on update; placing the validator on the
shared UserBase DTO covers both UserCreate and UserUpdate.

Forward-ported from edd2c7c (main): openrag/models/user.py ->
openrag/core/models/user.py.
Source file links embedded the credential as ?token=<api_key>, leaking it to
browser history, proxy logs and Referer headers. In OIDC mode the browser
already sends the openrag_session cookie on same-origin file fetches (the auth
middleware checks the cookie first for /static), so the token query param is
redundant — drop it there. Token mode keeps it (no cookie exists).

Forward-ported from 229503b (main): openrag/app_front.py.

Note: the sibling H7 commit 47b8cd3 (fail-fast on missing CHAINLIT_AUTH_SECRET)
was NOT ported — the refactor already fail-fasts (stricter, #380/b63a9825) and
guards it with a regression test; 47b8cd3 only re-adds an ALLOW_NO_AUTH-gated
default-secret fallback, a loosening the refactor deliberately rejected.
The N8 fix (don't bind-mount source in prod) comments out the
../../openrag:/app/openrag dev mount, so the default compose no longer lists
it. Assert it is absent rather than present.

Follow-up to the forward-port of 645128d (N8).
- uv.lock: bump the openrag package version pin to match pyproject (1.1.13).
- user_service.py: ruff-format the _UPDATABLE_USER_FIELDS constant onto one line.
Record ported commits (source hash → new-layout target), skipped commits
(already present / superseded / deliberately not loosened), and the remaining
TODO queue (OIDC crypto, RAG/retrieval batch, deps, deferred compose non-root
+ docs) so the effort is auditable and resumable.
…out (M9)

OIDC back-channel logout tokens are short-lived single-use tokens, but the
verifier defaulted a missing exp to "now + 1" (so an absent exp never expired)
and ignored jti entirely.

- verify_logout_token now requires exp (and rejects expired tokens) and
  requires jti, both per the OIDC back-channel logout spec; exp is surfaced on
  LogoutTokenClaims.
- AuthService.handle_backchannel_logout records consumed jti -> exp in a pruned
  in-process cache and rejects replays (defence-in-depth; logout is idempotent
  so this complements rather than replaces the DB revocation). The refactor
  routes the handler through the service, so the cache lives there (not the
  router as on main).

Forward-ported from 97c624e (main): openrag/components/auth/oidc_client.py ->
services/auth/oidc_client.py; routers/auth.py replay cache -> AuthService.
The test follow-up cdb3edc (exp on the logout test token) is subsumed: the
refactor's service tests use a fake client, and the added replay test carries exp.
…cation (crypto)

ID-token (and logout-token) exp was checked with zero tolerance and nbf was
not validated at all. Add a 60s clock-skew leeway to the exp checks and honour
nbf (reject tokens whose not-before is in the future beyond the leeway).

Forward-ported from 2b34a0d (main): services/auth/oidc_client.py.
GET /auth/logout is state-changing (revokes the session) and cookie-authed,
so a forged request (e.g. <img src=".../auth/logout">) could force-logout a
user. It must stay a GET to support the OIDC RP-initiated logout redirect, so
add a Fetch-Metadata guard that rejects cross-site non-navigation requests
(the silent CSRF vector) while allowing genuine top-level navigations and
same-origin calls.

Forward-ported from c2fde13 (main): routers/auth.py -> api/routers/auth/oidc.py.
…486)

Rotating a user's API token left their active OIDC browser sessions valid, so
a rotation didn't fully cut off access. After regenerate_token, revoke the
user's OIDC sessions.

The refactor already had the other half of the original commit: ensure_admin_user
no longer rotates the stored token on startup when AUTH_TOKEN is unset, and the
OIDC session repo already exposes revoke_by_user. This wires that revocation into
the token-regen flow via a thin AuthService.revoke_user_oidc_sessions, called by
UserService.regenerate_token (which already composes AuthService).

Forward-ported from 9a73200 (main): components/indexer/vectordb/utils.py ->
services/orchestrators/{auth_service,user_service}.py.
@hedhoud

hedhoud commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

The empty-partition fix still has one hole.

The new guard checks the dependency value for partitions == ["all"], but that dependency reads only the raw query string. When the client omits partitions, the route still applies its default partitions=["all"], while the guard sees an empty list and returns successfully. For a user with no memberships, the route then expands all to an empty partition list and the retrieval path can still behave like an unscoped search.

So this needs to fail closed for the omitted-parameter case too, not only for an explicit ?partitions=all. A regression test should cover a regular user with no memberships calling the multi-partition search route without a partitions query parameter.

@hedhoud

hedhoud commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

The auth-failure limiter is closer, but it still runs too late for invalid tokens.

For a bad bearer token, the middleware first asks the auth service to resolve the token, and only after that lookup fails does it increment/check the failure limiter. Once the IP is over limit, later bad-token requests still perform the same token lookup before returning 429.

That means the route is protected, but the auth lookup path is not. If the goal is to bound brute-force or noisy invalid-token traffic, the limiter needs a pre-check before the session/user-token lookup, then record the failure after the lookup only when it was actually invalid.

@hedhoud

hedhoud commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

One smaller docs mismatch remains.

The API docs still describe llm_override as accepting a custom endpoint and API key, and the example shows base_url plus api_key. The implementation now intentionally ignores those fields and only allows the model name to be overridden.

That mismatch will lead users to think they can route a RAG request to their own LLM endpoint when the server will keep using the configured backend. The docs should describe the new behavior and update the example accordingly.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
openrag/services/persistence/partition_repo.py (1)

50-78: 🗄️ Data Integrity & Integration | 🟠 Major | 🏗️ Heavy lift

Don't treat a race-lost create as a successful create.

This flow can return an existing row when another request wins the same-name create first. PartitionService.create_partition() then continues into update_partition(...), so the losing request can overwrite preset/config fields on a partition it did not create instead of getting PARTITION_EXISTS. The advisory lock here is keyed only by user_id, so concurrent creates from a different user/admin/no-auth path still reach this branch.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@openrag/services/persistence/partition_repo.py` around lines 50 - 78, The
create flow in partition_repo.py is treating a race-lost lookup as a successful
creation, which lets PartitionService.create_partition() proceed into
update_partition() for a partition it did not win. Update the create path around
the existing fetchrow/insert logic so that if the initial SELECT finds an
existing partition, the method raises the same PARTITION_EXISTS error instead of
returning that row; only the INSERT branch should represent a successful create.
Also review the advisory lock keyed by user_id so it does not mask cross-user
concurrent creates from reaching this branch.
🧹 Nitpick comments (1)
openrag/api/routers/admin/partitions.py (1)

286-293: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Bind context on the quota-config failure log.

This new error path logs an unstructured message, so the bad setting and caller context are harder to trace in production.

♻️ Proposed fix
     try:
         max_owned = max_partitions_for_user(user)
     except ConfigError as exc:
-        logger.error("Invalid MAX_PARTITIONS_PER_USER value")
+        logger.bind(
+            config_key="MAX_PARTITIONS_PER_USER",
+            user_id=user_id,
+        ).error("Invalid MAX_PARTITIONS_PER_USER value")
         raise HTTPException(
             status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
             detail=exc.message,
         ) from exc

As per coding guidelines, "Use Loguru structured logging via get_logger() and bind contextual fields instead of emitting unstructured logs."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@openrag/api/routers/admin/partitions.py` around lines 286 - 293, The
quota-config failure path in the partition admin handler logs an unstructured
message; update the ConfigError branch in the partitions router to use Loguru
via get_logger() and bind contextual fields instead of logger.error. Include the
invalid MAX_PARTITIONS_PER_USER value and relevant caller context in the bound
log entry so the failure can be traced consistently while still raising the
existing HTTPException with exc.message.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/content/docs/documentation/user_auth.md`:
- Around line 13-15: Update the auth documentation bullets in user_auth.md so
the fail-closed behavior is explicitly tied to token mode only: in the section
describing AUTH_TOKEN, AUTH_MODE, and ALLOW_NO_AUTH, clarify that missing
AUTH_TOKEN causes OpenRAG to fail closed only when AuthMiddleware is running
with AUTH_MODE=token, while OIDC mode does not use AUTH_TOKEN. Keep the wording
aligned with the existing auth symbols AUTH_TOKEN, AUTH_MODE, and ALLOW_NO_AUTH,
and make the same clarification in the duplicated section referenced by the
review.

In `@openrag/api/middleware/auth.py`:
- Around line 127-128: The warning in the auth rate-limit path logs
request-derived fields directly and bypasses the repo’s structured logging
pattern. Update the auth middleware to use get_logger() with bound contextual
fields instead of passing raw path/identity kwargs, and normalize/sanitize
request.url.path and identity before binding them in the Auth failure rate limit
exceeded log. Keep the logging call in the same failure branch so the new bound
logger is used consistently with the rest of the module.
- Around line 108-111: The AuthMiddleware initialization currently parses the
rate limit unconditionally, so a malformed RATE_LIMIT_AUTH_FAILURE or
RATE_LIMIT_AUTH can raise ValueError even when RATE_LIMIT_ENABLED is false.
Update AuthMiddleware.__init__ to only call parse() and assign _limit when
self.enabled is true, otherwise set _limit to None. Then update
response_if_limited to safely handle a None _limit case by skipping limiter
logic and returning the normal response path when the limiter is disabled.

In `@openrag/services/orchestrators/indexing_service.py`:
- Around line 147-152: The ownership cap in indexing is being recomputed from a
partial user dict, which drops the caller’s admin status and can make
auto-created partitions hit the limit. Update the `IndexingService.add_file()`
flow and the `create_partition` call path so `is_admin` is preserved from the
caller (or `max_owned` is passed explicitly) instead of rebuilding quota state
from `user`; make sure `MCPService.index_url()`’s user payload still results in
admin-cap behavior for super-admin uploads.

In `@openrag/services/orchestrators/mcp_service.py`:
- Around line 186-187: The auto-create quota path in _ensure_partition_exists()
is rebuilding cap_user with is_admin=False for authenticated callers, which
downgrades admins and applies the wrong partition cap. Use the admin flag
already resolved by _resolve_principal() when constructing the user passed to
max_partitions_for_user(), or compute max_owned before calling
self._partitions.create_partition() so MCP matches the behavior in
openrag/api/routers/admin/partitions.py.

---

Outside diff comments:
In `@openrag/services/persistence/partition_repo.py`:
- Around line 50-78: The create flow in partition_repo.py is treating a
race-lost lookup as a successful creation, which lets
PartitionService.create_partition() proceed into update_partition() for a
partition it did not win. Update the create path around the existing
fetchrow/insert logic so that if the initial SELECT finds an existing partition,
the method raises the same PARTITION_EXISTS error instead of returning that row;
only the INSERT branch should represent a successful create. Also review the
advisory lock keyed by user_id so it does not mask cross-user concurrent creates
from reaching this branch.

---

Nitpick comments:
In `@openrag/api/routers/admin/partitions.py`:
- Around line 286-293: The quota-config failure path in the partition admin
handler logs an unstructured message; update the ConfigError branch in the
partitions router to use Loguru via get_logger() and bind contextual fields
instead of logger.error. Include the invalid MAX_PARTITIONS_PER_USER value and
relevant caller context in the bound log entry so the failure can be traced
consistently while still raising the existing HTTPException with exc.message.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 24329baa-b831-4726-9fe6-7a43bbd1dae1

📥 Commits

Reviewing files that changed from the base of the PR and between 21bf3ec and e827d64.

📒 Files selected for processing (26)
  • docs/assets/compose_ollama_cpu.yaml
  • docs/content/docs/documentation/API.mdx
  • docs/content/docs/documentation/env_vars.md
  • docs/content/docs/documentation/user_auth.md
  • infra/compose/.env.example
  • openrag/api/dependencies/auth.py
  • openrag/api/dependencies/llm.py
  • openrag/api/mcp/server.py
  • openrag/api/middleware/auth.py
  • openrag/api/routers/admin/partitions.py
  • openrag/core/ports/partition_repo.py
  • openrag/core/utils/partition_limits.py
  • openrag/services/orchestrators/indexing_service.py
  • openrag/services/orchestrators/mcp_service.py
  • openrag/services/orchestrators/partition_service.py
  • openrag/services/persistence/partition_repo.py
  • tests/unit/api/dependencies/test_auth.py
  • tests/unit/api/dependencies/test_llm.py
  • tests/unit/api/mcp/test_server.py
  • tests/unit/api/middleware/test_bypass_config.py
  • tests/unit/infra/test_compose_storage.py
  • tests/unit/services/orchestrators/test_indexing_service.py
  • tests/unit/services/orchestrators/test_mcp_service.py
  • tests/unit/services/orchestrators/test_partition_preset_resolution.py
  • tests/unit/services/orchestrators/test_partition_service.py
  • tests/unit/services/persistence/test_partition_repo.py
✅ Files skipped from review due to trivial changes (2)
  • docs/content/docs/documentation/API.mdx
  • tests/unit/services/orchestrators/test_indexing_service.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • infra/compose/.env.example
  • tests/unit/infra/test_compose_storage.py

Comment thread docs/content/docs/documentation/user_auth.md
Comment thread openrag/api/middleware/auth.py Outdated
Comment thread openrag/api/middleware/auth.py Outdated
Comment thread openrag/services/orchestrators/indexing_service.py Outdated
Comment thread openrag/services/orchestrators/mcp_service.py Outdated
@hedhoud

hedhoud commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

Concurrent partition auto-creation still needs one guard.

When two first uploads target the same missing partition at the same time, both requests can pass the missing-partition check and then race into partition creation. The loser can now fail with a partition-already-exists error, even though the partition is available and the upload could safely continue.

This path should treat that specific conflict as a successful create by refreshing the partition cache and continuing. Otherwise a normal concurrent upload can become a user-visible indexing failure.

@hedhoud

hedhoud commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

The named-volume Milvus profile now asks for a different MinIO credential pair than the main compose example documents.

The default .env guidance uses MINIO_ACCESS_KEY and MINIO_SECRET_KEY, but the named-volume profile requires MINIO_ROOT_USER and MINIO_ROOT_PASSWORD. A user can follow the documented compose setup, switch to the named-volume profile, and fail during compose interpolation before the stack even starts.

The profile should either reuse the documented variables or the named-volume env example needs to clearly define the extra required values.

@hedhoud

hedhoud commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

Back-channel logout should not require exp on the logout token.

Some OIDC providers send spec-compliant back-channel logout tokens without exp. With the current validation, those logout requests are rejected as invalid, so the provider thinks it notified OpenRAG but the local session remains active.

jti is the important replay-prevention claim here. If exp is absent, the replay cache can use a conservative default TTL instead of rejecting the logout outright.

…cess

The partition auto-create race guard added for #568 re-verifies editor
membership after a concurrent create lost the race. It raised a builtin
`PermissionError`, which the global exception handlers don't map (only
`OpenRAGError` subclasses get their declared status) — so the catch-all
turned a legitimate authorization denial into an HTTP 500.

Raise `AuthError` (status 403) instead, so a cross-user create race that
the loser isn't a member of returns a clean Forbidden. Same-user
concurrent uploads are unaffected (they own the partition and continue).

Test asserts the raised error carries `status_code == 403`.
@Ahmath-Gadji

Copy link
Copy Markdown
Collaborator Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

In Ray Serve mode the API and Chainlit run on separate ports (the API via
`serve.run`, Chainlit via its own `uvicorn.run` on `chainlit_port`). Source
previews rewrite their file download links to the browser origin — which in
that topology is the Chainlit host, not the API host. But the standalone
Chainlit app (`chainlit_api.app`) only mounted the Chainlit UI, so the
authorized `/static/{extract_id}` download route was absent there and every
PDF/image/audio source preview 404'd.

Mount `download_router` on the standalone Chainlit app so the rewritten
links resolve. The route reuses the same `AuthMiddleware` (cookie in OIDC
mode — cookies aren't port-scoped, so the session cookie set by the API is
sent to the Chainlit port too — and `?token=` in token mode) and resolves
its services from the container the middleware initializes. The mounted
(`/chainlit`) deployment is unaffected: the UI shares the API origin, which
already serves this route.

Adds a regression test asserting the standalone app exposes the route.
@Ahmath-Gadji

Copy link
Copy Markdown
Collaborator Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@hedhoud hedhoud left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the detached Ray actor compatibility concern against the current PR head (18050a2). I don’t think that P1 applies to this version of the PR.

The current dispatcher still routes indexing through the existing pool method, so a leftover detached IndexerPool from a previous rollout would not hit the described submit/process_file mismatch. Also, the worker pool files mentioned in that concern are not part of the current PR diff against refactor/hexagonal.

I also reran the worker-focused tests around dispatcher/pool/bootstrap behavior, and they pass locally. So from this specific angle, I don’t see a live-upgrade blocker in the current head.

hedhoud and others added 5 commits June 26, 2026 10:46
…p priv

Builds on the APP_UID build arg: matching the host UID isn't enough on its
own, because Docker creates a missing bind-mount source (data/, logs/, the
HF cache) as root:root, which the non-root container still can't write —
so a fresh deploy fails with "[Errno 13] Permission denied" on upload,
logging, or model download.

Make it work with zero host-side steps:
- entrypoint.sh: when started as root, chgrp 0 + chmod g+w the writable
  bind mounts (mirroring the image's own chmod g=u), then drop to the
  non-root app user via setpriv before launching. A non-root start
  (OpenShift's arbitrary UID in GID 0) skips the fix and runs directly.
- docker-compose.yaml: run the openrag service as user "0:0" so the
  entrypoint can fix perms before dropping privileges.
- Dockerfile: ENV APP_UID so the entrypoint knows which UID to drop to;
  USER stays non-root for standalone/OpenShift builds.
…get-or-create

Two failures surfaced with ENABLE_RAY_SERVE=true:

1. @serve.ingress(app) cloudpickles the FastAPI app to ship it to replica
   processes. loguru's file sink isn't picklable (an open file handle, and with
   enqueue=True a multiprocessing.SimpleQueue), and the app graph (lifespan,
   exception handlers) captures the module-global logger by value — so binding
   crashed with "SimpleQueue objects should only be shared between processes
   through inheritance". Strip the logger's sinks in the driver before binding
   (a handler-less logger pickles fine) and restore them right after; replica
   processes re-add their own sinks via the module-level get_logger() that runs
   on import, matching loguru's per-process handler model.

2. With multiple replicas, each runs worker bootstrap in its lifespan and they
   raced on get_or_create_actor: get_actor() returned not-found, then create
   collided with a concurrent replica that created it first, crashing startup
   with "Failed to look up actor 'llmSemaphore'". Use Ray's atomic
   get_if_exists=True so concurrent callers all attach to a single actor.
In AUTH_MODE=oidc, /docs, /redoc and /openapi.json no longer serve the full
route + schema surface to anonymous callers: an unauthenticated browser is
302-redirected to /auth/login, while a valid session (or a users.token bearer)
still renders them. Token mode is unchanged — docs stay public there, since a
browser has no login flow to bounce through and the legacy bypass contract must
hold; health/version probes and API routes are unaffected.

The gated set is AuthBypassConfig.oidc_gated_paths (default the three docs
paths); override it to () to keep docs public under OIDC.
In Ray Serve mode the standalone Chainlit app runs in its own uvicorn
process, separate from the API replicas that own the initialized
ServiceContainer. It lazily built a container but never called
initialize(), so its asyncpg pool was never opened.

In token mode the /chainlit bypass never touches the DB, but in OIDC
mode the AuthMiddleware does a session lookup for /chainlit/ HTML loads
when a session cookie is present — hitting the un-opened pool and
raising "ConnectionManager.initialize() has not been called" (500 on
every page load once the user holds an openrag_session cookie).

Give the standalone app its own lifespan that builds and initializes
the container (best-effort, degrading to 503 like api.main), and add a
regression test for the cookie-present OIDC path.
@Ahmath-Gadji Ahmath-Gadji merged commit d1f7841 into refactor/hexagonal Jun 26, 2026
6 checks passed
@Ahmath-Gadji Ahmath-Gadji deleted the forward-port/main-to-hexagonal branch June 26, 2026 13:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants