Forward-port main's security & bug fixes onto refactor/hexagonal#568
Conversation
The Ray dashboard (8265) was published on all host interfaces. It has no authentication and its job-submission API allows arbitrary code execution on the cluster, so exposing it to the network is a remote code execution vector. Bind the published port to 127.0.0.1 so it is reachable only from the host. Forward-ported from c1079d7 (main) into the infra/compose/ layout.
The Ray worker image ran as root. Add a dedicated uid/gid 10001 user that owns /app (where the app writes the venv, data, logs and model weights) and switch to it with USER, so a container escape does not land as root on the host. HOME is set to /app before the uv install so uv's managed Python and cache land under the user-owned tree rather than /root. Forward-ported from 8914dfb (main) into infra/docker/ray.Dockerfile.
…user The API image ran as root. Add a fixed uid (APP_UID=10001, primary group 0) for plain Docker/Kubernetes and make every runtime-writable path group-owned by GID 0 and group-writable so the arbitrary UID OpenShift assigns can start the app: strip group-write from the whole tree, then grant it back only on the venv, egg-info, $HOME, data, db, logs, the HF model cache and uv's cache. Chainlit's ./.files and ./.chainlit dirs are pre-created group-writable so the app does not crash with PermissionError. UV_*/HOME/USER/LOGNAME env vars keep uv's Python and cache on root-owned paths and let getpass.getuser() resolve without an /etc/passwd entry. Forward-ported from 8914dfb, 6860341, 26a70db and c0847d8 (main) into infra/docker/api.Dockerfile. The later OpenShift/USER-LOGNAME/chainlit commits fully replaced 8914dfb's initial app-image non-root approach, so they are consolidated into this single commit against the new layout (which copies scripts/ instead of prompts/).
The production compose bind-mounted ./openrag over the image and the entrypoint always ran uvicorn with --reload (a dev feature, also incompatible with multiple workers). Both let host-side changes override the running code. - Comment out the ../../openrag dev bind-mount (uncomment for local dev). - Gate --reload behind UVICORN_RELOAD=true (default off). Forward-ported from 645128d (main) into infra/compose/docker-compose.yaml and infra/scripts/entrypoint.sh (uvicorn/Ray entrypoints target api.main here).
The uvicorn deployment path fed API_NUM_WORKERS into `uvicorn --workers N`, but the app calls ray.init() at import time, so each extra worker starts its own isolated Ray cluster with duplicate named actors (Indexer, Vectordb, TaskStateManager), fragmenting task state and vector-DB access. - entrypoint.sh: always run a single uvicorn worker; warn if API_NUM_WORKERS is set to a non-1 value, pointing operators to Ray Serve. - charts: drop the dead API_NUM_WORKERS: "8" (the chart runs Ray Serve, which takes the api.main branch and never reads it). - .env.example / docs: remove the knob and document Ray Serve (ENABLE_RAY_SERVE + RAY_SERVE_NUM_REPLICAS) as the HTTP scaling path. Closes #500 Forward-ported from 0e5687b (main) into the infra/ layout.
.env.example removed the API_NUM_WORKERS knob, but its two hand-maintained mirrors under docs/assets/ (env_example.env, env_linux_gpu.env) still advertised it with the old, now-incorrect description. Apply the same comment as .env.example pointing to the Ray Serve scaling path. Forward-ported from 319bc5c (main).
Forward-ported from c558dd1 (main).
Forward-ported from d69be1d (main).
…inned uv installer) (#488) - ansible.cfg: enable host_key_checking and switch ssh_args to StrictHostKeyChecking=accept-new (drop UserKnownHostsFile=/dev/null + StrictHostKeyChecking=no, which disabled MITM protection). - openrag.yml: write the deployed .env as 0600 (was 0644) and pin the uv installer to a specific version under `set -euo pipefail` / bash. Forward-ported from 2af1d76 (main) into infra/ansible/.
The Ray dashboard and co-hosted Jobs API are unauthenticated (CVE-2023-48022 "ShadowRay"), so exposing them on a routable interface is an unauthenticated-RCE vector. Bind to 127.0.0.1 by default everywhere it is configured, overridable via RAY_DASHBOARD_HOST: - openrag/api/main.py + openrag/api/mcp/server.py: ray.init reads RAY_DASHBOARD_HOST (default 127.0.0.1). The refactor has two embedded ray.init call sites (the API lifespan and the MCP server) where main had one, so both are hardened. - infra/quick_start/docker-compose.yaml: publish 127.0.0.1:8265 only - infra/cluster.yaml: --dashboard-host ${RAY_DASHBOARD_HOST:-127.0.0.1} - docs/assets/compose_ollama_cpu.yaml: localhost-only dashboard port The KubeRay pod keeps 0.0.0.0 because the dashboard Service requires in-pod reachability; it is instead isolated via the NetworkPolicy added separately (N12). Forward-ported from 34dc3a2 (main).
When RAY_ADDRESS is set, attach to an existing Ray cluster instead of starting an embedded one (so no local dashboard is started — the head node owns it). The embedded branch keeps binding the unauthenticated dashboard to 127.0.0.1 by default (CVE-2023-48022), overridable via RAY_DASHBOARD_HOST. Applied to both refactor ray.init sites (API lifespan and the standalone MCP server). Also document RAY_ADDRESS and RAY_DASHBOARD_HOST in the env examples and the env-vars / Ray-cluster deployment docs. Forward-ported from aa015bd (main).
…ault (H3) Hardcoded minioadmin:minioadmin in the Milvus stacks meant any foothold on the internal network yielded full object-store access. Require operators to supply MINIO_ACCESS_KEY / MINIO_SECRET_KEY (no default — compose fails fast if unset) in both the compose milvus stack and quick_start/vdb/milvus.yaml, and pass the same credentials to the Milvus container (MINIO_ACCESS_KEY_ID / MINIO_SECRET_ACCESS_KEY) so it no longer relies on the minioadmin default. Document the new required vars in .env.example. Forward-ported from 0515f70 (main): vdb/milvus.yaml -> infra/compose/milvus/ milvus.yaml; quick_start/vdb -> infra/quick_start/vdb; .env -> infra/compose.
Shipped defaults (root_password, POSTGRES_PASSWORD=root, AUTH_TOKEN=OpenRAG,
minioadmin) are usable credentials on any exposed deployment.
- docker-compose.yaml / quick_start: require ${POSTGRES_PASSWORD:?} (fail fast)
- conf/config.yaml: drop the "root_password" default; password must come from
the POSTGRES_PASSWORD env var
- docs/assets/compose_ollama_cpu.yaml: require AUTH_TOKEN, POSTGRES_PASSWORD
and MinIO credentials via env instead of the weak literals
- document POSTGRES_PASSWORD in .env.example
The Helm chart's Postgres password is addressed in the N7 commit (moved to a
Secret) alongside the rest of the chart hardening.
Forward-ported from 0164b82 (main) into the infra/ layout.
The Milvus services disabled syscall filtering entirely (security_opt: seccomp:unconfined), widening container-escape surface. Remove the override so the default seccomp profile applies. The refactor split the compose Milvus into milvus.yaml + milvus.named-volumes.yaml, so the override is dropped from both plus quick_start/vdb/milvus.yaml. Forward-ported from b8002ef (main) into the infra/ layout.
…, M2) The Postgres password was rendered into the world-readable ConfigMap (env.config) and defaulted to root_password. Move POSTGRES_PASSWORD to env.secrets (rendered into the Opaque Secret, which the deployment already mounts via secretRef) and replace the default with an obvious placeholder that must be overridden at install time. Forward-ported from 24efaa6 (main) into infra/charts/openrag-stack/values.yaml.
…k8s)
The chart shipped no NetworkPolicy, so any pod in the cluster could reach the
unauthenticated Ray dashboard (8265), GCS (6379), Ray client (10001), Postgres
and Milvus. Add a default-deny-ingress NetworkPolicy (podSelector: {}) that
allows only intra-namespace traffic plus the configurable public HTTP ports
(8080, 3000). Gated by networkPolicy.enabled (default true). This is the
in-cluster isolation referenced by the C3 fix for the KubeRay dashboard.
Forward-ported from c8f2d47 (main) into infra/charts/openrag-stack/.
The metrics compose exposed Prometheus and exporters on 0.0.0.0 and enabled Prometheus' unauthenticated lifecycle endpoints. - Prometheus: bind 127.0.0.1:9090 and drop --web.enable-lifecycle (unauthenticated /-/reload and /-/quit). - node-exporter (host pid + rootfs) and nvidia-gpu-exporter: bind to 127.0.0.1. Prometheus scrapes them over the compose network by service name, so localhost binding doesn't affect scraping. Forward-ported from 5caec50 (main): openrag_metrics/docker-compose.yaml -> infra/compose/monitoring.docker-compose.yaml.
Tracking :latest makes deploys non-reproducible and silently pulls unreviewed images. Pin the OpenRAG-owned chart images to the release version and the metrics stack to specific stable versions: - charts: openrag, openrag-ray, indexer-ui -> 1.1.13 (this branch's release; main pinned 1.1.11) - metrics: prometheus v2.54.1, grafana 11.2.2, node-exporter v1.8.2 Operator-supplied model-serving images (vLLM engines, infinity reranker) carry a comment to pin to a specific release/digest before production. NOTE: the compose / quick_start openrag-image pins from the original commit are NOT ported — main reverted those to :latest in 64c3e72 (already the refactor state), so only the chart + metrics pins remain in main's final state. Forward-ported from d7fc313 (main): openrag_metrics -> infra/compose/ monitoring.docker-compose.yaml; charts -> infra/charts.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThe PR updates authentication, partition handling, rate limiting, Ray startup, documentation, compose and Helm defaults, and several security-related service behaviors. It also adds a forward-port log entry. ChangesAPI and Service Security
Service-Layer Security and Content Processing
Documentation and Deployment
Forward-port Record
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…bypass When AUTH_MODE=token and AUTH_TOKEN was unset, the middleware silently treated every request as admin user 1 — a fail-open that turns a single missing env var into a world-open admin API. Gate this dev bypass behind an explicit ALLOW_NO_AUTH=true opt-in (with a loud warning); without it a missing AUTH_TOKEN no longer fails open and unauthenticated requests are denied. Tests added under tests/unit/api/middleware/test_bypass_config.py (the refactor has no equivalent of the legacy components/auth/test_middleware.py dev-bypass suite) using the existing dispatch harness. Forward-ported from 7bc4669 (main): openrag/components/auth/middleware.py -> openrag/api/middleware/auth.py (resolves admin via the injected auth service).
…ent) UserUpdate used extra="allow" and the update payload flowed into the repo's broad column setter. A caller could set columns never meant to be writable via the update — e.g. token (overwrite a victim's auth token to a known value), file_count (bypass quota), or id. - core/models/user.py: switch UserCreate/UserUpdate to extra="ignore" so unknown fields are dropped at parse. - services/orchestrators/user_service.update_user: whitelist writable profile fields (display_name, external_user_id, email, is_admin, file_quota) before calling the repo. The repo's update_user still accepts token/file_count for internal callers (token rotation, quota), so the boundary guard lives in the service rather than tightening the shared repo method. Forward-ported from 202433d (main). Partial on arrival: the refactor already had a column whitelist in the repo, but it included token/file_count and the DTOs still allowed extra fields; this closes both.
…ex collision (#121) An empty/whitespace external_user_id written to the unique index collides (Postgres permits many NULLs but only one empty string). Add a UserBase field_validator that coerces "" / whitespace to None. The refactor already normalized this on the create path in the repo (user_repo.create_user), but not on update; placing the validator on the shared UserBase DTO covers both UserCreate and UserUpdate. Forward-ported from edd2c7c (main): openrag/models/user.py -> openrag/core/models/user.py.
Source file links embedded the credential as ?token=<api_key>, leaking it to browser history, proxy logs and Referer headers. In OIDC mode the browser already sends the openrag_session cookie on same-origin file fetches (the auth middleware checks the cookie first for /static), so the token query param is redundant — drop it there. Token mode keeps it (no cookie exists). Forward-ported from 229503b (main): openrag/app_front.py. Note: the sibling H7 commit 47b8cd3 (fail-fast on missing CHAINLIT_AUTH_SECRET) was NOT ported — the refactor already fail-fasts (stricter, #380/b63a9825) and guards it with a regression test; 47b8cd3 only re-adds an ALLOW_NO_AUTH-gated default-secret fallback, a loosening the refactor deliberately rejected.
The N8 fix (don't bind-mount source in prod) comments out the ../../openrag:/app/openrag dev mount, so the default compose no longer lists it. Assert it is absent rather than present. Follow-up to the forward-port of 645128d (N8).
- uv.lock: bump the openrag package version pin to match pyproject (1.1.13). - user_service.py: ruff-format the _UPDATABLE_USER_FIELDS constant onto one line.
Record ported commits (source hash → new-layout target), skipped commits (already present / superseded / deliberately not loosened), and the remaining TODO queue (OIDC crypto, RAG/retrieval batch, deps, deferred compose non-root + docs) so the effort is auditable and resumable.
…out (M9) OIDC back-channel logout tokens are short-lived single-use tokens, but the verifier defaulted a missing exp to "now + 1" (so an absent exp never expired) and ignored jti entirely. - verify_logout_token now requires exp (and rejects expired tokens) and requires jti, both per the OIDC back-channel logout spec; exp is surfaced on LogoutTokenClaims. - AuthService.handle_backchannel_logout records consumed jti -> exp in a pruned in-process cache and rejects replays (defence-in-depth; logout is idempotent so this complements rather than replaces the DB revocation). The refactor routes the handler through the service, so the cache lives there (not the router as on main). Forward-ported from 97c624e (main): openrag/components/auth/oidc_client.py -> services/auth/oidc_client.py; routers/auth.py replay cache -> AuthService. The test follow-up cdb3edc (exp on the logout test token) is subsumed: the refactor's service tests use a fake client, and the added replay test carries exp.
…cation (crypto) ID-token (and logout-token) exp was checked with zero tolerance and nbf was not validated at all. Add a 60s clock-skew leeway to the exp checks and honour nbf (reject tokens whose not-before is in the future beyond the leeway). Forward-ported from 2b34a0d (main): services/auth/oidc_client.py.
GET /auth/logout is state-changing (revokes the session) and cookie-authed, so a forged request (e.g. <img src=".../auth/logout">) could force-logout a user. It must stay a GET to support the OIDC RP-initiated logout redirect, so add a Fetch-Metadata guard that rejects cross-site non-navigation requests (the silent CSRF vector) while allowing genuine top-level navigations and same-origin calls. Forward-ported from c2fde13 (main): routers/auth.py -> api/routers/auth/oidc.py.
…486) Rotating a user's API token left their active OIDC browser sessions valid, so a rotation didn't fully cut off access. After regenerate_token, revoke the user's OIDC sessions. The refactor already had the other half of the original commit: ensure_admin_user no longer rotates the stored token on startup when AUTH_TOKEN is unset, and the OIDC session repo already exposes revoke_by_user. This wires that revocation into the token-regen flow via a thin AuthService.revoke_user_oidc_sessions, called by UserService.regenerate_token (which already composes AuthService). Forward-ported from 9a73200 (main): components/indexer/vectordb/utils.py -> services/orchestrators/{auth_service,user_service}.py.
|
The empty-partition fix still has one hole. The new guard checks the dependency value for So this needs to fail closed for the omitted-parameter case too, not only for an explicit |
|
The auth-failure limiter is closer, but it still runs too late for invalid tokens. For a bad bearer token, the middleware first asks the auth service to resolve the token, and only after that lookup fails does it increment/check the failure limiter. Once the IP is over limit, later bad-token requests still perform the same token lookup before returning 429. That means the route is protected, but the auth lookup path is not. If the goal is to bound brute-force or noisy invalid-token traffic, the limiter needs a pre-check before the session/user-token lookup, then record the failure after the lookup only when it was actually invalid. |
|
One smaller docs mismatch remains. The API docs still describe That mismatch will lead users to think they can route a RAG request to their own LLM endpoint when the server will keep using the configured backend. The docs should describe the new behavior and update the example accordingly. |
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
openrag/services/persistence/partition_repo.py (1)
50-78: 🗄️ Data Integrity & Integration | 🟠 Major | 🏗️ Heavy liftDon't treat a race-lost create as a successful create.
This flow can return an existing row when another request wins the same-name create first.
PartitionService.create_partition()then continues intoupdate_partition(...), so the losing request can overwrite preset/config fields on a partition it did not create instead of gettingPARTITION_EXISTS. The advisory lock here is keyed only byuser_id, so concurrent creates from a different user/admin/no-auth path still reach this branch.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@openrag/services/persistence/partition_repo.py` around lines 50 - 78, The create flow in partition_repo.py is treating a race-lost lookup as a successful creation, which lets PartitionService.create_partition() proceed into update_partition() for a partition it did not win. Update the create path around the existing fetchrow/insert logic so that if the initial SELECT finds an existing partition, the method raises the same PARTITION_EXISTS error instead of returning that row; only the INSERT branch should represent a successful create. Also review the advisory lock keyed by user_id so it does not mask cross-user concurrent creates from reaching this branch.
🧹 Nitpick comments (1)
openrag/api/routers/admin/partitions.py (1)
286-293: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winBind context on the quota-config failure log.
This new error path logs an unstructured message, so the bad setting and caller context are harder to trace in production.
♻️ Proposed fix
try: max_owned = max_partitions_for_user(user) except ConfigError as exc: - logger.error("Invalid MAX_PARTITIONS_PER_USER value") + logger.bind( + config_key="MAX_PARTITIONS_PER_USER", + user_id=user_id, + ).error("Invalid MAX_PARTITIONS_PER_USER value") raise HTTPException( status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, detail=exc.message, ) from excAs per coding guidelines, "Use Loguru structured logging via
get_logger()and bind contextual fields instead of emitting unstructured logs."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@openrag/api/routers/admin/partitions.py` around lines 286 - 293, The quota-config failure path in the partition admin handler logs an unstructured message; update the ConfigError branch in the partitions router to use Loguru via get_logger() and bind contextual fields instead of logger.error. Include the invalid MAX_PARTITIONS_PER_USER value and relevant caller context in the bound log entry so the failure can be traced consistently while still raising the existing HTTPException with exc.message.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/content/docs/documentation/user_auth.md`:
- Around line 13-15: Update the auth documentation bullets in user_auth.md so
the fail-closed behavior is explicitly tied to token mode only: in the section
describing AUTH_TOKEN, AUTH_MODE, and ALLOW_NO_AUTH, clarify that missing
AUTH_TOKEN causes OpenRAG to fail closed only when AuthMiddleware is running
with AUTH_MODE=token, while OIDC mode does not use AUTH_TOKEN. Keep the wording
aligned with the existing auth symbols AUTH_TOKEN, AUTH_MODE, and ALLOW_NO_AUTH,
and make the same clarification in the duplicated section referenced by the
review.
In `@openrag/api/middleware/auth.py`:
- Around line 127-128: The warning in the auth rate-limit path logs
request-derived fields directly and bypasses the repo’s structured logging
pattern. Update the auth middleware to use get_logger() with bound contextual
fields instead of passing raw path/identity kwargs, and normalize/sanitize
request.url.path and identity before binding them in the Auth failure rate limit
exceeded log. Keep the logging call in the same failure branch so the new bound
logger is used consistently with the rest of the module.
- Around line 108-111: The AuthMiddleware initialization currently parses the
rate limit unconditionally, so a malformed RATE_LIMIT_AUTH_FAILURE or
RATE_LIMIT_AUTH can raise ValueError even when RATE_LIMIT_ENABLED is false.
Update AuthMiddleware.__init__ to only call parse() and assign _limit when
self.enabled is true, otherwise set _limit to None. Then update
response_if_limited to safely handle a None _limit case by skipping limiter
logic and returning the normal response path when the limiter is disabled.
In `@openrag/services/orchestrators/indexing_service.py`:
- Around line 147-152: The ownership cap in indexing is being recomputed from a
partial user dict, which drops the caller’s admin status and can make
auto-created partitions hit the limit. Update the `IndexingService.add_file()`
flow and the `create_partition` call path so `is_admin` is preserved from the
caller (or `max_owned` is passed explicitly) instead of rebuilding quota state
from `user`; make sure `MCPService.index_url()`’s user payload still results in
admin-cap behavior for super-admin uploads.
In `@openrag/services/orchestrators/mcp_service.py`:
- Around line 186-187: The auto-create quota path in _ensure_partition_exists()
is rebuilding cap_user with is_admin=False for authenticated callers, which
downgrades admins and applies the wrong partition cap. Use the admin flag
already resolved by _resolve_principal() when constructing the user passed to
max_partitions_for_user(), or compute max_owned before calling
self._partitions.create_partition() so MCP matches the behavior in
openrag/api/routers/admin/partitions.py.
---
Outside diff comments:
In `@openrag/services/persistence/partition_repo.py`:
- Around line 50-78: The create flow in partition_repo.py is treating a
race-lost lookup as a successful creation, which lets
PartitionService.create_partition() proceed into update_partition() for a
partition it did not win. Update the create path around the existing
fetchrow/insert logic so that if the initial SELECT finds an existing partition,
the method raises the same PARTITION_EXISTS error instead of returning that row;
only the INSERT branch should represent a successful create. Also review the
advisory lock keyed by user_id so it does not mask cross-user concurrent creates
from reaching this branch.
---
Nitpick comments:
In `@openrag/api/routers/admin/partitions.py`:
- Around line 286-293: The quota-config failure path in the partition admin
handler logs an unstructured message; update the ConfigError branch in the
partitions router to use Loguru via get_logger() and bind contextual fields
instead of logger.error. Include the invalid MAX_PARTITIONS_PER_USER value and
relevant caller context in the bound log entry so the failure can be traced
consistently while still raising the existing HTTPException with exc.message.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 24329baa-b831-4726-9fe6-7a43bbd1dae1
📒 Files selected for processing (26)
docs/assets/compose_ollama_cpu.yamldocs/content/docs/documentation/API.mdxdocs/content/docs/documentation/env_vars.mddocs/content/docs/documentation/user_auth.mdinfra/compose/.env.exampleopenrag/api/dependencies/auth.pyopenrag/api/dependencies/llm.pyopenrag/api/mcp/server.pyopenrag/api/middleware/auth.pyopenrag/api/routers/admin/partitions.pyopenrag/core/ports/partition_repo.pyopenrag/core/utils/partition_limits.pyopenrag/services/orchestrators/indexing_service.pyopenrag/services/orchestrators/mcp_service.pyopenrag/services/orchestrators/partition_service.pyopenrag/services/persistence/partition_repo.pytests/unit/api/dependencies/test_auth.pytests/unit/api/dependencies/test_llm.pytests/unit/api/mcp/test_server.pytests/unit/api/middleware/test_bypass_config.pytests/unit/infra/test_compose_storage.pytests/unit/services/orchestrators/test_indexing_service.pytests/unit/services/orchestrators/test_mcp_service.pytests/unit/services/orchestrators/test_partition_preset_resolution.pytests/unit/services/orchestrators/test_partition_service.pytests/unit/services/persistence/test_partition_repo.py
✅ Files skipped from review due to trivial changes (2)
- docs/content/docs/documentation/API.mdx
- tests/unit/services/orchestrators/test_indexing_service.py
🚧 Files skipped from review as they are similar to previous changes (2)
- infra/compose/.env.example
- tests/unit/infra/test_compose_storage.py
|
Concurrent partition auto-creation still needs one guard. When two first uploads target the same missing partition at the same time, both requests can pass the missing-partition check and then race into partition creation. The loser can now fail with a partition-already-exists error, even though the partition is available and the upload could safely continue. This path should treat that specific conflict as a successful create by refreshing the partition cache and continuing. Otherwise a normal concurrent upload can become a user-visible indexing failure. |
|
The named-volume Milvus profile now asks for a different MinIO credential pair than the main compose example documents. The default The profile should either reuse the documented variables or the named-volume env example needs to clearly define the extra required values. |
|
Back-channel logout should not require Some OIDC providers send spec-compliant back-channel logout tokens without
|
…cess The partition auto-create race guard added for #568 re-verifies editor membership after a concurrent create lost the race. It raised a builtin `PermissionError`, which the global exception handlers don't map (only `OpenRAGError` subclasses get their declared status) — so the catch-all turned a legitimate authorization denial into an HTTP 500. Raise `AuthError` (status 403) instead, so a cross-user create race that the loser isn't a member of returns a clean Forbidden. Same-user concurrent uploads are unaffected (they own the partition and continue). Test asserts the raised error carries `status_code == 403`.
|
@coderabbitai review |
✅ Action performedReview finished.
|
In Ray Serve mode the API and Chainlit run on separate ports (the API via
`serve.run`, Chainlit via its own `uvicorn.run` on `chainlit_port`). Source
previews rewrite their file download links to the browser origin — which in
that topology is the Chainlit host, not the API host. But the standalone
Chainlit app (`chainlit_api.app`) only mounted the Chainlit UI, so the
authorized `/static/{extract_id}` download route was absent there and every
PDF/image/audio source preview 404'd.
Mount `download_router` on the standalone Chainlit app so the rewritten
links resolve. The route reuses the same `AuthMiddleware` (cookie in OIDC
mode — cookies aren't port-scoped, so the session cookie set by the API is
sent to the Chainlit port too — and `?token=` in token mode) and resolves
its services from the container the middleware initializes. The mounted
(`/chainlit`) deployment is unaffected: the UI shares the API origin, which
already serves this route.
Adds a regression test asserting the standalone app exposes the route.
|
@coderabbitai review |
✅ Action performedReview finished.
|
hedhoud
left a comment
There was a problem hiding this comment.
I checked the detached Ray actor compatibility concern against the current PR head (18050a2). I don’t think that P1 applies to this version of the PR.
The current dispatcher still routes indexing through the existing pool method, so a leftover detached IndexerPool from a previous rollout would not hit the described submit/process_file mismatch. Also, the worker pool files mentioned in that concern are not part of the current PR diff against refactor/hexagonal.
I also reran the worker-focused tests around dispatcher/pool/bootstrap behavior, and they pass locally. So from this specific angle, I don’t see a live-upgrade blocker in the current head.
…p priv Builds on the APP_UID build arg: matching the host UID isn't enough on its own, because Docker creates a missing bind-mount source (data/, logs/, the HF cache) as root:root, which the non-root container still can't write — so a fresh deploy fails with "[Errno 13] Permission denied" on upload, logging, or model download. Make it work with zero host-side steps: - entrypoint.sh: when started as root, chgrp 0 + chmod g+w the writable bind mounts (mirroring the image's own chmod g=u), then drop to the non-root app user via setpriv before launching. A non-root start (OpenShift's arbitrary UID in GID 0) skips the fix and runs directly. - docker-compose.yaml: run the openrag service as user "0:0" so the entrypoint can fix perms before dropping privileges. - Dockerfile: ENV APP_UID so the entrypoint knows which UID to drop to; USER stays non-root for standalone/OpenShift builds.
…get-or-create Two failures surfaced with ENABLE_RAY_SERVE=true: 1. @serve.ingress(app) cloudpickles the FastAPI app to ship it to replica processes. loguru's file sink isn't picklable (an open file handle, and with enqueue=True a multiprocessing.SimpleQueue), and the app graph (lifespan, exception handlers) captures the module-global logger by value — so binding crashed with "SimpleQueue objects should only be shared between processes through inheritance". Strip the logger's sinks in the driver before binding (a handler-less logger pickles fine) and restore them right after; replica processes re-add their own sinks via the module-level get_logger() that runs on import, matching loguru's per-process handler model. 2. With multiple replicas, each runs worker bootstrap in its lifespan and they raced on get_or_create_actor: get_actor() returned not-found, then create collided with a concurrent replica that created it first, crashing startup with "Failed to look up actor 'llmSemaphore'". Use Ray's atomic get_if_exists=True so concurrent callers all attach to a single actor.
In AUTH_MODE=oidc, /docs, /redoc and /openapi.json no longer serve the full route + schema surface to anonymous callers: an unauthenticated browser is 302-redirected to /auth/login, while a valid session (or a users.token bearer) still renders them. Token mode is unchanged — docs stay public there, since a browser has no login flow to bounce through and the legacy bypass contract must hold; health/version probes and API routes are unaffected. The gated set is AuthBypassConfig.oidc_gated_paths (default the three docs paths); override it to () to keep docs public under OIDC.
In Ray Serve mode the standalone Chainlit app runs in its own uvicorn process, separate from the API replicas that own the initialized ServiceContainer. It lazily built a container but never called initialize(), so its asyncpg pool was never opened. In token mode the /chainlit bypass never touches the DB, but in OIDC mode the AuthMiddleware does a session lookup for /chainlit/ HTML loads when a session cookie is present — hitting the un-opened pool and raising "ConnectionManager.initialize() has not been called" (500 on every page load once the user holds an openrag_session cookie). Give the standalone app its own lifespan that builds and initializes the container (best-effort, degrading to 503 like api.main), and add a regression test for the cookie-present OIDC path.
What
Forward-ports the changes that landed on
mainafter the refactor branch point onto the new hexagonal layout (core/ services/ api/ di/+infra/). A literalgit rebasewas the wrong tool (it would replay the refactor's ~465 commits onto main); instead eachmaincommit is re-expressed against the new file structure, skipping anything already fixed and obviating anything the new architecture no longer needs.Of the 68 non-merge commits on
mainsince the merge-base, this branch lands 59 commits: every source commit is ported, obviated (with analysis), already-present, or (one) deliberately deferred. Each commit is 1:1 with itsmainsource and carries aForward-ported from <hash>trailer. Full per-commit accounting (including new-vs-old file mapping and skip reasons) is inFORWARD_PORT_LOG.md.By area
RAY_ADDRESS), non-root OpenShift Dockerfiles, no prod bind-mount /--reload, MinIO/DB/AUTH default removal, Helm DB-password→Secret + default-deny NetworkPolicy, seccomp drop, metrics lockdown, image-tag pins, Ansible hardening, version → 1.1.13.ALLOW_NO_AUTHgate on the no-token admin bypass,update_usermass-assignment whitelist,external_user_id→NULL, no session-token in UI file URLs, back-channel-logout exp/jti + replay cache, clock-skew/nbf, logout CSRF (Fetch-Metadata), OIDC-session revocation on token regen.llm_overridecredential/SSRF strip, control-token neutralizer, surrounding-chunk partition scoping (cross-tenant leak), token-limit +n/best_ofbounds, stack-trace leak, partition-scoped source-download authz (the open/staticmount was still live), EML attachment-fan-out cap, web/SVG SSRF hardening, cap-partitions-per-user, copy-endpoint validation, streamingfinish_reasonfix, non-empty-answer prompt rule.limits).CHAINLIT_PORTnote, OIDC/SSO quick-start frontmatter + pointer fixes.Obviated (verified, with analysis — not a live problem on this branch)
199424bfempty-stream 502 → non-streaming uses materializedchat()/generate()+InferenceError(502); no__anext__path.70a2db36doc-loader page bug →CustomDocLoaderremoved by the parser-shim cleanup.63a857afimage-URL SSRF → the refactor captions extracted bytes; a document's remote image URL is never fetched nor forwarded to the VLM.Several ports involved real judgment where the refactor differed (output-escaping vs. input-allowlist for Milvus, repo-abstraction vs. atomic-session for the partition cap, the source-download rekeyed to chunk-id, and not re-loosening the refactor's stricter
CHAINLIT_AUTH_SECRETpolicy). See the log for rationale.Not included — one item, by deliberate choice
4bbefd41compose non-root rework (init-perms bootstrap container + per-serviceuser:mappings). It's all-or-nothing, runtime-only (no CI/unit feedback — compose isn't exercised in tests), depends on fragile path-mapping ontoinfra/compose/, and is the least security-critical item (the images already build non-root via the Dockerfiles ported here). Committing it on faith risks a silently broken stack, so it's left for a follow-up that candocker compose upto verify. Target files + notes are inFORWARD_PORT_LOG.md.Verification
tests/unit/— 1355 passed. (One failure,test_seed_defaults_preserves_endpoint_api_keys, is an environmental artifact of the local nested-worktree.env, not a code defect — passes in a non-nested checkout / CI.)create_legacy_usersignature) is a pre-existing base-branch bug untouched by this PR.429/per-tier budgets, logout CSRF403/302, download authz403/404.… Auth → RateLimit → route),download_sourceroute present, open/staticmount removed.ruff check+ruff format --checkclean; layer-import guard OK.🤖 Generated with Claude Code
Summary by CodeRabbit