refactor(ray): drop obsolete indexer env vars, make Milvus timeout configurable#566
Conversation
…nfigurable
The hexagonal indexer refactor left several Ray/indexer config knobs with no
remaining consumers. Remove them, restore a real multi-actor indexer pool, and
expose the previously-hardcoded Milvus client timeout as config.
Removed env vars (concern re-homed or dead):
- RAY_NUM_GPUS — wired to no actor on either branch
- RAY_MAX_TASK_RETRIES — retries now per-parser (retry_with_backoff)
- INDEXER_SERIALIZE_TIMEOUT — superseded by per-parser MARKER/DOCLING timeouts
- VECTORDB_TIMEOUT — replaced by VDB_TIMEOUT
- RAY_SEMAPHORE_CONCURRENCY — real throttles are LLM_SEMAPHORE / VLM_SEMAPHORE
- INDEXER_{DEFAULT,UPDATE,SEARCH,DELETE,SERIALIZE,CHUNK,INSERT}_CONCURRENCY
— concurrency groups are inert now that the indexer exposes a single method
Added:
- VDB_TIMEOUT (vectordb.timeout, default 120s) replaces the hardcoded
MilvusVectorStore client timeout (was 60s), on both sync and async clients
Pool:
- RAY_POOL_SIZE now spawns pool_size IndexerWorkerActor instances with
least-loaded dispatch (ObjectRef preserved for cancellation + task state);
pool_size and max_tasks_per_worker move under ray.indexer with ge=1
validation. Env var names are unchanged.
Also drops redundant *_API_KEY os.getenv double-reads in model-endpoint
seeding — the loader already maps those vars into Settings.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThe PR adds ChangesConfiguration, docs, and runtime wiring
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
hedhoud
left a comment
There was a problem hiding this comment.
I found two things I’d like to address before merging.
First, the new pool balancing looks local to each API process. That means with multiple API_NUM_WORKERS, each process keeps its own view of in-flight work. During a burst, several API workers can all think the same indexer actor is the least loaded and send work there, while the rest of the pool stays underused. Since the PR’s main goal is to restore real multi-actor indexing capacity, I think the dispatch state should either live in a shared Ray-side component, or use a distribution strategy that does not depend on per-process counters.
Second, the env var documentation still looks inconsistent after the cleanup. RAY_MAX_TASKS_PER_WORKER appears with two different defaults, and RAY_SEMAPHORE_CONCURRENCY is still documented even though this PR removes it from config loading. Because this PR is intentionally removing obsolete knobs, keeping stale docs will make migration harder for deployments.
The targeted tests pass locally and CI is green, so this looks close. I’d just like these two points handled or explicitly clarified before merge.
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/content/docs/documentation/env_vars.md (1)
379-400: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick winKeep the Ray defaults consistent across both tables.
RAY_MAX_TASKS_PER_WORKERis documented as8in the General Ray Settings table and50in the Indexer Configuration table. Please pick one source of truth or remove the duplicate row, otherwise readers get conflicting defaults for the same env var.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/content/docs/documentation/env_vars.md` around lines 379 - 400, The documentation has conflicting defaults for the same env var, specifically RAY_MAX_TASKS_PER_WORKER in the Ray settings table and the Indexer Configuration table. Make the value consistent by choosing one source of truth and updating the duplicate row in env_vars.md, or remove the redundant entry so the README only documents a single default for RAY_MAX_TASKS_PER_WORKER.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/content/docs/documentation/deploy_ray_cluster.md`:
- Around line 56-57: Update the sizing guidance in the deployment docs so it
consistently refers to indexing concurrency via RAY_POOL_SIZE and
RAY_MAX_TASKS_PER_WORKER, and remove or reword the leftover GPU memory caution
so it no longer suggests subtracting GPU memory from the pool size. Keep the
guidance aligned with the Ray cluster sizing section by adjusting the
surrounding prose in the deploy_ray_cluster document rather than introducing
mixed CPU/GPU budgeting rules.
In `@openrag/core/config/infrastructure.py`:
- Around line 23-24: The VectorDBConfig timeout field currently allows zero or
negative values, so tighten validation at the model level by adding a
positive-only Field constraint to timeout in VectorDBConfig. Update the timeout
declaration in infrastructure.py so config loading fails immediately for
non-positive values instead of deferring the error until Milvus client usage.
In `@openrag/services/workers/indexer_pool.py`:
- Around line 252-255: The worker selection logic in IndexerPool currently
increments _inflight before calling process_file.remote, but if that remote
submission fails the counter is never rolled back. Update the submission path in
the method that builds ref/task to wrap
self._workers[idx].process_file.remote(**kwargs) in a try/except, and on failure
decrement self._inflight[idx] before re-raising so load balancing stays
accurate.
- Around line 275-282: The IndexerWorkerActor creation path is silently reusing
detached actors via get_if_exists=True, so updated max_concurrency values are
ignored for existing actors. In the worker pool setup that builds the workers
list, change the IndexerWorkerActor.options/.remote flow to detect when an actor
already exists with a stale concurrency setting and explicitly shut it down
before recreating it, or otherwise ensure a fresh actor is created when
max_concurrency changes. Keep the fix centered around the IndexerWorkerActor and
the worker initialization logic in the pool.
---
Outside diff comments:
In `@docs/content/docs/documentation/env_vars.md`:
- Around line 379-400: The documentation has conflicting defaults for the same
env var, specifically RAY_MAX_TASKS_PER_WORKER in the Ray settings table and the
Indexer Configuration table. Make the value consistent by choosing one source of
truth and updating the duplicate row in env_vars.md, or remove the redundant
entry so the README only documents a single default for
RAY_MAX_TASKS_PER_WORKER.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ef98b9ea-a967-40b9-a0dc-394ec1e0e092
📒 Files selected for processing (16)
conf/config.yamldocs/content/docs/documentation/deploy_ray_cluster.mddocs/content/docs/documentation/env_vars.mdinfra/charts/openrag-stack/values.yamlinfra/compose/.env.ollamaopenrag/core/config/infrastructure.pyopenrag/core/config/loader.pyopenrag/services/orchestrators/model_endpoint_service.pyopenrag/services/storage/milvus_store.pyopenrag/services/workers/dispatcher.pyopenrag/services/workers/indexer_pool.pyopenrag/services/workers/task_state.pytests/unit/core/config/test_ray_config.pytests/unit/core/config/test_vectordb_config.pytests/unit/services/workers/test_dispatcher.pytests/unit/services/workers/test_indexer_pool.py
💤 Files with no reviewable changes (2)
- infra/compose/.env.ollama
- infra/charts/openrag-stack/values.yaml
|
Small clarification on my earlier pool-balancing comment: the concern is not specifically I agree that Right now the Example:
If several uploads arrive at the same time, each API replica can independently decide that So the concern is about local dispatch state across multiple API replicas. Removing |
The VectorDBConfig.timeout field accepted 0/negative values, deferring the failure to Milvus client usage. Constrain it with gt=0 so config loading fails immediately.
IndexerPool.submit incremented _inflight before calling process_file.remote. If that submission raised (unserializable args, dead actor), the count stayed elevated and skewed least-loaded dispatch. Decrement and re-raise on failure.
RAY_MAX_TASKS_PER_WORKER was documented with two defaults (8 and 50); the code default is 50, so correct the General Ray Settings row and drop the duplicate Indexer Configuration table. Also remove the stale RAY_SEMAPHORE_CONCURRENCY section — this var no longer exists after the cleanup.
The GPU-memory caution still told readers to subtract GPU memory from the pool size, but indexer workers no longer reserve GPU after the RAY_NUM_GPUS removal. Reword it to route GPU budgeting through the parser (MARKER_NUM_GPUS) and model-server settings, keeping pool sizing about indexing throughput.
The client-side IndexerPool kept per-replica in-flight counters, so multiple API processes / Ray Serve replicas each balanced load against their own view of the shared workers and could pile onto one worker during bursts. Make IndexerPool a single detached named actor (shared cluster-wide via get_if_exists) that owns the worker fleet and the in-flight counts, so least-loaded dispatch is global. submit returns the worker's ObjectRef wrapped in a one-element list so ray.cancel still targets the real task (a bare returned ref can be auto-dereferenced and block the dispatch).
|
Thanks @hedhoud — both points are addressed: 1. Pool balancing state was local per replica. You're right. I moved the dispatch state off the API side into a single detached, named On 2. Env-var docs. Fixed in 75b639a: |
Resolved conflicts: - indexer_pool.py: keep the new dispatcher-actor structure (IndexerWorkerActor + shared IndexerPool dispatcher) and fold in hexagonal's named embedder/VLM endpoint support (vlm_factory, dict-based required-model-name resolution, _global_embedder/_global_vlm fallbacks). - model_endpoint_service.py: keep hexagonal's batch_size/timeout seed fields together with the single-read api_key cleanup (no os.getenv double-read).
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/unit/services/workers/test_indexer_pool.py`:
- Around line 831-839: The async test leaves `_release` tasks pending because
`pool.submit(...)` schedules work that awaits the fake futures, so the event
loop can be torn down with unfinished tasks. In the test around `pool.submit`
and `ref0`, make sure the fake futures are completed and then await the release
tasks before the test exits, so all scheduled cleanup finishes cleanly. Apply
the same cleanup pattern in the other affected submit assertions as well, using
the existing `workers`, `pool.submit`, and future references already captured in
the test.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 422e639a-b1f7-4548-bc8f-991a260fa7b9
📒 Files selected for processing (10)
docs/content/docs/documentation/deploy_ray_cluster.mddocs/content/docs/documentation/env_vars.mdinfra/charts/openrag-stack/values.yamlopenrag/core/config/infrastructure.pyopenrag/services/orchestrators/model_endpoint_service.pyopenrag/services/workers/dispatcher.pyopenrag/services/workers/indexer_pool.pytests/unit/core/config/test_vectordb_config.pytests/unit/services/workers/test_dispatcher.pytests/unit/services/workers/test_indexer_pool.py
✅ Files skipped from review due to trivial changes (1)
- tests/unit/services/workers/test_dispatcher.py
🚧 Files skipped from review as they are similar to previous changes (4)
- infra/charts/openrag-stack/values.yaml
- openrag/core/config/infrastructure.py
- openrag/services/orchestrators/model_endpoint_service.py
- docs/content/docs/documentation/deploy_ray_cluster.md
|
One live-upgrade issue here is the same kind of Ray detached-actor compatibility concern we avoided in PR #568. In #568, the detached This should use a new detached actor name for the new dispatcher shape, or add a small compatibility/migration guard before reusing the existing actor. |
Summary
The hexagonal indexer refactor left several Ray/indexer config knobs with no remaining consumers. This PR removes the dead vars, restores a real multi-actor indexer pool, and makes the Milvus client timeout configurable.
Removed env vars
Each removed var's responsibility either moved elsewhere or never existed in this branch:
RAY_NUM_GPUSMARKER_NUM_GPUS/DOCLING_NUM_GPUS.RAY_MAX_TASK_RETRIESretry_with_backoff.INDEXER_SERIALIZE_TIMEOUTMARKER_TIMEOUT,DOCLING_TIMEOUT, …).VECTORDB_TIMEOUTVDB_TIMEOUT(see below).RAY_SEMAPHORE_CONCURRENCYLLM_SEMAPHORE/VLM_SEMAPHORE(SemaphoreConfig).INDEXER_DEFAULT_CONCURRENCY,INDEXER_UPDATE_CONCURRENCY,INDEXER_SEARCH_CONCURRENCY,INDEXER_DELETE_CONCURRENCY,INDEXER_SERIALIZE_CONCURRENCY,INDEXER_CHUNK_CONCURRENCY,INDEXER_INSERT_CONCURRENCYconcurrency_groupsonly partition concurrent calls across distinct actor methods. Post-refactor the indexer exposes a single method, so the groups are inert; contention is now handled by physical worker separation.Added
VDB_TIMEOUT(vectordb.timeout, default 120s) — replaces the hardcoded 60sMilvusVectorStoreclient timeout, applied to both the sync and async Milvus clients.Pool restored
RAY_POOL_SIZEnow spawnspool_sizeIndexerWorkerActorinstances with least-loaded dispatch (theObjectRefis preserved, soray.canceland task-state tracking keep working).pool_sizeandmax_tasks_per_workermoved underray.indexerwithge=1validation. Env var names are unchanged (RAY_POOL_SIZE,RAY_MAX_TASKS_PER_WORKER) — only the internal config path moved.Also (minor)
Drops redundant
*_API_KEYos.getenvdouble-reads in model-endpoint seeding. The config loader already mapsAPI_KEY/EMBEDDER_API_KEY/VLM_API_KEY/RERANKER_API_KEYontoSettings, soseed_defaultsreads each once vias.<type>.api_key. This also makes seeding deterministic when a local.envis loaded into the process.Migration
Deployments that set any removed var should drop it. The only rename is
VECTORDB_TIMEOUT(default 30s) →VDB_TIMEOUT(default 120s).Testing
uv run python -m pytest tests/unit/→ 1340 passeduv run ruff check→ cleantests/unit/core/config/test_ray_config.py,tests/unit/core/config/test_vectordb_config.pySummary by CodeRabbit