fix(embedder): plumb max_model_len into model-endpoint embedders (default 2047) by Ahmath-Gadji · Pull Request #587 · linagora/openrag

Ahmath-Gadji · 2026-06-26T15:53:40Z

Problem

The indexer (and API) embedders are built from the DB-backed model-endpoint config, whose extra carries only implementation + api_key. max_model_len is dropped, so VLLMEmbedder is constructed with max_model_len=None. That disables the truncate_prompt_tokens guard, and pooling models such as Qwen3-Embedding then 400 / hang on context-boundary inputs (vllm-project/vllm#29496).

Setting MAX_MODEL_LEN did not help: that env var only feeds the static embedder settings, not the named model-endpoint embedders the indexer/API actually use. The indexer also has its own factory (_build_embedder_factory), separate from the API container's make_component_factory, so each path dropped the value independently.

Observed in the wild: a litellm-proxied endpoint returned ContextWindowExceededError: This model's maximum context length is 2048 tokens. However, you requested 1000003 tokens. Sending truncate_prompt_tokens fixes it.

Changes

Backfill max_model_len / embed_concurrency from the static embedder settings in both factories — di/container.py (_embedder_extra_kwargs) and services/workers/indexer_pool.py (_build_embedder_factory). An explicit per-endpoint extra value still wins, so it stays presetable per endpoint.
Default max_model_len 8192 → 2047 (one below the common 2048 boundary) so a fresh deployment is safe out of the box.
Minimal logging: one DEBUG line at embedder construction (shows max_model_len), a WARNING when max_model_len is unset, and a WARNING when a batched embed fails (with how many batches completed). No per-batch / INFO noise.

Tests

test_container.py: factory backfills the two kwargs; new test_embedder_extra_overrides_backfilled_max_model_len (explicit extra wins).
test_indexer_pool.py: new test_embedder_factory_backfills_max_model_len_from_settings (backfill + explicit-wins).
Verified end-to-end: a partition pointed at an embedder endpoint with extra.max_model_len=1000 indexes with truncate_prompt_tokens=999.

Base branch: refactor/hexagonal.

Summary by CodeRabbit

Bug Fixes
- Updated embedder defaults so missing endpoint settings now inherit sensible values automatically.
- Improved handling of long-context embedder requests to reduce hangs and invalid request errors.
- Added clearer logging when embedding runs fail, including how many batch tasks completed.
Tests
- Expanded unit coverage for embedder configuration backfilling and per-endpoint overrides.

…ault 2047) The DB-backed model-endpoint config only carries endpoint/model_name/ batch_size/timeout/extra, so both the API container factory and the indexer's own factory built embedders with max_model_len=None. That disables the `truncate_prompt_tokens` guard, and pooling models (e.g. Qwen3-Embedding) then 400/hang on context-boundary inputs (vllm-project/vllm#29496) — even when MAX_MODEL_LEN is set, because that only feeds the static embedder config. - Backfill max_model_len/embed_concurrency from the static `embedder` settings in both factories (di/container.py and services/workers/indexer_pool.py); an explicit per-endpoint `extra` value still wins. - Default max_model_len 8192 -> 2047 (one below the common 2048 boundary) so a fresh deployment is safe out of the box. - Add a minimal DEBUG embedder-construction log, plus a WARNING when max_model_len is unset or when a batched embed fails.

coderabbitai · 2026-06-26T15:53:52Z

📝 Walkthrough

Walkthrough

Embedder configuration now defaults max_model_len to 2047 with validation, the factory wiring backfills missing embedder kwargs from global settings, and VLLMEmbedder logs startup parameters plus batch-failure context. Tests cover the backfill behavior and override precedence.

Changes

Embedder defaults and backfill

Layer / File(s)	Summary
Config default and validation `openrag/core/config/endpoints.py`, `conf/config.yaml`	`EmbedderConfig.max_model_len` now defaults to `2047` with a positive constraint, and the config default/comment are updated to match.
Factory kwargs backfill `openrag/di/container.py`, `openrag/services/workers/indexer_pool.py`, `tests/unit/di/test_container.py`, `tests/unit/services/workers/test_indexer_pool.py`	Missing `max_model_len` and `embed_concurrency` are filled from `settings.embedder`/`cfg.embedder` before embedder creation, and tests cover the default and override cases.
Embedder logging and failure context `openrag/services/inference/vllm_client.py`	`VLLMEmbedder` now logs its startup parameters, warns when `max_model_len` is unset, and records completed batch counts before re-raising batch exceptions.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

linagora/openrag#465: Shares the same embedder wiring and batching control path, including embed_concurrency propagation and related VLLMEmbedder behavior.
linagora/openrag#495: Also changes openrag/services/inference/vllm_client.py around max_model_len and context-boundary truncation handling.
linagora/openrag#564: Touches openrag/services/workers/indexer_pool.py embedder factory construction and endpoint resolution/caching.

Suggested labels

fix, bug

Poem

🐰 I nibbled the context edge with care,
And trimmed the tokens floating այնտեղ?
The factory shared its carrots just right,
While logs twinkled softly through the night.
Hop hop—batch storms now leave a trail of light.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 53.85% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly captures the main change: backfilling max_model_len into model-endpoint embedders and updating the default.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/embedder-max-model-len

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

🧹 Nitpick comments (1)

tests/unit/di/test_container.py (1)

571-578: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Cover embed_concurrency override precedence too.

Line 571 only locks down the explicit-override path for max_model_len. Both factories backfill two keys, so a regression where endpoint extra["embed_concurrency"] stops winning would still pass these new tests.

Suggested test extension

     def test_embedder_extra_overrides_backfilled_max_model_len(self):
         """A per-endpoint ``extra.max_model_len`` wins over the settings default."""
         settings = _settings_with_named_models()
         settings.models.embedder["embed-a"].extra["max_model_len"] = 4096
+        settings.models.embedder["embed-a"].extra["embed_concurrency"] = 9
         c = ServiceContainer(settings)

-        assert c.embedder_factory("embed-a").kwargs["max_model_len"] == 4096
+        embedder = c.embedder_factory("embed-a")
+        assert embedder.kwargs["max_model_len"] == 4096
+        assert embedder.kwargs["embed_concurrency"] == 9

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/di/test_container.py` around lines 571 - 578, The embedder factory
test currently only verifies that per-endpoint extra.max_model_len overrides the
default, but it does not cover the same precedence for embed_concurrency. Extend
test_embedder_extra_overrides_backfilled_max_model_len in test_container.py to
also assert that
ServiceContainer.embedder_factory("embed-a").kwargs["embed_concurrency"] honors
settings.models.embedder["embed-a"].extra["embed_concurrency"] over any
backfilled default, matching the precedence already exercised for max_model_len.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/unit/di/test_container.py`:
- Around line 571-578: The embedder factory test currently only verifies that
per-endpoint extra.max_model_len overrides the default, but it does not cover
the same precedence for embed_concurrency. Extend
test_embedder_extra_overrides_backfilled_max_model_len in test_container.py to
also assert that
ServiceContainer.embedder_factory("embed-a").kwargs["embed_concurrency"] honors
settings.models.embedder["embed-a"].extra["embed_concurrency"] over any
backfilled default, matching the precedence already exercised for max_model_len.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 37ad303f-ee81-47b0-8def-7f4ff09616c6

📥 Commits

Reviewing files that changed from the base of the PR and between bfdc829 and d02b2ee.

📒 Files selected for processing (7)

conf/config.yaml
openrag/core/config/endpoints.py
openrag/di/container.py
openrag/services/inference/vllm_client.py
openrag/services/workers/indexer_pool.py
tests/unit/di/test_container.py
tests/unit/services/workers/test_indexer_pool.py

hedhoud · 2026-06-26T18:48:34Z

@codex review

chatgpt-codex-connector · 2026-06-26T18:54:19Z

Codex Review: Didn't find any major issues. You're on a roll.

Reviewed commit: d02b2ee3fc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

coderabbitai Bot added bug Something isn't working fix Fix issue labels Jun 26, 2026

coderabbitai Bot reviewed Jun 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(embedder): plumb max_model_len into model-endpoint embedders (default 2047)#587

fix(embedder): plumb max_model_len into model-endpoint embedders (default 2047)#587
Ahmath-Gadji wants to merge 1 commit into
refactor/hexagonalfrom
fix/embedder-max-model-len

Ahmath-Gadji commented Jun 26, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 26, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

hedhoud commented Jun 26, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Ahmath-Gadji commented Jun 26, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Changes

Tests

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

hedhoud commented Jun 26, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Ahmath-Gadji commented Jun 26, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 26, 2026 •

edited

Loading