feat: add configurable pipeline preloading support by juburr · Pull Request #115 · docling-project/docling-jobkit

juburr · 2026-03-31T03:44:02Z

Add a preload_formats field to DoclingConverterManagerConfig that accepts a list of InputFormat names (e.g. ["pdf", "audio"]) whose ML pipelines should be eagerly initialized at startup. This keeps models resident in GPU/CPU memory and eliminates cold-start latency on the first request for each configured format.

The new preload_additional_formats() method on
DoclingConverterManager is called from all startup paths:

LocalOrchestrator.warm_up_caches(): validates the config and pre-warms the orchestrator's converter manager before the readiness gate opens, regardless of shared_models.
AsyncLocalWorker.loop(): pre-warms each non-shared worker's own converter manager via asyncio.to_thread to avoid blocking the event loop during heavy model loading.
CustomRQWorker.__init__(): pre-warms the worker-local converter manager that persists across all jobs in the process.

Configured formats are treated as required: unknown format names or initialization failures raise at startup rather than silently degrading at request time.

Add a `preload_formats` field to `DoclingConverterManagerConfig` that accepts a list of `InputFormat` names (e.g. `["pdf", "audio"]`) whose ML pipelines should be eagerly initialized at startup. This keeps models resident in GPU/CPU memory and eliminates cold-start latency on the first request for each configured format. The new `preload_additional_formats()` method on `DoclingConverterManager` is called from all startup paths: - `LocalOrchestrator.warm_up_caches()`: validates the config and pre-warms the orchestrator's converter manager before the readiness gate opens, regardless of `shared_models`. - `AsyncLocalWorker.loop()`: pre-warms each non-shared worker's own converter manager via `asyncio.to_thread` to avoid blocking the event loop during heavy model loading. - `CustomRQWorker.__init__()`: pre-warms the worker-local converter manager that persists across all jobs in the process. Configured formats are treated as required: unknown format names or initialization failures raise at startup rather than silently degrading at request time. Signed-off-by: Justin Burr <juburr@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-03-31T03:44:13Z

✅ DCO Check Passed

Thanks @juburr, all your commits are properly signed off. 🎉

mergify · 2026-03-31T03:44:38Z

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

Add a `DOCLING_SERVE_PRELOAD_PIPELINES` setting that controls which ML pipelines are eagerly initialized at startup, keeping models resident in GPU/CPU memory between requests. Defaults to `["pdf"]` (matching current behavior). Setting it to `["pdf", "audio"]` pre-loads the Whisper ASR model alongside the standard PDF pipeline, eliminating ~5s of cold-start latency on the first audio transcription request. The setting accepts JSON arrays or comma-separated strings, supports YAML config files, and is gated on `LOAD_MODELS_AT_BOOT=true`. The normalized list is passed as `preload_formats` to the `DoclingConverterManagerConfig` in both the local orchestrator factory and the RQ worker startup path. Includes documentation in `docs/configuration.md` with a topology behavior table covering local shared, local non-shared, and RQ deployment modes. Depends on docling-jobkit `preload_formats` support (docling-project/docling-jobkit#115). Signed-off-by: Justin Burr <juburr@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- preload additional formats with the same default options used by live requests so warmed converters hit the correct cache keys - move worker-side preload off the asyncio event loop to prevent local startup from blocking during model initialization - validate preload formats in non-shared mode without warming an extra unused orchestrator-side model copy - reapply OMP_NUM_THREADS on worker threads before preload and conversion so PyTorch thread settings do not drift and degrade steady-state ASR performance - update local orchestrator tests for the non-shared validation path Signed-off-by: Justin Burr <juburr@users.noreply.github.com>

juburr mentioned this pull request Mar 31, 2026

feat: add PRELOAD_PIPELINES setting for format-level model pre-warming docling-project/docling-serve#556

Draft

juburr marked this pull request as draft March 31, 2026 13:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add configurable pipeline preloading support#115

feat: add configurable pipeline preloading support#115
juburr wants to merge 2 commits intodocling-project:mainfrom
juburr:main

juburr commented Mar 31, 2026

Uh oh!

github-actions bot commented Mar 31, 2026 •

edited

Loading

Uh oh!

mergify bot commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

juburr commented Mar 31, 2026

Uh oh!

github-actions bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mergify bot commented Mar 31, 2026

Merge Protections

🟢 Enforce conventional commit

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions bot commented Mar 31, 2026 •

edited

Loading