Skip to content

feat: add configurable pipeline preloading support#115

Draft
juburr wants to merge 2 commits intodocling-project:mainfrom
juburr:main
Draft

feat: add configurable pipeline preloading support#115
juburr wants to merge 2 commits intodocling-project:mainfrom
juburr:main

Conversation

@juburr
Copy link
Copy Markdown

@juburr juburr commented Mar 31, 2026

Add a preload_formats field to DoclingConverterManagerConfig that accepts a list of InputFormat names (e.g. ["pdf", "audio"]) whose ML pipelines should be eagerly initialized at startup. This keeps models resident in GPU/CPU memory and eliminates cold-start latency on the first request for each configured format.

The new preload_additional_formats() method on
DoclingConverterManager is called from all startup paths:

  • LocalOrchestrator.warm_up_caches(): validates the config and pre-warms the orchestrator's converter manager before the readiness gate opens, regardless of shared_models.
  • AsyncLocalWorker.loop(): pre-warms each non-shared worker's own converter manager via asyncio.to_thread to avoid blocking the event loop during heavy model loading.
  • CustomRQWorker.__init__(): pre-warms the worker-local converter manager that persists across all jobs in the process.

Configured formats are treated as required: unknown format names or initialization failures raise at startup rather than silently degrading at request time.

Add a `preload_formats` field to `DoclingConverterManagerConfig` that
accepts a list of `InputFormat` names (e.g. `["pdf", "audio"]`) whose
ML pipelines should be eagerly initialized at startup.  This keeps
models resident in GPU/CPU memory and eliminates cold-start latency
on the first request for each configured format.

The new `preload_additional_formats()` method on
`DoclingConverterManager` is called from all startup paths:

- `LocalOrchestrator.warm_up_caches()`: validates the config and
  pre-warms the orchestrator's converter manager before the readiness
  gate opens, regardless of `shared_models`.
- `AsyncLocalWorker.loop()`: pre-warms each non-shared worker's own
  converter manager via `asyncio.to_thread` to avoid blocking the
  event loop during heavy model loading.
- `CustomRQWorker.__init__()`: pre-warms the worker-local converter
  manager that persists across all jobs in the process.

Configured formats are treated as required: unknown format names or
initialization failures raise at startup rather than silently
degrading at request time.

Signed-off-by: Justin Burr <juburr@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 31, 2026

DCO Check Passed

Thanks @juburr, all your commits are properly signed off. 🎉

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Mar 31, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

juburr added a commit to juburr/docling-serve that referenced this pull request Mar 31, 2026
Add a `DOCLING_SERVE_PRELOAD_PIPELINES` setting that controls which ML
pipelines are eagerly initialized at startup, keeping models resident
in GPU/CPU memory between requests.  Defaults to `["pdf"]` (matching
current behavior).  Setting it to `["pdf", "audio"]` pre-loads the
Whisper ASR model alongside the standard PDF pipeline, eliminating
~5s of cold-start latency on the first audio transcription request.

The setting accepts JSON arrays or comma-separated strings, supports
YAML config files, and is gated on `LOAD_MODELS_AT_BOOT=true`.  The
normalized list is passed as `preload_formats` to the
`DoclingConverterManagerConfig` in both the local orchestrator factory
and the RQ worker startup path.

Includes documentation in `docs/configuration.md` with a topology
behavior table covering local shared, local non-shared, and RQ
deployment modes.

Depends on docling-jobkit `preload_formats` support
(docling-project/docling-jobkit#115).

Signed-off-by: Justin Burr <juburr@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@juburr juburr marked this pull request as draft March 31, 2026 13:41
- preload additional formats with the same default options used by live
  requests so warmed converters hit the correct cache keys
- move worker-side preload off the asyncio event loop to prevent local
  startup from blocking during model initialization
- validate preload formats in non-shared mode without warming an extra
  unused orchestrator-side model copy
- reapply OMP_NUM_THREADS on worker threads before preload and
  conversion so PyTorch thread settings do not drift and degrade
  steady-state ASR performance
- update local orchestrator tests for the non-shared validation path

Signed-off-by: Justin Burr <juburr@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant