feat: add configurable pipeline preloading support#115
Draft
juburr wants to merge 2 commits intodocling-project:mainfrom
Draft
feat: add configurable pipeline preloading support#115juburr wants to merge 2 commits intodocling-project:mainfrom
juburr wants to merge 2 commits intodocling-project:mainfrom
Conversation
Add a `preload_formats` field to `DoclingConverterManagerConfig` that accepts a list of `InputFormat` names (e.g. `["pdf", "audio"]`) whose ML pipelines should be eagerly initialized at startup. This keeps models resident in GPU/CPU memory and eliminates cold-start latency on the first request for each configured format. The new `preload_additional_formats()` method on `DoclingConverterManager` is called from all startup paths: - `LocalOrchestrator.warm_up_caches()`: validates the config and pre-warms the orchestrator's converter manager before the readiness gate opens, regardless of `shared_models`. - `AsyncLocalWorker.loop()`: pre-warms each non-shared worker's own converter manager via `asyncio.to_thread` to avoid blocking the event loop during heavy model loading. - `CustomRQWorker.__init__()`: pre-warms the worker-local converter manager that persists across all jobs in the process. Configured formats are treated as required: unknown format names or initialization failures raise at startup rather than silently degrading at request time. Signed-off-by: Justin Burr <juburr@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Contributor
|
✅ DCO Check Passed Thanks @juburr, all your commits are properly signed off. 🎉 |
Contributor
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
|
juburr
added a commit
to juburr/docling-serve
that referenced
this pull request
Mar 31, 2026
Add a `DOCLING_SERVE_PRELOAD_PIPELINES` setting that controls which ML pipelines are eagerly initialized at startup, keeping models resident in GPU/CPU memory between requests. Defaults to `["pdf"]` (matching current behavior). Setting it to `["pdf", "audio"]` pre-loads the Whisper ASR model alongside the standard PDF pipeline, eliminating ~5s of cold-start latency on the first audio transcription request. The setting accepts JSON arrays or comma-separated strings, supports YAML config files, and is gated on `LOAD_MODELS_AT_BOOT=true`. The normalized list is passed as `preload_formats` to the `DoclingConverterManagerConfig` in both the local orchestrator factory and the RQ worker startup path. Includes documentation in `docs/configuration.md` with a topology behavior table covering local shared, local non-shared, and RQ deployment modes. Depends on docling-jobkit `preload_formats` support (docling-project/docling-jobkit#115). Signed-off-by: Justin Burr <juburr@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- preload additional formats with the same default options used by live requests so warmed converters hit the correct cache keys - move worker-side preload off the asyncio event loop to prevent local startup from blocking during model initialization - validate preload formats in non-shared mode without warming an extra unused orchestrator-side model copy - reapply OMP_NUM_THREADS on worker threads before preload and conversion so PyTorch thread settings do not drift and degrade steady-state ASR performance - update local orchestrator tests for the non-shared validation path Signed-off-by: Justin Burr <juburr@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add a
preload_formatsfield toDoclingConverterManagerConfigthat accepts a list ofInputFormatnames (e.g.["pdf", "audio"]) whose ML pipelines should be eagerly initialized at startup. This keeps models resident in GPU/CPU memory and eliminates cold-start latency on the first request for each configured format.The new
preload_additional_formats()method onDoclingConverterManageris called from all startup paths:LocalOrchestrator.warm_up_caches(): validates the config and pre-warms the orchestrator's converter manager before the readiness gate opens, regardless ofshared_models.AsyncLocalWorker.loop(): pre-warms each non-shared worker's own converter manager viaasyncio.to_threadto avoid blocking the event loop during heavy model loading.CustomRQWorker.__init__(): pre-warms the worker-local converter manager that persists across all jobs in the process.Configured formats are treated as required: unknown format names or initialization failures raise at startup rather than silently degrading at request time.