Indexer parse stage has no timeout — a slow/wedged parse can hang indexing

### Summary
The indexing pipeline runs the parse stage with **no timeout**, so a slow or wedged parse can stall indexing indefinitely.

### Root cause
`services/workers/indexer_pool.py` calls `build_indexing_pipeline(...)` **without `timeouts=`**, so `PipelineTimeouts.parse` defaults to `None` and `parse_stage(..., timeout=None)` never bounds the parse.

- marker is self-protected (its pool has its own `MARKER_TIMEOUT` + retry), so it survives a bad PDF.
- pymupdf/docling have no parse timeout. pymupdf additionally serializes all work on a single shared worker thread (`_PYMUPDF_EXECUTOR`, `max_workers=1`), so one wedged parse blocks **every** subsequent pymupdf parse — the whole backend appears frozen.

### Suggested fix
Wire a parse-stage timeout so a pathological PDF fails *that file* (and is reported) instead of stalling the pipeline. Consider per-backend defaults.

Area: backend (`refactor/hexagonal`).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Indexer parse stage has no timeout — a slow/wedged parse can hang indexing #571

Summary

Root cause

Suggested fix

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Indexer parse stage has no timeout — a slow/wedged parse can hang indexing #571

Description

Summary

Root cause

Suggested fix

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions