fix(paged-attn): resolve scheduler queue loop deadlock under memory pressure by glaziermag · Pull Request #2043 · EricLBuehler/mistral.rs

glaziermag · 2026-03-31T00:14:24Z

Description

This PR addresses a deadlock issue in PagedAttentionScheduler where the inference engine could hang during preemption under heavy KV cache memory pressure (resolves #1470).

Previously, when a sequence was preempted in the schedule method via _preempt(), it was pushed to the front of the self.waiting queue. Because the main starvation loop iterated over the queue non-destructively using self.waiting.front(), the newly preempted sequence was immediately targeted for re-evaluation. This caused an infinite loop of preemption and failed allocations.

Changes

Refactored the waiting sequence queue iteration in scheduler.rs to proactively pop_front() the sequence before checking its allocation status.
If the scheduling loop must break early (due to max_num_seqs caps or unresolvable starvation conditions requiring a later retry), the sequence is safely re-inserted via self.waiting.push_front(seq).

Testing

Reproduced the deadlock condition under a heavy simulated workload (30 concurrent requests) on a g2-standard-32 instance.
Verified that the patched logic properly filters preempted sequences to the next pipeline tick, maintaining stable throughput without triggering the infinite WARN log cycle.

Blast Radius Verification

To verify the structural integrity of the scheduler state, we ran a concurrent barrage of large Qwen2.5-3B completions on a strictly bounded 4096-token KV cache environment specifically designed to exhaust the BlockPool.

Before (Unpatched):
The engine immediately hit the KV cache bottleneck and permanently hung at 0.00 T/s as it recursively thrashed the preemption loop safely catching AllocStatus::Later:

2026-03-30T23:52:01.962646Z  WARN mistralrs_core::paged_attention::scheduler: Sequence 5 with length of 351 tokens still exceeds KV cache size even after evicting another sequence.
2026-03-30T23:52:01.962654Z  WARN mistralrs_core::paged_attention::scheduler: Sequence 5 with length of 351 tokens still exceeds KV cache size even after evicting another sequence.
... <Infinitely spammed leading to 42MB deadlocked dump> ...

After (Patched):
Under identical aiohttp saturation tests, the sequence correctly stepped down onto the waiting queue and advanced the scheduler, letting independent completions securely evaluate until block space was liberated. The load successfully finished while keeping interval processing cleanly between 1.6K ~ 2.0K T/s:

2026-03-31T00:05:18.231240Z  INFO mistralrs_core::engine::logger: Throughput (T/s) 1328.40, Prefix cache hitrate 0.00%, 4 running, 26 waiting
2026-03-31T00:05:23.231414Z  INFO mistralrs_core::engine::logger: Throughput (T/s) 1558.40, Prefix cache hitrate 0.00%, 15 running, 15 waiting
2026-03-31T00:05:38.231924Z  INFO mistralrs_core::engine::logger: Throughput (T/s) 1848.60, Prefix cache hitrate 0.00%, 15 running, 15 waiting
... <Gracefully finishes execution> ...

…ressure

fix(paged-attn): resolve scheduler queue loop deadlock under memory p…

1d15838

…ressure

glaziermag marked this pull request as ready for review March 31, 2026 00:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(paged-attn): resolve scheduler queue loop deadlock under memory pressure#2043

fix(paged-attn): resolve scheduler queue loop deadlock under memory pressure#2043
glaziermag wants to merge 1 commit intoEricLBuehler:masterfrom
glaziermag:fix-paged-attention-deadlock-1470

glaziermag commented Mar 31, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

glaziermag commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes

Testing

Blast Radius Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

glaziermag commented Mar 31, 2026 •

edited

Loading