Skip to content

Metal: fix CPU readback race under concurrent command submission#3595

Merged
EricLBuehler merged 2 commits into
mainfrom
metal-concurrent-readback-fix
Jun 10, 2026
Merged

Metal: fix CPU readback race under concurrent command submission#3595
EricLBuehler merged 2 commits into
mainfrom
metal-concurrent-readback-fix

Conversation

@EricLBuehler

Copy link
Copy Markdown
Member

CPU readbacks (MetalStorage::to_cpu, QMetalStorage::{data, dequantize}) encoded a blit into the shared rotating command buffer and then called wait_until_completed, which commits the current buffer and waits on the last in-flight one. With concurrent submissions from other threads, the in-flight list can be taken by another thread's flush_and_wait between the blit encode and the wait, causing the reader to return before its blit has executed and to read stale or unwritten destination memory.

This PR adds Commands::flush_and_wait_current, which commits the current command buffer while holding the state lock and waits on that specific buffer. Command queues execute buffers in commit order, so completion of this buffer also covers anything committed before it by other threads. Completed buffers are drained from the in-flight list to keep it bounded under readback-heavy workloads.

CPU readbacks (MetalStorage::to_cpu, QMetalStorage::{data, dequantize})
encoded a blit into the shared rotating command buffer and then called
wait_until_completed, which commits the current buffer and waits on the
last in-flight one. With concurrent submissions from other threads, the
in-flight list can be taken by another thread's flush_and_wait between
the blit encode and the wait, causing the reader to return before its
blit has executed and to read stale or unwritten destination memory.

Fix: add Commands::flush_and_wait_current, which commits the current
command buffer while holding the state lock and waits on that specific
buffer. Command queues execute buffers in commit order, so completion
of this buffer also covers anything committed before it by other
threads. Completed buffers are drained from the in-flight list to keep
it bounded under readback-heavy workloads; errored buffers are kept for
reporting. The three readback sites now use it.

The new concurrent readback tests fail on every thread within a few
iterations against the previous behavior.

@ivarflakstad ivarflakstad left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment thread candle-metal-kernels/src/metal/commands.rs Outdated
@EricLBuehler EricLBuehler merged commit 65ecb58 into main Jun 10, 2026
12 checks passed
@EricLBuehler EricLBuehler deleted the metal-concurrent-readback-fix branch June 10, 2026 13:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants