[AutoDiff] Autodiff 7: Fix header size in AdStackAllocaStmt to match u64 runtime layout#534
Conversation
There was a problem hiding this comment.
LGTM — correct one-line fix aligning the alloca size with the u64 runtime header layout.
Extended reasoning...
Overview
Single-line change in AdStackAllocaStmt::size_in_bytes() in quadrants/ir/statements.h. The fix corrects the header slot size from sizeof(int32) (4 bytes) to sizeof(int64) (8 bytes) to match the u64 layout used by stack_init/stack_push/stack_top_primal in the runtime.
Security risks
None. This is a memory-layout correctness fix inside the autodiff stack allocator. It makes allocations larger, not smaller, so it eliminates a potential out-of-bounds write rather than introducing one.
Level of scrutiny
Low. The change is mechanical and isolated: one sizeof operand corrected to match a documented runtime invariant. The PR description confirms all other adstack consumers already use sizeof(u64), so this brings the last inconsistent site into alignment. No logic, algorithm, or API surface changes.
Other factors
No existing reviewer comments, no outstanding concerns, and the bug hunting system found no issues. The added comment clearly documents the rationale so future readers understand the constraint.
930d6d9 to
a2a23ff
Compare
0627f69 to
1caca8f
Compare
a2a23ff to
462383b
Compare
1caca8f to
4c5ee88
Compare
462383b to
53a99bd
Compare
4c5ee88 to
11d2006
Compare
53a99bd to
8a4c10b
Compare
11d2006 to
903e8e6
Compare
8a4c10b to
f6c9fe2
Compare
903e8e6 to
a222483
Compare
f6c9fe2 to
b502ac8
Compare
a222483 to
c4665c5
Compare
|
Checklist:
=> ok to merge |
b502ac8 to
ed07103
Compare
c4665c5 to
8dc6a50
Compare
Fix header size in
AdStackAllocaStmtto matchu64runtime layoutTL;DR
std::size_t size_in_bytes() const { - return sizeof(int32) + entry_size_in_bytes() * max_size; + // Header is a `u64` (see `stack_init`/`stack_push`/`stack_top_primal` in runtime.cpp), so use + // `sizeof(int64)` - not `sizeof(int32)` - to size the LLVM alloca matching the runtime layout. + return sizeof(int64) + entry_size_in_bytes() * max_size; }Why
The runtime code reads and writes the header as a
u64:codegen_llvm.cpp'svisit(AdStackAllocaStmt)creates anallocaofstmt->size_in_bytes()bytes at 8-byte alignment. With the oldsizeof(int32)in the size computation, the alloca was4 + max_size * entry_bytesbytes, but the runtime'sstack_initwrites 8 bytes — overwriting the start of what should be the first primal slot, or adjacent stack memory ifmax_size == 0. At 8-byte alignment the allocator may round up to the next 8-byte boundary, which happens to cover for smallmax_size; at largermax_sizeit does not, and subsequent push/pop indexing then treats the primal area as 4 bytes offset from where it actually starts.The regression is extremely narrow: every push / pop / load-top site in the runtime agrees with the
sizeof(u64)header width; only the alloca-sizing helper was stale. Fixing it is a 1-line change plus a comment.Why no dedicated test
A dedicated Python test for this specific bug would need to observe the 4-byte mis-sizing, which only manifests as silent corruption of neighboring stack memory or mis-indexed primal reads — both are deterministic in principle but highly sensitive to
max_size, alignment, and the allocator's rounding behaviour at the LLVM frame-layout level. The downstream tests in this stack (every adstack grad test in Autodiff 1 / Autodiff 6 / Autodiff 8, and the more intensive ones in Autodiff 10+) all exercise the alloca correctness indirectly: a mis-sized header would mis-index every primal slot and the gradient comparison against PyTorch / the analytical expected value would fail.Stack
Autodiff 7 of 13. First commit of the "LLVM adstack safety" triplet split. Based on #491 (regression tests). Followed by #535 (runtime overflow).