error log | 日志或报错信息 | ログ
Symptom
In x86 inference, contour/mask results are occasionally lost.
Investigation shows that all outputs from a convolution layer suddenly become NaN, and every subsequent layer also outputs NaN, resulting in a completely invalid final output.
Observed characteristics:
Input tensor contains no NaN
Weight tensor contains no NaN
Bias tensor contains no NaN
BT workspace tile already contains NaN before GEMM computation
Issue occurs intermittently and is difficult to reproduce deterministically
More likely to appear when PoolAllocator aggressively reuses workspace buffers
Root Cause Analysis
The issue appears to be caused by reuse of uninitialized memory in the SGEMM workspace.
Allocation chain:
SGEMM (im2col + GEMM) allocates temporary buffers (BT, topT_tileX) through opt.workspace_allocator (PoolAllocator).
fastMalloc() actually allocates:
size + NCNN_MALLOC_OVERREAD
where:
NCNN_MALLOC_OVERREAD = 64
PoolAllocator only clears the first size bytes when recycling memory:
memset(ptr, 0, size);
The extra 64-byte OVERREAD region is never cleared.
SIMD GEMM kernels perform 512-bit (64-byte) vector loads and may read across tile boundaries into the OVERREAD area.
For edge tiles (max_jj < TILE_N or max_kk < TILE_K), im2col data is only partially filled.
When workspace memory is reused, stale values (including NaN) can remain in the uncleared OVERREAD area and become visible to GEMM vector loads.
NaN propagates through matrix multiplication and contaminates all subsequent outputs.
As a result:
PoolAllocator reuse
+
uncleared OVERREAD region
+
partial edge tile filling
+
SIMD overread
=
intermittent NaN generation
context | 编译/运行环境 | バックグラウンド
ncnn: x86 backend
Convolution implementation: SGEMM (im2col + GEMM)
Workspace allocator: PoolAllocator
SIMD path: AVX512-capable GEMM kernels
OS: Linux / Windows (potentially platform independent)
Compiler: GCC / Clang / MSVC
Issue observed during inference only
how to reproduce | 复现步骤 | 再現方法
Build ncnn with x86 SGEMM convolution enabled.
Run repeated inference using a model that frequently exercises SGEMM edge tiles.
Reuse PoolAllocator as workspace allocator.
After sufficient allocator reuse cycles, outputs may occasionally become entirely NaN.
Instrumentation around SGEMM shows NaN already exists inside BT before GEMM computation.
Example checks:
// input
assert(!has_nan(bottom_blob));
// weights
assert(!has_nan(weight_data));
// bias
assert(!has_nan(bias_data));
// BT tile
check_nan(BT);
Result:
bottom_blob : clean
weight_data : clean
bias_data : clean
BT : contains NaN
output : all NaN
more | 其他 | その他
Proposed Fix 1: Clear full allocated region in PoolAllocator
fastMalloc() allocates:
size + NCNN_MALLOC_OVERREAD
but allocator only clears:
size
Suggested change in src/allocator.cpp:
memset(ptr, 0, size + NCNN_MALLOC_OVERREAD);
Apply to both:
budget recycle path
new allocation path
Proposed Fix 2: Explicitly zero SGEMM workspace tiles
In:
src/layer/x86/convolution_im2col_gemm.h
after creating BT:
memset(BT.data, 0, BT.total() * BT.elemsize);
after creating topT_tileX:
memset(topT_tileX.data, 0, topT_tileX.total() * topT_tileX.elemsize);
Verification
After applying both modifications:
No NaN observed in BT workspace
No contour disappearance observed
Long-duration inference remains stable
Issue has not reappeared during stress testing
Expected Behavior
Workspace memory should not expose stale data from previous allocations, including the OVERREAD region reserved for SIMD loads.
Actual Behavior
The OVERREAD region may contain stale NaN values from previous allocations, which can be consumed by SGEMM vector loads and propagate through inference.
error log | 日志或报错信息 | ログ
Symptom
In x86 inference, contour/mask results are occasionally lost.
Investigation shows that all outputs from a convolution layer suddenly become NaN, and every subsequent layer also outputs NaN, resulting in a completely invalid final output.
Observed characteristics:
Input tensor contains no NaN
Weight tensor contains no NaN
Bias tensor contains no NaN
BT workspace tile already contains NaN before GEMM computation
Issue occurs intermittently and is difficult to reproduce deterministically
More likely to appear when PoolAllocator aggressively reuses workspace buffers
Root Cause Analysis
The issue appears to be caused by reuse of uninitialized memory in the SGEMM workspace.
Allocation chain:
SGEMM (im2col + GEMM) allocates temporary buffers (BT, topT_tileX) through opt.workspace_allocator (PoolAllocator).
fastMalloc() actually allocates:
size + NCNN_MALLOC_OVERREAD
where:
NCNN_MALLOC_OVERREAD = 64
PoolAllocator only clears the first size bytes when recycling memory:
memset(ptr, 0, size);
The extra 64-byte OVERREAD region is never cleared.
SIMD GEMM kernels perform 512-bit (64-byte) vector loads and may read across tile boundaries into the OVERREAD area.
For edge tiles (max_jj < TILE_N or max_kk < TILE_K), im2col data is only partially filled.
When workspace memory is reused, stale values (including NaN) can remain in the uncleared OVERREAD area and become visible to GEMM vector loads.
NaN propagates through matrix multiplication and contaminates all subsequent outputs.
As a result:
PoolAllocator reuse
+
uncleared OVERREAD region
+
partial edge tile filling
+
SIMD overread
=
intermittent NaN generation
context | 编译/运行环境 | バックグラウンド
ncnn: x86 backend
Convolution implementation: SGEMM (im2col + GEMM)
Workspace allocator: PoolAllocator
SIMD path: AVX512-capable GEMM kernels
OS: Linux / Windows (potentially platform independent)
Compiler: GCC / Clang / MSVC
Issue observed during inference only
how to reproduce | 复现步骤 | 再現方法
Build ncnn with x86 SGEMM convolution enabled.
Run repeated inference using a model that frequently exercises SGEMM edge tiles.
Reuse PoolAllocator as workspace allocator.
After sufficient allocator reuse cycles, outputs may occasionally become entirely NaN.
Instrumentation around SGEMM shows NaN already exists inside BT before GEMM computation.
Example checks:
// input
assert(!has_nan(bottom_blob));
// weights
assert(!has_nan(weight_data));
// bias
assert(!has_nan(bias_data));
// BT tile
check_nan(BT);
Result:
bottom_blob : clean
weight_data : clean
bias_data : clean
BT : contains NaN
output : all NaN
more | 其他 | その他
Proposed Fix 1: Clear full allocated region in PoolAllocator
fastMalloc() allocates:
size + NCNN_MALLOC_OVERREAD
but allocator only clears:
size
Suggested change in src/allocator.cpp:
memset(ptr, 0, size + NCNN_MALLOC_OVERREAD);
Apply to both:
budget recycle path
new allocation path
Proposed Fix 2: Explicitly zero SGEMM workspace tiles
In:
src/layer/x86/convolution_im2col_gemm.h
after creating BT:
memset(BT.data, 0, BT.total() * BT.elemsize);
after creating topT_tileX:
memset(topT_tileX.data, 0, topT_tileX.total() * topT_tileX.elemsize);
Verification
After applying both modifications:
No NaN observed in BT workspace
No contour disappearance observed
Long-duration inference remains stable
Issue has not reappeared during stress testing
Expected Behavior
Workspace memory should not expose stale data from previous allocations, including the OVERREAD region reserved for SIMD loads.
Actual Behavior
The OVERREAD region may contain stale NaN values from previous allocations, which can be consumed by SGEMM vector loads and propagate through inference.