Skip to content

SGEMM (im2col + GEMM) convolution path may occasionally output NaN due to uninitialized workspace memory reuse #6766

Description

@suncan0207

error log | 日志或报错信息 | ログ
Symptom

In x86 inference, contour/mask results are occasionally lost.

Investigation shows that all outputs from a convolution layer suddenly become NaN, and every subsequent layer also outputs NaN, resulting in a completely invalid final output.

Observed characteristics:

Input tensor contains no NaN
Weight tensor contains no NaN
Bias tensor contains no NaN
BT workspace tile already contains NaN before GEMM computation
Issue occurs intermittently and is difficult to reproduce deterministically
More likely to appear when PoolAllocator aggressively reuses workspace buffers
Root Cause Analysis

The issue appears to be caused by reuse of uninitialized memory in the SGEMM workspace.

Allocation chain:

SGEMM (im2col + GEMM) allocates temporary buffers (BT, topT_tileX) through opt.workspace_allocator (PoolAllocator).
fastMalloc() actually allocates:
size + NCNN_MALLOC_OVERREAD

where:

NCNN_MALLOC_OVERREAD = 64
PoolAllocator only clears the first size bytes when recycling memory:
memset(ptr, 0, size);
The extra 64-byte OVERREAD region is never cleared.
SIMD GEMM kernels perform 512-bit (64-byte) vector loads and may read across tile boundaries into the OVERREAD area.
For edge tiles (max_jj < TILE_N or max_kk < TILE_K), im2col data is only partially filled.
When workspace memory is reused, stale values (including NaN) can remain in the uncleared OVERREAD area and become visible to GEMM vector loads.
NaN propagates through matrix multiplication and contaminates all subsequent outputs.

As a result:

PoolAllocator reuse
+
uncleared OVERREAD region
+
partial edge tile filling
+
SIMD overread
=
intermittent NaN generation
context | 编译/运行环境 | バックグラウンド
ncnn: x86 backend
Convolution implementation: SGEMM (im2col + GEMM)
Workspace allocator: PoolAllocator
SIMD path: AVX512-capable GEMM kernels
OS: Linux / Windows (potentially platform independent)
Compiler: GCC / Clang / MSVC
Issue observed during inference only
how to reproduce | 复现步骤 | 再現方法
Build ncnn with x86 SGEMM convolution enabled.
Run repeated inference using a model that frequently exercises SGEMM edge tiles.
Reuse PoolAllocator as workspace allocator.
After sufficient allocator reuse cycles, outputs may occasionally become entirely NaN.
Instrumentation around SGEMM shows NaN already exists inside BT before GEMM computation.

Example checks:

// input
assert(!has_nan(bottom_blob));

// weights
assert(!has_nan(weight_data));

// bias
assert(!has_nan(bias_data));

// BT tile
check_nan(BT);

Result:

bottom_blob : clean
weight_data : clean
bias_data : clean
BT : contains NaN
output : all NaN
more | 其他 | その他
Proposed Fix 1: Clear full allocated region in PoolAllocator

fastMalloc() allocates:

size + NCNN_MALLOC_OVERREAD

but allocator only clears:

size

Suggested change in src/allocator.cpp:

memset(ptr, 0, size + NCNN_MALLOC_OVERREAD);

Apply to both:

budget recycle path
new allocation path
Proposed Fix 2: Explicitly zero SGEMM workspace tiles

In:

src/layer/x86/convolution_im2col_gemm.h

after creating BT:

memset(BT.data, 0, BT.total() * BT.elemsize);

after creating topT_tileX:

memset(topT_tileX.data, 0, topT_tileX.total() * topT_tileX.elemsize);
Verification

After applying both modifications:

No NaN observed in BT workspace
No contour disappearance observed
Long-duration inference remains stable
Issue has not reappeared during stress testing
Expected Behavior

Workspace memory should not expose stale data from previous allocations, including the OVERREAD region reserved for SIMD loads.

Actual Behavior

The OVERREAD region may contain stale NaN values from previous allocations, which can be consumed by SGEMM vector loads and propagate through inference.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions