soc
driver
work out-of-box
subgroupMemoryBarrier hack
disable subgroup ops
855p
system driver 512.502.0
no
works
works
855p
mesa turnip 26.1.2
no
no
works
8elite
system driver 512.800.64
yes
/
/
8elite
mesa turnip 26.1.2
yes
/
/
8elitegen5
system driver 512.842.19
no
no
works
8elitegen5
mesa turnip 26.1.2
yes
/
/
reproduce steps
compile ncnn example ppocrv5 patched with query VK_KHR_shader_subgroup_extended_types features, sanitize fp16 subgroup path #6780
prepare mobile model files from https://github.com/nihui/ncnn-android-ppocrv5/tree/master/app/src/main/assets
./ppocrv5 test.jpg
expected output
H 0.98702 at 517.01 1078.97 33.54 x 742.46 @ 89.44 = 执行标准:GB/T26701-2011、Q/MINISO341-2024
on some chips, you will get garbage output
H 0.98622 at 517.02 1078.96 33.55 x 742.48 @ 89.44 = 砘毽斜荤敏天麔籗昈拧雗酯∖戍壼呓瘕|剺庂剜谭≇蠖親∓桅🐏ʎ古懻叁/M蝚盏忏偁パ₍偎刃蕗饍眃杄燬羵🌅负従嶙♋轨鶺垕Ã览蘁汰鍉⏝欵鬼鴀颟
subgroupMemoryBarrier hack
edit src/layer/vulkan/shader/gemm_sg.comp for extra subgroupMemoryBarrier() before and after any subgroup shuffle blocks and re-compile
for (int z = 0 ; z < UNROLL_SG_K ; z ++ )
{
subgroupMemoryBarrier ();
afpvec4 aa = subgroupShuffle (a , smi * UNROLL_SG_K + z );
afpvec4 bb = subgroupShuffle (b , sni * UNROLL_SG_K + z );
subgroupMemoryBarrier ();
sum0 += aa .r * bb ;
sum1 += aa .g * bb ;
sum2 += aa .b * bb ;
sum3 += aa .a * bb ;
}
and
for (int z = 0 ; z < UNROLL_SG_K && k + z < psc (K ); z ++ )
{
subgroupMemoryBarrier ();
afpvec4 aa = subgroupShuffle (a , smi * UNROLL_SG_K + z );
afpvec4 bb = subgroupShuffle (b , sni * UNROLL_SG_K + z );
subgroupMemoryBarrier ();
sum0 += aa .r * bb ;
sum1 += aa .g * bb ;
sum2 += aa .b * bb ;
sum3 += aa .a * bb ;
}
disable subgroup ops
edit examples/ppocrv5.cpp for setting option before loading rec model
ppocrv5_rec.opt.use_subgroup_ops = false ;
ppocrv5_rec.load_param(" PP_OCRv5_mobile_rec.ncnn.param" );
ppocrv5_rec.load_model(" PP_OCRv5_mobile_rec.ncnn.bin" );
Known ineffective workarounds/hacks
disable fp16 packed/storage/arithmetic but leaves use_subgroup_ops = true
reproduce steps
expected output
on some chips, you will get garbage output
subgroupMemoryBarrier hack
edit src/layer/vulkan/shader/gemm_sg.comp for extra subgroupMemoryBarrier() before and after any subgroup shuffle blocks and re-compile
and
disable subgroup ops
edit examples/ppocrv5.cpp for setting option before loading rec model
Known ineffective workarounds/hacks