KWS:
The new Thresholding (input quant) comes before the flatten (realized implicitly by the DWC), so its max PE is 1, leading to an interval of 490 cycles, which is the new bottleneck:
"Thresholding_rtl_0": 490,
"MVAU_hls_0": 392,
"MVAU_hls_1": 256,
"MVAU_hls_2": 256,
"MVAU_hls_3": 384,
"LabelSelect_hls_0": 13
bnn-pynq TFC:
The Reshape (no-op) performs the flattening of the 28x28 input. Its PE is limited to 28, so its lowest possible interval is 28 cycles. With the current folding this is not a problem (bottleneck 64 cycles), but I can imagine cases where it could be.
"Reshape_rtl_0": 28,
"Thresholding_rtl_0": 28,
"MVAU_hls_0": 64,
"MVAU_hls_1": 64,
"MVAU_hls_2": 64,
"MVAU_hls_3": 8,
"LabelSelect_hls_0": 11
KWS:
The new Thresholding (input quant) comes before the flatten (realized implicitly by the DWC), so its max PE is 1, leading to an interval of 490 cycles, which is the new bottleneck:
bnn-pynq TFC:
The Reshape (no-op) performs the flattening of the 28x28 input. Its PE is limited to 28, so its lowest possible interval is 28 cycles. With the current folding this is not a problem (bottleneck 64 cycles), but I can imagine cases where it could be.