You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary:
Implement DecomposeMaxPool1dPass to enable MaxPool1D support on ARM backend
by decomposing max_pool1d into unsqueeze_copy → max_pool2d → squeeze_copy.
## Implementation Strategy
### Decomposition Approach (Optimal for TOSA/Vela)
The pass decomposes max_pool1d into max_pool2d via unsqueeze_copy/squeeze_copy
operations:
1. unsqueeze_copy(dim=2): (N, C, L) → (N, C, 1, L) - add height dimension
2. max_pool2d: with adapted params [k]→[1,k], [s]→[1,s], [p]→[0,p], [d]→[1,d]
3. squeeze_copy(dims=[2]): (N, C, 1, L_out) → (N, C, L_out) - remove height dimension
### Why This Approach is Optimal
1. **unsqueeze_copy and squeeze_copy map to TOSA RESHAPE** which is zero-cost in Vela:
- Classified as memory_only_ops (Reshape, Squeeze, ExpandDims, Identity)
- Bypassed entirely when conditions met (NPU-produced, single consumer)
- Tensor equivalence enables memory aliasing (same address)
2. **TFA Pipeline Placement (before quantization)**:
- unsqueeze_copy.default is in _one_to_one_shared_input_qspec
- squeeze_copy.dims is added to _one_to_one_shared_input_qspec
- max_pool2d is in _one_to_one_shared_input_or_input_act_qspec
- All get proper SharedQuantizationSpec from the annotator automatically
3. **Quantization Handling**:
- Clear qparams on intermediate unsqueeze_copy and squeeze_copy ops (let annotator fill them)
- Preserve original meta on max_pool2d for proper tracing
- MAX_POOL2D doesn't need zero-point handling (unlike AVG_POOL2D)
### TOSA/Vela Constraints Validated
- U55: Stride ≤3 ✓, Kernel ≤256x256 ✓
- U85: Extended stride support via accumulator save/restore
- Dilation: Handled by separate DecomposeMaxPool2dPass if needed
Reviewed By: 3l1
Differential Revision: D91760459
0 commit comments