Add MaxPool1D decomposition pass support (#17022)#17022
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17022
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 2 New Failures, 2 Unrelated FailuresAs of commit ad1f626 with merge base 10c8958 ( NEW FAILURES - The following jobs have failed:
FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
There was a problem hiding this comment.
Pull request overview
Implements MaxPool1D support for the Arm backend by introducing a decomposition pass that rewrites max_pool1d into view_copy → max_pool2d → view_copy, and adds test coverage for the new behavior.
Changes:
- Add
DecomposeMaxPool1dPassand register it in the Arm pass pipeline (TFA/annotation stage). - Extend Arm quantization annotation patterns to treat
squeeze_copy.dimsas a shared-input-qspec op. - Add Arm backend tests covering MaxPool1D across TOSA FP/INT and Ethos-U55/U85 INT pipelines.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| backends/arm/test/targets.bzl | Registers the new MaxPool1D test file in the Arm test suite. |
| backends/arm/test/ops/test_max_pool1d.py | Adds pipeline-based tests validating MaxPool1D behavior and delegation. |
| backends/arm/quantizer/quantization_annotator.py | Adds squeeze_copy.dims to shared-qspec ops for quant annotation consistency. |
| backends/arm/_passes/decompose_max_pool1d_pass.py | Introduces the MaxPool1D decomposition pass (view/unsqueeze → max_pool2d → view/squeeze). |
| backends/arm/_passes/arm_pass_manager.py | Wires the new pass into the Arm pass manager (incl. annotation pipeline). |
| backends/arm/_passes/init.py | Exposes the new pass from the passes package. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @common.parametrize("test_data", test_data_suite) | ||
| def test_max_pool1d_tosa_FP(test_data: torch.Tensor): | ||
| """Test max_pool1d with TOSA FP pipeline.""" | ||
| test_data, model_params = test_data() | ||
| pipeline = TosaPipelineFP[input_t1]( |
There was a problem hiding this comment.
The test_data parameter is a callable from test_data_suite/test_data_suite_u55 (you immediately call test_data()), but it's annotated as torch.Tensor. This makes the test signature misleading for static type checking and IDEs; consider annotating it as Callable[[], tuple[torch.Tensor, list[int]]] (or dropping the annotation) to match actual usage.
oscarandersson8218
left a comment
There was a problem hiding this comment.
Awesome patch @Ninja91! Looks good on a high level. Adding a few comments on top of Copilot's.
Summary: Implement DecomposeMaxPool1dPass to enable MaxPool1D support on ARM backend by decomposing max_pool1d to view_copy → max_pool2d → view_copy. ## Implementation Strategy ### Decomposition Approach (Optimal for TOSA/Vela) The pass decomposes max_pool1d into max_pool2d via view_copy operations: 1. view_copy: (N, C, L) → (N, C, 1, L) - add height dimension 2. max_pool2d: with adapted params [k]→[1,k], [s]→[1,s], [p]→[0,p] 3. view_copy: (N, C, 1, L_out) → (N, C, L_out) - remove height dimension ### Why This Approach is Optimal 1. **view_copy maps to TOSA RESHAPE** which is zero-cost in Vela: - Classified as memory_only_ops (Reshape, Squeeze, ExpandDims, Identity) - Bypassed entirely when conditions met (NPU-produced, single consumer) - Tensor equivalence enables memory aliasing (same address) 2. **TFA Pipeline Placement (before quantization)**: - view_copy is in _one_to_one_shared_input_qspec (line 407) - max_pool2d is in _one_to_one_shared_input_or_input_act_qspec (line 455) - Both get proper SharedQuantizationSpec from annotator automatically 3. **Quantization Handling**: - Clear qparams on intermediate view_copy ops (let annotator fill them) - Preserve original meta on max_pool2d for proper tracing - MAX_POOL2D doesn't need zero-point handling (unlike AVG_POOL2D) ### TOSA/Vela Constraints Validated - U55: Stride ≤3 ✓, Kernel ≤256x256 ✓ - U85: Extended stride support via accumulator save/restore - Dilation: Handled by separate DecomposeMaxPool2dPass if needed Differential Revision: D91760459
3a1a61e to
8ad827c
Compare
Summary: Implement DecomposeMaxPool1dPass to enable MaxPool1D support on ARM backend by decomposing max_pool1d to view_copy → max_pool2d → view_copy. ## Implementation Strategy ### Decomposition Approach (Optimal for TOSA/Vela) The pass decomposes max_pool1d into max_pool2d via view_copy operations: 1. view_copy: (N, C, L) → (N, C, 1, L) - add height dimension 2. max_pool2d: with adapted params [k]→[1,k], [s]→[1,s], [p]→[0,p] 3. view_copy: (N, C, 1, L_out) → (N, C, L_out) - remove height dimension ### Why This Approach is Optimal 1. **view_copy maps to TOSA RESHAPE** which is zero-cost in Vela: - Classified as memory_only_ops (Reshape, Squeeze, ExpandDims, Identity) - Bypassed entirely when conditions met (NPU-produced, single consumer) - Tensor equivalence enables memory aliasing (same address) 2. **TFA Pipeline Placement (before quantization)**: - view_copy is in _one_to_one_shared_input_qspec (line 407) - max_pool2d is in _one_to_one_shared_input_or_input_act_qspec (line 455) - Both get proper SharedQuantizationSpec from annotator automatically 3. **Quantization Handling**: - Clear qparams on intermediate view_copy ops (let annotator fill them) - Preserve original meta on max_pool2d for proper tracing - MAX_POOL2D doesn't need zero-point handling (unlike AVG_POOL2D) ### TOSA/Vela Constraints Validated - U55: Stride ≤3 ✓, Kernel ≤256x256 ✓ - U85: Extended stride support via accumulator save/restore - Dilation: Handled by separate DecomposeMaxPool2dPass if needed Differential Revision: D91760459
8ad827c to
46190fb
Compare
|
See the errors below that seems suspicions, test to rerun Arm tests to see if it's just flakey FAILED backends/arm/test/ops/test_matmul.py::test_matmul_XXXX |
Summary: Implement DecomposeMaxPool1dPass to enable MaxPool1D support on ARM backend by decomposing max_pool1d to view_copy → max_pool2d → view_copy. ## Implementation Strategy ### Decomposition Approach (Optimal for TOSA/Vela) The pass decomposes max_pool1d into max_pool2d via view_copy operations: 1. view_copy: (N, C, L) → (N, C, 1, L) - add height dimension 2. max_pool2d: with adapted params [k]→[1,k], [s]→[1,s], [p]→[0,p] 3. view_copy: (N, C, 1, L_out) → (N, C, L_out) - remove height dimension ### Why This Approach is Optimal 1. **view_copy maps to TOSA RESHAPE** which is zero-cost in Vela: - Classified as memory_only_ops (Reshape, Squeeze, ExpandDims, Identity) - Bypassed entirely when conditions met (NPU-produced, single consumer) - Tensor equivalence enables memory aliasing (same address) 2. **TFA Pipeline Placement (before quantization)**: - view_copy is in _one_to_one_shared_input_qspec (line 407) - max_pool2d is in _one_to_one_shared_input_or_input_act_qspec (line 455) - Both get proper SharedQuantizationSpec from annotator automatically 3. **Quantization Handling**: - Clear qparams on intermediate view_copy ops (let annotator fill them) - Preserve original meta on max_pool2d for proper tracing - MAX_POOL2D doesn't need zero-point handling (unlike AVG_POOL2D) ### TOSA/Vela Constraints Validated - U55: Stride ≤3 ✓, Kernel ≤256x256 ✓ - U85: Extended stride support via accumulator save/restore - Dilation: Handled by separate DecomposeMaxPool2dPass if needed Differential Revision: D91760459
46190fb to
3c783db
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @common.parametrize("test_data", test_data_suite) | ||
| @pytest.mark.xfail(reason="MaxPool1D not yet supported", strict=False) | ||
| def test_max_pool1d_u55_INT(test_data: Callable): | ||
| """Test max_pool1d on Ethos-U55 (quantized).""" | ||
| test_data, model_params = test_data() | ||
| pipeline = EthosU55PipelineINT[input_t1]( | ||
| MaxPool1d(*model_params), | ||
| (test_data,), | ||
| aten_op, | ||
| exir_ops=[], | ||
| ) | ||
| pipeline.run() | ||
|
|
||
|
|
||
| @common.parametrize("test_data", test_data_suite) | ||
| @pytest.mark.xfail(reason="MaxPool1D not yet supported", strict=False) | ||
| def test_max_pool1d_u85_INT(test_data: Callable): | ||
| """Test max_pool1d on Ethos-U85 (quantized).""" | ||
| test_data, model_params = test_data() | ||
| pipeline = EthosU85PipelineINT[input_t1]( | ||
| MaxPool1d(*model_params), | ||
| (test_data,), | ||
| aten_op, | ||
| exir_ops=[], | ||
| ) | ||
| pipeline.run() | ||
|
|
There was a problem hiding this comment.
The U55/U85 INT tests use a broad @pytest.mark.xfail; once MaxPool1D is supported, these should follow the repo’s existing pattern and be gated with common.XfailIfNoCorstone300 / common.XfailIfNoCorstone320 instead (so CI environments without FVP don’t hard-fail).
| test_data_suite = { | ||
| # (test_name, test_data, [kernel_size, stride, padding]) | ||
| "simple": lambda: (torch.rand(1, 16, 50), [4, 2, 0]), | ||
| "with_padding": lambda: (torch.rand(1, 16, 50), [3, 2, 1]), | ||
| "stride_1": lambda: (torch.rand(1, 8, 32), [3, 1, 0]), | ||
| "larger_kernel": lambda: (torch.rand(1, 4, 64), [8, 4, 0]), | ||
| "multi_batch": lambda: (torch.rand(4, 16, 50), [4, 2, 0]), | ||
| } |
There was a problem hiding this comment.
The comment describing test_data_suite’s tuple shape is inaccurate: the values returned are (tensor, [kernel_size, stride, padding]) and do not include a test_name element. Please update the comment to match the actual structure to avoid confusion when adding new cases.
|
PR seem to introduce this errors :( |
Summary: Implement DecomposeMaxPool1dPass to enable MaxPool1D support on ARM backend by decomposing max_pool1d to view_copy → max_pool2d → view_copy. ## Implementation Strategy ### Decomposition Approach (Optimal for TOSA/Vela) The pass decomposes max_pool1d into max_pool2d via view_copy operations: 1. view_copy: (N, C, L) → (N, C, 1, L) - add height dimension 2. max_pool2d: with adapted params [k]→[1,k], [s]→[1,s], [p]→[0,p] 3. view_copy: (N, C, 1, L_out) → (N, C, L_out) - remove height dimension ### Why This Approach is Optimal 1. **view_copy maps to TOSA RESHAPE** which is zero-cost in Vela: - Classified as memory_only_ops (Reshape, Squeeze, ExpandDims, Identity) - Bypassed entirely when conditions met (NPU-produced, single consumer) - Tensor equivalence enables memory aliasing (same address) 2. **TFA Pipeline Placement (before quantization)**: - view_copy is in _one_to_one_shared_input_qspec (line 407) - max_pool2d is in _one_to_one_shared_input_or_input_act_qspec (line 455) - Both get proper SharedQuantizationSpec from annotator automatically 3. **Quantization Handling**: - Clear qparams on intermediate view_copy ops (let annotator fill them) - Preserve original meta on max_pool2d for proper tracing - MAX_POOL2D doesn't need zero-point handling (unlike AVG_POOL2D) ### TOSA/Vela Constraints Validated - U55: Stride ≤3 ✓, Kernel ≤256x256 ✓ - U85: Extended stride support via accumulator save/restore - Dilation: Handled by separate DecomposeMaxPool2dPass if needed Reviewed By: 3l1 Differential Revision: D91760459
3c783db to
e245c96
Compare
db258e9 to
0a5f90c
Compare
Summary: Implement DecomposeMaxPool1dPass to enable MaxPool1D support on ARM backend by decomposing max_pool1d to view_copy → max_pool2d → view_copy. ## Implementation Strategy ### Decomposition Approach (Optimal for TOSA/Vela) The pass decomposes max_pool1d into max_pool2d via view_copy operations: 1. view_copy: (N, C, L) → (N, C, 1, L) - add height dimension 2. max_pool2d: with adapted params [k]→[1,k], [s]→[1,s], [p]→[0,p] 3. view_copy: (N, C, 1, L_out) → (N, C, L_out) - remove height dimension ### Why This Approach is Optimal 1. **view_copy maps to TOSA RESHAPE** which is zero-cost in Vela: - Classified as memory_only_ops (Reshape, Squeeze, ExpandDims, Identity) - Bypassed entirely when conditions met (NPU-produced, single consumer) - Tensor equivalence enables memory aliasing (same address) 2. **TFA Pipeline Placement (before quantization)**: - view_copy is in _one_to_one_shared_input_qspec (line 407) - max_pool2d is in _one_to_one_shared_input_or_input_act_qspec (line 455) - Both get proper SharedQuantizationSpec from annotator automatically 3. **Quantization Handling**: - Clear qparams on intermediate view_copy ops (let annotator fill them) - Preserve original meta on max_pool2d for proper tracing - MAX_POOL2D doesn't need zero-point handling (unlike AVG_POOL2D) ### TOSA/Vela Constraints Validated - U55: Stride ≤3 ✓, Kernel ≤256x256 ✓ - U85: Extended stride support via accumulator save/restore - Dilation: Handled by separate DecomposeMaxPool2dPass if needed Reviewed By: 3l1 Differential Revision: D91760459
Summary: Pull Request resolved: pytorch#17022 Implement DecomposeMaxPool1dPass to enable MaxPool1D support on ARM backend by decomposing max_pool1d to view_copy → max_pool2d → view_copy. ## Implementation Strategy ### Decomposition Approach (Optimal for TOSA/Vela) The pass decomposes max_pool1d into max_pool2d via view_copy operations: 1. view_copy: (N, C, L) → (N, C, 1, L) - add height dimension 2. max_pool2d: with adapted params [k]→[1,k], [s]→[1,s], [p]→[0,p] 3. view_copy: (N, C, 1, L_out) → (N, C, L_out) - remove height dimension ### Why This Approach is Optimal 1. **view_copy maps to TOSA RESHAPE** which is zero-cost in Vela: - Classified as memory_only_ops (Reshape, Squeeze, ExpandDims, Identity) - Bypassed entirely when conditions met (NPU-produced, single consumer) - Tensor equivalence enables memory aliasing (same address) 2. **TFA Pipeline Placement (before quantization)**: - view_copy is in _one_to_one_shared_input_qspec (line 407) - max_pool2d is in _one_to_one_shared_input_or_input_act_qspec (line 455) - Both get proper SharedQuantizationSpec from annotator automatically 3. **Quantization Handling**: - Clear qparams on intermediate view_copy ops (let annotator fill them) - Preserve original meta on max_pool2d for proper tracing - MAX_POOL2D doesn't need zero-point handling (unlike AVG_POOL2D) ### TOSA/Vela Constraints Validated - U55: Stride ≤3 ✓, Kernel ≤256x256 ✓ - U85: Extended stride support via accumulator save/restore - Dilation: Handled by separate DecomposeMaxPool2dPass if needed Reviewed By: 3l1 Differential Revision: D91760459
0a5f90c to
821fd14
Compare
Summary: Implement DecomposeMaxPool1dPass to enable MaxPool1D support on ARM backend by decomposing max_pool1d to view_copy → max_pool2d → view_copy. ## Implementation Strategy ### Decomposition Approach (Optimal for TOSA/Vela) The pass decomposes max_pool1d into max_pool2d via view_copy operations: 1. view_copy: (N, C, L) → (N, C, 1, L) - add height dimension 2. max_pool2d: with adapted params [k]→[1,k], [s]→[1,s], [p]→[0,p] 3. view_copy: (N, C, 1, L_out) → (N, C, L_out) - remove height dimension ### Why This Approach is Optimal 1. **view_copy maps to TOSA RESHAPE** which is zero-cost in Vela: - Classified as memory_only_ops (Reshape, Squeeze, ExpandDims, Identity) - Bypassed entirely when conditions met (NPU-produced, single consumer) - Tensor equivalence enables memory aliasing (same address) 2. **TFA Pipeline Placement (before quantization)**: - view_copy is in _one_to_one_shared_input_qspec (line 407) - max_pool2d is in _one_to_one_shared_input_or_input_act_qspec (line 455) - Both get proper SharedQuantizationSpec from annotator automatically 3. **Quantization Handling**: - Clear qparams on intermediate view_copy ops (let annotator fill them) - Preserve original meta on max_pool2d for proper tracing - MAX_POOL2D doesn't need zero-point handling (unlike AVG_POOL2D) ### TOSA/Vela Constraints Validated - U55: Stride ≤3 ✓, Kernel ≤256x256 ✓ - U85: Extended stride support via accumulator save/restore - Dilation: Handled by separate DecomposeMaxPool2dPass if needed Reviewed By: 3l1 Differential Revision: D91760459
821fd14 to
a56c81d
Compare
Summary: Pull Request resolved: pytorch#17022 Implement DecomposeMaxPool1dPass to enable MaxPool1D support on ARM backend by decomposing max_pool1d to view_copy → max_pool2d → view_copy. ## Implementation Strategy ### Decomposition Approach (Optimal for TOSA/Vela) The pass decomposes max_pool1d into max_pool2d via view_copy operations: 1. view_copy: (N, C, L) → (N, C, 1, L) - add height dimension 2. max_pool2d: with adapted params [k]→[1,k], [s]→[1,s], [p]→[0,p] 3. view_copy: (N, C, 1, L_out) → (N, C, L_out) - remove height dimension ### Why This Approach is Optimal 1. **view_copy maps to TOSA RESHAPE** which is zero-cost in Vela: - Classified as memory_only_ops (Reshape, Squeeze, ExpandDims, Identity) - Bypassed entirely when conditions met (NPU-produced, single consumer) - Tensor equivalence enables memory aliasing (same address) 2. **TFA Pipeline Placement (before quantization)**: - view_copy is in _one_to_one_shared_input_qspec (line 407) - max_pool2d is in _one_to_one_shared_input_or_input_act_qspec (line 455) - Both get proper SharedQuantizationSpec from annotator automatically 3. **Quantization Handling**: - Clear qparams on intermediate view_copy ops (let annotator fill them) - Preserve original meta on max_pool2d for proper tracing - MAX_POOL2D doesn't need zero-point handling (unlike AVG_POOL2D) ### TOSA/Vela Constraints Validated - U55: Stride ≤3 ✓, Kernel ≤256x256 ✓ - U85: Extended stride support via accumulator save/restore - Dilation: Handled by separate DecomposeMaxPool2dPass if needed Reviewed By: 3l1 Differential Revision: D91760459
a56c81d to
18cedc6
Compare
Summary: Implement DecomposeMaxPool1dPass to enable MaxPool1D support on ARM backend by decomposing max_pool1d to view_copy → max_pool2d → view_copy. ## Implementation Strategy ### Decomposition Approach (Optimal for TOSA/Vela) The pass decomposes max_pool1d into max_pool2d via view_copy operations: 1. view_copy: (N, C, L) → (N, C, 1, L) - add height dimension 2. max_pool2d: with adapted params [k]→[1,k], [s]→[1,s], [p]→[0,p] 3. view_copy: (N, C, 1, L_out) → (N, C, L_out) - remove height dimension ### Why This Approach is Optimal 1. **view_copy maps to TOSA RESHAPE** which is zero-cost in Vela: - Classified as memory_only_ops (Reshape, Squeeze, ExpandDims, Identity) - Bypassed entirely when conditions met (NPU-produced, single consumer) - Tensor equivalence enables memory aliasing (same address) 2. **TFA Pipeline Placement (before quantization)**: - view_copy is in _one_to_one_shared_input_qspec (line 407) - max_pool2d is in _one_to_one_shared_input_or_input_act_qspec (line 455) - Both get proper SharedQuantizationSpec from annotator automatically 3. **Quantization Handling**: - Clear qparams on intermediate view_copy ops (let annotator fill them) - Preserve original meta on max_pool2d for proper tracing - MAX_POOL2D doesn't need zero-point handling (unlike AVG_POOL2D) ### TOSA/Vela Constraints Validated - U55: Stride ≤3 ✓, Kernel ≤256x256 ✓ - U85: Extended stride support via accumulator save/restore - Dilation: Handled by separate DecomposeMaxPool2dPass if needed Reviewed By: 3l1 Differential Revision: D91760459
18cedc6 to
448ee04
Compare
Summary: Pull Request resolved: pytorch#17022 Implement DecomposeMaxPool1dPass to enable MaxPool1D support on ARM backend by decomposing max_pool1d to view_copy → max_pool2d → view_copy. ## Implementation Strategy ### Decomposition Approach (Optimal for TOSA/Vela) The pass decomposes max_pool1d into max_pool2d via view_copy operations: 1. view_copy: (N, C, L) → (N, C, 1, L) - add height dimension 2. max_pool2d: with adapted params [k]→[1,k], [s]→[1,s], [p]→[0,p] 3. view_copy: (N, C, 1, L_out) → (N, C, L_out) - remove height dimension ### Why This Approach is Optimal 1. **view_copy maps to TOSA RESHAPE** which is zero-cost in Vela: - Classified as memory_only_ops (Reshape, Squeeze, ExpandDims, Identity) - Bypassed entirely when conditions met (NPU-produced, single consumer) - Tensor equivalence enables memory aliasing (same address) 2. **TFA Pipeline Placement (before quantization)**: - view_copy is in _one_to_one_shared_input_qspec (line 407) - max_pool2d is in _one_to_one_shared_input_or_input_act_qspec (line 455) - Both get proper SharedQuantizationSpec from annotator automatically 3. **Quantization Handling**: - Clear qparams on intermediate view_copy ops (let annotator fill them) - Preserve original meta on max_pool2d for proper tracing - MAX_POOL2D doesn't need zero-point handling (unlike AVG_POOL2D) ### TOSA/Vela Constraints Validated - U55: Stride ≤3 ✓, Kernel ≤256x256 ✓ - U85: Extended stride support via accumulator save/restore - Dilation: Handled by separate DecomposeMaxPool2dPass if needed Reviewed By: 3l1 Differential Revision: D91760459
448ee04 to
8824b12
Compare
Summary: Implement DecomposeMaxPool1dPass to enable MaxPool1D support on ARM backend by decomposing max_pool1d to view_copy → max_pool2d → view_copy. ## Implementation Strategy ### Decomposition Approach (Optimal for TOSA/Vela) The pass decomposes max_pool1d into max_pool2d via view_copy operations: 1. view_copy: (N, C, L) → (N, C, 1, L) - add height dimension 2. max_pool2d: with adapted params [k]→[1,k], [s]→[1,s], [p]→[0,p] 3. view_copy: (N, C, 1, L_out) → (N, C, L_out) - remove height dimension ### Why This Approach is Optimal 1. **view_copy maps to TOSA RESHAPE** which is zero-cost in Vela: - Classified as memory_only_ops (Reshape, Squeeze, ExpandDims, Identity) - Bypassed entirely when conditions met (NPU-produced, single consumer) - Tensor equivalence enables memory aliasing (same address) 2. **TFA Pipeline Placement (before quantization)**: - view_copy is in _one_to_one_shared_input_qspec (line 407) - max_pool2d is in _one_to_one_shared_input_or_input_act_qspec (line 455) - Both get proper SharedQuantizationSpec from annotator automatically 3. **Quantization Handling**: - Clear qparams on intermediate view_copy ops (let annotator fill them) - Preserve original meta on max_pool2d for proper tracing - MAX_POOL2D doesn't need zero-point handling (unlike AVG_POOL2D) ### TOSA/Vela Constraints Validated - U55: Stride ≤3 ✓, Kernel ≤256x256 ✓ - U85: Extended stride support via accumulator save/restore - Dilation: Handled by separate DecomposeMaxPool2dPass if needed Reviewed By: 3l1 Differential Revision: D91760459
8824b12 to
7ffeafd
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def call_operator(self, op, args, kwargs, meta): | ||
| if op != torch.ops.aten.max_pool1d.default or not self.allowed_to_transform( | ||
| meta | ||
| ): | ||
| return super().call_operator(op, args, kwargs, meta) |
There was a problem hiding this comment.
DecomposeMaxPool1dPass currently only matches torch.ops.aten.max_pool1d.default. In the backend lowering pipeline (transform_to_backend_pipeline), graphs are typically in the EXIR edge dialect, so this pass may never fire for non-quantized compilation unless it also handles the edge-dialect overload (and emits the corresponding edge ops). Consider supporting both aten and edge overloads (similar to other passes that accept both).
| x_4d = super().call_operator( | ||
| torch.ops.aten.unsqueeze_copy.default, | ||
| (x, 2), | ||
| {}, | ||
| meta, | ||
| updated=True, | ||
| ) |
There was a problem hiding this comment.
When inserting the unsqueeze_copy node, the pass reuses the original node metadata unchanged. In the main Arm pipeline this pass can run post Q/DQ folding, so carrying over input_qparams/output_qparams from the original max_pool1d node can incorrectly mark the new view-like node as already-quantized. Other passes that insert view ops clear qparams on these intermediates (e.g., Conv1dUnsqueezePass). Consider copying meta and clearing input/output qparams for the inserted unsqueeze node.
| output = super().call_operator( | ||
| torch.ops.aten.squeeze_copy.dims, | ||
| (pooled, [2]), | ||
| {}, | ||
| meta, | ||
| updated=True, | ||
| ) |
There was a problem hiding this comment.
Same as the inserted unsqueeze_copy: the inserted squeeze_copy node currently reuses the original node metadata unchanged. If this pass runs after quantization/QDQ folding, preserving qparams here can lead to incorrect quant metadata on a view-like op. Consider using a copied meta with cleared input_qparams/output_qparams for this intermediate reshape as well.
| DecomposeMaxPool2dPass(), | ||
| DecomposeMaxPool1dPass(), |
There was a problem hiding this comment.
In the TOSA backend pipeline, DecomposeMaxPool1dPass is currently scheduled after DecomposeMaxPool2dPass. Since this pass can introduce a max_pool2d op (potentially with dilation > 1), running it after the 2D decomposition pass prevents the newly created max_pool2d from being further decomposed/normalized. Consider moving DecomposeMaxPool1dPass before DecomposeMaxPool2dPass, or declaring DecomposeMaxPool2dPass in _passes_required_after so ordering is enforced.
| DecomposeMaxPool2dPass(), | |
| DecomposeMaxPool1dPass(), | |
| DecomposeMaxPool1dPass(), | |
| DecomposeMaxPool2dPass(), |
| class DecomposeMaxPool1dPass(ArmPass): | ||
| """Decomposes max_pool1d into max_pool2d via unsqueeze_copy/squeeze_copy | ||
| operations. | ||
|
|
||
| This pass runs in transform_for_annotation (TFA) pipeline before quantization, | ||
| ensuring proper quantization annotation for the decomposed ops. | ||
|
|
||
| Transformation: | ||
| max_pool1d(x, kernel, stride, padding, dilation, ceil_mode) | ||
| → unsqueeze_copy(x, dim=2) # (N,C,L) → (N,C,1,L) | ||
| → max_pool2d(..., [1,k], [1,s], [0,p], [1,d], ceil_mode) | ||
| → squeeze_copy(..., dims=[2]) # (N,C,1,L') → (N,C,L') | ||
| """ |
There was a problem hiding this comment.
There are existing max_pool1d backend tests currently marked xfail (e.g., backends/arm/test/ops/test_max_pool1d.py). Since this PR adds explicit MaxPool1D support, it should also update/enable those tests (and adjust any expected edge op name if the lowering changes from the previous max_pool2d_with_indices pattern).
|
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
|
@Ninja91 stale? |
Summary: Implement DecomposeMaxPool1dPass to enable MaxPool1D support on ARM backend by decomposing max_pool1d into unsqueeze_copy → max_pool2d → squeeze_copy. ## Implementation Strategy ### Decomposition Approach (Optimal for TOSA/Vela) The pass decomposes max_pool1d into max_pool2d via unsqueeze_copy/squeeze_copy operations: 1. unsqueeze_copy(dim=2): (N, C, L) → (N, C, 1, L) - add height dimension 2. max_pool2d: with adapted params [k]→[1,k], [s]→[1,s], [p]→[0,p], [d]→[1,d] 3. squeeze_copy(dims=[2]): (N, C, 1, L_out) → (N, C, L_out) - remove height dimension ### Why This Approach is Optimal 1. **unsqueeze_copy and squeeze_copy map to TOSA RESHAPE** which is zero-cost in Vela: - Classified as memory_only_ops (Reshape, Squeeze, ExpandDims, Identity) - Bypassed entirely when conditions met (NPU-produced, single consumer) - Tensor equivalence enables memory aliasing (same address) 2. **TFA Pipeline Placement (before quantization)**: - unsqueeze_copy.default is in _one_to_one_shared_input_qspec - squeeze_copy.dims is added to _one_to_one_shared_input_qspec - max_pool2d is in _one_to_one_shared_input_or_input_act_qspec - All get proper SharedQuantizationSpec from the annotator automatically 3. **Quantization Handling**: - Clear qparams on intermediate unsqueeze_copy and squeeze_copy ops (let annotator fill them) - Preserve original meta on max_pool2d for proper tracing - MAX_POOL2D doesn't need zero-point handling (unlike AVG_POOL2D) ### TOSA/Vela Constraints Validated - U55: Stride ≤3 ✓, Kernel ≤256x256 ✓ - U85: Extended stride support via accumulator save/restore - Dilation: Handled by separate DecomposeMaxPool2dPass if needed Reviewed By: 3l1 Differential Revision: D91760459
Summary: Implement DecomposeMaxPool1dPass to enable MaxPool1D support on ARM backend by decomposing max_pool1d into unsqueeze_copy → max_pool2d → squeeze_copy. ## Implementation Strategy ### Decomposition Approach (Optimal for TOSA/Vela) The pass decomposes max_pool1d into max_pool2d via unsqueeze_copy/squeeze_copy operations: 1. unsqueeze_copy(dim=2): (N, C, L) → (N, C, 1, L) - add height dimension 2. max_pool2d: with adapted params [k]→[1,k], [s]→[1,s], [p]→[0,p], [d]→[1,d] 3. squeeze_copy(dims=[2]): (N, C, 1, L_out) → (N, C, L_out) - remove height dimension ### Why This Approach is Optimal 1. **unsqueeze_copy and squeeze_copy map to TOSA RESHAPE** which is zero-cost in Vela: - Classified as memory_only_ops (Reshape, Squeeze, ExpandDims, Identity) - Bypassed entirely when conditions met (NPU-produced, single consumer) - Tensor equivalence enables memory aliasing (same address) 2. **TFA Pipeline Placement (before quantization)**: - unsqueeze_copy.default is in _one_to_one_shared_input_qspec - squeeze_copy.dims is added to _one_to_one_shared_input_qspec - max_pool2d is in _one_to_one_shared_input_or_input_act_qspec - All get proper SharedQuantizationSpec from the annotator automatically 3. **Quantization Handling**: - Clear qparams on intermediate unsqueeze_copy and squeeze_copy ops (let annotator fill them) - Preserve original meta on max_pool2d for proper tracing - MAX_POOL2D doesn't need zero-point handling (unlike AVG_POOL2D) ### TOSA/Vela Constraints Validated - U55: Stride ≤3 ✓, Kernel ≤256x256 ✓ - U85: Extended stride support via accumulator save/restore - Dilation: Handled by separate DecomposeMaxPool2dPass if needed Reviewed By: 3l1 Differential Revision: D91760459
| def call_operator(self, op, args, kwargs, meta): | ||
| if op != torch.ops.aten.max_pool1d.default or not self.allowed_to_transform( | ||
| meta | ||
| ): | ||
| return super().call_operator(op, args, kwargs, meta) | ||
|
|
||
| # Extract and normalize arguments | ||
| x = args[0] | ||
| kernel_size = _normalize_to_list(args[1]) | ||
| stride = _normalize_to_list( | ||
| args[2] if len(args) > 2 else None, | ||
| default=kernel_size, # stride defaults to kernel_size | ||
| ) | ||
| padding = _normalize_to_list(args[3] if len(args) > 3 else 0) | ||
| dilation = _normalize_to_list(args[4] if len(args) > 4 else 1) | ||
| ceil_mode = args[5] if len(args) > 5 else False | ||
|
|
||
| # Step 1: Unsqueeze input from 3D to 4D at dim=2 | ||
| # (N, C, L) → (N, C, 1, L) | ||
| unsqueeze_meta = meta.copy() | ||
| unsqueeze_meta.data["input_qparams"] = {} | ||
| unsqueeze_meta.data["output_qparams"] = {} | ||
| x_4d = super().call_operator( | ||
| torch.ops.aten.unsqueeze_copy.default, | ||
| (x, 2), | ||
| {}, | ||
| unsqueeze_meta, | ||
| updated=True, | ||
| ) | ||
|
|
||
| # Step 2: Call max_pool2d with 2D parameters | ||
| # kernel: [k] → [1, k], stride: [s] → [1, s] | ||
| # padding: [p] → [0, p], dilation: [d] → [1, d] | ||
| pooled = super().call_operator( | ||
| torch.ops.aten.max_pool2d.default, | ||
| ( | ||
| x_4d, | ||
| [1] + kernel_size, | ||
| [1] + stride, | ||
| [0] + padding, | ||
| [1] + dilation, | ||
| ceil_mode, | ||
| ), | ||
| {}, | ||
| meta, | ||
| updated=True, | ||
| ) |
| UnsqueezeBeforeRepeatPass(), | ||
| DecomposeCumsumPass(exported_program), | ||
| DecomposeAsStridedCopyPass(), | ||
| DecomposeMaxPool1dPass(), |
There was a problem hiding this comment.
Agree with this one. We shouldn't see any max_pool1d:s at this stage, the pass won't match any exir-targets and will emit torch.ops rather than exir_ops.
|
|
||
|
|
||
| @common.parametrize("test_data", test_data_suite_all) | ||
| @pytest.mark.xfail(reason="MaxPool1D not yet supported", strict=False) |
There was a problem hiding this comment.
I would have expected U55 and U85 tests to pass now as all the independent operators of the decomposition are supported.
| UnsqueezeBeforeRepeatPass(), | ||
| DecomposeCumsumPass(exported_program), | ||
| DecomposeAsStridedCopyPass(), | ||
| DecomposeMaxPool1dPass(), |
There was a problem hiding this comment.
Agree with this one. We shouldn't see any max_pool1d:s at this stage, the pass won't match any exir-targets and will emit torch.ops rather than exir_ops.
Summary:
Implement DecomposeMaxPool1dPass to enable MaxPool1D support on ARM backend
by decomposing max_pool1d into unsqueeze_copy → max_pool2d → squeeze_copy.
Implementation Strategy
Decomposition Approach (Optimal for TOSA/Vela)
The pass decomposes max_pool1d into max_pool2d via unsqueeze_copy/squeeze_copy
operations:
Why This Approach is Optimal
unsqueeze_copy and squeeze_copy map to TOSA RESHAPE which is zero-cost in Vela:
TFA Pipeline Placement (before quantization):
Quantization Handling:
TOSA/Vela Constraints Validated
Reviewed By: 3l1
Differential Revision: D91760459