Add MaxPool1D decomposition pass support (#17022) by Ninja91 · Pull Request #17022 · pytorch/executorch

Ninja91 · 2026-01-30T00:44:53Z

Summary:

Implement DecomposeMaxPool1dPass to enable MaxPool1D support on ARM backend
by decomposing max_pool1d into unsqueeze_copy → max_pool2d → squeeze_copy.

Implementation Strategy

Decomposition Approach (Optimal for TOSA/Vela)

The pass decomposes max_pool1d into max_pool2d via unsqueeze_copy/squeeze_copy
operations:

unsqueeze_copy(dim=2): (N, C, L) → (N, C, 1, L) - add height dimension
max_pool2d: with adapted params [k]→[1,k], [s]→[1,s], [p]→[0,p], [d]→[1,d]
squeeze_copy(dims=[2]): (N, C, 1, L_out) → (N, C, L_out) - remove height dimension

Why This Approach is Optimal

unsqueeze_copy and squeeze_copy map to TOSA RESHAPE which is zero-cost in Vela:
- Classified as memory_only_ops (Reshape, Squeeze, ExpandDims, Identity)
- Bypassed entirely when conditions met (NPU-produced, single consumer)
- Tensor equivalence enables memory aliasing (same address)
TFA Pipeline Placement (before quantization):
- unsqueeze_copy.default is in _one_to_one_shared_input_qspec
- squeeze_copy.dims is added to _one_to_one_shared_input_qspec
- max_pool2d is in _one_to_one_shared_input_or_input_act_qspec
- All get proper SharedQuantizationSpec from the annotator automatically
Quantization Handling:
- Clear qparams on intermediate unsqueeze_copy and squeeze_copy ops (let annotator fill them)
- Preserve original meta on max_pool2d for proper tracing
- MAX_POOL2D doesn't need zero-point handling (unlike AVG_POOL2D)

TOSA/Vela Constraints Validated

U55: Stride ≤3 ✓, Kernel ≤256x256 ✓
U85: Extended stride support via accumulator save/restore
Dilation: Handled by separate DecomposeMaxPool2dPass if needed

Reviewed By: 3l1

Differential Revision: D91760459

pytorch-bot · 2026-01-30T00:44:56Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17022

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Run pull request jobs on OSDC runners in shadow mode

❌ 2 New Failures, 2 Unrelated Failures

As of commit ad1f626 with merge base 10c8958 ():

NEW FAILURES - The following jobs have failed:

pull / test-qnn-wheel-packages-linux (3.10) / linux-job (gh)
RuntimeError: Command docker exec -t a053f9ef18bfdb54245dd1da81dd098b9beb9a8434350063edd4f47090d96914 /exec failed with exit code 1
pull / unittest-arm-backend-with-no-deps (test_pytest_ops_tosa) / linux-job (gh)
RuntimeError: Command docker exec -t ee286d62f96d16cb23c7530401beafe335a71d0815e2d34651f457bd92dea4d0 /exec failed with exit code 1

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

pull / test-qnn-testsuite-linux / test-backend-linux (qnn, models) / linux-job (gh) (detected as infra flaky with no log or failing log classifier)
trunk / test-models-linux-aarch64 (qwen2_5_1_5b, portable, linux.arm64.2xlarge) / linux-job (gh) (matched linux rule in flaky-rules.json)
The runner has received a shutdown signal. This can happen when the runner service is stopped, or a manually started runner is canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-codesync · 2026-01-30T00:45:11Z

@Ninja91 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D91760459.

github-actions · 2026-01-30T00:45:55Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Copilot

Pull request overview

Implements MaxPool1D support for the Arm backend by introducing a decomposition pass that rewrites max_pool1d into view_copy → max_pool2d → view_copy, and adds test coverage for the new behavior.

Changes:

Add DecomposeMaxPool1dPass and register it in the Arm pass pipeline (TFA/annotation stage).
Extend Arm quantization annotation patterns to treat squeeze_copy.dims as a shared-input-qspec op.
Add Arm backend tests covering MaxPool1D across TOSA FP/INT and Ethos-U55/U85 INT pipelines.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
backends/arm/test/targets.bzl	Registers the new MaxPool1D test file in the Arm test suite.
backends/arm/test/ops/test_max_pool1d.py	Adds pipeline-based tests validating MaxPool1D behavior and delegation.
backends/arm/quantizer/quantization_annotator.py	Adds `squeeze_copy.dims` to shared-qspec ops for quant annotation consistency.
backends/arm/_passes/decompose_max_pool1d_pass.py	Introduces the MaxPool1D decomposition pass (view/unsqueeze → max_pool2d → view/squeeze).
backends/arm/_passes/arm_pass_manager.py	Wires the new pass into the Arm pass manager (incl. annotation pipeline).
backends/arm/_passes/init.py	Exposes the new pass from the passes package.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-30T00:50:54Z

+@common.parametrize("test_data", test_data_suite)
+def test_max_pool1d_tosa_FP(test_data: torch.Tensor):
+    """Test max_pool1d with TOSA FP pipeline."""
+    test_data, model_params = test_data()
+    pipeline = TosaPipelineFP[input_t1](


The test_data parameter is a callable from test_data_suite/test_data_suite_u55 (you immediately call test_data()), but it's annotated as torch.Tensor. This makes the test signature misleading for static type checking and IDEs; consider annotating it as Callable[[], tuple[torch.Tensor, list[int]]] (or dropping the annotation) to match actual usage.

oscarandersson8218

Awesome patch @Ninja91! Looks good on a high level. Adding a few comments on top of Copilot's.

Summary: Implement DecomposeMaxPool1dPass to enable MaxPool1D support on ARM backend by decomposing max_pool1d to view_copy → max_pool2d → view_copy. ## Implementation Strategy ### Decomposition Approach (Optimal for TOSA/Vela) The pass decomposes max_pool1d into max_pool2d via view_copy operations: 1. view_copy: (N, C, L) → (N, C, 1, L) - add height dimension 2. max_pool2d: with adapted params [k]→[1,k], [s]→[1,s], [p]→[0,p] 3. view_copy: (N, C, 1, L_out) → (N, C, L_out) - remove height dimension ### Why This Approach is Optimal 1. **view_copy maps to TOSA RESHAPE** which is zero-cost in Vela: - Classified as memory_only_ops (Reshape, Squeeze, ExpandDims, Identity) - Bypassed entirely when conditions met (NPU-produced, single consumer) - Tensor equivalence enables memory aliasing (same address) 2. **TFA Pipeline Placement (before quantization)**: - view_copy is in _one_to_one_shared_input_qspec (line 407) - max_pool2d is in _one_to_one_shared_input_or_input_act_qspec (line 455) - Both get proper SharedQuantizationSpec from annotator automatically 3. **Quantization Handling**: - Clear qparams on intermediate view_copy ops (let annotator fill them) - Preserve original meta on max_pool2d for proper tracing - MAX_POOL2D doesn't need zero-point handling (unlike AVG_POOL2D) ### TOSA/Vela Constraints Validated - U55: Stride ≤3 ✓, Kernel ≤256x256 ✓ - U85: Extended stride support via accumulator save/restore - Dilation: Handled by separate DecomposeMaxPool2dPass if needed Differential Revision: D91760459

zingo · 2026-02-10T08:41:26Z

See the errors below that seems suspicions, test to rerun Arm tests to see if it's just flakey

FAILED backends/arm/test/ops/test_matmul.py::test_matmul_XXXX

Summary: Implement DecomposeMaxPool1dPass to enable MaxPool1D support on ARM backend by decomposing max_pool1d to view_copy → max_pool2d → view_copy. ## Implementation Strategy ### Decomposition Approach (Optimal for TOSA/Vela) The pass decomposes max_pool1d into max_pool2d via view_copy operations: 1. view_copy: (N, C, L) → (N, C, 1, L) - add height dimension 2. max_pool2d: with adapted params [k]→[1,k], [s]→[1,s], [p]→[0,p] 3. view_copy: (N, C, 1, L_out) → (N, C, L_out) - remove height dimension ### Why This Approach is Optimal 1. **view_copy maps to TOSA RESHAPE** which is zero-cost in Vela: - Classified as memory_only_ops (Reshape, Squeeze, ExpandDims, Identity) - Bypassed entirely when conditions met (NPU-produced, single consumer) - Tensor equivalence enables memory aliasing (same address) 2. **TFA Pipeline Placement (before quantization)**: - view_copy is in _one_to_one_shared_input_qspec (line 407) - max_pool2d is in _one_to_one_shared_input_or_input_act_qspec (line 455) - Both get proper SharedQuantizationSpec from annotator automatically 3. **Quantization Handling**: - Clear qparams on intermediate view_copy ops (let annotator fill them) - Preserve original meta on max_pool2d for proper tracing - MAX_POOL2D doesn't need zero-point handling (unlike AVG_POOL2D) ### TOSA/Vela Constraints Validated - U55: Stride ≤3 ✓, Kernel ≤256x256 ✓ - U85: Extended stride support via accumulator save/restore - Dilation: Handled by separate DecomposeMaxPool2dPass if needed Differential Revision: D91760459

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-10T22:07:21Z

+@common.parametrize("test_data", test_data_suite)
+@pytest.mark.xfail(reason="MaxPool1D not yet supported", strict=False)
+def test_max_pool1d_u55_INT(test_data: Callable):
+    """Test max_pool1d on Ethos-U55 (quantized)."""
+    test_data, model_params = test_data()
+    pipeline = EthosU55PipelineINT[input_t1](
+        MaxPool1d(*model_params),
+        (test_data,),
+        aten_op,
+        exir_ops=[],
+    )
+    pipeline.run()
+
+
+@common.parametrize("test_data", test_data_suite)
+@pytest.mark.xfail(reason="MaxPool1D not yet supported", strict=False)
+def test_max_pool1d_u85_INT(test_data: Callable):
+    """Test max_pool1d on Ethos-U85 (quantized)."""
+    test_data, model_params = test_data()
+    pipeline = EthosU85PipelineINT[input_t1](
+        MaxPool1d(*model_params),
+        (test_data,),
+        aten_op,
+        exir_ops=[],
+    )
+    pipeline.run()
+


The U55/U85 INT tests use a broad @pytest.mark.xfail; once MaxPool1D is supported, these should follow the repo’s existing pattern and be gated with common.XfailIfNoCorstone300 / common.XfailIfNoCorstone320 instead (so CI environments without FVP don’t hard-fail).

Copilot · 2026-02-10T22:07:21Z

+test_data_suite = {
+    # (test_name, test_data, [kernel_size, stride, padding])
+    "simple": lambda: (torch.rand(1, 16, 50), [4, 2, 0]),
+    "with_padding": lambda: (torch.rand(1, 16, 50), [3, 2, 1]),
+    "stride_1": lambda: (torch.rand(1, 8, 32), [3, 1, 0]),
+    "larger_kernel": lambda: (torch.rand(1, 4, 64), [8, 4, 0]),
+    "multi_batch": lambda: (torch.rand(4, 16, 50), [4, 2, 0]),
+}


The comment describing test_data_suite’s tuple shape is inaccurate: the values returned are (tensor, [kernel_size, stride, padding]) and do not include a test_name element. Please update the comment to match the actual structure to avoid confusion when adding new cases.

zingo · 2026-02-11T08:40:20Z

PR seem to introduce this errors :(
FAILED backends/arm/test/ops/test_matmul.py::test_matmul_u55_INT[double_input_randn_rand_1d_2d]
FAILED backends/arm/test/ops/test_matmul.py::test_matmul_u85_INT_a16w8[double_input_randn_rand_1d_2d]
FAILED backends/arm/test/ops/test_matmul.py::test_matmul_u85_INT[double_input_randn_rand_1d_2d]
FAILED backends/arm/test/ops/test_max_pool1d.py::test_max_pool1d_vgf_quant[simple]
FAILED backends/arm/test/ops/test_max_pool1d.py::test_max_pool1d_vgf_quant[with_padding]
FAILED backends/arm/test/ops/test_max_pool1d.py::test_max_pool1d_vgf_quant[stride_1]
FAILED backends/arm/test/ops/test_max_pool1d.py::test_max_pool1d_vgf_quant[larger_kernel]
FAILED backends/arm/test/ops/test_max_pool1d.py::test_max_pool1d_vgf_quant[multi_batch]

Summary: Implement DecomposeMaxPool1dPass to enable MaxPool1D support on ARM backend by decomposing max_pool1d to view_copy → max_pool2d → view_copy. ## Implementation Strategy ### Decomposition Approach (Optimal for TOSA/Vela) The pass decomposes max_pool1d into max_pool2d via view_copy operations: 1. view_copy: (N, C, L) → (N, C, 1, L) - add height dimension 2. max_pool2d: with adapted params [k]→[1,k], [s]→[1,s], [p]→[0,p] 3. view_copy: (N, C, 1, L_out) → (N, C, L_out) - remove height dimension ### Why This Approach is Optimal 1. **view_copy maps to TOSA RESHAPE** which is zero-cost in Vela: - Classified as memory_only_ops (Reshape, Squeeze, ExpandDims, Identity) - Bypassed entirely when conditions met (NPU-produced, single consumer) - Tensor equivalence enables memory aliasing (same address) 2. **TFA Pipeline Placement (before quantization)**: - view_copy is in _one_to_one_shared_input_qspec (line 407) - max_pool2d is in _one_to_one_shared_input_or_input_act_qspec (line 455) - Both get proper SharedQuantizationSpec from annotator automatically 3. **Quantization Handling**: - Clear qparams on intermediate view_copy ops (let annotator fill them) - Preserve original meta on max_pool2d for proper tracing - MAX_POOL2D doesn't need zero-point handling (unlike AVG_POOL2D) ### TOSA/Vela Constraints Validated - U55: Stride ≤3 ✓, Kernel ≤256x256 ✓ - U85: Extended stride support via accumulator save/restore - Dilation: Handled by separate DecomposeMaxPool2dPass if needed Reviewed By: 3l1 Differential Revision: D91760459

Summary: Pull Request resolved: pytorch#17022 Implement DecomposeMaxPool1dPass to enable MaxPool1D support on ARM backend by decomposing max_pool1d to view_copy → max_pool2d → view_copy. ## Implementation Strategy ### Decomposition Approach (Optimal for TOSA/Vela) The pass decomposes max_pool1d into max_pool2d via view_copy operations: 1. view_copy: (N, C, L) → (N, C, 1, L) - add height dimension 2. max_pool2d: with adapted params [k]→[1,k], [s]→[1,s], [p]→[0,p] 3. view_copy: (N, C, 1, L_out) → (N, C, L_out) - remove height dimension ### Why This Approach is Optimal 1. **view_copy maps to TOSA RESHAPE** which is zero-cost in Vela: - Classified as memory_only_ops (Reshape, Squeeze, ExpandDims, Identity) - Bypassed entirely when conditions met (NPU-produced, single consumer) - Tensor equivalence enables memory aliasing (same address) 2. **TFA Pipeline Placement (before quantization)**: - view_copy is in _one_to_one_shared_input_qspec (line 407) - max_pool2d is in _one_to_one_shared_input_or_input_act_qspec (line 455) - Both get proper SharedQuantizationSpec from annotator automatically 3. **Quantization Handling**: - Clear qparams on intermediate view_copy ops (let annotator fill them) - Preserve original meta on max_pool2d for proper tracing - MAX_POOL2D doesn't need zero-point handling (unlike AVG_POOL2D) ### TOSA/Vela Constraints Validated - U55: Stride ≤3 ✓, Kernel ≤256x256 ✓ - U85: Extended stride support via accumulator save/restore - Dilation: Handled by separate DecomposeMaxPool2dPass if needed Reviewed By: 3l1 Differential Revision: D91760459

Summary: Implement DecomposeMaxPool1dPass to enable MaxPool1D support on ARM backend by decomposing max_pool1d to view_copy → max_pool2d → view_copy. ## Implementation Strategy ### Decomposition Approach (Optimal for TOSA/Vela) The pass decomposes max_pool1d into max_pool2d via view_copy operations: 1. view_copy: (N, C, L) → (N, C, 1, L) - add height dimension 2. max_pool2d: with adapted params [k]→[1,k], [s]→[1,s], [p]→[0,p] 3. view_copy: (N, C, 1, L_out) → (N, C, L_out) - remove height dimension ### Why This Approach is Optimal 1. **view_copy maps to TOSA RESHAPE** which is zero-cost in Vela: - Classified as memory_only_ops (Reshape, Squeeze, ExpandDims, Identity) - Bypassed entirely when conditions met (NPU-produced, single consumer) - Tensor equivalence enables memory aliasing (same address) 2. **TFA Pipeline Placement (before quantization)**: - view_copy is in _one_to_one_shared_input_qspec (line 407) - max_pool2d is in _one_to_one_shared_input_or_input_act_qspec (line 455) - Both get proper SharedQuantizationSpec from annotator automatically 3. **Quantization Handling**: - Clear qparams on intermediate view_copy ops (let annotator fill them) - Preserve original meta on max_pool2d for proper tracing - MAX_POOL2D doesn't need zero-point handling (unlike AVG_POOL2D) ### TOSA/Vela Constraints Validated - U55: Stride ≤3 ✓, Kernel ≤256x256 ✓ - U85: Extended stride support via accumulator save/restore - Dilation: Handled by separate DecomposeMaxPool2dPass if needed Reviewed By: 3l1 Differential Revision: D91760459

Summary: Pull Request resolved: pytorch#17022 Implement DecomposeMaxPool1dPass to enable MaxPool1D support on ARM backend by decomposing max_pool1d to view_copy → max_pool2d → view_copy. ## Implementation Strategy ### Decomposition Approach (Optimal for TOSA/Vela) The pass decomposes max_pool1d into max_pool2d via view_copy operations: 1. view_copy: (N, C, L) → (N, C, 1, L) - add height dimension 2. max_pool2d: with adapted params [k]→[1,k], [s]→[1,s], [p]→[0,p] 3. view_copy: (N, C, 1, L_out) → (N, C, L_out) - remove height dimension ### Why This Approach is Optimal 1. **view_copy maps to TOSA RESHAPE** which is zero-cost in Vela: - Classified as memory_only_ops (Reshape, Squeeze, ExpandDims, Identity) - Bypassed entirely when conditions met (NPU-produced, single consumer) - Tensor equivalence enables memory aliasing (same address) 2. **TFA Pipeline Placement (before quantization)**: - view_copy is in _one_to_one_shared_input_qspec (line 407) - max_pool2d is in _one_to_one_shared_input_or_input_act_qspec (line 455) - Both get proper SharedQuantizationSpec from annotator automatically 3. **Quantization Handling**: - Clear qparams on intermediate view_copy ops (let annotator fill them) - Preserve original meta on max_pool2d for proper tracing - MAX_POOL2D doesn't need zero-point handling (unlike AVG_POOL2D) ### TOSA/Vela Constraints Validated - U55: Stride ≤3 ✓, Kernel ≤256x256 ✓ - U85: Extended stride support via accumulator save/restore - Dilation: Handled by separate DecomposeMaxPool2dPass if needed Reviewed By: 3l1 Differential Revision: D91760459

Summary: Implement DecomposeMaxPool1dPass to enable MaxPool1D support on ARM backend by decomposing max_pool1d to view_copy → max_pool2d → view_copy. ## Implementation Strategy ### Decomposition Approach (Optimal for TOSA/Vela) The pass decomposes max_pool1d into max_pool2d via view_copy operations: 1. view_copy: (N, C, L) → (N, C, 1, L) - add height dimension 2. max_pool2d: with adapted params [k]→[1,k], [s]→[1,s], [p]→[0,p] 3. view_copy: (N, C, 1, L_out) → (N, C, L_out) - remove height dimension ### Why This Approach is Optimal 1. **view_copy maps to TOSA RESHAPE** which is zero-cost in Vela: - Classified as memory_only_ops (Reshape, Squeeze, ExpandDims, Identity) - Bypassed entirely when conditions met (NPU-produced, single consumer) - Tensor equivalence enables memory aliasing (same address) 2. **TFA Pipeline Placement (before quantization)**: - view_copy is in _one_to_one_shared_input_qspec (line 407) - max_pool2d is in _one_to_one_shared_input_or_input_act_qspec (line 455) - Both get proper SharedQuantizationSpec from annotator automatically 3. **Quantization Handling**: - Clear qparams on intermediate view_copy ops (let annotator fill them) - Preserve original meta on max_pool2d for proper tracing - MAX_POOL2D doesn't need zero-point handling (unlike AVG_POOL2D) ### TOSA/Vela Constraints Validated - U55: Stride ≤3 ✓, Kernel ≤256x256 ✓ - U85: Extended stride support via accumulator save/restore - Dilation: Handled by separate DecomposeMaxPool2dPass if needed Reviewed By: 3l1 Differential Revision: D91760459

Summary: Pull Request resolved: pytorch#17022 Implement DecomposeMaxPool1dPass to enable MaxPool1D support on ARM backend by decomposing max_pool1d to view_copy → max_pool2d → view_copy. ## Implementation Strategy ### Decomposition Approach (Optimal for TOSA/Vela) The pass decomposes max_pool1d into max_pool2d via view_copy operations: 1. view_copy: (N, C, L) → (N, C, 1, L) - add height dimension 2. max_pool2d: with adapted params [k]→[1,k], [s]→[1,s], [p]→[0,p] 3. view_copy: (N, C, 1, L_out) → (N, C, L_out) - remove height dimension ### Why This Approach is Optimal 1. **view_copy maps to TOSA RESHAPE** which is zero-cost in Vela: - Classified as memory_only_ops (Reshape, Squeeze, ExpandDims, Identity) - Bypassed entirely when conditions met (NPU-produced, single consumer) - Tensor equivalence enables memory aliasing (same address) 2. **TFA Pipeline Placement (before quantization)**: - view_copy is in _one_to_one_shared_input_qspec (line 407) - max_pool2d is in _one_to_one_shared_input_or_input_act_qspec (line 455) - Both get proper SharedQuantizationSpec from annotator automatically 3. **Quantization Handling**: - Clear qparams on intermediate view_copy ops (let annotator fill them) - Preserve original meta on max_pool2d for proper tracing - MAX_POOL2D doesn't need zero-point handling (unlike AVG_POOL2D) ### TOSA/Vela Constraints Validated - U55: Stride ≤3 ✓, Kernel ≤256x256 ✓ - U85: Extended stride support via accumulator save/restore - Dilation: Handled by separate DecomposeMaxPool2dPass if needed Reviewed By: 3l1 Differential Revision: D91760459

Summary: Implement DecomposeMaxPool1dPass to enable MaxPool1D support on ARM backend by decomposing max_pool1d to view_copy → max_pool2d → view_copy. ## Implementation Strategy ### Decomposition Approach (Optimal for TOSA/Vela) The pass decomposes max_pool1d into max_pool2d via view_copy operations: 1. view_copy: (N, C, L) → (N, C, 1, L) - add height dimension 2. max_pool2d: with adapted params [k]→[1,k], [s]→[1,s], [p]→[0,p] 3. view_copy: (N, C, 1, L_out) → (N, C, L_out) - remove height dimension ### Why This Approach is Optimal 1. **view_copy maps to TOSA RESHAPE** which is zero-cost in Vela: - Classified as memory_only_ops (Reshape, Squeeze, ExpandDims, Identity) - Bypassed entirely when conditions met (NPU-produced, single consumer) - Tensor equivalence enables memory aliasing (same address) 2. **TFA Pipeline Placement (before quantization)**: - view_copy is in _one_to_one_shared_input_qspec (line 407) - max_pool2d is in _one_to_one_shared_input_or_input_act_qspec (line 455) - Both get proper SharedQuantizationSpec from annotator automatically 3. **Quantization Handling**: - Clear qparams on intermediate view_copy ops (let annotator fill them) - Preserve original meta on max_pool2d for proper tracing - MAX_POOL2D doesn't need zero-point handling (unlike AVG_POOL2D) ### TOSA/Vela Constraints Validated - U55: Stride ≤3 ✓, Kernel ≤256x256 ✓ - U85: Extended stride support via accumulator save/restore - Dilation: Handled by separate DecomposeMaxPool2dPass if needed Reviewed By: 3l1 Differential Revision: D91760459

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-03T21:03:30Z

+    def call_operator(self, op, args, kwargs, meta):
+        if op != torch.ops.aten.max_pool1d.default or not self.allowed_to_transform(
+            meta
+        ):
+            return super().call_operator(op, args, kwargs, meta)


DecomposeMaxPool1dPass currently only matches torch.ops.aten.max_pool1d.default. In the backend lowering pipeline (transform_to_backend_pipeline), graphs are typically in the EXIR edge dialect, so this pass may never fire for non-quantized compilation unless it also handles the edge-dialect overload (and emits the corresponding edge ops). Consider supporting both aten and edge overloads (similar to other passes that accept both).

Copilot · 2026-03-03T21:03:31Z

+        x_4d = super().call_operator(
+            torch.ops.aten.unsqueeze_copy.default,
+            (x, 2),
+            {},
+            meta,
+            updated=True,
+        )


When inserting the unsqueeze_copy node, the pass reuses the original node metadata unchanged. In the main Arm pipeline this pass can run post Q/DQ folding, so carrying over input_qparams/output_qparams from the original max_pool1d node can incorrectly mark the new view-like node as already-quantized. Other passes that insert view ops clear qparams on these intermediates (e.g., Conv1dUnsqueezePass). Consider copying meta and clearing input/output qparams for the inserted unsqueeze node.

Copilot · 2026-03-03T21:03:31Z

+        output = super().call_operator(
+            torch.ops.aten.squeeze_copy.dims,
+            (pooled, [2]),
+            {},
+            meta,
+            updated=True,
+        )


Same as the inserted unsqueeze_copy: the inserted squeeze_copy node currently reuses the original node metadata unchanged. If this pass runs after quantization/QDQ folding, preserving qparams here can lead to incorrect quant metadata on a view-like op. Consider using a copied meta with cleared input_qparams/output_qparams for this intermediate reshape as well.

Copilot · 2026-03-03T21:03:31Z

                DecomposeMaxPool2dPass(),
+                DecomposeMaxPool1dPass(),


In the TOSA backend pipeline, DecomposeMaxPool1dPass is currently scheduled after DecomposeMaxPool2dPass. Since this pass can introduce a max_pool2d op (potentially with dilation > 1), running it after the 2D decomposition pass prevents the newly created max_pool2d from being further decomposed/normalized. Consider moving DecomposeMaxPool1dPass before DecomposeMaxPool2dPass, or declaring DecomposeMaxPool2dPass in _passes_required_after so ordering is enforced.

Suggested change

DecomposeMaxPool2dPass(),

DecomposeMaxPool1dPass(),

DecomposeMaxPool1dPass(),

DecomposeMaxPool2dPass(),

Copilot · 2026-03-03T21:03:31Z

+class DecomposeMaxPool1dPass(ArmPass):
+    """Decomposes max_pool1d into max_pool2d via unsqueeze_copy/squeeze_copy
+    operations.
+
+    This pass runs in transform_for_annotation (TFA) pipeline before quantization,
+    ensuring proper quantization annotation for the decomposed ops.
+
+    Transformation:
+        max_pool1d(x, kernel, stride, padding, dilation, ceil_mode)
+            → unsqueeze_copy(x, dim=2)           # (N,C,L) → (N,C,1,L)
+            → max_pool2d(..., [1,k], [1,s], [0,p], [1,d], ceil_mode)
+            → squeeze_copy(..., dims=[2])        # (N,C,1,L') → (N,C,L')
+    """


There are existing max_pool1d backend tests currently marked xfail (e.g., backends/arm/test/ops/test_max_pool1d.py). Since this PR adds explicit MaxPool1D support, it should also update/enable those tests (and adjust any expected edge op name if the lowering changes from the previous max_pool2d_with_indices pattern).

github-actions · 2026-05-03T01:17:02Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

digantdesai · 2026-05-11T16:26:31Z

@Ninja91 stale?

Summary: Implement DecomposeMaxPool1dPass to enable MaxPool1D support on ARM backend by decomposing max_pool1d into unsqueeze_copy → max_pool2d → squeeze_copy. ## Implementation Strategy ### Decomposition Approach (Optimal for TOSA/Vela) The pass decomposes max_pool1d into max_pool2d via unsqueeze_copy/squeeze_copy operations: 1. unsqueeze_copy(dim=2): (N, C, L) → (N, C, 1, L) - add height dimension 2. max_pool2d: with adapted params [k]→[1,k], [s]→[1,s], [p]→[0,p], [d]→[1,d] 3. squeeze_copy(dims=[2]): (N, C, 1, L_out) → (N, C, L_out) - remove height dimension ### Why This Approach is Optimal 1. **unsqueeze_copy and squeeze_copy map to TOSA RESHAPE** which is zero-cost in Vela: - Classified as memory_only_ops (Reshape, Squeeze, ExpandDims, Identity) - Bypassed entirely when conditions met (NPU-produced, single consumer) - Tensor equivalence enables memory aliasing (same address) 2. **TFA Pipeline Placement (before quantization)**: - unsqueeze_copy.default is in _one_to_one_shared_input_qspec - squeeze_copy.dims is added to _one_to_one_shared_input_qspec - max_pool2d is in _one_to_one_shared_input_or_input_act_qspec - All get proper SharedQuantizationSpec from the annotator automatically 3. **Quantization Handling**: - Clear qparams on intermediate unsqueeze_copy and squeeze_copy ops (let annotator fill them) - Preserve original meta on max_pool2d for proper tracing - MAX_POOL2D doesn't need zero-point handling (unlike AVG_POOL2D) ### TOSA/Vela Constraints Validated - U55: Stride ≤3 ✓, Kernel ≤256x256 ✓ - U85: Extended stride support via accumulator save/restore - Dilation: Handled by separate DecomposeMaxPool2dPass if needed Reviewed By: 3l1 Differential Revision: D91760459

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

+    def call_operator(self, op, args, kwargs, meta):
+        if op != torch.ops.aten.max_pool1d.default or not self.allowed_to_transform(
+            meta
+        ):
+            return super().call_operator(op, args, kwargs, meta)
+
+        # Extract and normalize arguments
+        x = args[0]
+        kernel_size = _normalize_to_list(args[1])
+        stride = _normalize_to_list(
+            args[2] if len(args) > 2 else None,
+            default=kernel_size,  # stride defaults to kernel_size
+        )
+        padding = _normalize_to_list(args[3] if len(args) > 3 else 0)
+        dilation = _normalize_to_list(args[4] if len(args) > 4 else 1)
+        ceil_mode = args[5] if len(args) > 5 else False
+
+        # Step 1: Unsqueeze input from 3D to 4D at dim=2
+        # (N, C, L) → (N, C, 1, L)
+        unsqueeze_meta = meta.copy()
+        unsqueeze_meta.data["input_qparams"] = {}
+        unsqueeze_meta.data["output_qparams"] = {}
+        x_4d = super().call_operator(
+            torch.ops.aten.unsqueeze_copy.default,
+            (x, 2),
+            {},
+            unsqueeze_meta,
+            updated=True,
+        )
+
+        # Step 2: Call max_pool2d with 2D parameters
+        # kernel: [k] → [1, k], stride: [s] → [1, s]
+        # padding: [p] → [0, p], dilation: [d] → [1, d]
+        pooled = super().call_operator(
+            torch.ops.aten.max_pool2d.default,
+            (
+                x_4d,
+                [1] + kernel_size,
+                [1] + stride,
+                [0] + padding,
+                [1] + dilation,
+                ceil_mode,
+            ),
+            {},
+            meta,
+            updated=True,
+        )


oscarandersson8218 · 2026-05-20T05:54:16Z

                UnsqueezeBeforeRepeatPass(),
                DecomposeCumsumPass(exported_program),
                DecomposeAsStridedCopyPass(),
+                DecomposeMaxPool1dPass(),


Agree with this one. We shouldn't see any max_pool1d:s at this stage, the pass won't match any exir-targets and will emit torch.ops rather than exir_ops.

oscarandersson8218 · 2026-05-20T05:49:53Z



 @common.parametrize("test_data", test_data_suite_all)
-@pytest.mark.xfail(reason="MaxPool1D not yet supported", strict=False)


I would have expected U55 and U85 tests to pass now as all the independent operators of the decomposition are supported.

oscarandersson8218 · 2026-05-20T05:54:16Z

                UnsqueezeBeforeRepeatPass(),
                DecomposeCumsumPass(exported_program),
                DecomposeAsStridedCopyPass(),
+                DecomposeMaxPool1dPass(),


Agree with this one. We shouldn't see any max_pool1d:s at this stage, the pass won't match any exir-targets and will emit torch.ops rather than exir_ops.

Copilot AI review requested due to automatic review settings January 30, 2026 00:44

Ninja91 requested a review from digantdesai as a code owner January 30, 2026 00:44

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 30, 2026

Copilot started reviewing on behalf of Ninja91 January 30, 2026 00:45 View session

meta-codesync Bot added fb-exported meta-exported labels Jan 30, 2026

Copilot AI reviewed Jan 30, 2026

View reviewed changes

Ninja91 self-assigned this Jan 30, 2026

Ninja91 requested review from SaoirseARM and zingo January 30, 2026 02:25

zingo added the partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm label Jan 30, 2026

zingo changed the title ~~Add MaxPool1D decomposition pass support~~ Arm backend: Add MaxPool1D decomposition pass support Jan 30, 2026

oscarandersson8218 requested changes Jan 30, 2026

View reviewed changes

zingo added the ciflow/trunk label Feb 4, 2026

Ninja91 force-pushed the export-D91760459 branch from 3a1a61e to 8ad827c Compare February 10, 2026 06:39

Ninja91 force-pushed the export-D91760459 branch from 8ad827c to 46190fb Compare February 10, 2026 06:43

Ninja91 force-pushed the export-D91760459 branch from 46190fb to 3c783db Compare February 10, 2026 21:59

Copilot AI review requested due to automatic review settings February 10, 2026 21:59

Copilot started reviewing on behalf of Ninja91 February 10, 2026 22:00 View session

Copilot AI reviewed Feb 10, 2026

View reviewed changes

Ninja91 force-pushed the export-D91760459 branch from 3c783db to e245c96 Compare February 11, 2026 20:16

Ninja91 force-pushed the export-D91760459 branch from db258e9 to 0a5f90c Compare February 20, 2026 23:58

Ninja91 force-pushed the export-D91760459 branch from 0a5f90c to 821fd14 Compare February 21, 2026 00:02

Ninja91 force-pushed the export-D91760459 branch from 821fd14 to a56c81d Compare February 21, 2026 01:01

Ninja91 force-pushed the export-D91760459 branch from a56c81d to 18cedc6 Compare February 21, 2026 01:04

Copilot AI review requested due to automatic review settings March 3, 2026 20:55

Ninja91 force-pushed the export-D91760459 branch from 18cedc6 to 448ee04 Compare March 3, 2026 20:55

Copilot started reviewing on behalf of Ninja91 March 3, 2026 20:56 View session

Ninja91 force-pushed the export-D91760459 branch from 448ee04 to 8824b12 Compare March 3, 2026 20:59

Ninja91 force-pushed the export-D91760459 branch from 8824b12 to 7ffeafd Compare March 3, 2026 21:02

Copilot AI reviewed Mar 3, 2026

View reviewed changes

github-actions Bot added the Stale PRs inactive for over 60 days label May 3, 2026

meta-codesync Bot changed the title ~~Arm backend: Add MaxPool1D decomposition pass support~~ Add MaxPool1D decomposition pass support (#17022) May 19, 2026

Ninja91 force-pushed the export-D91760459 branch from 7ffeafd to 53843a9 Compare May 19, 2026 21:00

github-actions Bot added the module: arm Issues related to arm backend label May 19, 2026

Copilot AI review requested due to automatic review settings May 20, 2026 01:35

Ninja91 force-pushed the export-D91760459 branch from 53843a9 to ad1f626 Compare May 20, 2026 01:35

Copilot started reviewing on behalf of Ninja91 May 20, 2026 01:36 View session

Copilot AI reviewed May 20, 2026

View reviewed changes

oscarandersson8218 requested changes May 20, 2026

View reviewed changes



		@common.parametrize("test_data", test_data_suite_all)
		@pytest.mark.xfail(reason="MaxPool1D not yet supported", strict=False)

Conversation

Ninja91 commented Jan 30, 2026 • edited by meta-codesync Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Implementation Strategy

Decomposition Approach (Optimal for TOSA/Vela)

Why This Approach is Optimal

TOSA/Vela Constraints Validated

Uh oh!

pytorch-bot Bot commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17022

❗ 1 Active SEVs

❌ 2 New Failures, 2 Unrelated Failures

Uh oh!

meta-codesync Bot commented Jan 30, 2026

Uh oh!

github-actions Bot commented Jan 30, 2026

This PR needs a release notes: label

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

oscarandersson8218 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zingo commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

zingo commented Feb 11, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 3, 2026

Uh oh!

digantdesai commented May 11, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Ninja91 commented Jan 30, 2026 •

edited by meta-codesync Bot

Loading

pytorch-bot Bot commented Jan 30, 2026 •

edited

Loading

This PR needs a `release notes:` label

zingo commented Feb 10, 2026 •

edited

Loading