Skip to content

Add MaxPool1D decomposition pass support (#17022)#17022

Open
Ninja91 wants to merge 1 commit into
pytorch:mainfrom
Ninja91:export-D91760459
Open

Add MaxPool1D decomposition pass support (#17022)#17022
Ninja91 wants to merge 1 commit into
pytorch:mainfrom
Ninja91:export-D91760459

Conversation

@Ninja91

@Ninja91 Ninja91 commented Jan 30, 2026

Copy link
Copy Markdown
Contributor

Summary:

Implement DecomposeMaxPool1dPass to enable MaxPool1D support on ARM backend
by decomposing max_pool1d into unsqueeze_copy → max_pool2d → squeeze_copy.

Implementation Strategy

Decomposition Approach (Optimal for TOSA/Vela)

The pass decomposes max_pool1d into max_pool2d via unsqueeze_copy/squeeze_copy
operations:

  1. unsqueeze_copy(dim=2): (N, C, L) → (N, C, 1, L) - add height dimension
  2. max_pool2d: with adapted params [k]→[1,k], [s]→[1,s], [p]→[0,p], [d]→[1,d]
  3. squeeze_copy(dims=[2]): (N, C, 1, L_out) → (N, C, L_out) - remove height dimension

Why This Approach is Optimal

  1. unsqueeze_copy and squeeze_copy map to TOSA RESHAPE which is zero-cost in Vela:

    • Classified as memory_only_ops (Reshape, Squeeze, ExpandDims, Identity)
    • Bypassed entirely when conditions met (NPU-produced, single consumer)
    • Tensor equivalence enables memory aliasing (same address)
  2. TFA Pipeline Placement (before quantization):

    • unsqueeze_copy.default is in _one_to_one_shared_input_qspec
    • squeeze_copy.dims is added to _one_to_one_shared_input_qspec
    • max_pool2d is in _one_to_one_shared_input_or_input_act_qspec
    • All get proper SharedQuantizationSpec from the annotator automatically
  3. Quantization Handling:

    • Clear qparams on intermediate unsqueeze_copy and squeeze_copy ops (let annotator fill them)
    • Preserve original meta on max_pool2d for proper tracing
    • MAX_POOL2D doesn't need zero-point handling (unlike AVG_POOL2D)

TOSA/Vela Constraints Validated

  • U55: Stride ≤3 ✓, Kernel ≤256x256 ✓
  • U85: Extended stride support via accumulator save/restore
  • Dilation: Handled by separate DecomposeMaxPool2dPass if needed

Reviewed By: 3l1

Differential Revision: D91760459

Copilot AI review requested due to automatic review settings January 30, 2026 00:44
@Ninja91 Ninja91 requested a review from digantdesai as a code owner January 30, 2026 00:44
@pytorch-bot

pytorch-bot Bot commented Jan 30, 2026

Copy link
Copy Markdown

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17022

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 2 New Failures, 2 Unrelated Failures

As of commit ad1f626 with merge base 10c8958 (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 30, 2026
@meta-codesync

meta-codesync Bot commented Jan 30, 2026

Copy link
Copy Markdown
Contributor

@Ninja91 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D91760459.

@github-actions

Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements MaxPool1D support for the Arm backend by introducing a decomposition pass that rewrites max_pool1d into view_copy → max_pool2d → view_copy, and adds test coverage for the new behavior.

Changes:

  • Add DecomposeMaxPool1dPass and register it in the Arm pass pipeline (TFA/annotation stage).
  • Extend Arm quantization annotation patterns to treat squeeze_copy.dims as a shared-input-qspec op.
  • Add Arm backend tests covering MaxPool1D across TOSA FP/INT and Ethos-U55/U85 INT pipelines.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
backends/arm/test/targets.bzl Registers the new MaxPool1D test file in the Arm test suite.
backends/arm/test/ops/test_max_pool1d.py Adds pipeline-based tests validating MaxPool1D behavior and delegation.
backends/arm/quantizer/quantization_annotator.py Adds squeeze_copy.dims to shared-qspec ops for quant annotation consistency.
backends/arm/_passes/decompose_max_pool1d_pass.py Introduces the MaxPool1D decomposition pass (view/unsqueeze → max_pool2d → view/squeeze).
backends/arm/_passes/arm_pass_manager.py Wires the new pass into the Arm pass manager (incl. annotation pipeline).
backends/arm/_passes/init.py Exposes the new pass from the passes package.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread backends/arm/_passes/decompose_max_pool1d_pass.py Outdated
Comment thread backends/arm/_passes/decompose_max_pool1d_pass.py Outdated
Comment on lines +82 to +86
@common.parametrize("test_data", test_data_suite)
def test_max_pool1d_tosa_FP(test_data: torch.Tensor):
"""Test max_pool1d with TOSA FP pipeline."""
test_data, model_params = test_data()
pipeline = TosaPipelineFP[input_t1](

Copilot AI Jan 30, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test_data parameter is a callable from test_data_suite/test_data_suite_u55 (you immediately call test_data()), but it's annotated as torch.Tensor. This makes the test signature misleading for static type checking and IDEs; consider annotating it as Callable[[], tuple[torch.Tensor, list[int]]] (or dropping the annotation) to match actual usage.

Copilot uses AI. Check for mistakes.
@Ninja91 Ninja91 self-assigned this Jan 30, 2026
@Ninja91 Ninja91 requested review from SaoirseARM and zingo January 30, 2026 02:25
@zingo zingo added the partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm label Jan 30, 2026
@zingo zingo changed the title Add MaxPool1D decomposition pass support Arm backend: Add MaxPool1D decomposition pass support Jan 30, 2026

@oscarandersson8218 oscarandersson8218 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome patch @Ninja91! Looks good on a high level. Adding a few comments on top of Copilot's.

Comment thread backends/arm/_passes/decompose_max_pool1d_pass.py Outdated
Comment thread backends/arm/_passes/decompose_max_pool1d_pass.py Outdated
Comment thread backends/arm/quantizer/quantization_annotator.py
Comment thread backends/arm/test/ops/test_max_pool1d.py
Comment thread backends/arm/_passes/decompose_max_pool1d_pass.py Outdated
Ninja91 added a commit to Ninja91/executorch that referenced this pull request Feb 10, 2026
Summary:

Implement DecomposeMaxPool1dPass to enable MaxPool1D support on ARM backend
by decomposing max_pool1d to view_copy → max_pool2d → view_copy.

## Implementation Strategy

### Decomposition Approach (Optimal for TOSA/Vela)
The pass decomposes max_pool1d into max_pool2d via view_copy operations:
1. view_copy: (N, C, L) → (N, C, 1, L) - add height dimension
2. max_pool2d: with adapted params [k]→[1,k], [s]→[1,s], [p]→[0,p]
3. view_copy: (N, C, 1, L_out) → (N, C, L_out) - remove height dimension

### Why This Approach is Optimal

1. **view_copy maps to TOSA RESHAPE** which is zero-cost in Vela:
   - Classified as memory_only_ops (Reshape, Squeeze, ExpandDims, Identity)
   - Bypassed entirely when conditions met (NPU-produced, single consumer)
   - Tensor equivalence enables memory aliasing (same address)

2. **TFA Pipeline Placement (before quantization)**:
   - view_copy is in _one_to_one_shared_input_qspec (line 407)
   - max_pool2d is in _one_to_one_shared_input_or_input_act_qspec (line 455)
   - Both get proper SharedQuantizationSpec from annotator automatically

3. **Quantization Handling**:
   - Clear qparams on intermediate view_copy ops (let annotator fill them)
   - Preserve original meta on max_pool2d for proper tracing
   - MAX_POOL2D doesn't need zero-point handling (unlike AVG_POOL2D)

### TOSA/Vela Constraints Validated
- U55: Stride ≤3 ✓, Kernel ≤256x256 ✓
- U85: Extended stride support via accumulator save/restore
- Dilation: Handled by separate DecomposeMaxPool2dPass if needed

Differential Revision: D91760459
Ninja91 added a commit to Ninja91/executorch that referenced this pull request Feb 10, 2026
Summary:

Implement DecomposeMaxPool1dPass to enable MaxPool1D support on ARM backend
by decomposing max_pool1d to view_copy → max_pool2d → view_copy.

## Implementation Strategy

### Decomposition Approach (Optimal for TOSA/Vela)
The pass decomposes max_pool1d into max_pool2d via view_copy operations:
1. view_copy: (N, C, L) → (N, C, 1, L) - add height dimension
2. max_pool2d: with adapted params [k]→[1,k], [s]→[1,s], [p]→[0,p]
3. view_copy: (N, C, 1, L_out) → (N, C, L_out) - remove height dimension

### Why This Approach is Optimal

1. **view_copy maps to TOSA RESHAPE** which is zero-cost in Vela:
   - Classified as memory_only_ops (Reshape, Squeeze, ExpandDims, Identity)
   - Bypassed entirely when conditions met (NPU-produced, single consumer)
   - Tensor equivalence enables memory aliasing (same address)

2. **TFA Pipeline Placement (before quantization)**:
   - view_copy is in _one_to_one_shared_input_qspec (line 407)
   - max_pool2d is in _one_to_one_shared_input_or_input_act_qspec (line 455)
   - Both get proper SharedQuantizationSpec from annotator automatically

3. **Quantization Handling**:
   - Clear qparams on intermediate view_copy ops (let annotator fill them)
   - Preserve original meta on max_pool2d for proper tracing
   - MAX_POOL2D doesn't need zero-point handling (unlike AVG_POOL2D)

### TOSA/Vela Constraints Validated
- U55: Stride ≤3 ✓, Kernel ≤256x256 ✓
- U85: Extended stride support via accumulator save/restore
- Dilation: Handled by separate DecomposeMaxPool2dPass if needed

Differential Revision: D91760459
@zingo

zingo commented Feb 10, 2026

Copy link
Copy Markdown
Collaborator

See the errors below that seems suspicions, test to rerun Arm tests to see if it's just flakey

FAILED backends/arm/test/ops/test_matmul.py::test_matmul_XXXX

Ninja91 added a commit to Ninja91/executorch that referenced this pull request Feb 10, 2026
Summary:

Implement DecomposeMaxPool1dPass to enable MaxPool1D support on ARM backend
by decomposing max_pool1d to view_copy → max_pool2d → view_copy.

## Implementation Strategy

### Decomposition Approach (Optimal for TOSA/Vela)
The pass decomposes max_pool1d into max_pool2d via view_copy operations:
1. view_copy: (N, C, L) → (N, C, 1, L) - add height dimension
2. max_pool2d: with adapted params [k]→[1,k], [s]→[1,s], [p]→[0,p]
3. view_copy: (N, C, 1, L_out) → (N, C, L_out) - remove height dimension

### Why This Approach is Optimal

1. **view_copy maps to TOSA RESHAPE** which is zero-cost in Vela:
   - Classified as memory_only_ops (Reshape, Squeeze, ExpandDims, Identity)
   - Bypassed entirely when conditions met (NPU-produced, single consumer)
   - Tensor equivalence enables memory aliasing (same address)

2. **TFA Pipeline Placement (before quantization)**:
   - view_copy is in _one_to_one_shared_input_qspec (line 407)
   - max_pool2d is in _one_to_one_shared_input_or_input_act_qspec (line 455)
   - Both get proper SharedQuantizationSpec from annotator automatically

3. **Quantization Handling**:
   - Clear qparams on intermediate view_copy ops (let annotator fill them)
   - Preserve original meta on max_pool2d for proper tracing
   - MAX_POOL2D doesn't need zero-point handling (unlike AVG_POOL2D)

### TOSA/Vela Constraints Validated
- U55: Stride ≤3 ✓, Kernel ≤256x256 ✓
- U85: Extended stride support via accumulator save/restore
- Dilation: Handled by separate DecomposeMaxPool2dPass if needed

Differential Revision: D91760459
Copilot AI review requested due to automatic review settings February 10, 2026 21:59

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread backends/arm/test/ops/test_max_pool1d.py Outdated
Comment on lines +97 to +123
@common.parametrize("test_data", test_data_suite)
@pytest.mark.xfail(reason="MaxPool1D not yet supported", strict=False)
def test_max_pool1d_u55_INT(test_data: Callable):
"""Test max_pool1d on Ethos-U55 (quantized)."""
test_data, model_params = test_data()
pipeline = EthosU55PipelineINT[input_t1](
MaxPool1d(*model_params),
(test_data,),
aten_op,
exir_ops=[],
)
pipeline.run()


@common.parametrize("test_data", test_data_suite)
@pytest.mark.xfail(reason="MaxPool1D not yet supported", strict=False)
def test_max_pool1d_u85_INT(test_data: Callable):
"""Test max_pool1d on Ethos-U85 (quantized)."""
test_data, model_params = test_data()
pipeline = EthosU85PipelineINT[input_t1](
MaxPool1d(*model_params),
(test_data,),
aten_op,
exir_ops=[],
)
pipeline.run()

Copilot AI Feb 10, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The U55/U85 INT tests use a broad @pytest.mark.xfail; once MaxPool1D is supported, these should follow the repo’s existing pattern and be gated with common.XfailIfNoCorstone300 / common.XfailIfNoCorstone320 instead (so CI environments without FVP don’t hard-fail).

Copilot uses AI. Check for mistakes.
Comment on lines +54 to +61
test_data_suite = {
# (test_name, test_data, [kernel_size, stride, padding])
"simple": lambda: (torch.rand(1, 16, 50), [4, 2, 0]),
"with_padding": lambda: (torch.rand(1, 16, 50), [3, 2, 1]),
"stride_1": lambda: (torch.rand(1, 8, 32), [3, 1, 0]),
"larger_kernel": lambda: (torch.rand(1, 4, 64), [8, 4, 0]),
"multi_batch": lambda: (torch.rand(4, 16, 50), [4, 2, 0]),
}

Copilot AI Feb 10, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment describing test_data_suite’s tuple shape is inaccurate: the values returned are (tensor, [kernel_size, stride, padding]) and do not include a test_name element. Please update the comment to match the actual structure to avoid confusion when adding new cases.

Copilot uses AI. Check for mistakes.
@zingo

zingo commented Feb 11, 2026

Copy link
Copy Markdown
Collaborator

PR seem to introduce this errors :(
FAILED backends/arm/test/ops/test_matmul.py::test_matmul_u55_INT[double_input_randn_rand_1d_2d]
FAILED backends/arm/test/ops/test_matmul.py::test_matmul_u85_INT_a16w8[double_input_randn_rand_1d_2d]
FAILED backends/arm/test/ops/test_matmul.py::test_matmul_u85_INT[double_input_randn_rand_1d_2d]
FAILED backends/arm/test/ops/test_max_pool1d.py::test_max_pool1d_vgf_quant[simple]
FAILED backends/arm/test/ops/test_max_pool1d.py::test_max_pool1d_vgf_quant[with_padding]
FAILED backends/arm/test/ops/test_max_pool1d.py::test_max_pool1d_vgf_quant[stride_1]
FAILED backends/arm/test/ops/test_max_pool1d.py::test_max_pool1d_vgf_quant[larger_kernel]
FAILED backends/arm/test/ops/test_max_pool1d.py::test_max_pool1d_vgf_quant[multi_batch]

Ninja91 added a commit to Ninja91/executorch that referenced this pull request Feb 11, 2026
Summary:

Implement DecomposeMaxPool1dPass to enable MaxPool1D support on ARM backend
by decomposing max_pool1d to view_copy → max_pool2d → view_copy.

## Implementation Strategy

### Decomposition Approach (Optimal for TOSA/Vela)
The pass decomposes max_pool1d into max_pool2d via view_copy operations:
1. view_copy: (N, C, L) → (N, C, 1, L) - add height dimension
2. max_pool2d: with adapted params [k]→[1,k], [s]→[1,s], [p]→[0,p]
3. view_copy: (N, C, 1, L_out) → (N, C, L_out) - remove height dimension

### Why This Approach is Optimal

1. **view_copy maps to TOSA RESHAPE** which is zero-cost in Vela:
   - Classified as memory_only_ops (Reshape, Squeeze, ExpandDims, Identity)
   - Bypassed entirely when conditions met (NPU-produced, single consumer)
   - Tensor equivalence enables memory aliasing (same address)

2. **TFA Pipeline Placement (before quantization)**:
   - view_copy is in _one_to_one_shared_input_qspec (line 407)
   - max_pool2d is in _one_to_one_shared_input_or_input_act_qspec (line 455)
   - Both get proper SharedQuantizationSpec from annotator automatically

3. **Quantization Handling**:
   - Clear qparams on intermediate view_copy ops (let annotator fill them)
   - Preserve original meta on max_pool2d for proper tracing
   - MAX_POOL2D doesn't need zero-point handling (unlike AVG_POOL2D)

### TOSA/Vela Constraints Validated
- U55: Stride ≤3 ✓, Kernel ≤256x256 ✓
- U85: Extended stride support via accumulator save/restore
- Dilation: Handled by separate DecomposeMaxPool2dPass if needed

Reviewed By: 3l1

Differential Revision: D91760459
Ninja91 added a commit to Ninja91/executorch that referenced this pull request Feb 20, 2026
Summary:

Implement DecomposeMaxPool1dPass to enable MaxPool1D support on ARM backend
by decomposing max_pool1d to view_copy → max_pool2d → view_copy.

## Implementation Strategy

### Decomposition Approach (Optimal for TOSA/Vela)
The pass decomposes max_pool1d into max_pool2d via view_copy operations:
1. view_copy: (N, C, L) → (N, C, 1, L) - add height dimension
2. max_pool2d: with adapted params [k]→[1,k], [s]→[1,s], [p]→[0,p]
3. view_copy: (N, C, 1, L_out) → (N, C, L_out) - remove height dimension

### Why This Approach is Optimal

1. **view_copy maps to TOSA RESHAPE** which is zero-cost in Vela:
   - Classified as memory_only_ops (Reshape, Squeeze, ExpandDims, Identity)
   - Bypassed entirely when conditions met (NPU-produced, single consumer)
   - Tensor equivalence enables memory aliasing (same address)

2. **TFA Pipeline Placement (before quantization)**:
   - view_copy is in _one_to_one_shared_input_qspec (line 407)
   - max_pool2d is in _one_to_one_shared_input_or_input_act_qspec (line 455)
   - Both get proper SharedQuantizationSpec from annotator automatically

3. **Quantization Handling**:
   - Clear qparams on intermediate view_copy ops (let annotator fill them)
   - Preserve original meta on max_pool2d for proper tracing
   - MAX_POOL2D doesn't need zero-point handling (unlike AVG_POOL2D)

### TOSA/Vela Constraints Validated
- U55: Stride ≤3 ✓, Kernel ≤256x256 ✓
- U85: Extended stride support via accumulator save/restore
- Dilation: Handled by separate DecomposeMaxPool2dPass if needed

Reviewed By: 3l1

Differential Revision: D91760459
Ninja91 added a commit to Ninja91/executorch that referenced this pull request Feb 21, 2026
Summary:
Pull Request resolved: pytorch#17022

Implement DecomposeMaxPool1dPass to enable MaxPool1D support on ARM backend
by decomposing max_pool1d to view_copy → max_pool2d → view_copy.

## Implementation Strategy

### Decomposition Approach (Optimal for TOSA/Vela)
The pass decomposes max_pool1d into max_pool2d via view_copy operations:
1. view_copy: (N, C, L) → (N, C, 1, L) - add height dimension
2. max_pool2d: with adapted params [k]→[1,k], [s]→[1,s], [p]→[0,p]
3. view_copy: (N, C, 1, L_out) → (N, C, L_out) - remove height dimension

### Why This Approach is Optimal

1. **view_copy maps to TOSA RESHAPE** which is zero-cost in Vela:
   - Classified as memory_only_ops (Reshape, Squeeze, ExpandDims, Identity)
   - Bypassed entirely when conditions met (NPU-produced, single consumer)
   - Tensor equivalence enables memory aliasing (same address)

2. **TFA Pipeline Placement (before quantization)**:
   - view_copy is in _one_to_one_shared_input_qspec (line 407)
   - max_pool2d is in _one_to_one_shared_input_or_input_act_qspec (line 455)
   - Both get proper SharedQuantizationSpec from annotator automatically

3. **Quantization Handling**:
   - Clear qparams on intermediate view_copy ops (let annotator fill them)
   - Preserve original meta on max_pool2d for proper tracing
   - MAX_POOL2D doesn't need zero-point handling (unlike AVG_POOL2D)

### TOSA/Vela Constraints Validated
- U55: Stride ≤3 ✓, Kernel ≤256x256 ✓
- U85: Extended stride support via accumulator save/restore
- Dilation: Handled by separate DecomposeMaxPool2dPass if needed

Reviewed By: 3l1

Differential Revision: D91760459
Ninja91 added a commit to Ninja91/executorch that referenced this pull request Feb 21, 2026
Summary:

Implement DecomposeMaxPool1dPass to enable MaxPool1D support on ARM backend
by decomposing max_pool1d to view_copy → max_pool2d → view_copy.

## Implementation Strategy

### Decomposition Approach (Optimal for TOSA/Vela)
The pass decomposes max_pool1d into max_pool2d via view_copy operations:
1. view_copy: (N, C, L) → (N, C, 1, L) - add height dimension
2. max_pool2d: with adapted params [k]→[1,k], [s]→[1,s], [p]→[0,p]
3. view_copy: (N, C, 1, L_out) → (N, C, L_out) - remove height dimension

### Why This Approach is Optimal

1. **view_copy maps to TOSA RESHAPE** which is zero-cost in Vela:
   - Classified as memory_only_ops (Reshape, Squeeze, ExpandDims, Identity)
   - Bypassed entirely when conditions met (NPU-produced, single consumer)
   - Tensor equivalence enables memory aliasing (same address)

2. **TFA Pipeline Placement (before quantization)**:
   - view_copy is in _one_to_one_shared_input_qspec (line 407)
   - max_pool2d is in _one_to_one_shared_input_or_input_act_qspec (line 455)
   - Both get proper SharedQuantizationSpec from annotator automatically

3. **Quantization Handling**:
   - Clear qparams on intermediate view_copy ops (let annotator fill them)
   - Preserve original meta on max_pool2d for proper tracing
   - MAX_POOL2D doesn't need zero-point handling (unlike AVG_POOL2D)

### TOSA/Vela Constraints Validated
- U55: Stride ≤3 ✓, Kernel ≤256x256 ✓
- U85: Extended stride support via accumulator save/restore
- Dilation: Handled by separate DecomposeMaxPool2dPass if needed

Reviewed By: 3l1

Differential Revision: D91760459
Ninja91 added a commit to Ninja91/executorch that referenced this pull request Feb 21, 2026
Summary:
Pull Request resolved: pytorch#17022

Implement DecomposeMaxPool1dPass to enable MaxPool1D support on ARM backend
by decomposing max_pool1d to view_copy → max_pool2d → view_copy.

## Implementation Strategy

### Decomposition Approach (Optimal for TOSA/Vela)
The pass decomposes max_pool1d into max_pool2d via view_copy operations:
1. view_copy: (N, C, L) → (N, C, 1, L) - add height dimension
2. max_pool2d: with adapted params [k]→[1,k], [s]→[1,s], [p]→[0,p]
3. view_copy: (N, C, 1, L_out) → (N, C, L_out) - remove height dimension

### Why This Approach is Optimal

1. **view_copy maps to TOSA RESHAPE** which is zero-cost in Vela:
   - Classified as memory_only_ops (Reshape, Squeeze, ExpandDims, Identity)
   - Bypassed entirely when conditions met (NPU-produced, single consumer)
   - Tensor equivalence enables memory aliasing (same address)

2. **TFA Pipeline Placement (before quantization)**:
   - view_copy is in _one_to_one_shared_input_qspec (line 407)
   - max_pool2d is in _one_to_one_shared_input_or_input_act_qspec (line 455)
   - Both get proper SharedQuantizationSpec from annotator automatically

3. **Quantization Handling**:
   - Clear qparams on intermediate view_copy ops (let annotator fill them)
   - Preserve original meta on max_pool2d for proper tracing
   - MAX_POOL2D doesn't need zero-point handling (unlike AVG_POOL2D)

### TOSA/Vela Constraints Validated
- U55: Stride ≤3 ✓, Kernel ≤256x256 ✓
- U85: Extended stride support via accumulator save/restore
- Dilation: Handled by separate DecomposeMaxPool2dPass if needed

Reviewed By: 3l1

Differential Revision: D91760459
Ninja91 added a commit to Ninja91/executorch that referenced this pull request Mar 3, 2026
Summary:

Implement DecomposeMaxPool1dPass to enable MaxPool1D support on ARM backend
by decomposing max_pool1d to view_copy → max_pool2d → view_copy.

## Implementation Strategy

### Decomposition Approach (Optimal for TOSA/Vela)
The pass decomposes max_pool1d into max_pool2d via view_copy operations:
1. view_copy: (N, C, L) → (N, C, 1, L) - add height dimension
2. max_pool2d: with adapted params [k]→[1,k], [s]→[1,s], [p]→[0,p]
3. view_copy: (N, C, 1, L_out) → (N, C, L_out) - remove height dimension

### Why This Approach is Optimal

1. **view_copy maps to TOSA RESHAPE** which is zero-cost in Vela:
   - Classified as memory_only_ops (Reshape, Squeeze, ExpandDims, Identity)
   - Bypassed entirely when conditions met (NPU-produced, single consumer)
   - Tensor equivalence enables memory aliasing (same address)

2. **TFA Pipeline Placement (before quantization)**:
   - view_copy is in _one_to_one_shared_input_qspec (line 407)
   - max_pool2d is in _one_to_one_shared_input_or_input_act_qspec (line 455)
   - Both get proper SharedQuantizationSpec from annotator automatically

3. **Quantization Handling**:
   - Clear qparams on intermediate view_copy ops (let annotator fill them)
   - Preserve original meta on max_pool2d for proper tracing
   - MAX_POOL2D doesn't need zero-point handling (unlike AVG_POOL2D)

### TOSA/Vela Constraints Validated
- U55: Stride ≤3 ✓, Kernel ≤256x256 ✓
- U85: Extended stride support via accumulator save/restore
- Dilation: Handled by separate DecomposeMaxPool2dPass if needed

Reviewed By: 3l1

Differential Revision: D91760459
Copilot AI review requested due to automatic review settings March 3, 2026 20:55
@Ninja91 Ninja91 force-pushed the export-D91760459 branch from 18cedc6 to 448ee04 Compare March 3, 2026 20:55
Ninja91 added a commit to Ninja91/executorch that referenced this pull request Mar 3, 2026
Summary:
Pull Request resolved: pytorch#17022

Implement DecomposeMaxPool1dPass to enable MaxPool1D support on ARM backend
by decomposing max_pool1d to view_copy → max_pool2d → view_copy.

## Implementation Strategy

### Decomposition Approach (Optimal for TOSA/Vela)
The pass decomposes max_pool1d into max_pool2d via view_copy operations:
1. view_copy: (N, C, L) → (N, C, 1, L) - add height dimension
2. max_pool2d: with adapted params [k]→[1,k], [s]→[1,s], [p]→[0,p]
3. view_copy: (N, C, 1, L_out) → (N, C, L_out) - remove height dimension

### Why This Approach is Optimal

1. **view_copy maps to TOSA RESHAPE** which is zero-cost in Vela:
   - Classified as memory_only_ops (Reshape, Squeeze, ExpandDims, Identity)
   - Bypassed entirely when conditions met (NPU-produced, single consumer)
   - Tensor equivalence enables memory aliasing (same address)

2. **TFA Pipeline Placement (before quantization)**:
   - view_copy is in _one_to_one_shared_input_qspec (line 407)
   - max_pool2d is in _one_to_one_shared_input_or_input_act_qspec (line 455)
   - Both get proper SharedQuantizationSpec from annotator automatically

3. **Quantization Handling**:
   - Clear qparams on intermediate view_copy ops (let annotator fill them)
   - Preserve original meta on max_pool2d for proper tracing
   - MAX_POOL2D doesn't need zero-point handling (unlike AVG_POOL2D)

### TOSA/Vela Constraints Validated
- U55: Stride ≤3 ✓, Kernel ≤256x256 ✓
- U85: Extended stride support via accumulator save/restore
- Dilation: Handled by separate DecomposeMaxPool2dPass if needed

Reviewed By: 3l1

Differential Revision: D91760459
@Ninja91 Ninja91 force-pushed the export-D91760459 branch from 448ee04 to 8824b12 Compare March 3, 2026 20:59
Ninja91 added a commit to Ninja91/executorch that referenced this pull request Mar 3, 2026
Summary:

Implement DecomposeMaxPool1dPass to enable MaxPool1D support on ARM backend
by decomposing max_pool1d to view_copy → max_pool2d → view_copy.

## Implementation Strategy

### Decomposition Approach (Optimal for TOSA/Vela)
The pass decomposes max_pool1d into max_pool2d via view_copy operations:
1. view_copy: (N, C, L) → (N, C, 1, L) - add height dimension
2. max_pool2d: with adapted params [k]→[1,k], [s]→[1,s], [p]→[0,p]
3. view_copy: (N, C, 1, L_out) → (N, C, L_out) - remove height dimension

### Why This Approach is Optimal

1. **view_copy maps to TOSA RESHAPE** which is zero-cost in Vela:
   - Classified as memory_only_ops (Reshape, Squeeze, ExpandDims, Identity)
   - Bypassed entirely when conditions met (NPU-produced, single consumer)
   - Tensor equivalence enables memory aliasing (same address)

2. **TFA Pipeline Placement (before quantization)**:
   - view_copy is in _one_to_one_shared_input_qspec (line 407)
   - max_pool2d is in _one_to_one_shared_input_or_input_act_qspec (line 455)
   - Both get proper SharedQuantizationSpec from annotator automatically

3. **Quantization Handling**:
   - Clear qparams on intermediate view_copy ops (let annotator fill them)
   - Preserve original meta on max_pool2d for proper tracing
   - MAX_POOL2D doesn't need zero-point handling (unlike AVG_POOL2D)

### TOSA/Vela Constraints Validated
- U55: Stride ≤3 ✓, Kernel ≤256x256 ✓
- U85: Extended stride support via accumulator save/restore
- Dilation: Handled by separate DecomposeMaxPool2dPass if needed

Reviewed By: 3l1

Differential Revision: D91760459
@Ninja91 Ninja91 force-pushed the export-D91760459 branch from 8824b12 to 7ffeafd Compare March 3, 2026 21:02

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +45 to +49
def call_operator(self, op, args, kwargs, meta):
if op != torch.ops.aten.max_pool1d.default or not self.allowed_to_transform(
meta
):
return super().call_operator(op, args, kwargs, meta)

Copilot AI Mar 3, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DecomposeMaxPool1dPass currently only matches torch.ops.aten.max_pool1d.default. In the backend lowering pipeline (transform_to_backend_pipeline), graphs are typically in the EXIR edge dialect, so this pass may never fire for non-quantized compilation unless it also handles the edge-dialect overload (and emits the corresponding edge ops). Consider supporting both aten and edge overloads (similar to other passes that accept both).

Copilot uses AI. Check for mistakes.
Comment on lines +64 to +70
x_4d = super().call_operator(
torch.ops.aten.unsqueeze_copy.default,
(x, 2),
{},
meta,
updated=True,
)

Copilot AI Mar 3, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When inserting the unsqueeze_copy node, the pass reuses the original node metadata unchanged. In the main Arm pipeline this pass can run post Q/DQ folding, so carrying over input_qparams/output_qparams from the original max_pool1d node can incorrectly mark the new view-like node as already-quantized. Other passes that insert view ops clear qparams on these intermediates (e.g., Conv1dUnsqueezePass). Consider copying meta and clearing input/output qparams for the inserted unsqueeze node.

Copilot uses AI. Check for mistakes.
Comment on lines +92 to +98
output = super().call_operator(
torch.ops.aten.squeeze_copy.dims,
(pooled, [2]),
{},
meta,
updated=True,
)

Copilot AI Mar 3, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as the inserted unsqueeze_copy: the inserted squeeze_copy node currently reuses the original node metadata unchanged. If this pass runs after quantization/QDQ folding, preserving qparams here can lead to incorrect quant metadata on a view-like op. Consider using a copied meta with cleared input_qparams/output_qparams for this intermediate reshape as well.

Copilot uses AI. Check for mistakes.
Comment on lines +349 to +350
DecomposeMaxPool2dPass(),
DecomposeMaxPool1dPass(),

Copilot AI Mar 3, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the TOSA backend pipeline, DecomposeMaxPool1dPass is currently scheduled after DecomposeMaxPool2dPass. Since this pass can introduce a max_pool2d op (potentially with dilation > 1), running it after the 2D decomposition pass prevents the newly created max_pool2d from being further decomposed/normalized. Consider moving DecomposeMaxPool1dPass before DecomposeMaxPool2dPass, or declaring DecomposeMaxPool2dPass in _passes_required_after so ordering is enforced.

Suggested change
DecomposeMaxPool2dPass(),
DecomposeMaxPool1dPass(),
DecomposeMaxPool1dPass(),
DecomposeMaxPool2dPass(),

Copilot uses AI. Check for mistakes.
Comment on lines +29 to +41
class DecomposeMaxPool1dPass(ArmPass):
"""Decomposes max_pool1d into max_pool2d via unsqueeze_copy/squeeze_copy
operations.

This pass runs in transform_for_annotation (TFA) pipeline before quantization,
ensuring proper quantization annotation for the decomposed ops.

Transformation:
max_pool1d(x, kernel, stride, padding, dilation, ceil_mode)
→ unsqueeze_copy(x, dim=2) # (N,C,L) → (N,C,1,L)
→ max_pool2d(..., [1,k], [1,s], [0,p], [1,d], ceil_mode)
→ squeeze_copy(..., dims=[2]) # (N,C,1,L') → (N,C,L')
"""

Copilot AI Mar 3, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are existing max_pool1d backend tests currently marked xfail (e.g., backends/arm/test/ops/test_max_pool1d.py). Since this PR adds explicit MaxPool1D support, it should also update/enable those tests (and adjust any expected edge op name if the lowering changes from the previous max_pool2d_with_indices pattern).

Copilot uses AI. Check for mistakes.
@github-actions

github-actions Bot commented May 3, 2026

Copy link
Copy Markdown

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions Bot added the Stale PRs inactive for over 60 days label May 3, 2026
@digantdesai

Copy link
Copy Markdown
Contributor

@Ninja91 stale?

@meta-codesync meta-codesync Bot changed the title Arm backend: Add MaxPool1D decomposition pass support Add MaxPool1D decomposition pass support (#17022) May 19, 2026
@Ninja91 Ninja91 force-pushed the export-D91760459 branch from 7ffeafd to 53843a9 Compare May 19, 2026 21:00
Ninja91 added a commit to Ninja91/executorch that referenced this pull request May 19, 2026
Summary:

Implement DecomposeMaxPool1dPass to enable MaxPool1D support on ARM backend
by decomposing max_pool1d into unsqueeze_copy → max_pool2d → squeeze_copy.

## Implementation Strategy

### Decomposition Approach (Optimal for TOSA/Vela)
The pass decomposes max_pool1d into max_pool2d via unsqueeze_copy/squeeze_copy
operations:
1. unsqueeze_copy(dim=2): (N, C, L) → (N, C, 1, L) - add height dimension
2. max_pool2d: with adapted params [k]→[1,k], [s]→[1,s], [p]→[0,p], [d]→[1,d]
3. squeeze_copy(dims=[2]): (N, C, 1, L_out) → (N, C, L_out) - remove height dimension

### Why This Approach is Optimal

1. **unsqueeze_copy and squeeze_copy map to TOSA RESHAPE** which is zero-cost in Vela:
   - Classified as memory_only_ops (Reshape, Squeeze, ExpandDims, Identity)
   - Bypassed entirely when conditions met (NPU-produced, single consumer)
   - Tensor equivalence enables memory aliasing (same address)

2. **TFA Pipeline Placement (before quantization)**:
   - unsqueeze_copy.default is in _one_to_one_shared_input_qspec
   - squeeze_copy.dims is added to _one_to_one_shared_input_qspec
   - max_pool2d is in _one_to_one_shared_input_or_input_act_qspec
   - All get proper SharedQuantizationSpec from the annotator automatically

3. **Quantization Handling**:
   - Clear qparams on intermediate unsqueeze_copy and squeeze_copy ops (let annotator fill them)
   - Preserve original meta on max_pool2d for proper tracing
   - MAX_POOL2D doesn't need zero-point handling (unlike AVG_POOL2D)

### TOSA/Vela Constraints Validated
- U55: Stride ≤3 ✓, Kernel ≤256x256 ✓
- U85: Extended stride support via accumulator save/restore
- Dilation: Handled by separate DecomposeMaxPool2dPass if needed

Reviewed By: 3l1

Differential Revision: D91760459
@github-actions github-actions Bot added the module: arm Issues related to arm backend label May 19, 2026
Summary:

Implement DecomposeMaxPool1dPass to enable MaxPool1D support on ARM backend
by decomposing max_pool1d into unsqueeze_copy → max_pool2d → squeeze_copy.

## Implementation Strategy

### Decomposition Approach (Optimal for TOSA/Vela)
The pass decomposes max_pool1d into max_pool2d via unsqueeze_copy/squeeze_copy
operations:
1. unsqueeze_copy(dim=2): (N, C, L) → (N, C, 1, L) - add height dimension
2. max_pool2d: with adapted params [k]→[1,k], [s]→[1,s], [p]→[0,p], [d]→[1,d]
3. squeeze_copy(dims=[2]): (N, C, 1, L_out) → (N, C, L_out) - remove height dimension

### Why This Approach is Optimal

1. **unsqueeze_copy and squeeze_copy map to TOSA RESHAPE** which is zero-cost in Vela:
   - Classified as memory_only_ops (Reshape, Squeeze, ExpandDims, Identity)
   - Bypassed entirely when conditions met (NPU-produced, single consumer)
   - Tensor equivalence enables memory aliasing (same address)

2. **TFA Pipeline Placement (before quantization)**:
   - unsqueeze_copy.default is in _one_to_one_shared_input_qspec
   - squeeze_copy.dims is added to _one_to_one_shared_input_qspec
   - max_pool2d is in _one_to_one_shared_input_or_input_act_qspec
   - All get proper SharedQuantizationSpec from the annotator automatically

3. **Quantization Handling**:
   - Clear qparams on intermediate unsqueeze_copy and squeeze_copy ops (let annotator fill them)
   - Preserve original meta on max_pool2d for proper tracing
   - MAX_POOL2D doesn't need zero-point handling (unlike AVG_POOL2D)

### TOSA/Vela Constraints Validated
- U55: Stride ≤3 ✓, Kernel ≤256x256 ✓
- U85: Extended stride support via accumulator save/restore
- Dilation: Handled by separate DecomposeMaxPool2dPass if needed

Reviewed By: 3l1

Differential Revision: D91760459
Copilot AI review requested due to automatic review settings May 20, 2026 01:35
@Ninja91 Ninja91 force-pushed the export-D91760459 branch from 53843a9 to ad1f626 Compare May 20, 2026 01:35

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Comment on lines +45 to +91
def call_operator(self, op, args, kwargs, meta):
if op != torch.ops.aten.max_pool1d.default or not self.allowed_to_transform(
meta
):
return super().call_operator(op, args, kwargs, meta)

# Extract and normalize arguments
x = args[0]
kernel_size = _normalize_to_list(args[1])
stride = _normalize_to_list(
args[2] if len(args) > 2 else None,
default=kernel_size, # stride defaults to kernel_size
)
padding = _normalize_to_list(args[3] if len(args) > 3 else 0)
dilation = _normalize_to_list(args[4] if len(args) > 4 else 1)
ceil_mode = args[5] if len(args) > 5 else False

# Step 1: Unsqueeze input from 3D to 4D at dim=2
# (N, C, L) → (N, C, 1, L)
unsqueeze_meta = meta.copy()
unsqueeze_meta.data["input_qparams"] = {}
unsqueeze_meta.data["output_qparams"] = {}
x_4d = super().call_operator(
torch.ops.aten.unsqueeze_copy.default,
(x, 2),
{},
unsqueeze_meta,
updated=True,
)

# Step 2: Call max_pool2d with 2D parameters
# kernel: [k] → [1, k], stride: [s] → [1, s]
# padding: [p] → [0, p], dilation: [d] → [1, d]
pooled = super().call_operator(
torch.ops.aten.max_pool2d.default,
(
x_4d,
[1] + kernel_size,
[1] + stride,
[0] + padding,
[1] + dilation,
ceil_mode,
),
{},
meta,
updated=True,
)
UnsqueezeBeforeRepeatPass(),
DecomposeCumsumPass(exported_program),
DecomposeAsStridedCopyPass(),
DecomposeMaxPool1dPass(),

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with this one. We shouldn't see any max_pool1d:s at this stage, the pass won't match any exir-targets and will emit torch.ops rather than exir_ops.



@common.parametrize("test_data", test_data_suite_all)
@pytest.mark.xfail(reason="MaxPool1D not yet supported", strict=False)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have expected U55 and U85 tests to pass now as all the independent operators of the decomposition are supported.

UnsqueezeBeforeRepeatPass(),
DecomposeCumsumPass(exported_program),
DecomposeAsStridedCopyPass(),
DecomposeMaxPool1dPass(),

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with this one. We shouldn't see any max_pool1d:s at this stage, the pass won't match any exir-targets and will emit torch.ops rather than exir_ops.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported module: arm Issues related to arm backend partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm Stale PRs inactive for over 60 days

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants