Skip to content

[202412 backport] Add GCU apply-patch performance smoke test (sonic-mgmt#24092)#1221

Merged
vaibhavhd merged 1 commit into
202412from
rimunagala/202412-mgmt-gcu-perf-smoke-test
Jun 5, 2026
Merged

[202412 backport] Add GCU apply-patch performance smoke test (sonic-mgmt#24092)#1221
vaibhavhd merged 1 commit into
202412from
rimunagala/202412-mgmt-gcu-perf-smoke-test

Conversation

@rimunagala

Copy link
Copy Markdown

[202412 backport] Add GCU apply-patch performance smoke test

Backport of upstream sonic-net/sonic-mgmt#24092 (merged 2026-05-04, commit e8cf611e) to Azure/sonic-mgmt.msft:202412.

Why

Adds a dedicated nightly smoke test that exercises config apply-patch on representative payload sizes and asserts the runtime stays below platform-specific budgets. This is the regression net for the GCU sort-step performance work landed via sonic-utilities.msft#352 (backports of sonic-utilities#4310/#4476/#4478/#4554) - without this test, a future sort-step regression on 202412 would only surface as customer-visible apply-patch slowness.

What

Single new file: tests/generic_config_updater/test_apply_patch_perf.py (+684 lines, no other files touched). Imports the existing helpers from tests/common/gu_utils.py (generate_tmpfile, delete_tmpfile, create_checkpoint, delete_checkpoint, rollback_or_reload, expect_op_success, format_json_patch_for_multiasic) - all already present on 202412.

Cherry-pick provenance

Validation

  • Pre-cherry-pick import check: all 7 names imported from tests.common.gu_utils are present on Azure/sonic-mgmt.msft:202412.
  • Post-cherry-pick: zero conflict markers, file size matches upstream (25129 bytes), commit shape 1 file changed, 684 insertions(+).

Backport ordering

Soft prerequisite: sibling backport PR #1220 (cherry-pick of sonic-mgmt#23848) lands the effective apply_patch() timeout in the shared GCU helper. Without #1220 a real sort-step regression caught by this smoke test could hang the test runner instead of failing cleanly with a TimeoutError. Recommend merging #1220 first.

## Description

Adds a performance smoke test for `config apply-patch` that catches GCU
sort-step regressions across all topologies.

### Test Suite (6 tests)

| Test | Operation | What it exercises |
|------|-----------|-------------------|
| `test_perf_acl_port_removal` | N REMOVE moves | Leaf-list
decomposition — the O(N²) hotpath |
| `test_perf_acl_table_add` | 1 ADD move | Single table creation
baseline |
| `test_perf_acl_rules_add` | 10 ADD moves | Multiple rule additions |
| `test_perf_multi_operation` | 20 mixed moves | Port removal + table
remove + rule adds combined |
| `test_perf_ntp_server` | 3 ADD moves | Scalar config changes |
| `test_perf_port_mtu_replace` | 8 REPLACE moves | Per-port property
changes (multi-ASIC aware) |

### Budget Formula

```
budget = measured_overhead + num_moves × 2 loads/move × loaddata_time × 5x safety
```

- **Dynamic calibration**: A 1-move NTP patch runs during fixture setup
to measure real per-device overhead (CLI startup, YANG init, ConfigDB
write, SSH). No hardcoded constants.
- **`loaddata_time`**: Measured by timing `SonicYang.loadData()` on the
DUT at test start.
- **5x safety multiplier**: Accommodates variance across platforms
without masking regressions.

### Key Design Decisions

- **Topology `any`**: Runs on t0, t1-lag, t1-lag-vpp, multi-asic-t1, and
t2
- **Multi-ASIC support**: PORT MTU test discovers ports from frontend
ASIC namespace and uses `is_asic_specific=True`
- **Scales to available ports**: Uses `min(desired, available)` instead
of skipping (minimum 2 ports)
- **Times only `config apply-patch` CLI**: File copy and Ansible
overhead excluded from measurement
- **Excludes LAG member ports**: Avoids false failures from
PortChannel-bound ports
- **Loganalyzer integration**: Function-scoped fixture catches
unexpected syslog errors per test

### Background

GCU `apply-patch` has an O(N²) performance issue in the sort step: each
candidate move calls `SonicYang.loadData()` multiple times, and libyang
v1 leafref resolution scales quadratically with config size. On 512-port
platforms, simple patches took **hours**. This was addressed by:

- **sonic-net/sonic-utilities#4476** (merged): Content-hash cache for
`loadData` + REPLACE validation reorder
- **sonic-net/sonic-utilities#4478** (merged):
`BulkLeafListMoveGenerator` — batches N leaf-list REMOVEs into 1 REPLACE
move

This test provides CI-runnable regression coverage to prevent future
regressions.

### CI Results

All checks passing on build 1104058:
- ✅ t0, t1-lag, t1-lag-vpp, multi-asic-t1, t2
- ✅ Static analysis, DCO, EasyCLA, CodeQL, Semgrep

---------

Signed-off-by: vaibhavhd <vaibhav.dixit@microsoft.com>
Co-authored-by: rookie-who <rookie-who@users.noreply.github.com>
Signed-off-by: rimunagala <rimunagala@microsoft.com>
@vaibhavhd vaibhavhd merged commit e020046 into 202412 Jun 5, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants