[code sync] Merge code from sonic-net/sonic-mgmt:202511 to 202603#1239
Merged
lizhijianrd merged 22 commits intoJun 18, 2026
Merged
Conversation
…c duts - bgp_update_timer (#25324) <!-- Please make sure you've read and understood our contributing guidelines; https://github.com/sonic-net/SONiC/blob/gh-pages/CONTRIBUTING.md Please provide following information to help code review process a bit easier: --> ### Description of PR <!-- - Please include a summary of the change and which issue is fixed. - Please also include relevant motivation and context. Where should reviewer start? background context? - List any dependencies that are required for this change. --> Summary: Fixes # (issue) - This PR fixes the following "_vtysh_" command execution on multi-asic duts, which throws the following error for '_test_bgp_update_timer_' tests. _`AnsibleModule::shell Result => {"failed": true, "changed": true, "stdout": "", "stderr": "Usage: /usr/bin/vtysh -n [0 to 1] [OPTION]... ", "rc": 1, "cmd": "vtysh -c 'show ip bgp neighbors 20.0.0.1 received-routes' | grep '10.10.100.0/27'",`_ ```shell L0071 DEBUG | /data/tests/common/devices/multi_asic.py::_run_on_asics#144: [ixr-x3b-16] AnsibleModule::shell, args=["vtysh -c 'show ip bgp neighbors 20.0.0.1 received-routes' | grep '10.10.100.0/27'"], kwargs={"module_ignore_errors": true} L0108 DEBUG | /data/tests/common/devices/multi_asic.py::_run_on_asics#144: [ixr-x3b-16] AnsibleModule::shell Result => {"failed": true, "changed": true, "stdout": "", "stderr": "Usage: /usr/bin/vtysh -n [0 to 1] [OPTION]... ", "rc": 1, "cmd": "vtysh -c 'show ip bgp neighbors 20.0.0.1 received-routes' | grep '10.10.100.0/27'", "start": "2025-09-12 23:30:58.043409", "end": "2025-09-12 23:30:58.055180", "delta": "0:00:00.011771", "msg": "non-zero return code", "invocation": {"module_args": {"_raw_params": "vtysh -c 'show ip bgp neighbors 20.0.0.1 received-routes' | grep '10.10.100.0/27'", "_uses_shell": true, "warn": false, "stdin_add_newline": true, "strip_empty_ends": true, "argv": null, "chdir": null, "executable": null, "creates": null, "removes": null, "stdin": null}}, "stdout_lines": [], "stderr_lines": ["Usage: /usr/bin/vtysh -n [0 to 1] [OPTION]... "], "_ansible_no_log": null} ``` ### Type of change <!-- - Fill x for your type of change. - e.g. - [x] Bug fix --> - [ ] Bug fix - [ ] Testbed and Framework(new/improvement) - [ ] New Test case - [ ] Skipped for non-supported platforms - [x] Test case improvement ### Back port request - [ ] 202205 - [ ] 202305 - [ ] 202311 - [ ] 202405 - [ ] 202411 - [x] 202505 ### Approach #### What is the motivation for this PR? - On multi-asic duts, '_vtysh_' commands for debugging are not executed and throws the following error, `AnsibleModule::shell Result => {"failed": true, "changed": true, "stdout": "", "stderr": "Usage: /usr/bin/vtysh -n [0 to 1] [OPTION]... ", "rc": 1, "cmd": "vtysh -c 'show ip bgp neighbors 20.0.0.1 received-routes' | grep '10.10.100.0/27'", ` #### How did you do it? - Fetch asichost and call '_run_vtysh_' commands to get the received-routes detail for both single-asic and multi-asic duts. #### How did you verify/test it? - Ran 'test_bgp_update_timer' tests on T2 duts and made sure tests are passing. #### Any platform specific information? #### Supported testbed topology if it's a new test case? ### Documentation <!-- (If it's a new feature, new test case) Did you update documentation/Wiki relevant to your implementation? Link to the wiki page? --> <img width="1789" height="230" alt="image" src="https://github.com/user-attachments/assets/398cf047-c7f4-4b6c-a414-99569cb5c100" /> Signed-off-by: sanrajen <sanjai.rajendran@nokia.com> Signed-off-by: mssonicbld <sonicbld@microsoft.com> Co-authored-by: Sanjai Rajendran <114024719+sanjair-git@users.noreply.github.com>
<!-- Please make sure you've read and understood our contributing guidelines; https://github.com/sonic-net/SONiC/blob/gh-pages/CONTRIBUTING.md Please provide following information to help code review process a bit easier: --> ### Description of PR <!-- - Please include a summary of the change and which issue is fixed. - Please also include relevant motivation and context. Where should reviewer start? background context? - List any dependencies that are required for this change. --> Summary: Fixing Bugs sonic-net/sonic-mgmt#20840 (comment) and sonic-net/sonic-mgmt#21071 (HashTest) ### Type of change <!-- - Fill x for your type of change. - e.g. - [x] Bug fix --> - [x] Bug fix - [ ] Testbed and Framework(new/improvement) - [ ] New Test case - [ ] Skipped for non-supported platforms - [ ] Test case improvement ### Back port request - [ ] 202205 - [ ] 202305 - [ ] 202311 - [ ] 202405 - [ ] 202411 - [x] 202505 ### Approach #### What is the motivation for this PR? Fixing Bug sonic-net/sonic-mgmt#20840 (comment) and sonic-net/sonic-mgmt#21071 by override send_and_verify_packets on class IPinIPHashTest to make sure that the verification of the packet arrival will succeed How did you do it? fixed HashTest.check_within_expected_range usage Fixed IPinIP Hash Test by override send_and_verify_packets on class IPinIPHashTest and use no timeout for the packets arrival verification. How did you verify/test it? Ran fib_test::test_hash locally. Conducted a full overnight regression test. Any platform specific information? No Supported testbed topology if it's a new test case? N/A ### Documentation <!-- (If it's a new feature, new test case) Did you update documentation/Wiki relevant to your implementation? Link to the wiki page? --> Signed-off-by: mssonicbld <sonicbld@microsoft.com> Co-authored-by: gshemesh2 <gshemesh@nvidia.com>
…_t2_single_node_min.yml" (#25188) Reverts sonic-net/sonic-mgmt#25057 This PR is being reverted temporarily because the infrastructure changes required for the BGP Confederation topology have not yet been merged into the 202511 branch. The following prerequisite PRs must be included in the 202511 branch to ensure that deploy-mg functions correctly with the confederation topology. These PRs also contain several test fixes that are required once the confederation changes are integrated into the branch: sonic-net/sonic-mgmt#22417 sonic-net/sonic-mgmt#24416 sonic-net/sonic-mgmt#24415 sonic-net/sonic-mgmt#24157 sonic-net/sonic-mgmt#23725 sonic-net/sonic-mgmt#25190 Co-authored-by: Arvindsrinivasan Lakshmi Narasimhan <55814491+arlakshm@users.noreply.github.com>
Both testWarmreboot and testFastreboot in sflow/test_sflow.py::TestReboot are affected by a post-reboot race condition where the test reads interface ifindex values from sysfs before the kernel has re-created the network interfaces. Two instances have been observed across different DUTs, hwskus, reboot types, and interfaces. <!-- Please make sure you've read and understood our contributing guidelines; https://github.com/sonic-net/SONiC/blob/gh-pages/CONTRIBUTING.md Please provide following information to help code review process a bit easier: --> ### Description of PR <!-- - Please include a summary of the change and which issue is fixed. - Please also include relevant motivation and context. Where should reviewer start? background context? - List any dependencies that are required for this change. --> Summary: Fixes #25185 ### Type of change <!-- - Fill x for your type of change. - e.g. - [x] Bug fix --> - [x] Bug fix - [ ] Testbed and Framework(new/improvement) - [ ] New Test case - [ ] Skipped for non-supported platforms - [ ] Test case improvement ### Back port request - [ ] 202311 - [ ] 202405 - [ ] 202411 - [ ] 202505 - [x] 202511 - [ ] 202512 - [x] 202605 ### Approach #### What is the motivation for this PR? Make sflow reboot testing more robust and able to handle dut timing variances. #### How did you do it? Introduced a new utility for verifying that kernel interfaces exist, to be used in a wait_until. #### How did you verify/test it? Verifed test passes on failing topology. #### Any platform specific information? N/A #### Supported testbed topology if it's a new test case? N/A ### Documentation N/A <!-- (If it's a new feature, new test case) Did you update documentation/Wiki relevant to your implementation? Link to the wiki page? --> Signed-off-by: Will Rideout <wrideout@arista.com> Signed-off-by: mssonicbld <sonicbld@microsoft.com> Co-authored-by: wrideout-arista <wrideout@arista.com>
… (#25340) <!-- Please make sure you've read and understood our contributing guidelines; https://github.com/sonic-net/SONiC/blob/gh-pages/CONTRIBUTING.md Please provide following information to help code review process a bit easier: --> ### Description of PR <!-- - Please include a summary of the change and which issue is fixed. - Please also include relevant motivation and context. Where should reviewer start? background context? - List any dependencies that are required for this change. --> This PR tells loganalyzer to ignore syslog errors matching `.*ERR bgp#fpmsyncd:.*onRouteMsg: Invalid VRF name.*` that are sometimes seen during VxLAN tests. Summary: Microsoft ADO ID: 36841610 Ignoring `Invalid VRF name` errors in VxLAN tests ### Type of change <!-- - Fill x for your type of change. - e.g. - [x] Bug fix --> - [ ] Bug fix - [ ] Testbed and Framework(new/improvement) - [ ] New Test case - [ ] Skipped for non-supported platforms - [x] Test case improvement ### Back port request - [ ] 202311 - [ ] 202405 - [ ] 202411 - [ ] 202505 - [x] 202511 - [ ] 202512 - [x] 202605 ### Approach #### What is the motivation for this PR? There is a known issue that after deleting the VNET during the teardown of `fixture_setUp` fixture in VxLAN tests, a race condition might happen that will cause a log similar to the following to be added to syslog: ``` ERR bgp#fpmsyncd: :- onRouteMsg: Invalid VRF name (ifindex 1158) ``` Please see this issue for more details: sonic-net/sonic-buildimage#12259 The SWSS fix is available, but it is not merged yet: sonic-net/sonic-swss#4499 This error does not indicate any functional issues and is safe to ignore. However, the loganalyzer catches and reports this error (even though the test itself passed). Therefore, this PR tells loganalyzer to ignore this error pattern. #### How did you do it? For all VxLAN tests, added an error pattern to the loganalyzer's `ignore_regex` list. #### How did you verify/test it? I verified that after this change, the following test passed without any loganalyzer errors: ``` vxlan/test_vnet_bgp_route_precedence.py::Test_VNET_BGP_route_Precedence::test_vnet_route_bgp_removal_before_ep[v6_in_v4-BFD-initially_down] ``` **Note:** I also used the following command on the DUT during the test to add the error log to syslog: ``` logger -p ERR -t "bgp#fpmsyncd" ":- onRouteMsg: Invalid VRF name (ifindex 1158)" ``` #### Any platform specific information? N/A #### Supported testbed topology if it's a new test case? N/A ### Documentation <!-- (If it's a new feature, new test case) Did you update documentation/Wiki relevant to your implementation? Link to the wiki page? --> N/A Signed-off-by: Mahdi Ramezani <mramezani@microsoft.com> Signed-off-by: mssonicbld <sonicbld@microsoft.com> Co-authored-by: mramezani95 <mramezani@microsoft.com>
…(#25325) - Add frontend_asic_index_with_portchannel fixture to select ASIC with portchannels - Update conftest to dynamically determine DUT hostname fixture - Filter out backend/internal portchannels in test_portchannel_interface - Use enum_rand_one_per_hwsku_frontend_hostname for consistent frontend selection <!-- Please make sure you've read and understood our contributing guidelines; https://github.com/sonic-net/SONiC/blob/gh-pages/CONTRIBUTING.md Please provide following information to help code review process a bit easier: --> ### Description of PR <!-- - Please include a summary of the change and which issue is fixed. - Please also include relevant motivation and context. Where should reviewer start? background context? - List any dependencies that are required for this change. --> Summary: Fixes # ([issue](sonic-net/sonic-mgmt#21639)) ### Type of change <!-- - Fill x for your type of change. - e.g. - [x] Bug fix --> - [x] Bug fix - [ ] Testbed and Framework(new/improvement) - [ ] New Test case - [ ] Skipped for non-supported platforms - [ ] Test case improvement ### Back port request - [ ] 202205 - [ ] 202305 - [ ] 202311 - [x] 202405 - [ ] 202411 - [ ] 202505 - [x] 202511 ### Approach #### What is the motivation for this PR? sonic-net/sonic-mgmt#21639 #### How did you do it? Enable portchannel tests to run correctly on chassis linecards and multi-ASIC devices by targeting only ASICs with portchannel configuration. Changes: New Fixture: frontend_asic_index_with_portchannel (duthost_utils.py) Selects a frontend ASIC that has external portchannels configured Returns ASIC index for multi-ASIC/chassis devices, None for single-ASIC Ensures tests only run on ASICs with actual portchannel configuration Updated PortChannel Tests (test_portchannel_interface.py) Modified fixtures (cfg_facts, rand_portchannel_name, portchannel_table) to use the ASIC selector Filters out backend/internal portchannels, targets only external ones All test functions now operate on the correct ASIC namespace Log Analyzer Updates (conftest.py) Added expected error patterns for portchannel operations Impact: Fixes portchannel tests on chassis linecards and multi-ASIC platforms by ensuring tests run only on ASICs with portchannel configuration, preventing failures on ASICs without portchannels. #### How did you verify/test it? Sonic-mgmt test on T2 #### Any platform specific information? #### Supported testbed topology if it's a new test case? ### Documentation <!-- (If it's a new feature, new test case) Did you update documentation/Wiki relevant to your implementation? Link to the wiki page? --> ``` ---------------------------- live log sessionstart ----------------------------- 10/12/2025 01:38:32 conftest.pytest_sessionstart L0528 ERROR | /data/tests/build-gnmi-stubs.sh failed with exit code 128 /usr/local/lib/python3.8/dist-packages/paramiko/transport.py:236: CryptographyDeprecationWarning: Blowfish has been deprecated "class": algorithms.Blowfish, /usr/local/lib/python3.8/dist-packages/paramiko/transport.py:236: CryptographyDeprecationWarning: Blowfish has been deprecated "class": algorithms.Blowfish, /usr/local/lib/python3.8/dist-packages/paramiko/transport.py:236: CryptographyDeprecationWarning: Blowfish has been deprecated "class": algorithms.Blowfish, /usr/local/lib/python3.8/dist-packages/paramiko/transport.py:236: CryptographyDeprecationWarning: Blowfish has been deprecated "class": algorithms.Blowfish, /usr/local/lib/python3.8/dist-packages/paramiko/transport.py:236: CryptographyDeprecationWarning: Blowfish has been deprecated "class": algorithms.Blowfish, ============================= test session starts ============================== platform linux -- Python 3.8.10, pytest-7.4.0, pluggy-1.5.0 ansible: 2.13.13 rootdir: /data/tests configfile: pytest.ini plugins: stress-1.0.1, metadata-3.1.1, forked-1.6.0, html-4.1.1, repeat-0.9.3, xdist-1.28.0, allure-pytest-2.8.22, ansible-4.0.0 collected 4 items generic_config_updater/test_portchannel_interface.py::test_portchannel_interface_tc1_suite[sfd-lt2-lc0] PASSED [ 25%] generic_config_updater/test_portchannel_interface.py::test_portchannel_interface_tc2_attributes[sfd-lt2-lc0] PASSED [ 50%] generic_config_updater/test_portchannel_interface.py::test_portchannel_interface_tc1_suite[sfd-lt2-lc2] PASSED [ 75%] generic_config_updater/test_portchannel_interface.py::test_portchannel_interface_tc2_attributes[sfd-lt2-lc2] PASSED [100%]DEBUG:tests.conftest:[log_custom_msg] item: <Function test_portchannel_interface_tc2_attributes[sfd-lt2-lc2]> INFO:root:Can not get Allure report URL. Please check logs =============================== warnings summary =============================== ../../usr/local/lib/python3.8/dist-packages/paramiko/transport.py:236 /usr/local/lib/python3.8/dist-packages/paramiko/transport.py:236: CryptographyDeprecationWarning: Blowfish has been deprecated "class": algorithms.Blowfish, -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html ==================================== PASSES ==================================== ______________ test_portchannel_interface_tc1_suite[sfd-lt2-lc0] _______________ ____________ test_portchannel_interface_tc2_attributes[sfd-lt2-lc0] ____________ ______________ test_portchannel_interface_tc1_suite[sfd-lt2-lc2] _______________ ____________ test_portchannel_interface_tc2_attributes[sfd-lt2-lc2] ____________ - generated xml file: /run_logs/gcu_test_1/generic_config_updater/test_portchannel_interface_2025-12-10-01-36-20.xml - =========================== short test summary info ============================ PASSED generic_config_updater/test_portchannel_interface.py::test_portchannel_interface_tc1_suite[sfd-lt2-lc0] PASSED generic_config_updater/test_portchannel_interface.py::test_portchannel_interface_tc2_attributes[sfd-lt2-lc0] PASSED generic_config_updater/test_portchannel_interface.py::test_portchannel_interface_tc1_suite[sfd-lt2-lc2] PASSED generic_config_updater/test_portchannel_interface.py::test_portchannel_interface_tc2_attributes[sfd-lt2-lc2] ================== 4 passed, 1 warning in 1333.56s (0:22:13) =================== ``` Signed-off-by: Anand Mehra (anamehra) <anamehra@cisco.com> Signed-off-by: mssonicbld <sonicbld@microsoft.com> Co-authored-by: anamehra <54692434+anamehra@users.noreply.github.com>
…ed to run from host (#25321) <!-- Please make sure you've read and understood our contributing guidelines; https://github.com/sonic-net/SONiC/blob/gh-pages/CONTRIBUTING.md Please provide following information to help code review process a bit easier: --> ### Description of PR <!-- - Please include a summary of the change and which issue is fixed. - Please also include relevant motivation and context. Where should reviewer start? background context? - List any dependencies that are required for this change. --> Correcting the updateFeatureState func call in stopServices teardown path to run from host ( PR #18884 ), Summary: Fixes # (issue) Failure in multi-dut qos test with error -TypeError: SonicAsic.command() got an unexpected keyword argument 'module_ignore_errors' ### Type of change <!-- - Fill x for your type of change. - e.g. - [x] Bug fix --> - [ ] Bug fix - [ ] Testbed and Framework(new/improvement) - [ ] New Test case - [ ] Skipped for non-supported platforms - [x] Test case improvement ### Back port request - [ ] 202205 - [ ] 202305 - [ ] 202311 - [ ] 202405 - [ ] 202411 - [ ] 202505 - [x] 202511 ### Approach #### What is the motivation for this PR? Failure in multi-dut qos test with error -TypeError: SonicAsic.command() got an unexpected keyword argument 'module_ignore_errors' #### How did you do it? the updateFeatureState func call takes the Sonic_host as argument. Hence corrected the calling function #### How did you verify/test it? Executed the qos tests & verified the results #### Any platform specific information? #### Supported testbed topology if it's a new test case? ### Documentation <!-- (If it's a new feature, new test case) Did you update documentation/Wiki relevant to your implementation? Link to the wiki page? --> Signed-off-by: ansrajpu <anshu.rajput@nokia.com> Signed-off-by: mssonicbld <sonicbld@microsoft.com> Co-authored-by: Anshu <113939367+ansrajpu-git@users.noreply.github.com>
…re-enabling legacy KEX and host key algorithms (#25374) ### Description of PR After upgrading `docker-sonic-mgmt` to a version that ships paramiko 5.x, console SSH connections to older console servers fail during KEX negotiation with `Incompatible ssh peer (no acceptable kex algorithm)`. **Root cause:** paramiko 5.x removed the following legacy algorithms from its default preferred lists for security hardening: - KEX: `diffie-hellman-group14-sha1`, `diffie-hellman-group-exchange-sha1` - Host keys: `ssh-rsa` Many console servers in testbed environments only support these legacy algorithms and cannot be upgraded. This PR re-enables the legacy algorithms specifically for console connections by reconstructing the KEX classes from their existing SHA256 counterparts and re-registering them in paramiko's Transport handler maps. All changes are scoped to `BaseConsoleConn` and wrapped in try/except for forward compatibility. Summary: Fix console SSH connections broken by paramiko 5.x removal of legacy KEX and host key algorithms. ### Type of change - [x] Bug fix - [ ] Testbed and Framework(new/improvement) - [ ] New Test case - [ ] Skipped for non-supported platforms - [ ] Test case improvement ### Back port request - [ ] 202311 - [ ] 202405 - [ ] 202411 - [ ] 202505 - [x] 202511 - [ ] 202512 - [ ] 202605 ### Approach #### What is the motivation for this PR? Console connections to older SSH servers (e.g. Cisco SSH-2.0-Cisco-1.25) fail after the `docker-sonic-mgmt` container upgraded to paramiko 5.x. These console servers only support `diffie-hellman-group14-sha1` for KEX and `ssh-rsa` for host keys, which paramiko 5.x no longer offers by default. This breaks any test that relies on console access, including reboot tests that collect console logs during DUT reboot. #### How did you do it? In `BaseConsoleConn.__init__()` (before the connection is established): 1. Reconstruct `diffie-hellman-group14-sha1` and `diffie-hellman-group-exchange-sha1` KEX classes by subclassing the existing SHA256 variants and overriding the hash algorithm 2. Register these classes in `paramiko.Transport._preferred_kex` and `paramiko.Transport._kex_info` 3. Re-add `ssh-rsa` to `paramiko.Transport._preferred_keys`, `_key_info`, and `RSAKey.HASHES` 4. Wrap everything in try/except so it gracefully skips if paramiko internals change in future versions #### How did you verify/test it? - Verified console SSH connections succeed to Cisco console servers (SSH-2.0-Cisco-1.25) that only support legacy algorithms - Confirmed no impact on connections to console servers that support modern algorithms (SHA256 KEX remains preferred and is tried first) - Tested with both paramiko 4.x (no-op, algorithms already present) and paramiko 5.x (algorithms re-added) #### Any platform specific information? N/A — this is a test infrastructure fix affecting console connections. No DUT platform dependency. Only affects testbeds with older console servers that require legacy SSH algorithms. #### Supported testbed topology if it's a new test case? N/A — bug fix, not a new test case. ### Documentation N/A ``` Signed-off-by: Yatish Koul <yatishkoul@microsoft.com> Signed-off-by: mssonicbld <sonicbld@microsoft.com> Co-authored-by: Yatish <yatishkoul@microsoft.com>
<!-- Please make sure you've read and understood our contributing guidelines; https://github.com/sonic-net/SONiC/blob/gh-pages/CONTRIBUTING.md Please provide following information to help code review process a bit easier: --> ### Description of PR <!-- - Please include a summary of the change and which issue is fixed. - Please also include relevant motivation and context. Where should reviewer start? background context? - List any dependencies that are required for this change. --> Update qos params for topo-ft2-64 ### Back port request - [ ] 202311 - [ ] 202405 - [ ] 202411 - [ ] 202505 - [x] 202511 - [ ] 202512 - [ ] 202605 ### Approach #### What is the motivation for this PR? Update qos params for this topo after sai 14 upgrade and buffer changes #### How did you do it? Script generated #### How did you verify/test it? Verified by manual qos testing Signed-off-by: Ryan Garofano <rgarofano@arista.com> Signed-off-by: mssonicbld <sonicbld@microsoft.com> Co-authored-by: Ryan Garofano <rgarofano@arista.com>
<!-- Please make sure you've read and understood our contributing guidelines; https://github.com/sonic-net/SONiC/blob/gh-pages/CONTRIBUTING.md Please provide following information to help code review process a bit easier: --> ### Description of PR <!-- - Please include a summary of the change and which issue is fixed. - Please also include relevant motivation and context. Where should reviewer start? background context? - List any dependencies that are required for this change. --> Update qos params for topo-lt2-p32o64 ### Back port request - [ ] 202311 - [ ] 202405 - [ ] 202411 - [ ] 202505 - [x] 202511 - [ ] 202512 - [ ] 202605 ### Approach #### What is the motivation for this PR? Update qos params for this topo after sai 14 upgrade and buffer changes #### How did you do it? Script generated #### How did you verify/test it? Verified by manual qos testing Signed-off-by: Ryan Garofano <rgarofano@arista.com> Signed-off-by: mssonicbld <sonicbld@microsoft.com> Co-authored-by: Ryan Garofano <rgarofano@arista.com>
…DASH table write order (#25376) <!-- Please make sure you've read and understood our contributing guidelines; https://github.com/sonic-net/SONiC/blob/gh-pages/CONTRIBUTING.md Please provide following information to help code review process a bit easier: --> ### Description of PR <!-- - Please include a summary of the change and which issue is fixed. - Please also include relevant motivation and context. Where should reviewer start? background context? - List any dependencies that are required for this change. --> Summary: Fixes # (issue) ### Type of change <!-- - Fill x for your type of change. - e.g. - [x] Bug fix --> - [ ] Bug fix - [ ] Testbed and Framework(new/improvement) - [ ] New Test case - [ ] Skipped for non-supported platforms - [ ] Test case improvement ### Back port request - [ ] 202311 - [ ] 202405 - [ ] 202411 - [ ] 202505 - [ ] 202511 - [ ] 202512 - [ ] 202605 ### Approach #### What is the motivation for this PR? A recent orchagent change removed the retry-mechanism for out-of-order DASH configurations (sonic-net/sonic-swss#4566). Existing sonic-mgmt DASH test have been programming DASH configs in the wrong order and relying on the retry mechanism to program configs once all prerequisite DASH configs exist. Since the retry no longer occurs in orchagent, sonic-mgmt tests now need to ensure that configs are sent in the correct order. #### How did you do it? Add a new utility to sort all DASH configs for a given test module and program them in the correct order in tests/common/dash_utils.py: - `DashPhase` enum (GROUP_1..GROUP_6) names the phases in dependency order. - `DASH_TABLE_PHASE` registry maps each known DASH table name to its phase. New DASH tables only need a single entry here. - `apply_dash_configs(localhost, duthost, ptfhost, dpu_index, *dicts, set_db=True, ...)` accepts any number of config dicts keyed by `DASH_<TABLE>_TABLE:<key>`, groups them by phase, and applies them in ascending phase order on setup or descending order on teardown (`set_db=False`). The underlying `apply_messages` is lazy-imported so the helper works in both tests/dash/ and tests/ha/ contexts. The following test fixtures are migrated: - tests/dash/test_fnic.py - tests/dash/test_dash_privatelink.py - tests/dash/test_dash_metering.py - tests/dash/test_plnsg.py - tests/dash/test_config_churn.py (setup fixture only; churn test bodies retain raw apply_messages for sairedis-tracking precision) Each fixture's previous hand-rolled bucketing collapses to a single `apply_dash_configs(...)` call. Platform-conditional bundles (Pensando / Bluefield) are spread via `*list` patterns. A small fix to test_config_churn churn-body cleanup ordering and an ENI pl_sip_encoding update for the new VNI are included; these were necessary follow-ups when re-running the churn tests with the new fixture batching. Unit tests: tests/common/unit_tests/fixtures/unit_test_dash_utils.py covers key parsing, bucketing, ordering, same-phase merging, conflict warnings, unknown-table fallback, reverse-order delete, lazy-import default apply_fn, and an end-to-end positive assertion of the expected batches for a representative fixture. Run with: python3 -m pytest --noconftest \ tests/common/unit_tests/fixtures/unit_test_dash_utils.py -v #### How did you verify/test it? Run the migrated tests on a smartswitch testbed and verify that they pass. #### Any platform specific information? #### Supported testbed topology if it's a new test case? ### Documentation <!-- (If it's a new feature, new test case) Did you update documentation/Wiki relevant to your implementation? Link to the wiki page? --> Signed-off-by: Lawrence Lee <lawlee@microsoft.com> Signed-off-by: mssonicbld <sonicbld@microsoft.com> Co-authored-by: Lawrence Lee <lawlee@microsoft.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ry of interface Port… (#25397) Fix PortChannel999 IP removal failure due to T1 golden config static route <!-- Please make sure you've read and understood our contributing guidelines; https://github.com/sonic-net/SONiC/blob/gh-pages/CONTRIBUTING.md Please provide following information to help code review process a bit easier: --> ### Description of PR <!-- - Please include a summary of the change and which issue is fixed. - Please also include relevant motivation and context. Where should reviewer start? background context? - List any dependencies that are required for this change. --> Issuing config interface shutdown PortChannel999 drops the LAG link operationally. This causes: The BGP session over PortChannel999 tears down FRR withdraws PortChannel999 from the recursive nexthop resolution of 10.2.0.1/32 show ip route vrf all static no longer lists PortChannel999 The sonic-utilities pre-flight check passes cleanly config interface ip remove PortChannel999 10.0.0.0/31 succeeds A time.sleep(5) after admin-down is used to allow FRR time to complete BGP session teardown and routing table convergence before the IP removal is attempted. Summary: Fixes # 40146 ### Type of change <!-- - Fill x for your type of change. - e.g. - [x] Bug fix --> - [x] Bug fix - [ ] Testbed and Framework(new/improvement) - [ ] New Test case - [ ] Skipped for non-supported platforms - [ ] Test case improvement ### Back port request - [ ] 202205 - [ ] 202305 - [ ] 202311 - [ ] 202405 - [ ] 202411 - [ ] 202505 - [x] 202511 ### Approach #### What is the motivation for this PR? When running test_po_update and test_po_update_io_no_loss on T1 hardware topologies, the test teardown consistently fails with: Error: Cannot remove the last IP entry of interface PortChannel999. A static ip route is still bound to the RIF. This causes PortChannel999 to be left behind on the DUT after the test, which then causes the next run to fail with a cascading error: Error: PortChannel999 already exists! The T1 testbed golden config (ansible/golden_config_db/smartswitch_t1.json) installs a persistent static route: "STATIC_ROUTE": { "default|10.2.0.1/32": { "nexthop": "18.0.202.1", "ifname": "" } } Because ifname is empty, FRR resolves the nexthop 18.0.202.1 recursively via whichever PortChannel currently holds the 10.0.0.0/31 subnet. During the test, PortChannel999 is assigned that subnet. While it remains operationally up, FRR lists PortChannel999 as a resolved path in show ip route vrf all static: S> 10.2.0.1/32 [1/0] via 18.0.202.1 (recursive) * via 10.0.0.1, PortChannel999 * via 10.0.0.5, PortChannel105 ... sonic-utilities config/main.py checks that output before deleting the last IP on any interface. Since PortChannel999 appears as a resolved nexthop, the pre-flight check raises the error and the config interface ip remove command is rejected. admin@MtFuji-dut:~$ sonic-db-cli CONFIG_DB keys "STATIC_ROUTE*" STATIC_ROUTE|default|10.2.0.1/32 admin@MtFuji-dut:~$ sonic-db-cli CONFIG_DB hgetall "STATIC_ROUTE|default|10.2.0.1/32" {'blackhole': 'false', 'distance': '0', 'ifname': '', 'nexthop': '18.0.202.1', 'nexthop-vrf': 'default'} admin@MtFuji-dut:~$ show ip route vrf all static Codes: K - kernel route, C - connected, L - local, S - static, R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP, T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR, f - OpenFabric, t - Table-Direct, > - selected route, LICENSE Makefile README.md SECURITY.md ansible azure-pipelines.yml docs pylintrc pyproject.toml sdn_tests setup-container.sh sonic_dictionary.txt spytest test_reporting tests tools - FIB route, q - queued, r - rejected, b - backup t - trapped, o - offload failure IPv4 unicast VRF default: S> 10.2.0.1/32 [1/0] via 18.0.202.1 (recursive), weight 1, 05:43:36 *. via 10.0.0.1, PortChannel999, weight 1, 05:43:36 LICENSE Makefile README.md SECURITY.md ansible azure-pipelines.yml docs pylintrc pyproject.toml sdn_tests setup-container.sh sonic_dictionary.txt spytest test_reporting tests tools via 10.0.0.5, PortChannel105, weight 1, 05:43:36 LICENSE Makefile README.md SECURITY.md ansible azure-pipelines.yml docs pylintrc pyproject.toml sdn_tests setup-container.sh sonic_dictionary.txt spytest test_reporting tests tools via 10.0.0.9, PortChannel108, weight 1, 05:43:36 LICENSE Makefile README.md SECURITY.md ansible azure-pipelines.yml docs pylintrc pyproject.toml sdn_tests setup-container.sh sonic_dictionary.txt spytest test_reporting tests tools via 10.0.0.13, PortChannel111, weight 1, 05:43:36 admin@MtFuji-dut:~$ show ip route 10.2.0.1/32 Routing entry for 10.2.0.1/32 Known via "static", distance 1, metric 0, best Last update 05:44:26 ago 18.0.202.1 (recursive) 10.0.0.1, via PortChannel999 10.0.0.5, via PortChannel105 10.0.0.9, via PortChannel108 10.0.0.13, via PortChannel111 admin@MtFuji-dut:~$ show ip route 18.0.202.1 Routing entry for 0.0.0.0/0 Known via "bgp", distance 20, metric 0, best Last update 05:46:26 ago 10.0.0.1, via PortChannel999 10.0.0.5, via PortChannel105 10.0.0.9, via PortChannel108 10.0.0.13, via PortChannel111 #### How did you do it? a shutdown_interface (admin-down) step is inserted at the very top of each finally block, immediately before the existing IP removal. This preserves the original cleanup sequence (IP → members → delete PortChannel) unchanged. duthost.shutdown_interface(PortChannel999) ← NEW: admin-down the LAG time.sleep(5) ← NEW: wait for FRR BGP convergence config interface ip remove PortChannel999 ← original step (unchanged) time.sleep(5) config portchannel member del ← original step (unchanged) _wait_until_pc_members_removed config portchannel del PortChannel999 ← original step (unchanged) #### How did you verify/test it? Run the test script test_po_update.py in the T1 smartswitch having golden config (Cisco-8102-28FH-DPU-O) #### Any platform specific information? T1 Smartswitch - Cisco-8102-28FH-DPU-O #### Supported testbed topology if it's a new test case? N/A ### Documentation <!-- (If it's a new feature, new test case) Did you update documentation/Wiki relevant to your implementation? Link to the wiki page? --> Signed-off-by: Karthik H <klkh1993@gmail.com> Signed-off-by: mssonicbld <sonicbld@microsoft.com> Co-authored-by: Karthik H <klkh1993@gmail.com>
… family instead of erroring (#25398)
### Description of PR
Summary:
`route/test_duplicate_route.py::test_duplicate_routes` raises
`IndexError: Cannot choose from an empty sequence` on testbeds where the
Vlan (or Loopback) interface is configured with only a single IP address
family.
The `setup_routes` fixture is parametrized over `ip_versions = [4, 6]`
and `interface_types = ['Loopback', 'Vlan']`, but it only verified that
the interface had at least one IP across **both** families combined:
```python
pytest_require((len(intf_ips['ipv4']) + len(intf_ips['ipv6'])) > 0, "No IP configured on any Vlan")
...
else:
prefixes.append(str(random.choice(intf_ips['ipv6'])).split("/")[0])
```
It then unconditionally called `random.choice()` on the family selected
by `ip_versions`. On testbeds that configure only one family on the
interface — e.g. a SmartSwitch where `Vlan55` is IPv4-only
(`20.0.200.254/24`) — the `ip_versions=6` + `interface_types='Vlan'`
combination calls `random.choice([])` and errors out.
Fixes # (no GitHub issue filed yet)
### Type of change
- [x] Bug fix
- [ ] Testbed and Framework(new/improvement)
- [ ] New Test case
- [ ] Skipped for non-supported platforms
- [ ] Test case improvement
### Back port request
- [ ] 202311
- [ ] 202405
- [ ] 202411
- [ ] 202505
- [x] 202511
- [ ] 202512
- [ ] 202605
### Approach
#### What is the motivation for this PR?
Make the test robust on any testbed where an interface only has a single
address family, instead of crashing with an unhandled `IndexError`. This
was observed on SmartSwitch testbeds, where `Vlan55` is provisioned
IPv4-only — see
[`smartswitch_vlan_config`](https://github.com/sonic-net/sonic-mgmt/blob/995861eb21720d63da5a4916e9fbeb51fa1f7e12/ansible/module_utils/smartswitch_utils.py#L35-L50).
The fix itself is generic.
#### How did you do it?
Check the **specific** address family under test (`ipv4` or `ipv6`)
before selecting a prefix. When the requested family is not configured
on the interface, the case is skipped via `pytest_require` instead of
erroring. This also removes the duplicated `if ip_versions == 4 / else`
selection logic.
#### How did you verify/test it?
- Verified the failing combination (`ip_versions=6`,
`interface_types='Vlan'`) on a SmartSwitch testbed (Vlan55 IPv4-only)
now skips cleanly instead of raising `IndexError`.
- Verified the supported combinations (v4-Vlan, v4/v6-Loopback) still
run as before.
#### Any platform specific information?
Observed on SmartSwitch (`t1-28-lag`, light mode) where `Vlan55` is
configured IPv4-only. The change is platform-agnostic.
#### Supported testbed topology if it's a new test case?
N/A — existing test case.
Signed-off-by: mdubrovs <mdubrovs@cisco.com>
Signed-off-by: mssonicbld <sonicbld@microsoft.com>
Co-authored-by: Mike Dubrovsky <mdubrovs@cisco.com>
…cking base name containers (#25416) ### Description of PR On multi-asic, base-name containers (`bgp, swss, syncd, ...`) are not run and are exited because systemd masks them in favour of per-asic instances (`bgp@0, bgp@1, ...`). The testcase `platform_tests/test_reload_config.py::test_reload_configuration_checks` uses helper function`check_docker_status` which currently iterates over all container given from `docker ps -a` and require all containers to be running which includes base-name containers. It always returned False since base-name container does not run on multi-asic. This cause timeout failing the test case on multi-asic. Fixed it by using `critical_services_fully_started ` function to make sure critical services up and running. ### Type of change <!-- - Fill x for your type of change. - e.g. - [x] Bug fix --> - [x] Bug fix - [ ] Testbed and Framework(new/improvement) - [ ] New Test case - [ ] Skipped for non-supported platforms - [x] Test case improvement ### Back port request - [ ] 202205 - [ ] 202305 - [ ] 202311 - [ ] 202405 - [ ] 202411 - [ ] 202505 - [x] 202511 ### Approach #### What is the motivation for this PR? The test case `platform_tests/test_reload_config.py::test_reload_configuration_checks` fails waiting at `check_docker_status` to return `true` which checks for waits and checks for if `bgp, swss, syncd..` docker containers are up which acutually does not run and hence fails to return true within 300 sec timeout. ### How did you do it? Fixed it by using `critical_services_fully_started ` function to make sure critical services up and running. #### How did you verify/test it? Test case passes on both multi-asic and single-asic systems #### Any platform specific information? #### Supported testbed topology if it's a new test case? ### Documentation <!-- (If it's a new feature, new test case) Did you update documentation/Wiki relevant to your implementation? Link to the wiki page? --> Signed-off-by: setu <setu@arista.com> Signed-off-by: mssonicbld <sonicbld@microsoft.com> Co-authored-by: Setu Patel <171176331+arista-setu@users.noreply.github.com>
… to wait until routess withdrawn (#25415) Previously the fixture could yield a base number of withdrawn routes much too early. I replaced the generic sleep with a wait_until that does not progress until the number of routes is stable after withdrawing them. <!-- Please make sure you've read and understood our contributing guidelines; https://github.com/sonic-net/SONiC/blob/gh-pages/CONTRIBUTING.md Please provide following information to help code review process a bit easier: --> ### Description of PR <!-- - Please include a summary of the change and which issue is fixed. - Please also include relevant motivation and context. Where should reviewer start? background context? - List any dependencies that are required for this change. --> Summary: Fixes # (issue) ### Type of change <!-- - Fill x for your type of change. - e.g. - [x] Bug fix --> - [ ] Bug fix - [ ] Testbed and Framework(new/improvement) - [ ] New Test case - [ ] Skipped for non-supported platforms - [ ] Test case improvement ### Back port request - [ ] 202311 - [ ] 202405 - [ ] 202411 - [ ] 202505 - [x] 202511 - [ ] 202512 - [ ] 202605 ### Approach #### What is the motivation for this PR? Fix stress/test_stress_routes.py on large T2 Topos. #### How did you do it? I replaced the generic sleep with a wait_until that does not progress until the number of routes is stable after withdrawing them. #### How did you verify/test it? I reran the test multiple times, before it was flaky but now It passes every time. #### Any platform specific information? #### Supported testbed topology if it's a new test case? ### Documentation <!-- (If it's a new feature, new test case) Did you update documentation/Wiki relevant to your implementation? Link to the wiki page? --> Signed-off-by: Peter <peterbailey@arista.com> Signed-off-by: mssonicbld <sonicbld@microsoft.com> Co-authored-by: Peter Bailey <peterbailey@arista.com>
… topologies with a mix of LAG and non-LAG upstream ports (#25409) ### Description of PR For topologies like t1-isolated-d448u15-lag, most upstream (T2) connections are individual ports rather than PortChannels. The ACL table port binding logic adds downstream individual ports as well as PortChannels, but skips upstream individual ports because the logic for adding PortChannels and upstream-ports is mutually exclusive. This causes ingress ACL rules to not be applied on non-PortChannel upstream ports, so packets that should be dropped are forwarded, producing "Received packet that we expected not to receive" assertion errors on uplink->downlink ACL tests. Fix by also adding upstream ports that are not members of any PortChannel to the ACL table, even when PortChannels for other ports are present. Summary: Fixes # (issue) ### Type of change - [x] Bug fix - [ ] Testbed and Framework(new/improvement) - [ ] New Test case - [ ] Skipped for non-supported platforms - [ ] Test case improvement ### Back port request - [ ] 202205 - [ ] 202305 - [ ] 202311 - [ ] 202405 - [ ] 202411 - [ ] 202505 - [x] 202511 ### Approach #### What is the motivation for this PR? To fix the AssertionError "Received packet that we expected not to receive..." associated with the non-ACL'd non-PortChannel ports that cause the uplink->downlink ACL tests to fail. #### How did you do it? Add upstream ports that are not members of any PortChannel to the ACL table, even when PortChannels are present for other ports. #### How did you verify/test it? Ran the acl package on a t1-isolated-d448u15-lag setup and confirmed that the tests no longer fail with this AssertionError. Signed-off-by: Christopher Croy <ccroy@arista.com> Signed-off-by: mssonicbld <sonicbld@microsoft.com> Co-authored-by: Chris <156943338+ccroy-arista@users.noreply.github.com>
tests/qos/test_qos_sai.py are failing on a chassis that has both a multi asic and a single asic linecard. This commit fixes the issue and provides a general code improvement. This is a very similar issue as solved by sonic-net/sonic-mgmt#23999 but for a different fixture. <!-- Please make sure you've read and understood our contributing guidelines; https://github.com/sonic-net/SONiC/blob/gh-pages/CONTRIBUTING.md Please provide following information to help code review process a bit easier: --> ### Description of PR <!-- - Please include a summary of the change and which issue is fixed. - Please also include relevant motivation and context. Where should reviewer start? background context? - List any dependencies that are required for this change. --> Summary: Fixes #24256 ### Type of change <!-- - Fill x for your type of change. - e.g. - [x] Bug fix --> - [ ] Bug fix - [ ] Testbed and Framework(new/improvement) - [ ] New Test case - [ ] Skipped for non-supported platforms - [ ] Test case improvement ### Back port request - [ ] 202205 - [ ] 202305 - [ ] 202311 - [ ] 202405 - [ ] 202411 - [ ] 202505 - [x] 202511 ### Approach #### What is the motivation for this PR? #### How did you do it? Update the code to loop over all dut asics and using the counterpoll helper instead. #### How did you verify/test it? Reran tests/qos/test_qos_sai.py #### Any platform specific information? #### Supported testbed topology if it's a new test case? ### Documentation <!-- (If it's a new feature, new test case) Did you update documentation/Wiki relevant to your implementation? Link to the wiki page? --> Signed-off-by: Peter <peterbailey@arista.com> Signed-off-by: mssonicbld <sonicbld@microsoft.com> Co-authored-by: Peter Bailey <peterbailey@arista.com>
…UT2 64p topos (#25414)
### Description of PR
On 64p UT2 topos we've seen that the `ip_proto` variant of `test_fib.py`
can fail due to poor distributions seen across the 32-port ECMP.
This isn't surprising as `ip_proto` only has 8-bits so 256 unique values
which isn't enough entropy to expect a good distribution across a
32-port ECMP
```
16:22:48.727 root : INFO : hash_key=**ip-proto**, hit count map: {57: 512, 12: 256, 3: 320, 63: 256, 0: 256, 59: 320, 54: 416, 58: 96, 15: 320, 56: 192, 53: 320, 4: 192, 5: 384, 9: 288, 52: 128, 49: 256, 55: 256, 7: 160, 50: 480, 8: 224, 51: 192, 13: 288, 10: 256, 2: 96, 11: 224, 14: 128, 1: 320, 62: 192, 60: 256, 48: 64, 61: 224, 6: 128}
16:22:48.727 root : INFO : type port(s) exp_cnt act_cnt diff(%)
16:22:48.727 root : INFO : ECMP [0] 250 256 2.4%
16:22:48.727 root : INFO : ECMP [1] 250 320 28.000000000000004%
16:22:48.727 root : INFO : ECMP [2] 250 96 -61.6%
16:22:48.727 root : INFO : ECMP [3] 250 320 28.000000000000004%
16:22:48.727 root : INFO : ECMP [4] 250 192 -23.200000000000003%
16:22:48.727 root : INFO : ECMP [5] 250 384 53.6%
16:22:48.727 root : INFO : ECMP [6] 250 128 -48.8%
16:22:48.727 root : INFO : ECMP [7] 250 160 -36.0%
16:22:48.727 root : INFO : ECMP [8] 250 224 -10.4%
16:22:48.727 root : INFO : ECMP [9] 250 288 15.2%
16:22:48.727 root : INFO : ECMP [10] 250 256 2.4%
16:22:48.727 root : INFO : ECMP [11] 250 224 -10.4%
16:22:48.727 root : INFO : ECMP [12] 250 256 2.4%
16:22:48.727 root : INFO : ECMP [13] 250 288 15.2%
16:22:48.727 root : INFO : ECMP [14] 250 128 -48.8%
16:22:48.727 root : INFO : ECMP [15] 250 320 28.000000000000004%
16:22:48.727 root : INFO : ECMP [48] 250 64 -74.4%
16:22:48.727 root : INFO : ECMP [49] 250 256 2.4%
16:22:48.727 root : INFO : ECMP [50] 250 480 **92.0%**
16:22:48.728 root : INFO : ECMP [51] 250 192 -23.200000000000003%
16:22:48.728 root : INFO : ECMP [52] 250 128 -48.8%
16:22:48.728 root : INFO : ECMP [53] 250 320 28.000000000000004%
16:22:48.728 root : INFO : ECMP [54] 250 416 66.4%
16:22:48.728 root : INFO : ECMP [55] 250 256 2.4%
16:22:48.728 root : INFO : ECMP [56] 250 192 -23.200000000000003%
16:22:48.728 root : INFO : ECMP [57] 250 512 **104.80000000000001%**
16:22:48.728 root : INFO : ECMP [58] 250 96 -61.6%
16:22:48.728 root : INFO : ECMP [59] 250 320 28.000000000000004%
16:22:48.728 root : INFO : ECMP [60] 250 256 2.4%
16:22:48.728 root : INFO : ECMP [61] 250 224 -10.4%
16:22:48.728 root : INFO : ECMP [62] 250 192 -23.200000000000003%
16:22:48.728 root : INFO : ECMP [63] 250 256 2.4%
```
This PR skips the `ip_proto` variant specifically on these UT2 64p topos
### Type of change
- [x] Bug fix
- [ ] Testbed and Framework(new/improvement)
- [ ] New Test case
- [ ] Skipped for non-supported platforms
- [ ] Test case improvement
### Back port request
- [ ] 202205
- [ ] 202305
- [ ] 202311
- [ ] 202405
- [ ] 202411
- [ ] 202505
- [x] 202511
### Approach
#### What is the motivation for this PR?
The `ip_proto` variant doesn't have enough entropy to pass reliably on
UT2 64p topos so skip it.
#### How did you do it?
Remove the `ip_proto` from the list of `hash_keys` for topos
`t2_single_node_max_64p` and `t2_single_node_max_64p_v2`
#### How did you verify/test it?
Ran all testlets in `fib/test_fib.py` and saw they all passed on our UT2
64p topo
#### Any platform specific information?
N/A
#### Supported testbed topology if it's a new test case?
N/A
### Documentation
N/A
Signed-off-by: Nathan Wolfe <nwolfe@arista.com>
Signed-off-by: mssonicbld <sonicbld@microsoft.com>
Co-authored-by: arista-nwolfe <94405414+arista-nwolfe@users.noreply.github.com>
…art by waiting for neighbor re-learn (#25413) ### Description of PR After swss restart on single/multi-ASIC devices, the DUT's LLDP entry on the neighbor may have aged out during the restart window. The check_lldp_neighbor test then fails because it immediately queries the neighbor's LLDP table via SNMP before the neighbor has re-learned the DUT's LLDP information. This fix adds a wait_until(30, 5, 0) retry loop that polls the neighbor's LLDP table until the expected interface entry reappears, before proceeding with the LLDP fact assertions. A small helper _neighbor_has_lldp_entry is extracted to support the polling. Summary: Fixes # (issue) ### Type of change <!-- - Fill x for your type of change. - e.g. - [x] Bug fix --> - [ ] Bug fix - [ ] Testbed and Framework(new/improvement) - [ ] New Test case - [ ] Skipped for non-supported platforms - [x] Test case improvement ### Back port request - [ ] 202205 - [ ] 202305 - [ ] 202311 - [ ] 202405 - [ ] 202411 - [ ] 202505 - [x] 202511 ### Approach #### What is the motivation for this PR? The LLDP neighbor verification test (check_lldp_neighbor) is flaky after swss restart. During the restart window, the DUT stops sending LLDP PDUs, causing the neighbor's LLDP entry for the DUT to age out (default LLDP hold time is ~120s, but rapid restarts can exceed this). When the test immediately queries the neighbor via SNMP, the entry may not yet be re-learned, lead to a KeyError or assertion failure. #### How did you do it? - Extracted a helper function _neighbor_has_lldp_entry() that queries the neighbor's LLDP table via SNMP and checks if the expected interface is present. - Refactored the eos/sonic branching to set neighbor_interface and snmp_community variables up front, removing duplicated lldp_facts calls. - Added a wait_until call that polls every 5 seconds for up to 30 seconds, asserting the neighbor has re-learned the DUT's LLDP entry before proceeding with the existing LLDP fact assertions. #### How did you verify/test it? Ran `tests/lldp/test_lldp.py` on both single-ASIC and multi-ASIC DUTs. Confirmed that the test now waits for the neighbor to re-learn LLDP info and passes reliably instead of failing intermittently. #### Any platform specific information? #### Supported testbed topology if it's a new test case? ### Documentation <!-- (If it's a new feature, new test case) Did you update documentation/Wiki relevant to your implementation? Link to the wiki page? --> Signed-off-by: setu <setu@arista.com> Signed-off-by: mssonicbld <sonicbld@microsoft.com> Co-authored-by: Setu Patel <171176331+arista-setu@users.noreply.github.com>
…F (#25412) The static data in this file must match with the aliases and ignore statements in sensors.conf for the corresponding platform. This is a Arista platform specific change. <!-- Please make sure you've read and understood our contributing guidelines; https://github.com/sonic-net/SONiC/blob/gh-pages/CONTRIBUTING.md Please provide following information to help code review process a bit easier: --> ### Description of PR <!-- - Please include a summary of the change and which issue is fixed. - Please also include relevant motivation and context. Where should reviewer start? background context? - List any dependencies that are required for this change. --> Summary: Fixes # (issue) ### Type of change <!-- - Fill x for your type of change. - e.g. - [x] Bug fix --> - [x] Bug fix - [ ] Testbed and Framework(new/improvement) - [ ] New Test case - [ ] Skipped for non-supported platforms - [ ] Test case improvement ### Back port request - [ ] 202205 - [ ] 202305 - [ ] 202311 - [ ] 202405 - [ ] 202411 - [ ] 202505 - [x] 202511 ### Approach #### What is the motivation for this PR? The names of sensors here must match with the aliases using in sensors.conf #### How did you do it? Modified sku-sensors-data.yml to reflect the sensor aliases. #### How did you verify/test it? Passed platform_tests/test_sensors.py::test_sensors on this platform #### Any platform specific information? Applies to Arista 7280R4-32QF-32DF-F and 7280R4K-32QF-32DF-F models #### Supported testbed topology if it's a new test case? NA ### Documentation <!-- (If it's a new feature, new test case) Did you update documentation/Wiki relevant to your implementation? Link to the wiki page? --> NA Signed-off-by: arista-hpandya <hpandya@arista.com> Signed-off-by: mssonicbld <sonicbld@microsoft.com> Co-authored-by: HP <hpandya@arista.com>
…e_max_64p.yml (#25410) ### Description of PR Added BGP confederation configuration to `topo_t2_single_node_max_64p.yml`. This is a follow-up to PR #23527, which updated the DUT role, nhipv4, and nhipv6 values. The confed changes were deferred from that PR to keep the review scope manageable. Summary: - Updated `dut_asn` from `65100` to `66000` and added `dut_confed_asn: 65100`, `dut_confed_peers: 65200` in `configuration_properties` - Added `peer_in_bgp_confed: true` to all 32 core VMs (VM01T3–VM32T3) - Updated all 32 leaf VMs (VM01LT2–VM32LT2): changed `asn` to `65300`, added `confed_asn: 65100` and `confed_peers: 66000`, changed BGP peer from `65100` to `66000` Fixes # (issue) ### Type of change - [ ] Bug fix - [x] Testbed and Framework(new/improvement) - [ ] New Test case - [ ] Skipped for non-supported platforms - [ ] Test case improvement ### Back port request - [ ] 202205 - [ ] 202305 - [ ] 202311 - [ ] 202405 - [ ] 202411 - [ ] 202505 - [x] 202511 ### Approach #### What is the motivation for this PR? To enable BGP confederation support in the 64-port T2 single node topology. This is required for testing with UT2s and LT2s as part of BGP confederation, which needs the additional confed fields in the topology file. #### How did you do it? By modifying the topology file to add confederation-related BGP attributes (`dut_confed_asn`, `dut_confed_peers`, `peer_in_bgp_confed`, `confed_asn`, `confed_peers`) and updating ASN/peer values accordingly. #### How did you verify/test it? By doing add-topo. #### Any platform specific information? #### Supported testbed topology if it's a new test case? ### Documentation Signed-off-by: yatishkoul <yatishkoul@gmail.com> Signed-off-by: mssonicbld <sonicbld@microsoft.com> Co-authored-by: Yatish <yatishkoul@microsoft.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.