snappi: Assert DUT peer ports are operationally up before PFCwd basic test#25331
Open
ediwibowo-msft wants to merge 1 commit into
Open
snappi: Assert DUT peer ports are operationally up before PFCwd basic test#25331ediwibowo-msft wants to merge 1 commit into
ediwibowo-msft wants to merge 1 commit into
Conversation
… test Signed-off-by: Edi Wibowo <ediwibowo@microsoft.com>
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Collaborator
|
This PR has backport request for branch(es): 202511. ---Powered by SONiC BuildBot
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
Assert that the DUT-side ports for both the ingress and egress DUTs are operationally up before running the PFC watchdog basic test. Without this guard,
run_pfcwd_basic_testproceeds even when one of the DUT ports is still down.The new check uses
wait_until(30, 2, 0, host.is_interface_status_up, port)so a transiently-down link gets up to 30s to recover before the test is failed with a clear, actionable message naming the offending host and port.Fixes # #25330
Type of change
Back port request
Approach
What is the motivation for this PR?
test_pfcwd_basic_*_lossless_prioruns were intermittently failing on multi-DUT topologies withLoss rate of Data Flow 1 (0.0) should be in [0.7, 1]. Inspection of the run logs showed PFCwd drop/storm-detected counters identical before and after the traffic run — i.e. PFCwd never engaged because at least one peer port wasn't actually up when traffic started. The current helper has no precondition check on link state, so the failure surfaces as a confusing TGEN loss-rate mismatch instead of a clear "port not up" error.How did you do it?
In
tests/snappi_tests/pfcwd/files/pfcwd_basic_helper.py::run_pfcwd_basic_test, immediately afterstart_pfcwdis invoked on both DUTs and before initial PFCwd stats are sampled, iterate over both(duthost, peer_port)pairs and assert each is operationally up.How did you verify/test it?
test_pfcwd_basic_single_lossless_prioandtest_pfcwd_basic_multi_lossless_prio. Tests pass with both DUT ports up.config interface shutdownof one of the peer ports before launching the test; the test now fails fast (within ~30s) withPort EthernetX on <host> is not operationally upinstead of running ~minutes of traffic and failing with a misleading loss-rate assertion.Any platform specific information?
None. The check uses generic SONiC CLI status (
is_interface_status_up) and applies to all platforms running this snappi PFCwd basic helper. Multi-ASIC and single-ASIC paths are both covered.Supported testbed topology if it's a new test case?
N/A
Documentation
N/A — no behavioral or interface change requiring documentation updates.