Skip to content

Fix AnsibleHostBase._run: normalize 'failed' key for ansible-core >= 2.21#25482

Open
xwjiang-ms wants to merge 1 commit into
sonic-net:masterfrom
xwjiang-ms:ai-fix-base-ansible-failed-key
Open

Fix AnsibleHostBase._run: normalize 'failed' key for ansible-core >= 2.21#25482
xwjiang-ms wants to merge 1 commit into
sonic-net:masterfrom
xwjiang-ms:ai-fix-base-ansible-failed-key

Conversation

@xwjiang-ms

@xwjiang-ms xwjiang-ms commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Description of PR

Summary:
Fix KeyError: 'failed' in AnsibleHostBase._run caused by ansible-core >= 2.21 changing how task results are post-processed.

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • New Test case
  • Test case improvement

Back port request

  • 202311
  • 202405
  • 202411
  • 202505
  • 202511
  • 202605

Approach

What is the motivation for this PR?

The existing _IGNORE hack in �ase.py prevents ansible from stripping 'failed' from task results. This hack no longer works in ansible-core >= 2.21, which changed post-processing so that 'failed' is absent from successful command results.

Multiple test modules hit KeyError: 'failed' when they access
c['failed'] after a successful self.shell() or self.command() call, observed when running with the latest sonic-mgmt docker:

Module Location
tacacs common/helpers/tacacs/tacacs_helper.py:366 — if nss_config_attribute['failed']:
dualtor_io common/dualtor/dual_tor_io.py:591 — if not output['failed']:
bgp �gp/route_checker.py:121 — if res['failed'] and cmd_backup != "":
macsec macsec/test_dataplane.py:117 — ...["failed"]

How did you do it?

Normalize hostname_res in _run() to always include 'failed' (based on hostname_res.is_failed) before returning. This is a single-point fix that covers all call sites without requiring per-file changes.

python if 'failed' not in hostname_res: hostname_res['failed'] = hostname_res.is_failed

This approach is backward-compatible — it has no effect when ansible-core already includes 'failed' in the result.

How did you verify/test it?

  • Pre-commit checks passed locally ( lake8, rim trailing whitespace, check python ast, etc.)
  • Observed failures in ADO build 1142845 are all caused by this root issue
  • The fix aligns with the original intent of the _IGNORE hack — ensuring callers always have access to
    c['failed']

Any platform specific information?

N/A — this is a framework-level fix affecting all platforms.

Supported testbed topology if it's a new test case?

N/A

Documentation

N/A

…2.21

The _IGNORE hack that prevented ansible from stripping 'failed' from
task results no longer works in ansible-core >= 2.21, which changed
post-processing so that 'failed' is absent from successful results.

Multiple test modules raise KeyError: 'failed' when they access
rc['failed'] after a successful self.shell()/self.command() call:
- common/helpers/tacacs/tacacs_helper.py
- common/dualtor/dual_tor_io.py
- bgp/route_checker.py
- macsec/test_dataplane.py

Normalize hostname_res in _run() to always include 'failed' based on
is_failed before returning. Single-point fix covering all call sites.

Signed-off-by: xwjiang-ms <xiaweijiang@microsoft.com>
@mssonicbld

Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants