Zebra resilience configuration#22330
Conversation
Add a `zebra nexthop-group reslience buckets [0-255] idle-timer <seconds> unbalanced-timer <seconds>` to zebra. When zebra receives a nexthop group from an upper level protocol that does not have it's own reslience use the defined values. Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Add a test that shows that this is working at some level. Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Greptile SummaryThis PR adds a new global
Confidence Score: 4/5The dataplane and northbound changes are straightforward and well-scoped; the only concrete gap is a missing The core logic — injecting resilience into the NHG hash key before lookup — is correct because every caller of tests/topotests/zebra_nhg_resilience/ — needs an empty Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["CLI: zebra nexthop-group resilience\nbuckets N idle-timer T1 unbalanced-timer T2"] --> B["NB apply_finish callback\nzebra_nexthop_group_resilience_apply_finish"]
B --> C["zebra_nhg_set_resilience(buckets, idle, unbalanced)\nstores in zrouter.nhg_resilience"]
D["Route added / NHG lookup\nzebra_nhe_find(lookup, nhg_depends, ...)"] --> E{from_dplane\nor proto-owned\nor buckets==0?}
E -- "yes (skip)" --> G["hash_lookup / create NHE\nwithout resilience"]
E -- "no" --> F{multipath?\nnhg_depends OR\nnexthop->next}
F -- "singleton" --> G
F -- "multipath" --> H["lookup->nhg.nhgr = zrouter.nhg_resilience"]
H --> I["hash_lookup / create NHE\nwith resilience"]
J["CLI: no zebra nexthop-group resilience"] --> K["NB destroy NB_EV_APPLY\nzebra_nexthop_group_resilience_destroy"]
K --> L["zebra_nhg_set_resilience(0,0,0)\nzrouter.nhg_resilience.buckets = 0"]
L --> M["Future new groups:\nresilience NOT applied\nExisting groups: unchanged"]
Prompt To Fix All With AIFix the following 1 code review issue. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 1
tests/topotests/zebra_nhg_resilience/test_zebra_nhg_resilience.py:1-10
**Missing `__init__.py` in test directory**
Every other `zebra_*` topotest directory ships an `__init__.py` (e.g. `zebra_netlink/`, `zebra_opaque/`, `zebra_reserved_ranges/`, etc.). Without it, pytest running in default `prepend` import mode may fail to collect this test or, if two identically-named test files ever appear in different directories, produce an `ImportError` for a naming collision. The file is simply empty (`\n`), but its absence is inconsistent with the rest of the test suite and will cause problems in some CI environments.
Reviews (1): Last reviewed commit: "tests: Add a test for the new zebra next..." | Re-trigger Greptile |
| #!/usr/bin/env python | ||
| # SPDX-License-Identifier: ISC | ||
|
|
||
| # | ||
| # Copyright (c) 2026 by Nvidia Corporation | ||
| # Donald Sharp | ||
| # | ||
|
|
||
| """ | ||
| Test that 'zebra nexthop-group resilience ...' causes every zebra-created |
There was a problem hiding this comment.
Missing
__init__.py in test directory
Every other zebra_* topotest directory ships an __init__.py (e.g. zebra_netlink/, zebra_opaque/, zebra_reserved_ranges/, etc.). Without it, pytest running in default prepend import mode may fail to collect this test or, if two identically-named test files ever appear in different directories, produce an ImportError for a naming collision. The file is simply empty (\n), but its absence is inconsistent with the rest of the test suite and will cause problems in some CI environments.
Prompt To Fix With AI
This is a comment left during a code review.
Path: tests/topotests/zebra_nhg_resilience/test_zebra_nhg_resilience.py
Line: 1-10
Comment:
**Missing `__init__.py` in test directory**
Every other `zebra_*` topotest directory ships an `__init__.py` (e.g. `zebra_netlink/`, `zebra_opaque/`, `zebra_reserved_ranges/`, etc.). Without it, pytest running in default `prepend` import mode may fail to collect this test or, if two identically-named test files ever appear in different directories, produce an `ImportError` for a naming collision. The file is simply empty (`\n`), but its absence is inconsistent with the rest of the test suite and will cause problems in some CI environments.
How can I resolve this? If you propose a fix, please make it concise.
Allow zebra to control nexhtop group resilience.