Skip to content

Zebra resilience configuration#22330

Open
donaldsharp wants to merge 2 commits into
FRRouting:masterfrom
donaldsharp:zebra_resilience
Open

Zebra resilience configuration#22330
donaldsharp wants to merge 2 commits into
FRRouting:masterfrom
donaldsharp:zebra_resilience

Conversation

@donaldsharp

Copy link
Copy Markdown
Member

Allow zebra to control nexhtop group resilience.

Add a `zebra nexthop-group reslience buckets [0-255] idle-timer <seconds> unbalanced-timer <seconds>`
to zebra.  When zebra receives a nexthop group from an upper level protocol that does not have
it's own reslience use the defined values.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Add a test that shows that this is working at some level.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
@frrbot frrbot Bot added tests Topotests, make check, etc zebra labels Jun 12, 2026
@greptile-apps

greptile-apps Bot commented Jun 12, 2026

Copy link
Copy Markdown

Greptile Summary

This PR adds a new global zebra nexthop-group resilience command that causes zebra to automatically apply resilient hashing (buckets, idle-timer, unbalanced-timer) to every zebra-owned multipath nexthop group it creates, without overriding per-group resilience already configured by upper-level protocols.

  • Wires the new command through the YANG model (frr-zebra.yang presence container), northbound callbacks (zebra_nb_config.c), and CLI (zebra_cli.c); the global nhg_resilience field is stored in zebra_router and applied inside zebra_nhe_find before the hash lookup so resilience becomes part of the hash key for new groups.
  • Adds a topotest that verifies multipath groups inherit the config, singleton groups are left alone, and removing the config stops new groups from being made resilient while leaving existing ones unchanged; the test directory is missing its __init__.py.

Confidence Score: 4/5

The dataplane and northbound changes are straightforward and well-scoped; the only concrete gap is a missing __init__.py in the new test directory.

The core logic — injecting resilience into the NHG hash key before lookup — is correct because every caller of zebra_nhe_find passes either a stack-local struct or an explicitly copied NHE, so the in-place field write is safe. The YANG model, NB callbacks, and CLI all follow established FRR patterns. The one concrete defect is the missing __init__.py in tests/topotests/zebra_nhg_resilience/, which every other zebra_* topotest directory includes and whose absence can prevent test discovery in default pytest import mode.

tests/topotests/zebra_nhg_resilience/ — needs an empty __init__.py to match the convention of all other zebra topotest directories.

Important Files Changed

Filename Overview
tests/topotests/zebra_nhg_resilience/test_zebra_nhg_resilience.py New topotest covers multipath/singleton/removal scenarios but is missing the required __init__.py present in every other zebra_* test directory.
zebra/zebra_nhg.c Adds global resilience injection into zebra_nhe_find; the lookup struct is always a local copy at all call sites, so the in-place modification is safe.
zebra/zebra_nb_config.c NB callbacks correctly delegate all work to apply_finish; destroy properly gates on NB_EV_APPLY before zeroing global resilience state.
zebra/zebra_cli.c DEFPY command and cli_show function look correct; no form cleanly destroys the presence container without needing the parameter arguments.
yang/frr-zebra.yang New presence container resilience under nexthop-group with correctly typed mandatory leaves; types match struct nhg_resilience fields.
zebra/zebra_nb.c NB registration table correctly wires up the four new XPaths with their callbacks.
zebra/zebra_router.h Adds nhg_resilience field to zebra_router; zero-initialized by default, which correctly disables the feature on startup.
zebra/zebra_nhg.h Adds public declaration for zebra_nhg_set_resilience.
zebra/zebra_nb.h Header declarations for the six new NB callback functions.
doc/user/zebra.rst New CLI documentation entry describes the command and its scoping behaviour accurately.
tests/topotests/zebra_nhg_resilience/r1/frr.conf Router config sets up a multipath route (10.0.0.0/24 via two nexthops) and a singleton route (10.1.1.0/24) to exercise both test paths.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["CLI: zebra nexthop-group resilience\nbuckets N idle-timer T1 unbalanced-timer T2"] --> B["NB apply_finish callback\nzebra_nexthop_group_resilience_apply_finish"]
    B --> C["zebra_nhg_set_resilience(buckets, idle, unbalanced)\nstores in zrouter.nhg_resilience"]

    D["Route added / NHG lookup\nzebra_nhe_find(lookup, nhg_depends, ...)"] --> E{from_dplane\nor proto-owned\nor buckets==0?}
    E -- "yes (skip)" --> G["hash_lookup / create NHE\nwithout resilience"]
    E -- "no" --> F{multipath?\nnhg_depends OR\nnexthop->next}
    F -- "singleton" --> G
    F -- "multipath" --> H["lookup->nhg.nhgr = zrouter.nhg_resilience"]
    H --> I["hash_lookup / create NHE\nwith resilience"]

    J["CLI: no zebra nexthop-group resilience"] --> K["NB destroy NB_EV_APPLY\nzebra_nexthop_group_resilience_destroy"]
    K --> L["zebra_nhg_set_resilience(0,0,0)\nzrouter.nhg_resilience.buckets = 0"]
    L --> M["Future new groups:\nresilience NOT applied\nExisting groups: unchanged"]
Loading
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
tests/topotests/zebra_nhg_resilience/test_zebra_nhg_resilience.py:1-10
**Missing `__init__.py` in test directory**

Every other `zebra_*` topotest directory ships an `__init__.py` (e.g. `zebra_netlink/`, `zebra_opaque/`, `zebra_reserved_ranges/`, etc.). Without it, pytest running in default `prepend` import mode may fail to collect this test or, if two identically-named test files ever appear in different directories, produce an `ImportError` for a naming collision. The file is simply empty (`\n`), but its absence is inconsistent with the rest of the test suite and will cause problems in some CI environments.

Reviews (1): Last reviewed commit: "tests: Add a test for the new zebra next..." | Re-trigger Greptile

Comment on lines +1 to +10
#!/usr/bin/env python
# SPDX-License-Identifier: ISC

#
# Copyright (c) 2026 by Nvidia Corporation
# Donald Sharp
#

"""
Test that 'zebra nexthop-group resilience ...' causes every zebra-created

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Missing __init__.py in test directory

Every other zebra_* topotest directory ships an __init__.py (e.g. zebra_netlink/, zebra_opaque/, zebra_reserved_ranges/, etc.). Without it, pytest running in default prepend import mode may fail to collect this test or, if two identically-named test files ever appear in different directories, produce an ImportError for a naming collision. The file is simply empty (\n), but its absence is inconsistent with the rest of the test suite and will cause problems in some CI environments.

Prompt To Fix With AI
This is a comment left during a code review.
Path: tests/topotests/zebra_nhg_resilience/test_zebra_nhg_resilience.py
Line: 1-10

Comment:
**Missing `__init__.py` in test directory**

Every other `zebra_*` topotest directory ships an `__init__.py` (e.g. `zebra_netlink/`, `zebra_opaque/`, `zebra_reserved_ranges/`, etc.). Without it, pytest running in default `prepend` import mode may fail to collect this test or, if two identically-named test files ever appear in different directories, produce an `ImportError` for a naming collision. The file is simply empty (`\n`), but its absence is inconsistent with the rest of the test suite and will cause problems in some CI environments.

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

master size/L tests Topotests, make check, etc zebra

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant