bgpd: don't advertise LLGR stale routes to non-LLGR peers#22297
bgpd: don't advertise LLGR stale routes to non-LLGR peers#22297inder-nexthop wants to merge 1 commit into
Conversation
Greptile SummaryThis PR fixes a long-standing RFC 9494 compliance bug where FRR's LLGR helper was advertising
Confidence Score: 5/5The change is safe to merge: both hunks are surgical, well-motivated by RFC 9494, and covered by a new topotest that exercises the exact before/after behavior. The bgp_route.c fix is a single-clause removal with the correct logic clearly documented in the comment directly above it. The bgp_updgrp.h addition ensures update-group separation on the same axis the advertise check uses, closing the subgroup-merging escape hatch. The topotest covers both the positive case (r3 still receives the stale route) and the negative case (r4 gets a withdrawal), and the r4 config correctly disables GR to guarantee LLGR is not negotiated. No correctness concerns were found. No files require special attention. Important Files Changed
Sequence DiagramsequenceDiagram
participant r1 as r1 (route originator)
participant r2 as r2 (LLGR helper)
participant r3 as r3 (LLGR-capable peer)
participant r4 as r4 (non-LLGR peer)
Note over r1,r4: Initial convergence
r1->>r2: BGP OPEN (GR+LLGR capability)
r2->>r3: BGP OPEN (GR+LLGR capability)
r2->>r4: BGP OPEN (no GR/LLGR from r4)
r1->>r2: UPDATE 172.16.1.1/32
r2->>r3: UPDATE 172.16.1.1/32
r2->>r4: UPDATE 172.16.1.1/32
Note over r1,r4: r1 bgpd killed - r2 enters LLGR helper mode
r1--xr2: session down
Note over r2: GR restart-time=0 expires immediately
Note over r2: Tags 172.16.1.1/32 with LLGR_STALE community
Note over r2: PEER_CAP_LLGR_RCV set for r3, unset for r4
Note over r2: r3 gets own update-group (LLGR_RCV=1)
Note over r2: r4 gets own update-group (LLGR_RCV=0)
r2->>r3: UPDATE 172.16.1.1/32 [LLGR_STALE]
r2->>r4: WITHDRAW 172.16.1.1/32
Reviews (1): Last reviewed commit: "bgpd: don't advertise LLGR stale routes ..." | Re-trigger Greptile |
When a BGP peer goes down, an LLGR helper keeps the routes it learned from that peer, tags them with the LLGR_STALE community, and holds them as a last resort. RFC 9494 (4.3) says a stale route like that shouldn't be advertised to any neighbor that hasn't advertised the LLGR capability to us - a speaker that doesn't implement LLGR has no notion of staleness and would treat the route as a normal, live path. The advertise check only withheld a stale route when LLGR had been negotiated in neither direction (!RCV && !ADV), but since we advertise the LLGR capability to every peer, ADV is effectively always set and the route went out anyway. The capability that matters here is the neighbor's, so key the decision off whether they advertised it to us (RCV) - which is what the comment sitting right above that check already says we do. That alone isn't enough: the advertise decision is made once per update-subgroup, and an LLGR peer and a non-LLGR peer with the same outbound policy land in the same subgroup, so there's no way to send the route to one but not the other. Add PEER_CAP_LLGR_RCV to PEER_UPDGRP_CAP_FLAGS so the received capability feeds both the update-group hash key and the cmp function - keying off the same axis the advertise check uses - so a non-LLGR peer gets its own update-group and is never merged back in on a hash collision. The new bgp_llgr_no_capability topotest covers it: kill the peer that advertised the route, then confirm r2 keeps feeding the stale route to its LLGR-capable peer r3 while withdrawing it from the non-LLGR peer r4. Worth noting in the config - FRR advertises LLGR whenever it advertises graceful restart, and the default helper mode counts too, so r4 has to set "bgp graceful-restart-disable" to actually be a non-LLGR peer. Signed-off-by: Inder Pooni <inder@nexthop.ai>
9cd7d59 to
804e57f
Compare
When a BGP peer goes down, an LLGR helper keeps the routes it learned from that peer, tags them with the LLGR_STALE community, and holds them as a last resort. RFC 9494 (4.3) says a stale route like that shouldn't be advertised to any neighbor that hasn't advertised the LLGR capability to us - a speaker that doesn't implement LLGR has no notion of staleness and would treat the route as a normal, live path.
The advertise check only withheld a stale route when LLGR had been negotiated in neither direction (!RCV && !ADV), but since we advertise the LLGR capability to every peer, ADV is effectively always set and the route went out anyway. The capability that matters here is the neighbor's, so key the decision off whether they advertised it to us (RCV) - which is what the comment sitting right above that check already says we do.
That alone isn't enough: the advertise decision is made once per update-subgroup, and an LLGR peer and a non-LLGR peer with the same outbound policy land in the same subgroup, so there's no way to send the route to one but not the other. Add PEER_CAP_LLGR_RCV to PEER_UPDGRP_CAP_FLAGS so the received capability feeds both the update-group hash key and the cmp function - keying off the same axis the advertise check uses - so a non-LLGR peer gets its own update-group and is never merged back in on a hash collision.
The new bgp_llgr_no_capability topotest covers it: kill the peer that advertised the route, then confirm r2 keeps feeding the stale route to its LLGR-capable peer r3 while withdrawing it from the non-LLGR peer r4. Worth noting in the config - FRR advertises LLGR whenever it advertises graceful restart, and the default helper mode counts too, so r4 has to set "bgp graceful-restart-disable" to actually be a non-LLGR peer.