fix(bypass): clean up stale iptables rules on daemon restart#1621
fix(bypass): clean up stale iptables rules on daemon restart#1621imshubham22apr-gif wants to merge 1 commit intokmesh-net:mainfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses an issue where stale iptables bypass rules could persist in pod network namespaces after the Kmesh daemon restarts. It introduces a mechanism to ensure that existing bypass rules are properly flushed for non-bypassing pods during reconciliation, preventing the accumulation of outdated network configurations. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Old rules linger on, Daemon restarts, fresh new start, Clean slate, network flows. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request addresses an issue where stale iptables rules were not cleaned up on daemon restart for non-bypassing pods. The change introduces logic in the AddFunc handler to delete these rules upon pod addition, which covers the restart scenario. The PR also includes a fix for a formatting typo in error messages and adds a new test case to verify the cleanup logic. My review identifies one area for improvement: the error returned from deleteIptables is currently ignored. I've suggested logging this error to improve observability and ensure that cleanup failures don't go unnoticed. Otherwise, the changes look good and effectively address the issue.
| nspath, _ := ns.GetPodNSpath(pod) | ||
| if nspath != "" { | ||
| // during daemon restart, reconcile to ensure stale bypass rules are cleared | ||
| _ = deleteIptables(nspath) |
There was a problem hiding this comment.
The error returned by deleteIptables is ignored. This is inconsistent with other parts of the controller where errors from addIptables and deleteIptables are logged. If deleteIptables fails, it will fail silently, and stale iptables rules might not be cleaned up as intended, which undermines the goal of this change. The error should be logged to provide visibility into potential failures.
if err := deleteIptables(nspath); err != nil {
log.Errorf("failed to delete stale iptables rules for pod %s/%s: %v", pod.GetNamespace(), pod.GetName(), err)
}There was a problem hiding this comment.
Pull request overview
This PR addresses stale bypass iptables rules persisting across Kmesh daemon restarts by adding a reconciliation cleanup step for non-bypassing pods and extending unit coverage to validate the behavior.
Changes:
- Add best-effort iptables cleanup for sidecar-injected pods that do not have the bypass label during pod Add events (to reconcile on restart).
- Fix a minor formatting typo in iptables execution error messages.
- Add a new unit test case asserting cleanup is invoked when a sidecar pod is created without the bypass label.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| pkg/controller/bypass/bypass_controller.go | Adds reconciliation cleanup on AddFunc for non-bypass pods; fixes iptables error message formatting. |
| pkg/controller/bypass/bypass_test.go | Adds a test case ensuring delete iptables is invoked for sidecar pods created without bypass label. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| for _, args := range iptArgs { | ||
| if err := utils.Execute("iptables", args); err != nil { | ||
| err = fmt.Errorf("failed to exec command: iptables %v\", err: %v", args, err) | ||
| err = fmt.Errorf("failed to exec command: iptables %v, err: %v", args, err) | ||
| log.Error(err) | ||
| return err |
There was a problem hiding this comment.
deleteIptables treats any iptables -D failure as an error and logs it internally. With the new AddFunc reconciliation, this is likely to emit an error for pods where the RETURN rules were never present (iptables prints to stderr when a rule is missing), causing noisy logs on restart. Make deletion idempotent by ignoring “rule not found” errors (or check with -C first), and consider letting the caller decide whether to log to avoid double/log-spam behavior.
| Namespace: namespaceName, | ||
| Labels: map[string]string{}, // no bypass label | ||
| Annotations: map[string]string{ | ||
| "sidecar.istio.io/status": "placeholder", |
There was a problem hiding this comment.
This test uses a hard-coded "sidecar.istio.io/status" annotation key, while the rest of the test (and PodHasSidecar) uses annotation.SidecarStatus.Name. Using the constant here avoids drift/typos if Istio’s key changes and keeps the test aligned with the production check.
| "sidecar.istio.io/status": "placeholder", | |
| annotation.SidecarStatus.Name: "placeholder", |
| if !shouldBypass(pod) { | ||
| // TODO: add delete iptables in case we missed skip bypass during kmesh restart | ||
| nspath, _ := ns.GetPodNSpath(pod) | ||
| if nspath != "" { | ||
| // during daemon restart, reconcile to ensure stale bypass rules are cleared | ||
| _ = deleteIptables(nspath) |
There was a problem hiding this comment.
The new reconciliation path runs deleteIptables for every non-bypass Pod Add event (not just during informer startup). That means every newly created sidecar-injected pod will trigger netns lookup + iptables deletes even though there can’t be “stale” Kmesh rules yet, which can add avoidable overhead. Consider limiting this cleanup to the initial informer sync (startup reconciliation) or a one-time post-sync pass over existing pods, and handle/log GetPodNSpath/deleteIptables errors explicitly rather than dropping them.
Purpose
While auditing the bypass controller, we found that Kmesh doesn't flush existing bypass iptables rules when the daemon restarts. This can lead to stale rules accumulating in pod network namespaces if the bypass state changed while Kmesh was down.
Changes
Fixes #1592