Skip to content

fix(bypass): clean up stale iptables rules on daemon restart#1621

Open
imshubham22apr-gif wants to merge 1 commit intokmesh-net:mainfrom
imshubham22apr-gif:fix/bypass-restart-cleanup
Open

fix(bypass): clean up stale iptables rules on daemon restart#1621
imshubham22apr-gif wants to merge 1 commit intokmesh-net:mainfrom
imshubham22apr-gif:fix/bypass-restart-cleanup

Conversation

@imshubham22apr-gif
Copy link
Copy Markdown

Purpose

While auditing the bypass controller, we found that Kmesh doesn't flush existing bypass iptables rules when the daemon restarts. This can lead to stale rules accumulating in pod network namespaces if the bypass state changed while Kmesh was down.

Changes

  • Implement iptables cleanup for non-bypassing pods in AddFunc to ensure reconciliation on restart.
  • Fix minor formatting typo in iptables execution error messages.
  • Add test case to verify cleanup on pod addition.

Fixes #1592

Copilot AI review requested due to automatic review settings March 21, 2026 05:55
@kmesh-bot kmesh-bot requested review from kwb0523 and nlgwcy March 21, 2026 05:55
@kmesh-bot
Copy link
Copy Markdown
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign nlgwcy for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses an issue where stale iptables bypass rules could persist in pod network namespaces after the Kmesh daemon restarts. It introduces a mechanism to ensure that existing bypass rules are properly flushed for non-bypassing pods during reconciliation, preventing the accumulation of outdated network configurations.

Highlights

  • Stale Rule Cleanup: Implemented logic to clean up stale iptables rules for non-bypassing pods during reconciliation, specifically when a pod is added.
  • Error Message Fix: Corrected a minor formatting typo in iptables execution error messages.
  • Test Coverage: Added a new test case to verify the iptables cleanup behavior when a pod without a bypass label is created.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.


Old rules linger on, Daemon restarts, fresh new start, Clean slate, network flows.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses an issue where stale iptables rules were not cleaned up on daemon restart for non-bypassing pods. The change introduces logic in the AddFunc handler to delete these rules upon pod addition, which covers the restart scenario. The PR also includes a fix for a formatting typo in error messages and adds a new test case to verify the cleanup logic. My review identifies one area for improvement: the error returned from deleteIptables is currently ignored. I've suggested logging this error to improve observability and ensure that cleanup failures don't go unnoticed. Otherwise, the changes look good and effectively address the issue.

nspath, _ := ns.GetPodNSpath(pod)
if nspath != "" {
// during daemon restart, reconcile to ensure stale bypass rules are cleared
_ = deleteIptables(nspath)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The error returned by deleteIptables is ignored. This is inconsistent with other parts of the controller where errors from addIptables and deleteIptables are logged. If deleteIptables fails, it will fail silently, and stale iptables rules might not be cleaned up as intended, which undermines the goal of this change. The error should be logged to provide visibility into potential failures.

				if err := deleteIptables(nspath); err != nil {
					log.Errorf("failed to delete stale iptables rules for pod %s/%s: %v", pod.GetNamespace(), pod.GetName(), err)
				}

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses stale bypass iptables rules persisting across Kmesh daemon restarts by adding a reconciliation cleanup step for non-bypassing pods and extending unit coverage to validate the behavior.

Changes:

  • Add best-effort iptables cleanup for sidecar-injected pods that do not have the bypass label during pod Add events (to reconcile on restart).
  • Fix a minor formatting typo in iptables execution error messages.
  • Add a new unit test case asserting cleanup is invoked when a sidecar pod is created without the bypass label.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
pkg/controller/bypass/bypass_controller.go Adds reconciliation cleanup on AddFunc for non-bypass pods; fixes iptables error message formatting.
pkg/controller/bypass/bypass_test.go Adds a test case ensuring delete iptables is invoked for sidecar pods created without bypass label.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 183 to 187
for _, args := range iptArgs {
if err := utils.Execute("iptables", args); err != nil {
err = fmt.Errorf("failed to exec command: iptables %v\", err: %v", args, err)
err = fmt.Errorf("failed to exec command: iptables %v, err: %v", args, err)
log.Error(err)
return err
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleteIptables treats any iptables -D failure as an error and logs it internally. With the new AddFunc reconciliation, this is likely to emit an error for pods where the RETURN rules were never present (iptables prints to stderr when a rule is missing), causing noisy logs on restart. Make deletion idempotent by ignoring “rule not found” errors (or check with -C first), and consider letting the caller decide whether to log to avoid double/log-spam behavior.

Copilot uses AI. Check for mistakes.
Namespace: namespaceName,
Labels: map[string]string{}, // no bypass label
Annotations: map[string]string{
"sidecar.istio.io/status": "placeholder",
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test uses a hard-coded "sidecar.istio.io/status" annotation key, while the rest of the test (and PodHasSidecar) uses annotation.SidecarStatus.Name. Using the constant here avoids drift/typos if Istio’s key changes and keeps the test aligned with the production check.

Suggested change
"sidecar.istio.io/status": "placeholder",
annotation.SidecarStatus.Name: "placeholder",

Copilot uses AI. Check for mistakes.
Comment on lines 67 to +71
if !shouldBypass(pod) {
// TODO: add delete iptables in case we missed skip bypass during kmesh restart
nspath, _ := ns.GetPodNSpath(pod)
if nspath != "" {
// during daemon restart, reconcile to ensure stale bypass rules are cleared
_ = deleteIptables(nspath)
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new reconciliation path runs deleteIptables for every non-bypass Pod Add event (not just during informer startup). That means every newly created sidecar-injected pod will trigger netns lookup + iptables deletes even though there can’t be “stale” Kmesh rules yet, which can add avoidable overhead. Consider limiting this cleanup to the initial informer sync (startup reconciliation) or a one-time post-sync pass over existing pods, and handle/log GetPodNSpath/deleteIptables errors explicitly rather than dropping them.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

pkg/controller/bypass: Missing iptables cleanup during Kmesh daemon restart

3 participants