Skip to content

fix(ebpf): prevent file descriptor leak in resource retrieval#1632

Open
shivansh-gohem wants to merge 2 commits intokmesh-net:mainfrom
shivansh-gohem:fix-ebpf-fd-leak
Open

fix(ebpf): prevent file descriptor leak in resource retrieval#1632
shivansh-gohem wants to merge 2 commits intokmesh-net:mainfrom
shivansh-gohem:fix-ebpf-fd-leak

Conversation

@shivansh-gohem
Copy link
Copy Markdown
Contributor

Fixes #1618.

What this PR does :
The GetProgramByName and GetMapByName functions in pkg/utils/ebpf.go were iterating over system BPF objects but failing to close the file descriptors generated by ebpf.NewProgramFromID and ebpf.NewMapFromID on non-matching iterations. In environments with many loaded BPF objects, this loop quickly exhausts the open file descriptor limit (EMFILE).

This PR fixes the issue by:

  1. Adding explicit Close() calls on error paths and before continuing to the next iteration.
  2. Avoiding defer inside the for loop to ensure FDs are released immediately per-iteration.
  3. Adding a regression test (TestGetProgramByName_FDLeak) that aggressively searches for a non-existent program to verify FDs remain stable during full system iterations.

Testing:
Locally verified using the new regression test. Before the fix, the loop leaked over 120 FDs. After the fix, FDs remain perfectly stable:

=== RUN   TestGetProgramByName_FDLeak
    ebpf_test.go:21: Initial open File Descriptors: 7
    ebpf_test.go:31: Final open File Descriptors: 7
    ebpf_test.go:37: No leak detected.
--- PASS: TestGetProgramByName_FDLeak (0.13s)

Copilot AI review requested due to automatic review settings March 28, 2026 12:36
@kmesh-bot
Copy link
Copy Markdown
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign yaozengzeng for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical file descriptor leak occurring during the retrieval of eBPF programs and maps by name. By ensuring that file descriptors are properly closed after each iteration—even when a match is not found—the system avoids hitting the EMFILE limit in environments with numerous loaded BPF objects.

Highlights

  • Resource Management: Added explicit Close() calls to eBPF program and map file descriptors within iteration loops to prevent resource exhaustion.
  • Regression Testing: Introduced a new test case, TestGetProgramByName_FDLeak, to monitor and verify file descriptor stability during system-wide BPF object iteration.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.


The loop would run and open wide, / With file descriptors left inside. / A leak so deep, the system groaned, / Now closed and safe, the fix is owned.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request fixes file descriptor leaks in GetProgramByName and GetMapByName by ensuring BPF resources are closed during iteration and error handling. It also adds a regression test for GetProgramByName. The review feedback recommends improving error handling by using the %w verb for error wrapping and removing trailing newlines from error messages.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a file descriptor leak when iterating over system eBPF programs/maps by ensuring per-iteration handles are closed, and adds a regression test intended to detect FD growth when searching for a non-existent program.

Changes:

  • Close ebpf.Program / ebpf.Map handles on non-matching iterations and on Info() error paths.
  • Add a regression test that counts process FDs via /proc/<pid>/fd while repeatedly calling GetProgramByName for a missing program.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

File Description
pkg/utils/ebpf.go Ensures Program/Map objects are closed on non-match and certain error paths during iteration.
pkg/utils/ebpf_test.go Adds an FD-counting regression test to catch descriptor leaks while scanning system eBPF objects.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1 to +12
package utils

import (
"fmt"
"os"
"testing"
)

// Helper function to count open file descriptors for the current process
func countOpenFDs() int {
pid := os.Getpid()
fds, err := os.ReadDir(fmt.Sprintf("/proc/%d/fd", pid))
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test relies on Linux-specific /proc/<pid>/fd. The repo already uses Linux build constraints for OS-specific code (e.g., pkg/utils/kernel_version.go:1), so this test should also be guarded (build tag //go:build linux or runtime check + t.Skip) to avoid failing on non-Linux builders.

Copilot uses AI. Check for mistakes.
@kmesh-bot kmesh-bot added size/M and removed size/L labels Mar 28, 2026
Copilot AI review requested due to automatic review settings March 28, 2026 18:12
@kmesh-bot kmesh-bot added size/L and removed size/M labels Mar 28, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 76 to 79
if targetMapInfo, err = targetMap.Info(); err != nil {
err = fmt.Errorf("failed to get new map info from fd:%v, err is %v\n", targetMap, err)
return nil, err
targetMap.Close()
return nil, fmt.Errorf("failed to get new map info from fd:%v: %w", targetMap, err)
}
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above: the message refers to "fd:%v" but passes targetMap (a *ebpf.Map handle), not the numeric FD. Consider logging targetMap.FD() and/or mapID so the error output is accurate and actionable.

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings March 29, 2026 13:07
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 36 to 38
if progID, err = ebpf.ProgramGetNextID(progID); err != nil {
err = fmt.Errorf("failed to get system next program id, err is %v\n", err)
return nil, err
return nil, fmt.Errorf("failed to get system next program id: %w", err)
}
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the target program name isn't found, ebpf.ProgramGetNextID will eventually return the end-of-iteration error (typically ENOENT). Returning it as "failed to get system next program id" makes the not-found case look like an unexpected failure. Consider detecting the end-of-iteration error and returning a clearer "program not found" error (still wrapping the original error so callers can errors.Is on it).

Copilot uses AI. Check for mistakes.
Comment on lines 68 to 70
if mapID, err = ebpf.MapGetNextID(mapID); err != nil {
err = fmt.Errorf("failed to get system next map id, err is %v\n", err)
return nil, err
return nil, fmt.Errorf("failed to get system next map id: %w", err)
}
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the target map name isn't found, ebpf.MapGetNextID will eventually return the end-of-iteration error (typically ENOENT). Returning it as "failed to get system next map id" makes the not-found case look like an unexpected failure. Consider detecting the end-of-iteration error and returning a clearer "map not found" error (still wrapping the original error so callers can errors.Is on it).

Copilot uses AI. Check for mistakes.
Comment on lines 76 to +78
if targetMapInfo, err = targetMap.Info(); err != nil {
err = fmt.Errorf("failed to get new map info from fd:%v, err is %v\n", targetMap, err)
return nil, err
targetMap.Close()
return nil, fmt.Errorf("failed to get map info for id:%v: %w", mapID, err)
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These Close() calls ignore the returned error. Since this file is linted with errcheck, it’s safer to either handle the Close error (e.g., return/join it on the Info() error path) or explicitly assign it to _ to document that it’s intentionally ignored.

Copilot uses AI. Check for mistakes.
Comment on lines +45 to +52
// Searching for a non-existent program forces it to iterate through ALL
// loaded BPF programs, which would previously leak an FD for every single one.
for i := 0; i < 10; i++ {
_, err := GetProgramByName("non_existent_fake_prog_12345")
if err == nil {
t.Fatal("expected error for non-existent program, got nil")
}
}
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As written, these tests pass for any non-nil error (including EPERM/unsupported-kernel cases) and may not actually exercise the full iteration path that previously leaked FDs. To make this a stronger regression test, consider creating a minimal eBPF program/map in the test setup (similar to pkg/controller/manage/manage_controller_test.go:newTextXdpProg and pkg/auth/rbac_test.go:genRingbuf) and then asserting the call fails specifically with the end-of-iteration/not-found error; otherwise Skip.

Copilot uses AI. Check for mistakes.
if targetProgInfo, err = targetProg.Info(); err != nil {
err = fmt.Errorf("failed to get new program info from fd:%v, err is %v\n", targetProg, err)
return nil, err
_ = targetProg.Close()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there no error when the shutdown is performed?

@shivansh-gohem
Copy link
Copy Markdown
Contributor Author

shivansh-gohem commented Apr 3, 2026

@LiZhenCheng9527 Good catch, thank you for the review! Yes, Close() does return an error. The _ = assignment is intentional — it explicitly acknowledges the error to satisfy the errcheck linter (which is enabled in .golangci.yaml) while doing best-effort cleanup.

On this error path, Info() has already failed, so we are returning that primary error. The underlying Close() is just syscall.Close(fd), which can only realistically fail with EBADF — not something we can recover from. This pattern is consistent with the rest of the codebase (e.g., client.go, rbac.go, ads_controller.go)!!!!

Signed-off-by: Shivansh Sahu <sahushivansh142@gmail.com>
Signed-off-by: Shivansh Sahu <sahushivansh142@gmail.com>
Copilot AI review requested due to automatic review settings April 8, 2026 17:40
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 8, 2026

Codecov Report

❌ Patch coverage is 42.85714% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 39.74%. Comparing base (e78a0a6) to head (6b19390).
⚠️ Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
pkg/utils/ebpf.go 42.85% 8 Missing ⚠️

❌ Your patch check has failed because the patch coverage (42.85%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Files with missing lines Coverage Δ
pkg/utils/ebpf.go 61.90% <42.85%> (+61.90%) ⬆️

... and 2 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fa3035c...6b19390. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] File Descriptor Leak in BPF Resource Retrieval (GetProgramByName / GetMapByName)

4 participants