fix(ebpf): prevent file descriptor leak in resource retrieval#1632
fix(ebpf): prevent file descriptor leak in resource retrieval#1632shivansh-gohem wants to merge 2 commits intokmesh-net:mainfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical file descriptor leak occurring during the retrieval of eBPF programs and maps by name. By ensuring that file descriptors are properly closed after each iteration—even when a match is not found—the system avoids hitting the EMFILE limit in environments with numerous loaded BPF objects. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. The loop would run and open wide, / With file descriptors left inside. / A leak so deep, the system groaned, / Now closed and safe, the fix is owned. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request fixes file descriptor leaks in GetProgramByName and GetMapByName by ensuring BPF resources are closed during iteration and error handling. It also adds a regression test for GetProgramByName. The review feedback recommends improving error handling by using the %w verb for error wrapping and removing trailing newlines from error messages.
There was a problem hiding this comment.
Pull request overview
Fixes a file descriptor leak when iterating over system eBPF programs/maps by ensuring per-iteration handles are closed, and adds a regression test intended to detect FD growth when searching for a non-existent program.
Changes:
- Close
ebpf.Program/ebpf.Maphandles on non-matching iterations and onInfo()error paths. - Add a regression test that counts process FDs via
/proc/<pid>/fdwhile repeatedly callingGetProgramByNamefor a missing program.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
| pkg/utils/ebpf.go | Ensures Program/Map objects are closed on non-match and certain error paths during iteration. |
| pkg/utils/ebpf_test.go | Adds an FD-counting regression test to catch descriptor leaks while scanning system eBPF objects. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| package utils | ||
|
|
||
| import ( | ||
| "fmt" | ||
| "os" | ||
| "testing" | ||
| ) | ||
|
|
||
| // Helper function to count open file descriptors for the current process | ||
| func countOpenFDs() int { | ||
| pid := os.Getpid() | ||
| fds, err := os.ReadDir(fmt.Sprintf("/proc/%d/fd", pid)) |
There was a problem hiding this comment.
This test relies on Linux-specific /proc/<pid>/fd. The repo already uses Linux build constraints for OS-specific code (e.g., pkg/utils/kernel_version.go:1), so this test should also be guarded (build tag //go:build linux or runtime check + t.Skip) to avoid failing on non-Linux builders.
d038dff to
2935630
Compare
2935630 to
4e3af85
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if targetMapInfo, err = targetMap.Info(); err != nil { | ||
| err = fmt.Errorf("failed to get new map info from fd:%v, err is %v\n", targetMap, err) | ||
| return nil, err | ||
| targetMap.Close() | ||
| return nil, fmt.Errorf("failed to get new map info from fd:%v: %w", targetMap, err) | ||
| } |
There was a problem hiding this comment.
Same as above: the message refers to "fd:%v" but passes targetMap (a *ebpf.Map handle), not the numeric FD. Consider logging targetMap.FD() and/or mapID so the error output is accurate and actionable.
4e3af85 to
c08fa4e
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if progID, err = ebpf.ProgramGetNextID(progID); err != nil { | ||
| err = fmt.Errorf("failed to get system next program id, err is %v\n", err) | ||
| return nil, err | ||
| return nil, fmt.Errorf("failed to get system next program id: %w", err) | ||
| } |
There was a problem hiding this comment.
When the target program name isn't found, ebpf.ProgramGetNextID will eventually return the end-of-iteration error (typically ENOENT). Returning it as "failed to get system next program id" makes the not-found case look like an unexpected failure. Consider detecting the end-of-iteration error and returning a clearer "program not found" error (still wrapping the original error so callers can errors.Is on it).
| if mapID, err = ebpf.MapGetNextID(mapID); err != nil { | ||
| err = fmt.Errorf("failed to get system next map id, err is %v\n", err) | ||
| return nil, err | ||
| return nil, fmt.Errorf("failed to get system next map id: %w", err) | ||
| } |
There was a problem hiding this comment.
When the target map name isn't found, ebpf.MapGetNextID will eventually return the end-of-iteration error (typically ENOENT). Returning it as "failed to get system next map id" makes the not-found case look like an unexpected failure. Consider detecting the end-of-iteration error and returning a clearer "map not found" error (still wrapping the original error so callers can errors.Is on it).
| if targetMapInfo, err = targetMap.Info(); err != nil { | ||
| err = fmt.Errorf("failed to get new map info from fd:%v, err is %v\n", targetMap, err) | ||
| return nil, err | ||
| targetMap.Close() | ||
| return nil, fmt.Errorf("failed to get map info for id:%v: %w", mapID, err) |
There was a problem hiding this comment.
These Close() calls ignore the returned error. Since this file is linted with errcheck, it’s safer to either handle the Close error (e.g., return/join it on the Info() error path) or explicitly assign it to _ to document that it’s intentionally ignored.
| // Searching for a non-existent program forces it to iterate through ALL | ||
| // loaded BPF programs, which would previously leak an FD for every single one. | ||
| for i := 0; i < 10; i++ { | ||
| _, err := GetProgramByName("non_existent_fake_prog_12345") | ||
| if err == nil { | ||
| t.Fatal("expected error for non-existent program, got nil") | ||
| } | ||
| } |
There was a problem hiding this comment.
As written, these tests pass for any non-nil error (including EPERM/unsupported-kernel cases) and may not actually exercise the full iteration path that previously leaked FDs. To make this a stronger regression test, consider creating a minimal eBPF program/map in the test setup (similar to pkg/controller/manage/manage_controller_test.go:newTextXdpProg and pkg/auth/rbac_test.go:genRingbuf) and then asserting the call fails specifically with the end-of-iteration/not-found error; otherwise Skip.
0d2248b to
950b140
Compare
| if targetProgInfo, err = targetProg.Info(); err != nil { | ||
| err = fmt.Errorf("failed to get new program info from fd:%v, err is %v\n", targetProg, err) | ||
| return nil, err | ||
| _ = targetProg.Close() |
There was a problem hiding this comment.
Is there no error when the shutdown is performed?
|
@LiZhenCheng9527 Good catch, thank you for the review! Yes, Close() does return an error. The _ = assignment is intentional — it explicitly acknowledges the error to satisfy the errcheck linter (which is enabled in .golangci.yaml) while doing best-effort cleanup. On this error path, Info() has already failed, so we are returning that primary error. The underlying Close() is just syscall.Close(fd), which can only realistically fail with EBADF — not something we can recover from. This pattern is consistent with the rest of the codebase (e.g., client.go, rbac.go, ads_controller.go)!!!! |
Signed-off-by: Shivansh Sahu <sahushivansh142@gmail.com>
Signed-off-by: Shivansh Sahu <sahushivansh142@gmail.com>
950b140 to
6b19390
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report❌ Patch coverage is
❌ Your patch check has failed because the patch coverage (42.85%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.
... and 2 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
Fixes #1618.
What this PR does :
The
GetProgramByNameandGetMapByNamefunctions inpkg/utils/ebpf.gowere iterating over system BPF objects but failing to close the file descriptors generated byebpf.NewProgramFromIDandebpf.NewMapFromIDon non-matching iterations. In environments with many loaded BPF objects, this loop quickly exhausts the open file descriptor limit (EMFILE).This PR fixes the issue by:
Close()calls on error paths and before continuing to the next iteration.deferinside theforloop to ensure FDs are released immediately per-iteration.TestGetProgramByName_FDLeak) that aggressively searches for a non-existent program to verify FDs remain stable during full system iterations.Testing:
Locally verified using the new regression test. Before the fix, the loop leaked over 120 FDs. After the fix, FDs remain perfectly stable: