fix(ebpf): prevent file descriptor leak in resource retrieval by shivansh-gohem · Pull Request #1632 · kmesh-net/kmesh

shivansh-gohem · 2026-03-28T12:36:39Z

What this PR does :
The GetProgramByName and GetMapByName functions in pkg/utils/ebpf.go were iterating over system BPF objects but failing to close the file descriptors generated by ebpf.NewProgramFromID and ebpf.NewMapFromID on non-matching iterations. In environments with many loaded BPF objects, this loop quickly exhausts the open file descriptor limit (EMFILE).

This PR fixes the issue by:

Adding explicit Close() calls on error paths and before continuing to the next iteration.
Avoiding defer inside the for loop to ensure FDs are released immediately per-iteration.
Adding a regression test (TestGetProgramByName_FDLeak) that aggressively searches for a non-existent program to verify FDs remain stable during full system iterations.

Testing:
Locally verified using the new regression test. Before the fix, the loop leaked over 120 FDs. After the fix, FDs remain perfectly stable:

=== RUN   TestGetProgramByName_FDLeak
    ebpf_test.go:21: Initial open File Descriptors: 7
    ebpf_test.go:31: Final open File Descriptors: 7
    ebpf_test.go:37: No leak detected.
--- PASS: TestGetProgramByName_FDLeak (0.13s)

kmesh-bot · 2026-03-28T12:36:46Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign yaozengzeng for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

pkg/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

gemini-code-assist · 2026-03-28T12:36:47Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical file descriptor leak occurring during the retrieval of eBPF programs and maps by name. By ensuring that file descriptors are properly closed after each iteration—even when a match is not found—the system avoids hitting the EMFILE limit in environments with numerous loaded BPF objects.

Highlights

Resource Management: Added explicit Close() calls to eBPF program and map file descriptors within iteration loops to prevent resource exhaustion.
Regression Testing: Introduced a new test case, TestGetProgramByName_FDLeak, to monitor and verify file descriptor stability during system-wide BPF object iteration.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

The loop would run and open wide, / With file descriptors left inside. / A leak so deep, the system groaned, / Now closed and safe, the fix is owned.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request fixes file descriptor leaks in GetProgramByName and GetMapByName by ensuring BPF resources are closed during iteration and error handling. It also adds a regression test for GetProgramByName. The review feedback recommends improving error handling by using the %w verb for error wrapping and removing trailing newlines from error messages.

pkg/utils/ebpf.go

Copilot

Pull request overview

Fixes a file descriptor leak when iterating over system eBPF programs/maps by ensuring per-iteration handles are closed, and adds a regression test intended to detect FD growth when searching for a non-existent program.

Changes:

Close ebpf.Program / ebpf.Map handles on non-matching iterations and on Info() error paths.
Add a regression test that counts process FDs via /proc/<pid>/fd while repeatedly calling GetProgramByName for a missing program.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

File	Description
pkg/utils/ebpf.go	Ensures `Program`/`Map` objects are closed on non-match and certain error paths during iteration.
pkg/utils/ebpf_test.go	Adds an FD-counting regression test to catch descriptor leaks while scanning system eBPF objects.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

pkg/utils/ebpf.go

pkg/utils/ebpf_test.go

Copilot · 2026-03-28T12:40:16Z

pkg/utils/ebpf_test.go

+package utils
+
+import (
+	"fmt"
+	"os"
+	"testing"
+)
+
+// Helper function to count open file descriptors for the current process
+func countOpenFDs() int {
+	pid := os.Getpid()
+	fds, err := os.ReadDir(fmt.Sprintf("/proc/%d/fd", pid))


This test relies on Linux-specific /proc/<pid>/fd. The repo already uses Linux build constraints for OS-specific code (e.g., pkg/utils/kernel_version.go:1), so this test should also be guarded (build tag //go:build linux or runtime check + t.Skip) to avoid failing on non-Linux builders.

pkg/utils/ebpf_test.go

pkg/utils/ebpf.go

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

pkg/utils/ebpf_test.go

pkg/utils/ebpf.go

Copilot · 2026-03-28T18:14:56Z

pkg/utils/ebpf.go

 		if targetMapInfo, err = targetMap.Info(); err != nil {
-			err = fmt.Errorf("failed to get new map info from fd:%v, err is %v\n", targetMap, err)
-			return nil, err
+			targetMap.Close()
+			return nil, fmt.Errorf("failed to get new map info from fd:%v: %w", targetMap, err)
 		}


Same as above: the message refers to "fd:%v" but passes targetMap (a *ebpf.Map handle), not the numeric FD. Consider logging targetMap.FD() and/or mapID so the error output is accurate and actionable.

pkg/utils/ebpf_test.go

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-29T13:12:37Z

pkg/utils/ebpf.go

 		if progID, err = ebpf.ProgramGetNextID(progID); err != nil {
-			err = fmt.Errorf("failed to get system next program id, err is %v\n", err)
-			return nil, err
+			return nil, fmt.Errorf("failed to get system next program id: %w", err)
 		}


When the target program name isn't found, ebpf.ProgramGetNextID will eventually return the end-of-iteration error (typically ENOENT). Returning it as "failed to get system next program id" makes the not-found case look like an unexpected failure. Consider detecting the end-of-iteration error and returning a clearer "program not found" error (still wrapping the original error so callers can errors.Is on it).

Copilot · 2026-03-29T13:12:38Z

pkg/utils/ebpf.go

 		if mapID, err = ebpf.MapGetNextID(mapID); err != nil {
-			err = fmt.Errorf("failed to get system next map id, err is %v\n", err)
-			return nil, err
+			return nil, fmt.Errorf("failed to get system next map id: %w", err)
 		}


When the target map name isn't found, ebpf.MapGetNextID will eventually return the end-of-iteration error (typically ENOENT). Returning it as "failed to get system next map id" makes the not-found case look like an unexpected failure. Consider detecting the end-of-iteration error and returning a clearer "map not found" error (still wrapping the original error so callers can errors.Is on it).

Copilot · 2026-03-29T13:12:38Z

pkg/utils/ebpf.go

 		if targetMapInfo, err = targetMap.Info(); err != nil {
-			err = fmt.Errorf("failed to get new map info from fd:%v, err is %v\n", targetMap, err)
-			return nil, err
+			targetMap.Close()
+			return nil, fmt.Errorf("failed to get map info for id:%v: %w", mapID, err)


These Close() calls ignore the returned error. Since this file is linted with errcheck, it’s safer to either handle the Close error (e.g., return/join it on the Info() error path) or explicitly assign it to _ to document that it’s intentionally ignored.

Copilot · 2026-03-29T13:12:38Z

pkg/utils/ebpf_test.go

+	// Searching for a non-existent program forces it to iterate through ALL
+	// loaded BPF programs, which would previously leak an FD for every single one.
+	for i := 0; i < 10; i++ {
+		_, err := GetProgramByName("non_existent_fake_prog_12345")
+		if err == nil {
+			t.Fatal("expected error for non-existent program, got nil")
+		}
+	}


As written, these tests pass for any non-nil error (including EPERM/unsupported-kernel cases) and may not actually exercise the full iteration path that previously leaked FDs. To make this a stronger regression test, consider creating a minimal eBPF program/map in the test setup (similar to pkg/controller/manage/manage_controller_test.go:newTextXdpProg and pkg/auth/rbac_test.go:genRingbuf) and then asserting the call fails specifically with the end-of-iteration/not-found error; otherwise Skip.

LiZhenCheng9527 · 2026-04-01T07:58:23Z

pkg/utils/ebpf.go

 		if targetProgInfo, err = targetProg.Info(); err != nil {
-			err = fmt.Errorf("failed to get new program info from fd:%v, err is %v\n", targetProg, err)
-			return nil, err
+			_ = targetProg.Close()


Is there no error when the shutdown is performed?

shivansh-gohem · 2026-04-03T12:15:29Z

@LiZhenCheng9527 Good catch, thank you for the review! Yes, Close() does return an error. The _ = assignment is intentional — it explicitly acknowledges the error to satisfy the errcheck linter (which is enabled in .golangci.yaml) while doing best-effort cleanup.

On this error path, Info() has already failed, so we are returning that primary error. The underlying Close() is just syscall.Close(fd), which can only realistically fail with EBADF — not something we can recover from. This pattern is consistent with the rest of the codebase (e.g., client.go, rbac.go, ads_controller.go)!!!!

Signed-off-by: Shivansh Sahu <sahushivansh142@gmail.com>

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

codecov · 2026-04-08T18:02:44Z

Codecov Report

❌ Patch coverage is 42.85714% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 39.74%. Comparing base (e78a0a6) to head (6b19390).
⚠️ Report is 7 commits behind head on main.

Files with missing lines	Patch %	Lines
pkg/utils/ebpf.go	42.85%	8 Missing ⚠️

❌ Your patch check has failed because the patch coverage (42.85%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Files with missing lines	Coverage Δ
pkg/utils/ebpf.go	`61.90% <42.85%> (+61.90%)`	⬆️

... and 2 files with indirect coverage changes

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fa3035c...6b19390. Read the comment docs.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot AI review requested due to automatic review settings March 28, 2026 12:36

kmesh-bot requested review from LiZhenCheng9527 and Okabe-Rintarou-0 March 28, 2026 12:36

kmesh-bot added the size/L label Mar 28, 2026

Copilot started reviewing on behalf of shivansh-gohem March 28, 2026 12:37 View session

gemini-code-assist bot reviewed Mar 28, 2026

View reviewed changes

pkg/utils/ebpf.go Outdated Show resolved Hide resolved

Copilot AI reviewed Mar 28, 2026

View reviewed changes

shivansh-gohem force-pushed the fix-ebpf-fd-leak branch from d038dff to 2935630 Compare March 28, 2026 17:37

kmesh-bot added size/M and removed size/L labels Mar 28, 2026

Copilot AI review requested due to automatic review settings March 28, 2026 18:12

shivansh-gohem force-pushed the fix-ebpf-fd-leak branch from 2935630 to 4e3af85 Compare March 28, 2026 18:12

kmesh-bot added size/L and removed size/M labels Mar 28, 2026

Copilot started reviewing on behalf of shivansh-gohem March 28, 2026 18:12 View session

Copilot AI reviewed Mar 28, 2026

View reviewed changes

shivansh-gohem force-pushed the fix-ebpf-fd-leak branch from 4e3af85 to c08fa4e Compare March 28, 2026 18:26

Copilot AI review requested due to automatic review settings March 29, 2026 13:07

Copilot started reviewing on behalf of shivansh-gohem March 29, 2026 13:08 View session

Copilot AI reviewed Mar 29, 2026

View reviewed changes

shivansh-gohem force-pushed the fix-ebpf-fd-leak branch from 0d2248b to 950b140 Compare March 29, 2026 13:42

LiZhenCheng9527 reviewed Apr 1, 2026

View reviewed changes

shivansh-gohem added 2 commits April 8, 2026 17:40

fix(ebpf): prevent file descriptor leak in resource retrieval

2081233

Signed-off-by: Shivansh Sahu <sahushivansh142@gmail.com>

chore: run make gen to fix CI formatting rules

6b19390

Signed-off-by: Shivansh Sahu <sahushivansh142@gmail.com>

Copilot AI review requested due to automatic review settings April 8, 2026 17:40

shivansh-gohem force-pushed the fix-ebpf-fd-leak branch from 950b140 to 6b19390 Compare April 8, 2026 17:40

Copilot started reviewing on behalf of shivansh-gohem April 8, 2026 17:40 View session

Copilot AI reviewed Apr 8, 2026

View reviewed changes

Conversation

shivansh-gohem commented Mar 28, 2026

Uh oh!

kmesh-bot commented Mar 28, 2026

Uh oh!

gemini-code-assist bot commented Mar 28, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

LiZhenCheng9527 Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

shivansh-gohem commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

codecov bot commented Apr 8, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

shivansh-gohem commented Apr 3, 2026 •

edited

Loading