Skip to content

NM-279: fix random listen port on Windows#1266

Draft
abhishek9686 wants to merge 2 commits intorelease-v1.5.0from
NM-279-v2
Draft

NM-279: fix random listen port on Windows#1266
abhishek9686 wants to merge 2 commits intorelease-v1.5.0from
NM-279-v2

Conversation

@abhishek9686
Copy link
Copy Markdown
Member

On Windows, the WireGuard-nt adapter from a previous run can keep the UDP port bound, causing GetFreePort to see it as busy and assign a random port. Additionally, concurrent wgctrl access and transient driver handle contention lead to ConfigureDevice failures.

  • Query the adapter's actual listen port before running GetFreePort so the port owned by our own adapter is not replaced
  • Add retry logic with backoff to apply() on Windows for transient 'file being used by another process' errors
  • Protect GetPeersFromDevice, GetPeer, and UpdatePeer with wgMutex to prevent concurrent device access
  • Add a delay between Close and Create in resetInterfaceFunc on Windows to let the kernel driver fully release the device handle

Describe your changes

Provide Issue ticket number if applicable/not in title

Provide link to Netmaker PR if required

Provide testing steps

Checklist before requesting a review

  • My changes affect only 10 files or less.
  • I have performed a self-review of my code and tested it.
  • If it is a new feature, I have added thorough tests, my code is <= 1450 lines.
  • If it is a bugfix, my code is <= 200 lines.
  • My functions are <= 80 lines.
  • I have had my code reviewed by a peer.
  • My unit tests pass locally.
  • Netclient & Netmaker are awesome.

…s the port

On Windows the WireGuard-nt adapter from a previous run can keep the UDP
port bound, causing GetFreePort to see it as busy and assign a random
port. Additionally, concurrent wgctrl access and transient driver handle
contention lead to ConfigureDevice failures.

- Query the adapter's actual listen port before running GetFreePort so
  the port owned by our own adapter is not replaced
- Add retry logic with backoff to apply() on Windows for transient
  'file being used by another process' errors
- Protect GetPeersFromDevice, GetPeer, and UpdatePeer with wgMutex to
  prevent concurrent device access
- Add a delay between Close and Create in resetInterfaceFunc on Windows
  to let the kernel driver fully release the device handle
@tenki-reviewer
Copy link
Copy Markdown
Contributor

tenki-reviewer bot commented Mar 18, 2026

Tenki Code Review - Complete

Files Reviewed: 5
Findings: 1

By Severity:

  • 🟡 Low: 1

This PR correctly addresses a Windows-specific race condition where the WireGuard adapter's port falsely appears busy, adding mutex protection to previously unguarded functions and a retry loop for transient device-busy errors. One low-severity issue was found: the retry loop's fallback error message at the end of apply() is unreachable dead code due to an off-by-one in the loop condition.

Files Reviewed (5 files)
functions/daemon.go
functions/mqhandlers.go
functions/uninstall.go
wireguard/types.go
wireguard/wireguard.go

Copy link
Copy Markdown
Contributor

@tenki-reviewer tenki-reviewer bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overview

This PR (NM-279) fixes a Windows-specific bug where the netclient would unnecessarily change its listen port on restart because the existing WireGuard adapter's port appeared busy to net.ListenUDP. The fix is well-structured and targets the correct layers.

Changes Reviewed

  • wireguard/wireguard.go — New apply() retry loop with isDeviceBusyError() helper and new GetDeviceListenPort() function.
  • wireguard/types.goNCIface.UpdatePeer now holds wgMutex before calling apply(), consistent with other public methods like SetPeers and Configure.
  • wireguard/wireguard.goGetPeersFromDevice and GetPeer also gain wgMutex guards, improving thread safety.
  • functions/daemon.go — Port assignment now checks GetDeviceListenPort() first; skips the UDP bind test when the adapter already owns the desired port.
  • functions/mqhandlers.go / functions/uninstall.go — A 500 ms sleep on Windows between Close() and NewNCIface() to allow the kernel to release the adapter handle.

Finding

Unreachable fallback error in apply() retry loop (wireguard/wireguard.go)

The loop condition is attempt <= maxRetries (where maxRetries = 3), giving 4 iterations (0–3). The inner retry guards check attempt < maxRetries, so on the final iteration (attempt == 3) a device-busy error skips retrying and falls to return err at line 126. Every iteration therefore ends with either continue, return nil, or return err — the loop body never exits without returning. The return fmt.Errorf("ConfigureDevice failed after %d retries", maxRetries) at line 128 is unreachable dead code. While functionally harmless (the actual error is still returned), the intent appears to be that line 128 provides the definitive exhausted-retries message, and the current structure defeats that.

Positive Notes

  • The string-matching approach in isDeviceBusyError is pragmatic for Windows error messages and handles the known variants.
  • The portOwnedByDevice check in daemon.go cleanly isolates the Windows-specific logic without affecting the non-static-port code path on other platforms.
  • Adding wgMutex to NCIface.UpdatePeer, GetPeersFromDevice, and GetPeer improves consistency across the locking model.

On the final retry attempt, busy errors were returned directly instead
of falling through to the exhausted-retries message. Restructure so
busy errors always continue the loop and the post-loop error with the
retry count is the definitive exhaustion path.
@abhishek9686 abhishek9686 marked this pull request as draft March 31, 2026 01:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant