Skip to content

[Control Plane] PresenceManager.Start can spawn duplicate sweep goroutines #431

@santoshkumarradha

Description

@santoshkumarradha

Summary

PresenceManager.Start has no guard against duplicate calls, so calling it more than once spawns multiple background sweep goroutines that race on the same lease map and fire the expiration callback multiple times per node.

Context

presence_manager.go:60 implements Start() as a two-liner that unconditionally launches go pm.loop(). Stop() is protected by sync.Once, but Start() is not. In test suites that restart the manager between subtests, or in any reload path that calls Start again without calling Stop first, two goroutines will concurrently iterate and mutate pm.leases. Both will call expireCallback for the same nodes at the same time, potentially triggering double-deregistration or duplicate notifications.

Scope

In Scope

  • Guard Start() so it can safely be called multiple times — use a sync.Once or a started atomic bool protected by the existing mutex.
  • Ensure a second call to Start is either a no-op or returns an error rather than spawning a second goroutine.
  • Match the guard semantics to Stop so Start → Stop → Start works correctly (the sync.Once pattern prevents restart; prefer an atomic bool if restart support is needed).

Out of Scope

  • Changing the lease sweep logic or expiration timing.
  • Adding restart semantics if they don't exist today — a no-op second Start is sufficient.

Files

  • control-plane/internal/services/presence_manager.go:60 — add concurrency guard to Start()
  • control-plane/internal/services/presence_manager_test.go — test: calling Start twice does not create two sweep goroutines (verify via goroutine count or mock callback call count)

Acceptance Criteria

  • Calling Start() twice does not spawn a second goroutine
  • expireCallback is called at most once per expired node lease per sweep cycle
  • Existing presence manager tests continue to pass
  • Tests pass (go test ./control-plane/...)
  • Linting passes (make lint)

Notes for Contributors

Severity: LOW

If Start → Stop → Start (restart) is a requirement, use an atomic.Bool (Go 1.19+) for started and reset it in Stop. If restart is not needed, a sync.Once is simpler and prevents misuse. Check whether any tests in the suite call Start multiple times — fix those callers too.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:control-planeControl plane server functionalitybugSomething isn't working

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions