perf: return pre-goroutine value to avoid lock reacquisition in getLatestVersionCached#21
Conversation
Instead of capturing a stale cached value before the lock-release/ goroutine-spawn window, re-read latestVer under RLock at return time. This ensures callers get the freshest available value even if the background goroutine completed between the spawn and the return. Added concurrent test coverage: - TestGetLatestVersionCached_ConcurrentCalls_NoRace: 20 goroutines hitting the function simultaneously, verifies exactly 1 fetch - TestGetLatestVersionCached_ReturnsCachedValueWhileRefreshing: verifies stale-while-revalidate behavior - TestGetLatestVersionCached_NegativeCaching: verifies latestAt is set even on fetch failure to prevent thundering herd
There was a problem hiding this comment.
Code Review
This pull request introduces unit tests for the version caching mechanism, including concurrency and negative caching scenarios, and updates the service logic to return the most recent cached value. The review feedback recommends using t.Cleanup for more robust global state management in tests, correcting an inaccurate comment about synchronization, and replacing hardcoded sleeps with polling to prevent test flakiness.
| } | ||
|
|
||
| var fetchCount atomic.Int32 | ||
| original := fetchLatestVersion |
There was a problem hiding this comment.
Using t.Cleanup to restore global state is more robust than manual restoration at the end of the test. It ensures the state is restored even if the test panics or fails early, and it keeps the setup and teardown logic together. This pattern should be applied to all tests in this file that override fetchLatestVersion.
| original := fetchLatestVersion | |
| original := fetchLatestVersion | |
| t.Cleanup(func() { fetchLatestVersion = original }) |
| // Use a channel to block mock goroutines until we're done with the test, | ||
| // preventing races on the fetchLatestVersion global. |
| // Restore after all goroutines complete | ||
| fetchLatestVersion = original |
| } | ||
|
|
||
| close(fetched) | ||
| time.Sleep(50 * time.Millisecond) |
There was a problem hiding this comment.
Hardcoded sleeps in tests can lead to flakiness, especially in resource-constrained CI environments. It is better to poll the latestRefresh flag until the background task completes, which is more reliable and consistent with the other tests in this file.
svc.latestMu.RLock()
for svc.latestRefresh {
svc.latestMu.RUnlock()
time.Sleep(10 * time.Millisecond)
svc.latestMu.RLock()
}
svc.latestMu.RUnlock()- Use t.Cleanup for global state restoration (fetchLatestVersion) - Replace hardcoded sleeps with waitForLatestRefreshDone polling - Extract shared helper to reduce test flakiness in CI - Remove stale comment about channel-based blocking
Avoids an unnecessary RLock/re-read after spawning the background fetch. The value captured before setting latestRefresh=true is correct — the goroutine hasn't run yet at that point.
|
Thanks for the focused follow-up and for adding concurrency tests. I’m not going to merge this PR as-is. The intent in your description is to fix a stale return window in
So this is mostly a contract-description mismatch rather than a high-risk code bug:
What I need from this PR before merge:
Given this PR title/body, option 1 is fine if intended, but it needs to actually be implemented and tested. If you want to keep the same behavior and scope, you can update this to:
|
|
Okay, I will look into it. :) |
|
Thanks for the detailed explanation — you're right that the PR description was misleading. I've updated both the title and description to honestly reflect what the code does: deliberate stale-while-revalidate optimization. The trade-off is intentional — first callers after cache expiry get a slightly stale value while the goroutine fetches in the background, but we avoid an unnecessary RLock reacquisition on every return. The concurrency tests (dedup, stale-while-revalidate, negative caching) cover the actual behavior. The PR is scoped to that and nothing more. If you'd prefer a version that actually blocks for fresh data, let me know and I can implement that instead — but that would be a different trade-off (latency vs freshness) rather than a bug fix. |
|
LGTM with one caveat: this is intentionally stale-while-revalidate and does not guarantee freshest value. The refactor is aligned with the stated behavior and removes extra lock churn after triggering async refresh. I’m merging this now. |
Type
Summary
Return pre-goroutine cached value in
getLatestVersionCachedto avoid unnecessary RLock reacquisition after spawning the background fetch goroutine. This is a deliberate stale-while-revalidate trade-off: the first caller after cache expiry receives a slightly stale value while a fresh fetch runs in the background, but avoids an unnecessary lock operation on every return. Concurrency deduping via thelatestRefreshflag remains unchanged.Closes #
What Changed
internal/appsystem/system_service.gogetLatestVersionCached: capturev := s.latestVerbefore spawning the goroutine (under the write lock), return it directly after. Avoids RLock reacquisition on every return.internal/appsystem/latest_version_test.got.Cleanupand polling helper.Test Evidence
Checklist
Code quality
$,esc,safeColor,relTime)esc()Tests
go test -race ./...Manual verification
Documentation
CHANGELOG.mdupdated under the correct version heading (backend-only change; no changelog entry required)README.mdupdated if a new panel or config key was added (no new config keys)Screenshots / Recordings
Omit for backend-only PR.
Breaking Changes
None.
Agent Review Notes
Maintainer requested rephrasing from "freshest possible return" to honest "stale-while-revalidate optimization clarification". Code behavior unchanged; title and description updated to match actual implementation.