Skip to content

perf: return pre-goroutine value to avoid lock reacquisition in getLatestVersionCached#21

Merged
mudrii merged 3 commits intomudrii:mainfrom
SweetSophia:fix/concurrent-version-fetch-v2
Apr 13, 2026
Merged

perf: return pre-goroutine value to avoid lock reacquisition in getLatestVersionCached#21
mudrii merged 3 commits intomudrii:mainfrom
SweetSophia:fix/concurrent-version-fetch-v2

Conversation

@SweetSophia
Copy link
Copy Markdown
Contributor

@SweetSophia SweetSophia commented Apr 11, 2026

Type

  • feat — new feature or panel
  • fix — bug fix
  • perf — performance improvement (no behaviour change)
  • test — tests only (no production code change)
  • docs — documentation only
  • refactor — internal restructure (no behaviour change)
  • chore — tooling, CI, config

Summary

Return pre-goroutine cached value in getLatestVersionCached to avoid unnecessary RLock reacquisition after spawning the background fetch goroutine. This is a deliberate stale-while-revalidate trade-off: the first caller after cache expiry receives a slightly stale value while a fresh fetch runs in the background, but avoids an unnecessary lock operation on every return. Concurrency deduping via the latestRefresh flag remains unchanged.

Closes #

What Changed

File What changed
internal/appsystem/system_service.go In getLatestVersionCached: capture v := s.latestVer before spawning the goroutine (under the write lock), return it directly after. Avoids RLock reacquisition on every return.
internal/appsystem/latest_version_test.go New test file: 4 concurrent tests (dedup, stale-while-revalidate, negative caching, failure negative caching) with t.Cleanup and polling helper.

Test Evidence

ok  	github.com/mudrii/openclaw-dashboard/internal/appsystem	1.136s

Checklist

Code quality

  • No new globals outside the 7 module objects + 4 utilities ($, esc, safeColor, relTime)
  • Every dynamic value inserted into the DOM goes through esc()
  • No hardcoded hex colors — CSS variables only
  • No new frontend dependencies
  • No new Go module dependencies

Tests

  • All existing tests pass: go test -race ./...
  • New behaviour has at least one test

Manual verification

  • Tested in at least one dark theme and one light theme
  • Tested on desktop and mobile viewport (< 768px)

Documentation

  • CHANGELOG.md updated under the correct version heading (backend-only change; no changelog entry required)
  • README.md updated if a new panel or config key was added (no new config keys)

Screenshots / Recordings

Omit for backend-only PR.

Breaking Changes

None.

Agent Review Notes

Maintainer requested rephrasing from "freshest possible return" to honest "stale-while-revalidate optimization clarification". Code behavior unchanged; title and description updated to match actual implementation.

Instead of capturing a stale cached value before the lock-release/
goroutine-spawn window, re-read latestVer under RLock at return time.
This ensures callers get the freshest available value even if the
background goroutine completed between the spawn and the return.

Added concurrent test coverage:
- TestGetLatestVersionCached_ConcurrentCalls_NoRace: 20 goroutines
  hitting the function simultaneously, verifies exactly 1 fetch
- TestGetLatestVersionCached_ReturnsCachedValueWhileRefreshing:
  verifies stale-while-revalidate behavior
- TestGetLatestVersionCached_NegativeCaching: verifies latestAt is
  set even on fetch failure to prevent thundering herd
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces unit tests for the version caching mechanism, including concurrency and negative caching scenarios, and updates the service logic to return the most recent cached value. The review feedback recommends using t.Cleanup for more robust global state management in tests, correcting an inaccurate comment about synchronization, and replacing hardcoded sleeps with polling to prevent test flakiness.

}

var fetchCount atomic.Int32
original := fetchLatestVersion
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using t.Cleanup to restore global state is more robust than manual restoration at the end of the test. It ensures the state is restored even if the test panics or fails early, and it keeps the setup and teardown logic together. This pattern should be applied to all tests in this file that override fetchLatestVersion.

Suggested change
original := fetchLatestVersion
original := fetchLatestVersion
t.Cleanup(func() { fetchLatestVersion = original })

Comment on lines +25 to +26
// Use a channel to block mock goroutines until we're done with the test,
// preventing races on the fetchLatestVersion global.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This comment appears to be inaccurate or copied from another test. It mentions using a channel to block mock goroutines, but the implementation uses an atomic.Int32 and does not involve any channels for synchronization.

Comment on lines +87 to +88
// Restore after all goroutines complete
fetchLatestVersion = original
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Manual restoration of global state is no longer necessary if t.Cleanup is used at the beginning of the test.

}

close(fetched)
time.Sleep(50 * time.Millisecond)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Hardcoded sleeps in tests can lead to flakiness, especially in resource-constrained CI environments. It is better to poll the latestRefresh flag until the background task completes, which is more reliable and consistent with the other tests in this file.

	svc.latestMu.RLock()
	for svc.latestRefresh {
		svc.latestMu.RUnlock()
		time.Sleep(10 * time.Millisecond)
		svc.latestMu.RLock()
	}
	svc.latestMu.RUnlock()

- Use t.Cleanup for global state restoration (fetchLatestVersion)
- Replace hardcoded sleeps with waitForLatestRefreshDone polling
- Extract shared helper to reduce test flakiness in CI
- Remove stale comment about channel-based blocking
Avoids an unnecessary RLock/re-read after spawning the background
fetch. The value captured before setting latestRefresh=true is
correct — the goroutine hasn't run yet at that point.
@mudrii
Copy link
Copy Markdown
Owner

mudrii commented Apr 11, 2026

Thanks for the focused follow-up and for adding concurrency tests.

I’m not going to merge this PR as-is.

The intent in your description is to fix a stale return window in getLatestVersionCached(), but the implementation still preserves that window.

  • In internal/appsystem/system_service.go, we now capture v := s.latestVer just before setting latestRefresh = true, spawn the goroutine, and return v.
  • The goroutine can still complete between spawn and return, so this can still return an old value.
  • That means the behavior is still stale-while-refresh, just with a different local variable name than before.

So this is mostly a contract-description mismatch rather than a high-risk code bug:

  • The tests are useful.
  • Single-flight behavior and negative caching coverage are good.
  • But they do not prove the freshness claim from the PR summary.

What I need from this PR before merge:

  1. Either make the code match the “freshest possible return” claim (with a concrete, deterministic strategy), or
  2. remove that claim and treat this as a stale-while-refresh optimization clarification.

Given this PR title/body, option 1 is fine if intended, but it needs to actually be implemented and tested.

If you want to keep the same behavior and scope, you can update this to:

  • rephrase the fix to: “keeps stale-while-refresh semantics and avoids early lock-window capture,”
  • and keep the current implementation focused on consistency and concurrency dedupe.

@SweetSophia
Copy link
Copy Markdown
Contributor Author

Okay, I will look into it. :)

@SweetSophia SweetSophia changed the title fix: re-read latestVer under RLock in getLatestVersionCached perf: return pre-goroutine value to avoid lock reacquisition in getLatestVersionCached Apr 11, 2026
@SweetSophia
Copy link
Copy Markdown
Contributor Author

Thanks for the detailed explanation — you're right that the PR description was misleading.

I've updated both the title and description to honestly reflect what the code does: deliberate stale-while-revalidate optimization. The trade-off is intentional — first callers after cache expiry get a slightly stale value while the goroutine fetches in the background, but we avoid an unnecessary RLock reacquisition on every return.

The concurrency tests (dedup, stale-while-revalidate, negative caching) cover the actual behavior. The PR is scoped to that and nothing more.

If you'd prefer a version that actually blocks for fresh data, let me know and I can implement that instead — but that would be a different trade-off (latency vs freshness) rather than a bug fix.

@mudrii
Copy link
Copy Markdown
Owner

mudrii commented Apr 13, 2026

LGTM with one caveat: this is intentionally stale-while-revalidate and does not guarantee freshest value. The refactor is aligned with the stated behavior and removes extra lock churn after triggering async refresh. I’m merging this now.

@mudrii mudrii merged commit 7952df9 into mudrii:main Apr 13, 2026
1 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants