fix(auth): deduplicate concurrent IsAuthenticated() balloon notifications#1183
Conversation
…() calls Concurrent callers of IsAuthenticated() with the same token each independently called the auth provider and each sent a balloon notification on transient failure (e.g. 400 from /oauth2/token), resulting in duplicate popups in the IDE (observed 3x). With singleflight, only one in-flight API call is made per concurrent wave for the same token. All other callers share the result, so only one balloon notification is sent regardless of how many goroutines check simultaneously. Adds CheckAuthDelay to FakeAuthenticationProvider to allow concurrency testing, and a regression test that verifies exactly one notification is sent for 3 concurrent concurrent callers on transient auth failure.
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
This comment has been minimized.
This comment has been minimized.
- Extract token once at start of isAuthenticated() — eliminates three redundant config.GetToken(conf) calls (cache check, empty-token guard, singleflight key) in favour of a single read - Make type assertion on singleflight result explicit with a panic on failure so a future type mismatch surfaces immediately as a programming error rather than silently returning unauthenticated - Apply CheckAuthDelay in both success and failure branches of FakeAuthenticationProvider.GetCheckAuthenticationFunction so the delay works for both positive and negative test scenarios - Add AuthCallCount (atomic int32) to FakeAuthenticationProvider to let tests assert how many times the check function was invoked - Add TestIsAuthenticated_ConcurrentCallsOnlyInvokeProviderOnce to cover the success-path: three concurrent callers on a cache-miss must invoke the provider exactly once - Add load-bearing comment explaining why the 50ms CheckAuthDelay is required for concurrency overlap in both concurrent tests
|
/describe |
|
/review |
|
PR Description updated to latest commit (6f74b49) |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
/describe |
|
/review |
|
PR Description updated to latest commit (cbb4d11) |
This comment has been minimized.
This comment has been minimized.
On CI runners (2-4 vCPUs), the CLI semaphore was calculated as max(1, NumCPU-4) = 1, serializing all CLI calls. With ~40 CLI invocations across smoke tests (working dir + reference branch scans), this caused the application/server package to exceed its 60m timeout on Windows consistently. When the CI env var is set, use NumCPU directly (no reservation). The -4 reservation is kept for local development where CPUs should be left for the IDE.
|
/describe |
|
/review |
|
PR Description updated to latest commit (cbb4d11) |
|
/describe |
|
/review |
On main, all integration/smoke test helpers called
WithBinarySearchPaths([]string{}) on the Config struct, preventing the
env-defaults goroutine from walking directories like C:\Program Files
when searching for mvn.exe or java.exe.
After the IDE-1786 Config-struct removal, this safeguard was lost.
prepareTestHelper (used by SmokeTestWithEngine and IntegTestWithEngine)
now pre-seeds an empty SettingBinarySearchPaths into the GAF
configuration before passing it to InitEngine, so the goroutine that
calls DetermineJavaHome/MavenDefaults never performs an exhaustive
directory walk.
On Windows CI this walk took ~10 minutes per test setup (hundreds of
thousands of files under C:\Program Files), causing the application/server
package to hit the 60-minute go test timeout. This restores the behavior
from main and brings Windows CI back in line.
|
/describe |
|
/review |
|
PR Description updated to latest commit (c357491) |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
ScanProgress.Listen previously blocked indefinitely on its select waiting for cancel or done signals. If a scan ended without calling SetDone or CancelScan (e.g. due to an error early return), the listener goroutine would leak and keep the WaitGroup in DelegatingConcurrentScanner from completing, causing subsequent tests or scans to hang until timeout. Adding ctx.Done() as a third select case ensures the goroutine always exits when the scan's context is canceled, which happens at the latest when the scan function returns via defer cancel().
runtime.NumCPU() already returns int, the conversion was redundant.
|
/describe |
|
/review |
This comment has been minimized.
This comment has been minimized.
|
PR Description updated to latest commit (bd4188b) |
This comment has been minimized.
This comment has been minimized.
…stHelper prepareTestHelper passes a pre-configured engine to InitEngine to avoid binary search path walks in CI. However, InitEngine only calls InitWorkflows and engine.Init when engine==nil, so integration and smoke tests were running without registered workflows. This caused: - Test_connectivityCheckCommand_Execute_integration to fail because the connectivity check workflow was never registered - Test_textDocumentInlineValues_InlineValues_IntegTest to fail because the CLI installer workflow was unavailable - Test_SmokeInstanceTest to fail for the same reason Also set EXECUTION_MODE_KEY and PersistInStorage on preConf to match what InitEngine does for nil engines.
|
/describe |
|
/review |
|
PR Description updated to latest commit (bd4188b) |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Previously, ScanProgress.SetDone() was only called on the happy path in scanInternal, leaving isDone=false when legacyScan or ostestScan returned an error. The next scan on the same folder would call CancelScan() on the stale ScanProgress, hitting the 5-second channel timeout. Convert the explicit SetDone() call to a deferred function, matching the IAC scanner's existing pattern. SetDone() is safe to call repeatedly or after cancellation, so deferring it guarantees cleanup on all exit paths. Also adds a gomock-generated Executor mock and a regression test (Test_ScanError_ScanProgressIsMarkedDone) that verifies IsDone()==true after a scan error.
|
/describe |
|
PR Description updated to latest commit (01ce9fa) |
This comment has been minimized.
This comment has been minimized.
…tx.Done() When Listen exits via ctx.Done(), the unbuffered done channel has no active reader. Any subsequent call to SetDone() (e.g. from a deferred cleanup in scanInternal or iac.Scan) would block for the full 5-second timeout before returning, effectively hanging the scan goroutine on every context-canceled completion. Changing done to a buffered channel (capacity 1) ensures SetDone() can always complete the send immediately: if Listen is still active it reads the value normally; if Listen has already exited the value sits harmlessly in the buffer until the ScanProgress is garbage-collected. Adds TestScanProgress_SetDone_DoesNotBlockAfterContextCanceled to reproduce the hang and confirm the fix, plus additional unit tests for the normal SetDone, CancelScan, and idempotent SetDone paths.
|
/describe |
|
PR Description updated to latest commit (392361d) |
PR Reviewer Guide 🔍
|
User description
Description
Fixes a bug where the IDE showed duplicate authentication error balloon notifications on a single transient token-refresh failure.
Root cause:
IsAuthenticated()usessync.RWMutex(read lock), allowing multiple goroutines to enter the slow path concurrently. Each goroutine independently called the auth provider API and sent awindow/showMessagenotification. Three concurrent callers —DelegatingConcurrentScanner.Scan(), theDidChangeWorkspaceFoldershandler, and the authInitializer— each triggered the popup independently.Changes in this PR:
Auth: deduplicate concurrent IsAuthenticated() calls
singleflight.GrouptoAuthenticationServiceImplso concurrent callers with the same token share one in-flight API call and receive the same result — only one notification is ever sent per concurrent wave.notifDedup) usingsync.Mutex+ timestamp: identical messages are suppressed for 30 s; different messages are shown immediately.atomic.Int64) and reset on credential change.shouldCauseLogout/isTransientNetworkErrorhelpers to avoid spurious logouts on DNS/TCP/context-cancellation errors.invalid_grantwrapped inurl.Errornow correctly triggers logout.Logging: non-blocking LSP write channel
lspWriterchannel capacity from 1,000,000 to 10,000 and eliminated the bootstrap allocation.WriteLevelnow uses aselect/defaultso it never blocks the caller when the channel is full — messages are dropped to stderr instead of freezing the main goroutine.lsp_logger_test.gocovering level filtering and the non-blocking fallback.Secrets: silent skip when feature flag disabled
SnykSecretsEnabledfeature flag is off,Scan()now returnsnilerror and empty issues instead of an error — preventing a spurious balloon notification ("feature flag not found").CI/test fixes
macos-latest-largeto resolve OOM failures.build.yamlrunner name.TestUnifiedTestApiSmokeTestandsubstituteDepGraphFlowon macOS/Windows where they consistently OOM/exit-1.verifyQuickFixForIssuein smoke tests to distinguish transient RPC errors (skip) from correctness violations (fail fast), eliminating a 2-minute flake.maxIntegTestDurationfrom 45 m to 15 m.featureflag.Overridefor test injection.Checklist
make generate)make lint-fix)PR Type
Bug fix, Enhancement
Description
Deduplicate concurrent authentication status checks and balloon notifications.
Improve logging reliability and prevent blocking.
Enhance test stability and error handling.
Adjust feature flag behavior for disabled secrets scanning.
Diagram Walkthrough
flowchart LR A[Auth Service] --> B(Singleflight Group); A --> C(Notification Deduplicator); D[Logging] --> E(Non-blocking LSP Write); F[Testing] --> G(Error Handling); F --> H(Platform Specific Tests); I[Secrets Scanner] --> J(Feature Flag Check); B --> K{Auth API Call}; C --> L{IDE Notification}; E --> M{Stderr Fallback}; G --> N(Test Stability); H --> O(Platform Skip); J --> P{Silent Skip};File Walkthrough
1 files
Add comment to binary path finding logic1 files
Use stderr for bootstrap logger15 files
Pin experimental risk score feature flags for CIImprove error handling in test callbacks and code action verificationReduce integration test durationAdd platform checks for depgraph CLI testsAdd tests for concurrent auth calls and error handlingAdd delay and call count to fake auth provider for testingGenerate mock for CLI executorAdd override method to fake feature flag serviceAdd test for feature flag override functionalityTest scan progress marking on errorUpdate secrets scan test for disabled feature flagAdd tests for LSP logger non-blocking behaviorAdd tests for scan progress lifecycle managementAdd NotOnMacOS helper functionPre-configure engine for faster test initialization4 files
Deduplicate concurrent IsAuthenticated calls and notificationsAdjust CLI concurrency limit for CI environmentsImplement feature flag override mechanismMake LSP logger non-blocking and reduce channel capacity4 files
Ensure IAC scan goroutine exits on context doneEnsure OSS scan goroutine exits and marks done on errorSilently skip secrets scan when feature flag is disabledEnsure scan goroutines exit and handle done channel correctly2 files
Update go.mod with new dependenciesUpdate go.sum with new dependencies1 files