Skip to content

chore: Send instrumentation data on kill#6679

Draft
PeterSchafer wants to merge 1 commit intomainfrom
chore/CLI-1416_terminated
Draft

chore: Send instrumentation data on kill#6679
PeterSchafer wants to merge 1 commit intomainfrom
chore/CLI-1416_terminated

Conversation

@PeterSchafer
Copy link
Copy Markdown
Contributor

@PeterSchafer PeterSchafer commented Mar 23, 2026

Pull Request Submission Checklist

  • Follows CONTRIBUTING guidelines
  • Commit messages are release-note ready, emphasizing what was changed, not how.
  • Includes detailed description of changes
  • Contains risk assessment (Low | Medium | High)
  • Highlights breaking API changes (if applicable)
  • Links to automated tests covering new functionality
  • Includes manual testing instructions (if necessary)
  • Updates relevant GitBook documentation (PR link: ___)
  • Includes product update to be communicated in the next stable release notes

What does this PR do?

Ensures instrumentation data is sent when the CLI is terminated by a signal (SIGINT/SIGTERM). Previously, if a user killed a running CLI command (e.g., via Ctrl+C), instrumentation data would be lost. This change:

  1. Refactors teardown logic into a dedicated tearDown() function that handles analytics, instrumentation, cleanup, and logging
  2. Adds signal handling (behind PREVIEW_FEATURES_ENABLED flag) that intercepts SIGINT/SIGTERM and runs teardown before exiting
  3. Uses sync.Once to ensure teardown runs exactly once, whether triggered by normal completion or signal
  4. Extracts instrumentation helpers to a new instrumentation.go file for better code organization
  5. Updates error-catalog dependency to include NewTerminatedBySignalError for proper error categorization

Where should the reviewer start?

Start with cliv2/cmd/cliv2/main.go:487-523 for the new tearDown() function, then review the signal handling setup at lines 635-656.

snyk/go-application-framework#578

How should this be manually tested?

with a developer build

  1. Run a long-running command (e.g., snyk test on a large project)
  2. Press Ctrl+C to send SIGINT
  3. Verify in debug logs (--debug) that instrumentation is sent before exit

What's the product update that needs to be communicated to CLI users?

N/A - This is an internal improvement to telemetry reliability. No user-facing behavior changes.

Risk assessment (Low | Medium | High)?

Medium - Changes core CLI exit flow, but:

  • Signal handling is gated behind PREVIEW_FEATURES_ENABLED flag
  • Uses sync.Once to prevent double-execution
  • Signal handler is stopped before normal teardown to prevent race conditions

Any background context you want to provide?

When users terminate CLI commands prematurely, we lose visibility into those interactions. This change improves our ability to understand CLI usage patterns and identify issues with long-running commands.

@PeterSchafer PeterSchafer requested review from a team as code owners March 23, 2026 17:36
@snyk-io
Copy link
Copy Markdown

snyk-io bot commented Mar 23, 2026

Snyk checks have passed. No issues have been found so far.

Status Scan Engine Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues
Licenses 0 0 0 0 0 issues
Code Security 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@snyk-pr-review-bot

This comment has been minimized.

@snyk-pr-review-bot

This comment has been minimized.

@PeterSchafer PeterSchafer force-pushed the chore/CLI-1416_terminated branch from 0381db7 to 511765e Compare March 24, 2026 10:18
@snyk-pr-review-bot

This comment has been minimized.

@snyk-pr-review-bot

This comment has been minimized.

@PeterSchafer PeterSchafer force-pushed the chore/CLI-1416_terminated branch from d94d307 to 402304a Compare March 24, 2026 11:04
@snyk-pr-review-bot

This comment has been minimized.

@PeterSchafer PeterSchafer force-pushed the chore/CLI-1416_terminated branch 2 times, most recently from eb91fb0 to dccddef Compare March 24, 2026 11:15
@snyk-pr-review-bot

This comment has been minimized.

@PeterSchafer PeterSchafer force-pushed the chore/CLI-1416_terminated branch from dccddef to 5984ccb Compare March 24, 2026 11:18
@snyk-pr-review-bot

This comment has been minimized.

@snyk-pr-review-bot

This comment has been minimized.

@PeterSchafer PeterSchafer force-pushed the chore/CLI-1416_terminated branch from 5984ccb to f9e02d6 Compare March 24, 2026 11:42
@snyk-pr-review-bot

This comment has been minimized.

@PeterSchafer PeterSchafer force-pushed the chore/CLI-1416_terminated branch from f9e02d6 to 1d4c73a Compare March 24, 2026 18:24
@snyk-pr-review-bot

This comment has been minimized.

@PeterSchafer PeterSchafer force-pushed the chore/CLI-1416_terminated branch from 1d4c73a to 8d03e26 Compare March 24, 2026 19:25
@PeterSchafer PeterSchafer enabled auto-merge March 24, 2026 19:25
@snyk-pr-review-bot

This comment has been minimized.

@PeterSchafer PeterSchafer force-pushed the chore/CLI-1416_terminated branch from 8d03e26 to efc5edd Compare March 24, 2026 19:34
@snyk-pr-review-bot

This comment has been minimized.

@PeterSchafer PeterSchafer force-pushed the chore/CLI-1416_terminated branch from efc5edd to 91dc563 Compare March 25, 2026 07:24
@snyk-pr-review-bot

This comment has been minimized.

@PeterSchafer PeterSchafer force-pushed the chore/CLI-1416_terminated branch from 91dc563 to 34f431c Compare March 30, 2026 12:13
@snyk-pr-review-bot

This comment has been minimized.

errorListCopy := append([]error{}, errorList...)
errorListMutex.Unlock()

finalExitCode = tearDown(ctx, signalError, errorListCopy, startTime, ua, cliAnalytics, networkAccess)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: if teardown hangs (e.g. when making the analytics API call), can we still SIGINT the process? Or does teardownOnce prevent this? Basically, is it possible to be a hanging state?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned before, I'm working on a solution.

cliAnalytics.AddError(tempError)
}
}
if globalConfiguration.GetBool(configuration.PREVIEW_FEATURES_ENABLED) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question(0): what's the reason for putting this behind preview features?
question(1): what's the condition for removing this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q1: This could easily have side effects that I want to carefully roll out. Using preview only is a simple straight forward way to collect feedback from internal and preview usage in general.
Q2: That is a good question. I would say if we think it creates value and if concerns are mitigated.

@j-luong
Copy link
Copy Markdown
Contributor

j-luong commented Mar 30, 2026

suggestion: is it possible to write a test to validate this behaviour?

@PeterSchafer PeterSchafer force-pushed the chore/CLI-1416_terminated branch from 34f431c to 63602e0 Compare March 30, 2026 17:11
@snyk-pr-review-bot

This comment has been minimized.

@PeterSchafer PeterSchafer force-pushed the chore/CLI-1416_terminated branch from 63602e0 to bbac494 Compare March 31, 2026 10:18
@snyk-pr-review-bot

This comment has been minimized.

@snyk-pr-review-bot

This comment has been minimized.

@PeterSchafer PeterSchafer force-pushed the chore/CLI-1416_terminated branch from 61855c0 to a55a4e3 Compare March 31, 2026 11:57
@snyk-pr-review-bot

This comment has been minimized.

@PeterSchafer PeterSchafer force-pushed the chore/CLI-1416_terminated branch 2 times, most recently from 8439915 to a6b802f Compare March 31, 2026 16:42
@snyk-pr-review-bot

This comment has been minimized.

@snyk-pr-review-bot

This comment has been minimized.

@PeterSchafer PeterSchafer force-pushed the chore/CLI-1416_terminated branch from a6b802f to 856f008 Compare April 1, 2026 16:21
@snyk-pr-review-bot

This comment has been minimized.

@PeterSchafer PeterSchafer force-pushed the chore/CLI-1416_terminated branch from 856f008 to 0a239d7 Compare April 2, 2026 15:56
@snyk-pr-review-bot

This comment has been minimized.

chore: ensure to kill CLI processes
@PeterSchafer PeterSchafer force-pushed the chore/CLI-1416_terminated branch from 0a239d7 to ad74501 Compare April 2, 2026 19:42
@snyk-pr-review-bot
Copy link
Copy Markdown

PR Reviewer Guide 🔍

🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Instrumentation Suppression on Signal 🟠 [major]

The tearDown function initializes teardownCtx using context.WithTimeout(ctx, teardownTimeout). Since ctx is the main context cancelled by the signal handler (line 655) before tearDown is called (line 664), the resulting teardownCtx is also immediately cancelled. As a result, sendAnalytics and sendInstrumentation will fail to send any data because their underlying HTTP requests or engine invocations will exit immediately with a context canceled error. To fix this, the teardown phase should use a fresh context (e.g., context.WithoutCancel(ctx) or context.Background()) for IO operations.

teardownCtx, cancel := context.WithTimeout(ctx, teardownTimeout)
defer cancel()
Goroutine Leak 🟡 [minor]

The signal handling goroutine (lines 650-669) blocks on <-signalChan. On normal CLI exit, signal.Stop(signalChan) is called (line 710), which stops future signals but does not close the channel. The goroutine will remain blocked indefinitely. While minor for a short-lived CLI, the channel should be closed or the goroutine should select on the main context to exit gracefully.

sig := <-signalChan
📚 Repository Context Analyzed

This review considered 31 relevant code sections from 13 files (average relevance: 0.91)

Comment on lines +491 to +493
func tearDown(ctx context.Context, err error, errorList []error, startTime time.Time, ua networking.UserAgentInfo, cliAnalytics analytics.Analytics, networkAccess networking.NetworkAccess) int {
// Create a context with timeout for teardown operations to ensure we don't hang indefinitely
teardownCtx, cancel := context.WithTimeout(ctx, teardownTimeout)
Copy link
Copy Markdown
Contributor

@danskmt danskmt Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The context ctx here can happen to be already-cancelled as it is derived from the parent context. Testing locally, I saw this issue happening:

Image

Here I suppose you should use a fresh context for tearing down:

teardownCtx, cancel := context.WithTimeout(context.Background(), teardownTimeout)

Also, the context parameter we can use ctxCancel, updating the function signature:
func tearDown(ctx context.Context, ... -> func tearDown(ctxCancel context.CancelFunc, ...

in the calls of tearDown (L664 and L716), we provide ctxCancel:

finalExitCode = tearDown(ctxCancel, signalError, errorListCopy, startTime, ua, cliAnalytics, networkAccess)
...
finalExitCode = tearDown(ctxCancel, err, errorListCopy, startTime, ua, cliAnalytics, networkAccess)

And after sendInstrumentation call (line 516) can terminate child processes:

sendInstrumentation(teardownCtx, globalEngine, cliAnalytics.GetInstrumentation(), globalLogger)
ctxCancel()

i.e. instead of calling ctxCancel in line 655, we do it here after sendInstrumentation, wdyt?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @danskmt ! Good catches! I marked the PR as draft as it is far from ready. There are plenty of local changes as well, so I prematurely marked this as to review. Sorry for that!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries! Let me know when you want me to review it again.

@PeterSchafer PeterSchafer marked this pull request as draft April 7, 2026 08:53
auto-merge was automatically disabled April 7, 2026 08:53

Pull request was converted to draft

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants