fix: error-output sanitization + process-lifecycle hardening (#111)#118
Merged
Conversation
Four findings from the 2026-05-31 audit (3×P2 + 1×P3), all in the error/process
lifecycle layer:
1. sanitizeError() helper — streaming error paths sent raw claude error_message /
stderr to the client, leaking home-dir / credential-file paths that the
non-streaming path already redacted. Factored the path-strip regex into one
helper and applied it at all 9 client-facing jsonResponse/sendSSE error emits;
de-duped the 3 pre-existing inline .replace() sites. Operator-log calls
(logEvent/trackError) and admin-gated endpoints left raw by design.
2. res.on("close") SIGKILL escalation — a client disconnect sent only SIGTERM; a
SIGTERM-resistant child held its concurrency slot until the request timeout
(narrow #37 on the hottest exit path). Now escalates to SIGKILL 5s after SIGTERM,
cleared on proc exit. Per review: gated on the child still being alive
(exitCode===null && signalCode===null) so the normal-success close no longer
fires a spurious SIGTERM or leaks the 5s timer; killTimer.unref() so a genuine
disconnect timer never delays graceful shutdown.
3. Per-key quota TOCTOU — documented as best-effort/eventually-consistent (inline
comment + README note): concurrent requests at the boundary can overshoot by up
to MAX_CONCURRENT and cache hits are uncounted. Chose documentation over an
in-flight counter to avoid a decrement-on-all-paths liability (the #37 class) on
a low-blast-radius internal family rate-limiter — not a payment boundary.
4. [P3] overallTimer cleared on semantic completion — the request timer was cleared
only on proc exit, so a streamed response that res.end()'d before the child
exited could record a spurious post-success timeout. New clearOverallTimer()
(clears the timer ONLY, never touches the `cleaned` slot-accounting flag — no
slot leak) is called in the streaming stop-success path; cleanup() on exit still
clears it idempotently and decrements the slot.
ALIGNMENT.md: error-shaping / process-lifecycle / rate-limit documentation forward
no Anthropic operation, so a cli.js citation is N/A under Rule 2. No blacklisted
tokens or port literals introduced; alignment.yml passes.
Independent fresh-context reviewer (opus): APPROVE WITH MINOR (Iron Rule 10) — the
two critical concerns (clearOverallTimer slot-leak class, SIGKILL double-kill) were
verified clean; MINOR #1 (kill-timer leak on success path) folded in; MINOR #2
(pre-existing regex over-redaction of ratios/URLs) left as out-of-scope.
Closes #111.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes the four error-output / process-lifecycle findings from the 2026-05-31 audit (#111, 3×P2 + 1×P3).
sanitizeError()helper — streaming error paths sent raw claudeerror_message/stderr(home-dir/credential paths) to clients while the non-streaming path redacted them. One helper now applied at all 9 client-facingjsonResponse/sendSSEerror emits; 3 pre-existing inline.replace()sites de-duped. Operator logs + admin-gated endpoints left raw by design.res.on("close")SIGKILL escalation — client disconnect sent only SIGTERM; a SIGTERM-resistant child held its slot until the request timeout (narrow ops: Mac OCP recurring daily hang — 500 concurrency (8/8) on v3.10.0 #37 on the hottest exit path). Now escalates to SIGKILL 5s after SIGTERM, cleared on exit. Per review, gated on the child still being alive (exitCode===null && signalCode===null) so the normal-success close no longer fires a spurious SIGTERM or leaks the timer;killTimer.unref().MAX_CONCURRENT, cache hits uncounted. Chose documentation over an in-flight counter to avoid a decrement-on-all-paths liability (the ops: Mac OCP recurring daily hang — 500 concurrency (8/8) on v3.10.0 #37 class) on a low-blast-radius internal family limiter.overallTimercleared on semantic completion — newclearOverallTimer()(clears the timer ONLY — never touches thecleanedslot flag, so no slot leak) called in the streaming stop-success path;cleanup()on exit still clears + decrements.ALIGNMENT.md (server.mjs hard requirements)
alignment.ymlpasses.clearOverallTimerdoes not reintroduce the ops: Mac OCP recurring daily hang — 500 concurrency (8/8) on v3.10.0 #37 slot-leak class (only clears timer, never setscleaned; slot still decremented via the untouchedproc.once("exit", cleanup)), and no SIGKILL double-kill hazard — plus full sanitizeError coverage, quota-doc honesty, ALIGNMENT N/A. Verdict APPROVE WITH MINOR; MINOR #1 (success-path kill-timer leak) folded in; MINOR #2 (pre-existing path-regex over-redaction of ratios/URLs) left out-of-scope.npm test→ 152 passed, 0 failed.Closes #111.
🤖 Generated with Claude Code