Scope
internal/logger carries broader quality problems beyond the (now-fixed) exit
lag. This issue tracks the tiered cleanup / rewrite plan below.
Tier 0 — the CLI exit lag — is fixed and shipped in #388. Logger.Close
is now caller-owned (Close(ctx); the hardcoded 5s context.Background() is
gone, the caller's ctx is the flush deadline), plus review follow-ups that
removed the context.Canceled/DeadlineExceeded suppression so Close
returns the true provider.Shutdown outcome, and made clawker-cp surface
its close error to stderr without poisoning the teardown-integrity exit code.
The collector-down Close timeout path now has regression coverage
(TestClose_CanceledContext_ReturnsPromptly,
TestClose_LiveContext_BoundedByExportTimeout). Tiers 1–4 below remain.
Architectural intent that still governs the rewrite: enabling telemetry must
never make the CLI hang or crash when the collector is down, and this package is
imported on the CP boot path, where panic/os.Exit footguns matter given
the "CP crash is a security incident" invariant.
internal/logger package review — cleanup plan (tiered)
A package-isolated Go best-practices review (5 lenses), grounded against pinned
deps (otel/sdk/log@v0.20.0, otelslog@v0.19.0, rs/zerolog@v1.35.1) and stdlib
log/slog. Tracking the findings here as a sequenced cleanup.
Tier 1 — API ergonomics
- Fallible construction + fallible accessor force the
if log, err := f.Logger(); err == nil { ... } guard at ~34 call sites
(and silently drop the log line on the error path). Make the logger
infallible: the accessor returns a bare, never-nil *Logger (degrade to
Nop()/stderr internally); keep the error only at the single factory wiring
site.
*zerolog.Event leaks through the public API (Debug()/Info()/… return it)
→ the wrapper isn't a real boundary, the backend isn't swappable, and it
necessitates the Zerolog() escape hatch.
Compress bool zero-value lie: doc says default true, but the zero value
is false and there's no coercer (unlike the numeric rotation fields).
Tier 2 — OTEL↔zerolog bridge correctness (otel_writer.go)
The bridge serializes zerolog → JSON bytes, json.Unmarshals, and re-emits as a
log.Record. Anti-pattern — no official bridge does this (otelslog/otelzap
read typed fields directly → typed log.KeyValue). The round-trip causes real
data corruption:
- int64↔float64 type flipping; >2^53 precision loss on IDs/byte-counts
(everything decodes to float64, then is heuristically guessed back to int).
- nested objects/arrays re-stringified instead of
log.MapValue/SliceValue.
[]byte → base64 string instead of KindBytes.
- timestamps truncated to whole seconds (zerolog default
TimeFieldFormat = RFC3339; the package never overrides it).
zerolog.TraceLevel dropped → SeverityUndefined.
CLAUDE.md's "all structured fields preserved" is inaccurate.
Grounded nuance (deepwiki on rs/zerolog): zerolog — unlike slog/zap — does
not expose typed fields to a Hook or LevelWriter (fields are mid-flight
JSON in e.buf; substance of rs/zerolog#493). A LevelWriter delivers the
level directly (structurally fixes the severity mapping), but full type
fidelity requires capturing typed fields at clawker's own *Logger boundary
before zerolog serializes — i.e. this rewrite depends on the Tier 1 API
change (stop passing through the raw *zerolog.Event).
Tier 3 — safety / hygiene (matters on the CP boot path)
otel.SetErrorHandler(...) is a process-global mutation inside the
constructor (last-writer-wins, test pollution).
Fatal() on a shared library logger → os.Exit footgun; per the CP-crash
invariant an unintended exit strands eBPF. (3 external callers.)
With(...) panics on odd-count / non-string key — a log call must never
crash a (CP) goroutine; slog degrades with !BADKEY.
Tier 4 — nits + test gaps
interface{} → any (the file is the lone holdout); dead/reserved mTLS
file-path triple (YAGNI); the sync.Pool in the writer buys little;
thread-safety doc overstated (post-Close emits are silently dropped).
- Test gaps: no concurrency test under
-race despite the thread-safety
claim; OTEL fallback asserted only as "didn't error"; low-value accessor tests
(Options_Custom, LogFilePath, Compress).
Proposed sequencing
Recommendation: B now, C as an immediate focused follow-up. TDD, adding the
missing -race concurrency test and the typed-field bridge fidelity tests.
Related (out of scope here)
A separate, smaller command-exit concern: the changelog notifier has no HTTP
cache/TTL and fetches raw.githubusercontent.com on every interactive run (the
update check is 24h-TTL-gated). Tracked separately — not part of this logger issue.
Scope
internal/loggercarries broader quality problems beyond the (now-fixed) exitlag. This issue tracks the tiered cleanup / rewrite plan below.
Architectural intent that still governs the rewrite: enabling telemetry must
never make the CLI hang or crash when the collector is down, and this package is
imported on the CP boot path, where
panic/os.Exitfootguns matter giventhe "CP crash is a security incident" invariant.
internal/loggerpackage review — cleanup plan (tiered)A package-isolated Go best-practices review (5 lenses), grounded against pinned
deps (
otel/sdk/log@v0.20.0,otelslog@v0.19.0,rs/zerolog@v1.35.1) and stdliblog/slog. Tracking the findings here as a sequenced cleanup.Tier 1 — API ergonomics
if log, err := f.Logger(); err == nil { ... }guard at ~34 call sites(and silently drop the log line on the error path). Make the logger
infallible: the accessor returns a bare, never-nil
*Logger(degrade toNop()/stderr internally); keep theerroronly at the single factory wiringsite.
*zerolog.Eventleaks through the public API (Debug()/Info()/…return it)→ the wrapper isn't a real boundary, the backend isn't swappable, and it
necessitates the
Zerolog()escape hatch.Compress boolzero-value lie: doc says defaulttrue, but the zero valueis
falseand there's no coercer (unlike the numeric rotation fields).Tier 2 — OTEL↔zerolog bridge correctness (
otel_writer.go)The bridge serializes zerolog → JSON bytes,
json.Unmarshals, and re-emits as alog.Record. Anti-pattern — no official bridge does this (otelslog/otelzapread typed fields directly → typed
log.KeyValue). The round-trip causes realdata corruption:
(everything decodes to
float64, then is heuristically guessed back to int).log.MapValue/SliceValue.[]byte→ base64 string instead ofKindBytes.TimeFieldFormat = RFC3339; the package never overrides it).zerolog.TraceLeveldropped →SeverityUndefined.CLAUDE.md's "all structured fields preserved" is inaccurate.
Grounded nuance (deepwiki on
rs/zerolog): zerolog — unlike slog/zap — doesnot expose typed fields to a
HookorLevelWriter(fields are mid-flightJSON in
e.buf; substance of rs/zerolog#493). ALevelWriterdelivers thelevel directly (structurally fixes the severity mapping), but full type
fidelity requires capturing typed fields at clawker's own
*Loggerboundarybefore zerolog serializes — i.e. this rewrite depends on the Tier 1 API
change (stop passing through the raw
*zerolog.Event).Tier 3 — safety / hygiene (matters on the CP boot path)
otel.SetErrorHandler(...)is a process-global mutation inside theconstructor (last-writer-wins, test pollution).
Fatal()on a shared library logger →os.Exitfootgun; per the CP-crashinvariant an unintended exit strands eBPF. (3 external callers.)
With(...)panics on odd-count / non-string key — a log call must nevercrash a (CP) goroutine;
slogdegrades with!BADKEY.Tier 4 — nits + test gaps
interface{}→any(the file is the lone holdout); dead/reserved mTLSfile-path triple (YAGNI); the
sync.Poolin the writer buys little;thread-safety doc overstated (post-
Closeemits are silently dropped).-racedespite the thread-safetyclaim; OTEL fallback asserted only as "didn't error"; low-value accessor tests
(
Options_Custom,LogFilePath,Compress).Proposed sequencing
Recommendation: B now, C as an immediate focused follow-up. TDD, adding the
missing
-raceconcurrency test and the typed-field bridge fidelity tests.Related (out of scope here)
A separate, smaller command-exit concern: the changelog notifier has no HTTP
cache/TTL and fetches
raw.githubusercontent.comon every interactive run (theupdate check is 24h-TTL-gated). Tracked separately — not part of this logger issue.