Summary
engine.companion_insights stores one row per user (PK user_id) and InsightRepo::merge does a JSONB shallow-merge + UPSERT (crates/eros-engine-store/src/insight.rs:84-112). The row reflects only the latest state — there is no record of how the JSONB evolved over time.
Anything downstream that wants to observe the timeline (dedup logic, drift analysis, audit trails, etc.) currently has no observation point: by the time a consumer reads, intermediate states are gone.
Proposal
Append-only snapshot table plus a periodic sweeper that mirrors pipeline::dreaming::sweeper in shape, but does no LLM call and no transformation.
Schema
CREATE TABLE engine.companion_insights_snapshot (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL,
insights JSONB NOT NULL,
training_level DOUBLE PRECISION NOT NULL,
captured_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX idx_companion_insights_snapshot_user_time
ON engine.companion_insights_snapshot (user_id, captured_at DESC);
Sweeper
Background tokio::spawn'd task:
- Tick:
SNAPSHOT_TICK_SECS (default 300), SNAPSHOT_DISABLED=1 to disable
- Each tick: for every user whose
companion_insights.updated_at > MAX(captured_at) (for that user), insert one row carrying the current insights + training_level
- Optional cheap optimisation: skip the insert if the JSONB equals the latest existing snapshot — the UPSERT in
InsightRepo::merge bumps updated_at even when the merged JSONB didn't actually change, so polling alone would produce duplicate rows
Explicitly not in scope:
- No LLM calls
- No dedup / merge / classification logic
- No writes to
companion_insights or human_insights
This table is pure write-through history capture. Whatever policy a consumer wants (collapsing near-duplicates, semantic merge, retention) is theirs to build on top.
Why a separate table
companion_insights is the "freshest merged JSONB the prompt-builder should read" view, and that role wants single-row UPSERT semantics. Turning it into a stream would either bloat the prompt read path or force a more expensive query. Splitting reads (companion_insights) from history (companion_insights_snapshot) keeps each table single-purpose.
Open questions
- Polling vs UPSERT trigger. Polling matches the dreaming-lite cadence story and is simpler; a trigger has zero lag but adds write-path complexity. Polling preferred unless there's a known sub-minute-granularity need.
- Retention. My instinct is to leave TTL to operators rather than ship a built-in cleaner. Open to either.
- Initial cadence. 300s mirrors dreaming-lite; happy to take input on a saner default.
Happy to put up a PR if the approach lands.
Summary
engine.companion_insightsstores one row per user (PKuser_id) andInsightRepo::mergedoes a JSONB shallow-merge + UPSERT (crates/eros-engine-store/src/insight.rs:84-112). The row reflects only the latest state — there is no record of how the JSONB evolved over time.Anything downstream that wants to observe the timeline (dedup logic, drift analysis, audit trails, etc.) currently has no observation point: by the time a consumer reads, intermediate states are gone.
Proposal
Append-only snapshot table plus a periodic sweeper that mirrors
pipeline::dreaming::sweeperin shape, but does no LLM call and no transformation.Schema
Sweeper
Background
tokio::spawn'd task:SNAPSHOT_TICK_SECS(default 300),SNAPSHOT_DISABLED=1to disablecompanion_insights.updated_at > MAX(captured_at)(for that user), insert one row carrying the currentinsights+training_levelInsightRepo::mergebumpsupdated_ateven when the merged JSONB didn't actually change, so polling alone would produce duplicate rowsExplicitly not in scope:
companion_insightsorhuman_insightsThis table is pure write-through history capture. Whatever policy a consumer wants (collapsing near-duplicates, semantic merge, retention) is theirs to build on top.
Why a separate table
companion_insightsis the "freshest merged JSONB the prompt-builder should read" view, and that role wants single-row UPSERT semantics. Turning it into a stream would either bloat the prompt read path or force a more expensive query. Splitting reads (companion_insights) from history (companion_insights_snapshot) keeps each table single-purpose.Open questions
Happy to put up a PR if the approach lands.