Skip to content

feat: companion_insights snapshot table for time-series history #61

@enriquephl

Description

@enriquephl

Summary

engine.companion_insights stores one row per user (PK user_id) and InsightRepo::merge does a JSONB shallow-merge + UPSERT (crates/eros-engine-store/src/insight.rs:84-112). The row reflects only the latest state — there is no record of how the JSONB evolved over time.

Anything downstream that wants to observe the timeline (dedup logic, drift analysis, audit trails, etc.) currently has no observation point: by the time a consumer reads, intermediate states are gone.

Proposal

Append-only snapshot table plus a periodic sweeper that mirrors pipeline::dreaming::sweeper in shape, but does no LLM call and no transformation.

Schema

CREATE TABLE engine.companion_insights_snapshot (
    id              UUID        PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id         UUID        NOT NULL,
    insights        JSONB       NOT NULL,
    training_level  DOUBLE PRECISION NOT NULL,
    captured_at     TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX idx_companion_insights_snapshot_user_time
    ON engine.companion_insights_snapshot (user_id, captured_at DESC);

Sweeper

Background tokio::spawn'd task:

  • Tick: SNAPSHOT_TICK_SECS (default 300), SNAPSHOT_DISABLED=1 to disable
  • Each tick: for every user whose companion_insights.updated_at > MAX(captured_at) (for that user), insert one row carrying the current insights + training_level
  • Optional cheap optimisation: skip the insert if the JSONB equals the latest existing snapshot — the UPSERT in InsightRepo::merge bumps updated_at even when the merged JSONB didn't actually change, so polling alone would produce duplicate rows

Explicitly not in scope:

  • No LLM calls
  • No dedup / merge / classification logic
  • No writes to companion_insights or human_insights

This table is pure write-through history capture. Whatever policy a consumer wants (collapsing near-duplicates, semantic merge, retention) is theirs to build on top.

Why a separate table

companion_insights is the "freshest merged JSONB the prompt-builder should read" view, and that role wants single-row UPSERT semantics. Turning it into a stream would either bloat the prompt read path or force a more expensive query. Splitting reads (companion_insights) from history (companion_insights_snapshot) keeps each table single-purpose.

Open questions

  1. Polling vs UPSERT trigger. Polling matches the dreaming-lite cadence story and is simpler; a trigger has zero lag but adds write-path complexity. Polling preferred unless there's a known sub-minute-granularity need.
  2. Retention. My instinct is to leave TTL to operators rather than ship a built-in cleaner. Open to either.
  3. Initial cadence. 300s mirrors dreaming-lite; happy to take input on a saner default.

Happy to put up a PR if the approach lands.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions