chore: sync main with dev (PRs #48-#56 promoted)#58
Merged
Conversation
dev is the integration branch for new work; it carries 0.4.21-dev and
promotes to 0.4.21 (suffix dropped) on stable release.
release-docker.yml now picks the moving tag by version suffix: -dev cuts
push :{version}-dev + :latest-dev (never :latest); stable cuts push
:{version} + :latest. One workflow, suffix-driven.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…elds (#44) Opt-in chat-reply output filter (output_filter + [tasks.chat_output_filter]: model/fallback/retry_depth/filter_prompt/trigger/timing) + new SSE final fields (filtered, prompt_injected, tier, retries_chat, retries_filter). Codex P2s addressed (gated-traits trigger, task-level filter token docs).
…tes (#45) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Optional tips_amount_usd on POST /comp/chat/{id}/message/stream: companion always replies (never ghosted) with an amount-aware, tip_personality-flavored prompt fragment. Empty content allowed for standalone tips (persisted as a "(打赏 $N)" marker); PDE rule-0 guard forces Reply with Neutral/Tsundere baseline style; free-form tip_personality injected verbatim. No affinity special-casing, no new endpoint, no migration. Spec: docs/superpowers/specs/2026-05-26-tips-stream-reply-design.md
Stylized scheme: 0.4.20/0.4.21 read as 0.4.2 / 0.4.2-1, so the next track after the 0.4.2x line is 0.4.3 (→ 0.4.3-dev), not 0.4.22. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
The release-docker workflow builds linux/amd64 only (arm64 + qemu were dropped as of v0.4.20); README still claimed multi-arch. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…_traits metadata (#52) * docs(spec): tip role (gift_user) + chat-reply filter audit columns Design for issue #51 plus persisting the chat-reply output filter's pre-rewrite text and metadata. Bundles into one chat_messages migration: * metadata JSONB — tip rows carry {tips_amount_usd: X}; BFF history exposes the structured amount, role flips to gift_user. * pre_filter_content / filter_model / filter_triggers / f_client_msg_id / f_generation_id — written only on filtered-success assistant rows. Supersedes 2026-05-25-chat-output-filter-design.md §2.6 (in-memory-only original). No DTO surface for the filter audit columns. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(store): migration 0019 — tip metadata + filter audit columns Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(store): upsert_user_message_idempotent takes role + metadata * feat(store): FilterAudit struct + assistant insert binds 5 audit columns * refactor(store): FilterAudit.f_generation_id is Option<String> Allows the SQL NULL to propagate when OpenRouter's filter response omits generation_id. Avoids an .unwrap_or_default() at the Task 7 call site that would have stored "" for a legitimately-missing value. * feat(llm): should_filter returns Option<TriggerHits> with hit detail - Add TriggerHits { random, models, traits } + RandomHit { p, draw } types (skip_serializing_if = Option::is_none so stored JSONB only includes fired predicates) - Change should_filter(…, random_pass: bool) -> bool to should_filter(…, random_draw: Option<f64>) -> Option<TriggerHits> - Change turn_level_pass(random_pass: bool, …) signature to turn_level_pass(random_draw: Option<f64>, …) - Absent-predicate trait hits recorded as empty vec (nothing to enumerate when predicate fires on non-presence) - 5 new tests + should_filter_predicate_combinations updated for new API - eros-engine-server pipeline/stream.rs still calls old bool API; that call site is intentionally deferred to Task 7 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(llm): guard random-misuse fallback + empty TriggerHits JSON shape Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(stream): tip path persists role=gift_user + tips_amount_usd metadata * test(stream): pass role + metadata to upsert + filter_audit: None to AssistantInsert * feat(stream): filtered-success branch writes FilterAudit (5 columns) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(stream): filter_triggers serialize uses .expect + document MutexGuard drop * feat(bff): expose tips_amount_usd on history rows Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(spec): note 2026-05-26 supersedes 2026-05-25 §2.6 in-memory-only claim * chore: cargo fmt + regen openapi * feat(stream): record kept prompt_traits in chat_messages.metadata on every assistant row Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(pipeline): widen compute_signals role filter to include gift_user (codex P2) Tip turns persisting as gift_user (PR #52 / spec §3.1) were no longer counted by compute_signals_for_session, skewing message_count and hours_since_last_message signals. Widen both queries to role IN ('user','gift_user'), same pattern as the upsert dedup widening. Two sqlx::test cases added in pipeline::tests: - signals_count_includes_gift_user_rows: seeds 1 user + 1 gift_user row, asserts message_count == 2 - signals_count_user_only_rows: baseline regression for pure user rows * feat(metadata): record user tier-at-time on chat_messages + lock BFF surface to tips_amount_usd - companion_stream + drive_chat_burst now include {"tier": "<x>"} in chat_messages.metadata when the request carried a tier. Reason: tier table only has the user's CURRENT tier; the row should record what tier they had at message time. - BFF history negative test confirms only tips_amount_usd is surfaced; tier / prompt_traits / raw metadata all stay audit-only. - Spec §3.4 / §3.5 amended. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(pipeline): narrow signals query to tip-flagged gift_user rows only (codex P2 v2) Previous fix (5f5c09b) widened too far — it counted all gift_user rows including legacy in-app-gift rows written by routes/companion.rs:827 via append_message. Those rows lack tip metadata and never counted as user activity pre-PR. Narrow to: role = 'user' OR (role = 'gift_user' AND metadata ? 'tips_amount_usd') so only the new tip-replacing path counts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…depth + model recommendations (#53) * feat(v0.5.0): reasoning on filter + configurable chat retry_depth Item 1: Add reasoning: Option<ReasoningConfig> to ResolvedOutputFilter. resolve_output_filter() now reads it from [tasks.chat_output_filter] (task-level only, no per-tier override). run_output_filter() in stream.rs forwards it to the ChatRequest instead of relying on ..Default::default(). Item 2: Add retry_depth: u32 to ResolvedModel. resolve() computes it via tier > task > default 2 and truncates fallback_model in place before returning. Removes MAX_STREAM_FALLBACK_DEPTH=3 constant and the .take(MAX_STREAM_FALLBACK_DEPTH) call from drive_chat_burst — the chain is now [primary] + the already-capped fallback_model. Default of 2 gives the same 3-entry chain as before. Tier-overridable. Six new unit tests cover both items. * docs(model-config): rewrite chat_output_filter model recommendations gpt-5.4-nano primary (fast, stable). gemini-3.1-flash / zlm-4.7-flash fallbacks (real error responses -> fail-open works). Warn against gpt-4.1-nano (200-with-refusal masks failure) and haiku-4.5 (strict output alignment refuses to filter).
…in exhaustion (#54) * feat(store): error_handling_config kv table + 10-phrase seed (codex-generated) Add migration 0020 creating engine.error_handling_config (kind TEXT PK, payload JSONB) and seeding 10 casual pseudo-ghost phrases for the chat-stream failure fallback path. Add ErrorHandlingRepo::pick_chat_stream_fallback_phrase() helper with rand::seq::SliceRandom-based random selection. Returns None on missing row, empty array, or DB error so callers can fall back to the raw Error frame as a last resort. Three migration-level tests: seed count (exactly 10), picker round-trip against seed, picker returns None when kind deleted. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(stream): pseudo-ghost fallback on chain exhaustion When the chat-stream fallback chain exhausts, pick a configured phrase from engine.error_handling_config and emit Meta + Delta + Done frames as if the LLM returned a brief reply, instead of an Error frame. The assistant row is persisted with metadata.fallback_reason='stream_failure' for audit. Falls back to the original Error frame as a last resort if the config lookup fails. outcome.retries_chat is set to chain.len() so the Final frame correctly reflects all retries exhausted. Both live mode and filtered mode chain- exhaustion paths are covered. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(spec): error fallback config design Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: Cargo.lock update for rand 0.8 in eros-engine-store * fix(store): supabase RLS + revoke lockdown on error_handling_config (codex P2) Mirrors the 0013/0015 pattern: conditional REVOKE ALL from anon/authenticated, ENABLE ROW LEVEL SECURITY. Defense-in-depth for Supabase deployments that expose the engine schema via PostgREST. Also strips trailing whitespace from the spec doc. * fix(stream): persist pseudo-ghost row with model: None for replay idempotency (codex P2) Live stream emits Meta with model: None on the pseudo-ghost path. Persisting model: Some("__fallback_phrase__") meant replay_stream would feed that sentinel through display_override and surface a different meta.model than the original stream — a violation of the idempotent replay contract. Drop the sentinel; metadata.fallback_reason carries the audit signal. * fix(stream): pseudo-ghost retries_chat semantics + continues_from link (codex P2) Two findings from the second codex pass: 1. retries_chat over-reported: chain.len() includes the primary attempt; the field is documented as fallback retries consumed (0 when primary served). Fix both call sites to chain.len() - 1, and pass the same corrected value through to the metadata audit field. 2. continues_from was always None on the pseudo-ghost frame + persisted row. In live mode, the previous truncated bubble is already persisted and visible to the client; the pseudo-ghost should link to it so the replay path stitches the burst into one logical turn. Filtered mode leaves it None — that path never persists intermediate truncations. * fix(stream): replace produced list with pseudo-ghost on exhaustion (codex P2) When live mode exhausts the chain, outcome.produced still held the failed truncated attempts. Post-process (memory / affinity / insight extraction) would then run on those partial garbage outputs instead of the safe fallback phrase the user actually saw — and the old Error path bypassed post-process entirely, so this was a behavioral regression introduced by the pseudo-ghost path. Fix: helper now returns the produced message alongside the frames; call sites clear outcome.produced and push only the pseudo-ghost before yielding success frames. Filtered mode never populated produced, so clear() is a no-op there. * fix(stream): replay omits meta.model when persisted row.model is None (codex P2) Live stream emits Meta with model: None on the pseudo-ghost path. Previous replay_stream code did display(row.model.as_deref().unwrap_or_default()) which under model_name_display_override = true / fixed-string / map.default configurations would produce a non-None meta.model on replay, breaking wire-identical idempotent retry. Fix: only call display(...) when row.model is Some; otherwise emit None to mirror the live emission. Existing replay tests still pass; the display-override test continues to assert the Some(model) path correctly. * docs(spec): document inherited Final-frame replay divergence (codex P2 ack) Codex flagged that replay_stream emits Final with retries_chat=0, tier=None, prompt_injected=None on the pseudo-ghost path. That's the same divergence 2026-05-25-chat-output-filter-design.md §2.8 explicitly accepted for every completed turn — none of these Final-frame fields are persisted, so replay reconstructs them from current state rather than the original wire shape. The pseudo-ghost row DOES persist these values in metadata (audit-only). A future PR can extend replay_stream to read metadata.retries_chat / metadata.tier / metadata.prompt_traits if wire-identical Final replay becomes a requirement. Not in scope for this PR. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(llm): surface finish_reason on non-streaming ChatResponse
Lets callers gate on content_filter (Gemini-style mid-response safety
truncation, also used by OpenAI). Wire-level WireChoice gains the
field; ChatResponse exposes it as Option<String>. Default None for
existing constructors.
* feat(filter): defensive output validity gate + per-model chain walk
run_output_filter no longer trusts any HTTP 200 from the filter LLM.
After each per-model response, run filter_output_invalidity:
- refusal pattern in leading 120 chars (curated list, zh + en)
- response < 80 chars (short-and-refusal-verb OR plain too-short)
- finish_reason = content_filter (Gemini/OpenAI safety blocking)
On any of these, log the rejection and walk to the next model in the
chain. If the whole chain exhausts, return None as before (fail-open:
emit and persist the original reply). retries_filter index reflects
the model that passed validity, not just one that responded 200.
* docs(spec): chat_output_filter output validity gate design
* fix(filter): validity gate matches refusal patterns case-insensitively (codex P2)
Codex caught: original case-sensitive contains check missed common
English refusal variants like 'i'm sorry, but i can't ...' (lowercase
i) or 'as an ai ...' (lowercase a) — both real-world model outputs.
The 200-char-plus apology would slip past the gate and be persisted
as the filtered rewrite, which is exactly what this feature is meant
to prevent.
Fix: lowercase the head (and the short-text body) before contains.
Pattern table moved to lowercase form. char::to_lowercase is
Unicode-aware; CJK code points are unchanged, so Chinese patterns
still match exactly. Added a regression test covering lower / mixed /
upper case English apology shapes.
* feat(filter): record fail-open audit in chat_messages.metadata
When the validity gate rejects every model in the filter chain (or all
models error/timeout), the engine falls open and emits the original
reply — but now also writes a fail-open audit bag into metadata so ops
can count fail-open rate per period and see which models are refusing.
New metadata keys (only present when filter was triggered AND every
model failed):
filter_outcome = "fail_open"
f_client_msg_id = engine-generated ULID for this logical call
filter_attempts[] = [{model, reason}] per chain attempt
Reasons: refusal_pattern / too_short / content_filter / empty / error /
timeout. Trigger-not-fired and filter-not-configured rows stay
metadata-clean (filter_outcome absent), so ops can SELECT * FROM
chat_messages WHERE metadata->>'filter_outcome' = 'fail_open' to find
exactly the failure cases.
run_output_filter now returns Result<RunFilterOutcome, FilterFailOpen>
carrying the per-attempt audit log on the Err side.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs(spec): prompt enhancements + scope persistence design
Adds memory_scope/affinity_scope to chat_messages.metadata (pre-validation on user
rows, resolved on assistant rows) plus a raw prompt_traits audit on the user side
to surface frontend/backend allow-list mismatches.
Rewrites prompt.rs section headers to ASCII brackets (lighter on tokens, easier
to skim), adds a [recent_conversation] block carrying the prior three turn pairs,
and revises the iron rules: new positive-frame ⓪ in English plus a Japanese rewrite
of ③ that lists the actual pronoun and filler-word inventory.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(store): recent_turn_pairs for short-term memory injection
Returns up to N (user_or_gift_user, assistant) content pairs from a session,
filtered by truncated=false and capped at a cutoff timestamp. Used by the
chat pipeline to render [recent_conversation] in the system prompt.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* refactor(prompt): rename 16 section headers to ASCII brackets
【...】 → [...]. Same section order, same line breaks, same conditional
blocks — string substitution only. Saves a small amount of tokens per
turn and makes the prompt code easier to skim for non-CN readers. Cache
prefix boundary unchanged; per-persona stable-prefix tests still pass.
Also updates the one cross-module test assertion in pipeline/stream.rs
that pinned the old 【刚收到的打赏】 literal, so the full server crate
stays green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs: sweep stale 【...】 rustdoc references after header rename
Three doc-comments still named prompt sections by their old Chinese
literal — types.rs PromptTrait, handlers.rs hydrate_user_profile_bullets,
and routes/companion.rs PromptTraitDto. Doc-only fixup, no behavior change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fixup(prompt): finish header rename — openapi snapshot + public docs + test strings
After the 16-header rename:
- regenerate openapi.json so PromptTraitDto.text description reflects
[additional_guidance]; without this the openapi-snapshot CI check fails
- update docs/prompt-traits.{md,zh.md} which still described the layout
using 【附加指引】 / 【擅长话题】 / 【今日情境】 as live header names
- update 3 stale test comment / panic-message strings in prompt.rs that
still narrated the old Chinese labels
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fixup(openapi): regen snapshot cleanly (drop stray cargo log lines)
Prior fixup ran `cargo run > openapi.json` without --quiet/2>/dev/null,
which leaked 3 cargo build lines into the snapshot and broke JSON parse.
This regen uses --quiet so only the printer's JSON reaches the file.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(prompt): [recent_conversation] short-term memory block
build_prompt now takes recent_turns: &[(user, assistant)]. When non-empty,
renders a [recent_conversation] block between [now] and [iron_rules]
showing the last N turn pairs in 用户:X / {name}:Y form. Empty slice
omits the entire block including its header.
Call sites pass &[] for now; population wires up in a follow-up task.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fixup(prompt): empty recent_turns emits "\n" to preserve pre-PR blank line
Byte-stable empty-path: "[now]\n{tc}\n\n---" stays one blank line whether
[recent_conversation] renders or not. Single-char change, no behavior shift.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(prompt): iron rule ⓪ (positive identity) + Japanese rewrite of ③
⓪ "You are a real person ..." prepended for positive-frame identity
reinforcement (it's easier to say what to BE than what NOT to say).
③ rewritten in Japanese — Japanese has a richer pronoun inventory and
filler-word inventory than Chinese, so the rule lists the specific
alternates and quantifies "consecutive several" as "three or more
sentences". Reviewed via codex pass before commit.
Other iron rules ① ② ④ ⑤ ⑥ ⑦ ⑧ untouched.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(stream): persist raw memory_scope / affinity_scope / prompt_traits
User and gift_user rows now carry memory_scope_raw / affinity_scope_raw /
prompt_traits_raw in chat_messages.metadata when the request supplied
them. These hold the pre-validation, pre-resolve frontend payload, so
operators can diff against the post-resolve values on the matching
assistant row (Task 6) to spot allow-list misconfiguration or field-shape
drift between frontend and backend.
Keys are omitted when the source request field is None — JSONB stays sparse.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(stream): persist resolved memory_scope / affinity_scope on assistant rows
build_metadata and the pseudo-ghost fallback now write memory_scope
(snake_case enum string) and affinity_scope (6-bool record) into
chat_messages.metadata for every assistant row. Pairs with the _raw
values written on the matching user/gift_user row to enable a single
metadata->>'...' diff that surfaces frontend/backend allow-list or
shape mismatches.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(handlers): inject [recent_conversation] into per-turn system prompt
handlers.rs now fetches the prior 3 (user|gift_user, assistant) pairs via
ChatRepo::recent_turn_pairs at each build_prompt call site (chat + gift)
and threads them in. Cutoff = Utc::now() so the current-turn user row is
excluded from its own [recent_conversation] block.
Fetch failures degrade to empty (no short-term memory) with a warn-level
log — prompt assembly is non-fatal.
Completes the short-term memory layer: the system prompt now carries
long-term facts ([user_profile]), mid-term memories ([shared_memories]),
and the literal last three exchanges ([recent_conversation]).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* style: cargo fmt
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(handlers): cutoff [recent_conversation] at current user row's sent_at
Codex P2 on PR #56: Utc::now() cutoff is racy under concurrent streams on
the same session — a later already-completed turn could leak into the
current turn's [recent_conversation] block.
Adds ChatRepo::recent_turn_pairs_before_message which subqueries the
current msg's sent_at as the cutoff. handlers.rs threads user_message_id
through build_reply_request / build_gift_request to use it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore: silence clippy::too_many_arguments on build_gift_request
The codex P2 fix added user_message_id to build_gift_request, pushing it
to 8 args (over clippy's default 7). build_reply_request stayed at 7 so
needs no allow. Pure attribute addition; no behavior change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Absorb main's v0.4.21 squash node (4ebe078) without bringing in any of its content — main's logical content is already in dev's later commits. This fixes the ancestry-divergence conflict so the subsequent dev → main promote PR is mergeable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 26, 2026
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Promotes everything that landed on
devsince v0.4.21 intomain. No version bump — that lands in a separate follow-upchore(release): v0.5.0PR.This branch was created by
git merge -s ours origin/mainfromdev, which absorbs main's prior v0.4.21 squash node (4ebe078) without bringing any of its content in. main's logical content was already present in dev's later commits. The merge is a no-op for content; it only repairs ancestry so this PR is mergeable.What's included (PRs #48-#56)
Migrations
Two new:
0019_chat_tip_marker_and_filter_audit.sql,0020_error_handling_config.sql. Downstream operators must re-migrate after the release tag.Test plan
🤖 Generated with Claude Code