Skip to content

Commit b0709b8

Browse files
enriquephlclaude
andauthored
feat: v0.1.3 — fallback model array + OPENROUTER_USAGE_HIDDEN_KEYS (#22)
* feat(llm): plumb fallback_model as Vec<String> with FallbackSpec untagged enum * feat(llm): walk full fallback_model chain on retry, filter empty candidates * feat(server): add OPENROUTER_USAGE_HIDDEN_KEYS env-driven usage filter config * feat(api): strip configured usage keys from sync /message response * chore(examples): switch chat_companion to fallback array form The "fallback will be a list in a future iteration" TODO from PR #20 is no longer accurate — v0.1.3 implements the array. Switch the chat_companion task to the new array form to exercise the new code path in the example, leave the other tasks on the single-string form so the legacy compat path stays covered. Update the surrounding comment to describe the dual-shape contract + explicit-empty opt-out. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: document OPENROUTER_USAGE_HIDDEN_KEYS env EN + ZH README Configuration tables gain a row for the new deployer-set env var; .env.example gains a commented example explaining why a deployer might want to hide cost from a downstream-facing response without losing operator tracing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(llm-audit): document OPENROUTER_USAGE_HIDDEN_KEYS Add a "Hiding fields from the response" section under the Outbound usage echo block (EN + ZH). Explains the env var semantics, scope (sync /message only), tracing-unaffected guarantee, the top-level-only behaviour (list a parent key to suppress its entire subtree), and the pass-through default. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(release): bump workspace to 0.1.3 + regen OpenAPI snapshot Ships the fallback-model array support (TaskConfig.fallback can be a single string or an ordered array; OpenRouterClient::execute walks the chain sequentially) plus the OPENROUTER_USAGE_HIDDEN_KEYS deployer-set env var that strips named keys from the sync /message response's usage echo while leaving tracing intact. crates.io republish + GHCR build + fly deploy follow this commit's merge. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(examples): extend fallback array to insight + memory extraction User-driven follow-up to T5: insight_extraction now fans out to two fallback models (anthropic/claude-haiku-4.5 then deepseek/deepseek-v4-flash) instead of one. memory_extraction primary swapped to x-ai/grok-4.1-fast with the same two-entry fallback chain. Exercises the new chain-walker on more code paths in production traffic. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 88d725b commit b0709b8

16 files changed

Lines changed: 682 additions & 72 deletions

File tree

.env.example

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,13 @@ OPENROUTER_API_KEY=sk-or-...
1414
# OPENROUTER_APP_REFERER=https://eros.example
1515
# OPENROUTER_APP_TITLE=Eros
1616

17+
# OpenRouter response-side usage filter (optional). Comma-separated list
18+
# of top-level keys to strip from the sync /message response's `usage`
19+
# object before it leaves the engine. Tracing is unaffected. Hide your
20+
# wholesale cost from downstream customers without losing operator
21+
# observability.
22+
# OPENROUTER_USAGE_HIDDEN_KEYS=cost,cost_details
23+
1724
# Supabase project — same as eros-gateway.
1825
# Setting SUPABASE_URL is enough on modern projects: the engine derives the
1926
# JWKS URL (${SUPABASE_URL}/auth/v1/.well-known/jwks.json) and validates

Cargo.lock

Lines changed: 4 additions & 4 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ resolver = "2"
33
members = ["crates/*"]
44

55
[workspace.package]
6-
version = "0.1.2"
6+
version = "0.1.3"
77
edition = "2021"
88
license = "AGPL-3.0-only"
99
repository = "https://github.com/etherfunlab/eros-engine"

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -214,6 +214,7 @@ The `AuthValidator` trait is pluggable if you use a different identity provider.
214214
| `OPENROUTER_API_KEY` | yes | Chat completions, routed by `examples/model_config.toml` unless overridden. |
215215
| `OPENROUTER_APP_REFERER` | no | When set, sent as `HTTP-Referer` on every outbound OpenRouter call. Shows up on OpenRouter's app analytics dashboard. |
216216
| `OPENROUTER_APP_TITLE` | no | When set, sent as `X-Title`. Display name in OpenRouter app analytics. Pairs with `OPENROUTER_APP_REFERER`; both optional. |
217+
| `OPENROUTER_USAGE_HIDDEN_KEYS` | no | Comma-separated list of top-level keys to strip from the sync `/message` response's `usage` object. Useful for hiding wholesale `cost` / `cost_details` from downstream customers. Server-side tracing is unaffected. |
217218
| `VOYAGE_API_KEY` | yes | Embeddings. Empty keys fail server boot. |
218219
| `SUPABASE_URL` | no | Supabase project URL. Kept in `.env.example` for client/deploy conventions; the server does not read it today. |
219220
| `SUPABASE_JWT_SECRET` | yes | JWT signing secret for default auth. |

README.zh.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -174,6 +174,7 @@ Server 默認監聽 `0.0.0.0:8080`。Scalar API docs 在 `/docs`,OpenAPI JSON
174174
| `OPENROUTER_API_KEY` || Chat completions;默認由 `examples/model_config.toml` 路由。 |
175175
| `OPENROUTER_APP_REFERER` || 設了之後每次出站 OpenRouter 調用都帶 `HTTP-Referer`。會出現在 OpenRouter 的 app 分析面板上。 |
176176
| `OPENROUTER_APP_TITLE` || 設了之後帶 `X-Title`。OpenRouter app analytics 顯示名稱。和 `OPENROUTER_APP_REFERER` 一對;兩個都可選。 |
177+
| `OPENROUTER_USAGE_HIDDEN_KEYS` || 逗号分隔的顶层 key 列表,从 sync `/message` 响应的 `usage` 对象里剔除。常用于把批发 `cost` / `cost_details` 隐藏起来不外泄给下游客户。服务器端 tracing 不受影响。 |
177178
| `VOYAGE_API_KEY` || Embeddings。空 key 會拒絕啟動。 |
178179
| `SUPABASE_URL` || Supabase project URL。保留在 `.env.example` 裡方便 client / deploy 約定;目前 server 不讀取它。 |
179180
| `SUPABASE_JWT_SECRET` || 默認 auth 使用的 JWT signing secret。 |

crates/eros-engine-llm/Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ keywords = ["companion", "openrouter", "voyage", "embeddings", "llm"]
1212
categories = ["api-bindings"]
1313

1414
[dependencies]
15-
eros-engine-core = { path = "../eros-engine-core", version = "0.1.2" }
15+
eros-engine-core = { path = "../eros-engine-core", version = "0.1.3" }
1616
serde = { workspace = true }
1717
serde_json = { workspace = true }
1818
reqwest = { workspace = true }

crates/eros-engine-llm/src/model_config.rs

Lines changed: 184 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,26 @@ const FALLBACK_MODEL: &str = "x-ai/grok-4-mini";
1111
const FALLBACK_TEMPERATURE: f64 = 0.5;
1212
const FALLBACK_MAX_TOKENS: u32 = 200;
1313

14+
/// Per-task fallback shape — accepts either a single model id (legacy)
15+
/// or an ordered array. Normalised to `Vec<String>` via `into_vec()`
16+
/// in the resolver; empty entries are filtered out.
17+
#[derive(Debug, Clone, PartialEq, Eq, Deserialize)]
18+
#[serde(untagged)]
19+
pub enum FallbackSpec {
20+
Single(String),
21+
Multiple(Vec<String>),
22+
}
23+
24+
impl FallbackSpec {
25+
pub fn into_vec(self) -> Vec<String> {
26+
match self {
27+
FallbackSpec::Single(s) if s.is_empty() => Vec::new(),
28+
FallbackSpec::Single(s) => vec![s],
29+
FallbackSpec::Multiple(v) => v.into_iter().filter(|s| !s.is_empty()).collect(),
30+
}
31+
}
32+
}
33+
1434
#[derive(Debug, Clone, Default, Deserialize)]
1535
pub struct DefaultConfig {
1636
#[serde(default)]
@@ -30,9 +50,11 @@ pub struct TaskConfig {
3050
pub max_tokens: Option<u32>,
3151
#[serde(default)]
3252
pub description: String,
33-
/// Secondary model identifier used if the primary fails.
53+
/// Secondary model(s) tried in order on primary failure. Accepts a
54+
/// single string (legacy) or an array. Empty (`""` or `[]`) is an
55+
/// explicit opt-out and suppresses `defaults.fallback_model`.
3456
#[serde(default)]
35-
pub fallback: Option<String>,
57+
pub fallback: Option<FallbackSpec>,
3658
/// Embedding-only: vector dimensions.
3759
#[serde(default)]
3860
pub dimensions: Option<u32>,
@@ -47,10 +69,15 @@ pub struct ModelConfig {
4769
}
4870

4971
/// Resolved model parameters for an LLM call.
72+
///
73+
/// `fallback_model` is intentionally singular-named even though it's a
74+
/// `Vec<String>`: semantically only ONE fallback is ever used per call
75+
/// (the chain is tried sequentially, first success wins). Plural naming
76+
/// would mislead readers into thinking the candidates run in parallel.
5077
#[derive(Debug, Clone)]
5178
pub struct ResolvedModel {
5279
pub model: String,
53-
pub fallback_model: Option<String>,
80+
pub fallback_model: Vec<String>,
5481
pub temperature: f64,
5582
pub max_tokens: u32,
5683
}
@@ -87,9 +114,13 @@ impl ModelConfig {
87114
.or_else(|| self.defaults.fallback_model.clone())
88115
.unwrap_or_else(|| FALLBACK_MODEL.to_string());
89116

90-
let fallback_model = task_cfg
91-
.and_then(|t| t.fallback.clone())
92-
.or_else(|| self.defaults.fallback_model.clone());
117+
// Precedence: explicit per-task `fallback` (even when empty)
118+
// wins over defaults. Only an absent per-task value inherits
119+
// the singleton from `[defaults] fallback_model`.
120+
let fallback_model: Vec<String> = match task_cfg.and_then(|t| t.fallback.as_ref()) {
121+
Some(spec) => spec.clone().into_vec(),
122+
None => self.defaults.fallback_model.iter().cloned().collect(),
123+
};
93124

94125
let temperature = task_cfg
95126
.and_then(|t| t.temperature)
@@ -175,6 +206,11 @@ max_tokens = 600
175206
let cfg = ModelConfig::from_toml_str(SAMPLE).unwrap();
176207
let r = cfg.resolve("nonexistent_task", None);
177208
assert_eq!(r.model, "x-ai/grok-4-mini");
209+
assert_eq!(
210+
r.fallback_model,
211+
vec!["x-ai/grok-4-mini".to_string()],
212+
"unknown task must inherit defaults.fallback_model as a singleton"
213+
);
178214
assert_eq!(r.temperature, 0.5);
179215
assert_eq!(r.max_tokens, 200);
180216
}
@@ -259,8 +295,8 @@ description = "reserved — Voyage hard-codes its own model"
259295
let chat = cfg.tasks.get("chat_companion").unwrap();
260296
assert_eq!(chat.model, "x-ai/grok-4-fast");
261297
assert_eq!(
262-
chat.fallback.as_deref(),
263-
Some("deepseek/deepseek-chat-v3.2")
298+
chat.fallback.clone().expect("fallback present").into_vec(),
299+
vec!["deepseek/deepseek-chat-v3.2".to_string()]
264300
);
265301
assert_eq!(chat.temperature, Some(0.85));
266302
assert_eq!(chat.max_tokens, Some(600));
@@ -270,16 +306,20 @@ description = "reserved — Voyage hard-codes its own model"
270306
let insight = cfg.tasks.get("insight_extraction").unwrap();
271307
assert_eq!(insight.model, "x-ai/grok-4-mini");
272308
assert_eq!(
273-
insight.fallback.as_deref(),
274-
Some("deepseek/deepseek-chat-v3.2")
309+
insight
310+
.fallback
311+
.clone()
312+
.expect("fallback present")
313+
.into_vec(),
314+
vec!["deepseek/deepseek-chat-v3.2".to_string()]
275315
);
276316
assert_eq!(insight.temperature, Some(0.3));
277317
assert_eq!(insight.max_tokens, Some(400));
278318

279319
// pde_decision — reserved, partial fields.
280320
let pde = cfg.tasks.get("pde_decision").unwrap();
281321
assert_eq!(pde.model, "x-ai/grok-4-mini");
282-
assert_eq!(pde.fallback, None);
322+
assert!(pde.fallback.is_none());
283323
assert_eq!(pde.temperature, Some(0.5));
284324

285325
// embedding — reserved, with `dimensions` set.
@@ -291,8 +331,8 @@ description = "reserved — Voyage hard-codes its own model"
291331
let r = cfg.resolve("chat_companion", None);
292332
assert_eq!(r.model, "x-ai/grok-4-fast");
293333
assert_eq!(
294-
r.fallback_model.as_deref(),
295-
Some("deepseek/deepseek-chat-v3.2")
334+
r.fallback_model,
335+
vec!["deepseek/deepseek-chat-v3.2".to_string()]
296336
);
297337
assert_eq!(r.temperature, 0.85);
298338
assert_eq!(r.max_tokens, 600);
@@ -304,4 +344,135 @@ description = "reserved — Voyage hard-codes its own model"
304344
assert_eq!(r.temperature, 0.85);
305345
assert_eq!(r.max_tokens, 600);
306346
}
347+
348+
#[test]
349+
fn fallback_spec_deserializes_from_string() {
350+
let toml = r#"
351+
[tasks.chat_companion]
352+
model = "x"
353+
fallback = "y"
354+
"#;
355+
let cfg = ModelConfig::from_toml_str(toml).expect("parse ok");
356+
let t = cfg.tasks.get("chat_companion").unwrap();
357+
let v = t.fallback.clone().expect("fallback present").into_vec();
358+
assert_eq!(v, vec!["y".to_string()]);
359+
}
360+
361+
#[test]
362+
fn fallback_spec_deserializes_from_array() {
363+
let toml = r#"
364+
[tasks.chat_companion]
365+
model = "x"
366+
fallback = ["a", "b"]
367+
"#;
368+
let cfg = ModelConfig::from_toml_str(toml).expect("parse ok");
369+
let t = cfg.tasks.get("chat_companion").unwrap();
370+
let v = t.fallback.clone().expect("fallback present").into_vec();
371+
assert_eq!(v, vec!["a".to_string(), "b".to_string()]);
372+
}
373+
374+
#[test]
375+
fn fallback_spec_skips_empty_entries() {
376+
let toml = r#"
377+
[tasks.chat_companion]
378+
model = "x"
379+
fallback = ["", "a", ""]
380+
"#;
381+
let cfg = ModelConfig::from_toml_str(toml).expect("parse ok");
382+
let t = cfg.tasks.get("chat_companion").unwrap();
383+
let v = t.fallback.clone().expect("fallback present").into_vec();
384+
assert_eq!(v, vec!["a".to_string()]);
385+
}
386+
387+
#[test]
388+
fn fallback_spec_empty_string_collapses_to_empty_vec() {
389+
let toml = r#"
390+
[tasks.chat_companion]
391+
model = "x"
392+
fallback = ""
393+
"#;
394+
let cfg = ModelConfig::from_toml_str(toml).expect("parse ok");
395+
let t = cfg.tasks.get("chat_companion").unwrap();
396+
let v = t.fallback.clone().expect("fallback present").into_vec();
397+
assert!(v.is_empty());
398+
}
399+
400+
#[test]
401+
fn resolve_returns_empty_fallback_when_no_task_fallback_no_defaults() {
402+
let toml = r#"
403+
[tasks.chat_companion]
404+
model = "x"
405+
"#;
406+
let cfg = ModelConfig::from_toml_str(toml).expect("parse ok");
407+
let r = cfg.resolve("chat_companion", None);
408+
assert_eq!(r.model, "x");
409+
assert!(r.fallback_model.is_empty());
410+
}
411+
412+
#[test]
413+
fn resolve_returns_defaults_fallback_when_task_has_none() {
414+
let toml = r#"
415+
[defaults]
416+
fallback_model = "default-fb"
417+
418+
[tasks.chat_companion]
419+
model = "x"
420+
"#;
421+
let cfg = ModelConfig::from_toml_str(toml).expect("parse ok");
422+
let r = cfg.resolve("chat_companion", None);
423+
assert_eq!(r.fallback_model, vec!["default-fb".to_string()]);
424+
}
425+
426+
#[test]
427+
fn resolve_task_array_overrides_defaults() {
428+
let toml = r#"
429+
[defaults]
430+
fallback_model = "default-fb"
431+
432+
[tasks.chat_companion]
433+
model = "x"
434+
fallback = ["a", "b"]
435+
"#;
436+
let cfg = ModelConfig::from_toml_str(toml).expect("parse ok");
437+
let r = cfg.resolve("chat_companion", None);
438+
assert_eq!(r.fallback_model, vec!["a".to_string(), "b".to_string()]);
439+
}
440+
441+
#[test]
442+
fn resolve_empty_array_suppresses_defaults() {
443+
let toml = r#"
444+
[defaults]
445+
fallback_model = "default-fb"
446+
447+
[tasks.chat_companion]
448+
model = "x"
449+
fallback = []
450+
"#;
451+
let cfg = ModelConfig::from_toml_str(toml).expect("parse ok");
452+
let r = cfg.resolve("chat_companion", None);
453+
assert!(
454+
r.fallback_model.is_empty(),
455+
"explicit empty array must suppress defaults; got {:?}",
456+
r.fallback_model
457+
);
458+
}
459+
460+
#[test]
461+
fn resolve_empty_string_suppresses_defaults() {
462+
let toml = r#"
463+
[defaults]
464+
fallback_model = "default-fb"
465+
466+
[tasks.chat_companion]
467+
model = "x"
468+
fallback = ""
469+
"#;
470+
let cfg = ModelConfig::from_toml_str(toml).expect("parse ok");
471+
let r = cfg.resolve("chat_companion", None);
472+
assert!(
473+
r.fallback_model.is_empty(),
474+
"explicit empty string must suppress defaults; got {:?}",
475+
r.fallback_model
476+
);
477+
}
307478
}

0 commit comments

Comments
 (0)