Conversation
There was a problem hiding this comment.
Pull request overview
This PR moves LLM configuration and execution concerns “up” into LMI by introducing typed, opinionated config models (ModelSpec, LLMConfig), shifting retry/fallback behavior into LMI (with clearer error semantics), and updating LDP agents/modules/tests to consume the new config shape.
Changes:
- Replace dict-shaped
llm_modelconfiguration with typedllm_config(LLMConfig/ModelSpec) across agents, graph modules, and tests. - Introduce LMI-owned retry/fallback policy and new error surfaces (e.g.,
AllModelsExhaustedError,ModelRefusalError) with targeted unit tests. - Update VCR cassettes to reflect new default request parameters (notably
max_tokens=4096) and updated client/library versions.
Reviewed changes
Copilot reviewed 52 out of 54 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_rollouts.py | Updates fallback behavior assertions to expect AllModelsExhaustedError and uses llm_config legacy-coerce shape. |
| tests/test_optimizer.py | Migrates optimizer tests to LLMConfig.coerce() and llm_config= construction. |
| tests/test_ops.py | Switches op tests to LLMConfig inputs and updates gradient expectations when config becomes a scalar leaf. |
| tests/test_modules.py | Updates module configs/instantiation to use llm_config defaults from ReActAgent. |
| tests/test_envs.py | Updates agent construction to llm_config and error expectations to AllModelsExhaustedError. |
| tests/test_agents.py | Migrates agents to llm_config, updates serialization assertions and gradient assertions. |
| tests/cassettes/TestSimpleAgent.test_dummyenv[gpt-4o-mini-2024-07-18].yaml | VCR update for new request shape (adds max_tokens) and updated client headers. |
| tests/cassettes/TestSimpleAgent.test_dummyenv[claude-haiku-4-5-20251001].yaml | VCR update for new request shape (max_tokens, tool schema changes). |
| tests/cassettes/TestSimpleAgent.test_agent_grad[gpt-4o-mini-2024-07-18].yaml | VCR update for new request shape (adds max_tokens) and updated client headers. |
| tests/cassettes/TestSimpleAgent.test_agent_grad[claude-haiku-4-5-20251001].yaml | VCR update for new request shape (max_tokens, tool schema changes). |
| tests/cassettes/TestReActAgent.test_react_dummyenv[True-gpt-4-turbo].yaml | VCR update for new request shape (adds max_tokens). |
| tests/cassettes/TestReActAgent.test_react_dummyenv[True-claude-haiku-4-5-20251001].yaml | VCR update for new request shape (max_tokens) and payload adjustments. |
| tests/cassettes/TestReActAgent.test_react_dummyenv[False-claude-haiku-4-5-20251001].yaml | VCR update for new request shape (max_tokens) and payload adjustments. |
| tests/cassettes/TestReActAgent.test_agent_grad[True-gpt-4-turbo].yaml | VCR update for new request shape (adds max_tokens). |
| tests/cassettes/TestReActAgent.test_agent_grad[True-claude-haiku-4-5-20251001].yaml | VCR update for new request shape (max_tokens) and payload adjustments. |
| tests/cassettes/TestReActAgent.test_agent_grad[False-claude-haiku-4-5-20251001].yaml | VCR update for new request shape (max_tokens) and payload adjustments. |
| tests/cassettes/TestNoToolsSimpleAgent.test_dummyenv[claude-haiku-4-5-20251001].yaml | VCR update for new request shape (max_tokens). |
| tests/cassettes/TestMemoryAgent.test_agent_grad.yaml | VCR update for new request shape (adds max_tokens). |
| tests/cassettes/TestAgentState.test_no_state_mutation[agent1].yaml | VCR update for new request shape (adds max_tokens). |
| tests/cassettes/TestAgentState.test_no_state_mutation[agent0].yaml | VCR update for new request shape (adds max_tokens). |
| src/ldp/graph/modules/thought.py | Switches module to accept LLMConfig instead of dict config. |
| src/ldp/graph/modules/reflect.py | Replaces dict llm_model field with LLMConfigField and updates wiring. |
| src/ldp/graph/modules/react.py | Uses LLMConfig.with_extra_params() to set stop sequences without mutating dicts. |
| src/ldp/graph/modules/llm_call.py | Updates parsed-call module to use LLMConfig and typed ConfigOp. |
| src/ldp/graph/common_ops.py | Changes LLMCallOp.forward() signature to accept LLMConfig and constructs LiteLLMModel via llm_config. |
| src/ldp/agent/tree_of_thoughts_agent.py | Migrates agent config field to LLMConfigField and updates call sites. |
| src/ldp/agent/simple_agent.py | Migrates agent config field to LLMConfigField and updates internal ConfigOp. |
| src/ldp/agent/react_agent.py | Migrates agent config field to LLMConfigField and updates module construction. |
| packages/lmi/tests/test_retry.py | Adds unit tests for retry/fallback classification and backoff bounds. |
| packages/lmi/tests/test_llms.py | Updates tests for new fallback error semantics, dispatch behavior, and Responses integration toggled per-model. |
| packages/lmi/tests/test_litellm_patches.py | Removes tests for the removed provider-400 retry patch. |
| packages/lmi/tests/test_dispatch.py | Adds end-to-end tests for LMI dispatch + retry/fallback loop (mocking litellm.acompletion). |
| packages/lmi/tests/test_cost_tracking.py | Simplifies tests by removing Router bypass paths and aligning to new config behavior. |
| packages/lmi/tests/test_config.py | Adds comprehensive tests for ModelSpec, legacy config translation, and LLMConfig.coerce / LLMConfigField. |
| packages/lmi/tests/cassettes/TestResponsesAPIIntegration.test_basic_call.yaml | Updates Responses API VCR cassette. |
| packages/lmi/tests/cassettes/TestResponsesAPIIntegration.test_multi_turn_stateful.yaml | Updates Responses API multi-turn VCR cassette. |
| packages/lmi/tests/cassettes/TestResponsesAPIIntegration.test_responses_api_off_ignores_response_id.yaml | Updates VCR cassette for non-Responses model behavior. |
| packages/lmi/tests/cassettes/TestLiteLLMModel.test_max_token_truncation.yaml | Adds/updates VCR cassette for truncation behavior with max_tokens. |
| packages/lmi/tests/cassettes/TestLiteLLMModel.test_cost_call_single.yaml | Adds/updates VCR cassette for streaming cost tracking request shape. |
| packages/lmi/src/lmi/retry.py | Introduces centralized retry/fallback policy and jittered exponential backoff. |
| packages/lmi/src/lmi/litellm_patches.py | Removes the provider-400 retry patch and renumbers patch docs accordingly. |
| packages/lmi/src/lmi/exceptions.py | Adds ModelRefusalError and AllModelsExhaustedError. |
| packages/lmi/src/lmi/constants.py | Removes env-flag toggle for Responses API and keeps core constants. |
| packages/lmi/src/lmi/config.py | Adds typed ModelSpec/LLMConfig and legacy/dict coercion utilities. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Applies: Gemini default safety settings; `temperature` / `max_tokens` | ||
| defaults; and silent drop of `logprobs` / `top_logprobs` for non-OpenAI | ||
| providers (which don't support them). Explicit values in `overrides` | ||
| always win over the defaults. |
There was a problem hiding this comment.
ModelSpec.from_name() docstring says it will “silently drop” logprobs / top_logprobs for non-OpenAI providers, but the implementation raises a ValueError instead. Please align the docstring with the actual behavior (or change the behavior) so callers know whether to expect coercion or an exception.
| Applies: Gemini default safety settings; `temperature` / `max_tokens` | |
| defaults; and silent drop of `logprobs` / `top_logprobs` for non-OpenAI | |
| providers (which don't support them). Explicit values in `overrides` | |
| always win over the defaults. | |
| Applies: Gemini default safety settings and `temperature` / | |
| `max_tokens` defaults. `logprobs` and `top_logprobs` are treated as | |
| OpenAI-only parameters: passing them for non-OpenAI providers raises | |
| `ValueError`. Explicit values in `overrides` always win over the | |
| defaults. |
| params_by_name: dict[str, dict[str, Any]] = { | ||
| m["model_name"]: dict(m.get("litellm_params", {})) for m in model_list | ||
| } | ||
|
|
||
| primary_name = model_list[0]["model_name"] | ||
| ordered: list[str] = [primary_name, *fallback_map.get(primary_name, [])] | ||
| for name in params_by_name: | ||
| if name not in ordered: | ||
| ordered.append(name) | ||
|
|
||
| router_kwargs = legacy.get("router_kwargs") or {} | ||
| default_timeout = router_kwargs.get("timeout", 60.0) | ||
| default_retries = router_kwargs.get("num_retries", 3) | ||
|
|
||
| return cls( | ||
| models=[ | ||
| _spec_from_legacy_params( | ||
| params_by_name.get(name, {}), | ||
| default_timeout=default_timeout, | ||
| default_retries=default_retries, | ||
| ) | ||
| for name in ordered | ||
| ] |
There was a problem hiding this comment.
LLMConfig.from_legacy_dict() can pass an empty dict into _spec_from_legacy_params() when fallbacks references a model_name that isn’t present in model_list, which will then fail with a KeyError on params['model']. Consider validating that all referenced fallback model names exist (and raising a clear ValueError listing the missing names) before constructing the ordered chain.
| # Per-call retry kwargs that LiteLLM honors via its own internal retry loop. LMI | ||
| # owns retries through `_run_with_fallbacks` + `ModelSpec.max_retries`, so these | ||
| # must never reach `litellm.acompletion`/`litellm.aresponses` regardless of how | ||
| # they ended up in `ModelSpec.extra_params`. |
There was a problem hiding this comment.
| # they ended up in `ModelSpec.extra_params`. | |
| # they ended up in our `ModelSpec.extra_params`. |
Felt we should clarify that ModelSpec is our thing
|
|
||
| name: str = Field( | ||
| description=( | ||
| "LiteLLM model string, e.g. 'gpt-4o-mini' or 'claude-3-5-sonnet-20241022'." |
There was a problem hiding this comment.
But don't LiteLLM model string start with like openai/gpt-4o-mini? Maybe clarify this.
| spec_field_overrides: dict[str, Any] = {} | ||
| extra_overrides: dict[str, Any] = {} | ||
| for key, value in overrides.items(): | ||
| if key in cls.model_fields: | ||
| spec_field_overrides[key] = value | ||
| elif key in _OPENAI_ONLY_PARAMS and not is_openai: | ||
| raise ValueError( | ||
| f"{key!r} is only supported on OpenAI models; got {name!r}." | ||
| ) | ||
| else: | ||
| extra_overrides[key] = value | ||
|
|
||
| spec_field_overrides.setdefault("name", name) | ||
| merged_extra = ( | ||
| extra | extra_overrides | spec_field_overrides.pop("extra_params", {}) | ||
| ) | ||
| return cls(extra_params=merged_extra, **spec_field_overrides) |
There was a problem hiding this comment.
Should we do this validation in a field/model_validator, so it applies to all ModelSpec, not just this classmethod?
| default_timeout = router_kwargs.get("timeout", 60.0) | ||
| default_retries = router_kwargs.get("num_retries", 3) |
There was a problem hiding this comment.
Any chance we can not inject , 60.0 and , 3 defaults here? The less injection we have, the better
| for key, value in overrides.items(): | ||
| if key in cls.model_fields: | ||
| spec_field_overrides[key] = value | ||
| elif key in _OPENAI_ONLY_PARAMS and not is_openai: |
There was a problem hiding this comment.
Gemini supports these too, can you have Gemini work with these too?
| (`temperature`, `max_tokens`, `n`, `logprobs`, `top_logprobs`, | ||
| `safety_settings`, ...) which are merged into `extra_params`. | ||
| """ | ||
| is_openai = _is_openai_provider(name) |
There was a problem hiding this comment.
Consider using litellm for this instead of a DIY'd helper, it can detect this using litellm.get_llm_provider
| }) | ||
|
|
||
|
|
||
| def _spec_from_legacy_params( |
There was a problem hiding this comment.
Can we make this a classmethod of ModelSpec? Then we can get Self as the return type
| import os | ||
| from sys import version_info | ||
|
|
||
| USE_RESPONSES_API = os.environ.get("USE_RESPONSES_API", "").lower() in {"1", "true"} |
| "fhaviary>=0.14.0", # For multi-image support | ||
| "limits[async-redis]>=4.8", # Specify 'async-redis' since that's what we use. Lower pin for RedisBridge.key_prefix. | ||
| "litellm>=1.81.10,<=1.82.6", # Lower pin for MAX_CALLBACKS refactor from https://github.com/BerriAI/litellm/pull/20781, upper pin for supply chain attack | ||
| "litellm>=1.83.10", # Lower pin for MAX_CALLBACKS refactor from https://github.com/BerriAI/litellm/pull/20781, upper pin for supply chain attack |
There was a problem hiding this comment.
I think rebase atop main as this has been changed a bit
Bringing some functionality up from LiteLLM into LMI:
ModelSpec, LLMConfig), which get translated into litellm config dicts just in timeModelSpec.responses_apilets us toggle-on Responses backend selectively. This means we can swap between the backends in a fallback cascadeNotably, this drops all dependence on
litellm.Routerexcept for embeddings, which we can likely handle in a similar way. It also gets rid of a few things that annoyed me:model_listor top-level kwargsSimpleAgent/LLMCallOp/LiteLLMModel/litellmlevels. Now, the first 3 all operate onLLMConfig.Some proof that this is backwards compatible: no cassette churn besides from new/dropped tests.