Moving things into LMI by sidnarayanan · Pull Request #446 · Future-House/ldp

sidnarayanan · 2026-04-18T00:49:30Z

Bringing some functionality up from LiteLLM into LMI:

Model configuration uses opinionated pydantic models (ModelSpec, LLMConfig), which get translated into litellm config dicts just in time
LMI handles retries and fallbacks. We use slightly more intelligent retries - only on things that are actually worth retrying, and with jittered exp backoff
Because we own retries/fallbacks, we have better logging
ModelSpec.responses_api lets us toggle-on Responses backend selectively. This means we can swap between the backends in a fallback cascade
Response validation is pushed down from LDP into LMI, so it can use the new LMI retry/fallback logic.

Notably, this drops all dependence on litellm.Router except for embeddings, which we can likely handle in a similar way. It also gets rid of a few things that annoyed me:

Having to remember whether parameters go in model_list or top-level kwargs
Differently-shaped configuration at SimpleAgent/LLMCallOp/LiteLLMModel/litellm levels. Now, the first 3 all operate on LLMConfig.

Some proof that this is backwards compatible: no cassette churn besides from new/dropped tests.

Copilot

Pull request overview

This PR moves LLM configuration and execution concerns “up” into LMI by introducing typed, opinionated config models (ModelSpec, LLMConfig), shifting retry/fallback behavior into LMI (with clearer error semantics), and updating LDP agents/modules/tests to consume the new config shape.

Changes:

Replace dict-shaped llm_model configuration with typed llm_config (LLMConfig / ModelSpec) across agents, graph modules, and tests.
Introduce LMI-owned retry/fallback policy and new error surfaces (e.g., AllModelsExhaustedError, ModelRefusalError) with targeted unit tests.
Update VCR cassettes to reflect new default request parameters (notably max_tokens=4096) and updated client/library versions.

Reviewed changes

Copilot reviewed 52 out of 54 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
tests/test_rollouts.py	Updates fallback behavior assertions to expect `AllModelsExhaustedError` and uses `llm_config` legacy-coerce shape.
tests/test_optimizer.py	Migrates optimizer tests to `LLMConfig.coerce()` and `llm_config=` construction.
tests/test_ops.py	Switches op tests to `LLMConfig` inputs and updates gradient expectations when config becomes a scalar leaf.
tests/test_modules.py	Updates module configs/instantiation to use `llm_config` defaults from `ReActAgent`.
tests/test_envs.py	Updates agent construction to `llm_config` and error expectations to `AllModelsExhaustedError`.
tests/test_agents.py	Migrates agents to `llm_config`, updates serialization assertions and gradient assertions.
tests/cassettes/TestSimpleAgent.test_dummyenv[gpt-4o-mini-2024-07-18].yaml	VCR update for new request shape (adds `max_tokens`) and updated client headers.
tests/cassettes/TestSimpleAgent.test_dummyenv[claude-haiku-4-5-20251001].yaml	VCR update for new request shape (`max_tokens`, tool schema changes).
tests/cassettes/TestSimpleAgent.test_agent_grad[gpt-4o-mini-2024-07-18].yaml	VCR update for new request shape (adds `max_tokens`) and updated client headers.
tests/cassettes/TestSimpleAgent.test_agent_grad[claude-haiku-4-5-20251001].yaml	VCR update for new request shape (`max_tokens`, tool schema changes).
tests/cassettes/TestReActAgent.test_react_dummyenv[True-gpt-4-turbo].yaml	VCR update for new request shape (adds `max_tokens`).
tests/cassettes/TestReActAgent.test_react_dummyenv[True-claude-haiku-4-5-20251001].yaml	VCR update for new request shape (`max_tokens`) and payload adjustments.
tests/cassettes/TestReActAgent.test_react_dummyenv[False-claude-haiku-4-5-20251001].yaml	VCR update for new request shape (`max_tokens`) and payload adjustments.
tests/cassettes/TestReActAgent.test_agent_grad[True-gpt-4-turbo].yaml	VCR update for new request shape (adds `max_tokens`).
tests/cassettes/TestReActAgent.test_agent_grad[True-claude-haiku-4-5-20251001].yaml	VCR update for new request shape (`max_tokens`) and payload adjustments.
tests/cassettes/TestReActAgent.test_agent_grad[False-claude-haiku-4-5-20251001].yaml	VCR update for new request shape (`max_tokens`) and payload adjustments.
tests/cassettes/TestNoToolsSimpleAgent.test_dummyenv[claude-haiku-4-5-20251001].yaml	VCR update for new request shape (`max_tokens`).
tests/cassettes/TestMemoryAgent.test_agent_grad.yaml	VCR update for new request shape (adds `max_tokens`).
tests/cassettes/TestAgentState.test_no_state_mutation[agent1].yaml	VCR update for new request shape (adds `max_tokens`).
tests/cassettes/TestAgentState.test_no_state_mutation[agent0].yaml	VCR update for new request shape (adds `max_tokens`).
src/ldp/graph/modules/thought.py	Switches module to accept `LLMConfig` instead of dict config.
src/ldp/graph/modules/reflect.py	Replaces dict `llm_model` field with `LLMConfigField` and updates wiring.
src/ldp/graph/modules/react.py	Uses `LLMConfig.with_extra_params()` to set stop sequences without mutating dicts.
src/ldp/graph/modules/llm_call.py	Updates parsed-call module to use `LLMConfig` and typed `ConfigOp`.
src/ldp/graph/common_ops.py	Changes `LLMCallOp.forward()` signature to accept `LLMConfig` and constructs `LiteLLMModel` via `llm_config`.
src/ldp/agent/tree_of_thoughts_agent.py	Migrates agent config field to `LLMConfigField` and updates call sites.
src/ldp/agent/simple_agent.py	Migrates agent config field to `LLMConfigField` and updates internal `ConfigOp`.
src/ldp/agent/react_agent.py	Migrates agent config field to `LLMConfigField` and updates module construction.
packages/lmi/tests/test_retry.py	Adds unit tests for retry/fallback classification and backoff bounds.
packages/lmi/tests/test_llms.py	Updates tests for new fallback error semantics, dispatch behavior, and Responses integration toggled per-model.
packages/lmi/tests/test_litellm_patches.py	Removes tests for the removed provider-400 retry patch.
packages/lmi/tests/test_dispatch.py	Adds end-to-end tests for LMI dispatch + retry/fallback loop (mocking `litellm.acompletion`).
packages/lmi/tests/test_cost_tracking.py	Simplifies tests by removing Router bypass paths and aligning to new config behavior.
packages/lmi/tests/test_config.py	Adds comprehensive tests for `ModelSpec`, legacy config translation, and `LLMConfig.coerce` / `LLMConfigField`.
packages/lmi/tests/cassettes/TestResponsesAPIIntegration.test_basic_call.yaml	Updates Responses API VCR cassette.
packages/lmi/tests/cassettes/TestResponsesAPIIntegration.test_multi_turn_stateful.yaml	Updates Responses API multi-turn VCR cassette.
packages/lmi/tests/cassettes/TestResponsesAPIIntegration.test_responses_api_off_ignores_response_id.yaml	Updates VCR cassette for non-Responses model behavior.
packages/lmi/tests/cassettes/TestLiteLLMModel.test_max_token_truncation.yaml	Adds/updates VCR cassette for truncation behavior with `max_tokens`.
packages/lmi/tests/cassettes/TestLiteLLMModel.test_cost_call_single.yaml	Adds/updates VCR cassette for streaming cost tracking request shape.
packages/lmi/src/lmi/retry.py	Introduces centralized retry/fallback policy and jittered exponential backoff.
packages/lmi/src/lmi/litellm_patches.py	Removes the provider-400 retry patch and renumbers patch docs accordingly.
packages/lmi/src/lmi/exceptions.py	Adds `ModelRefusalError` and `AllModelsExhaustedError`.
packages/lmi/src/lmi/constants.py	Removes env-flag toggle for Responses API and keeps core constants.
packages/lmi/src/lmi/config.py	Adds typed `ModelSpec`/`LLMConfig` and legacy/dict coercion utilities.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-18T00:53:57Z

+        Applies: Gemini default safety settings; `temperature` / `max_tokens`
+        defaults; and silent drop of `logprobs` / `top_logprobs` for non-OpenAI
+        providers (which don't support them). Explicit values in `overrides`
+        always win over the defaults.


ModelSpec.from_name() docstring says it will “silently drop” logprobs / top_logprobs for non-OpenAI providers, but the implementation raises a ValueError instead. Please align the docstring with the actual behavior (or change the behavior) so callers know whether to expect coercion or an exception.

Suggested change

Applies: Gemini default safety settings; `temperature` / `max_tokens`

defaults; and silent drop of `logprobs` / `top_logprobs` for non-OpenAI

providers (which don't support them). Explicit values in `overrides`

always win over the defaults.

Applies: Gemini default safety settings and `temperature` /

`max_tokens` defaults. `logprobs` and `top_logprobs` are treated as

OpenAI-only parameters: passing them for non-OpenAI providers raises

`ValueError`. Explicit values in `overrides` always win over the

defaults.

Copilot · 2026-04-18T00:53:58Z

+        params_by_name: dict[str, dict[str, Any]] = {
+            m["model_name"]: dict(m.get("litellm_params", {})) for m in model_list
+        }
+
+        primary_name = model_list[0]["model_name"]
+        ordered: list[str] = [primary_name, *fallback_map.get(primary_name, [])]
+        for name in params_by_name:
+            if name not in ordered:
+                ordered.append(name)
+
+        router_kwargs = legacy.get("router_kwargs") or {}
+        default_timeout = router_kwargs.get("timeout", 60.0)
+        default_retries = router_kwargs.get("num_retries", 3)
+
+        return cls(
+            models=[
+                _spec_from_legacy_params(
+                    params_by_name.get(name, {}),
+                    default_timeout=default_timeout,
+                    default_retries=default_retries,
+                )
+                for name in ordered
+            ]


LLMConfig.from_legacy_dict() can pass an empty dict into _spec_from_legacy_params() when fallbacks references a model_name that isn’t present in model_list, which will then fail with a KeyError on params['model']. Consider validating that all referenced fallback model names exist (and raising a clear ValueError listing the missing names) before constructing the ordered chain.

jamesbraza · 2026-04-23T21:47:29Z

+# Per-call retry kwargs that LiteLLM honors via its own internal retry loop. LMI
+# owns retries through `_run_with_fallbacks` + `ModelSpec.max_retries`, so these
+# must never reach `litellm.acompletion`/`litellm.aresponses` regardless of how
+# they ended up in `ModelSpec.extra_params`.


Suggested change

# they ended up in `ModelSpec.extra_params`.

# they ended up in our `ModelSpec.extra_params`.

Felt we should clarify that ModelSpec is our thing

jamesbraza · 2026-04-23T21:48:53Z

+
+    name: str = Field(
+        description=(
+            "LiteLLM model string, e.g. 'gpt-4o-mini' or 'claude-3-5-sonnet-20241022'."


But don't LiteLLM model string start with like openai/gpt-4o-mini? Maybe clarify this.

jamesbraza · 2026-04-23T21:50:08Z

+        spec_field_overrides: dict[str, Any] = {}
+        extra_overrides: dict[str, Any] = {}
+        for key, value in overrides.items():
+            if key in cls.model_fields:
+                spec_field_overrides[key] = value
+            elif key in _OPENAI_ONLY_PARAMS and not is_openai:
+                raise ValueError(
+                    f"{key!r} is only supported on OpenAI models; got {name!r}."
+                )
+            else:
+                extra_overrides[key] = value
+
+        spec_field_overrides.setdefault("name", name)
+        merged_extra = (
+            extra | extra_overrides | spec_field_overrides.pop("extra_params", {})
+        )
+        return cls(extra_params=merged_extra, **spec_field_overrides)


Should we do this validation in a field/model_validator, so it applies to all ModelSpec, not just this classmethod?

jamesbraza · 2026-04-23T21:51:27Z

+        default_timeout = router_kwargs.get("timeout", 60.0)
+        default_retries = router_kwargs.get("num_retries", 3)


Any chance we can not inject , 60.0 and , 3 defaults here? The less injection we have, the better

jamesbraza · 2026-04-23T21:53:21Z

+        for key, value in overrides.items():
+            if key in cls.model_fields:
+                spec_field_overrides[key] = value
+            elif key in _OPENAI_ONLY_PARAMS and not is_openai:


Gemini supports these too, can you have Gemini work with these too?

jamesbraza · 2026-04-23T21:54:18Z

+        (`temperature`, `max_tokens`, `n`, `logprobs`, `top_logprobs`,
+        `safety_settings`, ...) which are merged into `extra_params`.
+        """
+        is_openai = _is_openai_provider(name)


Consider using litellm for this instead of a DIY'd helper, it can detect this using litellm.get_llm_provider

jamesbraza · 2026-04-23T21:54:42Z

+})
+
+
+def _spec_from_legacy_params(


Can we make this a classmethod of ModelSpec? Then we can get Self as the return type

jamesbraza · 2026-04-23T21:54:53Z

-import os
 from sys import version_info

-USE_RESPONSES_API = os.environ.get("USE_RESPONSES_API", "").lower() in {"1", "true"}


Nice this is awesome

jamesbraza · 2026-04-23T21:56:54Z

    "fhaviary>=0.14.0",  # For multi-image support
    "limits[async-redis]>=4.8",  # Specify 'async-redis' since that's what we use. Lower pin for RedisBridge.key_prefix.
-    "litellm>=1.81.10,<=1.82.6",  # Lower pin for MAX_CALLBACKS refactor from https://github.com/BerriAI/litellm/pull/20781, upper pin for supply chain attack
+    "litellm>=1.83.10",  # Lower pin for MAX_CALLBACKS refactor from https://github.com/BerriAI/litellm/pull/20781, upper pin for supply chain attack


I think rebase atop main as this has been changed a bit

sidnarayanan added 16 commits April 17, 2026 14:18

lmi-native configuration, fallbacks, retries

68f0201

clean up docstrings

51d3bb9

drop router

5804864

add dispatch/retry tests

9689698

update refusal fallback test

0c93218

update more tests

2715531

make BadRequestError fallbackable

232e1cb

agents accept LLMConfig

d7bf950

update cassettes

edbe8e5

drop deepseek tests

be876fd

regen cassettes

4609fc2

ModelSpec.responses_api

02b829e

update cassettes

ca77c21

fix anthropic n>1 test

f57642b

rename llm_model->llm_config in tests

8de5cf6

regen another cassette

7fcc321

sidnarayanan requested review from MicPie, albertbou92, Copilot, cvigoe, elimoss, jamesbraza, maykcaldas, mskarlin and whitead April 18, 2026 00:49

Copilot started reviewing on behalf of sidnarayanan April 18, 2026 00:49 View session

Copilot AI reviewed Apr 18, 2026

View reviewed changes

sidnarayanan added 3 commits April 19, 2026 15:57

fix refurb

6a12634

fix refurb

c993938

bump to litellm==1.83.0

9547c20

sidnarayanan added 8 commits April 19, 2026 16:02

uv lock

741d709

push response validation down into lmi

cd06f99

drop max_tokens=4096 default

9834861

fix refurb

6d8d567

add retry logging

b0f2b54

verbose logging

223a249

pre-dispatch log

3f87402

update litellm pin

f3fb059

jamesbraza reviewed Apr 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Moving things into LMI#446

Moving things into LMI#446
sidnarayanan wants to merge 27 commits intomainfrom
lmi-fallbacks

sidnarayanan commented Apr 18, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

jamesbraza Apr 23, 2026

Uh oh!

jamesbraza Apr 23, 2026

Uh oh!

jamesbraza Apr 23, 2026

Uh oh!

jamesbraza Apr 23, 2026

Uh oh!

jamesbraza Apr 23, 2026

Uh oh!

jamesbraza Apr 23, 2026

Uh oh!

jamesbraza Apr 23, 2026

Uh oh!

jamesbraza Apr 23, 2026

Uh oh!

jamesbraza Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	# they ended up in `ModelSpec.extra_params`.
	# they ended up in our `ModelSpec.extra_params`.

		default_timeout = router_kwargs.get("timeout", 60.0)
		default_retries = router_kwargs.get("num_retries", 3)

		})


		def _spec_from_legacy_params(

Conversation

sidnarayanan commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sidnarayanan commented Apr 18, 2026 •

edited

Loading