Skip to content

Add Foundry Local provider#330

Open
justinchuby wants to merge 9 commits into
andrewyng:mainfrom
justinchuby:foundry-local-explicit-alias-resolution
Open

Add Foundry Local provider#330
justinchuby wants to merge 9 commits into
andrewyng:mainfrom
justinchuby:foundry-local-explicit-alias-resolution

Conversation

@justinchuby

Copy link
Copy Markdown

Adds support for Microsoft Foundry Local, an on-device runtime that exposes an OpenAI-compatible API.

FoundryLocalProvider (named after the product) subclasses OpenaiProvider (same pattern as ollama/lmstudio) and is selected with the foundry_local: model prefix. A Foundry_localProvider alias is kept so the provider factory (which derives the class name from "foundry_local".capitalize()) resolves the key. The provider operates in one of two modes:

  • Managed (default): uses the Foundry Local Python SDK to start the service on demand, download/load the model, and resolve a model alias to its concrete id. Bootstrap is deferred to the first request, since the endpoint uses a dynamic port and the alias is only known at call time. Both SDK generations are supported via runtime detection: the current foundry-local-sdk 1.x (imported as foundry_local_sdk, using the Configuration/catalog/start_web_service singleton API) and the legacy 0.x package (imported as foundry_local, using FoundryLocalManager(alias)).
  • Explicit endpoint: when api_url/base_url or FOUNDRY_LOCAL_API_URL is set, points the OpenAI SDK directly at a running endpoint (no SDK required); the host is normalized to /v1 idempotently. A friendly model alias is resolved to the concrete served id by querying the endpoint's /v1/models (exact id → parent alias → alias-boundary prefix), falling back to the original string when the lookup fails or there is no confident match — so the same foundry_local:<alias> works in both modes.

Validation

Tested end-to-end on Windows against a real Foundry Local install (0.8.119) with foundry-local-sdk 1.2.3:

  • Managed mode (foundry_local:qwen2.5-0.5b): auto-started the service, downloaded + loaded the model, resolved the alias to qwen2.5-0.5b-instruct-generic-gpu:4, returned a valid completion (finish_reason=stop).
  • Explicit-endpoint mode: friendly alias qwen2.5-0.5b resolved against the running endpoint to qwen2.5-0.5b-instruct-generic-cpu:4 and returned a completion.
  • Tool calling: returned a structured tool_calls response (finish_reason=tool_calls). Note: tool calling through ai.Client currently requires the mcp extra (pip install 'aisuite[mcp]') due to an unrelated upstream bug where is_mcp_config is gated behind the MCP import.
  • Unit tests: 14 passed.

Changes

  • aisuite/providers/foundry_local_provider.py — new provider (FoundryLocalProvider, with a Foundry_localProvider alias for factory compatibility); explicit-endpoint alias resolution.
  • pyproject.tomlfoundry-local extra installs openai and foundry-local-sdk (^1.2.3, declared with a python = ">=3.11" marker since the SDK requires Python ≥3.11 while aisuite supports ^3.10; it's also added to the all extra). The SDK pulls in native ONNX Runtime/CUDA wheels, so users on Python 3.10 (or who need a hardware-specific variant) can still install it directly with pip install foundry-local-sdk.
  • tests/providers/test_foundry_local_provider.py — explicit-endpoint mode (incl. alias resolution by prefix/parent, concrete-id passthrough, ambiguous-alias error, unreachable-endpoint fallback), both managed SDK backends (1.x and 0.x), alias→id resolution, multi-alias loading, missing-SDK error.
  • guides/foundry_local.md (+ index link in guides/README.md) — usage for both modes and the aisuite[mcp] note for tool calling.

Usage

import aisuite as ai

client = ai.Client()
response = client.chat.completions.create(
    model="foundry_local:phi-3.5-mini",   # alias resolved + service started automatically
    messages=[{"role": "user", "content": "What is the golden ratio?"}],
)
print(response.choices[0].message.content)

Explicit endpoint (SDK not required; alias resolved against the running endpoint):

client = ai.Client(provider_configs={"foundry_local": {"api_url": "http://localhost:5273"}})
client.chat.completions.create(
    model="foundry_local:phi-3.5-mini",
    messages=[{"role": "user", "content": "Hi"}],
)

Copilot AI and others added 9 commits June 17, 2026 16:06
In explicit-endpoint mode the model string was passed through unchanged, so
users had to pass the concrete served id (e.g.
qwen2.5-0.5b-instruct-generic-cpu:4) discovered from the endpoint, while
managed mode accepted a friendly alias. Resolve the alias against the
endpoint's /v1/models (exact id, then parent alias, then alias-boundary
prefix), falling back to the original string when the lookup fails or no
confident match is found. This lets the same foundry_local:<alias> work in
both modes. Also document the aisuite[mcp] requirement for tool calling.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds a first-class `foundry_local` provider to the coworker platform,
mirroring the existing on-device Ollama integration:

- registry: `foundry_local` descriptor (keyless, optional user-supplied
  `base_url` since the Foundry Local port is dynamic) built on the shared
  OpenAI-compatible client; generalize `_normalize_ollama_url` into
  `_normalize_openai_compatible_url`.
- capabilities: treat foundry_local like ollama (tools on, single tool call,
  no vision, streaming).
- manager: `_foundry_models` lists models live from the endpoint's
  `/v1/models` for the settings model picker.

Verified end-to-end against a running Foundry Local service (qwen2.5-0.5b):
provider builds, routes, completes, and reports capabilities; model listing
returns the served id. Full platform suite (332 passed, 1 skipped).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The model-config UI hard-coded Ollama as the only local provider: the
'Local models' pane targeted `ollama` and the 'API models' pane listed
every provider except `ollama`. A keyless provider like `foundry_local`
therefore fell into the API pane, which only renders an `api_key` field —
so its `base_url` couldn't be set at all.

Generalize both the onboarding and Manage-models panes to treat any keyless
provider (`needs_key == false`) as local:

- The Local pane now has a runtime selector (shown when more than one local
  runtime exists) and renders the `base_url` field (label/placeholder/help)
  from the selected provider's descriptor, so Ollama and Foundry Local each
  get correct copy.
- The API pane lists `needs_key` providers instead of 'everything but
  ollama'.

No backend changes; the panes are now driven by provider metadata.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants