Skip to content

Add async support across providers, client, tools, and the agent Runner#296

Open
rohitprasad15 wants to merge 5 commits into
feat/ollama-openai-compatfrom
feat/async-core
Open

Add async support across providers, client, tools, and the agent Runner#296
rohitprasad15 wants to merge 5 commits into
feat/ollama-openai-compatfrom
feat/async-core

Conversation

@rohitprasad15

Copy link
Copy Markdown
Collaborator

Summary

Makes the core path genuinely async — provider → client → tools → Runner — so agent workloads (parallel tool calls, concurrent subagents, many simultaneous runs) can actually overlap. Design: async-core with thin sync wrappers and a single tool loop, so there's one source of truth rather than two divergent code paths.

Stacked on #295 (base branch is feat/ollama-openai-compat). Review/merge #295 first; this PR's diff will collapse to just the async commits once #295 lands.

Commits

  1. Async provider contractProvider.achat_completions_create with a thread-executor default (asyncio.to_thread), so every provider is awaitable immediately; OpenaiProvider overrides it natively via openai.AsyncOpenAI (Ollama/LM Studio inherit it).
  2. Async tool executionTools.aexecute_tool awaits async def tools and runs sync tools off the loop; per-call validation/policy/events/message-building are factored into shared helpers so sync and async stay in lockstep.
  3. Async client coreCompletions.acreate + an async _atool_runner; provider resolution, runner setup, model-event emission, and finalization are shared with the sync path (no duplicated tool loop).
  4. Async Runnerrun / continue_run await acreate; run_sync / continue_sync become thin wrappers (with a nest_asyncio fallback when invoked inside a running loop). A use_async_client flag keeps the sync wrappers driving the provider's sync create, so existing callers and tests are unaffected. Concurrent Runner.run calls are now gatherable.

Scope / follow-ups

  • First-pass native async covers the OpenAI-compatible providers; all others run correctly via the thread-executor default and can get native async incrementally.
  • Subagent tools (agent_tool) run via the thread executor in the async path, and a turn's tool calls execute sequentially. Parallel tool-call fan-out within a turn is a clean follow-up.

Testing

Full suite shows zero new failures vs. main (pre-existing failures are optional-dep collection errors + the known prerelease/trace tests). New async tests cover the provider contract, async tool execution, acreate, async Runner.run parity, gather concurrency, and run_sync inside a running loop.

@rohitprasad15 rohitprasad15 force-pushed the feat/ollama-openai-compat branch from c213b18 to 6f71017 Compare June 10, 2026 22:10
Introduce achat_completions_create on Provider as the async completion
entrypoint. The base implementation offloads each provider's synchronous
chat_completions_create to a worker thread via asyncio.to_thread, so every
provider is awaitable immediately. OpenaiProvider overrides it with a native
openai.AsyncOpenAI client for true non-blocking I/O; the Ollama and LM Studio
subclasses inherit that override for free.
Add Tools.aexecute_tool, which awaits async def tool callables and runs
blocking sync tools in a worker thread via asyncio.to_thread. The per-call
work (argument validation, policy evaluation, trace events, and tool-result
message building) is factored into shared helpers, so the sync execute_tool
and the new async path stay in lockstep with no duplicated tool-loop logic.
Add Completions.acreate and an async _atool_runner that await the provider's
achat_completions_create and the async tool execution. Provider resolution,
tool-runner setup, model-event emission, and response finalization are factored
into shared helpers so the sync create/_tool_runner and the async paths stay in
lockstep instead of duplicating the tool loop. MCP client cleanup remains
synchronous via ExitStack.
Invert Runner so the async path is the core: run/continue_run await the
client's acreate, and run_sync/continue_sync become thin wrappers that drive
the coroutine to completion (falling back to nest_asyncio when called from
within a running loop). The shared _run_impl/_continue_impl take a
use_async_client flag so the sync wrappers still drive the provider's
synchronous create — keeping existing callers and tests that patch
client.chat.completions.create working unchanged. Concurrent Runner.run calls
can now be awaited and gathered.
The async client replicated the manual tool-calling regression fixed in
#266: acreate() pops `tools` from kwargs and never handed them to the
provider when max_turns is absent. Both create() and acreate() now share
a _provider_ready_tools() helper (schema dicts pass through, callables
become OpenAI-format specs) and an async regression test mirrors the
sync one.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant