Multi-provider LLM client for Rust with streaming support.
Unified API for Anthropic Claude, OpenAI, z.ai, and local LLMs with built-in token tracking, state persistence, and automatic model handoff.
Part of the Limit ecosystem.
Building AI applications shouldn't require learning different APIs for each LLM provider. limit-llm provides a single, consistent interface that works across Anthropic Claude, OpenAI GPT, z.ai GLM, and local models—so you can switch providers without rewriting code.
- Multi-provider support: Anthropic Claude, OpenAI GPT, z.ai GLM, and local LLMs (Ollama, LM Studio, vLLM)
- Streaming responses: Async streaming with
futures::Streamfor real-time output - Token tracking: SQLite-based usage tracking with cost estimation
- State persistence: Serialize/restore conversation state with bincode
- Model handoff: Automatic fallback between providers on failure
- Tool calling: Full function/tool support for all compatible providers
- Thinking mode: Extended reasoning support (Claude, z.ai)
- Type-safe: Full Rust type system with serde integration
Add to your Cargo.toml:
[dependencies]
limit-llm = "0.0.27"Requirements: Rust 1.70+, tokio runtime
use limit_llm::{AnthropicClient, Message, Role, LlmProvider};
use futures::StreamExt;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = AnthropicClient::new(
std::env::var("ANTHROPIC_API_KEY")?,
None, // default base URL
60, // timeout in seconds
"claude-sonnet-4-6-20260217",
4096, // max tokens
);
let messages = vec![
Message {
role: Role::User,
content: Some("Hello, Claude!".to_string()),
tool_calls: None,
tool_call_id: None,
}
];
// Stream the response
let mut stream = client.send(messages, vec![]).await?;
while let Some(chunk) = stream.next().await {
match chunk {
Ok(limit_llm::ProviderResponseChunk::ContentDelta(text)) => print!("{}", text),
Ok(limit_llm::ProviderResponseChunk::Done(usage)) => {
println!("\nTokens: {} in, {} out", usage.input_tokens, usage.output_tokens);
}
Err(e) => eprintln!("Error: {}", e),
_ => {}
}
}
Ok(())
}use limit_llm::{Config, ProviderFactory, LlmProvider};
// Load from ~/.limit/config.toml
let config = Config::load()?;
// Create provider from config
let provider = ProviderFactory::from_config(&config)?;
// Use the provider
let stream = provider.send(vec![], vec![]).await?;| Provider | Client | Streaming | Tools | Thinking |
|---|---|---|---|---|
| Anthropic Claude | AnthropicClient |
✓ | ✓ | ✓ |
| OpenAI GPT | OpenAiProvider |
✓ | ✓ | — |
| z.ai GLM | ZaiProvider |
✓ | ✓ | ✓ |
| Local/Ollama | LocalProvider |
✓ | — | — |
# ~/.limit/config.toml
provider = "anthropic"
[providers.anthropic]
model = "claude-sonnet-4-6-20260217"
max_tokens = 4096
timeout = 60| Variable | Provider |
|---|---|
ANTHROPIC_API_KEY |
Anthropic Claude |
OPENAI_API_KEY |
OpenAI |
ZAI_API_KEY |
z.ai |
use limit_llm::{Tool, ToolFunction, Message, Role, AnthropicClient, LlmProvider};
use serde_json::json;
let tools = vec![Tool {
tool_type: "function".to_string(),
function: ToolFunction {
name: "get_weather".to_string(),
description: "Get current weather for a location".to_string(),
parameters: json!({
"type": "object",
"properties": {
"location": {"type": "string"}
},
"required": ["location"]
}),
},
}];
let messages = vec![Message {
role: Role::User,
content: Some("What's the weather in Tokyo?".to_string()),
tool_calls: None,
tool_call_id: None,
}];
let client = AnthropicClient::from_env()?;
let stream = client.send(messages, tools).await?;use limit_llm::TrackingDb;
let tracking = TrackingDb::new("~/.limit/tracking.db")?;
// Record usage (automatically done by clients)
tracking.record_usage("claude-sonnet-4-6-20260217", 100, 50)?;
// Get statistics
let stats = tracking.get_stats()?;
println!("Total cost: ${:.4}", stats.total_cost);use limit_llm::{StatePersistence, Message};
let persistence = StatePersistence::new("~/.limit/state/")?;
// Save conversation
persistence.save("session-123", &messages)?;
// Restore later
let restored = persistence.load::<Vec<Message>>("session-123")?;Automatic fallback between providers:
use limit_llm::ModelHandoff;
let handoff = ModelHandoff::new()
.with_primary("claude-sonnet-4-6-20260217")
.with_fallback("gpt-5.4")
.with_fallback("glm-5");
// Automatically falls back if primary fails
let response = handoff.complete(messages).await?;| Type | Description |
|---|---|
Message |
Chat message with role, content, and tool calls |
Role |
User, Assistant, System, or Tool |
Tool / ToolCall |
Function calling definitions |
Usage |
Token counting for prompt/completion |
Response |
Complete response with content and metadata |
See docs.rs/limit-llm for full API documentation.
# Run examples
cargo run --example basic
cargo run --example streaming
cargo run --example tool_callingMIT © Mário Idival