Smart Model Selection is an intelligent routing system that automatically selects the optimal LLM model based on the type of task being performed, not just text length or cost. Different LLMs excel at different tasks - Claude for code, o1 for reasoning, Gemini for long documents, etc.
This feature can reduce costs by 50-90% while improving output quality by routing each task to the model that performs best for that specific type of work.
- The Problem
- How It Works
- Task Types
- Quick Start
- Configuration
- Task Detection Methods
- Real-World Examples
- API Reference
- Analytics & Monitoring
- Cost Savings Analysis
- Best Practices
- Advanced Usage
- Troubleshooting
Most applications use a single model for all tasks:
// Everything goes to GPT-4o
const response = await fetch("https://api.openai.com/v1/chat/completions", {
body: JSON.stringify({
model: "gpt-4o", // $2.50 input / $10.00 output per 1M tokens
messages: [{ role: "user", content: prompt }]
})
});Problems with this approach:
-
Massive Cost Waste
- Using GPT-4o for simple greetings: 16x more expensive than needed
- Using GPT-4o for code: Claude produces better results
- Using GPT-4o for math: o1 models are specifically designed for reasoning
-
Suboptimal Quality
- GPT-4o for code generation: Good, but Claude is better
- GPT-4o for long documents: 128K context limit, Gemini has 2M
- GPT-4o for Chinese: Decent, but Kimi is optimized for it
-
Reliability Issues
- Long documents fail due to context limits
- Complex reasoning tasks get incorrect answers
- No automatic optimization
Route each request to the model that excels at that specific task type:
- 🎨 Code Generation → Claude (best code quality)
- 🧮 Math/Reasoning → o1/o1-mini (designed for reasoning)
- 📚 Long Documents → Gemini (2M token context)
- 💬 Simple Chat → GPT-4o-mini (cost-effective)
- ✍️ Creative Writing → GPT-4o (best creativity)
- 🌏 Chinese Language → Kimi (optimized for Chinese)
Result: Lower costs + Better quality + Higher reliability
┌─────────────────────────────────────────────────────────────┐
│ User Request Arrives │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Task Classification Engine │
│ • Keyword Analysis │
│ • Pattern Matching (Regex) │
│ • Semantic Analysis │
│ • Context Analysis (conversation history) │
│ • Language Detection │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Task Type Identified │
│ Example: "code_generation" (95% confidence) │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Model Selection │
│ Task: code_generation → Model: claude-3-5-sonnet │
│ Reason: "Claude excels at code generation" │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Request Routing │
│ • Route to selected model │
│ • Log decision for analytics │
│ • Track cost and performance │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Response & Learning │
│ • Return response to user │
│ • Track actual cost vs estimated │
│ • Update confidence scores │
│ • Improve future classifications │
└─────────────────────────────────────────────────────────────┘
Best Model: Claude 3.5 Sonnet
Why: Claude consistently produces higher quality, more maintainable code
Keywords: write code, create function, implement, build, develop, program
Patterns: /write.*code/i, /create.*function/i, /implement.*class/i
Cost: $3.00 input / $15.00 output per 1M tokens
Example Prompts:
- "Write a Python function to validate email addresses"
- "Create a React component for user authentication"
- "Implement a binary search tree in JavaScript"
Best Model: Claude 3.5 Sonnet
Why: Excellent at analyzing code structure and suggesting improvements
Keywords: review code, find bugs, optimize, refactor, improve code, debug
Patterns: /review.*code/i, /find.*bug/i, /refactor/i, /optimize/i
Cost: $3.00 input / $15.00 output per 1M tokens
Example Prompts:
- "Review this code and suggest improvements: [code]"
- "Find potential bugs in this function"
- "Refactor this code to be more efficient"
Best Model: o1-mini
Why: Designed specifically for mathematical reasoning with chain-of-thought
Keywords: calculate, solve, compute, equation, formula, math
Patterns: /solve.*equation/i, /calculate/i, /mathematical/i
Cost: $3.00 input / $12.00 output per 1M tokens
Example Prompts:
- "Solve this equation: 2x² + 5x - 3 = 0"
- "Calculate the compound interest for $10,000 at 5% over 10 years"
- "Find the derivative of x³ + 2x² - 5x + 7"
Best Model: o1
Why: Advanced reasoning model with extended thinking time
Keywords: analyze, reason, logic, deduce, infer, prove, derive
Patterns: /step.*by.*step/i, /reasoning/i, /logical.*analysis/i
Cost: $15.00 input / $60.00 output per 1M tokens
Example Prompts:
- "Analyze this business problem and provide a logical solution"
- "Prove this mathematical theorem step by step"
- "Deduce the root cause of this system failure"
Best Model: Gemini 2.5 Pro
Why: 2M token context window handles very long documents
Keywords: summarize document, analyze document, extract from, review document
Patterns: /summarize.*document/i, /analyze.*pdf/i, /extract.*information/i
Cost: $1.25 input / $10.00 output per 1M tokens
Context Threshold: Automatically selected when input > 50,000 tokens
Example Prompts:
- "Summarize this 100-page legal document"
- "Extract key findings from this research paper"
- "Analyze this contract and highlight important clauses"
Best Model: GPT-4o
Why: Best for creative, engaging, human-like content
Keywords: write story, create content, blog post, article, creative
Patterns: /write.*story/i, /creative.*writing/i, /blog.*post/i
Cost: $2.50 input / $10.00 output per 1M tokens
Example Prompts:
- "Write a short story about a time traveler"
- "Create a blog post about AI trends in 2026"
- "Write engaging product descriptions for an e-commerce site"
Best Model: GPT-4o-mini
Why: Simple task, cost-effective model is sufficient
Keywords: translate, translation, convert to
Patterns: /translate.*to/i, /translation/i
Cost: $0.15 input / $0.60 output per 1M tokens
Example Prompts:
- "Translate this text to Spanish"
- "Convert this document from English to French"
- "Translate: Hello, how are you?"
Best Model: GPT-4o-mini
Why: Fast, cost-effective for basic interactions
Keywords: hello, hi, how are you, thanks, thank you, help
Patterns: /^(hi|hello|hey)/i, /how.*are.*you/i, /thank/i
Cost: $0.15 input / $0.60 output per 1M tokens
Example Prompts:
- "Hi, how are you?"
- "Thanks for your help!"
- "Can you help me with something?"
Best Model: GPT-4o-mini
Why: Structured tasks work well with cost-effective models
Keywords: extract, parse, get data from, scrape, pull data
Patterns: /extract.*from/i, /parse.*json/i, /get.*data/i
Cost: $0.15 input / $0.60 output per 1M tokens
Example Prompts:
- "Extract all email addresses from this text"
- "Parse this JSON and get the user names"
- "Pull all dates from this document"
Best Model: Kimi (Moonshot v1-32k)
Why: Optimized specifically for Chinese language understanding
Keywords: 中文, 汉语, 普通话, Chinese
Patterns: /[\u4e00-\u9fa5]/ (Chinese characters)
Cost: $0.50 input / $0.50 output per 1M tokens
Example Prompts:
- "请帮我写一个排序算法" (Write a sorting algorithm)
- "翻译这段文字" (Translate this text)
- "分析这份中文文档" (Analyze this Chinese document)
Best Model: GPT-4o-mini
Why: Fast, accurate for straightforward factual questions
Keywords: what is, who is, when did, where is, how many
Patterns: /^(what|who|when|where|how|why)/i
Cost: $0.15 input / $0.60 output per 1M tokens
Example Prompts:
- "What is the capital of France?"
- "Who invented the telephone?"
- "When did World War II end?"
Best Model: Claude 3.5 Sonnet
Why: Excellent at technical writing and documentation
Keywords: document, documentation, API docs, technical writing
Patterns: /write.*documentation/i, /create.*docs/i
Cost: $3.00 input / $15.00 output per 1M tokens
Example Prompts:
- "Write API documentation for this function"
- "Create technical documentation for this system"
- "Document this codebase"
const {
createModelRouter,
patchGlobalFetch,
registerApiKeys
} = require("tokenfirewall");
// Step 1: Register API keys for all providers
registerApiKeys({
openai: process.env.OPENAI_API_KEY,
anthropic: process.env.ANTHROPIC_API_KEY,
gemini: process.env.GEMINI_API_KEY,
grok: process.env.GROK_API_KEY,
kimi: process.env.KIMI_API_KEY
});
// Step 2: Enable smart model selection
createModelRouter({
strategy: "smart", // Use task-type based routing
enableCrossProvider: true // Enable cross-provider fallback
});
// Step 3: Patch global fetch
patchGlobalFetch();
// Step 4: Use any LLM API - routing is automatic!
const response = await fetch("https://api.openai.com/v1/chat/completions", {
method: "POST",
headers: {
"Authorization": `Bearer ${process.env.OPENAI_API_KEY}`,
"Content-Type": "application/json"
},
body: JSON.stringify({
model: "gpt-4o", // This will be automatically replaced with optimal model
messages: [
{ role: "user", content: "Write a Python function to sort an array" }
]
})
});
// Behind the scenes:
// 1. Detects task type: "code_generation"
// 2. Selects optimal model: "claude-3-5-sonnet-20241022"
// 3. Routes to Anthropic API
// 4. Returns response in OpenAI format
// 5. Logs decision for analyticsconst {
createBudgetGuard,
createModelRouter,
patchGlobalFetch
} = require("tokenfirewall");
// Set budget limit
createBudgetGuard({
monthlyLimit: 100, // $100 USD
mode: "block"
});
// Enable smart routing
createModelRouter({
strategy: "smart"
});
patchGlobalFetch();
// Now you have:
// ✅ Automatic task-based routing
// ✅ Budget protection
// ✅ Cost optimization
// ✅ Quality improvementcreateModelRouter({
strategy: "smart",
// Optional: Customize task classifications
taskClassification: {
"code_generation": {
model: "claude-3-5-sonnet-20241022",
reason: "Claude excels at code generation",
keywords: ["write code", "create function", "implement"],
patterns: [/write.*code/i, /create.*function/i],
priority: 10 // Higher priority = checked first
},
"math_reasoning": {
model: "o1-mini",
reason: "o1 designed for reasoning",
keywords: ["calculate", "solve", "equation"],
patterns: [/solve.*equation/i, /calculate/i],
priority: 9
},
// Add your custom task types
"legal_analysis": {
model: "gpt-4o",
reason: "Complex legal reasoning",
keywords: ["legal", "contract", "clause"],
patterns: [/legal.*analysis/i],
priority: 8
}
},
// Optional: Override specific models
modelOverrides: {
"code_generation": "gpt-4o", // Use GPT-4o instead of Claude
"math_reasoning": "o1" // Use o1 instead of o1-mini
},
// Optional: Confidence threshold (0-1)
confidenceThreshold: 0.7, // Only route if confidence > 70%
// Optional: Fallback model if no task detected
defaultModel: "gpt-4o-mini",
// Optional: Enable cross-provider fallback
enableCrossProvider: true,
// Optional: Max retries
maxRetries: 2,
// Optional: Enable analytics
enableAnalytics: true,
// Optional: Custom task detector function
customDetector: async (prompt, context) => {
// Your custom logic here
if (prompt.includes("urgent")) {
return {
taskType: "urgent_request",
model: "gpt-4o",
confidence: 1.0
};
}
return null; // Fall back to default detection
}
});const config = {
strategy: "smart",
taskClassification: {
"code_generation": {
model: process.env.NODE_ENV === "production"
? "claude-3-5-sonnet-20241022" // Best quality for production
: "gpt-4o-mini", // Cheaper for development
reason: "Environment-based selection"
}
}
};
createModelRouter(config);Simple string matching for common task indicators:
// Prompt: "Write a Python function to sort arrays"
// Keywords detected: ["write", "function"]
// Match: code_generation
// Confidence: 85%Pros: Fast, simple, reliable
Cons: Can miss context, may have false positives
Advanced pattern detection using regular expressions:
// Prompt: "Can you help me solve this equation: 2x + 5 = 15"
// Pattern matched: /solve.*equation/i
// Match: math_reasoning
// Confidence: 95%Pros: More accurate, handles variations
Cons: Requires careful pattern design
Analyzes the meaning and intent of the prompt:
// Prompt: "I need help fixing this bug in my code"
// Semantic analysis:
// - Intent: debugging
// - Domain: programming
// - Action: fix/repair
// Match: code_review
// Confidence: 90%Pros: Understands context and intent
Cons: More computationally expensive
Considers conversation history and context:
// Previous messages:
// User: "I'm building a React application"
// Assistant: "Great! What features do you need?"
// Current: "Add a login form"
// Context analysis:
// - Previous context: React development
// - Current request: Add feature
// - Inferred task: code_generation
// Match: code_generation
// Confidence: 92%Pros: Highly accurate with context
Cons: Requires conversation history
Automatically detects non-English languages:
// Prompt: "请帮我写一个排序算法"
// Language detected: Chinese (zh-CN)
// Characters: [\u4e00-\u9fa5]
// Match: chinese_language
// Confidence: 99%Pros: Perfect for multilingual apps
Cons: Limited to language-specific tasks
Combines multiple detection methods for highest accuracy:
// Prompt: "Write a Python function to solve quadratic equations"
// Method 1 - Keywords: ["write", "function"] → code_generation (70%)
// Method 2 - Pattern: /solve.*equation/i → math_reasoning (80%)
// Method 3 - Semantic: Programming + Math → hybrid (85%)
// Fusion Result:
// Primary: code_generation (60% weight)
// Secondary: math_reasoning (40% weight)
// Selected: code_generation (Claude)
// Fallback: math_reasoning (o1-mini)
// Final Confidence: 88%Pros: Highest accuracy, handles edge cases
Cons: Most complex, slightly slower
// User Request
const prompt = "Write a Python function to validate email addresses using regex";
// Task Detection
// ✓ Keywords: ["write", "function", "python"] → code_generation
// ✓ Pattern: /write.*function/i → code_generation
// ✓ Confidence: 95%
// Model Selection
// Selected: claude-3-5-sonnet-20241022
// Reason: "Claude excels at code generation"
// Cost: $3.00 input / $15.00 output per 1M tokens
// Result
// ✅ High-quality, well-documented code
// ✅ Proper error handling
// ✅ Best practices followed
// ✅ Cost: ~$0.003 for this request
// vs if we used gpt-4o-mini:
// ⚠️ Cost: $0.0002 (cheaper)
// ❌ Lower code quality
// ❌ Less robust error handling// User Request
const prompt = "Solve this calculus problem: Find the derivative of f(x) = x³ + 2x² - 5x + 7";
// Task Detection
// ✓ Keywords: ["solve", "derivative", "calculus"] → math_reasoning
// ✓ Pattern: /solve.*calculus/i → math_reasoning
// ✓ Confidence: 98%
// Model Selection
// Selected: o1-mini
// Reason: "o1 models designed for mathematical reasoning"
// Cost: $3.00 input / $12.00 output per 1M tokens
// Result
// ✅ Step-by-step solution
// ✅ Chain-of-thought reasoning
// ✅ Correct answer: f'(x) = 3x² + 4x - 5
// ✅ Cost: ~$0.006 for this request
// vs if we used gpt-4o-mini:
// ❌ May make calculation errors
// ❌ No step-by-step reasoning
// ❌ Less reliable for complex math// User Request
const prompt = "Summarize this 100-page legal contract: [150,000 tokens of text]";
// Task Detection
// ✓ Keywords: ["summarize", "document", "legal"] → document_analysis
// ✓ Context size: 150,000 tokens
// ✓ Confidence: 92%
// Model Selection
// Selected: gemini-2.5-pro
// Reason: "Gemini has 2M token context window"
// Cost: $1.25 input / $10.00 output per 1M tokens
// Result
// ✅ Processes entire document (no chunking needed)
// ✅ Comprehensive summary
// ✅ Identifies key clauses
// ✅ Cost: ~$0.30 for this request
// vs if we used gpt-4o:
// ❌ Context limit: 128K tokens
// ❌ Would fail or need chunking
// ❌ Incomplete analysis
// ❌ Higher cost per token// User Request
const prompt = "Hi, how are you today?";
// Task Detection
// ✓ Keywords: ["hi", "how are you"] → simple_chat
// ✓ Pattern: /^(hi|hello)/i → simple_chat
// ✓ Confidence: 99%
// Model Selection
// Selected: gpt-4o-mini
// Reason: "Cost-effective for simple conversation"
// Cost: $0.15 input / $0.60 output per 1M tokens
// Result
// ✅ Perfect for simple greeting
// ✅ Fast response
// ✅ Cost: ~$0.00005 for this request
// vs if we used gpt-4o:
// ⚠️ Cost: $0.0008 (16x more expensive!)
// ⚠️ Overkill for simple greeting
// ⚠️ No quality benefit// User Request
const prompt = "Write a Python function to solve quadratic equations and explain the math";
// Task Detection
// ✓ Keywords: ["write", "function", "solve", "equation"] → hybrid
// ✓ Primary: code_generation (60% confidence)
// ✓ Secondary: math_reasoning (40% confidence)
// Model Selection
// Selected: claude-3-5-sonnet-20241022
// Reason: "Primary task is code generation"
// Fallback: o1-mini (if Claude fails)
// Result
// ✅ Well-structured Python function
// ✅ Mathematical explanation included
// ✅ Best of both worlds
// ✅ Cost: ~$0.004 for this requestCreates and configures the smart model router.
Parameters:
interface SmartRouterOptions {
strategy: "smart";
taskClassification?: TaskClassificationConfig;
modelOverrides?: Record<string, string>;
confidenceThreshold?: number;
defaultModel?: string;
enableCrossProvider?: boolean;
maxRetries?: number;
enableAnalytics?: boolean;
customDetector?: (prompt: string, context: any) => Promise<TaskDetection | null>;
}Example:
createModelRouter({
strategy: "smart",
confidenceThreshold: 0.75,
defaultModel: "gpt-4o-mini",
enableAnalytics: true
});Manually classify a task type.
Parameters:
prompt(string): The user's promptcontext(object, optional): Additional context (conversation history, metadata)
Returns:
interface TaskClassification {
taskType: string;
confidence: number;
selectedModel: string;
reason: string;
alternatives: Array<{
model: string;
confidence: number;
}>;
}Example:
const classification = await classifyTask(
"Write a Python function to sort arrays",
{ conversationHistory: [...] }
);
console.log(classification);
// {
// taskType: "code_generation",
// confidence: 0.95,
// selectedModel: "claude-3-5-sonnet-20241022",
// reason: "Claude excels at code generation",
// alternatives: [
// { model: "gpt-4o", confidence: 0.75 },
// { model: "gpt-4o-mini", confidence: 0.50 }
// ]
// }Manually override the task type for the next request.
Parameters:
taskType(string): The task type to use
Example:
// Force code generation model
overrideTaskType("code_generation");
// Next request will use Claude regardless of content
const response = await fetch(url, { ... });Get analytics about task classification and model usage.
Parameters:
interface AnalyticsOptions {
startDate?: string;
endDate?: string;
groupBy?: "day" | "week" | "month" | "task" | "model";
}Returns:
interface TaskAnalytics {
totalRequests: number;
taskDistribution: Record<string, number>;
modelUsage: Record<string, number>;
costSavings: number;
averageCostPerRequest: number;
accuracyRate: number;
topTasks: Array<{ task: string; count: number; percentage: number }>;
}Example:
const analytics = await getTaskAnalytics({
startDate: "2026-05-01",
endDate: "2026-05-27",
groupBy: "task"
});
console.log(analytics);
// {
// totalRequests: 100000,
// taskDistribution: {
// code_generation: 45000,
// simple_chat: 30000,
// math_reasoning: 15000,
// document_analysis: 10000
// },
// modelUsage: {
// "claude-3-5-sonnet": 45000,
// "gpt-4o-mini": 30000,
// "o1-mini": 15000,
// "gemini-2.5-pro": 10000
// },
// costSavings: 1250.00,
// averageCostPerRequest: 0.00035,
// accuracyRate: 0.94
// }Force a specific task type for a single request:
const response = await fetch(url, {
headers: {
"X-TokenFirewall-Task-Type": "code_generation"
}
});Disable smart routing for a single request:
const response = await fetch(url, {
headers: {
"X-TokenFirewall-Smart-Routing": "false"
}
});Add custom tags for analytics:
const response = await fetch(url, {
headers: {
"X-TokenFirewall-Tags": JSON.stringify({
feature: "chat",
team: "product",
priority: "high"
})
}
});// Enable real-time logging
createModelRouter({
strategy: "smart",
enableAnalytics: true,
onTaskDetected: (detection) => {
console.log(`Task: ${detection.taskType}`);
console.log(`Model: ${detection.selectedModel}`);
console.log(`Confidence: ${detection.confidence}`);
},
onModelSelected: (selection) => {
console.log(`Routing to: ${selection.model}`);
console.log(`Reason: ${selection.reason}`);
}
});Track key metrics for optimization:
const metrics = await getSmartRoutingMetrics();
console.log(metrics);
// {
// last24Hours: {
// totalRequests: 5000,
// costSavings: 45.50,
// averageConfidence: 0.89,
// taskBreakdown: {
// code_generation: 2250,
// simple_chat: 1500,
// math_reasoning: 750,
// document_analysis: 500
// }
// },
// last7Days: {
// totalRequests: 35000,
// costSavings: 318.50,
// topModels: [
// { model: "claude-3-5-sonnet", usage: 15750 },
// { model: "gpt-4o-mini", usage: 10500 },
// { model: "o1-mini", usage: 5250 }
// ]
// },
// last30Days: {
// totalRequests: 150000,
// costSavings: 1365.00,
// savingsPercentage: 62.3
// }
// }Export data for external analysis:
const data = await exportTaskAnalytics({
format: "csv", // or "json", "xlsx"
startDate: "2026-05-01",
endDate: "2026-05-27"
});
// Save to file
fs.writeFileSync("task-analytics.csv", data);Profile:
- 100,000 requests/month
- Mix of code, chat, and documentation tasks
Without Smart Selection:
| Task Type | Requests | Model | Cost per Request | Total Cost |
|---|---|---|---|---|
| All tasks | 100,000 | GPT-4o | $0.009 | $900.00 |
With Smart Selection:
| Task Type | Requests | Model | Cost per Request | Total Cost |
|---|---|---|---|---|
| Code Generation | 40,000 | Claude | $0.003 | $120.00 |
| Simple Chat | 30,000 | GPT-4o-mini | $0.0003 | $9.00 |
| Math/Reasoning | 15,000 | o1-mini | $0.006 | $90.00 |
| Document Analysis | 10,000 | Gemini | $0.0125 | $125.00 |
| Translation | 5,000 | GPT-4o-mini | $0.0003 | $1.50 |
| TOTAL | 100,000 | Mixed | $0.00346 | $345.50 |
💰 Monthly Savings: $554.50 (62% reduction)
📈 Quality: Improved (right model for each task)
Profile:
- 50,000 requests/month
- Mostly simple questions, some complex issues
Without Smart Selection:
| Task Type | Requests | Model | Cost per Request | Total Cost |
|---|---|---|---|---|
| All tasks | 50,000 | GPT-4o | $0.003 | $150.00 |
With Smart Selection:
| Task Type | Requests | Model | Cost per Request | Total Cost |
|---|---|---|---|---|
| Simple FAQ | 35,000 | GPT-4o-mini | $0.0002 | $7.00 |
| Medium Complexity | 10,000 | GPT-4o-mini | $0.0005 | $5.00 |
| Complex Issues | 5,000 | GPT-4o | $0.003 | $15.00 |
| TOTAL | 50,000 | Mixed | $0.00054 | $27.00 |
💰 Monthly Savings: $123.00 (82% reduction)
📈 Response Time: Faster (cheaper models are faster)
Profile:
- 20,000 requests/month
- Creative writing, code generation, translations
Without Smart Selection:
| Task Type | Requests | Model | Cost per Request | Total Cost |
|---|---|---|---|---|
| All tasks | 20,000 | GPT-4o | $0.015 | $300.00 |
With Smart Selection:
| Task Type | Requests | Model | Cost per Request | Total Cost |
|---|---|---|---|---|
| Creative Writing | 8,000 | GPT-4o | $0.015 | $120.00 |
| Code Generation | 6,000 | Claude | $0.008 | $48.00 |
| Translation | 4,000 | GPT-4o-mini | $0.0005 | $2.00 |
| Simple Edits | 2,000 | GPT-4o-mini | $0.0003 | $0.60 |
| TOTAL | 20,000 | Mixed | $0.00853 | $170.60 |
💰 Monthly Savings: $129.40 (43% reduction)
📈 Quality: Same or better (specialized models)
Begin with built-in task classifications:
createModelRouter({
strategy: "smart"
// Use defaults first
});Monitor for 1-2 weeks, then customize based on your specific needs.
Set appropriate confidence levels:
createModelRouter({
strategy: "smart",
confidenceThreshold: 0.75, // Only route if 75%+ confident
defaultModel: "gpt-4o-mini" // Fallback for low confidence
});Recommended thresholds:
- Production: 0.75-0.85 (higher accuracy)
- Development: 0.60-0.70 (more experimentation)
Regularly review analytics:
// Weekly review
const analytics = await getTaskAnalytics({
startDate: lastWeek,
endDate: today
});
// Check accuracy
if (analytics.accuracyRate < 0.85) {
console.warn("Low accuracy - review task classifications");
}
// Check cost savings
console.log(`Saved: $${analytics.costSavings.toFixed(2)}`);Tag requests for better analytics:
await fetch(url, {
headers: {
"X-TokenFirewall-Tags": JSON.stringify({
feature: "chat",
team: "product",
customer: "acme-corp",
priority: "high"
})
}
});
// Later, analyze by tag
const costs = await getCostsByTag("customer", "acme-corp");Always use budget guards:
createBudgetGuard({
monthlyLimit: 500,
mode: "block"
});
createModelRouter({
strategy: "smart"
});
// Now you have:
// ✅ Cost optimization (smart routing)
// ✅ Cost protection (budget guard)Use environment-based configuration:
const config = {
strategy: "smart",
confidenceThreshold: process.env.NODE_ENV === "production" ? 0.80 : 0.60,
enableAnalytics: true
};
createModelRouter(config);Provide fallbacks for uncertain tasks:
createModelRouter({
strategy: "smart",
defaultModel: "gpt-4o-mini", // Safe, cheap default
customDetector: async (prompt) => {
// Handle special cases
if (prompt.includes("URGENT")) {
return {
taskType: "urgent",
model: "gpt-4o",
confidence: 1.0
};
}
return null; // Use default detection
}
});Keep track of custom task types:
// tasks.config.js
module.exports = {
taskClassification: {
"legal_analysis": {
model: "gpt-4o",
reason: "Complex legal reasoning required",
keywords: ["legal", "contract", "clause"],
// Document why this exists
notes: "Added for legal team - requires high accuracy"
}
}
};Handle prompts with multiple task types:
createModelRouter({
strategy: "smart",
multiTaskHandling: "primary", // or "hybrid", "split"
customDetector: async (prompt) => {
const tasks = await detectMultipleTasks(prompt);
if (tasks.length > 1) {
// Primary task approach
return tasks[0]; // Use highest confidence
// Or hybrid approach
return {
taskType: "hybrid",
model: selectBestForHybrid(tasks),
confidence: averageConfidence(tasks)
};
}
return null;
}
});Use conversation history for better detection:
const conversationHistory = [
{ role: "user", content: "I'm building a React app" },
{ role: "assistant", content: "Great! What features do you need?" }
];
const response = await fetch(url, {
headers: {
"X-TokenFirewall-Context": JSON.stringify(conversationHistory)
},
body: JSON.stringify({
messages: [
...conversationHistory,
{ role: "user", content: "Add authentication" }
]
})
});
// Detection will use context:
// Previous: React app development
// Current: Add authentication
// Inferred: code_generation → ClaudeTest different models for the same task:
createModelRouter({
strategy: "smart",
abTesting: {
enabled: true,
tasks: {
"code_generation": {
variants: [
{ model: "claude-3-5-sonnet", weight: 0.7 },
{ model: "gpt-4o", weight: 0.3 }
]
}
}
}
});
// 70% of code generation → Claude
// 30% of code generation → GPT-4o
// Track which performs betterOverride based on cost thresholds:
createModelRouter({
strategy: "smart",
costOverrides: {
maxCostPerRequest: 0.01, // $0.01 max
fallbackModel: "gpt-4o-mini"
},
customDetector: async (prompt) => {
const detection = await defaultDetection(prompt);
const estimatedCost = estimateCost(detection.model, prompt);
if (estimatedCost > 0.01) {
return {
taskType: detection.taskType,
model: "gpt-4o-mini", // Downgrade to cheaper
confidence: detection.confidence,
reason: "Cost threshold exceeded"
};
}
return detection;
}
});Route based on quality requirements:
await fetch(url, {
headers: {
"X-TokenFirewall-Quality": "high" // or "medium", "low"
}
});
// High quality → Use premium models
// Medium quality → Use balanced models
// Low quality → Use cheap modelsSymptom: Many requests falling back to default model
Solutions:
- Add more keywords to task definitions
- Add more regex patterns
- Lower confidence threshold
- Add custom detector for your specific use case
createModelRouter({
strategy: "smart",
confidenceThreshold: 0.65, // Lower threshold
taskClassification: {
"your_task": {
keywords: ["more", "keywords", "here"],
patterns: [/more.*patterns/i]
}
}
});Symptom: Task classified incorrectly
Solutions:
- Check keyword conflicts between tasks
- Add more specific patterns
- Use manual override for specific requests
- Review analytics to find patterns
// Manual override
await fetch(url, {
headers: {
"X-TokenFirewall-Task-Type": "correct_task_type"
}
});Symptom: Costs not decreasing as expected
Solutions:
- Review task distribution in analytics
- Check if expensive models are being overused
- Adjust task classifications
- Add cost-based overrides
const analytics = await getTaskAnalytics();
console.log(analytics.modelUsage);
// Check if expensive models dominateSymptom: Requests taking longer than expected
Solutions:
- Task detection adds minimal overhead (<10ms)
- Check if using expensive models unnecessarily
- Consider caching task classifications
- Use faster models for time-sensitive tasks
createModelRouter({
strategy: "smart",
cacheDetections: true, // Cache for 5 minutes
taskClassification: {
"time_sensitive": {
model: "gpt-4o-mini", // Fastest model
keywords: ["urgent", "quick", "fast"]
}
}
});Smart Model Selection transforms how you use LLMs by:
✅ Reducing costs by 50-90% through intelligent routing
✅ Improving quality by using specialized models
✅ Increasing reliability with appropriate model selection
✅ Providing insights through comprehensive analytics
✅ Requiring zero code changes - drop-in solution
-
Install TokenFirewall
npm install tokenfirewall@latest
-
Enable Smart Routing
createModelRouter({ strategy: "smart" });
-
Monitor Results
const analytics = await getTaskAnalytics();
-
Optimize & Iterate
- Review analytics weekly
- Adjust task classifications
- Add custom tasks as needed
- GitHub: https://github.com/Ruthwik000/tokenfirewall
- npm: https://www.npmjs.com/package/tokenfirewall
- Documentation: See README.md
- Issues: https://github.com/Ruthwik000/tokenfirewall/issues
- Examples: See examples/ directory
Built with ❤️ for the AI developer community.
TokenFirewall - Smart Model Selection for Production LLM Applications