Sponsored by Deepsite.site

Llm Router

Created By
ypollak2a month ago
Route text, image, video, and audio tasks to 20+ AI providers — automatically picking the best model for the job based on your budget and active profile.
Overview

llm-router

Route every AI call to the cheapest model that can handle it. 43 tools · 20+ providers · Claude Code, VS Code, Cursor, Codex, and more.

PyPI Tests Downloads Python MCP License Stars

Average savings: 60–80% vs running everything on Claude Opus.

# One command to start saving
uvx claude-code-llm-router install

# Or: guided 5-minute setup
uvx claude-code-llm-router quickstart
HostOne-line install
Claude Codellm-router install
VS Codellm-router install --host vscode
Cursorllm-router install --host cursor
Codex CLIllm-router install --host codex

LLM Router is an MCP server and hook set that intercepts prompts and routes them to the cheapest model that can handle the task.

It is built for a common failure mode in AI coding tools: using your best model for everything. In Claude Code, that burns quota on simple explanations, file lookups, small edits, and repetitive prompts. In other MCP clients, it means paying premium-model prices for work that never needed them.

The goal is simple: keep cheap work on cheap or free models, keep hard work on Claude or other premium models, and remove the need to micromanage model selection. Works in Claude Code, Cursor, VS Code, Codex, Windsurf, Zed, claw-code, and Agno.


Why

Most sessions contain a lot of low-value turns: quick questions, repo lookups, boilerplate edits, and small follow-ups. Those are exactly the prompts that quietly burn through premium models.

LLM Router offloads that work first, then escalates when the task actually needs more capability.

  • Cheap work stays cheap.
  • Hard work still gets the best model.
  • Your workflow stays the same.

It does not try to replace Claude or force weak models onto hard tasks. It removes the waste around them.


Quick Start

pipx install claude-code-llm-router && llm-router install

llm-router install registers the MCP server and installs hooks so prompt routing starts automatically.

If you use Claude Code Pro/Max, you can start with zero API keys. Otherwise add GEMINI_API_KEY for a cheap free-tier fallback.

GEMINI_API_KEY=AIza...              # optional free-tier fallback
LLM_ROUTER_CLAUDE_SUBSCRIPTION=true

How It Works

  1. Intercept the prompt before your default premium model sees it.
  2. Classify the task and its complexity.
  3. Try the cheapest capable route first.
  4. Escalate or fall back when the task needs more capability.

Under the hood, every prompt goes through a UserPromptSubmit hook before your top-tier model sees it:

0. Context inherit      instant, free    "yes/ok/go ahead" reuse prior turn's route
1. Heuristic scoring    instant, free    high-confidence patterns route immediately
2. Ollama local LLM     free, ~1s        catches what heuristics miss
3. Cheap API            ~$0.0001         Gemini Flash / GPT-4o-mini fallback
PromptClassified asRouted to
"What does os.path.join do?"query/simpleGemini Flash ($0.000001)
"Fix the bug in auth.py"code/moderateHaiku / Sonnet
"Design the full auth system"code/complexSonnet / Opus
"Research latest AI funding"researchPerplexity Sonar Pro
"Generate a hero image"imageFlux Pro via fal.ai

Free-first chain (subscription mode): Ollama → Codex (free via OpenAI sub) → paid API


MCP Tools

41 tools across 6 categories:

Smart Routing

ToolWhat it does
llm_routeAuto-classify prompt → route to best model
llm_autoRoute + server-side savings tracking — designed for hook-less hosts (Codex CLI, Claude Desktop, Copilot)
llm_classifyClassify complexity + recommend model
llm_select_agentPick agent CLI (claude_code / codex) + model for a session
llm_streamStream LLM response for long-running tasks

Text & Code

ToolWhat it does
llm_queryGeneral questions — routed to cheapest capable model
llm_researchWeb-grounded answers via Perplexity Sonar
llm_generateCreative writing, summaries, brainstorming
llm_analyzeDeep reasoning — analysis, debugging, design review
llm_codeCode generation, refactoring, algorithms
llm_editRoute edit reasoning to cheap model → returns {file, old, new} patch pairs

Filesystem

ToolWhat it does
llm_fs_findDescribe files to find → cheap model returns glob/grep commands
llm_fs_renameDescribe a rename → returns mv/git mv commands (dry_run by default)
llm_fs_edit_manyBulk edits across files → returns all patch pairs

Media

ToolWhat it does
llm_imageImage generation — Flux, DALL-E, Gemini Imagen
llm_videoVideo generation — Runway, Kling, Veo 2
llm_audioTTS/voice — ElevenLabs, OpenAI

Orchestration

ToolWhat it does
llm_orchestrateMulti-step pipeline across multiple models
llm_pipeline_templatesList available pipeline templates

Monitoring & Admin

ToolWhat it does
llm_usageUnified dashboard — Claude sub, Codex, APIs, savings
llm_savingsCross-session savings breakdown by period, host, and task type
llm_check_usageLive Claude subscription usage (session %, weekly %)
llm_healthProvider availability + circuit breaker status
llm_providersList all configured providers and models
llm_set_profileSwitch profile: budget / balanced / premium
llm_setupInteractive provider wizard — add keys, validate, install hooks
llm_quality_reportRouting accuracy, savings metrics, classifier stats
llm_rateRate last response 👍/👎 — logged for quality tracking
llm_codexRoute task to local Codex desktop agent (free)
llm_save_sessionPersist session summary for cross-session context
llm_cache_statsCache hit rate, entries, evictions
llm_cache_clearClear classification cache
llm_refresh_claude_usageForce-refresh subscription data via OAuth
llm_update_usageFeed usage data from claude.ai into the router
llm_track_usageReport Claude Code token usage for budget tracking
llm_dashboardOpen web dashboard at localhost:7337
llm_team_reportTeam-wide routing savings report
llm_team_pushPush local savings data to shared team store
llm_policyShow active org/repo routing policy + last 10 policy decisions
llm_digestSavings digest with spend-spike detection; push to Slack/Discord webhook
llm_benchmarkPer-task-type routing accuracy from llm_rate feedback

Routing Profiles

Three profiles — switch anytime with llm_set_profile:

ProfileUse caseChain
budgetDev, drafts, explorationOllama → Haiku → Gemini Flash
balancedProduction work (default)Codex → Sonnet → GPT-4o
premiumCritical tasks, max qualityCodex → Opus → o3

Profile is overridden by complexity: simple prompts always use the budget chain, complex ones escalate to premium, regardless of the active profile setting.


Providers

ProviderModelsFree tierBest for
OllamaAny local modelYes (forever)Privacy, zero cost, offline
Google Gemini2.5 Flash, 2.5 ProYes (1M tokens/day)Generation, long context
GroqLlama 3.3, MixtralYesUltra-fast inference
OpenAIGPT-4o, o3, DALL-ENoCode, reasoning, images
PerplexitySonar, Sonar ProNoResearch, current events
AnthropicHaiku, Sonnet, OpusNoWriting, analysis, safety
DeepSeekV3, ReasonerLimitedCost-effective reasoning
MistralLarge, SmallLimitedMultilingual
fal.aiFlux, Kling, VeoNoImages, video, audio
ElevenLabsVoice modelsLimitedHigh-quality TTS
RunwayGen-3NoProfessional video

Full setup guides: docs/PROVIDERS.md


Works With

Claude Code

Auto-installed by llm-router install. Hooks intercept every prompt — you never need to call tools manually unless you want explicit control.

pipx install claude-code-llm-router && llm-router install

Live status bar shows routing stats before every prompt and in the persistent bottom statusline:

📊  CC 13%s · 24%w  │  sub:0 · free:305 · paid:27  │  $1.59 saved (35%)

claw-code

Add to ~/.claw-code/mcp.json:

{
  "mcpServers": {
    "llm-router": { "command": "llm-router", "args": [] }
  }
}

Every API call in claw-code is paid — the free-first chain (Ollama → Codex → Gemini Flash) saves more here than in Claude Code.

Cursor / Windsurf / Zed

Add to your IDE's MCP config:

{
  "mcpServers": {
    "llm-router": { "command": "llm-router", "args": [] }
  }
}

Agno (multi-agent)

Two integration modes:

Option 1 — RouteredModel (v2.0+): use llm-router as a first-class Agno model. Every agent call is automatically routed to the cheapest capable provider.

pip install "claude-code-llm-router[agno]"
from agno.agent import Agent
from llm_router.integrations.agno import RouteredModel, RouteredTeam

# Single agent — routes each call intelligently
coder = Agent(
    model=RouteredModel(task_type="code", profile="balanced"),
    instructions="You are a coding assistant.",
)
coder.print_response("Write a Python quicksort.")

# Multi-agent team with shared $20/month budget cap
# Automatically downshifts to 'budget' profile at 80% spend
team = RouteredTeam(
    members=[coder, researcher],
    monthly_budget_usd=20.0,
    downshift_at=0.80,
)

Option 2 — MCP tools: use llm-router's 41 tools in any Agno agent:

from agno.agent import Agent
from agno.models.anthropic import Claude
from agno.tools.mcp import MCPTools

agent = Agent(
    model=Claude(id="claude-sonnet-4-6"),
    tools=[MCPTools(command="llm-router")],
    instructions="Use llm_research for web searches, llm_code for coding tasks.",
)

Supported Hosts

HostInstall commandWrites filesHook support
Claude Codellm-router install✅ Full auto-route
Codex CLIllm-router install --host codex✅ PostToolUse
OpenCodellm-router install --host opencode✅ PostToolUse
Gemini CLIllm-router install --host gemini-cli✅ Extension hook
GitHub Copilot CLIllm-router install --host copilot-cli
OpenClawllm-router install --host openclaw
Trae IDEllm-router install --host trae
Factory Droidllm-router install --host factory✅ manifest— (Claude Code compat)
VS Code (MCP native)llm-router install --host vscode
Cursor IDEllm-router install --host cursor
Claude Desktopllm-router install --host desktopsnippet
GitHub Copilot (VS Code)llm-router install --host copilotsnippet

All installs are idempotent — run any command twice safely.

Codex CLI

llm-router install --host codex

Writes ~/.codex/config.yaml, ~/.codex/hooks.json (PostToolUse), and ~/.codex/instructions.md.

codex plugin install llm-router   # or via Codex marketplace

OpenCode

llm-router install --host opencode

Writes ~/.config/opencode/config.json (MCP block), PostToolUse hook, and routing rules.

Gemini CLI

llm-router install --host gemini-cli

Writes ~/.gemini/settings.json, creates the llm-router extension with gemini-extension.json + hooks.json, and appends routing rules.

GitHub Copilot CLI

llm-router install --host copilot-cli

Writes ~/.config/gh/copilot/mcp.json and routing rules.

OpenClaw

llm-router install --host openclaw

Writes ~/.openclaw/mcp.json and routing rules.

Trae IDE

llm-router install --host trae

Writes the platform-appropriate Trae config (~/Library/Application Support/Trae/mcp.json on macOS) and a .rules file in the current directory.

Factory Droid

Factory Droid natively supports Claude Code plugin format (.claude-plugin/) — no extra setup needed:

factory plugin install ypollak2/llm-router
# or via Factory marketplace search: llm-router

The dedicated .factory-plugin/ manifest is included for Factory marketplace discovery.

Claude Desktop

llm-router install --host desktop

Prints the snippet for claude_desktop_config.json. No hooks in Desktop — use llm_auto for savings tracking.

GitHub Copilot (VS Code)

llm-router install --host copilot

Prints the snippet for .vscode/mcp.json and a copilot-instructions.md template.

All at once

llm-router install --host all   # installs/prints all hosts

Docker / CI

RUN pip install claude-code-llm-router && llm-router install --headless
# Pass keys at runtime: docker run -e GEMINI_API_KEY=... your-image

Configuration

# API keys — at least one required
GEMINI_API_KEY=AIza...              # free tier at aistudio.google.com
OPENAI_API_KEY=sk-proj-...
PERPLEXITY_API_KEY=pplx-...
ANTHROPIC_API_KEY=sk-ant-...        # skip if using Claude Code subscription
DEEPSEEK_API_KEY=...
GROQ_API_KEY=gsk_...
FAL_KEY=...                         # images, video, audio via fal.ai
ELEVENLABS_API_KEY=...

# Router
LLM_ROUTER_PROFILE=balanced         # budget | balanced | premium
LLM_ROUTER_MONTHLY_BUDGET=0         # USD, 0 = unlimited
LLM_ROUTER_CLAUDE_SUBSCRIPTION=false  # true = Claude Code Pro/Max
LLM_ROUTER_ENFORCE=enforce          # shadow | suggest | enforce (default: enforce)

# Ollama (local models)
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_BUDGET_MODELS=gemma4:latest,qwen3.5:latest

# Spend limits
LLM_ROUTER_DAILY_SPEND_LIMIT=5.00   # USD, 0 = disabled

Enforcement Modes

Choose how strict routing should be. The easiest way is llm-router onboard, which lets you pick a mode interactively.

ModeBehaviourBest for
shadowObserve routing decisions, never blocksSafest first install
suggestShow route hints, allow direct answersLow-friction adoption
enforceBlock routed violations until the route is followedMaximum savings

LLM_ROUTER_ENFORCE=hard is the strict compatibility alias for enforce. Legacy soft and off values are still supported for direct CLI or env-based control.

Repo-level config (.llm-router.yml)

Commit a routing policy alongside your code — no env vars required:

profile: balanced
enforce: suggest          # shadow | suggest | enforce
block_providers:
  - openai                # never use OpenAI in this repo

routing:
  code:
    model: ollama/qwen3.5:latest   # always use local model for code tasks
  research:
    provider: perplexity           # always use Perplexity for research

daily_caps:
  _total: 2.00            # global $2/day cap
  code: 0.50              # code tasks capped at $0.50/day

User-level overrides live in ~/.llm-router/routing.yaml (same schema). Repo config wins.

Full reference: .env.example


Budget Control

LLM_ROUTER_MONTHLY_BUDGET=50   # raises BudgetExceededError when exceeded
llm_usage("month")
→ Calls: 142 | Tokens: 320k | Cost: $3.42 | Budget: 6.8% of $50

The router tracks spend in SQLite across all providers and blocks calls when the monthly cap is reached.


Dashboard

llm-router dashboard   # opens localhost:7337

Live view of routing decisions, cost trends, model distribution, and subscription pressure. Auto-refreshes every 30s.


Session Summary

At session end the router prints a breakdown:

  Free models  305 calls  ·  $0.52 saved  (Ollama / Codex)
  External       27 calls  ·  $0.006       (Gemini Flash, GPT-4o)
  💡 Saved ~$0.53 this session

Share your savings:

llm-router share   # copies savings card to clipboard + opens tweet

Roadmap

Positioning: Claude Code's cost autopilot. Stop paying Opus prices for Haiku work.

Phase 1 — Trust & Proof (Apr–Jun 2026)

VersionHeadlineStatus
v1.3–v2.0Foundation, dashboard, enforcement, Agno adapter✅ Done
v2.1Route Simulatorllm-router test "<prompt>" dry-run + llm_savings dashboard✅ Done
v2.2Explainable RoutingLLM_ROUTER_EXPLAIN=1, "why not Opus?", per-decision reasoning✅ Done
v2.3Zero-Friction Activation — onboarding wizard, shadow/suggest/enforce modes, yearly savings projection✅ Done

Phase 2 — Smarter Routing (Jun–Aug 2026)

VersionHeadlineStatus
v2.4Repo-Aware YAML Config.llm-router.yml committed with the codebase, block_providers, model pins✅ Done
v2.5Context-Aware Routing — "yes/ok/go ahead" inherits prior turn's route, zero classifier latency✅ Done
v2.6Latency + Personalized Routing — p95 latency scoring, per-user acceptance signals✅ Done

Phase 3 — Team Infrastructure (Sep–Nov 2026)

VersionHeadlineStatus
v3.0Team Dashboard — shared savings across the whole team✅ Done
v3.1Multi-Host + Cross-Session Savingsllm_auto, Codex/Desktop/Copilot adapters, persistent savings across sessions, 30-day projection✅ Done
v3.2Policy Engine — org/project/user routing policy, spend caps, audit log✅ Done
v3.3Slack Digests + Codex Plugin — weekly savings digest, spend-spike alerts, Codex marketplace plugin✅ Done

Phase 4 — Category Leadership (Jan–Apr 2027)

VersionHeadlineStatus
v3.4Agent-Context Routing — subscription-first chain reordering when Codex or Claude Code is active✅ Done
v3.5Multi-Agent CLI Compatibility — OpenCode, Gemini CLI, Copilot CLI, OpenClaw, Factory Droid, Trae✅ Done
v4.0VS Code + Cursor GA — cross-editor routing, shared config and analytics📅 Apr 2027

Full details: ROADMAP.md


Development

uv sync --extra dev
uv run pytest tests/ -q --ignore=tests/test_integration.py
uv run ruff check src/ tests/

See CLAUDE.md for architecture and module layout.


Contributing

See CONTRIBUTING.md. Key areas: new provider integrations, routing intelligence, MCP client testing.


License

MIT


Built with LiteLLM and MCP

Recommend Servers
TraeBuild with Free GPT-4.1 & Claude 3.7. Fully MCP-Ready.
MCP AdvisorMCP Advisor & Installation - Use the right MCP server for your needs
EdgeOne Pages MCPAn MCP service designed for deploying HTML content to EdgeOne Pages and obtaining an accessible public URL.
MiniMax MCPOfficial MiniMax Model Context Protocol (MCP) server that enables interaction with powerful Text to Speech, image generation and video generation APIs.
Playwright McpPlaywright MCP server
Howtocook Mcp基于Anduin2017 / HowToCook (程序员在家做饭指南)的mcp server,帮你推荐菜谱、规划膳食,解决“今天吃什么“的世纪难题; Based on Anduin2017/HowToCook (Programmer's Guide to Cooking at Home), MCP Server helps you recommend recipes, plan meals, and solve the century old problem of "what to eat today"
Jina AI MCP ToolsA Model Context Protocol (MCP) server that integrates with Jina AI Search Foundation APIs.
AiimagemultistyleA Model Context Protocol (MCP) server for image generation and manipulation using fal.ai's Stable Diffusion model.
RedisA Model Context Protocol server that provides access to Redis databases. This server enables LLMs to interact with Redis key-value stores through a set of standardized tools.
BlenderBlenderMCP connects Blender to Claude AI through the Model Context Protocol (MCP), allowing Claude to directly interact with and control Blender. This integration enables prompt assisted 3D modeling, scene creation, and manipulation.
Zhipu Web SearchZhipu Web Search MCP Server is a search engine specifically designed for large models. It integrates four search engines, allowing users to flexibly compare and switch between them. Building upon the web crawling and ranking capabilities of traditional search engines, it enhances intent recognition capabilities, returning results more suitable for large model processing (such as webpage titles, URLs, summaries, site names, site icons, etc.). This helps AI applications achieve "dynamic knowledge acquisition" and "precise scenario adaptation" capabilities.
ChatWiseThe second fastest AI chatbot™
Baidu Map百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
WindsurfThe new purpose-built IDE to harness magic
Y GuiA web-based graphical interface for AI chat interactions with support for multiple AI models and MCP (Model Context Protocol) servers.
DeepChatYour AI Partner on Desktop
Amap Maps高德地图官方 MCP Server
Tavily Mcp
Visual Studio Code - Open Source ("Code - OSS")Visual Studio Code
Serper MCP ServerA Serper MCP Server
CursorThe AI Code Editor