Sponsored by Deepsite.site

Budget Governor

Created By
mightbesaad18 days ago
Budget & cost control for AI agents: hard per-agent spend caps, rate limits, idempotency, and human-in-the-loop approval — enforced before each LLM call, not after the invoice. One hosted MCP endpoint (no proxy or self-hosting), settled via x402 (USDC on Base).
Overview

Gvnr

Tests Release License: MIT MCP Registry gvnr MCP server smithery badge

Spend caps, rate coordination, idempotency, and a human-in-the-loop gate for AI agents — enforced before the call, not after the invoice. One MCP endpoint, settled via x402 (USDC on Base). No proxy, no self-hosting, no infrastructure to deploy.

Listed on the Official MCP Registry as dev.gvnr/gvnr.


The problem

Agents cost 10–12× more than estimated in production. System prompts, retry loops, and tool calls multiply fast — a runaway agent can generate a $47,000 bill in 11 days. The usual fix, self-hosting a gateway like LiteLLM, means running infrastructure most developers won't set up.

Gvnr is the hosted alternative: an external authority your agent checks before it spends. Your agent asks "am I clear to make this call?" and acts on the answer.


How it works

Gvnr is not an LLM proxy — your tokens never pass through it, and you pay your model provider directly. Gvnr governs the decision to spend, in two independent meters:

1. The governance-operation quota — what you buy from Gvnr. Your account holds a balance of governance operations (operations_remaining). One budget_clear burns one op. You top this up with USDC; it's decoupled from your LLM spend. get_balance reports it.

2. The spend envelope — a USD cap you set, per agent. set_envelope(agent_id, limit_usd, window) gives an agent a daily or per-session USD ceiling. Gvnr tracks estimated spend against it and denies once it's exceeded — this is the runaway guardrail. No dollars move; it's an accounting limit you control.

A budget_clear is approved only when both hold: the account has ops left in the quota, and the agent is under its envelope. The loop:

  1. Your agent calls budget_clear(agent_id, model, estimated_tokens) before each LLM request.
  2. Gvnr checks the op quota and the agent's envelope, and returns { approved: true, ... } or { approved: false, reason }.
  3. If denied, your agent skips (or escalates via request_approval).
  4. After the LLM responds, call reconcile(...) with the real token counts so the envelope tracks actual cost, not the estimate.

Quick start

1. Provision an account

curl -X POST https://gvnr.dev/v1/account
# { "api_key": "bg_...", "account_id": "...", "operations_remaining": 25 }

New accounts include 25 free trial ops — enough to run the full loop (set an envelope, budget_clear, reconcile) before you fund anything. Your api_key is the credential for every call; account_id is just an internal reference for support. Account creation is credential-only by design (agent-native) — email is optional, set separately via POST /v1/account/notification-email, and used only for human approval notices.

2. Top up your governance-op quota

Pay-as-you-go: 1,000 ops per $1, any amount (try the whole rail for $1), USDC on Base. Open the pay page, name your amount, and pass your API key:

https://gvnr.dev/pay?usd=1&api_key=bg_YOUR_KEY

Send USDC to the address shown and paste your tx hash — ops are credited proportional to the amount received, after on-chain verification. Programmatic clients can submit the hash directly:

curl -X POST \
  -H "Authorization: Bearer bg_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"tx_hash":"0x..."}' \
  https://gvnr.dev/v1/account/topup-verify
# { "operations_remaining": 1000, "credited_ops": 1000, "credited_usd": 1 }

Or settle in one round-trip with an x402 client (Base MCP, AgentKit, x402-fetch) — name your own amount:

POST https://gvnr.dev/v1/account/topup?usd=5      # → 402 challenge → pay → 5,000 ops credited

3. Set a spend envelope for your agent

curl -X PUT \
  -H "Authorization: Bearer bg_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"agent_id":"my-agent","limit_usd":5,"window":"daily"}' \
  https://gvnr.dev/v1/budget/envelope
# { "success": true, "agent_id": "my-agent", "limit_usd": 5, "window": "daily" }

4. (optional) Set a rate envelope per (agent, provider, model)

curl -X PUT \
  -H "Authorization: Bearer bg_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"agent_id":"my-agent","provider":"anthropic","model":"claude-sonnet-4-6","requests_per_minute":30}' \
  https://gvnr.dev/v1/rate/envelope

5. Before each LLM request: budget_clear, then rate_check

curl -X POST \
  -H "Authorization: Bearer bg_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"agent_id":"my-agent","model":"claude-sonnet-4-6","estimated_tokens":2000}' \
  https://gvnr.dev/v1/budget/clear
# { "approved": true, "remaining_usd": 4.994, "operations_remaining": 18999 }

curl -X POST \
  -H "Authorization: Bearer bg_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"agent_id":"my-agent","provider":"anthropic","model":"claude-sonnet-4-6"}' \
  https://gvnr.dev/v1/rate/check
# { "allowed": true, "requests_remaining_this_minute": 29 }

6. (optional) Dedupe retries with idempotency_check

curl -X POST \
  -H "Authorization: Bearer bg_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"key":"job-abc-123","ttl_seconds":3600}' \
  https://gvnr.dev/v1/idempotency/check
# First call:  { "is_first_call": true,  "ttl_remaining_seconds": 3600 }
# Replay:      { "is_first_call": false, "ttl_remaining_seconds": 3598 }

7. After the LLM responds, reconcile against actual usage

curl -X POST \
  -H "Authorization: Bearer bg_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"agent_id":"my-agent","actual_input_tokens":1800,"actual_output_tokens":2400}' \
  https://gvnr.dev/v1/budget/reconcile
# { "ok": true, "drift_usd": 0.003, "remaining_usd": 4.991, "operations_remaining": 18999 }

reconcile adjusts the spend envelope by the drift between your estimate and actual cost (the op quota is untouched). You don't pass the model again — reconcile reuses the one from your prior budget_clear. Anthropic, OpenAI, and Gemini all return usage fields with real token counts — pass those in to keep the envelope honest.


Human-in-the-loop approvals

When an agent hits a denial or a sensitive action, pause for a human instead of failing:

# 1. Open a request — returns an approval_id and a mobile-friendly approval_url
curl -X POST -H "Authorization: Bearer bg_YOUR_KEY" -H "Content-Type: application/json" \
  -d '{"agent_id":"my-agent","action_summary":"Spend $40 on a research run","ttl_seconds":3600}' \
  https://gvnr.dev/v1/approval/request
# { "approval_id": "...", "approval_url": "https://gvnr.dev/approve/...", "expires_at": ... }

# 2. The human opens approval_url and taps approve / deny (emailed if notification-email is set).

# 3. Your agent polls until the decision lands:
curl -H "Authorization: Bearer bg_YOUR_KEY" \
  https://gvnr.dev/v1/approval/check/APPROVAL_ID
# { "decision": "pending" }  →  "approved" | "denied" | "timeout"

The agent proceeds on approved, skips on denied, and handles timeout (no decision before expires_at) however it likes.


MCP setup

Add to Claude Desktop or any MCP-compatible client:

https://gvnr.dev/mcp?api_key=bg_YOUR_KEY

Claude Code

claude mcp add gvnr --transport http \
  "https://gvnr.dev/mcp?api_key=bg_YOUR_KEY"

MCP tools

ToolDescription
budget_clear(agent_id, model, estimated_tokens)Check clearance against the op quota + spend envelope; burns one op
set_envelope(agent_id, limit_usd, window?)Create or update an agent's USD spend cap
get_balance()Remaining governance-operation quota (operations_remaining)
reconcile(agent_id, actual_input_tokens, actual_output_tokens)Apply estimate-vs-actual drift to the spend envelope after the LLM responds
set_rate_envelope(agent_id, provider, model, requests_per_minute)Allocate a per-(agent, provider, model) rate share
rate_check(agent_id, provider, model)Approve or deny against the rate envelope; returns retry_after_ms on denial
idempotency_check(key, ttl_seconds?)Dedupe retries on a caller-supplied key; returns is_first_call
request_approval(agent_id, action_summary, ttl_seconds?)Open a human-in-the-loop approval request; returns an approval_id
check_approval(approval_id)Poll an approval: pending / approved / denied / timeout

REST API

All endpoints except POST /v1/account require Authorization: Bearer bg_YOUR_KEY.

TypeScript users: generate full types from the live OpenAPI spec — npx openapi-typescript@latest https://gvnr.dev/openapi.json -o types/gvnr.d.ts. See TYPESCRIPT.md for the integration pattern (typed fetch, x402 topups, discriminated response unions).

Account & top-up

MethodPathDescription
POST/v1/accountProvision account — returns api_key
GET/v1/account/balanceRemaining governance-op quota (operations_remaining)
GET/v1/packs/:pack/infoPublic — preset details, USDC address, raw amount
POST/v1/account/topup-verify/:packSubmit tx hash → verify on-chain → credit (proportional to amount received)
POST/v1/account/topup?usd=<amount>x402-gated pay-as-you-go top-up — name your amount (min $1, max $100)
POST/v1/account/topup/:packx402-gated preset top-up (machine clients)

Budget

MethodPathDescription
POST/v1/budget/clearClearance call — approve or deny
POST/v1/budget/reconcileApply estimate-vs-actual drift to the envelope
PUT/v1/budget/envelopeCreate or update agent spend cap
GET/v1/budget/envelope/:agent_idRead envelope state
DELETE/v1/budget/envelope/:agent_idDelete an agent's envelope

Rate · Idempotency · Approval

MethodPathDescription
PUT/v1/rate/envelopeSet a per-(agent, provider, model) RPM share
POST/v1/rate/checkRuntime rate check — allowed flag, retry_after_ms on denial
POST/v1/idempotency/checkDedupe on a caller-supplied key — is_first_call
POST/v1/approval/requestOpen a human approval request — returns approval_id
GET/v1/approval/check/:approval_idPoll decision: pending / approved / denied / timeout

Clearance response

{ "approved": true, "remaining_usd": 4.994, "operations_remaining": 18999 }
{ "approved": false, "remaining_usd": 0, "reason": "envelope_exceeded" }

Denial reasons: no_credits (op quota exhausted) · no_envelope (agent has no envelope) · envelope_exceeded

Status codes

A denial is not an HTTP error — it's a 200 with a flag, so check the body, not the status:

SituationHTTPBody
budget_clear / rate_check allow or deny200{ approved/allowed: true | false, ... }
Top-up requires payment402x402 challenge (in the payment-required header; see Billing)
Missing / invalid API key401{ error }
Per-IP account-creation throttle429{ error: "rate_limited", retry_after_ms }
Bad request body400{ error, hint }

Billing — governance ops, not your tokens

Gvnr charges for governance operations, not LLM usage. Your model tokens are billed by your provider; Gvnr never sees them.

Top-ups are pay-as-you-go at 1,000 ops/$1 in USDC on Base mainnet — name any amount on /pay and ops are credited proportionally after on-chain verification. No minimum, no subscription; the amounts below are just one-tap presets:

AmountGovernance opsLink
$1 (trial)1,000/pay?usd=1
$1919,000/pay?usd=19
$3939,000/pay?usd=39
$7979,000/pay?usd=79

Works with Base MCP, AgentKit, and any x402 client. The 402 challenge follows x402 v2 — payment requirements (network, USDC asset, amount, payTo) are returned in the payment-required response header (the body is empty), so an x402 client settles it automatically; you only hand-parse it if you're rolling your own.


Envelope windows

  • daily — resets at UTC midnight each day
  • session — never resets (use for one-shot tasks; caller-managed)

Supported models

Model pricing is a static lookup on the hot path — no external calls. The estimate deducted from the envelope is rate(model) × estimated_tokens ÷ 1,000,000 (output rate for chat models, input rate for embeddings), reconciled to actual afterward. These are the per-million-token rates (USD):

ModelInputOutput
claude-opus-4-8 / 4-7 / 4-6$5$25
claude-sonnet-4-6$3$15
claude-haiku-4-5$1$5
gpt-4o$2.50$10
gpt-4o-mini$0.15$0.60
gpt-4-turbo$10$30
text-embedding-3-small / -large$0.02 / $0.13input-only
gemini-embedding-001 / -2$0.15 / $0.20input-only

Embedding / input-only models are billed on input tokens — pass input tokens as estimated_tokens to budget_clear. Unknown models fall back to a conservative $15 / $75 default.


Network

Runs on Base mainnet (X402_NETWORK=eip155:8453), settling real USDC. There's no minimum to try it: top up as little as $1 (1,000 governance ops) to exercise the full live rail end-to-end — no testnet needed.


License

MIT — see LICENSE.

The canonical hosted service is https://gvnr.dev. Self-hosted instances are unaffiliated.


Acknowledgments

gvnr was designed and built by mightbesaad in close partnership with Claude Code (Anthropic) — chiefly Claude Opus 4.8. From the substrate architecture and the billing model to the code in this repository, it was a genuine collaboration. Thank you.

Server Config

{
  "mcpServers": {
    "gvnr": {
      "url": "https://gvnr.dev/mcp",
      "headers": {
        "Authorization": "Bearer YOUR_GVNR_API_KEY"
      }
    }
  }
}
Recommend Servers
TraeBuild with Free GPT-4.1 & Claude 3.7. Fully MCP-Ready.
Baidu Map百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Howtocook Mcp基于Anduin2017 / HowToCook (程序员在家做饭指南)的mcp server,帮你推荐菜谱、规划膳食,解决“今天吃什么“的世纪难题; Based on Anduin2017/HowToCook (Programmer's Guide to Cooking at Home), MCP Server helps you recommend recipes, plan meals, and solve the century old problem of "what to eat today"
Tavily Mcp
MCP AdvisorMCP Advisor & Installation - Use the right MCP server for your needs
WindsurfThe new purpose-built IDE to harness magic
RedisA Model Context Protocol server that provides access to Redis databases. This server enables LLMs to interact with Redis key-value stores through a set of standardized tools.
Zhipu Web SearchZhipu Web Search MCP Server is a search engine specifically designed for large models. It integrates four search engines, allowing users to flexibly compare and switch between them. Building upon the web crawling and ranking capabilities of traditional search engines, it enhances intent recognition capabilities, returning results more suitable for large model processing (such as webpage titles, URLs, summaries, site names, site icons, etc.). This helps AI applications achieve "dynamic knowledge acquisition" and "precise scenario adaptation" capabilities.
Serper MCP ServerA Serper MCP Server
Jina AI MCP ToolsA Model Context Protocol (MCP) server that integrates with Jina AI Search Foundation APIs.
Visual Studio Code - Open Source ("Code - OSS")Visual Studio Code
BlenderBlenderMCP connects Blender to Claude AI through the Model Context Protocol (MCP), allowing Claude to directly interact with and control Blender. This integration enables prompt assisted 3D modeling, scene creation, and manipulation.
AiimagemultistyleA Model Context Protocol (MCP) server for image generation and manipulation using fal.ai's Stable Diffusion model.
ChatWiseThe second fastest AI chatbot™
Playwright McpPlaywright MCP server
CursorThe AI Code Editor
Amap Maps高德地图官方 MCP Server
MiniMax MCPOfficial MiniMax Model Context Protocol (MCP) server that enables interaction with powerful Text to Speech, image generation and video generation APIs.
Y GuiA web-based graphical interface for AI chat interactions with support for multiple AI models and MCP (Model Context Protocol) servers.
EdgeOne Pages MCPAn MCP service designed for deploying HTML content to EdgeOne Pages and obtaining an accessible public URL.
DeepChatYour AI Partner on Desktop