Sponsored by Deepsite.site

Sifter - Turn a folder of documents into typed records you can query

Created By
Bruno Fortunato - sifter-ai11 days ago
Sifter extracts structured, typed records from your documents (PDFs, scans, contracts, invoices) using a natural-language field spec, then lets an agent query and aggregate them — exact counts, sums, filters, with citations back to the source page. Unlike RAG, it answers collection-wide questions, not just "find the passage.
Content

Sifter MCP Server

Turn a folder of documents into a database your agent can query.

RAG is great at finding a passage. It can't answer the questions people actually ask about a pile of documents — "how many invoices are unpaid", "total billed to this client this year", "which contracts expire in the next 90 days". Those are aggregations over the whole collection, and top-k retrieval only ever sees a handful of docs.

Sifter takes a different path: it extracts every document into a typed record (you describe the fields in plain language, the schema is inferred), then exposes them over MCP so your agent can query and aggregate them — exact counts, sums, filters, group-bys — with every field cited back to its source page. Not a paragraph. A figure.

What the agent can do

  • Create a sift — define an extraction in natural language (e.g. "from invoices: client, date, total — skip anything that isn't an invoice").
  • Upload documents — PDFs, scans, contracts, receipts, images.
  • List & filter records — typed fields, real filters.
  • Aggregate — counts, sums, group-bys over all records, not a sample.
  • Get citations — trace any value back to its source document, page, and bounding box.

Connect

Remote (hosted, zero install — Starter+)

{
  "mcpServers": {
    "sifter": {
      "url": "https://api.sifter.run/mcp",
      "headers": { "Authorization": "Bearer sk-..." }
    }
  }
}

Get an API key at sifter.runAPI Keys. The remote endpoint is a Starter+ feature; free-plan keys receive 402 on tool calls.

Local (self-host, free, MIT — bring your own model)

{
  "mcpServers": {
    "sifter": {
      "command": "uvx",
      "args": ["sifter-mcp", "--base-url", "http://localhost:8000"],
      "env": { "SIFTER_API_KEY": "sk-..." }
    }
  }
}

Run the open-source engine with docker compose up -d and point the server at your instance. Local models work — the LLM is only the extractor, so nothing has to leave your machine.

Try it

"How much have we invoiced per client this year, highest first?" "What's the total unpaid across all invoices?" "Which contracts expire in the next 90 days?"

Each runs as a real query over every record and returns an exact answer, traceable to the source.

Tags: document-extraction · structured-data · rag · pdf · ocr · data · agents · invoices · self-hosted

Server Config

{
  "mcpServers": {
    "sifter": {
      "command": "uvx",
      "args": [
        "sifter-mcp",
        "--base-url",
        "https://api.sifter.run/api"
      ],
      "env": {
        "SIFTER_API_KEY": "sk-..."
      }
    }
  }
}
Recommend Servers
TraeBuild with Free GPT-4.1 & Claude 3.7. Fully MCP-Ready.
Baidu Map百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
MCP AdvisorMCP Advisor & Installation - Use the right MCP server for your needs
ChatWiseThe second fastest AI chatbot™
Howtocook Mcp基于Anduin2017 / HowToCook (程序员在家做饭指南)的mcp server,帮你推荐菜谱、规划膳食,解决“今天吃什么“的世纪难题; Based on Anduin2017/HowToCook (Programmer's Guide to Cooking at Home), MCP Server helps you recommend recipes, plan meals, and solve the century old problem of "what to eat today"
Playwright McpPlaywright MCP server
DeepChatYour AI Partner on Desktop
CursorThe AI Code Editor
Visual Studio Code - Open Source ("Code - OSS")Visual Studio Code
Tavily Mcp
Jina AI MCP ToolsA Model Context Protocol (MCP) server that integrates with Jina AI Search Foundation APIs.
EdgeOne Pages MCPAn MCP service designed for deploying HTML content to EdgeOne Pages and obtaining an accessible public URL.
AiimagemultistyleA Model Context Protocol (MCP) server for image generation and manipulation using fal.ai's Stable Diffusion model.
RedisA Model Context Protocol server that provides access to Redis databases. This server enables LLMs to interact with Redis key-value stores through a set of standardized tools.
Serper MCP ServerA Serper MCP Server
Zhipu Web SearchZhipu Web Search MCP Server is a search engine specifically designed for large models. It integrates four search engines, allowing users to flexibly compare and switch between them. Building upon the web crawling and ranking capabilities of traditional search engines, it enhances intent recognition capabilities, returning results more suitable for large model processing (such as webpage titles, URLs, summaries, site names, site icons, etc.). This helps AI applications achieve "dynamic knowledge acquisition" and "precise scenario adaptation" capabilities.
Y GuiA web-based graphical interface for AI chat interactions with support for multiple AI models and MCP (Model Context Protocol) servers.
BlenderBlenderMCP connects Blender to Claude AI through the Model Context Protocol (MCP), allowing Claude to directly interact with and control Blender. This integration enables prompt assisted 3D modeling, scene creation, and manipulation.
Amap Maps高德地图官方 MCP Server
WindsurfThe new purpose-built IDE to harness magic
MiniMax MCPOfficial MiniMax Model Context Protocol (MCP) server that enables interaction with powerful Text to Speech, image generation and video generation APIs.