Sponsored by Deepsite.site

AI-First Scraper

Created By
yubinkim444a month ago
Ad-free web scraping and search exposed as 3 MCP tools fetch_page, fetch_pages_batch, search_web. Works with Claude Desktop, Cursor, Cline.
Overview

ai-first-scraper-mcp

Plug Claude Desktop, Cursor, or Cline straight into an ad-free web scraper + search engine. Three tools, one line of config.

PyPI Python MCP License: MIT


What it does

Adds three tools to any MCP-compatible agent:

ToolWhat it does
fetch_pageFetch one URL → return clean Markdown (HTML or PDF).
fetch_pages_batchFetch up to 25 URLs in parallel → return Markdown for each.
search_webRun a web search and return the top-k result pages already converted to Markdown.

No more "the model called curl and then tried to parse 80kB of ad HTML." Your agent receives clean Markdown ready to reason about.

Backed by the ai-first-scraper and ai-first-search APIs.


Install

Fastest — uvx (no install, runs from PyPI on demand)

// claude_desktop_config.json  /  cline_mcp_settings.json  /  ~/.cursor/mcp.json
{
  "mcpServers": {
    "ai-first-scraper": {
      "command": "uvx",
      "args": ["ai-first-scraper-mcp"]
    }
  }
}

Restart your client (Claude Desktop / Cursor / Cline). The three tools above will appear automatically.

Alternative — pip install

pip install ai-first-scraper-mcp
{
  "mcpServers": {
    "ai-first-scraper": {
      "command": "ai-first-scraper-mcp"
    }
  }
}

Where the config file lives

ClientConfig path
Claude Desktop (macOS)~/Library/Application Support/Claude/claude_desktop_config.json
Claude Desktop (Windows)%APPDATA%\Claude\claude_desktop_config.json
Cursor~/.cursor/mcp.json
Cline (VS Code)~/Library/Application Support/Code/User/globalStorage/saoudrizwan.claude-dev/settings/cline_mcp_settings.json

Point at your own backend (optional)

By default this server calls the public ai-first-scraper.onrender.com and ai-first-search.onrender.com instances. If you want to self-host, set env vars in your MCP config:

{
  "mcpServers": {
    "ai-first-scraper": {
      "command": "uvx",
      "args": ["ai-first-scraper-mcp"],
      "env": {
        "SCRAPER_URL": "https://your-scraper.example.com",
        "SEARCH_URL":  "https://your-search.example.com",
        "AFS_TIMEOUT": "60"
      }
    }
  }
}

Verify it works

Open your MCP client and ask the agent:

"Use the search_web tool to find the top 3 recent articles about MCP and summarize them in 5 bullets each."

You should see the agent call search_web, get back Markdown for each result, and produce the summary without ever touching raw HTML.


Companion projects

  • ai-first-scraper — the per-URL Markdown cleaner this MCP server fans out to.
  • ai-first-search — search → scrape → markdown pipeline.
  • mcp-rec — record & replay any MCP server's traffic for tests and bug reports.
  • llm-cache-proxy — local cache for OpenAI/Anthropic API calls.
  • promptlocker — lockfile for prompts.
  • context-diff — see what blew up your Claude Code context window.
  • agentwatch — overlay for browser AI agents.

Develop locally

git clone https://github.com/yubinkim444/ai-first-scraper-mcp.git
cd ai-first-scraper-mcp

uv sync                    # or: pip install -e .
ai-first-scraper-mcp       # speaks MCP over stdio

To test against a local client, point its MCP config at the same command.


License

MIT © yubinkim444

Server Config

{
  "mcpServers": {
    "ai-first-scraper": {
      "command": "uvx",
      "args": [
        "ai-first-scraper-mcp"
      ]
    }
  }
}
Recommend Servers
TraeBuild with Free GPT-4.1 & Claude 3.7. Fully MCP-Ready.
EdgeOne Pages MCPAn MCP service designed for deploying HTML content to EdgeOne Pages and obtaining an accessible public URL.
CursorThe AI Code Editor
AiimagemultistyleA Model Context Protocol (MCP) server for image generation and manipulation using fal.ai's Stable Diffusion model.
Baidu Map百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
BlenderBlenderMCP connects Blender to Claude AI through the Model Context Protocol (MCP), allowing Claude to directly interact with and control Blender. This integration enables prompt assisted 3D modeling, scene creation, and manipulation.
Serper MCP ServerA Serper MCP Server
WindsurfThe new purpose-built IDE to harness magic
DeepChatYour AI Partner on Desktop
Jina AI MCP ToolsA Model Context Protocol (MCP) server that integrates with Jina AI Search Foundation APIs.
Zhipu Web SearchZhipu Web Search MCP Server is a search engine specifically designed for large models. It integrates four search engines, allowing users to flexibly compare and switch between them. Building upon the web crawling and ranking capabilities of traditional search engines, it enhances intent recognition capabilities, returning results more suitable for large model processing (such as webpage titles, URLs, summaries, site names, site icons, etc.). This helps AI applications achieve "dynamic knowledge acquisition" and "precise scenario adaptation" capabilities.
Howtocook Mcp基于Anduin2017 / HowToCook (程序员在家做饭指南)的mcp server,帮你推荐菜谱、规划膳食,解决“今天吃什么“的世纪难题; Based on Anduin2017/HowToCook (Programmer's Guide to Cooking at Home), MCP Server helps you recommend recipes, plan meals, and solve the century old problem of "what to eat today"
Tavily Mcp
Y GuiA web-based graphical interface for AI chat interactions with support for multiple AI models and MCP (Model Context Protocol) servers.
Amap Maps高德地图官方 MCP Server
MiniMax MCPOfficial MiniMax Model Context Protocol (MCP) server that enables interaction with powerful Text to Speech, image generation and video generation APIs.
MCP AdvisorMCP Advisor & Installation - Use the right MCP server for your needs
ChatWiseThe second fastest AI chatbot™
RedisA Model Context Protocol server that provides access to Redis databases. This server enables LLMs to interact with Redis key-value stores through a set of standardized tools.
Visual Studio Code - Open Source ("Code - OSS")Visual Studio Code
Playwright McpPlaywright MCP server