Sponsored by Deepsite.site

Scrapling Official Mcp Server

Created By
Karim Shoair3 months ago
The Scrapling MCP Server is a new feature that brings Scrapling's powerful Web Scraping capabilities directly to your favorite AI chatbot or AI agent. This integration allows you to scrape websites, extract data, and bypass anti-bot protections conversationally through Claude's AI interface or any other chatbot that supports MCP.
Content

Scrapling MCP Server Guide

The Scrapling MCP Server is a new feature that brings Scrapling's powerful Web Scraping capabilities directly to your favorite AI chatbot or AI agent. This integration allows you to scrape websites, extract data, and bypass anti-bot protections conversationally through Claude's AI interface or any other chatbot that supports MCP.

Features

The Scrapling MCP Server provides six powerful tools for web scraping:

🚀 Basic HTTP Scraping

  • get: Fast HTTP requests with browser fingerprint impersonation, generating real browser headers matching the TLS version, HTTP/3, and more!
  • bulk_get: An async version of the above tool that allows scraping of multiple URLs at the same time!

🌐 Dynamic Content Scraping

  • fetch: Rapidly fetch dynamic content with Chromium/Chrome browser with complete control over the request/browser, stealth mode, and more!
  • bulk_fetch: An async version of the above tool that allows scraping of multiple URLs in different browser tabs at the same time!

🔒 Stealth Scraping

  • stealthy_fetch: Uses our modified version of Camoufox browser to bypass Cloudflare Turnstile and other anti-bot systems with complete control over the request/browser!
  • bulk_stealthy_fetch: An async version of the above tool that allows stealth scraping of multiple URLs in different browser tabs at the same time!

Key Capabilities

  • Smart Content Extraction: Convert web pages/elements to Markdown, HTML, or extract a clean version of the text content
  • CSS Selector Support: Use the Scrapling engine to target specific elements with precision before handing the content to the AI
  • Anti-Bot Bypass: Handle Cloudflare Turnstile and other protections
  • Proxy Support: Use proxies for anonymity and geo-targeting
  • Browser Impersonation: Mimic real browsers with TLS fingerprinting, real browser headers matching that version, and more
  • Parallel Processing: Scrape multiple URLs concurrently for efficiency

But why use Scrapling MCP Server instead of other available tools?

Aside from its stealth capabilities and ability to bypass Cloudflare Turnstile, Scrapling's server is the only one that allows you to pass a CSS selector in the prompt to extract specific elements before handing the content to the AI.

The way other servers work is that they extract the content, then pass it all to the AI to extract the fields you want. This causes the AI to consume a lot more tokens that are not needed (from irrelevant content). Scrapling solves this problem by allowing you to pass a CSS selector to narrow down the content you want before passing it to the AI, which makes the whole process much faster and more efficient.

If you don't know how to write/use CSS selectors, don't worry. You can tell the AI in the prompt to write selectors to match possible fields for you and watch it try different combinations until it finds the right one, as we will show in the examples section.

Installation

Install Scrapling with MCP Support, then double-check that the browser dependencies are installed.

# Install Scrapling with MCP server dependencies
pip install "scrapling[ai]"

# Install browser dependencies
scrapling install

Setting up the MCP Server

Here we will explain how to add Scrapling MCP Server to Claude Desktop and Claude Code, but the same logic applies to any other chatbot that supports MCP:

Claude Desktop

  1. Open Claude Desktop
  2. Click the hamburger menu (☰) at the top left → Settings → Developer → Edit Config
  3. Add the Scrapling MCP server configuration:
"ScraplingServer": {
  "command": "scrapling",
  "args": [
    "mcp"
  ]
}

If that's the first MCP server you're adding, set the content of the file to this:

{
  "mcpServers": {
    "ScraplingServer": {
      "command": "scrapling",
      "args": [
        "mcp"
      ]
    }
  }
}

As per the official article, this action creates a new configuration file if one doesn’t exist or opens your existing configuration. The file is located at

  1. MacOS: ~/Library/Application Support/Claude/claude_desktop_config.json
  2. Windows: %APPDATA%\Claude\claude_desktop_config.json

To ensure it's working, it's best to use the full path to the scrapling executable. Open the terminal and execute the following command:

  1. MacOS: which scrapling
  2. Windows: where scrapling

For me, on my Mac, it returned /Users/<MyUsername>/.venv/bin/scrapling, so the config I used in the end is:

{
  "mcpServers": {
    "ScraplingServer": {
      "command": "/Users/<MyUsername>/.venv/bin/scrapling",
      "args": [
        "mcp"
      ]
    }
  }
}

The same logic applies to Cursor, WindSurf, and others.

Claude Code

Here it's much simpler to do. If you have Claude Code installed, open the terminal and execute the following command:

claude mcp add ScraplingServer "/Users/<MyUsername>/.venv/bin/scrapling" mcp

Same as above, to get Scrapling's executable path, open the terminal and execute the following command:

  1. MacOS: which scrapling
  2. Windows: where scrapling

Here's the main article from Anthropic on how to add MCP servers to Claude code for further details.

Then, after you've added the server, you need to completely quit and restart the app you used above. In Claude Desktop, you should see an MCP server indicator (🔧) in the bottom-right corner of the chat input or see ScraplingServer in the Search and tools dropdown in the chat input box.

Examples

Now we will show you some examples of prompts we used while testing the MCP server, but you are probably more creative than we are and better at prompt engineering than we are :)

We will gradually go from simple prompts to more complex ones. We will use Claude Desktop for the examples, but the same logic applies to the rest, of course.

  1. Basic Web Scraping

    Extract the main content from a webpage as Markdown:

    Scrape the main content from https://example.com and convert it to markdown format.
    

    Claude will use the get tool to fetch the page and return clean, readable content. If it fails, it will continue retrying every second for three attempts, unless you instruct it to do otherwise. If it fails to retrieve content for any reason, such as protection or if it's a dynamic website, it will automatically try the other tools. If Claude didn't do that automatically for some reason, you can add that to the prompt.

    A more optimized version of the same prompt would be:

    Use regular requests to scrape the main content from https://example.com and convert it to markdown format.
    

    This tells Claude about the right tool to use here, so it doesn't have to guess. Sometimes it will start using normal requests on its own, and at other times, it will assume browsers are better suited for this website without any apparent reason. As a general rule of thumb, you should always tell Claude what tool to use if you want to save time, money, and get consistent results.

  2. Targeted Data Extraction

    Extract specific elements using CSS selectors:

    Get all product titles from https://shop.example.com using the CSS selector '.product-title'. If the request fails, retry up to 5 times every 10 seconds.
    

    The server will extract only the elements matching your selector and return them as a structured list. Notice I told it to set the tool to only try three times in case the website has connection issues, but the default setting should be fine for most cases.

  3. E-commerce Data Collection

    Another example of a bit more complex prompt:

    Extract product information from these e-commerce URLs using bulk browser fetches:
    - https://shop1.com/product-a
    - https://shop2.com/product-b  
    - https://shop3.com/product-c
    
    Get the product names, prices, and descriptions from each page.
    

    Claude will use bulk_fetch to scrape all URLs concurrently, then analyze the extracted data.

  4. More advanced workflow

    Let's say I want to get all the action games available on PlayStation's store first page right now. I can use the following prompt to do that:

    Extract the URLs of all games in this page, then do a bulk request to them and return a list of all action games: https://store.playstation.com/en-us/pages/browse
    

    Note that I instructed it to use a bulk request for all the URLs collected. If I hadn't mentioned it, sometimes it works as intended, and other times it makes a separate request to each URL, which takes significantly longer. This prompt takes approximately one minute to complete.

    However, because I wasn't specific enough, it actually used the stealthy_fetch here and the bulk_stealthy_fetch in the second step, which unnecessarily consumed a large number of tokens. A better prompt would be:

    Use normal requests to extract the URLs of all games in this page, then do a bulk request to them and return a list of all action games: https://store.playstation.com/en-us/pages/browse
    

    And if you know how to write CSS selectors, you can instruct Claude to apply the selectors to the elements you want, and it will nearly complete the task immediately.

    Use normal requests to extract the URLs of all games on the page below, then perform a bulk request to them and return a list of all action games.
    The selector for games in the first page is `[href*="/concept/"]` and the selector for the genre in the second request is `[data-qa="gameInfo#releaseInformation#genre-value"]`
    
    URL: https://store.playstation.com/en-us/pages/browse
    
  5. Get data from a website with Cloudflare protection

    If you think the website you are targeting has Cloudflare protection, you should tell Claude instead of letting it discover that on its own.

    What's the price of this product? Be cautious, as it utilizes Cloudflare's Turnstile protection. Make the browser visible while you work.
    
    https://ao.com/product/oo101uk-ninja-woodfire-outdoor-pizza-oven-brown-99357-685.aspx
    
  6. Long workflow

    You can, for example, use a prompt like this:

    Extract all the product URLs in the following category, then return the prices and the details of the first three products.
    
    https://www.arnotts.ie/furniture/bedroom/bed-frames/
    

    But a better prompt would be:

    Go to the following category URL and extract all product URLs using the CSS selector "a". Then, fetch the first 3 product pages in parallel and extract each product’s price and details.
    
    Keep the output in markdown format to reduce irrelevant content.
    
    Category URL:
    https://www.arnotts.ie/furniture/bedroom/bed-frames/
    

And so on, you get the idea. Your creativity is the key here.

Best Practices

Here is some technical advice for you.

1. Choose the Right Tool

  • get: Fast, simple websites
  • fetch: Sites with JavaScript/dynamic content
  • stealthy_fetch: Protected sites, Cloudflare, anti-bot systems

2. Optimize Performance

  • Use bulk tools for multiple URLs
  • Disable unnecessary resources
  • Set appropriate timeouts
  • Use CSS selectors for targeted extraction

3. Handle Dynamic Content

  • Use network_idle for SPAs
  • Set wait_selector for specific elements
  • Increase timeout for slow-loading sites

4. Data Quality

  • Use main_content_only=true to avoid navigation/ads
  • Choose an appropriate extraction_type for your use case

⚠️ Important Guidelines:

  • Check robots.txt: Visit https://website.com/robots.txt to see scraping rules
  • Respect rate limits: Don't overwhelm servers with requests
  • Terms of Service: Read and comply with website terms
  • Copyright: Respect intellectual property rights
  • Privacy: Be mindful of personal data protection laws
  • Commercial use: Ensure you have permission for business purposes

Built with ❤️ by the Scrapling team. Happy scraping!

Server Config

{
  "mcpServers": {
    "ScraplingServer": {
      "command": "/Path/to/scrapling",
      "args": [
        "mcp"
      ]
    }
  }
}
Recommend Servers
TraeBuild with Free GPT-4.1 & Claude 3.7. Fully MCP-Ready.
TimeA Model Context Protocol server that provides time and timezone conversion capabilities. This server enables LLMs to get current time information and perform timezone conversions using IANA timezone names, with automatic system timezone detection.
EdgeOne Pages MCPAn MCP service designed for deploying HTML content to EdgeOne Pages and obtaining an accessible public URL.
ChatWiseThe second fastest AI chatbot™
Playwright McpPlaywright MCP server
Tavily Mcp
Howtocook Mcp基于Anduin2017 / HowToCook (程序员在家做饭指南)的mcp server,帮你推荐菜谱、规划膳食,解决“今天吃什么“的世纪难题; Based on Anduin2017/HowToCook (Programmer's Guide to Cooking at Home), MCP Server helps you recommend recipes, plan meals, and solve the century old problem of "what to eat today"
Serper MCP ServerA Serper MCP Server
Context7Context7 MCP Server -- Up-to-date code documentation for LLMs and AI code editors
BlenderBlenderMCP connects Blender to Claude AI through the Model Context Protocol (MCP), allowing Claude to directly interact with and control Blender. This integration enables prompt assisted 3D modeling, scene creation, and manipulation.
Visual Studio Code - Open Source ("Code - OSS")Visual Studio Code
WindsurfThe new purpose-built IDE to harness magic
AiimagemultistyleA Model Context Protocol (MCP) server for image generation and manipulation using fal.ai's Stable Diffusion model.
Zhipu Web SearchZhipu Web Search MCP Server is a search engine specifically designed for large models. It integrates four search engines, allowing users to flexibly compare and switch between them. Building upon the web crawling and ranking capabilities of traditional search engines, it enhances intent recognition capabilities, returning results more suitable for large model processing (such as webpage titles, URLs, summaries, site names, site icons, etc.). This helps AI applications achieve "dynamic knowledge acquisition" and "precise scenario adaptation" capabilities.
Baidu Map百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
CursorThe AI Code Editor
MiniMax MCPOfficial MiniMax Model Context Protocol (MCP) server that enables interaction with powerful Text to Speech, image generation and video generation APIs.
DeepChatYour AI Partner on Desktop
MCP AdvisorMCP Advisor & Installation - Use the right MCP server for your needs
Jina AI MCP ToolsA Model Context Protocol (MCP) server that integrates with Jina AI Search Foundation APIs.
Amap Maps高德地图官方 MCP Server