Sponsored by Deepsite.site

PDF Reader MCP Server (@shtse8/pdf-reader-mcp)

Created By
shtse810 months ago
An MCP server built with Node.js/TypeScript that allows AI agents to securely read PDF files (local or URL) and extract text, metadata, or page counts. Uses pdf-parse.
Content

PDF Reader MCP Server (@shtse8/pdf-reader-mcp)

npm version Docker Pulls

Empower your AI agents (like Cline/Claude) with the ability to read and extract information from PDF files within your project, using a single, flexible tool.

This Node.js server implements the Model Context Protocol (MCP) to provide a consolidated read_pdf tool for interacting with PDF documents (local or URL) located within a defined project root directory.


⭐ Why Use This Server?

  • 🛡️ Secure Project Root Focus:
    • All local file operations are strictly confined to the project root directory (determined by the server's launch context), preventing unauthorized access.
    • Uses relative paths for local files. Important: The server determines its project root from its own Current Working Directory (cwd) at launch. The process starting the server (e.g., your MCP host) must set the cwd to your intended project directory.
  • 🌐 URL Support: Can directly process PDFs from public URLs.
  • ⚡ Efficient PDF Processing:
    • Leverages the pdf-parse library for extracting text, metadata, and page information.
  • 🔧 Flexible & Consolidated Tool:
    • A single read_pdf tool handles various extraction needs via parameters, simplifying agent interaction.
  • 🚀 Easy Integration: Get started quickly using npx with minimal configuration.
  • 🐳 Containerized Option: Also available as a Docker image for consistent deployment environments.
  • ✅ Robust Validation: Uses Zod schemas to validate all incoming tool arguments.

The simplest way is via npx, configured in your MCP host (e.g., mcp_settings.json).

{
  "mcpServers": {
    "pdf-reader-mcp": {
      "command": "npx",
      "args": ["@shtse8/pdf-reader-mcp"],
      "name": "PDF Reader (npx)"
    }
  }
}

(Alternative) Using bunx:

{
  "mcpServers": {
    "pdf-reader-mcp": {
      "command": "bunx",
      "args": ["@shtse8/pdf-reader-mcp"],
      "name": "PDF Reader (bunx)"
    }
  }
}

Important: Ensure your MCP Host launches the command with the cwd set to your project's root directory for local file access.


✨ The read_pdf Tool

This server provides a single, powerful tool: read_pdf.

  • Description: Reads content, metadata, or page count from a PDF file (local or URL), controlled by parameters.
  • Input: An object containing:
    • sources (array): Required. An array of source objects. Each object must contain either path (string, relative path to local PDF) or url (string, URL of PDF). Each source object can optionally include:
      • pages (string | number[], optional): Extract text only from specific pages (1-based) or ranges (e.g., [1, 3, 5] or '1,3-5,7') for this specific source. If provided, the global include_full_text flag is ignored for this source.
    • include_full_text (boolean, optional, default false): Include the full text content for each PDF. Ignored if pages is provided.
    • include_metadata (boolean, optional, default true): Include metadata (info and metadata objects) for each PDF.
    • include_page_count (boolean, optional, default true): Include the total number of pages (num_pages) for each PDF.
  • Output: An object containing a results array. Each element corresponds to a source in the input sources array. Processing continues even if some sources fail. Each result object has the following structure:
    • source (string): The original path or URL provided for identification.
    • success (boolean): Indicates if processing this specific source was successful.
    • error (string, optional): Provides an error message if success is false for this source.
    • data (object, optional): Contains the extracted data if success is true for this source:
      • full_text (string, optional)
      • page_texts (array, optional): Array of { page: number, text: string }.
      • missing_pages (array, optional)
      • info (object, optional)
      • metadata (object, optional)
      • num_pages (number, optional)
      • warnings (array, optional): Non-critical warnings for this source (e.g., requested page out of bounds).
  1. Get metadata and page count for multiple files:

    {
      "sources": [
        { "path": "report.pdf" },
        { "url": "http://example.com/another.pdf" },
        { "path": "nonexistent.pdf" }
      ]
    }
    

    (Example Output: { "results": [ { "source": "report.pdf", "success": true, "data": { "info": {...}, "metadata": {...}, "num_pages": 10 } }, { "source": "http://example.com/another.pdf", "success": true, "data": { "info": {...}, "metadata": {...}, "num_pages": 5 } }, { "source": "nonexistent.pdf", "success": false, "error": "File not found..." } ] })

  2. Get full text for one file:

    {
      "sources": [{ "url": "http://example.com/document.pdf" }],
      "include_full_text": true,
      "include_metadata": false,
      "include_page_count": false
    }
    

    (Example Output: { "results": [ { "source": "http://example.com/document.pdf", "success": true, "data": { "full_text": "..." } } ] })

  3. Get text from different pages for different files:

    {
      "sources": [
        { "path": "manual.pdf", "pages": "1-2" },
        { "url": "http://example.com/report.pdf", "pages": [5] }
      ],
      "include_metadata": false /* Default is true, explicitly set false */,
      "include_page_count": false /* Default is true, explicitly set false */
    }
    

    (Example Output: { "results": [ { "source": "manual.pdf", "success": true, "data": { "page_texts": [...] } }, { "source": "http://example.com/report.pdf", "success": true, "data": { "page_texts": [...] } } ] })


🐳 Alternative Usage: Docker

Configure your MCP Host to run the Docker container, mounting your project directory to /app.

{
  "mcpServers": {
    "pdf-reader-mcp": {
      "command": "docker",
      "args": [
        "run",
        "-i",
        "--rm",
        "-v",
        "/path/to/your/project:/app",
        "shtse8/pdf-reader-mcp:latest"
      ],
      "name": "PDF Reader (Docker)"
    }
  }
}

Note on Volume Mount Path: Instead of hardcoding /path/to/your/project, you can often use shell variables to automatically use the current working directory:

  • Linux/macOS: -v "$PWD:/app"
  • Windows Cmd: -v "%CD%:/app"
  • Windows PowerShell: -v "${PWD}:/app"
  • VS Code Tasks/Launch: You might be able to use ${workspaceFolder} if supported by your MCP host integration.

🛠️ Other Usage Options

Local Build (For Development)

  1. Clone: git clone https://github.com/shtse8/pdf-reader-mcp.git
  2. Install: cd pdf-reader-mcp && npm install
  3. Build: npm run build
  4. Configure MCP Host:
    {
      "mcpServers": {
        "pdf-reader-mcp": {
          "command": "node",
          "args": ["/path/to/cloned/repo/pdf-reader-mcp/build/index.js"],
          "name": "PDF Reader (Local Build)"
        }
      }
    }
    

💻 Development

  1. Clone, npm install, npm run build.
  2. npm run watch for auto-recompile.

🚢 Publishing (via GitHub Actions)

Uses GitHub Actions (.github/workflows/publish.yml) to publish to npm and Docker Hub on pushes to main. Requires NPM_TOKEN, DOCKERHUB_USERNAME, DOCKERHUB_TOKEN secrets.


🙌 Contributing

Contributions welcome! Open an issue or PR.

Recommend Servers
TraeBuild with Free GPT-4.1 & Claude 3.7. Fully MCP-Ready.
EdgeOne Pages MCPAn MCP service designed for deploying HTML content to EdgeOne Pages and obtaining an accessible public URL.
Amap Maps高德地图官方 MCP Server
TimeA Model Context Protocol server that provides time and timezone conversion capabilities. This server enables LLMs to get current time information and perform timezone conversions using IANA timezone names, with automatic system timezone detection.
Jina AI MCP ToolsA Model Context Protocol (MCP) server that integrates with Jina AI Search Foundation APIs.
DeepChatYour AI Partner on Desktop
WindsurfThe new purpose-built IDE to harness magic
ChatWiseThe second fastest AI chatbot™
AiimagemultistyleA Model Context Protocol (MCP) server for image generation and manipulation using fal.ai's Stable Diffusion model.
Zhipu Web SearchZhipu Web Search MCP Server is a search engine specifically designed for large models. It integrates four search engines, allowing users to flexibly compare and switch between them. Building upon the web crawling and ranking capabilities of traditional search engines, it enhances intent recognition capabilities, returning results more suitable for large model processing (such as webpage titles, URLs, summaries, site names, site icons, etc.). This helps AI applications achieve "dynamic knowledge acquisition" and "precise scenario adaptation" capabilities.
MCP AdvisorMCP Advisor & Installation - Use the right MCP server for your needs
Visual Studio Code - Open Source ("Code - OSS")Visual Studio Code
Baidu Map百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Y GuiA web-based graphical interface for AI chat interactions with support for multiple AI models and MCP (Model Context Protocol) servers.
BlenderBlenderMCP connects Blender to Claude AI through the Model Context Protocol (MCP), allowing Claude to directly interact with and control Blender. This integration enables prompt assisted 3D modeling, scene creation, and manipulation.
Playwright McpPlaywright MCP server
Howtocook Mcp基于Anduin2017 / HowToCook (程序员在家做饭指南)的mcp server,帮你推荐菜谱、规划膳食,解决“今天吃什么“的世纪难题; Based on Anduin2017/HowToCook (Programmer's Guide to Cooking at Home), MCP Server helps you recommend recipes, plan meals, and solve the century old problem of "what to eat today"
MiniMax MCPOfficial MiniMax Model Context Protocol (MCP) server that enables interaction with powerful Text to Speech, image generation and video generation APIs.
Tavily Mcp
Serper MCP ServerA Serper MCP Server
CursorThe AI Code Editor