Gemini OCR MCP

Created By

WindoCa year ago

This project provides a simple yet powerful OCR (Optical Character Recognition) service through a FastMCP server, leveraging the capabilities of the Google Gemini API. It allows you to extract text from images either by providing a file path or a base64 encoded string.

# ocr

# gemini

Overview Content Tools Comments

Content

Gemini OCR MCP Server

Objective

Extract the text from the following image:

CAPTCHA

and convert it to plain text, e.g., fbVk

Features

File-based OCR: Extract text directly from an image file on your local system.
Base64 OCR: Extract text from a base64 encoded image string.
Easy to Use: Exposes OCR functionality as simple tools in an MCP server.
Powered by Gemini: Utilizes Google's advanced Gemini models for high-accuracy text recognition.

Prerequisites

Python 3.8 or higher
A Google Gemini API Key. You can obtain one from Google AI Studio.

Setup and Installation

Clone the repository:

git clone https://github.com/WindoC/gemini-ocr-mcp
cd gemini-ocr-mcp

Create and activate a virtual environment:

# Install uv standalone if needed

## On macOS and Linux.
curl -LsSf https://astral.sh/uv/install.sh | sh

## On Windows.
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

Install the required dependencies:
```
uv sync
```

MCP Configuration Example

If you are running this as a server for a parent MCP application, you can configure it in your main MCP config.json.

Windows Example:

{
  "mcpServers": {
    "gemini-ocr-mcp": {
      "command": "uv",
      "args": [
        "--directory",
        "x:\\path\\to\\your\\project\\gemini-ocr-mcp",
        "run",
        "gemini-ocr-mcp.py"
      ],
      "env": {
        "GEMINI_MODEL": "gemini-2.5-flash-preview-05-20",
        "GEMINI_API_KEY": "YOUR_GEMINI_API_KEY"
      }
    }
  }
}

Linux/macOS Example:

{
  "mcpServers": {
    "gemini-ocr-mcp": {
      "command": "uv",
      "args": [
        "--directory",
        "/path/to/your/project/gemini-ocr-mcp",
        "run",
        "gemini-ocr-mcp.py"
      ],
      "env": {
        "GEMINI_MODEL": "gemini-2.5-flash-preview-05-20",
        "GEMINI_API_KEY": "YOUR_GEMINI_API_KEY"
      }
    }
  }
}

Note: Remember to replace the placeholder paths with the absolute path to your project directory.

Tools Provided

`ocr_image_file`

Performs OCR on a local image file.

Parameter: image_file (string): The absolute or relative path to the image file.
Returns: (string) The extracted text from the image.

`ocr_image_base64`

Performs OCR on a base64 encoded image.

Parameter: base64_image (string): The base64 encoded string of the image.
Returns: (string) The extracted text from the image.

Server Config

{
  "mcpServers": {
    "gemini-ocr-mcp": {
      "command": "uv",
      "args": [
        "--directory",
        "/path/to/your/project/gemini-ocr-mcp",
        "run",
        "gemini-ocr-mcp.py"
      ],
      "env": {
        "GEMINI_MODEL": "gemini-2.5-flash-preview-05-20",
        "GEMINI_API_KEY": "YOUR_GEMINI_API_KEY"
      }
    }
  }
}

Recommend Servers

TraeBuild with Free GPT-4.1 & Claude 3.7. Fully MCP-Ready.

WindsurfThe new purpose-built IDE to harness magic

CursorThe AI Code Editor

RedisA Model Context Protocol server that provides access to Redis databases. This server enables LLMs to interact with Redis key-value stores through a set of standardized tools.

Howtocook Mcp基于Anduin2017 / HowToCook （程序员在家做饭指南）的mcp server，帮你推荐菜谱、规划膳食，解决“今天吃什么“的世纪难题； Based on Anduin2017/HowToCook (Programmer's Guide to Cooking at Home), MCP Server helps you recommend recipes, plan meals, and solve the century old problem of "what to eat today"

Tavily Mcp

Jina AI MCP ToolsA Model Context Protocol (MCP) server that integrates with Jina AI Search Foundation APIs.

Amap Maps高德地图官方 MCP Server

Y GuiA web-based graphical interface for AI chat interactions with support for multiple AI models and MCP (Model Context Protocol) servers.

ChatWiseThe second fastest AI chatbot™

Playwright McpPlaywright MCP server