Sponsored by Deepsite.site

Peekaboo MCP – lightning-fast macOS screenshots for AI agents

Created By
steipete6 months ago
## What Peekaboo Can Do Peekaboo provides three main tools that give AI agents visual capabilities: - **`image`** - Capture screenshots of screens or specific applications - **`analyze`** - Ask AI questions about captured images using vision models - **`list`** - Enumerate available screens and windows for targeted captures Each tool is designed to be powerful and flexible. The most powerful feature is visual question answering - agents can ask questions about screenshots like "What do you see in this window?" or "Is the submit button visible?" and get accurate answers. This saves context space since asking specific questions is much more efficient than returning raw image data. Peekaboo supports both cloud and local vision models, letting you choose between accuracy and privacy.
Overview

My MCP Ecosystem

Peekaboo is part of a growing collection of MCP servers I'm building:

Each serves a specific purpose in building autonomous AI workflows.

Technical Architecture

Peekaboo combines TypeScript and Swift for the best of both worlds. TypeScript provides excellent MCP support and easy distribution via npm, while Swift enables direct access to Apple's ScreenCaptureKit for capturing windows without focus changes.

My initial AppleScript prototype had a fatal flaw: it required focus changes to capture windows. The Swift rewrite uses ScreenCaptureKit to access the window manager directly - no focus changes, no user disruption.

The system uses a Swift CLI that communicates with a Node.js MCP server, supporting both local models and cloud providers with automatic fallback. Built with Swift 6 and the new Swift Testing framework (now that I have experience with it!), Peekaboo delivers fast, non-intrusive screenshot capture with intelligent window matching.

For detailed testing instructions using the MCP Inspector, see the Peekaboo README.

The Vision: Autonomous Agent Debugging

Peekaboo is like one puzzle piece in a larger set of MCPs I'm building to help agents stay in the loop. The goal is simple: if an agent can answer questions by itself, you don't have to intervene and it can simply continue and debug itself. This is the holy grail for building applications with CI - you want to do everything so the agent can loop and work until what you want is done.

When your build fails, when your UI doesn't look right, when something breaks - instead of stopping and asking you "what do you see?", the agent can take a screenshot, analyze it, and continue fixing the problem autonomously. That's the power of giving agents their eyes.

👻 Peekaboo MCP is available now - ⭐ the repo if this saves you a debug session!

Server Config

{
  "mcpServers": {
    "peekaboo": {
      "command": "npx",
      "args": [
        "-y",
        "@steipete/peekaboo-mcp"
      ],
      "env": {
        "PEEKABOO_AI_PROVIDERS": "ollama/llava:latest"
      }
    }
  }
}
Recommend Servers
TraeBuild with Free GPT-4.1 & Claude 3.7. Fully MCP-Ready.
WindsurfThe new purpose-built IDE to harness magic
Context7Context7 MCP Server -- Up-to-date code documentation for LLMs and AI code editors
ChatWiseThe second fastest AI chatbot™
CursorThe AI Code Editor
AiimagemultistyleA Model Context Protocol (MCP) server for image generation and manipulation using fal.ai's Stable Diffusion model.
TimeA Model Context Protocol server that provides time and timezone conversion capabilities. This server enables LLMs to get current time information and perform timezone conversions using IANA timezone names, with automatic system timezone detection.
MiniMax MCPOfficial MiniMax Model Context Protocol (MCP) server that enables interaction with powerful Text to Speech, image generation and video generation APIs.
Jina AI MCP ToolsA Model Context Protocol (MCP) server that integrates with Jina AI Search Foundation APIs.
EdgeOne Pages MCPAn MCP service designed for deploying HTML content to EdgeOne Pages and obtaining an accessible public URL.
DeepChatYour AI Partner on Desktop
Baidu Map百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Serper MCP ServerA Serper MCP Server
Amap Maps高德地图官方 MCP Server
Visual Studio Code - Open Source ("Code - OSS")Visual Studio Code
Tavily Mcp
BlenderBlenderMCP connects Blender to Claude AI through the Model Context Protocol (MCP), allowing Claude to directly interact with and control Blender. This integration enables prompt assisted 3D modeling, scene creation, and manipulation.
Playwright McpPlaywright MCP server
Zhipu Web SearchZhipu Web Search MCP Server is a search engine specifically designed for large models. It integrates four search engines, allowing users to flexibly compare and switch between them. Building upon the web crawling and ranking capabilities of traditional search engines, it enhances intent recognition capabilities, returning results more suitable for large model processing (such as webpage titles, URLs, summaries, site names, site icons, etc.). This helps AI applications achieve "dynamic knowledge acquisition" and "precise scenario adaptation" capabilities.
Howtocook Mcp基于Anduin2017 / HowToCook (程序员在家做饭指南)的mcp server,帮你推荐菜谱、规划膳食,解决“今天吃什么“的世纪难题; Based on Anduin2017/HowToCook (Programmer's Guide to Cooking at Home), MCP Server helps you recommend recipes, plan meals, and solve the century old problem of "what to eat today"
MCP AdvisorMCP Advisor & Installation - Use the right MCP server for your needs