Sponsored by Deepsite.site

Charlotte

Created By
TickTockBent18 days ago
Charlotte is an MCP server that renders web pages into structured, agent-readable representations using headless Chromium. It exposes the browser's semantic understanding — accessibility tree, layout geometry, interactive elements — to AI agents via Model Context Protocol tools, enabling navigation, observation, and interaction without vision models or brittle selectors.
Content

Charlotte

The Web, Readable.

Charlotte is an MCP server that renders web pages into structured, agent-readable representations using headless Chromium. It exposes the browser's semantic understanding — accessibility tree, layout geometry, interactive elements — to AI agents via Model Context Protocol tools, enabling navigation, observation, and interaction without vision models or brittle selectors.

Why Charlotte?

Most browser MCP servers dump the entire accessibility tree on every call — a flat text blob that can exceed a million characters on content-heavy pages. Agents pay for all of it whether they need it or not.

Charlotte takes a different approach. It decomposes each page into a typed, structured representation — landmarks, headings, interactive elements, forms, content summaries — and lets agents control how much they receive with three detail levels. When an agent navigates to a new page, it gets a compact orientation (336 characters for Hacker News) instead of the full element dump (61,000+ characters). When it needs specifics, it asks for them.

Benchmarks

Charlotte v0.4.0 vs Playwright MCP, measured by characters returned per tool call on real websites:

Navigation (first contact with a page):

SiteCharlotte navigatePlaywright browser_navigate
example.com612817
Wikipedia (AI article)7,6671,040,636
Hacker News33661,230
GitHub repo3,18580,297

Charlotte's navigate returns minimal detail by default — landmarks, headings, and interactive element counts grouped by page region. Enough to orient, not enough to overwhelm. On Wikipedia, that's 135x smaller than Playwright's response.

Tool definition overhead (invisible cost per API call):

ProfileToolsDef. tokens/callSavings vs full
full407,187
browse (default)223,72748%
core71,67777%

Tool definitions are sent on every API round-trip. With the default browse profile, Charlotte carries 48% less definition overhead than loading all 40 tools. Over a 20-call browsing session, that's 38.6% fewer total tokens. See the profile benchmark report for full results.

The workflow difference: Playwright agents receive 61K+ characters every time they look at Hacker News, whether they're reading headlines or looking for a login button. Charlotte agents get 336 characters on arrival, call find({ type: "link", text: "login" }) to get exactly what they need, and never pay for the rest.

How It Works

Charlotte maintains a persistent headless Chromium session and acts as a translation layer between the visual web and the agent's text-native reasoning. Every page is decomposed into a structured representation:

┌─────────────┐     MCP Protocol     ┌──────────────────┐
│   AI Agent  │<────────────────────>│    Charlotte     │
└─────────────┘                      │                  │
                                     │  ┌────────────┐  │
                                     │  │  Renderer  │  │
                                     │  │  Pipeline  │  │
                                     │  └─────┬──────┘  │
                                     │        │         │
                                     │  ┌─────▼──────┐  │
                                     │  │  Headless  │  │
                                     │  │  Chromium  │  │
                                     │  └────────────┘  │
                                     └──────────────────┘

Agents receive landmarks, headings, interactive elements with typed metadata, bounding boxes, form structures, and content summaries — all derived from what the browser already knows about every page.

Features

Navigationnavigate, back, forward, reload

Observationobserve (3 detail levels), find (spatial + semantic search), screenshot, diff (structural comparison against snapshots)

Interactionclick, type, select, toggle, submit, scroll, hover, drag, key, wait_for (async condition polling), dialog (accept/dismiss JS dialogs)

Monitoringconsole (all severity levels, filtering, timestamps), requests (full HTTP history, method/status/resource type filtering)

Session Managementtabs, tab_open, tab_switch, tab_close, viewport (device presets), network (throttling, URL blocking), set_cookies, get_cookies, clear_cookies, set_headers, configure

Development Modedev_serve (static server + file watching with auto-reload), dev_inject (CSS/JS injection), dev_audit (a11y, performance, SEO, contrast, broken links)

Utilitiesevaluate (arbitrary JS execution in page context)

Tool Profiles

Charlotte ships 40 tools, but most workflows only need a subset. Startup profiles control which tools load into the agent's context, reducing definition overhead by up to 77%.

charlotte --profile browse    # 22 tools (default) — navigate, observe, interact, tabs
charlotte --profile core      # 7 tools — navigate, observe, find, click, type, submit
charlotte --profile full      # 40 tools — everything
charlotte --profile interact  # 27 tools — full interaction + dialog + evaluate
charlotte --profile develop   # 30 tools — interact + dev_serve, dev_inject, dev_audit
charlotte --profile audit     # 13 tools — navigation + observation + dev_audit + viewport

Agents can activate more tools mid-session without restarting:

charlotte:tools enable dev_mode    → activates dev_serve, dev_audit, dev_inject
charlotte:tools disable dev_mode   → deactivates them
charlotte:tools list               → see what's loaded

Quick Start

Prerequisites

  • Node.js >= 22
  • npm

Installation

Charlotte is listed on the MCP Registry as io.github.TickTockBent/charlotte and published on npm as @ticktockbent/charlotte:

npm install -g @ticktockbent/charlotte

Docker images are available on Docker Hub and GitHub Container Registry:

# Alpine (default, smaller)
docker pull ticktockbent/charlotte:alpine

# Debian (if you need glibc compatibility)
docker pull ticktockbent/charlotte:debian

# Or from GHCR
docker pull ghcr.io/ticktockbent/charlotte:latest

Or install from source:

git clone https://github.com/ticktockbent/charlotte.git
cd charlotte
npm install
npm run build

Run

Charlotte communicates over stdio using the MCP protocol:

# If installed globally (default browse profile)
charlotte

# With a specific profile
charlotte --profile core

# If installed from source
npm start

MCP Client Configuration

Add Charlotte to your MCP client configuration. For Claude Code, create .mcp.json in your project root:

{
  "mcpServers": {
    "charlotte": {
      "type": "stdio",
      "command": "npx",
      "args": ["@ticktockbent/charlotte"],
      "env": {}
    }
  }
}

For Claude Desktop, add to claude_desktop_config.json:

{
  "mcpServers": {
    "charlotte": {
      "command": "npx",
      "args": ["@ticktockbent/charlotte"]
    }
  }
}

See docs/mcp-setup.md for the full setup guide, including development mode, generic MCP clients, verification steps, and troubleshooting.

Usage Examples

Once connected, an agent can use Charlotte's tools:

Browse a website

navigate({ url: "https://example.com" })
// → 612 chars: landmarks, headings, interactive element counts

find({ type: "link", text: "More information" })
// → just the matching element with its ID

click({ element_id: "lnk-a3f1" })

Fill out a form

navigate({ url: "https://httpbin.org/forms/post" })
find({ type: "text_input" })
type({ element_id: "inp-c7e2", text: "hello@example.com" })
select({ element_id: "sel-e8a3", value: "option-2" })
submit({ form_id: "frm-b1d4" })

Local development feedback loop

dev_serve({ path: "./my-site", watch: true })
observe({ detail: "full" })
dev_audit({ checks: ["a11y", "contrast"] })
dev_inject({ css: "body { font-size: 18px; }" })

Page Representation

Charlotte returns structured representations with three detail levels that let agents control how much context they consume:

Minimal (default for navigate)

Landmarks, headings, and interactive element counts grouped by page region. Designed for orientation — "what's on this page?" — without listing every element.

{
  "url": "https://news.ycombinator.com",
  "title": "Hacker News",
  "viewport": { "width": 1280, "height": 720 },
  "structure": {
    "headings": [{ "level": 1, "text": "Hacker News", "id": "h-a1b2" }]
  },
  "interactive_summary": {
    "total": 93,
    "by_landmark": {
      "(page root)": { "link": 91, "text_input": 1, "button": 1 }
    }
  }
}

Summary (default for observe)

Full interactive element list with typed metadata, form structures, and content summaries.

{
  "url": "https://example.com/dashboard",
  "title": "Dashboard",
  "viewport": { "width": 1280, "height": 720 },
  "structure": {
    "landmarks": [
      { "id": "rgn-b2c1", "role": "banner", "label": "Site header", "bounds": { "x": 0, "y": 0, "w": 1280, "h": 64 } },
      { "id": "rgn-d4e5", "role": "main", "label": "Content", "bounds": { "x": 240, "y": 64, "w": 1040, "h": 656 } }
    ],
    "headings": [{ "level": 1, "text": "Dashboard", "id": "h-1a2b" }],
    "content_summary": "main: 2 headings, 5 links, 1 form"
  },
  "interactive": [
    {
      "id": "btn-a3f1",
      "type": "button",
      "label": "Create Project",
      "bounds": { "x": 960, "y": 80, "w": 160, "h": 40 },
      "state": {}
    }
  ],
  "forms": []
}

Full

Everything in summary, plus all visible text content on the page.

Detail Levels

LevelTokensUse case
minimal~50-200Orientation after navigation. What regions exist? How many interactive elements?
summary~500-5000Working with the page. Full element list, form structures, content summaries.
fullvariableReading page content. All visible text included.

Navigation tools default to minimal. The observe tool defaults to summary. Both accept an optional detail parameter to override.

Element IDs

Element IDs are stable across minor DOM mutations. They're generated by hashing a composite key of element type, ARIA role, accessible name, and DOM path signature:

btn-a3f1  (button)    inp-c7e2  (text input)
lnk-d4b9  (link)      sel-e8a3  (select)
chk-f1a2  (checkbox)  frm-b1d4  (form)
rgn-e0d2  (landmark)  hdg-0f40  (heading)

IDs survive unrelated DOM changes and element reordering within the same container. When an agent navigates at minimal detail (no individual element IDs), it uses find to locate elements by text, type, or spatial proximity — the returned elements include IDs ready for interaction.

Development

# Run in watch mode
npm run dev

# Run all tests
npm test

# Run only unit tests
npm run test:unit

# Run only integration tests
npm run test:integration

# Type check
npx tsc --noEmit

Project Structure

src/
  browser/          # Puppeteer lifecycle, tab management, CDP sessions
  renderer/         # Accessibility tree extraction, layout, content, element IDs
  state/            # Snapshot store, structural differ
  tools/            # MCP tool definitions (navigation, observation, interaction, session, dev-mode)
  dev/              # Static server, file watcher, auditor
  types/            # TypeScript interfaces
  utils/            # Logger, hash, wait utilities
tests/
  unit/             # Fast tests with mocks
  integration/      # Full Puppeteer tests against fixture HTML
  fixtures/pages/   # Test HTML files

Architecture

The Renderer Pipeline is the core — it calls extractors in order and assembles a PageRepresentation:

  1. Accessibility tree extraction (CDP Accessibility.getFullAXTree)
  2. Layout extraction (CDP DOM.getBoxModel)
  3. Landmark, heading, interactive element, and content extraction
  4. Element ID generation (hash-based, stable across re-renders)

All tools go through renderActivePage() which handles snapshots, reload events, dialog detection, and response formatting.

Sandbox

Charlotte includes a test website in tests/sandbox/ that exercises all 40 tools without touching the public internet. Serve it locally with:

dev_serve({ path: "tests/sandbox" })

Four pages cover navigation, forms, interactive elements, delayed content, scroll containers, and more. See docs/sandbox.md for the full page reference and a tool-by-tool exercise checklist.

Known Issues

Tool naming convention — Charlotte uses : as a namespace separator in tool names (e.g., charlotte:navigate, charlotte:observe). MCP SDK v1.26.0+ logs validation warnings for this character, as the emerging SEP standard restricts tool names to [A-Za-z0-9_.-]. This does not affect functionality — all tools work correctly — but produces stderr warnings on server startup. Will be addressed in a future release to comply with the SEP standard.

Iframe content not captured — Charlotte reads the main frame's accessibility tree only. Content inside iframes (same-origin or cross-origin) is not included in the page representation. See the Roadmap for planned iframe support.

Shadow DOM — Open shadow DOM works transparently. Chromium's accessibility tree pierces open shadow boundaries, so web components (e.g., GitHub's <relative-time>, <tool-tip>) render their content into Charlotte's representation without special handling. Closed shadow roots are opaque to the accessibility tree and will not be captured.

No file upload support — Charlotte identifies file_input elements in the page representation but provides no tool to set file paths on them. Workflows that require file uploads cannot be completed.

Roadmap

Interaction Gaps

File Upload — Add a charlotte:upload tool to set file paths on file_input elements via Puppeteer's elementHandle.uploadFile(). Charlotte already identifies file inputs but cannot act on them.

Batch Form Fill — Add a charlotte:fill_form tool that accepts an array of {element_id, value} pairs and fills an entire form in a single tool call, reducing N sequential type/select/toggle calls to one.

Slow Typing — Add a slowly or character_delay parameter to charlotte:type for character-by-character input. Required for sites with key-by-key event handlers (autocomplete, search-as-you-type, input validation).

Session & Configuration

Connect to Existing Browser — Add a --cdp-endpoint CLI argument so Charlotte can attach to an already-running browser via puppeteer.connect() instead of always launching a new instance. Enables working with logged-in sessions and browser extensions.

Persistent Init Scripts — Add a --init-script CLI argument to inject JavaScript on every page load via page.evaluateOnNewDocument(). Charlotte's dev_inject currently applies CSS/JS once and does not persist across navigations.

Configuration File — Support a --config CLI argument to load settings from a JSON file, simplifying repeatable setups and CI/CD integration.

File Output — Add an optional filename parameter to screenshot, observe, and future monitoring tools so large responses can be written to disk instead of returned inline, reducing token consumption.

Full Device Emulation — Extend charlotte:viewport to accept named devices (e.g., "iPhone 15") and configure user agent, touch support, and device pixel ratio via CDP, not just viewport dimensions.

Feature Roadmap

Screenshot Artifacts — Save screenshots as persistent file artifacts rather than only returning inline data, enabling agents to reference and manage captured images across sessions.

Video Recording — Record interactions as video, capturing the full sequence of agent-driven navigation and manipulation for debugging, documentation, and review.

ARM64 Docker Images — Add linux/arm64 platform support to the Docker publish workflow for native performance on Apple Silicon Macs and ARM servers.

Iframe Content Extraction — Traverse child frames via CDP to include iframe content in the page representation. Currently, Charlotte only reads the main frame's accessibility tree; same-origin and cross-origin iframe content is invisible.

See docs/playwright-mcp-gap-analysis.md for the full gap analysis against Playwright MCP, including lower-priority items (vision tools, testing/verification, tracing, transport, security) and areas where Charlotte has advantages.

Full Specification

See docs/CHARLOTTE_SPEC.md for the complete specification including all tool parameters, the page representation format, element identity strategy, and architecture details.

License

MIT

Contributing

See CONTRIBUTING.md for guidelines.

Part of a growing suite of literary-named MCP servers. See more at github.com/TickTockBent.

Server Config

{
  "mcpServers": {
    "charlotte": {
      "type": "stdio",
      "command": "npx",
      "args": [
        "@ticktockbent/charlotte"
      ],
      "env": {}
    }
  }
}
Recommend Servers
TraeBuild with Free GPT-4.1 & Claude 3.7. Fully MCP-Ready.
MiniMax MCPOfficial MiniMax Model Context Protocol (MCP) server that enables interaction with powerful Text to Speech, image generation and video generation APIs.
Playwright McpPlaywright MCP server
Howtocook Mcp基于Anduin2017 / HowToCook (程序员在家做饭指南)的mcp server,帮你推荐菜谱、规划膳食,解决“今天吃什么“的世纪难题; Based on Anduin2017/HowToCook (Programmer's Guide to Cooking at Home), MCP Server helps you recommend recipes, plan meals, and solve the century old problem of "what to eat today"
Y GuiA web-based graphical interface for AI chat interactions with support for multiple AI models and MCP (Model Context Protocol) servers.
Serper MCP ServerA Serper MCP Server
Baidu Map百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Amap Maps高德地图官方 MCP Server
Visual Studio Code - Open Source ("Code - OSS")Visual Studio Code
ChatWiseThe second fastest AI chatbot™
CursorThe AI Code Editor
EdgeOne Pages MCPAn MCP service designed for deploying HTML content to EdgeOne Pages and obtaining an accessible public URL.
BlenderBlenderMCP connects Blender to Claude AI through the Model Context Protocol (MCP), allowing Claude to directly interact with and control Blender. This integration enables prompt assisted 3D modeling, scene creation, and manipulation.
DeepChatYour AI Partner on Desktop
RedisA Model Context Protocol server that provides access to Redis databases. This server enables LLMs to interact with Redis key-value stores through a set of standardized tools.
AiimagemultistyleA Model Context Protocol (MCP) server for image generation and manipulation using fal.ai's Stable Diffusion model.
MCP AdvisorMCP Advisor & Installation - Use the right MCP server for your needs
Tavily Mcp
Zhipu Web SearchZhipu Web Search MCP Server is a search engine specifically designed for large models. It integrates four search engines, allowing users to flexibly compare and switch between them. Building upon the web crawling and ranking capabilities of traditional search engines, it enhances intent recognition capabilities, returning results more suitable for large model processing (such as webpage titles, URLs, summaries, site names, site icons, etc.). This helps AI applications achieve "dynamic knowledge acquisition" and "precise scenario adaptation" capabilities.
Jina AI MCP ToolsA Model Context Protocol (MCP) server that integrates with Jina AI Search Foundation APIs.
WindsurfThe new purpose-built IDE to harness magic