flatten-mcp

Created By

shayaShav8 days ago

An MCP server that flattens Claude Code sessions — keeping every prompt and event verbatim while reclaiming context tokens, so you resume the exact same raw conversation at a lower token count instead of compacting it into a lossy summary. It moves bulky tool output (large file reads, command logs, base64 screenshots) into a sidecar file, leaving a tiny retrievable reference in its place. Crash-safe, idempotent, and fully reversible. Real example from the README: a 317,236-token session flattened to 182,287 tokens.

# mcp

# model-context-protocol

Overview Content Tools Comments

Content

flatten-mcp

Resume the exact same conversation at a lower token cost — without compacting it into a lossy summary.

flatten-mcp is a Model Context Protocol server for Claude Code. It shrinks a session's token footprint by moving bulky tool output (large file reads, command logs, base64 screenshots) out of the conversation and into a sidecar file — leaving a tiny, retrievable reference in its place. Your prompts and the chronological flow of the session are preserved verbatim — those lines are never rewritten. You resume the same raw conversation; it just costs less to carry.

See how 317,236 tokens turned into 182,287:

https://github.com/user-attachments/assets/4672b3cd-f78f-4146-97ba-e0077b655381

Why flatten instead of compact?

The standard answer to a full context window is compaction: the model reads the whole conversation and rewrites it into a shorter summary. That summary is lossy by construction — an interpretation of your history, and interpretations drift, smooth over the awkward parts, and quietly drop the detail you didn't know you'd need. But the history is exactly what's worth keeping verbatim: the words you typed at 2 a.m., the precise order of events, the dead ends and the decisions. A fuzzy, half-formed prompt carries more raw truth about your intent than any tidy paragraph written about it after the fact — and preserving it untouched is the foundation of trust in a coding agent.

Flattening is the opposite move. It changes nothing about what was said. In most sessions the model reads a lot — large files, long logs, multiple sources — and keeps every byte of it in context, even though it has nearly always already written down the conclusion in plain prose: the one line that mattered in a 2 MB log, the finding distilled from five files, the running tally of open tasks. The raw source has done its job. Flattening lifts those already-summarized blocks out and swaps each for a lightweight reference ID — so starting cold from a flattened session is usually smooth sailing, and on the rare occasion the raw bytes are needed, they're one retrieve_flattened call away.

What sits in the context window:

   USER         "fix the crash"
   ASSISTANT    reading the logs…
   TOOL_RESULT  ▓▓▓ 2 MB log dump ▓▓▓        ← bulk; already summarized in prose below
   ASSISTANT    "the OOM is at line 88,402 — the fix is …"

After flatten — same words, only the bulk set aside:

   USER         "fix the crash"
   ASSISTANT    reading the logs…
   TOOL_RESULT  [FLATTENED id=… → sidecar]   ← one marker; fetch the full dump on demand
   ASSISTANT    "the OOM is at line 88,402 — the fix is …"

What you'll actually save

Token reduction depends entirely on what the session did:

Read-heavy sessions (lots of large files, logs, or screenshots in context) — expect reductions up to ~50%.
Prose-heavy sessions (little external data ingested) — savings are negligible. There's simply not much bulk to move.
It varies a lot — often a pleasant surprise, and once in a while a touch underwhelming.

When to reach for it. A common point is around 200k tokens. For critical sessions where you want the model at its sharpest and most context-aware, flattening around 250k–300k is where the most dramatic reductions tend to show up.

Flatten smartly, the same way you wouldn't compact mid-way through a large reading task. That said, nothing is ever lost — flattening everything and then cherry-picking the few blocks you still need is a perfectly legitimate strategy.

Quick start

Requires Node.js ≥ 18 and Claude Code.

One command — installs from npm and registers it user-wide:

claude mcp add flatten -s user -- npx -y flatten-mcp@latest

Or register it manually (in ~/.claude.json, or your project's .mcp.json):

{
  "mcpServers": {
    "flatten": {
      "command": "npx",
      "args": ["-y", "flatten-mcp@latest"]
    }
  }
}

Recommended — install the /flatten slash command:

curl -fsSL https://raw.githubusercontent.com/shayaShav/flatten-mcp/main/commands/flatten.md -o ~/.claude/commands/flatten.md

From source (for development)

git clone https://github.com/shayaShav/flatten-mcp.git
cd flatten-mcp
npm install      # builds automatically via the "prepare" script
cp commands/flatten.md ~/.claude/commands/   # optional: installs the /flatten command

{
  "mcpServers": {
    "flatten": {
      "command": "node",
      "args": ["/absolute/path/to/flatten-mcp/dist/index.js"]
    }
  }
}

Configuration

By default the server operates on the project the CLI runs in (its current working directory). Pass project_dir explicitly on any call to target a different project.

Env var	Required	Purpose
`ANTHROPIC_API_KEY`	no	If set, token savings are counted exactly via Anthropic's free `count_tokens` endpoint instead of estimated locally.
`FLATTEN_COUNT_MODEL`	no	Model id used for the exact token count (default: `claude-haiku-4-5-20251001`).

Usage

CAUTION

Always exit the session you want to flatten with Ctrl-C, then flatten it from a different window. Rewriting a live session's file out from under Claude Code corrupts its in-memory state and bricks the session.

Exit the session you want to flatten with Ctrl-C. This is mandatory — a 10-second live-write guard refuses to touch a recently-modified session unless you force it, but exiting is the safe path.
In a new Claude Code window, type /flatten latest or /flatten <session-id> — or ask:

"Flatten the latest session." · or · "Flatten session <session-id>."

/flatten latest (or bare /flatten) flattens the larger of the two most recent sessions — the smaller, seconds-old one is almost always the window doing the flattening itself, and the session worth flattening is the big one. It never forces past the live-write guard.
Resume your original session and send a prompt. When Claude starts outputting text, you'll see the token count drop.

To preview without touching anything, ask for a dry run first. To undo, ask to unflatten the session — every original block is restored to its exact original value.

TIP

Flattening needs no model intelligence — park a second window on a fast, inexpensive model (/model haiku) as a dedicated flattening station and just type /flatten latest.

Tools

Tool	What it does
`flatten_session`	Move bulky tool results into a sidecar, leaving `[FLATTENED …]` markers. Crash-safe and reversible. Supports `dry_run`, `min_size`, `force`, and `include_tool_use_result`.
`retrieve_flattened`	Fetch one original block back by its id — returns the original text, or re-renders a flattened screenshot as a real image.
`unflatten_session`	Reverse a flatten completely: re-inline every block from the sidecar, restoring each flattened result to its exact original value.
`prune_flatten_artifacts`	Reclaim disk by deleting leftover `.bak` / `.tmp` files (and, opt-in, sidecars). Defaults to a safe dry run.
`list_sessions`	List a project's sessions with branch, message count, size, and first prompt.
`search_sessions`	Keyword / branch / date search across past sessions — scans prose, tool I/O, and flatten sidecars so nothing goes dark after flattening.

When a session is flattened, the model sees compact markers like this in place of the original output:

[FLATTENED id=toolu_01AbC… tool=Read file_path=/src/server.ts | text 48213B/612L | session=2f9c… | retrieve_flattened(id,session) for raw content]

Everything the model needs to fetch the original — the id and the session — is right there in the marker.

How it works

Sidecar, not deletion. Each extracted block is written verbatim to <session>.flat.jsonl next to the session. The original session file is backed up once to <session>.jsonl.bak before the first rewrite.
Crash-safe. Originals are persisted to the sidecar before they're removed from the session, and the session is rewritten via an atomic temp-file-and-rename, so an interrupted run can never leave a half-written, irreplaceable session file.
Idempotent. Re-running flatten skips already-flattened blocks and never double-writes a sidecar entry.
Lossless & reversible. Text and base64 images are stored exactly as they appeared, so unflatten_session restores each flattened block to its exact original value (byte-identical for Claude Code's canonical JSON). Your prompts and untouched lines were never altered to begin with.
Disk vs. context tokens. Claude Code stores each tool result twice on disk (once in the API message, once in a toolUseResult mirror) and only one copy is ever sent to the model. flatten reports both diskBytesSaved (affects --resume parse speed) and contextTokensSaved out of contextTokensTotal (the number that actually matters for the context window and compaction) — they differ a lot, and the tool is explicit about which is which.

See docs/ARCHITECTURE.md for the session JSONL format, the sidecar schema, and the marker protocol.

Compatibility & roadmap

Claude Code only, for now. flatten-mcp reads Claude Code's session store at ~/.claude/projects/<encoded-project-dir>/*.jsonl. It has been tested against Claude Code exclusively; the paths and the JSONL schema are specific to it and will not work for other agents or LLM CLIs as-is.
Planned — a pluggable session backend. Porting to other agents means abstracting the storage location and the on-disk message format behind a small adapter. Contributions welcome.

Contributing

Issues and PRs are welcome. To develop locally:

npm install
npm run dev        # tsc --watch
npm run build      # one-off compile to dist/

License

Server Config

{
  "mcpServers": {
    "flatten": {
      "command": "npx",
      "args": [
        "-y",
        "flatten-mcp@latest"
      ]
    }
  }
}

Recommend Servers

TraeBuild with Free GPT-4.1 & Claude 3.7. Fully MCP-Ready.

Y GuiA web-based graphical interface for AI chat interactions with support for multiple AI models and MCP (Model Context Protocol) servers.

Zhipu Web SearchZhipu Web Search MCP Server is a search engine specifically designed for large models. It integrates four search engines, allowing users to flexibly compare and switch between them. Building upon the web crawling and ranking capabilities of traditional search engines, it enhances intent recognition capabilities, returning results more suitable for large model processing (such as webpage titles, URLs, summaries, site names, site icons, etc.). This helps AI applications achieve "dynamic knowledge acquisition" and "precise scenario adaptation" capabilities.

Amap Maps高德地图官方 MCP Server

MCP AdvisorMCP Advisor & Installation - Use the right MCP server for your needs

AiimagemultistyleA Model Context Protocol (MCP) server for image generation and manipulation using fal.ai's Stable Diffusion model.

Tavily Mcp

Playwright McpPlaywright MCP server

Jina AI MCP ToolsA Model Context Protocol (MCP) server that integrates with Jina AI Search Foundation APIs.

ChatWiseThe second fastest AI chatbot™

MiniMax MCPOfficial MiniMax Model Context Protocol (MCP) server that enables interaction with powerful Text to Speech, image generation and video generation APIs.

CursorThe AI Code Editor

Baidu Map百度地图核心API现已全面兼容MCP协议，是国内首家兼容MCP协议的地图服务商。

Visual Studio Code - Open Source ("Code - OSS")Visual Studio Code

RedisA Model Context Protocol server that provides access to Redis databases. This server enables LLMs to interact with Redis key-value stores through a set of standardized tools.

DeepChatYour AI Partner on Desktop

Serper MCP ServerA Serper MCP Server

EdgeOne Pages MCPAn MCP service designed for deploying HTML content to EdgeOne Pages and obtaining an accessible public URL.

WindsurfThe new purpose-built IDE to harness magic

BlenderBlenderMCP connects Blender to Claude AI through the Model Context Protocol (MCP), allowing Claude to directly interact with and control Blender. This integration enables prompt assisted 3D modeling, scene creation, and manipulation.

Howtocook Mcp基于Anduin2017 / HowToCook （程序员在家做饭指南）的mcp server，帮你推荐菜谱、规划膳食，解决“今天吃什么“的世纪难题； Based on Anduin2017/HowToCook (Programmer's Guide to Cooking at Home), MCP Server helps you recommend recipes, plan meals, and solve the century old problem of "what to eat today"