Sponsored by Deepsite.site

Rlm Infinite Memory For Claude Code

Created By
EncrEor3 days ago
RLM solves Claude Code's biggest limitation: memory loss. Every time you /compact or hit the context limit, your decisions, insights, and conversation history disappear. RLM fixes this with an MCP server that auto-saves before context loss, lets you store key decisions as searchable insights, and keeps your full conversation history in persistent chunks. Features include BM25 ranked search, fuzzy matching (typo-tolerant), multi-project organization, and smart retention that auto-archives old chunks. Inspired by MIT CSAIL's Recursive Language Models paper. Install in 3 lines, zero configuration needed.
Overview

RLM - Infinite Memory for Claude Code

Your Claude Code sessions forget everything after /compact. RLM fixes that.

License: MIT Python 3.10+ MCP Server CI codecov PyPI version

Français | English | 日本語


The Problem

Claude Code has a context window limit. When it fills up:

  • /compact wipes your conversation history
  • Previous decisions, insights, and context are lost
  • You repeat yourself. Claude makes the same mistakes. Productivity drops.

The Solution

RLM is an MCP server that gives Claude Code persistent memory across sessions:

You: "Remember that the client prefers 500ml bottles"
     → Saved. Forever. Across all sessions.

You: "What did we decide about the API architecture?"
     → Claude searches its memory and finds the answer.

3 lines to install. 14 tools. Zero configuration.


Quick Install

pip install mcp-rlm-server[all]

Via Git

git clone https://github.com/EncrEor/rlm-claude.git
cd rlm-claude
./install.sh

Restart Claude Code. Done.

Requirements: Python 3.10+, Claude Code CLI

Upgrading from v0.9.0 or earlier

v0.9.1 moved the source code from mcp_server/ to src/mcp_server/ (PyPA best practice). A compatibility symlink is included so existing installations keep working, but we recommend re-running the installer:

cd rlm-claude
git pull
./install.sh          # reconfigures the MCP server path

Your data (~/.claude/rlm/) is untouched. Only the server path is updated.


How It Works

                    ┌─────────────────────────┐
                    │     Claude Code CLI      │
                    └────────────┬────────────┘
                    ┌────────────▼────────────┐
                    │    RLM MCP Server        │
                    │    (14 tools)            │
                    └────────────┬────────────┘
              ┌──────────────────┼──────────────────┐
              │                  │                   │
    ┌─────────▼────────┐ ┌──────▼──────┐ ┌──────────▼─────────┐
    │    Insights       │ │   Chunks    │ │    Retention        │
    │ (key decisions,   │ │ (full conv  │ │ (auto-archive,      │
    │  facts, prefs)    │ │  history)   │ │  restore, purge)    │
    └──────────────────┘ └─────────────┘ └────────────────────┘

Auto-Save Before Context Loss

RLM hooks into Claude Code's /compact event. Before your context is wiped, RLM automatically saves a snapshot. No action needed.

Two Memory Systems

SystemWhat it storesHow to use
InsightsKey decisions, facts, preferencesrlm_remember() / rlm_recall()
ChunksFull conversation segmentsrlm_chunk() / rlm_peek() / rlm_grep()

Features

Memory & Insights

  • rlm_remember - Save decisions, facts, preferences with categories and importance levels
  • rlm_recall - Search insights by keyword (multi-word tokenized), category, or importance
  • rlm_forget - Remove an insight
  • rlm_status - System overview (insight count, chunk stats, access metrics)

Conversation History

  • rlm_chunk - Save conversation segments to persistent storage
  • rlm_peek - Read a chunk (full or partial by line range)
  • rlm_grep - Regex search across all chunks (+ fuzzy matching for typo tolerance)
  • rlm_search - Hybrid search: BM25 + semantic cosine similarity (FR/EN, accent-normalized, chunks + insights)
  • rlm_list_chunks - List all chunks with metadata

Multi-Project Organization

  • rlm_sessions - Browse sessions by project or domain
  • rlm_domains - List available domains for categorization
  • Auto-detection of project from git or working directory
  • Cross-project filtering on all search tools

Smart Retention

  • rlm_retention_preview - Preview what would be archived (dry-run)
  • rlm_retention_run - Archive old unused chunks, purge ancient ones
  • rlm_restore - Bring back archived chunks
  • 3-zone lifecycle: ActiveArchive (.gz) → Purge
  • Immunity system: critical tags, frequent access, and keywords protect chunks

Auto-Chunking (Hooks)

  • PreCompact hook: Automatic snapshot before /compact or auto-compact
  • PostToolUse hook: Stats tracking after chunk operations
  • User-driven philosophy: you decide when to chunk, the system saves before loss

Semantic Search (optional)

  • Hybrid BM25 + cosine - Combines keyword matching with vector similarity for better relevance
  • Auto-embedding - New chunks are automatically embedded at creation time
  • Two providers - Model2Vec (fast, 256d) or FastEmbed (accurate, 384d)
  • Graceful degradation - Falls back to pure BM25 when semantic deps are not installed

Provider comparison (benchmark on 108 chunks)

Model2Vec (default)FastEmbed
Modelpotion-multilingual-128Mparaphrase-multilingual-MiniLM-L12-v2
Dimensions256384
Embed 108 chunks0.06s1.30s
Search latency0.1ms/query1.5ms/query
Memory0.1 MB0.3 MB
Disk (model)~35 MB~230 MB
Semantic qualityGood (keyword-biased)Better (true semantic)
Speed21x fasterBaseline

Top-5 result overlap between providers: ~1.6/5 (different results in 7/8 queries). FastEmbed captures more semantic meaning while Model2Vec leans toward keyword similarity. The hybrid BM25 + cosine fusion compensates for both weaknesses.

Recommendation: Start with Model2Vec (default). Switch to FastEmbed only if you need better semantic accuracy and can afford the slower startup.

# Model2Vec (default) — fast, ~35 MB
pip install mcp-rlm-server[semantic]

# FastEmbed — more accurate, ~230 MB, slower
pip install mcp-rlm-server[semantic-fastembed]
export RLM_EMBEDDING_PROVIDER=fastembed

# Compare both providers on your data
python3 scripts/benchmark_providers.py

# Backfill existing chunks (run once after install)
python3 scripts/backfill_embeddings.py

Sub-Agent Skills

  • /rlm-analyze - Analyze a single chunk with an isolated sub-agent
  • /rlm-parallel - Analyze multiple chunks in parallel (Map-Reduce pattern from MIT RLM paper)

Comparison

FeatureRaw ContextLetta/MemGPTRLM
Persistent memoryNoYesYes
Works with Claude CodeN/ANo (own runtime)Native MCP
Auto-save before compactNoN/AYes (hooks)
Search (regex + BM25 + semantic)NoBasicYes
Fuzzy search (typo-tolerant)NoNoYes
Multi-project supportNoNoYes
Smart retention (archive/purge)NoBasicYes
Sub-agent analysisNoNoYes
Zero config installN/AComplex3 lines
FR/EN supportN/AEN onlyBoth
CostFreeSelf-hostedFree

Usage Examples

Save and recall insights

# Save a key decision
rlm_remember("Backend is the source of truth for all data",
             category="decision", importance="high",
             tags="architecture,backend")

# Find it later
rlm_recall(query="source of truth")
rlm_recall(category="decision")

Manage conversation history

# Save important discussion
rlm_chunk("Discussion about API redesign... [long content]",
          summary="API v2 architecture decisions",
          tags="api,architecture")

# Search across all history
rlm_search("API architecture decisions")      # BM25 ranked
rlm_grep("authentication", fuzzy=True)         # Typo-tolerant

# Read a specific chunk
rlm_peek("2026-01-18_MyProject_001")

Multi-project organization

# Filter by project
rlm_search("deployment issues", project="MyApp")
rlm_grep("database", project="MyApp", domain="infra")

# Browse sessions
rlm_sessions(project="MyApp")

Project Structure

rlm-claude/
├── src/mcp_server/
│   ├── server.py              # MCP server (14 tools)
│   └── tools/
│       ├── memory.py          # Insights (remember/recall/forget)
│       ├── navigation.py      # Chunks (chunk/peek/grep/list)
│       ├── search.py          # BM25 search engine
│       ├── tokenizer_fr.py    # FR/EN tokenization
│       ├── sessions.py        # Multi-session management
│       ├── retention.py       # Archive/restore/purge lifecycle
│       ├── embeddings.py      # Embedding providers (Model2Vec, FastEmbed)
│       ├── vecstore.py        # Vector store (.npz) for semantic search
│       └── fileutil.py        # Safe I/O (atomic writes, path validation, locking)
├── hooks/                     # Claude Code hooks
│   ├── pre_compact_chunk.py   # Auto-save before /compact (PreCompact hook)
│   └── reset_chunk_counter.py # Stats reset after chunk (PostToolUse hook)
├── templates/
│   ├── hooks_settings.json    # Hook config template
│   ├── CLAUDE_RLM_SNIPPET.md  # CLAUDE.md instructions
│   └── skills/                # Sub-agent skills
├── context/                   # Storage (created at install, git-ignored)
│   ├── session_memory.json    # Insights
│   ├── index.json             # Chunk index
│   ├── chunks/                # Conversation history
│   ├── archive/               # Compressed archives (.gz)
│   ├── embeddings.npz         # Semantic vectors (Phase 8)
│   └── sessions.json          # Session index
├── install.sh                 # One-command installer
└── README.md

Configuration

Hook Configuration

The installer automatically configures hooks in ~/.claude/settings.json:

{
  "hooks": {
    "PreCompact": [
      {
        "matcher": "manual",
        "hooks": [{ "type": "command", "command": "python3 ~/.claude/rlm/hooks/pre_compact_chunk.py" }]
      },
      {
        "matcher": "auto",
        "hooks": [{ "type": "command", "command": "python3 ~/.claude/rlm/hooks/pre_compact_chunk.py" }]
      }
    ],
    "PostToolUse": [{
      "matcher": "mcp__rlm-server__rlm_chunk",
      "hooks": [{ "type": "command", "command": "python3 ~/.claude/rlm/hooks/reset_chunk_counter.py" }]
    }]
  }
}

Custom Domains

Organize chunks by topic with custom domains:

{
  "domains": {
    "my_project": {
      "description": "Domains for my project",
      "list": ["feature", "bugfix", "infra", "docs"]
    }
  }
}

Edit context/domains.json after installation.


Manual Installation

If you prefer to install manually:

pip install -e ".[all]"
claude mcp add rlm-server -- python3 -m mcp_server
mkdir -p ~/.claude/rlm/hooks
cp hooks/*.py ~/.claude/rlm/hooks/
chmod +x ~/.claude/rlm/hooks/*.py
mkdir -p ~/.claude/skills/rlm-analyze ~/.claude/skills/rlm-parallel
cp templates/skills/rlm-analyze/skill.md ~/.claude/skills/rlm-analyze/
cp templates/skills/rlm-parallel/skill.md ~/.claude/skills/rlm-parallel/

Then configure hooks in ~/.claude/settings.json (see above).

Uninstall

./uninstall.sh              # Interactive (choose to keep or delete data)
./uninstall.sh --keep-data  # Remove RLM config, keep your chunks/insights
./uninstall.sh --all        # Remove everything
./uninstall.sh --dry-run    # Preview what would be removed

Security

RLM includes built-in protections for safe operation:

  • Path traversal prevention - Chunk IDs are validated against a strict allowlist ([a-zA-Z0-9_.-&]), and resolved paths are verified to stay within the storage directory
  • Atomic writes - All JSON and chunk files are written using write-to-temp-then-rename, preventing corruption from interrupted writes or crashes
  • File locking - Concurrent read-modify-write operations on shared indexes use fcntl.flock exclusive locks
  • Content size limits - Chunks are limited to 2 MB, and gzip decompression (archive restore) is capped at 10 MB to prevent resource exhaustion
  • SHA-256 hashing - Content deduplication uses SHA-256 (not MD5)

All I/O safety primitives are centralized in mcp_server/tools/fileutil.py.


Troubleshooting

"MCP server not found"

claude mcp list                    # Check servers
claude mcp remove rlm-server       # Remove if exists
claude mcp add rlm-server -- python3 -m mcp_server

"Hooks not working"

cat ~/.claude/settings.json | grep -A 10 "PreCompact"  # Verify hooks config
ls ~/.claude/rlm/hooks/                                  # Check installed hooks

Roadmap

  • Phase 1: Memory tools (remember/recall/forget/status)
  • Phase 2: Navigation tools (chunk/peek/grep/list)
  • Phase 3: Auto-chunking + sub-agent skills
  • Phase 4: Production (auto-summary, dedup, access tracking)
  • Phase 5: Advanced (BM25 search, fuzzy grep, multi-sessions, retention)
  • Phase 6: Production-ready (tests, CI/CD, PyPI)
  • Phase 7: MAGMA-inspired (temporal filtering, entity extraction)
  • Phase 8: Hybrid semantic search (BM25 + cosine, Model2Vec)

Inspired By

Research Papers

  • RLM Paper (MIT CSAIL) - Zhang et al., Dec 2025 - "Recursive Language Models" — foundational architecture (chunk/peek/grep, sub-agent analysis)
  • MAGMA (arXiv:2601.03236) - Jan 2026 - "Memory-Augmented Generation with Memory Agents" — temporal filtering, entity extraction (Phase 7)

Libraries & Tools

  • Model2Vec - Static word embeddings for fast semantic search (Phase 8)
  • BM25S - Fast BM25 implementation in pure Python (Phase 5)
  • FastEmbed - ONNX-based embeddings, optional provider (Phase 8)
  • Letta/MemGPT - AI agent memory framework — early inspiration

Standards & Platform


Authors

  • Ahmed MAKNI (@EncrEor)
  • Claude Opus 4.5 (joint R&D)

License

MIT License - see LICENSE

Server Config

{
  "mcpServers": {
    "rlm-server": {
      "command": "python3",
      "args": [
        "/path/to/rlm-claude/mcp_server/server.py"
      ]
    }
  }
}
Recommend Servers
TraeBuild with Free GPT-4.1 & Claude 3.7. Fully MCP-Ready.
MCP AdvisorMCP Advisor & Installation - Use the right MCP server for your needs
Baidu Map百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
CursorThe AI Code Editor
Serper MCP ServerA Serper MCP Server
MiniMax MCPOfficial MiniMax Model Context Protocol (MCP) server that enables interaction with powerful Text to Speech, image generation and video generation APIs.
Playwright McpPlaywright MCP server
TimeA Model Context Protocol server that provides time and timezone conversion capabilities. This server enables LLMs to get current time information and perform timezone conversions using IANA timezone names, with automatic system timezone detection.
Zhipu Web SearchZhipu Web Search MCP Server is a search engine specifically designed for large models. It integrates four search engines, allowing users to flexibly compare and switch between them. Building upon the web crawling and ranking capabilities of traditional search engines, it enhances intent recognition capabilities, returning results more suitable for large model processing (such as webpage titles, URLs, summaries, site names, site icons, etc.). This helps AI applications achieve "dynamic knowledge acquisition" and "precise scenario adaptation" capabilities.
Visual Studio Code - Open Source ("Code - OSS")Visual Studio Code
Y GuiA web-based graphical interface for AI chat interactions with support for multiple AI models and MCP (Model Context Protocol) servers.
Tavily Mcp
DeepChatYour AI Partner on Desktop
EdgeOne Pages MCPAn MCP service designed for deploying HTML content to EdgeOne Pages and obtaining an accessible public URL.
Amap Maps高德地图官方 MCP Server
AiimagemultistyleA Model Context Protocol (MCP) server for image generation and manipulation using fal.ai's Stable Diffusion model.
BlenderBlenderMCP connects Blender to Claude AI through the Model Context Protocol (MCP), allowing Claude to directly interact with and control Blender. This integration enables prompt assisted 3D modeling, scene creation, and manipulation.
Jina AI MCP ToolsA Model Context Protocol (MCP) server that integrates with Jina AI Search Foundation APIs.
ChatWiseThe second fastest AI chatbot™
Howtocook Mcp基于Anduin2017 / HowToCook (程序员在家做饭指南)的mcp server,帮你推荐菜谱、规划膳食,解决“今天吃什么“的世纪难题; Based on Anduin2017/HowToCook (Programmer's Guide to Cooking at Home), MCP Server helps you recommend recipes, plan meals, and solve the century old problem of "what to eat today"
WindsurfThe new purpose-built IDE to harness magic