Sponsored by Deepsite.site

Joern MCP

Created By
Lekssays2 months ago
A Model Context Protocol (MCP) server that provides AI assistants with static code analysis capabilities using Joern's Code Property Graph (CPG) technology.
Content

🕷️ joern-mcp

A Model Context Protocol (MCP) server that provides AI assistants with static code analysis capabilities using Joern's Code Property Graph (CPG) technology.

Features

  • Multi-Language Support: Java, C/C++, JavaScript, Python, Go, Kotlin, C#, Ghidra, Jimple, PHP, Ruby, Swift
  • Docker Isolation: Each analysis session runs in a secure container
  • GitHub Integration: Analyze repositories directly from GitHub URLs
  • Session-Based: Persistent CPG sessions with automatic cleanup
  • Redis-Backed: Fast caching and session management
  • Async Queries: Non-blocking CPG generation and query execution

Quick Start

Prerequisites

  • Python 3.8+
  • Docker
  • Redis
  • Git

Installation

  1. Clone and install dependencies:
git clone https://github.com/Lekssays/joern-mcp.git
cd joern-mcp
pip install -r requirements.txt
  1. Setup (builds Joern image and starts Redis):
./setup.sh
  1. Configure (optional):
cp config.example.yaml config.yaml
# Edit config.yaml as needed
  1. Run the server:
python main.py
# Server will be available at http://localhost:4242

Integration with GitHub Copilot

The server uses Streamable HTTP transport for network accessibility and supports multiple concurrent clients.

Add to your VS Code settings.json:

{
  "github.copilot.advanced": {
    "mcp": {
      "servers": {
        "joern-mcp": {
          "url": "http://localhost:4242/mcp",
        }
      }
    }
  }
}

Make sure the server is running before using it with Copilot:

python main.py

Available Tools

Core Tools

  • create_cpg_session: Initialize analysis session from local path or GitHub URL
  • run_cpgql_query: Execute synchronous CPGQL queries with JSON output
  • run_cpgql_query_async: Execute asynchronous queries with status tracking
  • get_query_status: Check status of asynchronously running queries
  • get_query_result: Retrieve results from completed queries
  • cleanup_queries: Clean up old completed query results
  • get_session_status: Check session state and metadata
  • list_sessions: View active sessions with filtering
  • close_session: Clean up session resources
  • cleanup_all_sessions: Clean up multiple sessions and containers

Code Browsing Tools

  • get_codebase_summary: Get high-level overview of codebase (file count, method count, language)
  • list_files: List all source files with optional regex filtering
  • list_methods: Discover all methods/functions with filtering by name, file, or external status
  • get_method_source: Retrieve actual source code for specific methods
  • list_calls: Find function call relationships and dependencies
  • get_call_graph: Build call graphs (outgoing callees or incoming callers) with configurable depth
  • list_parameters: Get detailed parameter information for methods
  • find_literals: Search for hardcoded values (strings, numbers, API keys, etc)
  • get_code_snippet: Retrieve code snippets from files with line range

Security Analysis Tools

  • find_taint_sources: Locate likely external input points (taint sources)
  • find_taint_sinks: Locate dangerous sinks where tainted data could cause vulnerabilities
  • find_taint_flows: Find dataflow paths from sources to sinks using Joern dataflow primitives
  • find_argument_flows: Find flows where the exact same expression is passed to both source and sink calls
  • check_method_reachability: Check if one method can reach another through the call graph
  • list_taint_paths: List detailed taint flow paths from sources to sinks
  • get_program_slice: Build a program slice from a specific line or call

Example Usage

# Create session from GitHub
{
  "tool": "create_cpg_session",
  "arguments": {
    "source_type": "github",
    "source_path": "https://github.com/user/repo",
    "language": "java"
  }
}

# Get codebase overview
{
  "tool": "get_codebase_summary",
  "arguments": {
    "session_id": "abc-123-def"
  }
}

# List all methods in the codebase
{
  "tool": "list_methods",
  "arguments": {
    "session_id": "abc-123-def",
    "include_external": false,
    "limit": 50
  }
}

# Get source code for a specific method
{
  "tool": "get_method_source",
  "arguments": {
    "session_id": "abc-123-def",
    "method_name": "authenticate"
  }
}

# Find what methods call a specific function
{
  "tool": "get_call_graph",
  "arguments": {
    "session_id": "abc-123-def",
    "method_name": "execute_query",
    "depth": 2,
    "direction": "incoming"
  }
}

# Search for hardcoded secrets
{
  "tool": "find_literals",
  "arguments": {
    "session_id": "abc-123-def",
    "pattern": "(?i).*(password|secret|api_key).*",
    "limit": 20
  }
}

# Get code snippet from a file
{
  "tool": "get_code_snippet",
  "arguments": {
    "session_id": "abc-123-def",
    "filename": "src/main.c",
    "start_line": 10,
    "end_line": 25
  }
}

# Run custom CPGQL query
{
  "tool": "run_cpgql_query",
  "arguments": {
    "session_id": "abc-123-def",
    "query": "cpg.method.name.l"
  }
}

# Find potential security vulnerabilities
{
  "tool": "find_taint_sources",
  "arguments": {
    "session_id": "abc-123-def",
    "language": "c"
  }
}

# Check for data flows from sources to sinks
{
  "tool": "find_taint_flows",
  "arguments": {
    "session_id": "abc-123-def",
    "source_patterns": ["getenv", "fgets"],
    "sink_patterns": ["system", "sprintf"]
  }
}

# Find argument flows between function calls
{
  "tool": "find_argument_flows",
  "arguments": {
    "session_id": "abc-123-def",
    "source_name": "validate_input",
    "sink_name": "process_data",
    "arg_index": 0
  }
}

# Get detailed taint paths
{
  "tool": "list_taint_paths",
  "arguments": {
    "session_id": "abc-123-def",
    "source_pattern": "getenv",
    "sink_pattern": "system",
    "max_paths": 5
  }
}

# Build program slice for security analysis
{
  "tool": "get_program_slice",
  "arguments": {
    "session_id": "abc-123-def",
    "filename": "main.c",
    "line_number": 42,
    "call_name": "memcpy"
  }
}

Security Analysis Capabilities

The security analysis tools provide comprehensive vulnerability detection including:

Taint Analysis:

  • Source identification: find_taint_sources locates external input points
  • Sink identification: find_taint_sinks finds dangerous operations
  • Flow analysis: find_taint_flows traces data from sources to sinks
  • Argument flow analysis: find_argument_flows finds exact expression reuse between calls
  • Path enumeration: list_taint_paths provides detailed propagation chains

Program Slicing:

  • Backward slicing: get_program_slice shows all code affecting a specific operation
  • Data dependencies: Variable assignments and data flow tracking
  • Control dependencies: Conditional statements affecting execution

Reachability Analysis:

  • Method connectivity: check_method_reachability verifies call graph connections
  • Impact analysis: Understand potential execution paths

Configuration

Key settings in config.yaml:

server:
  host: 0.0.0.0
  port: 4242
  log_level: INFO

redis:
  host: localhost
  port: 6379

sessions:
  ttl: 3600                # Session timeout (seconds)
  max_concurrent: 50       # Max concurrent sessions

cpg:
  generation_timeout: 600  # CPG generation timeout (seconds)
  supported_languages: [java, c, cpp, javascript, python, go, kotlin, csharp, ghidra, jimple, php, ruby, swift]

Environment variables override config file settings (e.g., MCP_HOST, REDIS_HOST, SESSION_TTL).

Example CPGQL Queries

Find all methods:

cpg.method.name.l

Find hardcoded secrets:

cpg.literal.code("(?i).*(password|secret|api_key).*").l

Find SQL injection risks:

cpg.call.name(".*execute.*").where(_.argument.isLiteral.code(".*SELECT.*")).l

Find complex methods:

cpg.method.filter(_.cyclomaticComplexity > 10).l

Architecture

  • FastMCP Server: Built on FastMCP 2.12.4 framework with Streamable HTTP transport
  • HTTP Transport: Network-accessible API supporting multiple concurrent clients
  • Docker Containers: One isolated Joern container per session
  • Redis: Session state and query result caching
  • Async Processing: Non-blocking CPG generation
  • CPG Caching: Reuse CPGs for identical source/language combinations

Development

Project Structure

joern-mcp/
├── src/
│   ├── services/       # Session, Docker, Git, CPG, Query services
│   ├── tools/          # MCP tool definitions
│   ├── utils/          # Redis, logging, validators
│   └── models.py       # Data models
├── playground/         # Test codebases and CPGs
├── main.py            # Server entry point
├── config.yaml        # Configuration
└── requirements.txt   # Dependencies

Running Tests

# Install dev dependencies
pip install -r requirements.txt

# Run tests
pytest

# Run with coverage
pytest --cov=src --cov-report=html

Code Quality

# Format
black src/ tests/
isort src/ tests/

# Lint
flake8 src/ tests/
mypy src/

Troubleshooting

Setup issues:

# Re-run setup to rebuild and restart services
./setup.sh

Docker issues:

# Verify Docker is running
docker ps

# Check Joern image
docker images | grep joern

# Check Redis container
docker ps | grep joern-redis

Redis connection issues:

# Test Redis connection
docker exec joern-redis redis-cli ping

# Check Redis logs
docker logs joern-redis

# Restart Redis
docker restart joern-redis

Server connectivity:

# Test server is running
curl http://localhost:4242/health

# Check server logs for errors
python main.py

Loading large projects:

joern:
  binary_path: ${JOERN_BINARY_PATH:joern}
  memory_limit: ${JOERN_MEMORY_LIMIT:16g}
  java_opts: ${JOERN_JAVA_OPTS:-Xmx16G -Xms8G -XX:+UseG1GC -Dfile.encoding=UTF-8}

Debug logging:

export MCP_LOG_LEVEL=DEBUG
python main.py

Contributing

We welcome contributions! Please see CONTRIBUTING.md for:

  • Getting started with development setup
  • Code style and quality guidelines
  • Testing requirements and best practices
  • Submitting changes through pull requests
  • Reporting issues and feature requests
  • Documentation standards

Quick start for contributors:

git clone https://github.com/YOUR_USERNAME/joern-mcp.git
cd joern-mcp
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
./setup.sh

# Create feature branch
git checkout -b feature/your-feature

# Make changes and run tests
pytest && black . && flake8

# Submit pull request

See CONTRIBUTING.md for detailed guidelines.

Acknowledgments


Built with ❤️ in Doha 🇶🇦

Server Config

{
  "mcpServers": {
    "joern-mcp": {
      "url": "https://0.0.0.0:4242/mcp"
    }
  }
}
Recommend Servers
TraeBuild with Free GPT-4.1 & Claude 3.7. Fully MCP-Ready.
Zhipu Web SearchZhipu Web Search MCP Server is a search engine specifically designed for large models. It integrates four search engines, allowing users to flexibly compare and switch between them. Building upon the web crawling and ranking capabilities of traditional search engines, it enhances intent recognition capabilities, returning results more suitable for large model processing (such as webpage titles, URLs, summaries, site names, site icons, etc.). This helps AI applications achieve "dynamic knowledge acquisition" and "precise scenario adaptation" capabilities.
CursorThe AI Code Editor
Baidu Map百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Playwright McpPlaywright MCP server
BlenderBlenderMCP connects Blender to Claude AI through the Model Context Protocol (MCP), allowing Claude to directly interact with and control Blender. This integration enables prompt assisted 3D modeling, scene creation, and manipulation.
AiimagemultistyleA Model Context Protocol (MCP) server for image generation and manipulation using fal.ai's Stable Diffusion model.
Jina AI MCP ToolsA Model Context Protocol (MCP) server that integrates with Jina AI Search Foundation APIs.
Visual Studio Code - Open Source ("Code - OSS")Visual Studio Code
TimeA Model Context Protocol server that provides time and timezone conversion capabilities. This server enables LLMs to get current time information and perform timezone conversions using IANA timezone names, with automatic system timezone detection.
Context7Context7 MCP Server -- Up-to-date code documentation for LLMs and AI code editors
ChatWiseThe second fastest AI chatbot™
MCP AdvisorMCP Advisor & Installation - Use the right MCP server for your needs
Amap Maps高德地图官方 MCP Server
EdgeOne Pages MCPAn MCP service designed for deploying HTML content to EdgeOne Pages and obtaining an accessible public URL.
Serper MCP ServerA Serper MCP Server
Tavily Mcp
Howtocook Mcp基于Anduin2017 / HowToCook (程序员在家做饭指南)的mcp server,帮你推荐菜谱、规划膳食,解决“今天吃什么“的世纪难题; Based on Anduin2017/HowToCook (Programmer's Guide to Cooking at Home), MCP Server helps you recommend recipes, plan meals, and solve the century old problem of "what to eat today"
WindsurfThe new purpose-built IDE to harness magic
DeepChatYour AI Partner on Desktop
MiniMax MCPOfficial MiniMax Model Context Protocol (MCP) server that enables interaction with powerful Text to Speech, image generation and video generation APIs.