- Joern MCP
Joern MCP
🕷️ joern-mcp
A Model Context Protocol (MCP) server that provides AI assistants with static code analysis capabilities using Joern's Code Property Graph (CPG) technology.
Features
- Multi-Language Support: Java, C/C++, JavaScript, Python, Go, Kotlin, C#, Ghidra, Jimple, PHP, Ruby, Swift
- Docker Isolation: Each analysis session runs in a secure container
- GitHub Integration: Analyze repositories directly from GitHub URLs
- Session-Based: Persistent CPG sessions with automatic cleanup
- Redis-Backed: Fast caching and session management
- Async Queries: Non-blocking CPG generation and query execution
Quick Start
Prerequisites
- Python 3.8+
- Docker
- Redis
- Git
Installation
- Clone and install dependencies:
git clone https://github.com/Lekssays/joern-mcp.git
cd joern-mcp
pip install -r requirements.txt
- Setup (builds Joern image and starts Redis):
./setup.sh
- Configure (optional):
cp config.example.yaml config.yaml
# Edit config.yaml as needed
- Run the server:
python main.py
# Server will be available at http://localhost:4242
Integration with GitHub Copilot
The server uses Streamable HTTP transport for network accessibility and supports multiple concurrent clients.
Add to your VS Code settings.json:
{
"github.copilot.advanced": {
"mcp": {
"servers": {
"joern-mcp": {
"url": "http://localhost:4242/mcp",
}
}
}
}
}
Make sure the server is running before using it with Copilot:
python main.py
Available Tools
Core Tools
create_cpg_session: Initialize analysis session from local path or GitHub URLrun_cpgql_query: Execute synchronous CPGQL queries with JSON outputrun_cpgql_query_async: Execute asynchronous queries with status trackingget_query_status: Check status of asynchronously running queriesget_query_result: Retrieve results from completed queriescleanup_queries: Clean up old completed query resultsget_session_status: Check session state and metadatalist_sessions: View active sessions with filteringclose_session: Clean up session resourcescleanup_all_sessions: Clean up multiple sessions and containers
Code Browsing Tools
get_codebase_summary: Get high-level overview of codebase (file count, method count, language)list_files: List all source files with optional regex filteringlist_methods: Discover all methods/functions with filtering by name, file, or external statusget_method_source: Retrieve actual source code for specific methodslist_calls: Find function call relationships and dependenciesget_call_graph: Build call graphs (outgoing callees or incoming callers) with configurable depthlist_parameters: Get detailed parameter information for methodsfind_literals: Search for hardcoded values (strings, numbers, API keys, etc)get_code_snippet: Retrieve code snippets from files with line range
Security Analysis Tools
find_taint_sources: Locate likely external input points (taint sources)find_taint_sinks: Locate dangerous sinks where tainted data could cause vulnerabilitiesfind_taint_flows: Find dataflow paths from sources to sinks using Joern dataflow primitivesfind_argument_flows: Find flows where the exact same expression is passed to both source and sink callscheck_method_reachability: Check if one method can reach another through the call graphlist_taint_paths: List detailed taint flow paths from sources to sinksget_program_slice: Build a program slice from a specific line or call
Example Usage
# Create session from GitHub
{
"tool": "create_cpg_session",
"arguments": {
"source_type": "github",
"source_path": "https://github.com/user/repo",
"language": "java"
}
}
# Get codebase overview
{
"tool": "get_codebase_summary",
"arguments": {
"session_id": "abc-123-def"
}
}
# List all methods in the codebase
{
"tool": "list_methods",
"arguments": {
"session_id": "abc-123-def",
"include_external": false,
"limit": 50
}
}
# Get source code for a specific method
{
"tool": "get_method_source",
"arguments": {
"session_id": "abc-123-def",
"method_name": "authenticate"
}
}
# Find what methods call a specific function
{
"tool": "get_call_graph",
"arguments": {
"session_id": "abc-123-def",
"method_name": "execute_query",
"depth": 2,
"direction": "incoming"
}
}
# Search for hardcoded secrets
{
"tool": "find_literals",
"arguments": {
"session_id": "abc-123-def",
"pattern": "(?i).*(password|secret|api_key).*",
"limit": 20
}
}
# Get code snippet from a file
{
"tool": "get_code_snippet",
"arguments": {
"session_id": "abc-123-def",
"filename": "src/main.c",
"start_line": 10,
"end_line": 25
}
}
# Run custom CPGQL query
{
"tool": "run_cpgql_query",
"arguments": {
"session_id": "abc-123-def",
"query": "cpg.method.name.l"
}
}
# Find potential security vulnerabilities
{
"tool": "find_taint_sources",
"arguments": {
"session_id": "abc-123-def",
"language": "c"
}
}
# Check for data flows from sources to sinks
{
"tool": "find_taint_flows",
"arguments": {
"session_id": "abc-123-def",
"source_patterns": ["getenv", "fgets"],
"sink_patterns": ["system", "sprintf"]
}
}
# Find argument flows between function calls
{
"tool": "find_argument_flows",
"arguments": {
"session_id": "abc-123-def",
"source_name": "validate_input",
"sink_name": "process_data",
"arg_index": 0
}
}
# Get detailed taint paths
{
"tool": "list_taint_paths",
"arguments": {
"session_id": "abc-123-def",
"source_pattern": "getenv",
"sink_pattern": "system",
"max_paths": 5
}
}
# Build program slice for security analysis
{
"tool": "get_program_slice",
"arguments": {
"session_id": "abc-123-def",
"filename": "main.c",
"line_number": 42,
"call_name": "memcpy"
}
}
Security Analysis Capabilities
The security analysis tools provide comprehensive vulnerability detection including:
Taint Analysis:
- Source identification:
find_taint_sourceslocates external input points - Sink identification:
find_taint_sinksfinds dangerous operations - Flow analysis:
find_taint_flowstraces data from sources to sinks - Argument flow analysis:
find_argument_flowsfinds exact expression reuse between calls - Path enumeration:
list_taint_pathsprovides detailed propagation chains
Program Slicing:
- Backward slicing:
get_program_sliceshows all code affecting a specific operation - Data dependencies: Variable assignments and data flow tracking
- Control dependencies: Conditional statements affecting execution
Reachability Analysis:
- Method connectivity:
check_method_reachabilityverifies call graph connections - Impact analysis: Understand potential execution paths
Configuration
Key settings in config.yaml:
server:
host: 0.0.0.0
port: 4242
log_level: INFO
redis:
host: localhost
port: 6379
sessions:
ttl: 3600 # Session timeout (seconds)
max_concurrent: 50 # Max concurrent sessions
cpg:
generation_timeout: 600 # CPG generation timeout (seconds)
supported_languages: [java, c, cpp, javascript, python, go, kotlin, csharp, ghidra, jimple, php, ruby, swift]
Environment variables override config file settings (e.g., MCP_HOST, REDIS_HOST, SESSION_TTL).
Example CPGQL Queries
Find all methods:
cpg.method.name.l
Find hardcoded secrets:
cpg.literal.code("(?i).*(password|secret|api_key).*").l
Find SQL injection risks:
cpg.call.name(".*execute.*").where(_.argument.isLiteral.code(".*SELECT.*")).l
Find complex methods:
cpg.method.filter(_.cyclomaticComplexity > 10).l
Architecture
- FastMCP Server: Built on FastMCP 2.12.4 framework with Streamable HTTP transport
- HTTP Transport: Network-accessible API supporting multiple concurrent clients
- Docker Containers: One isolated Joern container per session
- Redis: Session state and query result caching
- Async Processing: Non-blocking CPG generation
- CPG Caching: Reuse CPGs for identical source/language combinations
Development
Project Structure
joern-mcp/
├── src/
│ ├── services/ # Session, Docker, Git, CPG, Query services
│ ├── tools/ # MCP tool definitions
│ ├── utils/ # Redis, logging, validators
│ └── models.py # Data models
├── playground/ # Test codebases and CPGs
├── main.py # Server entry point
├── config.yaml # Configuration
└── requirements.txt # Dependencies
Running Tests
# Install dev dependencies
pip install -r requirements.txt
# Run tests
pytest
# Run with coverage
pytest --cov=src --cov-report=html
Code Quality
# Format
black src/ tests/
isort src/ tests/
# Lint
flake8 src/ tests/
mypy src/
Troubleshooting
Setup issues:
# Re-run setup to rebuild and restart services
./setup.sh
Docker issues:
# Verify Docker is running
docker ps
# Check Joern image
docker images | grep joern
# Check Redis container
docker ps | grep joern-redis
Redis connection issues:
# Test Redis connection
docker exec joern-redis redis-cli ping
# Check Redis logs
docker logs joern-redis
# Restart Redis
docker restart joern-redis
Server connectivity:
# Test server is running
curl http://localhost:4242/health
# Check server logs for errors
python main.py
Loading large projects:
joern:
binary_path: ${JOERN_BINARY_PATH:joern}
memory_limit: ${JOERN_MEMORY_LIMIT:16g}
java_opts: ${JOERN_JAVA_OPTS:-Xmx16G -Xms8G -XX:+UseG1GC -Dfile.encoding=UTF-8}
Debug logging:
export MCP_LOG_LEVEL=DEBUG
python main.py
Contributing
We welcome contributions! Please see CONTRIBUTING.md for:
- Getting started with development setup
- Code style and quality guidelines
- Testing requirements and best practices
- Submitting changes through pull requests
- Reporting issues and feature requests
- Documentation standards
Quick start for contributors:
git clone https://github.com/YOUR_USERNAME/joern-mcp.git
cd joern-mcp
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
./setup.sh
# Create feature branch
git checkout -b feature/your-feature
# Make changes and run tests
pytest && black . && flake8
# Submit pull request
See CONTRIBUTING.md for detailed guidelines.
Acknowledgments
- Joern - Static analysis platform
- FastMCP - MCP framework
- Model Context Protocol - MCP specification
Built with ❤️ in Doha 🇶🇦
Server Config
{
"mcpServers": {
"joern-mcp": {
"url": "https://0.0.0.0:4242/mcp"
}
}
}