Sponsored by Deepsite.site

Code-Index-MCP (Local-first Code Indexer)

Created By
ViperJuice7 months ago
Code indexing MCP server to provide context to coding agents.
Content

Code-Index-MCP (Local-first Code Indexer)

Modular, extensible local-first code indexer designed to enhance Claude Code and other LLMs with deep code understanding capabilities. Built on the Model Context Protocol (MCP) for seamless integration with AI assistants.

🎯 Key Features

  • 🚀 Local-First Architecture: All indexing happens locally for speed and privacy
  • 🔌 Plugin-Based Design: Easily extensible with language-specific plugins
  • 🔍 Multi-Language Support: Python, C/C++, JavaScript, Dart, HTML/CSS
  • ⚡ Real-Time Updates: File system monitoring for instant index updates
  • 🧠 Semantic Search: AI-powered code search with Voyage AI embeddings
  • 📊 Rich Code Intelligence: Symbol resolution, type inference, dependency tracking

🏗️ Architecture

The Code-Index-MCP follows a modular, plugin-based architecture designed for extensibility and performance:

System Layers

  1. 🌐 System Context (Level 1)

    • Developer interacts with Claude Code or other LLMs
    • MCP protocol provides standardized tool interface
    • Local-first processing with optional cloud features
    • Performance SLAs: <100ms symbol lookup, <500ms search
  2. 📦 Container Architecture (Level 2)

    ┌─────────────────┐     ┌──────────────┐     ┌─────────────┐
    │   API Gateway   │────▶│  Dispatcher  │────▶│   Plugins   │
    │   (FastAPI)     │     │              │     │ (Language)  │
    └─────────────────┘     └──────────────┘     └─────────────┘
           │                        │                     │
           ▼                        ▼                     ▼
    ┌─────────────────┐     ┌──────────────┐     ┌─────────────┐
    │  Local Index    │     │ File Watcher │     │  Embedding  │
    │  (SQLite+FTS5)  │     │  (Watchdog)  │     │   Service   │
    └─────────────────┘     └──────────────┘     └─────────────┘
    
  3. 🔧 Component Details (Level 3)

    • Gateway Controller: RESTful API endpoints
    • Dispatcher Core: Plugin routing and lifecycle
    • Plugin Base: Standard interface for all plugins
    • Language Plugins: Specialized parsers and analyzers
    • Index Manager: SQLite with FTS5 for fast searches
    • Watcher Service: Real-time file monitoring

🛠️ Language Support

Currently Supported Languages

LanguageParserFeaturesStatus
PythonTree-sitter + JediType inference, import resolution, docstrings✅ Fully Implemented
CTree-sitterPreprocessor, headers, symbols✅ Fully Implemented
C++Tree-sitterTemplates, namespaces, classes✅ Fully Implemented
JavaScript/TypeScriptTree-sitterES6+, modules, async/await, TypeScript support✅ Fully Implemented
DartRegex-basedClasses, functions, variables✅ Fully Implemented
HTML/CSSTree-sitterSelectors, media queries, custom properties✅ Fully Implemented

Implementation Status: 95% Complete - All 6 language plugins operational with comprehensive testing framework.

Planned Languages

  • Rust, Go, Ruby, Swift, Kotlin, Java, TypeScript

🚀 Quickstart

Prerequisites

  • Python 3.8+
  • Git
  • Docker (optional, for architecture diagrams)

Installation

  1. Clone the repository

    git clone https://github.com/yourusername/Code-Index-MCP.git
    cd Code-Index-MCP
    
  2. Install dependencies

    # Create virtual environment
    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    
    # Install requirements
    pip install -r requirements.txt
    
  3. Start the server

    # Start the MCP server
    uvicorn mcp_server.gateway:app --reload --host 0.0.0.0 --port 8000
    
  4. Test the API

    # Check server status
    curl http://localhost:8000/status
    
    # Search for code
    curl -X POST http://localhost:8000/search \
      -H "Content-Type: application/json" \
      -d '{"query": "def parse"}'
    

🔧 Configuration

Create a .env file for configuration:

# Optional: Voyage AI for semantic search
VOYAGE_AI_API_KEY=your_api_key_here

# Server settings
MCP_SERVER_HOST=0.0.0.0
MCP_SERVER_PORT=8000
MCP_LOG_LEVEL=INFO

# Workspace settings
MCP_WORKSPACE_ROOT=.
MCP_MAX_FILE_SIZE=10485760  # 10MB

💻 Development

Creating a New Language Plugin

  1. Create plugin structure

    mkdir -p mcp_server/plugins/my_language_plugin
    cd mcp_server/plugins/my_language_plugin
    touch __init__.py plugin.py
    
  2. Implement the plugin interface

    from mcp_server.plugin_base import PluginBase
    
    class MyLanguagePlugin(PluginBase):
        def __init__(self):
            self.tree_sitter_language = "my_language"
        
        def index(self, file_path: str) -> Dict:
            # Parse and index the file
            pass
        
        def getDefinition(self, symbol: str, context: Dict) -> Dict:
            # Find symbol definition
            pass
        
        def getReferences(self, symbol: str, context: Dict) -> List[Dict]:
            # Find symbol references
            pass
    
  3. Register the plugin

    # In dispatcher.py
    from .plugins.my_language_plugin import MyLanguagePlugin
    
    self.plugins['my_language'] = MyLanguagePlugin()
    

Running Tests

# Run all tests
pytest

# Run specific test
pytest test_python_plugin.py

# Run with coverage
pytest --cov=mcp_server --cov-report=html

Architecture Visualization

# View C4 architecture diagrams
docker run --rm -p 8080:8080 \
  -v "$(pwd)/architecture":/usr/local/structurizr \
  structurizr/lite

# Open http://localhost:8080 in your browser

📚 API Reference

Core Endpoints

GET /symbol

Get symbol definition

GET /symbol?symbol_name=parseFile&file_path=/path/to/file.py

Query parameters:

  • symbol_name (required): Name of the symbol to find
  • file_path (optional): Specific file to search in

Search for code patterns

GET /search?query=async+def.*parse&file_extensions=.py,.js

Query parameters:

  • query (required): Search pattern (regex supported)
  • file_extensions (optional): Comma-separated list of extensions

Response Format

All API responses follow a consistent JSON structure:

Success Response:

{
  "status": "success",
  "data": { ... },
  "timestamp": "2024-01-01T00:00:00Z"
}

Error Response:

{
  "status": "error",
  "error": "Error message",
  "code": "ERROR_CODE",
  "timestamp": "2024-01-01T00:00:00Z"
}

🚢 Deployment

Docker Deployment Options

The project includes multiple Docker configurations for different environments:

Development (Default):

# Uses docker-compose.yml + Dockerfile
docker-compose up -d
# - SQLite database
# - Uvicorn development server  
# - Volume mounts for code changes
# - Debug logging enabled

Production:

# Uses docker-compose.production.yml + Dockerfile.production
docker-compose -f docker-compose.production.yml up -d
# - PostgreSQL database
# - Gunicorn + Uvicorn workers
# - Multi-stage optimized builds
# - Security hardening (non-root user)
# - Production logging

Enhanced Development:

# Uses both compose files with development overrides
docker-compose -f docker-compose.yml -f docker-compose.dev.yml up -d
# - Development base + enhanced debugging
# - Source code volume mounting
# - Read-write code access

Container Restart Behavior

Important: By default, docker-compose restart uses the DEVELOPMENT configuration:

  • docker-compose restart → Uses docker-compose.yml (Development)
  • docker-compose -f docker-compose.production.yml restart → Uses Production

Production Deployment

For production environments, we provide:

  1. Multi-stage Docker builds with security hardening
  2. PostgreSQL database with async support
  3. Redis caching for performance optimization
  4. Qdrant vector database for semantic search
  5. Prometheus + Grafana monitoring stack
  6. Kubernetes manifests in k8s/ directory
  7. nginx reverse proxy configuration

See our Deployment Guide for detailed instructions including:

  • Kubernetes deployment configurations
  • Auto-scaling setup
  • Database optimization
  • Security best practices
  • Monitoring and observability

System Requirements

  • Minimum: 2GB RAM, 2 CPU cores, 10GB storage
  • Recommended: 8GB RAM, 4 CPU cores, 50GB SSD storage
  • Large codebases: 16GB+ RAM, 8+ CPU cores, 100GB+ SSD storage

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

Development Process

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes
  4. Add tests (aim for 90%+ coverage)
  5. Update documentation
  6. Submit a pull request

Code Style

  • Follow PEP 8 for Python code
  • Use type hints for all functions
  • Write descriptive docstrings
  • Keep functions small and focused

📈 Performance

Benchmarks

OperationPerformance TargetCurrent Status
Symbol Lookup<100ms (p95)✅ Implemented, pending benchmark results
Code Search<500ms (p95)✅ Implemented, pending benchmark results
File Indexing10K files/min✅ Implemented, pending benchmark results
Memory Usage<2GB for 100K files✅ Implemented, pending benchmark results

Note: All core functionality is implemented (95% complete). Performance benchmarking framework exists but results need to be published.

Optimization Tips

Performance optimization features are implemented and available:

  1. Enable caching: Redis caching is implemented and configurable via environment variables
  2. Adjust batch size: Configurable via INDEXING_BATCH_SIZE environment variable
  3. Use SSD storage: Improves indexing speed significantly
  4. Limit file size: Configurable via INDEXING_MAX_FILE_SIZE environment variable
  5. Parallel processing: Multi-worker indexing configurable via INDEXING_MAX_WORKERS

🔒 Security

  • Local-first: All processing happens locally by default
  • Path validation: Prevents directory traversal attacks
  • Input sanitization: All queries are sanitized
  • Secret detection: Automatic redaction of detected secrets
  • Plugin isolation: Plugins run in restricted environments

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

📬 Contact


Built with ❤️ for the developer community

Recommend Servers
TraeBuild with Free GPT-4.1 & Claude 3.7. Fully MCP-Ready.
Playwright McpPlaywright MCP server
Zhipu Web SearchZhipu Web Search MCP Server is a search engine specifically designed for large models. It integrates four search engines, allowing users to flexibly compare and switch between them. Building upon the web crawling and ranking capabilities of traditional search engines, it enhances intent recognition capabilities, returning results more suitable for large model processing (such as webpage titles, URLs, summaries, site names, site icons, etc.). This helps AI applications achieve "dynamic knowledge acquisition" and "precise scenario adaptation" capabilities.
DeepChatYour AI Partner on Desktop
EdgeOne Pages MCPAn MCP service designed for deploying HTML content to EdgeOne Pages and obtaining an accessible public URL.
ChatWiseThe second fastest AI chatbot™
MiniMax MCPOfficial MiniMax Model Context Protocol (MCP) server that enables interaction with powerful Text to Speech, image generation and video generation APIs.
AiimagemultistyleA Model Context Protocol (MCP) server for image generation and manipulation using fal.ai's Stable Diffusion model.
Tavily Mcp
BlenderBlenderMCP connects Blender to Claude AI through the Model Context Protocol (MCP), allowing Claude to directly interact with and control Blender. This integration enables prompt assisted 3D modeling, scene creation, and manipulation.
MCP AdvisorMCP Advisor & Installation - Use the right MCP server for your needs
Howtocook Mcp基于Anduin2017 / HowToCook (程序员在家做饭指南)的mcp server,帮你推荐菜谱、规划膳食,解决“今天吃什么“的世纪难题; Based on Anduin2017/HowToCook (Programmer's Guide to Cooking at Home), MCP Server helps you recommend recipes, plan meals, and solve the century old problem of "what to eat today"
TimeA Model Context Protocol server that provides time and timezone conversion capabilities. This server enables LLMs to get current time information and perform timezone conversions using IANA timezone names, with automatic system timezone detection.
Baidu Map百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Serper MCP ServerA Serper MCP Server
WindsurfThe new purpose-built IDE to harness magic
Visual Studio Code - Open Source ("Code - OSS")Visual Studio Code
CursorThe AI Code Editor
Context7Context7 MCP Server -- Up-to-date code documentation for LLMs and AI code editors
Amap Maps高德地图官方 MCP Server
Jina AI MCP ToolsA Model Context Protocol (MCP) server that integrates with Jina AI Search Foundation APIs.