Sponsored by Deepsite.site

Mcp Monitoring

Created By
reemshai106 months ago
A sophisticated Model Context Protocol (MCP) server that provides intelligent monitoring and observability integration. This server enables natural language interactions with Prometheus, AlertManager, and Grafana through chat-style commands, advanced query processing, and comprehensive monitoring automation. ## 🌟 Overview This MCP server transforms how you interact with monitoring infrastructure by providing: - **Natural Language Processing**: Ask monitoring questions in plain English - **Intelligent Query Translation**: Automatically converts questions to PromQL queries - **Historical Alert Analysis**: Count failures, outages, and incidents over time - **Multi-Source Integration**: Seamlessly works with Prometheus, AlertManager, and Grafana - **Automated Incident Detection**: Smart pattern recognition for service failures ## ✨ Key Features ### 🧠 **Natural Language Query Engine** - **Smart Intent Recognition**: Understands monitoring questions like "How many times did service X fail?" - **Automatic Time Range Parsing**: Handles phrases like "last 2 weeks", "yesterday", "past month" - **Service Name Detection**: Recognizes services like opengrok, jenkins, grafana, prometheus - **Alert Pattern Matching**: Identifies automation failures, service outages, and critical incidents - **Context-Aware Responses**: Provides detailed breakdowns with incident counts and durations ### 🔍 **Prometheus Integration** - **Advanced PromQL Generation**: Automatically creates complex queries based on natural language - **Historical Data Analysis**: Analyzes alert trends and service availability over time - **Metric Discovery**: Browse and search available metrics with intelligent filtering - **Range Query Optimization**: Smart step sizing for different time ranges - **Alert History Tracking**: Tracks firing periods and incident detection ### 🚨 **AlertManager Integration** - **Real-time Alert Monitoring**: Query active, pending, and resolved alerts - **Smart Alert Filtering**: Filter by service, severity, alertname, or custom labels - **Alert Fingerprinting**: Track unique alert instances and their lifecycle - **Incident Correlation**: Group related alerts and calculate total impact ### 📊 **Grafana Integration** (Optional) - **Dashboard Discovery**: Find dashboards related to specific services - **Dynamic Dashboard Links**: Generate direct links to relevant monitoring views - **Service Context Mapping**: Connect services to their monitoring dashboards
Content

💬 Natural Language Examples

Service Failure Analysis

Q: "How many times did prevent-opengrok automation fail in the last 2 weeks?"
A: 46 failures over 2 days and 3 hours total downtime

Q: "Show me jenkins outages yesterday"
A: Detailed breakdown of jenkins service interruptions

Q: "Count critical alerts for grafana service this month"
A: Historical analysis with incident timeline

Service Availability Queries

Q: "How many times was prometheus down last week?"
A: Service downtime incidents with duration analysis

Q: "Show cleanup-zuultmp disk usage alerts"  
A: Disk space warnings and critical alerts breakdown

Q: "What automation failures happened in the past 7 days?"
A: Comprehensive automation failure report

🔧 Integration Examples

VS Code MCP Configuration

{
  "servers": {
    "monitoring-mcp": {
      "command": "node",
      "args": [
        "/Users/MCP/mcp-monitoring/dist/index.js"
      ],
      "env": {
        "PROMETHEUS_URL": "${input:prometheus_base_url}",
        "ALERTMANAGER_URL": "${input:alertmanager_base_url}",
        "GRAFANA_URL": "${input:grafana_base_url}",
        "GRAFANA_API_KEY": "${input:grafana_api_key}"
        }
      }
    }
  }
}

For Grafana Token ask the admin to create a service user and provide the token

🎯 Use Cases

DevOps Teams

  • Incident Response: Quickly assess service health and failure patterns
  • Postmortem Analysis: Historical incident data for root cause analysis
  • Capacity Planning: Trend analysis and resource utilization monitoring
  • Alert Fatigue Management: Identify noisy alerts and optimization opportunities

SRE Teams

  • SLI/SLO Monitoring: Service availability and performance tracking
  • Error Budget Analysis: Calculate error rates and availability metrics
  • Automated Reporting: Generate incident reports and availability summaries
  • Proactive Monitoring: Identify patterns before they become critical issues

Development Teams

  • Deployment Monitoring: Track deployment success/failure rates
  • Performance Regression Detection: Compare metrics across releases
  • Integration Testing: Monitor test environment stability
  • Feature Flag Impact: Assess performance impact of feature rollouts

🧩 Architecture

Smart Query Processing Pipeline

  1. Intent Recognition: Parse natural language to understand query type
  2. Service Detection: Identify target services and components
  3. Time Range Extraction: Parse temporal expressions into date ranges
  4. PromQL Generation: Create optimized queries based on intent
  5. Data Analysis: Process results and calculate meaningful metrics
  6. Response Formatting: Present data in human-readable format

Supported Query Types

  • current_alerts: Active/firing alerts right now
  • historical_alerts: Past incidents and failure counts
  • service_availability: Uptime/downtime analysis
  • dashboard_discovery: Find relevant monitoring dashboards
  • metrics: General metric queries and analysis

📈 Performance Features

  • Intelligent Query Optimization: Automatic step sizing for different time ranges
  • Result Caching: Avoid redundant API calls for recent queries
  • Timeout Handling: Graceful handling of slow monitoring APIs
  • Batch Processing: Efficient handling of multi-service queries
  • Memory Management: Optimized for long-running server deployment

🔒 Security & Best Practices

Authentication

  • Secure API token storage for Grafana integration
  • Support for basic auth with Prometheus/AlertManager
  • Environment variable configuration for sensitive data

Network Security

  • HTTPS-only connections to monitoring services
  • Configurable timeout and retry policies
  • Certificate validation for secure connections

Access Control

  • Read-only operations by design
  • No data modification capabilities
  • Audit logging for all monitoring queries

🐛 Troubleshooting

Common Issues

# Connection errors
Error: connect ECONNREFUSED
Solution: Check PROMETHEUS_URL and network connectivity

# Authentication failures  
Error: 401 Unauthorized
Solution: Verify API tokens and authentication credentials

# Query timeouts
Error: timeout of 30000ms exceeded
Solution: Reduce query complexity or time range

# No data returned
Warning: No matching metrics found
Solution: Check service names and time range validity

Debug Mode

# Enable verbose logging
DEBUG=monitoring-mcp node dist/index.js

# Check configuration
node -e "console.log(process.env.PROMETHEUS_URL)"

🚀 Advanced Usage

Custom Service Detection

The server automatically recognizes these services:

  • cleanup-zuultmp, opengrok, jenkins
  • grafana, prometheus, alertmanager
  • gerrit, nginx, mysql, redis, elasticsearch

Advanced Natural Language Patterns

"How many times did [service] fail in the last [time period]?"
"Show me [severity] alerts for [service] [time range]"
"Count [alert name] incidents in [time period]"
"When was [service] down last [time period]?"

🤝 Contributing

Contributions welcome! Please ensure:

  • TypeScript compilation passes (npm run build)
  • Natural language query tests pass
  • Documentation updated for new features
  • Error handling comprehensive

Built with ❤️ for DevOps and SRE teams who want smarter monitoring interactions

Server Config

{
  "mcpServers": {
    "monitoring-mcp": {
      "command": "node",
      "args": [
        "/Users/MCP/mcp-monitoring/dist/index.js"
      ],
      "env": {
        "PROMETHEUS_URL": "${input:prometheus_base_url}",
        "ALERTMANAGER_URL": "${input:alertmanager_base_url}",
        "GRAFANA_URL": "${input:grafana_base_url}",
        "GRAFANA_API_KEY": "${input:grafana_api_key}"
      }
    }
  }
}
Recommend Servers
TraeBuild with Free GPT-4.1 & Claude 3.7. Fully MCP-Ready.
Baidu Map百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Context7Context7 MCP Server -- Up-to-date code documentation for LLMs and AI code editors
BlenderBlenderMCP connects Blender to Claude AI through the Model Context Protocol (MCP), allowing Claude to directly interact with and control Blender. This integration enables prompt assisted 3D modeling, scene creation, and manipulation.
Zhipu Web SearchZhipu Web Search MCP Server is a search engine specifically designed for large models. It integrates four search engines, allowing users to flexibly compare and switch between them. Building upon the web crawling and ranking capabilities of traditional search engines, it enhances intent recognition capabilities, returning results more suitable for large model processing (such as webpage titles, URLs, summaries, site names, site icons, etc.). This helps AI applications achieve "dynamic knowledge acquisition" and "precise scenario adaptation" capabilities.
ChatWiseThe second fastest AI chatbot™
Tavily Mcp
TimeA Model Context Protocol server that provides time and timezone conversion capabilities. This server enables LLMs to get current time information and perform timezone conversions using IANA timezone names, with automatic system timezone detection.
MiniMax MCPOfficial MiniMax Model Context Protocol (MCP) server that enables interaction with powerful Text to Speech, image generation and video generation APIs.
AiimagemultistyleA Model Context Protocol (MCP) server for image generation and manipulation using fal.ai's Stable Diffusion model.
Playwright McpPlaywright MCP server
Serper MCP ServerA Serper MCP Server
EdgeOne Pages MCPAn MCP service designed for deploying HTML content to EdgeOne Pages and obtaining an accessible public URL.
MCP AdvisorMCP Advisor & Installation - Use the right MCP server for your needs
CursorThe AI Code Editor
DeepChatYour AI Partner on Desktop
WindsurfThe new purpose-built IDE to harness magic
Howtocook Mcp基于Anduin2017 / HowToCook (程序员在家做饭指南)的mcp server,帮你推荐菜谱、规划膳食,解决“今天吃什么“的世纪难题; Based on Anduin2017/HowToCook (Programmer's Guide to Cooking at Home), MCP Server helps you recommend recipes, plan meals, and solve the century old problem of "what to eat today"
Visual Studio Code - Open Source ("Code - OSS")Visual Studio Code
Amap Maps高德地图官方 MCP Server
Jina AI MCP ToolsA Model Context Protocol (MCP) server that integrates with Jina AI Search Foundation APIs.