A sophisticated Model Context Protocol (MCP) server that provides intelligent monitoring and observability integration. This server enables natural language interactions with Prometheus, AlertManager, and Grafana through chat-style commands, advanced query processing, and comprehensive monitoring automation.
## 🌟 Overview
This MCP server transforms how you interact with monitoring infrastructure by providing:
- **Natural Language Processing**: Ask monitoring questions in plain English
- **Intelligent Query Translation**: Automatically converts questions to PromQL queries
- **Historical Alert Analysis**: Count failures, outages, and incidents over time
- **Multi-Source Integration**: Seamlessly works with Prometheus, AlertManager, and Grafana
- **Automated Incident Detection**: Smart pattern recognition for service failures
## ✨ Key Features
### 🧠 **Natural Language Query Engine**
- **Smart Intent Recognition**: Understands monitoring questions like "How many times did service X fail?"
- **Automatic Time Range Parsing**: Handles phrases like "last 2 weeks", "yesterday", "past month"
- **Service Name Detection**: Recognizes services like opengrok, jenkins, grafana, prometheus
- **Alert Pattern Matching**: Identifies automation failures, service outages, and critical incidents
- **Context-Aware Responses**: Provides detailed breakdowns with incident counts and durations
### 🔍 **Prometheus Integration**
- **Advanced PromQL Generation**: Automatically creates complex queries based on natural language
- **Historical Data Analysis**: Analyzes alert trends and service availability over time
- **Metric Discovery**: Browse and search available metrics with intelligent filtering
- **Range Query Optimization**: Smart step sizing for different time ranges
- **Alert History Tracking**: Tracks firing periods and incident detection
### 🚨 **AlertManager Integration**
- **Real-time Alert Monitoring**: Query active, pending, and resolved alerts
- **Smart Alert Filtering**: Filter by service, severity, alertname, or custom labels
- **Alert Fingerprinting**: Track unique alert instances and their lifecycle
- **Incident Correlation**: Group related alerts and calculate total impact
### 📊 **Grafana Integration** (Optional)
- **Dashboard Discovery**: Find dashboards related to specific services
- **Dynamic Dashboard Links**: Generate direct links to relevant monitoring views
- **Service Context Mapping**: Connect services to their monitoring dashboards