Sponsored by Deepsite.site

K8s MCP Server

Created By
christian-schlichtherle4 months ago
An MCP server for comprehensive access to one or more Kubernetes clusters
Content

K8s MCP Server

Enhanced Kubernetes access with AI-friendly port-forward management supporting headless services.

Implementation Complete - All core features have been implemented.

Features

  • kubectl-Inspired Output Formats: Multiple output formats (table, wide, json, yaml, custom-columns, jsonpath) for efficient token usage
  • Resource Kind Discovery: Natural language queries to find Custom Resource Definitions and builtin resources by keywords
  • URL-based Configuration: Share port-forward configs via GitHub Gists or local files
  • Headless Service Support: Direct pod targeting via label selectors
  • AI-Optimized: Clean service/target structure for intelligent fuzzy matching
  • Multi-format Support: YAML, JSON, TOML configuration files
  • kubectl-like Behavior: Selects first ready pod, same as kubectl port-forward
  • Session Management: In-memory port-forward tracking with list and teardown capabilities
  • Parameterless Discovery: Easy cluster, alias, and service discovery for AI agents

Quick Start

1. Install Dependencies

pip install -e .

2. Set Environment Variables

# Required: Kubernetes cluster configurations
export KUBECONFIG_URLS="dev=file:///path/to/dev.yaml prod=file:///path/to/prod.yaml"

# Optional: Cluster aliases
export KUBECONFIG_ALIASES="development=dev production=prod"

# Required: Port-forward configuration URL
export PORT_FORWARD_CONFIG_URL="file://$(pwd)/example-port-forward-config.yaml"
# Or use a GitHub Gist:
# export PORT_FORWARD_CONFIG_URL="https://gist.githubusercontent.com/user/abc123/raw/port-forward.yaml"

3. Run the Server

python main.py

Usage Examples

AI Agent Interactions

User: "Setup port-forward to the FRET Druid in development"
AI Agent:
1. Calls discover_port_forward_services() to understand available services
2. Maps "FRET Druid in development" → service="druid", target="fret-dev"  
3. Calls setup_port_forward_to_target("druid", "fret-dev")
4. Returns: "✅ Connected to fret-dev Druid: http://localhost:8082 → pod/druid-router-0"

Direct Tool Calls

  • connect_to_service_targets("druid", ["fret-dev"]) - Connect to specific target
  • connect_to_service_targets("druid", []) - Connect to all Druid environments
  • list_resources("prod", "pods", labels={"app": "nginx"}) - List pods with labels
  • get_resource("dev", "deployment", "my-app", "default") - Get specific resource
  • discover_resource_kinds("dev", ["druid", "ingestion"]) - Find resource kinds by keywords
  • list_resources("dev", "pods", output_format="custom-columns", custom_columns="NAME:.metadata.name,IP:.status.podIP") - Custom output
  • list_port_forwards() - Show active port-forward sessions
  • teardown_port_forward("forward-id") - Stop a port-forward session
  • Supports kubectl naming variants: "pods", "pod", "po" all work

Output Format Examples

Token-Efficient Resource Listing

# Default table format (kubectl-like)
list_resources(cluster="dev", kind="pods", namespace="fret-dev")
# Returns clean table: NAME | READY | STATUS | RESTARTS | AGE

Custom Columns for Specific Data

# Extract specific fields with JSONPath
list_resources(
    cluster="dev", 
    kind="services",
    output_format="custom-columns", 
    custom_columns="NAME:.metadata.name,TYPE:.spec.type,PORTS:.spec.ports[*].port"
)

Resource Kind Discovery

# Find resource kinds by keywords
discover_resource_kinds(
    cluster="dev",
    keywords=["druid", "ingestion"],
    match_mode="OR"
)
# Returns: DruidIngestion resource kind info with all naming variants

Resource Naming Conventions

The K8s MCP server follows kubectl naming conventions:

Resource Kind Support

  • Plural forms: pods, services, deployments
  • Singular forms: pod, service, deployment
  • Abbreviations: po, svc, deploy, sts (for StatefulSet)
  • Case insensitive: Pod, POD, pod all work
  • discover_* tools find available resource kinds/types
  • search_* and list_* tools find specific instances

Example Workflow

# 1. Discover what kinds are available
discover_resource_kinds(cluster="dev", keywords=["druid"])
# Returns: DruidIngestion kind with variants ["druidingestions", "druidingestion"]

# 2. List instances using any variant
list_resources(cluster="dev", kind="druidingestions")
# or
list_resources(cluster="dev", kind="DruidIngestion")

Configuration

Port-Forward Config Format

druid:
  fret-dev:
    k8s_cluster: "dev-us-east"
    namespace: "druid"
    labels:
      app.kubernetes.io/component: "router"
    local_port: 8082
    remote_port: 8888

grafana:
  dev:
    k8s_cluster: "dev-us-east"
    namespace: "monitoring"
    labels:
      app.kubernetes.io/name: "grafana"
    local_port: 3000
    remote_port: 3000

Common Workflows

These workflow examples demonstrate how AI agents can combine multiple tools to accomplish complex Kubernetes tasks:

1. Cluster Overview Workflow

User: "Show me the status of all my clusters"
AI Agent Flow:
1. discover_clusters() → Get available clusters ["dev", "staging", "prod"]
2. For each cluster:
   - cluster_health() → Check API server connectivity
   - list_resources("cluster", "nodes") → Get node count and status
   - list_resources("cluster", "namespaces") → Get namespace overview
3. Summarize: "3 clusters online, 12 total nodes, 8 namespaces"

2. Application Debugging Workflow

User: "My app isn't working in staging"
AI Agent Flow:
1. list_resources("staging", "pods", labels={"app": "user-app"}) → Find app pods
2. For each pod:
   - describe_resource("staging", "pod", pod_name) → Get detailed status
   - get_logs("staging", pod_name, lines=50) → Check recent logs
   - list_resources("staging", "events", field_selector=f"involvedObject.name={pod_name}") → Get events
3. Analyze patterns: "Found 2 pods crash-looping due to missing config map 'app-config'"

3. Safe Deployment Update Workflow

User: "Update nginx deployment to version 1.21"
AI Agent Flow:
1. get_resource("prod", "deployment", "nginx") → Get current deployment
2. Modify image version in manifest
3. Present proposed changes: "Will update image from nginx:1.20 to nginx:1.21"
4. User confirms: "Apply it"
5. update_resource("prod", "deployment", "nginx", new_manifest) → Apply update
6. Monitor rollout status with deployment_status()

4. Resource Cleanup Workflow

User: "Clean up all test resources from last week"
AI Agent Flow:
1. list_resources("dev", "all", labels={"purpose": "test", "created": "2024-01-15"}) → Find test resources
2. Group by type: "Found 5 pods, 3 services, 2 configmaps"
3. For each resource:
   - Show resource details and age
4. Present deletion plan with resource details
5. User confirms: "Yes, delete them"
6. Execute deletions with proper cascade handling

5. Enhanced Port-Forward Workflow

User: "Setup port-forward to the FRET Druid in development"
AI Agent Flow:
1. discover_port_forward_services() → Get available services/targets
2. AI reasoning: "FRET Druid in development" → service="druid", target="fret-dev"
3. connect_to_service_targets("druid", ["fret-dev"]) → Setup to specific target
4. Return: "✅ Connected to fret-dev Druid: http://localhost:8082 → pod/druid-router-0"

6. Troubleshooting Performance Workflow

User: "Why is my app slow in production?"
AI Agent Flow:
1. list_resources("prod", "pods", labels={"app": "slow-app"}) → Find app pods
2. For each pod:
   - describe_resource() → Check resource limits/requests
   - get_logs(since="1h") → Check for performance-related logs
   - execute_in_pod(["top", "-b", "-n", "1"]) → Get resource usage
3. list_resources("prod", "nodes") → Check node capacity
4. query_by_labels("prod", "hpa", "app=slow-app") → Check auto-scaling
5. Analyze: "App is CPU-throttled, increase CPU requests from 100m to 250m"

7. Security Audit Workflow

User: "Audit RBAC permissions for the finance team"
AI Agent Flow:
1. list_resources("prod", "rolebindings") → Get all role bindings
2. list_resources("prod", "clusterrolebindings") → Get cluster-wide bindings
3. Filter by subjects containing "finance"
4. For each binding:
   - get_resource("prod", "role", role_name) → Get role permissions
5. Summarize: "Finance team has read access to secrets in finance namespace"

8. Disaster Recovery Workflow

User: "Restore the database from backup"
AI Agent Flow:
1. list_resources("prod", "pods", labels={"app": "database"}) → Find DB pods
2. scale_resource("prod", "deployment", "database", replicas=0) → Scale down
3. Wait for pods to terminate
4. apply_manifest("prod", restore_job_manifest) → Apply restore job
5. Monitor job status until completion
6. scale_resource("prod", "deployment", "database", replicas=1) → Scale back up
7. Verify database connectivity

Best Practices for AI Agents

  1. Always Preview Changes: Use diff/dry-run before modifications
  2. Provide Context: Include relevant information when reporting issues
  3. Batch Operations: Group related operations for efficiency
  4. Handle Pagination: Be aware of large result sets
  5. Respect Cluster Boundaries: Never assume operations across clusters

Tool Safety Classifications

All MCP tools are classified by their safety level for AI agent operations:

[READ-ONLY] Tools - Safe for autonomous use

These tools only read data and never modify cluster or local state:

Resource Operations:

  • get_resource() - Get specific resource details
  • get_resource_yaml() - Get resource as YAML
  • list_resources() - List resources with formatting options
  • list_resources_yaml() - List resources as YAML
  • query_by_labels() - Advanced label-based queries
  • query_by_labels_yaml() - Label queries as YAML

Monitoring & Analysis:

  • troubleshoot_pod() - Comprehensive pod debugging
  • troubleshoot_pod_yaml() - Pod debugging as YAML
  • cluster_health() - Overall cluster health assessment
  • deployment_status() - Deployment rollout analysis
  • deployment_status_yaml() - Deployment analysis as YAML
  • resource_usage() - Resource consumption analysis
  • get_logs() - Retrieve pod logs
  • list_port_forwards() - List active port-forward sessions

Discovery:

  • discover_resource_kinds() - Find resource types by keywords
  • discover_clusters() - List available clusters
  • discover_cluster_aliases() - List cluster aliases
  • discover_port_forward_services() - List port-forward configurations

[MODIFIES STATE] Tools - Require user approval

These tools modify cluster state or local network configuration:

Resource Management:

  • create_resource() - Create new Kubernetes resources
  • update_resource() - Update existing resources
  • patch_resource() - Partially update resources
  • scale_resource() - Change replica counts
  • apply_manifest() - Apply YAML manifests

Network Operations:

  • connect_to_service_targets() - Connect to configured service environments
  • setup_port_forward() - Generic port-forward setup
  • teardown_port_forward() - Close port-forward sessions

[DANGEROUS] Tools - Require explicit user confirmation

These tools can cause data loss or security risks:

  • delete_resource() - Permanently delete resources (cannot be undone)
  • execute_in_pod() - Run arbitrary commands in pods (potential security risk)

AI Agent Safety Guidelines

  1. Autonomous Operations: AI agents can freely use [READ-ONLY] tools
  2. State Changes: [MODIFIES STATE] tools should require user approval
  3. High-Risk Operations: [DANGEROUS] tools must have explicit user confirmation
  4. Error Handling: All tools return consistent error formats for safe parsing

Architecture

k8s-mcp/
├── main.py                 # MCP server entry point with tool definitions
├── config/                 # Configuration management  
│   ├── loader.py          # URL-based config loading
│   ├── manager.py         # Cluster and port-forward session management
│   └── config.py          # Environment validation and configuration
├── tools/                  # Modular tool implementations
│   ├── resources.py       # Common resource operation patterns
│   ├── formatting.py      # All resource formatting (table, YAML, JSON)
│   ├── columns.py         # Default column definitions for resources
│   ├── discovery.py       # Resource kind discovery and CRD enumeration
│   └── port_forward.py    # Port-forward specific operations
└── utils/                  # Shared utilities
    ├── helpers.py         # Common helper functions
    └── jsonpath.py        # JSONPath processing for custom columns

Code Organization Principles

  • Single Responsibility: Each module has a clear, focused purpose
  • DRY Compliance: No duplicate implementations across files
  • Separation of Concerns: Client management, formatting, and business logic are separated
  • Modular Design: Tools can be imported and reused independently

Server Config

{
  "mcpServers": {
    "k8s": {
      "command": "uv",
      "args": [
        "--directory",
        "/path/to/k8s-mcp",
        "run",
        "k8s-mcp"
      ],
      "env": {
        "KUBECONFIG_ALIASES": "development=dev production=prod",
        "KUBECONFIG_URLS": "dev=file:///path/to/development/k3s/kubeconfig.yaml prod=file:///path/to/production/k3s/kubeconfig.yaml",
        "PORT_FORWARD_CONFIG_URL": "file:///path/to/k8s-mcp/port-forward-config.yaml"
      }
    }
  }
}
Recommend Servers
TraeBuild with Free GPT-4.1 & Claude 3.7. Fully MCP-Ready.
WindsurfThe new purpose-built IDE to harness magic
ChatWiseThe second fastest AI chatbot™
BlenderBlenderMCP connects Blender to Claude AI through the Model Context Protocol (MCP), allowing Claude to directly interact with and control Blender. This integration enables prompt assisted 3D modeling, scene creation, and manipulation.
Jina AI MCP ToolsA Model Context Protocol (MCP) server that integrates with Jina AI Search Foundation APIs.
Amap Maps高德地图官方 MCP Server
Serper MCP ServerA Serper MCP Server
Howtocook Mcp基于Anduin2017 / HowToCook (程序员在家做饭指南)的mcp server,帮你推荐菜谱、规划膳食,解决“今天吃什么“的世纪难题; Based on Anduin2017/HowToCook (Programmer's Guide to Cooking at Home), MCP Server helps you recommend recipes, plan meals, and solve the century old problem of "what to eat today"
Visual Studio Code - Open Source ("Code - OSS")Visual Studio Code
CursorThe AI Code Editor
DeepChatYour AI Partner on Desktop
EdgeOne Pages MCPAn MCP service designed for deploying HTML content to EdgeOne Pages and obtaining an accessible public URL.
TimeA Model Context Protocol server that provides time and timezone conversion capabilities. This server enables LLMs to get current time information and perform timezone conversions using IANA timezone names, with automatic system timezone detection.
Playwright McpPlaywright MCP server
Zhipu Web SearchZhipu Web Search MCP Server is a search engine specifically designed for large models. It integrates four search engines, allowing users to flexibly compare and switch between them. Building upon the web crawling and ranking capabilities of traditional search engines, it enhances intent recognition capabilities, returning results more suitable for large model processing (such as webpage titles, URLs, summaries, site names, site icons, etc.). This helps AI applications achieve "dynamic knowledge acquisition" and "precise scenario adaptation" capabilities.
Baidu Map百度地图核心API现已全面兼容MCP协议,是国内首家兼容MCP协议的地图服务商。
Context7Context7 MCP Server -- Up-to-date code documentation for LLMs and AI code editors
Tavily Mcp
MCP AdvisorMCP Advisor & Installation - Use the right MCP server for your needs
AiimagemultistyleA Model Context Protocol (MCP) server for image generation and manipulation using fal.ai's Stable Diffusion model.
MiniMax MCPOfficial MiniMax Model Context Protocol (MCP) server that enables interaction with powerful Text to Speech, image generation and video generation APIs.