- K8s MCP Server
K8s MCP Server
K8s MCP Server
Enhanced Kubernetes access with AI-friendly port-forward management supporting headless services.
✅ Implementation Complete - All core features have been implemented.
Features
- kubectl-Inspired Output Formats: Multiple output formats (table, wide, json, yaml, custom-columns, jsonpath) for efficient token usage
- Resource Kind Discovery: Natural language queries to find Custom Resource Definitions and builtin resources by keywords
- URL-based Configuration: Share port-forward configs via GitHub Gists or local files
- Headless Service Support: Direct pod targeting via label selectors
- AI-Optimized: Clean service/target structure for intelligent fuzzy matching
- Multi-format Support: YAML, JSON, TOML configuration files
- kubectl-like Behavior: Selects first ready pod, same as
kubectl port-forward - Session Management: In-memory port-forward tracking with list and teardown capabilities
- Parameterless Discovery: Easy cluster, alias, and service discovery for AI agents
Quick Start
1. Install Dependencies
pip install -e .
2. Set Environment Variables
# Required: Kubernetes cluster configurations
export KUBECONFIG_URLS="dev=file:///path/to/dev.yaml prod=file:///path/to/prod.yaml"
# Optional: Cluster aliases
export KUBECONFIG_ALIASES="development=dev production=prod"
# Required: Port-forward configuration URL
export PORT_FORWARD_CONFIG_URL="file://$(pwd)/example-port-forward-config.yaml"
# Or use a GitHub Gist:
# export PORT_FORWARD_CONFIG_URL="https://gist.githubusercontent.com/user/abc123/raw/port-forward.yaml"
3. Run the Server
python main.py
Usage Examples
AI Agent Interactions
User: "Setup port-forward to the FRET Druid in development"
AI Agent:
1. Calls discover_port_forward_services() to understand available services
2. Maps "FRET Druid in development" → service="druid", target="fret-dev"
3. Calls setup_port_forward_to_target("druid", "fret-dev")
4. Returns: "✅ Connected to fret-dev Druid: http://localhost:8082 → pod/druid-router-0"
Direct Tool Calls
connect_to_service_targets("druid", ["fret-dev"])- Connect to specific targetconnect_to_service_targets("druid", [])- Connect to all Druid environmentslist_resources("prod", "pods", labels={"app": "nginx"})- List pods with labelsget_resource("dev", "deployment", "my-app", "default")- Get specific resourcediscover_resource_kinds("dev", ["druid", "ingestion"])- Find resource kinds by keywordslist_resources("dev", "pods", output_format="custom-columns", custom_columns="NAME:.metadata.name,IP:.status.podIP")- Custom outputlist_port_forwards()- Show active port-forward sessionsteardown_port_forward("forward-id")- Stop a port-forward session- Supports kubectl naming variants:
"pods","pod","po"all work
Output Format Examples
Token-Efficient Resource Listing
# Default table format (kubectl-like)
list_resources(cluster="dev", kind="pods", namespace="fret-dev")
# Returns clean table: NAME | READY | STATUS | RESTARTS | AGE
Custom Columns for Specific Data
# Extract specific fields with JSONPath
list_resources(
cluster="dev",
kind="services",
output_format="custom-columns",
custom_columns="NAME:.metadata.name,TYPE:.spec.type,PORTS:.spec.ports[*].port"
)
Resource Kind Discovery
# Find resource kinds by keywords
discover_resource_kinds(
cluster="dev",
keywords=["druid", "ingestion"],
match_mode="OR"
)
# Returns: DruidIngestion resource kind info with all naming variants
Resource Naming Conventions
The K8s MCP server follows kubectl naming conventions:
Resource Kind Support
- Plural forms:
pods,services,deployments - Singular forms:
pod,service,deployment - Abbreviations:
po,svc,deploy,sts(for StatefulSet) - Case insensitive:
Pod,POD,podall work
Discovery vs Search
discover_*tools find available resource kinds/typessearch_*andlist_*tools find specific instances
Example Workflow
# 1. Discover what kinds are available
discover_resource_kinds(cluster="dev", keywords=["druid"])
# Returns: DruidIngestion kind with variants ["druidingestions", "druidingestion"]
# 2. List instances using any variant
list_resources(cluster="dev", kind="druidingestions")
# or
list_resources(cluster="dev", kind="DruidIngestion")
Configuration
Port-Forward Config Format
druid:
fret-dev:
k8s_cluster: "dev-us-east"
namespace: "druid"
labels:
app.kubernetes.io/component: "router"
local_port: 8082
remote_port: 8888
grafana:
dev:
k8s_cluster: "dev-us-east"
namespace: "monitoring"
labels:
app.kubernetes.io/name: "grafana"
local_port: 3000
remote_port: 3000
Common Workflows
These workflow examples demonstrate how AI agents can combine multiple tools to accomplish complex Kubernetes tasks:
1. Cluster Overview Workflow
User: "Show me the status of all my clusters"
AI Agent Flow:
1. discover_clusters() → Get available clusters ["dev", "staging", "prod"]
2. For each cluster:
- cluster_health() → Check API server connectivity
- list_resources("cluster", "nodes") → Get node count and status
- list_resources("cluster", "namespaces") → Get namespace overview
3. Summarize: "3 clusters online, 12 total nodes, 8 namespaces"
2. Application Debugging Workflow
User: "My app isn't working in staging"
AI Agent Flow:
1. list_resources("staging", "pods", labels={"app": "user-app"}) → Find app pods
2. For each pod:
- describe_resource("staging", "pod", pod_name) → Get detailed status
- get_logs("staging", pod_name, lines=50) → Check recent logs
- list_resources("staging", "events", field_selector=f"involvedObject.name={pod_name}") → Get events
3. Analyze patterns: "Found 2 pods crash-looping due to missing config map 'app-config'"
3. Safe Deployment Update Workflow
User: "Update nginx deployment to version 1.21"
AI Agent Flow:
1. get_resource("prod", "deployment", "nginx") → Get current deployment
2. Modify image version in manifest
3. Present proposed changes: "Will update image from nginx:1.20 to nginx:1.21"
4. User confirms: "Apply it"
5. update_resource("prod", "deployment", "nginx", new_manifest) → Apply update
6. Monitor rollout status with deployment_status()
4. Resource Cleanup Workflow
User: "Clean up all test resources from last week"
AI Agent Flow:
1. list_resources("dev", "all", labels={"purpose": "test", "created": "2024-01-15"}) → Find test resources
2. Group by type: "Found 5 pods, 3 services, 2 configmaps"
3. For each resource:
- Show resource details and age
4. Present deletion plan with resource details
5. User confirms: "Yes, delete them"
6. Execute deletions with proper cascade handling
5. Enhanced Port-Forward Workflow
User: "Setup port-forward to the FRET Druid in development"
AI Agent Flow:
1. discover_port_forward_services() → Get available services/targets
2. AI reasoning: "FRET Druid in development" → service="druid", target="fret-dev"
3. connect_to_service_targets("druid", ["fret-dev"]) → Setup to specific target
4. Return: "✅ Connected to fret-dev Druid: http://localhost:8082 → pod/druid-router-0"
6. Troubleshooting Performance Workflow
User: "Why is my app slow in production?"
AI Agent Flow:
1. list_resources("prod", "pods", labels={"app": "slow-app"}) → Find app pods
2. For each pod:
- describe_resource() → Check resource limits/requests
- get_logs(since="1h") → Check for performance-related logs
- execute_in_pod(["top", "-b", "-n", "1"]) → Get resource usage
3. list_resources("prod", "nodes") → Check node capacity
4. query_by_labels("prod", "hpa", "app=slow-app") → Check auto-scaling
5. Analyze: "App is CPU-throttled, increase CPU requests from 100m to 250m"
7. Security Audit Workflow
User: "Audit RBAC permissions for the finance team"
AI Agent Flow:
1. list_resources("prod", "rolebindings") → Get all role bindings
2. list_resources("prod", "clusterrolebindings") → Get cluster-wide bindings
3. Filter by subjects containing "finance"
4. For each binding:
- get_resource("prod", "role", role_name) → Get role permissions
5. Summarize: "Finance team has read access to secrets in finance namespace"
8. Disaster Recovery Workflow
User: "Restore the database from backup"
AI Agent Flow:
1. list_resources("prod", "pods", labels={"app": "database"}) → Find DB pods
2. scale_resource("prod", "deployment", "database", replicas=0) → Scale down
3. Wait for pods to terminate
4. apply_manifest("prod", restore_job_manifest) → Apply restore job
5. Monitor job status until completion
6. scale_resource("prod", "deployment", "database", replicas=1) → Scale back up
7. Verify database connectivity
Best Practices for AI Agents
- Always Preview Changes: Use diff/dry-run before modifications
- Provide Context: Include relevant information when reporting issues
- Batch Operations: Group related operations for efficiency
- Handle Pagination: Be aware of large result sets
- Respect Cluster Boundaries: Never assume operations across clusters
Tool Safety Classifications
All MCP tools are classified by their safety level for AI agent operations:
[READ-ONLY] Tools - Safe for autonomous use
These tools only read data and never modify cluster or local state:
Resource Operations:
get_resource()- Get specific resource detailsget_resource_yaml()- Get resource as YAMLlist_resources()- List resources with formatting optionslist_resources_yaml()- List resources as YAMLquery_by_labels()- Advanced label-based queriesquery_by_labels_yaml()- Label queries as YAML
Monitoring & Analysis:
troubleshoot_pod()- Comprehensive pod debuggingtroubleshoot_pod_yaml()- Pod debugging as YAMLcluster_health()- Overall cluster health assessmentdeployment_status()- Deployment rollout analysisdeployment_status_yaml()- Deployment analysis as YAMLresource_usage()- Resource consumption analysisget_logs()- Retrieve pod logslist_port_forwards()- List active port-forward sessions
Discovery:
discover_resource_kinds()- Find resource types by keywordsdiscover_clusters()- List available clustersdiscover_cluster_aliases()- List cluster aliasesdiscover_port_forward_services()- List port-forward configurations
[MODIFIES STATE] Tools - Require user approval
These tools modify cluster state or local network configuration:
Resource Management:
create_resource()- Create new Kubernetes resourcesupdate_resource()- Update existing resourcespatch_resource()- Partially update resourcesscale_resource()- Change replica countsapply_manifest()- Apply YAML manifests
Network Operations:
connect_to_service_targets()- Connect to configured service environmentssetup_port_forward()- Generic port-forward setupteardown_port_forward()- Close port-forward sessions
[DANGEROUS] Tools - Require explicit user confirmation
These tools can cause data loss or security risks:
delete_resource()- Permanently delete resources (cannot be undone)execute_in_pod()- Run arbitrary commands in pods (potential security risk)
AI Agent Safety Guidelines
- Autonomous Operations: AI agents can freely use [READ-ONLY] tools
- State Changes: [MODIFIES STATE] tools should require user approval
- High-Risk Operations: [DANGEROUS] tools must have explicit user confirmation
- Error Handling: All tools return consistent error formats for safe parsing
Architecture
k8s-mcp/
├── main.py # MCP server entry point with tool definitions
├── config/ # Configuration management
│ ├── loader.py # URL-based config loading
│ ├── manager.py # Cluster and port-forward session management
│ └── config.py # Environment validation and configuration
├── tools/ # Modular tool implementations
│ ├── resources.py # Common resource operation patterns
│ ├── formatting.py # All resource formatting (table, YAML, JSON)
│ ├── columns.py # Default column definitions for resources
│ ├── discovery.py # Resource kind discovery and CRD enumeration
│ └── port_forward.py # Port-forward specific operations
└── utils/ # Shared utilities
├── helpers.py # Common helper functions
└── jsonpath.py # JSONPath processing for custom columns
Code Organization Principles
- Single Responsibility: Each module has a clear, focused purpose
- DRY Compliance: No duplicate implementations across files
- Separation of Concerns: Client management, formatting, and business logic are separated
- Modular Design: Tools can be imported and reused independently
Server Config
{
"mcpServers": {
"k8s": {
"command": "uv",
"args": [
"--directory",
"/path/to/k8s-mcp",
"run",
"k8s-mcp"
],
"env": {
"KUBECONFIG_ALIASES": "development=dev production=prod",
"KUBECONFIG_URLS": "dev=file:///path/to/development/k3s/kubeconfig.yaml prod=file:///path/to/production/k3s/kubeconfig.yaml",
"PORT_FORWARD_CONFIG_URL": "file:///path/to/k8s-mcp/port-forward-config.yaml"
}
}
}
}