AzPolicyMCP

Created By

Jitha-afk9 months ago

Simple MCP server to help create Azure Policies for any resource type.

# azpolicymcp

# azure-policies

Overview Content Tools Comments

Content

AzPolicyMCP

Simple MCP server to help create Azure Policies for any resource type.

Overview

This document outlines the requirements for an Azure Policy Model Context Protocol (MCP) Server. The primary goal of this server is to empower Large Language Models (LLMs) to assist users in generating, validating, and deploying Azure custom policies effectively. It solves the problem of LLMs generating potentially incorrect or non-compliant Azure policy JSON by providing tools to fetch relevant built-in policies as examples, validate the structure of generated policies against the official schema, and manage policy assignments via the Azure REST API. It also aims to assist the LLM in selecting the appropriate policy effect based on user intent (audit/deny vs. remediation). The target user is an LLM application (like a chatbot or code assistant) that needs to interact with Azure Policy definitions and assignments. The value lies in providing a standardized, reliable interface for LLMs to create, validate, and deploy accurate and compliant Azure policies based on user requests.

Core Features

get_builtin_policies Tool (Implemented):
- What it does: Fetches the top-level categories of Azure built-in policies from the Azure/azure-policy GitHub repository. Can optionally filter categories by name.
- Why it's important: Allows the LLM to discover the available policy categories (e.g., 'Storage', 'Compute', 'Network').
- How it works: Uses the GitHub API to list directories within the built-in-policies/policyDefinitions path. Returns a list of category names and their corresponding paths.
get_policies_in_category Tool (Implemented):
- What it does: Fetches the individual policy definition files (names, paths, download URLs) within a specified category path obtained from get_builtin_policies. Can optionally filter policies by filename.
- Why it's important: Enables the LLM to drill down into a specific area and find relevant policy examples.
- How it works: Uses the GitHub API to list files within the provided category path. Returns a list of JSON file names, their paths, and direct download URLs.
get_policy_content Tool (Implemented):
- What it does: Fetches the raw JSON content of a specific policy definition using its direct download URL obtained from get_policies_in_category.
- Why it's important: Provides the actual policy JSON, which the LLM can use as a concrete example or template.
- How it works: Performs an HTTP GET request to the provided GitHub raw content URL. Returns the policy JSON as a string.
verify_policy_structure Tool (Temporarily Disabled):
- What it does: Validates a given JSON string against the official Azure Policy definition schema.
- Why it's important: Ensures that any custom policy generated by the LLM adheres to the correct syntax and structure required by Azure before being presented to the end-user, preventing deployment errors.
- How it works (Intended): Takes a policy definition (as a JSON string) as input. Parses the JSON and validates it against a locally stored copy of the official Azure Policy JSON schema (schemas/policyDefinition.json) using the jsonschema library. Returns a success message or validation errors. (Currently commented out in server.py due to issues.)
deploy_policy_assignment Tool (New Requirement):
- What it does: Creates or updates an Azure Policy Assignment using a provided policy definition (JSON or ID) and assignment parameters (scope, name, etc.).
- Why it's important: Enables the LLM to complete the workflow by deploying the generated and validated policy into the target Azure environment.
- How it works: Takes the policy definition details (potentially the JSON content or an ID of an existing definition), assignment scope (e.g., subscription, resource group), assignment name, display name, description, parameters, and potentially identity details (for remediation tasks) as input. Constructs and executes the appropriate Azure REST API call (PUT) to providers/Microsoft.Authorization/policyAssignments endpoint. Requires Azure authentication credentials configured for the server environment.
Intent Identification Support (New Requirement):
- What it is: Functionality or guidance to help the LLM determine whether the user's goal is to audit/prevent non-compliant resources (using effects like Audit, Deny) or to remediate existing ones (using effects like DeployIfNotExists, Modify).
- Why it's important: Selecting the correct policy effect is critical for achieving the user's desired outcome and correctly structuring the policyRule.
- How it works: This might involve:
  - Enhancing the create_policy_prompt (Feature 9) to explicitly ask the LLM to clarify intent with the user.
  - Potentially adding a tool to analyze a draft policy definition and suggest appropriate effects based on keywords or structure.
  - Providing guidance (e.g., in a rule/resource) on common use cases for different effects.
(Optional Enhancement) azure_policy_schema Resource:
- What it does: Exposes the official Azure Policy JSON schema via an MCP resource URI (e.g., schema://azurepolicy).
- Why it's important: Allows the LLM client to fetch the schema definition directly, enabling it to better understand the target structure before attempting generation.
- How it works: This is already implemented in the current version.
(Optional Enhancement) azure_resource_types Resource:
- What it does: Provides a list of common Azure resource provider namespaces and types via an MCP resource URI (e.g., types://azure/resources).
- Why it's important: Helps the LLM use the correct identifiers for resource types within policy rules.
(Optional Enhancement) create_policy_prompt Prompt:
- What it does: Defines a reusable MCP prompt template to guide the LLM in the policy creation process, including intent clarification.
- Why it's important: Standardizes the workflow for the LLM, prompting it to use the available tools and resources effectively (e.g., "Generate a policy for {resource_type} to enforce {requirement}. Clarify if this should audit/block new resources or remediate existing ones. Use the policy fetching tools for examples and verify_policy_structure before finalizing. Use deploy_policy_assignment to deploy.").

User Experience

The primary "user" of this MCP server is the LLM client application. The interaction flow is as follows:

End-user requests a custom Azure policy from the LLM application (e.g., "Create a policy to enforce HTTPS on App Services").
LLM application interacts with the Azure Policy MCP Server.
LLM calls get_builtin_policies (optionally with a query) to find relevant policy categories.
LLM calls get_policies_in_category for a chosen category path (optionally with a query) to find relevant policy files.
LLM calls get_policy_content using the download URL for one or more policies to get examples.
LLM may call read_resource on schema://azurepolicy (if implemented) to understand the required structure.
LLM interacts with the user (potentially guided by create_policy_prompt) to clarify intent (audit/block vs. remediate) and determine the appropriate policy effect.
LLM generates the custom policy JSON based on the user request, clarified intent, and retrieved examples/schema.
LLM should call verify_policy_structure (once enabled) with the generated JSON.
If valid, the LLM confirms the deployment scope and parameters with the user.
LLM calls deploy_policy_assignment with the validated policy definition and assignment details.
LLM presents the outcome (success or failure with details) of the deployment to the end-user.
If validation (step 9) or deployment (step 11) fails, the LLM uses the error feedback to correct the policy/parameters and re-validates/re-attempts deployment.

Technical Architecture

Core Framework: Python mcp-sdk using FastMCP.
Server Components:
- server.py: Main FastMCP application definition, registers tools (get_builtin_policies, get_policies_in_category, get_policy_content, deploy_policy_assignment (new), and the disabled verify_policy_structure). Loads schema for validation.
- schemas/policyDefinition.json: Locally stored copy of the official Azure Policy JSON schema file (used by the disabled verify_policy_structure).
- (New) Module/logic for handling Azure authentication (e.g., using azure-identity library with environment variables for credentials like Service Principal). Secure credential handling is crucial.
Data Models:
- Input/Output for MCP tools (defined by function signatures and type hints in server.py).
- Azure Policy JSON structure (handled as dict/string).
- Azure Policy Schema JSON structure (loaded by jsonschema in server.py).
APIs and Integrations:
- MCP protocol interface exposed by FastMCP.
- GitHub API via requests (used by policy fetching tools).
- (New) Azure REST API via requests/httpx (used by deploy_policy_assignment).
Key Libraries: mcp[cli], requests, jsonschema, (New) azure-identity (recommended for auth).
Infrastructure: Python 3.x environment. Requires secure configuration of Azure credentials (e.g., environment variables, managed identity if hosted in Azure).

Development Roadmap

Phase 1: MVP (Core Functionality)
- Set up basic FastMCP server project structure (server.py).
- Obtain and store the official Azure Policy JSON schema in schemas/.
- Implement and fix the verify_policy_structure tool using jsonschema and the stored schema. (Currently blocked/disabled)
- Implement get_builtin_policies, get_policies_in_category, get_policy_content using the GitHub API (requests).
- Basic unit tests for all tools.
- Basic README.md explaining setup and usage.
Phase 2: Enhancements & Reliability
- Implement the azure_policy_schema MCP Resource.
- Implement the azure_resource_types MCP Resource (requires curating this list).
- Implement the create_policy_prompt MCP Prompt, including guidance for intent clarification.
- Implement a mechanism to check for/update the schema file.
- Improve error handling and logging within tools (timeouts added, further improvements possible).
- Expand test coverage.
Phase 3: Deployment & Advanced Features (New)
- Implement robust Azure authentication mechanism (e.g., using azure-identity with Service Principal or Managed Identity).
- Implement the deploy_policy_assignment tool, handling different scopes and parameters.
- Add logic/tooling to support intent identification (audit/deny vs. remediate) if simple prompting is insufficient.
- Implement support for assigning policies that require Managed Identities (for deployIfNotExists/Modify).
- Further enhance test coverage, including integration tests for deployment (requires careful setup).

Logical Dependency Chain

Establish base Python project with mcp-sdk.
Acquire and integrate the Azure Policy JSON schema (schemas/policyDefinition.json).
Implement GitHub API data access tools (get_builtin_policies, get_policies_in_category, get_policy_content).
Implement and fix verify_policy_structure (depends on schema). (Blocked)
Implement optional Resources.
Implement optional Prompts (including intent clarification).
Implement Azure Authentication.
Implement deploy_policy_assignment (depends on validation and auth).
Implement reliability features (schema updates).

Focus was initially on getting policy retrieval working via the GitHub API. The next focus is resolving the validation issue, followed by implementing deployment capabilities.

Current Status & Next Steps

Status: The MCP server is running. The tools for discovering policy categories (get_builtin_policies), listing policies within categories (get_policies_in_category), and fetching policy content (get_policy_content) using the GitHub API are implemented and functional. Basic error handling and timeouts are included. The verify_policy_structure tool is implemented but currently commented out due to runtime errors/validation logic issues. The schema file (schemas/policyDefinition.json) is present.
Next Action Plan:
1. Debug and Fix verify_policy_structure: Uncomment the tool in server.py and diagnose the jsonschema validation errors. Ensure it correctly validates policy JSON against the schemas/policyDefinition.json file.
2. Add Basic README: Create a README.md explaining how to set up the environment, run the server, and use the available tools.
3. Unit Tests: Begin adding unit tests, starting with the currently functional policy fetching tools.

Risks and Mitigations

Risk: verify_policy_structure tool proves difficult to fix or requires significant schema adjustments.
- Mitigation: Deep dive into jsonschema documentation and the specific validation errors. Compare the downloaded schema against Azure documentation examples. If necessary, seek simpler validation approaches initially.
(New) Risk: Handling Azure authentication securely and reliably is complex.
- Mitigation: Use standard libraries like azure-identity. Follow best practices for credential management (environment variables, Key Vault, Managed Identity). Clearly document setup requirements.
(New) Risk: The Azure Policy Assignments REST API has nuances (scopes, parameters, identity management for remediation).
- Mitigation: Start with simpler assignment scenarios (e.g., Audit policies at Resource Group scope). Incrementally add support for parameters and different scopes. Refer extensively to the Azure REST API documentation. Implement thorough error handling for API responses.
(New) Risk: LLM might misinterpret user intent regarding policy effect (audit/deny vs. remediation).
- Mitigation: Design clear prompts (create_policy_prompt). Provide examples. Log interactions to identify common LLM mistakes. Consider adding explicit checks or a dedicated analysis tool if prompting alone is insufficient.
(New) Risk: Deploying incorrect policies or assignments can have negative impacts on the Azure environment.
- Mitigation: Emphasize the importance of the verify_policy_structure step. Implement checks within deploy_policy_assignment for required parameters. Encourage user confirmation before deployment. Advise testing in non-production environments.
Risk: GitHub API rate limits or downtime affecting policy fetching tools.
- Mitigation: Current implementation uses unauthenticated requests (potential for lower rate limits). Document this limitation. Consider adding optional GitHub token support for authenticated requests (higher limits) later. Implement sensible retry logic if transient errors become common. Cache results briefly if needed.
Risk: Keeping the locally stored schema (schemas/policyDefinition.json) up-to-date requires effort.
- Mitigation: Add a periodic check/manual update process to the backlog (Phase 2). Document the source and date of the current schema.