- AzPolicyMCP
AzPolicyMCP
AzPolicyMCP
Simple MCP server to help create Azure Policies for any resource type.
Overview
This document outlines the requirements for an Azure Policy Model Context Protocol (MCP) Server. The primary goal of this server is to empower Large Language Models (LLMs) to assist users in generating, validating, and deploying Azure custom policies effectively. It solves the problem of LLMs generating potentially incorrect or non-compliant Azure policy JSON by providing tools to fetch relevant built-in policies as examples, validate the structure of generated policies against the official schema, and manage policy assignments via the Azure REST API. It also aims to assist the LLM in selecting the appropriate policy effect based on user intent (audit/deny vs. remediation). The target user is an LLM application (like a chatbot or code assistant) that needs to interact with Azure Policy definitions and assignments. The value lies in providing a standardized, reliable interface for LLMs to create, validate, and deploy accurate and compliant Azure policies based on user requests.
Core Features
-
get_builtin_policiesTool (Implemented):- What it does: Fetches the top-level categories of Azure built-in policies from the
Azure/azure-policyGitHub repository. Can optionally filter categories by name. - Why it's important: Allows the LLM to discover the available policy categories (e.g., 'Storage', 'Compute', 'Network').
- How it works: Uses the GitHub API to list directories within the
built-in-policies/policyDefinitionspath. Returns a list of category names and their corresponding paths.
- What it does: Fetches the top-level categories of Azure built-in policies from the
-
get_policies_in_categoryTool (Implemented):- What it does: Fetches the individual policy definition files (names, paths, download URLs) within a specified category path obtained from
get_builtin_policies. Can optionally filter policies by filename. - Why it's important: Enables the LLM to drill down into a specific area and find relevant policy examples.
- How it works: Uses the GitHub API to list files within the provided category path. Returns a list of JSON file names, their paths, and direct download URLs.
- What it does: Fetches the individual policy definition files (names, paths, download URLs) within a specified category path obtained from
-
get_policy_contentTool (Implemented):- What it does: Fetches the raw JSON content of a specific policy definition using its direct download URL obtained from
get_policies_in_category. - Why it's important: Provides the actual policy JSON, which the LLM can use as a concrete example or template.
- How it works: Performs an HTTP GET request to the provided GitHub raw content URL. Returns the policy JSON as a string.
- What it does: Fetches the raw JSON content of a specific policy definition using its direct download URL obtained from
-
verify_policy_structureTool (Temporarily Disabled):- What it does: Validates a given JSON string against the official Azure Policy definition schema.
- Why it's important: Ensures that any custom policy generated by the LLM adheres to the correct syntax and structure required by Azure before being presented to the end-user, preventing deployment errors.
- How it works (Intended): Takes a policy definition (as a JSON string) as input. Parses the JSON and validates it against a locally stored copy of the official Azure Policy JSON schema (
schemas/policyDefinition.json) using thejsonschemalibrary. Returns a success message or validation errors. (Currently commented out inserver.pydue to issues.)
-
deploy_policy_assignmentTool (New Requirement):- What it does: Creates or updates an Azure Policy Assignment using a provided policy definition (JSON or ID) and assignment parameters (scope, name, etc.).
- Why it's important: Enables the LLM to complete the workflow by deploying the generated and validated policy into the target Azure environment.
- How it works: Takes the policy definition details (potentially the JSON content or an ID of an existing definition), assignment scope (e.g., subscription, resource group), assignment name, display name, description, parameters, and potentially identity details (for remediation tasks) as input. Constructs and executes the appropriate Azure REST API call (PUT) to
providers/Microsoft.Authorization/policyAssignmentsendpoint. Requires Azure authentication credentials configured for the server environment.
-
Intent Identification Support (New Requirement):
- What it is: Functionality or guidance to help the LLM determine whether the user's goal is to audit/prevent non-compliant resources (using effects like
Audit,Deny) or to remediate existing ones (using effects likeDeployIfNotExists,Modify). - Why it's important: Selecting the correct policy effect is critical for achieving the user's desired outcome and correctly structuring the
policyRule. - How it works: This might involve:
- Enhancing the
create_policy_prompt(Feature 9) to explicitly ask the LLM to clarify intent with the user. - Potentially adding a tool to analyze a draft policy definition and suggest appropriate effects based on keywords or structure.
- Providing guidance (e.g., in a rule/resource) on common use cases for different effects.
- Enhancing the
- What it is: Functionality or guidance to help the LLM determine whether the user's goal is to audit/prevent non-compliant resources (using effects like
-
(Optional Enhancement)
azure_policy_schemaResource:- What it does: Exposes the official Azure Policy JSON schema via an MCP resource URI (e.g.,
schema://azurepolicy). - Why it's important: Allows the LLM client to fetch the schema definition directly, enabling it to better understand the target structure before attempting generation.
- How it works: This is already implemented in the current version.
- What it does: Exposes the official Azure Policy JSON schema via an MCP resource URI (e.g.,
-
(Optional Enhancement)
azure_resource_typesResource:- What it does: Provides a list of common Azure resource provider namespaces and types via an MCP resource URI (e.g.,
types://azure/resources). - Why it's important: Helps the LLM use the correct identifiers for resource types within policy rules.
- What it does: Provides a list of common Azure resource provider namespaces and types via an MCP resource URI (e.g.,
-
(Optional Enhancement)
create_policy_promptPrompt:- What it does: Defines a reusable MCP prompt template to guide the LLM in the policy creation process, including intent clarification.
- Why it's important: Standardizes the workflow for the LLM, prompting it to use the available tools and resources effectively (e.g., "Generate a policy for {resource_type} to enforce {requirement}. Clarify if this should audit/block new resources or remediate existing ones. Use the policy fetching tools for examples and
verify_policy_structurebefore finalizing. Usedeploy_policy_assignmentto deploy.").
User Experience
The primary "user" of this MCP server is the LLM client application. The interaction flow is as follows:
- End-user requests a custom Azure policy from the LLM application (e.g., "Create a policy to enforce HTTPS on App Services").
- LLM application interacts with the Azure Policy MCP Server.
- LLM calls
get_builtin_policies(optionally with a query) to find relevant policy categories. - LLM calls
get_policies_in_categoryfor a chosen category path (optionally with a query) to find relevant policy files. - LLM calls
get_policy_contentusing the download URL for one or more policies to get examples. - LLM may call
read_resourceonschema://azurepolicy(if implemented) to understand the required structure. - LLM interacts with the user (potentially guided by
create_policy_prompt) to clarify intent (audit/block vs. remediate) and determine the appropriate policy effect. - LLM generates the custom policy JSON based on the user request, clarified intent, and retrieved examples/schema.
- LLM should call
verify_policy_structure(once enabled) with the generated JSON. - If valid, the LLM confirms the deployment scope and parameters with the user.
- LLM calls
deploy_policy_assignmentwith the validated policy definition and assignment details. - LLM presents the outcome (success or failure with details) of the deployment to the end-user.
- If validation (step 9) or deployment (step 11) fails, the LLM uses the error feedback to correct the policy/parameters and re-validates/re-attempts deployment.
Technical Architecture
- Core Framework: Python
mcp-sdkusingFastMCP. - Server Components:
server.py: Main FastMCP application definition, registers tools (get_builtin_policies,get_policies_in_category,get_policy_content,deploy_policy_assignment(new), and the disabledverify_policy_structure). Loads schema for validation.schemas/policyDefinition.json: Locally stored copy of the official Azure Policy JSON schema file (used by the disabledverify_policy_structure).- (New) Module/logic for handling Azure authentication (e.g., using
azure-identitylibrary with environment variables for credentials like Service Principal). Secure credential handling is crucial.
- Data Models:
- Input/Output for MCP tools (defined by function signatures and type hints in
server.py). - Azure Policy JSON structure (handled as dict/string).
- Azure Policy Schema JSON structure (loaded by
jsonschemainserver.py).
- Input/Output for MCP tools (defined by function signatures and type hints in
- APIs and Integrations:
- MCP protocol interface exposed by
FastMCP. - GitHub API via
requests(used by policy fetching tools). - (New) Azure REST API via
requests/httpx(used bydeploy_policy_assignment).
- MCP protocol interface exposed by
- Key Libraries:
mcp[cli],requests,jsonschema, (New)azure-identity(recommended for auth). - Infrastructure: Python 3.x environment. Requires secure configuration of Azure credentials (e.g., environment variables, managed identity if hosted in Azure).
Development Roadmap
-
Phase 1: MVP (Core Functionality)
- Set up basic
FastMCPserver project structure (server.py). - Obtain and store the official Azure Policy JSON schema in
schemas/. - Implement and fix the
verify_policy_structuretool usingjsonschemaand the stored schema. (Currently blocked/disabled) - Implement
get_builtin_policies,get_policies_in_category,get_policy_contentusing the GitHub API (requests). - Basic unit tests for all tools.
- Basic
README.mdexplaining setup and usage.
- Set up basic
-
Phase 2: Enhancements & Reliability
- Implement the
azure_policy_schemaMCP Resource. - Implement the
azure_resource_typesMCP Resource (requires curating this list). - Implement the
create_policy_promptMCP Prompt, including guidance for intent clarification. - Implement a mechanism to check for/update the schema file.
- Improve error handling and logging within tools (timeouts added, further improvements possible).
- Expand test coverage.
- Implement the
-
Phase 3: Deployment & Advanced Features (New)
- Implement robust Azure authentication mechanism (e.g., using
azure-identitywith Service Principal or Managed Identity). - Implement the
deploy_policy_assignmenttool, handling different scopes and parameters. - Add logic/tooling to support intent identification (audit/deny vs. remediate) if simple prompting is insufficient.
- Implement support for assigning policies that require Managed Identities (for
deployIfNotExists/Modify). - Further enhance test coverage, including integration tests for deployment (requires careful setup).
- Implement robust Azure authentication mechanism (e.g., using
Logical Dependency Chain
- Establish base Python project with
mcp-sdk. - Acquire and integrate the Azure Policy JSON schema (
schemas/policyDefinition.json). - Implement GitHub API data access tools (
get_builtin_policies,get_policies_in_category,get_policy_content). - Implement and fix
verify_policy_structure(depends on schema). (Blocked) - Implement optional Resources.
- Implement optional Prompts (including intent clarification).
- Implement Azure Authentication.
- Implement
deploy_policy_assignment(depends on validation and auth). - Implement reliability features (schema updates).
Focus was initially on getting policy retrieval working via the GitHub API. The next focus is resolving the validation issue, followed by implementing deployment capabilities.
Current Status & Next Steps
- Status: The MCP server is running. The tools for discovering policy categories (
get_builtin_policies), listing policies within categories (get_policies_in_category), and fetching policy content (get_policy_content) using the GitHub API are implemented and functional. Basic error handling and timeouts are included. Theverify_policy_structuretool is implemented but currently commented out due to runtime errors/validation logic issues. The schema file (schemas/policyDefinition.json) is present. - Next Action Plan:
- Debug and Fix
verify_policy_structure: Uncomment the tool inserver.pyand diagnose thejsonschemavalidation errors. Ensure it correctly validates policy JSON against theschemas/policyDefinition.jsonfile. - Add Basic README: Create a
README.mdexplaining how to set up the environment, run the server, and use the available tools. - Unit Tests: Begin adding unit tests, starting with the currently functional policy fetching tools.
- Debug and Fix
Risks and Mitigations
- Risk:
verify_policy_structuretool proves difficult to fix or requires significant schema adjustments.- Mitigation: Deep dive into
jsonschemadocumentation and the specific validation errors. Compare the downloaded schema against Azure documentation examples. If necessary, seek simpler validation approaches initially.
- Mitigation: Deep dive into
- (New) Risk: Handling Azure authentication securely and reliably is complex.
- Mitigation: Use standard libraries like
azure-identity. Follow best practices for credential management (environment variables, Key Vault, Managed Identity). Clearly document setup requirements.
- Mitigation: Use standard libraries like
- (New) Risk: The Azure Policy Assignments REST API has nuances (scopes, parameters, identity management for remediation).
- Mitigation: Start with simpler assignment scenarios (e.g., Audit policies at Resource Group scope). Incrementally add support for parameters and different scopes. Refer extensively to the Azure REST API documentation. Implement thorough error handling for API responses.
- (New) Risk: LLM might misinterpret user intent regarding policy effect (audit/deny vs. remediation).
- Mitigation: Design clear prompts (
create_policy_prompt). Provide examples. Log interactions to identify common LLM mistakes. Consider adding explicit checks or a dedicated analysis tool if prompting alone is insufficient.
- Mitigation: Design clear prompts (
- (New) Risk: Deploying incorrect policies or assignments can have negative impacts on the Azure environment.
- Mitigation: Emphasize the importance of the
verify_policy_structurestep. Implement checks withindeploy_policy_assignmentfor required parameters. Encourage user confirmation before deployment. Advise testing in non-production environments.
- Mitigation: Emphasize the importance of the
- Risk: GitHub API rate limits or downtime affecting policy fetching tools.
- Mitigation: Current implementation uses unauthenticated requests (potential for lower rate limits). Document this limitation. Consider adding optional GitHub token support for authenticated requests (higher limits) later. Implement sensible retry logic if transient errors become common. Cache results briefly if needed.
- Risk: Keeping the locally stored schema (
schemas/policyDefinition.json) up-to-date requires effort.- Mitigation: Add a periodic check/manual update process to the backlog (Phase 2). Document the source and date of the current schema.
Appendix
- Azure Policy Documentation
- Azure Policy GitHub Repository
- (New) Azure Policy Assignments REST API
- MCP Python SDK Documentation (Referenced from user-provided
README.md) - JSON Schema Specification
- Requests Library
- jsonschema Library
- (New) azure-identity Library