Maintenance Configurations
Overview
Azure Maintenance Configurations provide automated patch management for VMs through scheduled maintenance windows. This implementation uses dynamic scope assignments to automatically target VMs based on filters (tags, resource groups, locations), eliminating the need to explicitly list individual VMs.
The system integrates Event Grid, Automation Runbooks, and Azure Resource Graph to orchestrate pre- and post-maintenance workflows, including automated VM power state management. VM discovery within runbooks is performed dynamically at runtime using Azure Resource Graph queries based on maintenance correlation IDs.
Module Structure
The patch automation system consists of several integrated modules:
- maintenance_configuration - Defines maintenance schedules, patching rules, and recurrence patterns
- maintenance_assignment_dynamic_scope - Automatically targets VMs using tag-based filters (no explicit VM lists required)
- eventgrid_system_topic - Creates Event Grid topics that emit pre/post maintenance events
- eventgrid_event_subscription - Routes maintenance events to automation webhooks with configurable event types
- automation_account - Provides system-assigned managed identity for secure runbook execution
- automation_runbook - PowerShell scripts using Azure Resource Graph for dynamic VM discovery and power management
- automation_webhook - HTTP endpoints that receive Event Grid events and trigger runbooks
Note: This automation requires a custom RBAC role with specific permissions for VM power management and tagging.
Architecture Overview
The patch automation workflow operates as follows:
- Maintenance Configuration - Defines when patching occurs (schedule, duration, recurrence, patch classifications)
- Dynamic Scope Assignments - Azure automatically discovers target VMs at runtime using tag filters (e.g.,
PatchGroup = "1") - Event Grid System Topics - Emit events when maintenance windows start and complete
- Event Grid Event Subscriptions - Route
PreMaintenanceEventandPostMaintenanceEventto webhooks - Automation Webhooks - Receive events with maintenance correlation IDs and trigger runbooks
- Automation Runbooks - Execute PowerShell scripts with managed identity that:
- Parse Event Grid payloads to extract correlation IDs
- Query Azure Resource Graph to dynamically find VMs in the maintenance window
- Require custom RBAC role with VM power management and tagging permissions
Usage
1. Configure Patch Group Tagging on VMs
First, enable patch group tagging on your Windows VMs:
windows_vms = {
hsw = {
names = ["azwu2nhsw001", "azwu2nhsw002"]
patch_group_tagging = true # Enable patch group tagging
patch_group_tag = "PatchGroup" # Tag key (default: "PatchGroup")
# ... other VM configuration
}
}
This will automatically tag VMs with:
PatchGroup = "1"for VMs ending in odd numbers (001, 003, etc.)PatchGroup = "2"for VMs ending in even numbers (002, 004, etc.)
2. Create Maintenance Configurations
Define maintenance schedules in your terraform.tfvars:
maintenance_configurations = {
patch_tuesday = {
resource_group = "hsw"
window = {
start_date_time = "2026-02-10 02:00" # Second Tuesday of the month
duration = "03:00" # 3 hour window
recur_every = "1Month Second Tuesday" # Monthly recurrence on Second Tuesday
}
install_patches = {
reboot = "IfRequired"
windows = [{
classifications_to_include = ["Critical", "Security", "UpdateRollup", "ServicePack"]
}]
}
}
patch_thursday = {
resource_group = "hsw"
window = {
start_date_time = "2026-02-11 02:00"
duration = "03:00"
recur_every = "1Month Second Thursday"
}
install_patches = {
reboot = "IfRequired"
windows = [{
classifications_to_include = ["Critical", "Security", "UpdateRollup"]
}]
}
}
}
3. Create Dynamic Scope Assignments
Create dynamic assignments that automatically target VMs based on filters:
dynamic_scope_assignments = {
patch_group_1 = {
maintenance_configuration = "patch_tuesday"
tag_filters = [
{
tag = "PatchGroup"
values = ["1"]
},
{
tag = "environment"
values = ["dev"]
}
]
}
patch_group_2 = {
maintenance_configuration = "patch_thursday"
tag_filters = [
{
tag = "PatchGroup"
values = ["2"]
},
{
tag = "environment"
values = ["dev"]
}
]
}
}
4. Create Automation Account
Create an automation account with a system-assigned managed identity:
automation_account = {
maintenance = {
resource_group = "hsw"
sku_name = "Basic"
identity = {
type = "SystemAssigned"
}
tags = {}
}
}
5. Create Automation Runbooks
Define PowerShell runbooks that respond to maintenance events:
automation_runbook = {
pre_maintenance_power_on = {
automation_account = "maintenance"
runbook_type = "PowerShell"
description = "Pre-maintenance task - Power on VMs"
content = <<-EOT
param(
[object]$WebhookData
)
# Parse correlation ID from webhook input
if ($WebhookData -is [string]) {
# Direct string input for manual runs/testing
$CorrelationId = $WebhookData
} else {
# Event Grid webhook - RequestBody is a JSON string
$events = $WebhookData.RequestBody | ConvertFrom-Json
$CorrelationId = $events[0].data.CorrelationId
}
Write-Output "Starting pre-maintenance power on for correlation ID: $CorrelationId"
# Connect to Azure and get access token
Connect-AzAccount -Identity | Out-Null
$resource = "https://management.azure.com/"
$tokenAuthUri = $env:IDENTITY_ENDPOINT + "?resource=$resource&api-version=2019-08-01"
$tokenResponse = Invoke-RestMethod -Headers @{"X-IDENTITY-HEADER" = $env:IDENTITY_HEADER } -Method GET -Uri $tokenAuthUri
$accessToken = $tokenResponse.access_token
# Query for powered off VMs in this maintenance run
$Query = @"
maintenanceresources
| where properties.correlationId =~ '$CorrelationId'
| where type =~ 'microsoft.maintenance/applyupdates'
| extend targetResourceId=tostring(properties.resourceId)
| extend targetResourceIdLower=tolower(targetResourceId)
| join kind=inner (
resources
| where type =~ 'microsoft.compute/virtualmachines'
| extend powerState = tostring(properties.extended.instanceView.powerState.code)
| extend idLower = tolower(id)
) on `$left.targetResourceIdLower == `$right.idLower
| where powerState !~ 'PowerState/running'
| extend vmName = tostring(split(targetResourceId, '/')[-1])
| extend vmResourceGroup = tostring(split(targetResourceId, '/')[4])
| project id=targetResourceId, name=vmName, resourceGroup=vmResourceGroup
"@
$argUrl = "https://management.azure.com/providers/Microsoft.ResourceGraph/resources?api-version=2021-03-01"
$body = @{ query = $Query } | ConvertTo-Json
$response = Invoke-RestMethod -Method POST -Uri $argUrl -Headers @{ Authorization = "Bearer $accessToken" } -ContentType "application/json" -Body $body
# Retrieve VMs from response
$VMs = @($response.data)
if (-not $VMs) {
Write-Output "No powered off VMs found for this maintenance run"
exit
}
Write-Output "Found $($VMs.Count) powered off VMs to start"
$Timestamp = Get-Date -Format "yyyy-MM-dd HH:mm:ss"
# Start all VMs
foreach ($VM in $VMs) {
Write-Output "Starting VM: $($VM.name) in $($VM.resourceGroup)"
Start-AzVM -ResourceGroupName $VM.resourceGroup -Name $VM.name -NoWait
}
# Tag all VMs to track they were powered on by automation
foreach ($VM in $VMs) {
Update-AzTag -ResourceId $VM.id -Tag @{AutomationPoweredOn = $Timestamp} -Operation Merge
Write-Output "Tagged $($VM.name) with AutomationPoweredOn=$Timestamp"
}
Write-Output "Pre-maintenance power on completed"
EOT
}
post_maintenance_power_off = {
automation_account = "maintenance"
runbook_type = "PowerShell"
description = "Post-maintenance task - Power off VMs that were powered on by automation"
content = <<-EOT
param(
[object]$WebhookData
)
# Parse correlation ID from webhook input
if ($WebhookData -is [string]) {
# Direct string input for manual runs/testing
$CorrelationId = $WebhookData
} else {
# Event Grid webhook - RequestBody is a JSON string
$events = $WebhookData.RequestBody | ConvertFrom-Json
$CorrelationId = $events[0].data.CorrelationId
}
Write-Output "Starting post-maintenance power off for correlation ID: $CorrelationId"
# Connect to Azure and get access token
Connect-AzAccount -Identity | Out-Null
$resource = "https://management.azure.com/"
$tokenAuthUri = $env:IDENTITY_ENDPOINT + "?resource=$resource&api-version=2019-08-01"
$tokenResponse = Invoke-RestMethod -Headers @{"X-IDENTITY-HEADER" = $env:IDENTITY_HEADER } -Method GET -Uri $tokenAuthUri
$accessToken = $tokenResponse.access_token
# Query for running VMs with AutomationPoweredOn tag in this maintenance run
$Query = @"
maintenanceresources
| where properties.correlationId =~ '$CorrelationId'
| where type =~ 'microsoft.maintenance/applyupdates'
| extend targetResourceId=tostring(properties.resourceId)
| extend targetResourceIdLower=tolower(targetResourceId)
| join kind=inner (
resources
| where type =~ 'microsoft.compute/virtualmachines'
| extend powerState = tostring(properties.extended.instanceView.powerState.code)
| extend automationPoweredOn = tostring(tags.AutomationPoweredOn)
| extend idLower = tolower(id)
) on `$left.targetResourceIdLower == `$right.idLower
| where powerState =~ 'PowerState/running'
| where isnotempty(automationPoweredOn)
| extend vmName = tostring(split(targetResourceId, '/')[-1])
| extend vmResourceGroup = tostring(split(targetResourceId, '/')[4])
| project id=targetResourceId, name=vmName, resourceGroup=vmResourceGroup
"@
$argUrl = "https://management.azure.com/providers/Microsoft.ResourceGraph/resources?api-version=2021-03-01"
$body = @{ query = $Query } | ConvertTo-Json
$response = Invoke-RestMethod -Method POST -Uri $argUrl -Headers @{ Authorization = "Bearer $accessToken" } -ContentType "application/json" -Body $body
# Retrieve VMs from response
$VMs = @($response.data)
if (-not $VMs) {
Write-Output "No running VMs with AutomationPoweredOn tag found for this maintenance run"
exit
}
Write-Output "Found $($VMs.Count) VMs to power off"
# Stop all VMs
foreach ($VM in $VMs) {
Write-Output "Stopping VM: $($VM.name) in $($VM.resourceGroup)"
Stop-AzVM -ResourceGroupName $VM.resourceGroup -Name $VM.name -Force -NoWait
}
# Remove AutomationPoweredOn tag from all VMs
foreach ($VM in $VMs) {
Update-AzTag -ResourceId $VM.id -Tag @{AutomationPoweredOn = ""} -Operation Delete
Write-Output "Removed AutomationPoweredOn tag from $($VM.name)"
}
Write-Output "Post-maintenance power off completed"
EOT
}
}
6. Create Automation Webhooks
Create webhooks that will be called by Event Grid:
automation_webhook = {
pre_maintenance_webhook = {
automation_account = "maintenance"
runbook = "pre_maintenance_power_on"
expiry_time = "2036-01-25T00:00:00Z"
enabled = true
}
post_maintenance_webhook = {
automation_account = "maintenance"
runbook = "post_maintenance_power_off"
expiry_time = "2036-01-25T00:00:00Z"
enabled = true
}
}
7. Create Event Grid System Topic
Create an Event Grid system topic for the maintenance configuration:
eventgrid_system_topic = {
maintenance_events = {
resource_group = "hsw"
maintenance_configuration = "patch_tuesday" # References your maintenance configuration
topic_type = "Microsoft.Maintenance.MaintenanceConfigurations"
}
}
8. Create Event Grid Event Subscriptions
Subscribe to maintenance events and route them to automation webhooks:
eventgrid_event_subscription = {
pre_maintenance_sub = {
system_topic = "maintenance_events"
event_delivery_schema = "EventGridSchema"
included_event_types = ["Microsoft.Maintenance.PreMaintenanceEvent"]
webhook_endpoint = {
automation_webhook = "pre_maintenance_webhook"
}
}
post_maintenance_sub = {
system_topic = "maintenance_events"
event_delivery_schema = "EventGridSchema"
included_event_types = ["Microsoft.Maintenance.PostMaintenanceEvent"]
webhook_endpoint = {
automation_webhook = "post_maintenance_webhook"
}
}
}
9. Create Custom RBAC Role and Assign to Automation Account
The automation account's managed identity requires specific permissions to manage VMs. Create a custom Azure role with the following permissions:
Required Permissions:
{
"actions": [
"Microsoft.Compute/virtualMachines/start/action",
"Microsoft.Compute/virtualMachines/powerOff/action",
"Microsoft.Compute/virtualMachines/deallocate/action",
"Microsoft.Compute/virtualMachines/read",
"Microsoft.Resources/tags/write",
"Microsoft.Resources/tags/delete",
"Microsoft.ResourceGraph/resources/read",
"Microsoft.Maintenance/applyUpdates/read",
"Microsoft.Resources/subscriptions/resourceGroups/read"
]
}
Event Flow Diagram
Maintenance Window Starts
↓
Pre-Maintenance Event Grid (40 minutes prior to scheduled maintenance)
↓
Pre-Maintenance Webhook Triggered
↓
Pre-Maintenance Runbook Executes
↓
Azure Performs Maintenance (Patching)
↓
Post-Maintenance Event Grid (15 minutes after maintenance completes)
↓
Post-Maintenance Webhook Triggered
↓
Post-Maintenance Runbook Executes
Variable Reference
maintenance_configurations
| Field | Type | Description | Default |
|---|---|---|---|
resource_group |
string | Resource group for the configuration | Required |
scope |
string | Maintenance scope | "InGuestPatch" |
visibility |
string | Configuration visibility | "Custom" |
window.start_date_time |
string | Start date/time in ISO 8601 format | Required |
window.duration |
string | Maintenance window duration | "02:00" |
window.time_zone |
string | Time zone for the schedule | Defaults to global timezone |
window.recur_every |
string | Recurrence pattern | Required |
tags |
map(string) | Resource tags | {} |
dynamic_scope_assignments
| Field | Type | Description | Default |
|---|---|---|---|
maintenance_configuration |
string | Maintenance configuration key | Required |
resource_types |
list(string) | Resource types to target | ["microsoft.compute/virtualmachines"] |
resource_groups |
list(string) | Resource groups to include | null (all) |
locations |
list(string) | Azure regions to include | Defaults to global location |
tag_filters |
list(object) | Tag-based filtering | null |
tag_filters[].tag |
string | Tag name to filter by | Required |
tag_filters[].values |
list(string) | Tag values to match | Required |
automation_account
| Field | Type | Description | Default |
|---|---|---|---|
resource_group |
string | Resource group key | Required |
sku_name |
string | SKU name | "Basic" |
identity.type |
string | Managed identity type | "SystemAssigned" |
tags |
map(string) | Resource tags | {} |
automation_runbook
| Field | Type | Description | Default |
|---|---|---|---|
automation_account |
string | Automation account key | Required |
runbook_type |
string | Runbook type | "PowerShell", "PowerShell72", "Python3", etc. |
log_progress |
bool | Enable progress logging | true |
log_verbose |
bool | Enable verbose logging | true |
description |
string | Runbook description | null |
content |
string | Inline PowerShell script content | null |
publish_content_link |
object | External script source | null |
tags |
map(string) | Resource tags | {} |
automation_webhook
| Field | Type | Description | Default |
|---|---|---|---|
automation_account |
string | Automation account key | Required |
runbook |
string | Runbook key | Required |
expiry_time |
string | Webhook expiry (ISO 8601) | "2036-01-25T00:00:00Z" |
enabled |
bool | Enable webhook | true |
parameters |
map(string) | Static parameters to pass to runbook | null |
eventgrid_system_topic
| Field | Type | Description | Default |
|---|---|---|---|
resource_group |
string | Resource group key | Required |
maintenance_configuration |
string | Maintenance configuration key | null |
source_resource_id |
string | Direct Azure resource ID | null |
topic_type |
string | Event Grid topic type | "Microsoft.Maintenance.MaintenanceConfigurations" |
tags |
map(string) | Resource tags | {} |
Note: Use either maintenance_configuration (module reference) or source_resource_id (direct ARM ID).
eventgrid_event_subscription
| Field | Type | Description | Default |
|---|---|---|---|
system_topic |
string | Event Grid system topic key | Required |
event_delivery_schema |
string | Event schema format | "EventGridSchema" |
included_event_types |
list(string) | Event types to subscribe to | null (all) |
webhook_endpoint.automation_webhook |
string | Automation webhook key | null |
webhook_endpoint.url |
string | Direct webhook URL | null |
labels |
list(string) | Subscription labels | null |
Common Event Types:
Microsoft.Maintenance.PreMaintenanceEvent- Triggered before maintenance startsMicrosoft.Maintenance.PostMaintenanceEvent- Triggered after maintenance completes
Azure Resource Graph Queries
The automation runbooks use Azure Resource Graph to query VMs affected by maintenance events. Here are the key queries used:
Find VMs in Maintenance Window
maintenanceresources
| where type == 'microsoft.maintenance/applyupdates'
| where properties.correlationId == '<CORRELATION_ID>'
| project resourceId = tostring(properties.resourceId)
| join kind=inner (
resources
| where type == 'microsoft.compute/virtualmachines'
| project resourceId = tolower(id), vmName = name, resourceGroup
) on resourceId
| project vmName, resourceGroup
This query:
- Queries the
maintenanceresourcestable for apply updates matching the correlation ID - Joins with the
resourcestable to get VM details - Returns VM names and resource groups
Naming Convention
Resources are named using the standard pattern:
- Maintenance Configuration:
{prefix}{key}{suffix}(e.g.,prod-patch_tuesday-eastus2-mc) - Maintenance Assignment:
{prefix}{key}{suffix}(e.g.,prod-patch_group_1-eastus2-ma) - Automation Account:
{prefix}{key}{suffix}(e.g.,prod-maintenance-eastus2-aa) - Automation Runbook:
{prefix}{key}{suffix}(e.g.,prod-pre_maintenance-eastus2-runbook) - Automation Webhook:
{prefix}{key}{suffix}(e.g.,prod-pre_maintenance-eastus2-webhook) - Event Grid System Topic:
{prefix}{key}{suffix}(e.g.,prod-maintenance_events-eastus2-egst) - Event Grid Event Subscription:
{prefix}{key}{suffix}(e.g.,prod-pre_maintenance_sub-eastus2-eges)
Add to your name_prefixes and name_suffixes:
name_prefixes = {
maintenance_configuration = "prod-"
maintenance_assignment = "prod-"
automation_account = "prod-"
automation_runbook = "prod-"
automation_webhook = "prod-"
eventgrid_system_topic = "prod-"
eventgrid_event_subscription = "prod-"
}
name_suffixes = {
maintenance_configuration = "-eastus2-mc"
maintenance_assignment = "-eastus2-ma"
automation_account = "-eastus2-aa"
automation_runbook = "-eastus2-runbook"
automation_webhook = "-eastus2-webhook"
eventgrid_system_topic = "-eastus2-egst"
eventgrid_event_subscription = "-eastus2-eges"
}