Monitor Alerts

Overview

Log alerts are produced by two cooperating modules: monitor_action_group defines who gets notified, and monitor_scheduled_query_rules_alert defines what to alert on by running a KQL query against a Log Analytics Workspace on a schedule. The alert module is built on azurerm_monitor_scheduled_query_rules_alert_v2, the current generation of Log Analytics alert rules.

Metric alerts are a separate kind of alert produced by the monitor_metric_alert module. Where a log alert runs a KQL query against a Log Analytics Workspace, a metric alert evaluates a platform metric emitted directly by the target resource — no workspace and no diagnostic logs required. Use a metric alert when the signal is a first-class Azure metric, such as file share used capacity, which is what drives the WBS file share sizing alert below.

Module Structure

Module	Azure Resource	Purpose
`monitor_action_group`	`azurerm_monitor_action_group`	Notification target — email, SMS, or webhook receivers grouped under one name
`monitor_scheduled_query_rules_alert`	`azurerm_monitor_scheduled_query_rules_alert_v2`	KQL-based log alert evaluated on a schedule against a Log Analytics Workspace
`monitor_metric_alert`	`azurerm_monitor_metric_alert`	Platform-metric alert evaluated directly against a resource — e.g. file share used capacity
`log_analytics_workspace`	`azurerm_log_analytics_workspace`	Workspace the alert queries — referenced by key, see Log Analytics Workspace

Architecture

Action Groups are defined once per environment and reused across alerts. Each group has a short name (used in SMS) and one or more receivers.
Log Analytics Workspaces receive diagnostic logs from monitored resources (Key Vault, Recovery Vault, VMs via VM Insights, etc.) — see Monitor Diagnostic Setting.
Scheduled Query Rules run KQL queries against a workspace on a schedule. When the query returns rows meeting the threshold for the configured number of failing periods, the rule fires and notifies its action groups.
Metric Alerts (monitor_metric_alert) evaluate a platform metric on a storage account (optionally a storage sub-service such as fileServices) and notify the same action groups. They need no workspace and no diagnostic-log routing.

Usage

1. Define At Least One Action Group

An action group is required — alerts without an action group have nowhere to notify.

monitor_action_group = {
  ops-critical = {
    resource_group = "shared"
    short_name     = "opscrit"
    email_receivers = {
      oncall = { email_address = "[email protected]" }
    }
  }
  ops-warning = {
    resource_group = "shared"
    short_name     = "opswarn"
    email_receivers = {
      platform = { email_address = "[email protected]" }
    }
  }
}

2. Define Log Alerts

monitor_scheduled_query_rules_alert accepts arbitrary KQL alerts evaluated on a schedule against a Log Analytics Workspace.

monitor_scheduled_query_rules_alert = {
  epic-failed-logon = {
    resource_group       = "shared"
    scope_workspace      = "shared"
    severity             = 2
    evaluation_frequency = "PT5M"
    window_duration      = "PT15M"
    description          = "More than 10 failed Epic logons in 15 minutes"
    criteria = {
      primary = {
        query                   = <<-KQL
          SecurityEvent
          | where EventID == 4625
          | where AccountType == "User"
          | summarize FailedLogons = count() by Computer, bin(TimeGenerated, 5m)
          | where FailedLogons > 10
        KQL
        operator                = "GreaterThan"
        threshold               = 0
        time_aggregation_method = "Count"
        failing_periods = {
          minimum_failing_periods_to_trigger_alert = 1
          number_of_evaluation_periods             = 1
        }
      }
    }
    action_groups = ["ops-critical"]
  }
}

3. Define Metric Alerts (Optional)

monitor_metric_alert raises an alert on a platform metric emitted by a storage account. Set storage_service to target a storage sub-service — fileServices scopes the alert to the file service so FileCapacity and the FileShare dimension are available. The example below fires when the ansible-files share exceeds 8 GiB of used capacity, the signal for issue #128 (grow the WBS file share before it fills).

monitor_metric_alert = {
  fileshare-capacity-high = {
    resource_group        = "shared"
    scope_storage_account = "diagsaphdev"
    storage_service       = "fileServices"
    description           = "File share used capacity above 8 GiB - increase the WBS share quota before it fills"
    severity              = 2
    frequency             = "PT15M"
    window_size           = "PT1H"
    criteria = {
      capacity = {
        metric_namespace = "Microsoft.Storage/storageAccounts/fileServices"
        metric_name      = "FileCapacity"
        aggregation      = "Average"
        operator         = "GreaterThan"
        threshold        = 8589934592
        dimensions = {
          share = {
            name     = "FileShare"
            operator = "Include"
            values   = ["ansible-files"]
          }
        }
      }
    }
    action_groups = ["ops-warning"]
  }
}

Note: FileCapacity is reported in bytes, so the threshold above (8589934592) is 8 GiB. The FileShare dimension narrows the alert to a single share; omit the dimension to alert on the file service total instead.

Variable Reference

`monitor_action_group`

Map of action group entries. Each key becomes the action group name (with prefix/suffix applied unless name is set).

Field	Type	Description	Default
`name`	string	Override the action group name	`{prefix}{key}{suffix}`
`resource_group`	string	`rgs` key for the containing resource group	Required
`short_name`	string	1–12 character identifier used in SMS messages	Required
`enabled`	bool	Whether receivers are notified when alerts fire	`true`
`email_receivers`	map	Email receivers — see below	`{}`
`sms_receivers`	map	SMS receivers — see below	`{}`
`webhook_receivers`	map	Webhook receivers — see below	`{}`
`tags`	map(string)	Additional tags merged with `default_tags`	`{}`

`email_receivers`

Field	Type	Description	Default
`email_address`	string	Email address to notify	Required
`use_common_alert_schema`	bool	Use the Common Alert Schema payload format	`true`

`sms_receivers`

Field	Type	Description	Default
`country_code`	string	Numeric country code without `+` (e.g. `"1"`)	Required
`phone_number`	string	Subscriber number, digits only	Required

`webhook_receivers`

Field	Type	Description	Default
`service_uri`	string	HTTPS endpoint that receives the webhook POST	Required
`use_common_alert_schema`	bool	Use the Common Alert Schema payload format	`true`

`monitor_scheduled_query_rules_alert`

Map of alert rule entries. Each key becomes the alert rule name (with prefix/suffix applied unless name is set).

Field	Type	Description	Default
`name`	string	Override the alert rule name	`{prefix}{key}{suffix}`
`display_name`	string	Human-readable display name shown in the Azure portal	`null`
`resource_group`	string	`rgs` key for the containing resource group	Required
`location`	string	Azure region for the alert rule	Inherits root `location`
`scope_workspace`	string	`log_analytics_workspace` key the KQL query runs against	Required
`severity`	number	`0`–`4`, `0` is most severe	Required
`evaluation_frequency`	string	ISO 8601 duration — how often the query runs. One of `PT1M`, `PT5M`, `PT10M`, `PT15M`, `PT30M`, `PT45M`, `PT1H`, `PT2H`, `PT3H`, `PT4H`, `PT5H`, `PT6H`, `P1D`	Required
`window_duration`	string	ISO 8601 duration — the time window each evaluation covers (bin size)	Required
`description`	string	Free-text description shown in the portal and alert payload	`null`
`enabled`	bool	Whether the alert rule is active	`true`
`auto_mitigation_enabled`	bool	Auto-resolve the alert when the condition clears	`true`
`action_groups`	list(string)	List of `monitor_action_group` keys to notify	`[]`
`criteria`	map	One or more criteria blocks — see below	Required
`tags`	map(string)	Additional tags merged with `default_tags`	`{}`

`criteria`

A single criteria block is enough for most alerts. Multiple criteria entries are AND-ed together by Azure Monitor.

Field	Type	Description	Default
`query`	string	KQL query to evaluate against the workspace	Required
`operator`	string	One of `Equal`, `GreaterThan`, `GreaterThanOrEqual`, `LessThan`, `LessThanOrEqual`	Required
`threshold`	number	Value the result is compared against	Required
`time_aggregation_method`	string	One of `Average`, `Count`, `Maximum`, `Minimum`, `Total`	Required
`metric_measure_column`	string	Column holding the metric value. Required when `time_aggregation_method` is not `Count`.	`null`
`resource_id_column`	string	Column holding a resource ID — enables per-resource alert routing	`null`
`dimensions`	map	Filter the result set by dimension values — see below	`{}`
`failing_periods`	object	How many consecutive evaluation windows must fail before the alert fires — see below	`null`

`dimensions`

Field	Type	Description	Default
`name`	string	Dimension column name from the query result	Required
`operator`	string	`Include` or `Exclude`	Required
`values`	list(string)	Dimension values to include or exclude. Use `["*"]` for all.	Required

`failing_periods`

Field	Type	Description	Default
`minimum_failing_periods_to_trigger_alert`	number	Number of failing periods required to fire (1–6)	Required
`number_of_evaluation_periods`	number	Total number of look-back periods to consider (1–6)	Required

Note: window_duration * number_of_evaluation_periods cannot exceed 48 hours.

`monitor_metric_alert`

Map of metric alert entries. Each key becomes the alert name (with prefix/suffix applied unless name is set).

Field	Type	Description	Default
`name`	string	Override the alert name	`{prefix}{key}{suffix}`
`resource_group`	string	`rgs` key for the resource group that holds the alert	Required
`scope_storage_account`	string	`storage_accounts` key the alert is scoped to	Required
`storage_service`	string	Storage sub-service to scope to (e.g. `fileServices`, `blobServices`). When set, the scope becomes `{storage account id}/{storage_service}/default`; when null the scope is the storage account itself.	`null`
`description`	string	Free-text description shown in the portal and alert payload	`null`
`severity`	number	`0`–`4`, `0` is most severe	`3`
`frequency`	string	ISO 8601 duration — how often the metric is evaluated. One of `PT1M`, `PT5M`, `PT15M`, `PT30M`, `PT1H`	`PT1M`
`window_size`	string	ISO 8601 duration — the window each evaluation aggregates over. Must be greater than `frequency`. One of `PT1M`, `PT5M`, `PT15M`, `PT30M`, `PT1H`, `PT6H`, `PT12H`, `P1D`	`PT5M`
`enabled`	bool	Whether the alert is active	`true`
`auto_mitigate`	bool	Auto-resolve the alert when the condition clears	`true`
`criteria`	map	One or more static criteria blocks — see below	Required
`action_groups`	list(string)	`monitor_action_group` keys to notify	`[]`
`tags`	map(string)	Additional tags merged with `default_tags`	`{}`

`criteria`

A single criteria block is enough for most alerts. Multiple criteria entries are AND-ed together by Azure Monitor.

Field	Type	Description	Default
`metric_namespace`	string	Metric namespace to monitor (e.g. `Microsoft.Storage/storageAccounts/fileServices`)	Required
`metric_name`	string	Metric name to monitor (e.g. `FileCapacity`)	Required
`aggregation`	string	One of `Average`, `Count`, `Minimum`, `Maximum`, `Total`	Required
`operator`	string	One of `Equals`, `GreaterThan`, `GreaterThanOrEqual`, `LessThan`, `LessThanOrEqual`	Required
`threshold`	number	Value that activates the alert	Required
`skip_metric_validation`	bool	Skip metric validation — needed only for custom metrics not yet emitted	`false`
`dimensions`	map	Filter the metric by dimension values — see below	`{}`

`dimensions`

Field	Type	Description	Default
`name`	string	Dimension name (e.g. `FileShare`)	Required
`operator`	string	`Include`, `Exclude`, or `StartsWith`	Required
`values`	list(string)	Dimension values to match. Use `["*"]` for all.	Required

Naming Convention

Names are composed from the prefix/suffix maps unless name is set on the entry. Both maps must include entries for monitor_action_group, monitor_scheduled_query_rules_alert, and monitor_metric_alert.

name_prefixes = {
  monitor_action_group                = "prod-"
  monitor_scheduled_query_rules_alert = "prod-"
  monitor_metric_alert                = "prod-"
}

name_suffixes = {
  monitor_action_group                = "-eastus2-ag"
  monitor_scheduled_query_rules_alert = "-eastus2-sqr"
  monitor_metric_alert                = "-eastus2-mal"
}

For an action group keyed ops-critical, the resolved name is prod-ops-critical-eastus2-ag. A log alert keyed epic-failed-logon becomes prod-epic-failed-logon-eastus2-sqr. A metric alert keyed fileshare-capacity-high becomes prod-fileshare-capacity-high-eastus2-mal.

Cross-Module Dependencies

monitor_scheduled_query_rules_alert references three other modules by key:

resource_group → key into rgs
scope_workspace → key into log_analytics_workspace
action_groups → list of keys into monitor_action_group

monitor_metric_alert references three other modules by key:

resource_group → key into rgs
scope_storage_account → key into storage_accounts
action_groups → list of keys into monitor_action_group

The module resolves these to resource IDs internally — never put raw IDs into the tfvars.