Skip to content

Monitor Alerts

Overview

Log alerts are produced by two cooperating modules: monitor_action_group defines who gets notified, and monitor_scheduled_query_rules_alert defines what to alert on by running a KQL query against a Log Analytics Workspace on a schedule. The alert module is built on azurerm_monitor_scheduled_query_rules_alert_v2, the current generation of Log Analytics alert rules.

Metric alerts are a separate kind of alert produced by the monitor_metric_alert module. Where a log alert runs a KQL query against a Log Analytics Workspace, a metric alert evaluates a platform metric emitted directly by the target resource — no workspace and no diagnostic logs required. Use a metric alert when the signal is a first-class Azure metric, such as file share used capacity, which is what drives the WBS file share sizing alert below.

Module Structure

Module Azure Resource Purpose
monitor_action_group azurerm_monitor_action_group Notification target — email, SMS, or webhook receivers grouped under one name
monitor_scheduled_query_rules_alert azurerm_monitor_scheduled_query_rules_alert_v2 KQL-based log alert evaluated on a schedule against a Log Analytics Workspace
monitor_metric_alert azurerm_monitor_metric_alert Platform-metric alert evaluated directly against a resource — e.g. file share used capacity
log_analytics_workspace azurerm_log_analytics_workspace Workspace the alert queries — referenced by key, see Log Analytics Workspace

Architecture

  1. Action Groups are defined once per environment and reused across alerts. Each group has a short name (used in SMS) and one or more receivers.
  2. Log Analytics Workspaces receive diagnostic logs from monitored resources (Key Vault, Recovery Vault, VMs via VM Insights, etc.) — see Monitor Diagnostic Setting.
  3. Scheduled Query Rules run KQL queries against a workspace on a schedule. When the query returns rows meeting the threshold for the configured number of failing periods, the rule fires and notifies its action groups.
  4. Metric Alerts (monitor_metric_alert) evaluate a platform metric on a storage account (optionally a storage sub-service such as fileServices) and notify the same action groups. They need no workspace and no diagnostic-log routing.

Usage

1. Define At Least One Action Group

An action group is required — alerts without an action group have nowhere to notify.

monitor_action_group = {
  ops-critical = {
    resource_group = "shared"
    short_name     = "opscrit"
    email_receivers = {
      oncall = { email_address = "[email protected]" }
    }
  }
  ops-warning = {
    resource_group = "shared"
    short_name     = "opswarn"
    email_receivers = {
      platform = { email_address = "[email protected]" }
    }
  }
}

2. Define Log Alerts

monitor_scheduled_query_rules_alert accepts arbitrary KQL alerts evaluated on a schedule against a Log Analytics Workspace.

monitor_scheduled_query_rules_alert = {
  epic-failed-logon = {
    resource_group       = "shared"
    scope_workspace      = "shared"
    severity             = 2
    evaluation_frequency = "PT5M"
    window_duration      = "PT15M"
    description          = "More than 10 failed Epic logons in 15 minutes"
    criteria = {
      primary = {
        query                   = <<-KQL
          SecurityEvent
          | where EventID == 4625
          | where AccountType == "User"
          | summarize FailedLogons = count() by Computer, bin(TimeGenerated, 5m)
          | where FailedLogons > 10
        KQL
        operator                = "GreaterThan"
        threshold               = 0
        time_aggregation_method = "Count"
        failing_periods = {
          minimum_failing_periods_to_trigger_alert = 1
          number_of_evaluation_periods             = 1
        }
      }
    }
    action_groups = ["ops-critical"]
  }
}

3. Define Metric Alerts (Optional)

monitor_metric_alert raises an alert on a platform metric emitted by a storage account. Set storage_service to target a storage sub-service — fileServices scopes the alert to the file service so FileCapacity and the FileShare dimension are available. The example below fires when the ansible-files share exceeds 8 GiB of used capacity, the signal for issue #128 (grow the WBS file share before it fills).

monitor_metric_alert = {
  fileshare-capacity-high = {
    resource_group        = "shared"
    scope_storage_account = "diagsaphdev"
    storage_service       = "fileServices"
    description           = "File share used capacity above 8 GiB - increase the WBS share quota before it fills"
    severity              = 2
    frequency             = "PT15M"
    window_size           = "PT1H"
    criteria = {
      capacity = {
        metric_namespace = "Microsoft.Storage/storageAccounts/fileServices"
        metric_name      = "FileCapacity"
        aggregation      = "Average"
        operator         = "GreaterThan"
        threshold        = 8589934592
        dimensions = {
          share = {
            name     = "FileShare"
            operator = "Include"
            values   = ["ansible-files"]
          }
        }
      }
    }
    action_groups = ["ops-warning"]
  }
}

Note: FileCapacity is reported in bytes, so the threshold above (8589934592) is 8 GiB. The FileShare dimension narrows the alert to a single share; omit the dimension to alert on the file service total instead.

Variable Reference

monitor_action_group

Map of action group entries. Each key becomes the action group name (with prefix/suffix applied unless name is set).

Field Type Description Default
name string Override the action group name {prefix}{key}{suffix}
resource_group string rgs key for the containing resource group Required
short_name string 1–12 character identifier used in SMS messages Required
enabled bool Whether receivers are notified when alerts fire true
email_receivers map Email receivers — see below {}
sms_receivers map SMS receivers — see below {}
webhook_receivers map Webhook receivers — see below {}
tags map(string) Additional tags merged with default_tags {}

email_receivers

Field Type Description Default
email_address string Email address to notify Required
use_common_alert_schema bool Use the Common Alert Schema payload format true

sms_receivers

Field Type Description Default
country_code string Numeric country code without + (e.g. "1") Required
phone_number string Subscriber number, digits only Required

webhook_receivers

Field Type Description Default
service_uri string HTTPS endpoint that receives the webhook POST Required
use_common_alert_schema bool Use the Common Alert Schema payload format true

monitor_scheduled_query_rules_alert

Map of alert rule entries. Each key becomes the alert rule name (with prefix/suffix applied unless name is set).

Field Type Description Default
name string Override the alert rule name {prefix}{key}{suffix}
display_name string Human-readable display name shown in the Azure portal null
resource_group string rgs key for the containing resource group Required
location string Azure region for the alert rule Inherits root location
scope_workspace string log_analytics_workspace key the KQL query runs against Required
severity number 04, 0 is most severe Required
evaluation_frequency string ISO 8601 duration — how often the query runs. One of PT1M, PT5M, PT10M, PT15M, PT30M, PT45M, PT1H, PT2H, PT3H, PT4H, PT5H, PT6H, P1D Required
window_duration string ISO 8601 duration — the time window each evaluation covers (bin size) Required
description string Free-text description shown in the portal and alert payload null
enabled bool Whether the alert rule is active true
auto_mitigation_enabled bool Auto-resolve the alert when the condition clears true
action_groups list(string) List of monitor_action_group keys to notify []
criteria map One or more criteria blocks — see below Required
tags map(string) Additional tags merged with default_tags {}

criteria

A single criteria block is enough for most alerts. Multiple criteria entries are AND-ed together by Azure Monitor.

Field Type Description Default
query string KQL query to evaluate against the workspace Required
operator string One of Equal, GreaterThan, GreaterThanOrEqual, LessThan, LessThanOrEqual Required
threshold number Value the result is compared against Required
time_aggregation_method string One of Average, Count, Maximum, Minimum, Total Required
metric_measure_column string Column holding the metric value. Required when time_aggregation_method is not Count. null
resource_id_column string Column holding a resource ID — enables per-resource alert routing null
dimensions map Filter the result set by dimension values — see below {}
failing_periods object How many consecutive evaluation windows must fail before the alert fires — see below null

dimensions

Field Type Description Default
name string Dimension column name from the query result Required
operator string Include or Exclude Required
values list(string) Dimension values to include or exclude. Use ["*"] for all. Required

failing_periods

Field Type Description Default
minimum_failing_periods_to_trigger_alert number Number of failing periods required to fire (1–6) Required
number_of_evaluation_periods number Total number of look-back periods to consider (1–6) Required

Note: window_duration * number_of_evaluation_periods cannot exceed 48 hours.

monitor_metric_alert

Map of metric alert entries. Each key becomes the alert name (with prefix/suffix applied unless name is set).

Field Type Description Default
name string Override the alert name {prefix}{key}{suffix}
resource_group string rgs key for the resource group that holds the alert Required
scope_storage_account string storage_accounts key the alert is scoped to Required
storage_service string Storage sub-service to scope to (e.g. fileServices, blobServices). When set, the scope becomes {storage account id}/{storage_service}/default; when null the scope is the storage account itself. null
description string Free-text description shown in the portal and alert payload null
severity number 04, 0 is most severe 3
frequency string ISO 8601 duration — how often the metric is evaluated. One of PT1M, PT5M, PT15M, PT30M, PT1H PT1M
window_size string ISO 8601 duration — the window each evaluation aggregates over. Must be greater than frequency. One of PT1M, PT5M, PT15M, PT30M, PT1H, PT6H, PT12H, P1D PT5M
enabled bool Whether the alert is active true
auto_mitigate bool Auto-resolve the alert when the condition clears true
criteria map One or more static criteria blocks — see below Required
action_groups list(string) monitor_action_group keys to notify []
tags map(string) Additional tags merged with default_tags {}

criteria

A single criteria block is enough for most alerts. Multiple criteria entries are AND-ed together by Azure Monitor.

Field Type Description Default
metric_namespace string Metric namespace to monitor (e.g. Microsoft.Storage/storageAccounts/fileServices) Required
metric_name string Metric name to monitor (e.g. FileCapacity) Required
aggregation string One of Average, Count, Minimum, Maximum, Total Required
operator string One of Equals, GreaterThan, GreaterThanOrEqual, LessThan, LessThanOrEqual Required
threshold number Value that activates the alert Required
skip_metric_validation bool Skip metric validation — needed only for custom metrics not yet emitted false
dimensions map Filter the metric by dimension values — see below {}

dimensions

Field Type Description Default
name string Dimension name (e.g. FileShare) Required
operator string Include, Exclude, or StartsWith Required
values list(string) Dimension values to match. Use ["*"] for all. Required

Naming Convention

Names are composed from the prefix/suffix maps unless name is set on the entry. Both maps must include entries for monitor_action_group, monitor_scheduled_query_rules_alert, and monitor_metric_alert.

name_prefixes = {
  monitor_action_group                = "prod-"
  monitor_scheduled_query_rules_alert = "prod-"
  monitor_metric_alert                = "prod-"
}

name_suffixes = {
  monitor_action_group                = "-eastus2-ag"
  monitor_scheduled_query_rules_alert = "-eastus2-sqr"
  monitor_metric_alert                = "-eastus2-mal"
}

For an action group keyed ops-critical, the resolved name is prod-ops-critical-eastus2-ag. A log alert keyed epic-failed-logon becomes prod-epic-failed-logon-eastus2-sqr. A metric alert keyed fileshare-capacity-high becomes prod-fileshare-capacity-high-eastus2-mal.

Cross-Module Dependencies

monitor_scheduled_query_rules_alert references three other modules by key:

  • resource_group → key into rgs
  • scope_workspace → key into log_analytics_workspace
  • action_groups → list of keys into monitor_action_group

monitor_metric_alert references three other modules by key:

  • resource_group → key into rgs
  • scope_storage_account → key into storage_accounts
  • action_groups → list of keys into monitor_action_group

The module resolves these to resource IDs internally — never put raw IDs into the tfvars.