Skip to main content
Article
pagerduty-aiaiopsincident-managementalert-correlationautomationsrenoise-reductiondevops

Automate Incident Triage with PagerDuty AI

Use PagerDuty AI to group related alerts, reduce notification noise, and automate diagnostics. This guide shows how to configure alert grouping and send rich event data that the AI uses to triage incidents and suggest actions.

intermediate30 min5 steps
The play
  1. Enable Intelligent Alert Grouping
    First, enable the core AI feature for noise reduction. In your PagerDuty account, navigate to a Service's configuration page. Go to the 'Settings' tab and find the 'Reduce Noise' section. Select 'Intelligent' for the 'Alert Grouping' option. This allows PagerDuty AI to analyze and group incoming alerts based on content and timing.
  2. Structure Rich Event Payloads
    PagerDuty AI thrives on data. To empower it, send detailed events using the Events API v2. The most important part is the `custom_details` object in your JSON payload. Include key context like `runbook_url`, `affected_component`, or `error_log` to fuel AI-driven triage and diagnostics.
  3. Trigger and Correlate Test Events
    Simulate an event storm to see PagerDuty AI in action. Run the starter script multiple times in quick succession. Slightly vary the `summary` (e.g., 'High CPU on web-prod-02', 'High CPU on web-prod-03') but keep the `source` and `custom_details` similar. This mimics a real-world scenario where multiple nodes fail.
  4. Review AI-Generated Incident Insights
    Open the incident created in your PagerDuty dashboard. In the incident timeline, you will see that PagerDuty AI has grouped your multiple alerts into one incident. Check the 'Related Alerts' tab to see the grouped events. The AI may also add notes about similar past incidents or generate a human-readable status update.
  5. Configure an Automated Diagnostic Action
    Go beyond grouping by automating responses. Navigate to 'Automation' > 'Automation Actions'. Create a new action that runs a script (e.g., via AWS Systems Manager or a webhook) to gather diagnostics. You can then configure rules to have PagerDuty AI automatically run this action when specific incidents occur, attaching the output directly to the incident.
Starter code
import os
import sys
import json
import requests

# Get routing key from environment variable
ROUTING_KEY = os.getenv("PAGERDUTY_ROUTING_KEY")
if not ROUTING_KEY:
    print("Error: PAGERDUTY_ROUTING_KEY environment variable not set.")
    sys.exit(1)

# Get summary from command-line argument or use a default
summary = sys.argv[1] if len(sys.argv) > 1 else "Default Test Event from AI Action Pack"

def send_pagerduty_event(summary_text):
    """Sends a trigger event to the PagerDuty Events API v2."""
    url = "https://events.pagerduty.com/v2/enqueue"
    
    payload = {
        "routing_key": ROUTING_KEY,
        "event_action": "trigger",
        "payload": {
            "summary": summary_text,
            "source": "monitoring-agent.corp.example.com",
            "severity": "critical",
            "component": "webapp-cluster",
            "group": "prod-web-servers",
            "custom_details": {
                "runbook_url": "http://wiki.example.com/cpu-utilization",
                "affected_service": "user-database",
                "cpu_load_percent": 92
            }
        }
    }
    
    headers = {
        "Content-Type": "application/json"
    }
    
    try:
        response = requests.post(url, data=json.dumps(payload), headers=headers, timeout=10)
        response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
        print(f"Successfully sent event to PagerDuty. Status: {response.status_code}")
        print(f"Response: {response.json()}")
    except requests.exceptions.RequestException as e:
        print(f"Error sending event to PagerDuty: {e}")

if __name__ == "__main__":
    # To run: 
    # 1. Save this as send_pagerduty_event.py
    # 2. Get a routing key from a PagerDuty service integration (Events API V2)
    # 3. export PAGERDUTY_ROUTING_KEY='your_key_here'
    # 4. python send_pagerduty_event.py "Your custom incident summary"
    send_pagerduty_event(summary)
Automate Incident Triage with PagerDuty AI — Action Pack