$120 tested Claude codes · real before/after data · Full tier $15 one-timebuy --sheet=15 →
$Free 40-page Claude guide — setup, 120 prompt codes, MCP servers, AI agents. download --free →
clskills.sh — terminal v2.4 — 2,347 skills indexed● online
[CL]Skills_
Monitoring & Loggingintermediate

Alert Rules

Share

Configure alerting rules and notifications

Works with OpenClaude

You are a monitoring engineer. The user wants to configure alerting rules and notifications to trigger actions when metrics cross thresholds.

What to check first

  • Review your monitoring stack (Prometheus, Grafana, AlertManager, or cloud provider native solution)
  • Run curl http://localhost:9090/api/v1/alerts to see current active alerts in Prometheus
  • Verify notification channels are configured (email, Slack, PagerDuty, webhooks)

Steps

  1. Define alert conditions using PromQL expressions — write queries that return non-zero values when the alert should fire (e.g., up{job="api"} == 0 for down services)
  2. Create alert rules in prometheus.yml or a separate rules file with alert_rules.yml, specifying alert, expr, for, and annotations
  3. Set the for duration to prevent flapping — typically 5m means the condition must persist 5 minutes before firing
  4. Add labels to categorize alerts by severity (critical, warning) and team ownership
  5. Configure annotations with summary and description templates using {{ $labels.instance }} to pass context
  6. Point AlertManager at your rules file and configure global.resolve_timeout (default 5m)
  7. Define notification routes in AlertManager's alertmanager.yml with receiver, group_by, and group_wait settings
  8. Test rule syntax with promtool check rules /path/to/rules.yml before deploying

Code

# prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - 'alert_rules.yml'

alerting:
  alertmanagers:
    - static_configs:
        - targets:
            - 'localhost:9093'

---
# alert_rules.yml
groups:
  - name: application_alerts
    interval: 15s
    rules:
      - alert: HighErrorRate
        expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
        for: 5m
        labels:
          severity: critical
          team: backend
        annotations:
          summary: "High error rate on {{ $labels.instance }}"
          description: "Error rate is {{ $value | humanizePercentage }} for {{ $labels.job }}"

      - alert: ServiceDown
        expr: up{job="api"} == 0
        for: 1m
        labels:
          severity: critical
          team: platform
        annotations:
          summary: "{{ $labels.instance }} is down"
          description: "Service {{ $labels.job }} on {{ $labels.instance }} has been unreachable for 1 minute"

      - alert: HighMemoryUsage
        expr: (1 - (node_memory_MemAvailable_bytes / node_memory_Mem

Note: this example was truncated in the source. See the GitHub repo for the latest full version.

Common Pitfalls

  • Treating this skill as a one-shot solution — most workflows need iteration and verification
  • Skipping the verification steps — you don't know it worked until you measure
  • Applying this skill without understanding the underlying problem — read the related docs first

When NOT to Use This Skill

  • When a simpler manual approach would take less than 10 minutes
  • On critical production systems without testing in staging first
  • When you don't have permission or authorization to make these changes

How to Verify It Worked

  • Run the verification steps documented above
  • Compare the output against your expected baseline
  • Check logs for any warnings or errors — silent failures are the worst kind

Production Considerations

  • Test in staging before deploying to production
  • Have a rollback plan — every change should be reversible
  • Monitor the affected systems for at least 24 hours after the change

Quick Info

Difficultyintermediate
Version1.0.0
AuthorClaude Skills Hub
monitoringalertsnotifications

Install command:

curl -o ~/.claude/skills/alert-rules.md https://claude-skills-hub.vercel.app/skills/monitoring/alert-rules.md

Related Monitoring & Logging Skills

Other Claude Code skills in the same category — free to download.

Want a Monitoring & Logging skill personalized to YOUR project?

This is a generic skill that works for everyone. Our AI can generate one tailored to your exact tech stack, naming conventions, folder structure, and coding patterns — with 3x more detail.