Summary  
This chapter shows how to implement code for detecting predefined conditions on metrics (such as thresholds, anomalies, heartbeats, or composite rules) and automatically sending notifications through various channels.

General domain of usage  
System monitoring and incident response

Alerting and notifications are essential features in DevOps that help you stay informed about the health and performance of your systems. **Alerting** means automatically detecting when something goes wrong or unusual in your environment, such as a server going down or a spike in error rates. **Notifications** are the messages you receive—by email, chat, or other channels—when an alert is triggered.

These tools are important for proactive system monitoring because they allow you to spot problems before they impact users or cause major outages. By setting up alerts and notifications, you can:

- Detect issues early, such as performance drops or service interruptions;
- Respond quickly to incidents, minimizing downtime and user impact;
- Prioritize which problems need immediate attention;
- Improve overall system reliability and user satisfaction.

Effective alerting and notification systems help your team act fast, resolve problems efficiently, and maintain trust in your services.

### Common Types of Alerts

When setting up observability in DevOps, you will encounter several main types of alerts:

- **Threshold alerts**: Triggered when a metric crosses a set value, such as CPU usage above 80%; 
- **Anomaly alerts**: Triggered when a metric behaves unusually compared to its normal pattern, like a sudden spike in error rates;
- **Heartbeat alerts**: Triggered when a system or service fails to send a regular signal, indicating potential downtime;
- **Composite alerts**: Triggered by a combination of conditions, such as high memory usage and slow response times at the same time.

### Notification Channels

Once an alert is triggered, you need to notify the right people. Common notification channels include:

- **Email**: Sends alerts to an inbox for tracking and escalation;
- **SMS**: Sends urgent alerts directly to a mobile device;
- **Chat platforms**: Sends alerts to tools like Slack or Microsoft Teams for quick team response;
- **Incident management tools**: Integrates with platforms like PagerDuty or Opsgenie for automated incident handling.

### Best Practices for Setting Thresholds

To reduce noise and ensure important issues are addressed, follow these best practices:

- Set thresholds based on historical data, not just default values;
- Adjust thresholds to minimize false positives and avoid alert fatigue;
- Use different thresholds for different times (such as business hours vs. off-hours);
- Regularly review and tune thresholds as systems and workloads change;
- Always test alerts to confirm they trigger as expected and reach the right people.

What is a key benefit of using alerting and notifications in DevOps?

A beginner-friendly course introducing the essential concepts and practical applications of observability in DevOps. Learn how logs, metrics, and traces provide visibility into systems, how to use dashboards and alerts, and how to interpret service health using SLIs and SLOs. Each chapter combines clear explanations with real-world text-based examples to build foundational skills for modern DevOps workflows.

Learn the foundational concepts of observability, its role in DevOps, and why it is critical for modern software systems.

Dive deeper into each pillar of observability and learn how to apply them using practical examples.

Explore how observability data is used in real-world DevOps workflows, including alerting, dashboards, SLIs, SLOs, and incident analysis.

Alerting and Notifications

Common Types of Alerts

Notification Channels

Best Practices for Setting Thresholds