Alerting with Prometheus and AlertManager
How to setup Prometheus AlertManager and get a whole alerting pipeline setup.
Objectives and Goals
- Write and Deploy Prometheus Alert Rules
- Configure Prometheus to send Alerts to Alert Manager
- Setup AlertManager to receive Prometheus Alert
- Send a Slack message on Alert
Prerequisite
- Prometheus is already setup and running
Alert Rules
To begin writing and deploying alerts, you’ll need to modify your prometheus config file. Usually, its located at /etc/prometheus/prometheus.yml
.
If there’s no rule_files
key in root, add it. It should look something like:
rule_files:
- "alert.rules"
This tells prometheus that there’s a rules file that’s located at
alert.rules
in a path that’s relative to the location of the current config file.
Rules are basically instructions to Prometheus and can be used for Alerting or Recording. Alerting is to tell Prometheus to raise an alert if certain conditions are met and Recording rules is used to precompute/process time-series as they come in. For example, to rewrite labels into a new time_series.
With your alert.rules
file, now you can start writing ALERTING expressions to monitor a Prometheus Metrics and its value.
# Alert Rules
ALERT AppCrash
IF firehost_value_metric_bbs_crashed_actual_lr_ps > 10
FOR 15s
LABELS { severity="critical" }
ANNOTATIONS {
summary = "Number of consecutive crashes in the past 30 seconds"
description = "The number of consecutive crashes in the past 30 seconds is at: {{$value}}. This is dangerous and rectified immediately"
}
The FOR indicates the duration to wait before firing an alert out. This is important if you want to prevent false positives. By adding a higher duration, it allows the failure to resolve itself before sending out a message.
Now, let’s reload your Prometheus by sending a SIGHUP
to the process.
$ ps auwx | grep prometheus
$ kill -1 <prometheus_pid>
When you access http://
Setup AlertManager
When a alert is triggered, Prometheus sends a payload to the urls defined in Prometheus’ /api/v1/alertmanagers
.
It might be possible to build your own AlertManagers but in the meantime, let’s use the one provided by Prometheus.
Download the binaries here.
Download it onto a VM which you want to run, un-tar it and you should see two files here. One simple.yml
and alertmanager
.
Lets keep it simple for now and get alertmanager hooked up.
Run the binary on a server that is internet accessible. I would place it in the same server as Prometheus to start off.
$ ./alertmanager -config.file=simple.yml
Remember to note the port that its listening to.
Now on your Prometheus startup script, re-run Prometheus with an -alertmanager.url http://<server with prometheus>:<port>
If everything is ok, when you curl Prometheus’ alertmanagers endpoint, you should see a new URL under activeAlertmanagers
.
curl http://localhost:900/api/v1/alertmanagers
Configure AlertManager to send a Slack Alert
Moving back to AlertManager, right now if an alert is triggered, you’ll probably be unaware. Lets hook it into something that’s more visible. Like a Slack Channel.
First thing to do is to get your Incoming Webhooks url ready. It should look something like https://hooks.slack.com/services/T1Ydfdsfab/B31uy2389/Edfsdfdsfsdf
.
Add\Modify the following fields in your simple.yml
file in your AlertManager.
global:
...
slack_api_url: "https://hooks.slack.com/services/sdfsfdsf..."
route:
receiver: slack-alert # replace this field
receivers:
- name: "slack-alert" # you can name this anything you want
slack_configs:
- channel: "#channel"
username: "The Watchmen"
text: "{{ .CommonAnnotations.description }}"
send_resolved: true
Do a SIGHUP
(kill -1 <pid>
) on the AlertManager process.
And you’re done!
You should see something like: