Alerting with Prometheus and AlertManager

2017-02-28

How to setup Prometheus AlertManager and get a whole alerting pipeline setup.

Objectives and Goals

Write and Deploy Prometheus Alert Rules
Configure Prometheus to send Alerts to Alert Manager
Setup AlertManager to receive Prometheus Alert
Send a Slack message on Alert

Prerequisite

Prometheus is already setup and running

Alert Rules

To begin writing and deploying alerts, you’ll need to modify your prometheus config file. Usually, its located at /etc/prometheus/prometheus.yml.

If there’s no rule_files key in root, add it. It should look something like:

rule_files:
  - "alert.rules"

This tells prometheus that there’s a rules file that’s located at alert.rules in a path that’s relative to the location of the current config file.

Rules are basically instructions to Prometheus and can be used for Alerting or Recording. Alerting is to tell Prometheus to raise an alert if certain conditions are met and Recording rules is used to precompute/process time-series as they come in. For example, to rewrite labels into a new time_series.

With your alert.rules file, now you can start writing ALERTING expressions to monitor a Prometheus Metrics and its value.

# Alert Rules

ALERT AppCrash
  IF firehost_value_metric_bbs_crashed_actual_lr_ps > 10
  FOR 15s
  LABELS { severity="critical" }
  ANNOTATIONS {
   summary = "Number of consecutive crashes in the past 30 seconds"
   description = "The number of consecutive crashes in the past 30 seconds is at: {{$value}}. This is dangerous and rectified immediately"
  }

The FOR indicates the duration to wait before firing an alert out. This is important if you want to prevent false positives. By adding a higher duration, it allows the failure to resolve itself before sending out a message.

Now, let’s reload your Prometheus by sending a SIGHUP to the process.

$ ps auwx | grep prometheus
$ kill -1 <prometheus_pid>

When you access http://:, you should see your alert appear in the “ALERT” tab. If you don’t, check that your rules and prometheus configuration are not malformed.

Setup AlertManager

When a alert is triggered, Prometheus sends a payload to the urls defined in Prometheus’ /api/v1/alertmanagers.

It might be possible to build your own AlertManagers but in the meantime, let’s use the one provided by Prometheus.

Download the binaries here.

Download it onto a VM which you want to run, un-tar it and you should see two files here. One simple.yml and alertmanager.

Lets keep it simple for now and get alertmanager hooked up.

Run the binary on a server that is internet accessible. I would place it in the same server as Prometheus to start off.

$ ./alertmanager -config.file=simple.yml

Remember to note the port that its listening to.

Now on your Prometheus startup script, re-run Prometheus with an -alertmanager.url http://<server with prometheus>:<port>

If everything is ok, when you curl Prometheus’ alertmanagers endpoint, you should see a new URL under activeAlertmanagers.

curl http://localhost:900/api/v1/alertmanagers

Configure AlertManager to send a Slack Alert

Moving back to AlertManager, right now if an alert is triggered, you’ll probably be unaware. Lets hook it into something that’s more visible. Like a Slack Channel.

First thing to do is to get your Incoming Webhooks url ready. It should look something like https://hooks.slack.com/services/T1Ydfdsfab/B31uy2389/Edfsdfdsfsdf.

Add\Modify the following fields in your simple.yml file in your AlertManager.

global:
  ...
  slack_api_url: "https://hooks.slack.com/services/sdfsfdsf..."

route:
  receiver: slack-alert # replace this field

receivers:
- name: "slack-alert" # you can name this anything you want
  slack_configs:
  - channel: "#channel"
    username: "The Watchmen"
    text: "{{ .CommonAnnotations.description }}"
    send_resolved: true

Do a SIGHUP (kill -1 <pid>) on the AlertManager process.

And you’re done!

You should see something like: