We have been using Prometheus as our monitoring solution for all of our Kubernetes clusters for a while. While providing us with much valuable insight into our services, it still retains some flaws. Prometheus lacks an option to override alert rules.
There’s practically no easy way to design alerting rules that can be altered or disabled for specific cases, for example when you need to disable certain alerts for particular deployments or change the thresholds for different namespaces, clusters, etc.
Let’s see how we were able to overcome this obstacle.
Our Monitoring Setup
We primarily use Prometheus and Grafana as our monitoring solution for our cloud infrastructure in AWS. Together with Alertmanager for dispatching alerts to Slack and VictorOps.
The monitoring is custom-tailored to our needs to monitor all of our Kubernetes clusters, together with monitoring AWS Cloudwatch metrics and other generic EC2 instances.
Alerting
Our alerting rules are mostly revolving around our Kubernetes clusters. They alert us for various situations regarding the health of the whole cluster and its individual components. They are designed to be as generic as possible, so we can reuse them for other clusters when we need to. And that’s where the problem lies.
The problem…
Prometheus, despite how good a monitoring solution it is, does not provide a way to override alerting rules. Imagine you have an alert that monitors CPU utilization of each node in your cluster and dispatches a warning alert if it status over 90% for some amount of time.
The alert would look like this:
alert: K8SNodeCPUUtilization
Expr: node:node_container_cpu_usage_seconds:irate1m / node:node_allocatable_cpu_cores:sum) * 100 > 90
for: 5m
labels:
group: system
severity: high
annotations:
identifier: '{{ $labels.kubernetes_cluster }}/{{ $labels.instance_name }}'
msg: Node CPU ultilization by containers has been over 90% for more than 5m.
reason: Node CPU ultilization by containers is high. Recalculate CPU request and
limits.
value: '{{ printf "%.2f" $value }}%'
gistfile1.txt hosted with ❤ by GitHub
Now, what if there’s a special case where you need to alter this alert for a set of nodes? Let’s say we want to dispatch this alert at much longer intervals for our dev cluster. We also want to lower it’s severity to warning.
This is where it gets tricky…
We can add another alert with mostly the same values and add a label called kubernetes_cluster, to filter out just the nodes of our dev cluster.
The alert would look like this:
alert: K8SNodeCPUUtilizationDev
Expr: node:node_container_cpu_usage_seconds:irate1m{kubernetes_cluster=”kube-dev”} / node:node_allocatable_cpu_cores:sum{kubernetes_cluster=”kube-dev”}) * 100 > 90
for: 1h
labels:
group: system
severity: warning
annotations:
identifier: '{{ $labels.kubernetes_cluster }}/{{ $labels.instance_name }}'
msg: Node CPU ultilization by containers has been over 90% for more than 1h.
reason: Node CPU ultilization by containers is high. Recalculate CPU request and
limits.
value: '{{ printf "%.2f" $value }}%'
gistfile1.txt hosted with ❤ by GitHub
Great, now we have an alert that will notify us much less frequently than the original one. But we also need to alter the original one, since it’s matching all of the nodes that are being monitored.
alert: K8SNodeCPUUtilization
Expr: node:node_container_cpu_usage_seconds:irate1m{kubernetes_cluster!="kube-dev"} / node:node_allocatable_cpu_cores:sum{kubernetes_cluster!="kube-dev"}) * 100 > 90
for: 5m
labels:
group: system
severity: high
annotations:
identifier: '{{ $labels.kubernetes_cluster }}/{{ $labels.instance_name }}'
msg: Node CPU ultilization by containers has been over 90% for more than 5m.
reason: Node CPU ultilization by containers is high. Recalculate CPU request and
limits.
value: '{{ printf "%.2f" $value }}%'
gistfile1.txt hosted with ❤ by GitHub
Now it works as expected. We get the notification after 5 minutes for every cluster, except dev.
Imagine you want to do this for multiple rules, at multiple levels… You have a specific namespace that needs custom rules, a deployment in that namespace that also needs custom rules… It can get messy rather quickly.
How we’ve solved it
We’ve approached this problem from several directions and decided to write a custom pre-processor that would automatically generate a set of overridden Prometheus alerting rules based on a simple set of attributes.
We wanted to keep this process as simple as possible, so we introduced just two new keys into the standard Prometheus config for alerting rules.
- alert: K8SNodeCPUUtilizationDev
override: ["K8SNodeCPUUtilization"]
enabled: true
expr: K8SNodeCPUUtilization{kubernetes_cluster="kube-dev"}
for: 1h
labels:
severity: warning
annotations:
identifier: '{{ $labels.kubernetes_cluster }}/{{ $labels.instance_name }}'
msg: Node CPU ultilization by containers has been over 90% for more than 5m.
reason: Node CPU ultilization by containers is high. Recalculate CPU request and limits.
value: '{{ printf "%.2f" $value }}%'
gistfile1.txt hosted with ❤ by GitHub
As you can see in the example above, we added a list attribute called override and a bool attribute called enabled. The rest of the structure is the standard way to define an alert in Prometheus. Using override, you are able to specify a set of rules, you want to override. In addition, you are able to use the enabled attribute to disable the rule for a subset of resources all together.
Let me explain the logic behind it.
The pre-processing mechanism
We narrowed the process down to simple string operations on the alerting rules structures.
The program loops over all of the alerting rules until it finds an override rule. Then it finds all the matching rules we want to override and apply new filters to it. The result consists of two separate rules, the generic one with a new expression that negates the override expression and a new rule based on the overriding rule.
Disabling rules for a subset of resources
In certain cases, you don’t want to be alerted for a subset of resources. You are able to silence rules through Alertmanager, but this is only a temporary solution. It also doesn’t affect the list of rules displayed in Prometheus, they will still appear as firing.
This is when you can use the enabled attribute. If it is set to false, no new rule is created. Instead, only the generic rule(s) are altered.
- alert: DisableKubeDev
override: ["K8S.*"]
enabled: false
expr: '{kubernetes_cluster="kube-dev"}'
gistfile1.txt hosted with ❤ by GitHub
In the example above, we’ve disabled all of the rules starting with K8S for our dev cluster. This means that all expression of matching rules were altered by adding kubernetes_cluster!=”kube-dev” expression.
Limitations
Currently, the program does not allow you to override overriding rules. We also advise you to create a recording rule for more complicated expressions and use them as the alert expression. This way, the evaluation and alteration of expressions are more accurate. It will also give your alerting rules a cleaner look.
Where can I get it
The tool is available at our Github page. If you have any questions or suggestions or feature requests, you can create an issue on Github or contact us directly through our website.
Conclusion
The tool we’ve created enables you to create overriding rules in Prometheus. This allows you to maintain a set of generic rules for all of your clusters and alter or disable them for specific subsets of resources, keeping the codebase simple and clean.