Checkly: Alerting, webhook, and Prometheus updates

We just shipped three updates based on customer feedback.

Alerting when a check fails to schedule

We saw some cases where an API check failed to run because there was an issue with the code executed in the setup script. However, we did not alert on this. Even more, we did register this error in the editor, so the experience was not as expected.

With this update, any check that fails to schedule will trigger an alert on all configured alerting channels with an explicit scheduling error. Here as an example from Slack.

Degraded recovery exposed on webhooks

Webhooks did not discriminate between a check recovering from a hard failure and a degradation because we did not expose that alert type. Now we do.

Alert types now are “ALERT_FAIL”, “ALERT_RECOVERY”, “ALERT_DEGRADED” and “ALERT_DEGRADED_RECOVERY”

Degraded status on dedicated Prometheus gauge

We exposed the degradation status of a check as boolean tag on our Prometheus endpoint.

checkly_check_status{check_name="API Check trigger",check_type="api",muted="false",activated="true",degraded="false" tags="api-checks,triggers,public"} 1

This was useful up to a degree but did not allow detailed inspection, filtering and aggregation of this status in tools like Grafana. Together with customer Nomanini we decided to treat "degraded" as its own distinct status. 1 indicates non-degraded, 0 indicates degraded.

For example:

# HELP checkly_check_degraded_status The degraded status of the last check. 1 is not-degraded, 0 is degraded
# TYPE checkly_check_degraded_status gauge
checkly_check_degraded_status{check_name="API Check trigger",check_type="api",muted="false",activated="true",degraded="false" tags="api-checks,triggers,public"} 1

That's it for now!

Changelog: Alerting, webhook, and Prometheus updates

Alerting when a check fails to schedule

Degraded recovery exposed on webhooks

Degraded status on dedicated Prometheus gauge

Related Articles

Mastering Prometheus Exporters: Techniques and Best Practices

The Advent of Monitoring, Day 3: Easy Monitoring for Self-Hosted Projects with Checkly

The Advent of Monitoring, Day 10: Better Observability Into Your Local Clickhouse Instance With Grafana and Prometheus

Changelog: customize your webhook with method, headers and more

Changelog: Extended Handlebars template functions & Groups API

Detect

Communicate

Resolve

Resources

Community

Alerting when a check fails to schedule

Degraded recovery exposed on webhooks

Degraded status on dedicated Prometheus gauge

Related Articles

Mastering Prometheus Exporters: Techniques and Best Practices

The Advent of Monitoring, Day 3: Easy Monitoring for Self-Hosted Projects with Checkly

The Advent of Monitoring, Day 10: Better Observability Into Your Local Clickhouse Instance With Grafana and Prometheus

Changelog: customize your webhook with method, headers and more

Changelog: Extended Handlebars template functions & Groups API