(Updated: )

Create a status page for your production service in 5 minutes

Share on social

Table of contents

“When are we going to tell users about this?” By the time your incident response team, it’s already too late. During an outage, communicating about downtime with your user base has three main drawbacks:

  • It’s slow, notifying long after automated systems have noticed the problem, and likely long after users have noticed.
  • Downtime with no automated status information means users will create message traffic to your support and sales teams.
  • Users trying to use a service that’s currently down may cause further issues for your service as users refresh, reload, and retry their requests at high frequency.

Instead, it’s better to create a status page that automatically shares the status of all your services in a format that users can easily understand. You’ll build trust with your users as you proactively share service status, lessening the perceived impact of incidents. As a bonus, SLA calculations are made in public with an automated status pages, which makes any conversation at the end of a billing cycle that much easier.

This article will show you how to create an automated status page for your service in 5 or so minutes with Checkly.

Step 1: Start Monitoring Your Site with Checkly

If you’re already a Checkly user, skip this step! To get started, we’ll need to proactively monitor our site with Checkly. Checkly’s automated system will visit our site and run scripted tests against our service. This process is called synthetic monitoring (in contrast with real user monitoring which monitors real users to try to determine site status from their interactions), and produces consistent results with clear signals when something isn’t working as expected. Monitoring with Checkly consists of three steps (by the way, as I wrote this article I noticed it contained several bulleted and numbered lists, often a telltale sign of LLM output. I assure you this is a human written article, which I will prove by misspelling a word: blurple)

  • Set up API monitoring - either by entering a config with our web UI or import terraform or OpenAPI templates to send simple requests to any of your services endpoints and report on the staus
  • Go beyond looking for 200 OK status - With the power of Playwright, you can automate complex users flows as tests that you can run on a cadence with Checkly
  • Set up a cadence, retry logic, and notifications - While traditional end-to-end testing is run just once against your service (usually before a deployment), with Checkly you’ll want your tests to run all the time. Configuration options let you determine how often tests will run, how they’ll retry when they fail, and how you want to be notified. Of course Checkly has tons of integrations to make sure you know when something isn’t working.

With Checkly set up you (and your team) should know about downtime before your users do.

Our Checkly dashboard gives great status information to our own team, the next step will translate this automated status information for our users.

A great improvement on your team’s responsiveness and how likely you are to meet your SLA. While these notifications will get the team working on outages faster, we’d now like to automatically communicate that status to users.

Step 2: Create a Status page

Our goal is to create a page that shares the status of multiple services in logical groups.

Visitors to our status page can see both that we’re working right now, and when failures have occurred in the recent past.

But it won’t do much good to just show a red marker when things are down, instead we need a feed of incidents that shows what happened and the current status.

Incidents can be automatically opened and resolved with Checkly Synthetic monitors, they’re also how our team can update users manually by creating and updating incidents.

Now that we have a goal clear in mind, let’s create our first status page. Start by defining a service. Users don’t want to see team millhouse: react view idempotency test failed on the status page, so we’ll need ‘services’ to group and translate your checks into something users can readily interpret. On the lefthand toolbar in your Checkly account, go to ‘services’ and create a number of some logical service names.

Users visiting your status pages won’t see the names of checks, only the service names that the checks are linked to.

Once services are created, go into a check’s settings and click ‘Incident Automation’ to create incidents automatically when this check fails.

The incident settings control the incident information that will be created when this check starts failing, note the ‘notifications’ check box: if it’s unchecked, then users who have subscribed to notifications from your status page won’t get notifications if this check fails.

Remember as a general rule: a check continuing to fail doesn’t create an incident, this is also true for notifications. Checks will generate updates when they either start failing or recover.

Now that Checks are grouped into services, it’s time to create our status page.

After you create and name your page, you can add the services you want to display. You can organize services into ‘cards’ for clearer visuals on your page. You can also set a custom logo, favicon, and link to your main page.

Step 3: Share Incident Updates With Your Users

When your checks fail, they’ll update the status of your services, and automatically create an incident at the severity you set under incident automation.

All the updates you see here on the status page are automatic.

Once an incident is created, go to Status Pages>[Select a Status Page]>Incidents to update the incident.

You can update status of the incident and give a message. By default your updates will be tagged as occurring at the moment you hit ‘Publish’, but you can use a custom date to backfill incidents that occurred in the past. This is especially useful if you’re adding an update about a resolution a few minutes late. Note that you probably want to uncheck ‘Notifications’ for a backdated update.

If the check recovers, the incident on your status page will shift to ‘resolved’ automatically.

Conclusions: Keep Your Users Updated with Status Pages

In the early days of GitHub, the most requested feature was some way to hide your commit history. Even if the code you were sharing looked great, it felt embarrassing to share the false starts and mistakes that your team faced on the way. Sharing your failures can feel risky, but opening the channels of communication with your users results in higher customer satisfaction and a better process.

If you’ve already taken the time to proactively monitor your sites with Checkly, sharing the status of your most important services is an easy step that reduces load on your support service, shares service status more widely across your organization, and gives your users a heads up when things are fixed.

See Status Pages in Action

Check out the video tutorial on status pages here:

Share on social