Incident Assessment & Severity Guide for Engineering Teams (+ Cheat Sheet)

Incident Assessment & Severity

Not every alert is an incident—and not every incident is equally urgent.

That’s where incident assessment and severity classification come in. Without clear definitions, teams get stuck in limbo:

  • Should we wake someone up?
  • Should we inform customers?
  • Should we prepare a support strategy?
  • Is this critical or just annoying?

The goal of incident assessment is to evaluate the scope and impact of a problem, determine its urgency, and trigger the appropriate response. Done right, this step aligns engineering, support, and leadership around a shared understanding of what matters—and what to do next.

Let’s break down what effective assessment looks like and how to build your own severity classification system.

What Is Incident Assessment?

Incident assessment is the process of determining whether an observed issue qualifies as an incident—and if so, how serious it is.

To assess an incident, you typically ask:

  • What’s broken?
  • Who is impacted?
  • Is there a workaround?
  • How fast do we need to act?

The outcome of this process is a severity level that maps to your internal response playbook: who gets paged, how quickly you communicate, and how visible the incident becomes across the company.

Why Severity Levels Matter

Clear severity definitions help your team:

  • Act faster under pressure
  • Escalate the right issues
  • Prevent over-alerting or under-reacting
  • Set communication expectations internally and externally

They also create psychological safety. When engineers know exactly what qualifies as a SEV1, they don’t waste time debating—they act.

Severity Levels: Example Framework

Here’s a simple, 3-tier severity model you can adopt or adapt:

Severity Impact Example Incident Expected Action
SEV1 Critical / Total Outage Full production outage, major security breach, data loss All-hands on deck. Wake people up. 24/7 response. Execs informed.
SEV2 High / Partial Outage 10% of users can’t log in, degraded performance, partial failure Escalate to on-call immediately. Frequent updates. Prioritized fix.
SEV3 Moderate / Minor Bug Broken styling, slow dashboard load, minor UX issue Fix during business hours. Log the issue. May not require updates.

A Score-Based System for Classifying Severity

You can use a weighted scoring system that evaluates incidents across five dimensions. This adds structure and reduces subjective decisions:

Dimension Low (1 pt) Medium (2 pts) High (3 pts)
User Impact <5% affected 5–25% affected >25% or all users affected
Functionality Cosmetic / minor bug Partial functionality loss Core feature broken, no workaround
Business Impact No SLA/revenue/legal risk Mild SLA concern or revenue impact Revenue loss, SLA breach, or legal exposure
Urgency Can wait for a sprint Fix in a day or two Requires immediate attention
Workaround Easy workaround exists Workaround is possible but painful No workaround available

Then, you can map the final score as follows:

Total Score Severity Level
5–7 SEV3 (Low)
8–11 SEV2 (Medium)
12–15 SEV1 (High)

Example: Users on an unusual browser cannot check out

Let’s say our business is a review site with an ecommerce store. Users on Microsoft Edge can’t check out due to an incompatibility with our payment provider implementation.

  • User Impact: Low (1) — Less than 5% of all our users are on Microsoft Edge

  • Functionality: High (3) — Users are prevented from a final checkout step, and are unlikely to switch browsers, instead abandoning their cart

  • Business: High (3) — This will cost revenue

  • Urgency: Medium (2) — At our estimate, this only requires updates to dependencies, and can be fixed in a day or two

  • Workaround: Medium (2) - We definitely don’t want to add a ‘please switch browsers’ message to our site

    Score: 12 → SEV1

Downloadable Incident Severity Cheat Sheet

If you want to adopt or adapt this process, you can make a copy of our own incident severity cheat sheet.

Creating Your Own Severity Rules

Every organization operates differently, and what counts as a critical incident for one team may be a routine alert for another. Here’s how to build a severity scoring system that reflects your team’s priorities, customer expectations, and business context.

1. Pick Dimensions That Matter to You

Start by identifying the dimensions of impact that are most relevant to your systems and stakeholders. Common ones include user impact, feature impact, business risk, urgency, and workaround availability—but you might also include compliance violations, customer tier affected, or data integrity if those are key concerns. The goal is to capture the real-world consequences of an incident in a way that reflects your product and risk model.

2. Agree on What “High”, “Medium”, and “Low” Mean

Without clear definitions, severity scoring quickly becomes subjective. What one engineer sees as a “minor issue” might be considered “urgent” by someone in customer success. To avoid this, document clear criteria for each level of impact. For example, define “High User Impact” as “more than 25% of users affected” or “SEV1 Business Impact” as “any outage that causes revenue loss or legal risk.” These definitions become your north star for consistent triage.

3. Add Automation Where Possible

Manual severity scoring can slow things down and introduce inconsistencies, especially during high-pressure incidents. Automate as much of the process as you can. A shared Google Sheet or Notion template can help teams select impact levels via dropdowns, with scores and severity levels calculated automatically. For more mature teams, connect this logic directly into your alerting pipeline or incident management tool so severity is auto-assigned on alert creation.

4. Train Your Incident Responders

Severity models only work if everyone is aligned on how to use them. Run training sessions or tabletop exercises with your incident response team. Use past incidents to “test” the model—how would they score it now, and does the outcome feel right in hindsight? Over time, this helps calibrate judgment, improves consistency across teams, and creates shared understanding between engineering, support, and leadership.

Final Words

Clear severity rules turn gut-feel decisions into structured, confident responses. With a shared scoring model, your team can triage faster, communicate better, and stay focused when it matters most.


Last updated on May 23, 2025. You can contribute to this documentation by editing this page on Github