Incident Assessment & Severity
Not every alert is an incident—and not every incident is equally urgent. That’s where incident assessment and severity classification come in. Without clear definitions, teams get stuck in limbo:- Should we wake someone up?
- Should we inform customers?
- Should we prepare a support strategy?
- Is this critical or just annoying?
What Is Incident Assessment?
Incident assessment is the process of determining whether an observed issue qualifies as an incident—and if so, how serious it is. To assess an incident, you typically ask:- What’s broken?
- Who is impacted?
- Is there a workaround?
- How fast do we need to act?
Why Severity Levels Matter
Clear severity definitions help your team:- Act faster under pressure
- Escalate the right issues
- Prevent over-alerting or under-reacting
- Set communication expectations internally and externally
Severity Levels: Example Framework
Here’s a simple, 3-tier severity model you can adopt or adapt:Severity | Impact | Example Incident | Expected Action |
---|---|---|---|
SEV1 | Critical / Total Outage | Full production outage, major security breach, data loss | All-hands on deck. Wake people up. 24/7 response. Execs informed. |
SEV2 | High / Partial Outage | 10% of users can’t log in, degraded performance, partial failure | Escalate to on-call immediately. Frequent updates. Prioritized fix. |
SEV3 | Moderate / Minor Bug | Broken styling, slow dashboard load, minor UX issue | Fix during business hours. Log the issue. May not require updates. |
A Score-Based System for Classifying Severity
You can use a weighted scoring system that evaluates incidents across five dimensions. This adds structure and reduces subjective decisions:Dimension | Low (1 pt) | Medium (2 pts) | High (3 pts) |
---|---|---|---|
User Impact | <5% affected | 5–25% affected | >25% or all users affected |
Functionality | Cosmetic / minor bug | Partial functionality loss | Core feature broken, no workaround |
Business Impact | No SLA/revenue/legal risk | Mild SLA concern or revenue impact | Revenue loss, SLA breach, or legal exposure |
Urgency | Can wait for a sprint | Fix in a day or two | Requires immediate attention |
Workaround | Easy workaround exists | Workaround is possible but painful | No workaround available |
Total Score | Severity Level |
---|---|
5–7 | SEV3 (Low) |
8–11 | SEV2 (Medium) |
12–15 | SEV1 (High) |
Example: Users on an unusual browser cannot check out
Let’s say our business is a review site with an ecommerce store. Users on Microsoft Edge can’t check out due to an incompatibility with our payment provider implementation.- User Impact: Low (1) — Less than 5% of all our users are on Microsoft Edge
- Functionality: High (3) — Users are prevented from a final checkout step, and are unlikely to switch browsers, instead abandoning their cart
- Business: High (3) — This will cost revenue
- Urgency: Medium (2) — At our estimate, this only requires updates to dependencies, and can be fixed in a day or two
- Workaround: Medium (2) - We definitely don’t want to add a ‘please switch browsers’ message to our site → Score: 12 → SEV1