Severity Levels
The first step in any incident response process is to determine what actually constitutes an incident. Incidents can then be classified by severity, usually done by using "SEV" definitions, with the lower numbered severities being more urgent. Operational issues can be classified at one of these severity levels, and in general you are able to take more risky moves to resolve a higher severity issue. Anything above a SEV-3 is automatically considered a "major incident" and gets a more intensive response than a normal incident.
Always Assume The Worst
If you are unsure which level an incident is (e.g. not sure if SEV-2 or SEV-1), treat it as the higher one. During an incident is not the time to discuss or litigate severities, just assume the highest and review during a postmortem.
Can a SEV-3 be a major incident?
All SEV-2's are major incidents, but not all major incidents need to be SEV-2's. If you require coordinated response, even for lower severity issues, trigger our incident response process. The Incident Commander can make a determination on whether full incident response is necessary.
Severity | Description | Typical Response |
---|---|---|
SEV-1 |
Critical issue that warrants public notification and liaison with executive teams.
|
Major incident response.
|
SEV-2 |
Critical system issue actively impacting many customers' ability to use the product.
|
Major incident response.
|
Anything above this line is considered a "Major Incident". Our incident response process should be triggered for any major incidents. | ||
SEV-3 |
Stability or minor customer-impacting issues that require immediate attention from service owners.
|
High-Urgency page to service team.
|
SEV-4 |
Minor issues requiring action, but not affecting customer ability to use the product.
|
Low-Urgency page to service team.
|
SEV-5 |
Cosmetic issues or bugs, not affecting customer ability to use the product.
|
JIRA ticket.
|
Be Specific
These severity descriptions have been changed from the PagerDuty internal definitions to be more generic. For your own documentation, you are encouraged to make your definitions very specific, usually referring to a % of users/accounts affected. You will usually want your severity definitions to be metric driven.