Subject Matter Expert

If you are on-call for any team at PagerDuty, you may be paged for a major incident and will be expected to respond as a subject matter expert (SME) for your service. This page details everything you need to know in order to be prepared for that responsibility. If you are interested in becoming an Incident Commander, take a look at the Incident Commander Training page.

On-Call Expectations#

If you are on-call for your team, there are certain expectations of you as that on-call. This applies to both the primary and secondary on-calls. Getting paged about a SEV-3 or SEV-4 in your system comes with different expectations than getting paged with a major SEV-2.

Before Going On-Call#

  1. Be prepared, by having already familiarized yourself with our incident response policies and procedures. In particular,
    1. Different Roles for Incidents - You will be acting as a "Resolver" or "SME". But you should familiarize yourself with the other roles and what they will be doing.
    2. Incident Call Etiquette - How to behave during an incident call.
    3. During an Incident - What to do during an incident. You are specifically interested in the "Resolver" steps, but you should familiarize yourself with the entire document.
    4. Glossary - Familiarize yourself with the terminology that may be used during the call.
  2. Make sure you have set up your alerting methods, and that PagerDuty can bypass your "Do Not Disturb" settings.
  3. Check you can join the incident call. You may need to install a browser plugin. You don't want to be doing that the first time you get paged.
  4. Be aware of your upcoming on-call time and arrange swaps around travel, vacations, appointments, etc.
  5. If you are an Incident Commander, make sure you are not on-call for your team at the same time as being on-call as Incident Commander.

During On-Call Period#

  1. Have your laptop and Internet with you at all times during your on-call period (office, home, a MiFi, a phone with a tethering plan, etc).
  2. If you have important appointments, you need to get someone else on your team to cover that time slot in advance.
  3. When you receive an alert for a major incident, you are expected to join the incident call and Slack as quickly as possible (within minutes).
    1. You will be asked questions or given actions by the Incident Commander. Answer questions concisely, and follow all actions given (even if you disagree with them).

Response Mobilization#

When an incident occurs, you must be mobilized or assigned to become part of the incident response. In other words, until you are mobilized to the incident via a page or being directly asked by someone else on the incident, you remain in your everyday role. After being mobilized, your first task is to check in and receive an assignment. While it's tempting to see an incident happening and want to jump in and help, when resources show up that have not been requested, the management of the incident can be compromised.

"Never Hesitate to Escalate"#

If you're not sure about something, it is perfectly acceptable to bring in other SMEs from your team that you believe know a given system better than you. Don't let your ego keep you from bringing in additional help. Our motto is "Never hesitate to escalate", you will never be looked down upon for escalating something because you didn't know how to handle it.

Blameless#

There will be incidents. Some will be caused by you, some will be caused by others... some will just happen. Our entire incident response process is completely blameless. Blaming people is counter productive and just distracts from the problem at hand. No matter how an incident started, they all need to get solved as quickly as possible.

Wartime vs Peacetime#

Behavior during a major incident is very different to any other alert you may have received in the past. We call a major incident "wartime", and make a distinction between that and normal everyday operations ("peacetime").

Peacetime#

The organizational structure is generally based on seniority. The more senior members of a team will lead discussions, and managers or team leads will have the final say. Decisions are made after careful consideration of all options, and to minimize potential risk to customers.

Wartime#

Wartime is different, and you will notice on our major incident calls that there's a different organizational structure.