What is Incident Management & Why is It Important?

Guide to incident management

Imagine that your system is under attack and your customers are unable to access your system because of this disruption in service. What do you do next and how do you respond? This is where incident management comes into play. An effective incident management process and incident response plan helps to return your system to normal operations. SOC 2 auditors will evaluate your controls in place to address incidents when they occur.

What is the Purpose of Incident Management?

The purpose of incident management is to return the service organization’s services to the user entities back to normal operations as quickly as possible, after an event, to minimize the impact of the event on the service organization’s achievement of its service commitments and system requirements.

How Do You Classify a Major (or Critical) Incident?

A major or critical incident may be defined as a service disruption or reduction in quality of service of such magnitude that it interferes with the service organization’s ability to meet its service commitments and system requirements to a broad range of users. A major or critical incident requires an emergency response to restore functionality to its intended state.

  • Events are bound to occur, and no service organization is immune from unexpected events occurring that need to be addressed promptly to return to normal operations.
  • Events can lead to the loss of, or disruption to, operations, services, or functions, and result in a service organization’s failure to achieve its service commitments or system requirements.
  • Events may arise from actual or attempted unauthorized access to impair or potentially impair the availability, integrity, or confidentiality of information or systems, unauthorized disclosure, theft, corruption, or destruction of information or damage to systems.
  • Events may include everything from a security breach to a denial of service attack.

Incident management includes those activities that identify, document, analyze, and address events and prevent future events from recurring.

What Are the Benefits of Incident Management?

The benefits of an incident management program are to provide a methodology for handling incidents as they occur through all stages of the response to quickly return to normal operations thereby limiting the severity of the service disruption.  If an event is not managed appropriately and timely, it may escalate into a bigger problem, crisis, or disaster.

Ineffective incident management may result in a greater loss of or longer disruption to business operations or services, adversely impacting information security, information systems, employees, customers, or other critical business functions of the service organization.

 

Incident management process

What is an Incident Management Process?

An incident management process encompasses the actions from identification to restoration back to normal operations, thereby limiting disruption severity and duration. Implementing a repeatable process to manage incidents assists a service organization in achieving its service commitments and system requirements. How resilient a service organization is in their response to incidents aligns with how prepared they are for unexpected events to occur and how organized they are in their response plan to ensure minimal service outage, loss, or damage.

Why is Effective Incident Management Important?

An effective incident management process is important because it provides a methodology to respond to incidents as they occur thereby substantially lowering the time needed to return to normal operations.

Preventing an event from occurring is always the best type of control to implement. However, quickly detecting when an event does occur and effectively and efficiently addressing it is also necessary because of the difficulty assessing all potential risk scenarios in our ever-changing dynamic information technology landscape.

What are the 4 Main Stages of an Incident?

While the 4 main stages of an incident management process are: 1) detection, 2) containment, 3) resolution, and 4) post-mortem review, a service organization needs to implement a strong incident management process that includes consideration for the following items:

  • Preparation for an incident
  • Identification of an incident
  • Containment of the incident
  • Documenting and tracking the incident in a ticketing system to record actions taken
  • Classifying and prioritizing the incident based upon impact and urgency
  • Assigning the incident to appropriate incident response team members
  • Diagnosing and responding to the incident to eradicate its impact
  • Resolving the incident and restoring operations
  • Closing the incident
  • Analyzing post-recovery lessons learned
  • Incorporating lessons learned back into the incident response plan

An incident can come from anywhere, including various internal and external communication channels such as email, phone, application and/or infrastructure monitoring alerts, or through your customer support team member. The severity of an incident may vary based upon the impact of a service disruption to all customers versus to a small number of users being impacted intermittently. The classification of an incident should be based upon the impact of the incident such as the number of users impacted or severity as well as the sense of urgency such as high or low priority to return to normal functionality.

 

Incident Management Process Tips Infographic

Documenting the incident in an incident management tool or ticketing system helps to track the incident from initial identification through to resolution and provides a means for monitoring the status of the incident at any time throughout its life cycle. Being able to separately identify incidents within the ticketing system allows for data analysis and trending of incidents that aids in preventing future recurrence.

Having a standardized approach to classifying incidents based upon the severity of the adverse impact on the business and the urgency needed to resolve the incident will help to focus limited resources on significant incidents requiring immediate attention. Incidents that may harm the service organization’s ability to meet service level agreements should take precedence over incidents having a lower impact on the service organization’s service commitments and system requirements.

Assigning the incident to appropriate incident response team members gets the incident quickly in front of individuals who have the role and responsibility to address the incident efficiently and effectively. Escalating the incident further up the chain of command depends upon the facts and circumstances of the incident’s impact and urgency.

Responding to an incident entails thoroughly investigating the incident and diagnosing the problem so that it may be adequately contained and resolved to ensure prompt restoration of normal operations. This step may include a root cause analysis of the problem so that the problem may not only be fixed for the current incident but also to prevent future similar incidents from occurring.

What is the Difference Between Resolution & Recovery of an Incident?

The difference between resolution and recovery of an incident is that recovery of an incident occurs when operations are fully restored to normal operations perhaps even through a temporary workaround. The resolution of an incident occurs when the root cause of the event is analyzed and a permanent fix is deployed (e.g., patching, bug fixing, etc.).

The incident ticket may be closed upon completion of all actions to respond to the incident and restore service back to normal operations. In some cases, depending upon the results of the root cause analysis, a longer-term fix may be warranted that is implemented further down the road to resolve the underlying problem in order to prevent future recurrence.

An analysis of the post-recovery lessons learned is a good way to identify areas needing tweaking to smooth out the process and determine areas requiring improvement or training needs. Post-mortems are an effective way to learn from the incident and apply lessons learned in order to continuously improve the process over time.

 

Incident response plan considerations

What are Considerations for an Incident Response Plan?

A well-documented incident response plan that is communicated to appropriate personnel helps guide actions needed when an incident such as a security breach, cyberattack, or server downtime occurs.

Some steps to be sure to include in an incident response plan include the following:

  • Identification of incident response team members
  • Defined roles and responsibilities for each incident response team member
  • Incident classification matrix based upon impact/urgency
  • Protocols for reporting and communicating to internal/external parties as appropriate
  • Strategies for responding to various types of incidents
  • Consideration of relevant critical system components across the entity that could impair operations
  • Periodic table-top test of the plan or simulation of an event
  • Continuous refinement of the incident response plan based upon live event, test results, and/or lessons learned

It is important that the incident response plan is communicated and readily accessible to appropriate personnel so that roles and responsibilities are known and understood.  Internal and external communication protocols should be outlined in advance of an incident for timely notification as required. Incident communication includes informing impacted internal and external users that the system is experiencing degraded performance or an outage.  Documenting strategies for certain risk scenarios based upon their classification can help to jump-start an immediate response to an incident that occurs that may help to minimize outage duration, loss, or damage.

 

Incident Response Plan Tips

Conscientious service organizations continually evaluate, test, and refine their incident response plan and shift their approach to incident management by performing post mortems and incorporating lessons learned so that they may be more responsive and adept at resolving incidents as they occur thereby minimizing any potential loss, damage, or disruption of service. Rapid recovery to normal operations equates to lower downtime, cost and productivity savings, and higher customer confidence.

Executing a well-thought-out incident management process along with an incident response plan is a value-added differentiator in the competitive service organization environment.

 

Incident management auditing

Where Do SOC 2 Auditors Focus When Auditing Incident Management?

The trust services common criteria and related areas of focus are applicable to all service organizations undergoing a SOC 2 examination. This includes the evaluation of security incidents to determine the impact on the service organization’s service commitments and system requirements and the action needed to prevent or remediate adverse impacts. Response to the security incident including the execution of a defined incident response plan to understand, contain, remediate, and communicate as appropriate. Additionally, the service organization must identify, develop, and implement activities to recover from security incidents.

Auditors will focus on the controls in place providing reasonable assurance that incidents are identified, tracked, investigated, and resolved in a timely manner. Controls that in aggregate help to meet the required criteria may include but are not limited to the following:

  • An established Incident Management Policy & Procedure including an Incident Response Plan that has been communicated to appropriate personnel.
  • Clearly defined incident management roles and responsibilities.
  • Documented incident classification protocols and priority based upon the severity of the impact and urgency.
  • Utilization of a tracking mechanism (e.g., a ticketing system) to document and track the status of incidents from identification to resolution and closure.
  • Established incident reporting and communication protocols and procedures.
  • Established procedures to contain the incident and to execute corrective actions for remediation of the incident.
  • Documented remediation plans and corrective actions.
  • Root cause analysis to prevent future recurrence of the incident.
  • Timely resolution of incidents and restoration to normal operations.
  • Reviewing and updating of the Incident Response Plan at least annually based upon lessons learned.
  • Testing of the Incident Response Plan at least annually if no security events occur during the year and revision of the plan based upon test results and lessons learned.

Summary

Incidents happen. Service organizations need to make the best use of their limited available time and resources to address inevitable events that occur by considering trends, patterns, and underlying problems or root causes. Utilizing an incident management tool and addressing incidents in a methodical manner will help reduce chaos.

Incident management is an important tool in any service organization’s arsenal that helps to achieve its service commitments and system requirements resulting in maintaining normal operations and retaining satisfied customers.

Please contact us at Linford & Company if you would like more information regarding SOC reports. Our team of IT audit professionals complete Type I and Type II, SOC 1 audit reports (f. SAS 70 / SSAE 16), and SOC 2 audit reports on behalf of service organizations located around the world. We are available to answer questions you may have regarding SOC compliance requirements and your auditing needs.

This article was originally published on 2/20/2019 and was updated on 3/22/2022.