Disaster Recovery (DR) Testing: Getting the Most From Your Tabletop Exercise

Disaster recover (DR) testing

Many organizations do a tabletop test each year of their Incident Response (IR) or Business Continuity/Disaster Recovery (BC/DR) plan to evaluate its effectiveness and make sure it’s current. While tabletop is generally the weakest form of testing and has some significant limitations, there are some things that can be done to make it a better test of your actual IR or DR response capabilities and the thoroughness of your plans.

Consider Doing a More Thorough Type of Test

Certain frameworks (FedRAMP, HIPAA, etc.) either discourage or simply don’t accept a tabletop test as a sufficient test. Alternate types of testing that can be more thorough include simulation, parallel, and full-interruption testing. NIST publication 800-84 provides a great overview. What it truly comes down to, of course, is the needs of the business, but during an outage or an incident is a terrible time to find a flaw in your plan. If it is determined that a tabletop exercise is sufficient, make sure to get the most out of it by considering the following tips.

Be Specific in Your ‘Tabletop’ Actions

I once observed a tabletop exercise that included the following exchange:

Tabletop Leader:  “Okay, next item is “notify law enforcement, and it says it’s the CISO’s task.”

CISO: “Okay, I notify law enforcement.”

Tabletop Leader:  “Okay, check.  Next task is….”

I actually interrupted the test at this point, despite the fact that I was only supposed to be an observer, with some questions about these actions. The CISO lives in Oregon, the Company is headquartered in San Jose, and the data is in Virginia in AWS US-East. Whom did you notify? 911? In some locations, I’m not sure local law enforcement has any idea how to take a report on cybercrime. Cybercrime incidents might be the FBI, they might be the Department of Justice, they might even end up being the United States Postal Service or even the Secret Service.

The point here is that “I notify law enforcement” didn’t test anything.

“I go to the URL in the Incident Response Plan for the FBI’s Internet Crime Complaint Center and make a report. I follow up with the local FBI office at the phone number in the plan as well via a telephone call. I put the complaint confirmation number in the incident tracking ticket along with the time and date of my communications.” This action provides much more coverage for testing this significant step.

 

Consider testing situations

Try to Consider Everyone’s Different Situation

When discussing a Disaster Recovery Plan with clients, I tend to ask “What if you came to work tomorrow and the building was gone?” I usually get the “We’d just work from home, like we did during COVID” answer. When I ask about laptops, I get told “Everyone takes their laptop home.” At this point I encourage them to walk through the building after 5:00 pm, and when they do they discover that finance, sales, and HR usually don’t take their laptops home.

“Fine, we’ll get them a laptop.” That’s a great idea, but it’s not a plan. Much like “I notify law enforcement” the plan should include WHO is going to get them a laptop, who is going to install endpoint management and the corporate VPN on it, where this laptop will be purchased from, and who is going to get it to the end user. Sometimes I hear “We’ll just tell them to buy a laptop” which assumes that the end-user has the finances to just purchase a thousand-dollar item. Surely most professionals can absorb that cost, but assuming they can is dangerous.

Obviously, testing “buying a laptop” isn’t necessary, nor it is efficient, but “I run to the store down the street from me and buy 14 Windows laptops and deliver them to Ted’s house for imaging” is a better tabletop action than “We get new laptops for finance and give them to the team.”

Think of the Business as a Whole

Often, IT or Security/Compliance gets the task of creating the Incident Response or Disaster Recovery plans. This can lead to a narrow view of the tasks required. Many of the plans I’ve seen, especially for disaster recovery, end when the systems that serve customers are available again and the after-actions have been completed, but don’t even consider back office or supporting functions.

Sure, IT can work from home for long days to get systems back up, but how long will they work without getting paid? How long can the company function without Accounts Payable paying service providers? Are you willing to lose the customers that are in the pipeline because salespeople are unable to respond quickly? These may not be ‘day 1’ recovery items, but they’re essential in their own way. When tabletop testing, be sure to remember they’re affected as well.

At one point, when I was testing an internal communications failure scenario and the usual method of instant messaging was out, a manager asserted he would “call” his employee. I asked him to show me that he had the employee’s phone number and he didn’t. His assertion that he would “get it from HR” didn’t consider that communication with HR was unavailable as well and without a phone number for “someone in HR” obtaining the employee’s number wasn’t as easy as it seemed. A good tabletop test needs to consider the impact of the scenario beyond just the servers and databases.

 

Test who is or isn't working

Don’t Assume Everyone is at Work Every Day

If I’m testing Disaster Recovery or Incident Response on a year-after-year basis, occasionally I like to take out a key person involved in the testing process. I actually bring a Hawaiian shirt and a fruity drink in a coconut and a sun hat and have someone key to the process put them on At that point, they’re merely an observer and can’t provide information or perform steps. The reality is that disasters can happen when your key person is in Hawaii, and you still have to be able to handle whatever happens without them. That’s how you ensure you’re testing the PLAN and not the PEOPLE.

Sidelining the key person who knows the ins and outs of the architecture and every field of every database gives you not only a better test of the plan, but a better understanding of where information silos are. This can help identify which employees should be discouraged from planning concurrent absences. It also gives rising stars in the organization a chance to showcase their skills and knowledge.

Choose a Test That is Likely & Different From Last Year’s

I’ve noticed organizations have a tendency to test very similar scenarios that are small-scale year after year. While “Someone leaves their laptop in their car and it gets stolen” isn’t a terrible test, it’s a small-scale issue with significantly limited impact. Testing that scenario year after year is an indication you’re not taking this test or the usefulness of this test seriously. Likewise, this may also apply to “Someone clicks on malware” or “Our region in our cloud provider is unavailable.”

Find a scenario that seems possible, but has more broad implications. The testing value of “We discover an account that should have been disabled has been exfiltrating data for 6 months” is a way better test of the Incident Response Plan than “Someone in marketing let their kids use their laptop and they got a keylogger.”

Summary – Tabletop Tests CAN be Valuable

If you’ve taken a long hard look at your business needs, correlated them with your risk profile, and determined a tabletop exercise is an adequate test of your Incident Response or Disaster Recovery capabilities, then, as an auditor, I’m not about to tell you that’s not the case. As an audit firm that avoids ‘check the box’ audits, it is important to me that you’re getting the most value out of the exercise that you can. I hope these tips can make your next tabletop a better test and a more valuable use of people’s time.

Need more information? Need your security posture, including your Disaster Recovery and Business Continuity plans, to be assessed more thoroughly? Linford & Co. can help you identify a compliance framework that can give you a solid assessment of how secure your business is and how prepared you are for security challenges. Contact us today to learn about our audit process and variety of audit services, which range from penetration testing to SOC 2 audits, HITRUST certification, and more.