Disaster Recovery Testing: Communication, Communication, Communication

I have a challenge for you that can tell you immediately if your company has a good recovery plan for disaster. First, find out from your network administrator where the cable connecting your business to the telephone company plugs in. Then, bring your voice communications manager or technician to that location and make a sudden move, pretending to unplug the cable.

If the voice manager or technician screams and lurches towards you, then you have no plan. If the technician dares you to go ahead, your company has just passed the test.

It is amazing how companies will spends hundreds of dollars backing up data, identifying a hot-site and making sure the lights will stay on with a backup generator, yet they disregard some key questions: What are customers going to hear or experience when the company experiences a disaster? Or, how will you let all 500 employees know that they need to stay home because the facility has no electrical power?

The number one thing you must have during a disaster is communication. Recovering from a disaster is a very fluid operation. You never know what the disaster will be, what was affected and what actions you will need to take. It is imperative that you are able to communicate in order to direct the response to the disaster and to give your customers confidence that everything is under control.

Businesses have tried to address some of these concerns with clustered or redundant PBXs, and by working with TELCO to reroute calls when prompted. A solution is only good if tested, and that doesn’t necessarily mean just sending the calls to another location. Have a plan, publish the execution time internally and pull the plug. Did it work as planned? If not, that is why you test!

If your testing was done right, the next time you step into the wiring closet the technician will pull the TELCO plug out before you even ask. He will pull the plug with the same confidence you will have during the next disaster when the big boss asks you if everything is working properly. You’ll score plenty of bonus points when you tell your boss to go ahead and call the customer number. 

By: Steve O'Neal

Disaster Recovery Testing: How Do You Know It Works?

While presenting for various groups around the country I am always asked, “What is the best recovery solution out there?”  My answer has always been the same: It doesn’t matter what your solution is, but more importantly, have you TESTED it? Not just sent a file or two, or restored a subset of data, but actually tested the solution under conditions as close to real world scenarios as possible.

Thousands of dollars are spent every year to provide disaster recovery and/or business continuity and yet businesses large and small barely test the solution to make sure it will really work. Until you test realistically, the kinks in your solution will hide until you really need things to work, such as after the disaster has occurred.

A large hospital in New Orleans during Katrina scrambled to send armed security guards back into the calamity to retrieve backup tapes, only to discover that data was never properly written to the tapes. No one knew how long the backup process had been broken and had to start almost from scratch retrieving patient information from paper files.

Another company had redundant generators that performed exactly as designed when the town flooded and the utility power dropped. Uninterruptable power was provided to the data center servers with backup generators and UPS without hesitation, but it suddenly started getting warmer and warmer. Suddenly the company realized that they had neglected to include the power connections for the air conditioning on the backup generator circuits. The data center overheated and they were forced to shut it down in order to protect the equipment.

Testing has a side benefit of making better technicians, managers, and employees. Most technicians have never built the system they maintain from scratch. In a test they can try new things, review their current day-to-day processes, and in many cases apply new knowledge to daily activities. Lab environments are an amazing learning tool because you start over again and again without causing revenue loss for server downtime. Think about it – you are providing a lab for the IT team to recreate the entire IT business; the potential for learning is unavoidable.

Think your plan is good? Test it. Find the kinks; find out if processes aren’t working as you thought they were working, and make sure your team knows what to do when the kinks do appear. Imagine if your local fire department never tested the fire hoses – feel good about them keeping your house from burning down?


By: Steve O'Neal