Step 1: Risk Analysis
The first step in drafting a disaster recovery plan is conducting a thorough risk analysis of your computer systems. List all the possible risks that threaten system uptime and evaluate how imminent they are in your particular IT shop. Anything that can cause a system outage is a threat, from relatively common manmade threats like virus attacks and accidental data deletions to more rare natural threats like floods and fires. Determine which of your threats are the most likely to occur and prioritize them using a simple system: rank each threat in two important categories, probability and impact. In each category, rate the risks as low, medium, or high.
Step 2: Establish the Budget
Once you've figured out your risks, ask 'what can we do to suppress them, and how much will it cost?' Can I detect a threat before it hits? How do I reduce the potential of it occurring? How do I minimize its impact to the business? For example, our small California Internet company could employ an emergency power supply to mitigate its power outage threat and have all its data backed up daily on RAID tapes, which are stored at a remote site in case of an earthquake. The more preventative measures you establish upfront the better. Emerson says, "dollars spent in prevention are worth more than dollars spent in recovery."
Step 3: Develop the Plan
The feedback from the business units will begin to shape your DRP procedures. If, for example, they determine that the company must be up within 48 hours of an incident to stay viable, then you can calculate the amount of time it would take to execute the recovery plan and have the business back up in that timeframe. Emerson suggests that you have the recovery systems tested, configured, and retested 24 hours prior to launching them. He says the set up takes anywhere from 40 hours to days to complete.
The recovery procedure should be written in a detailed plan or "script." Establish a Recovery Team from among the IT staff and assign specific recovery duties to each member. The manner in which your team conducts its recovery probably will be no different than its regular production procedures: the chain of command likely won't change and neither will the aspects of the network for which each member is responsible.
Define how to deal with the loss of various aspects of the network (databases, servers, bridges/routers, communications links, etc.) and specify who arranges for repairs or reconstruction and how the data recovery process occurs. The script will also outline priorities for the recovery: What needs to be recovered first? What is the communication procedure for the initial respondents? To complement the script, create a checklist or test procedure to verify that everything is back to normal once repairs and data recovery have taken place.
Step 4: Test, Test, Test
Once your DRP is set, test it frequently. Eventually you'll need to perform a component-level restoration of your largest databases to get a realistic assessment of your recovery procedure, but a periodic walk-through of the procedure with the Recovery Team will assure that everyone knows their roles. Test the systems you're going to use in recovery regularly to validate that all the pieces work. Always record your test results and update the DRP to address any shortcomings.
As your business environment changes, so should your DRP. Reexamine the plan every year on a high level: Do you still need every part of the plan? Do you need to add to it? Will the budget need to be adjusted to accommodate changes to the plan? As applications, hardware, and software are added to your network, they must be brought into the plan. New employees must be trained on recovery procedures. New threats to business seem to pop up every week and a sound DRP takes all of them into account.