Example Disaster Recovery Plan
-
Introduction
-
Disaster Threat
Analysis
-
Organizational
Responsibilities
-
Disaster Avoidance
-
Disaster Preparation
-
Disaster Recovery
-
Appendices
A disaster is any incident or event that results in a major
(multi-day) interruption of operations at one or more of the
contact or data centers. For disruptions in service that affect
only a portion of systems or operations at any one location, a
subset of the full recovery procedures will likely be used to
restore normal operations. A catastrophic disaster, however,
would render the centers incapable of conducting critical
functions for an extended period of time. The impact of such a
disruption would require that notification and periodic updates
be circulated throughout the system, until normal operations were
restored. The appropriate authorities, depending on the nature of
the disaster (fire, flood, etc.), would also have to be
contacted. Personnel at each center, organized into emergency
management teams, would coordinate the initial response to the
disaster, assess the damage, and determine the extent to which
all or part of the disaster recovery plan should be deployed.
Designated team members would have responsibility for maintaining
the necessary sequence of notifications to senior management, to
users, to public emergency personnel, and to utility contractors,
as appropriate and as the need arises.
This disaster recovery plan will be invoked if one of the
following disasters occur:
- Limited Disaster
A limited disaster is characterized by limited or
isolated damage to a part of a contact or data center that is
sufficient that has disabled or will disable it, partially or
completely, for a period of 24 hours.
- Moderate Disaster
A moderate disaster is characterized by severe damage to
the entire contact or data center, thereby temporarily
prohibiting the performance of all user support or operations
tasks. It requires either temporarily allocation of the
workload to other existing sites or else temporarily transfer
to a hot-backup site until the facility can be repaired.
However, no cold backup site is required because of the limited
time required to put the affected site into full
operation.
- Catastrophic Disaster
A catastrophic disaster is characterized by complete
destruction of a contact or data center. Because the center is
a total loss and needs to be completely rebuilt or replaced, it
requires either temporarily allocation of the workload to other
existing sites or else temporarily transfer to either a hot or
cold-backup site.
- Earthquake.
- Fire.
- Flood.
- Major storms such as tornados and hurricanes.
- Loss of electrical power (e.g., power brownouts and
blackouts).
- Loss of cooling.
- Loss of network connectivity.
- Loss of telephone service.
- Hardware component failure.
- Failure of physical security.
- Loss of required staffing (e.g., evacuation, strike, or
sick-out).
- Sabotage.
- Bomb threat.
- Hacker attacks.
- Disaster Recovery Team:
- Organization and Membership
- Pre-disaster Responsibilities
- Post-disaster Responsibilities
- Headquarters
- Equipment (two-way radios, hard hats, rain gear,
bullhorns, badges).
- Environments Team
- Operations Team
- User Support Team
- Air Conditioning
- Fire Detection and Suppression Equipment
- Security Systems
- Emergency Lighting
- Universal Power Supply
- Hardware Redundancy
To properly prepare for the occurance of disasters, the
following steps will be taken:
- Review and maintain the Disaster Recovery Plan.
- Inventory Maintenance
Maintain a current inventory and status of all equipment,
software, and data in the contact and data centers that may be
damaged or lost during a disaster.
- Offsite Backup Storage:
- Regularly back up data at the off-site storage
facilities.
- Regularly back up disaster recovery materials (e.g.,
hardware and software inventories) at the off-site storage
facilities.
- Regularly rotate the backup media as scheduled at the
off-site storage facilities.
- Backup Centers Maintenance
- Establish and Stock a Disaster Control Center
safe location, a chalk board for general information, a
radio for up-to-date news information, a telephone for external
communications, and a hand held portable radio
- Disaster Recovery Plan Communication and
Training
Ensure that all personnel are aware of:
- The existance of disaster recovery plan.
- Their responsibilities in case of a disaster.
- The appropriate emergency and evacuation procedures.
- Where the exits are.
- Disaster Recovery Plan Testing
- Center Inspections
Regularly inspect the contact and data centers:
- Air Conditioning
Ensure that the air conditioning systems are
functioning properly and that proper temperatures are
maintained.
- Fire Detection and Suppression Equipment
Ensure that the fire detection and suppression systems
are functioning properly.
- Security Systems
Ensure that the security warning systems and emergency
lighting systems are functioning properly.
- Universal Power Supply (UPS)
Ensure that the UPS systems are functioning
properly.
- Hardware Redundancy
- Building Inspections
Ensure that the buildings are up to code.
- Maintaining a current status of equipment in the contact
and data centers.
- Ensuring that the user community is aware of appropriate
disaster recovery procedures and any potential problems and
consequences that could affect their operations.
- Ensuring that the operations procedure manual is kept
current.
- Emergency and Evacuation Procedures
- Damage Assessment
- Disaster Communication
- Disaster Control Center
- Contact Center Recovery Strategy
- Data Center Recovery Strategy:
- Degraded Operations
- Facilities Recovery
- Hardware (Server and Network) Recovery
- Operating System Recovery
- Application Recovery
- Communications Recovery
- Backup Approach:
- Hot and Cold Backup
- Alternate Backup Data Centers
- Alternate Backup Data Centers
- Offsite Backup Data Storage
- Backup Activation
-
Once the back-up content and/or data sites are functioning on
a full production schedule, priority return to the permanent
centers. Initial assessments of damage would be refined, and
reconstruction plans developed. If major facilities/site damage
had been incurred, the full reconstruction plans would extend
well beyond the operations staff. However, once the time schedule
for facilities reconstruction were known, at least approximately,
plans could be made for permanent replacement equipment. Unless
arrangements had been made to continue long-term lease (or
purchase) of the temporary replacement equipment, this
undertaking would entail issuance of a competitive solicitation
for the replacement hardware. Award/delivery would have to be
timed to coincide with availability of reconstructed centers.
With the permanent centers restored, operations are transferred
from the temporary facility by following the same sequence of
steps as were used to set up the back-up site. The
re-establishment of normal operations should proceed under far
less duress than the establishment of emergency operations, and
the logs kept during disaster recovery should help highlight and
troubleshoot/resolve any problems that may have arisen during
earlier system transfers.
When a disaster occurs and time and safety permits, the local
management team will:
- Announce Evacuation
Make an announcement to either evacuate the building or
to move to a safe(er) location in the building.
- Provide First Aid
Provide first aid to any injured personnel.
- Evacuate Injured
Evacuate any injured personnel to a safe location for
transport to hospital.
- Obtain Emergency Assistance
Call for emergency assistance (e.g., ambulance, fire) as
appropriate.
- Perform Initial Assessment
Perform an initial quick assessment of the nature,
extent, and impact of the disaster.
- Notify Emergency Response Team
Notify the emergency response team leader, or if
unavailable, the secondary backup team contact(s).
- Notify Disaster Recovery Team
Notify the disaster recovery team leader, or if
unavailable, the secondary backup team contact(s).
When a disaster occurs and time and safety permits, the
security personnel will collaborate with the
management team to:
- Provide First Aid
Provide first aid to any injured personnel.
- Evacuate Injured
Evacuate any injured personnel to a safe location for
transport to hospital.
- Limit Damages
Perform damage-limiting measures (e.g., covering all
computers and valuable equipment with plastic garbage
bags).
- Verify Evacuation
Verify that the evacuation was successful and that the
facilities are unoccupied.
- Secure all doors.
- Provide emergency supplies of flashlights, batteries,
communications devices, to personnel as directed.
When a disaster occurs and time and safety permits, the
operations team will:
- Perform an orderly shut down of the computers if they have
not automatically powered down.
If possible, the primary (or secondary) disaster recovery
coordinator will:
- Perform a more detailed assessment of the nature, extent,
and impact of the disaster.
- Decide on an immediate course of actions:
- Determine if additional equipment and supplies are
needed.
- Determine if recovery is feasible at the affected center
or if the alternate back-up center must be mobilized.
- Notify senior management about the disaster.
- Obtain approval for the expenditure of funds to bring in
any requried equipment, supplies, and personnel.
- Notify the engineering response team leader.
- Notify the operations team leader.
- Notify the support response team leader with the
information necessary to provide an initial notification and
status report to users and to user support agents.
The
emergency response team will:
- Notify Hardware Vendors
If there is a need for the immediate delivery of hardware
components to return the center(s) to operation (even if in a
degraded mode), contact the local marketing and/or field
service representatives of the equipment replacement vendors
to:
- Formally declare a disaster (and thereby initiate any
disaster recovery contract).
- Notify them that a priority should be placed on supplying
additional equipment and/or replacing damaged equipment.
- Notify them of the situation (e.g., hardware required,
delivery location).
- Notify them of the anticipated required schedule for
equipment replacement.
- Request delivery of required equipment to either the
affected or alternate centers (as circumstances
dictate).
- Notify Off-Site Storage Providers
Contact the off-site data storage provider to obtain
backup data tapes (as needed).
- Confer with Operations
Confer with operations team(s) to schedule the obtaining
the backup data tapes and associated documentation.
- Confer with Engineering
Confer with environments team(s) to coordinate site
readiness for the installation of the replacement equipment,
the rerouting of telecommunications links, etc.
The
emergency response team will:
- Update Senior Management
Provide senior management with an updated assessment of
the nature, extent, and impact of the disaster including an
estimated schedule for full recovery.
- Arrange Emergency Funding
Obtain authorization for emergency funding, if required
to cover travel or any other extra expenses necessary to deal
with the situation.
- Notify Software Vendors
Contact the software vendors' support personnel to notify
them of the:
- Disaster situation.
- Anticipated interim operations requirements.
- Immediate help that is needed to begin procedures to
restore systems software.
- Imminent need for emergency software keys, as
appropriate, upon identification of serial numbers of
replacement equipment.
- Rush Order Supplies
Rush order any supplies, forms, or media that may be
needed.
- The operations response teams will be proceeding with
retrieval/recovery of back-up tapes.
- The engineering response team(s) will be coordinating
restoration at the affected or alternate site, as appropriate.
Both teams will initiate their portions of recovery logs.
If replacement equipment is not yet available, the disaster
recovery coordinator, in concert with operations and engineering
team captains will: initiate an alternate production schedule to
share the resources of the remaining site to support operational
requirements for both sites test and verify communications
capabilities
The
disaster recovery team will:
- Provide updates to senior management.
- Issue notification of alternate/interim processing
schedules.
- Perform a detailed assessment of the nature, extent, and
impact of the disaster (e.g., to identify safety hazards; to
identify equipment status as working, destroyed, or
salvagable).
- Prioritize repairs.
The
disaster recovery team will:
- Collaborate with the facilities organization to restore
original site(s).
- Collaborate with the procurement organization to purchase
permanent replacement equipment.
For each affected contact and data center and upon delivery of
replacement equipment, the
environments team will:
- Install, configure, and test the permanent replacement
equipment (workstations, servers, network cables, network
connectivity devices, etc).
- Re-route and test the communications to restored
center.
For each affected contact and data center, the
operations team will:
- Transport the back-up tapes to restored site.
- Install the operating systems, database management systems,
and other system software on the replacement hardware.
- Install the applications software.
- Install the data from the backup tapes.
- Test and verify that all systems are operational.
- Monitor the restored operations to verify continuity, data
integrity, etc.
- Resume all normal operations.
The
disaster recovery team will:
- Announce the restoration and re-scheduling of operations
from the restored site.
- Provide updates to senior management.
- Complete the disaster recovery logs by documenting the
restoration of normal operations.
re-assess status of equipment ( necessity of bidding
permanent replacement equipment, while continuing El Camino
lease, etc.) re-assess any other physical/facilities requirements
before considering restoration complete confirm status of
hardware/software with vendors/service-providers
- If it is judged advisable, check with third-party vendors
to see if a faster delivery schedule can be obtained.
- Order any additional electrical cables needed from
suppliers.
- The following additional major tasks must be followed in
use of the alternate site:
- Notify officials that an alternate site will be needed for
an alternate facility.
- Coordinate moving of equipment and support personnel into
the alternate site with appropriate personnel.
- Bring the recovery materials from the off-site storage to
the alternate site.
- As soon as the hardware is up to specifications to run the
operating system, load software and run necessary tests.
- Determine the priorities of the client software that need
to be available and load these packages in order. These
priorities often are a factor of the time of the month and
semester when the disaster occurs.
- Prepare backup materials and return these to the off-site
storage area.
- Set up operations in the alternate site.
- Coordinate client activities to ensure the most critical
jobs are being supported as needed.
- As production begins, ensure that periodic backup
procedures are being followed and materials are being placed in
off-site storage periodically.
- Work out plans to ensure all critical support will be
phased in.
- Keep administration and clients informed of the status,
progress, and problems.
- Coordinate the longer range plans with the administration,
the alternate site officials, and staff for time of continuing
support and ultimately restoring the Systems & Operations
section.
- Assemble and verify availability of all necessary hardware,
software, and resources at the back-up site
- Install and test systems and applications software
- Arrange for and test/verify full recovery of communications
capabilities
- Determine starting point for recovered operations:
- establish latest back-up files to be restored
- establish priority sequence for restoring most critical
applications
- revise production schedules
- alert the user community to status and potential gaps in
data and/or changes in procedures (i.e., need to re-enter
lost data or resubmit requests/reports, etc.)
- Restore operations and begin processing (with most critical
applications, first)
- Monitor and verify restoration is complete and data
integrity and continuity have been re-established
- Resume full processing schedule
- Notify hardware maintenance providers of disaster condition
and disposition of affected equipment
- Retrieve most recent back-up tapes and transport them to
alternate location
- Official Agencies (Fire, FBI, Police, Hazardous Materials,
County Civil Defense, County Emergency Management Office,
National Weather Service, Building Inspectors)
- Business Internal Staff
- Vendors (Facilities, Equipment, Software, Telephone, Power,
Support)
- Disaster Recovery Team
- Environments Recovery Team
- Operations Recovery Team
- User Support Recovery Team
This plan is based on the following assumptions:
- This plan will be invoked on occurance of a disaster
affecting a contact or data center.
- Once a disaster which is covered by this plan has been
declared, the plan, duties, and responsibilities will remain in
effect until the disaster is resolved and services are
restored.
- Funds will be made available to procure necessary
replacement facilities and hardware.
- An inventory exists of all hardware and software that may
need to be replaced.
- Replacement hardware is available.
- Alternate facilities for contact and data centers are
available.
- Data is regularly backed up and preserve at a remote
location.
LETTER FROM MANAGEMENT ENDORSING PROGRAM
MISSION STATEMENT."
CONTENTS:
SECTION I - GENERAL POLICIES
1.1 Introduction
1.1.1 Background
1.1.2 Scope and Objectives
1.2 Plan General
1.3 Identification of Key Personnel
1.4 Initial Response and Recovery Actions
1.5 Responsibilities
1.5.1 General Responsibilities
1.5.2 Specific Responsibilities
1.6 Recovery/Restoration Activities
1.6.1 ACD Damage Assessment Activities
1.6.2 ACD Recovery Activities
1.6.3 ACD Salvage Activities
1.6.4 ACD Restoration Activities
1.6.5 Supporting Checklists
1.7 Plan Documentation
1.8 Plan Distribution
1.8.1 Distribution List
1.8.2 Distribution Procedure
1.9 Plan Testing
1.10 Plan Maintenance
1.11 Disaster Scenarios
1.11.1 Types of Disasters
1.11.2 Where Disasters Occur
1.12 Critical Call Center Assets
1.13 Emergency Declaration Guidelines
1.13.1 Five Basic Levels of a Disaster
1.13.2 Decision Criteria
1.14 Recovery and Restoration Time Frames
1.14.1 1-6 Hours After Being Notified
1.14.2 6-12 Hours After Being Notified
1.14.3 12-24 Hours After Being Notified
1.14.4 24+ Hours After Being Notified
1.15 Plan Format
1.16 Budgeting/Funding
SECTION II - CONTINGENCY AND RECOVERY PLANS
2.1 Baseline Plan Organization and Structure
2.2 Introduction
2.3 Pre-Planning Activities
2.4 Plan Distribution
2.5 Security and Disaster Prevention
2.6 Disaster Preparedness/Security
2.7 Disaster Recovery Action Plans
2.8 Training Activities
2.9 Plan Documentation
2.10 Plan Implementation
2.11 Plan Testing
2.12 Plan Maintenance
2.13 Plan Training
2.14 Summary of Activities - Disaster Response
2.15 Summary of Activities - Disaster Recovery
2.16 Summary of Activities - Disaster Restoration
2.17 Detailed Activities - Disaster Response
2.18 Detailed Activities - Disaster Recovery
2.19 Detailed Activities - Disaster Restoration
2.20 ACD Recovery Considerations - General
2.21 Risk Analysis - External Risks
2.22 Risk Analysis - Internal Risks
2.23 Risk Analysis - Security
2.24 ACD Hardware Asset Recovery Activities
2.25 Site Recovery Plan
SECTION III - PLAN MAINTENANCE, TESTING, MISCELLANEOUS
3.1 Plan Maintenance
3.2 Plan Testing
3.3 Plan Documentation
3.4 Plan Distribution
3.5 Training
3.6 Service Prioritization