Skip to main content

IT- Disaster Recovery for a Business Continuity

IT  disaster  recovery  consists  of  developing  step-by-step  procedures  for  a  full recovery, disaster avoidance and business continuity.

When many think about DR, they usually think about Backup, while it is only one piece in BC-DR puzzle and inefficient for a continuity of business operations in an event of a disaster.

Backup is not disaster recovery (DR) based on following points:

  • Failure of backup software
  • Service Levels: backups typically happen twice per day which means that a RTO will be significantly higher and RPO could be ~12 hours data loss which is not acceptable for critical applications in DR concept.
  • Reverse Replication: in an event of an outage, once an application has been made available on a target site, you must extend that application’s protection to include new data being created. A backup solution can not start taking backups and ship them back to a production site, yet a DR solution will ensure that an application is still protected by replicating back to a source site.
  • Application Impact: backups occur at night because, making a copy of an application and its data load a CPU on a server and impacts significantly end-user productivity.

Every institution large or small should have both a backup mechanism and disaster recovery solution in place; they are complementary pieces to a same puzzle. 

Mitigation Measures For Some IT- Hazards

POSSIBLE RISK

MITIGATION MEASURE

DOWNTIME

 

     Hardware

     Software

 

     Redundancy

     Maintenance and upgrade of software

NETWORK

 

     Unreliable network

 

     Loss of connectivity

 

     Traffic

     Misconfiguration

 

 

 

     Design and monitor a network for a maximum reliability

     Physical protection, Redundancy or diverse paths

     Network segmentation

     Installation of firewalls to ensure security

     Load balancing (Intelligent direction to backup site)

     Use automation to deploy changes, test all configurations in a lab environment before making changes on your production devices.

DATA AND APPLICATION

 

     File corruption

     Application downtime

     Malicious software

 

 

 

     Data backup

     Mirroring of application, load balancing and replication

     Security management and installation of antivirus

 

EQUIPMENT FAILURES

 

     Server failure

     Server Overload

     Other Hardware

     Old equipment

 

 

 

     Redundant disks, Backups, SAN / NAS

     Load balancer/Monitoring/virtualization

     Regular maintenance

     Planning for upgrades and replacing out-of-date equipment.

POWER

 

     Power Outage

     Equipment failure

 

 

 

     Redundancy and backup power supply (UPS and Generators)

     Monitoring and performing preventative maintenance regularly.

ATTACKS

 

     DDoS

     Viruses

     Hackers

     Other attacks

 

 

 

     Managed security services/anti-DDoS

     Installation of antivirus

     Firewall and other security features

     Access control system

HUMAN ERROR

 

     File deletion

     Unskilled  people

     Fire

 

     Regular backup

     Access management

     Training / Staff certification requirements

     Fire detection system, fire extinguisher and fire hydrant

Factors Influencing a Successful IT- Disaster Recovery

A. INFRASTRUCTURE

An infrastructure is a fundamental aspect which impacts and defines an output; an infrastructure condition or state should be well known in terms of network connectivity, quality, performance, processing capability and scalability.

Considerations at infrastructure layer:

  • Before any hosting or connectivity, a required infrastructure including additional hardware and software especially needed for recovery and replication should be well defined and avoid single purpose infrastructure.
  • Same Infrastructure on both sites(Primary and Alternative site)
  • Availability of maintenance facilities

B. RTO AND RPO MEASUREMENT

RTO and RPO measurement should be based on a business impact analysis (BIA), conducted, that contain a classification and BIA matrix (criticality and priority level) of systems/Assets.

For critical systems RTO and RPO should be minimized to zero.

C. REDUNDANCY AND BACKUP

Backups and redundancy are both infrastructure and data protection methods, but which can not be replaceable and should be applied at every layer.

Redundancy is a data and system protection method considered as a real time fail prevention measure.

Backup does not provide real-time protection, but by performing restoration for it provides a protection against greater loss.

Data and system backup should be done regularly and kept offsite. 

D. HIGH AVAILABILITY(HA)

HA is a disaster avoidance, a capability to automatically switch to alternative site without any downtime.

HA is achieved by applying:

  • Clustering (mirroring of critical applications)
  • Replication of clusters
  • Load balancing in network which improves a HA by arranging multiple servers running simultaneously in primary and secondary order.
  • Redundancy should be fairly implemented and sufficient at every layer (network, storage, etc.).

E. LEVEL OF DISASTER RECOVERY SITES

level.jpg

F. REPLICATION SOLUTION

Replication for disaster recovery (DR) is no longer a “nice to have” technology, but a necessary part of every disaster recovery solution.

Replication Mode

replication mode.jpg

G.  Virtualization

Software technique in which a single physical resource appears as multiple logical resources which reduce a data center complexity and improve restoration.
With  this  solution  you  have  fewer  number  of  machine  to  manage,  also  server

including operating systems, applications, patches, are all encapsulated into a single virtual server; hardware is virtual and completely separated from the actual, physical hardware in the host server, this separation and encapsulation allow redundancy and restoration, as a virtual server can be restored on another host if necessary.

H.  Security System

Physical and cyber security system should be established.

Refer to Directive on cyber security for network and Information 

IT- Disaster Recovery Strategies

Disaster Recover Strategies.jpg

Figure 4: IT disaster recovery strategies

IT disaster recovery strategies encapsulate recovery solution at different layer

Disaster Recovery Phases

The main phases for responding to a disaster are:

  • Deficiency/damage notification
  • Analysis and evaluation
  • Response and control of disaster (crisis management).
  • Site rehabilitation and returning business to operating normal level.
  • Documentation / Plan activation/update. 

To ensure long-term viability and effectiveness of Business Continuity Plan, organization should maintain, conduct, and document a business continuity testing, training program regularly.

  • Conducting a plan review at least quarterly
  • Conduct continuity awareness briefings or orientations for entire workforce
  • Train personnel on all reconstitution plans and procedures, recovery process.
  • Test   and   validate   equipment   monthly   to   ensure   internal   and   external interoperability, test viability of communications, alerts, notifications systems
  • Test primary and backup infrastructure systems and services at primary and secondary recovery sites