IT- Disaster Recovery for a Business Continuity

IT  disaster  recovery  consists  of  developing  step-by-step  procedures  for  a  full recovery, disaster avoidance and business continuity. 

 When many think about DR, they usually think about Backup, while it is only one piece in BC-DR puzzle and inefficient for a continuity of business operations in an event of a disaster. 

 Backup is not disaster recovery (DR) based on following points: 

 

 Failure of backup software 

 Service Levels: backups typically happen twice per day which means that a RTO will be significantly higher and RPO could be ~12 hours data loss which is not acceptable for critical applications in DR concept. 

 Reverse Replication: in an event of an outage, once an application has been made available on a target site, you must extend that application’s protection to include new data being created. A backup solution can not start taking backups and ship them back to a production site, yet a DR solution will ensure that an application is still protected by replicating back to a source site. 

 Application Impact: backups occur at night because, making a copy of an application and its data load a CPU on a server and impacts significantly end-user productivity. 

 

 Every institution large or small should have both a backup mechanism and disaster recovery solution in place; they are complementary pieces to a same puzzle.  

 Mitigation Measures For Some IT- Hazards 

 

 

 

 

 POSSIBLE RISK 

 

 

 MITIGATION MEASURE 

 

 

 

 

 DOW N TIME 

   

 • 

   Hard w are 

 • 

   S oftwa r e 

 

 

   

 • 

   Redunda n c y 

 • 

   Ma i n te n a n c e a n d up g ra d e of s o ft w are 

 

 

 

 

 N ET W ORK 

   

 • 

   U n rel i able n etwork 

   

 • 

   L o s s of c on n e c t i v i ty 

   

 • 

   T raf f i c 

 • 

   M i s c on f i g ura t i on 

 

 

   

   

   

 • 

   De s i g n and m o n i tor a n e t work for a m a xi m u m r e l i a b i l i ty 

 • 

   Ph y s i c al p r o t e c t i on, R e d u n da n c y or dive r s e p a t hs 

 • 

   N etwork s e gm en t at i on 

 • 

   Installa t i on of f i re w al l s t o e n s ure s e c u r i ty 

 • 

   L oad b ala nc i n g (Intell i g e n t d i re c t i on to b a c kup s i t e ) 

 • 

 Us e au t o m at i on t o dep lo y c ha ng e s , t e s t all c onf i g urat i ons i n a lab en v i ro nm ent b ef o re m a k in g c h an g es on y o u r p r od u c t i on d evi c e s . 

 

 

 

 

 DA T A A N D A PP LIC A TI O N 

   

 • 

     F i le c o rrup t i on 

 • 

   A p p l ic a t i on dow n t i m e 

 • 

   Mal i c i o u s s o ftware 

 

 

   

   

   

 • 

   Data b a c k up 

 • 

   M i rro r i n g of app l ic a t i on, l oad b ala nc i n g and repl i c a t i on 

 • 

   S e c ur i ty m ana ge m ent a n d in s talla t i on o f a n t i v i r us 

 

 

 

 

   

 EQ U I P ME N T FA I LURES 

   

 • 

   S er v er f a i lu r e 

 •   

   S er v er Ov er l oad 

 • 

   Other Hard w a r e 

 • 

   Old equ i p m ent 

 

 

   

   

   

 • 

   Redunda n t d i s k s , B a c k u p s , S A N / NAS 

 • 

   L oad b ala n c e r/Mo n i tor i n g / v i rtual i zat i on 

 • 

   Re g ular m a i n te n a n c e 

 • 

   Pla nn i n g for up g rad e s a n d repla c i n g ou t - o f - date e q u i p m en t . 

 

 

 

 

 P O W ER 

   

 • 

   Po w er O utage 

 • 

   Equ i p m ent f a i lu r e 

 

 

   

   

   

 • 

   Redunda n c y a n d b a c kup power s upply (UPS a n d G enerato r s ) 

 • 

   Monito r i n g a n d perfo r min g p re v en t at i v e m a i n te n a n c e re g ularl y . 

 

 

 

 

 AT T A CKS 

   

 • 

   DD o S 

 • 

   Vi r u s es 

 • 

   Ha c ke r s 

 • 

   Other att a c ks 

 

 

   

   

   

 • 

   Mana g ed s e c u r i ty s e r v i ce s /a n t i - D DoS 

 • 

   I n s talla t i on of an t i v i r us 

 • 

   F i rewall a n d o t h e r s e c ur i t y featur e s 

 • 

   A cc e s s c o n t r ol s y s t em 

 

 

 

 

 H UMAN E RROR 

   

 • 

   F i le dele t i on 

 • 

   U n s k i lled   peo p le 

 • 

   F i re 

 

 

   

 • 

   Re g ular b a c k up 

 • 

   A cc e s s ma n a g e m ent 

 • 

   T r a i n i n g / S taff c er t i f i c a t i on requ i r e m en t s 

 • 

   F i re d et e c t i on s y s t e m , f i re e x t i ng u i s h e r and f i re h y d ra n t 

 

 

 

 

 Factors Influencing a Successful IT- Disaster Recovery 

 A. INFRASTRUCTURE 

 An infrastructure is a fundamental aspect which impacts and defines an output; an infrastructure condition or state should be well known in terms of network connectivity, quality, performance, processing capability and scalability. 

 Considerations at infrastructure layer: 

 

 Before any hosting or connectivity, a required infrastructure including additional hardware and software especially needed for recovery and replication should be well defined and avoid single purpose infrastructure. 

 Same Infrastructure on both sites(Primary and Alternative site) 

 Availability of maintenance facilities 

 

 B. RTO AND RPO MEASUREMENT 

 RTO and RPO measurement should be based on a business impact analysis (BIA), conducted, that contain a classification and BIA matrix (criticality and priority level) of systems/Assets. 

 For critical systems RTO and RPO should be minimized to zero. 

 C. REDUNDANCY AND BACKUP 

 Backups and redundancy are both infrastructure and data protection methods, but which can not be replaceable and should be applied at every layer. 

 Redundancy is a data and system protection method considered as a real time fail prevention measure. 

 Backup does not provide real-time protection, but by performing restoration for it provides a protection against greater loss. 

 Data and system backup should be done regularly and kept offsite.  

 D. HIGH AVAILABILITY(HA) 

 HA is a disaster avoidance, a capability to automatically switch to alternative site without any downtime. 

 HA is achieved by applying: 

 

 Clustering (mirroring of critical applications) 

 Replication of clusters 

 Load balancing in network which improves a HA by arranging multiple servers running simultaneously in primary and secondary order. 

 Redundancy should be fairly implemented and sufficient at every layer (network, storage, etc.). 

 

 E. LEVEL OF DISASTER RECOVERY SITES 

 

 F. REPLICATION SOLUTION 

 Replication for disaster recovery (DR) is no longer a “nice to have” technology, but a necessary part of every disaster recovery solution. 

 Replication Mode 

 

 G.  Virtualization 

 Software technique in which a single physical resource appears as multiple logical resources which reduce a data center complexity and improve restoration. With  this  solution  you  have  fewer  number  of  machine  to  manage,  also  server 

 including operating systems, applications, patches, are all encapsulated into a single virtual server; hardware is virtual and completely separated from the actual, physical hardware in the host server, this separation and encapsulation allow redundancy and restoration, as a virtual server can be restored on another host if necessary. 

 H.  Security System 

 Physical and cyber security system should be established. 

 Refer to Directive on cyber security for network and Information   

 IT- Disaster Recovery Strategies 

 

 Figure 4: IT disaster recovery strategies 

 IT disaster recovery strategies encapsulate recovery solution at different layer 

 Disaster Recovery Phases 

 The main phases for responding to a disaster are: 

 

 Deficiency/damage notification 

 Analysis and evaluation 

 Response and control of disaster (crisis management). 

 Site rehabilitation and returning business to operating normal level. 

 Documentation / Plan activation/update.  

 

 To ensure long-term viability and effectiveness of Business Continuity Plan, organization should maintain, conduct, and document a business continuity testing, training program regularly. 

 

 Conducting a plan review at least quarterly 

 Conduct continuity awareness briefings or orientations for entire workforce 

 Train personnel on all reconstitution plans and procedures, recovery process. 

 Test   and   validate   equipment   monthly   to   ensure   internal   and   external interoperability, test viability of communications, alerts, notifications systems 

 Test primary and backup infrastructure systems and services at primary and secondary recovery sites