Disaster Recovery Experts Speak Out (continued) 6Disaster Recovery Experts Speak Out (continued) 6
<b>THE DISCUSSION (continued)</b><P><b>Redundancy</b><br>thread by Dee (07-Nov-01 7:24 PM GMT) <P>What's the opinion on having redundant equipment or fail-over systems or mirrored data storage at another site (not the primary data center) and building a disaster recovery plan around that and possibly using a hot site or some other recovery strategy if the primary data center is destroyed or rendered unavailable? <P><blockquote><b> Re: Redundancy </b>
THE DISCUSSION (continued)
Redundancy
thread by Dee (07-Nov-01 7:24 PM GMT)
What's the opinion on having redundant equipment or fail-over systems or mirrored data storage at another site (not the primary data center) and building a disaster recovery plan around that and possibly using a hot site or some other recovery strategy if the primary data center is destroyed or rendered unavailable?
Re: Redundancy
by Rick Weaver (07-Nov-01 9:30 PM GMT)There are several offerings that support 'remote site mirroring', which is what you are suggesting. There are pros and cons to each solution, and few caveats to be weighed when making a decision. Basically, there are 5 cost components to remote site mirroring - the Hardware (redundant storage), the Software (to drive the mirroring), the Network, Facilities (to house the remote mirror), and Operations (to handle the care and feeding of the mirror). At time of disaster, you can the execute a 'Disaster Restart' (as opposed to a 'Disaster Recovery'), and theoretically you are back in business. Some of the other considerations of remote site mirroring include the distance to the remote mirror (which will impact Network cost and Production Performance), the Integrity of the data at the remote site after restart, and the amount of time it takes to cable the remote disk to a processor. To address the 'distance sensitivity' issue most vendors have an 'asynchronous' form of mirroring, so there is some data loss with these types of solutions (although it can be measured in minutes or hours instead of days).
A lot of the business impacted by the World Trade Center tragedy were in fact using some form of remote mirroring, and had minimal data loss disruption due to the loss of those building. The bigger issue has turned out to be workarea and office space for the displaced employees.
I know we're not supposed to just post our web site, and I don't want to get in trouble with Esther etc., but you might want to check http://www.bmc.com/products/whitepapers.cfm and look at 'Remote Recovery - Advanced Technology Solutions for OS/390'. There are analogous solutions for Distributed, but I haven't written that paper yet. Any questions feel free to chat back, or e-mail me at [email protected].
Regards, Rick Weaver Product Manager Recovery and Storage Management BMC Software
Re: Redundancy
by Boris Geller (08-Nov-01 0:07 AM GMT)Actually, integration between both HA failover and network-based data mirroring can be the "best of both worlds".
FYI - SteelEye just announced our Disaster Recovery Solution that does precisely that for Windows and Linux-based servers. Visit www.steeleye.com for additional info.
Regards, Boris.
Re: Redundancy
by Martin Garvey (08-Nov-01 1:00 AM GMT)For true business continuity, it's a must. On line mirroring can be cost prohibitive. But I just wrote a story on a very well known company with a history of downtime that now has hot backup with between 5 minutes and a half hour. Brokerage houses lose millions per minute. Few companies are like that. So the IP network can work for most companies' geographic dispersion plans. One location in New York, have another in Texas. The Number one danger is the mighty weather. The key is to pick two or more places that won't ever be hit by the same damaging storm. You still have to start with the right OS, and check analyst reports on those availability numbers. But from there, business continuity is becoming more affordable. And far off locations are a must.
Re: Redundancy
by Jason Buffington (08-Nov-01 3:32 AM GMT)I think that there is no doubt that a separate data facility, using as close to real-time replication solutions as possible is critical to any Disaster Recovery plan. A couple thoughts:
1) Recognize that a D/R plan does not end with the data or the servers. If the plan does not include the people, there will be no one sitting at the consoles of those newly shipped PC's and redundant servers. So, one key to successful Disaster Recovery is that the redundant facility must be far enough away that it is not susceptible to the outage (i.e. power grid, hurricane, flood, etc) but close enough that the key people can get there if necessary. This will also help with routine D/R testing. For example (Dee), I might not want to go from Novi to Farmington Hills, but South Bend to Indianapolis would be viable.
2) When you do a value justification for any D/R initiative - remember the two simple metrics that apply (RTO and RPO). RTO (Recovery Time Objective) is how long until you are up again. RPO (Recovery Point Objective) is where is the data, once you are up. D/R budgets should really be based on how much money can you afford to lose, based on those metrics. For example, if D/R is strictly restoring from tape and the outage is at 4PM - what is the $$$ of lost productivity for 7 business hours (hint - Gartner estimated $39/hour/person).
3) Acknowledge that D/R plans without testing are no better than doing backups without ever trying to restore. Don't be surprised if you have "blank tapes". So, look for solutions that allow for access to the replicated or redundant data, WITHOUT invoking an outage. So that the D/R solution can be tested easily.
4) Lastly, know the difference between Disaster Recovery (keeping the data survivable) versus High Availability (keeping the users productive), so that user and management expectations are set correctly. For example, "failover" or high availability across country is probably non-viable for most user applications - although it is a great D/R scenario.
GOOD LUCK
Re: Redundancy
by Wayne Lam (09-Nov-01 1:04 AM GMT)Excellent idea, and I can cite one of FalconStor's case studies: Bell Micros has a Primary data center in Montgomery, AL and a backup/DR center in San Jose, CA, connected with 3 T1 lines. Both sites are identical, and kept synchronized using FalconStor's IPStor product. This is in fact the industry's first known "coast-to-coast IP SAN" (running since August 2001) despite claims of "first" by others.
Please email me if you want a copy of this case study, or simply go to FalconStor's web site.
Re: Redundancy
by Rick Weaver (09-Nov-01 4:01 PM GMT)One very important thing to remember with any of the hardware or software based replication solutions for Disaster Recovery - well over 80% of all recoveries are due to logic or user errors which corrupt the data. A replication facility will not prevent or correct from such an error. You have to be able to recover/restore the data from a good backup and apply logs (if a database) to recover to a point of consistency, or programmatically 'correct' the data (assuming you can do that and ensure data integrity). It's also prudent to have a 'Plan B' in case the replication facility fails (and there are a lot of components involved). Rick
Re: Redundancy
by KATHRYN (09-Nov-01 8:17 AM GMT)Who does one hire to establish perfect back up for a region's complete infrastructure, police, hospitals. My client wants only the best and has ordered sonetoc-12 channel.
Re: Redundancy
By Rick Weaver (09-Nov-01 4:01 PM GMT)One very important thing to remember with any of the hardware or software based replication solutions for Disaster Recovery - well over 80% of all recoveries are due to logic or user errors which corrupt the data. A replication facility will not prevent or correct from such an error. You have to be able to recover/restore the data from a good backup and apply logs (if a database) to recover to a point of consistency, or programmatically 'correct' the data (assuming you can do that and ensure data integrity).
It's also prudent to have a 'Plan B' in case the replication facility fails (and there are a lot of components involved).
Rick
About the Author
You May Also Like