Master the Diving Catch: Storage Recovery Challenges & Prospects
Jon William Toigo
Independent Consultant and Author
Toigo Productions
Introduction and Welcome
• “Masters of the Diving Catch”
• Topics for Discussion– Fundamentals of Fielding the Ball
– How Free Agency is Wrecking the Game
– The Importance of Spring Training
• Q&A
Then…
• Early DR planning and storage recovery– Comparatively simple, “secretary-friendly”– Bolt-on to existing applications and host
platforms– 1-for-1 replacement of mainframe and DASD– 24 to 72 hour recovery timeframe
…and Now
• DR in the Internet Era– 7x24x365– Lots of players — heterogeneous storage
platforms supporting heterogeneous client-server hosting configurations
– Faster, harder, and…
Fundamentals for Fielding the Ball
• Data is a irreplaceable asset• Goal: Avoid preventable data disasters…
– Corruption of asset (security, virus protection, backups)
– Interrupted availability (fault tolerance, meshed links, effective monitoring and management)
• While minimizing impact of events that just can’t be prevented.
All Disaster Recovery Strategies Consist of…
- EITHER -• Redundancy
– Duplicating assets on a “1-for-1” or consolidated basis– Deploying redundant assets at a sufficiently distant
location to avoid regional disaster events
- OR -• Replacement
– Fielding new assets within recovery timeframe requirements
With Data, Redundancy is the Only Option…
• Restoring data from damaged media (time consuming…)
• Re-building data from original source materials (difficult or impossible…)
• Recovering storage requires copies of current or near current data, suitable data hosting platform, and pre-planned strategy
Characteristics of Many Storage Environments Raise
Challenges• Data growth is poorly managed in most shops (lack of tools, lack of time, lack of open management standards, lack of strategic planning)
• Knee-jerk acquisitions of popular storage products leads to platform heterogeneity
Free Agency is Killing the Game…Free Agency is Killing the Game…(Ask any Storage Vendor)(Ask any Storage Vendor)
Why Effective Storage Recovery Requires a Diving
Catch…• 1-for-1 replacement is increasingly costly
• Consolidation strategies are more complex– Cross-platform data re-hosting takes time– Lack of software tools for data re-hosting
requires use of tape as medium– Large (and growing) data volume calling
efficacy of tape into question in some settings
Cross-Platform Data Re-Hosting
1-for-11-for-1Storage PlatformStorage Platform
RestoralRestoral$$$$$$
ProductionProductionEnvironmentEnvironment
RecoveryRecoveryEnvironmentEnvironment
Ideal Approach
Re-host DataRe-host DataOn ConsolidatedOn Consolidated
Minimum EquipmentMinimum EquipmentConfigurationConfiguration
(e.g., Large Array or SAN)(e.g., Large Array or SAN)
ProductionProductionEnvironmentEnvironment
RecoveryRecoveryEnvironmentEnvironment
Reality
Software Tools for MirroringSoftware Tools for Mirroring(e.g., EMC SRDF)(e.g., EMC SRDF)Only Work withOnly Work with
Same-Type ArraysSame-Type Arrays
What about 3rd Party Volume Managers?
• Limited platform support
• Vicissitudes of Host-Based Mirroring
• Warranty hassles
The Continuing Need for Tape
Tape provides the Tape provides the Only ReliableOnly Reliable
means for means for Data Re-HostingData Re-Hosting
To Arrays To Arrays or Zoned Fabricsor Zoned Fabrics
FC SWOR
ProductionProductionEnvironmentEnvironment
RecoveryRecoveryEnvironmentEnvironment
Ironies…
• “Tape is dead, and SANs have killed it.” (?!?)– Tape accounted for nearly 75% of SAN deployments through
2000, and sharing tape continues to be a leading SAN deployment motivator
• “Tape is too slow for high volume, mission-critical data. You need mirroring.” (?!?)– Ignores tape automation, increasing capacity
and speed of tape drives, and multi-stream capabilities
– Ignores nature of most databases: 80% static, 20% active
– Ignores cost of mirroring
Who’s the fairest one of all?
LAN/SANLAN/SAN MAN/WANMAN/WAN
SYMMETRICALSYMMETRICALMIRROR MIRROR (LOCAL,(LOCAL,
SHORT DISTANCE,SHORT DISTANCE,LOW LATENCY)LOW LATENCY)
ASYMMETRICALASYMMETRICALMIRRORMIRROR
(REMOTE, 2(REMOTE, 2NDND PROCESS, PROCESS,DATA NOT DATA NOT
SYNCHRONIZED)SYNCHRONIZED)
Pros and Cons of Mirroring
• PRO– Fast recovery of data access– Less vulnerability to outage– Demonstrated track record– Adjustable to recovery
requirements– New technologies (e.g.,
Wave Division Multiplexing) reducing cost of WAN/MAN interconnect
• CON– Another process to monitor
– Vendor lock-in because of platform-specific mirroring software
– Three-tier configuration required to avoid latency in production applications
– High-cost solution suited only to most extreme data recovery requirements
Pros and Cons of Tape
• PRO– Media price lower than
disk– Well-designed strategy
optimized for time-to-data– Improving media
management capabilities– Low latency solutions can
be designed– Multi-streams make multi-
TB restores feasible
• CON– Tape subject to wear and
exposed to damage in transit (tape vaulting a potential solution)
– Potential conflicts with virtualization engines
– Disk prices are falling, capacities growing
– On-line data is “better” than near-line or off-line data
Facts are Facts…
• Most IT architects embracing HSM– Enabled by evolving infrastructure view of storage– Leveraging cheaper disk platforms (IDE/ATA drives), tape
or optical for “near-line” configurations– “Content Networking”
• Problems with fault tolerance through load balancing in storage fabrics remain — no silver bullets
• Server-free backup remains a holy grail– Where does the metadata go?– Lack of granularity in “bare metal” backups
The Rise of Near-Line
LAN/SANLAN/SAN
SCSI/FCSCSI/FCDISK ARRAYDISK ARRAY
FORFORON-LINE/ACTIVEON-LINE/ACTIVE
STORAGESTORAGE
IDE/ATA PLATFORMSIDE/ATA PLATFORMSOROR
TAPE (OR OPTICAL)TAPE (OR OPTICAL)FOR NEAR-LINE/STATICFOR NEAR-LINE/STATIC
STORAGESTORAGE
Select hosting platformSelect hosting platformBased on data characteristicsBased on data characteristics
And cost criteriaAnd cost criteria
The Trouble with Load Balancing
User
Load Balancers
Storage Servers
Fabric Switch
DataStorage
Zone A – User Files Zone BE-mail
Zone CDatabase
Big Issue: Potential Choke Point in Tape-based Data
Restore Storage
Servers
DataStorage
“Virtualization” Engine
“Virtual Volumes”
BackupsBackups(Reads)(Reads)
FromFrom““VirtualVirtual
Volumes”Volumes”To To
TapeTapeOKOK
RestoresRestores(Writes)(Writes)
FromFromTapeTapeToTo
““VirtualVirtualVolumes”Volumes”ChokedChoked
ByBy““Virtualization”Virtualization”
softwaresoftware
Tape and “Virtualization”
• Old software RAID write penalty…again• 100+ hours to restore 1 TB of files to
virtualized environment• Concordance of backup & restore software
and “virtualization engine” (LUN aggregation software) must be tested and verified
Get to Spring Training Camp• DR landscape complicated by
burgeoning data and new technologies “When you come to a fork in the road, take it.”*
• Proactive strategies required “You can’t think and hit at the same time.”*
• Need for frequent and thorough testing underscored “You can observe a lot by watching.”*
* Storage Recovery a la Yogi Berra
The Game goes on
• True storage networks coming soon to a theater near you…
• DataCore and FalconStor pioneering platform-agnostic data re-hosting…
• Work on standards-based management continues..– De-facto (EMC)– Open (CIMOM)
Tips for Staying in the Game
• Make recoverability a key criterion/consideration when selecting components, designing applications, architecting infrastructure, etc.
• Filter through the market hype by becoming knowledgeable about technology and its limitations– SearchStorage.com– Drplanning.org
• Join and attend a DR user group: learn from peers
And remember Yogi Berra’s greatest line:And remember Yogi Berra’s greatest line:
““Baseball (like storage recovery) is 90% mental. The other half is physical.”Baseball (like storage recovery) is 90% mental. The other half is physical.”
And look forAnd look forDisaster Recovery Planning 3/eDisaster Recovery Planning 3/e
andandThe Holy Grail of Networked Storage ManagementThe Holy Grail of Networked Storage ManagementComing from Prentice Hall PTR in Summer 2002Coming from Prentice Hall PTR in Summer 2002
For further information…For further information…