storage over distance network design (cisco - 2003)
TRANSCRIPT
Copyright © 2003, Cisco Systems, Inc. All rights reserved. Printed in USA.OPT-2052 8224_06_2003_X2
111© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
2© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Storage over Distance:Network Design
Session OPT-2052
Copyright © 2003, Cisco Systems, Inc. All rights reserved. Printed in USA.OPT-2052 8224_06_2003_X2
333© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Agenda
• Why SAN Extension? Business Continuity, RPO/RTO Overview
• Database I/O Overview• Backup and Archive Overview• Replication and Mirroring Background• Host-Based Mirroring• Storage-Based Replication• SAN Extension Alternatives• Optical SAN Extension—Fiber, CWDM, DWDM• FCIP SAN Extension• Fabric Configuration and Flow Control for Extended SANs• High Availability• QoS• Wrap Up
444© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Storage over Distance:Network Design
• Why SAN Extension? Business Continuity, RPO/RTO Overview
• Database I/O Overview• Backup and Archive Overview• Replication and Mirroring Background• Host-Based Mirroring• Storage-Based Replication• SAN Extension Alternatives• Optical SAN Extension—Fiber, CWDM, DWDM• FCIP SAN Extension• Fabric Configuration and Flow Control for Extended SANs• High Availability• Wrap Up
Copyright © 2003, Cisco Systems, Inc. All rights reserved. Printed in USA.OPT-2052 8224_06_2003_X2
555© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Changing Requirements and Attitudes
• US FED (Federal Reserve), SEC (Securities Exchange Commission), and OCC (Comptroller of Currency) issued interagency paper* on April 7, 2003; it identified:
–Three business continuity objectives for financial firms
Rapid recovery of operations–Four sound practices to ensure resilience of the U.S. financial system
Maintain sufficient geographically dispersed resources to meet recovery and resumption objectives
• Regulations expected to spread to other industries–Higher insurance premiums for companies without DR processes and plans
*FED Docket No. R-1128; OCC [Docket No. 03-05]; SEC [Release No. 34-7638; File No. S7-32-02]
666© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Disaster Recovery and Business Continuity Options
1. Offsite Tape BackupSend tapes offsite by truck daily
2. Electronic VaultingUse WAN instead of truck
3. Remote Disk ReplicationContinuous Updates—zero or near zero data loss
4. Cold SiteTake tapes to another site
5. Duplicated Hot SiteReady to take over from primary site
Copyright © 2003, Cisco Systems, Inc. All rights reserved. Printed in USA.OPT-2052 8224_06_2003_X2
777© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Impact of a Disaster
• Disasters are characterized by their impact
Local, metro, regional, globalFire, flood, earthquake, attack
• Is the backup site within the threat radius?
Local1-2 km
Metro< 50km
Regional< 400km
PrimaryData Center
SecondaryData Center
DR Site
Global
888© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Recovery Time
time
Disasterstrikes
time t1 time t2
Systems recoveredand operational
Recovery time
extendedcluster
Manualmigration
Taperestore
secs mins hours days weeks
$$$ Increasing cost
Copyright © 2003, Cisco Systems, Inc. All rights reserved. Printed in USA.OPT-2052 8224_06_2003_X2
999© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Recovery Point
• How current or fresh is the data after recovery?
time
Disasterstrikes
time t1
time t0
Recovery point
SynchronousReplication
secsminshoursdays
asynchronousReplication
periodicreplication
Tapebackup
Recoverytime
Increasing cost $$$
101010© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Example RPOs and RTOs
DaysWeeksPrint Server
SecondsWeeksWeb Server
HoursHoursHR Database
SecondsSecondsCustomer Database
Recovery Time Objective (RTO)
Recovery Point Objective (RPO)
Application
Copyright © 2003, Cisco Systems, Inc. All rights reserved. Printed in USA.OPT-2052 8224_06_2003_X2
111111© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Determining Enterprise Policy
• Recovery PointWhat is the cost and impact of data loss?How much data loss is tolerable in event of disasteror failure
• Recovery TimeWhat’s the maximum tolerable outage?When must operations resume after a disaster?
Establishing these criteria will provide measurable targets in preparing the BC/DR plans, and designing the underlying Data
Center, Application, Storage, and Network Infrastructure
Establishing these criteria will provide measurable targets in preparing the BC/DR plans, and designing the underlying Data
Center, Application, Storage, and Network Infrastructure
121212© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Storage over Distance:Network Design
• Why SAN Extension? Business Continuity, RPO/RTO Overview
• Database I/O Overview• Backup and Archive Overview• Replication and Mirroring Background• Host-Based Mirroring• Storage-Based Replication• SAN Extension Alternatives• Optical SAN Extension—Fiber, CWDM, DWDM• FCIP SAN Extension• Fabric Configuration and Flow Control for Extended SANs• High Availability• Wrap Up
Copyright © 2003, Cisco Systems, Inc. All rights reserved. Printed in USA.OPT-2052 8224_06_2003_X2
131313© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Database I/O
• Databases try to minimize disk I/O
Disk is slow—defer writes until necessary
Cache as much as possible
All committed and uncommitted changes written sequentially to a “Redo Log”
Datafile contains database tables—can be out of step
Changes batched up
• An example using Oracle…
(Microsoft SQL Server is similar)
141414© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Databases and Replication
• “Typically” only Redo Logs replicated to remote site• Archived Redo Logs copied when Redo log switches• Point in Time (PiT) copies of Datafiles and Control Files
copied periodically (e.g. nightly)
Redo Logs (cycling)Redo Logs (cycling)
Datafiles
Control Files
ts
ts
Archived Redo Logs Archived Redo Logs
Replicated
Copied Periodically
Datafiles
Control Files
ts
ts
Primary Site Secondary Site
Point in Time Copies
Copyright © 2003, Cisco Systems, Inc. All rights reserved. Printed in USA.OPT-2052 8224_06_2003_X2
151515© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Recovering from Localized Failures
• Database restored to state at time of failure (time t1) by:
1. Restoring Control Files and Datafiles from last Hot Backup (time t0)
2. Sequentially replaying changes from subsequent Redo Logs (archived and online)—changes made between time t0and t1
Hot Backup of Datafiles and Control Files
Taken at Time t 0
t0
time
t1
Failure or disaster occurs at time t1
• Media Failure (e.g. disk)• Human Error (datafile
deletion)• Database Corruption
Archived Redo Logs Online Redo Logs
. . . . . . . . .
161616© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Storage over Distance:Network Design
• Why SAN Extension? Business Continuity, RPO/RTO Overview
• Database I/O Overview• Backup and Archive Overview• Replication and Mirroring Background• Host-Based Mirroring• Storage-Based Replication• SAN Extension Alternatives• Optical SAN Extension—Fiber, CWDM, DWDM• FCIP SAN Extension• Fabric Configuration and Flow Control for Extended SANs• High Availability• Wrap Up
Copyright © 2003, Cisco Systems, Inc. All rights reserved. Printed in USA.OPT-2052 8224_06_2003_X2
171717© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
What Is Backup?
• It’s “data protection”
To enable resumption of service after disaster or failure (incl. data corruption)
To enable data movement
Compliance with regulatory data retention requirements
• Don’t forget Restore time
How long to restore from (full/incremental) backup?
Short Backup => Long Restore
Long Backup => Short Restore
181818© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Backup ? Replication/Mirroring
• Data is consistent across all disks
• Replication and Mirroring does not protect against
Data corruption
Accidental deletion
• Backups provide something to revert back to
Mirrored Volume (RAID1) Mirrored Volume (RAID1)
Replicated to other site
Copyright © 2003, Cisco Systems, Inc. All rights reserved. Printed in USA.OPT-2052 8224_06_2003_X2
191919© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Types of Backups
• PhysicalDisk image/raw disk partition
Fast, but copies everything
e.g. EMC Timefinder BCV (Business Continuance Volume)
• LogicalKnows structure and content of data, so can backup specific entities
Slower in raw speed, but,…can be faster than physical by only backing up changes (incremental)
202020© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Database Backup Types
• Online—logical backup
Backup while DB running (for 24x7 operation) e.g. Oracle RMAN utility
Backup specific DB tablespaces
But,…DB performance degraded during backup process
• Offline—logical backup
Backup while DB quiescent (no transactions)
• Raw Device—physical backup
Backup of disk partition
Copyright © 2003, Cisco Systems, Inc. All rights reserved. Printed in USA.OPT-2052 8224_06_2003_X2
212121© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Storage over Distance:Network Design
• Why SAN Extension? Business Continuity, RPO/RTO Overview
• Database I/O Overview• Backup and Archive Overview• Replication and Mirroring Background• Host-Based Mirroring• Storage-Based Replication• SAN Extension Alternatives• Optical SAN Extension—Fiber, CWDM, DWDM• FCIP SAN Extension• Fabric Configuration and Flow Control for Extended SANs• High Availability• Wrap Up
222222© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Replication Objectives
• Get the data to a recovery siteMaximize the “currency”
How much can you afford to lose? —RPO
• Enable rapid restorationUsable format at the other site
Copyright © 2003, Cisco Systems, Inc. All rights reserved. Printed in USA.OPT-2052 8224_06_2003_X2
232323© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Replication and Mirroring Alternatives
• Disk ReplicationTransparent to hostManaged by disk subsystem
e.g. EMC SRDF, HP DRM, HDS TruecopyOr,…3rd party volume replicator
e.g. Veritas Volume Replicator
• Host-Based MirroringHost volume manager duplicates writes
e.g. Windows LDM, Veritas Volume Manager
242424© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Storage over Distance:Network Design
• Why SAN Extension? Business Continuity, RPO/RTO Overview
• Database I/O Overview• Backup and Archive Overview• Replication and Mirroring Background• Host-Based Mirroring• Storage-Based Replication• SAN Extension Alternatives• Optical SAN Extension—Fiber, CWDM, DWDM• FCIP SAN Extension• Fabric Configuration and Flow Control for Extended SANs• High Availability• Wrap Up
Copyright © 2003, Cisco Systems, Inc. All rights reserved. Printed in USA.OPT-2052 8224_06_2003_X2
252525© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Host-Based Mirroring—A Look Inside
• Volume Management Software aggregates Virtual Disk and presents as a single volume
• File system is mirrored identically on eachvirtual disk
• Writes are duplicated for each mirror
Write is only complete when all acknowledgements returned
• Reads from either diskRound-robin or select a preferred disk
Disk Driver
App1 App2 App3. . .
Host Server
HBA HBA
Virtual Disk 1
Virtual Disk 2
Volume xyz
Duplicated writes –
single read
Volume Manager
File System
PreferredPath
262626© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Why Host-Based Mirroring?
• Simple Method of achieving RAID 1 protection
• Can exploit heterogeneous disks
e.g. Split Mirror across Compaq MA8000 and a JBOD
• Can separate the mirrors over distance
Locate second mirror at alternate site for disaster tolerance
Synchronous nature of mirroring will limit distance due to latency
Extension alternatives:
FCIP—extend Fibre Channel SAN to remote site
iSCSI—locate iSCSI target at remote site
Copyright © 2003, Cisco Systems, Inc. All rights reserved. Printed in USA.OPT-2052 8224_06_2003_X2
272727© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Host-Based Mirroring: Remote iSCSI
• e.g. Supplement an existing FibreChannel HBA attached host server
• Employ iSCSI over existing inter-site IP network
iSCSI driver onHost ServerVolume Manager on Host Server e.g. Windows LDM or Veritas Volume Manager
• HA through redundant links, routers and clusteredStorage Routers
• QoS: LLQ to minimize queuing latency onWAN link
IPNetwork
FC
Remote Mirror
SN5428 Storage Router or MDS9000
with IPS-8
FC
Local Mirror
Primary Site Disaster Recovery Site
iSCSI
282828© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Host-Based Mirroring:Local and Remote iSCSI
• iSCSI used for local and remote mirrors
• Each Mirror appears to iSCSI as a separate target
Can use single or multiple NICs
HA Cluster at each site to enhance availability
Volume Manager multipathing to further enhance availability
• HA through multiple links, routers, switches, and dual Storage Routers at each site
IPNetwork
FC FC
iSCSI
Primary Site Disaster Recovery Site
SN5428 or MDS9000 with IPS-8
SN5428 or MDS9000 with IPS-8
Copyright © 2003, Cisco Systems, Inc. All rights reserved. Printed in USA.OPT-2052 8224_06_2003_X2
292929© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Host-Based Mirroring: FCIP (7200 PA)
• Fibre Channel SAN extended between sites with FCIP tunnel
• Storage visible from either site (subjectto zoning)
Volumes defined and managed as per normal FC rules
• HA through dual FCIP tunnel over redundant linksand routers
FCIP PA7200/7400
with FCIP PA
IPNetwork
FC FC
FCIP FCIP
7200/7400 with FCIP PA
303030© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
I/O Traffic with Remote Mirrors
• Select Local Mirror as “Preferred”—don’t use “Round-Robin!”
Host will read only from local “preferred” mirror
• Read performance is independent of distance
• Write replicated toall mirrors
• Write I/O does not complete until writes acknowledged fromall mirrors
• Write performance degrades with distance/latency
• Write may consist of 1 or 2 round trips depending upon use of iSCSI R2T (Ready to Transfer)
IPNetwork
FC
Remote Mirror
FC
Local Mirror
Primary Site Disaster Recovery Site
iSCSI
Replicated writes
Write acknowledgementsSCSI Status=good
Read from local
preferred mirror
Copyright © 2003, Cisco Systems, Inc. All rights reserved. Printed in USA.OPT-2052 8224_06_2003_X2
313131© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Application of Host-Based Mirroring
• Requirement for “no data loss” Recovery Point and low Recovery Time in case of site failure or regional disaster
• Heterogeneous Disk Arrays located at each site or vendor Array based replication deemed too expensive
323232© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Host-Based Mirroring: Best Practices
• Sites ideally located <100km (60mi) apart (but balance this with the“threat radius”)
Shorter distance/latency will enable better performance, but……sites too close maybe affected by same failure/disasterOne or two round trips per write (iSCSI R2T required?) = >10 or 20us/km additional latency
• Application has low write ratioSQL Server and Oracle OLTP apps are typically 80% read and 20% write, but very sensitive to latency
• Inter-site network is:Highly available: redundant linksProvisioned sufficiently for application traffic (including occasional resynchronization)QoS policies applied and enforced if over shared links
FCFC
IPNetwork
Site 1 Site 2
Distance?
Copyright © 2003, Cisco Systems, Inc. All rights reserved. Printed in USA.OPT-2052 8224_06_2003_X2
333333© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Storage over Distance:Network Design
• Why SAN Extension? Business Continuity, RPO/RTO Overview
• Database I/O Overview• Backup and Archive Overview• Replication and Mirroring Background• Host-Based Mirroring• Storage-Based Replication• SAN Extension Alternatives• Optical SAN Extension—Fiber, CWDM, DWDM• FCIP SAN Extension• Fabric Configuration and Flow Control for Extended SANs• High Availability• Wrap Up
343434© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Basic Operation
• Two arrays located on extended FibreChannel Fabric
• Read only from local array
• Writes I/Os replicated to remote array
Replication managed by software in storage arrays
Host server is unaware of replication
Implementations are proprietary
SRDF, Truecopy, DRM,…
Fibre ChannelFabric
Writes replicated to remote target array synchronously
or asynchronously
Remote Storage
Array
Local Storage
Array
Host Server
Copyright © 2003, Cisco Systems, Inc. All rights reserved. Printed in USA.OPT-2052 8224_06_2003_X2
353535© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Modes of Operation
• Synchronous—all data written to controller cache of initiator and target arrays before I/O is complete and acknowledged to host
• Asynchronous—write acknowledged after write to initiator cache; write is replicated to target controller asynchronously
363636© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Synchronous Replication: I/O Detail
Controller in Remote
Storage Array (Target)
Controller in Local
Storage Array
(Initiator)
Host Server
Write, LUN=5, LBA=12345, DL=8kB
Transfer Ready
FCP Data (2kB frames)
SCSI Status=good
I/OServiceTime
Write, LUN=5, LBA=12345, DL=8kB
Transfer Ready
FCP Data (2kB frames)
SCSI Status=good
Last frame of sequence (E_S=1)
Response from local controller (initiator)
returned after remote (target) responds
t t t
RoundTrip
RoundTrip
Replication to Target does not begin until last FCP Data frame in sequence
received from host
Replication to Target does not begin until last FCP Data frame in sequence
received from host
Example: HP DRM
Copyright © 2003, Cisco Systems, Inc. All rights reserved. Printed in USA.OPT-2052 8224_06_2003_X2
373737© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Asynchronous Replication: I/O Detail
Controller in Remote
Storage Array
(Target)
Controller in Local Storage
Array (Initiator)
Host Server
Write, LUN=5, LBA=12345, DL=8kB
Transfer Ready
FCP Data (2kB frames)
SCSI Status=good
I/O Service
Time Write, LUN=5, LBA=12345, DL=8kB
Transfer Ready
FCP Data (2kB frames)
SCSI Status=good
Last frame of sequence (E_S=1)
Response from local controller (initiator)
returned independently of replication
t t t
Round Trip
Round Trip
Example: HP DRM
383838© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Other Implementations
• EMC SRDF over Fibre Channel operation is “similar” to HP DRM, but…
Uses proprietary commands (withproprietary CDB)
Replicates slightly more than original I/O (e.g. 4kB write à 4kB – 8kB replication)
• 2x Round Trips between source and destination arrays per write I/O
2 x 2 x 5µs/km = 20µs/km additional latencyEMC SRDF = SymmetrixRemote Data Facility
HP DRM = Data Replication Manager
Copyright © 2003, Cisco Systems, Inc. All rights reserved. Printed in USA.OPT-2052 8224_06_2003_X2
393939© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Latency and Synchronous Replication
• Two Round Trips between source and destination arrays per write I/O
2 x 2 x 5µs/km = 20µs/km additional latencye.g. at 50km à additional 1000µs (1ms) I/O Service time (write) with Synchronous replication
Implementation dependent (2RTT for SRDF, DRM)
50km (30mi)
250µs
250µs250µs
250µs
Speed of Light
c = 3 x 108m/s (vacuum) ˜ 3.3µs/km
Speed through fiber ˜ ? c ˜ 5µs/km
Speed of Light
c = 3 x 108m/s (vacuum) ˜ 3.3µs/km
Speed through fiber ˜ ? c ˜ 5µs/km
404040© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Effect of Latency on Sync Replication
• Only Write I/Os are affectedIncreased “Service (Response) Time”
• OLTP apps typically 80%R / 20%W(But disk I/O is not necessarily 80/20!)What data is actually replicated?
Redo Logs? Yes; Datafiles? MaybeDatabases are very sensitive to latency
Locks on Tables, Rows, etc…
• Tolerable latency is up to the application
Case-by-case basis20% write ?
80%read ?
OLTP app
Copyright © 2003, Cisco Systems, Inc. All rights reserved. Printed in USA.OPT-2052 8224_06_2003_X2
414141© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Asynchronous Replication Considerations
• Maximizes Application I/O RateNo waiting on response from remote site
• But,…Potential Data Loss in a disaster“In-flight” and queued I/Os not yet atremote site
Replication “Lag” is typically configurable (e.g. OUTSTANDING_IO in DRM)
424242© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Asynchronous Replication Considerations (2)
• If the configured “lag” limit is reached, replication changes to Synchronous Mode to clear the backlog (e.g. SRDF and DRM)
Instantly raises Write I/O response time
Occurs if: Write I/O Rate > SAN Extension capacity
ReplicationQueue
SAN ExtensionNetwork
Write I/O
SAN Extension Network Capacity must be dimensioned to handle Write I/O rate
Copyright © 2003, Cisco Systems, Inc. All rights reserved. Printed in USA.OPT-2052 8224_06_2003_X2
434343© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Asynchronous Replication—Network Capacity
• Typical HA Environment for SAN Extension
Each link must be able to handle full replication load
Configure each for <40% peak load (80% in failover)
Bottom link takes over replication load from top
444444© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Write Ordering
• Replication must preserve the order of writes
Sync is OK—dependent writes not initiated until prior writes completed
Async—must use timestamps orsequence numbers
• “Consistency Group”—maintains write ordering over several volumes
e.g. Database spread over several volumes
Copyright © 2003, Cisco Systems, Inc. All rights reserved. Printed in USA.OPT-2052 8224_06_2003_X2
454545© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Storage over Distance:Network Design
• Why SAN Extension? Business Continuity, RPO/RTO Overview
• Database I/O Overview• Backup and Archive Overview• Replication and Mirroring Background• Host-Based Mirroring• Storage-Based Replication• SAN Extension Alternatives• Optical SAN Extension—Fiber, CWDM, DWDM• FCIP SAN Extension• Fabric Configuration and Flow Control for Extended SANs• High Availability• QoS• Wrap Up
464646© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
SAN Extension
• SAN Extension means extending a Fibre Channel Fabric over distance
Campus
Metro
Regional
Global
Copyright © 2003, Cisco Systems, Inc. All rights reserved. Printed in USA.OPT-2052 8224_06_2003_X2
474747© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
SAN Extension Alternatives
• Dark Fiber—(1 or 2 Gbps per port)
• CWDM (1 or 2 Gbps per port)
• DWDM (1 or 2 Gbps per port)
• FCIP (variable up to 1 Gbps per port)
484848© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Storage over Distance:Network Design
• Why SAN Extension? Business Continuity, RPO/RTO Overview
• Database I/O Overview• Backup and Archive Overview• Replication and Mirroring Background• Host-Based Mirroring• Storage-Based Replication• SAN Extension Alternatives• Optical SAN Extension—Fiber, CWDM, DWDM• FCIP SAN Extension• Fabric Configuration and Flow Control for Extended SANs• High Availability• Wrap Up
Copyright © 2003, Cisco Systems, Inc. All rights reserved. Printed in USA.OPT-2052 8224_06_2003_X2
494949© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
FC SAN
Dark Fiber
• Single 1 or 2 Gbps link per fiber pairSW (850nm) 300m over 62.5/125µm Multimode
SW (850nm) 500m over 50/125µm Multimode
LW (1310nm) 10km over 9/125µm Single Mode
• “Client Protection”—ULP (SAN or Application) responsible for failover protection
FC SAN
FC SAN
FC SAN
FC SAN
Diverse Paths for High Availability
<10km for LW Single Mode
505050© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
CWDM—Coarse Wave Division Multiplexing
• 8-channel WDM at 20nm spacing (cf DWDM at <1nm spacing)1470, 1490, 1510, 1530, 1550, 1570, 1590, 1610nm
• Special “Colored” SFPs (or GBICs) used in FC Switches• Muxing done in CWDM OADM (Optical Add/drop Multiplexer)
Passive (unpowered) device—just mirrors and prisms
• 30dBm power budget (36dBm typical) on SM fiber~90km Point-to-point or ~40km ring
• “Typically” not EDFA (Erbium Doped Fiber Amplifier) amplifiableOnly two wavelengths around 1550nm fit within EDFA range
Mux Mux1470nm1490nm1510nm1530nm1550nm1570nm1590nm1610nm
1470nm1490nm1510nm1530nm1550nm1570nm1590nm1610nm
Copyright © 2003, Cisco Systems, Inc. All rights reserved. Printed in USA.OPT-2052 8224_06_2003_X2
515151© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
2-Site CWDM Storage Network
• HA Resilience against fiber cut—“client” protection4-member Portchannel—2 x 2 diverse pathsPortchannel appears as single logical linkE_Port or TE_Port for carriage of VSANsLoad balance by Src/Dst (or Src/Dst/OXid)Fiber cut will halve capacity from 8Gbps to 4Gbps but not alter Fabric topology—no FSPF change
• MUX-8 would double capacity or leave spare for GigE channels
FCFC
2Gbps CWDM SFPs
MDS9000 MDS9000
Network
Pass
PassPass
PassNetwork
Portchannel 4 x 2Gbps over two diverse paths
MUX-4MUX-4
MUX-4MUX-4
MUX-4MUX-4
MUX-4MUX-4Diverse Paths - one-fiber pair each path
525252© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
DWDM—Dense Wave Division Multiplexing
• Higher Density than CWDM32 lambdas or channels in narrow band around 1550nm at 100GHz spacing (<1nm)
EDFA amplifiable à longer distances
Carriage of 1 or 2 Gbps FC, FICON, GigE, 10GigE, ESCON, IBM GDPS
• Data Center to Data Center
• Protection Options: Client, Splitter, or Linecard
Copyright © 2003, Cisco Systems, Inc. All rights reserved. Printed in USA.OPT-2052 8224_06_2003_X2
535353© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
DWDM Protection Alternativesfor Storage
• Single transponder required
• Protects againstFiber Breaks
• Failover causes Loss of Light (and Fabric Change if only link)
Working Lambda
Protected Lambda
Optical Splitter
• Dual transponders requiredMore expensive than Splitter-based protection
• Transmits over both circuits, but only one accepted
Optical Splitter Protection
Linecard or Y-cable Protection
Working Lambda
Protected Lambda
Y-cable
545454© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
DWDM HA Storage Network Topology
• Client Protection Recommended—Fabric and Application responsible for failover recovery
• Portchannel provides resiliencePortchannel members follow diverse pathsSingle fiber cut will not affect Fabric (no RSCNs, etc.…) Use “Src/Dst” hash for load balancing (rather than “Src/Dst/Oxid” per Exchange) for each extended VSAN
FC
MDS9000
2x 2Gbps Portchannel
FC
MDS9000
Copyright © 2003, Cisco Systems, Inc. All rights reserved. Printed in USA.OPT-2052 8224_06_2003_X2
555555© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Storage over Distance:Network Design
• Why SAN Extension? Business Continuity, RPO/RTO Overview
• Database I/O Overview• Backup and Archive Overview• Replication and Mirroring Background• Host-Based Mirroring• Storage-Based Replication• SAN Extension Alternatives• Optical SAN Extension—Fiber, CWDM, DWDM• FCIP SAN Extension• Fabric Configuration and Flow Control for Extended SANs• High Availability• Wrap Up
565656© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
FCIP—Fibre Channel over IP
• FCIP is an draft from the IETF IP Storage WG for linking Fibre Channel SANs over IP
Point-to-Point Tunnel between FCIP Link End-pointsAppears as one logical FC Fabric with single FSPF routing domain
• FCIP implemented on:MDS9000 IPS-8 (IP Services Card)—E_Port or B_PortPA-FC-1G Port Adapter for 7200 and 7400—B_Port only
FC SANFC SAN
FCIP Tunnel
Copyright © 2003, Cisco Systems, Inc. All rights reserved. Printed in USA.OPT-2052 8224_06_2003_X2
575757© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
IPNetwork
IPNetwork
E_Port and B_Port Comparison
VE_Port
Exchange Fabric ParametersExchange Fabric Parameters
Exchange Fabric Parameters
Exchange Link Parameters Exchange Link ParametersExchange FCIP-Link Parameters
ESC
E-Port E-Port
VB_Port
Exchange Link Parameters Exchange Link ParametersExchange FCIP-Link Parameters
Exchange Fabric Parameters
ESC (Exchange Switch Capabilities) if required
B_Port FCIP SAN Extension
B-Port B-Port
7200 w/ PA -FC-1G
ESCESC
FCIP FCIP
7200 w/ PA -FC-1G
E-Port E-Port E-Port E-Port• Fabric parms set between adjacent switches (inc FCIP)
• FCIP link emulates E_Port link (VE_Port)
• Fabric and switch parms bridgedthrough FCIP
E_Port FCIP SAN Extension
585858© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
FCIP Entity and Link Endpoint—E_Port
• The FCIP interface represents both the VE_Port and theFCIP Link
• An FCIP Link is defined as one or more TCP sessions (the IPS-8 card supports 1 or 2 TCP connections per FCIP Link)
Control Traffic (Class F FC Frames)
Data Traffic (Class 3 FC Frames)
• FCIP Link Endpoint (LEP) terminates FCIP Links
• FCIP Data Engine: one per TCP connection
Normally two for IPS-8 (configurable)
FCIP listens on TCP Port 3225
Entity 1
TCPPorts
Well Known Port 3225
VE_Port
FCIP_LEP
DE DE
FCIP LinkClass F
Class 3
IP Address = 192.168.1.10
Gigabit Ethernet Interface
Data Engine
Data Traffic
ControlTraffic
Link Endpoint
Copyright © 2003, Cisco Systems, Inc. All rights reserved. Printed in USA.OPT-2052 8224_06_2003_X2
595959© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
FCIP Entity and Link Endpoint—B_Port
• The FCIP interface represents both the B_Port and the FCIP Link
• One or Two TCP Sessions:Control Traffic (Class FFC Frames)
Data Traffic (Class 3FC Frames)
One TCP session only for PA-FC-1G Port Adapter (Control + Data traffic combined)
• FCIP Link Endpoint (LEP) terminates FCIP Links
Entity 1
TCPPorts
Well Known Port 3225
B_Access
FCIP_LEP
DE DE
FCIP LinkClass F
Class 3
IP Address = 192.168.1.10
Gigabit Ethernet Interface
Data Engine
Data Traffic
ControlTraffic
Link Endpoint
606060© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
PA slot 5
PA slot 3
PA slot 1
I/O ctlr
PA slot 6
PA slot 4
PA slot 2
PA-FC-1G using PA-POS-OC3MM and SA-VAM with NPE-400
PA-FC-1G
SA-VAM
PA-POS-OC3MM
GE
7200 Configuration for FCIP
• Can’t just use any slot!
• Port Adapter and I/O Controller configuration has a huge impact on performance (PCI Bus configuration)
Use GigE on I/O Controller of NPE-400 (rather than GigE PA)
Put FCIP PA (PA-FC-1G) in even slot (right hand side – #2,4,6)
Put WAN PAs (OC-3, DS-3,…) and SA-VAM (if used) in odd slots
7204
7206
Copyright © 2003, Cisco Systems, Inc. All rights reserved. Printed in USA.OPT-2052 8224_06_2003_X2
616161© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Two-Site HA FCIP Design with 7200 PA
• Two Fabrics (top and bottom) extended over distance
• High Availability provided through Application level failover through dual fabrics
e.g. Replication (SRDF, DRM, etc.…)
• Cannot trunk VSANs (TE_Port) with 7200 PA
FCFC
IPNetwork
FCIP
FCIP FCIP
FCIP
626262© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Compression and Encryption
• SA-VAM (VPN Adapter Module) available for compression and IPSEC 3DES encryption on 7200 VXR routers
• Throughput constrained to ~100 Mb/sec• Use IPPCP LZS—compression typically 2:1 (data stream
dependent)• Encryption method, compression or no compression has
minimal bearing on performance
FCFC
IPNetwork
FCIP
FCIP FCIP
FCIP
IPSec 3DES Encryption and IPPCP Compression
Copyright © 2003, Cisco Systems, Inc. All rights reserved. Printed in USA.OPT-2052 8224_06_2003_X2
636363© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Design Options with IPS-8 FCIP
• Portchannel, VSANs, and Trunking features allow a number of design alternatives
• Multiple FCIP links can be Portchannelled to appear as a single logical link
Use Srcid/Destid load balancing for VSANs using Channel Group
Entity 1210.12.1.3
FCIP112
Entity 1210.12.1.4
FCIP112
Entity 1310.13.1.3
FCIP113
Entity 1310.13.1.4
FCIP113
10.12.1.410.12.1.3
10.13.1.3 10.13.1.4
E_Port
E_PortE_Port
E_Port
PortchannelE_Port/TE -Port
PortchannelE_Port/TE_Port
646464© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Two-Site HA FCIP Design with IPS-8
• Two Fabrics (top and bottom) extended over distance
• Each Fabric protected from Network failures—link, Cat6k switch, WAN
Application (e.g. disk replication) protected from FC switch failure by dual fabrics
• Portchannel prevents state changes upon single link failures (FSPF, Domain, FIB)
FC
MDS9000
2x FCIP Portchannel
FC
MDS9000
IPNetwork
Copyright © 2003, Cisco Systems, Inc. All rights reserved. Printed in USA.OPT-2052 8224_06_2003_X2
656565© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Two-Site HA FCIP Design with IPS-8—Multiple VSANs
• Individual SANs connected as VSANsto MDS9000
• VSANs trunked over Portchanneled FCIP
VSANs can be scaled independently of FCIP andWAN links
MDS9000 MDS9000
IPNetwork
2x FCIP Portchannel with TE (Trunking VE_Port)VSANs
666666© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
FCIP Packets and MTU Sizes
• FC Frames can be up to2148 Bytes
• MTU for IP packet on Ethernet is 1500
7200 FCIP PA chops full-size FC frame roughly in half
IPS-8 will fill up to MTU
• IPS-8 will acceptjumbo frames
No need to chop up and reassemble FC frames overFCIP link
Fractionally lower latency with jumbo frames (a few µs)
• FC frames are alwaysreassembled
IPNetwork
IPNetwork
FC
FC
FCIP
2084Bytes
2048 Byte payload
1174B
FC Frame
1054B
Ethernet Frames
2084B
2048 Byte payload
FC Frame
2182B
Ethernet Frame
1518B
MTU 1500
MTU 3000
MTU 1500 734B
PA-FC-1G FCIP PA for 7200/7400
MDS9000 IPS-8
Copyright © 2003, Cisco Systems, Inc. All rights reserved. Printed in USA.OPT-2052 8224_06_2003_X2
676767© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Storage over Distance:Network Design
• Why SAN Extension? Business Continuity, RPO/RTO Overview
• Database I/O Overview• Backup and Archive Overview• Replication and Mirroring Background• Host-Based Mirroring• Storage-Based Replication• SAN Extension Alternatives• Optical SAN Extension—Fiber, CWDM, DWDM• FCIP SAN Extension• Fabric Configuration and Flow Control for Extended SANs• High Availability• Wrap Up
686868© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Why Fabric Switches Are Required between Arrays
• Storage Arrays typically have limited BB_Credits—hence limited in distance capability
• Fabric switches have larger BB_Credit pools (up to 255 per port on MDS9000)—can keep the long distance pipe full
2-8 BB Credits (typically) from arrays
Fibre Channel SAN
Long Distance
2-8 BB Credits (typically) from
arrays
Fibre Channel SAN
Long Distance
2-8 BB Credits (typically) from
arrays
Up to 255 BB_Credits (MDS9000)
Copyright © 2003, Cisco Systems, Inc. All rights reserved. Printed in USA.OPT-2052 8224_06_2003_X2
696969© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Flow Control for FCIP Links
• TCP performs flow control for FCIP tunnel
End-to-end “stream” oriented using sliding window—TCP Window size (irrespective of number of packets)
• Storage Traffic is typically Class 3 FC frames (connectionless)
Flow Control through per hop Buffer to Buffer Credit scheme (BB_Credits)
One BB_Credit used per frame (irrespective of size)
• BB_Credits do not apply to the FCIP link—even though it tunnels FC frames through VE_Port or B_Port
FCIP FCIPTCP Windowing
BB Credits
BB Credits
BB Credits
BB Credits
IP Network
Fibre Channel SAN Fibre Channel SAN
707070© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
FCIP—Flow Control Design Considerations
• FC frames can expire if many FC sources feed into a slow or congested FCIP WAN link
FC frames will expire if buffered for >500msKeep FC Receive Buffers small (e.g. set “fcrxbbcredit” on MDS9000 to “2” if using T3 or T1 links)
Remember TCP slow start and congestion mechanisms will throttle throughput Congestion window monitor and slow start threshold configurable on MDS9000
1GbpsWAN link
e.g. 45Mbps
1 or 2Gbps
1 or 2Gbps
1 or 2Gbps
1 or 2Gbps
Fibre Channel Receive Buffers
= receive BB credits + extra
TCP Send Buffer = max window size
Maximum “Drain” rate determined by TCP
window size and link bandwidth
Frames must not sit in FC Receive buffers
> 500ms (or get dropped)
Frames must not sit in FC Receive buffers
> 500ms (or get dropped)
Multiple Sources
Router buffers > window size to cope
with bursts
Copyright © 2003, Cisco Systems, Inc. All rights reserved. Printed in USA.OPT-2052 8224_06_2003_X2
717171© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Determining the Optimum TCP Window Size
• Set the TCP MWS (Max Window Size) to “Keep the pipe full”Calculated by: RTT (Round Trip Time) * Path Bandwidthe.g. 10ms RTT (5ms each way) * 155Mbps (OC-3) = 1.5Mbits = 192kB
• Do not over-dimension TCP MWS valueBut,… under-dimensioning will throttle throughput
• Take multiple link speeds into account (e.g. GigE connectingto OC-3)
Path bandwidth = lowest speed link
• 7200 FCIP PA window size ranges up to 512kB• MDS9000 IPS-8 window size ranges up to 32MB
Round Trip Time (RTT) (e.g. 10ms)
PathBandwidth
(e.g. 155Mbps)
To keep the pipe full:Path Bandwidth x RTT
=> 155Mbps x 10ms = 192kBytes
727272© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Flow Control for Optical Links
• BB_Credits at every hop including Optical Link (CWDM/DWDM)
Low number (~5) only required in local SANsPush back to source rather than buffering at intermediate points
Larger number required for long links due to latency (5µs/km)
Enough to “keep the pipe full”—but,…at whatframe size?
BB Credits
BB Credits
BB Credits
BB Credits
Fibre Channel SAN Fibre Channel SAN
BB Credits
Long Distance
Copyright © 2003, Cisco Systems, Inc. All rights reserved. Printed in USA.OPT-2052 8224_06_2003_X2
737373© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Fabric Timeout Parameters
• E_D_TOV—Error Detect TimeoutHow long will an N_Port wait for an action to occur
Depends on transmission time through fabric + processing delays
• R_A_TOV—Resource Allocation TimeoutMax time a frame can still be valid within a fabric
• Switch Hold time = 500msHow long can a frame be held (buffered) within a switch
• For FCIP Extended SANS:Set E_D_TOV to 10 seconds and R_A_TOV to20 seconds
747474© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Storage over Distance:Network Design
• Why SAN Extension? Business Continuity, RPO/RTO Overview
• Database I/O Overview• Backup and Archive Overview• Replication and Mirroring Background• Host-Based Mirroring• Storage-Based Replication• SAN Extension Alternatives• Optical SAN Extension—Fiber, CWDM, DWDM• FCIP SAN Extension• Fabric Configuration and Flow Control for Extended SANs• High Availability• Wrap Up
Copyright © 2003, Cisco Systems, Inc. All rights reserved. Printed in USA.OPT-2052 8224_06_2003_X2
757575© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
FCIP HA Alternatives
FC FC FC FC
• Separate Physical paths and infrastructure
• Protection against any single failure
• Single failure will kill one path
• Client responsible for recovery
• Protection against IP network failure
• Each SAN portchannelled over separate IP paths
• Single failure in IP network will not fail path – just half available bandwidth
• Protection against IP network failure and GigE Port/card failure on FC switch
• Each SAN portchannelled over separate IP paths
• All links TE ports
• Single failure in IP network will not fail path – just half available bandwidth
• Protection against any IP network failure and single FC switch failure – two paths always open
• Each SAN portchannelled over separate IP paths
• All links TE ports
• Single failure in IP network will not fail path – just half available bandwidth
767676© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Failover Considerations
• What happens to “in-flight” traffic upon failure?
It’s typically lost!
Application is responsible for recovery
Abort Sequence, Resend frame, etc.…
• If not sending at that instant, then all ok!
FCFC
IPNetwork
IPNetworkFCIP
FCIP FCIP
FCIP
Failure in IP WAN
Copyright © 2003, Cisco Systems, Inc. All rights reserved. Printed in USA.OPT-2052 8224_06_2003_X2
777777© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Storage over Distance:Network Design
• Why SAN Extension? Business Continuity, RPO/RTO Overview
• Database I/O Overview• Backup and Archive Overview• Replication and Mirroring Background• Host-Based Mirroring• Storage-Based Replication• SAN Extension Alternatives• Optical SAN Extension—Fiber, CWDM, DWDM• FCIP SAN Extension• Fabric Configuration and Flow Control for Extended SANs• High Availability• Wrap Up
787878© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Wrap Up
• Understand Enterprise RPO andRTO Policy
How does that equate to technology requirements?
• Many transport alternatives for meeting those requirements
Optical: DWDM and CWDM
FCIP
Copyright © 2003, Cisco Systems, Inc. All rights reserved. Printed in USA.OPT-2052 8224_06_2003_X2
797979© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Questions?
797979© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
808080© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Recommended Reading
DWDM Network Designs and Engineering Solutions ISBN: 1587050749
Essential Guide to Optical Networks ISBN: 0130429562
Optical Networks ISBN: 0130607266
Available on-site at the Cisco Company Store
Copyright © 2003, Cisco Systems, Inc. All rights reserved. Printed in USA.OPT-2052 8224_06_2003_X2
818181© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Recommended Reading
Designing Storage Area Networks, Second Ed. ISBN: 0321136500
Essential Guide to Storage Area Networks ISBN: 0130935751
Available on-site at the Cisco Company Store
828282© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2
Please Complete Your Evaluation Form
Session OPT-2052
Copyright © 2003, Cisco Systems, Inc. All rights reserved. Printed in USA.OPT-2052 8224_06_2003_X2
838383© 2003, Cisco Systems, Inc. All rights reserved.OPT-20528224_06_2003_X2