cern it department ch-1211 geneva 23 switzerland t experience with netapp at cern it/db giacomo...
TRANSCRIPT
CERN IT Department
CH-1211 Geneva 23
Switzerlandwww.cern.ch/
it
Experience with NetAppat CERN IT/DB
Giacomo Tenagliaon behalf of
Eric Grancher
Ruben Gaspar Aparicio
CERN IT Department
CH-1211 Geneva 23
Switzerlandwww.cern.ch/
it
Outline
• NAS-based usage at CERN• Key features• Future plans
Experience with NetApp at CERN IT/DB - 2
Storage for Oracle at CERN
• 1982: Oracle at CERN, PDP-11, mainframe, VAX VMS, Solaris SPARC 32 and 64
• 1996: Solaris SPARC with OPS, then RAC• 2000: Linux x86 on single node, DAS• 2005: Linux x86_64 / RAC / SAN
– Experiment and part of WLCG on SAN until 2012
• 2006: Linux x86_64 / RAC / NFS (IBM/NetApp)• 2012: all production primary Oracle databases (*)
on NFS
(*) apart from ALICE and LHCb online
Experience with NetApp at CERN IT/DB - 3
Network topology
• All 10Gb/s Ethernet• Same network for storage and cluster interconnect
filer1 filer2 filer3 filer4
serverBserverA serverC serverEserverD
Ethernet switch
Private 1
Ethernet switch
Private 2
Internal HA pair interconnect
Private network, both CRS and storage
“public network” Ethernet switch Public
Domains: space/filers
Total size (TB) Used for backup (TB) # of Filers
des-nas 47.4 62.6 10
shosts 204 4
gen3 97 4
rac10 59 6
rac11 59 6
castor 154 18
acc 281 8
db disk 1000 2
TOTAL 901.4 1062.6 58
Experience with NetApp at CERN IT/DB - 5
Typical setup
Impact of storage architecture on Oracle stability at CERN
Experience with NetApp at CERN IT/DB - 7
Key features
• Flash cache• RaidDP• Snapshots• Compression
Experience with NetApp at CERN IT/DB - 8
Flash cache
• Help to increase random IOPs on disks– Very good for OLTP-like workload
• Don’t get wiped when servers reboot• For databases
– Decide what volumes to cache:
fas3240>priority on
fas3240>priority set volume volname cache=[reuse|keep]
• 512 GB modules• 1 per controller
Experience with NetApp at CERN IT/DB - 9
IOPs and Flash cache
Experience with NetApp at CERN IT/DB - 10
IOPs and Flash cache
Experience with NetApp at CERN IT/DB - 11
Key features
• Flash cache• RaidDP• Snapshots• Compression
Experience with NetApp at CERN IT/DB - 12
Disk and redundancy (1/2)
• Disks are larger and larger – speed stay ~constant → issue with performance– bit error rate stay constant (10-14 to 10-16), increasing
issue with availability
• With x as the size and α the “bit error rate”
Experience with NetApp at CERN IT/DB - 13
Disks, redundancy comparison (2/2)
1 TB SATA desktop
Bit error rate 10^-14
RAID 1 7.68E-02
RAID 5 (n+1) 3.29E-01 6.73E-01 8.93E-01
~RAID 6 (n+2) 1.60E-14 1.46E-13 6.05E-13
~triple mirror 8.00E-16 8.00E-16 8.00E-16
1TB SATA enterprise
Bit error rate 10^-15
RAID 1 7.96E-03
RAID 5 (n+1) 3.92E-02 1.06E-01 2.01E-01
~RAID 6 (n+2) 1.60E-16 1.46E-15 6.05E-15
~triple mirror 8.00E-18 8.00E-18 8.00E-18
450GB FCBit error
rate 10^-16
RAID 1 4.00E-04
RAID 5 (n+1) 2.00E-03 5.58E-03 1.11E-02
~RAID 6 (n+2) 7.20E-19 6.55E-18 2.72E-17
~triple mirror 3.60E-20 3.60E-20 3.60E-20
5 14 28 5 14 28
10TB SATA enterprise
Bit error rate 10^-15
RAID 1 7.68E-02
RAID 5 (n+1) 3.29E-01 6.73E-01 8.93E-01
~RAID 6 (n+2) 1.60E-15 1.46E-14 6.05E-14
~triple mirror 8E-17 8E-17 8E-17
Experience with NetApp at CERN IT/DB - 14
Data loss probability for different disk types and groups
Key features
• Flash cache• RaidDP• Snapshots• Compression
Experience with NetApp at CERN IT/DB - 15
Snapshots
Experience with NetApp at CERN IT/DB - 16
• T0: take snapshot 1
Snapshots
Experience with NetApp at CERN IT/DB - 17
• T0: take snapshot 1• T1: file changed
Snapshots
Experience with NetApp at CERN IT/DB - 18
• T0: take snapshot 1• T1: file changed• T2: take snapshot 2
Snapshots for backups
• With data growth, restoring databases in reasonable amount of time is impossible using “traditional” restore/backup techniques
• 100TB, 10GbE, 4 tape drives• Tape drive restore performance ~120MB/s• Restore ~ 58 hours (but it can be much longer)
Experience with NetApp at CERN IT/DB - 19
Snapshots and Real Application Testing
Capture
inse
rt…
PL/SQL
update …
delete …Original
Clone 10.2 11.2
UpgradeReplay
inse
rt…
PL/SQL
update …
delete …
Experience with NetApp at CERN IT/DB - 20
Snapshots and Real Application Testing
Capture
inse
rt…
PL/SQL
update …
delete …Original
Clone 10.2 11.2
UpgradeReplay
inse
rt…
PL/SQL
update …
delete …
SnapRestore®
Replay
inse
rt…
PL/SQL
update …
delete …
Replay
inse
rt…
PL/SQL
update …
delete …
Experience with NetApp at CERN IT/DB - 20
Key features
• Flash cache• RaidDP• Snapshots• Compression
Experience with NetApp at CERN IT/DB - 21
NetApp compression factor
' Uncompressed GB Compressed GB Compression RatioOne day AISDB Prod redolog 281.3 100.7 2.8Recent one day ACCLOG datafile 118.1 49.4 2.4CMSR full backup 997.3 297.7 3.4
Experience with NetApp at CERN IT/DB - 22
Compression: backup on disk
RMANFile backup
1x tape copy
+
Disk bufferRaw: ~1700 TiB (576 3TB disks)
Usable: 1000 TiB(to hold ~2PiB uncompressed data)
Experience with NetApp at CERN IT/DB - 23
Future: OnTap Cluster Mode
• Non-disruptive upgrades/operations: the immortal cluster
• Interesting new features– Internal DNS load balancing– Export policies: fine-grained access for NFS exports– Encryption and compression at storage level– NFS 4.1 implementation, parallel NFS
• Scale-out architecture: up to 24 (512 theoretical)• Seamless data moves for capacity, performance
rebalancing or hardware replacement
Experience with NetApp at CERN IT/DB - 24
Architecture view – Ontap cluster mode
Experience with NetApp at CERN IT/DB - 25
Possible implementation
Experience with NetApp at CERN IT/DB - 26
Logical components
Experience with NetApp at CERN IT/DB - 27
pNFS
• NFS 4.1 standard (client caching, Kerberos, ACL)• Coming with Ontap 8.1RC2
• Not natively supported by Oracle yet• In RHEL 6.2
• Control protocol: provides synchronization among data and metadata server
• pNFS between client and MDS, get where information is store
• Storage access protocols: file-based, block-based and object- based
pNFS
Storage access protocols
Experience with NetApp at CERN IT/DB - 28
CERN IT Department
CH-1211 Geneva 23
Switzerlandwww.cern.ch/
it
Summary
• Good reliability– Six years of operations with minimal downtime
• Good flexibility– Same setup for different uses/workloads
• Scales to our needs
Experience with NetApp at CERN IT/DB - 29
CERN IT Department
CH-1211 Geneva 23
Switzerlandwww.cern.ch/
it
Q&A
Thanks!
Experience with NetApp at CERN IT/DB - 30