experiences with high-bandwidth networks
TRANSCRIPT
Streaming Exa-‐scale Data over 100Gbps Networks
Mehmet Balman
Scien.fic Data Management Group Computa.onal Research Division
Lawrence Berkeley Na.onal Laboratory
ESG (Earth Systems Grid)
• Over 2,700 sites • 25,000 users
• IPCC Fi3h Assessment Report (AR5) 2PB
• IPCC Forth Assessment Report (AR4) 35TB
Applications’ Perspective
• Increasing the bandwidth is not sufficient by itself; we need careful evaluaLon of high-‐bandwidth networks from the applicaLons’ perspecLve.
• Data distribu.on for climate science
• How scien*fic data movement and analysis between geographically disparate supercompu*ng facili*es can benefit from high-‐bandwidth networks?
Climate Data over 100Gbps
• Data volume in climate applicaLons is increasing exponenLally.
• An important challenge in managing ever increasing data sizes in climate science is the large variance in file sizes. • Climate simulaLon data consists of a mix of relaLvely small and large files with irregular file size distribuLon in each dataset.
• Many small files
Keep the data channel full
FTP RPC
request a file
request a file
send file
send file
request data
send data
• Concurrent transfers • Parallel streams
lots-‐of-‐small-‐Ciles problem! Cile-‐centric tools?
l Not necessarily high-‐speed (same distance) - Latency is sLll a problem
100Gbps pipe 10Gbps pipe
request a dataset
send data
Framework for the Memory-‐mapped Network Channel
memory caches are logically mapped between client and server
Moving climate Ciles efCiciently
Advantages of MemzNet • Decoupling I/O and network operaLons
• front-‐end (I/O processing) • back-‐end (networking layer)
• Not limited by the characterisLcs of the file sizes On the fly tar approach, bundling and sending many files together
• Dynamic data channel management Can increase/decrease the parallelism level both in the network communicaLon and I/O read/write operaLons, without closing and reopening the data channel connecLon (as is done in regular FTP variants).
ANI 100Gbps testbed
ANI 100G Router
nersc-diskpt-2
nersc-diskpt-3
nersc-diskpt-1
nersc-C2940 switch
4x10GE (MM)
4x 10GE (MM)
Site Router(nersc-mr2)
anl-mempt-2
anl-mempt-1
anl-app
nersc-app
NERSC ANL
Updated December 11, 2011
ANI Middleware Testbed
ANL Site Router
4x10GE (MM)
4x10GE (MM)
100G100G
1GE
1 GE
1 GE
1 GE
1GE
1 GE
1 GE1 GE
10G
10G
To ESnet
ANI 100G Router
4x10GE (MM)
100G 100G
ANI 100G Network
anl-mempt-1 NICs:2: 2x10G Myricom
anl-mempt-2 NICs:2: 2x10G Myricom
nersc-diskpt-1 NICs:2: 2x10G Myricom1: 4x10G HotLava
nersc-diskpt-2 NICs:1: 2x10G Myricom1: 2x10G Chelsio1: 6x10G HotLava
nersc-diskpt-3 NICs:1: 2x10G Myricom1: 2x10G Mellanox1: 6x10G HotLava
Note: ANI 100G routers and 100G wave available till summer 2012; Testbed resources after that subject funding availability.
nersc-asw1
anl-C2940 switch
1 GE
anl-asw1
1 GE
To ESnet
eth0
eth0
eth0
eth0
eth0
eth0
eth2-5
eth2-5
eth2-5
eth2-5
eth2-5
eth0
anl-mempt-3
4x10GE (MM)
eth2-5 eth0
1 GE
anl-mempt-3 NICs:1: 2x10G Myricom1: 2x10G Mellanox
4x10GE (MM)
10GE (MM)10GE (MM)
SC11 100Gbps demo
Disadvantage of many TCP Streams
(a) total throughput vs. the number of concurrent memory-to-memory transfers, (b) interface traffic, packages per second (blue) and bytes per second, over a single NIC with different number of concurrent transfers. Three hosts, each with 4 available NICs, and a total of 10 10Gbps NIC pairs were used to saturate the 100Gbps pipe in the ANI Testbed. 10 data movement jobs, each corresponding to a NIC pair, at source and destination started simultaneously. Each peak represents a different test; 1, 2, 4, 8, 16, 32, 64 concurrent streams per job were initiated for 5min intervals (e.g. when concurrency level is 4, there are 40 streams in total).
ANI testbed 100Gbps (10x10NICs, three hosts): Interrupts/CPU vs the number of concurrent transfers [1, 2, 4, 8, 16, 32 64 concurrent jobs - 5min intervals], TCP buffer size is 50M
Effects of many streams
MemzNet’s Performance
TCP buffer size is set to 50MB
MemzNet GridFTP
SC11 demo
ANI Testbed
MemzNet’s Architecture for data streaming
Acknowledgements Eric Pouyoul, Yushu Yao, E. Wes Bethel, Burlen Loring, Prabhat, John Shalf, Alex Sim, Brian L. Tierney, Peter Nugent, Zarija Lukic , Patrick Dorn, Evangelos Chaniotakis, John Christman, Chin Guok, Chris Tracy, Lauren Rotman, Jason Lee, Shane Canon, Tina Declerck, Cary Whitney, Ed Holohan, Adam Scovel, Linda Winkler, Jason Hill, Doug Fuller, Susan Hicks, Hank Childs, Mark Howison, Aaron Thomas, John Dugan, Gopal Vaswani