1 what's next for the net? - grid computing internet2 member meeting sept 21,2005 debbie...
TRANSCRIPT
1
What's Next for the Net? - Grid Computing
Internet2 Member MeetingSept 21,2005
Debbie [email protected]
2
Global Grid – Networking
Debbie Montano– Director R&E Alliances, Force10 Networks
Force10 Networks– GigE / 10 GigE switch/routers
Will our networks be able to provide the high-speed access that Grid users will need and demand?
Grid - Sharing Resources– Computing Cycles– Software– Databases / Storage– Network Bandwidth…!
3
Global Grid – Vision to Reality
Themes…
Networks WILL keep up (or catch up) with needs of Grids
Flexible use of Bandwidth will become integral to Grids
Ethernet is key
4
Networks will support Grids
If Grids are the driving applications, the network will be there
The need is recognized for– robust networks– increased bandwidth– new network infrastructure– To support vast amounts of data and grid
collaborations
Example: SC2005 supercomputing & high performance networking conference:– Over 55 x 10 Gbps of WAN bandwidth is converging
on Seattle– Approx 40 x 10 GigE of bandwidith for Bandwidth
Challenge
5
TeraGrid – NSF investment
NSF investing $150M – on top of the initial > $100M investment -- to ensure access to and use of this Grid resource!
Most TeraGrid nodes use Force10 switch/routers for access to users
Credits: Graphics: N.R. Fuller, National Science Foundation Bottom images (left to right): (1) A. Silvestri, AMANDA Project, University of California, Irvine; (2) B. Minsker, University of Illinois, Urbana-Champaign, using an MT3DMS model developed at the Army Corps of Engineers and modified by C. Zheng, University of Alabama; (3) M. Wheeler, University of Texas, Austin; J. Saltz, Ohio State University; M. Parashar, Rutgers University; (4) P. Coveney, University College London / Pittsburgh Supercomputing Center; (5) A. Chourasia, Visualization Services, San Diego Supercomputer Center and The Southern California Earthquake Center Community Modeling Environment
6
Top 500: Customer Segment
Segment 2004 2005
Industry 55.0% 52.8%
Research 22.0% 22.2%
Academic 16.0% 18.6%
Classified 3.0% 3.4%
Vendor 3.8% 2.8%
Others 0.2% 0.2% -
In the top 500 supercomputers, more than half of the clusters are owned by Industry
That type of investment will drive efficient use and the necessary supporting infrastructure
Over 41% of clusters are in research & academic environments.
The days of exclusive ownership and control are being replaced by sharing across disciplines, across university systems, research labs, states and even around the world
7
CERN – International Resource
CERN – International Resource; International Collaboration
Scientific partners around the world
Investing in networking:– Announced Monday, 9/19/2005, CERN will deploy
the TeraScale E-Series family of switch/routers as the foundation of its new 2.4 Terabit per second (Tbps) high performance grid computing farm
– The TeraScale E-Series will connect more than 8,000 processors and storage devices
– Also provides the first intercontinental 10 Gigabit Ethernet WAN links in a production network
8
State & Regional Investment
Networking Investment at all Layers
Regional Optical Networks (RONs) are Growing– State and Universities investing in their own fiber and optical
infrastructure to ensure affordable growth and abundant bandwidth
– Southern Light Rail– I-Light Indiana– LEARN – Texas– Louisiana Optical Networking Initiative (LONI)
Additional GigaPOP Layer 2/3 Services
Costs are continuing to go down– Ethernet port costs, for example, continuing to drop– Densities for GigE and 10 GigE continuing to improve– Lower cost technologies being used more
9
Flexibility of Bandwidth
Lots of Bandwidth but “smart” use
High Speed links dedicated to specific grids versus shared flexible use of bandwidth
Network links as a resource on the grid itself, to be shared, managed and allocated as a needed
Need flexible layers above the “dedicated lambdas”
10
New Architectures: HOPI
Abilene Abilene NetworkNetwork
Abilene core routerAbilene core router
Force10 E600 Force10 E600 Switch/RouterSwitch/Router
NLR OpticalNLR OpticalTerminalTerminal
Abilene Abilene NetworkNetwork
NLR 10 GigE NLR 10 GigE LambdaLambda
OPTICALOPTICAL
PACKETPACKET
NLR OpticalNLR OpticalTerminalTerminal
OpticalOpticalCrossCross
ConnectConnect
10 GigE Backbone10 GigE Backbone
ControlControlMeasurementMeasurementSupportSupportOOBOOB
HOPI NodeHOPI Node
Regional Regional Optical Optical
Network (RON)Network (RON)
GigaPOPGigaPOP GigaPOPGigaPOP
11
Ethernet is Key
Local Area Network (LAN)
Metropolitan Area Network (MAN)– Metro Ethernet– Ethernet Aggregation
Wide Area Network (WAN)– Carriers moving to ethernet and IP services– WAN PHY (Physical Interface) playing a role
All the way down to CPU-to-CPU communication in supercomputers– Ethernet adoption is continuing to grow
12
What Drives Grid / Cluster Topology?Four Networking Requirements
WANWAN
2 Gigabit2 GigabitFiberFiber
SANSAN
FiberFiberConnectConnect
1010
User directoryUser directoryand applicationsand applications
5000 Linux” compute”5000 Linux” compute”cluster nodescluster nodes
700700Mbytes/secMbytes/sec
UsersUsers
33
22
11
Interconnect(node-to-node communication)
1515TByteTByte
1515TByteTByte
I/O To Storage
StorageStorage
Management
I/O To Users(Campus backbone or WAN)
13
Grids / Clusters
System Interconnects– Node-to-node: Inter-processor
Communication (IPC)– Management Network– I/O to users, outside world
(campus, LAN, WAN)– Storage, servers & storage
subsystems
IPC Interconnect Technology – GigE now #1– Top 500 Supercomputers– Ethernet Rapid Growth– Favored in Clusters
Other System Interconnection– Major reliance on Ethernet
Type 2004 2005
Ethernet 35.2% 42.4% Myrinet 38.6% 28.2%
SP Switch 9.2% 9.0%
NUM Alink 3.4% 4.2%
Crossbar 4.6% 4.2%
Proprietary 3.4%
Infiniband 2.2% 3.2%
Quadrics 4.0% 2.6%
Other 2.8% 2.8% -
14
Interconnects – Ethernet NICs
Speedup methods– Stateless offload (performance improvement
without breaking I/O stack, compatible with off-the-shelf OS TCP/IP)
– TOE - TCP Offload Engine– OS bypass / eliminate context switching– RDMA / remote DMA / eliminate payload
copying– iWARP / combination of TOE, OS Bypass, and
RDMA
Hot 10 GbE NIC vendors:
15
Management I/OWhat Makes Sense?
Management network is ALWAYS required– Out-of-band, in-band,
control & management– CPU & memory utilization per node,
system temperature, cooling.
Management has to touch each node – device density is important, helping to simplify topology
If the cluster is in trouble, management network is needed to fix it – must be reliable!
With Ethernet, Management is FREE
16
User GatewayWhat Makes Sense?
Ethernet is ALWAYS the user gateway– Dominant installed base &
knowledge base– End systems are
connected via Ethernet
Ethernet advantages– No distance limitation
– 5 microseconds per mile– 7 Gbps over 20km
(541 GB of data in 10 min.)– Data center or cluster core
switch/router extends directly into the LAN
– Less devices, simplifying topology
17
Compute-Intensive256 nodes
Data-Intensive128 nodes
Compute-Intensive814 nodes
ExtensibleBackplaneNetwork
LAHub
ChicagoHub
30Gb/s
30Gb/s
30Gb/s
Visualization112 nodes
Data collection analysis55 nodes
40 Gb/s
An Example Of Long Distance SharingNSF / DoE TeraGrid
30Gb/s
Data SetsStored Here
Data Set Moved Here for Computing
18
Role of Ethernet – Benefits
Industry Standard (IEEE)
Ubiquitous (Everywhere) and proven Technology
Standard Communication Technology when the Cluster Talks to the Rest of the World (Grid)
Does Not Suffer From distance Limitations
Scales to 1000’s and even 10,000’s of nodes
Allows for Single Fabric Design Easy to Configure, Manage, and Administer for Cluster
Environments (Competing Fabrics require cumbersome multichassis solutions & COMPLEX mapping)
53% yr/yr reduction in price / bit in 15 yrs (ref: Gartner)
Almost All Shipping Servers Include one or more 1000Base-TX NICs w/ TOE
19
Global Grid – Vision to Reality
Themes…
Networks WILL keep up (or catch up) with needs of Grids
Flexible use of Bandwidth will become integral to Grids
Ethernet is key
20
Thank You
www.force10networks.com