doe office of science and esnet
DESCRIPTION
ESnet4: Enabling Next Generation Distributed Experimental and Computational Science for the US DOE’s Office of Science Summer, 2006. William E. Johnston ESnet Department Head and Senior Scientist Lawrence Berkeley National Laboratory www.es.net. DOE Office of Science and ESnet. - PowerPoint PPT PresentationTRANSCRIPT
1
ESnet4:Enabling Next Generation
Distributed Experimental and Computational Science for the US DOE’s Office of Science
Summer, 2006
William E. Johnston
ESnet Department Head and Senior Scientist
Lawrence Berkeley National Laboratory
www.es.net
2
DOE Office of Science and ESnet
• “The Office of Science is the single largest supporter of basic research in the physical sciences in the United States, … providing more than 40 percent of total funding … for the Nation’s research programs in high-energy physics, nuclear physics, and fusion energy sciences.” (http://www.science.doe.gov)
• The large-scale science that is the mission of the Office of Science is dependent on networks for– Sharing of massive amounts of data
– Supporting thousands of collaborators world-wide
– Distributed data processing
– Distributed simulation, visualization, and computational steering
– Distributed data management
• ESnet’s mission is to enable those aspects of science that depend on networking and on certain types of large-scale collaboration
ES
net
Scie
nce D
ata
N
etw
ork
(S
DN
) core
TWC
SNLL
YUCCA MT
BECHTEL-NV
PNNLLIGO
INEEL
LANL
SNLA
AlliedSignal
PANTEX
ARM
KCP
NOAA
OSTI ORAU
SRS
JLAB
PPPLLab DC
Offices
MIT
ANL
BNL
FNALAMES
NR
EL
LLNL
GA
DOE-ALB
OSC GTNNNSA
International (high speed)10 Gb/s SDN core10G/s IP core2.5 Gb/s IP coreMAN rings (≥ 10 G/s)OC12 ATM (622 Mb/s)OC12 / GigEthernetOC3 (155 Mb/s)45 Mb/s and less
Office Of Science Sponsored (22)NNSA Sponsored (12)Joint Sponsored (3)
Other Sponsored (NSF LIGO, NOAA)
Laboratory Sponsored (6)
42 end user sites
ESnet IP core
SINet (Japan)Russia (BINP)CA*net4
FranceGLORIAD (Russia, China)Korea (Kreonet2
Japan (SINet)Australia (AARNet)Canada (CA*net4Taiwan (TANet2)Singaren
ESnet IP core: Packet over SONET Optical Ring and
Hubs
ELP
ATL
DC
commercial and R&E peering points
MAE-E
PAIX-PAEquinix, etc.
PN
WG
Po
P/
PA
cifi
cWav
e
ESnet Layer 2 Architecture Provides Global High-Speed Internet Connectivity for DOE Facilities and Collaborators (spring, 2006)
ESnet core hubs IP
Abilene high-speed peering points with Internet2/Abilene
Abilene
Ab
ilen
e
CERN(USLHCnet
CERN+DOE funded)
GÉANT - France, Germany, Italy, UK, etc
NYC
StarlightChi NAP
CH
I-SL
SNV
Ab
ilene
SNV SDN
JGI
LBNL
SLACNERSC
SDSC
Equinix
MA
N L
AN
Ab
ilen
e
MAXGPoP
SNV
ALB
ORNL
CHI
MRENNetherlandsStarTapTaiwan (TANet2, ASCC)
NA
SA
Am
es
AU
AU
SEA
Challenges Facing ESnet - A Changing Science Environment is the key Driver of ESnet
•Large-scale collaborative science – big facilities, massive data, thousands of collaborators – is now a key element of the Office of Science (“SC”)
•SC science community is almost equally split between Labs and universities, and SC facilities have users worldwide
•Very large international (non-US) facilities (e.g. LHC, ITER) and international collaborators are now also a key element of SC science
•Distributed systems for data analysis, simulations, instrument operation, etc., are becoming common
5
Changing Science Environment New Demands on Network
• Increased capacity– Needed to accommodate a large and steadily increasing
amount of data that must traverse the network
• High network reliability– For interconnecting components of distributed large-scale
science
• High-speed, highly reliable connectivity between Labs and US and international R&E institutions– To support the inherently collaborative, global nature of large-
scale science
• New network services to provide bandwidth guarantees– For data transfer deadlines for remote data analysis, real-time
interaction with instruments, coupled computational simulations, etc.
6
Future Network Planning by Requirements
• There are many stakeholders for ESnet
– SC supported scientists
– DOE national facilities
– SC programs
– Lab operations
– Lab general population
– Lab networking organizations
– Other R&E networking organizations
– non-DOE R&E institutions
• Requirements are determined by
– data characteristics of instruments and facilities
– examining the future process of science
– observing traffic patterns
7
Requirements from Instruments and Facilities• Typical DOE large-scale facilities: Tevatron (FNAL), RHIC (BNK), SNS
(ORNL), ALS (LBNL), supercomputer centers: NERSC, NCLF (ORNL), Blue Gene (ANL)
• This is the ‘hardware infrastructure’ of DOE science – types of requirements can be summarized as follows
– Bandwidth: Quantity of data produced, requirements for timely movement– Connectivity: Geographic reach – location of instruments, facilities, and users plus
network infrastructure involved (e.g. ESnet, Abilene, GEANT) – Services: Guaranteed bandwidth, traffic isolation, etc.; IP multicast
• Data rates and volumes from facilities and instruments – bandwidth, connectivity, services
– Large supercomputer centers (NERSC, NLCF)– Large-scale science instruments (e.g. LHC, RHIC)– Other computational and data resources (clusters, data archives, etc.)
• Some instruments have special characteristics that must be addressed (e.g. Fusion) – bandwidth, services
• Next generation of experiments and facilities, and upgrades to existing facilities – bandwidth, connectivity, services
– Addition of facilities increases bandwidth requirements– Existing facilities generate more data as they are upgraded– Reach of collaboration expands over time– New capabilities require advanced services
8
Requirements from Case Studies
• Advanced Scientific Computing Research (ASCR)– NERSC (LBNL) (supercomputer
center)
– NLCF (ORNL) (supercomputer center)
– ACLF (ANL) (supercomputer center)
• Basic Energy Sciences– Advanced Light Source
• Macromolecular Crystallography
– Chemistry/Combustion
– Spallation Neutron Source
• Biological and Environmental– Bioinformatics/Genomics
– Climate Science
• Fusion Energy Sciences– Magnetic Fusion Energy/ITER
• High Energy Physics– LHC
• Nuclear Physics– RHIC
• There is a high level of correlation between network requirements for large and small scale science – the only difference is bandwidth
– Meeting the requirements of the large-scale stakeholders will cover the smaller ones, provided the required services set is the same
Science Network Requirements Aggregation SummaryScience Drivers
Science Areas / Facilities
End2End Reliability
Connectivity Today End2End
Band width
5 years End2End
Band width
Traffic Characteristics
Network Services
Advanced Light Source
- • DOE sites
• US Universities
• Industry
1 TB/day
300 Mbps
5 TB/day
1.5 Gbps
• Bulk data
• Remote control
• Guaranteed bandwidth
• PKI / Grid
Bioinformatics - • DOE sites
• US Universities
625 Mbps
12.5 Gbps in two years
250 Gbps • Bulk data
• Remote control
• Point-to-multipoint
• Guaranteed bandwidth
• High-speed multicast
Chemistry / Combustion
- • DOE sites
• US Universities
• Industry
- 10s of Gigabits per
second
• Bulk data • Guaranteed bandwidth
• PKI / Grid
Climate Science
- • DOE sites
• US Universities
• International
- 5 PB per year
5 Gbps
• Bulk data
• Remote control
• Guaranteed bandwidth
• PKI / Grid
High Energy Physics (LHC)
99.95+%
(Less than 4
hrs/year)
• US Tier1 (DOE)
• US Tier2 (Universities)
• International (Europe, Canada)
10 Gbps 60 to 80 Gbps
(30-40 Gbps per US Tier1)
• Bulk data
• Remote control
• Guaranteed bandwidth
• Traffic isolation
• PKI / Grid
Science Network Requirements Aggregation SummaryScience Drivers
Science Areas / Facilities
End2End Reliability
Connectivity Today End2End
Band width
5 years End2End
Band width
Traffic Characteristics
Network Services
Magnetic Fusion Energy
99.999%
(Impossible without full
redundancy)
• DOE sites
• US Universities
• Industry
200+ Mbps
1 Gbps • Bulk data
• Remote control
• Guaranteed bandwidth
• Guaranteed QoS
• Deadline scheduling
NERSC and ACLF
- • DOE sites
• US Universities
• International
•Other ASCR supercomputers
10 Gbps 20 to 40 Gbps
• Bulk data
• Remote control
• Guaranteed bandwidth
• Guaranteed QoS
•Deadline Scheduling
• PKI / Grid
NLCF - • DOE sites
• US Universities
• Industry
• International
Backbone Band width parity
Backbone band width
parity
• Bulk data
Nuclear Physics (RHIC)
- • DOE sites
• US Universities
• International
12 Gbps 70 Gbps • Bulk data • Guaranteed bandwidth
•PKI / Grid
Spallation Neutron Source
High
(24x7 operation)
• DOE sites 640 Mbps 2 Gbps • Bulk data
11
Requirements from Observing the Network
ESnet is currently transporting 600 to 700 terabytes/monthand this volume is increasing exponentially
(approximately 10x every 46 months)
ESnet Monthly Accepted TrafficFebruary, 1990 – December, 2005
Te
rab
yte
s / m
on
th
12
Footprint of SC Collaborators - Top 100 Traffic Generators
Universities and research institutes that are the top 100 ESnet users• The top 100 data flows generate 30% of all ESnet traffic (ESnet handles about 3x109 flows/mo.)• 91 of the top 100 flows are from the Labs to other institutions (shown) (CY2005 data)
13
Traffic coming into ESnet = GreenTraffic leaving ESnet = BlueTraffic between ESnet sites% = of total ingress or egress traffic
Traffic notes• more than 90% of all traffic is Office of Science• less that 15% is inter-Lab
Who Generates ESnet Traffic?ESnet Inter-Sector Traffic Summary, Nov 05
Peering Points
Commercial
R&E (mostlyuniversities)
10%
14%
25%
9%
16%
13%
ESnet
~13%
DOE collaborator traffic, inc. data
62%
50%
DOE is a net supplier of data because DOE facilities are used by universities and commercial entities, as well as by DOE researchers
DOE sites
International(almost entirelyR&E sites)
Network Observation – Large-Scale Science Flows by Site(Among other things these observations set the network footprint
requirements)
Esnet Top 20 Host-to-Host Flows by Site, Sept. 2004 to Sept. 2005
020406080
100120140160
LIG
O -
Ca
lTe
ch
BN
L-R
ike
n (
JP)
BN
L-R
ike
n (
JP)
BN
L-R
ike
n (
JP)
SL
AC
- IN
2P
3 (
FR
)
SL
AC
- IN
FN
, Bo
log
na
(IT
)
ES
ne
t BA
MA
N te
stin
g
SL
AC
- R
uth
erf
ord
La
b (
UK
)
FN
AL
- IN
2P
3 (
FR
)
SL
AC
- IN
FN
, Pa
do
va (
IT)
SL
AC
- IN
FN
, Pa
do
va (
IT)
FN
AL
- IN
2P
3 (
FR
)
BN
L-R
ike
n (
JP)
SL
AC
- IN
2P
3 (
FR
)
SL
AC
- R
uth
erf
ord
La
b (
UK
)
SL
AC
- IN
2P
3 (
FR
)
INF
N, P
ad
ova
(IT
) -
SL
AC
FN
AL
- U
BC
(C
A)
BN
L-R
ike
n (
JP)
SL
AC
- R
uth
erf
ord
La
b (
UK
)
Ter
aBye
s/yr
.
Instrument – University
Nuclear Physics (RHIC)
High Energy Physics
Test traffic
TB
/ye
ar
ESnet Top 20 Host-to-Host Flows by Site, Sep. 2004 to Sep. 2005
15
Dominance of Science Traffic (1)
• Top 100 flows are increasing as a percentage of total traffic volume
• 99% to 100% of top 100 flows are science data (100% starting mid-2005)
• A small number of large-scale science users account for a significant and growing fraction of total traffic volume
2 TB/month
2 TB/month
2 TB/month
1/05
7/05
1/06
16
Requirements from Observing the Network
• In 4 years, we can expect a 10x increase in traffic over current levels without the addition of production LHC traffic– Nominal average load on busiest backbone links is ~1.5
Gbps today
– In 4 years that figure will be ~15 Gbps if current trends continue
• Measurements of this kind are science-agnostic– It doesn’t matter who the users are, the traffic load is
increasing exponentially
• Bandwidth trends drive requirement for a new network architecture– Current architecture not scalable in a cost-efficient way
17
Requirements from Traffic Flow Characteristics
• Most of ESnet science traffic has a source or sink outside of ESnet– Drives requirement for high-bandwidth peering– Reliability and bandwidth requirements demand that peering be
redundant– Multiple 10 Gbps peerings today, must be able to add more flexibly
and cost-effectively
• Bandwidth and service guarantees must traverse R&E peerings– Collaboration with other R&E networks on a common framework is
critical– Seamless fabric
• Large-scale science is becoming the dominant user of the network– Satisfying the demands of large-scale science traffic into the future
will require a purpose-built, scalable architecture– Traffic patterns are different than commodity Internet
• Since large-scale science traffic will be the dominant user, the network should be architected to serve large-scale science
18
Virtual Circuit Characteristics
• Traffic isolation and traffic engineering– Provides for high-performance, non-standard transport mechanisms that
cannot co-exist with commodity TCP-based transport
– Enables the engineering of explicit paths to meet specific requirements• e.g. bypass congested links, using lower bandwidth, lower latency paths
• Guaranteed bandwidth [Quality of Service (QoS)]– Addresses deadline scheduling
• Where fixed amounts of data have to reach sites on a fixed schedule, so that the processing does not fall far enough behind that it could never catch up – very important for experiment data analysis
• Reduces cost of handling high bandwidth data flows– Highly capable routers are not necessary when every packet goes to the
same place
– Use lower cost (factor of 5x) switches to relatively route the packets
• End-to-end connections are required between Labs and collaborator institutions
19
OSCARS: Guaranteed Bandwidth VC Service For SC Science
• ESnet On-demand Secured Circuits and Advanced Reservation System (OSCARS)
• To ensure compatibility, the design and implementation is done in collaboration with the other major science R&E networks and end sites
– Internet2: Bandwidth Reservation for User Work (BRUW) • Development of common code base
– GEANT: Bandwidth on Demand (GN2-JRA3), Performance and Allocated Capacity for End-users (SA3-PACE) and Advance Multi-domain Provisioning System (AMPS) Extends to NRENs
– BNL: TeraPaths - A QoS Enabled Collaborative Data Sharing Infrastructure for Peta-scale Computing Research
– GA: Network Quality of Service for Magnetic Fusion Research – SLAC: Internet End-to-end Performance Monitoring (IEPM) – USN: Experimental Ultra-Scale Network Testbed for Large-Scale Science
• In its current phase this effort is being funded as a research project by the Office of Science, Mathematical, Information, and Computational Sciences (MICS) Network R&D Program
• A prototype service has been deployed as a proof of concept– To date more then 20 accounts have been created for beta users, collaborators, and
developers– More then 100 reservation requests have been processed
20
ESnet’s Place in U. S. and International Science
• ESnet, Internet2/Abilene, and National Lambda Rail (NLR) provide most of the nation’s transit networking for basic science– Abilene provides national transit networking for most of the US
universities by interconnecting the regional networks (mostly via the GigaPoPs)
– ESnet provides national transit networking and ISP service for the DOE Labs
– NLR provides various science-specific and network R&D circuits
• GÉANT plays a role in Europe similar to Abilene and ESnet in the US – it interconnects the European National Research and Education Networks, to which the European R&E sites connect
• GÉANT currently carries essentially all ESnet traffic to Europe– LHC use of LHCnet to CERN is still ramping up
21
Federated Trust Services
• Remote, multi-institutional, identity authentication is critical for distributed, collaborative science in order to permit sharing computing and data resources, and other Grid services
• The science community needs PKI to formalize the existing web of trust within science collaborations and to extend that trust into cyber space
• The function, form, and policy of the ESnet trust services are driven entirely by the requirements of the science community and by direct input from the science community
• Managing cross site trust agreements among many organizations is crucial for authentication in collaborative environments– ESnet assists in negotiating and managing the cross-site, cross-
organization, and international trust relationships to provide policies that are tailored to collaborative science
22
DOEGrids CA (one of several CAs) Usage Statistics
User Certificates 3248 Total No. of Active Certificates 4123
Host & Service Certificates 6187 Total No. of Expired Certificates 5333
Total No. of Requests 11918 Total No. of Certificates Issued 9456
ESnet SSL Server CA Certificates 74
FusionGRID CA certificates 119
* Report as of Jun 19, 2006
0500
100015002000250030003500400045005000550060006500700075008000850090009500
1000010500110001150012000
Production service began in June 2003
No
.of
cert
ific
ate
s o
r re
qu
ests
User Certificates
Service Certificates
Expired(+revoked) Certificates
Total Certificates Issued
Total Cert Requests
23
Technical Approach for Next Generation ESnet
• ESnet has developed a technical approach designed to meet all known requirements
– Significantly increased capacity
– New architecture for• Increased reliability
• Support of traffic engineering (separate IP and circuit oriented traffic)
– New circuit services for• Dynamic bandwidth reservation
• Traffic isolation
• Traffic engineering
– Substantially increased US and international R&E network connectivity and reliability
• All of this is informed by SC program and community input
24
Cle
vela
nd
Primary DOE Labs
IP core hubsSDN hubs
Europe(GEANT)
Asia-Pacific
New YorkChicago
Washington DC
Atl
anta
CERN (30 Gbps)
Seattle
Albuquerque
Au
stra
lia
San Diego
LA
Production IP core (10Gbps)
SDN core (20-30-40Gbps)
MANs (20-60 Gbps) or backbone loops for site access
International connections
Su
nn
yval
e
Denver
South America(AMPATH)
South America(AMPATH)
Canada(CANARIE)
CERN (30 Gbps)Canada(CANARIE)
Europe(GEANT)
ESnet4 Architecture and Configuration
Asi
a-Pac
ific
Asia Pacific
High Speed Cross connects with Ineternet2/Abilene
GLORIAD (Russia and
China)
Boise
Houston
Jacksonville
Possible hubs
Tulsa
Boston
Science Data Network Core
IP Core
Kansa
s
City
Au
stra
lia
Core networks: 40-50 Gbps in 2009, 160-400 Gbps in 2011-2012
Fiber path is ~ 14,000 miles / 24,000 km