doe office of science and esnet

24
1 ESnet4: Enabling Next Generation Distributed Experimental and Computational Science for the US DOE’s Office of Science Summer, 2006 William E. Johnston ESnet Department Head and Senior Scientist Lawrence Berkeley National Laboratory

Upload: kylynn-jenkins

Post on 03-Jan-2016

26 views

Category:

Documents


0 download

DESCRIPTION

ESnet4: Enabling Next Generation Distributed Experimental and Computational Science for the US DOE’s Office of Science Summer, 2006. William E. Johnston ESnet Department Head and Senior Scientist Lawrence Berkeley National Laboratory www.es.net. DOE Office of Science and ESnet. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: DOE Office of Science and ESnet

1

ESnet4:Enabling Next Generation

Distributed Experimental and Computational Science for the US DOE’s Office of Science

Summer, 2006

William E. Johnston

ESnet Department Head and Senior Scientist

Lawrence Berkeley National Laboratory

www.es.net

Page 2: DOE Office of Science and ESnet

2

DOE Office of Science and ESnet

• “The Office of Science is the single largest supporter of basic research in the physical sciences in the United States, … providing more than 40 percent of total funding … for the Nation’s research programs in high-energy physics, nuclear physics, and fusion energy sciences.” (http://www.science.doe.gov)

• The large-scale science that is the mission of the Office of Science is dependent on networks for– Sharing of massive amounts of data

– Supporting thousands of collaborators world-wide

– Distributed data processing

– Distributed simulation, visualization, and computational steering

– Distributed data management

• ESnet’s mission is to enable those aspects of science that depend on networking and on certain types of large-scale collaboration

Page 3: DOE Office of Science and ESnet

ES

net

Scie

nce D

ata

N

etw

ork

(S

DN

) core

TWC

SNLL

YUCCA MT

BECHTEL-NV

PNNLLIGO

INEEL

LANL

SNLA

AlliedSignal

PANTEX

ARM

KCP

NOAA

OSTI ORAU

SRS

JLAB

PPPLLab DC

Offices

MIT

ANL

BNL

FNALAMES

NR

EL

LLNL

GA

DOE-ALB

OSC GTNNNSA

International (high speed)10 Gb/s SDN core10G/s IP core2.5 Gb/s IP coreMAN rings (≥ 10 G/s)OC12 ATM (622 Mb/s)OC12 / GigEthernetOC3 (155 Mb/s)45 Mb/s and less

Office Of Science Sponsored (22)NNSA Sponsored (12)Joint Sponsored (3)

Other Sponsored (NSF LIGO, NOAA)

Laboratory Sponsored (6)

42 end user sites

ESnet IP core

SINet (Japan)Russia (BINP)CA*net4

FranceGLORIAD (Russia, China)Korea (Kreonet2

Japan (SINet)Australia (AARNet)Canada (CA*net4Taiwan (TANet2)Singaren

ESnet IP core: Packet over SONET Optical Ring and

Hubs

ELP

ATL

DC

commercial and R&E peering points

MAE-E

PAIX-PAEquinix, etc.

PN

WG

Po

P/

PA

cifi

cWav

e

ESnet Layer 2 Architecture Provides Global High-Speed Internet Connectivity for DOE Facilities and Collaborators (spring, 2006)

ESnet core hubs IP

Abilene high-speed peering points with Internet2/Abilene

Abilene

Ab

ilen

e

CERN(USLHCnet

CERN+DOE funded)

GÉANT - France, Germany, Italy, UK, etc

NYC

StarlightChi NAP

CH

I-SL

SNV

Ab

ilene

SNV SDN

JGI

LBNL

SLACNERSC

SDSC

Equinix

MA

N L

AN

Ab

ilen

e

MAXGPoP

SNV

ALB

ORNL

CHI

MRENNetherlandsStarTapTaiwan (TANet2, ASCC)

NA

SA

Am

es

AU

AU

SEA

Page 4: DOE Office of Science and ESnet

Challenges Facing ESnet - A Changing Science Environment is the key Driver of ESnet

•Large-scale collaborative science – big facilities, massive data, thousands of collaborators – is now a key element of the Office of Science (“SC”)

•SC science community is almost equally split between Labs and universities, and SC facilities have users worldwide

•Very large international (non-US) facilities (e.g. LHC, ITER) and international collaborators are now also a key element of SC science

•Distributed systems for data analysis, simulations, instrument operation, etc., are becoming common

Page 5: DOE Office of Science and ESnet

5

Changing Science Environment New Demands on Network

• Increased capacity– Needed to accommodate a large and steadily increasing

amount of data that must traverse the network

• High network reliability– For interconnecting components of distributed large-scale

science

• High-speed, highly reliable connectivity between Labs and US and international R&E institutions– To support the inherently collaborative, global nature of large-

scale science

• New network services to provide bandwidth guarantees– For data transfer deadlines for remote data analysis, real-time

interaction with instruments, coupled computational simulations, etc.

Page 6: DOE Office of Science and ESnet

6

Future Network Planning by Requirements

• There are many stakeholders for ESnet

– SC supported scientists

– DOE national facilities

– SC programs

– Lab operations

– Lab general population

– Lab networking organizations

– Other R&E networking organizations

– non-DOE R&E institutions

• Requirements are determined by

– data characteristics of instruments and facilities

– examining the future process of science

– observing traffic patterns

Page 7: DOE Office of Science and ESnet

7

Requirements from Instruments and Facilities• Typical DOE large-scale facilities: Tevatron (FNAL), RHIC (BNK), SNS

(ORNL), ALS (LBNL), supercomputer centers: NERSC, NCLF (ORNL), Blue Gene (ANL)

• This is the ‘hardware infrastructure’ of DOE science – types of requirements can be summarized as follows

– Bandwidth: Quantity of data produced, requirements for timely movement– Connectivity: Geographic reach – location of instruments, facilities, and users plus

network infrastructure involved (e.g. ESnet, Abilene, GEANT) – Services: Guaranteed bandwidth, traffic isolation, etc.; IP multicast

• Data rates and volumes from facilities and instruments – bandwidth, connectivity, services

– Large supercomputer centers (NERSC, NLCF)– Large-scale science instruments (e.g. LHC, RHIC)– Other computational and data resources (clusters, data archives, etc.)

• Some instruments have special characteristics that must be addressed (e.g. Fusion) – bandwidth, services

• Next generation of experiments and facilities, and upgrades to existing facilities – bandwidth, connectivity, services

– Addition of facilities increases bandwidth requirements– Existing facilities generate more data as they are upgraded– Reach of collaboration expands over time– New capabilities require advanced services

Page 8: DOE Office of Science and ESnet

8

Requirements from Case Studies

• Advanced Scientific Computing Research (ASCR)– NERSC (LBNL) (supercomputer

center)

– NLCF (ORNL) (supercomputer center)

– ACLF (ANL) (supercomputer center)

• Basic Energy Sciences– Advanced Light Source

• Macromolecular Crystallography

– Chemistry/Combustion

– Spallation Neutron Source

• Biological and Environmental– Bioinformatics/Genomics

– Climate Science

• Fusion Energy Sciences– Magnetic Fusion Energy/ITER

• High Energy Physics– LHC

• Nuclear Physics– RHIC

• There is a high level of correlation between network requirements for large and small scale science – the only difference is bandwidth

– Meeting the requirements of the large-scale stakeholders will cover the smaller ones, provided the required services set is the same

Page 9: DOE Office of Science and ESnet

Science Network Requirements Aggregation SummaryScience Drivers

Science Areas / Facilities

End2End Reliability

Connectivity Today End2End

Band width

5 years End2End

Band width

Traffic Characteristics

Network Services

Advanced Light Source

- • DOE sites

• US Universities

• Industry

1 TB/day

300 Mbps

5 TB/day

1.5 Gbps

• Bulk data

• Remote control

• Guaranteed bandwidth

• PKI / Grid

Bioinformatics - • DOE sites

• US Universities

625 Mbps

12.5 Gbps in two years

250 Gbps • Bulk data

• Remote control

• Point-to-multipoint

• Guaranteed bandwidth

• High-speed multicast

Chemistry / Combustion

- • DOE sites

• US Universities

• Industry

- 10s of Gigabits per

second

• Bulk data • Guaranteed bandwidth

• PKI / Grid

Climate Science

- • DOE sites

• US Universities

• International

- 5 PB per year

5 Gbps

• Bulk data

• Remote control

• Guaranteed bandwidth

• PKI / Grid

High Energy Physics (LHC)

99.95+%

(Less than 4

hrs/year)

• US Tier1 (DOE)

• US Tier2 (Universities)

• International (Europe, Canada)

10 Gbps 60 to 80 Gbps

(30-40 Gbps per US Tier1)

• Bulk data

• Remote control

• Guaranteed bandwidth

• Traffic isolation

• PKI / Grid

Page 10: DOE Office of Science and ESnet

Science Network Requirements Aggregation SummaryScience Drivers

Science Areas / Facilities

End2End Reliability

Connectivity Today End2End

Band width

5 years End2End

Band width

Traffic Characteristics

Network Services

Magnetic Fusion Energy

99.999%

(Impossible without full

redundancy)

• DOE sites

• US Universities

• Industry

200+ Mbps

1 Gbps • Bulk data

• Remote control

• Guaranteed bandwidth

• Guaranteed QoS

• Deadline scheduling

NERSC and ACLF

- • DOE sites

• US Universities

• International

•Other ASCR supercomputers

10 Gbps 20 to 40 Gbps

• Bulk data

• Remote control

• Guaranteed bandwidth

• Guaranteed QoS

•Deadline Scheduling

• PKI / Grid

NLCF - • DOE sites

• US Universities

• Industry

• International

Backbone Band width parity

Backbone band width

parity

• Bulk data

Nuclear Physics (RHIC)

- • DOE sites

• US Universities

• International

12 Gbps 70 Gbps • Bulk data • Guaranteed bandwidth

•PKI / Grid

Spallation Neutron Source

High

(24x7 operation)

• DOE sites 640 Mbps 2 Gbps • Bulk data

Page 11: DOE Office of Science and ESnet

11

Requirements from Observing the Network

ESnet is currently transporting 600 to 700 terabytes/monthand this volume is increasing exponentially

(approximately 10x every 46 months)

ESnet Monthly Accepted TrafficFebruary, 1990 – December, 2005

Te

rab

yte

s / m

on

th

Page 12: DOE Office of Science and ESnet

12

Footprint of SC Collaborators - Top 100 Traffic Generators

Universities and research institutes that are the top 100 ESnet users• The top 100 data flows generate 30% of all ESnet traffic (ESnet handles about 3x109 flows/mo.)• 91 of the top 100 flows are from the Labs to other institutions (shown) (CY2005 data)

Page 13: DOE Office of Science and ESnet

13

Traffic coming into ESnet = GreenTraffic leaving ESnet = BlueTraffic between ESnet sites% = of total ingress or egress traffic

Traffic notes• more than 90% of all traffic is Office of Science• less that 15% is inter-Lab

Who Generates ESnet Traffic?ESnet Inter-Sector Traffic Summary, Nov 05

Peering Points

Commercial

R&E (mostlyuniversities)

10%

14%

25%

9%

16%

13%

ESnet

~13%

DOE collaborator traffic, inc. data

62%

50%

DOE is a net supplier of data because DOE facilities are used by universities and commercial entities, as well as by DOE researchers

DOE sites

International(almost entirelyR&E sites)

Page 14: DOE Office of Science and ESnet

Network Observation – Large-Scale Science Flows by Site(Among other things these observations set the network footprint

requirements)

Esnet Top 20 Host-to-Host Flows by Site, Sept. 2004 to Sept. 2005

020406080

100120140160

LIG

O -

Ca

lTe

ch

BN

L-R

ike

n (

JP)

BN

L-R

ike

n (

JP)

BN

L-R

ike

n (

JP)

SL

AC

- IN

2P

3 (

FR

)

SL

AC

- IN

FN

, Bo

log

na

(IT

)

ES

ne

t BA

MA

N te

stin

g

SL

AC

- R

uth

erf

ord

La

b (

UK

)

FN

AL

- IN

2P

3 (

FR

)

SL

AC

- IN

FN

, Pa

do

va (

IT)

SL

AC

- IN

FN

, Pa

do

va (

IT)

FN

AL

- IN

2P

3 (

FR

)

BN

L-R

ike

n (

JP)

SL

AC

- IN

2P

3 (

FR

)

SL

AC

- R

uth

erf

ord

La

b (

UK

)

SL

AC

- IN

2P

3 (

FR

)

INF

N, P

ad

ova

(IT

) -

SL

AC

FN

AL

- U

BC

(C

A)

BN

L-R

ike

n (

JP)

SL

AC

- R

uth

erf

ord

La

b (

UK

)

Ter

aBye

s/yr

.

Instrument – University

Nuclear Physics (RHIC)

High Energy Physics

Test traffic

TB

/ye

ar

ESnet Top 20 Host-to-Host Flows by Site, Sep. 2004 to Sep. 2005

Page 15: DOE Office of Science and ESnet

15

Dominance of Science Traffic (1)

• Top 100 flows are increasing as a percentage of total traffic volume

• 99% to 100% of top 100 flows are science data (100% starting mid-2005)

• A small number of large-scale science users account for a significant and growing fraction of total traffic volume

2 TB/month

2 TB/month

2 TB/month

1/05

7/05

1/06

Page 16: DOE Office of Science and ESnet

16

Requirements from Observing the Network

• In 4 years, we can expect a 10x increase in traffic over current levels without the addition of production LHC traffic– Nominal average load on busiest backbone links is ~1.5

Gbps today

– In 4 years that figure will be ~15 Gbps if current trends continue

• Measurements of this kind are science-agnostic– It doesn’t matter who the users are, the traffic load is

increasing exponentially

• Bandwidth trends drive requirement for a new network architecture– Current architecture not scalable in a cost-efficient way

Page 17: DOE Office of Science and ESnet

17

Requirements from Traffic Flow Characteristics

• Most of ESnet science traffic has a source or sink outside of ESnet– Drives requirement for high-bandwidth peering– Reliability and bandwidth requirements demand that peering be

redundant– Multiple 10 Gbps peerings today, must be able to add more flexibly

and cost-effectively

• Bandwidth and service guarantees must traverse R&E peerings– Collaboration with other R&E networks on a common framework is

critical– Seamless fabric

• Large-scale science is becoming the dominant user of the network– Satisfying the demands of large-scale science traffic into the future

will require a purpose-built, scalable architecture– Traffic patterns are different than commodity Internet

• Since large-scale science traffic will be the dominant user, the network should be architected to serve large-scale science

Page 18: DOE Office of Science and ESnet

18

Virtual Circuit Characteristics

• Traffic isolation and traffic engineering– Provides for high-performance, non-standard transport mechanisms that

cannot co-exist with commodity TCP-based transport

– Enables the engineering of explicit paths to meet specific requirements• e.g. bypass congested links, using lower bandwidth, lower latency paths

• Guaranteed bandwidth [Quality of Service (QoS)]– Addresses deadline scheduling

• Where fixed amounts of data have to reach sites on a fixed schedule, so that the processing does not fall far enough behind that it could never catch up – very important for experiment data analysis

• Reduces cost of handling high bandwidth data flows– Highly capable routers are not necessary when every packet goes to the

same place

– Use lower cost (factor of 5x) switches to relatively route the packets

• End-to-end connections are required between Labs and collaborator institutions

Page 19: DOE Office of Science and ESnet

19

OSCARS: Guaranteed Bandwidth VC Service For SC Science

• ESnet On-demand Secured Circuits and Advanced Reservation System (OSCARS)

• To ensure compatibility, the design and implementation is done in collaboration with the other major science R&E networks and end sites

– Internet2: Bandwidth Reservation for User Work (BRUW) • Development of common code base

– GEANT: Bandwidth on Demand (GN2-JRA3), Performance and Allocated Capacity for End-users (SA3-PACE) and Advance Multi-domain Provisioning System (AMPS) Extends to NRENs

– BNL: TeraPaths - A QoS Enabled Collaborative Data Sharing Infrastructure for Peta-scale Computing Research

– GA: Network Quality of Service for Magnetic Fusion Research – SLAC: Internet End-to-end Performance Monitoring (IEPM) – USN: Experimental Ultra-Scale Network Testbed for Large-Scale Science

• In its current phase this effort is being funded as a research project by the Office of Science, Mathematical, Information, and Computational Sciences (MICS) Network R&D Program

• A prototype service has been deployed as a proof of concept– To date more then 20 accounts have been created for beta users, collaborators, and

developers– More then 100 reservation requests have been processed

Page 20: DOE Office of Science and ESnet

20

ESnet’s Place in U. S. and International Science

• ESnet, Internet2/Abilene, and National Lambda Rail (NLR) provide most of the nation’s transit networking for basic science– Abilene provides national transit networking for most of the US

universities by interconnecting the regional networks (mostly via the GigaPoPs)

– ESnet provides national transit networking and ISP service for the DOE Labs

– NLR provides various science-specific and network R&D circuits

• GÉANT plays a role in Europe similar to Abilene and ESnet in the US – it interconnects the European National Research and Education Networks, to which the European R&E sites connect

• GÉANT currently carries essentially all ESnet traffic to Europe– LHC use of LHCnet to CERN is still ramping up

Page 21: DOE Office of Science and ESnet

21

Federated Trust Services

• Remote, multi-institutional, identity authentication is critical for distributed, collaborative science in order to permit sharing computing and data resources, and other Grid services

• The science community needs PKI to formalize the existing web of trust within science collaborations and to extend that trust into cyber space

• The function, form, and policy of the ESnet trust services are driven entirely by the requirements of the science community and by direct input from the science community

• Managing cross site trust agreements among many organizations is crucial for authentication in collaborative environments– ESnet assists in negotiating and managing the cross-site, cross-

organization, and international trust relationships to provide policies that are tailored to collaborative science

Page 22: DOE Office of Science and ESnet

22

DOEGrids CA (one of several CAs) Usage Statistics

User Certificates 3248 Total No. of Active Certificates 4123

Host & Service Certificates 6187 Total No. of Expired Certificates 5333

Total No. of Requests 11918 Total No. of Certificates Issued 9456

ESnet SSL Server CA Certificates 74

FusionGRID CA certificates 119

* Report as of Jun 19, 2006

0500

100015002000250030003500400045005000550060006500700075008000850090009500

1000010500110001150012000

Production service began in June 2003

No

.of

cert

ific

ate

s o

r re

qu

ests

User Certificates

Service Certificates

Expired(+revoked) Certificates

Total Certificates Issued

Total Cert Requests

Page 23: DOE Office of Science and ESnet

23

Technical Approach for Next Generation ESnet

• ESnet has developed a technical approach designed to meet all known requirements

– Significantly increased capacity

– New architecture for• Increased reliability

• Support of traffic engineering (separate IP and circuit oriented traffic)

– New circuit services for• Dynamic bandwidth reservation

• Traffic isolation

• Traffic engineering

– Substantially increased US and international R&E network connectivity and reliability

• All of this is informed by SC program and community input

Page 24: DOE Office of Science and ESnet

24

Cle

vela

nd

Primary DOE Labs

IP core hubsSDN hubs

Europe(GEANT)

Asia-Pacific

New YorkChicago

Washington DC

Atl

anta

CERN (30 Gbps)

Seattle

Albuquerque

Au

stra

lia

San Diego

LA

Production IP core (10Gbps)

SDN core (20-30-40Gbps)

MANs (20-60 Gbps) or backbone loops for site access

International connections

Su

nn

yval

e

Denver

South America(AMPATH)

South America(AMPATH)

Canada(CANARIE)

CERN (30 Gbps)Canada(CANARIE)

Europe(GEANT)

ESnet4 Architecture and Configuration

Asi

a-Pac

ific

Asia Pacific

High Speed Cross connects with Ineternet2/Abilene

GLORIAD (Russia and

China)

Boise

Houston

Jacksonville

Possible hubs

Tulsa

Boston

Science Data Network Core

IP Core

Kansa

s

City

Au

stra

lia

Core networks: 40-50 Gbps in 2009, 160-400 Gbps in 2011-2012

Fiber path is ~ 14,000 miles / 24,000 km