sc|05 bandwidth challenge escc meeting 9th february ‘06 yee-ting li stanford linear accelerator...

27
SC|05 Bandwidth Challenge ESCC Meeting 9th February ‘06 Yee-Ting Li Stanford Linear Accelerator Center

Post on 21-Dec-2015

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SC|05 Bandwidth Challenge ESCC Meeting 9th February ‘06 Yee-Ting Li Stanford Linear Accelerator Center ESCC Meeting 9th February ‘06 Yee-Ting Li Stanford

SC|05 Bandwidth Challenge

SC|05 Bandwidth Challenge

ESCC Meeting9th February ‘06

Yee-Ting LiStanford Linear Accelerator Center

ESCC Meeting9th February ‘06

Yee-Ting LiStanford Linear Accelerator Center

Page 2: SC|05 Bandwidth Challenge ESCC Meeting 9th February ‘06 Yee-Ting Li Stanford Linear Accelerator Center ESCC Meeting 9th February ‘06 Yee-Ting Li Stanford
Page 3: SC|05 Bandwidth Challenge ESCC Meeting 9th February ‘06 Yee-Ting Li Stanford Linear Accelerator Center ESCC Meeting 9th February ‘06 Yee-Ting Li Stanford

LHC Network RequirementsLHC Network Requirements

CERN/Outside Resource Ratio ~1:2Tier0/( Tier1)/( Tier2) ~1:1:1

Tier 1

Tier2 Center

Online System

CERN Center PBs of Disk;

Tape Robot

FNAL CenterIN2P3 Center INFN Center RAL Center

InstituteInstituteInstituteInstitute

Workstations

~150-1500 MBytes/sec

~10 Gbps

1 to 10 Gbps

Tens of Petabytes by 2007-8.An Exabyte ~5-7 Years later.

Physics data cache

~PByte/sec

10 - 40 Gbps

Tier2 CenterTier2 CenterTier2 Center

~1-10 Gbps

Tier 0 +1

Tier 3

Tier 4

Tier2 Center Tier 2

Experiment

Page 4: SC|05 Bandwidth Challenge ESCC Meeting 9th February ‘06 Yee-Ting Li Stanford Linear Accelerator Center ESCC Meeting 9th February ‘06 Yee-Ting Li Stanford

OverviewOverview Bandwidth Challenge

‘The Bandwidth Challenge highlights the best and brightest in new techniques for creating and utilizing vast rivers of data that can be carried across advanced networks.‘

Transfer as much data as possible using real applications over a 2 hour window

We did… Distributed TeraByte Particle Physics Data Sample Analysis ‘Demonstrated high speed transfers of particle physics

data between host labs and collaborating institutes in the USA and worldwide. Using state of the art WAN infrastructure and Grid Web Services based on the LHC Tiered Architecture, they showed real-time particle event analysis requiring transfers of Terabyte-scale datasets.’

Bandwidth Challenge ‘The Bandwidth Challenge highlights the best and brightest

in new techniques for creating and utilizing vast rivers of data that can be carried across advanced networks.‘

Transfer as much data as possible using real applications over a 2 hour window

We did… Distributed TeraByte Particle Physics Data Sample Analysis ‘Demonstrated high speed transfers of particle physics

data between host labs and collaborating institutes in the USA and worldwide. Using state of the art WAN infrastructure and Grid Web Services based on the LHC Tiered Architecture, they showed real-time particle event analysis requiring transfers of Terabyte-scale datasets.’

Page 5: SC|05 Bandwidth Challenge ESCC Meeting 9th February ‘06 Yee-Ting Li Stanford Linear Accelerator Center ESCC Meeting 9th February ‘06 Yee-Ting Li Stanford

OverviewOverview

In detail, during the bandwidth challenge (2 hours): 131 Gbps measured by SCInet BWC team on 17 of our waves (15

minute average) 95.37TB of data transferred.

(3.8 DVD’s per second) 90-150Gbps (peak 150.7Gbps)

On day of challenge Transferred ~475TB ‘practising’ (waves were shared, still

tuning applications and hardware) Peak one way USN utlisation observed on a single link was

9.1Gbps (Caltech) and 8.4Gbps (SLAC) Also wrote to StorCloud

SLAC: wrote 3.2TB in 1649 files during BWC Caltech: 6GB/sec with 20 nodes

In detail, during the bandwidth challenge (2 hours): 131 Gbps measured by SCInet BWC team on 17 of our waves (15

minute average) 95.37TB of data transferred.

(3.8 DVD’s per second) 90-150Gbps (peak 150.7Gbps)

On day of challenge Transferred ~475TB ‘practising’ (waves were shared, still

tuning applications and hardware) Peak one way USN utlisation observed on a single link was

9.1Gbps (Caltech) and 8.4Gbps (SLAC) Also wrote to StorCloud

SLAC: wrote 3.2TB in 1649 files during BWC Caltech: 6GB/sec with 20 nodes

Page 6: SC|05 Bandwidth Challenge ESCC Meeting 9th February ‘06 Yee-Ting Li Stanford Linear Accelerator Center ESCC Meeting 9th February ‘06 Yee-Ting Li Stanford

ParticipantsParticipants Caltech/HEP/CACR/ NetLab:

Harvey Newman, Julian Bunn - Contact, Dan Nae, Sylvain Ravot, Conrad Steenberg, Yang Xia, Michael Thomas

SLAC/IEPM: Les Cottrell, Gary Buhrmaster, Yee-Ting Li, Connie Logg

FNAL Matt Crawford, Don Petravick, Vyto Grigaliunas, Dan Yocum

University of Michigan Shawn McKee, Andy Adamson, Roy Hockett, Bob Ball, Richard French, Dean Hildebrand, Erik Hofer, David Lee, Ali Lotia, Ted Hanss, Scott Gerstenberger

Caltech/HEP/CACR/ NetLab: Harvey Newman, Julian Bunn - Contact, Dan Nae, Sylvain Ravot, Conrad Steenberg, Yang Xia, Michael Thomas

SLAC/IEPM: Les Cottrell, Gary Buhrmaster, Yee-Ting Li, Connie Logg

FNAL Matt Crawford, Don Petravick, Vyto Grigaliunas, Dan Yocum

University of Michigan Shawn McKee, Andy Adamson, Roy Hockett, Bob Ball, Richard French, Dean Hildebrand, Erik Hofer, David Lee, Ali Lotia, Ted Hanss, Scott Gerstenberger

U Florida Paul Avery, Dimitri Bourilkov,

University of Manchester: Richard Hughes-Jones ・

CERN, Switzerland David Foster

KAIST, Korea Yusung Kim, Kyungpook Univserity, Korea,

Kihwan Kwon, UERJ, Brazil Alberto Santoro, UNESP, Brazil Sergio Novaes, USP, Brazil Luis Fernandez

Lopez GLORIAD, USA: Greg Cole,

Natasha Bulashova

U Florida Paul Avery, Dimitri Bourilkov,

University of Manchester: Richard Hughes-Jones ・

CERN, Switzerland David Foster

KAIST, Korea Yusung Kim, Kyungpook Univserity, Korea,

Kihwan Kwon, UERJ, Brazil Alberto Santoro, UNESP, Brazil Sergio Novaes, USP, Brazil Luis Fernandez

Lopez GLORIAD, USA: Greg Cole,

Natasha Bulashova

Page 7: SC|05 Bandwidth Challenge ESCC Meeting 9th February ‘06 Yee-Ting Li Stanford Linear Accelerator Center ESCC Meeting 9th February ‘06 Yee-Ting Li Stanford

Networking OverviewNetworking Overview

We had 22 10Gbits/s waves to the Caltech and SLAC/FNAL booths. Of these: 15 waves to the Caltech booth (from Florida (1),

Korea/GLORIAD (1), Brazil (1 * 2.5Gbits/s), Caltech (2), LA (2), UCSD, CERN (2), U Michigan (3), FNAL(2)).

7 x 10Gbits/s waves to the SLAC/FNAL booth (2 from SLAC, 1 from the UK, and 4 from FNAL).

The waves were provided by Abilene, Canarie, Cisco (5), ESnet (3), GLORIAD (1), HOPI (1), Michigan Light Rail (MiLR), National Lambda Rail (NLR), TeraGrid (3) and UltraScienceNet (4).

We had 22 10Gbits/s waves to the Caltech and SLAC/FNAL booths. Of these: 15 waves to the Caltech booth (from Florida (1),

Korea/GLORIAD (1), Brazil (1 * 2.5Gbits/s), Caltech (2), LA (2), UCSD, CERN (2), U Michigan (3), FNAL(2)).

7 x 10Gbits/s waves to the SLAC/FNAL booth (2 from SLAC, 1 from the UK, and 4 from FNAL).

The waves were provided by Abilene, Canarie, Cisco (5), ESnet (3), GLORIAD (1), HOPI (1), Michigan Light Rail (MiLR), National Lambda Rail (NLR), TeraGrid (3) and UltraScienceNet (4).

Page 8: SC|05 Bandwidth Challenge ESCC Meeting 9th February ‘06 Yee-Ting Li Stanford Linear Accelerator Center ESCC Meeting 9th February ‘06 Yee-Ting Li Stanford

Network OverviewNetwork Overview

Page 9: SC|05 Bandwidth Challenge ESCC Meeting 9th February ‘06 Yee-Ting Li Stanford Linear Accelerator Center ESCC Meeting 9th February ‘06 Yee-Ting Li Stanford

Hardware (SLAC only)Hardware (SLAC only) At SLAC:

14 x 1.8Ghz Sun v20z (Dual Opteron) 2 x Sun 3500 Disk trays (2TB of storage) 12 x Chelsio T110 10Gb NICs (LR) 2 x Neterion/S2io Xframe I (SR) Dedicated Cisco 6509 with 4 x 4x10GB blades

At SC|05: 14 x 2.6Ghz Sun v20z (Dual Opteron) 10 QLogic HBA’s for StorCloud Access 50TB Storage at SC|05 provide by 3PAR (Shared with

Caltech) 12 x Neterion/S2io Xframe I NICs (SR) 2 x Chelsio T110 NICs (LR) Shared Cisco 6509 with 6 x 4x10GB blades

At SLAC: 14 x 1.8Ghz Sun v20z (Dual Opteron) 2 x Sun 3500 Disk trays (2TB of storage) 12 x Chelsio T110 10Gb NICs (LR) 2 x Neterion/S2io Xframe I (SR) Dedicated Cisco 6509 with 4 x 4x10GB blades

At SC|05: 14 x 2.6Ghz Sun v20z (Dual Opteron) 10 QLogic HBA’s for StorCloud Access 50TB Storage at SC|05 provide by 3PAR (Shared with

Caltech) 12 x Neterion/S2io Xframe I NICs (SR) 2 x Chelsio T110 NICs (LR) Shared Cisco 6509 with 6 x 4x10GB blades

Page 10: SC|05 Bandwidth Challenge ESCC Meeting 9th February ‘06 Yee-Ting Li Stanford Linear Accelerator Center ESCC Meeting 9th February ‘06 Yee-Ting Li Stanford

Hardware at SC|05Hardware at SC|05

Page 11: SC|05 Bandwidth Challenge ESCC Meeting 9th February ‘06 Yee-Ting Li Stanford Linear Accelerator Center ESCC Meeting 9th February ‘06 Yee-Ting Li Stanford

SoftwareSoftware

BBCP ‘Babar File Copy’ Uses ‘ssh’ for authentication Multiple stream capable Features ‘rate synchronisation’ to reduce byte retransmissions Sustained over 9Gbps on a single session

XrootD Library for transparent file access (standard unix file functions) Designed primarily for LAN access (transaction based protocol) Managed over 35Gbit/sec (in two directions) on 2 x 10Gbps

waves Transferred 18TBytes in 257,913 files

DCache 20Gbps production and test cluster traffic

BBCP ‘Babar File Copy’ Uses ‘ssh’ for authentication Multiple stream capable Features ‘rate synchronisation’ to reduce byte retransmissions Sustained over 9Gbps on a single session

XrootD Library for transparent file access (standard unix file functions) Designed primarily for LAN access (transaction based protocol) Managed over 35Gbit/sec (in two directions) on 2 x 10Gbps

waves Transferred 18TBytes in 257,913 files

DCache 20Gbps production and test cluster traffic

Page 12: SC|05 Bandwidth Challenge ESCC Meeting 9th February ‘06 Yee-Ting Li Stanford Linear Accelerator Center ESCC Meeting 9th February ‘06 Yee-Ting Li Stanford

Last year (SC|04)

BWC Aggregate BandwidthBWC Aggregate Bandwidth

Page 13: SC|05 Bandwidth Challenge ESCC Meeting 9th February ‘06 Yee-Ting Li Stanford Linear Accelerator Center ESCC Meeting 9th February ‘06 Yee-Ting Li Stanford

Cumulative Data Transferred

Cumulative Data Transferred

Bandwidth Challenge period

Page 14: SC|05 Bandwidth Challenge ESCC Meeting 9th February ‘06 Yee-Ting Li Stanford Linear Accelerator Center ESCC Meeting 9th February ‘06 Yee-Ting Li Stanford

Component TrafficComponent Traffic

Page 15: SC|05 Bandwidth Challenge ESCC Meeting 9th February ‘06 Yee-Ting Li Stanford Linear Accelerator Center ESCC Meeting 9th February ‘06 Yee-Ting Li Stanford

SLAC-ESnet

FermiLab-HOPI

SLAC-ESnet-USNFNAL-UltraLight

UKLight

Out from booth

SLAC-FermiLab-UK Bandwidth Contributions

SLAC-FermiLab-UK Bandwidth Contributions

In to booth

Page 16: SC|05 Bandwidth Challenge ESCC Meeting 9th February ‘06 Yee-Ting Li Stanford Linear Accelerator Center ESCC Meeting 9th February ‘06 Yee-Ting Li Stanford

In to booth

Out from booth

ESnet routed

ESnet SDN layer2 via USN

Bandwidth Challenge period

SLAC Cluster ContributionsSLAC Cluster Contributions

Page 17: SC|05 Bandwidth Challenge ESCC Meeting 9th February ‘06 Yee-Ting Li Stanford Linear Accelerator Center ESCC Meeting 9th February ‘06 Yee-Ting Li Stanford

SLAC/FNAL BoothSLAC/FNAL BoothAggregate

Mbp

s

Waves

Page 18: SC|05 Bandwidth Challenge ESCC Meeting 9th February ‘06 Yee-Ting Li Stanford Linear Accelerator Center ESCC Meeting 9th February ‘06 Yee-Ting Li Stanford

Problems…Problems…

Managerial/PR Initial request for loan hardware took place 6 months

in advance! Lots and lots of paperwork to keep account of all

loan equipment Logistical

Set up and tore down a pseudo production network and servers in a space of week!

Testing could not begin until waves were alight Most waves lit day before challenge!

Shipping so much hardware not cheap! Setting up monitoring

Managerial/PR Initial request for loan hardware took place 6 months

in advance! Lots and lots of paperwork to keep account of all

loan equipment Logistical

Set up and tore down a pseudo production network and servers in a space of week!

Testing could not begin until waves were alight Most waves lit day before challenge!

Shipping so much hardware not cheap! Setting up monitoring

Page 19: SC|05 Bandwidth Challenge ESCC Meeting 9th February ‘06 Yee-Ting Li Stanford Linear Accelerator Center ESCC Meeting 9th February ‘06 Yee-Ting Li Stanford

Problems…Problems… Tried to configure hardware and software prior to show Hardware

NICS We had 3 bad Chelsios (bad memory) Xframe II’s did not work in UKLight’s Boston machines

Hard-disks 3 dead 10K disks (had to ship in spare)

1 x 4Port 10Gb blade DOA MTU mismatch between domains Router blade died during stress testing day before BWC! Cables! Cables! Cables!

Software Used golden disks for duplication (still takes 30 minutes per disk to replicate!) Linux kernels:

Initially used 2.6.14, found sever performance problems compared to 2.6.12. (New) Router firmware caused crashes under heavy load

Unfortunately, only discovered just before BWC Had to manually restart the affected ports during BWC

Tried to configure hardware and software prior to show Hardware

NICS We had 3 bad Chelsios (bad memory) Xframe II’s did not work in UKLight’s Boston machines

Hard-disks 3 dead 10K disks (had to ship in spare)

1 x 4Port 10Gb blade DOA MTU mismatch between domains Router blade died during stress testing day before BWC! Cables! Cables! Cables!

Software Used golden disks for duplication (still takes 30 minutes per disk to replicate!) Linux kernels:

Initially used 2.6.14, found sever performance problems compared to 2.6.12. (New) Router firmware caused crashes under heavy load

Unfortunately, only discovered just before BWC Had to manually restart the affected ports during BWC

Page 20: SC|05 Bandwidth Challenge ESCC Meeting 9th February ‘06 Yee-Ting Li Stanford Linear Accelerator Center ESCC Meeting 9th February ‘06 Yee-Ting Li Stanford

ProblemsProblems

Most transfers were from memory to memory (Ramdisk etc). Local caching of (small) files in memory Reading and writing to disk will be the next

bottleneck to overcome

Most transfers were from memory to memory (Ramdisk etc). Local caching of (small) files in memory Reading and writing to disk will be the next

bottleneck to overcome

Page 21: SC|05 Bandwidth Challenge ESCC Meeting 9th February ‘06 Yee-Ting Li Stanford Linear Accelerator Center ESCC Meeting 9th February ‘06 Yee-Ting Li Stanford

ConclusionConclusion Previewed the IT Challenges of the next generation Data

Intensive Science Applications (High Energy Physics, astronomy etc) Petabyte-scale datasets Tens of national and transoceanic links at 10 Gbps (and up) 100+ Gbps aggregate data transport sustained for hours; We

reached a Petabyte/day transport rate for real physics data Learned to gauge difficulty of the global networks and

transport systems required for the LHC mission Set up, shook down and successfully ran the systems in < 1

week Understood and optimized the configurations of various

components (Network interfaces, router/switches, OS, TCP kernels, applications) for high performance over the wide area network.

Previewed the IT Challenges of the next generation Data Intensive Science Applications (High Energy Physics, astronomy etc) Petabyte-scale datasets Tens of national and transoceanic links at 10 Gbps (and up) 100+ Gbps aggregate data transport sustained for hours; We

reached a Petabyte/day transport rate for real physics data Learned to gauge difficulty of the global networks and

transport systems required for the LHC mission Set up, shook down and successfully ran the systems in < 1

week Understood and optimized the configurations of various

components (Network interfaces, router/switches, OS, TCP kernels, applications) for high performance over the wide area network.

Page 22: SC|05 Bandwidth Challenge ESCC Meeting 9th February ‘06 Yee-Ting Li Stanford Linear Accelerator Center ESCC Meeting 9th February ‘06 Yee-Ting Li Stanford

ConclusionConclusion Products from this the exercise

An optimized Linux (2.6.12 + NFSv4 + FAST and other TCP stacks) kernel for data transport; after 7 full kernel-build cycles in 4 days

A newly optimized application-level copy program, bbcp, that matches the performance of iperf under some conditions.

Extensions of Xrootd, an optimized low-latency file access application for clusters, across the wide area

Understanding of the limits of 10 Gbps-capable systems under stress. How to effectively utilize 10GE and 1GE connected systems to drive 10

gigabit wavelengths in both directions. Use of production and test clusters at FNAL reaching more than 20

Gbps of network throughput. Significant efforts remain from the perspective of high-energy

physics Management, integration and optimization of network resources End-to-end capabilities able to utilize these network resources. This

includes applications and IO devices (disk and storage systems)

Products from this the exercise An optimized Linux (2.6.12 + NFSv4 + FAST and other TCP stacks)

kernel for data transport; after 7 full kernel-build cycles in 4 days A newly optimized application-level copy program, bbcp, that matches

the performance of iperf under some conditions. Extensions of Xrootd, an optimized low-latency file access application

for clusters, across the wide area Understanding of the limits of 10 Gbps-capable systems under stress. How to effectively utilize 10GE and 1GE connected systems to drive 10

gigabit wavelengths in both directions. Use of production and test clusters at FNAL reaching more than 20

Gbps of network throughput. Significant efforts remain from the perspective of high-energy

physics Management, integration and optimization of network resources End-to-end capabilities able to utilize these network resources. This

includes applications and IO devices (disk and storage systems)

Page 23: SC|05 Bandwidth Challenge ESCC Meeting 9th February ‘06 Yee-Ting Li Stanford Linear Accelerator Center ESCC Meeting 9th February ‘06 Yee-Ting Li Stanford

Press and PRPress and PR 11/8/05 - Brit Boffins aim to Beat LAN speed record from vnunet.com SC|05 Bandwidth Challenge SLAC Interaction Point. Top Researchers, Projects in High Performance Computing Honored

at SC/05 ... Business Wire (press release) - San Francisco, CA, USA 11/18/05 - Official Winner Announcement 11/18/05 - SC|05 Bandwidth Challenge Slide Presentation 11/23/05 - Bandwidth Challenge Results from Slashdot 12/6/05 - Caltech press release 12/6/05 - Neterion Enables High Energy Physics Team to Beat World

Record Speed at SC05 Conference CCN Matthews News Distribution Experts

High energy physics team captures network prize at SC|05 from SLAC High energy physics team captures network prize at SC|05

EurekaAlert! 12/7/05 - High Energy Physics Team Smashes Network Record, from

Science Grid this Week. Congratulations to our Research Partners for a New Bandwidth

Record at SuperComputing 2005, from Neterion.

11/8/05 - Brit Boffins aim to Beat LAN speed record from vnunet.com SC|05 Bandwidth Challenge SLAC Interaction Point. Top Researchers, Projects in High Performance Computing Honored

at SC/05 ... Business Wire (press release) - San Francisco, CA, USA 11/18/05 - Official Winner Announcement 11/18/05 - SC|05 Bandwidth Challenge Slide Presentation 11/23/05 - Bandwidth Challenge Results from Slashdot 12/6/05 - Caltech press release 12/6/05 - Neterion Enables High Energy Physics Team to Beat World

Record Speed at SC05 Conference CCN Matthews News Distribution Experts

High energy physics team captures network prize at SC|05 from SLAC High energy physics team captures network prize at SC|05

EurekaAlert! 12/7/05 - High Energy Physics Team Smashes Network Record, from

Science Grid this Week. Congratulations to our Research Partners for a New Bandwidth

Record at SuperComputing 2005, from Neterion.

Page 24: SC|05 Bandwidth Challenge ESCC Meeting 9th February ‘06 Yee-Ting Li Stanford Linear Accelerator Center ESCC Meeting 9th February ‘06 Yee-Ting Li Stanford
Page 25: SC|05 Bandwidth Challenge ESCC Meeting 9th February ‘06 Yee-Ting Li Stanford Linear Accelerator Center ESCC Meeting 9th February ‘06 Yee-Ting Li Stanford

SLAC/UK ContributionSLAC/UK Contribution

ESnet/USN layer 2

UKLightIn to booth

Out frombooth

ESnet routed

Page 26: SC|05 Bandwidth Challenge ESCC Meeting 9th February ‘06 Yee-Ting Li Stanford Linear Accelerator Center ESCC Meeting 9th February ‘06 Yee-Ting Li Stanford

SLAC/Esnet ContributionSLAC/Esnet ContributionM

bps

Hosts

Aggregate

Page 27: SC|05 Bandwidth Challenge ESCC Meeting 9th February ‘06 Yee-Ting Li Stanford Linear Accelerator Center ESCC Meeting 9th February ‘06 Yee-Ting Li Stanford

HOPIUSN

FermiLab ContributionFermiLab Contribution

UltraLight