hpc at cern and the grid - speedup · cern orap/speedup october 2000 hpc at cern and the grid ......

30
CERN ORAP/Speedup October 2000 HPC at CERN and the Grid HPC at CERN and the Grid Fabrizio Gagliardi Fabrizio Gagliardi CERN CERN Information Technology Division Information Technology Division October, 2000 October, 2000 [email protected] [email protected]

Upload: trankiet

Post on 04-Jun-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: HPC at CERN and the Grid - SPEEDUP · CERN ORAP/Speedup October 2000 HPC at CERN and the Grid ... high speed network computer farm tape ... Analogy with the electrical power GRID

CERN

ORAP/Speedup October 2000

HPC at CERN and the GridHPC at CERN and the Grid

Fabrizio GagliardiFabrizio Gagliardi

CERNCERN

Information Technology DivisionInformation Technology Division

October, 2000October, 2000

[email protected]@cern.ch

Page 2: HPC at CERN and the Grid - SPEEDUP · CERN ORAP/Speedup October 2000 HPC at CERN and the Grid ... high speed network computer farm tape ... Analogy with the electrical power GRID

F. Gagliardi - CERN/IT 2

CERN

ORAP/Speedup October 2000

online systemmulti-level triggerfilter out backgroundreduce data volume

Page 3: HPC at CERN and the Grid - SPEEDUP · CERN ORAP/Speedup October 2000 HPC at CERN and the Grid ... high speed network computer farm tape ... Analogy with the electrical power GRID

CERN

ORAP/Speedup October 2000

Event Filter & ReconstructionEvent Filter & Reconstruction(figures are for (figures are for one experimentone experiment))

switch

data from detector - event builder

high speed network

computer farm

tapeand disk servers

raw datasummary data

input: 5-100 GB/sec

capacity: 50K SI95 (~4K 1999 PCs)

recording rate: 100 MB/sec (Alice – 1 GB/sec)

+ 1-1.25 PetaByte/year + 1-500 TB/year20,000 Redwood cartridges every year (+ copy)

Page 4: HPC at CERN and the Grid - SPEEDUP · CERN ORAP/Speedup October 2000 HPC at CERN and the Grid ... high speed network computer farm tape ... Analogy with the electrical power GRID

interactivephysicsanalysis

batchphysicsanalysis

batchphysicsanalysis

detector

event summary data

rawdata

eventreconstruction

eventreconstruction

eventsimulation

eventsimulation

analysis objects(extracted by physics topic)

Offline Data andComputation forPhysics Analysisevent filter

(selection &reconstruction)

event filter(selection &

reconstruction)

processeddata

Page 5: HPC at CERN and the Grid - SPEEDUP · CERN ORAP/Speedup October 2000 HPC at CERN and the Grid ... high speed network computer farm tape ... Analogy with the electrical power GRID

F. Gagliardi - CERN/IT 5

CERN

ORAP/Speedup October 2000

Estimated CPU Capacity at CERN

0

500

1,000

1,500

2,000

2,500

1998 1999 2000 2001 2002 2003 2004 2005 2006

year

K SI

95

~10K SI951200 processors

Non-LHC

technology-price curve (40%annual price improvement)

LHC

Capacity that can be purchased for the value of the equipment present in 2000

Page 6: HPC at CERN and the Grid - SPEEDUP · CERN ORAP/Speedup October 2000 HPC at CERN and the Grid ... high speed network computer farm tape ... Analogy with the electrical power GRID

F. Gagliardi - CERN/IT 6

CERN

ORAP/Speedup October 2000

Estimated DISK Capacity ay CERN

0

200

400

600

800

1000

1200

1400

1600

1800

1998 1999 2000 2001 2002 2003 2004 2005 2006

year

Tera

Byte

s

Non-LHC

technology-price curve (40%annual price improvement)

LHC

Page 7: HPC at CERN and the Grid - SPEEDUP · CERN ORAP/Speedup October 2000 HPC at CERN and the Grid ... high speed network computer farm tape ... Analogy with the electrical power GRID

F. Gagliardi - CERN/IT 7

CERN

ORAP/Speedup October 2000

Long Term Tape Storage Estimates

Current Experiments

COMPASS

LHC

02'0004'0006'0008'000

10'00012'00014'000

1995

1996

1997

1998

1999

2000

2001

2002

2003

2004

2005

2006

Year

Tera

Byt

es

Page 8: HPC at CERN and the Grid - SPEEDUP · CERN ORAP/Speedup October 2000 HPC at CERN and the Grid ... high speed network computer farm tape ... Analogy with the electrical power GRID

F. Gagliardi - CERN/IT 8

CERN

ORAP/Speedup October 2000

HPC or HTCHPC or HTC

High High ThroughputThroughput Computing Computingmass of modest problemsthroughput rather than performanceresilience rather than ultimate reliability

Can exploit Can exploit inexpensive mass marketinexpensive mass market components componentsBut we need to marry these withBut we need to marry these with

inexpensiveinexpensive highly highly scalable managementscalable management tools tools

Much in common with data mining, Internet computingMuch in common with data mining, Internet computingfacilities, ……facilities, ……

Page 9: HPC at CERN and the Grid - SPEEDUP · CERN ORAP/Speedup October 2000 HPC at CERN and the Grid ... high speed network computer farm tape ... Analogy with the electrical power GRID

F. Gagliardi - CERN/IT 9

CERN

ORAP/Speedup October 2000

History-1History-1

1960s through 1980s1960s through 1980sThe largest scientific mainframes (Control Data, Cray, IBM,Siemens/Fujitsu)Time-sharing interactive services on IBM & DEC-VMSScientific workstations from 1982 (Apollo) for development, finalanalysis

1988 -- 1988 -- On-line computing farms (Falcon) -On-line computing farms (Falcon) -joint project with Digital (microVax and Vaxstations)joint project with Digital (microVax and Vaxstations)

19891989 -- -- First batch services on RISC -First batch services on RISC - joint project with HP (Apollo DN10.000 )joint project with HP (Apollo DN10.000 )1990 -- 1990 -- Central Simulation Facility (CSF) - 4 X mainframe capacityCentral Simulation Facility (CSF) - 4 X mainframe capacity1991 -- 1991 -- SHIFT - data intensive applications, distributed modelSHIFT - data intensive applications, distributed model1993 -- 1993 -- First central interactive service on RISCFirst central interactive service on RISC

Page 10: HPC at CERN and the Grid - SPEEDUP · CERN ORAP/Speedup October 2000 HPC at CERN and the Grid ... high speed network computer farm tape ... Analogy with the electrical power GRID

F. Gagliardi - CERN/IT 10

CERN

ORAP/Speedup October 2000

History-2History-2

1994 -- 1994 -- 128 processor QSW (Meiko/QSW) CS2 and 72128 processor QSW (Meiko/QSW) CS2 and 72processor IBM SP-2processor IBM SP-21996 -- 1996 -- Last mainframe de-commissionedLast mainframe de-commissioned1997 --1997 -- First batch services on PCs First batch services on PCs1998 -- 1998 -- NA48 record 70 TeraBytes of data in one yearNA48 record 70 TeraBytes of data in one year

Page 11: HPC at CERN and the Grid - SPEEDUP · CERN ORAP/Speedup October 2000 HPC at CERN and the Grid ... high speed network computer farm tape ... Analogy with the electrical power GRID

F. Gagliardi - CERN/IT 11

CERN

ORAP/Speedup October 2000

LHC Computing Fabric —

Can we scale up the current commodity-component based approach?

Page 12: HPC at CERN and the Grid - SPEEDUP · CERN ORAP/Speedup October 2000 HPC at CERN and the Grid ... high speed network computer farm tape ... Analogy with the electrical power GRID

network servers

tape servers

disk servers

application servers

Generic computing farm

Cern/it/pdp-les.robertson 10-98-12

Page 13: HPC at CERN and the Grid - SPEEDUP · CERN ORAP/Speedup October 2000 HPC at CERN and the Grid ... high speed network computer farm tape ... Analogy with the electrical power GRID

F. Gagliardi - CERN/IT 13

CERN

ORAP/Speedup October 2000

StandardStandard components components

Computing & Storage FabricComputing & Storage Fabric built up from commodity componentsbuilt up from commodity components

Simple PCsInexpensive network-attached diskStandard network interface (whatever Ethernet happens to be in 2006)

with a minimum of high(with a minimum of high(erer)-end components)-end componentsLAN backboneWAN connection

Page 14: HPC at CERN and the Grid - SPEEDUP · CERN ORAP/Speedup October 2000 HPC at CERN and the Grid ... high speed network computer farm tape ... Analogy with the electrical power GRID

F. Gagliardi - CERN/IT 14

CERN

ORAP/Speedup October 2000

HEP’sHEP’s not special, just more not special, just more cost consciouscost conscious

Computing & Storage FabricComputing & Storage Fabric built up from commodity componentsbuilt up from commodity components

Simple PCsSimple PCsInexpensive network-attached diskStandard network interface

with a minimum of high(with a minimum of high(erer)-end components)-end componentsLAN backboneWAN connection

Page 15: HPC at CERN and the Grid - SPEEDUP · CERN ORAP/Speedup October 2000 HPC at CERN and the Grid ... high speed network computer farm tape ... Analogy with the electrical power GRID

F. Gagliardi - CERN/IT 15

CERN

ORAP/Speedup October 2000

LimitLimited ed role of high end equipmentrole of high end equipment

Computing & Storage FabricComputing & Storage Fabric built up from commodity componentsbuilt up from commodity components

Simple PCsInexpensive network-attached diskStandard network interface (whatever Ethernet happens to be in 2006)

with a minimum of high(with a minimum of high(erer)-end components)-end components

LAN backbone WAN connection

Page 16: HPC at CERN and the Grid - SPEEDUP · CERN ORAP/Speedup October 2000 HPC at CERN and the Grid ... high speed network computer farm tape ... Analogy with the electrical power GRID

F. Gagliardi - CERN/IT 16

CERN

ORAP/Speedup October 2000

Not everything has been Not everything has been commoditised commoditised yetyet

Page 17: HPC at CERN and the Grid - SPEEDUP · CERN ORAP/Speedup October 2000 HPC at CERN and the Grid ... high speed network computer farm tape ... Analogy with the electrical power GRID

F. Gagliardi - CERN/IT 17

CERN

ORAP/Speedup October 2000

CMS: 1800 physicists150 institutes32 countries

World Wide Collaboration ⇒ distributed computing & storage capacity

Page 18: HPC at CERN and the Grid - SPEEDUP · CERN ORAP/Speedup October 2000 HPC at CERN and the Grid ... high speed network computer farm tape ... Analogy with the electrical power GRID

F. Gagliardi - CERN/IT 18

CERN

ORAP/Speedup October 2000

Exploit established computing expertise & infrastructureExploit established computing expertise & infrastructureIn national labs, universitiesIn national labs, universities

Reduce dependence on links to CERNReduce dependence on links to CERNfull ESD available nearby - through a fat, fast, reliablenetwork link

Tap funding sources not otherwise available to HEPTap funding sources not otherwise available to HEP

Regional Computing Regional Computing CentresCentres

Page 19: HPC at CERN and the Grid - SPEEDUP · CERN ORAP/Speedup October 2000 HPC at CERN and the Grid ... high speed network computer farm tape ... Analogy with the electrical power GRID

F. Gagliardi - CERN/IT 19

CERN

ORAP/Speedup October 2000

Regional Centres - a Multi-Tier ModelRegional Centres - a Multi-Tier Model

Department αααα ββββγγγγ

Desktop

CERN – Tier 0

MONARC report: http://home.cern.ch/~barone/monarc/RCArchitecture.html

Tier 1 FNAL RAL

IN2P362

2 Mbps

2.5 Gbps

622

Mbp

s

155 mb

ps 155 mbps

Tier2 Lab aUni b Lab c

Uni n

Page 20: HPC at CERN and the Grid - SPEEDUP · CERN ORAP/Speedup October 2000 HPC at CERN and the Grid ... high speed network computer farm tape ... Analogy with the electrical power GRID

F. Gagliardi - CERN/IT 20

CERN

ORAP/Speedup October 2000

More realistically - a Grid TopologyMore realistically - a Grid Topology

CERN – Tier 0

Tier 1 FNAL RAL

IN2P362

2 Mbps

2.5 Gbps

622

Mbp

s

155 mb

ps 155 mbps

Tier2 Lab aUni b Lab c

Uni n

Department αααα ββββγγγγ

Desktop

DHL

Page 21: HPC at CERN and the Grid - SPEEDUP · CERN ORAP/Speedup October 2000 HPC at CERN and the Grid ... high speed network computer farm tape ... Analogy with the electrical power GRID

F. Gagliardi - CERN/IT 21

CERN

ORAP/Speedup October 2000

Summary - the basic problemSummary - the basic problem

ScalabilityScalabilityThousands of processors, thousands of disks, PetaBytesof data, Terabits/second of I/O bandwidth, ….

Wide-area distributionWide-area distributionWANs are and will be 1% of LANsDistribute, replicate, cache, synchronise the dataMultiple ownership, policies, ….integration of this amorphous collection of RegionalCentresWith some attempt at optimisation

AdaptabilityAdaptabilityWe shall only know how analysis is done once the dataarrives

Page 22: HPC at CERN and the Grid - SPEEDUP · CERN ORAP/Speedup October 2000 HPC at CERN and the Grid ... high speed network computer farm tape ... Analogy with the electrical power GRID

F. Gagliardi - CERN/IT 22

CERN

ORAP/Speedup October 2000

Are Grids a solution?Are Grids a solution?

Change of orientation of US Meta-computing activityChange of orientation of US Meta-computing activityFrom inter-connected super-computers … .. towards a more general concept of a computational Grid (The Grid – Ian Foster, Carl Kesselman)

Has initiated a flurry of activity in HEPHas initiated a flurry of activity in HEPUS – Particle Physics Data Grid (PPDG)Grid technology evaluation project in INFNUK proposal for funding for a prototype gridGriPhyN – data grid proposal just approved by NSFNASA Information Processing Grid

DataGridDataGrid initiative launched initiative launched

Page 23: HPC at CERN and the Grid - SPEEDUP · CERN ORAP/Speedup October 2000 HPC at CERN and the Grid ... high speed network computer farm tape ... Analogy with the electrical power GRID

F. Gagliardi - CERN/IT 23

CERN

ORAP/Speedup October 2000

The GRID metaphorThe GRID metaphor

Unlimited ubiquitous distributed computingUnlimited ubiquitous distributed computing

Transparent access to multipetabyteTransparent access to multipetabytedistributed data basesdistributed data bases

Easy to plug inEasy to plug in

Hidden complexity of the infrastructureHidden complexity of the infrastructure

Analogy with the electrical power GRIDAnalogy with the electrical power GRID

Page 24: HPC at CERN and the Grid - SPEEDUP · CERN ORAP/Speedup October 2000 HPC at CERN and the Grid ... high speed network computer farm tape ... Analogy with the electrical power GRID

F. Gagliardi - CERN/IT 24

CERN

ORAP/Speedup October 2000

The Grid from a Services ViewThe Grid from a Services View

:

:E.g.,

Applications

Resource-specific implementations of basic servicesE.g., Transport protocols, name servers, differentiated services, CPU schedulers, public keyinfrastructure, site accounting, directory service, OS bypass

Resource-independent and application-independent services authentication, authorization, resource location, resource allocation, events, accounting,

remote data access, information, policy, fault detection

DistributedComputing

Toolkit

Grid Fabric(Resources)

Grid Services(Middleware)

ApplicationToolkits

Data-Intensive

ApplicationsToolkit

CollaborativeApplications

Toolkit

RemoteVisualizationApplications

Toolkit

ProblemSolving

ApplicationsToolkit

RemoteInstrumentation

ApplicationsToolkit

Applications Chemistry

Biology

Cosmology

High Energy Physics

Environment

Page 25: HPC at CERN and the Grid - SPEEDUP · CERN ORAP/Speedup October 2000 HPC at CERN and the Grid ... high speed network computer farm tape ... Analogy with the electrical power GRID

F. Gagliardi - CERN/IT 25

CERN

ORAP/Speedup October 2000

Five Emerging Models of Networked Computing FromFive Emerging Models of Networked Computing FromThe GridThe Grid

Distributed ComputingDistributed Computing|| synchronous processing

High-Throughput ComputingHigh-Throughput Computing|| asynchronous processing

On-Demand ComputingOn-Demand Computing|| dynamic resources

Data-Intensive ComputingData-Intensive Computing|| databases

Collaborative ComputingCollaborative Computing|| scientists

Ian Foster and Carl Kesselman, editors, “The Grid: Blueprint for a New ComputingInfrastructure,” Morgan Kaufmann, 1999, http://www.mkp.com/grids

Page 26: HPC at CERN and the Grid - SPEEDUP · CERN ORAP/Speedup October 2000 HPC at CERN and the Grid ... high speed network computer farm tape ... Analogy with the electrical power GRID

F. Gagliardi - CERN/IT 26

CERN

ORAP/Speedup October 2000

R&D requiredR&D required

Local fabricLocal fabricManagement of giant computing fabricsManagement of giant computing fabrics

auto-installation, configuration management, resilience, self-healing

Mass storage managementMass storage managementmulti-PetaByte data storage, “real-time” data recordingrequirement, active tape layer – 1,000s of users

Wide-area - Wide-area - building on an existing framework & RN (e.g.building on an existing framework & RN (e.g.GlobusGlobus,,GeantGeant and high performance network R&D) and high performance network R&D)workload managementworkload management

no central statuslocal access policies

data managementdata managementcaching, replication, synchronisationobject database model

application monitoringapplication monitoring

Page 27: HPC at CERN and the Grid - SPEEDUP · CERN ORAP/Speedup October 2000 HPC at CERN and the Grid ... high speed network computer farm tape ... Analogy with the electrical power GRID

F. Gagliardi - CERN/IT 27

CERN

ORAP/Speedup October 2000

HEP Data Grid InitiativeHEP Data Grid Initiative

European levelEuropean level coordination coordination of national initiatives & of national initiatives &projectsprojectsPrincipal goals:Principal goals:

Middleware for fabric & Grid managementLarge scale testbed - major fraction of one LHCexperimentProduction quality HEP demonstrations

“mock data”, simulation analysis, current experimentsOther science demonstrations

Three year phased developments & demosThree year phased developments & demosComplementary to other GRID projectsComplementary to other GRID projects

EuroGrid: Uniform access to parallel supercomputingresources

Synergy being developed (GRID Forum, Industry andSynergy being developed (GRID Forum, Industry andResearch Forum)Research Forum)

Page 28: HPC at CERN and the Grid - SPEEDUP · CERN ORAP/Speedup October 2000 HPC at CERN and the Grid ... high speed network computer farm tape ... Analogy with the electrical power GRID

F. Gagliardi - CERN/IT 28

CERN

ORAP/Speedup October 2000

ParticipantsParticipants

Main partners: CERN, INFN(I), CNRS(F), PPARC(UK),Main partners: CERN, INFN(I), CNRS(F), PPARC(UK),NIKHEF(NL), ESA-Earth ObservationNIKHEF(NL), ESA-Earth ObservationOther sciences: KNMI(NL), Biology, MedicineOther sciences: KNMI(NL), Biology, MedicineIndustrial participation: CS SI/F,Industrial participation: CS SI/F, DataMat DataMat/I, IBM/UK/I, IBM/UKAssociated partners: Czech Republic, Finland, Germany,Associated partners: Czech Republic, Finland, Germany,Hungary, Spain, Sweden (mostly computer scientists)Hungary, Spain, Sweden (mostly computer scientists)Formal collaboration with USA establishedFormal collaboration with USA establishedIndustry and Research Project Forum with representativesIndustry and Research Project Forum with representativesfrom:from:

Denmark, Greece, Israel, Japan, Norway, Poland, Portugal,Russia, Switzerland

Page 29: HPC at CERN and the Grid - SPEEDUP · CERN ORAP/Speedup October 2000 HPC at CERN and the Grid ... high speed network computer farm tape ... Analogy with the electrical power GRID

F. Gagliardi - CERN/IT 29

CERN

ORAP/Speedup October 2000

StatusStatus

Prototype work already started at CERN and in most ofPrototype work already started at CERN and in most ofcollaborating institutes (Globus initial installation andcollaborating institutes (Globus initial installation andtests)tests)

Proposal to the EU positively reviewed at the end ofProposal to the EU positively reviewed at the end ofJuly, 9.8 M July, 9.8 M EurosEuros (covering 1/3 of total investment), 3 (covering 1/3 of total investment), 3years contract being negotiated nowyears contract being negotiated now

Expect start of the project, January next yearExpect start of the project, January next year

Page 30: HPC at CERN and the Grid - SPEEDUP · CERN ORAP/Speedup October 2000 HPC at CERN and the Grid ... high speed network computer farm tape ... Analogy with the electrical power GRID

F. Gagliardi - CERN/IT 30

CERN

ORAP/Speedup October 2000

ConclusionsConclusions

The Grid is a useful metaphor to describe an appropriateThe Grid is a useful metaphor to describe an appropriatecomputing model for LHC and future HEP computingcomputing model for LHC and future HEP computingMiddlewareMiddleware, APIs and interface general enough to, APIs and interface general enough toaccommodate many different models for science,accommodate many different models for science,industry and commerceindustry and commerceStill important R&D to be doneStill important R&D to be donePerfect field for multidisciplinary collaboration (computerPerfect field for multidisciplinary collaboration (computerscience, physics and other sciences)science, physics and other sciences)If successful could develop next generation InternetIf successful could develop next generation InternetcomputingcomputingMajor funding agencies prepared to fund large testbedsMajor funding agencies prepared to fund large testbedsin USA, EU and Japanin USA, EU and JapanExcellent opportunity for HEP computing to deploy aExcellent opportunity for HEP computing to deploy asustainable HPC modelsustainable HPC model