computing strategy victoria white, associate lab director for computing and cio fermilab pac june...

45
Computing Strategy Victoria White, Associate Lab Director for Computing and CIO Fermilab PAC June 24, 2011

Post on 19-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Computing Strategy Victoria White, Associate Lab Director for Computing and CIO Fermilab PAC June 24, 2011

Computing Strategy

Victoria White, Associate Lab Director for Computing and CIOFermilab PACJune 24, 2011

Page 2: Computing Strategy Victoria White, Associate Lab Director for Computing and CIO Fermilab PAC June 24, 2011

The Experiments you approve

• Depend heavily (at all stages from inception to publication and beyond) on Computing:

Facilities (power, cooling, space) Data storage and distribution Compute servers Grid services Databases High performance networks Software frameworks for simulation, processing,

analysis Tools such as GEANT, ROOT, Pythia, GENIE General tools to support collaboration, documentation,

code management, etc.

Computing Strategy - Fermilab PAC 6/24/20112

Page 3: Computing Strategy Victoria White, Associate Lab Director for Computing and CIO Fermilab PAC June 24, 2011

Our job in the Computing Sector

• Is to enable science and to optimize the support (human and technological) of the scientific programs of the lab (including the Experiment program)

Within funding and resource contraints In the face of growing demands To meet emerging needs To deal with rapidly changing technology

• We also have to provide computing to support the lab’s operations and provide all the standard services that an organization needs (and often expects 24x7)

Computing Strategy - Fermilab PAC 6/24/20113

Page 4: Computing Strategy Victoria White, Associate Lab Director for Computing and CIO Fermilab PAC June 24, 2011

Computing Division -> Computing Sector

Computing Strategy - Fermilab PAC 6/24/20114

Service Management• Business Relationship

Management (BSM)• ITIL Process Owners• Continuous Service

Improvement Program • ISO 20K Certification

Office of the CIO• Enterprise Architecture (EA) &

Configuration Management• Computer Security• Governance and Portfolio

Management• Project Management Office• Financial Management

Page 5: Computing Strategy Victoria White, Associate Lab Director for Computing and CIO Fermilab PAC June 24, 2011

Scientific Computing strategy

• Provide computing, software tools and expertise to all parts of the Fermilab scientific program including theory simulations (Lattice QCD and Cosmology), and accelerator modeling

• Work closely with each scientific program – as collaborators (where a scientist from computing is involved) and as valued customers.

• Create a coherent Scientific Computing program from the many parts and many funding sources – encouraging sharing of facilities, common approaches and re-use of software wherever possible

Computing Strategy - Fermilab PAC 6/24/20115

Page 6: Computing Strategy Victoria White, Associate Lab Director for Computing and CIO Fermilab PAC June 24, 2011

EXPERIMENT COMPUTING STRATEGIES

Computing Strategy - Fermilab PAC 6/24/20116

Page 7: Computing Strategy Victoria White, Associate Lab Director for Computing and CIO Fermilab PAC June 24, 2011

CMS Tier 1 at Fermilab

• The CMS Tier-1 facility at Fermilab and the experienced team who operate it enable CMS to reprocess data quickly and to distribute the data reliably to the user community around the world.

Computing Strategy - Fermilab PAC 6/24/20117

Fermilab also operates: • LHC Physics Center (LPC)• Remote Operations Center• U.S. CMS Analysis Facility

Page 8: Computing Strategy Victoria White, Associate Lab Director for Computing and CIO Fermilab PAC June 24, 2011

CMS Offline and Computing

• Fermilab is a hub for CMS Offline and Computing

Ian Fisk is the CMS Computing Coordinator Liz Sexton-Kennedy is Deputy Offline Coordinator Patricia McBride is Deputy Computing Coordinator Leadership roles in many areas in CMS Offline and

Computing: Frameworks, Simulations, Data Quality Monitoring, Workload Management and Data Management, Data Operations, Integration and User Support.

• Fermilab Remote Operations Center allows US physicists to participate in monitoring shifts for CMS.

Computing Strategy - Fermilab PAC 6/24/20118

Page 9: Computing Strategy Victoria White, Associate Lab Director for Computing and CIO Fermilab PAC June 24, 2011

Computing Strategy for CMS

• Continue to evolve the CMS Tier 1 center at Fermilab - to meet US obligations to CMS and provide the highest level of availability and functionality for the $

• Continue to ensure that the LHC Physics Center and the US CMS physics community is well supported by the Tier 3 (LPC CAF) at Fermilab

• Plan for evolution of the computing, software and data access models as the experiment matures – requires R&D and development

Ever higher bandwidth networks Data on demand Frameworks for multi-core

Computing Strategy - Fermilab PAC 6/24/20119

Page 10: Computing Strategy Victoria White, Associate Lab Director for Computing and CIO Fermilab PAC June 24, 2011

10

Any Data, Anywhere, Any time: Early Demonstrator

• Root I/O and Xrootd demonstrator : an example of evolving requirements and technology

Computing Strategy - Fermilab PAC 6/24/2011

Page 11: Computing Strategy Victoria White, Associate Lab Director for Computing and CIO Fermilab PAC June 24, 2011

Computing Strategy - Fermilab PAC 6/24/2011

Run II Computing Strategy

• Production processing and Monte-Carlo production capability after the end of data taking

Reprocessing efforts in 2011/2012 aimed at the Higgs Monte Carlo production at the current rate through mid-

2013• Analysis computing capability for at least 5 years,

but diminishing after end of 2012 Push for 2012 conferences for many results –no large

drop in computing requirements through this period• Continued support for up to 5 years for

Code management and science software infrastructure Data handling for production (+MC) and Analysis

Operations• Curation of the data: > 10 years with possibly some

support for continuing analyses11

Page 12: Computing Strategy Victoria White, Associate Lab Director for Computing and CIO Fermilab PAC June 24, 2011

Tevatron – looking ahead

Computing Strategy - Fermilab PAC 6/24/201112

CDF and D0 expect the publication rate to remain stable for several years.

Analysis activity: Expect > 100 (students+

postdocs) actively doing analysis in each experiment through 2012.

Expect this number to be much smaller in 2015 though data analysis will still be on-going.

D0 Publications each year

CDF Publications each year

Page 13: Computing Strategy Victoria White, Associate Lab Director for Computing and CIO Fermilab PAC June 24, 2011

“Data Preservation” for Tevatron data

• Data will be stored and migrated to new tape technologies for ~ 10 years

Eventually 16 PB of data will seem modest• If we want to maintain the ability to reprocess

and do analysis on the data there is a lot of work to be done to keep the entire environment viable

Code, access to databases, libraries, I/O routines, Operating Systems, documentation…..

• If there is a goal to provide “open data” that scientists outside of CDF and Dzero could use there is even more work to do.

• 4th Data Preservation Workshop was held at Fermilab in May

• Not just a Tevatron issueComputing Strategy - Fermilab PAC 6/24/201113

Page 14: Computing Strategy Victoria White, Associate Lab Director for Computing and CIO Fermilab PAC June 24, 2011

Intensity Frontier program needs

Computing Strategy - Fermilab PAC 6/24/201114

• Many experiments in many different phases of development/operations.• MINOS• MiniBooNE• SciBooNE• MINERvA• NOvA• MicroBooNE• ArgoNeuT• Mu2e• g-2• LBNE• Project X era expts

CPU (cores)

Disk (TB)

1 PB

Page 15: Computing Strategy Victoria White, Associate Lab Director for Computing and CIO Fermilab PAC June 24, 2011

Intensity Frontier strategies

• NuComp forum to encourage planning and common approaches where possible

• A shared analysis facility where we can quickly and flexibly allocate computing to experiments

• Continue to work to “grid enable” the simulation and processing software

Good success with MINOS, MINERvA and Mu2e• All experiments use shared storage services –

for data and local disk – so we can allocate resources when needed

• Hired two associate scientists in the past year and reassigned another scientist.

Computing Strategy - Fermilab PAC 6/24/201115

Page 16: Computing Strategy Victoria White, Associate Lab Director for Computing and CIO Fermilab PAC June 24, 2011

Budget/resource allocation for 2012 +

• There is always upward pressure for computing more disk and more cpu leads to faster results and greater

flexibility more help with software & operations is always requested

• Within a fixed budget each experiment can usually optimize between tape drives, tapes, disk, cpu, servers

assuming basic shared services are provided.• With so many experiments in so many different stages

we intend to convene a “Scientific Computing Portfolio Management Team” to examine the needs/computing models of the different Fermilab based experiments and help in allocating the finite dollars to optimize scientific output.

Computing Strategy - Fermilab PAC 6/24/201116

Page 17: Computing Strategy Victoria White, Associate Lab Director for Computing and CIO Fermilab PAC June 24, 2011

Cosmic Frontier experiments

• Continue to curate data for SDSS • Support data and processing for Auger,

CDMS and COUPP • Will maintain an archive copy of the DES

data and provide modest analysis facilities for Fermilab DES scientists.

Data management is an NCSA (NSF) responsibility

We have the capability to provide computing should this become necessary

• DES use Open Science Grid resources opportunistically

• Future initiatives still in the planning stages

Computing Strategy - Fermilab PAC 6/24/201117

SDSS

DES

Page 18: Computing Strategy Victoria White, Associate Lab Director for Computing and CIO Fermilab PAC June 24, 2011

Computing Strategy - Fermilab PAC 6/24/2011

DES Analysis Computing at Fermilab

18

• Fermilab plans to host a copy of the DES Science Archive. This consists of two pieces

A copy of the Science database A copy of the relevant image data on disk and tape

• This copy serves a number of different roles Acts as a backup for the primary NCSA archive, enabling

collaboration access to the data when the primary is unavailable Handles queries by the collaboration, thus supplementing the

resources at NCSA Enables the Fermilab scientists to effectively exploit the DES

data for science analysis• To support the science analysis of the Fermilab

Scientists, DES will need a modest amount of computing (of order 24 nodes). This is similar to what was supported for the SDSS project.

Page 19: Computing Strategy Victoria White, Associate Lab Director for Computing and CIO Fermilab PAC June 24, 2011

LSST

• Fermilab recently joined LSST• Fermilab expertise in data management,

software frameworks, overall computing from SDSS and from the entire program means we

could contribute effectively• Currently negotiating small roles in

Data Acquisition (where it touches data management) Science Analysis (where it touches data management)

Computing Strategy - Fermilab PAC 6/24/201119

Page 20: Computing Strategy Victoria White, Associate Lab Director for Computing and CIO Fermilab PAC June 24, 2011

Computing Strategy - Fermilab PAC 6/24/2011

SOFTWARE IN COLLABORATION

20

Page 21: Computing Strategy Victoria White, Associate Lab Director for Computing and CIO Fermilab PAC June 24, 2011

Computing Strategy - Fermilab PAC 6/24/2011

Software Tools and frameworks: our strategy

• Develop and maintain core expertise and tools, aiming to support the entire lifecycle of scientific programs

Focus on areas of general applicability with long term support requirements

Work in partnership with individual programs to create scientific applications

Participate in projects and collaborations that aim to develop scientific computational infrastructure

• Provide support of concept development to scientific programs in pre-project phase

Enabled by core expertise and tools • Reuse expertise and best-of-class tools from

partnerships with individual projects and make them available to other projects

21

Page 22: Computing Strategy Victoria White, Associate Lab Director for Computing and CIO Fermilab PAC June 24, 2011

22

Framework Applications

Success: specific application (RunII) leads to community tool and continuing requests for framework applications from new projects

Success: high-quality implementations (most recently, CMS framework)

RunII Offline infrastructure

Framework

LQCD software

LArNOv

A

CMSMu2e

MiniBooNE

Computing Strategy - Fermilab PAC 6/24/2011

Page 23: Computing Strategy Victoria White, Associate Lab Director for Computing and CIO Fermilab PAC June 24, 2011

“CMS framework in excellent shape and well validated*”

Computing Strategy - Fermilab PAC 6/24/201123

*CMS offline coordinators, Dec 2010

Page 24: Computing Strategy Victoria White, Associate Lab Director for Computing and CIO Fermilab PAC June 24, 2011

Detector Simulation

• GEANT activity: members of G4 collaboration since 2007, toolkit capability development.

• Work in critical areas defined by G4 external reviews

• Simulation development & support activity: provide expertise and support to Fermilab projects and users.

• Applications in high-priority areas for the Fermilab program. Shifting from LHC/CMS main focus to Intensity Frontier

• Toolkit evolution: in collaboration with other institutions (SLAC, CERN,…)

• Optimize performance of existing toolkit• Enhance capabilities and improve infrastructure

Computing Strategy - Fermilab PAC 6/24/201124

Page 25: Computing Strategy Victoria White, Associate Lab Director for Computing and CIO Fermilab PAC June 24, 2011

Analysis suites for the community: ROOT

• ROOT is the standard HEP analysis toolkit, used for RunII, LHC, and Intensity Frontier

Fermilab is a founding member of the ROOT project

• Support deployment and operation of ROOT applications by Fermilab users and projects

• Development emphasis, in collaboration with CERN, to optimize I/O (essential for LHC) and thread safety (driven by technology evolution and LHC needs)

2525 Computing Strategy - Fermilab PAC 6/24/2011

Page 26: Computing Strategy Victoria White, Associate Lab Director for Computing and CIO Fermilab PAC June 24, 2011

Computing Strategy - Fermilab PAC 6/24/2011

Software – collaborative efforts

26

• ComPASS – Accelerator Modeling Tools project• Lattice QCD project and USQCD Collaboration• Open Science Grid – many aspects and some sub-

projects such as Grid security, workload management• Grid and Data Management tools• Advanced Wide Area Network projects • Dcache collaboration• Enstore collaboration• Scientific Linux (with CERN)• GEANT core development /validation (with GEANT4

collaboration)• ROOT development & support (with CERN)• Cosmological Computing• Data Preservation initiative (global HEP)

Page 27: Computing Strategy Victoria White, Associate Lab Director for Computing and CIO Fermilab PAC June 24, 2011

SHARING STRATEGIES

Computing Strategy - Fermilab PAC 6/24/201127

Page 28: Computing Strategy Victoria White, Associate Lab Director for Computing and CIO Fermilab PAC June 24, 2011

Why Sharing Strategies are needed

• Cost• Coherent technical approaches and

architectures• Support over the entire lifecycle of an

experiment/project

Computing Strategy - Fermilab PAC 6/24/201128

Page 29: Computing Strategy Victoria White, Associate Lab Director for Computing and CIO Fermilab PAC June 24, 2011

Experiment/Project Lifecycle and funding

Computing Strategy - Fermilab PAC 6/24/201129

Early Period

R&D, Simulations

LOI,

Proposals

Shared

services

Mature phase

Construction, Operations, Analysis

Shared services

Expt or

Project

specific

Final data-taking

and beyond

Final analysis,

Data preservation

and access

Shared

services

Project specific

Shared services

Page 30: Computing Strategy Victoria White, Associate Lab Director for Computing and CIO Fermilab PAC June 24, 2011

Computing Strategy - Fermilab PAC 6/24/2011

Sharing via the Grid – FermiGrid

30

TeraGrid WLCG NDGF

User Login & Job

Submission

GRIDFarm

3284 slots

CMS

7485 slotsCDF

5600 slots

D0

6916 slots

FermiGridMonitorin

g/Accountin

gServices

FermiGridInfrastructu

reServices

FermiGridSite

Gateway

FermiGridAuthenticati

on/Authorizatio

nServices

Open Science

Grid

Page 31: Computing Strategy Victoria White, Associate Lab Director for Computing and CIO Fermilab PAC June 24, 2011

31 Computing Strategy - Fermilab PAC 6/24/2011

• The Open Science Grid (OSG) advances science through open distributed computing. The OSG is a multi-disciplinary partnership to federate local, regional, community and national cyberinfrastructures to meet the needs of research and academic communities at all scales.

• Total of 95 sites; ½ million jobs a day, 1 million CPU hours/day; 1 million files transferred/day.

• It is cost effective, it promotes collaboration, it is working!

Open Science Grid (OSG)

The US contribution and partnership with the LHC

Computing Grid is provided through OSG

for CMS and ATLAS

Page 32: Computing Strategy Victoria White, Associate Lab Director for Computing and CIO Fermilab PAC June 24, 2011

FNAL CPU – core count for science

Computing Strategy - Fermilab PAC 6/24/201132

Page 33: Computing Strategy Victoria White, Associate Lab Director for Computing and CIO Fermilab PAC June 24, 2011

Data Storage at Fermilab - Tape

Computing Strategy - Fermilab PAC 6/24/201133

FY07 FY08 FY09 FY100

5

10

15

20

25

30

Petabytes on tape at end of fiscal year

Other experimentsCMSD0CDF

Page 34: Computing Strategy Victoria White, Associate Lab Director for Computing and CIO Fermilab PAC June 24, 2011

Computing Strategy - Fermilab PAC 6/24/2011

Data on tape - total

Other E

xperiments

34

Page 35: Computing Strategy Victoria White, Associate Lab Director for Computing and CIO Fermilab PAC June 24, 2011

Computing Strategy - Fermilab PAC 6/24/2011

FermiCloud: Virtualization likely a key component for long term analysis

• The FermiCloud project is a private cloud facility built to provide a production facility for cloud services

• A private cloud—on-site access only for registered Fermilab users

Can be evolved into a hybrid cloud with connections to Magellan, Amazon or other cloud provider in the future.

• Much of the “data intensive” computing cannot use commercial Cloud computing

• Not cost effective today for permanent use – only for overflow or unexpected needs for Simulation.

35

Page 36: Computing Strategy Victoria White, Associate Lab Director for Computing and CIO Fermilab PAC June 24, 2011

COMPUTING FOR THEORY AND SIMULATION SCIENCE

Computing Strategy - Fermilab PAC 6/24/201136

Page 37: Computing Strategy Victoria White, Associate Lab Director for Computing and CIO Fermilab PAC June 24, 2011

High Performance (parallel) Computing is needed for

• Lattice Gauge Theory calculations (LQCD)• Accelerator modeling tools and simulations• Computational Cosmology:

Computing Strategy - Fermilab PAC 6/24/201137

Dark energy, matter Cosmic gas Galaxies

Simulations connect fundamentals with observables

Page 38: Computing Strategy Victoria White, Associate Lab Director for Computing and CIO Fermilab PAC June 24, 2011

Strategies for Simulation Science Computing

• Lattice QCD is the poster child Coherent inclusive US QCD collaboration

• Paul MacKenzie, Fermilab leads. This allocates HPC resources. LQCD Computing Project (HEP and NP funding)

• Bill Boroski, Fermilab is the Project Manager SciDAC II project to develop the software infrastructure

• Accelerator modeling Multi-institutional tools project COMPASS – Panagiotis

Spentzouris, Fermilab is the PI Also accelerator project specific modeling efforts

• Computational Cosmology Computational Cosmology Collaboration (C3) for mid-range

computing for astrophysics and cosmology Taskforce – Fermilab, ANL, U of Chicago - to develop strategy

Computing Strategy - Fermilab PAC 6/24/201138

Page 39: Computing Strategy Victoria White, Associate Lab Director for Computing and CIO Fermilab PAC June 24, 2011

CORE COMPUTING & INFRASTRUCTURE

Computing Strategy - Fermilab PAC 6/24/201139

Page 40: Computing Strategy Victoria White, Associate Lab Director for Computing and CIO Fermilab PAC June 24, 2011

Core Computing – a strong base

• Scientific Computing relies on Core Computing services and Computing Facility infrastructure

Core Networking and network services Computer rooms, power and cooling Email, videoconferencing, web servers Document databases, Indico, calendering Service desk Monitoring and alerts Logistics Desktop support (Windows and Mac) Printer support Computer Security ….. and more

• All of the above is provided through overheads

Computing Strategy - Fermilab PAC 6/24/201140

Page 41: Computing Strategy Victoria White, Associate Lab Director for Computing and CIO Fermilab PAC June 24, 2011

Computer Rooms

• The home of all the scientific computing hardware is the computer rooms.

They provide power, space and cooling for all the systems. CD’s computer rooms are a critical component of the successful

delivery of scientific computing.

Computing Strategy - Fermilab PAC 6/24/201141

Feynman Computing Center (FCC)

Grid Computing Center (GCC) Lattice Computing Center (LCC)

Page 42: Computing Strategy Victoria White, Associate Lab Director for Computing and CIO Fermilab PAC June 24, 2011

Computing Strategy - Fermilab PAC 6/24/2011

Fermilab Computing Facilities

42

•Lattice Computing Center (LCC)• High Performance Computing (HPC)• Accelerator Simulation, Cosmology nodes• No UPS

•Feynman Computing Center (FCC)• High availability services – e.g. core

network, email, etc.• Tape Robotic Storage (3 10000 slot

libraries)• UPS & Standby Power Generation• ARRA project: upgrade cooling and

add HA computing room - completed

•Grid Computing Center (GCC)• High Density Computational

Computing• CMS, RUNII, Grid Farm batch worker

nodes• Lattice HPC nodes• Tape Robotic Storage (4 10000 slot

libraries)• UPS & taps for portable generatorsEPA Energy

Star award 2010

Page 43: Computing Strategy Victoria White, Associate Lab Director for Computing and CIO Fermilab PAC June 24, 2011

Computing Strategy - Fermilab PAC 6/24/2011

Facilities: more than just space power and cooling – continuous planning

43

ARRA funded new high availability computer

room in Feynman Computing Center

Many CMS disks are now in here

Page 44: Computing Strategy Victoria White, Associate Lab Director for Computing and CIO Fermilab PAC June 24, 2011

Computing Strategy - Fermilab PAC 6/24/2011

Reliable high speed networking is key

44

Page 45: Computing Strategy Victoria White, Associate Lab Director for Computing and CIO Fermilab PAC June 24, 2011

Conclusion

• We have a coherent and evolving scientific computing program that emphasizes sharing of resources, re-use of code and tools, and requirements planning.

• Embedded scientists with deep involvement are also a key strategy for success.

• Fermilab takes on leadership roles in computing in many areas.

• We support projects and experiments at all stages of their lifecycle – but if we want to truly preserve access to Tevatron data long term much more work is needed.

Computing Strategy - Fermilab PAC 6/24/201145