rit colloquium (may 23, 2007)paul avery 1 paul avery university of florida [email protected]...
TRANSCRIPT
RIT Colloquium (May 23, 2007)
Paul Avery 1
Paul AveryUniversity of [email protected]
Physics ColloquiumRIT (Rochester, NY)
May 23, 2007
Open Science GridLinking Universities and Laboratories In National
Cyberinfrastructure
www.opensciencegrid.org
RIT Colloquium (May 23, 2007)
Paul Avery 2
Cyberinfrastructure and Grids Grid: Geographically distributed computing
resources configured for coordinated use
Fabric: Physical resources & networks providing raw capability
Ownership: Resources controlled by owners and shared w/ others
Middleware: Software tying it all together: tools, services, etc.
Enhancing collaboration via transparent resource sharing
US-CMS“Virtual Organization”
RIT Colloquium (May 23, 2007)
Paul Avery 3
Motivation: Data Intensive Science
21st century scientific discoveryComputationally & data intensiveTheory + experiment + simulation Internationally distributed resources and collaborations
Dominant factor: data growth (1 petabyte = 1000 terabytes)
2000 ~0.5 petabyte2007 ~10 petabytes2013 ~100 petabytes2020 ~1000 petabytes
Powerful cyberinfrastructure neededComputation Massive, distributed CPUData storage & access Large-scale, distributed storageData movement International optical networksData sharing Global collaborations (100s – 1000s)Software Managing all of the above
How to collect, manage,access and interpret thisquantity of data?
RIT Colloquium (May 23, 2007)
Paul Avery 4
Open Science Grid: July 20, 2005Consortium of many organizations (multiple
disciplines)Production grid cyberinfrastructure80+ sites, 25,000+ CPUs: US, UK, Brazil, Taiwan
RIT Colloquium (May 23, 2007)
Paul Avery 5
The Open Science Grid Consortium
OpenScience
Grid
U.S. gridprojects LHC experiments
Laboratorycenters
Educationcommunities
Science projects & communities
Technologists(Network, HPC, …)
ComputerScience
Universityfacilities
Multi-disciplinaryfacilities
Regional andcampus grids
RIT Colloquium (May 23, 2007)
Paul Avery 6
Open Science Grid Basics Who
Comp. scientists, IT specialists, physicists, biologists, etc.
WhatShared computing and storage resourcesHigh-speed production and research networksMeeting place for research groups, software experts, IT
providers
VisionMaintain and operate a premier distributed computing facilityProvide education and training opportunities in its useExpand reach & capacity to meet needs of stakeholdersDynamically integrate new resources and applications
Members and partnersMembers: HPC facilities, campus, laboratory & regional
gridsPartners: Interoperation with TeraGrid, EGEE, NorduGrid,
etc.
RIT Colloquium (May 23, 2007)
Paul Avery 7
Crucial Ingredients in Building OSG
Science “Push”: ATLAS, CMS, LIGO, SDSS1999: Foresaw overwhelming need for distributed
cyberinfrastructure
Early funding: “Trillium” consortiumPPDG: $12M (DOE) (1999 – 2006)GriPhyN: $12M (NSF) (2000 – 2006) iVDGL: $14M (NSF) (2001 – 2007)Supplements + new funded projects
Social networks: ~150 people with many overlapsUniversities, labs, SDSC, foreign partners
Coordination: pooling resources, developing broad goalsCommon middleware: Virtual Data Toolkit (VDT)Multiple Grid deployments/testbeds using VDTUnified entity when collaborating internationallyHistorically, a strong driver for funding agency collaboration
RIT Colloquium (May 23, 2007)
Paul Avery 8
OSG History in Context
1999 2000 2001 2002 20052003 2004 2006 2007 2008 2009
PPDG
GriPhyN
iVDGL
Trillium Grid3 OSG(DOE)
(DOE+NSF)(NSF)
(NSF)
European Grid + Worldwide LHC Computing Grid
Campus, regional grids
LHC OpsLHC construction, preparation
LIGO operation LIGO preparation
RIT Colloquium (May 23, 2007)
Paul Avery 9
Principal Science Drivers High energy and nuclear physics
100s of petabytes (LHC) 2007Several petabytes 2005
LIGO (gravity wave search)0.5 - several petabytes 2002
Digital astronomy10s of petabytes 200910s of terabytes 2001
Other sciences coming forwardBioinformatics (10s of petabytes)NanoscienceEnvironmentalChemistryApplied mathematicsMaterials Science?
Data
gro
wth
Com
mu
nit
y g
row
th
2007
2005
2003
2001
2009
RIT Colloquium (May 23, 2007)
Paul Avery 10
OSG Virtual OrganizationsATLAS HEP/LHC HEP experiment at CERN
CDF HEP HEP experiment at FermiLab
CMS HEP/LHC HEP experiment at CERN
DES Digital astronomy Dark Energy Survey
DOSAR Regional grid Regional grid in Southwest US
DZero HEP HEP experiment at FermiLab
DOSAR Regional grid Regional grid in Southwest
ENGAGE Engagement effort A place for new communities
FermiLab Lab grid HEP laboratory grid
fMRI fMRI Functional MRI
GADU Bio Bioinformatics effort at Argonne
Geant4 Software Simulation project
GLOW Campus grid Campus grid U of Wisconsin, Madison
GRASE Regional grid Regional grid in Upstate NY
RIT Colloquium (May 23, 2007)
Paul Avery 11
OSG Virtual Organizations (2)GridChem
Chemistry Quantum chemistry grid
GPN Great Plains Network www.greatplains.net
GROW Campus grid Campus grid at U of Iowa
I2U2 EOT E/O consortium
LIGO Gravity waves Gravitational wave experiment
Mariachi Cosmic rays Ultra-high energy cosmic rays
nanoHUB
Nanotech Nanotechnology grid at Purdue
NWICG Regional grid Northwest Indiana regional grid
NYSGRID NY State Grid www.nysgrid.org
OSGEDU EOT OSG education/outreach
SBGRID Structural biology Structural biology @ Harvard
SDSS Digital astronomy Sloan Digital Sky Survey (Astro)
STAR Nuclear physics Nuclear physics experiment at Brookhaven
UFGrid Campus grid Campus grid at U of Florida
RIT Colloquium (May 23, 2007)
Paul Avery 12
Partners: Federating with OSG Campus and regional
Grid Laboratory of Wisconsin (GLOW) Grid Operations Center at Indiana University (GOC) Grid Research and Education Group at Iowa (GROW) Northwest Indiana Computational Grid (NWICG) New York State Grid (NYSGrid) (in progress) Texas Internet Grid for Research and Education (TIGRE) nanoHUB (Purdue) LONI (Louisiana)
National Data Intensive Science University Network (DISUN) TeraGrid
International Worldwide LHC Computing Grid Collaboration (WLCG) Enabling Grids for E-SciencE (EGEE) TWGrid (from Academica Sinica Grid Computing) Nordic Data Grid Facility (NorduGrid) Australian Partnerships for Advanced Computing (APAC)
RIT Colloquium (May 23, 2007)
Paul Avery 13
Search for Origin of Mass New fundamental forces Supersymmetry Other new particles 2007 – ?
TOTEM
LHCb
ALICE
27 km Tunnel in Switzerland & France
CMS
ATLAS
Defining the Scale of OSG:Experiments at Large Hadron Collider
LHC @ CERN
RIT Colloquium (May 23, 2007)
Paul Avery 14
CMS: “Compact” Muon Solenoid
Inconsequential humans
RIT Colloquium (May 23, 2007)
Paul Avery 15
All charged tracks with pt > 2 GeV
Reconstructed tracks with pt > 25 GeV
(+30 minimum bias events)
109 collisions/sec, selectivity: 1 in 1013
Collision Complexity: CPU + Storage
RIT Colloquium (May 23, 2007)
Paul Avery 16
CMSATLAS
LHCb
Storage Raw recording rate 0.2 – 1.5 GB/s Large Monte Carlo data samples 100 PB by ~2013 1000 PB later in decade?
Processing PetaOps (> 300,000 3 GHz PCs)
Users 100s of institutes 1000s of researchers
LHC Data and CPU Requirements
RIT Colloquium (May 23, 2007)
Paul Avery 17
CMS Experiment
OSG and LHC Global Grid
Online System
CERN Computer Center
FermiLabKorea RussiaUK
Maryland
200 - 1500 MB/s
>10 Gb/s
10-40 Gb/s
2.5-10 Gb/s
Tier 0
Tier 1
Tier 3
Tier 2
Physics caches
PCs
Iowa
UCSDCaltechU Florida
5000 physicists, 60 countries
10s of Petabytes/yr by 2009 CERN / Outside = 10-20%
FIU
Tier 4
OSG
RIT Colloquium (May 23, 2007)
Paul Avery 18
ATLAS CMS
LHC Global Collaborations
2000 – 3000 physicists per experiment USA is 20–31% of total
RIT Colloquium (May 23, 2007)
Paul Avery 19
LIGO: Search for Gravity Waves LIGO Grid
6 US sites3 EU sites (UK & Germany)
* LHO, LLO: LIGO observatory sites* LSC: LIGO Scientific Collaboration
Cardiff
AEI/Golm •
Birmingham•
RIT Colloquium (May 23, 2007)
Paul Avery 20
Sloan Digital Sky Survey: Mapping the Sky
RIT Colloquium (May 23, 2007)
Paul Avery 21
Integrated Database
Integrated Database Includes: Parsed Sequence Data and
Annotation Data from Public web sources.
Results of different tools used for Analysis: Blast, Blocks, TMHMM, …
GADU using GridApplications executed on Grid as
workflows and results are stored in integrated Database.
GADU Performs:Acquisition: to acquire Genome
Data from a variety of publicly available databases and store temporarily on the file system.
Analysis: to run different publicly available tools and in-house tools on the Grid using Acquired data & data from Integrated database.
Storage: Store the parsed data acquired from public databases and parsed results of the tools and workflows used during analysis.
Bidirectional Data Flow
Public DatabasesGenomic databases available on the web.Eg: NCBI, PIR, KEGG, EMP, InterPro, etc.
Applications (Web Interfaces) Based on the Integrated Database
PUMA2Evolutionary Analysis of
Metabolism
ChiselProtein Function Analysis
Tool.
TARGETTargets for Structural analysis of proteins.
PATHOSPathogenic DB for
Bio-defense research
PhyloblocksEvolutionary analysis of
protein families
TeraGrid OSG DOE SG
GNARE – Genome Analysis Research Environment
Services to Other Groups
•SEED(Data Acquisition)
•Shewanella Consortium
(Genome Analysis)Others..
Bioinformatics: GADU / GNARE
RIT Colloquium (May 23, 2007)
Paul Avery 22
Bioinformatics (cont)
Shewanella oneidensisgenome
RIT Colloquium (May 23, 2007)
Paul Avery 23
Nanoscience Simulations
collaboration
nanoHUB.org
courses, tutorialsonline simulation
seminars
learning modules
Real users and real usage >10,100 users
1881 sim. users>53,000 simulations
RIT Colloquium (May 23, 2007)
Paul Avery 24
OSG Engagement Effort Purpose: Bring non-physics applications to OSG
Led by RENCI (UNC + NC State + Duke)
Specific targeted opportunitiesDevelop relationshipDirect assistance with technical details of connecting to
OSG
Feedback and new requirements for OSG infrastructure
(To facilitate inclusion of new communities)More & better documentationMore automation
RIT Colloquium (May 23, 2007)
Paul Avery 25
OSG and the Virtual Data Toolkit VDT: a collection of software
Grid software (Condor, Globus, VOMS, dCache, GUMS, Gratia, …)
Virtual Data SystemUtilities
VDT: the basis for the OSG software stackGoal is easy installation with automatic configurationNow widely used in other projectsHas a growing support infrastructure
RIT Colloquium (May 23, 2007)
Paul Avery 26
Why Have the VDT? Everyone could download the software from the
providers But the VDT:
Figures out dependencies between software Works with providers for bug fixesAutomatic configures & packages softwareTests everything on 15 platforms (and growing)
Debian 3.1 Fedora Core 3 Fedora Core 4 (x86, x86-64) Fedora Core 4 (x86-64) RedHat Enterprise Linux 3 AS (x86, x86-64, ia64) RedHat Enterprise Linux 4 AS (x64, x86-64) ROCKS Linux 3.3 Scientific Linux Fermi 3 Scientific Linux Fermi 4 (x86, x86-64, ia64) SUSE Linux 9 (IA-64)
RIT Colloquium (May 23, 2007)
Paul Avery 27
05
101520253035404550
Jan-
02
Jul-0
2
Jan-
03
Jul-0
3
Jan-
04
Jul-0
4
Jan-
05
Jul-0
5
Jan-
06
Jul-0
6
Jan-
07
Nu
mb
er o
f m
ajo
r co
mp
on
ents
VDT 1.1.x VDT 1.2.x VDT 1.3.x VDT 1.4.0 VDT 1.5.x VDT 1.6.x
VDT 1.0Globus 2.0bCondor-G 6.3.1
VDT 1.1.8Adopted by LCG
VDT 1.1.11Grid2003 VDT 1.2.0
VDT 1.3.0
VDT 1.3.9For OSG 0.4
VDT 1.6.1For OSG 0.6.0
VDT 1.3.6For OSG 0.2
More dev releases
Both added and removed software
VDT Growth Over 5 Years (1.6.1i now)
vdt.cs.wisc.edu #
of
Com
pon
en
ts
RIT Colloquium (May 23, 2007)
Paul Avery 28
Collaboration with Internet2www.internet2.edu
RIT Colloquium (May 23, 2007)
Paul Avery 29
Optical, multi-wavelength community owned or leased “dark fiber” (10 GbE) networks for R&E
Spawning state-wide and regional networks (FLR, SURA, LONI, …)
Bulletin: NLR-Internet2 merger announcement
Collaboration with National Lambda Rail
www.nlr.net
RIT Colloquium (May 23, 2007)
Paul Avery 30
UltraLight
10 Gb/s+ network• Caltech, UF, FIU, UM, MIT• SLAC, FNAL• Int’l partners• Level(3), Cisco, NLR
http://www.ultralight.org
Funded by NSF
Integrating Advanced Networking in Applications
RIT Colloquium (May 23, 2007)
Paul Avery 31
REDDnet: National Networked Storage
NSF funded project Vandebilt
8 initial sitesMultiple disciplines
Satellite imagery HEP Terascale
Supernova Initative Structural Biology Bioinformatics
Storage 500TB disk 200TB tape
Brazil?
RIT Colloquium (May 23, 2007)
Paul Avery 32
OSG Jobs Snapshot: 6 Months
5000 simultaneous jobs
from multiple VOs
Sep Dec FebNov JanOct Mar
RIT Colloquium (May 23, 2007)
Paul Avery 33
OSG Jobs Per Site: 6 Months
5000 simultaneous jobs
at multiple sites
Sep Dec FebNov JanOct Mar
RIT Colloquium (May 23, 2007)
Paul Avery 34
Completed Jobs/Week on OSG
400K CMS “Data Challenge”
Sep Dec FebNov JanOct Mar
RIT Colloquium (May 23, 2007)
Paul Avery 35
# Jobs Per VONew Accounting
System(Gratia)
RIT Colloquium (May 23, 2007)
Paul Avery 36
Massive 2007 Data Reprocessing
by D0 Experiment @ Fermilab
SAM
OSG
LCG~ 400M total~ 250M OSG
RIT Colloquium (May 23, 2007)
Paul Avery 37
CDF Discovery of Bs Oscillations
s sB B
/ 21 2sin / cos /t
s s s s sB e x t B x t B
2.8THz2sxf
RIT Colloquium (May 23, 2007)
Paul Avery 38
Communications:International Science Grid This
WeekSGTW iSGTWFrom April 2005Diverse audience>1000
subscribers
www.isgtw.org
RIT Colloquium (May 23, 2007)
Paul Avery 39
OSG News: Monthly Newsletter
18 issues by Apr. 2007
www.opensciencegrid.org/osgnews
RIT Colloquium (May 23, 2007)
Paul Avery 40
Grid Summer Schools
Summer 2004, 2005, 20061 week @ South Padre Island, TexasLectures plus hands-on exercises to ~40 studentsStudents of differing backgrounds (physics + CS), minorities
Reaching a wider audienceLectures, exercises, video, on webMore tutorials, 3-4/yearStudents, postdocs, scientistsAgency specific tutorials
RIT Colloquium (May 23, 2007)
Paul Avery 41
Project Challenges Technical constraints
Commercial tools fall far short, require (too much) invention
Integration of advanced CI, e.g. networks
Financial constraints (slide)Fragmented & short term funding injections (recent
$30M/5 years)Fragmentation of individual efforts
Distributed coordination and managementTighter organization within member projects compared to
OSGCoordination of schedules & milestonesMany phone/video meetings, travelKnowledge dispersed, few people have broad overview
RIT Colloquium (May 23, 2007)
Paul Avery 42
Funding & Milestones: 1999 – 2007
2000 2001 2003 2004 2005 2006 20072002
GriPhyN, $12M
PPDG, $9.5M
UltraLight, $2M
CHEPREO, $4M
Grid Communications
Grid Summer Schools 2004,
2005, 2006
Grid3 start OSG
start
VDT 1.0
First US-LHCGrid
Testbeds
Digital Divide Workshops04, 05, 06
LIGO Grid
LHC startiVDGL,
$14M
DISUN, $10M
OSG, $30M NSF, DOE
VDT 1.3
Grid, networking projects Large experiments Education, outreach, training
RIT Colloquium (May 23, 2007)
Paul Avery 43
Challenges from Diversity and Growth
Management of an increasingly diverse enterpriseSci/Eng projects, organizations, disciplines as distinct
culturesAccommodating new member communities (expectations?)
Interoperation with other gridsTeraGrid International partners (EGEE, NorduGrid, etc.)Multiple campus and regional grids
Education, outreach and trainingTraining for researchers, students… but also project PIs, program officers
Operating a rapidly growing cyberinfrastructure25K 100K CPUs, 4 10 PB diskManagement of and access to rapidly increasing data stores
(slide)Monitoring, accounting, achieving high utilizationScalability of support model (slide)
RIT Colloquium (May 23, 2007)
Paul Avery 44
Rapid Cyberinfrastructure Growth: LHC
0
50
100
150
200
250
300
350
2007 2008 2009 2010Year
MS
I200
0
LHCb-Tier-2
CMS-Tier-2
ATLAS-Tier-2
ALICE-Tier-2
LHCb-Tier-1
CMS-Tier-1
ATLAS-Tier-1
ALICE-Tier-1
LHCb-CERN
CMS-CERN
ATLAS-CERN
ALICE-CERN
CERN
Tier-1
Tier-2
2008: ~140,000PCs
Meeting LHC service challenges & milestonesParticipating in worldwide simulation productions
RIT Colloquium (May 23, 2007)
Paul Avery 45
OSG Operations
Distributed modelScalability!VOs, sites, providersRigorous problem
tracking & routingSecurityProvisioningMonitoringReporting
Partners with EGEE operations
RIT Colloquium (May 23, 2007)
Paul Avery 46
Integrated Network Management
Five Year Project Timeline & Milestones
LHC Simulations Support 1000 Users; 20PB Data Archive
Contribute to Worldwide LHC Computing Grid LHC Event Data Distribution and Analysis
Contribute to LIGO Workflow and Data Analysis
Additional Science Communities +1 Community
+1 Community
+1 Community
Facility Security : Risk Assessment, Audits, Incident Response, Management, Operations, Technical Controls
Plan V1 1st Audit Risk Assessment
Audit Risk Assessment
Audit Risk Assessment
Audit Risk Assessment
VDT and OSG Software Releases: Major Release every 6 months; Minor Updates as needed VDT 1.4.0VDT 1.4.1VDT 1.4.2 … … … …
Advanced LIGO LIGO Data Grid dependent on OSG
CDF Simulation
STAR, CDF, D0, Astrophysics
D0 Reprocessing D0 SimulationsCDF Simulation and Analysis
LIGO data run SC5
Facility Operations and Metrics: Increase robustness and scale; Operational Metrics defined and validated each year.
Interoperate and Federate with Campus and Regional Grids
2006 2007 2008 2009 2010 2011
Project start End of Phase I End of Phase II
VDT Incremental
Updates
dCache with role based
authorization
OSG 0.6.0OSG 0.8.0 OSG 1.0 OSG 2.0 OSG 3.0 …
Accounting Auditing
VDS with SRMCommon S/w Distribution
with TeraGridEGEE using VDT 1.4.X
Transparent data and job movement with TeraGridTransparent data management with
EGEE
Federated monitoring and information services
Data Analysis (batch and interactive) Workflow
Extended Capabilities & Increase Scalability and Performance for Jobs and Data to meet Stakeholder needsSRM/dCache Extensions
“Just in Time” Workload Management
VO Services Infrastructure
Improved Workflow and Resource Selection
Work with SciDAC-2 CEDS and Security with Open Science
+1 Community
2006 2007 2008 2009 2010 2011
+1 Community
+1 Community
+1 Community
+1 Community
STAR Data Distribution and Jobs 10KJobs per Day
RIT Colloquium (May 23, 2007)
Paul Avery 47
Extra Slides
RIT Colloquium (May 23, 2007)
Paul Avery 48
VDT Release Process (Subway Map)
Gather requirements
Build software
Test
Validation test bed
ITB Release Candidate
VDT Release
Integration test bed
OSG Release
TimeDay 0
Day N
From Alain Roy
RIT Colloquium (May 23, 2007)
Paul Avery 49
VDT Challenges How should we smoothly update a production
service? In-place vs. on-the-sidePreserve old configuration while making big changesStill takes hours to fully install and set up from scratch
How do we support more platforms?A struggle to keep up with the onslaught of Linux
distributionsAIX? Mac OS X? Solaris?
How can we accommodate native packaging formats?
RPMDeb
Fedora Core 3
Fedora Core 4RHEL 3
RHEL 4BCCD
Fedora Core 6