griphyn management mike wilde university of chicago, argonne paul avery university of florida...
DESCRIPTION
329 Jan 2003 Mike Wilde, University of Chicago External Advisory Committee Physics Experiments Project Directors Paul Avery Ian Foster Internet 2DOE Science NSF PACIs Project Coordination Mike Wilde Rick Cavanaugh Outreach/Education Manuela Campanelli Industrial Connections Ian Foster / Paul Avery EDG, LCG, Other Grid Projects Architecture Carl Kesselman VDT Development Coord.: M. Livny Requirements, Definition & Scheduling (Miron Livny) Integration, Testing, Documentation, Support (Alain Roy) Globus Project & NMI Integration (Carl Kesselman) CS Research Coord.: I. Foster Virtual Data (Mike Wilde) Request Planning & Scheduling (Ewa Deelman) Execution Management (Miron Livny) Measurement, Monitoring & Prediction (Valerie Taylor) Applications Coord.: R. Cavanaugh ATLAS (Rob Gardner) CMS (Rick Cavanaugh) LIGO (Albert Lazzarini) SDSS (Alexander Szalay) Inter-Project Coordination: R. Pordes HICB (Larry Price) HIJTB (Carl Kesselman) PPDG (Ruth Pordes) TeraGrid, NMI, etc. (TBD) International (EDG, etc) (Ruth Pordes) GriPhyN Management iVDGL iVDGL Rob GardnerTRANSCRIPT
![Page 1: GriPhyN Management Mike Wilde University of Chicago, Argonne Paul Avery University of Florida GriPhyN NSF Project](https://reader034.vdocuments.net/reader034/viewer/2022051007/5a4d1b197f8b9ab059992e67/html5/thumbnails/1.jpg)
GriPhyN ManagementMike Wilde
University of Chicago, [email protected]
Paul AveryUniversity of [email protected]
GriPhyN NSF Project Review29-30 January 2003
Chicago
![Page 2: GriPhyN Management Mike Wilde University of Chicago, Argonne Paul Avery University of Florida GriPhyN NSF Project](https://reader034.vdocuments.net/reader034/viewer/2022051007/5a4d1b197f8b9ab059992e67/html5/thumbnails/2.jpg)
229 Jan 2003Mike Wilde, University of Chicago [email protected]
GriPhyN Management• Management
– Paul Avery (Florida) co-Director– Ian Foster (Chicago) co-Director– Mike Wilde (Argonne) Project Coordinator– Rick Cavanaugh (Florida) Deputy Coordinator
![Page 3: GriPhyN Management Mike Wilde University of Chicago, Argonne Paul Avery University of Florida GriPhyN NSF Project](https://reader034.vdocuments.net/reader034/viewer/2022051007/5a4d1b197f8b9ab059992e67/html5/thumbnails/3.jpg)
329 Jan 2003Mike Wilde, University of Chicago [email protected]
External Advisory Committee
Physics Experiments
Project DirectorsPaul AveryIan Foster
Inte
rnet
2
DO
E Sc
ienc
e
NSF
PA
CIs
Project CoordinationMike Wilde
Rick Cavanaugh
Outreach/EducationManuela Campanelli
Industrial Connections
Ian Foster / Paul Avery
EDG, LCG,Other Grid Projects
ArchitectureCarl Kesselman
VDT DevelopmentCoord.: M. Livny
Requirements, Definition & Scheduling(Miron Livny)
Integration, Testing, Documentation, Support
(Alain Roy)
Globus Project & NMI Integration
(Carl Kesselman)
CS ResearchCoord.: I. Foster
Virtual Data(Mike Wilde)
Request Planning & Scheduling
(Ewa Deelman)
Execution Management(Miron Livny)
Measurement, Monitoring & Prediction
(Valerie Taylor)
ApplicationsCoord.: R. Cavanaugh
ATLAS(Rob Gardner)
CMS(Rick Cavanaugh)
LIGO(Albert Lazzarini)
SDSS(Alexander Szalay)
Inter-Project Coordination:
R. Pordes
HICB(Larry Price)
HIJTB(Carl Kesselman)
PPDG(Ruth Pordes)
TeraGrid, NMI, etc.(TBD)
International (EDG, etc)(Ruth Pordes)
GriPhyNManagement
iVDGL
iVDGLRob Gardner
![Page 4: GriPhyN Management Mike Wilde University of Chicago, Argonne Paul Avery University of Florida GriPhyN NSF Project](https://reader034.vdocuments.net/reader034/viewer/2022051007/5a4d1b197f8b9ab059992e67/html5/thumbnails/4.jpg)
429 Jan 2003Mike Wilde, University of Chicago [email protected]
External Advisory Committee• Members
– Fran Berman (SDSC Director)– Dan Reed (NCSA Director)– Joel Butler (former head, FNAL Computing Division)– Jim Gray (Microsoft)– Bill Johnston (LBNL, DOE Science Grid)– Fabrizio Gagliardi (CERN, EDG Director)– David Williams (former head, CERN IT)– Paul Messina (former CACR Director)– Roscoe Giles (Boston U, NPACI-EOT)
• Met with us 3 times: 4/2001, 1/2002, 1/2003– Extremely useful guidance on project scope & goals
![Page 5: GriPhyN Management Mike Wilde University of Chicago, Argonne Paul Avery University of Florida GriPhyN NSF Project](https://reader034.vdocuments.net/reader034/viewer/2022051007/5a4d1b197f8b9ab059992e67/html5/thumbnails/5.jpg)
529 Jan 2003Mike Wilde, University of Chicago [email protected]
GriPhyN Project Challenges• We balance and coordinate
– CS researchwith “goals, milestones & deliverables”
– GriPhyN schedule/priorities/riskswith those of the 4 experiments
– General tools developed by GriPhyNwith specific tools developed by 4 experiments
– Data Grid design, architecture & deliverableswith those of other Grid projects
• Appropriate balance requires– Tight management, close coordination, trust
• We have (so far) met these challenges– But requires constant attention, good will
![Page 6: GriPhyN Management Mike Wilde University of Chicago, Argonne Paul Avery University of Florida GriPhyN NSF Project](https://reader034.vdocuments.net/reader034/viewer/2022051007/5a4d1b197f8b9ab059992e67/html5/thumbnails/6.jpg)
629 Jan 2003Mike Wilde, University of Chicago [email protected]
Meetings in 2000-2001•GriPhyN/iVDGL meetings
– Oct. 2000 All-hands Chicago– Dec. 2000 Architecture Chicago– Apr. 2001 All-hands, EAC USC/ISI– Aug. 2001 Planning Chicago– Oct. 2001 All-hands, iVDGL USC/ISI
•Numerous smaller meetings– CS-experiment– CS research– Liaisons with PPDG and EU DataGrid– US-CMS and US-ATLAS computing reviews– Experiment meetings at CERN
![Page 7: GriPhyN Management Mike Wilde University of Chicago, Argonne Paul Avery University of Florida GriPhyN NSF Project](https://reader034.vdocuments.net/reader034/viewer/2022051007/5a4d1b197f8b9ab059992e67/html5/thumbnails/7.jpg)
729 Jan 2003Mike Wilde, University of Chicago [email protected]
Meetings in 2002• GriPhyN/iVDGL meetings
– Jan. 2002 EAC, Planning, iVDGL Florida– Mar. 2002 Outreach Workshop Brownsville– Apr. 2002 All-hands Argonne– Jul. 2002 Reliability Workshop ISI– Oct. 2002 Provenance Workshop Argonne– Dec. 2002 Troubleshooting Workshop Chicago– Dec. 2002 All-hands technical ISI +
Caltech– Jan. 2003 EAC SDSC
• Numerous other 2002 meetings– iVDGL facilities workshop (BNL)– Grid activities at CMS, ATLAS meetings– Several computing reviews for US-CMS, US-ATLAS– Demos at IST2002, SC2002– Meetings with LCG (LHC Computing Grid) project– HEP coordination meetings (HICB)
![Page 8: GriPhyN Management Mike Wilde University of Chicago, Argonne Paul Avery University of Florida GriPhyN NSF Project](https://reader034.vdocuments.net/reader034/viewer/2022051007/5a4d1b197f8b9ab059992e67/html5/thumbnails/8.jpg)
829 Jan 2003Mike Wilde, University of Chicago [email protected]
Planning Goals• Clarify our vision and direction
– Know how to make a difference in science & computing• Map that vision to each application
– Create concrete realizations of our vision• Organize as cooperative subteams with specific
missions and defined points of interaction• Coordinate our research programs• Shape toolkit to meet challenge-problem needs• “Stop, Look, and Listen” to each experiment’s need
– Excite the customer with our vision– Balance the promotion of our ideas with a solid
understanding of the size and nature of the problems
![Page 9: GriPhyN Management Mike Wilde University of Chicago, Argonne Paul Avery University of Florida GriPhyN NSF Project](https://reader034.vdocuments.net/reader034/viewer/2022051007/5a4d1b197f8b9ab059992e67/html5/thumbnails/9.jpg)
929 Jan 2003Mike Wilde, University of Chicago [email protected]
Project Approach
CS Research
VDT Development
Application Analysis
Infrastructure Developmentand Deployment
Challenge ProblemIdentification
Challenge ProblemSolution Development
Challenge ProblemSolution Integration
VDT Development VDTDevelopment
InfrastructureDeployment
ISDeployment
time
Process
![Page 10: GriPhyN Management Mike Wilde University of Chicago, Argonne Paul Avery University of Florida GriPhyN NSF Project](https://reader034.vdocuments.net/reader034/viewer/2022051007/5a4d1b197f8b9ab059992e67/html5/thumbnails/10.jpg)
1029 Jan 2003Mike Wilde, University of Chicago [email protected]
Project Activities• Research
– Experiment Analysis> Use cases, statistics, distributions, data flow patterns, tools, data
types, HIPO– Vision Refinement– Attacking the “hard problems”
> Virtual data identification and manipulation> Advanced resource allocation and execution planning> Scaling this up to Petascale
– Architectural Refinement• Toolkit Development• Integration
– Identify and Address Challenge Problems– Testbed construction
• Support• Evaluation
![Page 11: GriPhyN Management Mike Wilde University of Chicago, Argonne Paul Avery University of Florida GriPhyN NSF Project](https://reader034.vdocuments.net/reader034/viewer/2022051007/5a4d1b197f8b9ab059992e67/html5/thumbnails/11.jpg)
1129 Jan 2003Mike Wilde, University of Chicago [email protected]
Research Milestone HighlightsY1: Execution framework
Virtual data prototypesY2: Virtual data catalog w/glue language
Integ w/ scalable replica catalog serviceInitial resource usage policy language
Y3: Advanced planning, fault recoveryIntelligent catalogAdvanced policy languages
Y4: Knowledge management and locationY5: Transparency and usability
Scalability and manageability
![Page 12: GriPhyN Management Mike Wilde University of Chicago, Argonne Paul Avery University of Florida GriPhyN NSF Project](https://reader034.vdocuments.net/reader034/viewer/2022051007/5a4d1b197f8b9ab059992e67/html5/thumbnails/12.jpg)
1229 Jan 2003Mike Wilde, University of Chicago [email protected]
Research Leadership Centers• Virtual Data:
– Chicago (VDC, VDL, KR), ISI (Schema)– Wisconsin (NeST), SDSC (MCAT,SRB)
• Request Planning– ISI (algorithms), Chicago (policy),
Berkeley (query optimization)• Request Execution
– Wisconsin• Fault Tolerance
– SDSC• Monitoring
– Northwestern• User interface
– Indiana
![Page 13: GriPhyN Management Mike Wilde University of Chicago, Argonne Paul Avery University of Florida GriPhyN NSF Project](https://reader034.vdocuments.net/reader034/viewer/2022051007/5a4d1b197f8b9ab059992e67/html5/thumbnails/13.jpg)
1329 Jan 2003Mike Wilde, University of Chicago [email protected]
Project Status Overview• Year 1 research fruitful
– Virtual data, planning, execution, integration—demonstrated at SC2001
• Research efforts launched– 80% focused – 20% exploratory
• VDT effort staffed and launched– Yearly major release; VDT1 close; VDT2 planned;
VDT3-5 envisioned• Year 2 experiment integrations high level
plans done; detailed planning underway• Long term vision refined and unified
![Page 14: GriPhyN Management Mike Wilde University of Chicago, Argonne Paul Avery University of Florida GriPhyN NSF Project](https://reader034.vdocuments.net/reader034/viewer/2022051007/5a4d1b197f8b9ab059992e67/html5/thumbnails/14.jpg)
1429 Jan 2003Mike Wilde, University of Chicago [email protected]
Milestones: Architecture• Early 2002:
– Specify interfaces for new GriPhyN functional modules
> Request Planner> Virtual Data Catalog service> Monitoring service
– Define how we will connect and integrate our solutions, e.g.:
> Virtual data language> Multiple-catalog integration> DAGman graphs> Policy langauge> CAS interaction for policy lookup and enforcement
• Year-end 2002: phased migration to a web-services based architecture
![Page 15: GriPhyN Management Mike Wilde University of Chicago, Argonne Paul Avery University of Florida GriPhyN NSF Project](https://reader034.vdocuments.net/reader034/viewer/2022051007/5a4d1b197f8b9ab059992e67/html5/thumbnails/15.jpg)
1529 Jan 2003Mike Wilde, University of Chicago [email protected]
Status: Virtual Data• Virtual Data
– First version of a catalog structure built– Integration language “VDL” developed– Detailed transformation model designed
• Replica location service at Chicago & ISI– Highly scalable and fault tolerant– Soft-state distributed architecture
• NeSt at UW– Storage appliance for the Grid– Treats data transfer as a job step
![Page 16: GriPhyN Management Mike Wilde University of Chicago, Argonne Paul Avery University of Florida GriPhyN NSF Project](https://reader034.vdocuments.net/reader034/viewer/2022051007/5a4d1b197f8b9ab059992e67/html5/thumbnails/16.jpg)
1629 Jan 2003Mike Wilde, University of Chicago [email protected]
Milestones: Virtual Data• Year 2:
– Local Virtual Data Catalog Structures (relational)– Catalog manipulation language (VDL)– Linkage to application metadata
• Year 3: Handling multi-modal virtual data– Distributed virtual data catalogs (based on RLS)– Advanced transformation signatures– Flat, objects, OODBs, relational– Cross-modal depdendency tracking
• Year 4: Knowledge representation– Ontologies; data generation paradigms– Fuzzy dependencies and data equivalence
• Year 5: Finalize Scalability and Manageability
![Page 17: GriPhyN Management Mike Wilde University of Chicago, Argonne Paul Avery University of Florida GriPhyN NSF Project](https://reader034.vdocuments.net/reader034/viewer/2022051007/5a4d1b197f8b9ab059992e67/html5/thumbnails/17.jpg)
1729 Jan 2003Mike Wilde, University of Chicago [email protected]
Status: Planning and Execution• Planning and Execution
– Major strides in execution environment made with Condor, CondorG, and DAGman
– DAGs evolving as pervasive job specification model with the virtual data grid
– Large-scale CMS production demonstrated on 3-site wide-area multi-organization grid
– LIGO demonstrated full GriPhyN integration– Sophisticated policy language for grid-wide resource
sharing under design at Chicago– Knowledge representation research underway at
Chicago– Research in ClassAds explored in Globus context
• Master/worker fault tolerance at UCSD– Design proposed to extend fault tolerance of Condor
masters
![Page 18: GriPhyN Management Mike Wilde University of Chicago, Argonne Paul Avery University of Florida GriPhyN NSF Project](https://reader034.vdocuments.net/reader034/viewer/2022051007/5a4d1b197f8b9ab059992e67/html5/thumbnails/18.jpg)
1829 Jan 2003Mike Wilde, University of Chicago [email protected]
Milestones: Request Planning• Year 2:
– Protype planner as a grid service module– Intial CAS and Policy Language Integration– Refinement of DAG language with data flow info
• Year 3:– Policy enhancements: dynamic replanning (based on
Grid monitoring), cost alternatives and optimizations• Year 4:
– Global planning with policy constraints• Year 5:
– Incremental global planning– Algorithms evaluated, tuned w/ large-scale simulations
![Page 19: GriPhyN Management Mike Wilde University of Chicago, Argonne Paul Avery University of Florida GriPhyN NSF Project](https://reader034.vdocuments.net/reader034/viewer/2022051007/5a4d1b197f8b9ab059992e67/html5/thumbnails/19.jpg)
1929 Jan 2003Mike Wilde, University of Chicago [email protected]
Milestones: Request Execution• Year 2:
– Request Planning and Execution> Striving for increasingly greater resource leverage with
increasing both power AND transparency> Fault tolerance – keeping it all running!
– Intial CAS and Policy Language Integration– Refinement of DAG language with data flow info– Resource utiization monitoring to drive planner
• Year 3:– Resource co-allocation with recovery– Fault tolerant execution engines
• Year 4:– Execution adapts to grid resource availability changes
• Year 5:– Simulation-based algorithm eval and tuning
![Page 20: GriPhyN Management Mike Wilde University of Chicago, Argonne Paul Avery University of Florida GriPhyN NSF Project](https://reader034.vdocuments.net/reader034/viewer/2022051007/5a4d1b197f8b9ab059992e67/html5/thumbnails/20.jpg)
2029 Jan 2003Mike Wilde, University of Chicago [email protected]
Status: Supporting Research• Joint PPDG-GriPhyN Monitoring group
– Meeting regularly– Use-case development underway
• Research into monitoring, measurement, profiling, and performance predication– Underway at NU and ANL
• GRIPE facility for Grid-wide user and host certificate and login management
• GRAPPA portal for end-user science access
![Page 21: GriPhyN Management Mike Wilde University of Chicago, Argonne Paul Avery University of Florida GriPhyN NSF Project](https://reader034.vdocuments.net/reader034/viewer/2022051007/5a4d1b197f8b9ab059992e67/html5/thumbnails/21.jpg)
2129 Jan 2003Mike Wilde, University of Chicago [email protected]
Status – Experiments• ATLAS
– 8-site testgrid in place– data and metadata management prototypes evolving– Ambitious Year-2 plan well refined – will use numerous
GriPhyN deliverables• CMS
– Working prototypes of production and distributed analysis, both with virtual data
– Year-2 plan – simulation production – underway• LIGO
– Working prototypes of full VDG demonstrated– Year-2 plan well refined and development underway
• SDSS– Year-2 plan well refined– Challenge problem development underway– close collaboration with Chicago on VDC
![Page 22: GriPhyN Management Mike Wilde University of Chicago, Argonne Paul Avery University of Florida GriPhyN NSF Project](https://reader034.vdocuments.net/reader034/viewer/2022051007/5a4d1b197f8b9ab059992e67/html5/thumbnails/22.jpg)
2229 Jan 2003Mike Wilde, University of Chicago [email protected]
Year 2 Plan: ATLAS
• ATLAS-GriPhyN Challenge Problem I– ATLAS DC0: 10M events, O(1000) CPUs– Integration of VDT to provide uniform distributed
data access– Use of GRAPPA portal, possibly over DAGman– Demo ATLAS SW Week – March 2002
![Page 23: GriPhyN Management Mike Wilde University of Chicago, Argonne Paul Avery University of Florida GriPhyN NSF Project](https://reader034.vdocuments.net/reader034/viewer/2022051007/5a4d1b197f8b9ab059992e67/html5/thumbnails/23.jpg)
2329 Jan 2003Mike Wilde, University of Chicago [email protected]
Year 2 Plan: ATLAS• ATLAS-GriPhyN Challenge Problem II
– Virtualization of pipelines to deliver analysis data products: reconstructions and metadata tags
– Full chain production and analysis of event data – Prototyping of typical physicist analysis sessions– Graphical monitoring display of event throughput
throughout the Grid– Live update display of distributed histogram
population from Athena – Virtual data re-materialization from Athena – Grappa job submission and monitoring
![Page 24: GriPhyN Management Mike Wilde University of Chicago, Argonne Paul Avery University of Florida GriPhyN NSF Project](https://reader034.vdocuments.net/reader034/viewer/2022051007/5a4d1b197f8b9ab059992e67/html5/thumbnails/24.jpg)
2429 Jan 2003Mike Wilde, University of Chicago [email protected]
Year 2 Plan: SDSS• Challenge Problem 1 – Balanced resources
– Cluster Galaxy Cataloging– Exercises virtual data derivation tracking
• Challenge Problem 2 – Compute Intensive– Spatial Correlation Functions and Power Spectra– Provides a research base for scientific knowledge
search-engine problems• Challenge Problem 3 – Storage Intensive
– Weak Lensing– Provides challenging testbed for advanced
request planning algorithms
![Page 25: GriPhyN Management Mike Wilde University of Chicago, Argonne Paul Avery University of Florida GriPhyN NSF Project](https://reader034.vdocuments.net/reader034/viewer/2022051007/5a4d1b197f8b9ab059992e67/html5/thumbnails/25.jpg)
2529 Jan 2003Mike Wilde, University of Chicago [email protected]
Integration of GriPhyN and iVDGL• Tight integration with GriPhyN
– Testbeds– VDT support– Outreach– Common External Advisory Committee
![Page 26: GriPhyN Management Mike Wilde University of Chicago, Argonne Paul Avery University of Florida GriPhyN NSF Project](https://reader034.vdocuments.net/reader034/viewer/2022051007/5a4d1b197f8b9ab059992e67/html5/thumbnails/26.jpg)
2629 Jan 2003Mike Wilde, University of Chicago [email protected]
iVDGL Management & Coordination
Project Coordination Group
US External Advisory Committee
GLUE Interoperability Team
Collaborating Grid Projects
TeraGrid
EDG Asia
DataTAG
BTEV
LCG?
BioALICE Geo
?
D0 PDC CMS HI ?
US ProjectDirectors
Outreach Team
Core Software Team
Facilities Team
Operations Team
Applications Team
International Piece
US Project Steering Group
U.S. Piece
GriPhyN Mike Wilde
![Page 27: GriPhyN Management Mike Wilde University of Chicago, Argonne Paul Avery University of Florida GriPhyN NSF Project](https://reader034.vdocuments.net/reader034/viewer/2022051007/5a4d1b197f8b9ab059992e67/html5/thumbnails/27.jpg)
2729 Jan 2003Mike Wilde, University of Chicago [email protected]
Global Context: Data Grid Projects• U.S. Infrastructure Projects
– GriPhyN (NSF)– iVDGL (NSF)– Particle Physics Data Grid (DOE)– TeraGrid (NSF)– DOE Science Grid (DOE)
• EU, Asia major projects– European Data Grid (EDG) (EU, EC)– EDG related national Projects (UK, Italy, France, …)– CrossGrid (EU, EC)– DataTAG (EU, EC)– LHC Computing Grid (LCG) (CERN)– Japanese Project– Korea project
![Page 28: GriPhyN Management Mike Wilde University of Chicago, Argonne Paul Avery University of Florida GriPhyN NSF Project](https://reader034.vdocuments.net/reader034/viewer/2022051007/5a4d1b197f8b9ab059992e67/html5/thumbnails/28.jpg)
2829 Jan 2003Mike Wilde, University of Chicago [email protected]
Coordination with US Efforts• Trillium = GriPhyN + iVDGL + PPDG• NMI & VDT• Networking initiatives
– HENP working group within Internet2– Working closely with National Light Rail
• New proposals
![Page 29: GriPhyN Management Mike Wilde University of Chicago, Argonne Paul Avery University of Florida GriPhyN NSF Project](https://reader034.vdocuments.net/reader034/viewer/2022051007/5a4d1b197f8b9ab059992e67/html5/thumbnails/29.jpg)
2929 Jan 2003Mike Wilde, University of Chicago [email protected]
International Coordination• EU DataGrid & DataTAG• HICB: HEP Inter-Grid Coordination Board
– HICB-JTB: Joint Technical Board– GLUE
• Participation in LHC Computing Grid (LCG)• International networks
– Standing Committee on Inter-regional Connectivity– Digital Divide projects, IEEAF