u.s. atlas grid testbed status and plans

24
U.S. ATLAS Grid Testbed Status and Plans Kaushik De Kaushik De University of Texas at Arlington University of Texas at Arlington DoE/NSF Mid-term Review DoE/NSF Mid-term Review NSF Headquarters, June 2002 NSF Headquarters, June 2002

Upload: beck

Post on 19-Jan-2016

55 views

Category:

Documents


0 download

DESCRIPTION

U.S. ATLAS Grid Testbed Status and Plans. Kaushik De University of Texas at Arlington DoE/NSF Mid-term Review NSF Headquarters, June 2002. Outline. Testbed Phase 2 launched: UTA Workshop http://heppc1.uta.edu/atlas/workshop_april_2002/index.html - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: U.S. ATLAS Grid Testbed Status and Plans

U.S. ATLAS Grid Testbed Status and Plans

Kaushik DeKaushik De

University of Texas at ArlingtonUniversity of Texas at Arlington

DoE/NSF Mid-term ReviewDoE/NSF Mid-term Review

NSF Headquarters, June 2002NSF Headquarters, June 2002

Page 2: U.S. ATLAS Grid Testbed Status and Plans

June 20, 2002June 20, 2002Kaushik De DoE/NSF ReviewKaushik De DoE/NSF Review 2

Outline

Testbed Phase 2 launched: UTA Workshop Testbed Phase 2 launched: UTA Workshop http://heppc1.uta.edu/atlas/workshop_april_2002/index.html

New focus on rapid software deployment New focus on rapid software deployment and grid based data production leading to and grid based data production leading to demonstrations at Supercomputing 2002demonstrations at Supercomputing 2002

Kaushik De coordinating U.S. Testbed and Kaushik De coordinating U.S. Testbed and SC2002 planning since mid-April 2002SC2002 planning since mid-April 2002

This talk based on new & evolving plansThis talk based on new & evolving plans Testbed status

Software distribution

Application toolkit

MC production plans

Monitoring

Grid tools

Integration

SC2002 demos

Page 3: U.S. ATLAS Grid Testbed Status and Plans

June 20, 2002June 20, 2002Kaushik De DoE/NSF ReviewKaushik De DoE/NSF Review 3

Testbed Goals

Demonstrate success of grid computing Demonstrate success of grid computing model for High Energy Physicsmodel for High Energy Physics in data production

in data access

in data analysis

Develop, deploy and test grid middleware Develop, deploy and test grid middleware and applicationsand applications integrate middleware with applications

simplify deployment - robust, rapid & scalable

inter-operate with other testbeds & grid organizations (iVDGL, DataTag…)

provide single point-of-service for grid users

Evolve into fully Evolve into fully functioning scalable functioning scalable distributeddistributed tiered grid tiered grid

Page 4: U.S. ATLAS Grid Testbed Status and Plans

June 20, 2002June 20, 2002Kaushik De DoE/NSF ReviewKaushik De DoE/NSF Review 4

Testbed Website

http://heppc1.uta.edu/atlas/grid-testbed/index.htm

Page 5: U.S. ATLAS Grid Testbed Status and Plans

June 20, 2002June 20, 2002Kaushik De DoE/NSF ReviewKaushik De DoE/NSF Review 5

Lawrence BerkeleyNational Laboratory

BrookhavenNationalLaboratoryIndiana

University

Boston University

ArgonneNationalLaboratory

U Michigan

University ofTexas atArlington

OklahomaUniversity

US -ATLAS testbed launched February 2001

Grid Testbed Sites

Page 6: U.S. ATLAS Grid Testbed Status and Plans

June 20, 2002June 20, 2002Kaushik De DoE/NSF ReviewKaushik De DoE/NSF Review 6

Testbed Fabric

8 production gatekeepers - ANL, BNL, 8 production gatekeepers - ANL, BNL, LBNL, BU, IU, UM, OU, UTALBNL, BU, IU, UM, OU, UTA http://heppc1.uta.edu/atlas/grid-testbed/testbed-sites.htm

Large clusters at BNL, LBNL, IU, UTA, BULarge clusters at BNL, LBNL, IU, UTA, BU BNL: RCF, LBNL: PDSF, IU/BU: prototype Tier 2

UTA awarded NSF MRI for acquisition of D0 & ATLAS grid facility ($950k+$400k) - Thanks!

+ Multiple R&D gatekeepers+ Multiple R&D gatekeepers gremlin@bnl - iVDGL GIIS

heppc5@uta - ATLAS hierarchical GIIS

atlas10/14@anl - EDG testing

heppc6@uta+gremlin@bnl - glue schema

heppc17/19@uta - GRAT development

few sites - Grappa portal

bnl - VO server

few sites - iVDGL testbed

Page 7: U.S. ATLAS Grid Testbed Status and Plans

June 20, 2002June 20, 2002Kaushik De DoE/NSF ReviewKaushik De DoE/NSF Review 7

Software Distribution

Jason Smith, Kaushik De, Saul Youssef, Jason Smith, Kaushik De, Saul Youssef, Wensheng Deng, Shava SmallenWensheng Deng, Shava Smallen

Goals:Goals: Easy installation by System Administrators

Uniform software versions

Pacman perfect for this task

First stage deploymentFirst stage deployment Done - May, 2002 Pacman, Globus 2.0b, cernlib GRAT application/production package

Second stage deploymentSecond stage deployment Magda, Grappa - June, 2002

Tools for distributed production

Third stageThird stage VDT 1.1.1, Chimera, … - July/August, 2002

Page 8: U.S. ATLAS Grid Testbed Status and Plans

June 20, 2002June 20, 2002Kaushik De DoE/NSF ReviewKaushik De DoE/NSF Review 8

Available Packages

Page 9: U.S. ATLAS Grid Testbed Status and Plans

June 20, 2002June 20, 2002Kaushik De DoE/NSF ReviewKaushik De DoE/NSF Review 9

Applications Team

Horst Severini, Kaushik De, Dan Engh, Horst Severini, Kaushik De, Dan Engh, Wensheng Deng, Ed MayWensheng Deng, Ed May

Goal:Goal: enable physicist to use testbed without worrying about underlying middleware or ATLAS software

Athena-Atlfast for grid testbedAthena-Atlfast for grid testbed Tool 1: runs on any globus enabled node (requires

transfer of ~17MB executable package)

Tool 2: runs on grid site where executable package has been preinstalled

Tool 3: runs on afs enabled sites (the latest version of software is built and used)

GRid Applications Toolkit: GRATGRid Applications Toolkit: GRAT Above plus grid tools - ver 0.1 released 4/12/02 tested successfully on 17 U.S. ATLAS

gatekeepers, CMS gatekeeper, D0 gatekeeper, EDG CE node (RH 6.x and RH 7.x), ...

Version 0.3 of GRAT released May 8, 2002

Next, add Magda+ & merge with GrappaNext, add Magda+ & merge with Grappa

Page 10: U.S. ATLAS Grid Testbed Status and Plans

June 20, 2002June 20, 2002Kaushik De DoE/NSF ReviewKaushik De DoE/NSF Review 10

GRAT v 0.3

Script based toolkit. Merging now with Grappa visual GUI tool (see Gardner talk)

Page 11: U.S. ATLAS Grid Testbed Status and Plans

June 20, 2002June 20, 2002Kaushik De DoE/NSF ReviewKaushik De DoE/NSF Review 11

Testbed Production

Goals: Goals: Demonstrate distributed ATLAS data production,

access and analysis using grid middleware and tools developed by the testbed group

Plans:Plans: Atlfast production to test middleware and tools,

and produce physics data for summer students, based on athena-atlfast, using VDT+Magda +Chimera and both GRAT and Grappa 2 weeks to regenerate data, once a month

deploy new tools and middleware each cycle

move away from farm paradigm to grid model

very aggressive schedule - people limited!

DC1 production to test fabric capabilities and produce and access data, using old Fortran code atlsim, atrig and atrecon (see previous talks) not repeatable - hard to actively test grid software

increase U.S. participation - involve grid testbed

Page 12: U.S. ATLAS Grid Testbed Status and Plans

June 20, 2002June 20, 2002Kaushik De DoE/NSF ReviewKaushik De DoE/NSF Review 12

Atlfast Production

Application: Application: Athena-atlfast

Current version 3.0.1. Next release will be 3.2.0 (official DC1 release)

Middleware: Middleware: VDT+Magda+Chimera

Interface: Interface: GRAT, Grappa

Sites: Sites: 8 ATLAS testbed sites, 2 CMS testbed sites, 2 D0 MC farms, EDG sites? TeraGrid sites?

June, 2002: June, 2002: Phase AlphaPhase Alpha Demonstrate software deployment and simple

production system done

Page 13: U.S. ATLAS Grid Testbed Status and Plans

June 20, 2002June 20, 2002Kaushik De DoE/NSF ReviewKaushik De DoE/NSF Review 13

Summer Schedule

July 1-15: July 1-15: Phase 0Phase 0, , 10^7 events10^7 events Globus 2.0 beta, Athena 3.0.1, Grappa, common

disk model, Magda, 5 physics processes, BNL VO manager, minimal job scheduler, GridView monitoring

August 5-19: August 5-19: Phase 1, Phase 1, 10^8 events10^8 events VDT 1.1.1, Hierarchical GIIS server, Athena-atlfast

3.2.0, Grappa, Magda - data & replica management with metadata catalogue, 10 physics processes, static MDS based job scheduler, new visualization

September 2-16: September 2-16: Phase 2, Phase 2, 10^9 events, 10^9 events, 1 TB storage, 40k files1 TB storage, 40k files Athena-atlfast 3.2.0 instrumented, 20 physics

processes, upgraded BNL VO manager, dynamic job scheduler, fancy monitoring

Need some planning of analysis toolsNeed some planning of analysis tools

Page 14: U.S. ATLAS Grid Testbed Status and Plans

June 20, 2002June 20, 2002Kaushik De DoE/NSF ReviewKaushik De DoE/NSF Review 14

Atlfast Production Architecture

BoxedAthena-Atlfast

JobOptions:HiggsSUSYQCDTopW/Z

Compute Sites

Grappa Portalor

GRAT script

User

ResourceBroker

Magda VDC

MDS Globus

StorageElements

Page 15: U.S. ATLAS Grid Testbed Status and Plans

June 20, 2002June 20, 2002Kaushik De DoE/NSF ReviewKaushik De DoE/NSF Review 15

Monitoring Team

Dantong Yu, Patrick McGuigan, Craig Tull, Dantong Yu, Patrick McGuigan, Craig Tull, Kaushik De, Shawn McKee, Dan Engh, Kaushik De, Shawn McKee, Dan Engh, Jason SmithJason Smith

Monitoring is critically important in Monitoring is critically important in distributed Grid computingdistributed Grid computing check system health, debug problems

discover resources using static data

job scheduling and resource allocation decisions using dynamic data from MDS and other monitors

Testbed monitoring prioritiesTestbed monitoring priorities Discover site configuration

Discover software installation

Application monitoring

Grid status/operations monitoring

Also needAlso need Well defined data for job scheduling

Visualization

Page 16: U.S. ATLAS Grid Testbed Status and Plans

June 20, 2002June 20, 2002Kaushik De DoE/NSF ReviewKaushik De DoE/NSF Review 16

Monitoring - Back End

Publishing MDS informationPublishing MDS information Glue schema - BNL & UTA

Pippy - Pacman information service provider

BNL ACAS schema

Hierarchical GIIS server

Non-MDS back endsNon-MDS back ends iPerf, Netlogger, Prophesy, Ganglia

ArchivingArchiving MySQL

GridView, BNL ACAS

RRD Network

Work neededWork needed What to store?

Replication of archived information

Good progress on back end!Good progress on back end!

Page 17: U.S. ATLAS Grid Testbed Status and Plans

June 20, 2002June 20, 2002Kaushik De DoE/NSF ReviewKaushik De DoE/NSF Review 17

Monitoring - Front End

MDS basedMDS based GridView, Gridsearcher

Converting TeraGrid and other toolkits

Non-MDSNon-MDS Cricket, Ganglia

Work neededWork needed Urgent for SC2002! Graphs, maps, drill-down…

New visualization team: Dantong Yu (evaluation of existing tools), Patrick McGuigan (Java CoG, Python), Jason Smith (PHP)

Page 18: U.S. ATLAS Grid Testbed Status and Plans

June 20, 2002June 20, 2002Kaushik De DoE/NSF ReviewKaushik De DoE/NSF Review 18

GridView 2.2

Simple visualization tool using Globus Toolkit First native Globus application for ATLAS grid (March 2001)

Collects information using Globus tools. Archival information is stored in MySQL server on a different machine. Data published through web server on a third machine.

http://heppc1.uta.edu/atlas/grid-status/index.html

Page 19: U.S. ATLAS Grid Testbed Status and Plans

June 20, 2002June 20, 2002Kaushik De DoE/NSF ReviewKaushik De DoE/NSF Review 19

Testbed Tools

Many tools developed by the U.S. ATLAS Many tools developed by the U.S. ATLAS testbed group during past yeartestbed group during past year

GridView - simple tool to monitor status of testbed Kaushik De, Patrick McGuigan

Gripe - unified user accounts Rob Gardner

Magda - MAnager for Grid DAta Torre Wenaus, Wensheng Deng (see Gardner & Wenaus talks)

Pacman - package management and distribution tool Saul Youssef

Being widely used or adopted by iVDGL VDT, Ganga, and others (see Gardner talk)

Grappa - web portal using active notebook technology Shava Smallen (see Gardner talk)

GRAT - GRid Application Toolkit

Gridsearcher - MDS browser Jennifer Schopf

GridExpert - Knowledge Database Mark Sosebee

VO Toolkit - Site AA Rich Baker (see Baker talk)

......

Page 20: U.S. ATLAS Grid Testbed Status and Plans

June 20, 2002June 20, 2002Kaushik De DoE/NSF ReviewKaushik De DoE/NSF Review 20

Integration!!

Coordination with other grid efforts and Coordination with other grid efforts and software developers - very difficult task!software developers - very difficult task!

Project centric:Project centric: GriPhyN/iVDGL - Rob Gardner PPDG - Torre Wenaus EDG - Ed May, Jerry Gieraltowski ATLAS/LHCb - Rich Baker ATLAS/CMS - Kaushik De ATLAS/D0 - Jae Yu

Fabric/Middleware centric:Fabric/Middleware centric: Afs Software installations - Alex Undrus, Shane

Canon, Iwona Sakrejda Networking - Shawn McKee, Rob Gardner Virtual and Real Data Management -

Wendsheng Deng, Sasha Vaniachin, Pavel Nevski, David Malon, Rob Gardner, Dan Engh, Mike Wilde, Yong Zhao, Shava Smallen

Security/Site AA/VO - Rich Baker, Dantong Yu

Page 21: U.S. ATLAS Grid Testbed Status and Plans

June 20, 2002June 20, 2002Kaushik De DoE/NSF ReviewKaushik De DoE/NSF Review 21

SC2002 Plans

SC2002 in Maryland, mid-November

Testbed Production demo (BNL) Kaushik De Monitor/interact with grid production

ATLAS/CMS demo (FNAL/SLAC) Kaushik De preliminary discussions with CMS

may become iVDGL demo (see Gardner talk)

ATLAS GRAT already running at CMS sites

GridView is monitoring two CMS sites

Application monitoring (LBNL) Craig Tull Athena + Netlogger + Prophesy

Virtual data demo (ANL/UC/IU) Rob Gardner

Common areas Brochure - Rob Gardner

Posters - Craig Tull

Common script - Jennifer Schopf

Page 22: U.S. ATLAS Grid Testbed Status and Plans

June 20, 2002June 20, 2002Kaushik De DoE/NSF ReviewKaushik De DoE/NSF Review 22

Testbed Production Demo. (in BNL booth)

ATLAS physics storyATLAS physics story

ATLAS computing storyATLAS computing story

Visualize production:Visualize production: Monitor site status

static - glue, pippy

dynamic - jobs, cpu usage

Monitor data status magda - visual?

VDC (same as IU booth)

Monitor applications Athena instrumented (same as LBNL booth)

Event display?Event display?

First version at LBNL US Computing First version at LBNL US Computing meeting July 29-31meeting July 29-31

Page 23: U.S. ATLAS Grid Testbed Status and Plans

June 20, 2002June 20, 2002Kaushik De DoE/NSF ReviewKaushik De DoE/NSF Review 23

SC2002 Demo

ATLAS-CMS Demo. Architecture

ATLAS-CMSUser Job

SchedulingPolicy

ATLAS-CMSTestbed

Visualization(status, physics)

ProductionJobs

MOP, GRAT, Grappa

Condor, Python?

Globus,Condor-G?

MDS,Ganglia,Paw/Root

??

Page 24: U.S. ATLAS Grid Testbed Status and Plans

June 20, 2002June 20, 2002Kaushik De DoE/NSF ReviewKaushik De DoE/NSF Review 24

Summary

Testbed -> SC2002Testbed -> SC2002 Recently refocused testbed activities and plans

Important grid-based production milestone this summer to test middleware using light-weight layered approach to software deployment

Testbed production should naturally lead to Supercomputing 2002 demos

Exploring various integration and cooperation issues - no need to reinvent the wheel

The testbed can provide a lot of resources, hardware and people, when fully grid-enabled

In summary - hardware not limiting problem yet! Middleware coming along. Need serious work on integration and deployment and testing. Shortage of people critical here - lab and university base funding shortages are the limiting factors!!