star grid activities, osg and beyond

25
STAR STAR STAR Grid Activities, OSG and Beyond D. Olson a for the STAR Collaboration The STAR Grid Team: W. Betts b , L. Didenko b , T. Freeman c , P. Jakl b , L. Hajdu b , E. Hjort a , K. Keahey c , J. Lauret b , D. Olson a , A. Rose a , I. Sakrejda a , A. Sim a a LBNL, b BNL, c ANL

Upload: tovah

Post on 13-Jan-2016

31 views

Category:

Documents


0 download

DESCRIPTION

STAR Grid Activities, OSG and Beyond. D. Olson a for the STAR Collaboration The STAR Grid Team: W. Betts b , L. Didenko b , T. Freeman c , P. Jakl b , L. Hajdu b , E. Hjort a , K. Keahey c , J. Lauret b , D. Olson a , A. Rose a , I. Sakrejda a , A. Sim a a LBNL, b BNL, c ANL. Abstract. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: STAR Grid Activities,  OSG and Beyond

STARSTAR

STAR Grid Activities, OSG and Beyond

D. Olsona for the STAR CollaborationThe STAR Grid Team:

W. Bettsb, L. Didenkob, T. Freemanc, P. Jaklb, L. Hajdub, E. Hjorta, K. Keaheyc, J. Lauretb,

D. Olsona, A. Rosea, I. Sakrejdaa, A. Sima

aLBNL, bBNL, cANL

Page 2: STAR Grid Activities,  OSG and Beyond

Olson, STAR Grid Activities, ISGC2008 2

STARSTAR9 Apr 2008

Abstract

We will present the ongoing grid efforts of the STAR experiment within the Open Science Grid (OSG) and beyond, as well as the integration of resources in Europe, Asia and South America. STAR is a founding member of the OSG Consortium and has several functioning resources on OSG, its main facilities at BNL/RCF and LBNL/NERSC as well as universities, Wayne & Birmingham. Additional resources are in process of connecting to OSG. Numerous distributed resources used by STAR collaborators are employing grid or grid-inspired technologies. Common examples are the usage of grid job submission tools with the STAR standard workload service called SUMS and the use of data handling and transfer tools across grids. Maximizing on heterogeneity of resources while minimizing in-house platform support efforts, evaluation of the dynamic deployment of reliable data analysis framework via STAR validated software stack with Xen virtual machine is being thoroughly investigated, leveraging advanced VM technologies and research from the CEDPS project.

Page 3: STAR Grid Activities,  OSG and Beyond

Olson, STAR Grid Activities, ISGC2008 3

STARSTAR9 Apr 2008

Contents

• Background/History• Open Science Grid Deployments and Usage• Other Distributed Computing Usage• Asian Activities• Workload Scheduling (SUMS)• Virtualization & Cloud Computing• Conclusion

Page 4: STAR Grid Activities,  OSG and Beyond

Olson, STAR Grid Activities, ISGC2008 4

STARSTAR9 Apr 2008

Background/History

• STAR has been participating in the U.S. grid activities since the early days of the Particle Physics Data Grid (1999) and a founding member of the Open Science Grid.

• Starting with involvement of LBNL and BNL, activities now include collaborators also at Wayne State, MIT, Univ. Chicago, Birmingham, Sao Paolo, Prague and ANL.

• Additionally– SUN Grid, 2007

– MIT Xgrid, 2006+

– Xen, Amazon EC2, 2007+

Page 5: STAR Grid Activities,  OSG and Beyond

STARSTAR

PDSFBerkeley LAB

Brookhaven National Lab

Fermi Lab                   

           

University of Birmingham

Wayne State University

STAR Grid STAR Grid = 90% of Grid resources part of the OpenScience Grid

Page 6: STAR Grid Activities,  OSG and Beyond

STARSTAR

Amazon.com

                  

           

MIT X-grid

SunGrid

NPI, Czech Republic

Interoperability / outreach Virtualization VDT extension SRM / DPM / EGEE

STAR is also outreaching other grid resources & projects

Page 7: STAR Grid Activities,  OSG and Beyond

Olson, STAR Grid Activities, ISGC2008 7

STARSTAR9 Apr 2008

Resources used by STAR

6 main dedicated sites (STAR software fully installed)• BNL Tier0• NERSC/PDSF Tier1• WSU (Wayne State University) Tier2• BHAM (Birmingham, England) Tier2• UIC (University of Illinois, Chicago) Tier2

Incoming• Prague Tier2

Other resources• FermiGrid - non STAR dedicated ; simulation production 10% level• SunGrid – commercial (free for STAR) ; event generation 1-2% level• MIT Xgrid cluster – analysis mainly ; working on Globus GK for Mac OSX• Amazon.com EC2 cluster (Elastic Computing Cloud) ; event generation for

now ; exercise on Xen based virtualization 1-2% level

Page 8: STAR Grid Activities,  OSG and Beyond

Olson, STAR Grid Activities, ISGC2008 8

STARSTAR9 Apr 2008

BeStMan SRMBerkeley Storage Manager

• SRM interface with caching for data transfer• We use for bulk data transfer as well as asynchronous data

placement in job workflow.• Expect to deploy BeStMan-Xrootd interface

http://datagrid.lbl.gov/bestman/

Page 9: STAR Grid Activities,  OSG and Beyond

Olson, STAR Grid Activities, ISGC2008 9

STARSTAR9 Apr 2008

OSG usageUsage - Process Hours / Week

Page 10: STAR Grid Activities,  OSG and Beyond

Olson, STAR Grid Activities, ISGC2008 10

STARSTAR9 Apr 2008

Proof of Principle Initial Successes and Benefits from OSG• Year 1 OSG Milestone for STAR:

– Migration of 80% or more of the simulation production to OSG based operation• Simulation production - 97% efficiency achieved

– Exceeds expectations (we targeted a satisfactory level between 75% to 85% success)• Site used are not necessarily STAR dedicated (FermiGrid)

– Especially: STAR received help from Fermi resources and the FNAL team in June 2007• several k CPU hours loaned on emergency request• as small as it seems, this help made the difference

– This part of resource loan worked and is an important proof of principle of OSG benefit

Before resubmission.

After resubmission

Efficiency of job executionvia OSG infrastructure.

Page 11: STAR Grid Activities,  OSG and Beyond

Olson, STAR Grid Activities, ISGC2008 11

STARSTAR9 Apr 2008

Other grid/distributed activities

• Xgrid at MIT– Adam Kocoloski, Michael Miller

Leve Hajdu– Mac OS X, 50 desktops– Scavenging spare cycles– Doing STAR data analysis via

SUMS so same UI for analysis– Xgrid/Globus job manager in test

• Prague, EGEE Tier2 site– Michal Zerola, Pavl Jakl– High-performance data transfer using multiple srmcp to DPM in

Prague (next slide)

• SUN Grid– Production of STAR Geant simulations on SUN utility computing

resources.

Page 12: STAR Grid Activities,  OSG and Beyond

Olson, STAR Grid Activities, ISGC2008 12

STARSTAR9 Apr 2008

Data transfer to Prague:

parallel srmcp to DPM storage element, 700 Mbps – 20 threads

Page 13: STAR Grid Activities,  OSG and Beyond

Olson, STAR Grid Activities, ISGC2008 13

STARSTAR9 Apr 2008

STAR Asian institutions

• China– IHEP, Beijing (2)– Institute of Modern Physics, Lanzhou (6)– USTC, Beijing (14)– Shanghai Institute of Applied Physics (11)– Tsinghua University (9)– Institute of Particle Physics, Wuhan (12)

• India– Institute of Physics, Bhubaneswar (4)– Indian Institute of Technology, Mumbai (5)– University of Jammu (15)– Panjab University (5)– University of Rajasthan (3)– Variable Energy Cyclotron Centre, Kolkata (14)

• Korea– Pusan National University (4)– KISTI (in progress as CS collaborator)

Page 14: STAR Grid Activities,  OSG and Beyond

Olson, STAR Grid Activities, ISGC2008 14

STARSTAR9 Apr 2008

Asian Activities

• Many collaborators in Asia• Planning for Tier2-like facility at PNU• Discussions with KISTI of possible Tier1-like facility for

Asia region• Anxious to see how we can better interface/integrate

with our Asian collaborators on computational aspects

Page 15: STAR Grid Activities,  OSG and Beyond

Olson, STAR Grid Activities, ISGC2008 15

STARSTAR9 Apr 2008

Gloriad

• 10 Gb all the way through NY

• Would allow for immediate full data transfer• Would allow later year ½ dataset transfer

– Possibly more depending on Gloriad expansion

Page 16: STAR Grid Activities,  OSG and Beyond

Olson, STAR Grid Activities, ISGC2008 16

STARSTAR9 Apr 2008

SUMS

• STAR Unified Meta Scheduler• A single user interface and

framework for submitting to all STAR resources, local and grid flavors

• Optimizes resource utilization

25K jobs/day

Page 17: STAR Grid Activities,  OSG and Beyond

Olson, STAR Grid Activities, ISGC2008 17

STARSTAR9 Apr 2008

Why Xen? Virtualization?

• SIMULATION = EVENT GENERATION IS EASY …– We can all do it …

• BEYOND THAT, the reality– Complex experimental application codes

• Developed over more than 10 years, by more than 100 scientists, comprises ~2 M lines of C++ and Fortran code

– Require complex, customized environments• Rely on the right combination of compiler versions and available

libraries • Dynamically load external libraries depending on the task to be

performed – Environment validation

• To ensure reproducibility and result uniformity across environments• Regression tests cannot be done on all OS flavors due to simple

manpower considerations)

Page 18: STAR Grid Activities,  OSG and Beyond

Olson, STAR Grid Activities, ISGC2008 18

STARSTAR9 Apr 2008

Why Xen? Virtualization?

• Solution? Use Virtual Machines (Xen)– Bring your environment with you– Fast to deploy, enables short-term leasing– Excellent enforcement, performance isolation– Very good security isolation– Minimize experiment team’s efforts

• Activity ↔ Development effort leveraged though CEDPS SciDAC partner project

Page 19: STAR Grid Activities,  OSG and Beyond

Olson, STAR Grid Activities, ISGC2008 19

STARSTAR9 Apr 2008

Deploying OSG Cluster as Workspaces

Poolnode

Poolnode

Poolnode

Poolnode

Poolnode

Poolnode

Poolnode

Poolnode

Poolnode

Poolnode

Poolnode

Poolnode

VWSService

Cluster manager can deploy gatekeeper and

workernodes in ~ 30 min.

Application workload submitted to cluster as to any other OSG CE.

OSG CE image as gatekeeper

Worker node images with application

environment.

Cluster can be retired after workload finishes,

freeing resources for other applications.

Page 20: STAR Grid Activities,  OSG and Beyond

Olson, STAR Grid Activities, ISGC2008 20

STARSTAR9 Apr 2008

Virtual Machine activities

• “Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides resizable compute capacity in the cloud. It is designed to make web-scale computing

easier for developers.”• Work so far:

– Xen image with OSG 0.6.0 CE on SL 4.4– Xen image with OSG 0.6.0 WN on SL 4.4– Use Globus Workspaces to deploy gatekeeper and workernodes

on EC2– Can launch 100 node cluster in ~ 30 min.– Have run Hijing event generator simulations on EC2.– Have prepared Xen image with full STAR software environment

on SL4.4, currently being validated

• Next steps:– Run event reconstruction of simulations on EC2 and Teraport

cloud

Page 21: STAR Grid Activities,  OSG and Beyond

Olson, STAR Grid Activities, ISGC2008 21

STARSTAR9 Apr 2008

NerscPDSF

ENC2Amazon.com

WSU

Accelerated display of a workflow job state Y = job number, X = job state

Page 22: STAR Grid Activities,  OSG and Beyond

Olson, STAR Grid Activities, ISGC2008 22

STARSTAR9 Apr 2008

VM image build/maintenance

• We are working with rPath, Inc. in an SBIR project to use rBuilder to efficiently build and maintain OS and application images.

• From the inventors of RPM,rBuilder– http://www.rpath.com/rbuilder– “rBuilder is the first and only development tool that simplifies and

automates the creation of software appliances and virtual appliances. rBuilder combines powerful features with innovative packaging techniques to yield a repeatable appliance creation process. “

Page 23: STAR Grid Activities,  OSG and Beyond

Olson, STAR Grid Activities, ISGC2008 23

STARSTAR9 Apr 2008

Near term plans• We MUST prepare for real data production on OSG

– And take ANY shortcut necessary to accomplish it BY 2009• onset of DAQ1000, one order of magnitude higher data acquisition rate than today

will require additional resources for real-data processing• Virtualization appears to us as one development helping to easily deploy & run a 2

Million line framework (software) for data mining– UCM job tracking (SBIR with Tech-X) is maturing

• Essential to engage discussion on integration – we MUST monitor our application

• We have to consolidate our sites– More resources are available in STAR but not-fully used (BHAM, UIC for example)

• We will ramp up in infrastructure support to achieve this• We hope leveraging OSG efforts in the US (UIC for example)

– We have efforts in integrating Mac OS-X resources from MIT• Initial work was uniquely started in STAR• Is there a path forward? Depends on priorities …

Page 24: STAR Grid Activities,  OSG and Beyond

Olson, STAR Grid Activities, ISGC2008 24

STARSTAR9 Apr 2008

Longer term needs

• Requirements driven by demanding data processing– https://twiki.grid.iu.edu/twiki/bin/view/UserGroup/VOApplicationsRequire

ments#STAR– We will need to efficiently share resources

• Concerned about what happens when LHC has ramped up data taking.

• Will there be any cycles left to be had?

• Additional– STAR is expanding its pool of sites

• Interest in sites possibly shared by EGEE - OSG interoperability (especially China)

• Hoping for help from OSG to understand policy as well as technology issues.

– We believe virtualization is “a” path forward to • Simple deployment of experimental software• Allowing experimental software developer’s team to concentrate on

science and a minimal OS version support• Globus workload management needed

Page 25: STAR Grid Activities,  OSG and Beyond

Olson, STAR Grid Activities, ISGC2008 25

STARSTAR9 Apr 2008

Conclusion

• STAR Grid usage is expanding geographically and functionally.

• Upgrades at STAR and RHIC are driving a significant increase in computational needs beginning next year which means we MUST push more workload onto the grid.

• The emergence (and convergence?) of VM, cloud computing and grid make very powerful paradigm for scientific computing.

• We want (and need) to have greater involvement with our Asia-Pacific colleagues which is enabled with new trans-Pacific networks.