national grid cyberinfrastructure open science grid (osg) and teragrid (tg)

48
National Grid Cyberinfrastructure Open Science Grid (OSG) and TeraGrid (TG)

Upload: edith-cole

Post on 01-Jan-2016

221 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: National Grid Cyberinfrastructure Open Science Grid (OSG) and TeraGrid (TG)

National Grid Cyberinfrastructure

Open Science Grid (OSG) and

TeraGrid (TG)

Page 2: National Grid Cyberinfrastructure Open Science Grid (OSG) and TeraGrid (TG)

• What we’ve already learned so far– What grids are, why we want them and who is using them (Intro)– Grid Authentication and Authorization– Harnessing CPU cycles with Condor– Data Management and the Grid

In this lecture – Fabric level infrastructure: Grid building blocks– National Grid efforts in the US

• Open Science Grid• TeraGrid

Introduction

Page 3: National Grid Cyberinfrastructure Open Science Grid (OSG) and TeraGrid (TG)

Grid Resources in the US

• Research Participation

Majority from physics : Tevatron, LHC, STAR, LIGO.

Used by 10 other (small) research groups. 90 members, 30 VOs,

Contributors: 5 DOE Labs

BNL, Fermilab, NERSC, ORNL, SLAC.

65 Universities. 5 partner campus/regional grids.

Accessible resources: 43,000+ cores 6 Petabytes disk cache 10 Petabytes tape stores 14 internetwork partnership

Usage 15,000 CPU WallClock days/day 1 Petabyte data distributed/month. 100,000 application jobs/day. 20% cycles through resource sharing,

opportunistic use.

• Research Participation

Majority from physics : Tevatron, LHC, STAR, LIGO.

Used by 10 other (small) research groups. 90 members, 30 VOs,

Contributors: 5 DOE Labs

BNL, Fermilab, NERSC, ORNL, SLAC.

65 Universities. 5 partner campus/regional grids.

Accessible resources: 43,000+ cores 6 Petabytes disk cache 10 Petabytes tape stores 14 internetwork partnership

Usage 15,000 CPU WallClock days/day 1 Petabyte data distributed/month. 100,000 application jobs/day. 20% cycles through resource sharing,

opportunistic use.

• Research Participation Support for Science Gateways

over 100 scientific data collections

(discipline specific databases)

Contributors: 11 Supercomputing centers

Indiana, LONI, NCAR, NCSA, NICS, ORNL, PSC

, Purdue, SDSC, TACC and UC/ANL

• Computational resources:

– > 1 Petaflop computing capability

– 30 Petabytes of storage (disk and tape)

– Dedicated high performance internet

connections (10G)

TFLOPS (161K-cores) in parallel 750

computing systems and growing

• Research Participation Support for Science Gateways

over 100 scientific data collections

(discipline specific databases)

Contributors: 11 Supercomputing centers

Indiana, LONI, NCAR, NCSA, NICS, ORNL, PSC

, Purdue, SDSC, TACC and UC/ANL

• Computational resources:

– > 1 Petaflop computing capability

– 30 Petabytes of storage (disk and tape)

– Dedicated high performance internet

connections (10G)

750 TFLOPS (161K-cores) in parallel

computing systems and growing

TeraGridOSG

Page 4: National Grid Cyberinfrastructure Open Science Grid (OSG) and TeraGrid (TG)

OSG vs TG OSG TG

Computational Resource 43K-cores across 80 institutions

161K-cores across 11 institutions and 22 systems

Storage support a shared file system is not mandatory, and hence applications need to be aware of this

shared file system (NFS, PVFS, GPFS, Lustre) on each system, and even has a WAN GPFS mounted across most systems

Accessibility • Private IP space for compute nodes• no interactive sessions• supports Condor throughout• supports GT2 (and a few GT4)• the firewall is locked down

• More compute nodes public IP space• Support for interactive sessions with login and compute nodes• Supports GT2, GT4 for remote access, and mostly PBS/SGE and some Condor for local access• 10K ports open in the TG firewall on login and compute nodes

Page 5: National Grid Cyberinfrastructure Open Science Grid (OSG) and TeraGrid (TG)

Layout of Typical Grid Site

Computing Fabric

Grid MiddlewareGrid Level Services

++

=>

A Grid Site

=>globusglobus

ComputeElement

StorageElement

User Interface

Authz server

Monitoring Element

Monitoring Clients Services

Data Management

Services

Grid Operations

The Gr id

globusglobusglobusglobusglobusglobus

Globus,Condor, ++

Globus,Condor, ++

Page 6: National Grid Cyberinfrastructure Open Science Grid (OSG) and TeraGrid (TG)

Grid Monitoring & Information Services

Page 7: National Grid Cyberinfrastructure Open Science Grid (OSG) and TeraGrid (TG)

To efficiently use a Grid, you must locate and monitor its resources.

• Check the availability of different grid sites• Discover different grid services• Check the status of “jobs”• Make better scheduling decisions with information

maintained on the “health” of sites

Page 8: National Grid Cyberinfrastructure Open Science Grid (OSG) and TeraGrid (TG)

Open Science Grid Overview

Page 9: National Grid Cyberinfrastructure Open Science Grid (OSG) and TeraGrid (TG)

• grid service providers:– middleware developers– cluster, network and storage administrators– local-grid communities

• the grid consumers:– global collaborations – single researchers– campus communities – under-served science domains

into a cooperative infrastructure to share and sustain a common heterogeneous distributed facility in the US and beyond.

The Open Science Grid Consortium brings:

Page 10: National Grid Cyberinfrastructure Open Science Grid (OSG) and TeraGrid (TG)
Page 11: National Grid Cyberinfrastructure Open Science Grid (OSG) and TeraGrid (TG)

96 Resources across production & integration infrastructures

20 Virtual Organizations +6 operations

Includes 25% non-physics.

~20,000 CPUs (from 30 to 4000)

~6 PB Tapes

~4 PB Shared Disk

Snapshot of Jobs on OSGs

Sustaining through OSG submissions:

3,000-4,000 simultaneous jobs .

~10K jobs/day

~50K CPUhours/day.

Peak test jobs of 15K a day.

Using production & research networks

OSG Snapshot

Page 12: National Grid Cyberinfrastructure Open Science Grid (OSG) and TeraGrid (TG)

OSG sits in the middle of an environment of a Grid-of-Grids from Local to Global InfrastructuresInter-Operating and Co-Operating Grids: Campus, Regional, Community, National,

International. Virtual Organizations doing Research & Education.

Page 13: National Grid Cyberinfrastructure Open Science Grid (OSG) and TeraGrid (TG)

Overlaid by virtual computational environments of single to large groups of researchers local to worldwide

Page 14: National Grid Cyberinfrastructure Open Science Grid (OSG) and TeraGrid (TG)

OSG Grid Monitoring

Page 15: National Grid Cyberinfrastructure Open Science Grid (OSG) and TeraGrid (TG)

Open Science Grid

Page 16: National Grid Cyberinfrastructure Open Science Grid (OSG) and TeraGrid (TG)

Virtual Organization Resource Selector - VORShttp://vors.grid.iu.edu/

• Custom web interface to a grid scanner that checks services and resources on:– Each Compute Element – Each Storage Element

• Very handy for checking:– Paths of installed tools on Worker Nodes.– Location & amount of disk space for planning a workflow.– Troubleshooting when an error occurs.

Page 17: National Grid Cyberinfrastructure Open Science Grid (OSG) and TeraGrid (TG)

VORS entry for OSG_LIGO_PSU

Gatekeeper: grid3.aset.psu.edu

OSG Consortium Mtg March 2007Quick Start Guide to the OSG

Page 18: National Grid Cyberinfrastructure Open Science Grid (OSG) and TeraGrid (TG)

Gratia -- job accounting systemhttp://gratia-osg.fnal.gov:8880/gratia-reporting/

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 19: National Grid Cyberinfrastructure Open Science Grid (OSG) and TeraGrid (TG)

How do you join the OSG?A software perspective

(based on Alain Roy’s presentation)

Page 20: National Grid Cyberinfrastructure Open Science Grid (OSG) and TeraGrid (TG)

Joining OSG

• Assumption:– You have a campus grid

• Question: – What changes do you need to make to join OSG?

Page 21: National Grid Cyberinfrastructure Open Science Grid (OSG) and TeraGrid (TG)

Your Campus Grid

• assuming that you have a cluster with a batch system:

– Condor– Sun Grid Engine– PBS/Torque– LSF

Page 22: National Grid Cyberinfrastructure Open Science Grid (OSG) and TeraGrid (TG)

Administrative Work

• You need a security contact– Who will respond to security concerns

• You need to register your site

• You should have a web page about your site.– This will be published– People can learn about your site.

Page 23: National Grid Cyberinfrastructure Open Science Grid (OSG) and TeraGrid (TG)

Big Picture

• Compute Element (CE)– OSG jobs submitted to CE, which gives them to batch

system– Also has information services and lots of support software

• Shared file system– OSG requires a couple of directories to be mounted on all

worker nodes

• Storage Element (SE)– How do you manage your storage at your site

Page 24: National Grid Cyberinfrastructure Open Science Grid (OSG) and TeraGrid (TG)

Installing Software

• The OSG Software Stack– Based on the VDT

•The majority of the software you’ll install•It is grid independent

– OSG Software Stack:•VDT + OSG-specific configuration

• Installed via Pacman

Page 25: National Grid Cyberinfrastructure Open Science Grid (OSG) and TeraGrid (TG)

What is installed?

• GRAM: – Allows job submissions

• GridFTP: – Allows file transfers

• CEMon/GIP: – Publishes site information

• Some authorization mechanism– grid-mapfile: file that lists authorized users, or– GUMS (grid identity mapping service)

• And a few other things…

Page 26: National Grid Cyberinfrastructure Open Science Grid (OSG) and TeraGrid (TG)

OSG Middleware

Infr

astr

uctu

reA

pplic

atio

ns VO Middleware

Core grid technology distributions: Condor, Globus, Myproxy: shared with TeraGrid and

others

Virtual Data Toolkit (VDT) core technologies + software needed by

stakeholders: many components shared with EGEE

OSG Release Cache: OSG specific configurations, utilities etc.

HEP

Data and workflow management etc

Biology

Portals, databases etc

User Science Codes and Interfaces

Existing Operating, Batch systems and Utilities.

Astrophysics

Data replication etc

Page 27: National Grid Cyberinfrastructure Open Science Grid (OSG) and TeraGrid (TG)

Picture of a basic site

Page 28: National Grid Cyberinfrastructure Open Science Grid (OSG) and TeraGrid (TG)

Shared file system

• OSG_APP– For users to store applications

• OSG_DATA– A place to store data– Highly recommended, not required

• OSG_GRID– Software needed on worker nodes– Not required– May not exist on non-Linux clusters

• Home directories for users– Not required, but often very convenient

Page 29: National Grid Cyberinfrastructure Open Science Grid (OSG) and TeraGrid (TG)

Storage Element

• Some folks require more sophisticated storage

management– How do worker nodes access data?

– How do you handle terabytes (petabytes?) of data

• Storage Elements are more complicated– More planning needed

– Some are complex to install and configure

• Two OSG supported options of SRMs:– dCache

– Bestman

Page 30: National Grid Cyberinfrastructure Open Science Grid (OSG) and TeraGrid (TG)

More information

• Site planning– https://twiki.grid.iu.edu/twiki/bin/view/ReleaseDocumentation/SitePlanning

• Installing OSG Software Stack– https://twiki.grid.iu.edu/twiki/bin/view/ReleaseDocumentation/– Tutorial: http://www.mcs.anl.gov/~bacon/osgedu/sa_intro.html

Page 31: National Grid Cyberinfrastructure Open Science Grid (OSG) and TeraGrid (TG)

Genome Analysis and Database Update system

• Runs across TeraGrid and OSG. Uses the Virtual Data System (VDS) workflow & provenance.

•Pass through public DNA and protein databases for new and newly updated genomes of different organisms and runs BLAST, Blocks, Chisel. 1200 users of resulting DB.

• Request: 1000 CPUs for 1-2 weeks. Once a month, every month. On OSG at the moment >600CPUs and 17,000 jobs a week.

Page 32: National Grid Cyberinfrastructure Open Science Grid (OSG) and TeraGrid (TG)

Summary of OSG

• Provides core services, software and a distributed facility for an increasing set of research communities.

• Helps VOs access resources on many different infrastructures.

• Interested in collaborating and contributing our experience and efforts.

Page 33: National Grid Cyberinfrastructure Open Science Grid (OSG) and TeraGrid (TG)

42Computational Resources (size approximate - not to scale)

Tommy Minyard, TACC

SDSC

TACC

UC/ANL

NCSA

ORNL

PU

IU

PSC

NCAR

2007(504TF)

2008(~1PF)Tennessee

LONI/LSU

Page 34: National Grid Cyberinfrastructure Open Science Grid (OSG) and TeraGrid (TG)

The TeraGrid Facility

• Grid Infrastructure Group (GIG)– University of Chicago

– TeraGrid integration, planning, management, coordination

– Organized into areas (and not VOs as OSG)• User Services

• Operations

• Gateways

• Data/Visualization/Scheduling

• Education Outreach & Training

• Software Integration

• Resource Providers (RP)– Ex: NCSA, SDSC, PSC, Indiana, Purdue, ORNL, TACC, UC/ANL

– Systems (resources, services) support, user support

– Provide access to resources via policies, software, and mechanisms coordinated by and provided through the GIG.

Page 35: National Grid Cyberinfrastructure Open Science Grid (OSG) and TeraGrid (TG)

SDSC

TACC

UC/ANL

NCSA

ORNL

PU

IU

PSC

NCAR

Caltech

USC/ISI

UNC/RENCI

UW

Resource Provider (RP)

Software Integration Partner

Grid Infrastructure Group (UChicago)

11 Resource Providers, One Facility

LONI

NICS

Page 36: National Grid Cyberinfrastructure Open Science Grid (OSG) and TeraGrid (TG)

TeraGrid Hardware Components

• High-end compute hardware– Intel/Linux clusters

– Alpha SMP clusters

– IBM POWER3 and POWER4 clusters

– SGI Altix SMPs

– SUN visualization systems

– Cray XT3

– IBM Blue Gene/L

• Large-scale storage systems– hundreds of terabytes for secondary storage

• Visualization hardware• Very high-speed network backbone (40Gb/s)

– bandwidth for rich interaction and tight coupling

Page 37: National Grid Cyberinfrastructure Open Science Grid (OSG) and TeraGrid (TG)

TeraGrid Objectives

• DEEP Science: Enabling Petascale Science– Make Science More Productive through an integrated set

of very-high capability resources•Address key challenges prioritized by users

• WIDE Impact: Empowering Communities– Bring TeraGrid capabilities to the broad science

community•Partner with science community leaders - “Science Gateways”

• OPEN Infrastructure, OPEN Partnership– Provide a coordinated, general purpose, reliable set of

services and resources•Partner with campuses and facilities

Page 38: National Grid Cyberinfrastructure Open Science Grid (OSG) and TeraGrid (TG)

TeraGrid Resources and Services• Computing -

– nearly a Petaflop of computing power today

– 500 Tflop Ranger system at TACC

• Remote visualization servers and software

• Data – Allocation of data storage facilities

– Over 100 Scientific Data Collections

• Central allocations process

• Technical Support– Central point of contact for support of all systems

– Advanced Support for TeraGrid Applications (ASTA)

– Education and training events and resources

– Over 20 Science Gateways

Page 39: National Grid Cyberinfrastructure Open Science Grid (OSG) and TeraGrid (TG)

Requesting Allocations of Time

• TeraGrid resources are provided for free to academic researchers and educators

– Development Allocations Committee (DAC) for start-up accounts up to 30,000 hours of time are requests processed in two weeks - start-up and courses

– Medium Resource Allocations Committee (MRAC) for requests of up to 500,000 hours of time are reviewed four times a year

– Large Resource Allocations Committee (LRAC) for requests of over 500,000 hours of time are reviewed twice a year

Page 40: National Grid Cyberinfrastructure Open Science Grid (OSG) and TeraGrid (TG)

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

PIs (879)

Active Users

(3,197)

Charging Users

(1,141)

Allocations (1.8B NUs)

NUs (618M NUs)

All 20 Others (< 2% Usage each) Atmospheric Sciences

Chemical, Thermal Systems

Materials Research

Astronomical Sciences

Physics

Chemistry

Molecular Biosciences

TeraGrid User Community

Page 41: National Grid Cyberinfrastructure Open Science Grid (OSG) and TeraGrid (TG)

TeraGrid Web Resources

• TeraGrid User Portal for managing user allocations and job flow

• Knowledge Base for quick answers to technical questions

• User Information including documentation, information about hardware and software resources

• Science Highlights

• News and press releases

• Education, outreach and training events and resources

TeraGrid Provides a rich array of web-based resources:

In general, seminars and workshops will be accessible via video on the Web. Extensive documentation will also be Web-based.

Page 42: National Grid Cyberinfrastructure Open Science Grid (OSG) and TeraGrid (TG)

Science GatewaysBroadening Participation in TeraGrid

• Increasing investment by communities in their own cyberinfrastructure, but heterogeneous:

• Resources• Users – from expert to K-12• Software stacks, policies

• Science Gateways– Provide “TeraGrid Inside”

capabilities– Leverage community investment

• Three common forms:– Web-based Portals – Application programs running on

users' machines but accessing services in TeraGrid

– Coordinated access points enabling users to move seamlessly between TeraGrid and other grids.

Technical Approach

Biomedical and Biology, Building Biomedical Communities

OGCE Portletswith ContainerOGCE Portletswith Container

Apache JetspeedInternal ServicesApache JetspeedInternal Services

ServiceAPI

ServiceAPI

GridProtocols

GridServiceStubs

GridServiceStubs

RemoteContentServices

RemoteContentServices

RemoteContentServersHTTP

GridService

sLocalPortal

Services

LocalPortal

Services

Grid Resources

Open Source Tools

Build standard portals to meet the domain requirements of the biology communitiesDevelop federated databases to be replicated and shared across TeraGrid

Workflow Composer

Source: Dennis Gannon ([email protected])

Page 43: National Grid Cyberinfrastructure Open Science Grid (OSG) and TeraGrid (TG)

HPC Education and Training

• Workshops, institutes and seminars on high-performance scientific computing

• Hands-on tutorials on porting and optimizing code for the TeraGrid systems

• On-line self-paced tutorials

• High-impact educational and visual materials suitable for K–12, undergraduate and graduate classes

TeraGrid partners offer training and education events and resources to educators and researchers:

Page 44: National Grid Cyberinfrastructure Open Science Grid (OSG) and TeraGrid (TG)

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

“HPC University”• Advance researchers’ HPC skills

– Catalog of live and self-paced training– Schedule series of training courses– Gap analysis of materials to drive development

• Work with educators to enhance the curriculum– Search catalog of HPC resources– Schedule workshops for curricular development– Leverage good work of others

• Offer Student Research Experiences– Enroll in HPC internship opportunities– Offer Student Competitions

• Publish Science and Education Impact– Promote via TeraGrid Science Highlights, iSGTW– Publish education resources to NSDL-CSERD

Page 45: National Grid Cyberinfrastructure Open Science Grid (OSG) and TeraGrid (TG)

Sampling of Training Topics Offered• HPC Computing

– Introduction to Parallel Computing– Toward Multicore Petascale Applications– Scaling Workshop - Scaling to Petaflops– Effective Use of Multi-core Technology – TeraGrid - Wide BlueGene Applications – Introduction to Using SDSC Systems – Introduction to the Cray XT3 at PSC – Introduction to & Optimization for SDSC Sytems – Parallel Computing on Ranger & Lonestar

• Domain-specific Sessions– Petascale Computing in the Biosciences – Workshop on Infectious Disease Informatics at NCSA

• Visualization– Introduction to Scientific Visualization– Intermediate Visualization at TACC– Remote/Collaborative TeraScale Visualization on the TeraGrid

• Other Topics– NCSA to host workshop on data center design – Rocks Linux Cluster Workshop– LCI International Conference on HPC Clustered Computing

• Over 30 on-line asynchronous tutorials

Page 46: National Grid Cyberinfrastructure Open Science Grid (OSG) and TeraGrid (TG)

ANL/UC IU NCSA ORNL PSC Purdue SDSC TACC

ComputationalResources

Itanium 2(0.5 TF)

IA-32(0.5 TF)

Itanium2(0.2 TF)

IA-32(2.0 TF)

Itanium2(10.7 TF)

SGI SMP (7.0 TF)

Dell Xeon(17.2TF)

IBM p690(2TF)

Condor Flock(1.1TF)

IA-32 (0.3 TF)

XT3 (10 TF)

TCS (6 TF)

Marvel SMP

(0.3 TF)

Hetero(1.7 TF)

IA-32(11 TF)Opportunistic

Itanium2(4.4 TF)

Power4+(15.6 TF)

Blue Gene(5.7 TF)

IA-32(6.3 TF)

Online Storage 20 TB 32 TB 1140 TB 1 TB 300 TB 26 TB 1400 TB 50 TB

Mass Storage 1.2 PB 5 PB 2.4 PB 1.3 PB 6 PB 2 PB

Net Gb/s, Hub 30 CHI 10 CHI 30 CHI 10 ATL 30 CHI 10 CHI 10 LA 10 CHI

DataCollections# collectionsApprox total sizeAccess methods

5 Col.>3.7 TBURL/DB/GridFTP

> 30 Col.URL/SRB/DB/GridFTP

4 Col.7 TBSRB/Portal/OPeNDAP

>70 Col.>1 PBGFS/SRB/DB/GridFTP

4 Col. 2.35 TBSRB/Web Services/URL

Instruments ProteomicsX-ray Cryst.

SNS and HFIR Facilities

VisualizationResourcesRI: Remote InteractRB: Remote BatchRC: RI/Collab

RI, RC, RB IA-32, 96 GeForce 6600GT

RBSGI Prism, 32 graphics pipes; IA-32

RI, RBIA-32 + Quadro4 980 XGL

RBIA-32, 48 Nodes

RB RI, RC, RBUltraSPARC IV, 512GB SMP, 16 gfx cards

TeraGrid Resources

100+ TF8 distinct

architectures

3 PB Online Disk

>100 data collections

Page 47: National Grid Cyberinfrastructure Open Science Grid (OSG) and TeraGrid (TG)

Applications can cross infrastructures e.g: OSG and TeraGrid

Page 48: National Grid Cyberinfrastructure Open Science Grid (OSG) and TeraGrid (TG)

More Info:

• Open Science Grid–http://www.opensciencegrid.org

• TeraGrid–http://www.teragrid.org