thursday, august 21, 2008 cyberinfrastructure for research teams uab high performance computing...

38
Thursday, August 21, 2008 Cyberinfrastructure for Cyberinfrastructure for Research Teams Research Teams UAB High Performance Computing Services John-Paul Robinson <[email protected]>

Upload: leonard-smith

Post on 28-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Thursday, August 21, 2008

Cyberinfrastructure for Cyberinfrastructure for Research TeamsResearch Teams

UAB High Performance Computing Services

John-Paul Robinson <[email protected]>

UAB Cyberinfrastructure (CI)Investments

Common Network User Identity (BlazerID) for consistent identity across systems

Early Internet2 Member providing high bandwidth network access to other research campuses

High Performance Computing (HPC) Investments to build investigative capacity for computational research

On-going Model of Engagement to support Research Technology Investments

Alabama State Optical Network and National LamdaRail

Alabama SON is a very high bandwidth lambda network. Operated by SLR.

Connects major research institutions across state

Connects Alabama to National Lambda Rail and Internet2

10GigE Campus Research Network

Connects campus HPC centers to facilitate resource aggregation

Compute clusters scheduled for connectivity

Facilitates secure network build outs

Expands access to regional and national compute resources

UAB Investments in HPC

Cyberinfrastructure Elements

A Continuum of Identity lower assurance – facilitates collaboration higher assurance – facilitates trust

Maximized Network Bandwidth Pools of Execution Resources A Common Data Framework Reliability and Performance Monitoring

Harnessing CI with the Grid

Interconnects and coordinates resources across administrative domains

Uses standard, open, and general purpose interfaces and protocols

Allows resource combination to deliver high quality services built on the core utility

The “grid” is the Fabric of Interconnected Resources

About UABgrid

Leverages Local, Regional and National Cyberinfrastructure Components

Identity, Execution, Data, Status, and Networking

Integrated Technology Infrastructure to Facilitate and Encourage Collaboration

Remember: It's All About the Data Sharing Information is the Motivation for Collaboration

UABgrid Overview

UABgrid Pilot launched at campus HPC Boot-Camp September 2007

User-driven collaboration environment supports web and grid applications

Leverages InCommon for user identification SSO for web applications and VO Management Self-service certificate generation for Globus users

Provides meta-cluster to harness on- and off-campus compute power using GridWay

Cyberinfrastructure

IdM Exec Data NetInfo

UABgrid

Application1

Application4Application

3

Application2

Building Standard Service Interfaces

Infrastructure to Support Application Domains

Cyberinfrastructure

IdM Exec Data NetInfo

UABgrid

Application1

Research UserAdminEducation

UABgrid Provides Services to Research Applications

Cyberinfrastructure

IdM Exec Data NetInfo

UABgrid

Research Applications

Users StatsFilesProcessesGroups Comm

UABgrid Applications and Services

Collaboration Support VO Tools: VO Mgmt, Mail lists, Wiki's, Project Mgmt, Portals...

Research Applications Support Compute Expansion

Goals Generic Model Current Focus is Workflow Migration

Science Domains Microbiology -- DynamicBLAST Statistical Genetics – R Statistical Package Cancer Center – caBIG

UABgrid VO Management: User Attributes to Apps

IdP1IdP1

IdP1IdP2

IdP1IdPn

App1

Appn

App2

Identity Providers Applications

UserAttributes

VOAttributes

myVocs System

Collaboration Support

myVocs box forms the core of VO collaboration infrastructure

VO resources like mailing list, wiki's, and Trac intrinsic to VO and can access common authorization information

Additional web collaboration tools instantiated as needed (eg. Gridsphere)

VO resources hosted in VM cloud dev.uabgrid is a working VO model for the

construction and management of UABgrid

Compute Expansion

Meta-scheduling: Grid Cluster Cluster Upgrades and Acquisitions Resource Aggregation

State Resources Regional Resources via SURAgrid National and International Resources via TeraGrid &

Open Science Grid

UABgrid Compute Cluster Test Architecture

UABgrid Pilot Meta-Cluster Specifications

Today 2 campus clusters + ASA resource: 912 processing cores,

>5TFlops of power

2009 Targets Add all shared campus clusters: 1156 more processing cores

and 10TFlops of additional power

On Going Local expansion though campus HPC investments Engage SURAgrid, OSG, TeraGrid, and other grid compute

suppliers for more compute power

SURAgrid

Drawing Power from the Grid

Generic Grid Application Model

Command Line Custom ClientWeb Portal

Appllication Workflow Logic

Metascheduling: GridWay, DRMAA, Swift, Pegasus, Avalon

Globus Client Tools

Globus Services Globus ServicesGlobus Services

SGE

AppCode Data

LSF

AppCode Data

PBS

AppCode Data

Cluster 1 Cluster nCluster 2

Grid Migration Goals

Eliminate need for user-level grid technology awareness

Build on grid middleware, tools, and standards to maximize portability and resource utilization

Manage and leverage variable resource availability and dynamic load balancing

Efficiently and transparently handle issues like application availability, fault tolerance, and interoperability

Application Containers Simplify Administration

Types of Containers User Accounts Java Boxes Virtual Machines

Account Containers Initial target because most common and addresses R application

configuration Allows for library dependency and site dependency configuration Full continuum of deployment options from fully staged for each

job to statically cached on resources

Migrating Workflows to Grid

Statistical Genetics R Statistical Package Methodological Analysis Workflow Many Isolated Computations Work in Progress and Promising Results Developing Work led by John-Paul Robinson in UAB HPC Services

Microbiology DynamicBLAST – Grid Version of BLAST Master Worker Type Application Maximize Throughput, Minimize Job Turn-around Leading Model for Migrations Work led by Enis Afgan and Dr. Puri Bangalore in CIS

Statistical Genetics on the Grid – MIG

Microarray

The Grid

TissueSamples

Data

Clusters

MIG Workflow Powered by the Grid

10 100 200 250 500 1000

0

50

100

150

200

250

300

350

400

450

500

MIG Workflow Performance

10,000 Iterations

Job Granularity (Chunks)

Min

utes

per

Chu

nk

Manual job control constrains performance to the human scale (~10)

Automating job control enables managing scale that significantly improves job performance and resource utilization

Dynamic BLAST Grid Workflow

BLAST is a gene search algorithm

Dynamic BLAST breaks application steps and search apart and spreads effort across the grid

Good example of component and data parallelization

SCOOP – Coastal Ocean Observation and Prediction

SURA program to advance the sciences of prediction and hazard planning for coastal populations

Harvests cycles around the grid

Working with MCNC/Renci to use Cheaha via SURAgrid

Research Initiative Support

caBIG UAB Comprehensive Cancer Center funded to connect to caBIG Contributed to completion of Self-Assessment and

Implementation Plan Deploying Life Sciences Distribution to support research

workflows caBIG provides a very good model for service and infrastructure

abstractions

caGrid Bring BlazerID system to NIST Level 2 Exploring Integration of caGrid GAARDS AuthX Infrastructure

(GridGrouper)

caGrid Provides Tools For Many Research Domains

TaxonomyDevelopment

TravernaWorkflow Management

Education and Training

UAB 2007 HPC Boot Camp included sessions on grid computing and UABgrid Pilot launch

2008 HPC Bootcamp September 22, 2008

UAB 1st Annual CIDay in conjunction with ASA campus visit

CIS has taught graduate-level grid computing courses since fall 2003

Active participation in grid technology communities MardiGras08, OGF22, SURAgrid All-Hands, Internet2, caBIG

Open Development Model

UABgrid development work is done openly Outside groups are actively engaged in the

development of infrastructure (CIS, ENG, ASA, etc) Development group relies on the same services

available to all users (we eat our own dog food) Virtual organizations build on infrastructure and are

free to engage to their level of interest

Collaborative Development

Engaging User Groups and Service Providers to leverage Infrastructure

We are building our own solutions to depend on the grid

In order to build a grid, you need carrots – there has to be a benefit, even if it's long term

Grid services and development environment built on virtual machine foundation – key to expectation of “running from the cloud”

Engagement in a Regional Infrastructure Construction

Involved in SURAgrid since it's inception as a voluntary extension the the NSF's Middleware Initiative Tesbed

Have helped mold an organization that provides broad engagement across organizations in the development of infrastructure

SURAgrid Governance Committee just completed strategic plan to guide the next 4 years

Technology in Service of Research

IT expresses institutional initiatives IT doesn't necessarily do it but should help make it

possible To have leading research you need leading

infrastructure IT supports a leading edge infrastructure and services framework IT provides transparent interfaces to services and operations

Implement grid interfaces and conventions for our own services – “eat our own dog food”

Trust is the Foundation for Collaboration

People Use Technology They Trust Open Communication Channels

Researchers and Infrastructure communicate as peers Intra-organizational communication is fluid

Control Over Implementation Application requirements lead acquisitions

Service Partnership Researchers and Infrastructure work together to satisfy

organizational commitments

Important Issues are Guaranteed Service Researchers have authorized influence over Infrastructure because

are part of same organization

On The Horizon

Data Services UABgrid Backup

Implement using technologies that satisfy needs of the user community (eg. GridFTP, REDDnet

Focus on backup of VMs: putting our valuable data on-line...just like users would be expected to do

Data Stores Dspace, Fedora, Alfresco, Subversion

Metrics increase reliability confidence and maintain a pulse on the

impact of our solutions Resource Integration Guidlines High Speed to the Desktop

Acknowledgments

UAB Office of the Vice President for Information Technology

Collaborators at UAB in Computer and Information Sciences, the School of Engineering, the School of Public Health Section on Statistical Genetics, Comprehensive Cancer Center

Collaborators within SURAgrid, Internet2, and other organizations

A Closing Thought...

We are part of the Cyberinfrastructure

The reason for CI is to empower us as individuals to engage with others as we build community at UAB

and reach out to collaborate with other like minded communities around the globe