topics in bmi: grid computing

90
Grid-1 CSE 300 Topics in BMI: Grid Computing Topics in BMI: Grid Computing Jay Coppola Database Architect Information Technologies Department The University of Connecticut Health Center 263 Farmington Ave. MC-5210 Farmington, CT 06030 [email protected] (860) 679 - 1682 Grid Computing and its Applications in the Biomedical Informatics Domain May 2008

Upload: emmett

Post on 13-Jan-2016

35 views

Category:

Documents


0 download

DESCRIPTION

Topics in BMI: Grid Computing. Grid Computing and its Applications in the Biomedical Informatics Domain May 2008. Jay Coppola Database Architect Information Technologies Department The University of Connecticut Health Center 263 Farmington Ave. MC-5210 Farmington, CT 06030. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Topics in BMI: Grid Computing

Grid-1

CSE 300

Topics in BMI: Grid ComputingTopics in BMI: Grid Computing

Jay Coppola Database Architect

Information Technologies DepartmentThe University of Connecticut Health Center

263 Farmington Ave. MC-5210Farmington, CT 06030

[email protected](860) 679 - 1682

Grid Computing and its Applications in the

Biomedical Informatics DomainMay 2008

Page 2: Topics in BMI: Grid Computing

Grid-2

CSE 300

What is a Computer Grid?What is a Computer Grid?

A Computer Grid is a grouping of computer resources A Computer Grid is a grouping of computer resources (CPU, Disk, Memory, Peripherals, ect.) for use as a (CPU, Disk, Memory, Peripherals, ect.) for use as a single, albeit large and powerful, virtual computer. single, albeit large and powerful, virtual computer.

““Distributed computing across virtualized resources” Distributed computing across virtualized resources” [1].[1].

““Coordinates resources that are not subject to Coordinates resources that are not subject to centralized control… using standard, open, general-centralized control… using standard, open, general-purpose interfaces and protocols…to deliver non-purpose interfaces and protocols…to deliver non-trivial quality of service” [3]. trivial quality of service” [3].

Page 3: Topics in BMI: Grid Computing

Grid-3

CSE 300

What is a Computer Grid?What is a Computer Grid?

The basic premise is to have the ability to leverage as The basic premise is to have the ability to leverage as much of the unused CPU cycles (and other computer much of the unused CPU cycles (and other computer resources of the grid) as possible in order to execute a resources of the grid) as possible in order to execute a computer program more quickly.computer program more quickly. Years and Months Days and Hours!

Application requirements for grid usageApplication requirements for grid usage Application must be capable of executing remotely

on the distributed grid architecture. Application must be able to be subdivided into

smaller “jobs” to take advantage of parallel processing.

Required data must be available and without undo latency. May require replication of data sets across grid.

Page 4: Topics in BMI: Grid Computing

Grid-4

CSE 300

Why Grid?Why Grid?

The term “Grid” refers to the electric power “grid”.The term “Grid” refers to the electric power “grid”. Virtually unlimited power. Customer is abstracted from type and location of

power source. Nuclear, Coal, Solar, Wind… Local power plant or remote generation facility.

– Millstone 2, 3, or Niagara Falls.

Customer doesn’t know and doesn’t care! Pays on a per hour usage rate.

$.16/Killowatt Hour - $1.00/CPU Hour. Plug ‘n Play. Grid computing not quite there yet.

Page 5: Topics in BMI: Grid Computing

Grid-5

CSE 300

Grid TopologiesGrid Topologies Grids come in all sizes but are typically segregated Grids come in all sizes but are typically segregated

into 4 basic configurations or sizes that span physical into 4 basic configurations or sizes that span physical location as well as geo-political issues.location as well as geo-political issues. Cluster – Similar H/W and S/W on a local LAN Intragrid – Dissimilar H/W and S/W on a local

LAN (departments in same company share) Extragrid – Two or more Intragrids spanning

across LAN’s, typically within the same geopolitical environment (Corporate wide)

Intergrid - A world wide combination of multiple Extragrid’s Possible dedicated H/W and standalone Mainframe and

Super-computer systems. Spans corporations as well as counties. Internet or private network backbone.

Page 6: Topics in BMI: Grid Computing

Grid-6

CSE 300

Computer Grid Topologies (Intergrid) Computer Grid Topologies (Intergrid)

Intragrid

Cluster

Extragrid

Page 7: Topics in BMI: Grid Computing

Grid-7

CSE 300

Computer Grid TypesComputer Grid Types

Computer Grids are divided into 3 types:Computer Grids are divided into 3 types:

Computational (CPU) Grid Most Common (and mature) of all grids. Logical extension of Distributed Computing. CPU Cycles.

Data Grid Focuses on the data storage capacity as the main shared resource. Manage massive data sets ranging in size from Mega (106) bytes

to Peta (1015) bytes. Network Grid

Focuses on the communication aspects of the available resources. Provide fault tolerant high-performance communication services.

Each grid type requires some aspect of the other to be truly functional.

Page 8: Topics in BMI: Grid Computing

Grid-8

CSE 300

Grid Benefits and IssuesGrid Benefits and Issues Exploit under utilized resourcesExploit under utilized resources

Business desktop PC’s utilized 5% [2]. 2 – 3 GHz Dual/Quad CPU’s, 1(+) Gigabytes of

memory, .5 to 1 Terabyte disk, Gigabyte Ethernet. Servers also under utilized with even more

performance and resources. Performance and capacity continually growing.

This unutilized computing power can be exploited by This unutilized computing power can be exploited by a computer grid architecture.a computer grid architecture. More efficient use of under utilized H/W. Create a super computer for the cost of software!

May require application rewrite. “Grid Ready” app.May require application rewrite. “Grid Ready” app. Remote Grid “node” (computer) must meet any Remote Grid “node” (computer) must meet any

special H/W, S/W, or resource requirements of the special H/W, S/W, or resource requirements of the executing App.executing App.

Page 9: Topics in BMI: Grid Computing

Grid-9

CSE 300

Benefits and Issues – Parallel CPU CapacityBenefits and Issues – Parallel CPU Capacity

Computer Grid offers the potential for massive Computer Grid offers the potential for massive parallel processing.parallel processing.

To truly exploit a grid the application must be To truly exploit a grid the application must be subdivided into multiple sub-jobs for parallel subdivided into multiple sub-jobs for parallel processing.processing.

Not practical or workable for many applications.Not practical or workable for many applications. Currently no practical tool exists that can transform an Currently no practical tool exists that can transform an

arbitrary app into sub-jobs to take advantage of arbitrary app into sub-jobs to take advantage of parallel processing.parallel processing.

Applications that can be subdivided will experience Applications that can be subdivided will experience huge performance gains! huge performance gains!

Page 10: Topics in BMI: Grid Computing

Grid-10

CSE 300

Benefits and Issues – Virtual ResourcesBenefits and Issues – Virtual Resources

Virtualization of ResourcesVirtualization of Resources Fundamental point of Grid

Physical characteristics are abstracted. Underlying H/W and S/W is transparent to the Grid

user. User “sees” one large and powerful computer system. User can focus on the Task not the computer system.

Page 11: Topics in BMI: Grid Computing

Grid-11

CSE 300

Benefits and Issues Benefits and Issues Access to Additional ResourcesAccess to Additional Resources

Each Computer (Node) of the Grid adds its resources Each Computer (Node) of the Grid adds its resources to the entire grid.to the entire grid. CPU, Memory, Disk, and N/W. Software Licenses. Specialized Peripherals

Remote controlled Electron Microscope Sensors May require reservation system to guarantee

availability.

Page 12: Topics in BMI: Grid Computing

Grid-12

CSE 300

Benefits and Issues Benefits and Issues Resource Balancing and ReliabilityResource Balancing and Reliability

Grid system maintains metadata about resourcesGrid system maintains metadata about resources Availability of node Available resources on a particular node Average throughput/performance Failure detection/long executing jobs

System will direct a sub-job to an available node that System will direct a sub-job to an available node that can support the performance required.can support the performance required. If a node is busy, redirect to a different node

System can detect failed nodes and sub-jobsSystem can detect failed nodes and sub-jobs Restart a job on same or different node Resubmit job to a different node.

Page 13: Topics in BMI: Grid Computing

Grid-13

CSE 300

Benefits and IssuesBenefits and IssuesManagement and Virtual OrganizationsManagement and Virtual Organizations

Grid system can manage priorities among different Grid system can manage priorities among different projects and jobsprojects and jobs Requires cooperation among grid uses.

Virtual Organizations (VO)Virtual Organizations (VO) Political entity. Formed by users, groups, teams, companies,

countries… Form a collaboration among users to achieve a

common goal. Defines\Provides protocols and mechanisms for

access to resources. Can be stand-alone or a hierarchy of regional,

national, or international VO’s.

Page 14: Topics in BMI: Grid Computing

Grid-14

CSE 300

Benefits and Issues - SecurityBenefits and Issues - Security Important issue made even more important in a Grid Important issue made even more important in a Grid

architecture.architecture. Application and Data is now exposed to multiple

computers (nodes) any of which may be directly communicating with or executing your application.

Addressed with Authentication, Authorization, and Encryption. Each node must be authenticated by the “grid” it

belongs to. Once authenticated, authorization can be given to

specific nodes to allow it to perform certain tasks. Encryption required for communication intercept

issues. Use technologies such as Key Encryption,

Certificate Authority (CA), Digital Certificates, and SSL.

Page 15: Topics in BMI: Grid Computing

Grid-15

CSE 300

Software ComponentsSoftware Components Grid system requires a layer of software to manage the Grid system requires a layer of software to manage the

grid (middleware). grid (middleware). Management tasksManagement tasks

Scheduling of jobs. Resource availability monitoring. Node capacity and utilization information

gathering. Job status for recovery.

Local node S/WLocal node S/W Needed to allow node to accept a sub-job for

execution. Allow it to register its resources to the grid. Monitor job progress and send status to grid.

Page 16: Topics in BMI: Grid Computing

Grid-16

CSE 300

Software Components – Job SchedulerSoftware Components – Job Scheduler Major component of Grid “Middleware”Major component of Grid “Middleware”

Can vary in complexity Blindly submit jobs round-robin. Job queuing system with several priority queues.

Advanced features include: Maintain metadata for each node

– Performance

– Resources

– Availability (idle/busy)

– Status (On-line/Off-line)

Automatically find the most appropriate node for the next job in queue.

Job monitoring for recovery

Page 17: Topics in BMI: Grid Computing

Grid-17

CSE 300

Software Components – Node SoftwareSoftware Components – Node Software Grid systems can have 2 different node typesGrid systems can have 2 different node types

Resource only – No job submission. Participating node – Can submit a job as well.

Every node of a grid system requires interface S/W Every node of a grid system requires interface S/W regardless of typeregardless of type

All Nodes require…All Nodes require… Monitoring software that notifies the grid

middleware\scheduler about… Node availability Current load Available resources Status of grid management software

Page 18: Topics in BMI: Grid Computing

Grid-18

CSE 300

Software Components – Node SoftwareSoftware Components – Node Software All nodes require (continued)…All nodes require (continued)…

Software that allows the node to accept and execute a job Node S/W must accept the executable file or select the appropriate

file from a local copy. Locate any required dataset whether local or remotely located. Communicate job status during execution. Return results once completed. Allow for communication between sub-jobs whether local to that

node or not. Dynamically adjust priorities of a job to meet a “level of service”

requirement of others. Participating grid node has additional requirementsParticipating grid node has additional requirements

Allow jobs to be submitted to the grid scheduler May have its own scheduler or an interface to the grids common

scheduler.

Page 19: Topics in BMI: Grid Computing

Grid-19

CSE 300

Globus ToolkitGlobus Toolkit

Framework for creating a Grid solution.Framework for creating a Grid solution.

Created by the Globus Alliance (http://www.ggf.org ).Created by the Globus Alliance (http://www.ggf.org ).

80% of all computer grid systems are implemented 80% of all computer grid systems are implemented using a version of the Globus Toolkit [2]using a version of the Globus Toolkit [2]

Current release is Version 4.0.6 – GT4Current release is Version 4.0.6 – GT4

Version 3 introduced SOA to the framework.Version 3 introduced SOA to the framework.

Version 4 expanded SOA and leverages Web Service Version 4 expanded SOA and leverages Web Service (WS) as the underlying technology.(WS) as the underlying technology.

Page 20: Topics in BMI: Grid Computing

Grid-20

CSE 300

Globus ToolkitGlobus Toolkit Globus Grid Forum (GGF)Globus Grid Forum (GGF)

Created Open Grid Service Architecture (OGSA) Utilizes SOA for Grid implementation.

Two OGSA-compliant Grid Service Implementations based on Web Service (WS) architecture Open Grid Service Interface (OGSI) Web Service Resource Framework (WSRF)

WSRF is the latest and most true to the WS architecture Utilizes standard XML schemas Provides distinction between the service and the state of the

service which is required for GS. Defines a WS-Resource which includes “State” data using WS’s.

– Maintained in an XML document.

– Defines life cycle.

– Known to and accessed by one or more WS’s.

Page 21: Topics in BMI: Grid Computing

Grid-21

CSE 300

Globus Toolkit - ComponentsGlobus Toolkit - Components Components of the GT4 are segregated into 5 Components of the GT4 are segregated into 5

categoriescategories Common Runtime Security Data Management Monitor and Discovery Execution

Not all components pieces are implemented as SOA.Not all components pieces are implemented as SOA.

Not all component pieces are full operational code.Not all component pieces are full operational code. Partial functional - starting point for a full

implementation.

Page 22: Topics in BMI: Grid Computing

Grid-22

CSE 300

Globus Toolkit - ComponentsGlobus Toolkit - ComponentsComponentComponent Web service based componentsWeb service based components Non Web service Non Web service

based componentsbased components

Common Common Runtime Runtime ComponentComponent

Java WS Java WS

CoreCore C WS CoreC WS Core Python WS Python WS

CoreCore C C Common Common LibrariesLibraries

eXtensible eXtensible IO (XIO)IO (XIO)

Security Security components components

WS WS authentication authentication and and

authorizationauthorization

Community Community Authorization Authorization Services Services

(CAS)(CAS)

Delegation Delegation

servicesservices Pre-WS Pre-WS authenticatauthentication and ion and authorizatiauthorizationon

Credential Credential

ManagementManagement

Data Data Management Management components components

Reliable File Reliable File Transfer Transfer

(RTF)(RTF)

OGSA-DAI OGSA-DAI Data Data Replication Replication Services Services

(DRS)(DRS)

GridFTP GridFTP Replica Replica Location Location Services Services

(RLS)(RLS) Monitoring Monitoring and Discovery and Discovery

ServicesServices

Index Index

ServiceService Trigger Trigger

ServiceService Aggregator Aggregator FrameworkFramework

WebWebMDSMDS

MDS2MDS2

Execution Execution ManagementManagement

WS GRAMWS GRAM Community Community Scheduler Scheduler Framework 4 Framework 4 (CSF 4) (CSF 4)

Globus Globus TeleoperatioTeleoperations Control ns Control ProtocolProtocol

(GTCP)(GTCP)

Workspace Workspace Management Management ServiceService

(WMS)(WMS)

Pre WS Pre WS GRAMGRAM

Page 23: Topics in BMI: Grid Computing

Grid-23

CSE 300

Globus Toolkit - ComponentsGlobus Toolkit - Components Common Runtime ComponentsCommon Runtime Components

“Building Blocks” for most toolkit components Web Services implemented in 3 languages:

Java C Python (PyGridware)

All 3 consist of API’s and tools that implement the WSRF and WS-Notification standards.

Act as base components for various default services.

Java WS Core provides the development base library and tools for custom WSRF services.

Page 24: Topics in BMI: Grid Computing

Grid-24

CSE 300

Globus Toolkit - ComponentsGlobus Toolkit - Components eXtensible IO (XIO)eXtensible IO (XIO)

Extensible I/O library written in C Provides single API

Supports multiple protocols Implementation encapsulated as drivers Framework for error handling Asynchronous message delivery Timeouts

Driver approach Supports concept of driver stacks Maximizes code reuse Written as atomic units and “stacked” on top of one

another.

Page 25: Topics in BMI: Grid Computing

Grid-25

CSE 300

Globus Toolkit - ComponentsGlobus Toolkit - Components

Security ComponentsSecurity Components Implemented using Grid Security Infrastructure

(GSI) Utilizes public key cryptography as basis

Three primary functions of GSI Provide secure authentication and confidentiality

between elements of the grid. Provide support for security across organizational

boundaries i.e. no centrally-managed security system. Supports “single-sign-on” for grid users

Page 26: Topics in BMI: Grid Computing

Grid-26

CSE 300

Globus Toolkit - ComponentsGlobus Toolkit - Components Security Components (continued) - Authentication and Security Components (continued) - Authentication and

AuthorizationAuthorization Enabled with message-level and transport-level

security for SOAP communication of WS. Also provides an Authorization framework for

container-level authorization. Community Authorization Service (CAS)Community Authorization Service (CAS)

Provides access control to VO’s. Grants fine-grain permissions of subsets of

resources to VO members. Extensible to multiple services Currently supported by the GridFTP service

Page 27: Topics in BMI: Grid Computing

Grid-27

CSE 300

Globus Toolkit - ComponentsGlobus Toolkit - Components Security Components (continued) - Delegation Security Components (continued) - Delegation

Services Services Allows a single delegate credential to be used by

many services. Also supports credential renewal interface.

Capable of extending a credentials valid date SimpleCASimpleCA

Simplified Certificate Authority Uses pre-WS Authentication, Authorization and

OpenSSL. Fully functional Public Key Infrastructure (PKI) Suggest to be used for testing only – commercial CA

solution should be utilized for production

Page 28: Topics in BMI: Grid Computing

Grid-28

CSE 300

Globus Toolkit - ComponentsGlobus Toolkit - Components Security Components (continued) – MyProxySecurity Components (continued) – MyProxy

Online credential repository. Stores X.509 proxy credentials protected by pass

phrase Eliminates need for manual copying of private

keys and cert files between nodes Used for authentication to grid portals and

credential renewal with job managers GSI-OpenSSHGSI-OpenSSH

Modified version of OpenSSH with added support for GSI authentication

Permits file transfer between systems without user ID and PW prompting.

Page 29: Topics in BMI: Grid Computing

Grid-29

CSE 300

Globus Toolkit - ComponentsGlobus Toolkit - Components Data Management ComponentsData Management Components

Set of tools concerned with location, transfer, and management of distributed data.

2 basic categories- Data Movement Data Replication

Data Movement – GridFTP Provides secure reliable data transfer between nodes Based of FTP standard with additional Grid features

– Added 3rd party transfer. Data Movement – Reliable File Transfer (RFT)

Provides WS interface for transfer and deletion of files. Receives requests via SOAP over HTTP and uses GridFTP to

perform the actual work. Utilizes a db to store list of files and their “state” for recovery if

interrupted.

Page 30: Topics in BMI: Grid Computing

Grid-30

CSE 300

Globus Toolkit - ComponentsGlobus Toolkit - Components Data Management ComponentsData Management Components

Data Replication - Replica Location Service (RLS) Maintains access information about location of

replicated data. Can map multiple physical replicas to ne single logical

file, enabling data redundancy on a grid. Data Replication – OGSA-DAI

Open Grid Service Architecture – Data Access & Integration.

General grid interface for access data resources via WS– Databases and XML repositories

Supports query languages– SQL, XPath, and XQuery

Page 31: Topics in BMI: Grid Computing

Grid-31

CSE 300

Globus Toolkit - ComponentsGlobus Toolkit - Components

Data Management ComponentsData Management Components Data Replication Services (DRS)

WSRF compliant WS– Exposes WS_Resource (Replicator resource)

– Allows users to query the resource properties to monitor the “state” of the resource.

Supports locating of file sets and creating local replicas– GridFTP for file transfer

New replicas are registered in the Replication Location Service (RLS)

Page 32: Topics in BMI: Grid Computing

Grid-32

CSE 300

Globus Toolkit - ComponentsGlobus Toolkit - Components Monitor and Discovery System (MDS)Monitor and Discovery System (MDS)

Suite of WS concerned with collection, distribution, indexing, archival, and processing of grid resource availability and “state”.

MDS4 WSRF and WS_Notification compliant version in GT4

Aggregator Framework Framework for building services that collect and

aggregate data. (Aggregator Services) Collects data from 3 source type (Information

Provider)– Query, Subscription, and Execution sources– Source data for Query and Subscription is a WSRF-compliant

service.– Source data for execution is an executable program

Page 33: Topics in BMI: Grid Computing

Grid-33

CSE 300

Globus Toolkit - ComponentsGlobus Toolkit - Components Monitor and Discovery System (MDS)Monitor and Discovery System (MDS)

Aggregator Framework - Index service (IS) Central component of MDS services of GT4. Default instance exposed as a WSRF service. Collects resource information from multiple sources Publishes it in a repository for discovery Repository queried using XPath. VO can configure local index service to track relevant

sources in their domain. Key features

– Configurable in a hierarchy – but no single global index exists with all information regarding all resources.

– Information published is recent but not latest.– Existence does not guarantee availability.– Requires periodic refreshing.

Page 34: Topics in BMI: Grid Computing

Grid-34

CSE 300

Globus Toolkit - ComponentsGlobus Toolkit - Components Monitor and Discovery System (MDS)Monitor and Discovery System (MDS)

Aggregator Framework - Trigger Services Collects and compares resource information against a

set of conditions– Conditions are defined in a configuration file.

Conditions are specified as an XPath expression.

WebMDS Web-based interface to WSRF resource properties. Used as a user-friendly interface to the index service. Uses standard resource property requests. Displays results in several formats.

Page 35: Topics in BMI: Grid Computing

Grid-35

CSE 300

Globus Toolkit - ComponentsGlobus Toolkit - Components Execution ManagementExecution Management

Concerned with all aspects of remote computation Initiation, Monitoring, Management, and Scheduling

Utilizes Grid Resource Allocation and Management (GRAM)

Typically deployed with Delegation and RFT services. API’s implemented in C, Java, and Python

Execution Management - WS GRAMExecution Management - WS GRAM Grid service for remote execution and management of

jobs. SOAP messaging for communication between clients

(nodes) WS GRAM submits job to local scheduler for execution Collaborates with RFT service for staging any required

files.

Page 36: Topics in BMI: Grid Computing

Grid-36

CSE 300

Globus Toolkit - ComponentsGlobus Toolkit - Components Community Scheduler Framework 4 (CSF4)Community Scheduler Framework 4 (CSF4)

WSRF-compliant tool for grids that have multiple job schedulers.

Provides intelligent, policy-based meta-scheduling facility.

Enables a single interface for different resource managers. Globus Teleoperations Control Protocol (GTCP)Globus Teleoperations Control Protocol (GTCP)

Service interface for telecontrol. WSRF version of NEESgrid Teleoperations Control

Protocol (NTCP) Controls heterogeneous instrumentation. High-res cameras, Electron microscopes, ect.

Dynamic AccountsDynamic Accounts Allows Grid client to dynamically create, manage and

delete user accounts on remote UNIX sites.

Page 37: Topics in BMI: Grid Computing

Grid-37

CSE 300

BMI Examples of Grid Usage BMI Examples of Grid Usage TeraGrid TeraGrid

Largest non-military grid implementation in USA. Network of super computers

250 teraflops (trillion floating point operations/second) 30 petabytes of secondary storage (disk) 40 Gbps network backbone High-Resolution visualization environment Toolkit for grid computing

National Science Foundation (NSF) Terascale initiative Create an infrastructure of unbound capacity and scope

connecting Universities and organizations with the fastest cross-country backbone in existence.

Page 38: Topics in BMI: Grid Computing

Grid-38

CSE 300

BMI Examples of Grid UsageBMI Examples of Grid Usage

TeraGridTeraGrid Currently composed of 11 super computers across

the USA. Each site contributes resources and expertise to

create the largest computer grid in USA. Primary usage is to support scientific research Medical field usage:

Brain imaging. Drug interaction with cancer cells.

Page 39: Topics in BMI: Grid Computing

Grid-39

CSE 300

BMI Examples of Grid UsageBMI Examples of Grid Usage TeraGrid – 11 SitesTeraGrid – 11 Sites

Indiana University (IU) “Big Red” - Big Red is a distributed shared-memory cluster,

consisting of 768 IBM JS21 Blades, each with two dual-core PowerPC 970 MP processors, 8GB of memory, and a PCI-X Myrinet 2000 adapter for high-bandwidth, low-latency Message Passing Interface (MPI) applications.

Joint Institute for Computational Sciences (JICS) University of Tennessee and ORNL Future expansions are being planned that would add a 40-teraflops

Cray XT3 system to the TeraGrid. Additional plans to expand to a 170 teraflops Cray XT4 system

which in turn will be upgraded to a 10,000+ compute socket Cray system of approximately 1 petaflop.

Page 40: Topics in BMI: Grid Computing

Grid-40

CSE 300

BMI Examples of Grid UsageBMI Examples of Grid Usage TeraGrid – 11 SitesTeraGrid – 11 Sites

Louisiana Optical Network Initiative (LONI) “Queen Bee”, the core cluster of LONI, is a 50.7

Teraflops Peak Performance 668 node Dell PowerEdge 1950 cluster running the Red Hat Enterprise Linux 4 operating system. Each node contains two Quad Core Intel Xeon 2.33GHz 64-bit processors and 8 GB of memory.

The cluster is interconnected with 10 GB/sec Infniband and has 192 TB of storage in a Lustre file system.

Half of Queen Bee's computational cycles have been contributed to the TeraGrid community.

Page 41: Topics in BMI: Grid Computing

Grid-41

CSE 300

BMI Examples of Grid UsageBMI Examples of Grid Usage TeraGrid – 11 SitesTeraGrid – 11 Sites

Oak Ridge National Laboratory (ORNL) More of a user than a provider. Their users of neutron science facilities (the High Flux

Isotope Reactor and the Spallation Neutron Source) will be able to access TeraGrid resources and services for their data storage, analysis, and simulation.

National Center for Supercomputing Applications (NCSA) University of Illinois Urbana-Champaign Provides 10 teraflops of capability computing through

its IBM Linux cluster, which consists of 1,776 Itanium2 processors.

The NCSA also includes 600 terabytes of secondary storage and 2 petabytes of archival storage capacity.

Page 42: Topics in BMI: Grid Computing

Grid-42

CSE 300

BMI Examples of Grid UsageBMI Examples of Grid Usage TeraGrid – 11 SitesTeraGrid – 11 Sites

Pittsburgh Supercomputing Center (PSC) Provides computational power via its 3,000-processor

HP Alpha Server system, TCS-1, which offers 6 teraflops of capability coupled uniquely to a 21-node visualization system. It also provides a 128-processor, 512-gigabyte shared-memory HP Marvel system, a 150-terabyte disk cache, and a mass storage system with a capacity of 2.4 petabytes.

Purdue University Provide 6 teraflops of computing capability 400 terabytes of data storage capacity Visualization resources, access to life science data sets,

and a connection to the Purdue Terrestrial Observatory.

Page 43: Topics in BMI: Grid Computing

Grid-43

CSE 300

BMI Examples of Grid UsageBMI Examples of Grid Usage TeraGrid – 11 SitesTeraGrid – 11 Sites

San Diego Supercomputer Center (SDCS) Leads the TeraGrid data and knowledge management

effort. Provides a data-intensive IBM Linux cluster based on

Itanium processors, that reaches over 4 teraflops and 540 terabytes of network disk storage.

In addition, a portion of SDSC’s IBM 10-teraflops supercomputer is assigned to the TeraGrid.

An IBM HPSS archive currently stores a petabyte of data.

Page 44: Topics in BMI: Grid Computing

Grid-44

CSE 300

BMI Examples of Grid UsageBMI Examples of Grid Usage TeraGrid – 11 SitesTeraGrid – 11 Sites

Texas Advanced Computing Center (TACC) Provides a 1024-processor Cray/Dell Xeon-based Linux

cluster A 128-processor Sun E25K Terascale visualization

machine with 512 gigabytes of shared memory Total of 6.75 teraflops of computing/visualization

capacity. Provides a 50 terabyte Sun storage area network. Only half of the cycles produced by these resources are

available to TeraGrid users.

Page 45: Topics in BMI: Grid Computing

Grid-45

CSE 300

BMI Examples of Grid UsageBMI Examples of Grid Usage TeraGrid – 11 SitesTeraGrid – 11 Sites

University of Chicago/Argonne National Laboratory (UC/ANL) Provides users with high-resolution rendering and

remote visualization capabilities via a 1-teraflop IBM Linux cluster with parallel visualization hardware.

National Center for Atmospheric Research (NCAR) Located in Boulder, CO. “Frost” - BlueGene/L computing system. The 2048-

processor system brings 250 teraflops of computing capability and more than 30 petabytes of online and archival data storage to the TeraGrid.

Page 46: Topics in BMI: Grid Computing

Grid-46

CSE 300

BMI Examples of Grid UsageBMI Examples of Grid Usage TeraGrid ApplicationsTeraGrid Applications

The Center for Imaging Science (CIS) at Johns Hopkins University has deployed a shape-based morphometric tools on the TeraGrid to support the Biomedical Informatics Research Network, a National Institute of Health initiative involving 15 universities and 22 research groups whose work centers on brain imaging of human neurological disorders and associated animal models.

University of Illinois, Urbana-Champaign has a project that uses massive parallelism on the TeraGrid for major advances in the understanding of membrane proteins.

Another project is also harnessing the TeraGrid to attack problems in the mechanisms of bioenergetic proteins, the recognition and regulation of DNA by proteins, the molecular basis of lipid metabolism, and the mechanical properties of cells.

Page 47: Topics in BMI: Grid Computing

Grid-47

CSE 300

BMI Examples of Grid UsageBMI Examples of Grid Usage GridMol - Molecular modeling on a Computer GridGridMol - Molecular modeling on a Computer Grid

Molecular visualization and modeling tool. Study of geometry and properties of molecules. GridMol Features Include…

Modifying bond lengths and angles. Change dihedral angles. (the angle between two planes that are

determined by three connected atoms) Adding or deleting atoms. Adding radicals.

Globus Toolkit based Scheduling tool is non-GT middleware.

Coded in Java, Java 3D, C/C++, and OpenGL. Standalone application or applet for browser. Runs on the China National Grid (CNG)

Composed of 8 super computer sites across China

Page 48: Topics in BMI: Grid Computing

Grid-48

CSE 300

BMI Examples of Grid UsageBMI Examples of Grid Usage GridMol - Molecular modeling on a Computer GridGridMol - Molecular modeling on a Computer Grid

GridMol System Overview

GridMol CNGrid

HPC1

HPC3

HPC2

HPC4

Modeling

JobSubmission

Visualization

CNGrid Middle-ware

GlobusToolkit

Page 49: Topics in BMI: Grid Computing

Grid-49

CSE 300

BMI Examples of Grid UsageBMI Examples of Grid Usage GridMol - Molecular modeling on a Computer GridGridMol - Molecular modeling on a Computer Grid

Overview figure points – A job is submitted to the CNGrid Middleware which will execute

the application on available High Performance Computer systems (HPC) based on performance requirements.

GridMol maintains a history of job descriptions to remember the jobs for future operations.

After the job is submitted users can query the status of the job to determine if it has been successfully submitted or has failed.

After a job has finished GridMol can be used to analyze the results using several different visualization tools.

GridMol complete abstracts the underlying grid infrastructure from the user. Users do not need to know how to submit a job to the grid or on which HPC(s) it will run, which allows the research to focus on the molecule modeling problem and not be bothered with issues related to using the computer grid system.

Page 50: Topics in BMI: Grid Computing

Grid-50

CSE 300

BMI Examples of Grid UsageBMI Examples of Grid Usage GridMol - Molecular modeling on a Computer GridGridMol - Molecular modeling on a Computer Grid

GridMol supports six different molecule display models.

Different display models highlight different aspects of a molecule, each having its unique advantages and disadvantages. Line Model - Bonds are shown as lines while atoms

are not displayed.

Page 51: Topics in BMI: Grid Computing

Grid-51

CSE 300

BMI Examples of Grid UsageBMI Examples of Grid Usage GridMol - Molecular modeling on a Computer GridGridMol - Molecular modeling on a Computer Grid

Different display models Ball and Stick Model – All atoms are shown as spheres

of different size and all bonds are shown as cylinders of different lengths.

Page 52: Topics in BMI: Grid Computing

Grid-52

CSE 300

BMI Examples of Grid UsageBMI Examples of Grid Usage GridMol - Molecular modeling on a Computer GridGridMol - Molecular modeling on a Computer Grid

Different display models Space-filled Model – All atoms are shown as spheres of

different size and color according to their van der Walls radius and atom types while bonds are not displayed. This model provides a good way to understand the volume of the molecule being studied.

Page 53: Topics in BMI: Grid Computing

Grid-53

CSE 300

BMI Examples of Grid UsageBMI Examples of Grid Usage GridMol - Molecular modeling on a Computer GridGridMol - Molecular modeling on a Computer Grid

Different display models Tube Model – For large molecules such as protein,

DNA, or RNA, only the backbone atoms are displayed with cylinders.

Page 54: Topics in BMI: Grid Computing

Grid-54

CSE 300

BMI Examples of Grid UsageBMI Examples of Grid Usage GridMol - Molecular modeling on a Computer GridGridMol - Molecular modeling on a Computer Grid

Different display models Ribbon Model – In this model (also for large

molecules) the backbone atoms and additional information are used to draw directed ribbons.

Page 55: Topics in BMI: Grid Computing

Grid-55

CSE 300

BMI Examples of Grid UsageBMI Examples of Grid Usage GridMol - Molecular modeling on a Computer GridGridMol - Molecular modeling on a Computer Grid

Different display models Cartoon Model – This model is used to display a

protein’s secondary structure including helix, sheet, and coil.

Page 56: Topics in BMI: Grid Computing

Grid-56

CSE 300

BMI Examples of Grid UsageBMI Examples of Grid Usage

Genetic ResearchGenetic Research Analysis and discovery of Gene sequences

requires massive computational power. Current techniques in this field generate

increasing amounts of complex data sets. Knowledge discovery algorithms Data Mining Remote collaboration between experts

Perfect candidate for computational grid architecture.

Page 57: Topics in BMI: Grid Computing

Grid-57

CSE 300

BMI Examples of Grid UsageBMI Examples of Grid Usage Genetic Research – GNAREGenetic Research – GNARE

Genomic Analysis Research Environment High-performance scalable computer system for

efficient automation of the major steps in genomic analysis.

Steps including…– Data acquisition from different genomic DBs.

– Genome analysis by several bioinformatics tools and algorithms.

– Storing results on analysis and annotations.

Utilizes the resources of several Grid systems– Grid2003

– TeraGrid

– Department Of Energy (DOE)

Page 58: Topics in BMI: Grid Computing

Grid-58

CSE 300

BMI Examples of Grid UsageBMI Examples of Grid Usage Genetic Research – GNAREGenetic Research – GNARE

Composed of three major parts GUDA Integrated Database Web-based apps to run the system.

GUDA “Heart” of the system. Gateway to the grid, handling all computational

analysis. Automated, scalable, high-throughput workflow engine. Executes computationally intensive workflows on the

grid. Interfaces to the Integrated database.

Page 59: Topics in BMI: Grid Computing

Grid-59

CSE 300

BMI Examples of Grid UsageBMI Examples of Grid Usage Genetic Research – GNAREGenetic Research – GNARE

GNARE Architecture

Page 60: Topics in BMI: Grid Computing

Grid-60

CSE 300

BMI Examples of Grid UsageBMI Examples of Grid Usage Genetic Research – GNAREGenetic Research – GNARE

Integrated Database Holds the genome sequence data and annotations from

monitored public databases. Hold results of data analysis from GADU update

engine. Web-based applications

Front-end for GUDA’s analysis services. Integrated database.

GADU details Heart of the system Diagram on next slide…

Page 61: Topics in BMI: Grid Computing

Grid-61

CSE 300

BMI Examples of Grid UsageBMI Examples of Grid Usage Genetic Research – GNAREGenetic Research – GNARE

GADU Architecture

7

Page 62: Topics in BMI: Grid Computing

Grid-62

CSE 300

BMI Examples of Grid UsageBMI Examples of Grid Usage Genetic Research – GNAREGenetic Research – GNARE

GADU detail The job description file (item 3) describes the job

– May involve simply running a bioinformatics tool on the local machine.

– May execute a predefined complex workflow on the grid Using the information in this file the Job Processing Server (item

1) along with the Workflow Generator (Item 5) creates the actual workflow in a Virtual Data Language.

The Job Processing Server (item 1) accepts a Job Description File (item 3) and creates a worker process (item 2) to handle the job. The Site Selector (Item 4) and the Workflow Generator (item 5) are also involved.

The Job Processing Server also creates a session for each job and controls the Site Selector (item 4) to keep an updated list of working sites for job submission.

The worker process (item 2) determines how to handle each job based on the information Job Description File (item 3).

The worker process first creates the directory structure for the job and then sends the Sequence Database (item 7) to all usable sites on the grid as determined by the Site Selector (item 4).

Page 63: Topics in BMI: Grid Computing

Grid-63

CSE 300

BMI Examples of Grid UsageBMI Examples of Grid Usage Genetic Research – GNAREGenetic Research – GNARE

Additional uses include Bio-Defense research Structural Biology Bioremediation

Benefits Reduced Human intervention to process genome

sequences. Dramatic reduction in time to process a sequence. Simplifies the analysis of newly sequenced genomes.

Page 64: Topics in BMI: Grid Computing

Grid-64

CSE 300

BMI Examples of Grid UsageBMI Examples of Grid Usage Genetic Research – Grid-AllegroGenetic Research – Grid-Allegro

“Marries” the Allegro genome analysis S/W with a computational grid architecture. Goal - Improve Performance

Allegro is a genome linkage analysis tool Readily sub-dividable into multiple sub-jobs.

– Perfect for the massive parallel processing power of a grid.

Genome type simulation on stand-alone PC is “doable”– But…takes weeks or months depending on the number of

genome types and pedigrees.– Grid can help with this!

Grid-Allegro implemented on Swegrid VO. Globus-based Swedish national computer grid. 600 computers in 6 clusters located across Sweden.

Page 65: Topics in BMI: Grid Computing

Grid-65

CSE 300

BMI Examples of Grid UsageBMI Examples of Grid Usage Genetic Research – Grid-AllegroGenetic Research – Grid-Allegro

Swegrid VO Member of the NorduGrid VO

– Created in 2001 with 2500 processors Grid-Allegro Environment

Local http serverMaster node

SWEGRID/NORDU GRID

GlobusMiddleware

Remote grid Clusters/Remote Workers

Page 66: Topics in BMI: Grid Computing

Grid-66

CSE 300

BMI Examples of Grid UsageBMI Examples of Grid Usage Genetic Research – Grid-AllegroGenetic Research – Grid-Allegro

Grid-Allegro system overview Two programs, written in Perl, were created to

implement the Grid-Allegro system. (Gridallegrosteep1.pl) runs locally on the master node

– prepare the input files that will be submitted to the grid

– create a specific number of grid jobs using the Globus Resource Specification Language (RSL).

The Grid-broker program (the second program named gridallegrosteep2.pl) handles the distribution of the jobs to the remote nodes of the grid.– constantly evaluating the status of each job, managing re-

submissions in the case of failure or excessive delay in a grid scheduling queue

– collects the output results when completed by a remote node.

Page 67: Topics in BMI: Grid Computing

Grid-67

CSE 300

BMI Examples of Grid UsageBMI Examples of Grid Usage Genetic Research – Grid-AllegroGenetic Research – Grid-Allegro

Performance improvements Grid-Allegro system was tested with an analysis of

Swedish families with Alzheimer’s disease (AD). The study was conducted on 109 families consisting of

470 individuals. A test of this size (requiring 1000 simulations) using

the Allegro system on a 2 GHz/512 Mb PC was calculated to take 1200 days or 3.2 years.

Adding in application movement latency of 12 hours and running with the full complement of nodes (600) the complete analysis took 2.6 days

62.4 hours versus 1200 days this is a 461-fold improvement.

No Hardware costs only software!

Page 68: Topics in BMI: Grid Computing

Grid-68

CSE 300

BMI Examples of Grid UsageBMI Examples of Grid Usage Medical ImagingMedical Imaging

Mammography Important tool for detection of Breast Cancer Old film-based images made storing and retrieving difficult.

– Must be converted to digital. Modern images are digital – no conversion required. Benefits of digital images

– Allows for faster retrieval and more efficient storage.– Allows for analysis by doctors and researchers in remote locations.– Easier access to images by hospitals, universities, and research

facilities would improve breast cancer screening and diagnosis Need for a system that is able to provide large-scale digital image

storage and analysis services, allow multiple medical sites to store, process, and data-mine the images, manage mammograms as digital images, and make theses images available to other hospitals, universities, and research institutions.

Data grid architecture would be an excellent choice for these requirements.

Page 69: Topics in BMI: Grid Computing

Grid-69

CSE 300

BMI Examples of Grid UsageBMI Examples of Grid Usage Medical ImagingMedical Imaging

Mammography System requirements A typical digital mammographic image is 32MB with

four images taken per patient-examination for a total of 128 MB per patient [3].

Record would need to includes patient demographic, attending physician information, and related examination notes.

Must be scalable to support thousands of patients with thousands more added every year for multiple hospitals and clinics.

Patient privacy and image access must be controlled to protect against unauthorized viewing and modification.

Page 70: Topics in BMI: Grid Computing

Grid-70

CSE 300

BMI Examples of Grid UsageBMI Examples of Grid Usage Medical ImagingMedical Imaging

Proposed Mammography architecture components

Page 71: Topics in BMI: Grid Computing

Grid-71

CSE 300

BMI Examples of Grid UsageBMI Examples of Grid Usage Medical ImagingMedical Imaging

Mammography architecture component descriptions Image capture workstation

– Technician digitize images and convert them to a high quality data image using the Digital Imaging Communications in Medicine (DICOM) open format.

– The DICOM standard was developed by the American College of Radiology along with others.

– Simplify the development of image recognition and analysis programs.

Grid nodes– Grid nodes are resources provided to\by the grid.– Each university, clinic, or research facility that participates

will add servers along with image storage for DICOM files. – Each node would have a relational federated database to store

patient data and image metadata.

Page 72: Topics in BMI: Grid Computing

Grid-72

CSE 300

BMI Examples of Grid UsageBMI Examples of Grid Usage Medical ImagingMedical Imaging

Mammography architecture component descriptions Image session workstation

– Radiologists and researchers retrieve the images and related data to perform diagnostics and research related activities.

Management portal – A centrally located component used by Administrators to

perform management tasks, such as managing the system workload and capacity.

OGSA-DAI – Open Grid Service Architecture – Data Access Integration

– provides a standard interface for a distributed query processing system to access data in different databases using SOA and open standards.

Page 73: Topics in BMI: Grid Computing

Grid-73

CSE 300

BMI Examples of Grid UsageBMI Examples of Grid Usage Medical ImagingMedical Imaging

Mammography architecture component descriptions Security and Privacy

– Data is protected using cryptology and other security strategies defined in the OGSA standard.

Current implementation Using GT4 architecture. Includes four screening centers and five universities with

approximately 35 staff members with 256 Terabytes of data being stored per year.

Only supports mammography but future plans include expanding to other digital images and expanding the amount of participating members to include world-wide facilities.

Also data mining technologies so a researcher can find “similar” images is being planed for future releases.

Page 74: Topics in BMI: Grid Computing

Grid-74

CSE 300

BMI Examples of Grid UsageBMI Examples of Grid Usage Neuroimaging Neuroimaging

Helps physicians and researchers expand their knowledge and understanding of the brain.

Help with detecting abnormalities and cancers. Includes imaging technologies

Positron Emission Tomography (PET) functional Magnetic Resonance Imaging (fMRI)

Computational infrastructure advancements required in the storage, analysis, and sharing of fMRI data.

To properly analyze fMRI’s of the brain requires repeated averaging of subsections of time series (TS) data and correlating this TS data.

Page 75: Topics in BMI: Grid Computing

Grid-75

CSE 300

BMI Examples of Grid UsageBMI Examples of Grid Usage Neuroimaging Neuroimaging

System proposed in paper [Uri Hasson. Improving the analysis, storage and sharing of neuroimaging data using relational databases and distributed computing. NeuroImage 39 (2008)]

Proposes a Grid enabled DBMS based approach to handle the computational demands for imaging research, storage, and analysis of fMRI data.

Requirements of the proposed system efficiently store data. enable rapid selection of data. make data easily accessible for both local and remote

users.

Page 76: Topics in BMI: Grid Computing

Grid-76

CSE 300

BMI Examples of Grid UsageBMI Examples of Grid Usage NeuroimagingNeuroimaging

The sited advantages of a database-centric framework includes Using the DBMS for storage and sharing data Takes advantage of the DBMS capabilities by making the database

an integral part of the fMRI data analysis workflow. System Architecture

Distributed clients pull data from a central server and work independently and simultaneously to conduct their analysis.

The server maintains a relational database that store the data that are to be analyzed as well as the metadata (assignment of nodes to anatomical regions of interest).

Regional replications of the database are maintained for locality\performance and backup .

– TS data set can be 10 Gig or more localized replicas reduce N\W latency issues.

– Also helpful with db concurrency performance.

Page 77: Topics in BMI: Grid Computing

Grid-77

CSE 300

BMI Examples of Grid UsageBMI Examples of Grid Usage NeuroimagingNeuroimaging

System diagram

Page 78: Topics in BMI: Grid Computing

Grid-78

CSE 300

BMI Examples of Grid UsageBMI Examples of Grid Usage Drug Discovery and DesignDrug Discovery and Design

Drug design uses a molecular modeling technique that requires the screening of millions of molecular compounds that are located in a chemical database (CDB) to identify those that are potential useful drugs.

Lengthy process that can take up to 15 years from the first compound synthesis in the laboratory to the drug being available to the consumer.

This process has been estimated to cost an average of $800 Million dollars per drug [3].

This process is referred to as molecular docking. Docking helps scientists predict how small

molecules chemically bind to an enzyme or a protein receptor of a known three-dimensional molecular structure

Page 79: Topics in BMI: Grid Computing

Grid-79

CSE 300

BMI Examples of Grid UsageBMI Examples of Grid Usage Drug Discovery and DesignDrug Discovery and Design

Page 80: Topics in BMI: Grid Computing

Grid-80

CSE 300

BMI Examples of Grid UsageBMI Examples of Grid Usage Drug Discovery and DesignDrug Discovery and Design

Docking process is both a computational and data intensive task makes it a perfect candidate for computer grid

technologies. To perform molecular docking requires

information about the molecule that is located in one of many large CDB’s. Each CDB requires storage as larges a 1 Terabyte. However each docking process only requires one

molecular compound record (ligand[1]), not the entire database.

[1] An Ion, a molecule or a molecular group that binds to another chemical entity to form a larger complex entity. [4]

Page 81: Topics in BMI: Grid Computing

Grid-81

CSE 300

BMI Examples of Grid UsageBMI Examples of Grid Usage Drug Discovery and DesignDrug Discovery and Design

Designing a drug requires the screening of millions of ligands located in different CDB’s.

Depending on the complexity of the compound a single screening may take anywhere between a few minutes to a few hours on a standard PC. Screening all compounds in a single database can take

years! A drug design problem that involves screening 180,000

compounds with each compound screening job taking three hours on a desktop PC requires (180,000 x 3) 540,000 hours or roughly 61 years!

Page 82: Topics in BMI: Grid Computing

Grid-82

CSE 300

BMI Examples of Grid UsageBMI Examples of Grid Usage

Drug Discovery and DesignDrug Discovery and Design Screening process can be implemented in parallel

utilizing grid technology Depending on the grid size this time (61 years) can be

significantly reduced. Reference [4] proposes a “Virtual Laboratory

Tool” that transforms existing molecular modeling applications so they can be processed in parallel.

Sub-Jobs require minimal CDB access but a computationally intense.

Page 83: Topics in BMI: Grid Computing

Grid-83

CSE 300

BMI Examples of Grid UsageBMI Examples of Grid Usage Drug Discovery and DesignDrug Discovery and Design

Virtual Laboratory (VL) workflow… The drug designer formulates the molecular docking

problem. The problem is submitted to the Grid Resource Broker,

along with performance and optimization requirements.

The broker discovers the resources, establishing their cost and capabilities.

A schedule is then prepared to map docking jobs to resources.

The broker dispatcher deploys its agents to the appropriate resources.

The agent executes a list of commands specified in the job’s task specification.

Page 84: Topics in BMI: Grid Computing

Grid-84

CSE 300

BMI Examples of Grid UsageBMI Examples of Grid Usage Drug Discovery and DesignDrug Discovery and Design

Virtual Laboratory (VL) workflow… Typical tasks may include

– Copy executables and input files from the user machine or extract records from a remote CDB.

– Substitute parameters declared in the input file.

– Execution of the program.

– Copy the results back to the user. Virtual Laboratory features

VL builds on existing grid technologies. Provides new tools for managing and accessing remote

CDB as a network service.

Page 85: Topics in BMI: Grid Computing

Grid-85

CSE 300

BMI Examples of Grid UsageBMI Examples of Grid Usage Drug Discovery and DesignDrug Discovery and Design

VL software stack components – 1 - The drug design molecular modeling software. 2 - The Nimrod Parameter Modeling tools for enabling DOCK as

a parameter sweep application. (Creating sub-jobs). 3 - The Nimrod/G Grid Resource Broker for scheduling DOCK

jobs on the grid . 4 - Chemical Database (CBD) management and intelligent access

tools 5 - The GrACE (GRid seArch & Categorization Engine )

software for resource trading toolkit. 6 - The Globus middleware for secure and uniform access to the

grid resources. 7 - The distributed computing and database resources (Grid).

Page 86: Topics in BMI: Grid Computing

Grid-86

CSE 300

FABRIC

Worldwide Grid(Distributed computers and databases with different Architectures, OS, and local

resource management systems)

BMI Examples of Grid UsageBMI Examples of Grid Usage Drug Discovery and DesignDrug Discovery and Design

Layered architecture for the drug design VL.APPLICATIONS

Molecular Modeling for Drug Design Software

Nimrod and Virtual Lab ToolsPROGRAMMING

TOOLS

(Parametric programming language, GUI tools, and CDB indexer)Nimrod-G and CDB Data Broker USER LEVEL

MIDDLEWARE

(Task farming engine, scheduler, dispatcher agent, CDB server)CORE

MIDDLEWARE

Globus (security, information, job management) and GrACE

Page 87: Topics in BMI: Grid Computing

Grid-87

CSE 300

Topics in BMI: Grid ComputingTopics in BMI: Grid Computing

ConclusionConclusion Computational grid systems bring together

multiple computer system resources. Create a (virtual) massive super computer for use

by people and organizations to address a particular computational need (VO).

BMI applications can be particularly computationally challenging. Requiring years to complete on conventional PC’s.

Utilizing grid technologies have shown a 400+ fold increase in performance. 1200 days to 2.6 days.

Page 88: Topics in BMI: Grid Computing

Grid-88

CSE 300

Topics in BMI: Grid ComputingTopics in BMI: Grid Computing ConclusionConclusion

What a Grid enables… Lessoning time for Mammography analysis and

diagnosis. Enabling analysis of the Human genome at a much

faster pace. Help with brain scan analysis and diagnosis. Quicken the process for drug discovery. Allowing for better and faster research in all fields.

Grid technology is saving Time, Money, and LIVES!

Can you think of a better use for a computer?

Page 89: Topics in BMI: Grid Computing

Grid-89

CSE 300

Topics in BMI: Grid ComputingTopics in BMI: Grid Computing References sited in this presentationReferences sited in this presentation

1. Bart Jacob, Michael Brown, Kentaro Fukui, and Nithar Trivedi. IBM Redbooks, Introduction to Grid Computing. IBM.com/redbooks. December 2005.

2. Roger Smith. Grid Computing: A Brief Technology Analysis. CTOnet.org. 2005. http://www.ctonet.org/documents/GridComputing_analysis.pdf

3. Luis Ferreira, Fabiano Lucchese, Tomoari Yasuda, Chin Yau Lee, Carlos Alexandre Queiroz, Elton Minetto, and Antonio Mungioli. IBM Redbooks, Grid Computing in Research and Education. IBM.com/redbooks. April 2005.

4. Rajkumar Buyya, Kim Branson, Jon Giddy, and David Abramson. The Virtual Laboratory: a toolset to enable distributed molecular modeling for drug design on the World-Wide Grid. Concurrency and Computation: Practice and Experience. Concurrency Computat.: Pract. Exper. 2003; 15:1–25

7. Uri Hasson, Jeremy I. Skipper, Michael J. Wilde, Howard C. Nusbaum, and Steven L. Small. Improving the analysis, storage and sharing of neuroimaging data using relational databases and distributed computing. www.elsevier.com/locate/ynimg. NeuroImage 39 (2008) 693–706.

Page 90: Topics in BMI: Grid Computing

Grid-90

CSE 300

Topics in BMI: Grid ComputingTopics in BMI: Grid Computing Questions?Questions?

Comments?Comments?

Want to join a Grid system to help the world?Want to join a Grid system to help the world? http://www.worldcommunitygrid.org/ http://boinc.berkeley.edu/

Thank you!Thank you!