shared cyberinfrastructure for global medical research (pdf)

33
Garuda : The National Grid Computing Initiative - The shared cyberinfrastructure for data and compute intensive research Subrata Chattopadhyay CDAC Knowledge Park, Bangalore [email protected] www.garudaindia.in

Upload: vannguyet

Post on 14-Feb-2017

221 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Shared CyberInfrastructure for Global Medical Research (pdf)

Garuda : The National Grid Computing Initiative - The shared cyberinfrastructure for data and compute intensive research

Subrata ChattopadhyayCDAC Knowledge Park,[email protected]

www.garudaindia.in

Page 2: Shared CyberInfrastructure for Global Medical Research (pdf)

Outline

• Introduction on Garuda • NKN – highlights• Tools and Services - GarudaWare • Major Applications• Collaborations • Q & A

Page 3: Shared CyberInfrastructure for Global Medical Research (pdf)

Global Access to Resources Using Distributed Architecture

Page 4: Shared CyberInfrastructure for Global Medical Research (pdf)

Garuda on MPLS based NKN

LegendH Head NodeG Gateway

CC--DAC, DAC, BangaloreBangalore

LANLocalUser

Compute Nodes

H

InternetAccess

Partner Partner without without

resourcesresources

PartnerPartnerwith resourceswith resources

Compute Nodes

H

User

Tele-scope

LAN

Storage

LANAccess Terminal

Gridfs AccessTerminal

Access Terminal

M P L S AccessM P L S Access

Access Terminals

G

G

G

Page 5: Shared CyberInfrastructure for Global Medical Research (pdf)

• High Capacity, Highly Scalable Backbone

• Provide Quality of Service (QoS) and Security

• Wide Geographical Coverage

• Common Standard Platform

• Bandwidth from Many NLD’s

• Highly Reliable & Available by Design

• Test beds ( for various implementation)

• Dedicated and Owned.

NKN – National Knowledge Network

Page 6: Shared CyberInfrastructure for Global Medical Research (pdf)

Garuda High Level System Components

ProgrammingDevelopment Environment

Computing Resources and Virtual Organizations

Research Organizations

Educational institutions Computing Centers

WSRF+GT4 + other Services + Cloud S/W (Nimbus/ VMware)WSRF+GT4 + other Services + Cloud S/W (Nimbus/ VMware)

NKN

Grid PSE

Virtualization support

Workflows

Grid Security and High-Performance Grid Networking

Data

Grid

Reso

urce

En

ab

ler &

Mo

nito

ring

CDAC Resource centers

Access PortalCLI Visualization

Federated Information Server Job Scheduler

Programming Environments Grid ApplicationsSecurity

Resource Management User

EnvironmentsMiddleware Data GridResources

Hand held devices

GARUDA – enabled Applications

Non – Research

Organizations

Cloud Interface

Page 7: Shared CyberInfrastructure for Global Medical Research (pdf)

GARUDA Middleware componentsGARUDA Middleware components

Utility tools• RAT• Compiler Service• Gridftp GUI• GARUDA Information

Registry

Access Methods• Access Portal • Problem Solving

Environments• Workflows• Visualization

gateways• Hand held device• Cloud Interface

Management, Monitoring & Accounting• Paryaveekshanam• GARUDA Accounting• MDS4

Security Framework• IGCA Certificates• VOMS • MyProxy• Login Service

Resource Mgmt & Scheduling• Resource Reservation• QoS• GridWay Meta-scheduler• Torque, Load Leveler• Globus 4.x (WS Components)

Legend••••

Data Management• SRB• GSRM• GridFTP

Page 8: Shared CyberInfrastructure for Global Medical Research (pdf)

– Indian Grid Certification Authority located at C-DAC, Knowledge Park, Bangalore, India.

– IGCA is the accredited member of APGridPMA.– Issues X.509 Certificates to support the secure environment

in Grid. (for GARUDA, institutes that do research in grid from India and foreign institutes that collaborates with GARUDA).

– http://ca.garudaindia.in

Page 9: Shared CyberInfrastructure for Global Medical Research (pdf)

GARUDA SLCS provides gridusers an instant access toGARUDA grid for a trial periodof 30days.

Highlights:• Hassle free registration• Get an access in less than 5mins.• Service over the internet.

Features:• GARUDA Job submission portal• GARUDA Compiler Service

Website: http://labs.garudaindia.in

GARUDA Short Live Certificate

Page 10: Shared CyberInfrastructure for Global Medical Research (pdf)

GARUDA Resources

CDAC Resource :

Fourteen of the partner institutions are also contributing resources including satellite terminals.Total computing power is more than 5500 CPUs equivalent to 65TFStorage space 220 TB

Page 11: Shared CyberInfrastructure for Global Medical Research (pdf)

GARUDA Resources – cont...Institution Location Resources

Space Application Centre Ahmedabad VSAT Terminal - 2 Nos.

Indian Institute of Science Bangalore 64 cpu; POWER5; Linux

Raman Research Institute Bangalore 32 cpu; Opteron; Linux

Institute of Mathematical Sciences Chennai 24 cpu; Opteron cluster (Cray XD1)

Madras Institute of Technology Chennai 16 cpu; P4; Linux

Indian Institute of Technology Delhi 32 cpu; Opteron; Linux

Jawaharlal Nehru University Delhi 32+16+16 cpu; Opteron, Opteron, Itanium; Linux

Institute of Genomics and Integrative Biology

Delhi 48 cpu; Xeon; Linux

Indian Institute of Technology Guwahati 128 cpu; Opteron; Linux

University of Hyderabad Hyderabad 32 way SMP; POWER4, AIX

Indian Institute of Technology Kharagapur 16+16 cpu; Power PC2, Xeon; AIX, Linux

Physical Research Laboratory Ahmedabad 320cpus; 64bit AMD

CDAC Bangalore 64 cpu Power 5; 320 cpu Xeon Linux

CDAC Hyderabad 320 cpu Xeon Linux

CDAC chennai 320 cpu Xeon Linux

CDAC Pune 32 cpu Xeon Linux: 4068 CPU Linux

Page 12: Shared CyberInfrastructure for Global Medical Research (pdf)

GARUDA Operations & Management

• Looks after deployment of

middleware and network

• Operates from CDAC KP

Bangalore

• Operation Centre with High

resolution, scalable display wall

• Conduct Regular Monday

meetings among administrators

to maintain Garuda health

[email protected] [email protected]

Page 13: Shared CyberInfrastructure for Global Medical Research (pdf)

GARUDA Partners• Motivation

– To Collaborate on Research and Engineering of Technologies, Architectures, Standards and Applications

– To Contribute to the aggregation of GARUDA resources

• Participation– 36 research & academic

institutions in the 17 cities– 8 centres of C-DAC– Total of 45 institutions– Additional over 20 labs

with LOE

Page 14: Shared CyberInfrastructure for Global Medical Research (pdf)

Virtual User Community (VOMS)Group Name Description

Bioinformatics application of statistics and computer science to the molecular biology

ClimateModelling Deals with the dynamics of the climate system.

OSDD Community dedicated to develop drugs for tropical infectious diseases like malaria, tuberculosis

GeoPhysis Study related to physics of the Earth and its environment in space

CAE usage of computer software to solve engineering problems

IndianHeritage Focused on technology products for preserving & processing Heritage texts

HealthInformatics Focused on utilizing compute power for health informatics

MaterialScience interdisciplinary field applying the properties of matter to science and engineering

Euindia The vision of a worldwide Grid for Research by both Europe and India

ToolsDeveloper Forum to communicate and collaborate on developing Garuda Tools

GarudaAdmin Meant for administrators from resource providers & Garuda Operation team members

Page 15: Shared CyberInfrastructure for Global Medical Research (pdf)

Applications on GARUDA

Page 16: Shared CyberInfrastructure for Global Medical Research (pdf)

OSDD Chemo-informatics

datasets

Curatedmolecule datasets

CheminformaticsModels

Analysis

Data Mining and

Analysis

HT Virtual screening

PubChem

ChEMBL

DrugBank

Experimental Assays

Community of About 400

Page 17: Shared CyberInfrastructure for Global Medical Research (pdf)

Role of Garuda Grid in OSDD

Project Team

Page 18: Shared CyberInfrastructure for Global Medical Research (pdf)

Internet/NKN

Results

NKN

OSDD-Garuda Interface

Galaxy Workflow

Page 19: Shared CyberInfrastructure for Global Medical Research (pdf)

Weka Workflow

Page 20: Shared CyberInfrastructure for Global Medical Research (pdf)

Customized Galaxy Framework on GARUDA for OSDD:Chemo-informatics

• Integrated with Grid Authentication mechanism - Indian Grid Certificate Authority (IGCA)

• Integrated with Gridway Metascheduler - Job scheduling and management

• Integrated OSDD required tools - Weka (for data mining) and Autodock (Virtual screening)

• Provided support to upload multiple input files as tar file

• Data libraries of OSDD community are uploaded and are shared by all users

• Integrated with PostgreSQL

Page 21: Shared CyberInfrastructure for Global Medical Research (pdf)

Bioinformatics: Protein Structure Prediction on Grid

• Genetic Algorithm for Protein Structure Prediction (PSP), an in-house developed code is Grid-enabled

• Concurrent jobs of PSP are done by splitting the protein molecule into multiple overlapping parts

• Uses Divide-and-Construct approach for– Reduction in Complexity– Possibility of Concurrency– To handle larger protein molecules

Page 22: Shared CyberInfrastructure for Global Medical Research (pdf)

Flow of PSP

Dividing the sequence into parts

Mapping of each part onto a grid resource and to run GA

Constructing the molecule by combining parts and to run GA on combined sequence

Input:Protein

Sequence

Protein Sequence:

Part 1

Protein Sequence:

Part 2

Protein Sequence:

Part 3

Torsion Angles of

Part 1

Torsion Angles of

Part 2

Torsion Angles of

Part 3

Grid Resource 1 Grid

Resource 2Grid Resource 3

Combined GA output

for full molecule

Dividing the sequence into parts

Mapping of each part onto a grid resource and to run GA

Constructing the molecule by combining parts and to run GA on combined sequence

Input:Protein

Sequence

Protein Sequence:

Part 1

Protein Sequence:

Part 2

Protein Sequence:

Part 3

Torsion Angles of

Part 1

Torsion Angles of

Part 2

Torsion Angles of

Part 3

Grid Resource 1 Grid

Resource 2Grid Resource 3

Combined GA output

for full molecule

Page 23: Shared CyberInfrastructure for Global Medical Research (pdf)

Performance of GA based PSP on Garuda

• Dataset:– 1TUP – a tumor suppressor protein having 219

amino acids.

• Molecule is splitted into 9 parts and each part has 30 amino acids

• GA on full molecule took 76 hours whereas distributed GA on Garuda took only 3 hours

Page 24: Shared CyberInfrastructure for Global Medical Research (pdf)
Page 25: Shared CyberInfrastructure for Global Medical Research (pdf)

Data/Memory Intensive Applications on Garuda

Page 26: Shared CyberInfrastructure for Global Medical Research (pdf)

Computationally Intensive Applications on Garuda

Page 27: Shared CyberInfrastructure for Global Medical Research (pdf)

caBIG - Garuda

• Exploring possibilities in Collaboration

• Interoperability of the “grid” technologies– Make the software

components talk to each other– Follow same Data standards

for publication– Common tool base for

researchers• Leverage HPC capabilities for

applications

Page 28: Shared CyberInfrastructure for Global Medical Research (pdf)

Other Areas of Collaboration

• Indian Cancer Grid

• Protein folding analysis (using caGrid workflow, transport and security technology)

• caTissue – implement in the Software as a Service (SaaS) model

• Building a regional biobanking system—based on caTissue—at the Tata Memorial Centre & Hospital in Mumbai

Page 29: Shared CyberInfrastructure for Global Medical Research (pdf)

Overall goal• Facilitate meeting priorities of NCI and ICMR towards discovering

and application of carcinogenesis biology into cancer prevention.• Discuss development of personalized approach to cancer

prevention and control through linking cancer biology to population diversity.

• To consider development and validation of cost effective biomarkers capable of early detection of cancers through global scientific, population and technological resources.

• Improve and share population databases from India and the United States to compare cancer biology, incidence, mortality, natural history, geographic and population diversity.

• To create understanding between India and Western nations to develop collaborative studies.

• Further details at http://canbio.in/overview.htm

Translational Cancer Prevention & Biomarkers Workshop 2011 @ Bangalore

Page 30: Shared CyberInfrastructure for Global Medical Research (pdf)

Highlights• Founded in 2002 by two Yale trained physicians.

• Teleradiology services to hospitals around the globe.

• Teleradiology services include interpretation of all non-invasive imaging studies, namely CT, MRI, ultrasound, nuclear medicine studies and digitized Xrays.

• Emergency reports are provided within thirty minutes.

• Joint research partnerships with major technology vendors such as GE, to explore new techniques in 3D imaging analysis

• Further details at http://www.telradsol.com/

Innovative Startup

Page 31: Shared CyberInfrastructure for Global Medical Research (pdf)

31

Collaborative Class Room

Supported Features:-

• Interface to Access grid• GSRM based data storage for maintaining course repositories• Indexing of course material based on key words

Website: http://ccr.garudaindia.in

Page 32: Shared CyberInfrastructure for Global Medical Research (pdf)

Interoperability with International Grids

• Integrating technological components of Garuda and EGI– Glite and Globus– Customizing Gridway meta-scheduler – To run real life application across both Infrastructures

• Collaboration between CaBig and Garuda– Interoperation of technological service among these grids– Cancer Research application portability – Contribution to standards for using distributed computing in Health care

Page 33: Shared CyberInfrastructure for Global Medical Research (pdf)

!

! !

! !

Thank you!很好, 谢谢

!

Grazie tanto!