grids for chemical informatics randall bramley, geoffrey fox, dennis gannon, beth plale computer...

20
Grids for Chemical Informatics Randall Bramley, Geoffrey Fox, Dennis Gannon, Beth Plale Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401

Upload: james-young

Post on 30-Dec-2015

219 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Grids for Chemical Informatics Randall Bramley, Geoffrey Fox, Dennis Gannon, Beth Plale Computer Science, Informatics, Physics Pervasive Technology Laboratories

Grids for ChemicalInformatics

Randall Bramley, Geoffrey Fox, Dennis Gannon, Beth Plale

Computer Science, Informatics, Physics

Pervasive Technology Laboratories

Indiana University Bloomington IN 47401

Page 2: Grids for Chemical Informatics Randall Bramley, Geoffrey Fox, Dennis Gannon, Beth Plale Computer Science, Informatics, Physics Pervasive Technology Laboratories

What is a Grid?

Name borrowed from the power grid.• The concept:

A ubiquitous information & computation resource A definition

• a network of compute and data resources that has been supplemented with a layer of services that provide uniform and secure access to a set of applications of interest to a distributed community of users.

Grids may be wide-area or enterprise

Page 3: Grids for Chemical Informatics Randall Bramley, Geoffrey Fox, Dennis Gannon, Beth Plale Computer Science, Informatics, Physics Pervasive Technology Laboratories

Scientific Challenges The current and future

generations of scientific problems are:

• Data Oriented Increasingly stream based. Often need petabyte

archives

• In need of on-demand computing resources

• Conducted by geographically distributed teams of specialists

Who don’t want to become experts in grid computing.

Science Communities and Outreach

• Communities• CERN’s Large Hadron Collider

experiments• Physicists working in HEP and

similarly data intensive scientificdisciplines

• National collaborators and thoseacross the digital divide indisadvantaged countries

• Scope• Interoperation between LHC

Data Grid Hierarchy and ETF• Create and Deploy Scientific

Data and Services Grid Portals• Bring the Power of ETF to bear

on LHC Physics Analysis: Helpdiscover the Higgs Boson!

• Partners• Caltech• University of Florida• Open Science Grid and Grid3• Fermilab• DOE PPDG• CERN• NSF GriPhyn and iVDGL• EU LCG and EGEE• Brazil (UERJ,…)• Pakistan (NUST,…)• Korea (KAIST,…)

LHC Data Distribution Model

Identify Genes

Phenotype 1 Phenotype 2 Phenotype 3 Phenotype 4

Predictive Disease Susceptibility

Physiology

Metabolism Endocrine

Proteome

Immune Transcriptome

BiomarkerSignatures

Morphometrics

Pharmacokinetics

EthnicityEnvironment

AgeGender

Genetics and Disease Susceptibility

Source: Terry Magnuson, UNC

On-DemandStorm predictions

StreamingObservations

Forecast Model

Data Mining

Storms Forming

Page 4: Grids for Chemical Informatics Randall Bramley, Geoffrey Fox, Dennis Gannon, Beth Plale Computer Science, Informatics, Physics Pervasive Technology Laboratories

Information/Knowledge Grids Distributed (10’s to 1000’s) of data sources (instruments,

file systems, curated databases …) Data Deluge: 1 (now) to 100’s petabytes/year (2012)

• Moore’s law for Sensors Possible filters assigned dynamically (on-demand)

• Run image processing algorithm on telescope image• Run Gene sequencing algorithm on compiled data

Needs decision support front end with “what-if” simulations

Metadata (provenance) critical to annotate data

Integrate across experiments as in multi-wavelength astronomy

Data Deluge comes from pixels/year available

Page 5: Grids for Chemical Informatics Randall Bramley, Geoffrey Fox, Dennis Gannon, Beth Plale Computer Science, Informatics, Physics Pervasive Technology Laboratories

Internet Scale Distributed Services Grids use Internet technology to manage sets of network

connected resources• Classic Web: independent one-to-one access to individual

resources • Grids integrate together and manage multiple Internet-

connected resources: People, Sensors, computers, data systems

Grids are built on top of commodity web service technology with broad industry support

Organization can be explicit as in• TeraGrid which federates many supercomputers; • CrisisGrid which federates first responders, commanders,

sensors, GIS, (Tsunami) simulations, science/public data Organization can be implicit such as curated databases and

simulation resources that “harmonize a community”

Page 6: Grids for Chemical Informatics Randall Bramley, Geoffrey Fox, Dennis Gannon, Beth Plale Computer Science, Informatics, Physics Pervasive Technology Laboratories

The Architecture of Gateway GridsThe Users Desktop.

Gateway Services

Grid Portal Server

Grid Portal Server

Physical Resource Layer

Core Grid Services

Proxy CertificateServer / vault

Proxy CertificateServer / vault

Application EventsApplication EventsResource BrokerResource Broker

User MetadataCatalog

User MetadataCatalog

Replica MgmtReplica Mgmt

ApplicationWorkflow

ApplicationWorkflow

App. Resourcecatalogs

App. Resourcecatalogs

ApplicationDeployment

ApplicationDeployment

ExecutionManagement

ExecutionManagement

InformationServices

InformationServices

SelfManagement

SelfManagement

DataServices

DataServices

ResourceManagement

ResourceManagement

SecurityServicesSecurityServices

OGSA-like Layer

Page 7: Grids for Chemical Informatics Randall Bramley, Geoffrey Fox, Dennis Gannon, Beth Plale Computer Science, Informatics, Physics Pervasive Technology Laboratories

Let’s look at a few real examples

(about a dozen … many more exist!)

Page 8: Grids for Chemical Informatics Randall Bramley, Geoffrey Fox, Dennis Gannon, Beth Plale Computer Science, Informatics, Physics Pervasive Technology Laboratories

BIRN – Biomedical Information

Page 9: Grids for Chemical Informatics Randall Bramley, Geoffrey Fox, Dennis Gannon, Beth Plale Computer Science, Informatics, Physics Pervasive Technology Laboratories

Mesoscale MeteorologyNSF LEAD project - making the tools thatare needed to make accurate predictions of tornados and hurricanes. - Data exploration and Grid workflow

Page 10: Grids for Chemical Informatics Randall Bramley, Geoffrey Fox, Dennis Gannon, Beth Plale Computer Science, Informatics, Physics Pervasive Technology Laboratories

Workflow in the LEAD Grid

Katrinaoutput

Page 11: Grids for Chemical Informatics Randall Bramley, Geoffrey Fox, Dennis Gannon, Beth Plale Computer Science, Informatics, Physics Pervasive Technology Laboratories

Renci Bio GatewayProviding access to biotechnology tools running on a back-end Grid.

- leverage state-wide investment in bioinformatics- undergraduate & graduate education, faculty research- another portal soon: national evolutionary synthesis center

Page 12: Grids for Chemical Informatics Randall Bramley, Geoffrey Fox, Dennis Gannon, Beth Plale Computer Science, Informatics, Physics Pervasive Technology Laboratories

X-Ray Crystallography

Page 13: Grids for Chemical Informatics Randall Bramley, Geoffrey Fox, Dennis Gannon, Beth Plale Computer Science, Informatics, Physics Pervasive Technology Laboratories

SERVOGridSERVOGrid

Page 14: Grids for Chemical Informatics Randall Bramley, Geoffrey Fox, Dennis Gannon, Beth Plale Computer Science, Informatics, Physics Pervasive Technology Laboratories

SERVOGrid Requirements Seamless Access to Data repositories and large scale

computers Integration of multiple data sources including sensors,

databases, file systems with analysis system• Including filtered OGSA-DAI (Grid database access)

Rich meta-data generation and access with SERVOGrid specific Schema extending openGIS (Geography as a Web service) standards and using Semantic Grid

Portals with component model for user interfaces and web control of all capabilities

Collaboration to support world-wide work Basic Grid tools: workflow and notification NOT metacomputing

Page 15: Grids for Chemical Informatics Randall Bramley, Geoffrey Fox, Dennis Gannon, Beth Plale Computer Science, Informatics, Physics Pervasive Technology Laboratories

Database Database

Analysis and VisualizationPortal

RepositoriesFederated Databases

Data Filter

Services

Field Trip DataStreaming Data

Sensors

?DiscoveryServices

SERVOGrid

ResearchSimulations

Research Education

CustomizationServices

From Research

to Education

EducationGrid ComputerFarmGrid of Grids: Research Grid and Education Grid

GISGrid

Sensor GridDatabase Grid

Compute Grid

Page 16: Grids for Chemical Informatics Randall Bramley, Geoffrey Fox, Dennis Gannon, Beth Plale Computer Science, Informatics, Physics Pervasive Technology Laboratories

Google maps can be integrated with Web Feature Service Archives to filter and browse seismic records.

Integrating Archived Web

Feature Services and Google Maps

Page 17: Grids for Chemical Informatics Randall Bramley, Geoffrey Fox, Dennis Gannon, Beth Plale Computer Science, Informatics, Physics Pervasive Technology Laboratories

MyGrid - Bioinformatics

Page 18: Grids for Chemical Informatics Randall Bramley, Geoffrey Fox, Dennis Gannon, Beth Plale Computer Science, Informatics, Physics Pervasive Technology Laboratories

A B C

The Williams Workflows

A: Identification of overlapping sequenceB: Characterisation of nucleotide sequenceC: Characterisation of protein sequence

Page 19: Grids for Chemical Informatics Randall Bramley, Geoffrey Fox, Dennis Gannon, Beth Plale Computer Science, Informatics, Physics Pervasive Technology Laboratories

Physical Network

Discovery Metadata

BioInformatics GridChemical Informatics Grid

…Domain SpecificGrids/Services

Data Access/Storage

Security WorkflowMessaging Management

Information/Knowledge

Instrument/Sensor

Compute/Supercomputer

MIS

Core Low Level Grid Services

Application Services Policy

M(B,C)IS is Molecular (Bio, Chem) Information System supportingspecific metadata (CML, CellML, SBML) and physical representations

HTS ToolsQuantum CalculationsCIS

Sequencing ToolsBiocomplexity SimulationsBIS

Portals

Collaboration

Ser

vice

s

Page 20: Grids for Chemical Informatics Randall Bramley, Geoffrey Fox, Dennis Gannon, Beth Plale Computer Science, Informatics, Physics Pervasive Technology Laboratories

Comments on Grid Components Support GT4 and WS-I+(+); Support Java and .NET Portals – all services will have a portlet interface Compute Grid -- This is some sort of Condor Grid (as used by Cambridge) Supercomputer Grid -- (extended) TeraGrid Workflow, Metadata, Information Management – learn from Taverna, link

with BPEL style workflow, link with other Semantic Grid/metadata services Instruments – learn from CIMA/Reciprocal Net, compare with Sensors in

LEAD/SERVOGrid MIS/CIS – See if idea sensible – in any case need CML, LSID, Molecular

visualization Application Services – Need a wizard. Support “filters” (Wild) and loosely

coupled simulations (Baik) Data – Link to PubChem and Bioinformatics – link to Baik database Discovery – Extended UDDI Security – review any special requirements and status of PubChem, caBIG,

myGrid etc, Collaboration, Management, Messaging, Policy -- nothing special needed