copyright discovery net imperial college 2001-2004 sars analysis on the grid discovery net in...

28
SARS Analysis on the SARS Analysis on the Grid Grid Discovery Net in Bioinformatics

Upload: benjamin-zimmerman

Post on 28-Mar-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Copyright Discovery Net Imperial College 2001-2004 SARS Analysis on the Grid Discovery Net in Bioinformatics

SARS Analysis on the GridSARS Analysis on the Grid

Discovery Net in Bioinformatics

Page 2: Copyright Discovery Net Imperial College 2001-2004 SARS Analysis on the Grid Discovery Net in Bioinformatics

OverviewOverview

• Introduction to Discovery Net

• SARS project

• Demo

• Conclusion

Page 3: Copyright Discovery Net Imperial College 2001-2004 SARS Analysis on the Grid Discovery Net in Bioinformatics

OverviewOverview

• Introduction to Discovery Net

• SARS project

• Demo

• Conclusion

Page 4: Copyright Discovery Net Imperial College 2001-2004 SARS Analysis on the Grid Discovery Net in Bioinformatics

Structure of Discovery NetStructure of Discovery Net

Workflow ExecutionA compositional GRID

Workflow ManagementCollaborative Knowledge Management

Workflow Deployment:Grid Service and Portal

WorkflowWarehousing

Resource Mapping

Service Abstraction

Workflow AuthoringComposing services

Condor-GCondor-G

Native MPINative MPI OGSA-serviceOGSA-service

Web ServiceWeb Service

UnicoreUnicoreOralce 10g

Web WrapperWeb WrapperSun Grid Engine

Component Design/Integration

Page 5: Copyright Discovery Net Imperial College 2001-2004 SARS Analysis on the Grid Discovery Net in Bioinformatics

Component modelComponent model• Components

– Nodes– Basic units of composition– Contain compositional, integrity and execution

logic

• Component frameworks– Groups of related nodes (sequence alignment)– Common object model (inputs/outputs are

typed)

• Component architectures– Grouping of related frameworks (bioinformatics)

Page 6: Copyright Discovery Net Imperial College 2001-2004 SARS Analysis on the Grid Discovery Net in Bioinformatics

Three levels of a componentThree levels of a componentConnectivity:

– What are my inputs?– What are my outputs?

Metadata:– What are my logical

constraints?– How do I verify myself?– What will I produce?

Execution:– What do I actually do?

Input types

Input metadata

Input data

Output types

Result metadata

Result

Page 7: Copyright Discovery Net Imperial College 2001-2004 SARS Analysis on the Grid Discovery Net in Bioinformatics

Construction of a componentConstruction of a component• Through Software Development Kit –

for new algorithms• Using template nodes for webservices,

command-line tools• With specialised IDEs to produce

customised components• Idea is to remove the complexity of

component construction as far as possible from the user

Page 8: Copyright Discovery Net Imperial College 2001-2004 SARS Analysis on the Grid Discovery Net in Bioinformatics

Workflow Warehousing and Workflow Warehousing and ProvenanceProvenance

• Workflows/Services record their history:

• Discovery Net records the full authoring information

• Users may annotate workflows

• All information stored in DPML

• Shared IP for a virtual Organization

• Users can browse for services based on properties

• Users can browse for existing workflows and workflow templates

• Users can see full project history for each service

Page 9: Copyright Discovery Net Imperial College 2001-2004 SARS Analysis on the Grid Discovery Net in Bioinformatics

Publishing of workflowsPublishing of workflows• Parameterisation of a workflow• Defining the black box that is offered to

the end-user• Once deployed, workflow is accessible as:

– Web service– Grid service– Command line tool– Web page

• Workflows combined in personalised portals

Page 10: Copyright Discovery Net Imperial College 2001-2004 SARS Analysis on the Grid Discovery Net in Bioinformatics

Discovery Net usersDiscovery Net users• Component developers

– IT-literate to an extent

• Analysis designers– Domain experts with understanding of the

research problem

• End users– Scientists with no interest in IT and

coding/assembling their software

• Line does get blurry!

Page 11: Copyright Discovery Net Imperial College 2001-2004 SARS Analysis on the Grid Discovery Net in Bioinformatics

Discovery Net Discovery Net Application ExamplesApplication Examples

• Environmental Modelling– High throughput dispersed air sensing

technology

• Life Sciences– High throughput genomics and proteomics

• Real time geo-hazard modelling– Earthquake modelling through satellite

imagery• GM Crop trial studies

– Simulating the effects of GM crops on the surrounding ecosystem

NMLKJIHGFEDCBA

123456

78

910

Page 12: Copyright Discovery Net Imperial College 2001-2004 SARS Analysis on the Grid Discovery Net in Bioinformatics

OverviewOverview

• Introduction to Discovery Net

• SARS project

• Demo

• Conclusion

Page 13: Copyright Discovery Net Imperial College 2001-2004 SARS Analysis on the Grid Discovery Net in Bioinformatics

SARS Basic FactsSARS Basic Facts• Appeared first in January 2003,

Guangdong province, China• SARS Coronavirus (SARS-CoV)

identified as the cause• China started a major research

initiative to investigate the biology of the virus and predict its behaviour

Page 14: Copyright Discovery Net Imperial College 2001-2004 SARS Analysis on the Grid Discovery Net in Bioinformatics

SARS projectSARS project• Collaboration between Discovery Net

and SCBIT (Shanghai Center for Bioinformation Technology)

• Annotation of SARS genomes obtained from different patient samples

• Analysis of mutation patterns of SARS virus

• Discovery Net providing the IT platform to organize the analysis

Page 15: Copyright Discovery Net Imperial College 2001-2004 SARS Analysis on the Grid Discovery Net in Bioinformatics

Work doneWork done• Data

– Research performed on 33 sample of SARS virus, sequenced from the Chinese patients

– Combined with publicly available data from NCBI

• Goal– Deeper understanding of the mutation patterns of

the SARS virus

• Analysis– Examining the variability of the virus on both

genomic and proteomic level– Providing full insight into the significance of

changes in the nucleic structure of the virus

Page 16: Copyright Discovery Net Imperial College 2001-2004 SARS Analysis on the Grid Discovery Net in Bioinformatics

Genomic analysisGenomic analysis

Alignment - data intensive, performed on the Grid

Retrieval of publicly available knowledge

Examining the variations in different strains

Page 17: Copyright Discovery Net Imperial College 2001-2004 SARS Analysis on the Grid Discovery Net in Bioinformatics

Phylogenetic Phylogenetic viewviewSARS Genome taken from Hong Kong Patients

SARS Genome taken from Beijing Patients

SARS Genome taken from Singapore Patients

Page 18: Copyright Discovery Net Imperial College 2001-2004 SARS Analysis on the Grid Discovery Net in Bioinformatics

Proteomic analysisProteomic analysisIsolating interesting

genomic regions

Identifying relevant protein sequences

Observing the variations in the resulting protein

Page 19: Copyright Discovery Net Imperial College 2001-2004 SARS Analysis on the Grid Discovery Net in Bioinformatics

Proteomic annotationProteomic annotation• Parallel

annotation with multiple sequence analysis tools

• Framework first used in Supercomputing 2002

Page 20: Copyright Discovery Net Imperial College 2001-2004 SARS Analysis on the Grid Discovery Net in Bioinformatics

Annotation editorAnnotation editor

Page 21: Copyright Discovery Net Imperial College 2001-2004 SARS Analysis on the Grid Discovery Net in Bioinformatics

SCBIT Analysis PortalSCBIT Analysis Portal

Page 22: Copyright Discovery Net Imperial College 2001-2004 SARS Analysis on the Grid Discovery Net in Bioinformatics

OverviewOverview

• Introduction to Discovery Net

• SARS project

• Demo

• Conclusion

Page 23: Copyright Discovery Net Imperial College 2001-2004 SARS Analysis on the Grid Discovery Net in Bioinformatics

Next stepNext step

• Portal technology used to build thematical portals concentrating on particular research areas

• Goal: to construct a number of public portals for the needs of the UK eScience community and make them accessible to all

Page 24: Copyright Discovery Net Imperial College 2001-2004 SARS Analysis on the Grid Discovery Net in Bioinformatics

OverviewOverview

• Introduction to Discovery Net

• SARS project

• Demo

• Conclusion

Page 25: Copyright Discovery Net Imperial College 2001-2004 SARS Analysis on the Grid Discovery Net in Bioinformatics

Discovery Net Advantages…Discovery Net Advantages…• Rapid component integration through SDK or

generic connectors:– Grid services– Web services– Command-line tools etc.

• Intuitive research assembly and management– Graphical workflow assembly

• Provenance of analysis– Within the server warehouse

• Personalised end-user environments– Discovery Portal

Page 26: Copyright Discovery Net Imperial College 2001-2004 SARS Analysis on the Grid Discovery Net in Bioinformatics

… … applied to SCBIT researchapplied to SCBIT research• Integrated

– Existing tools (EMBOSS, alignment apps)– In-house data stores (with SARS sequence

data)– Original algorithms for mining variation info

• Workflows assembled by the whole research group

• Research history tracked through the project change information

• SCBIT Portal creating a common platform for multidisciplinary users

Page 27: Copyright Discovery Net Imperial College 2001-2004 SARS Analysis on the Grid Discovery Net in Bioinformatics

SummarySummary• IT platform supporting an urgent discovery

research• Access to data within a scalable knowledge

creation infrastructure • Exploitation and annotation of biological

information using multiple sources, data types and locations

• Integration of external applications within a unified environment

• Sharing of methods, results and data views across the Virtual Organisation

Page 28: Copyright Discovery Net Imperial College 2001-2004 SARS Analysis on the Grid Discovery Net in Bioinformatics

Credits and further infoCredits and further info• Discovery Net team, especially

Moustafa Ghanem, Jameel Syed and Stuart Hassard

• http://www.discovery-on-the.net• Exhibiting at EPSRC and LESC stands• Demo today at 13:15 – 14:45 at EPSRC

stand