mygrid and the semantic web phillip lord school of computer science university of manchester

22
myGrid and the Semantic Web Phillip Lord School of Computer Science University of Manchester

Upload: gary-george

Post on 02-Jan-2016

224 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: MyGrid and the Semantic Web Phillip Lord School of Computer Science University of Manchester

myGrid and the Semantic Web

Phillip Lord

School of Computer Science

University of Manchester

Page 2: MyGrid and the Semantic Web Phillip Lord School of Computer Science University of Manchester

myGrid: eScience and Bioinformatics

• Oct 2001 – April 2005.

• £3.4 million.

• UK e-Science Pilot Project.

• £0.4 million studentships.

Newcastle

NottinghamManchester

Southampton

Hinxton

Sheffield

Page 3: MyGrid and the Semantic Web Phillip Lord School of Computer Science University of Manchester

Data (Type) Intensive Bioinformatics

ID MURA_BACSU STANDARD; PRT; 429 AA.DE PROBABLE UDP-N-ACETYLGLUCOSAMINE 1-CARBOXYVINYLTRANSFERASEDE (EC 2.5.1.7) (ENOYLPYRUVATE TRANSFERASE) (UDP-N-ACETYLGLUCOSAMINEDE ENOLPYRUVYL TRANSFERASE) (EPT).GN MURA OR MURZ.OS BACILLUS SUBTILIS.OC BACTERIA; FIRMICUTES; BACILLUS/CLOSTRIDIUM GROUP; BACILLACEAE;OC BACILLUS.KW PEPTIDOGLYCAN SYNTHESIS; CELL WALL; TRANSFERASE.FT ACT_SITE 116 116 BINDS PEP (BY SIMILARITY).FT CONFLICT 374 374 S -> A (IN REF. 3).SQ SEQUENCE 429 AA; 46016 MW; 02018C5C CRC32; MEKLNIAGGD SLNGTVHISG AKNSAVALIP ATILANSEVT IEGLPEISDI ETLRDLLKEI GGNVHFENGE MVVDPTSMIS MPLPNGKVKK LRASYYLMGA MLGRFKQAVI GLPGGCHLGP RPIDQHIKGF EALGAEVTNE QGAIYLRAER LRGARIYLDV VSVGATINIM LAAVLAEGKT IIENAAKEPE IIDVATLLTS MGAKIKGAGT NVIRIDGVKE LHGCKHTIIP DRIEAGTFMI

Page 4: MyGrid and the Semantic Web Phillip Lord School of Computer Science University of Manchester

Web Service (Grid Service) communication fabric

Web Service (Grid Service) communication fabric

AMBITText Extraction

Service

Provenance

Personalisation

Event Notification

Gateway

Service and WorkflowDiscovery

myGrid Information Repository

Ontology Mgt

Metadata Mgt

Work bench Taverna Talisman

Native Web Services

SoapLab

Web Portal

Legacy apps

Registries

Ontologies

FreeFluo Workflow Enactment Engine

OGSA-DQPDistributed Query Processor

Bio

info

rmat

icia

nsT

ool P

rovi

ders

Ser

vice

Pro

vide

rsA

pplicationsC

ore servicesE

xternal services

Views

Legacy apps

GowLab

Page 5: MyGrid and the Semantic Web Phillip Lord School of Computer Science University of Manchester
Page 6: MyGrid and the Semantic Web Phillip Lord School of Computer Science University of Manchester

Support not Automation

Page 7: MyGrid and the Semantic Web Phillip Lord School of Computer Science University of Manchester

Thin Semantics• PRETTYSEQ of CDS1|>CDS2|strand_1 from 1 to 129

• ---------|---------|---------|---------|---------|---------|• 1 atgacggacactgctggtcgctgtggcttcctcctacgcgttcggtcactcctgcacatg 60• 1 M T D T A G R C G F L L R V R S L L H M 20

• ---------|---------|---------|---------|---------|---------|• 61 tccgcagtagtggtgctctcggggaccccctcgccaccccacaataccgctcaccacatg 120• 21 S A V V V L S G T P S P P H N T A H H M 40

• ---------• 121 gccaaacag 129• 41 A K Q 43

CPGREPORT of CDS1|>CDS2|strand_1 from 1 to 129

Sequence Begin End Score CpG %CG CG/GC

CDS1|>CDS2|strand_1 5 109 58 9 64.8

1.12

########################################

# Program: restrict

# Rundate: Thu Jul 15 16:32:30 2004

# Report_format: table

# Report_file: /scratch/emboss_interfaces/a/unknown/Projects/default/Data/out1089905549241

########################################

Start End Enzyme_name Restriction_site 5prime 3prime 5primerev 3primerev

4 8 TspGWI ACGGA 19 17 . .

9 15 TspRI CASTGNN 15 6 . .

14 19 BtsI GCAGTG 8 6 . .

25 28 CviJI RGCY 26 26 . .

30 33 MnlI CCTC 40 39 . .

36 41 MluI ACGCGT 36 40 . .

#---------------------------------------

#---------------------------------------

Page 8: MyGrid and the Semantic Web Phillip Lord School of Computer Science University of Manchester

Semantic Discovery with FetaQuery-ontology – discovering workflows and services described in the registry by building a query in Taverna.

A common ontology is used to annotate and query. (Planning For OBO release)

Page 9: MyGrid and the Semantic Web Phillip Lord School of Computer Science University of Manchester

Knowledge in Feta

Ontology (OWL-DL)

Service Descriptions

(XML)

Jena Querying

(RDF)

Page 10: MyGrid and the Semantic Web Phillip Lord School of Computer Science University of Manchester

Service DiscoveryGood:

RDF provides a convenient search capability, with a

well defined link to an ontology

Bad:

Unsure about scalability. Issues of security,

Concurrency will probably also affect us.

Page 11: MyGrid and the Semantic Web Phillip Lord School of Computer Science University of Manchester

Provenance• Bioinformatics has a data circularity problem.

• Computational data is hard to trace, reproduce or repeat.

• We need to store provenance.

• Service Orientated Architecture and Service Descriptions start to enable us to do this.

Page 12: MyGrid and the Semantic Web Phillip Lord School of Computer Science University of Manchester

Provenance: The Semantic Web

Page 13: MyGrid and the Semantic Web Phillip Lord School of Computer Science University of Manchester

Generating Provenance

Web Services

Taverna

FreeFluo

MetadataRepository

(reified)

Data Repository

LaunchPad Haystack

Page 14: MyGrid and the Semantic Web Phillip Lord School of Computer Science University of Manchester

Workflow run

Workflow design

Experiment design

Project

Person

Organisation

Process

Service

Event

Data item

Data itemData item

data derivation e.g. output data derived from input data

instanceOf

partOf componentProcesse.g. web service invocation of BLAST @ NCBI

componentEvente.g. completion of a web service invocation at 12.04pm

runBye.g. BLAST @ NCBI

run for

Organisation level provenance Process level provenance

User can add templates to each workflow process to determine links between data items.

Page 15: MyGrid and the Semantic Web Phillip Lord School of Computer Science University of Manchester
Page 16: MyGrid and the Semantic Web Phillip Lord School of Computer Science University of Manchester

ProvenanceGOOD:

RDF provides a convenient data model, which is

flexible, and adaptable.

BAD:

Visualisation tools are lacking. Scalability even more an

of issue with reification

Page 17: MyGrid and the Semantic Web Phillip Lord School of Computer Science University of Manchester

LSID’s• Standard identifier mechanism, aimed at the life

sciences

• Has standard resolution mechanism by which the data can be obtained.

• Has semantics for versioning

• Has standard association with metadata

• Abbreviation distressingly similar to LSD

Page 18: MyGrid and the Semantic Web Phillip Lord School of Computer Science University of Manchester

Provenance• Used LSID within provenance; all of our data is

stored and resolved with LSID

• Notion of a single identifier system within myGrid is attractive.

Page 19: MyGrid and the Semantic Web Phillip Lord School of Computer Science University of Manchester

Worries• We are unclear as how the metadata/data split

happens with LSID: Use former for mutability, later for immutability.

• We have also tending toward using “metadata” for RDF based data, and “data” for relational.

Page 20: MyGrid and the Semantic Web Phillip Lord School of Computer Science University of Manchester

LSIDGOOD:

Defined resolution mechanism, data and metadata.

BAD:

Unclear how to use data/metadata split.

Page 21: MyGrid and the Semantic Web Phillip Lord School of Computer Science University of Manchester

AcknowledgementsCore

• Matthew Addis, Nedim Alpdemir, Tim Carver, Rich Cawley, Neil Davis, Alvaro Fernandes, Justin Ferris, Robert Gaizaukaus, Kevin Glover, Carole Goble, Chris Greenhalgh, Mark Greenwood, Yikun Guo, Ananth Krishna, Peter Li, Phillip Lord, Darren Marvin, Simon Miles, Luc Moreau, Arijit Mukherjee, Tom Oinn, Juri Papay, Savas Parastatidis, Norman Paton, Terry Payne, Matthew Pocock, Milena Radenkovic, Stefan Rennick-Egglestone, Peter Rice, Martin Senger, Nick Sharman, Robert Stevens, Victor Tan, Anil Wipat, Paul Watson and Chris Wroe.

Users

• Simon Pearce and Claire Jennings, Institute of Human Genetics School of Clinical Medical Sciences, University of Newcastle, UK

• Hannah Tipney, May Tassabehji, Andy Brass, St Mary’s Hospital, Manchester, UK

Postgraduates

• Martin Szomszor, Duncan Hull, Jun Zhao, Pinar Alper, John Dickman, Keith Flanagan, Antoon Goderis, Tracy Craddock, Alastair Hampshire

Industrial

• Dennis Quan, Sean Martin, Michael Niemi, Syd Chapman (IBM)

• Robin McEntire (GSK)

Collaborators

• Keith Decker

Page 22: MyGrid and the Semantic Web Phillip Lord School of Computer Science University of Manchester

SummaryGOOD:

RDF provides a convenient search capability, with a well defined link to an ontology

RDF provides a convenient data model, which is flexible, and adaptable.

LSID: Defined resolution mechanism, data and metadata.

BAD:

Unsure about scalability. Issues of security, Concurrency will probably also affect

Visualisation tools are lacking. Scalability even more an of issue with reification

LSID: Unclear how to use data/metadata split.