exploring williams-beuren syndrome using my grid r.d. stevens, a h.j. tipney, b c.j. wroe, a t.m....

19
Exploring Williams- Beuren Syndrome using my Grid R.D. Stevens, a H.J. Tipney, b C.J. Wroe, a T.M. Oinn, c M. Senger, c P.W. Lord, a C.A. Goble, a A. Brass, a M. Tassabehji b a Department of Computer Science University of Manchester b University of Manchester, Academic Unit of Medical Genetics St Mary’s Hospital c European Bioinformatics Institute Wellcome Trust Genome Campus Hinxton

Upload: olivia-bennett

Post on 05-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Exploring Williams-Beuren Syndrome using my Grid R.D. Stevens, a H.J. Tipney, b C.J. Wroe, a T.M. Oinn, c M. Senger, c P.W. Lord, a C.A. Goble, a A. Brass,

Exploring Williams-Beuren Syndrome using myGrid

R.D. Stevens,a H.J. Tipney,b C.J. Wroe,a T.M. Oinn,c

M. Senger,c P.W. Lord,a C.A. Goble,a A. Brass,a

M. Tassabehji b

a Department of Computer Science

University of Manchester

b University of Manchester,

Academic Unit of Medical Genetics

St Mary’s Hospital

c European Bioinformatics Institute

Wellcome Trust Genome Campus

Hinxton

Page 2: Exploring Williams-Beuren Syndrome using my Grid R.D. Stevens, a H.J. Tipney, b C.J. Wroe, a T.M. Oinn, c M. Senger, c P.W. Lord, a C.A. Goble, a A. Brass,

Williams-Beuren Syndrome (WBS)

• Congenital disorder caused by sporadic gene deletion

• 1/20,000 live births• Effects multiple systems – muscular,

nervous, circulatory• Characteristic facial features• Unique cognitive profile• Mental retardation (IQ 40-100,

mean~60, ‘normal’ mean ~ 100 )• Outgoing personality, friendly nature,

‘charming’• Haploinsuffieciency of the region

results in the phenotype

Page 3: Exploring Williams-Beuren Syndrome using my Grid R.D. Stevens, a H.J. Tipney, b C.J. Wroe, a T.M. Oinn, c M. Senger, c P.W. Lord, a C.A. Goble, a A. Brass,

Williams-Beuren Syndrome Microdeletion

Chr 7 ~155 Mb

~1.5 Mb7q11.23

GT

F2I

RF

C2

CY

LN

2

GT

F2I

RD

1

NC

F1

WB

SC

R1/

E1f

4H

LIM

K1

EL

N

CL

DN

4

CL

DN

3

ST

X1A

WB

SC

R18

WB

SC

R21

TB

L2

BC

L7B

BA

Z1B

FZ

D9

WB

SC

R5/

LA

B

WB

SC

R22

FK

BP

6

PO

M12

1

NO

LR

1

GT

F2I

RD

2

C-c

en

C-m

id

A-c

en

B-m

id

B-c

en

A-m

id

B-t

el

A-t

el

C-t

el

WB

SC

R14

ST

AG

3P

MS

2L

Block A

FK

BP

6T

PO

M12

1N

OL

R1

Block C

GT

F2I

PN

CF

1PG

TF

2IR

D2P

Block B

**

WBS

SVAS

Patient deletions

CTA-315H11

CTB-51J22

‘Gap’

Physical Map

Eicher E, Clark R & She, X An Assessment of the Sequence Gaps: Unfinished Business in a Finished Human Genome. Nature Genetics Reviews (2004) 5:345-354Hillier L et al. The DNA Sequence of Human Chromosome 7. Nature (2003) 424:157-164

Page 4: Exploring Williams-Beuren Syndrome using my Grid R.D. Stevens, a H.J. Tipney, b C.J. Wroe, a T.M. Oinn, c M. Senger, c P.W. Lord, a C.A. Goble, a A. Brass,

1. Identify new, overlapping sequence of interest2. Characterise the new sequence at nucleotide and amino acid

level

Cutting and pasting between numerous web-based services i.e. BLAST, InterProScan etc

Filling a genomic gap in Silico

12181 acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt 12241 cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt 12301 gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct 12361 gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt 12421 taggtgactt gcctgttttt ttttaattgg gatcttaatt tttttaaatt attgatttgt 12481 aggagctatt tatatattct ggatacaagt tctttatcag atacacagtt tgtgactatt 12541 ttcttataag tctgtggttt ttatattaat gtttttattg atgactgttt tttacaattg 12601 tggttaagta tacatgacat aaaacggatt atcttaacca ttttaaaatg taaaattcga 12661 tggcattaag tacatccaca atattgtgca actatcacca ctatcatact ccaaaagggc 12721 atccaatacc cattaagctg tcactcccca atctcccatt ttcccacccc tgacaatcaa 12781 taacccattt tctgtctcta tggatttgcc tgttctggat attcatatta atagaatcaa

Page 5: Exploring Williams-Beuren Syndrome using my Grid R.D. Stevens, a H.J. Tipney, b C.J. Wroe, a T.M. Oinn, c M. Senger, c P.W. Lord, a C.A. Goble, a A. Brass,

Filling a genomic gap in silico

• Frequently repeated – info rapidly added to public databases• Time consuming and mundane• Don’t always get results• Huge amount of interrelated data is produced – handled in notebooks and

files saved to local hard drive• Much knowledge remains undocumented: Bioinformatician does the analysis

Advantages: Specialist human intervention at every step, quick and easy access to distributed services

Disadvantages: Labour intensive, time consuming, highly repetitive and error prone process, tacit procedure so difficult to share both protocol and results

Page 6: Exploring Williams-Beuren Syndrome using my Grid R.D. Stevens, a H.J. Tipney, b C.J. Wroe, a T.M. Oinn, c M. Senger, c P.W. Lord, a C.A. Goble, a A. Brass,

Why Workflows and Services?

Workflow = general technique for describing and enacting a processWorkflow = describes what you want to do, not how you want to do itWeb Service = how you want to do itWeb Service = automated programmatic internet access to applications

• Automation– Capturing processes in an explicit manner– Tedium! Computers don’t get bored/distracted/hungry/impatient!– Saves repeated time and effort

• Modification, maintenance, substitution and personalisation • Easy to share, explain, relocate, reuse and build• Available to wider audience: don’t need to be a coder, just need to know

how to do Bioinformatics • Releases Scientists/Bioinformaticians to do other work• Record

– Provenance: what the data is like, where it came from, its quality– Management of data (LSID - Life Science Identifiers)

Page 7: Exploring Williams-Beuren Syndrome using my Grid R.D. Stevens, a H.J. Tipney, b C.J. Wroe, a T.M. Oinn, c M. Senger, c P.W. Lord, a C.A. Goble, a A. Brass,

myGrid

• E-Science pilot research project funded by EPSRC www.mygrig.org.uk

• Manchester, Newcastle, Sheffield, Southampton, Nottingham, EBI and RFCGR, also industrial partners.

• ‘targeted to develop open source software to support personalised in silico experiments in biology on a grid.’

www.mygrid.org.uk

Which means….

Distributed computing – machines, tools, databanks, people

Personalisation

Provenance and Data management

Enactment and notification

A virtual lab ‘workbench’, a toolkit which serves life science communities.

Page 8: Exploring Williams-Beuren Syndrome using my Grid R.D. Stevens, a H.J. Tipney, b C.J. Wroe, a T.M. Oinn, c M. Senger, c P.W. Lord, a C.A. Goble, a A. Brass,

Workflow Components

Scufl Simple Conceptual Unified Flow LanguageTaverna Writing, running workflows & examining resultsSOAPLAB Makes applications available

Freefluo Workflow engine to run workflows

Freefluo

SOAPLABWeb Service

Any Application

Web Service e.g. DDBJ BLAST

Page 9: Exploring Williams-Beuren Syndrome using my Grid R.D. Stevens, a H.J. Tipney, b C.J. Wroe, a T.M. Oinn, c M. Senger, c P.W. Lord, a C.A. Goble, a A. Brass,

GenBank Accession No

GenBank Entry

Seqret

Nucleotide seq (Fasta)

GenScanCoding sequence

ORFs

prettyseq

restrict

cpgreport

RepeatMasker

ncbiBlastWrapper

sixpack

transeq

6 ORFs

Restriction enzyme map

CpG Island locations and %

Repetitive elements

Translation/sequence file. Good for records and publications

Blastn Vs nr, est databases.

Amino Acid translation

epestfind

pepcoil

pepstats

pscan

Identifies PEST seq

Identifies FingerPRINTS

MW, length, charge, pI, etc

Predicts Coiled-coil regions

SignalPTargetPPSORTII

InterPro

Hydrophobic regions

Predicts cellular location

Identifies functional and structural domains/motifs

Pepwindow?Octanol?

BlastWrapper

URL inc GB identifier

tblastn Vs nr, est, est_mouse, est_human databases.Blastp Vs nr

RepeatMasker

Query nucleotide sequence

BLASTwrapper

Sort for appropriate Sequences only

Pink: Outputs/inputs of a servicePurple: Tailor-made servicesGreen: Emboss soaplab services Yellow: Manchester soaplab services

RepeatMasker

TF binding Prediction

Promotor Prediction

Regulation Element Prediction

Identify regulatory elements in genomic sequence

Williams Workflow Plan

Page 10: Exploring Williams-Beuren Syndrome using my Grid R.D. Stevens, a H.J. Tipney, b C.J. Wroe, a T.M. Oinn, c M. Senger, c P.W. Lord, a C.A. Goble, a A. Brass,

A B C

The Williams Workflows

A: Identification of overlapping sequenceB: Characterisation of nucleotide sequenceC: Characterisation of protein sequence

Page 11: Exploring Williams-Beuren Syndrome using my Grid R.D. Stevens, a H.J. Tipney, b C.J. Wroe, a T.M. Oinn, c M. Senger, c P.W. Lord, a C.A. Goble, a A. Brass,

The Workflow Experience

• Correct and Biologically meaningful results• Automation

– Saved time, increased productivity– Process split into three, you still require humans!

• Sharing– Other people have used and want to develop the workflows

• Change of work practises– Post hoc analysis. Don’t analyse data piece by piece receive

all data all at once– Data stored and collected in a more standardised manner– Results amplification– Results management and visualisation

Have workflows delivered on their promise? YES!

Page 12: Exploring Williams-Beuren Syndrome using my Grid R.D. Stevens, a H.J. Tipney, b C.J. Wroe, a T.M. Oinn, c M. Senger, c P.W. Lord, a C.A. Goble, a A. Brass,

The Workflow Experience

• Activation energy versus Reusability trade-off– Lack of ‘available’ services, levels of redundancy can be limited – But once available can be reused for the greater good of the

community

• Licensing of Bioinformatics Applications– Means can’t be used outside of licensing body– No license = access third-party websites

• Instability of external services– Research level– Reliant on other peoples servers– Taverna can retry or substitute before graceful failure

• Shims

Page 13: Exploring Williams-Beuren Syndrome using my Grid R.D. Stevens, a H.J. Tipney, b C.J. Wroe, a T.M. Oinn, c M. Senger, c P.W. Lord, a C.A. Goble, a A. Brass,

shim  (sh m) n. A thin, often tapered piece of material used to fill gaps, make something level, or adjust something to fit properly. shimmed, shim·ming, shimsTo fill in, level, or adjust by using shims or a shim.

Shims

• Explicitly capturing the process• Unrecorded ‘steps’ which aren’t realised until attempting to build something• Enable services to fit together

Page 14: Exploring Williams-Beuren Syndrome using my Grid R.D. Stevens, a H.J. Tipney, b C.J. Wroe, a T.M. Oinn, c M. Senger, c P.W. Lord, a C.A. Goble, a A. Brass,

Shims

Sequencei.e. last known 3000bp

Mask BLASTIdentify new sequences and determine their degree of identity

Sequence database entryFasta format sequenceGenbank format sequence

Alignment of full query sequence V full ‘new’ sequence

Old BLAST result

Simplify and Compare

Lister

Retrieve

BLAST2

‘I want to identify new sequences which overlap with my query sequence and determine if they are useful’

Page 15: Exploring Williams-Beuren Syndrome using my Grid R.D. Stevens, a H.J. Tipney, b C.J. Wroe, a T.M. Oinn, c M. Senger, c P.W. Lord, a C.A. Goble, a A. Brass,

The Biological Results

CTA-315H11 CTB-51J22

EL

N

WB

SC

R14

RP11-622P13 RP11-148M21 RP11-731K22

314,004bp extension

All nine known genes identified

CL

DN

4

CL

DN

3

ST

X1A

WB

SC

R18

WB

SC

R21

WB

SC

R22

WB

SC

R24

WB

SC

R27

WB

SC

R28

Four workflow cycles totalling ~ 10 hoursThe gap was correctly closed and all known features identified

Page 16: Exploring Williams-Beuren Syndrome using my Grid R.D. Stevens, a H.J. Tipney, b C.J. Wroe, a T.M. Oinn, c M. Senger, c P.W. Lord, a C.A. Goble, a A. Brass,

Conclusions

• It works – a new tool has been developed which is being utilised by biologists

• More regularly undertaken, less mundane, less error prone

• Once notification is installed won’t even need to initiate it• More systematic collection and analysis of results• Increased productivity• Services: only as good as the individual services, lots of

them, we don’t own them, many are unique and at a single site, research level software, reliant on other peoples services, licenses

• Activation energy

Page 17: Exploring Williams-Beuren Syndrome using my Grid R.D. Stevens, a H.J. Tipney, b C.J. Wroe, a T.M. Oinn, c M. Senger, c P.W. Lord, a C.A. Goble, a A. Brass,

Future Directions

• Scheduling and Notification

• Portals

• Results visualisation

• Re-use: other genomic disorders, Graves Disease

Page 18: Exploring Williams-Beuren Syndrome using my Grid R.D. Stevens, a H.J. Tipney, b C.J. Wroe, a T.M. Oinn, c M. Senger, c P.W. Lord, a C.A. Goble, a A. Brass,

Acknowledgments

• Dr May Tassabehji

• Prof Andy Brass

• Medical Genetics team at St Marys Hospital, Manchester

• Wellcome Trust

www.mygrid.org.uk

Page 19: Exploring Williams-Beuren Syndrome using my Grid R.D. Stevens, a H.J. Tipney, b C.J. Wroe, a T.M. Oinn, c M. Senger, c P.W. Lord, a C.A. Goble, a A. Brass,

myGrid Peoplewww.mygrid.org.uk

CoreMatthew Addis, Nedim Alpdemir, Tim Carver, Rich Cawley, Neil Davis, Alvaro

Fernandes, Justin Ferris, Robert Gaizaukaus, Kevin Glover, Carole Goble, Chris Greenhalgh, Mark Greenwood, Yikun Guo, Ananth Krishna, Peter Li, Phillip Lord, Darren Marvin, Simon Miles, Luc Moreau, Arijit Mukherjee, Tom Oinn, Juri Papay, Savas Parastatidis, Norman Paton, Terry Payne, Matthew Pockock Milena Radenkovic, Stefan Rennick-Egglestone, Peter Rice, Martin Senger, Nick Sharman, Robert Stevens, Victor Tan, Anil Wipat, Paul Watson and Chris Wroe.

UsersSimon Pearce and Claire Jennings, Institute of Human Genetics School of Clinical

Medical Sciences, University of Newcastle, UKHannah Tipney, May Tassabehji, Andy Brass, St Mary’s Hospital, Manchester, UKPostgraduatesMartin Szomszor, Duncan Hull, Jun Zhao, Pinar Alper, John Dickman, Keith

Flanagan, Antoon Goderis, Tracy Craddock, Alastair HampshireIndustrial Dennis Quan, Sean Martin, Michael Niemi, Syd Chapman (IBM)Robin McEntire (GSK)CollaboratorsKeith Decker