the seven deadly sins of bioinformatics professor carole goble [email protected] the...
TRANSCRIPT
![Page 1: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/1.jpg)
The Seven Deadly Sins of BioinformaticsProfessor Carole Goble
The University of Manchester, UK
The myGrid project http://www.mygrid.org.uk
myExperiment http://myexperiment.org
OMII-UK http://www.omii.ac.uk
![Page 2: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/2.jpg)
We’ve been developing software and ontologies and data and stuff with and
for Bioinformatics and Bioinformaticians for a long time.
“though it took two years before we understood each other!”
Andy Brass,
Professor of Bioinformatics
![Page 3: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/3.jpg)
3
![Page 4: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/4.jpg)
myGrid Taverna Workflow Workbench
http://www.mygrid.org.ukhttp://taverna.sourceforge.net
![Page 5: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/5.jpg)
5http://myexperiment.org
![Page 6: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/6.jpg)
The Seven Deadly Sins of Bioinformatics
BOSC 2007
Slideshare.net 8091views (02-June-2008)http://www.slideshare.net/dullhunk/the-seven-deadly-sins-of-bioinformatics/
From an original idea by Stevens and Lord
![Page 7: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/7.jpg)
7
Methodology• Email a handful of bioinformaticans.• Stand well back.• Collect.• Edit.
• Therapy on the cheap. • We all felt better.
![Page 8: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/8.jpg)
8
I am grateful to…• Phil Lord (University of Newcastle)• Anil Wipat (University of Newcastle)• Matthew Pocock (University of Newcastle)• Robert Stevens (University of Manchester)• Paul Fisher (University of Manchester)• Duncan Hull (Manchester Centre for Systems Biology)• Norman Paton (University of Manchester)• Marco Roos (University of Amsterdam)• Rodrigo Lopez (EBI)• Tom Oinn (EBI)• Andy Law (Roslin Institute)• Graham Cameron (EBI)
![Page 9: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/9.jpg)
10
The Traditional Sins….
1. Lust
2. Gluttony
3. Greed
4. Sloth
5. Wrath
6. Envy
7. Pride
http://en.wikipedia.org/wiki/Seven_deadly_sins
![Page 10: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/10.jpg)
11
The Sins of Bioinformatics
1. Parochialism and Insularity
2. Exceptionalism
3. Autonomy or death!
4. Vanity: Pride and Narcissism
5. Monolith Meglomania
6. Scientific method Sloth
7. Instant Gratification
![Page 11: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/11.jpg)
12
Parochialism• “being provincial, being narrow in scope, or
considering only small sections of an issue.” http://en.wikipedia.org/wiki/Parochialism
Insularity• “a person, group of people, or a community
that is only concerned with their limited way of life and not at all interested in new ideas or other cultures.” http://en.wikipedia.org/wiki/Insularity
Sin 1: Parochialism and Insularity
![Page 12: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/12.jpg)
13
Reinvention of the Wheel• Rediscovering the same old problems,
techniques, methods.• Creating…Yet another …
– identity scheme. – representation mechanism for data. – ontology. – data warehouse.– integration framework. – query or ontology or workflow language.
• Result? Misery. But more work for the boys….
![Page 13: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/13.jpg)
14
Q92983O00275O00276O00277O00278O00279O00280O14865O14866P78507
• WSL-1 protein• Apoptosis-mediating receptor DR3 • Apoptosis-mediating receptor
TRAMP • Death domain receptor 3 • WSL protein • Apoptosis-inducing receptor AIR • Apo-3 • Lymphocyte-associated receptor of
death • LARD• GENE: Name=TNFRSF25
Q93038 = Tumor necrosis factor receptor superfamily member 25 precursor
P78515Q93036 Q93037 Q99722 Q99830 Q99831 Q9BY86 Q9UME0 Q9UME1 Q9UME5
Annotation history:
http://www.expasy.org/uniprot/Q93038
![Page 14: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/14.jpg)
15
Andy Law's Third Law• “The number of unique identifiers assigned to
an individual is never less than the number of Institutions involved in the study”... and is frequently many, many more.
http://bioinformatics.roslin.ac.uk/lawslaws.html
![Page 15: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/15.jpg)
Bioinformatics is about mapping one schema to another, one format to another, one id scheme to
another.
Comparative Genomics? Comparative Bioinformatics!
What a waste of time. But a handy distraction from doing some Real
Science™.
![Page 16: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/16.jpg)
17
Andy Law’s First (Format) Law “The first step in developing a new genetic
analysis algorithm is to decide how to make the input data file format different from all pre-existing analysis data file formats.”
http://bioinformatics.roslin.ac.uk/lawslaws.html
crimapfemale male
Keightly
Knott and Haley
0 1
1 0
1 2
When they use ‘3’ and ‘4’ we will know they are doing it deliberately.
![Page 17: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/17.jpg)
18
• Organism databases• 250+ metabolic pathway
databases.• Generic Model Organism
Database Toolkit.
FlyBase, WormBase, SGD, BeeBase and many other large and small community databases
Yet another database …
Yet another Integration Platform…• Warehouses, Views,
Mashups, Workflows …• Goble and Roberts, The
state of the nation in data integration in the Life Sciences. JBI.
![Page 18: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/18.jpg)
19
BioBabel• bioperl • biojava • biopython • bioruby • biophp• biosql • biouml• bioeclipse• biofoo• biobar• bio*
![Page 19: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/19.jpg)
20
Computer Science? No thanks!
Ontology and Knowledge Representation Languages
Database Schemas
Workflow systems
Integration platforms
Programming Languages
Tools
![Page 20: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/20.jpg)
21
Why don’t biologists modularise OWL*
ontologies properly?
Er, well, like how should we do it “properly” and where are the tools to help us?
We don’t know and we haven’t got any. But here are some
vague guidelines.
W3C Semantic Web for Life Sciences mailing list, 2005
*The W3C Web Ontology Language
![Page 21: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/21.jpg)
A few months in the laboratory can save a few hours in the library.
Westheimer's Law
A few years at the computer can save a few hours on Google.
![Page 22: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/22.jpg)
23
Reuse Rocks. Collaboration through workflow and web services
VL-e Project• “instant collaboration”
with Martijn Schuemie (Rotterdam) through a web service that discloses their protein synonym data.
• Exchanging services and (sub)workflows with food scientists.
• Web services make that easier.
This isn’t the workflow – its just a picture of one!
![Page 23: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/23.jpg)
24
Of Mice and Cows• A Trypanosomiasis in Cattle workflow
(by Paul*) reused without change for Trichuris muris Infection (by Jo).
• Identified the biological pathways believed to be involved in the ability of mice to expel the parasite.
• Workflows are memes. Scientific commodities. To be exchanged and traded and vetted and mashed.
• Users add value.
*Fisher P et al A systematic strategy for large-scale analysis ofgenotype–phenotype correlations: identification of candidate genes involved in African trypanosomiasis, Nucleic Acids Research, 2007, 1–9
![Page 24: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/24.jpg)
25
Sin 2: Exceptionalism• Biologist exceptionalism
• Biological exceptionalism
• Biology exceptionalism
A root cause of Reinvention Syndrome
“Bioinformatics is special”
“Our domain specific outcomes requires-specific approaches and technologies”
![Page 25: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/25.jpg)
26
Biologist exceptionalism
• I know there is already a gene name for that gene, but, I don't like it and it doesn't fit in with my schema.
• It would be better if I wrote the script I need so I know what it does, how it does it and how to modify it later because I haven’t specified what it was supposed to do in the first place.
I’m different. We are all individuals.
![Page 26: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/26.jpg)
27
I am considerably more complex than you…
• “There are proteins, and there are records about proteins. Records come in different formats. If I make a statement using this url, is it about the record? or the protein?” Alan Ruttenberg
• “[Usually] we have one entry per gene. We have several entries for a single gene when description of variations are too complicated to describe in FT lines (of course, this criteria depends on the annotator). For viruses, it is much more messy, due to ribosomal frame-shifts. Formalise that!” Eric Jain UniProtDB
er…decomposition and untangling?
![Page 27: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/27.jpg)
28
Biology Exceptionalism• Drawing graphs of data
sets over time.• Stop it.• The real problem is
complexity not scale.• The number of data sets,
their diversity and how they overlap.
• How they change. • Their reliability.
![Page 28: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/28.jpg)
29
Biological exceptionalism“Biology is all exception.”
• Don’t complicate everyone’s life for the sake of a few esoteric cases.– Cameron’s 5th Commandment of Curation
• Exceptionalism paralysis.• Gather requirements expansively, prune
ruthlessly
• The EMBL/GenBank/DDBJ/Feature Table
![Page 29: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/29.jpg)
30
With added churn, indifference to users and
monopoly mentality.
Sin 3: Autonomy or death!
• Compounded by the Early Adopter tendency of the community
• “Hell is other people’s systems” as Jean Paul Sartre would have said if he had been a bioinformatician.
![Page 30: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/30.jpg)
31
Autonomy IS death!• I’ll Change my interface / format whenever I feel
like it, despite the fact I wanted lots of users and I have lots of users who depend on this. And I won’t bother to debug either or provide backwards compatibility. So there.– BioMART changed 4 times in the 2007.– NCBI changes as it fancies. – Ensembl relational schema.– Early BioJava.
• Churning for change sake
Professional
![Page 31: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/31.jpg)
32
No tool is an island…• Assume
– only we will use it, whatever it may be.– that it will be freestanding and unlinked to anything else.– that it will always work and will keep on working.– That everyone will understand it.
• “Well I know what I mean. And so does my friend. So I don’t need to specify it. Or document it properly. Or keep the metadata up to date.”
• Never mind the interface, just look at my implementation!
• Metadata, Models, Interfaces, Services matter.
![Page 32: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/32.jpg)
33
Workflow commodities• Workflow published with
its paper and its data set.
• So what happens when I want to run this workflow again?
• Is the service dead?• Is the dataset still there? • Was it designed to be
reproduced or reused in the first place?
![Page 33: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/33.jpg)
34
The myGrid Service DeCrypting Sweatshop notice how tired they look
Franck Tanoh Katy Wolstencroft
![Page 34: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/34.jpg)
35
Lincoln Stein said a while ago…“An interface is a contract between data
provider and data consumer”• Document interface; warn if it is unstable• Do not make changes lightly
– Even little fiddly changes break things– Provide plenty of advance warning
• When possible, maintain legacy interfaces until clients can port their scripts
Support as many interfaces as you can• HTML, Text only (better), HTTP, REST, SOAP• Easy Interfaces + Power User Interfaces
…and he could say it again today.
![Page 35: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/35.jpg)
Stability is more important than Standards, Smartness, or the latest
Thing that everybody is talking about.
![Page 36: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/36.jpg)
37
Sin 4
Vanity
Pride
Narcissismconceit, egotism or simple selfishness.
Applied to a social group, denotes elitism or an indifference to the plight of others
![Page 37: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/37.jpg)
38
I know it all.• Claiming to know everything
about biology and everything about computers.
• Really irritating to biologists AND computer scientists.
• Even they don’t claim to know everything about biology or computer science.
• Computer scientists do know a lot of stuff. And they publish too.
• “Biologists are the experts on everything because we produce the data”
And what would you suggest, Mr. Smartie Pants?
![Page 38: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/38.jpg)
39
Think like me! Be like me!
• Designing good experiments is hard.• Workflows are computational experimental protocols.
Ergo….• Writing workflows is hard.• Writing good workflows is really hard.• Writing good reusable workflows is really really hard.
Misunderstanding and disrespecting users• Building interfaces that only you can use.• Not actually using your tools in the field.
I understand workflowsWorkflows are for biologists. My granny can do workflows...
![Page 39: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/39.jpg)
A good User Experience outweighs smart features and cool computing.
Bummer.
Can I use it?
Is the user interface familiar?
Does it fit with my needs?
![Page 40: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/40.jpg)
41
Sin 5: Monolith Meglomania• delusions of grandeur. • obsession with grandiosity and extravagance.
• Data mining – “my data is mine, and your data is mine”
![Page 41: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/41.jpg)
42
• Integration – the more the merrier. No.– Every link is a potential dead link.– Every dependency finds its way on to your critical path.
• “Uber-tools” and “Uber-databases”– Ensembl, BioMART etc etc….cost
• Apps/libraries in bioinformatics workbenches– with loads of crap bundled in, none of it kept up to date,
none of it properly integrated.
• Put it all in a warehouse. – Lots of warehouses and lots of toolkits…GMOD,
BioWarehouse, BioMART blah blah….– 50% warehouses fail. (Standish Group)
![Page 42: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/42.jpg)
43
The trouble with warehouses• Warehouses work? Piffle. They never manage to maintain
synchrony with the source data. Mostly they fall down of their own weight!” Graham Cameron, EMBL-EBI
• "Our ability to capture and store data far outpaces our ability to process and exploit it. This growing challenge has produced a phenomenon we call the data tombs, or data stores that are effectively write-only; data is deposited to merely rest in peace, since in all likelihood it will never be accessed again. Data tombs also represent missed opportunities." Usamma Fayyad Yahoo! Research! Laboratories!
• We believe that attempts to solve the issues of scientific data management by building large, centralised, archival repositories are both dangerous and unworkable” Microsoft 2020 Science report.
![Page 43: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/43.jpg)
44
Distributed Annotation System http://www.biodas.org
Reference Server
AC003027AC005122
M10154
Annotation Server
Annotation Server
AC003027M10154
WI1029 AFM820 AFM1126 WI443
AC005122
Annotation Server
Now we call this a mash up
![Page 44: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/44.jpg)
45
Sin 6: Scientific Method Sloth• Its easier to think of a new name than use
someone else’s.• I want my own view over data and views are
difficult, so I’ll create my own database.
• Leads to Reinvention, Exceptionalism • Often the result of Instant Gratification
![Page 45: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/45.jpg)
46
Ennui• Garbage in, garbage out
– Running analysis over the wrong datasets– E.g. Identifying chicken proteins in mouse cells.
• Configuration traditionalism– Not changing the parameters of BLAST. Ever.
• Top list ennui– If there is a list only looking at the first one.– Look no further than the first Blast hit / first Google hit.
• Arbitrary cut-offs on rank-ordered result list– Absolute truth above, absolute falsehood below– E.g. differentially expressed genes in microarray
analyses.
![Page 46: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/46.jpg)
47
Quality Delusions• The bioinformatics does not
have to be sound, because we only trust wet-lab results anyway.
• Worrying about errors in experimental data but believing that derived data is always true.
• Believing Trembl is always right.
• Believing computational gene predictions are always correct.
![Page 47: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/47.jpg)
48
Irreproducible Black Box Science• Can you reproduce
bioinformatics analyses?– Not collecting the provenance of
the analysis.– Not testing during software
development.
• UniGene – What is happening during
UniGene clustering? – ‘Human’ descriptions (via NCBI),
are not exact. – The Human Transcriptome Map
project and other microarray analysts ended up reclustering UniGene [Marco Roos].
Stuff In
Stuff Out
![Page 48: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/48.jpg)
“No experiment is reproducible.” Wyszowski's Law
“An experiment is reproducible until another laboratory tries to
repeat it.” Alexander Kohn
![Page 49: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/49.jpg)
50
Sin 7: Instant Gratification• Greed? Gluttony?• Always the immediate return.• Never investing for the future.• The quick and dirty fix.• Refusing to model or abstract.• Refusing to plan for recording
and exchanging.• Just getting the next quick fix.• The pressure to deliver now
and pay laterwww.CartoonStock.com .
![Page 50: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/50.jpg)
51
Hackery• Deliver now, pay later
– Producing crap, non-reusable, software because only the biological results matter for publication X.
– Collect! Analyse! Er…now what?– NO up to date or useful DOCUMENTATION!
• Spaghetti-ism– Over-indulgence in PERL– Over-indulgence in Ascii Art flat files.– Modelling a system by hacking up XSD fragments on a
whiteboard.– Writing perl scripts that resemble my high-school
BASIC of the 80s.
![Page 51: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/51.jpg)
52
Law's Second Law• “Error messages should never be provided”
corollary... “If error messages are provided, they should be utterly cryptic so as to convey as little information as possible to the end user”
![Page 52: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/52.jpg)
53
Blind faith in XML (or any mechanism)• “It’s in XML, thus all
data integration problems are solved.”
• “The good thing about XML is that it is human readable”. !!*£!*!
XML
![Page 53: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/53.jpg)
54
Blind Faith in Foo.• There's a new thing to
use.• we don't understand it
yet. • so it sucks up all the stuff
we already know we don't understand.
• Lack of appreciation about exactly what the new technology addresses in itself before trying to make it work for us.
![Page 54: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/54.jpg)
55
Pioneering development methods• Development by anecdote
– I heard in the pub that the way to go was Foo.– Though I have no idea what Foo is or why it is the
way to go.
• Design by hacking– It would be better if I wrote the script I need so I
know what it does, how it does it and how to modify it later because I haven’t specified what it was supposed to do in the first place.
– Hmmm…..We call that Extreme Programming or Emergent Semantics or Web 2.0 in CS .
![Page 55: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/55.jpg)
56
At the other end of the spectrum…• Often found in industry• Over-engineered solutions• Delivered too late• That solve a problem the users didn’t know
they had• Or are hard to use• Or wire their favourite tool in
![Page 56: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/56.jpg)
57
Sin Summary
Maybe only one “original sin” in bioinformatics.
Parochialism and Insularity
Autonomy or death!
Vanity: Pride and Narcissism
Monolith Meglomania
Scientific method Sloth
Instant Gratification
Reinvention
Churn
Exceptionalism
![Page 57: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/57.jpg)
Why do these sins exist? Can we become less sinful?
Are bioinformaticians particularly naughty?
No naughtier than Computer Scientists.
And its all very hard.
Though they are naughty…
![Page 58: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/58.jpg)
59
Forgive us our sins….1. Reward Culture The Selfish Scientist and Self-promotion, Research vs
Production, Reuse is Hard
2. Mechanism evangelismMore than one mechanism
New and my shiny Gadget syndrome
3. Gang warfareIf you are not with us you are against us
Interdisciplinary mistrust and conflict
4. Blazing the trailInvention, not Reinvention
Delivery bulge
5. Fear and TrustLuddism
Trust (or lack of it)
6. Its HardHybrid Exhaustion
The Selfish Scientist and Self-promotionResearch vs ProductionReuse is Hard
More than one mechanismNew and my shiny Gadget syndrome
If you are not with us you are against usInterdisciplinary mistrust and conflict
Invention, not ReinventionDelivery bulge
LuddismTrust (or lack of it)
Hybrid Exhaustion
Long Jump
![Page 59: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/59.jpg)
60
The Selfish, Self-interested Scientist• Reputation. Results right now. Win. More funds!• Fear of dependency. Fear of being left behind.• Understand the incentives and barriers to
adoption.
“A biologist would rather share their toothbrush than their (gene) names”
Mike Ashburner, Professor Genetics, University of Cambridge, UK
![Page 60: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/60.jpg)
61
Funding and Social structures Are against shareable reusable software/ontology/database/thing
1. Self-promotion– I can publish every new monolithic thing and I can’t publish if I
reuse someone else’s thing.
2. Novelty vs Standards– Standards are boring “blue collar” science (Quackenbush)
3. Research vs Production Confusion– How do you get funding for production software other than
claiming to be researching stuff? – How do you get a publication out of a bit of research software
without claiming a potential user-base?– I don’t want to be a long-term service provider!
Short Jump
![Page 61: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/61.jpg)
63
Reuse is Hard• Writing reusable workflows is hard.
– Local services. Permissions. Licences.
• Writing reusable services is hard.– Predicting the unknown required by the unknown.– Its an effort – What is in it for me? Helping out my competitors! Forcing me
to support others?
• Finding and using “reusable” workflows and services and tools is hard– Where do you go?? What does it DO??
![Page 62: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/62.jpg)
64
My New Gadget Syndrome• This is inherited from
Computer Science I fear.
• Sorry.
• A few evangelistic voices, very loud, vested interest, for their application, win.
![Page 63: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/63.jpg)
65
More than One Mechanism Muddle
Global Identity naming mechanism for data objects in the Life Sciences. LSIDs and URIs and PURLs. WS-Naming and all its friends.
urn:lsid:uniprot.org:{db}:{id}
http://purl.uniprot.org/{db}/{id}
W3C Semantic Web Health Care and Life Sciences Interest Group [email protected]
![Page 64: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/64.jpg)
Endurants, Perdurants,Being, Substance, Event
Discipline gang warfareThe Montagues and The Capulets, Comparative and Functional Genomics, 2004
The Ontology War
Short Jump
![Page 65: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/65.jpg)
69
Invention, not Reinvention • BioMOBY pre-dates (Semantic) Web service revolution
– Though not 20 years of SOA
• OBO and OBO-Edit pre-dates OWL and Protégé-OWL– Though not 20 years of Knowledge Representation.
• Taverna pre-dates a reliable Open Source BPEL engine– Though not 20 years of lambda and pi calculus and petri nets.
• There ARE features that Bioinformatics needs that other solutions don’t cater for.
• And solutions needed URGENTLY
![Page 66: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/66.jpg)
70
Delivery Bulge
![Page 67: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/67.jpg)
71
Luddism• Refusing to have
biology go beyond a cottage industry.
• Being scared to do it properly.
• Railing against big science
• The cult of amateurism.
[Stevens]
![Page 68: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/68.jpg)
TrustI don’t trust your codeI don’t trust your data
I don’t trust you will still be around in one yearI don’t trust your workflow
I don’t trust you will use my data / workflow / code properly
![Page 69: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/69.jpg)
73
Hybrid exhaustion and pressure.• Biology + Computing + Bioinformatics• It. Is Hard.
![Page 70: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/70.jpg)
What can we do to be less sinful?
Make sharing easierMake sharing rewarded
Think ComponentsOnly standardise the minimum
Embed bioinformaticians and computer scientists
Presume naughtiness
![Page 71: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/71.jpg)
75
Think Safe Sharing of Stuff• Understanding outside my expertise. e.g.
sources of error. • A comprehensive catalogue of web services• A Facebook for workflow builders.• Learn from others. Even Computer Science.
And other Sciences.• Try and create a culture of raising quality.
Somehow.
![Page 72: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/72.jpg)
76http://myexperiment.org
![Page 73: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/73.jpg)
77
Think Components• Stop building monolithic solutions• Component-ise Bioinformatics
– Loosely coupled systems– Stable APIs, standardised metadata. – Design to combine.– Sort out the naming/id problem– If you can’t agree, agree on the bridge.– Virtualise
• Raise the level of abstraction– Less Perl, more workflows – Enable users to extract the data they need without hassling
you.
• “Standardise messages not structures” Graham Cameron
Web Services
![Page 74: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/74.jpg)
78
Think how it Really Is™…• …incremental change • …others use our stuff• …others add value to our stuff• …scientific naughtiness
• deal with it, or expose it.• end to black boxes• workflows as a route to transparency• open notebook science
• scary stuff.
Short Jump
![Page 75: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/75.jpg)
80
Think User and Developer together
• Embed Bioinformaticians with Computer Scientists and Biologists
![Page 76: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/76.jpg)
81
Think Web 2.0 Design Patterns
• De Roure, D. and Goble, C. (2008) Six Principles of Software Design to Empower Scientists. IEEE Software (to appear). http://eprints.ecs.soton.ac.uk/15032/
1. The Long Tail
2. Data is the Next Intel Inside
3. Users Add Value
4. Network Effects by Default
5. Some Rights Reserved
6. The Perpetual Beta
7. Cooperate, Don't Control
8. Software Above the Level of a Single Device
![Page 77: The Seven Deadly Sins of Bioinformatics Professor Carole Goble carole.goble@manchester.ac.uk The University of Manchester, UK The myGrid project](https://reader036.vdocuments.net/reader036/viewer/2022081515/56649e5e5503460f94b56fed/html5/thumbnails/77.jpg)
“Other men's sins are before our eyes; our own are behind our backs”
Seneca
Roman philosopher, mid-1st century AD