myexperiment @ nettab

38
Who Are You? Managing collaborative digital identities in bioinformatics with myExperiment Duncan Hull Postdoctoral Research Associate Manchester Biocentre mib.ac.uk , School of Chemistry University of Manchester, UK NETTAB 2009, Catania, Italy, June 2009

Upload: duncan-hull

Post on 11-May-2015

3.208 views

Category:

Technology


0 download

DESCRIPTION

Digital Identity is fundamental to collaboration in bioinformatics research and development because it enables attribution, contribution, publication to be recorded and quantified. However, current models of identity are often obsolete and have problems capturing both small contributions "microattribution" and large contributions "mega-attribution" in Science. Without adequate identity mechanisms, the incentive for collaboration can be reduced, and the utility of collaborative social tools hindered. Using examples of metabolic pathway analysis with the taverna workbench and myexperiment.org, this talk will illustrate problems and solutions to identifying scientists accurately and effectively in collaborative bioinformatics networks on the Web.

TRANSCRIPT

Page 1: myExperiment @ Nettab

Who Are You? Managing collaborative digital identities in bioinformatics with myExperiment

Duncan HullPostdoctoral Research AssociateManchester Biocentremib.ac.uk, School of ChemistryUniversity of Manchester, UKNETTAB 2009, Catania, Italy, June 2009

Page 2: myExperiment @ Nettab

• Intro: Collaborative social software on the Web generally

– Scientists and the web

– Publishing

– Digital Identity

• Sets to the scene for http://www.myexperiment.org in a nutshell

– The What, Who and Why and How of myexperiment

– Building an online community where Scientists share data more efficiently

– Encouraging people to share and re-use data (especially experimental protocols)

• Overcoming publish or perish culture

• Incentives to share data, tooling to make it as easy as possible

• Case Study: REFINE Project http://www.nactem.ac.uk/refine

– Refining Pathway models, myExperiment from a personal user point of view (40 minutes)

• Demonstration of myexperiment (30 minutes)

@

Page 3: myExperiment @ Nettab

Social software for collaborating on the Web < 10 yrs old

Designed to allow communication by sharing data with friends, colleagues and other people

http://tinyurl.com/myscience

Some people call this “Web 2.0”

Page 4: myExperiment @ Nettab

Unfortunately

• Many scientists don’t use these tools for serious work (if at all)

• Why?

• It’s complicated but…

Page 5: myExperiment @ Nettab

Galileo Galilei (1632) Dialogo sopra i due massimi sistemi del mondo

Page 6: myExperiment @ Nettab

Scientific publishing has worked this way for centuries

• Publishing the main (perhaps only) way of sharing data and communicating:

• “Publish or Perish”

Page 7: myExperiment @ Nettab

Digital Data Driven Science• Science is increasingly digital and data-driven

– Scientists contributions are increasingly digital

– Not just digital publications in electronic journals…

– wiki edits, software development, workflows, database curation, ontology development, blog posts

– Traditional journal publishing is often inadequate for sharing this kind of data and attributing it to individual people

Page 8: myExperiment @ Nettab

Burying or Destroying Data and Metadata?

• Publishing can be inadequte, difficult to mine

Barend MonsWikiproteins

Why bury it [data] first and then mine it again?

Which gene did you mean?http://pubmed.gov/15941477

BMC Bioinformatics. 2005 Jun 7;6:142.

In other cases important data and metadata gets destroyed completely

(author, title, gene, protein, chemical names etc)

Make digital libraries difficult to useDefrosting the Digital Library Hull, Pettifer and Kellhttp://www.pubmed.gov/18974831 PLoS Computational Biology 2008 Oct;4(10):e1000204

Page 9: myExperiment @ Nettab

Double Trouble!

1. Scientists reluctant to share data until published in peer-reviewed journals

2. When they do publish, data often gets badly damaged or destroyed in the process. Digital Identity of people gets especially mangled…

CC licensed double trouble picture by Puck90 http://www.flickr.com/photos/puck90/2480833393/

Page 10: myExperiment @ Nettab

Digital Identity is currently a mess (part 1)

• One person, can be identified by many different URIs

• People who know Paolo can tell the difference

– People who don’t (and software) face a significant challenge to disambiguate

• Digital Identity is a second-class citizen on the Web (see http://www.flickr.com/photos/dullhunk/3618998907/ for web e.g.)

1. http://www.nettab.org/promano/ (nettab organiser)2. mailto:[email protected]. http://www.paoloromano.it/ 4. en.wikipedia.org/wiki/Paolo_Romano (sculptor)5. it.wikipedia.org/wiki/Paolo_Romano (actor)6. www.linkedin.com/in/paoloromano7. http://pubmed.gov?Term=Paolo+Romano[author]8. myspace.com/paoloromano (musician)9. www.paoloromano.net/ (politician and friend of Berlusconi)10.citeulike.org/tag/paolo-romano11. ...uni-trier.de/~ley/db/indices/a-tree/r/Romano_0001:Paolo.html

Will the real Paolo Romano please stand up?

URI’s are used for identifying people on the web

Page 11: myExperiment @ Nettab

Digital attribution

Neil Smalheiser and Vetle Torvik

Attribution would seem to be a simple process and yet it represents a

major, unsolved problem for information science.

Author name disambiguationChapter published in Volume 43 (2009) of the Annual Review of Information Science and Technology (ARIST) (edited by B. Cronin) which is available from the publisher Information Today, Inc

http://www.hbs.edu/units/tom/seminars/2007/docs/Author%20Name%20Disambiguation.pdf

Page 12: myExperiment @ Nettab

Misattribution

Google Scholar thinks I’m Maurice Wilkins

Dr. Duncan HullHumble Postdoc

Articleabout

Authored-by

Authored-by

Wrong!

“DNA mania”

title

http://tinyurl.com/mistakenid

Page 13: myExperiment @ Nettab

Digital identity is currently a mess (part 2)

• On three levels, the three A’s:

– Authentication: is Paolo is who he says he is? Or a fake?

– Authorisation: is Paolo authorised to view/operate-on workflow?

– Attribution: Paolo AuthorOf Nettab-Workflow or

Paolo Reused Workshop-Workflow

Currently done through combination of username-and-password

http://tinyurl.com/too-many-passwords

Paolo Romano

Simon Willison(The Guardian)

The average user has

[at least]

18 user accounts

and 3.49 passwords”

Page 14: myExperiment @ Nettab

Digital Identity Really Matters

• Digital Identity is fundamental to collaboration because it enables

– Attribution …

– Contribution…

– Publication … to be recorded and quantified.

• Important decisions made on digital identity

– Hiring, funding, promotion, collaboration

– Selecting appropriate reviewers for grants and publications

– attributing published data

• This is the envionment which myexperiment operates in:

– A “Publish or perish” culture in science

– Encourage workflow sharing before, during & after traditional publication

• Via the website www.myexperiment.org and it’s various API’s

– Get digital attribution done right, with more reliable digital identities

Page 15: myExperiment @ Nettab

What is myExperiment?• Facebook for Scientists?

• Collaborative software for sharing and finding experimental protocols on the web

Page 16: myExperiment @ Nettab

User Profiles Groups Friends Sharing Tags Workflows Developer interface Credits and Attributions Fine control over privacy Packs Federation Enactment

What is myExperiment?What is myExperiment?

Unique Selling Points, key differentiators to Facebook etc

Page 17: myExperiment @ Nettab

Kepler

Triana

BPEL

Ptolemy II

Taverna

Trident

BioExtract

Page 18: myExperiment @ Nettab

Who is involved in myExperiment?

• Small team of developers (2-3 full time)

• 1500 users have uploaded 560 workflows, 150 files and 40 packs in 130 groups

Carole Goble

David De Roure

Page 19: myExperiment @ Nettab

http://openid.net/

Tackling Digital Identity and attribution

Page 20: myExperiment @ Nettab

Open ID is quickly becoming widespread

“42,235 sites are now enabled to accept OpenID logins” sourcehttp://blog.janrain.com/2009/05/relying-party-stats-as-of-may-1-2009.html

Page 21: myExperiment @ Nettab

But you can’t force OpenID on people…(yet)

http://romano.myopenid.com/

[email protected]

nettab

OR

Password handled by third partyOpenID provider

+84%

16%

Page 22: myExperiment @ Nettab

Once logged in, each user gets a profile pageidentified by a URI

Page 23: myExperiment @ Nettab
Page 24: myExperiment @ Nettab
Page 25: myExperiment @ Nettab
Page 26: myExperiment @ Nettab

Search Engine

reviewsratingsgroupsfriendships

tags

Enactor

filesworkflows

`

HTML

For DevelopersFor Developers

RDF Store

SPAR

QL

endp

oint

Managed REST API

facebook

iGoogle

android

XML

APIconfig

mySQL

profiles

packscredits

Page 27: myExperiment @ Nettab

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX myexp: <http://rdf.myexperiment.org/ontology#>PREFIX sioc: <http://rdfs.org/sioc/ns#>select ?friend1 ?friend2 ?acceptedat where {?z rdf:type<http://rdf.myexperiment.org/ontology#Friendship> . ?z myexp:has-requester?x .?x sioc:name ?friend1 . ?z myexp:has-accepter ?y . ?y sioc:name ?friend2 .?z myexp:accepted-at ?acceptedat }

All accepted Friendships including accepted-at time Semantically-Interlinked

Online Communities

SPARQL endpoint: maximises data re-useSPARQL endpoint: maximises data re-use

Page 28: myExperiment @ Nettab

future workPhase 2

• Repository integration (institutional: EPrints, Fedora)

• Controlled vocabularies

• Relationships between items (in and between packs)

• Recommendations

• Improved search ranking and faceted browsing

• Indexing of packs

• New contribution types (Meandre, Kepler, e-books)

• Further blog / wiki integration

• Biocatalogue integration

Phase 2Phase 2

Page 29: myExperiment @ Nettab

Representing Evidence For Interacting Network Elements

Page 30: myExperiment @ Nettab

http://www.biomodels.net

http://www.sbml.org

http://pubmed.gov

Case Study REFINE Project: Improving SBML models

Metabolic reconstruction

Difficult

Document level

“tools and resources” - fairly straightforward

Page 31: myExperiment @ Nettab

Example from Glycolysis in Yeast

reactant

reactant product

productmodifier

This is just one reaction, there are at least another 1700+ in Yeast

Page 32: myExperiment @ Nettab

Refine Workflow:

1. Given SBML file, list all reactions

2. For each reactant, get synonms (e.g. synonyms of “D-glucose”)

3. Construct PubMed queries and execute them

4. Rank results

5. Display results to user

Workflow itself not rocket science (just a tool that needed to be built)

Services 2 and 4 have been based on other people’s workflows

saved lots of effort re-inventing the wheel

Services 1, 3 and 5 are “private” during prototyping

Page 33: myExperiment @ Nettab

• Of the 661 workflows, 531 are publicly visible whereas 502 are publicly downloadable.

• 3% of the workflows with restricted access are entirely private to the contributor and for the remaining they elected to share with individual users and groups.

• 69 workflows (over 10%) have been shared, with the owner granting edit permissions to specific users and groups.

• In addition there are 52 instances where users have noted that a workflow is based on another workflow on the site.

• The most viewed workflow has 1566 views.

• There are 50 packs, ranging from tutorial examples to bundles of materials relating to specific experiments.

C

Some preliminary data: First few months of use

Page 34: myExperiment @ Nettab

Conclusions

• myExperiment experience so far has been

• Scientists do share data but…

– you need to get digital identity right (still an unsolved problem)

– Get digital attribution right

• Allow fine grained control over what is shared and when with who and with what license…

Page 35: myExperiment @ Nettab

Conclusions: Aristocracy 2.0 or Democracy 2.0?

Web 2.0 Science 1.0 ?

Wisdom of Crowds Wisdom of experts

Lightly filtered information (or not filtered at all)

Heavily filtered information (peer review)

Democratic (“a link is a vote”) andTechnocratic (“The geeks shall inherit the earth”)

Artistocratic? (program committees, editorial boards, funding panels, academic faculty staff etc)

Low barrier to entry, inclusive High barrier to entry, exclusive

What will Science 2.0 look like once scientists start sharing more data on the web?

We live in exciting times!

Page 36: myExperiment @ Nettab

Conclusions: Participation Inequality:

http://www.useit.com/alertbox/participation_inequality.html

Dr. Jakob Nielsen

90% of users in online communities are “lurkers” who never contribute

Page 37: myExperiment @ Nettab

We need you!

• It’s all about collaboration

• Sign up for an account at http://www.myexperiment.org

• Please get in touch if you’d like to join in

• Mailing list [email protected]

• Questions?

• … and now for a live demonstration

Page 38: myExperiment @ Nettab

Grazie!

• Paolo Romano, Rosalba Guigno and the organisers / delegates of NETTAB 2009

• Università degli Studi di Catania (University of Catania) for hosting

• Rete Nazionale de Bioinformatica Oncologica (Italian Network for Oncology Bioinformatics) http://www.rnbio.it for funding

• myExperiment team, led by Dave De Roure, Carole Goble, also Jiten Bhagat, Danius Michaelides, Don Cruickshank, Sergejs Aleksejevs, Paul Fisher, ( Also Kell Group lab members, Paul Dobson and Neil Swainston)

• REFINE project, Sophia Ananiadou, Douglas Kell, Steve Pettifer, Jun'ichi Tsujii, Yoshimasa Tsuruoka funded by BBSRC and at http://www.nactem.ac.uk