data citation principles harvard may 2011: orcid and data publication - identifying knowledge...

38
Data Citation Principles Workshop, IQSS, Harvard University 16-17 May 2011 http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID ORCID and data publication Identifying knowledge contributors to motivate sharing 1 Gudmundur A. Thorisson <[email protected] > Tony Brookes bioinformatics group Departments of Genetics University of Leicester -- Outline -- Pretext: my route to workshop Ongoing & planned data publication projects Disease genetics data Planned integration with ORCID for researcher identification Role of ORCID in data publication ecosystem? [shameless] plug for Sept workshop on researcher identity This work can be freely copied, redistributed and adapted, as long as proper attribution is given Monday, 16 May 2011

Upload: gudmundur-thorisson

Post on 10-May-2015

1.254 views

Category:

Education


2 download

TRANSCRIPT

Page 1: Data Citation Principles Harvard May 2011: ORCID and data publication - Identifying knowledge contributors to motivate sharing

Data Citation Principles Workshop, IQSS, Harvard University 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

ORCID and data publicationIdentifying knowledge contributors to motivate sharing

1

Gudmundur A. Thorisson <[email protected]> Tony Brookes bioinformatics group

Departments of GeneticsUniversity of Leicester

-- Outline --• Pretext: my route to workshop

• Ongoing & planned data publication projects

• Disease genetics data

• Planned integration with ORCID for researcher identification

• Role of ORCID in data publication ecosystem?

• [shameless] plug for Sept workshop on researcher identity

This work can be freely copied, redistributed and adapted, as long as proper attribution is given

Monday, 16 May 2011

Page 2: Data Citation Principles Harvard May 2011: ORCID and data publication - Identifying knowledge contributors to motivate sharing

Data Citation Principles Workshop, IQSS, Harvard University 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

Prologue

2

Monday, 16 May 2011

Page 3: Data Citation Principles Harvard May 2011: ORCID and data publication - Identifying knowledge contributors to motivate sharing

Data Citation Principles Workshop, IQSS, Harvard University 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

3

Monday, 16 May 2011

Page 4: Data Citation Principles Harvard May 2011: ORCID and data publication - Identifying knowledge contributors to motivate sharing

Data Citation Principles Workshop, IQSS, Harvard University 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

44

Prof Anthony J Brookes GEN2PHEN coordinatorChair, Bioinformatics and GenomicsDepartment of GeneticsUniversity of Leicester, UK

Monday, 16 May 2011

Page 5: Data Citation Principles Harvard May 2011: ORCID and data publication - Identifying knowledge contributors to motivate sharing

Data Citation Principles Workshop, IQSS, Harvard University 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

5

Monday, 16 May 2011

Page 6: Data Citation Principles Harvard May 2011: ORCID and data publication - Identifying knowledge contributors to motivate sharing

Data Citation Principles Workshop, IQSS, Harvard University 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

The data sharing problem

6

Monday, 16 May 2011

Page 7: Data Citation Principles Harvard May 2011: ORCID and data publication - Identifying knowledge contributors to motivate sharing

Data Citation Principles Workshop, IQSS, Harvard University 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

Lack of incentives for sharing

• Effort required to prepare, package and submit datasets to public repositories

• Time better spent writing papers & grants

• All sticks (funders, journals) - no carrots

• Need incentives - treat data as publications and credit creators

7

“[...] Many of the issues regarding data availability can be addressed if the principles of “publication” rather than “sharing” are applied. However, online data publication systems also need to develop mechanisms for data citation and indices of data access comparable to those for citation systems in print journals”

Costello, M. Motivating Online Publication of Data. BioScience (2009) vol. 59 (5) pp. 418-427

Monday, 16 May 2011

Page 8: Data Citation Principles Harvard May 2011: ORCID and data publication - Identifying knowledge contributors to motivate sharing

Data Citation Principles Workshop, IQSS, Harvard University 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

Name ambiguity => attribution challenges

8

Are these authors all the same person?G. Thorisson, University of LeicesterG. A. Thorisson, University of LeicesterG. A. Thorisson, Cold Spring Harbor Laboratory

J. SmithJ. SmithJ. SmithJ. SmithJ. Smith [etc.]

Or these?

∼2/3 of the ∼6 million authors in MEDLINE share a last name and first initial with at least one other author, and an ambiguous name refers to ∼8 persons on average.Torvik and Smalheiser. Author name disambiguation in MEDLINE. ACM Transactions on Knowledge Discovery from Data (2009) vol. 3 (3)

How about these?

Monday, 16 May 2011

Page 9: Data Citation Principles Harvard May 2011: ORCID and data publication - Identifying knowledge contributors to motivate sharing

Data Citation Principles Workshop, IQSS, Harvard University 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

ORCID

F67572010

?

ORCID ID: B-1242-2010G. Thorisson, Univ. LeicesterG. A. Thorisson, Univ. LeicesterG. A. Thorisson, Cold Spring Harbor Lab.

ORCID ID: G-1442-2009J. Smith, Univ. North Pole

ORCID ID: D-2400-2010J. Smith, Luthor Corporation

ORCID - tackling the contributor identity problem

Monday, 16 May 2011

Page 10: Data Citation Principles Harvard May 2011: ORCID and data publication - Identifying knowledge contributors to motivate sharing

Data Citation Principles Workshop, IQSS, Harvard University 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

Projects

10

Monday, 16 May 2011

Page 11: Data Citation Principles Harvard May 2011: ORCID and data publication - Identifying knowledge contributors to motivate sharing

Data Citation Principles Workshop, IQSS, Harvard University 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

1110

1. Diagnostic laboratories

2. Central ‘clearinghouse’

3. End-users (e.g. LSDB curators)

Publish data Retrieve Atom feeds

Submi&ng  muta,ons  from  diagnos,c  labs  using  “Café  RouGE  enabled”  so<ware  via  simple  bu@on  click

Data  are  shared  with  diverse  3rd  par,es  via  manual  retrieval  or  automated  feed-­‐based  monitoring/retrieval

Cafe Variome - facilitating exchange of genetic data

Monday, 16 May 2011

Page 12: Data Citation Principles Harvard May 2011: ORCID and data publication - Identifying knowledge contributors to motivate sharing

Data Citation Principles Workshop, IQSS, Harvard University 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

12

Cafe Variome - facilitating exchange of genetic data

dbSNP  (coding)UniProt

PhenCode

Submission  from  diag.  lab

Metadata  describing  varia,on  data  published  elsewhere

Data  shared    with  diverse  3rd  par,es  and  data  usage/cita,on  tracked  via  DOI

×

DOI  assigned  to  incoming  data  upload

Already  stable  IDs  so  no  DOI  assignedA@ribu,on  given  to  data  submi@ers

via  ORCID  unique  iden,fier

Monday, 16 May 2011

Page 13: Data Citation Principles Harvard May 2011: ORCID and data publication - Identifying knowledge contributors to motivate sharing

Data Citation Principles Workshop, IQSS, Harvard University 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

12

Cafe Variome - facilitating exchange of genetic data

dbSNP  (coding)UniProt

PhenCode

Submission  from  diag.  lab

Metadata  describing  varia,on  data  published  elsewhere

Data  shared    with  diverse  3rd  par,es  and  data  usage/cita,on  tracked  via  DOI

×

DOI  assigned  to  incoming  data  upload

Already  stable  IDs  so  no  DOI  assignedA@ribu,on  given  to  data  submi@ers

via  ORCID  unique  iden,fier

Monday, 16 May 2011

Page 14: Data Citation Principles Harvard May 2011: ORCID and data publication - Identifying knowledge contributors to motivate sharing

Data Citation Principles Workshop, IQSS, Harvard University 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

13

G. Thorisson, Univ. [email protected]

ORCID ID: A-883-2010

4x variants in BRCA2gene in patient X

Publication credit for Cafe Variome deposits

CV user has linked his user account with his ORCID profile

Monday, 16 May 2011

Page 15: Data Citation Principles Harvard May 2011: ORCID and data publication - Identifying knowledge contributors to motivate sharing

Data Citation Principles Workshop, IQSS, Harvard University 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

13

G. Thorisson, Univ. [email protected]

ORCID ID: A-883-2010

4x variants in BRCA2gene in patient X

G. A. Thorisson (A-883-2010). 4x variants in BRCA2 gene. Published online via Cafe Variome. 21 January (2011) doi:10.1255/caferouge.BRCA2-2352354

=> http://api.caferouge.org/atomserver/v1/caferouge/mutations/2352354

Publication credit for Cafe Variome deposits

CV user has linked his user account with his ORCID profile

Monday, 16 May 2011

Page 16: Data Citation Principles Harvard May 2011: ORCID and data publication - Identifying knowledge contributors to motivate sharing

Data Citation Principles Workshop, IQSS, Harvard University 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

GWAS nanopublications• Foray into semantic publishing

– GWAS Central as ‘nano-publisher’

– variant<->disease assertion as nanopub

rs19243 <associatedWith> Type II diabetes + condition & provenance

• Provenance part to include:– Contributors IDs

– Contributor roles:

• Author(s) on original GWAS paper

• Curator

• Registrant

• Citability: register DOI for nanopub?

14

Monday, 16 May 2011

Page 17: Data Citation Principles Harvard May 2011: ORCID and data publication - Identifying knowledge contributors to motivate sharing

Data Citation Principles Workshop, IQSS, Harvard University 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

BRIF - measuring bioresource use and impact

• Biobanks: collections of biomaterials + associated metadata – Identification: citing, acknowledging, tracking use of

– Evaluation: assess impact

– Attribution: crediting PIs, repository managers, technicians [?]

• Digital resources, incl. biomedical databases– E.g. locus-specific databases (LSDBs), variation archives (e.g. Cafe Variome)

– How to acknowledge researchers who:

• Maintain vital community resource (e.g. http://www.wormbase.org )

• Undertake value-adding curation

– Micro-attribution: Giardine, B. et al. Systematic documentation and analysis of human genetic variation in hemoglobinopathies using the microattribution approach. Nature Genetics advance on, (2011). http://dx.doi.org/10.1038/ng.785

• BRIF online group: http://bit.ly/brif-group

15

Monday, 16 May 2011

Page 18: Data Citation Principles Harvard May 2011: ORCID and data publication - Identifying knowledge contributors to motivate sharing

Data Citation Principles Workshop, IQSS, Harvard University 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

Identifying & citing databases

16

Monday, 16 May 2011

Page 19: Data Citation Principles Harvard May 2011: ORCID and data publication - Identifying knowledge contributors to motivate sharing

Data Citation Principles Workshop, IQSS, Harvard University 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

Identifying & citing databases• Bio-databases are often cited as a collection

– E.g. “In our analysis, we used release X of SwissProt” “Our results were compared with the COL3A1 database as of Jan11”

– Example: OI variant database:Dalgleish, R. (1998) Nucleic Acids Research 26(1), 253 http://dx.doi.org/10.1093/nar/26.1.253Dalgleish, R. (1997) Nucleic Acids Research 25(1), 181 http://dx.doi.org/10.1093/nar/25.1.181Osteogenesis Imperfecta Variant Database - https://oi.gene.le.ac.uk

16

Monday, 16 May 2011

Page 20: Data Citation Principles Harvard May 2011: ORCID and data publication - Identifying knowledge contributors to motivate sharing

Data Citation Principles Workshop, IQSS, Harvard University 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

Identifying & citing databases• Bio-databases are often cited as a collection

– E.g. “In our analysis, we used release X of SwissProt” “Our results were compared with the COL3A1 database as of Jan11”

– Example: OI variant database:Dalgleish, R. (1998) Nucleic Acids Research 26(1), 253 http://dx.doi.org/10.1093/nar/26.1.253Dalgleish, R. (1997) Nucleic Acids Research 25(1), 181 http://dx.doi.org/10.1093/nar/25.1.181Osteogenesis Imperfecta Variant Database - https://oi.gene.le.ac.uk

• Are DOIs appropriate? - db’s are not ‘unchanging entities’

16

Monday, 16 May 2011

Page 21: Data Citation Principles Harvard May 2011: ORCID and data publication - Identifying knowledge contributors to motivate sharing

Data Citation Principles Workshop, IQSS, Harvard University 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

Identifying & citing databases• Bio-databases are often cited as a collection

– E.g. “In our analysis, we used release X of SwissProt” “Our results were compared with the COL3A1 database as of Jan11”

– Example: OI variant database:Dalgleish, R. (1998) Nucleic Acids Research 26(1), 253 http://dx.doi.org/10.1093/nar/26.1.253Dalgleish, R. (1997) Nucleic Acids Research 25(1), 181 http://dx.doi.org/10.1093/nar/25.1.181Osteogenesis Imperfecta Variant Database - https://oi.gene.le.ac.uk

• Are DOIs appropriate? - db’s are not ‘unchanging entities’

• Minimal information about a database - include DOI name?– What does the DOI point to? URL for database site vs. URL for db description

16

Monday, 16 May 2011

Page 22: Data Citation Principles Harvard May 2011: ORCID and data publication - Identifying knowledge contributors to motivate sharing

Data Citation Principles Workshop, IQSS, Harvard University 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

Acknowledging contributions to bio-resources

17

Monday, 16 May 2011

Page 23: Data Citation Principles Harvard May 2011: ORCID and data publication - Identifying knowledge contributors to motivate sharing

Data Citation Principles Workshop, IQSS, Harvard University 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

Acknowledging contributions to bio-resources

• Database curation– Overall mgmt/responsibility: R. Dalgleish A-3523-534-144 <maintained> 10.5335/lsdb.oi.325dff

Temporary curator appointment: J. Smith G-1442-2009 <curated> 10.5335/lsdb.oi.325dff

– Microattribution: fine-grained tracking of curator activity (insert/update/delete)

– [see also GBIF presentation]

17

Monday, 16 May 2011

Page 24: Data Citation Principles Harvard May 2011: ORCID and data publication - Identifying knowledge contributors to motivate sharing

Data Citation Principles Workshop, IQSS, Harvard University 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

Acknowledging contributions to bio-resources

• Database curation– Overall mgmt/responsibility: R. Dalgleish A-3523-534-144 <maintained> 10.5335/lsdb.oi.325dff

Temporary curator appointment: J. Smith G-1442-2009 <curated> 10.5335/lsdb.oi.325dff

– Microattribution: fine-grained tracking of curator activity (insert/update/delete)

– [see also GBIF presentation]

• Biobanking activities– Principal Investigator responsible for project (aka ‘corresponding author’)

– Laboratory personnel?

– Clinical collaborators?

17

Monday, 16 May 2011

Page 25: Data Citation Principles Harvard May 2011: ORCID and data publication - Identifying knowledge contributors to motivate sharing

Data Citation Principles Workshop, IQSS, Harvard University 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

Characterizing citations and contributions

18

Monday, 16 May 2011

Page 26: Data Citation Principles Harvard May 2011: ORCID and data publication - Identifying knowledge contributors to motivate sharing

Data Citation Principles Workshop, IQSS, Harvard University 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

Characterizing citations and contributions

• What is the nature of the resource citation?– acknowledgement / earlier or related work

– reused data or materials

– extended methodology

– ‘..this study is flawed and complete rubbish!!’

18

Monday, 16 May 2011

Page 27: Data Citation Principles Harvard May 2011: ORCID and data publication - Identifying knowledge contributors to motivate sharing

Data Citation Principles Workshop, IQSS, Harvard University 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

Characterizing citations and contributions

• What is the nature of the resource citation?– acknowledgement / earlier or related work

– reused data or materials

– extended methodology

– ‘..this study is flawed and complete rubbish!!’

• What is the nature of my contribution to the resource?– Paper: authored / undertook analysis / conceived of study / designed experiment

– Dataset: created / submitted / managed

– Database: curator / manager / PI responsible

– Biobank: sample collector / day-to-day manager / ??

– Temporal aspect:

• E.g. Mummi contributed in a curator role for SwissProt Jun 2004 to Oct 2009

18

Monday, 16 May 2011

Page 28: Data Citation Principles Harvard May 2011: ORCID and data publication - Identifying knowledge contributors to motivate sharing

Data Citation Principles Workshop, IQSS, Harvard University 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

19

Shotton, D., 2010. CiTO, the Citation Typing Ontology. Journal of Biomedical Semantics, 1(Suppl 1).doi:10.1186/2041-1480-1-S1-S6

Monday, 16 May 2011

Page 29: Data Citation Principles Harvard May 2011: ORCID and data publication - Identifying knowledge contributors to motivate sharing

Data Citation Principles Workshop, IQSS, Harvard University 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

my study <cito:extends> Thorisson et al. 2008 doi:10.433/888544jamaX

my study <cito:usesSamplesFrom> Biobank X doi:10.424/35xxjapan.5 ??

G. Thorisson (A-523-44-3423) <pro:manager> Biobank X doi:10.424/35xxjapan??

19

Shotton, D., 2010. CiTO, the Citation Typing Ontology. Journal of Biomedical Semantics, 1(Suppl 1).doi:10.1186/2041-1480-1-S1-S6

Monday, 16 May 2011

Page 30: Data Citation Principles Harvard May 2011: ORCID and data publication - Identifying knowledge contributors to motivate sharing

Data Citation Principles Workshop, IQSS, Harvard University 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

20

ORCID and contributor recognition

Monday, 16 May 2011

Page 31: Data Citation Principles Harvard May 2011: ORCID and data publication - Identifying knowledge contributors to motivate sharing

Data Citation Principles Workshop, IQSS, Harvard University 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

• Who contributed to dataset 10.4259/psycho.5gtpq-thorisson?

• All data publications by A-883-2010 ?

• Which papers have cited the works of A-883-2010 ?

• Total no. citations to datasets by A-883-2010 in the last 2 years?

• Total no. downloads of datasets by A-883-2010?

• Which database projects has A-883-2010 contributed to?

• [...]

G. Thorisson, Univ. [email protected]

ORCID ID: A-883-2010

Why track all this stuff?Enable aggregation of contributions by unique researcher ID

Monday, 16 May 2011

Page 32: Data Citation Principles Harvard May 2011: ORCID and data publication - Identifying knowledge contributors to motivate sharing

Data Citation Principles Workshop, IQSS, Harvard University 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

Current ORCID status & timeline

• Alpha prototype– Running on a sandbox website for limited testing

• partial functionality - based on ResearcherID software

• Early adopters / collaborators

• Looking to collaborate with projects– Gather use cases => feed requirements for ORCID

core system

– WHERE/HOW might ORCID be used to identify contributors?

– Joint fund-seeking to do pilot implementations

22

Monday, 16 May 2011

Page 33: Data Citation Principles Harvard May 2011: ORCID and data publication - Identifying knowledge contributors to motivate sharing

Data Citation Principles Workshop, IQSS, Harvard University 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

Current ORCID status & timeline

• Alpha prototype– Running on a sandbox website for limited testing

• partial functionality - based on ResearcherID software

• Early adopters / collaborators

• Looking to collaborate with projects– Gather use cases => feed requirements for ORCID

core system

– WHERE/HOW might ORCID be used to identify contributors?

– Joint fund-seeking to do pilot implementations

22

• Timeline for live beta system: early 2012

Monday, 16 May 2011

Page 34: Data Citation Principles Harvard May 2011: ORCID and data publication - Identifying knowledge contributors to motivate sharing

Data Citation Principles Workshop, IQSS, Harvard University 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

Example: SageCite?

• i) dataset published in SageCommons– assigned DOI via DataCite

– attribution link deposited in ORCID

• ii) derivative datasets published in SageCommons– assigned DOI => DataCite

– attribution link deposited in ORCID

• iii) analysis workflow published via myExperiment– attribution => ORCID (creator/submitter & others who contributed)

– DOI (or not - not essential?)

23

Monday, 16 May 2011

Page 35: Data Citation Principles Harvard May 2011: ORCID and data publication - Identifying knowledge contributors to motivate sharing

Data Citation Principles Workshop, IQSS, Harvard University 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

<Shameless_plug>

24

Monday, 16 May 2011

Page 36: Data Citation Principles Harvard May 2011: ORCID and data publication - Identifying knowledge contributors to motivate sharing

Data Citation Principles Workshop, IQSS, Harvard University 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

25

Monday, 16 May 2011

Page 37: Data Citation Principles Harvard May 2011: ORCID and data publication - Identifying knowledge contributors to motivate sharing

Data Citation Principles Workshop, IQSS, Harvard University 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

</Shameless_plug>

26

Monday, 16 May 2011

Page 38: Data Citation Principles Harvard May 2011: ORCID and data publication - Identifying knowledge contributors to motivate sharing

Data Citation Principles Workshop, IQSS, Harvard University 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

27

GEN2PHEN Consortiumhttp://www.gen2phen.org/about-gen2phen/partners

Prof Anthony J. Brookes Bioinformatics Group

This work has received funding from the European Community's Seventh Framework Programme (FP7/2007-2013)under grant agreement number 200754 - the GEN2PHEN project.

Acknowledgements

Contact me! Gudmundur ‘Mummi’ Thorisson

<[email protected]> |<[email protected]>http://friendfeed.com/mummi

http://www.linkedin.com/in/mummihttp://www.twitter.com/gthorisson

Monday, 16 May 2011