niso working group connection live! research data metrics landscape: an update from the niso...
TRANSCRIPT
NISO Working Group Connection LIVE!Research Data Metrics Landscape:
An update from the NISO Altmetrics Working Group B: Output Types & Identifiers
Monday, November 16, 2015
Presenters:
Kristi Holmes, PhD, Director, Galter Health Sciences Library, Northwestern UniversityMike Taylor, Senior Product Manager, Informetrics, Elsevier
Philippe Rocca-Serra, Ph.D., Technical Project Leader, Oxford
Tom Demeranville, THOR Senior Project Officer & ORCiD Software Engineer
Martin Fenner, Technical Director, DataCite
Dr. Sarah Callaghan, Senior Researcher and Project Manager, British Atmospheric Data Centre
Dr. Melissa Haendel, Associate Professor, Ontology Development Group, OHSU Library, Dept of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University
http://www.niso.org/news/events/2015/wg_connections_live/altmetrics_wgb/
•
•
•
•
•
•
•
•
•
•
•
•
••
•
•
••
•
1.2.
3.4.5.
6.
7.
8.
Tha nk you!
Data-Level Metrics
Martin FennerDataCite Technical Director http://orcid.org/0000-0003-1419-2405
Project PartnersCalifornia Digital Library, PLOS, DataONE
National Science Foundation Grant 1448821 http://www.nsf.gov/awardsearch/showAward? AWD_ID=1448821
Project Pagehttp://mdc.lagotto.io
Making Data Count
MDC TeamStephen Abrams Matt Jones Peter Slaughter John KratzDave Vieglais
Project ends February 29, 2016
Jennifer Lin John Chodacki Patricia Cruse Martin Fenner Kristen Ratan Carly Strasser
Goals
What metrics for research data do researchers and data managers want?
Do data repositories make these metrics available?
If not, build services to collect these metrics for DataONE repository network
How interested would you be to know each of the following about the impact of your data?
http://doi.org/10.1038/sdata.2015.39http://www.dx.doi.org/10.5060/D8H59D
What metrics/statistics does your repository currently track and expose?
http://doi.org/10.1038/sdata.2015.39http://www.dx.doi.org/10.5060/D8H59D
Citations
Metadata of datasets
https://search.labs.datacite.org/?q=10.5061%2FDRYAD.KG943
Metadata of articles
References are part of the metadata deposited to CrossRef
Cited-by service aggregates these citations for CrossRef DOIs
Work is underway to exchange DOI <-> DOI links between CrossRef and DataCite
https://cls.labs.datacite.org htts://det.crossref.org
DOI <-> DOI links are stored outside of the DataCite and CrossRef Metadata Stores
Fulltext search
http://dlm.labs.datacite.org/works/http://doi.org/10.5061/dryad.f1cb2
Second order events
http://dlm.labs.datacite.org/sources/pmceurope
Downloads
Usage Stats
aggregate DataOne usage log files from DataOne member nodes
parse logs, applying COUNTER rules•
•double-click intervals whitelist user agents
two versions of usage stats•
•COUNTER-compliantpartial compliant (include some bots)
Average %
of not
filteredsince 2005COUNTER 63.57%
Partial 63.59%
this past yearCOUNTER 44.88%
Partial 47.05%
Usage Stats
Future Work
•Collect data citations from CrossRef
•Analyze usage statistics in more detail and provide input to COUNTER and NISO
•Analyze network graph, e.g. linked datasets and second order citations
•Turn research project into service, including integration of client applications for search and reporting
Introducing the Metadata Model v1
Philippe Rocca-Serra PhD,University of Oxford e-Research Centre
on behalf of WG3 Metadata WG
Supported by the NIH grant 1U24 AI117966-01 to the University of California, San Diego
A trans-NIH funding initiative established to enable biomedical research as a digital
research enterprise
• Facilitate broad use of biomedical digital assets by making them discoverable, accessible, and citable ->
• Conduct research and develop the methods, software, and tools needed to analyze biomedical Big Data ->
catalog to enable researchers to find, cite research datasets
ease the use community standards to annotate datasets
Lucila Ohno-Machado (PI)Jeff Grethe
Supported by the NIH grant 1U24 AI117966-01 to the University of California, San Diego
Supported by the NIH grant 1U24 AI117966-01 to the University of California, San Diego
Pilot applications that ‘dock’ with the prototype and community-driven activities via Working Groups:1. BD2K Centers of Excellence Collaboration2. Data Identifiers Recommendation3. Metadata Specifications 4. Use Cases and Testing Benchmarks5. Dataset Citation Metrics6. Criteria for Being Included in the DDI7. Machine Actionable Licenses8. Ranking Algorithm9. End User Evaluation Criteria10. Repository Collaboration11. Outreach Meeting: Repository Operators12. Standard-driven Curation Best Practices13. Evaluation of Harvesting and NLP Pilot Projects
All this by August 2017!
Joint effort with BD2K Center for Expanded Data Annotation and Retrieval (CEDAR)
Synergies with BD2K cross-centers Metadata WG (co-chaired by M Musen/CEDAR, G Alter/bioCADDIE) and ELIXIR activities
WG3 Metadata - Goals
Define a set of metadata specifications that support intended capability of the Data Discovery Index prototype - being designed by the bioCADDIE Core Development Team - as outlined in the White Paper
Core metadata, designed to be future-proofed for progressive extensions (phase 1: May-July 2015) Followed by test and implementation phase
Domain specific metadata for more specialized data types (phase 2)
Use cases and the competency questions have been used throughout the process To define the appropriate boundaries and level of granularity:
which queries will be answered in full, which only partially, and which are out of scope
Supported by the NIH grant 1U24 AI117966-01 to the University of California, San Diego
Supported by the NIH grant 1U24 AI117966-01 to the University of California, San Diego
WG3 Metadata – work to date
with contributions, comments from several WG 3 members and colleagues, in particular: Joan Starr, George Alter, Ian Fore, Kevin Read, Stian Soyland-Reyes, Muhammad Amith, Michel Dumontier…
By:
Contains lists of material reviewed• data discovery initiatives and metadata initiatives• existing meta-models for representing metadata elements
Outlines the approach used to identify metadata descriptors • Via use cases and competency questions (top-down
approach)• Mapping generic and life science-specific metadata
schemas (bottom-up approach) Listed in the BioSharing collection for bioCADDIE
The results of both approaches has been compared and converged on the core set of metadata
Supported by the NIH grant 1U24 AI117966-01 to the University of California, San Diego
Standard Operating Procedure (SOP)
List of Metadata Schema considered
• schema.org• datacite• hcls dataset descriptors• biosample• geo miniml• prideml• isatab/magetab• ga4gh metadata schema• sra xml• bioproject• cdisc sdm / element of bridge modelSupported by the NIH grant 1U24 AI117966-01 to the University of
California, San Diego
Bottom-up approach: survey of existing models
Supported by the NIH grant 1U24 AI117966-01 to the University of California, San Diego
Selected competency questions representative set from use cases workshop, white paper, submitted by the
community and from Phil Bourne questions have been abstracted and key metadata elements have been
highlighted and color-coded and categorized as the set of core and extended metadata elements are defined, it will
become clearer which questions the Data Discovery Index will not be able to answers if full and which only in part.
Supported by the NIH grant 1U24 AI117966-01 to the University of California, San Diego
Use Cases and Derived Metadata
Selected competency questions representative set from use cases workshop, white paper, submitted by the
community and from Phil Bourne questions have been abstracted and key metadata elements have been
highlighted and color-coded and categorized as the set of core and extended metadata elements are defined, it will
become clearer which questions the Data Discovery Index will not be able to answers if full and which only in part.
Supported by the NIH grant 1U24 AI117966-01 to the University of California, San Diego
Use Cases and Derived Metadata
Processing use cases
Supported by the NIH grant 1U24 AI117966-01 to the University of California, San Diego
All use cases on equal footing
Term BinningMaterialProcessInformationProperty
Relation identification
Core metadata elements and initial model the result of the combined approaches has delivered a set of core metadata
elements and progressively these will/could be extended to domain specific ones, in phase two, as needed
we aim to have maximum coverage of use cases with minimal number of data elements, but we do foresee that not all questions can be answered in full
Initial Set of Metadata Elements
Initial Set of Metadata Elements
Everything is on github
Supported by the NIH grant 1U24 AI117966-01 to the University of California, San Diego
Formal specificationsmetadata schema in JSON
• https://github.com/biocaddie/WG3-MetadataSpecifications/tree/master/json-schemas
Supported by the NIH grant 1U24 AI117966-01 to the University of California, San Diego
What’s next ?
With this work phase 1 has been completed We have entered the evaluation phase
the model will be implemented and tested by the bioCADDIE Development Team with a number of data sources
the results will inform the activities in phase 2, where the metadata elements and the model may be revised, simplified and/or enriched, as needed
Supported by the NIH grant 1U24 AI117966-01 to the University of California, San Diego
Take Home Message
• primary goal: provide a general purpose metadata
schema allow harvesting of key experimental and
data descriptors from a variety of resources and
enable indexing to support data discovery
– relations between authors, datasets, publication
and funding sources
– nature of biological signal, nature of perturbation,
Supported by the NIH grant 1U24 AI117966-01 to the University of California, San Diego
Outstanding issues
• prioritizing the use cases
• defining mechanisms to deal with domain specific,
granular data
• moving into phase2 and devising data ingesters
– ETL activities
– interact with other modeling efforts
• incorporate feedback from users and developers
Supported by the NIH grant 1U24 AI117966-01 to the University of California, San Diego
Question Time
Supported by the NIH grant 1U24 AI117966-01 to the University of California, San Diego
orcid.org
Contact Info: p. +1-301-922-9062 a. 10411 Motor City Drive, Suite 750, Bethesda, MD 20817 USA
ORCID, Metrics andProject THOR Tom Demeranville Senior Technical Officier – Project THOR NISO Webinar, November 2015
Start Here
What is ORCID?
orcid.org
16 November 2015
55
ORCID is an infrastructure that provides unique Person Identifiers. ORCID is a hub for linking identifiers for people with their activities. ORCID is researcher centric with 1.7 million registered identifiers.ORCID records are managed by the researcher themselves. ORCID is open source, community governed and non-profit. ORCID has a public API that allows querying of non-private data. ORCID has a member API that enables updating and notifications. ORCID IDs are associated with over 4 million unique DOIs
347 members, 4 national consortia,over 200 integrations
research inst 68%
publisher 12%
funder 5%
9%
association 6%
repository MEA 3%
orcid.org
16 November 2015
56
Europe 58%Latin
America 1%
North Ameri
ca 26%
Pacific 7%
Asia 5%
What ORCID isn’t
orcid.org
16 November 2015
57
ORCID is not a CRIS systemORCID is not a researcher profile system ORCID is not a research activity metadata store
Research outputs
orcid.org
16 November 2015
58
• ORCID includes links to publications, patents, datasets, software and more.
• ORCID uses the CASRAI Output vocabulary for work types
• ORCID references over 20 other output identifiers (more are being added!)
Otherresearcher activities
orcid.org
16 November 2015
59
• Peer review• Education• Employment
ORCID and Metrics
orcid.org
16 November 2015
60
ORCID doesn’t track metrics – it’s not our focus
ORCID is an enabling
infrastructure ORCID improves
robustness of metrics
ORCID and Metrics
orcid.org
16 November 2015
61
• ORCID improves the quality of research information and makes gathering it and disseminating it easier.
• Other services use ORCID IDs to improve their data• ORCID IDs are found in DOI metadata, funder
systems, publishers, CRIS systems, national reporting frameworks and more
• Institutions can discover researcher curated standard and non-standard outputs or be notified when added
Project THOR
http://project-thor.eu
EC funded H2020 2.5 year project
Establish seamless integration between articles, data, and researchers across the research lifecycle
Make persistent identifier use for people and research artefacts the default
Both human and technical in scope
http://project-thor.eu
What THOR are up to
http://project-thor.eu
Research - Deciding what needs to be done Integration - Doing what needs to be done Outreach - Getting others involved Sustainability - Making sure it lasts
Organisation identifiers
http://project-thor.eu
Organisation identifiers are important for all areas of scholarly communication, including metrics.
The organisation identifier landscape is fragmented. There are gaps.
It’s a hard problem. Everyone knows this.
Organisation identifiers
http://project-thor.eu
Community driven consensus on requirements is needed.
We need a way forward.
THOR will help by convening meetings with all interested parties in the community, including research institutions, funders, datacentres, publishers, standards bodies, existing organisation identifier and other identifier providers.
orcid.org
Contact Info: p. +1-301-922-9062 a. 10411 Motor City Drive, Suite 750, Bethesda, MD 20817 USA
ORCID, Metrics andProject THOR Tom Demeranville Senior Technical Officier – Project THOR NISO Webinar, November 2015
Start Here
What is ORCID?
orcid.org
16 November 2015
70
ORCID is an infrastructure that provides unique Person Identifiers. ORCID is a hub for linking identifiers for people with their activities. ORCID is researcher centric with 1.7 million registered identifiers.ORCID records are managed by the researcher themselves. ORCID is open source, community governed and non-profit. ORCID has a public API that allows querying of non-private data. ORCID has a member API that enables updating and notifications. ORCID IDs are associated with over 4 million unique DOIs
347 members, 4 national consortia,over 200 integrations
research inst 68%
publisher 12%
funder 5%
9%
association 6%
repository MEA 3%
orcid.org
16 November 2015
71
Europe 58%Latin
America 1%
North Ameri
ca 26%
Pacific 7%
Asia 5%
What ORCID isn’t
orcid.org
16 November 2015
72
ORCID is not a CRIS systemORCID is not a researcher profile system ORCID is not a research activity metadata store
Research outputs
orcid.org
16 November 2015
73
• ORCID includes links to publications, patents, datasets, software and more.
• ORCID uses the CASRAI Output vocabulary for work types
• ORCID references over 20 other output identifiers (more are being added!)
Otherresearcher activities
orcid.org
16 November 2015
74
• Peer review• Education• Employment
ORCID and Metrics
orcid.org
16 November 2015
75
ORCID doesn’t track metrics – it’s not our focus
ORCID is an enabling
infrastructure ORCID improves
robustness of metrics
ORCID and Metrics
orcid.org
16 November 2015
76
• ORCID improves the quality of research information and makes gathering it and disseminating it easier.
• Other services use ORCID IDs to improve their data• ORCID IDs are found in DOI metadata, funder
systems, publishers, CRIS systems, national reporting frameworks and more
• Institutions can discover researcher curated standard and non-standard outputs or be notified when added
Project THOR
http://project-thor.eu
EC funded H2020 2.5 year project
Establish seamless integration between articles, data, and researchers across the research lifecycle
Make persistent identifier use for people and research artefacts the default
Both human and technical in scope
http://project-thor.eu
What THOR are up to
http://project-thor.eu
Research - Deciding what needs to be done Integration - Doing what needs to be done Outreach - Getting others involved Sustainability - Making sure it lasts
Organisation identifiers
http://project-thor.eu
Organisation identifiers are important for all areas of scholarly communication, including metrics.
The organisation identifier landscape is fragmented. There are gaps.
It’s a hard problem. Everyone knows this.
Organisation identifiers
http://project-thor.eu
Community driven consensus on requirements is needed.
We need a way forward.
THOR will help by convening meetings with all interested parties in the community, including research institutions, funders, datacentres, publishers, standards bodies, existing organisation identifier and other identifier providers.
VO Sandpit, November 2009
Bibliometrics for Data – what counts and what doesn’t?
Sarah [email protected]
@sorcha_ni
NISO Working Group Connections LIVE!Research Data Metrics Landscape:
An update from the NISO Altmetrics Working Group B: Output Types & IdentifiersMonday, November 16 from 11:00 a.m. - 1:00 p.m. (ET)
VO Sandpit, November 2009
The UK’s Natural Environment Research Council (NERC) funds six data centres which between them have responsibility for the long-term management of NERC's environmental data holdings.
We deal with a variety of environmental measurements, along with the results of model simulations in:•Atmospheric science•Earth sciences•Earth observation•Marine Science•Polar Science•Terrestrial & freshwater science, Hydrology and Bioinformatics•Space Weather
Who are we and why do we care about data?
VO Sandpit, November 2009
Data, Reproducibility and Science
Science should be reproducible – other people doing the same experiments in the same way should get the same results.
Observational data is not reproducible (unless you have a time machine!)
Therefore we need to have access to the data to confirm the science is valid! http://www.flickr.com/photos/31333486@N00/1893012324/
sizes/o/in/photostream/
VO Sandpit, November 2009
It used to be “easy”…
Suber cells and mimosa leaves. Robert Hooke, Micrographia, 1665
The Scientific Papers of William Parsons, Third Earl of Rosse 1800-1867
…but datasets have gotten so big, it’s not useful to publish them in hard copy anymore
VO Sandpit, November 2009
Hard copy of the Human Genome at the Wellcome Collection
VO Sandpit, November 2009
Creating a dataset is hard work!
"Piled Higher and Deeper" by Jorge Chamwww.phdcomics.com
Managing and archiving data so that it’s understandable by other researchers is difficult and time consuming too.
We want to reward researchers for putting that effort in!
VO Sandpit, November 2009
Most people have an idea of what a publication is
VO Sandpit, November 2009
Most people have an idea of what a publication is
VO Sandpit, November 2009
Most people have an idea of what a publication is
VO Sandpit, November 2009
Most people have an idea of what a publication is
VO Sandpit, November 2009
Some examples of data (just from the Earth Sciences)
1. Time series, some still being updated e.g. meteorological measurements
2. Large 4D synthesised datasets, e.g. Climate, Oceanographic, Hydrological and Numerical Weather Prediction model data generated on a supercomputer
3. 2D scans e.g. satellite data, weather radar data
4. 2D snapshots, e.g. cloud camera5. Traces through a changing medium,
e.g. radiosonde launches, aircraft flights, ocean salinity and temperature
6. Datasets consisting of data from multiple instruments as part of the same measurement campaign
7. Physical samples, e.g. fossils
VO Sandpit, November 2009
What is a Dataset?
DataCite’s definition (http://www.datacite.org/sites/default/files/Business_Models_Principles_v1.0.pdf):
Dataset: "Recorded information, regardless of the form or medium on which it may be recorded including writings, films, sound recordings, pictorial reproductions, drawings, designs, or other graphic representations, procedural manuals, forms, diagrams, work flow, charts, equipment descriptions, data files, data processing or computer programs (software), statistical records, and other research data."
(from the U.S. National Institutes of Health (NIH) Grants Policy Statement via DataCite's Best Practice Guide for Data Citation).
In my opinion a dataset is something that is:•The result of a defined process•Scientifically meaningful•Well-defined (i.e. clear definition of what is in the dataset and what isn’t)
VO Sandpit, November 2009
What metrics do we use for our data?
VO Sandpit, November 2009
Metric Breakdown CEDA numbers Notes
Number of discovery dataset records in the DCS
Quarterly NEODC 26 BADC 242 UKSSDC 11
Compliance with NERC data management policy. Reflects how many data sets NERC has. The number of dataset discovery records visible from the NERC data discovery service.
Web site visits Quarterly
BADC: 61,600NEODC: 10,200
Active use and visibility of the data centre. Site visits from standard web log analysis systems, such as webaliser. Sensible web crawler filters should have been applied.
Web site page views
Quarterly BADC: 219,900NEODC: 25,800
See web visits notes.
Queries closed this period
Quarterly 362 helpdesk queries838 dataset applications
Active use and visibility of the data centre. Queries marked as resolved within the quarter. A query is a request for information, a problem or ad hoc data request.
Queries received in period
Quarterly 388 helpdesk queries860 dataset applications
Active use and visibility of the data centre. See closed query notes.
Data centre metrics – produced 15th July 2014
VO Sandpit, November 2009
Metric Breakdown CEDA numbers NotesPercent queries dealt with in 3 working days
Quarterly 84.06 (11.57% resolved after 3 days)87.67 (10.23% resolved after 3 days)Queries receiving initial response within 1 working day Helpdesk - 93.57 %Dataset applications - 97.91%
Responsiveness. See closed query notes
Identifiable users actively downloading
None Over year to date: BADC: 4065NEODC: 362
Use and visibility of the data centre. An estimate of the number of users using data access services over the year.
Number of metadata records in data centre web site
None BADC: 240NEODC:33
INSPIRE compliance. Reflects how many data sets NERC has.
Number of datasets available to view via the data centre web site
None (Metric in development) INSPIRE compliance. Usable services.
Number of datasets available to download via the data centre web site
None (Metric in development) INSPIRE compliance. Usable services.
Data centre metrics – produced 15th July 2014
VO Sandpit, November 2009
Metric Breakdown CEDA numbers NotesNERC funded Data centre staff (FTE)
None 14 (estimate for FY 14/15)
Data management costs. Efficiency. Number of full time equivalent posts employed to perform data centre functions.
Direct costs of Data Stewardship in data centre
None (reportable at end of financial year)
Data management costs. Efficiency. Cost to NERC
Capital Expenditure directly related to Data Stewardship at data centre
None (reportable at end financial year)
Data management costs. Efficiency.
Direct Receipts from Data Licenses and Sales
None £0 (CEDA does not charge for data)
Commercial value of data products and services
Number of projects with Outline Data Management Plans
None (Metric in development)
Means of tracking projects’ adoption of good DM practice. Outline DMP is at proposal stage
Number of projects with Full Data Management Plans
None (Metric in development)
Means of tracking projects’ adoption of good DM practice. Full DMP is at funded stage
Users by area UK 2534 61% Active use. Visibility of the data centre internationally. Percentage of user base in terms of geographical spread.
Europe 494 12%Rest of the world
1024 25%
Unknown 79 2%Users by institute type University 2934 71% Active use. Visibility of the data centre
sectorially. Percentage of users base in terms of the users host institute type.
Government 694 17%NERC 160 4%Other 277 7%Commercial 42 1%School 35 1%
VO Sandpit, November 2009
Short answer:
We don’t know!!
Unless the data user comes back to us to tell us.Or we stumble across a paper which•Cites us•Or mentions us in a way that we can find
• And tells us what the dataset the authors used was.
This is why we’re working with other groups (like CODATA, Force11, RDA, DataCite, Thompson Reuters,…) to promote data citation.
After the data is downloaded, what happens then?
VO Sandpit, November 2009
How we (NERC) cite data
We using digital object identifiers (DOIs) as part of our dataset citation because:
• They are actionable, interoperable, persistent links for (digital) objects
• Scientists are already used to citing papers using DOIs (and they trust them)
• Academic journal publishers are starting to require datasets be cited in a stable way, i.e. using DOIs.
• We have a good working relationship with the British Library and DataCite
NERC’s guidance on citing data and assigning DOIs can be found at: http://www.nerc.ac.uk/research/sites/data/doi.asp
VO Sandpit, November 2009
Dataset catalogue page (and DOI landing page)
Dataset citation
Clickable link to Dataset in the archive
VO Sandpit, November 2009
Another example of a cited dataset
VO Sandpit, November 2009
Another example of a cited dataset
VO Sandpit, November 2009
Data metrics – the state of the art!
Data citation isn’t common practice (unfortunately)
Data citation counts don’t exist yet
To count how often BADC data is used we have to:
1. Search Google Scholar for “BADC”, “British Atmospheric Data Centre”
2. Scan the results and weed out false positives
3. Read the papers to figure out what datasets the authors are talking about (if we can)
4. Count the mentions and citations (if any)
http://www.lol-cat.org/little-lovely-lolcat-and-big-work/
We’re working with DataCite and Thompson Reuters to get data
citation counts.
VO Sandpit, November 2009
Altmetrics and social media for data?
Mainly focussing on citation as a first step, as it’s most commonly accepted by researchers.
We have a social media presence @CEDAnews
- Mainly used for announcements about service availability
We definitely want ways of showing our funders that we provide a good service to our users and the research community.
And we want to be able to tell our depositors what
impact their data has had!
VO Sandpit, November 2009
RDA/WDS WG Bibliometrics Survey Results: Mostly Expected
Citations are preferred metrics, downloads next.
Standards are missing.Culture change is needed.
Nothing
Data citation counts
Downloads
Social media (likes/shares/tweets)
Mentions in peer-reviewed papers
Hits in search engines
Mentions in blogs
Bookmarks in Zotero and/or Mendeley
Other (please specify)
0 10 20 30 40 50 60 70
31.5%
68.5%
Are the methods you use to evaluate impact adequate for
your needs?
YesNo
What do you currently use to evaluate the impact of data?
VO Sandpit, November 2009
Other projects in the data metrics space
1. CASRAI data level metrics 2. PLOS Making Data Count 3. NISO altmetrics 4. Jisc Giving Researchers Credit for their Data
VO Sandpit, November 2009
Next steps for Bibliometrics for Data WG
Will be based on:• WG survey results (presented RDA P4 and P5)• Spreadsheet of metrics being collected by repositories - Still open
for contributions! http://bit.ly/1MpyW4K • Shared results from other projects – understanding the challenges
and answering the questions posed in the case statement• Preliminary analysis of data DOI resolutions• Supporting and evaluating tools from other projects• Preliminary guidance for the community - “minimal” rather than
“best” practice – get people discussing the issues and coming up with solutions!
VO Sandpit, November 2009
Thanks!Any questions?
[email protected] @sorcha_ni
http://citingbytes.blogspot.co.uk/
Image credit: Borepatch http://borepatch.blogspot.com/2010/06/its-not-what-you-dont-know-that-hurts.html
“Publishing research without data is simply advertising, not science” - Graham Steel
http://blog.okfn.org/2013/09/03/publishing-research-without-data-is-simply-advertising-not-science/
VO Sandpit, November 2009
Title: Getting (and giving) credit for all that we do
Melissa Haendel
NISO Research Data Metrics Landscape: An update from the NISO Altmetrics Working Group B:
Output Types & Identifiers11.16.2015
@ontowonka
VO Sandpit, November 2009
What *IS* “success”?
VO Sandpit, November 2009https://goo.gl/b60moX
It’s not always what you see
VO Sandpit, November 2009
What is attribution???
VO Sandpit, November 2009
VO Sandpit, November 2009
Over 1000 authors
VO Sandpit, November 2009
Project CRediT
http://projectcredit.net
VO Sandpit, November 2009
Many contributions don’t lead to authorship
BD2K co-authorship
D.EichmannN.Vasilevsky
20% key personnel are not adequately profiled using publications
VO Sandpit, November 2009
Some contributions are anonymous
Data depositionImage credit: http://disruptiveviews.com/is-your-data-anonymous-or-just-encrypted/
Anonymous review
VO Sandpit, November 2009
The Research Life Cycle
EXPERIMENT
CONSULT
PUBLISHDATA
FUND
VO Sandpit, November 2009
The Research Life Cycle
EXPERIMENT
CONSULT
PUBLISHDATA
FUND
Network
VO Sandpit, November 2009
• Measurement instruments• Continuing education materials• Cost-effective intervention• Change in delivery of healthcare services• Quality measure guidelines• Gray literature
Evidence of meaningful impact
• New experimental methods, data models, databases, software tools
• New diagnostic criteria • New standards of care• Biological materials, animal models• Consent documents• Clinical/practice guidelines
https://becker.wustl.edu/impact-assessment http://nucats.northwestern.edu/
Diverse outputs Diverse impacts
Diverse rolesEach a critical component of the
research process
VO Sandpit, November 2009
EXAMPLE OUTPUTS related to software:
Outputs: binary redistribution package (installer), algorithm, data analytic software tool, analysis scripts, data cleaning, APIs, codebook (for content analysis), source code, software to make metadata for libraries archives and museums, data analytic software tool, source code, program codes (for modeling), commentary in code(thinking of open source-need to attribute code authors and commentator/enhancers/hackers, who can document what they did and why), computer language (a syntax to describe a set of operations or activities), software patch (set of changes to code to fix bugs, add features, etc.), digital workflow (automated sequence of programs, steps to an outcome), software library (non-stand alone code that can be incorporated into something larger), software application (computer code that accomplishes something)
Roles: catalog, design, develop, test, hacker, bug finder, software developer, software engineer, developer, programmer, system administrator, execute, document, software package maintainer, project manager, database administrator
Attribution workshop results - >500 scholarly products
VO Sandpit, November 2009
Connecting people to their “stuff”
VO Sandpit, November 2009
Modeling & implementation
VIVO-ISF: Suite of ontologies that integrates and extends community standards
VO Sandpit, November 2009
Credit extends beyond the original contribution
Stacy creates mouse1
Kristi creates mouse2
Karen uses performs RNAseq analysis on mouse1 and
mouse2 to generate dataset3, which she subsequently curates and analyzes
Karen writes publication pmid:12345 about the results of her analysis
Karen explicitly credits Stacy as an author but not Kristi.
VO Sandpit, November 2009
Credit is connected
Credit to Stacy is asserted, but credit to Kristi can be inferred
VO Sandpit, November 2009
Introducing openRIF
The Open Research Information Framework
openRIF
SciENcv
eagle-i
VIVO-ISF
VO Sandpit, November 2009
Ensuring an openRIF that meets community needs
Data Entry Discovery
Interoperability
A domain configurable suite of ontologies to enable interoperability across systems
A community of developers, tools, data providers, and end-users
VO Sandpit, November 2009
Developing a computable research ecosystem
Research information is scattered amongst:Research networking toolsCitation databases (e.g., PubMED)Award databases (e.g., NIH Reporter)Curated archives (e.g., GenBank)Locked up in text (the research literature)
Map SciENcv data model to VIVO-ISF/openRIF
Enable bi-directional data exchangeIntegrate SciENcv, ORCID data into
CTSAsearch
http://research.icts.uiowa.edu/polyglot/CTSAsearch:
The Open Research Information Framework
David Eichmann
VO Sandpit, November 2009
Thank you!
Join the Force Attribution Working Group at: https://www.force11.org/group/attributionwg
Join the openRIF listserv at: http://group.openrif.org
VO Sandpit, November 2009
Identifying those scholarly outputs
Identifiers for things that are not publications, or documents, need to get beyond thinking about DOIs