deepcarbon.net xiaogang (marshall) ma, yu chen, han wang, john erickson, patrick west, peter fox...
TRANSCRIPT
deepcarbon.net
Xiaogang (Marshall) Ma, Yu Chen, Han Wang, John Erickson, Patrick West, Peter Fox
Tetherless World ConstellationRensselaer Polytechnic Institute
Deep Carbon Virtual Observatory: Leveraging Data Science to Facilitate Earth Science Research
2
Outline
• Deep Carbon Virtual Observatory
• Data Management, Publication and Citation
• Provenance of Research
• Era of Science 2.0
3
Deep Carbon Virtual Observatory
• A vision of the DCVO:– A conceptual model of the interplay between data, people,
publication, instruments, models, organizations, etc.– Identify, annotate and link all key entities, agents and activities – A repository for datasets and associated metadata– Unique and powerful data and metadata visualization for
dissemination of information– Collaboration tools for scientific efforts– An integrated portal for diverse content and applications
(Fox et al., 2014)
4
Data Management
data work
Image courtesy Randy Glasbergen
5
Data Management Plan
• DCO Open Access and Data Policies– https://deepcarbon.net/dco/dco-open-access-and-data-policies
• Data Management Plan– A formal document that outlines what you will do with your data during and
after you complete your research
• Resources/Tools help create DMPs:– DCC Data Management Plans:
http://www.dcc.ac.uk/resources/data-management-plans – NSF Data Management Plan Requirements: http
://www.nsf.gov/eng/general/dmp.jsp
– DMPTool: https://dmptool.org – DCC DMPOnline: https://dmponline.dcc.ac.uk
6
Data Publication
• Data as first class products of research– e.g., NSF bio-sketches can include data publications
Image from j4h.net
http://www.nsf.gov/pubs/2013/nsf13004/nsf13004.jsp
(NSF, 2012)
7
“All data necessary to understand, assess, and extend the conclusions of the manuscript must be available to any reader of Science. ”
“…authors are required to make materials, data and associated protocols promptly available to readers without undue qualifications.”
“…authors must make materials, data, and associated protocols available to readers.”
“…it is a condition of publication that authors make available the data and research materials supporting the results in the article.”
“…require authors to make all data underlying the findings described in their manuscript fully available without restriction…”
“Earth and space science data should be widely accessible in multiple formats and long‐term preservation of data is an integral responsibility of scientists and sponsoring institutions.”
“…support the principle that research data should be made freely available to all researchers…”
“…recommends depositing data that correspond to journal articles in reliable data repositories…”
8
• Ways of data publication– Data as supplemental material of a paper– Standalone data– Data paper: data + descriptive ‘data paper’
(Strasser, 2014)
Examples:• Standalone data journals: Nature Scientific Data, Geoscience Data
Journal, Ecological Archives, Data in Brief• Journals that publish data papers: Earth and Space Science,
GigaScience, F1000 Research, Internet Archaeology
9
What does a DCO data publication look like?
10
11Image from Internet; Anonymous author
12
Data Citation
• Data Citation Index– Indexes the world's leading data repositories– Connect datasets to related refereed literature indexed in the
Web of Science™– Efficient access to data across subjects and regions
13
Data interoperability
Ma et al., Nature Geosciecne (2011)
Interoperable:“Data should be discoverable, accessible, decodable, understandable and usable, and data sharing should be legal and ethical for all participants.”
Original image from: http://ehna.org
14
Provenance of research
• Provenance documentation – Linking a range of observations and model outputs, research
activities, people and organizations involved in the production of scientific findings with the supporting data sets and methods used to generate them
Provenance enables the traceability, reproducibility, explanation, verification,and validation of scientific findings.
Image from nature.com
Ma et al., Nature Climate Change (2014)
We made extension to the IPython Notebook to enable automatic provenance capture during a scientific workflow
• IPython Notebook: A web-based interactive computational environment
(Di Stefano et al., 2014)
16
Era of Science 2.0
• Science 2.0– New practices of scientists who post raw experimental results,
nascent theories, claims of discovery and draft papers on the Web for others to see and comment on
– Social scholarship: Reconsidering scholarly practices in the age of social media
(Waldrop, 2008; Greenhow and Gleason, 2014)
Practice
17
• altmetric.com – already a product used by NPG, Springer, PNAS, Wiley, etc.
http://www.nature.com/nature/journal/v497/n7449/nature12127/metrics
This Altmetric score means that the article is:• in the 99 percentile (ranked 184th) of the 81,261
tracked articles of a similar age in all journals• in the 92 percentile (ranked 69th) of the 983
tracked articles of a similar age in Nature
18
Summary
• Data Science is making DCO a more open, more collaborative, and more productive community
• eScience: the digital or electronic facilitation of science
• Are you ready? http://deepcarbon.net/join
Image courtesy BGS © NERC