deepcarbon.net xiaogang (marshall) ma, yu chen, han wang, john erickson, patrick west, peter fox...

19
deepcarbon.net Xiaogang (Marshall) Ma, Yu Chen, Han Wang, John Erickson, Patrick West, Peter Fox Tetherless World Constellation Rensselaer Polytechnic Institute Deep Carbon Virtual Observatory: Leveraging Data Science to Facilitate Earth Science Research

Upload: shannon-charles

Post on 04-Jan-2016

240 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Deepcarbon.net Xiaogang (Marshall) Ma, Yu Chen, Han Wang, John Erickson, Patrick West, Peter Fox Tetherless World Constellation Rensselaer Polytechnic

deepcarbon.net

Xiaogang (Marshall) Ma, Yu Chen, Han Wang, John Erickson, Patrick West, Peter Fox

Tetherless World ConstellationRensselaer Polytechnic Institute

Deep Carbon Virtual Observatory: Leveraging Data Science to Facilitate Earth Science Research

Page 2: Deepcarbon.net Xiaogang (Marshall) Ma, Yu Chen, Han Wang, John Erickson, Patrick West, Peter Fox Tetherless World Constellation Rensselaer Polytechnic

2

Outline

• Deep Carbon Virtual Observatory

• Data Management, Publication and Citation

• Provenance of Research

• Era of Science 2.0

Page 3: Deepcarbon.net Xiaogang (Marshall) Ma, Yu Chen, Han Wang, John Erickson, Patrick West, Peter Fox Tetherless World Constellation Rensselaer Polytechnic

3

Deep Carbon Virtual Observatory

• A vision of the DCVO:– A conceptual model of the interplay between data, people,

publication, instruments, models, organizations, etc.– Identify, annotate and link all key entities, agents and activities – A repository for datasets and associated metadata– Unique and powerful data and metadata visualization for

dissemination of information– Collaboration tools for scientific efforts– An integrated portal for diverse content and applications

(Fox et al., 2014)

Page 4: Deepcarbon.net Xiaogang (Marshall) Ma, Yu Chen, Han Wang, John Erickson, Patrick West, Peter Fox Tetherless World Constellation Rensselaer Polytechnic

4

Data Management

data work

Image courtesy Randy Glasbergen

Page 5: Deepcarbon.net Xiaogang (Marshall) Ma, Yu Chen, Han Wang, John Erickson, Patrick West, Peter Fox Tetherless World Constellation Rensselaer Polytechnic

5

Data Management Plan

• DCO Open Access and Data Policies– https://deepcarbon.net/dco/dco-open-access-and-data-policies

• Data Management Plan– A formal document that outlines what you will do with your data during and

after you complete your research

• Resources/Tools help create DMPs:– DCC Data Management Plans:

http://www.dcc.ac.uk/resources/data-management-plans – NSF Data Management Plan Requirements: http

://www.nsf.gov/eng/general/dmp.jsp

– DMPTool: https://dmptool.org – DCC DMPOnline: https://dmponline.dcc.ac.uk

Page 6: Deepcarbon.net Xiaogang (Marshall) Ma, Yu Chen, Han Wang, John Erickson, Patrick West, Peter Fox Tetherless World Constellation Rensselaer Polytechnic

6

Data Publication

• Data as first class products of research– e.g., NSF bio-sketches can include data publications

Image from j4h.net

http://www.nsf.gov/pubs/2013/nsf13004/nsf13004.jsp

(NSF, 2012)

Page 7: Deepcarbon.net Xiaogang (Marshall) Ma, Yu Chen, Han Wang, John Erickson, Patrick West, Peter Fox Tetherless World Constellation Rensselaer Polytechnic

7

“All data necessary to understand, assess, and extend the conclusions of the manuscript must be available to any reader of Science. ”

“…authors are required to make materials, data and associated protocols promptly available to readers without undue qualifications.”

“…authors must make materials, data, and associated protocols available to readers.”

“…it is a condition of publication that authors make available the data and research materials supporting the results in the article.”

“…require authors to make all data underlying the findings described in their manuscript fully available without restriction…”

“Earth and space science data should be widely accessible in multiple formats and long‐term preservation of data is an integral responsibility of scientists and sponsoring institutions.”

“…support the principle that research data should be made freely available to all researchers…”

“…recommends depositing data that correspond to journal articles in reliable data repositories…”

Page 8: Deepcarbon.net Xiaogang (Marshall) Ma, Yu Chen, Han Wang, John Erickson, Patrick West, Peter Fox Tetherless World Constellation Rensselaer Polytechnic

8

• Ways of data publication– Data as supplemental material of a paper– Standalone data– Data paper: data + descriptive ‘data paper’

(Strasser, 2014)

Examples:• Standalone data journals: Nature Scientific Data, Geoscience Data

Journal, Ecological Archives, Data in Brief• Journals that publish data papers: Earth and Space Science,

GigaScience, F1000 Research, Internet Archaeology

Page 9: Deepcarbon.net Xiaogang (Marshall) Ma, Yu Chen, Han Wang, John Erickson, Patrick West, Peter Fox Tetherless World Constellation Rensselaer Polytechnic

9

What does a DCO data publication look like?

Page 10: Deepcarbon.net Xiaogang (Marshall) Ma, Yu Chen, Han Wang, John Erickson, Patrick West, Peter Fox Tetherless World Constellation Rensselaer Polytechnic

10

Page 11: Deepcarbon.net Xiaogang (Marshall) Ma, Yu Chen, Han Wang, John Erickson, Patrick West, Peter Fox Tetherless World Constellation Rensselaer Polytechnic

11Image from Internet; Anonymous author

Page 12: Deepcarbon.net Xiaogang (Marshall) Ma, Yu Chen, Han Wang, John Erickson, Patrick West, Peter Fox Tetherless World Constellation Rensselaer Polytechnic

12

Data Citation

• Data Citation Index– Indexes the world's leading data repositories– Connect datasets to related refereed literature indexed in the

Web of Science™– Efficient access to data across subjects and regions

Page 13: Deepcarbon.net Xiaogang (Marshall) Ma, Yu Chen, Han Wang, John Erickson, Patrick West, Peter Fox Tetherless World Constellation Rensselaer Polytechnic

13

Data interoperability

Ma et al., Nature Geosciecne (2011)

Interoperable:“Data should be discoverable, accessible, decodable, understandable and usable, and data sharing should be legal and ethical for all participants.”

Original image from: http://ehna.org

Page 14: Deepcarbon.net Xiaogang (Marshall) Ma, Yu Chen, Han Wang, John Erickson, Patrick West, Peter Fox Tetherless World Constellation Rensselaer Polytechnic

14

Provenance of research

• Provenance documentation – Linking a range of observations and model outputs, research

activities, people and organizations involved in the production of scientific findings with the supporting data sets and methods used to generate them

Provenance enables the traceability, reproducibility, explanation, verification,and validation of scientific findings.

Image from nature.com

Ma et al., Nature Climate Change (2014)

Page 15: Deepcarbon.net Xiaogang (Marshall) Ma, Yu Chen, Han Wang, John Erickson, Patrick West, Peter Fox Tetherless World Constellation Rensselaer Polytechnic

We made extension to the IPython Notebook to enable automatic provenance capture during a scientific workflow

• IPython Notebook: A web-based interactive computational environment

(Di Stefano et al., 2014)

Page 16: Deepcarbon.net Xiaogang (Marshall) Ma, Yu Chen, Han Wang, John Erickson, Patrick West, Peter Fox Tetherless World Constellation Rensselaer Polytechnic

16

Era of Science 2.0

• Science 2.0– New practices of scientists who post raw experimental results,

nascent theories, claims of discovery and draft papers on the Web for others to see and comment on

– Social scholarship: Reconsidering scholarly practices in the age of social media

(Waldrop, 2008; Greenhow and Gleason, 2014)

Practice

Page 17: Deepcarbon.net Xiaogang (Marshall) Ma, Yu Chen, Han Wang, John Erickson, Patrick West, Peter Fox Tetherless World Constellation Rensselaer Polytechnic

17

• altmetric.com – already a product used by NPG, Springer, PNAS, Wiley, etc.

http://www.nature.com/nature/journal/v497/n7449/nature12127/metrics

This Altmetric score means that the article is:• in the 99 percentile (ranked 184th) of the 81,261

tracked articles of a similar age in all journals• in the 92 percentile (ranked 69th) of the 983

tracked articles of a similar age in Nature

Page 18: Deepcarbon.net Xiaogang (Marshall) Ma, Yu Chen, Han Wang, John Erickson, Patrick West, Peter Fox Tetherless World Constellation Rensselaer Polytechnic

18

Summary

• Data Science is making DCO a more open, more collaborative, and more productive community

• eScience: the digital or electronic facilitation of science

• Are you ready? http://deepcarbon.net/join

Image courtesy BGS © NERC

Page 19: Deepcarbon.net Xiaogang (Marshall) Ma, Yu Chen, Han Wang, John Erickson, Patrick West, Peter Fox Tetherless World Constellation Rensselaer Polytechnic

19

[email protected]

Thank you!