ucmp 20150407
TRANSCRIPT
Perry Willett
Stephen AbramsUniversity of California Curation Center
www.cdlib.org/uc3
CDL services for UC researchers
www.flickr.com/photos/infocux/8450190120www.flickr.com/photos/adavey/4735763989
Museum of PaleontologyUC BerkeleyApril 7, 2015
Sharing your data is good for science
reproducibility integrity
enables collaboration and synergy
minimizes needless duplication of effort
© Universal Pictures
“Papers with publicly available microarray data received
more citations than similar papers that did not make their
data available, even after controlling for many variables
known to influence citation rate”
Sharing your data is good for scientists
get credit for your work
higher impact factor
… and you have to(and should want to)
funders (are starting to)
require it
journals require it
disciplinary best practice (increasingly) expects it
“To do otherwise should come to be regarded as scientific malpractice”
– Royal Society, 2014
what can I do?
www.flickr.com/photos/cristinacosta/4304968451
adopt the growing body of
good practices
10 aspirational goals ►
plan ahead
www.flickr.com/photos/wscullin/3770015203
10
implicit (non-)decisions
can have significant
consequences
plan ahead
www.flickr.com/photos/wscullin/3770015203
dmptool.org
10
a data management plan
describes your intentions
during and after your
research project
www.flickr.com/photos/epublicist/3546059144
prefer formats that are …
standard customized
open source proprietary
commonly-used obscure
self-describing opaque
text binary
9
be preservation-friendly from the start
assign an identifier to your data
www.flickr.com/photos/erskinelibrary/4581870160
ezid.cdlib.orgdatacite.org
8
DOIs provide unambiguous
reference, persistent access,
and citation metrics[digital object identifier]
get an identifier for yourself
orcid.org
www.flickr.com/photos/mumpfpuffel/2337520969
7
ORCIDs provide unambiguous
reference and citation metrics[open researcher and contributor identifier]
www.flickr.com/photos/mumpfpuffel/233752096
describe and document
what would you
want to know
about someone
else’s data?
www.flickr.com/photos/61423903@N06/7357608430
who?
what?
when?
where?
how?
why?
…?
6
upload to a repository
www.flickr.com/photos/teegardin/6094310934
re3data.org
databib.org
5
professional,
pro-active
management
merritt.cdlib.edu
replication
fixity
monitoring
media refresh
technology watch
disaster recovery/
business continuity
…
replication
fixity
monitoring
media refresh
technology watch
disaster recovery/
business continuity…
dash.berkeley.edu
use a license with the most permissive terms
www.flickr.com/photos/_elemenoh_/147966697
4
allows simplest reuse
used by Dash
custom data use
agreement should be avoided
publish
3
www.nature.com/sdata
esapubs.org/archive
www.flickr.com/photos/takomabibelot/3984413475
so your data are available
to collaborators,
colleagues, and community
cite yourself and others
2
add data citations to your CV
and publications
track usage of your data
products through alt-metrics
www.flickr.com/photos/rob_stone/559595880
plumanalytics.comaltmetric.com impactstory.org
preserve your code
1
everything just said about
data applies equally well
to code
www.flickr.com/photos/mwichary/3368836377
github.com sourceforge.com
plan
format
identify ( your data )
identify ( yourself )
describe
upload
license
publish
cite
code
data preservation 101
www.flickr.com/photos/santos/230060595
features:
• datasets!• open to Berkeley researchers (faculty members, grad students)• no cost (to you)• assistance in describing your dataset• easy drag-and-drop to assemble your files• DOIs• catalog of datasets from other UC researchers
dash.berkeley.edu
www.cdlib.org/uc3
datapub.cdlib.org
for more information …
… also, a good paper to review:
Goodman, Pepe, Blocker, Borgman, Cranmer et al. (2014)
“Ten simple rules for the care and feeding of scientific data”
PLOS Computational Biology 10(4):e1003452,
doi:10.1371/journal.pcbi.1003542
… and ask your local librarian