claudia bauzer medeiros digital preservation – caring for our data to foster knowledge discovery...
TRANSCRIPT
![Page 1: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/1.jpg)
Digital preservation caring for our data to foster
knowledge discovery and
dissemination
Claudia Bauzer Medeiros
Institute of Computing
UNICAMP
![Page 2: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/2.jpg)
Pre-Saervare
(Before) – (Save)
= save before disappears
![Page 3: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/3.jpg)
Maintain
Manu-tenere
= being able to get/find it
![Page 4: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/4.jpg)
![Page 5: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/5.jpg)
Dec 2008
Feb 2010
![Page 6: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/6.jpg)
Data deluge
• At end of 2011 – info created and replicated > 1.8 zettabytes
• 90% data created in the last 2 years
• 5 hour flight – 240 Tbytes
• Facebook – 200 million users, >70 languages
• Each person in England is filmed 300 times/day
• Teenagers in the US send average 110 phone text messages a day
=> We need to build arks during the deluge - PRESERVATION
![Page 7: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/7.jpg)
Outline
• Why preserve?
• What to preserve?
• How to preserve?
• Where to preserve?
And a few associated challenges
![Page 8: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/8.jpg)
Outline
• Why preserve?
• What to preserve?
• How to preserve?
• Where to preserve?
And a few associated challenges
![Page 9: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/9.jpg)
WHY PRESERVE
• Costly to produce
• Contribute to progress of science
• Intrinsic value
culture/science/sustainability
![Page 10: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/10.jpg)
WHY PRESERVE• Costly to produce
– Infrastructure, power, software, models, visualization, people
– Hardware, Software, Peopleware
• Contribute to progress of science– Reproducibility and reusability
– Publication and sharing
– Quality
• Intrinsic value culture/science/sustainability– Digital humanities
– Domesday project
– Fonoteca Neotropical Jacques Vieillard
![Page 11: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/11.jpg)
WHY PRESERVE• Costly to produce
– Infrastructure, power, software, models, visualization, people
– Hardware, Software, Peopleware
• Contribute to progress of science– Reproducibility and reusability
– Publication and sharing
– Quality
• Intrinsic value culture/science/sustainability– Digital humanities
– Domesday project
– Fonoteca Neotropical Jacques Vieillard
![Page 12: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/12.jpg)
WHY PRESERVE• Costly to produce
– Infrastructure, power, software, models, visualization, people
– Hardware, Software, Peopleware
• Contribute to progress of science– Reproducibility and reusability
– Publication and sharing
– Quality
• Intrinsic value culture/science/sustainability– Digital humanities
– Domesday project
– Fonoteca Neotropical Jacques Vieillard
![Page 13: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/13.jpg)
The Domesday Project 1086-1986
• Digital decay
• Equipment obsolescence
• Software obsolescence
![Page 14: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/14.jpg)
Domesday reloaded
![Page 15: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/15.jpg)
Fonoteca
Neotropical
Jacques
Vieillard
![Page 16: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/16.jpg)
![Page 17: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/17.jpg)
![Page 18: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/18.jpg)
Outline
• Why preserve?
• What to preserve? • How to preserve?
And associated challenges
![Page 19: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/19.jpg)
What to preserve?
• Data
• BUT what is “data”?
• Only data?
![Page 20: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/20.jpg)
What to preserve?
• Data
• BUT what is “data”?
– Files and records
– Models, documentation, annotations, sketches,
experiments, recordings
• Only data?
![Page 21: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/21.jpg)
What to preserve?
• Data
• BUT what is “data”?
– Files and records
– Models, documentation, annotations, sketches,
experiments, recordings
• Only data?
– How produced it – workflows, devices,
methodologies, materials and methods,
reasonings, logs --- provenance
![Page 22: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/22.jpg)
What to preserve?
• Data
• Environment in which was produced
• Data needed to preserve occupies more space
than the data itself
• Preservation means storing more than object
itself
![Page 23: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/23.jpg)
23/10000
What about our research data?(slide adapted from Jim Gray)
Answers
Questions
“Collaboratory”Data-driven science
Models
Simulations
Papers
Files
Experiments
Instruments
DATA
![Page 24: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/24.jpg)
24/10000
Data sources?Table of Product Characteristics
id Property name Value
MilkProd productsrep MilkA
MilkProd quantity 10000
MilkProd validity date 10/06/2006
CheeseProd productsr
ep
Minas
CheeseProd quantity 2000
CheeseProd validity date 12/02/2006
CheeseProd shape Circular
![Page 25: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/25.jpg)
25/10000
eEnvironmental Science
• Direct and indirect observations
![Page 26: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/26.jpg)
26/10000
Data sources
![Page 27: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/27.jpg)
27/10000
![Page 28: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/28.jpg)
We are
DATASCOPE
engineers
Software is the
device/tool
![Page 29: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/29.jpg)
Outline
• Why preserve?
• What to preserve?
• How to preserve?
And associated challenges
![Page 30: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/30.jpg)
How to preserve?
How to construct the ark during the
deluge?
Presaervare, Manutenere and Share
![Page 31: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/31.jpg)
How to preserve?
• To ensure retrievability and sharing– Index structures
– Ontologies, metadata, keywords, standards
– Workflows
• To ensure longevity – Media decay, software decay, hardware decay
• To ensure quality– Curation procedures
• To afford maintenance costs– Cloud? CAP theorem?
![Page 32: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/32.jpg)
How to preserve?
• To ensure retrievability and sharing– Index structures
– Ontologies, metadata, keywords, standards
– Workflows
• To ensure longevity – Media decay, software decay, hardware decay
• To ensure quality– Curation procedures
• To afford maintenance costs– Cloud? CAP theorem?
![Page 33: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/33.jpg)
How to preserve?
• To ensure retrievability and sharing– Index structures
– Ontologies, metadata, keywords, standards
– Workflows
• To ensure longevity – Media decay, software decay, hardware decay
• To ensure quality– Curation procedures
• To afford maintenance costs– Cloud? CAP theorem?
![Page 34: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/34.jpg)
How to preserve?
• To ensure retrievability and sharing– Index structures
– Ontologies, metadata, keywords, standards
– Workflows
• To ensure longevity – Media decay, software decay, hardware decay
• To ensure quality– Curation procedures, metadata,standards
• To afford maintenance costs– Cloud? CAP theorem?
![Page 35: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/35.jpg)
How to preserve?
• To ensure retrievability and sharing– Index structures
– Ontologies, metadata, keywords, standards
– Workflows
• To ensure longevity – Media decay, software decay, hardware decay
• To ensure quality– Curation procedures,metadata, standards
• To afford maintenance costs– Cloud? CAP theorem? =======� WHERE
![Page 36: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/36.jpg)
How to preserve?
• To ensure retrievability and sharing– Index structures
– Ontologies, metadata, keywords, standards
– Workflows
• To ensure longevity – Media decay, software decay, hardware decay
– PEOPLE DECAY
• To ensure quality– Curation procedures,metadata, standards
• To afford maintenance costs– Cloud? CAP theorem? =======� WHERE
![Page 37: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/37.jpg)
Sharing and open access
NSF Data Management Policy
Paper and data publication
![Page 38: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/38.jpg)
![Page 39: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/39.jpg)
Sharing of Data Leads to Progress on Alzheimer’s
By GINA KOLATA
Published: August 12, 2010
= NEW YORK TIMES
In 2003, a group of scientists and executives from the National Institutes of Health, the Food and
Drug Administration, the drug and medical-imaging industries, universities and nonprofit groups
joined in a project that experts say had no precedent: a collaborative effort to find the biological
markers that show the progression of Alzheimer’s disease in the human brain.
share all the data, making every single
finding public immediately, available to
anyone with a computer anywhere in the
world
=> AVAILABILITY and REUSE
![Page 40: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/40.jpg)
40/10000
• Data must be properly curated throughout its
life-cycle and released with the appropriate
high-quality metadata.
• Medical Research Council UK
![Page 41: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/41.jpg)
41/10000
• Research data should be made available for
use by other researchers. Researchers must
retain research data, including electronic data,
in a durable, indexed and retrievable form.
• Australian Govnmt National Health and
Medical Research Council
![Page 42: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/42.jpg)
42/10000
Microsoft Academic Search
40M publications
19M authors
75 publishers (Wiley, Springer, ACM, IEEE …)
![Page 43: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/43.jpg)
43/10000
Google Scholar Citations
![Page 44: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/44.jpg)
44/10000
• Citing data is as important as citing papers
• For researchers, publishers, data centers
• Over 1M DOI, several major national research
libraries
– Germany, France, Korea, Netherlands, Australia,
USA...
• Present manager – German National Library of
Science and Technology
![Page 45: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/45.jpg)
45/10000
Publish on the Cloud
Add metadata
Pre-print sharing
![Page 46: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/46.jpg)
46/10000
FNJV
proj.lis.ic.unicamp.br/fnjv
• Sharing by publishing on the Web
• Retrievability by extending metadata
![Page 47: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/47.jpg)
![Page 48: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/48.jpg)
![Page 49: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/49.jpg)
CURATION AND USE OF STANDARDS
![Page 50: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/50.jpg)
Workflows and model preservation
![Page 51: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/51.jpg)
![Page 52: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/52.jpg)
52/10000
Workflows and model preservation
Comb-e-Chem
X-Ray
e-Lab
Analysis
Properties
Properties
e-Lab
SimulationVideo
Dif
fra
cto
me
ter
Grid Middleware
Structures
Database
![Page 53: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/53.jpg)
The cloud and CAP
![Page 54: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/54.jpg)
Outline
• Why preserve?
• What to preserve?
• How to preserve?
• Where to preserve?
And a few associated challenges
PRE-SAVE and MANU-TENERE
![Page 55: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/55.jpg)
Outline• Why preserve?
– Costly to produce (hardware, software, peopleware)
– Contribute to progress of science
– Value – culture, science, sustainability
• What to preserve? – Data [WHAT IS DATA?]
– Context of production and use
• How to preserve?– Accessibility and sharing – standards, metadata,
ontologies
– Integrity and quality – context to use (hw, sw), standards
![Page 56: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/56.jpg)
56/10000
References
•
![Page 57: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/57.jpg)
References
NSF – CISE Data management policy
The Domesday Project
http://www.atsf.co.uk/dottext/domesday.html
The CLARIN Project (languages)
Eigenfactor.org
Altmetrics movement
![Page 58: Claudia Bauzer Medeiros Digital preservation – caring for our data to foster knowledge discovery and dissemination](https://reader033.vdocuments.net/reader033/viewer/2022060108/5550612cb4c905ae3f8b53f6/html5/thumbnails/58.jpg)
Thank you!!!!