data infrastructure for coastal and estuarine science
DESCRIPTION
This talk was given at the Atlantic Estuarine Research Society at their 2014 Sprint meeting in Ocean City, Maryland, USATRANSCRIPT
![Page 1: Data Infrastructure for Coastal and Estuarine Science](https://reader036.vdocuments.net/reader036/viewer/2022062513/554eabd6b4c9055f7b8b4e51/html5/thumbnails/1.jpg)
Data Infrastructures for Estuarine and Coastal Science
Anne E. Thessen
http://www.slideshare.net/[email protected]
![Page 2: Data Infrastructure for Coastal and Estuarine Science](https://reader036.vdocuments.net/reader036/viewer/2022062513/554eabd6b4c9055f7b8b4e51/html5/thumbnails/2.jpg)
Photo Credit: NASA/ GSFC/ NOAA/ USGS
![Page 3: Data Infrastructure for Coastal and Estuarine Science](https://reader036.vdocuments.net/reader036/viewer/2022062513/554eabd6b4c9055f7b8b4e51/html5/thumbnails/3.jpg)
Outline
• Why are we talking about data infrastructures?
• What are the challenges?• What are the requirements?• What parts are already available?• How do we get there?• PSA
![Page 4: Data Infrastructure for Coastal and Estuarine Science](https://reader036.vdocuments.net/reader036/viewer/2022062513/554eabd6b4c9055f7b8b4e51/html5/thumbnails/4.jpg)
Data Type Important Easy
Atmospheric Data 52.2% 21.6%Climate Data 56.0% 23.3%Oceanographic Data 42.5% 18.9%Geophysical Data 55.5% 22.0%Geological Data 56.3% 19.8%Critical Zone Data 19.3% 8.2%Hydrology Data 48.4% 20.1%
Results from EarthCube Stakeholder Alignment Survey
Why Are We Talking About Data Infrastructure?
![Page 5: Data Infrastructure for Coastal and Estuarine Science](https://reader036.vdocuments.net/reader036/viewer/2022062513/554eabd6b4c9055f7b8b4e51/html5/thumbnails/5.jpg)
Working with multiple data sets from many disciplines?
Working with multiple data sets within a discipline?
88.1% say it is important23.5% say it is easy
70.7% say it is important9.8% say it is easy
Results from EarthCube Stakeholder Alignment Survey
Why Are We Talking About Data Infrastructure?
![Page 6: Data Infrastructure for Coastal and Estuarine Science](https://reader036.vdocuments.net/reader036/viewer/2022062513/554eabd6b4c9055f7b8b4e51/html5/thumbnails/6.jpg)
Why Are We Talking About Data Infrastructure?
• “Data Deluge”• Large-scale problems• Maturation of the internet• Increased investment (i.e.
EarthCube)• Estuarine and coastal
science has interdisciplinary nature and strong sharing culture
![Page 7: Data Infrastructure for Coastal and Estuarine Science](https://reader036.vdocuments.net/reader036/viewer/2022062513/554eabd6b4c9055f7b8b4e51/html5/thumbnails/7.jpg)
User Needs
Where Do We Start?
Available Technology
Existing Infrastructure
Incentives
![Page 8: Data Infrastructure for Coastal and Estuarine Science](https://reader036.vdocuments.net/reader036/viewer/2022062513/554eabd6b4c9055f7b8b4e51/html5/thumbnails/8.jpg)
Sociological
Technological
• Data sharing• Incentives• Data cultures• Science practices• Massive heterogeneity
• Storage capacity• Moving data around• Efficient query• Processing speed• Knowledge representation
![Page 9: Data Infrastructure for Coastal and Estuarine Science](https://reader036.vdocuments.net/reader036/viewer/2022062513/554eabd6b4c9055f7b8b4e51/html5/thumbnails/9.jpg)
Stakeholder Assessment
Data producers
Photo Credit: The University of Nottingham Photo Credit: Kay Nietfeld/EPA
Data consumers
![Page 10: Data Infrastructure for Coastal and Estuarine Science](https://reader036.vdocuments.net/reader036/viewer/2022062513/554eabd6b4c9055f7b8b4e51/html5/thumbnails/10.jpg)
What is the current state of sharing?
• Data sharing varies widely by discipline– No universal rules or agreements– Sharing in marine science is 40%– Other disciplines - 10% to 100%
![Page 11: Data Infrastructure for Coastal and Estuarine Science](https://reader036.vdocuments.net/reader036/viewer/2022062513/554eabd6b4c9055f7b8b4e51/html5/thumbnails/11.jpg)
What is the current state of sharing?
• Data sharing varies widely and by discipline• Far more scientists say they are willing to
share data than actually do– Time to prepare– Concerns about misuse
![Page 12: Data Infrastructure for Coastal and Estuarine Science](https://reader036.vdocuments.net/reader036/viewer/2022062513/554eabd6b4c9055f7b8b4e51/html5/thumbnails/12.jpg)
What is the current state of sharing?
• Data sharing varies widely and by discipline• Far more scientists say they are willing to
share data than actually do• Lack of access to data is a major impediment
![Page 13: Data Infrastructure for Coastal and Estuarine Science](https://reader036.vdocuments.net/reader036/viewer/2022062513/554eabd6b4c9055f7b8b4e51/html5/thumbnails/13.jpg)
If sharing is so important why aren’t more people doing it?
The large proportion of researchers who claim to be willing to share data and the low numbers of researchers who actually make their data easily available suggests that data sharing would increase substantially if the proper infrastructure were in place.
![Page 14: Data Infrastructure for Coastal and Estuarine Science](https://reader036.vdocuments.net/reader036/viewer/2022062513/554eabd6b4c9055f7b8b4e51/html5/thumbnails/14.jpg)
Reasons for Not Sharing
• Not enough time or funding• No place to put the data• No standards or policies for sharing• Others have no need for the data• Loss of control• No way to get credit• Sensitive data cannot be shared• Errors will be exposed• Loss of competitiveness
![Page 15: Data Infrastructure for Coastal and Estuarine Science](https://reader036.vdocuments.net/reader036/viewer/2022062513/554eabd6b4c9055f7b8b4e51/html5/thumbnails/15.jpg)
Social Infrastructure Requirements
• Repository capability• Place conditions on access• Mechanisms for data citation and credit• Data sharing policy• Value added services• Requirements from publishers and funders• Respect for confidentiality• Ease of use
![Page 16: Data Infrastructure for Coastal and Estuarine Science](https://reader036.vdocuments.net/reader036/viewer/2022062513/554eabd6b4c9055f7b8b4e51/html5/thumbnails/16.jpg)
We need a system that can
• Share• Preserve• Digitize• Automate• Integrate– Data– Infrastructure
![Page 17: Data Infrastructure for Coastal and Estuarine Science](https://reader036.vdocuments.net/reader036/viewer/2022062513/554eabd6b4c9055f7b8b4e51/html5/thumbnails/17.jpg)
Data Set Size
![Page 18: Data Infrastructure for Coastal and Estuarine Science](https://reader036.vdocuments.net/reader036/viewer/2022062513/554eabd6b4c9055f7b8b4e51/html5/thumbnails/18.jpg)
Data Set Heterogeneity
• Data format• Data file format• Data quality and completeness• Physical samples
![Page 19: Data Infrastructure for Coastal and Estuarine Science](https://reader036.vdocuments.net/reader036/viewer/2022062513/554eabd6b4c9055f7b8b4e51/html5/thumbnails/19.jpg)
What Will We Do With the Data?
• Preserve Data– Format migration– Redundancy– Self-Repair
• Serve Data– Discoverable– Accessible– Usable
![Page 20: Data Infrastructure for Coastal and Estuarine Science](https://reader036.vdocuments.net/reader036/viewer/2022062513/554eabd6b4c9055f7b8b4e51/html5/thumbnails/20.jpg)
Technical Infrastructure Requirements
• Preservation• Layered service architecture• Repository functions• Accommodate heterogeneity• Bridge digital and physical
![Page 21: Data Infrastructure for Coastal and Estuarine Science](https://reader036.vdocuments.net/reader036/viewer/2022062513/554eabd6b4c9055f7b8b4e51/html5/thumbnails/21.jpg)
Review Requirements
Sociological• Repository capability• Place conditions on access• Mechanisms for data citation
and credit• Data sharing policy• Value added services• Requirements from
publishers and funders• Respect for confidentiality• Ease of use
Technological• Preservation• Layered service architecture• Repository functions• Accommodate
heterogeneity• Bridge digital and physical
![Page 22: Data Infrastructure for Coastal and Estuarine Science](https://reader036.vdocuments.net/reader036/viewer/2022062513/554eabd6b4c9055f7b8b4e51/html5/thumbnails/22.jpg)
What is Available?
Repositories
![Page 23: Data Infrastructure for Coastal and Estuarine Science](https://reader036.vdocuments.net/reader036/viewer/2022062513/554eabd6b4c9055f7b8b4e51/html5/thumbnails/23.jpg)
What is Available?
Citation
Repositories
![Page 24: Data Infrastructure for Coastal and Estuarine Science](https://reader036.vdocuments.net/reader036/viewer/2022062513/554eabd6b4c9055f7b8b4e51/html5/thumbnails/24.jpg)
What is Available?
Preservation
Repositories
Citation
![Page 25: Data Infrastructure for Coastal and Estuarine Science](https://reader036.vdocuments.net/reader036/viewer/2022062513/554eabd6b4c9055f7b8b4e51/html5/thumbnails/25.jpg)
What is Available?
Quality Control and Usage Metrics
Repositories
Citation
Preservation
Crowd Sourcing
Web 2.0
![Page 26: Data Infrastructure for Coastal and Estuarine Science](https://reader036.vdocuments.net/reader036/viewer/2022062513/554eabd6b4c9055f7b8b4e51/html5/thumbnails/26.jpg)
What is Available?
Integration
Repositories
Citation
Preservation
Quality and Metrics
Web 3.0
![Page 27: Data Infrastructure for Coastal and Estuarine Science](https://reader036.vdocuments.net/reader036/viewer/2022062513/554eabd6b4c9055f7b8b4e51/html5/thumbnails/27.jpg)
What is Available?
Mobilization
Repositories
Citation
Preservation
Quality and Metrics
Integration
![Page 28: Data Infrastructure for Coastal and Estuarine Science](https://reader036.vdocuments.net/reader036/viewer/2022062513/554eabd6b4c9055f7b8b4e51/html5/thumbnails/28.jpg)
What is Available?
Access Protocols
Web Services
Data Brokers Repositories
Citation
Preservation
Quality and Metrics
Integration
Mobilization
![Page 29: Data Infrastructure for Coastal and Estuarine Science](https://reader036.vdocuments.net/reader036/viewer/2022062513/554eabd6b4c9055f7b8b4e51/html5/thumbnails/29.jpg)
What is Available?
Standards
Repositories
Citation
Preservation
Quality and Metrics
Integration
Mobilization
Access
![Page 30: Data Infrastructure for Coastal and Estuarine Science](https://reader036.vdocuments.net/reader036/viewer/2022062513/554eabd6b4c9055f7b8b4e51/html5/thumbnails/30.jpg)
How Can it all Fit Together?
Quality and
Metrics
Access
Citation
PreservationMobilization
Integration
Repositories
Standards
![Page 31: Data Infrastructure for Coastal and Estuarine Science](https://reader036.vdocuments.net/reader036/viewer/2022062513/554eabd6b4c9055f7b8b4e51/html5/thumbnails/31.jpg)
Who Should Be Doing All This Work?
• Librarians• Data Scientists• Informaticians• Ontologists• Computer Scientists• Software Developers• Standards Groups
Image by Michael Krigsman
![Page 32: Data Infrastructure for Coastal and Estuarine Science](https://reader036.vdocuments.net/reader036/viewer/2022062513/554eabd6b4c9055f7b8b4e51/html5/thumbnails/32.jpg)
PSA
![Page 33: Data Infrastructure for Coastal and Estuarine Science](https://reader036.vdocuments.net/reader036/viewer/2022062513/554eabd6b4c9055f7b8b4e51/html5/thumbnails/33.jpg)
Why Share Data?
• Increased recognition• Increased economic opportunities• Improved data set• Improved science• Time and money saved
![Page 34: Data Infrastructure for Coastal and Estuarine Science](https://reader036.vdocuments.net/reader036/viewer/2022062513/554eabd6b4c9055f7b8b4e51/html5/thumbnails/34.jpg)
Photo Credit: Emergency Cleaning Solutions
![Page 35: Data Infrastructure for Coastal and Estuarine Science](https://reader036.vdocuments.net/reader036/viewer/2022062513/554eabd6b4c9055f7b8b4e51/html5/thumbnails/35.jpg)
Photo Credit: The Collared Sheep
![Page 36: Data Infrastructure for Coastal and Estuarine Science](https://reader036.vdocuments.net/reader036/viewer/2022062513/554eabd6b4c9055f7b8b4e51/html5/thumbnails/36.jpg)
![Page 37: Data Infrastructure for Coastal and Estuarine Science](https://reader036.vdocuments.net/reader036/viewer/2022062513/554eabd6b4c9055f7b8b4e51/html5/thumbnails/37.jpg)
Acknowledgements
• Benjamin Fertig• David Patterson• Mike Kemp• John Milliman• Melissa Cragin• Sayeed Choudhury• Tim DiLauro• Carol Palmer
• Nathan Wilson• Alan Renear• Ruth Duerr• Cyndy Chandler• Peter Fox• Krishna Sinha• Janet Fredericks• Carl Lagoze
![Page 38: Data Infrastructure for Coastal and Estuarine Science](https://reader036.vdocuments.net/reader036/viewer/2022062513/554eabd6b4c9055f7b8b4e51/html5/thumbnails/38.jpg)
Questions?
![Page 39: Data Infrastructure for Coastal and Estuarine Science](https://reader036.vdocuments.net/reader036/viewer/2022062513/554eabd6b4c9055f7b8b4e51/html5/thumbnails/39.jpg)
ReferencesAtkins DE, Droegemeier KK, Feldman SI, Garcia-Molina H, Klein ML, Messerschmitt DG, Messina P, Ostriker JP, Wright MH.
2003. Revolutionizing science and engineering through cyberinfrastructure.
Borgman CL. 2010. Research data: who will share what, with whom, when, and why? Fifth China-North America Library Conference 2010
Borgman CL. 2012. The conundrum of sharing research data. Journal of the American Society for Information Science and Technology 63(6):1059-1078
Burton A, Treloar A. 2009. Designing for discovery and re-use: the ANDS data-sharing verbs approach to service decomposition. The International Journal of Digital Curation 4.
Costello M. 2009. Motivating online publication of data. BioScience 59:418-426
Cragin MH, Palmer CL, Carlson JR, Witt M. 2010. Data sharing, small science and institutional repositories. Philosophical Transactions of the Royal Society A 368:4023-4038
Edwards PN, Mayernik MS, Batcheller AL, Bowker GC, Borgman CL. 2011. Science friction: data, metadata and collaboration. Social Studies of Science 41(5):667-690
Enke N, Thessen AE, Bach K, Bendix J, Seeger B, Gemeinholzer B. 2012. The User’s View on Biodiversity Data Sharing. Ecological Informatics 11: 25-33
Field D Sansone SA, Collis A, Booth T, Dukes P, Gregurick SK, Kennedy K, Kolar P, Kolker E, Maxon M, Millard S, Mugabushaka AM, Perrin N, Remacle JE, Remington K, Rocca-Serra P, Taylor CF, Thorley M, Tiwari B, Wilbanks J. 2009. ‘Omics data-sharing. Science 326:234-236
Froese R, Lloris D, Opitz S. 2003. Scientific data in the public domain. ACP-EU Fisheries Research Report 14:267-271.
Gleditsch NP, Strand H. 2003. Posting your data: will you be scooped or will you be famous? International Study Perspectives 4:89-97
Heidorn PB. 2008. Shedding light on the dark data in the long tail of science. Library Trends 57:280-299.
Henty M, Weaver B, Bradbury SJ, Simon P. 2008. Investigating data management practices in Australian Universities. APSR. QUT digital repository http://eprints.qut.edu.au/14549
Hey T, Tansley S, Tolle K. 2009. The Fourth Paradigm. Microsoft Research. Redmond, WA, USA, 252 pp.
![Page 40: Data Infrastructure for Coastal and Estuarine Science](https://reader036.vdocuments.net/reader036/viewer/2022062513/554eabd6b4c9055f7b8b4e51/html5/thumbnails/40.jpg)
ReferencesKey Perspectives Ltd. 2010. Data Dimensions: disciplinary differences in research data-sharing, reuse and long term viability.
DCC Scarp Synthesis Report. ISSN 1759-586X
Laogze C, Patzke K. 2011. A research agenda for data curation cyberinfrastructure. JCDL’11
Mayernik MS, DiLauro T, Duerr R, Metsger E, Thessen AE Choudhury GS. 2013. Data Conservancy provenance, context and lineage services: key components for data preservation and curation. Data Science Journal 12:158-171
Palmer CL, Cragin MH, Heidorn PB, Smith LC. 2007. Data curation for the long tail of science: the case of environmental studies. Digital Curation
Palmer CL, Weber NM, Cragin MH. 2011. The analytic potential of scientific data: understanding re-use value. ASIST 2011
Piwowar HA, Day RS, Fridsma DB. 2007. Sharing detailed research data is associated with increased citation rate. PLoS ONE 3:e308
Savage CJ, Vickers AJ. 2009. Empirical study of data-sharing by authors publishing in PLoS journals. PLoS ONE 4: e7078
Sinha AK, Thessen AE, Barnes CG. 2013. Geoinformatics: towards an integrative view of Earth as a system, in Bickford, M.E., ed., The Web of Geological Sciences: Advances, Impacts, and Interactions: Geological Society of America Special Paper 500, p. 1-14. 10.1130/2013.2500(19)
Smith VS. 2009. Data publication: towards a database of everything. BMC Research Notes 2:113
Tenopir C, Allard S, Douglass KL, Aydinoglu AU, Wu L, Read E, Manoff M, Frame M. 2011. Data sharing by scientists: practices and perceptions. PLoS ONE 6.6
Thessen AE, Patterson DJ. 2011. Data issues in the life sciences. ZooKeys 150:15-51
Wallis JC, Mayernik MS, Borgman CL, Pepe A. 2010. Digital libraries for scientific data discovery and reuse: from vision to practical reality. Joint Conference on Digital Libraries 2010
Weber NM, Baker KS, Thomer AK, Chao TC, Palmer CL. 2012. Value and context in data use: domain analysis revisited. Proceedings of the American Society for Information Science and Technology. 49(1):1-10
Whitlock MC. 2011. Data archiving in ecology and evolution: best practices. TREE 26(2):61-65