data sharing as publication: a view from archaeology
TRANSCRIPT
Data Sharing as Publishing:A View from Archaeology
Data Sharing as Publishing:A View from Archaeology
Unless otherwise indicated, this work is licensed under a Creative Commons Attribution 3.0 License <http://creativecommons.org/licenses/by/3.0/>
Eric C. Kansa (@ekansa)Open Context / UC Berkeley
Image Credit: ‘Peter’ via Flickr (CC-BY-SA) https://www.flickr.com/photos/12023825@N04/2898021822/
Publishing norms as if the past 30 years didn’t happenPublishing norms as if the past 30 years didn’t happen
My Precious Data: Data do not “count”
My Precious Data: Data do not “count”
Image Credit: “Lord of the Rings” (2003, New Line), All Rights Reserved Copyright
Data often discussed using language of
compliance + metrics (Taylorist perspectives)
Data often discussed using language of
compliance + metrics (Taylorist perspectives)
Image Credit: Wikimedia Commons (Public Domain) http://en.wikipedia.org/wiki/Frederick_Winslow_Taylor#mediaviewer/File:Frederick_Winslow_Taylor_crop.jpg
● Linked: Links with wide array of other systems + data (example: ORCID)● Open: Code, data (mainly CC-By) on GitHub, machine-readable formats, APIs● Long-term: NSF, NEH data management. Archiving by the California Digital Library● Global: Mirroring, backup, collaboration with the German Archaeological Institute (DAI)
● Linked: Links with wide array of other systems + data (example: ORCID)● Open: Code, data (mainly CC-By) on GitHub, machine-readable formats, APIs● Long-term: NSF, NEH data management. Archiving by the California Digital Library● Global: Mirroring, backup, collaboration with the German Archaeological Institute (DAI)
Archaeology & Less Structured Data1. More structured: classification, quantification2. Less structured: images, field-notes3. URIs useful even for less structured information
Archaeology & Less Structured Data1. More structured: classification, quantification2. Less structured: images, field-notes3. URIs useful even for less structured information
Federico Buccellati (CAA 2015)1. Interested in energetics, logistics, labor investment in
monumental architecture2. Look at lots of pictures, read field notes.3. BUT stable URIs to granular resources (individual context
records, images) mean greater reproducibility
Federico Buccellati (CAA 2015)1. Interested in energetics, logistics, labor investment in
monumental architecture2. Look at lots of pictures, read field notes.3. BUT stable URIs to granular resources (individual context
records, images) mean greater reproducibility
Open Context ≠ A conventional digital repositoryOpen Context ≠ A conventional digital repositoryOpen Context ≠ A conventional digital repositoryOpen Context ≠ A conventional digital repository
● Discover, cite “archaeological entities” (sites, coins, bones, sherds)
● High granularity (“URI per potsherd”)
● Schema mapping (one big database)
● Common human + machine interfaces for data + metadata
● Expensive “boutique” publishing
● Discover, cite digital files
● Low granularity (files aggregate many items of observation)
● Multiple data schemas in data files
● Human + machine interfaces for metadata (not content)
● Cheaper, easier to scale, more “self-service” models
Digital Repository
Managing Complexity:Data about this coin came from several different files (relational data bases, spreadsheets)
Some archaeological projects can have dozens of different spreadsheets + databases!
Managing Complexity:Data about this coin came from several different files (relational data bases, spreadsheets)
Some archaeological projects can have dozens of different spreadsheets + databases!
● Discover, cite “archaeological entities” (sites, coins, bones, sherds)
● High granularity (“URI per potsherd”)
● Schema mapping (one big database)
● Common human + machine interfaces for data + metadata
● Expensive “boutique” publishing
● Discover, cite digital files
● Low granularity (files aggregate many items of observation)
● Multiple data schemas in data files
● Human + machine interfaces for metadata (not content)
● Cheaper, easier to scale, more “self-service” models
Digital Repository
● Discover, cite “archaeological entities” (sites, coins, bones, sherds)
● High granularity (“URI per potsherd”)
● Schema mapping (one big database)
● Common human + machine interfaces for data + metadata
● Expensive “boutique” publishing
● Discover, cite digital files
● Low granularity (files aggregate many items of observation)
● Multiple data schemas in data files
● Human + machine interfaces for metadata (not content)
● Cheaper, easier to scale, more “self-service” models
Digital Repository
Stable Web URI:Reference this to disambiguate between “Alexandria” (Egypt) and other places called “Alexandria” (many of which are also ancient)
Stable Web URI:Reference this to disambiguate between “Alexandria” (Egypt) and other places called “Alexandria” (many of which are also ancient)
Pelagios:Heat map of museum collections, archives, databases referencing places in Pleiades (PIs Leif Isaksen, Elton Barker)
Pelagios:Heat map of museum collections, archives, databases referencing places in Pleiades (PIs Leif Isaksen, Elton Barker)
● Digital Index of North American Archaeology (DINAA): David G. Anderson, Joshua Wells (Pis) NSF-funded.
● URI for each archaeological “site file” record (curated by state agencies)
● Digital Index of North American Archaeology (DINAA): David G. Anderson, Joshua Wells (Pis) NSF-funded.
● URI for each archaeological “site file” record (curated by state agencies)
● DINAA Cross-referencing: tDAR and more specialized databases (Paleoindian Database of the Americans, Eastern Woodlands Household Archaeological Data Project)
● Google / Open Refine with API “recipes” to facilitate linking
● DINAA Cross-referencing: tDAR and more specialized databases (Paleoindian Database of the Americans, Eastern Woodlands Household Archaeological Data Project)
● Google / Open Refine with API “recipes” to facilitate linking
Publishing Workflow
Improve / Enhance1. Consistency2. Context
(intelligibility)
Improve / Enhance1. Consistency2. Context
(intelligibility)
Large scale data sharing & integration for exploring the origins of farming. Funded by EOL / NEH
Large scale data sharing & integration for exploring the origins of farming. Funded by EOL / NEH
1. 300,000 bone specimens2. Complex: dozens, up to 110
descriptive fields3. 34 contributors from 15
archaeological sites4. More than 4 person years
of effort to create the data !
1. 300,000 bone specimens2. Complex: dozens, up to 110
descriptive fields3. 34 contributors from 15
archaeological sites4. More than 4 person years
of effort to create the data !
7000 BC (many pigs, cattle)
7500 BC (sheep + goat dominate, few pigs, few cattle)
6500 BC (few pigs, mixing with wild animals?)
8000 BC (cattle, pigs,sheep + goats)
• Not a neat model of progress to adopt a more productive economy. Very different, sometimes piecemeal adoption in different regions.
Arbuckle BS, Kansa SW, Kansa E, Orton D, Çakırlar C, et al. (2014) Data Sharing Reveals Complexity in the Westward Spread of Domestic Animals across Neolithic Turkey. PLOS ONE 9(6): e99845. doi:10.1371/journal.pone.0099845
BIG Data versus SLOW DATABIG Data versus SLOW DATA
Final ThoughtsFinal Thoughts
Data require intellectual investment - creativity, not just compliance.
Need experimentation and community capacity building
Data require intellectual investment - creativity, not just compliance.
Need experimentation and community capacity building
Thank you!Thank you!
Special Thanks!Rikk Mulligan,ARL