data sharing as publication: a view from archaeology

26
Data Sharing as Publishing: A View from Archaeology Data Sharing as Publishing: A View from Archaeology Unless otherwise indicated, this work is licensed under a Creative Commons Attribution 3.0 License <http://creativecommons.org/licenses/by/3.0/> Eric C. Kansa (@ekansa) Open Context / UC Berkeley

Upload: eric-kansa

Post on 12-Apr-2017

203 views

Category:

Education


1 download

TRANSCRIPT

Page 1: Data Sharing as Publication: A View from Archaeology

Data Sharing as Publishing:A View from Archaeology

Data Sharing as Publishing:A View from Archaeology

Unless otherwise indicated, this work is licensed under a Creative Commons Attribution 3.0 License <http://creativecommons.org/licenses/by/3.0/>

Eric C. Kansa (@ekansa)Open Context / UC Berkeley

Page 2: Data Sharing as Publication: A View from Archaeology

Image Credit: ‘Peter’ via Flickr (CC-BY-SA) https://www.flickr.com/photos/12023825@N04/2898021822/

Publishing norms as if the past 30 years didn’t happenPublishing norms as if the past 30 years didn’t happen

Page 3: Data Sharing as Publication: A View from Archaeology

My Precious Data: Data do not “count”

My Precious Data: Data do not “count”

Image Credit: “Lord of the Rings” (2003, New Line), All Rights Reserved Copyright

Page 4: Data Sharing as Publication: A View from Archaeology

Data often discussed using language of

compliance + metrics (Taylorist perspectives)

Data often discussed using language of

compliance + metrics (Taylorist perspectives)

Image Credit: Wikimedia Commons (Public Domain) http://en.wikipedia.org/wiki/Frederick_Winslow_Taylor#mediaviewer/File:Frederick_Winslow_Taylor_crop.jpg

Page 5: Data Sharing as Publication: A View from Archaeology

● Linked: Links with wide array of other systems + data (example: ORCID)● Open: Code, data (mainly CC-By) on GitHub, machine-readable formats, APIs● Long-term: NSF, NEH data management. Archiving by the California Digital Library● Global: Mirroring, backup, collaboration with the German Archaeological Institute (DAI)

● Linked: Links with wide array of other systems + data (example: ORCID)● Open: Code, data (mainly CC-By) on GitHub, machine-readable formats, APIs● Long-term: NSF, NEH data management. Archiving by the California Digital Library● Global: Mirroring, backup, collaboration with the German Archaeological Institute (DAI)

Page 6: Data Sharing as Publication: A View from Archaeology

Archaeology & Less Structured Data1. More structured: classification, quantification2. Less structured: images, field-notes3. URIs useful even for less structured information

Archaeology & Less Structured Data1. More structured: classification, quantification2. Less structured: images, field-notes3. URIs useful even for less structured information

Page 7: Data Sharing as Publication: A View from Archaeology
Page 8: Data Sharing as Publication: A View from Archaeology

Federico Buccellati (CAA 2015)1. Interested in energetics, logistics, labor investment in

monumental architecture2. Look at lots of pictures, read field notes.3. BUT stable URIs to granular resources (individual context

records, images) mean greater reproducibility

Federico Buccellati (CAA 2015)1. Interested in energetics, logistics, labor investment in

monumental architecture2. Look at lots of pictures, read field notes.3. BUT stable URIs to granular resources (individual context

records, images) mean greater reproducibility

Page 9: Data Sharing as Publication: A View from Archaeology

Open Context ≠ A conventional digital repositoryOpen Context ≠ A conventional digital repositoryOpen Context ≠ A conventional digital repositoryOpen Context ≠ A conventional digital repository

Page 10: Data Sharing as Publication: A View from Archaeology

● Discover, cite “archaeological entities” (sites, coins, bones, sherds)

● High granularity (“URI per potsherd”)

● Schema mapping (one big database)

● Common human + machine interfaces for data + metadata

● Expensive “boutique” publishing

● Discover, cite digital files

● Low granularity (files aggregate many items of observation)

● Multiple data schemas in data files

● Human + machine interfaces for metadata (not content)

● Cheaper, easier to scale, more “self-service” models

Digital Repository

Page 11: Data Sharing as Publication: A View from Archaeology

Managing Complexity:Data about this coin came from several different files (relational data bases, spreadsheets)

Some archaeological projects can have dozens of different spreadsheets + databases!

Managing Complexity:Data about this coin came from several different files (relational data bases, spreadsheets)

Some archaeological projects can have dozens of different spreadsheets + databases!

Page 12: Data Sharing as Publication: A View from Archaeology

● Discover, cite “archaeological entities” (sites, coins, bones, sherds)

● High granularity (“URI per potsherd”)

● Schema mapping (one big database)

● Common human + machine interfaces for data + metadata

● Expensive “boutique” publishing

● Discover, cite digital files

● Low granularity (files aggregate many items of observation)

● Multiple data schemas in data files

● Human + machine interfaces for metadata (not content)

● Cheaper, easier to scale, more “self-service” models

Digital Repository

Page 13: Data Sharing as Publication: A View from Archaeology
Page 14: Data Sharing as Publication: A View from Archaeology
Page 15: Data Sharing as Publication: A View from Archaeology

● Discover, cite “archaeological entities” (sites, coins, bones, sherds)

● High granularity (“URI per potsherd”)

● Schema mapping (one big database)

● Common human + machine interfaces for data + metadata

● Expensive “boutique” publishing

● Discover, cite digital files

● Low granularity (files aggregate many items of observation)

● Multiple data schemas in data files

● Human + machine interfaces for metadata (not content)

● Cheaper, easier to scale, more “self-service” models

Digital Repository

Page 16: Data Sharing as Publication: A View from Archaeology

Stable Web URI:Reference this to disambiguate between “Alexandria” (Egypt) and other places called “Alexandria” (many of which are also ancient)

Stable Web URI:Reference this to disambiguate between “Alexandria” (Egypt) and other places called “Alexandria” (many of which are also ancient)

Page 17: Data Sharing as Publication: A View from Archaeology

Pelagios:Heat map of museum collections, archives, databases referencing places in Pleiades (PIs Leif Isaksen, Elton Barker)

Pelagios:Heat map of museum collections, archives, databases referencing places in Pleiades (PIs Leif Isaksen, Elton Barker)

Page 18: Data Sharing as Publication: A View from Archaeology

● Digital Index of North American Archaeology (DINAA): David G. Anderson, Joshua Wells (Pis) NSF-funded.

● URI for each archaeological “site file” record (curated by state agencies)

● Digital Index of North American Archaeology (DINAA): David G. Anderson, Joshua Wells (Pis) NSF-funded.

● URI for each archaeological “site file” record (curated by state agencies)

Page 19: Data Sharing as Publication: A View from Archaeology

● DINAA Cross-referencing: tDAR and more specialized databases (Paleoindian Database of the Americans, Eastern Woodlands Household Archaeological Data Project)

● Google / Open Refine with API “recipes” to facilitate linking

● DINAA Cross-referencing: tDAR and more specialized databases (Paleoindian Database of the Americans, Eastern Woodlands Household Archaeological Data Project)

● Google / Open Refine with API “recipes” to facilitate linking

Page 20: Data Sharing as Publication: A View from Archaeology

Publishing Workflow

Improve / Enhance1. Consistency2. Context

(intelligibility)

Improve / Enhance1. Consistency2. Context

(intelligibility)

Page 21: Data Sharing as Publication: A View from Archaeology

Large scale data sharing & integration for exploring the origins of farming. Funded by EOL / NEH

Large scale data sharing & integration for exploring the origins of farming. Funded by EOL / NEH

Page 22: Data Sharing as Publication: A View from Archaeology

1. 300,000 bone specimens2. Complex: dozens, up to 110

descriptive fields3. 34 contributors from 15

archaeological sites4. More than 4 person years

of effort to create the data !

1. 300,000 bone specimens2. Complex: dozens, up to 110

descriptive fields3. 34 contributors from 15

archaeological sites4. More than 4 person years

of effort to create the data !

Page 23: Data Sharing as Publication: A View from Archaeology

7000 BC (many pigs, cattle)

7500 BC (sheep + goat dominate, few pigs, few cattle)

6500 BC (few pigs, mixing with wild animals?)

8000 BC (cattle, pigs,sheep + goats)

• Not a neat model of progress to adopt a more productive economy. Very different, sometimes piecemeal adoption in different regions.

Arbuckle BS, Kansa SW, Kansa E, Orton D, Çakırlar C, et al. (2014) Data Sharing Reveals Complexity in the Westward Spread of Domestic Animals across Neolithic Turkey. PLOS ONE 9(6): e99845. doi:10.1371/journal.pone.0099845

Page 24: Data Sharing as Publication: A View from Archaeology

BIG Data versus SLOW DATABIG Data versus SLOW DATA

Page 25: Data Sharing as Publication: A View from Archaeology

Final ThoughtsFinal Thoughts

Data require intellectual investment - creativity, not just compliance.

Need experimentation and community capacity building

Data require intellectual investment - creativity, not just compliance.

Need experimentation and community capacity building

Page 26: Data Sharing as Publication: A View from Archaeology

Thank you!Thank you!

Special Thanks!Rikk Mulligan,ARL