iassit kansa presentation

36
Case-Study: Publishing to the “Web of Data” in Archaeology Quality and Workflows Eric Kansa UC Berkeley / OpenContext.org Unless otherwise indicated, this work is licensed under a Creative Commons Attribution 3.0 License <http://creativecommons.org/licenses/by/3.0/>

Upload: ekansa

Post on 29-Nov-2014

693 views

Category:

Technology


3 download

DESCRIPTION

A presentation given at the "Data Stewardship: Increasing the Integrity and Effectiveness of Science and Scholarship" Session on Friday, June 8 2012 at the IASSIT 2012 conference in Washington DC. This presentation introduced data publishing, using a social science (archaeology) case study to explore editorial processes and dissemination outcomes that increasingly demand “Linked Data” capabilities.

TRANSCRIPT

Page 1: IASSIT Kansa Presentation

Case-Study: Publishing to the “Web of Data” in Archaeology

Quality and Workflows

Eric Kansa UC Berkeley / OpenContext.org

Unless otherwise indicated, this work is licensed under a Creative Commons Attribution 3.0 License <http://creativecommons.org/licenses/by/3.0/>

Page 2: IASSIT Kansa Presentation

“Small Science” data sharing is hard:(1) Complexity(2) Scalability(3) Ethics, cultural property

claims, IP(4) Incentives(5) Preservation

Image Credit: “Grand Canyon NPS” via Flickr (CC-By)http://www.flickr.com/photos/grand_canyon_nps/5975537378/

Page 3: IASSIT Kansa Presentation

Thousand Flowers

● Open Context: Open access, open licensed data for arhaeology

● Archiving by California Digital Library

● Persistent Identifiers (DOIs, ARKs)

● Web services● NSF/NEH links for data

management plans

Page 4: IASSIT Kansa Presentation

Thousand Flowers

Fills a Gap:

Most data sources are institutional. Open Context publishes individual, small group contributions

Page 5: IASSIT Kansa Presentation

Thousand Flowers

Fills a Gap:

Most data sources are institutional. Open Context publishes individual, small group contributions

Challenge:Diverse contributions, needing lots of work to clean-up and “link” to the Web of Data

Page 6: IASSIT Kansa Presentation

• 3-year project Oct 2010 – Sep 2013

• Funded with a National Leadership Grant from the Institute for Museum and Library Services, LG-06-10-0140-10, “Dissemination Information Packages for Information Reuse”

• Ixchel Faniel, PI & Elizabeth Yakel, Co-PI

http://www.dipir.org

Page 7: IASSIT Kansa Presentation

DIPIR Collaboration

Page 8: IASSIT Kansa Presentation

The Big DIPIR Questions

Research Questions

1. What are the significant properties of data that facilitate reuse by the designated communities at the three sites?

2. How can these significant properties be expressed as representation information to ensure the preservation of meaning and enable data reuse?

Page 9: IASSIT Kansa Presentation

Open Context Interviewees

• 22 Ph.D. or graduate students interviewed

– 13 men– 9 women

• Novices / Experts– 19 experts– 3 novices

• Interviewees who where curators or professors also with a curatorial role = 6

Page 10: IASSIT Kansa Presentation

Raw Data is Unappetizing?

Page 11: IASSIT Kansa Presentation

Data Documentation PracticesI use an Excel spreadsheet…which I … inherited from my research advisers. …my dissertation advisor was still recording data for each specimen on paper when I was in graduate school so that's what I started …then quickly, I was like, "This is ridiculous.“… I just started using an Excel spreadsheet that has sort of slowly gotten bigger and bigger over time with more variables or columns…I've added …color coding…I also use…a very sort of primitive numerical coding system, again, that I inherited from my research advisers…So, this little book that goes with me of codes which is sort of odd, but …we all know that a 14 is a sheep.” (CCU13)

Page 12: IASSIT Kansa Presentation

Data Documentation PracticesI use an Excel spreadsheet…which I … inherited from my research advisers. …my dissertation advisor was still recording data for each specimen on paper when I was in graduate school so that's what I started …then quickly, I was like, "This is ridiculous.“… I just started using an Excel spreadsheet that has sort of slowly gotten bigger and bigger over time with more variables or columns…I've added …color coding…I also use…a very sort of primitive numerical coding system, again, that I inherited from my research advisers…So, this little book that goes with me of codes which is sort of odd, but …we all know that a 14 is a sheep.” (CCU13)

A long way to go before we get usable, intelligible data

Page 13: IASSIT Kansa Presentation

Sometimes data is better served cooked.

Page 14: IASSIT Kansa Presentation

Thousand Flowers

● Clean-up and document contributed data

● Map to ArchaeoML (general ontology)

● Mint URIs to entities (potsherds, projects, contexts, people)

● Link to important vocabularies / collections (Pleiades, Encyclopedia of Life)

● Working on CIDOC-CRM (RDF) representations (not straightforward)

Page 15: IASSIT Kansa Presentation

Open Context: Record

Page 16: IASSIT Kansa Presentation

Open Context: Record

● XHTML + RDFa (Dublin Core, Open Annotation, etc.)

● XML (ArchaeoML)● Atom● RDF (draft CIDOC)● Link to GitHub versioned file

Page 17: IASSIT Kansa Presentation

Open Context: Record

Page 18: IASSIT Kansa Presentation

Open Context: Record

Page 19: IASSIT Kansa Presentation

Open Context: Visutalization of Data Linked to the EOL

Page 20: IASSIT Kansa Presentation

My Precious Data

Image Credit: “Lord of the Rings” (2003, New Line), All Rights Reserved Copyright

Page 21: IASSIT Kansa Presentation

Data sharing as publication

Page 22: IASSIT Kansa Presentation

Data Publishing

Page 23: IASSIT Kansa Presentation

Data Quality and Standards Alignment(1) Check consistency(2) Edit functions(3) Align to common standards

(“Linked Data” if applicable)(4) Issue tracking, version

control

Publishing

Page 24: IASSIT Kansa Presentation

Tools of the Trade

(1) Google Refine (check, edit, consistancy)

(2) Mantis (issue-tracker, coordinate edits, metadata creation)

Publishing

Page 25: IASSIT Kansa Presentation

Tools of the Trade

(1) Domain scientists (Editorial Board) check data

(2) Iterative “coproduction” between contributors and editoris

Publishing

Page 26: IASSIT Kansa Presentation

Publishing

Project Metadata

Column Descriptions

Page 27: IASSIT Kansa Presentation

Web of Data (2011)

Main Contributors:

● Institutions (esp. government)

● Thematic collections / projects

Page 28: IASSIT Kansa Presentation

Entity Reconciliation

(1) With Google Refine(2) Implemented, EOL and

Pleiades (gazetteer)(3) Use existing mappings to

improve future reconciliation

Publishing

Page 29: IASSIT Kansa Presentation

● CDL Archiving Service● EZID for persistent Identity: DOIs

(aggregate resources), ARKs (granular resources) and Merritt Repository

● Helps build trust in community

Page 30: IASSIT Kansa Presentation

● Platform / Services disciplinary communities can use for “Data Publishing”

● Different communities work out semantic/interoperability needs, editorial policies, incentives, etc.

University of California (System) Repository,

All disciplines(UC-funded library, grants)

CDL as Infrastructure

Page 31: IASSIT Kansa Presentation

● Platform / Services disciplinary communities can use for “Data Publishing”

● Different communities work out semantic/interoperability needs, editorial policies, incentives, etc.

University of California (System) Repository,

All disciplines(UC-funded library, grants)

CDL as InfrastructureFuture data publisher

Future data publisher

Page 32: IASSIT Kansa Presentation

eScholarship: UC’s OA Publishing Platform

Page 33: IASSIT Kansa Presentation

Platform for traditional publishing

Page 34: IASSIT Kansa Presentation

Also supports new genres

Page 35: IASSIT Kansa Presentation

Outcomes of Publishing Data:(1) Communicate and set

expectations about content and quality

(2) Organize workflows to improve data quality and usability

(3) Make “datasets” first class citizens in world of scholarly communications

Summary

Page 36: IASSIT Kansa Presentation

Final Thoughts

Publication needs to evolve!

(1) Participating in Linked Data is a great goal, but far removed from most everyday practice

(2) Researchers need help.

(3) 19th century publication norms poorly suited to 21st century methods, research, public goals