the curator’s approach to data management and …...the curator’s approach to data management...

36
The Curator’s Approach to Data Management and Sustainability Nic Weber & Megan Senseney Center for Informatics Research in Science & Scholarship Graduate School of Library & Information Science University of Illinois at Urbana-Champaign Digital Humanities at Oxford Summer School 14-18 July 2014

Upload: others

Post on 13-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Curator’s Approach to Data Management and …...The Curator’s Approach to Data Management and Sustainability Nic Weber & Megan Senseney Center for Informatics Research in Science

The Curator’s Approach to Data Management and Sustainability Nic Weber & Megan Senseney Center for Informatics Research in Science & Scholarship Graduate School of Library & Information Science University of Illinois at Urbana-Champaign Digital Humanities at Oxford Summer School 14-18 July 2014

Page 2: The Curator’s Approach to Data Management and …...The Curator’s Approach to Data Management and Sustainability Nic Weber & Megan Senseney Center for Informatics Research in Science

Agenda

Data management –  ...as a DH technique

•  “…valued ends…” •  “…available resources…”

– DMP Agency Mandates – DMP beyond two pages

Sustainability – Significant properties – 2 Case studies in DH sustainability

Page 3: The Curator’s Approach to Data Management and …...The Curator’s Approach to Data Management and Sustainability Nic Weber & Megan Senseney Center for Informatics Research in Science

“I’m trying to deflate the idea of digital humanities from a domain to an underlying set of practices” 6 July DH 2014

Page 4: The Curator’s Approach to Data Management and …...The Curator’s Approach to Data Management and Sustainability Nic Weber & Megan Senseney Center for Informatics Research in Science

DM as a DH Technique

Many different

Techniques

Page 5: The Curator’s Approach to Data Management and …...The Curator’s Approach to Data Management and Sustainability Nic Weber & Megan Senseney Center for Informatics Research in Science

Data Management as a DH Technique

“…the ensemble of practices by which one uses available resources in order to achieve certain valued ends.”

Harold Lasswell

Page 6: The Curator’s Approach to Data Management and …...The Curator’s Approach to Data Management and Sustainability Nic Weber & Megan Senseney Center for Informatics Research in Science

Valued Ends

•  Preservation of Knowledge (material artifacts that are produced, as well as ways of knowing)

•  Maximize the value of public investment •  Increase the efficiency of doing digital

humanities research – both immediate and long-term.

Page 7: The Curator’s Approach to Data Management and …...The Curator’s Approach to Data Management and Sustainability Nic Weber & Megan Senseney Center for Informatics Research in Science

7 The Royal Society Science Policy Centre. (2012). Science as an open enterprise. Page 60.

Page 8: The Curator’s Approach to Data Management and …...The Curator’s Approach to Data Management and Sustainability Nic Weber & Megan Senseney Center for Informatics Research in Science

Data management

•  Is highly personal •  Interpersonal when collaborating •  Intrapersonal in our relationship with

institutions, organizations and funding agencies

Page 9: The Curator’s Approach to Data Management and …...The Curator’s Approach to Data Management and Sustainability Nic Weber & Megan Senseney Center for Informatics Research in Science

! =

Page 10: The Curator’s Approach to Data Management and …...The Curator’s Approach to Data Management and Sustainability Nic Weber & Megan Senseney Center for Informatics Research in Science

Data management techniques include concerns of …

•  Planning ( more in a bit ) / Costing •  Documentation •  Formatting •  Storage •  Copyright / IP / Licensing

Page 11: The Curator’s Approach to Data Management and …...The Curator’s Approach to Data Management and Sustainability Nic Weber & Megan Senseney Center for Informatics Research in Science

Documentation

Page 12: The Curator’s Approach to Data Management and …...The Curator’s Approach to Data Management and Sustainability Nic Weber & Megan Senseney Center for Informatics Research in Science

Documentation : tricks and tips

•  Include a “header” line that describes the variables as the first line in the table.

•  Use plain ASCII text for your file names, variable names, and data values.

•  Record naming schemes (<- develop naming schemes)

•  When you export from an analysis environment (e.g. SPSS, R, Gephi, etc.) record transformations in a separate: readme_(filename).txt file

Page 13: The Curator’s Approach to Data Management and …...The Curator’s Approach to Data Management and Sustainability Nic Weber & Megan Senseney Center for Informatics Research in Science

Storage & Formatting!

Page 14: The Curator’s Approach to Data Management and …...The Curator’s Approach to Data Management and Sustainability Nic Weber & Megan Senseney Center for Informatics Research in Science

Storage : DIY Cyberinfrastructure

Page 15: The Curator’s Approach to Data Management and …...The Curator’s Approach to Data Management and Sustainability Nic Weber & Megan Senseney Center for Informatics Research in Science

Formatting & Storage: Tricks and Tips

•  Store data in nonproprietary software formats (e.g., comma delimited text file, .csv); proprietary software (e.g., Excel, Access) can become unavailable, whereas text files can always be read.

•  When in an analysis stage - store an uncorrected (raw) data file. Do not make any corrections to this file; make corrections within a scripted language.

Modified from: https://www.nceas.ucsb.edu/content/simple-guidelines-effective-data-management

Page 16: The Curator’s Approach to Data Management and …...The Curator’s Approach to Data Management and Sustainability Nic Weber & Megan Senseney Center for Informatics Research in Science

Copyright / IP slide

Page 17: The Curator’s Approach to Data Management and …...The Curator’s Approach to Data Management and Sustainability Nic Weber & Megan Senseney Center for Informatics Research in Science

IP: Tricks of Trade

Melissa Levine’s Checklist on the DH Curation Guide: http://guide.dhcuration.org/legal/policy/#p05

Page 18: The Curator’s Approach to Data Management and …...The Curator’s Approach to Data Management and Sustainability Nic Weber & Megan Senseney Center for Informatics Research in Science

Data Management Planning

•  Is highly social – Dialectic (optimal vs. practical) – Plans change

Page 19: The Curator’s Approach to Data Management and …...The Curator’s Approach to Data Management and Sustainability Nic Weber & Megan Senseney Center for Informatics Research in Science

Peer Reviewed

Components Enforcement

AHRC

Yes

Summary of Digital Outputs and Digital Technologies; Technical Methodology; Standards and Formats; Hardware and Software; Data Acquisition, Processing, Analysis and Use; Technical Support and Relevant Experience; Preservation, Sustainability and Use; Preserving Your Data; Ensuring Continued Access and Use of Your Digital Outputs

Unclear

NEH

YES

Expected types of data Period of data retention Data forms and dissemination Data storage and preservation

YES

EU

No Data set reference and name Data set description Standards and metadata Data sharing Archiving and preservation

Sliding

DMP Mandates (Funding Agencies)

Page 20: The Curator’s Approach to Data Management and …...The Curator’s Approach to Data Management and Sustainability Nic Weber & Megan Senseney Center for Informatics Research in Science

AHRC Example Project: Kitchen Cosmology Project University of Bristol. PI: Dr. Rita Langer. Link: http://bit.ly/1n0eVUn

NEH Example Project: A unified approach to preserving cultural software objects and their development histories : UC – Santa Cruz. PI Noah Wardrip-Fruin Link: http://1.usa.gov/1kNxM8n

Page 21: The Curator’s Approach to Data Management and …...The Curator’s Approach to Data Management and Sustainability Nic Weber & Megan Senseney Center for Informatics Research in Science

completed worksheets

Page 22: The Curator’s Approach to Data Management and …...The Curator’s Approach to Data Management and Sustainability Nic Weber & Megan Senseney Center for Informatics Research in Science

Costing – Tricks and Tips

4C: Overview of 10 curation cost models: http://bit.ly/1lDMUFt “…provides a short description of each of the models and a presentation of their core features…”

Page 23: The Curator’s Approach to Data Management and …...The Curator’s Approach to Data Management and Sustainability Nic Weber & Megan Senseney Center for Informatics Research in Science

More tricks of the trade slide

•  Advertise your data •  Say how you would like it to be cited (paper?

data? both?) •  State known limitations (fit-for-purpose) •  Rely on journals, repositories and colleagues

for guidance •  Don’t rely on journals, repositories or

colleagues for guidance

Page 24: The Curator’s Approach to Data Management and …...The Curator’s Approach to Data Management and Sustainability Nic Weber & Megan Senseney Center for Informatics Research in Science

SUSTAINABILITY How do projects end?

Page 25: The Curator’s Approach to Data Management and …...The Curator’s Approach to Data Management and Sustainability Nic Weber & Megan Senseney Center for Informatics Research in Science

Why this matters to DC

Fundamental questions of digital preservation: 1.  What must you retain to ensure the integrity

and authenticity of the digital object? 2. What can you lose without potential implications?

Page 26: The Curator’s Approach to Data Management and …...The Curator’s Approach to Data Management and Sustainability Nic Weber & Megan Senseney Center for Informatics Research in Science

Significant Properties

“…characteristics of an information object that must be maintained to ensure that object’s continued access, use, and meaning over time as it is moved to new technologies.” (Wilson, 2007).

Page 27: The Curator’s Approach to Data Management and …...The Curator’s Approach to Data Management and Sustainability Nic Weber & Megan Senseney Center for Informatics Research in Science

Five categories of SPs • Content • Context • Rendering • Structure • Behavior

Page 28: The Curator’s Approach to Data Management and …...The Curator’s Approach to Data Management and Sustainability Nic Weber & Megan Senseney Center for Informatics Research in Science

Criteria for deciding significance

Grace, S. & Knight, G. (2008)

Page 29: The Curator’s Approach to Data Management and …...The Curator’s Approach to Data Management and Sustainability Nic Weber & Megan Senseney Center for Informatics Research in Science

GLOBALIZATION AND AUTONOMY ONLINE COMPENDIUM

Case study 1 : Sustainability

Rockwell, Day, Yu, and Engel (2014) Burying Dead Projects: Depositing the Globalization Compendium. Digital Humanities Quarterly; 8 (002). http://www.digitalhumanities.org/dhq/vol/8/2/000179/000179.html

Page 30: The Curator’s Approach to Data Management and …...The Curator’s Approach to Data Management and Sustainability Nic Weber & Megan Senseney Center for Informatics Research in Science

Then we came to (planning for) the end

-  XML files with content; -  A MySQL bibliographic database; -  A metadata database of the content

for generating topical pages and for searching;

-  A full text index for searching the text;

-  The code that handles the dynamic generation of the site, the searching, linking, and the XSL transforms;

-  Some HTML pages and CSS stylesheets;

-  And various images that are embedded in pages.

End of what?

http://globalautonomy.ca/global1/index.jsp

Page 31: The Curator’s Approach to Data Management and …...The Curator’s Approach to Data Management and Sustainability Nic Weber & Megan Senseney Center for Informatics Research in Science

“The experience of the Compendium is that the intellectual work is not only in the individual articles, or even in the bibliographic data – it is in the interaction between these, mediated by code and in the user experience.”

Rockwell et al. 2014

Page 32: The Curator’s Approach to Data Management and …...The Curator’s Approach to Data Management and Sustainability Nic Weber & Megan Senseney Center for Informatics Research in Science

What was deposited?

Content: …the texts, including bibliography, and glossary. We also considered the text on the HTML pages content.

Code: HTML, CSS, and includes the XSLT code that generated much of the interface Process: …materials (but not all) that document the editorial processes, including the editorial backend that strictly speaking was not part of the Compendium as experienced. The User Experience: …information about the experience of the Compendium as an interactive work by writing a narrative along with screen shots of typical use of the Compendium stored as PDFs

Page 33: The Curator’s Approach to Data Management and …...The Curator’s Approach to Data Management and Sustainability Nic Weber & Megan Senseney Center for Informatics Research in Science

Five categories of SPs • Content • Context • Rendering • Structure • Behavior

Rockwell’s Categories •  Content •  Code •  Process •  User Experience

Page 34: The Curator’s Approach to Data Management and …...The Curator’s Approach to Data Management and Sustainability Nic Weber & Megan Senseney Center for Informatics Research in Science

PERSEUS DIGITAL LIBRARY Case study 2 : Sustainability

Page 35: The Curator’s Approach to Data Management and …...The Curator’s Approach to Data Management and Sustainability Nic Weber & Megan Senseney Center for Informatics Research in Science

How would Perseus End? (hint – not by beheading Medusa)

Page 36: The Curator’s Approach to Data Management and …...The Curator’s Approach to Data Management and Sustainability Nic Weber & Megan Senseney Center for Informatics Research in Science

RESOURCE LIST

Rockwell, Day, Yu, and Engel (2014) Burying Dead Projects: Depositing the Globalization Compendium. Digital Humanities Quarterly; 8 (002). http://www.digitalhumanities.org/dhq/vol/8/2/000179/000179.html Grace, S. & Knight, G. (2008) What are significant properties and why should I care? Presentation delivered at Digital Curation 101, October, 7 2008. Edinburgh, Scotland