keep calm and curate

15
Keep Calm and Curate Easy steps to help you better manage your content and make it more useful Gareth Knight, Digital Curation Specialist [email protected] Food for Thought. University of Cambridge 27th June 2011

Upload: garethknight

Post on 24-May-2015

363 views

Category:

Technology


3 download

DESCRIPTION

Presentation given at the University of Cambridge for the English department's Food for Thought event on 27th June 2011

TRANSCRIPT

Page 1: Keep Calm and Curate

Keep Calm and CurateEasy steps to help you better manage your content and make it more useful

Gareth Knight, Digital Curation Specialist

[email protected]

Food for Thought. University of Cambridge

27th June 2011

Page 2: Keep Calm and Curate

2

DIGITAL LIFECYCLEWe’ve created our research content

It has taken a lot of time &

effort

I want to get maximum value from it

How can I ensure that it doesn’t go to

an early grave?

Page 3: Keep Calm and Curate

3

WHY MANAGE YOUR CONTENT?

Researcher perspective1. Protect value of content2. Maximise visibility and impact of researcher3. Enable continue development and use

Institutional Perspective4. Protect financial investment5. Evidence of operation and impact

6. Compliance with appropriate regulations

Page 4: Keep Calm and Curate

4

MANAGEMENT CHALLENGES

•Inability to locate:• Data files have been lost or corrupted and an alternative copy

cannot be found.•Inability to access:

• Data files cannot be decoded using available software•Uncertainty over content:

• Many data files exist, but it is unclear what constitutes the final product of research process and what is a by-product of investigation.

•Inability to understand:• Data files can be accessed using appropriate software, but

context of research content cannot be established•Unclear usage:

• Rights issues associated with publication and use of content is unclear – should the institution err on the side of caution?

Page 5: Keep Calm and Curate

5

RESEARCHER EXPERIENCE @ KCL

JISC PEKin project performed assessment of six research & administrative departments within King’s College London during 09/10:

Storage use:• Staff were uncertain where to store data. Network drives did not offer sufficient

capacity, resulting in use of local storage (external USB disks, USB sticks) that had no backups, or as 3rd party services unknown to institution (e.g. DropBox)

Data encoding & conversion:• Data formats: Uncertainty over correct file format to use to store data.• Data conversion problematic – tools cause loss of some significant properties

Authenticity concerns: Questionable provenance:• Staff do not understand origin of data. Many different copies with

unidentified/unknown changes made by different authors• Result: Staff store known good copy on local drive. Some rely upon print-outs

of digital original

Archival value and retention period:• Value of research papers understood, but value of datasets & other outputs

not recognised (implications for REF & publication). Some data stored, others deleted

Page 6: Keep Calm and Curate

6

DATA STORAGEReality: ALL digital storage media is unreliable:

• Gradual degradation over time• Unexpected failure through power surge, unexpected motion,

and theft (as well as accidental washing)• Media obsolescence – 5 ¼ and 3 ½ inch

floppy disk, Zip disks, & many others• 3rd party storage providers can close

their service & delete your content

Practical approaches to take:• Appraise - do you need to keep everything?• Store content on at least 2 forms of storage in different

locations, e.g. store 2 local copies, one on internal drive and one on USB stick, hard disk, etc, and at least one remote copy (e.g. departmental shared drive)

• Submit your research to the institutional repository• Test your backups to ensure they are still valid.• Copy data files to new media every 2-5 years after first

creationhttp://www.flickr.com/photos/timypenburg/5442288539/

Page 7: Keep Calm and Curate

7

DATA ORGANISATION

•If someone examined your data for the first time, what would they wish to know?

• What research collection is contained within the directory?• What type of information does it contain?• Where can I find specific content, e.g. final report, analysis

data?

•Practical approaches to take:• Establish directory structure that clearly distinguishes between

groups of files (e.g. reports, photographs, etc.) Use sub-directories for sub-categories (e.g. topics, date, version)

• Adopt a consistent approach to organising directories (across your department, if possible)

• Label files in manner that allows purpose, version and other relevant information to be quickly identified (e.g. using filename, cover page)

http://www.flickr.com/photos/amcclen/253640379/

Page 8: Keep Calm and Curate

8

CHOOSING THE RIGHT FORMAT

How do you choose correct file format to store your content?

• Each format has diff. capabilities & are not suitable for every task, e.g. MSWord not suitable for web access, etc.

• Some formats remove content or functionality to reduce file size & limit use, e.g. JPEGs lack detail, difficult to edit PDFs

• Target audience may not see content in same way that you do - each application interprets data and renders content differently, e.g. diff. fonts, layout changes

How do you ensure that content can be accessed in long-term?• Format obsolescence: Gradual change may result in formats & older

versions becoming difficult to access over time, e.g. MS Word, WordPerfect, older AutoCAD formats, complex objects – gradual change - Know file contains information, but what does it mean?

• Format conversion: Some content attributes may be changed or lost when converting between formats

Page 9: Keep Calm and Curate

9

DIFFERENCES IN SOFTWARE INTERPRETATION

Open Office

Microsoft Powerpoint

Open Office Impress

Page 10: Keep Calm and Curate

10

PHOTOGRAPH FORMATS

Original photograph

stored as TIFF

79712 colours

JPEG, 85% compression

Considerable detail loss

GIF, 256 colours, colour

banding on petals

Open Office

Page 11: Keep Calm and Curate

11

CHOOSING THE “RIGHT” FORMAT

Select diff. formats based upon needs, rather than single format:

• Digital master: Preservation copy intended for long-term storage

• Dissemination: Access formats for use by specific users, e.g. PDF

•Format of the digital master:• Try to use common, widely used formats supported by a

range of software tools.• Store content in formats that support required attributes (e.g.

16 million colours) and will not degrade when resaved – ensure that you re-examine your file after you’ve saved it

• Retain all data associated with original creation/capture process – may contain information properties that is useful at a later date

Page 12: Keep Calm and Curate

12

POTENTIAL FORMATS

Digital master Distribution copy

Plain Text ASCII/Unicode text ASCII text, Unicode text

Document Open Document, Rich Text

Format, MS DocX (possibly)

PDF/A

Database Comma Separated (CSV) or

Tab-delimited text (tab),

SQL Dump (possibly)

MySQL, MS Access,

FileMaker Pro through

appropriate front-end

Photos TIFF, PNG, RAW JPEG, PNG

Audio AIFF, Microsoft Wave, FLAC

(potentially)

MP3, ASF

Video MPEG2 (as used on DVDs),

JPEG 2000 in an MJPEG

wrapper, MJPEG

MPEG2, Quicktime, AVI ,etc.

Page 13: Keep Calm and Curate

13

DOCUMENTATION

Information necessary to interpret, understand and use a given dataset or set of documents

What would someone wish to know about your content?

• Who created it?• When was it created?• Why was it created?

Who funded it?• What is the source of the

material used?• What is the motivation for the

approach you took?• What content can be

published?• How can it be used?

Practical approach to take:

• Attach a cover page to your document with relevant creator & rights information

• Create a catalogue record for your digital repository

• Create an administrative file for internal use that can help colleagues and repository staff and assign it an appropriate filename

Where to go for help:

DSpace@Cambridge Guidance:http://www.lib.cam.ac.uk/

dataman/pages/metadata.html

http://www.flickr.com/photos/playingwithpsp/3031647963/

Page 14: Keep Calm and Curate

14

CONCLUSIONS• A number of factors may limit use of your content

over time. However, you can make choices that will enable your content to be accessed in the long-term and be usable by others

• Ways to protect the value of your data:• Store your data in 2 or more locations• Organise it using an easy to understand structure• Adopt a digital master format that is fit for purpose• Document information that cannot be obtained elsewhere

• Support & documentation available within institution (DSpace@Cambridge) and externally which can help you with choices

Page 15: Keep Calm and Curate

15

USEFUL REFERENCES

Cambridgehttp://www.lib.cam.ac.uk/dataman/

Glasgow http://www.gla.ac.uk/services/datamanagement/

Edinburgh http://www.ed.ac.uk/schools-departments/information-

services/services/research-support/data-library/

research-data-mgmt