digital curation and higher education it: lessons from the national agenda for digital stewardship...

62
1

Upload: educause

Post on 29-May-2017

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

1

Page 2: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

Digital Stewardship and Higher Education IT: Lessons from the National

Agenda

Prepared for

NERCOMP Annual Conference

March 2014

Presented by:

Micah Altman, <[email protected]>

Director of Research, MIT Libraries

Non-Resident Senior Fellow, Brookings Institution

Page 3: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

Capturing Contributor Roles in Scholarly Publications

DISCLAIMERThese opinions are my own, they are not the opinions of MIT, Brookings, any of the project funders, nor (with the exception of co-authored previously published work) my collaborators

Secondary disclaimer:

“It’s tough to make predictions, especially about the future!”-- Attributed to Woody Allen, Yogi Berra, Niels Bohr, Vint Cerf, Winston Churchill, Confucius, Disreali [sic], Freeman Dyson, Cecil B. Demille, Albert Einstein, Enrico Fermi, Edgar R. Fiedler, Bob Fourer, Sam Goldwyn, Allan Lamport, Groucho Marx, Dan Quayle, George Bernard Shaw,

Casey Stengel, Will Rogers, M. Taub, Mark Twain, Kerr L. White, etc.

Page 4: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

PreviewWho are the NDSA?

Why develop an agenda for digital stewardship?

What should national stewardship priorities be?

… research& foundations of stewardship … digital content

… technical infrastructure… organizational roles

Lessons for Higher Ed IT4

Page 5: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

Digital Stewardship and Higher Education IT: Lessons from the National Agenda

5

Collaborators & Co-Conspirators• The 160+ institutional members of NDSA, and the

10000+ hours contributed by their representatives to NDSA working groups, meetings and reports

• National Agenda Authors:

Micah Altman, Jefferson Bailey, Karen Cariani, Jim Corridan, Jonathan Crabtree, Blaine Dessy, Michelle Gallinger, Andrea Goethals, Abigail Grotke, Cathy Hartman, Butch Lazorchak, Jane Mandelbaum, Carol Minton Morris, Trevor Owens, Meg Phillips, John Spencer, Helen Tibbo, Tyler Walters, Kate Wittenberg, Kate Zwaard

Page 6: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

Digital Stewardship and Higher Education IT: Lessons from the National Agenda

Who are the NDSA?

6

Page 7: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

About the NDSA• Founded in 2010, the National Digital Stewardship Alliance (NDSA) is a

consortium of institutions that are committed to the long-term preservation of digital information.

• Our mission is to establish, maintain, and advance the capacity to preserve our nation's digital resources for the benefit of present and future generations.

• NDSA member institutions represent all sectors, and include universities, consortia, professional associations, commercial enterprises, and government agencies at the federal, state, and local levels. The Library of Congress provides organizational support and substantive collaboration as Secretariat.

• Based on collaborative community effort -- there are no fees for NDSA membership. Each member institution commits to to NDSA principles, and contributes efforts to working groups, reports, surveys, meetings and other NDSA initiatives.

Digital Stewardship and Higher Education IT: Lessons from the National Agenda 7

Page 8: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

NDSA Initiatives

Digital Stewardship and Higher Education IT: Lessons from the National Agenda 8

Wor

king

G

roup

sR

ecen

t O

utpu

ts

Extending Knowledge• Preservation Storage Survey• Web Harvesting Survey• Preservation Staffing Survey• Geospatial Selection &

Appraisal report• Content case studies• NDSA Interview Series

Tools for Practice

• Levels of Preservation• Digital Preservation in a Box• Digital Preservation on

Wikipedia

Dissemination• National agenda for digital

stewardship • NDSA Innovation Awards• NDSA Social Media

Page 9: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

NDSA Member Organizations

• 165 Member Organizations

• From all sectors• Committed to

digital stewardship

Digital Stewardship and Higher Education IT: Lessons from the National Agenda 9

digitalpreservation.gov/ndsa/memberslist.html

Page 10: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

Digital Stewardship and Higher Education IT: Lessons from the National Agenda

Why develop an agenda for digital

stewardship?

10

Page 11: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

Why a national agenda for digital stewardship?

• Effective digital stewardship is vital for:– maintaining authentic public records– growing a reliable scientific evidence base– providing durable access to our cultural heritage

• Knowledge of ongoing research, practice, and organizational collaborations is distributed widely across disciplines, sectors, and communities of practice

Digital Stewardship and Higher Education IT: Lessons from the National Agenda 11

Page 12: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

How was this accomplished it?• Contributed community effort

- Development: contributions from the (now 150+) institutional members through working group participation, workshop discussion, commentary

- Writing: LC Staff, chairs of NDSA working groups, coordination committee- Reviewing: expert reviewers in the preservation community

• Integrating diverse perspectives from multiple disciplines & sectors

• The persistence, organization, and commitment of the Library of Congress in its role as Secretariat

Digital Stewardship and Higher Education IT: Lessons from the National Agenda 12

Page 13: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

Why Now - Climate

Strong trends towards:• More production of digital content• More publishing, filtering and access • More learners and collaborators• More attention to public information

Digital Stewardship and Higher Education IT: Lessons from the National Agenda 13

Page 14: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

Trends in Higher Education Technology willIncrease Need for Information Stewardship

• Adoption Trends– Growing Ubiquity of Social Media – Integration of Online, Hybrid, and Collaborative

Learning – Rise of Data-Driven Learning and Assessment– Shift to Students as Creators – Evolution of Online Learning

• Significant Challenges– Low Digital Fluency of Faculty – Scaling Teaching Innovations

• Important Developments– Learning Analytics– 3D Printing– Quantified Self

Digital Stewardship and Higher Education IT: Lessons from the National Agenda 14

more information, in new forms, created by more people

need to manage, understand, and retain information for teaching, research, and evaluation

Requires curation at scale

Page 15: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

Maximizing the Impact of Research through Research Data Management

15

Climate vs Weather• Climate is what you should expect -- weather is what you get. • Climate for reproducibility and data management seems

favorable… prepare for shifts in the weather.

Page 16: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

What Was Accomplished?The

National Agenda for Digital Stewardship identifies high-impact opportunities to

advance:

• the state of the art• the state of practice

• the state of collaboration

Digital Stewardship and Higher Education IT: Lessons from the National Agenda 16

Page 17: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

Digital Stewardship and Higher Education IT: Lessons from the National Agenda

Foundations of Content

Stewardship— Framework &

Research

17

Page 18: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

Digital Stewardship and Higher Education IT: Lessons from the National Agenda

18

What is Content Stewardship?

• Stewardship involves taking broad responsibility for preservation and curation

• The goal of preservation is ensuring meaningful long-term access

• Example:

If you have 1000 files (bitstreams), and you’d like to have 99.99% chance of accessing them in 20 years. How do you store them?

Page 19: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

Why not store everything with Amazon?

• Why not put everything in Amazon?• Amazon claims reliability of 99.999999999%

(Better odds than winning Powerball ®, being struck by lightning, and finding alien life… combined)

Digital Stewardship and Higher Education IT: Lessons from the National Agenda 19

Page 20: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

Digital Stewardship and Higher Education IT: Lessons from the National Agenda

20

What’s left out of the Eleven Nines?• What are the units? - Collection? Object? Bit?• How was the failure rate calculated? (It’s theoretical)

– MBTF + Independence * enough replicas = lots of nines– But.. No details for estimate provided; No historical reliability statistics provided; No service reliability

auditing provided

• What is the empirical evidence for MBTF?– Storage manufacture hardware MTBF (mean time between failures) is inaccurate…– Failures across hardware replicas are not independent

• What threats are assumed away? – software failure

(e.g. a bug in the AWS software for its control backplane)– legal threats (leading to account lock-out — such as this, deletion, or content removal);– institutional threats (such as a change in Amazon’s business model)– Process threats (someone hits the delete button by mistake; forgets to pay the bill; or AWS rejects the

payment)

• Do SLA’s or audits back up “design” reliability claims?– No claim to reliability in SLA’s (or uptime, availability, response time…) – Can’t even prove AWS has the content without taking it out!– Sole recovery for breach is limited to refund of fees for periods the service was unavailable– No right to inspect Amazon logs, assistance with forensics, etc.

Page 21: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

And How Much Does it Really Cost?

• Glacier storage is relatively cheap• Getting your data back is not –

if you want it fast• Creates lock-in and gotcha’s

Digital Stewardship and Higher Education IT: Lessons from the National Agenda 21

Page 22: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

Observations

• Digital preservation does not equal “backup”• Ensuring long-term access requires ongoing

evaluation and management of a broad spectrum of risks & costs

• Without attention the digital evidence base will erode

Digital Stewardship and Higher Education IT: Lessons from the National Agenda 22

Page 23: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

The Problem - RestatedKeeping risk of object loss fixed -- what choices minimize $?

“Dual problem” Keeping $ fixed, what choices minimize risk?

Extension

For specific cost functions for loss of object:

Loss(object_i), of all lost objects

What choices minimize:

Total cost= preservation cost+ sum(E(Loss))

risk

cost

Are we there yet?

Page 24: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

Digital Stewardship and Higher Education IT: Lessons from the National Agenda

24

Insider & ExternalAttacks

What are some threats?

Physical & Hardware

Software

Curatorial Error

OrganizationalFailure

Page 25: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

Threat ModelingBit Corruption

Media characteristics

Threat characteristics

Correlations

Logical Scope of Corruption

Format Characteristics

File/encoding Characteristics

Filesystem Characteristics

Probability of Successful

Repair

Auditing Frequency

Auditing Algorithm

Repair Algorithm

Repair Frequency

Repair duration

Corruption

Detection

Repair

Page 26: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

Methods for Mitigating Bit-Level Risk

Physical:Media,

Hardware,Environment

Number of copies

Diversification of copies

Formats FileTransforms:compressio

n,encoding, encryption

Fixity Repair

Loca

l S

tora

ge File

Systems:transforms,deduplicatio

n, redundancy

Rep

licat

ion

Verif

icat

ion

Audit

Page 27: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

Digital Stewardship and Higher Education IT: Lessons from the National Agenda

27

Observations• Blind replication is rarely a rational long-run

strategy – even with lots of copies.• Without verification/audit and repair strategies

long-term risk often remains high• There are multiple methods to mitigating threats

to access – use these to guide diversification• Threat / lifecycle modeling order to make an

rational choice• Better practices, models, and evidence are

needed

Page 28: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

Research Priorities• Applied Research for Cost Modeling and

Audit Modeling• Value of information• Understanding Information Equivalence &

Significance• Policy Research on Trust Frameworks• Preservation at Scale• The Evidence Base for Digital Preservation

Digital Stewardship and Higher Education IT: Lessons from the National Agenda 28

Page 29: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

Digital Stewardship and Higher Education IT: Lessons from the National Agenda

29

What Else do We Need To Know?• What is the expected future value of a specified collection of digital

content? • What content is already being effectively stewarded by other organizations? • How much is the expected future cost of preserving that content?• How often do different threats to information manifest

– storage hardware or media failures– software errors cause information loss– stored information becomes inaccessible because of obsolete formats, or loss of

other contextual knowledge– that human error or maliciousness causes loss content in an information system

• What is the reliability of current digital preservation networks and services?• How successful are other proposed strategies for replication, monitoring,

certification, and auditing at preventing loss due to these threats?

Page 30: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

Digital Stewardship and Higher Education IT: Lessons from the National Agenda

30

The Limits of Case StudiesMost current evidence for digital preservation practices and outcomes are based on local case studies and convenience samples

• Case studies are useful for:– existence proofs– raising awareness of problems– process tracing– hypothesis generation,

• Case studies are not enough to– advance our scientific knowledge– create robust predictive models– test causal hypotheses– strongly guide decision making.

• Systematic Evidence is needed both to support – general selection of digital preservation practices and method– applications of selected digital preservation methods in a specific operational context.

Page 31: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

Digital Stewardship and Higher Education IT: Lessons from the National Agenda

31

How will we learn?• Apply existing research methodologies from other fields

-- especially fields involving observation research on humans and human systems

• Some useful methodologies:– probability-based surveys

(e.g. of information management practice and outcomes) – replicable simulation experiments tied to theoretically grounded

models of information management and risk; – creation of testbeds and test-corpuses which can be used to

systematically compare new practices, tools, and methods; – field experiments, in which randomized interventions are applied

and evaluated in real operational environments.

Page 32: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

Digital Stewardship and Higher Education IT: Lessons from the National Agenda

32

Observations

• Developing better practices will require going beyond case studies – to formal modeling, computer simulation, statistical analysis, experiments

Page 33: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

Digital Stewardship and Higher Education IT: Lessons from the National Agenda

National priorities for…

Digital Content

33

Page 34: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

Selected Digital Content Areas that Challenge Curation

• Web and Social Media • Electronic Records• Moving Image and Recorded Sound • Research Data• “Big” Data

Digital Stewardship and Higher Education IT: Lessons from the National Agenda 34

Page 35: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

Goals of content curation• Curation involves selection of content for retention,

and management for use• Selection requires predicting future value, in order

to build an information portfolio that increases in value

• Management requires capturing and maintaining tacit information that ensures fitness for use: Content size, uncertain value, rapid change, unstable form, and external context are core challenges to curation

Digital Stewardship and Higher Education IT: Lessons from the National Agenda 35

Page 36: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

Observations: • The tacit information needed to understand formats is lost

over time. Format migration plans are needed to mitigate risk.

• Information objects are rarely self-documenting, ensuring fitness for use: requires metadata, provenance, “documentation”, rights, authenticity, To select content for long-term access, we need to develop theoretically grounded and empirically tested models of information valuation and portfolios.

• Cost-models for digital stewardship exist, but they are most accurate for collections of small, static, digital objects in stable formats. Generally, a few things are clear:- Raw storage is rarely limiting cost factor- Management of objects is cheapest and most effective if tacit information is captured early in the lifecycle

Digital Stewardship and Higher Education IT: Lessons from the National Agenda 36

Page 37: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

Digital Stewardship and Higher Education IT: Lessons from the National Agenda

National priorities for…

Technical Infrastructure

37

Page 38: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

2014 Technical Infrastructure Priorities

• Interoperability and Portability in Storage Architectures

• Integration of Digital Forensics Tools• Ensuring Content Integrity

Digital Stewardship and Higher Education IT: Lessons from the National Agenda 38

Page 39: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

Interoperability and Portability in Storage Architectures

• As stewardship organizations manage increasingly large and complex data sets, the need for interoperability at various levels within the technical hardware and software stacks that make digital preservation becomes increasingly important.

• Interoperability of storage devices, hardware, data tape, and file systems software and would help alleviate bottlenecks in the interrelationship between distinct functions in workflows.

• Need for establishing and promoting technical means by which lower levels of the technology stack can directly integrate without requiring extensive computation and processing at higher levels.

Digital Stewardship and Higher Education IT: Lessons from the National Agenda 39

Page 40: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

Integration of Digital Forensics Tools• Digital Forensics tools are essential for working across the

range of heterogeneous kinds of digital materials coming under stewardship

• Projects like BitCurator are pulling together the suite of tools to do this work and developing processes and workflows.

• We are now at the point of implementation, it’s time for organizations to start implementing and sharing information about their work

• The result of this work, will be large sets of heterogeneous digital files which will then push for the development of tools to work with these kinds of data at scale.

Digital Stewardship and Higher Education IT: Lessons from the National Agenda 40

Page 41: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

Ensuring Content Integrity• Digital preservation is possible through a chain of migration

of current hardware and software systems to yet-to-be-established future infrastructures.

• Maintaining file fixity is a minimum requirements.• Beyond file fixity there is a need to ensure that the

semantics of the data and the quality of representation remain unchanged when the object is represented in different forms.

• Identifying the significant semantic properties of the digital object, and algorithms to create semantic fingerprints can ensure that meaning is preserved over time.

Digital Stewardship and Higher Education IT: Lessons from the National Agenda 41

Page 42: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

Observations: • Interoperability and portability across

local and cloud storage architecture remains a significant issue – beware economic and technical lock-in

• Curation of objects acquired later in the information lifecycle often require digital forensics – invest in tools and expertise

• Ensuring integrity of content over time requires assessing fixity at both a file and semantic level

Digital Stewardship and Higher Education IT: Lessons from the National Agenda 42

Page 43: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

Digital Stewardship and Higher Education IT: Lessons from the National Agenda

National priorities for…

Organizational Development

43

Page 44: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

State of the curation practice: Trusted Digital Repositories

An organization with a mission and to provide reliable, long-term access to managed digital resources to its designated community; coupled with sufficient evidence of practices to ensure the success of this mission.

• Formalized in:– OAIS Reference Model

(standardized in ISO 14721:2012)– Trustworthy Repositories Audit & Certification (TRAC)

(standardized in ISO 16363:2012)

Digital Stewardship and Higher Education IT: Lessons from the National Agenda 44

Page 45: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

National Priorities for Organizational Roles, Policies,

and Practices

Identifies need to increase cross‐organizational cooperation to increase the impact and leverage investments

made by individual institutions.

Digital Stewardship and Higher Education IT: Lessons from the National Agenda 45

Page 46: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

Auditing Distributed Digital Preservation

Networks

Potential Nexuses for Preservation Failure• Technical

– Media failure: storage conditions, media characteristics– Format obsolescence– Preservation infrastructure software failure– Storage infrastructure software failure– Storage infrastructure hardware failure

• External Threats to Institutions– Third party attacks – Institutional funding– Change in legal regimes

• Quis custodiet ipsos custodes?– Unintentional curatorial modification – Loss of institutional knowledge & skills– Intentional curatorial de-accessioning– Change in institutional mission

Source: Reich & Rosenthal 2005

46

Page 47: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

1) Provision networked preservation services – network of preservation service providers with specialized services rather than every organization performing all aspects of digital preservation -- A number of core risks are institutional

2) Collaborate on shepherding and promotion of standards– digital preservation community representation on the relevant standards bodies rather than each organization needing to participate in every body

3) Share digital preservation training and staffing resources

Priorities for Organizational Collaboration

Digital Stewardship and Higher Education IT: Lessons from the National Agenda 47

Page 48: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

Observations• Trustworthy repository standards provide good

abstract models of a single institutions curatorial responsibilities, and an inventory of accepted practices

• Many threats to content require multi-institutional stewardship

• Certification of trustworthiness and evaluation of impact of accepted practices is still in early stages

• Both intra- and inter- institutional collaboration is needed to prevision preservation services, set standards, establish and evaluate trustworthiness

Digital Stewardship and Higher Education IT: Lessons from the National Agenda 48

Page 49: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

Digital Stewardship and Higher Education IT: Lessons from the National Agenda

What’s next?

49

Page 50: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

A National Stewardship Agenda for 2015 and Beyond

• Drafts and update process starts this winter• Community review process late spring• An update will be presented in July at

Digital Preservation 2014

Digital Stewardship and Higher Education IT: Lessons from the National Agenda 50

Page 51: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

Moving Digital Stewardship Forward

NDSA has a commitment to:

• Facilitating broad collaboration• Promoting dissemination and engagement• Regular updates and revisions of the

National Agenda and core NDSA surveys

Digital Stewardship and Higher Education IT: Lessons from the National Agenda 51

Page 52: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

Want more information?

Contact NDSA for… • Briefings, webinars, and consultations on the

Agenda or other NDSA work • Assistance in gathering comments on National

policies and programs• Assistance in recruiting experts for review and

discussion panels; grant review• Referrals to content stewards in specific areas

Digital Stewardship and Higher Education IT: Lessons from the National Agenda 52

Page 53: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

Observation: Principles• The core of digital stewardship is taking

broad responsibility for preservation and curation

• The goal of preservation is meaningful long-term access

• The principle activities of curation are selection and management for use

Digital Stewardship and Higher Education IT: Lessons from the National Agenda 53

Page 54: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

Digital Stewardship and Higher Education IT: Lessons from the National Agenda

54

Observations: Planning• Blind replication is rarely a rational long-run strategy –

even with lots of copies.• Without verification and repair strategies long-term

risk often remains high• There are multiple methods to mitigating threats to

access – use these to guide diversification• Threat / lifecycle modeling order to make an rational

choice• Developing better practices will require going beyond

case studies – to formal modeling, computer simulation, statistical analysis, experiments

Page 55: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

Observations: Curation • The tacit information needed to understand formats is lost

over time. Format migration plans are needed to mitigate risk.

• Information objects are rarely self-documenting, ensuring fitness for use: requires metadata, provenance, “documentation”, rights, authenticity, To select content for long-term access, we need to develop theoretically grounded and empirically tested models of information valuation and portfolios.

• Cost-models for digital stewardship exist, but they are most accurate for collections of small, static, digital objects in stable formats. Generally, a few things are clear:- Raw storage is rarely limiting cost factor- Management of objects is cheapest and most effective if tacit information is captured early in the lifecycle

Digital Stewardship and Higher Education IT: Lessons from the National Agenda 55

Page 56: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

Observations: Curation • The tacit information needed to understand formats is lost

over time. Format migration plans are needed to mitigate risk.

• Information objects are rarely self-documenting, ensuring fitness for use: requires metadata, provenance, “documentation”, rights, authenticity, To select content for long-term access, we need to develop theoretically grounded and empirically tested models of information valuation and portfolios.

• Cost-models for digital stewardship exist, but they are most accurate for collections of small, static, digital objects in stable formats. Generally, a few things are clear:- Raw storage is rarely limiting cost factor- Management of objects is cheapest and most effective if tacit information is captured early in the lifecycle

Digital Stewardship and Higher Education IT: Lessons from the National Agenda 56

Page 57: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

Observations: Infrastructure • Interoperability and portability across

local and cloud storage architecture remains a significant issue – beware economic and technical lock-in

• Curation of objects acquired later in the information lifecycle often require digital forensics – invest in tools and expertise

• Ensuring integrity of content over time requires assessing fixity at both a file and semantic level

Digital Stewardship and Higher Education IT: Lessons from the National Agenda 57

Page 58: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

Observations: Organizations• Interoperability and portability across

local and cloud storage architecture remains a significant issue – beware economic and technical lock-in

• Curation of objects acquired later in the information lifecycle often require digital forensics – invest in tools and expertise

• Ensuring integrity of content over time requires assessing fixity at both a file and semantic level

Digital Stewardship and Higher Education IT: Lessons from the National Agenda 58

Page 59: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

Key Terms• Audit: An independent evaluation of records and activities to assess a

system of controls • Authenticity: information used to verify the truthfulness of assertions

about content or ite provenance• Curation: selection of content for retention, and management for fit use• Content stewardship: broad responsibility for curation and preservation • File fixity: information used to verify that a digital object has not been

altered or corrupted.• Provenance: the chronology of the ownership, custody, operations on,

and/or location of an information object.• Preservation: ensuring meaningful long-term access• Trusted Digital Repository: an organization with a mission and to

provide reliable, long-term access to managed digital resources to its designated community; coupled with sufficient evidence of practices to ensure the success of this mission

Digital Stewardship and Higher Education IT: Lessons from the National Agenda 59

Page 60: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

Bibliography• Bailey, Charles (2011). Digital Curation and Preservation Bibliography, <

digital-scholarship.org/dcpb/>• CCSDS (2012), Reference model for an open archival information system (OAIS),

<public.ccsds.org/publications/archive/650x0m2.pdf >• Digital Curation Center, (2010-4):

How to Guides: <dcc.ac.uk/resources/how-guides>Curation Reference Manual: <dcc.ac.uk/resources/curation-reference-manual>

• Giaretta, David (2011). Advanced Digital Preservation. <amazon.com/Advanced-Digital-Preservation-David-Giaretta>

• ISO, 2012, ISO 16363:2012: Audit and certification of trustworthy digital repositories. < iso.org/iso/catalogue_detail.htm?csnumber=56510 >

• Johnson, L., Adams Becker, S., Estrada, V., Freeman, A. (2014). NMC Horizon Report: 2014 Higher Education Edition. Austin, Texas: The New Media Consortium.

• NDSA (2013), National Agenda for Digital Stewardship, <digitalpreservation.gov/ndsa/nationalagenda/>

• Rosenthal, David SH, Thomas S. Robertson, Tom Lipkis, Vicky Reich, and Seth Morabito. (2005) "Requirements for digital preservation systems: A bottom-up approach”. Dlib 11(11)<dlib.org/dlib/november05/rosenthal/11rosenthal.html>

Digital Stewardship and Higher Education IT: Lessons from the National Agenda 60

Page 61: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

More Information

digitalpreservation.gov/ndsa/nationalagenda

[email protected]

Digital Stewardship and Higher Education IT: Lessons from the National Agenda 61

Page 62: Digital Curation and Higher Education IT: Lessons from the National Agenda for Digital Stewardship (216484591)

Digital Stewardship and Higher Education IT: Lessons from the National Agenda

Questions?E-mail: [email protected]:informatics.mit.edu

62