rethinking repositories - cni: coalition for networked ... · distinct user groups and service...
TRANSCRIPT
Rethinking
Jason Casden Head, Software Development
Paolo Mangiafico Coordinator, Scholarly
Communications Technology
Julie Rudder Repository Program
Librarian
Will Sexton Head, Digital Curation and
Production
UNC University Libraries Duke University Libraries
CNI Spring Meeting
April 3 2017
Repositories
Clifford A. Lynch, “Institutional Repositories: Essential Infrastructure for Scholarship in the Digital Age,” ARL Bimonthly Report 226 (February 2003), 1-7. (PDF) “In the fall of 2002, something extraordinary occurred in the continuing networked information revolution…. The development of institutional repositories emerged as a new strategy….”
What goes into the repository?
What goes into the repository? 1/2
Lynch, 2003: ● “intellectual works of faculty and students - both teaching
and research materials” ● “documentation of the activities of the institution itself in
the form of records of events and performance(s) and of the ongoing intellectual life of the institution”
● “experimental and observational data captured by members of the institution that support their scholarly activities”
What goes into the repository? 2/2
DDR Mission Statement:
“...research data, scholarly output, digital collections, archival records, and other digital materials that provide enduring value for intellectual inquiry and documentation of University activities.”
Campus Services
Ginny Boyer SDIS
Will Sexton DCAPS Chair
Jim Tuttle Repository Architect
Maggie Dickson Metadata Architect
Herndon, DVS
Mangiafico, ScholComm
Gillispie, UA
Bragg, DDC
DDR Program Committee
Faculty Research Data Provost’s mandate, data management
planning, compliance
Digital Archives ETDs, e-records,
born-digital, patron requests
Digital Collections New & legacy, access UI, AV, images, MSI
Faculty pubs, DukeSpace, VIVO/
Scholars@Duke
Open Access
Metadata Advisory
Group
Preservation Advisory
Group
Library Collections
DDR Program Areas
● Research Data
● Scholarly Publications
● Library Collections
Each program area has its own collections policy and working group
Duke Digital Repository
The DDR is an umbrella service providing preservation, management, access, and discovery for five collecting or depositing areas that
currently have staff with dedicated time and established (or developing) processes:
DDR Program Area DUL Collecting/Depositing Area Departmental Owner(s)
Research Data Duke Faculty Research Data Data & Visualization Services, Digital Curation & Production
Scholarly Publications Electronic Theses & Dissertations University Archives
Open Access Publications Scholarly Communications
Library Collections Digital Collections Digital Curation & Production
RL Born-Digital Rubenstein Library
Duke Digital
Repository
Born-digital
Digital collections
Research Data
Scholarly publications Electronic
Theses & Dissertations
Library general
collections
Scholarly products not data or peer-
reviewed
Duke Digital
Repository
Library general
collections
“proposed additions to the DDR that originate with the collecting and curatorial activities of staff in the Libraries”
● Materials purchased or licensed by librarians
● Patrons may only use if we provide access
Scholarly products not data or peer-
reviewed
DUL gets requests from campus to use the repository for a variety of items that our current policies don’t cover
● Student scholarship ● Non peer-reviewed
publications ● Archiving web sites
for academic programs
● Digital objects related to exhibits
● Spawn of interdisciplinary centers
Is anyone doing these things?
Are we building collections for our libraries, are we providing services
for members of our campus communities, are we doing both …
or what?
From https://purr.purdue.edu/start “Get Started” @ Purdue University Research Repository
What does this mean?
UNC at Chapel Hill Libraries
Jason Casden & Julie Rudder
Rethinking repositories
Rethinking repositories
History
Governance increasingly centered at the library
Digital Curation & Institutional Repository Committee (DC/IRC) formed (2005)
Partnership between School of Information and Library Science, Campus IT, and Libraries
Focus on specifying and developing developing a pilot preservation repository
Carolina Digital Library and Archives (CDLA) formed (2007)
DC/IRC reports out (2007-2008)
Carolina Digital Repository pilots launched (Spring 2009)
Carolina Digital Repository production launch (Fall 2009)
Carolina Digital Repository project shifted from CDLA to Libraries IT (2010)
CDLA dissolved (November 2013)
Julie Rudder hired (October 2015)
Jason Casden hired (March 2017)
Shifting scope
“The purpose of the UNC-CH Pilot IR is to demonstrate a system for the capture, storage, and dissemination of digital assets created and/or maintained by the UNC-CH community members. These digital assets represent the scholarly, intellectual, and creative contributions of UNC-CH community members, as well as represent valuable digital assets required for stewardship in accordance with the North Carolina Public Records Law.”
UNC-CH Pilot IR Service Definition Policy from DC/IRC Report (2007-2008)
Shifting scope
“The mission of the UNC Institutional Repository is to collect, preserve, and ensure continuing access to digital content of enduring value to the University”
Draft mission statement from DC/IRC Report (2007-2008)
Shifting scope
“An IR is intended as a solution for managing the diverse, unique digital assets produced at the University. … Such a system is intended to benefit the digital asset management needs and deficiencies of the University community in its entirety, composed of numerous stakeholder groups of administrators, faculty, researchers, staff and students, across a multitude of disciplines, and to allow for managed stewardship so that these assets remain accessible in the long-term. Further, an IR is intended to serve as a rich resource of University-produced and/or managed scholarly materials, and to make these assets discoverable over time by University community members and the academic and research community at-large.”
Executive summary from DC/IRC Report (2007-2008)
Shifting scope
“The Carolina Digital Repository (CDR) is a digital archives for materials produced by members of the University of North Carolina at Chapel Hill community. The main goal of the CDR is to keep UNC digital scholarly output safe and accessible for as long as needed. It also serves as a repository of historical materials that broadly support the University’s academic mission. More specifically, the CDR aims to acquire UNC digital material and ensure it is accessible, searchable and safe from alteration.”
“About the Repository” - https://blogs.lib.unc.edu/cdr/
Our current infrastructure
ContentDM for digitized materials (~1,500,000 objects)
Carolina Digital Repository for at-risk materials and IR (~650,000 objects)
Simultaneously, we manage ContentDM for digitized collections (access focus), CDR for born digital and faculty deposit (preservation focus)
Filesystem storage for most digitized preservation masters
Thriving
Large, varied collection
Sophisticated customized preservation workflows around Fedora
Self-deposit features
Developed a community of researcher and departmental users of IR services
ETDs, Master’s papers, Honors Theses
Data sets
A/V materials
Publications
Satisficing
CDR has gradually become our de facto preservation repository
“At-risk digital materials” from Special Collections
All born-digital materials
A subset of digitized materials
Research outputs of the institution
Student papers
University Archives
...and also a staging area for unprocessed born digital materials
Prompted by a storage crisis
Custom infrastructure
As complexity increases, we rarely fully benefit from common international efforts.
Services
“While guidance, training and engagement activities are a critical factor in achieving IR support, participation, use, and deposits, with the latter a major obstacle reported by many operational IRs, it is premature to act on such strategic plans unless a system is in place, or nearing completion, for demonstration and use”
Recommendations from DC/IRC Report (2007-2008)
UNC Open Access Policy
“Each Faculty member of the University of North Carolina at Chapel Hill grants to the University a nonexclusive, noncommercial, irrevocable, worldwide license to exercise, and to authorize others to exercise, any and all rights under copyright relating to each of his or her scholarly articles, in any medium, for the purpose of making those articles freely and widely available in an open access repository.”
January 1, 2016 - http://policies.unc.edu/policies/open-access/
UNC Open Access Policy
“The Scholarly Communications Office of the University Library, or other office designated by the Provost, will be responsible for interpreting this policy, resolving disputes concerning its interpretation and application, and recommending changes from time to time. The Scholarly Communications Office shall also create and maintain the repository terms of use governing public access to and use of the scholarly articles licensed under this policy.”
January 1, 2016 - http://policies.unc.edu/policies/open-access/
Institutional support for library services
Three new provost-funded positions Institutional Repository Librarian Open Access Librarian Software Developer
One-time funds to support outreach and internal training
Renewing our focus
Redouble scholarly service development efforts
Reevaluate the benefits of community-developed software
Redefine the scope of our preservation infrastructure
Rethink our definition of success
Some of what we set out to do:
Learn from other universities.
Understand the local UNC community and needs.
Involve as many library staff as possible.
What we heard from faculty:
Care about getting content from behind paywalls.
Don’t care as much about getting already open-content into the repository.
Care about author’s rights (suspect this is the main things their colleagues will care about).
The majority of faculty don’t yet know about the policy.
There are some basic IR-type features that are lacking from the current CDR.
The policy absolutely, positively can not be a great burden to faculty.
When we talked to library staff and administration:
Having lots of content in the repository.
Populating faculty profiles.
Showing the impact and the volume of UNC research.
A desire to really understand the policy and strategies to talk to faculty about the benefit of OA.
~Catherine Mitchell http://osc.universityofcalifornia.edu/2016/10/does-the-uc-open-access-policy-miss-the-mark-depends-on-which-mark/
“Perhaps the most compelling arguments for open access policies, however, have nothing at all to do
with the numbers but, instead, are driven by the anecdotal stories we hear of how open research makes a difference to people around the world.”
“...in our preoccupation with low compliance rates for open access policies, we often forget to highlight the transformative
rights declarations of these policies, which are purely independent of compliance and apply to all authors.”
~Catherine Mitchell http://osc.universityofcalifornia.edu/2016/10/does-the-uc-open-access-policy-miss-the-mark-depends-on-which-mark/
We decided to focus on four areas (3 year plan)
Harvest open content: Use automated tools, invest in sustainable community efforts, integrate with campus faculty activity tracker and profile system.
Paywalled content recovery: Create systematic way to find UNC publications and make them open. IR Librarian’s focus.
Open access education and outreach: Launch major campaign (hire marketing firm) around authors’ rights. Create a robust outreach and education program. OA’s focus.
Cultivate the value of openness (goes beyond policy): Prioritize work that makes our collections more open and usable (rights statements, better metadata and discovery). Clarify and increase support for data and other non-article content.
How does the OA policy relate to the scholarly record and to the IR
as a whole?
Hyrax (formerly Sufia)- a Hydra application
Library Managed Collections
Current CDR technology stack.
Data, Publications, Presentations, Student Work, ETDs.
Born-digital collections. At-risk digitization.
Why two systems?
● Very different users with different needs for ingest, management and access of content.
● Need for focused direction governance.
● Distinct user groups and service management needs between use cases.
● Need for different policies
○ Preservation policies
○ Deletion/editing policies
○ Collection development policies
● Constant collection ingest and churn for Library Managed Collections (10,000+ files). Collections are getting bigger, files are getting bigger.
● Need to provide almost 24/7 system uptime for self-deposit users.
Near term work:
● Increasing our data curation services.
● Hire marketing consultants to launch campaign for open access.
● Create better metadata for EVERYTHING.
● Add right statements to EVERYTHING.
● Train and invest in library staff.
● ORCID implementation.
● Needs assessment for Digital Collections.
● Launch and customize Hyrax
Some of our open questions: ● What level of preservation do we need for IR content?
● How does the heavy preservation focus affect what we accept into an IR?
● How will policies change for the new system?
● Governance model for 2 systems?
● What faculty activity tracking system will UNC adopt? We know this is the future.
● Can we use the current CDR system for all library managed content?
● Where are other institutions doing a good job of inserting in the research life-cycle.