the purdue university research repository:
DESCRIPTION
The Purdue University Research Repository:. HUBzero customization for dataset publication and digital preservation. Amy Barton, MLS Assistant Professor of Library Science, Metadata Specialist . Carly Dearborn, MSIS Digital Preservation and Electronic Records Archivist. Neal Harmeyer, MLS - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: The Purdue University Research Repository:](https://reader036.vdocuments.net/reader036/viewer/2022062501/568168b8550346895ddf9e64/html5/thumbnails/1.jpg)
THE PURDUE UNIVERSITY RESEARCH REPOSITORY:HUBZERO CUSTOMIZATION FOR DATASET PUBLICATION AND DIGITAL PRESERVATION
Amy Barton, MLSAssistant Professor of Library Science, Metadata Specialist
Carly Dearborn, MSISDigital Preservation and Electronic Records Archivist
Neal Harmeyer, MLSDigital Archivist
![Page 2: The Purdue University Research Repository:](https://reader036.vdocuments.net/reader036/viewer/2022062501/568168b8550346895ddf9e64/html5/thumbnails/2.jpg)
WHAT IS PURR?TECHNICAL AND INSTITUTIONAL INFRASTRUCTURE
![Page 3: The Purdue University Research Repository:](https://reader036.vdocuments.net/reader036/viewer/2022062501/568168b8550346895ddf9e64/html5/thumbnails/3.jpg)
THE PURDUE UNIVERSITY RESEARCH REPOSITORYA BRIEF OVERVIEW:
The Purdue University Research Repository (PURR) is a research collaboration and data management solution for Purdue researchers and their collaborators.
• Data management support • A workspace for researchers to collaborate on research and publish
datasets online• Access to published datasets with unique Digital Object Identifier
(DOI)• Long-term preservation component • https://purr.purdue.edu
![Page 4: The Purdue University Research Repository:](https://reader036.vdocuments.net/reader036/viewer/2022062501/568168b8550346895ddf9e64/html5/thumbnails/4.jpg)
THE PURDUE UNIVERSITY RESEARCH REPOSITORY A CUSTOMIZED INSTANCE OF HUBZERO ®
• PURR utilizes HUBzero as its foundation: https://hubzero.org• Designed to facilitate virtual communities, online collaboration,
research, and teaching• Built on open source LAMP (Linux Apache, MySQL, and PHP)
platform with Joomla! Content Management System (CMS) • PURR was specially customized for data stewardship which includes
a workflow for the curation, publication, dissemination and preservation of datasets
• Unique customization of HUB software will be added to base HUBzero package in next release
![Page 5: The Purdue University Research Repository:](https://reader036.vdocuments.net/reader036/viewer/2022062501/568168b8550346895ddf9e64/html5/thumbnails/5.jpg)
THE PURDUE UNIVERSITY RESEARCH REPOSITORY COLLABORATIVE INSTITUTIONAL INFRASTRUCTURE
• Collaborative effort• Purdue University Libraries• Information Technology at Purdue (ITaP)• Office of the Vice President for Research (OVPR)
• Governed by an Executive Committee, Steering Group, and a Working Group• PURR Libraries team
• Project Director 50%• Digital Data Repository Specialist 100%• Two Software Developers 100%• Metadata Specialist 20%• Digital Archivist 25%• Two Graduate Assistants 50%• Graduate Assistant 25%
![Page 6: The Purdue University Research Repository:](https://reader036.vdocuments.net/reader036/viewer/2022062501/568168b8550346895ddf9e64/html5/thumbnails/6.jpg)
DATA PRESERVATIONISO 16363 & OAIS
![Page 7: The Purdue University Research Repository:](https://reader036.vdocuments.net/reader036/viewer/2022062501/568168b8550346895ddf9e64/html5/thumbnails/7.jpg)
DIGITAL PRESERVATION IN PURRTHE DATA DELUGE
• Long-term data management plans required by many federal funding agencies
• Trustworthy repositories, sound metadata creation and capture, open standards for file formats, and information literacy vital to longevity of digital resources
• Working Group drafted PURR Digital Preservation Policy using the Trustworthy Repository Audit Checklist (TRAC) as guiding document.
• TRAC/ISO 16363 influenced documentation such as mission statement, policies, job descriptions, business plan, etc.
![Page 8: The Purdue University Research Repository:](https://reader036.vdocuments.net/reader036/viewer/2022062501/568168b8550346895ddf9e64/html5/thumbnails/8.jpg)
DIGITAL PRESERVATION IN PURRDEVELOPING POLICIES AND STRATEGIES
• PURR’s preservation mandate and its organizational commitment. • PURR commits to preservation for a period of 10 years after which
the content is subject to the Libraries’ selection criteria and archival appraisal
• Preservation strategies: full preservation, bit-level preservation and no preservation.
• All objects receive bit-level maintenance, a DOI permanent identifier, PREMIS preservation metadata, onsite and offsite backups, regular virus checks, regular rotation to new storage media.
• PURR accepts all file formats but recommends formats which are more sustainable long-term.
![Page 9: The Purdue University Research Repository:](https://reader036.vdocuments.net/reader036/viewer/2022062501/568168b8550346895ddf9e64/html5/thumbnails/9.jpg)
DIGITAL PRESERVATION IN PURROAIS & DISTRIBUTED DIGITAL PRESERVATION
• The Open Archival Information System (OAIS) Reference Model is a standard in digital preservation and an ISO standard – ISO 14721
• Producers submit content item for publication with appropriate Dublin Core metadata – this acts as the Submission Information Package (SIP)
• The Content Information (CI) is then bundled together with Preservation Description Information using Library of Congress specifications for BagIt. This is the Archival Information Package (AIP)
• Unlike most OAIS repositories, the Dissemination Information Package (DIP) is not derived from AIP but rather its SIP.
• In February 2013, Purdue joined The MetaArchive Cooperative
![Page 10: The Purdue University Research Repository:](https://reader036.vdocuments.net/reader036/viewer/2022062501/568168b8550346895ddf9e64/html5/thumbnails/10.jpg)
Designated Community purr.purdue.ed
u
APACHE
HUBzeroJOOMLA!
PHP MySQL
LINUX
BagIt Media
Backup(MetaArchive)
LOCKSS access
SIP
DIP
AIP
Diagram designed by: Sriram Kiran Valavala
![Page 11: The Purdue University Research Repository:](https://reader036.vdocuments.net/reader036/viewer/2022062501/568168b8550346895ddf9e64/html5/thumbnails/11.jpg)
PURR METADATAWEAVING OF STANDARDSFOR PRESERVATION
![Page 12: The Purdue University Research Repository:](https://reader036.vdocuments.net/reader036/viewer/2022062501/568168b8550346895ddf9e64/html5/thumbnails/12.jpg)
METADATA & AIP CREATION TOOL• Metadata Overview•Dataset Publication Process•Archive Information Package Generation•Metadata Generation
TALKING POINTS
![Page 13: The Purdue University Research Repository:](https://reader036.vdocuments.net/reader036/viewer/2022062501/568168b8550346895ddf9e64/html5/thumbnails/13.jpg)
METADATA GOALS FOR PURR• Capture all pertinent information about the dataset file for long term
preservation• Descriptive metadata• Administrative metadata
– Technical metadata– Structural metadata– Rights metadata– Preservation metadata
DATASET METADATA
![Page 14: The Purdue University Research Repository:](https://reader036.vdocuments.net/reader036/viewer/2022062501/568168b8550346895ddf9e64/html5/thumbnails/14.jpg)
METADATA GENERATION METADATA STANDARDS FOR PURR
• Metadata Encoding and Transmission Standard (METS) • Wrapper
• DCMI Metadata Terms (dcterms)• Descriptive metadata
• Metadata Object Description Schema (MODS)• Dataset ownership• Access condition
• Preservation Metadata: Implementation Strategies (PREMIS)• Preservation metadata
![Page 15: The Purdue University Research Repository:](https://reader036.vdocuments.net/reader036/viewer/2022062501/568168b8550346895ddf9e64/html5/thumbnails/15.jpg)
METADATA GENERATIONWHY METS?
• METS acts as a structured container into which other standard metadata schemas can be pointed to externally or embedded internally.
• Structure:• Descriptive Section <mets:dmdSec>• Administrative Section <mets:amdSec>
• Technical Section <mets:techMD>• Rights Section <mets:rightsMD>• Digital Provenance Section <mets:digiprovMD>
• File Section <mets:fileSec>• File Structure Section <mets:structMap>
DCTERMS
PREMIS
PREMIS & MODS
METS
![Page 16: The Purdue University Research Repository:](https://reader036.vdocuments.net/reader036/viewer/2022062501/568168b8550346895ddf9e64/html5/thumbnails/16.jpg)
METADATA GENERATIONWHY QUALIFIED DUBLIN CORE?
• Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH)• Our Digital Library Software Developer, Brandon Beatty,
developed OAI-PMH functionality• The code was submitted to the HUBzero development group and
added the core HUBzero code.• HUBzero now comes standard with OAI-PMH functionality• A contribution for the greater good
![Page 17: The Purdue University Research Repository:](https://reader036.vdocuments.net/reader036/viewer/2022062501/568168b8550346895ddf9e64/html5/thumbnails/17.jpg)
METADATA GENERATION WHY PREMIS?
• PREMIS is a robust preservation standard that captures digital preservation activities applied to a digital object.
• Intellectual Entity• A coherent set of content that is reasonably described as a unit
(dataset).
• Objects• A discrete unit of information in digital form.
• Events• An action that involves at least one object or agent known to the
preservation repository.
• Rights• Assertions of one or more rights or permissions pertaining to an object
and/or agent.
• Agents• A person, organization, or software program associated with
preservation events in the life of an object.
Data Dictionary for Preservation Metadata (http://www.oclc.org/content/dam/research/activities/pmwg/premis-final.pdf)
![Page 18: The Purdue University Research Repository:](https://reader036.vdocuments.net/reader036/viewer/2022062501/568168b8550346895ddf9e64/html5/thumbnails/18.jpg)
METADATA GENERATION PREMIS EVENTS FOR PURR
Data Dictionary for Preservation Metadata (http://www.oclc.org/content/dam/research/activities/pmwg/premis-final.pdf)
Event Name Event Description Event Preservation Explanationcapture Initial capture of the
publication data from the user -- the first event in the event stream.
Preserving capture would help with HUBzero debugging, as it tells us when the SIP capture/creation process started.
in-revision Generated when a SIP must be revised before it can be approved (it would occur between capture and validation).
Preserving in-revision would help with HUBzero debugging, as in-revision is a major change in the SIP status. Note that in-revision only occurs if the SIP is sent back to the author(s) for revision before it can be approved for AIP status.
validation Validation of the SIP to ensure it is ready to become an AIP.
Validation is one of the major steps in a SIP's journey to AIP status, so it should be preserved.
ingestion Creation of the AIP from the approved SIP.
Ingestion is the creation of the AIP, so it should be preserved.
fixity check Periodic event where the fixity of the files in the AIP is re-validated.
Preserving fixity check would help with HUBzero debugging, as it tells when there is a problem with the preservation process.
replication Copying the AIP bit-for-bit to another location for preservation purposes (as in LOCKSS).
As replication creates another copy of the AIP, it should be preserved for debugging purposes.
migration Transforming the AIP and its contents into a more-contemporary format.
As migration creates a newer, automatically-generated version of the AIP, it should be preserved for debugging purposes.
![Page 19: The Purdue University Research Repository:](https://reader036.vdocuments.net/reader036/viewer/2022062501/568168b8550346895ddf9e64/html5/thumbnails/19.jpg)
METADATA GENERATION WHY MODS?• Capture Dataset Ownership• <name> The name of a person, organization, or event (conference,
meeting, etc.) associated in some way with the resource.• <affiliation> The name of an organization, institution, etc. with which the
entity recorded in <name> was associated at the time that the resource was created.
• <role> Designates the relationship (role) of the entity recorded in name to the resource described in the record.
• <accessCondition> Information about restrictions imposed on access to a resource.
• <mods:accessCondition type="restriction on access">publically accessible </mods:accessCondition>
• <mods:accessCondition type="restriction on access">embargoed until 2015-06-30 </mods:accessCondition>
![Page 20: The Purdue University Research Repository:](https://reader036.vdocuments.net/reader036/viewer/2022062501/568168b8550346895ddf9e64/html5/thumbnails/20.jpg)
METADATA GENERATION AIP CREATION
![Page 21: The Purdue University Research Repository:](https://reader036.vdocuments.net/reader036/viewer/2022062501/568168b8550346895ddf9e64/html5/thumbnails/21.jpg)
METADATA GENERATION AIP CREATION
![Page 22: The Purdue University Research Repository:](https://reader036.vdocuments.net/reader036/viewer/2022062501/568168b8550346895ddf9e64/html5/thumbnails/22.jpg)
METADATA GENERATION AIP CREATION
![Page 23: The Purdue University Research Repository:](https://reader036.vdocuments.net/reader036/viewer/2022062501/568168b8550346895ddf9e64/html5/thumbnails/23.jpg)
METADATA GENERATIONAIP CREATION
CC0 - Creative CommonsCreative Commons Attribution Unported 3.0 LicenseCreative Commons Attribution-NoDerivs Unported 3.0 LicenseCreative Commons Attribution-NonCommercial-ShareAlike Unported 3.0 LicenseCreative Commons Attribution-ShareAlike Unported 3.0 LicenseCreative Commons Attribution-NonCommercial Unported 3.0 LicenseCreative Commons Attribution-NonCommercial-NoDerivs Unported 3.0 License
![Page 24: The Purdue University Research Repository:](https://reader036.vdocuments.net/reader036/viewer/2022062501/568168b8550346895ddf9e64/html5/thumbnails/24.jpg)
METADATA GENERATIONAIP CREATION
![Page 25: The Purdue University Research Repository:](https://reader036.vdocuments.net/reader036/viewer/2022062501/568168b8550346895ddf9e64/html5/thumbnails/25.jpg)
METADATA GENERATIONAIP CREATION
PURR
Puuuurrrrrrrrrrr….
![Page 26: The Purdue University Research Repository:](https://reader036.vdocuments.net/reader036/viewer/2022062501/568168b8550346895ddf9e64/html5/thumbnails/26.jpg)
METADATA GENERATION <mets:digiprovMD ID="METS-digiprovMD-premis-event-unpacking-20130312T112352-
processId-17937-seq-1"><mets:mdWrap MDTYPE="PREMIS:EVENT">
<mets:xmlData><premis:event>
<premis:eventIdentifier><premis:eventIdentifierType>HUBzero</
premis:eventIdentifierType>
<premis:eventIdentifierValue>premis-event-unpacking-20130312T112352-processId-17937-seq-1
</premis:eventIdentifierValue></premis:eventIdentifier><premis:eventType>unpacking</
premis:eventType><premis:eventDateTime>2013-03-
12T11:23:52+00:00</premis:eventDateTime><premis:eventDetail>tool:
HUBzero</premis:eventDetail><premis:eventOutcomeInformation>
<premis:eventOutcome>unpackaged</premis:eventOutcome></premis:eventOutcomeInformation>…
PREMIS EVENT CAPTURED
![Page 27: The Purdue University Research Repository:](https://reader036.vdocuments.net/reader036/viewer/2022062501/568168b8550346895ddf9e64/html5/thumbnails/27.jpg)
METADATA GENERATION METS WRAPPER & DC TERMS
<mets:dmdSec ID="METS-dmdSec-doi__10.5072__FK250925"><mets:mdWrap MDTYPE="DC"><mets:xmlData><mets:dcterms><dcterms:creator>Amy Barton</dcterms:creator><dcterms:date>2013-01-07T16:40:43-05:00</dcterms:date><dcterms:description>projectName: Metadata Project</dcterms:description><dcterms:description>projectAlias: metadata</dcterms:description><dcterms:description>publicationState: Draft under review</dcterms:description><dcterms:description>publicationVersion: 1</dcterms:description><dcterms:description>abstract: A metadata workshop was developed based on subject liaison librarians’ feedback in a
Qualtrics survey.</dcterms:description><dcterms:description>notes: The dataset contains survey data.</dcterms:description><dcterms:description>synopsis: Subject Librarian survey and resulting metadata workshop.</dcterms:description><dcterms:format>BagIt</dcterms:format><dcterms:identifier>doi:10.5072/FK250925</dcterms:identifier><dcterms:publisher>Purdue University Research Repository</dcterms:publisher><dcterms:rights>CC0 - Creative Commons</dcterms:rights><dcterms:subject>Instruction</dcterms:subject><dcterms:subject>Metadata</dcterms:subject><dcterms:subject>Survey data</dcterms:subject><dcterms:subject>Library Science</dcterms:subject><dcterms:title>Metadata Madness Workshop:</dcterms:title><dcterms:type>dataset</dcterms:type></mets:dcterms></mets:xmlData></mets:mdWrap></mets:dmdSec>
![Page 28: The Purdue University Research Repository:](https://reader036.vdocuments.net/reader036/viewer/2022062501/568168b8550346895ddf9e64/html5/thumbnails/28.jpg)
METADATA GENERATION METS TECHNICAL SECTION
![Page 29: The Purdue University Research Repository:](https://reader036.vdocuments.net/reader036/viewer/2022062501/568168b8550346895ddf9e64/html5/thumbnails/29.jpg)
METADATA GENERATION METS TECHNICAL SECTION
![Page 30: The Purdue University Research Repository:](https://reader036.vdocuments.net/reader036/viewer/2022062501/568168b8550346895ddf9e64/html5/thumbnails/30.jpg)
METADATA GENERATION METS RIGHTS
![Page 31: The Purdue University Research Repository:](https://reader036.vdocuments.net/reader036/viewer/2022062501/568168b8550346895ddf9e64/html5/thumbnails/31.jpg)
METADATA GENERATION MODS DATASET OWNERSHIP
![Page 32: The Purdue University Research Repository:](https://reader036.vdocuments.net/reader036/viewer/2022062501/568168b8550346895ddf9e64/html5/thumbnails/32.jpg)
METADATA GENERATION PREMIS AGENT
![Page 33: The Purdue University Research Repository:](https://reader036.vdocuments.net/reader036/viewer/2022062501/568168b8550346895ddf9e64/html5/thumbnails/33.jpg)
METADATA GENERATION PREMIS EVENT
![Page 34: The Purdue University Research Repository:](https://reader036.vdocuments.net/reader036/viewer/2022062501/568168b8550346895ddf9e64/html5/thumbnails/34.jpg)
METADATA GENERATION METS FILES AND STRUCTURE MAP
![Page 35: The Purdue University Research Repository:](https://reader036.vdocuments.net/reader036/viewer/2022062501/568168b8550346895ddf9e64/html5/thumbnails/35.jpg)
PURR
Puuuurrrrrrrrrrr….
![Page 36: The Purdue University Research Repository:](https://reader036.vdocuments.net/reader036/viewer/2022062501/568168b8550346895ddf9e64/html5/thumbnails/36.jpg)
WANT TO LEARN MORE?• Visit https://purr.purdue.edu/• Digital Data Repository Specialist:
Courtney Matthews at [email protected]
PURR CONTACTS
![Page 37: The Purdue University Research Repository:](https://reader036.vdocuments.net/reader036/viewer/2022062501/568168b8550346895ddf9e64/html5/thumbnails/37.jpg)
SPECIAL THANKS TO:
• Neal Harmeyer, Digital Archivist • Brandon Beatty, Digital Library Software Developer• Courtney Matthews, Digital Data Repository Specialist• Mark Fisher, Digital Library Software Developer
![Page 38: The Purdue University Research Repository:](https://reader036.vdocuments.net/reader036/viewer/2022062501/568168b8550346895ddf9e64/html5/thumbnails/38.jpg)
REFERENCES • Faniel, Ixchel M., Zimmerman, Ann (2011) “Beyond the Data Deluge: A Research Agenda for
Largee-Scale Data Sharing and Reuse.” The International Journal of Digital Curation 6(1): 59 • Lee, C., and Tibbo, H. “Digital Curation and Trusted Repositories: Steps toward Success” (2007).
Journal of Digital Information. http://journals.tdl.org/jodi/index.php/jodi/article/view/229/183• Klimeck, G., McLennan, M., Brophy, S.P., Adams, G.B., & Lundstrom, M.S.(2008). “nanoHUB.org:
Advancing Education and Research in Nanotechnology,” Computing in Science and Engineering,10(5): 17, 19, 21
• Witt, M., (2012). “Curation Service Models: Purdue University Research Repository” Libraries and Staff Presentations. Paper 3. http://docs.lib.purdue.edu/lib_fspress/3
• Witt, M. (2012). Co-designing, Co-developing, and Co-implementing an Institutional Data Repository Service. Journal of Library Administration, 52(2). DOI:10.1080/01930826.2012.655607