digital curation plan

34
 Research Data Curation and Lifecycle Management REPORT Submitted to: Pennington Biomedical Research Center Submission date: April 24, 2014 Submitted by: Just in Time Data Solutions  

Upload: kmschmitt14

Post on 16-Oct-2015

50 views

Category:

Documents


0 download

DESCRIPTION

Plan created for Pennington Biomedical Center to curate data created and collected by their researchers.

TRANSCRIPT

  • 5/26/2018 Digital Curation Plan

    1/34

    Research Data Curation and

    Lifecycle Management

    REPORT

    Submitted to: Pennington Biomedical

    Research CenterSubmission date: April 24, 2014

    Submitted by: Just in Time Data Solutions

  • 5/26/2018 Digital Curation Plan

    2/34

    Prepared by: Jennifer [email protected]

    Katie Schmitt

    [email protected]

    Troy [email protected]

    Katrina [email protected]

    mailto:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]
  • 5/26/2018 Digital Curation Plan

    3/34

    Table of Contents

    Executive Summary ............................................................................................................. 4

    The Curation Lifecycle

    Getting Started .................................................................................................................... 6

    Educate & Plan .................................................................................................................... 7

    Receive & Pre-process ......................................................................................................... 8

    Appraise & Select ............................................................................................................... 10

    Secure ................................................................................................................................. 11Quality Assurance .............................................................................................................. 12

    Store & Preserve ................................................................................................................ 13

    Access, Use & Reuse .......................................................................................................... 15

    Transform .......................................................................................................................... 16

    Glossary .............................................................................................................................. 17

    Workflow Diagrams .......................................................................................................... 19

    Appendices

    Appendix A: Data Management Plan ................................................................................ 20

    Appendix B: Recommended Metadata Schemas and Tools ............................................. 21

    Appendix C: Descriptive Metadata Template ................................................................... 25

    Appendix D: Deposit Agreements ..................................................................................... 26

    Appendix E: Repository Software Recommendations ..................................................... 27

    Appendix F: Budget Tools ................................................................................................. 34

  • 5/26/2018 Digital Curation Plan

    4/34

    4 | P a g e

    Executive SummaryThis report has been prepared for Pennington Biomedical Research Center by Just in Time Data Solutions

    (JTDS) to assist in the storage and preservation of research data for current and future analytical use and

    assigns responsibility for the implementation and subsequent workflows to Pennington Biomedicals

    Library and Information Center staff.It features nine sections, each relating to a phase of the lifecycle, as

    well as a glossary defining all bolded terms, workflow diagrams, and six appendices.In this context,data

    is defined as the digital output of researchers and can include manuscripts, images, research data sets,

    supplemental computer code, and any supplemental lab reports/documentation.

    The report presented is based in part on the Digital Curation Center (DCC) Lifecycle Model (Figure 1).

    While JTDS has used the DCC Lifecycle Model as a backbone for the plan, recommendations presented

    in the report are considered to be the best option for Pennington Biomedical, regardless of their

    conformance to the DCC Model.

    Figure 1. Digital Curation Center (DCC) Lifecycle Model.

  • 5/26/2018 Digital Curation Plan

    5/34

    5 | P a g e

    The proposed data curation plan and lifecycle have been designed specifically for Pennington Biomedical

    research data and more accurately reflects the action steps to take place in each phase (Figure 2).

    Figure 2. Pennington Biomedical Data Curation Lifecycle.

  • 5/26/2018 Digital Curation Plan

    6/34

    6 | P a g e

    The Curation Lifecycle: Getting Started

    Conducting a Data Audit

    Before a data curation plan can be implemented, the Library and Information Center staff should conduct

    a data audit. This will allow the staff to better understand Pennington Biomedicals potential digital

    holdings as well as benchmark Researchers inclinations to transfer responsibility of research data. To

    conduct the audit, JTDS recommends Library and Information Center staff begin with the Data Asset

    Frameworkdeveloped by HATII at the University of Glasgow in conjunction with the DCC. The Data

    Asset Framework provides example interview questions and web surveys to assist institutions with themeans to identify, locate, describe and assess how they are [currently] managing their research data

    assets. The data audit will also begin an open dialogue between Researchers and the Library and

    Information Center staff and will ensure a successful implementation of the curation plan outlined in this

    document.

    The Implementation Guide:http://www.data-audit.eu/docs/DAF_Implementation_Guide.pdf

    More on the Data Asset Framework:http://www.data-audit.eu/index.html

    Audit Trail

    JTDS strongly suggests an audit trail be created immediately upon the implementation of this data

    curation plan. A large portion of the curation process takes place outside of the repository software;

    therefore, each step should be tracked upon completion. This includes, but is not limited to:

    Date/time a digital item is received by the Library and Information Center staff

    Date/time the item is processed

    Name of staff member who processed the item

    Date/time a repository record is created

    Name of staff member who created the record

    The audit trail will ensure Library and Information Center staff are completing the workflow in a timely

    and accurate manner. It will also provide detailed information in the case that a digital item is either lostor corrupted. JTDS recommends that specific audit trail information be captured both in an items

    administrative metadata(see Appendix B) and in a secure spreadsheet.

    http://www.data-audit.eu/docs/DAF_Implementation_Guide.pdfhttp://www.data-audit.eu/docs/DAF_Implementation_Guide.pdfhttp://www.data-audit.eu/docs/DAF_Implementation_Guide.pdfhttp://www.data-audit.eu/index.htmlhttp://www.data-audit.eu/index.htmlhttp://www.data-audit.eu/index.htmlhttp://www.data-audit.eu/index.htmlhttp://www.data-audit.eu/docs/DAF_Implementation_Guide.pdf
  • 5/26/2018 Digital Curation Plan

    7/34

    7 | P a g e

    Educate & PlanDCC Lifecycle Step: Conceptualise

    At the beginning of the data lifecycle, it is important that Library and Information Center staff ensure data

    is created or collected by Researchers in an efficient manner. Researchers should be made aware of

    Pennington Biomedicals digital curation workflow (Figure 2) and the digital repository managed by the

    Library and Information Center staff. The roles of staff and Researchers in the workflow should be clearly

    defined. It is critical that Researchers know what is expected before they create or collect data and

    throughout the digital curation process for the workflow to be a success. JTDS stresses the need for open

    and consistent communication between Researchers and Library and Information Center staff, as the

    curation process will be new and will evolve over time.

    JTDS recommends that Pennington Biomedical publish a website to facilitate communication regarding

    the data lifecycle. The site should give Researchers information on policies and procedures for the newrepository, answer frequently asked questions, and ensure consistent processes. Contact information and

    hours for Library and Information Center staff should be easily accessible.

    Examples of similar websites:

    University of Michigan - ICPSR: http://www.icpsr.umich.edu/icpsrweb/deposit/index.jsp

    Purdue University Research Repository (PURR): https://purr.purdue.edu/about

    Data Management Plan

    JTDS encourages Library and Information Center staff to provide a customized sample Data

    Management Plan (DMP) to its Researchers. DMPs are required for many funding sources and are anintegral part of open science. A sample DMP with details specific to Pennington Biomedicals repository

    will not only save Researchers time, but will also assist in the grant application process (see Appendix A).

    http://www.icpsr.umich.edu/icpsrweb/deposit/index.jsphttp://www.icpsr.umich.edu/icpsrweb/deposit/index.jsphttps://purr.purdue.edu/abouthttps://purr.purdue.edu/abouthttps://purr.purdue.edu/abouthttp://www.icpsr.umich.edu/icpsrweb/deposit/index.jsp
  • 5/26/2018 Digital Curation Plan

    8/34

    8 | P a g e

    Receive & Pre-processDCC Lifecycle Step: Create or Receive

    Immediately following the completion of a research or analysis project, all data and supplemental files

    (including code) should be transferred to a secured shared drive that both the projects Researchers and

    Library and Information Center staff may access. While this step may occur well before project results are

    published, JTDS believes that this proactive transfer of responsibility will encourage Researchers to make

    curation part of their workflow. If publication of the research/analysis occurs at a later time, the pre-

    publication peer-reviewed manuscript, as well any other supplemental figures and images, should also betransferred to the Library and Information Center staff at the notification of publication. Library and

    Information Center staff should request Researchers sign a Deposit Agreement (see Appendix D) at the

    time Pennington Biomedical takes ownership of the digital files, if they have not done so already.

    File Formats

    Files should be delivered to Library and Information Center staff in a non-proprietary, uncompressed

    format. Research data should be stored in comma separated files (.csv), manuscripts and all

    accompanying documentation should be stored as plain text (.txt), and images should be stored as TIFF

    files (.tiff). As the majority of Pennington Biomedical research items are stored in Microsoft Office

    proprietary file formats (.doc, .docx, .xls, .xlsx), the Library and Information Center staff may decide toencourage the transfer of data in proprietary formats which will later be processed into the recommended

    formats. If this decision is enacted by the Library and Information Center staff, it should be well-

    documented, included in deposit agreements and audit trails, and originals should be kept for quality

    assurance purposes later in the lifecycle.

    File Names

    All digital files that are deposited by Researchers should follow Pennington Biomedicals file naming

    conventions. If the research item is not properly named, a member of the Library and Information Center

    staff should rename the file before proceeding. For file naming conventions, JTDS recommends that the

    researchers name or initials, at least a portion of the title, and a date associated with the itembe includedin the file name. JTDS also recommends that the length of the name not exceed 207 characters. The file

    name should not include spaces or periods, except for one period before the file extension. The file

    naming convention/rules should be clearly documented on Penningtons repository website maintained by

    the Library and Information Center staff.

  • 5/26/2018 Digital Curation Plan

    9/34

    9 | P a g e

    Resources for best practices of file naming:

    Stanford University Libraries:http://library.stanford.edu/research/data-management-

    services/data-best-practices/best-practices-file-naming

    University of Leicester:http://www2.le.ac.uk/services/research-data/organise-data/naming-files

    University of Illinois at Urbana-Champaign:

    http://www.library.illinois.edu/dcc/pdfs/best_practicespdfs/02_best_practices_for_file_naming_opt.pdf

    Metadata

    Metadata facilitates the management, use, and retrieval of a digital object or record, and it is a critical part

    of the digital curation process. JTDS strongly recommends that metadata is created and validated quickly

    after an object is first received to ensure accuracy, a complete audit trail, and the availability of the

    depositing Researcher. Metadataschema recommendations, as well as links to specifications for these

    schemas are included in Appendix B.

    The majority of descriptive metadatawill be provided by the Researcher upon transfer of data to Libraryand Information Center staff. At a minimum, the Researcher should provide:

    Title

    Author

    Publisher (if applicable)

    Journal Name (if applicable)

    Abstract/Description

    Related Publications/Datasets

    Grant Support

    Researcher ID

    Null value (for data sets)

    Embargo Periods

    This information can be provided in a plain text template transferred with the research item. An example

    has been provided in Appendix C. Library and Information Center staff should review the submission

    form to find missing or inconsistent fields. Researchers should then be contacted to clarify any potential

    confusion before the research item moves on to the Appraise & Select phase.

    File Fixity

    For preservation and validation purposes, a checksum should be generated for any digital items at the

    time they are transferred to the Library and Information Center staff. Checksums assist with fixitychecks

    throughout the preservation process. Many online tools exist to generate these numbers, including Online

    MD5|SHA1, available athttp://onlinemd5.com/.The checksum should be stored in the plain text metadata

    template for later use (see Appendix C).

    http://library.stanford.edu/research/data-management-services/data-best-practices/best-practices-file-naminghttp://library.stanford.edu/research/data-management-services/data-best-practices/best-practices-file-naminghttp://library.stanford.edu/research/data-management-services/data-best-practices/best-practices-file-naminghttp://library.stanford.edu/research/data-management-services/data-best-practices/best-practices-file-naminghttp://www2.le.ac.uk/services/research-data/organise-data/naming-fileshttp://www2.le.ac.uk/services/research-data/organise-data/naming-fileshttp://www2.le.ac.uk/services/research-data/organise-data/naming-fileshttp://www.library.illinois.edu/dcc/pdfs/best_practicespdfs/02_best_practices_for_file_naming_opt.pdfhttp://www.library.illinois.edu/dcc/pdfs/best_practicespdfs/02_best_practices_for_file_naming_opt.pdfhttp://www.library.illinois.edu/dcc/pdfs/best_practicespdfs/02_best_practices_for_file_naming_opt.pdfhttp://onlinemd5.com/http://onlinemd5.com/http://onlinemd5.com/http://onlinemd5.com/http://www.library.illinois.edu/dcc/pdfs/best_practicespdfs/02_best_practices_for_file_naming_opt.pdfhttp://www.library.illinois.edu/dcc/pdfs/best_practicespdfs/02_best_practices_for_file_naming_opt.pdfhttp://www2.le.ac.uk/services/research-data/organise-data/naming-fileshttp://library.stanford.edu/research/data-management-services/data-best-practices/best-practices-file-naminghttp://library.stanford.edu/research/data-management-services/data-best-practices/best-practices-file-naming
  • 5/26/2018 Digital Curation Plan

    10/34

    10 | P a g e

    Appraise & SelectDCC Lifecycle Step: Appraise & Select

    The Library and Information Center staff should create an official written collection policyto define

    which data files will be stored in the Pennington Biomedical repository. JTDS recommends that Library

    and Information Center staff choose to store, at minimum, de-identifiable data and supplemental files

    related to studies conducted by Pennington Biomedical. Pennington Biomedical may also choose to store

    pre-published, post-peer review manuscripts and supplemental images. Distinctions that may be

    determined within the collection policy may include:

    Scope

    Purpose

    Documentation

    Accuracy

    Acceptable file formats

    Federal Mandates

    Because Pennington Biomedical is a federally funded organization, it may be affected by the Open Data

    Policymandated by Executive Order on May 9, 2013 as well as the Open Access Policymandated by the

    National Institutes of Health. With the help of Pennington Biomedical legal counsel, the collection policyshould ensure Pennington Biomedical remains in compliance in order to receive future federal monies.

    Deposited materials which meet the terms of the collection policy should move on to the Secure phase.

    The collection policy should also define steps that will be taken when deposited materials do not meet the

    requirements for long term storage. JTDS recommends that Researchers are made clearly aware of steps

    for rejected materials and those steps are followed exactly. Possibilities for rejection include personally-

    identifiable or HIPAA protected data which should be either returned to the Researcher or securely

    disposed of.

    Examples of a collection policy include:

    Purdue University Research Repository (PURR):https://purr.purdue.edu/legal/collection-policy

    Western Australia Department of Health:

    http://www.health.wa.gov.au/CircularsNew/attachments/664.pdf

    https://purr.purdue.edu/legal/collection-policyhttps://purr.purdue.edu/legal/collection-policyhttps://purr.purdue.edu/legal/collection-policyhttp://www.health.wa.gov.au/CircularsNew/attachments/664.pdfhttp://www.health.wa.gov.au/CircularsNew/attachments/664.pdfhttp://www.health.wa.gov.au/CircularsNew/attachments/664.pdfhttps://purr.purdue.edu/legal/collection-policy
  • 5/26/2018 Digital Curation Plan

    11/34

    11 | P a g e

    SecureDCC Lifecycle Step: Ingest

    During the Secure phase of the lifecycle, each data file should be moved to a secured file folder or

    network drive which only the Library and Information Center staff may access. This process ensures that

    any future manipulation of the object may only be done by Library and Information Center staff. This

    folder or drive should be backed up frequently - preferably once every 24 hours, or more often.

    At this point, JTDS recommends Library and Information Center staff pause to confirm that each data fileis ready to move on in the lifecycle. Each file should be accompanied by a plain text metadata or

    description file as outlined in the Receive and Pre-process step. The Library and Information Center staff

    should also have a signed deposit agreement from the Researcher (more information about deposit

    agreements and why they are an essential part of the curation lifecycle can be found in Appendix D). If a

    staff member confirms that the previous steps have been completed, the data files are now ready for

    Quality Assurance.

  • 5/26/2018 Digital Curation Plan

    12/34

    12 | P a g e

    Quality AssuranceDCC Lifecycle Step: Preservation Actions

    To begin the preservation process, all data files should be reviewed to make sure that they are saved in

    uncompressed and non-proprietary formats and conform to Pennington Biomedicals file naming

    conventions. If the file was migrated from a proprietary format into a recommended format, the Library

    and Information Center staff should validate the results against the original files by opening both files and

    comparing the new copy to the original. Additionally, as outlined previously, JTDS recommends audit

    trails are consistently recorded to make certain any changes and quality assurance are fully documented.

    Library and Information Center staff will createadministrative and preservation metadatafor each item.

    JTDS recommends METS for administrative metadata and PREMIS for preservation metadata (links to

    specifications for these schemas are included in Appendix B). All metadata created should be stored in

    XML. The researchers descriptive metadata submission should be entered into the descriptive schema

    chosen and then wrapped inside the METS record, which will also include the METS administrative

    metadata fields. The PREMIS record should be kept as a separate file. Both file names should match the

    file name of the data file with either _METS.xml or _PREMIS.xml added to the end.

    Finally, a fixity/checksum check should be performed to ensure that the transfer did not affect the file. If

    changes were made to the file, a new checksum should be generated and stored in the PREMIS file.

  • 5/26/2018 Digital Curation Plan

    13/34

    13 | P a g e

    Store & PreserveDCC Lifecycle Step: Store

    Ultimately, Pennington Biomedicals research items should be stored in an institutional repositorywith

    preservation capabilities for long-term management of these items. However, it may be necessary in the

    interim to simply create a drive on the Pennington Biomedical network to act as a temporary repository.

    If an interim storage space is created, JTDS strongly recommends Library and Information Center staff

    backup all files frequently - preferably every 24 hours or more often.

    Repository Software

    A full explanation of recommendations for repository software is included in Appendix E. Many options

    and resources exist, and only JTDSs recommended options have been presented in this report.

    The storage location and repository software that Pennington Biomedical choose to implement will

    dictate a specific workflow for creating a repository record to store each data file with its metadata. For

    all options, Library and Information Center staff should ensure the workflow is repeatable and consistent.

    Library and Information Center staff should consider the following:

    Pennington Biomedical Hosted Repository- Once items are prepared, a member of the Libraryand Information Center staff will create a record in the repository software, enter the required

    metadata into the web form or upload the XML files, and attach the object to the record. These

    steps should be documented as part of the data lifecycle website that the Library and Information

    Center maintains. This part of the website should include which staff have the administrative

    rights to create repository records (i.e. only full-time staff and management, only full-time staff,

    management, and graduate school trainees, etc.), the timeline for record creation (i.e. how long

    after an item is prepared/processed will a repository item be created), and if/how the researcher

    responsible for creation of the item will be notified that a repository record has been created, as

    well as screenshots with an explanation of the process for knowledge transfer purposes.

    Louisiana State University (LSU) RepositoryThe questions included in Appendix E shouldbe answered by LSU staff and documented in a standard operating procedure that outlines the

    entire process which should be maintained by both LSU and Pennington Biomedical staff.

    Third Party Hosted RepositoryOnce items are prepared, a member of the Library and

    Information Center staff will create a record in the repository software, enter the required

    metadata into the web form or upload XML files, and attach the object to the record. These steps

    should be documented in the online procedures manual that the Library and Information Center

  • 5/26/2018 Digital Curation Plan

    14/34

    14 | P a g e

    maintains. This part of the manual should include which staff have the administrative rights to

    create repository records (i.e. only full-time staff and management, only full-time staff,

    management, and graduate school trainees, etc.), the timeline for record creation (i.e. how long

    after an item is prepared/processed will a repository item be created), and if/how the researcher

    responsible for creation of the item will be notified that a repository record has been created, as

    well as screenshots with an explanation of the process for knowledge transfer purposes.

    Additionally, depending on repository infrastructure implementation, XML metadata files may not be

    able to be uploaded directly into the repository software. If this is the case, JTDS recommends storing the

    XML metadata files separately on the Pennington Biomedical network in a file folder or drive that is

    backed up frequently - preferably every 24 hours or more often.

    Many of the repository infrastructures/software packages recommended in Appendix E require additional

    modules or add-on architecture to properly preserve objects in the repository. Links and overviews to

    certain modules/add-ons are also included in Appendix E. This list is not comprehensive but presents a

    few of the best options for Pennington Biomedical.

    Review Schedules

    In addition to the add-on architecture, a review schedule of file formats, file fixity, and deposit

    agreements should be implemented. Some of the add-on architecture for the repository software includes

    configurable automatic alerts for format obsolescence. JTDS recommends that this process occur, at

    minimum, yearly.

    File fixity checks, using the stored checksums, will also need to be performed. JTDS recommends that

    fixity checks are performed every time a research item is transferred or moved. Additionally, one percent

    of research items should receive random sample checks every month, at minimum, to verify that items

    remain stable.

    Backups/Redundancy

    Backups of each repository implementation should also be considered. If a Pennington Biomedical-based

    repository option is chosen, this will require Pennington Biomedical to select either a cloud-based or

    another geographic location for backups of the entire repository and its contents. If the LSU repository or

    a third-party hosted solution is chosen, this is an issue that should be addressed. The following questions

    provide high-level considerations:

    Where are the backups? (I.e. in a different geographical location and/or cloud-based?)

    How often do the backups occur?

    How many copies are made?

    What is the maximum down-time in case of an outage?

    How do these features affect the final cost of the repository solution?

  • 5/26/2018 Digital Curation Plan

    15/34

    15 | P a g e

    Access, Use & ReuseDCC Lifecycle Step: Access, Use & Reuse

    As most of Pennington Biomedicals research items are subject to comply with the directives of the

    United States Office of Science and Technology Policy (OSTP) and National Institutes of Health (NIH),

    these items must be discoverable and reusable with free access to metadata. Open access should drive

    continual public-private collaboration, as well as adding to public knowledge without compromising

    confidentiality and respecting proprietary interests. In order for these items to be reusable, they must be

    stored in a machine-readable format (.txt, .csv, .tiff) and have appropriate metadata, as outlined in theReceive & Pre-process phase. Care must be taken to ensure all open data has been de-identified, as well

    as respecting embargo periods.

    All repository software recommended include a web access component, allowing all items with the proper

    permissions to be available through the web (see Appendix E). Users will access the items via these

    simple web forms, allowing Pennington Biomedical to monitor usage statistics and check for proper

    citations. Only copies of items will be distributed to users.

    All versioned or manipulated data files should trigger the beginning of the curation lifecycle for the new

    file.

  • 5/26/2018 Digital Curation Plan

    16/34

    16 | P a g e

    TransformDCC Lifecycle Step: Transform

    As outlined in the Store and Preserve phase, a review schedule should be implemented for format

    obsolescence. If items require format migration, a new copy of the item in the updated format should be

    created. This copy should be treated as a new item and trigger the beginning of the curation lifecycle.

  • 5/26/2018 Digital Curation Plan

    17/34

    17 | P a g e

    GlossaryChecksum: An algorithmically-computed numeric value for a file or a set of files used to validate the

    state and content of the file for the purpose of detecting accidental errors that may have been introduced

    during its transmission or storage. The integrity of the data can be checked at any later time by re-computing the checksum and comparing it with the stored one. If the checksums match, the data has not

    been altered.

    Collection Policy: A formal collection policy defines specific criteria to determine the value of data

    deposits. The policy ensures appraisal decisions are made in an open, consistent, and lawful manner. For

    more information, visithttp://www.dcc.ac.uk/resources/how-guides/appraise-select-data#5

    Controlled Lots of Copies Keep Stuff Safe (CLOCKSS): A global long-term archive committed to

    open access. Scholarly publishers have agreed to make content available for free under a creative

    commons license in the event that they can no longer supply it. The archive is distributed across 12

    geopolitically and geographically diverse long-lived steward libraries that have agreed to take on anarchival role on behalf of the wider international community. For more information, visit

    http://www.clockss.org/clockss/Home .

    Crosswalk: A table or schema that maps one metadata standard to another, showing equivalent fields.

    Data: The digital output of researchers, which may include manuscripts, images, research data sets,

    supplemental computer code, and any supplemental lab reports/documentation. According to the DCC

    Lifecycle Model, data is any information in binary digital form and can include simple digital objects

    (discrete digital items such as text files, image files or sound files, along with their related identifiers and

    metadata) or complex digital objects (discrete digital objects made by combining a number of other

    digital objects, such as websites), as well as databases, which are structured collections of records ordata stored in a computer system.

    Data Asset Framework: A system of interview questions and web surveys used by institutions to audit

    and assess data holdings and data management procedures. For more information, visithttp://www.data-

    audit.eu/index.html

    Data Management Plan (DMP): A plan that is generated before a scientific study begins and is often

    included with grant applications. The plan states what data are to be created and managed and describes

    the specific plans for preservation and access. For more information, visit

    http://www.dcc.ac.uk/resources/data-management-plans.

    Deposit Agreement: A receipt of transfer signed at the time custody of digital files is transferred from

    researchers to the digital repository staff (see Appendix D for recommendations).

    Fixity: The property of a digital object being fixed or unchanged. Fixity information, such as checksums,

    provides evidence for the integrity and authenticity of the digital objects and are essential to enabling

    trust.

    Institutional Repository (IR): A set of services that a university offers to the members of its

    http://www.dcc.ac.uk/resources/how-guides/appraise-select-data#5http://www.dcc.ac.uk/resources/how-guides/appraise-select-data#5http://www.dcc.ac.uk/resources/how-guides/appraise-select-data#5http://www.dcc.ac.uk/resources/how-guides/appraise-select-data#5http://www.clockss.org/clockss/Homehttp://www.clockss.org/clockss/Homehttp://www.clockss.org/clockss/Homehttp://www.clockss.org/clockss/Homehttp://www.data-audit.eu/index.htmlhttp://www.data-audit.eu/index.htmlhttp://www.data-audit.eu/index.htmlhttp://www.data-audit.eu/index.htmlhttp://www.dcc.ac.uk/resources/data-management-plans.http://www.dcc.ac.uk/resources/data-management-plans.http://www.dcc.ac.uk/resources/data-management-planshttp://www.dcc.ac.uk/resources/data-management-planshttp://www.dcc.ac.uk/resources/data-management-planshttp://www.dcc.ac.uk/resources/data-management-planshttp://www.dcc.ac.uk/resources/data-management-plans.http://www.data-audit.eu/index.htmlhttp://www.data-audit.eu/index.htmlhttp://www.clockss.org/clockss/Homehttp://www.clockss.org/clockss/Homehttp://www.dcc.ac.uk/resources/how-guides/appraise-select-data#5http://www.dcc.ac.uk/resources/how-guides/appraise-select-data#5
  • 5/26/2018 Digital Curation Plan

    18/34

    18 | P a g e

    community for the management and dissemination of digital materials created by the institution and its

    community members. It is most essentially an organizational commitment to the stewardship of these

    digital materials, including long-term preservation where appropriate, as well as organization and access

    or distribution. For more information, visithttp://www.arl.org/storage/documents/publications/arl-br-

    226.pdf.

    Metadata: Structured, descriptive data or information about digital and physical objects or records. For

    more information, visit http://www.niso.org/publications/press/UnderstandingMetadata.pdf .

    Administrative Metadata: Metadata that describes how the information in a digital record is

    organized. This can include management metadata such as when and how the record was created,

    file types, digital object identifiers (DOIs) and other technical info, and intellectual property

    metadata including how proprietary data is protected and who can have access to it.

    Descriptive Metadata:Metadata that captures important characteristics about an object for

    discovery and identification. It can include elements such as title, abstract, author, and keywords.

    Preservation Metadata: Metadata that supports and documents the process of digital

    preservation. Usually reserved for metadata that specifically supports the functions of maintaining

    the fixity, viability, renderability, understandability, and/or authenticity of digital materials in a

    preservation context. For more information, visit

    http://www.dcc.ac.uk/sites/default/files/documents/resource/curation-

    manual/chapters/preservation-metadata/preservation-metadata.pdf.

    Open Access Policy:The National Institutes of Health enacted a Public Access Policy in 2008. This

    policy ensures that the public has access to the published results of NIH-funded research. [It] requires

    that these final peer-reviewed manuscripts be accessible to the public on PubMed Central to help advance

    science and improve human health. For more information, visithttp://publicaccess.nih.gov/FAQ.htm#821 .

    Open Data Policy: The Executive Office of the President defines open data as publicly available data

    structured in a way that enables the data to be fully discoverable and usable by end users. To read the

    Policy, visit http://www.whitehouse.gov/sites/default/files/omb/memoranda/2013/m-13-13.pdf.

    Repository Software: The technical infrastructure, or software package, for an institutional repository.

    Most repository software includes architecture for a web access portal, a database, as well as

    administrative portals for data management.

    http://www.arl.org/storage/documents/publications/arl-br-226.pdfhttp://www.arl.org/storage/documents/publications/arl-br-226.pdfhttp://www.arl.org/storage/documents/publications/arl-br-226.pdfhttp://www.arl.org/storage/documents/publications/arl-br-226.pdfhttp://www.niso.org/publications/press/UnderstandingMetadata.pdfhttp://www.niso.org/publications/press/UnderstandingMetadata.pdfhttp://www.niso.org/publications/press/UnderstandingMetadata.pdfhttp://www.dcc.ac.uk/sites/default/files/documents/resource/curation-manual/chapters/preservation-metadata/preservation-metadata.pdfhttp://www.dcc.ac.uk/sites/default/files/documents/resource/curation-manual/chapters/preservation-metadata/preservation-metadata.pdfhttp://www.dcc.ac.uk/sites/default/files/documents/resource/curation-manual/chapters/preservation-metadata/preservation-metadata.pdfhttp://www.dcc.ac.uk/sites/default/files/documents/resource/curation-manual/chapters/preservation-metadata/preservation-metadata.pdfhttp://publicaccess.nih.gov/FAQ.htm#821http://publicaccess.nih.gov/FAQ.htm#821http://publicaccess.nih.gov/FAQ.htm#821http://publicaccess.nih.gov/FAQ.htm#821http://www.whitehouse.gov/sites/default/files/omb/memoranda/2013/m-13-13.pdfhttp://www.whitehouse.gov/sites/default/files/omb/memoranda/2013/m-13-13.pdfhttp://www.whitehouse.gov/sites/default/files/omb/memoranda/2013/m-13-13.pdfhttp://www.whitehouse.gov/sites/default/files/omb/memoranda/2013/m-13-13.pdfhttp://publicaccess.nih.gov/FAQ.htm#821http://publicaccess.nih.gov/FAQ.htm#821http://www.dcc.ac.uk/sites/default/files/documents/resource/curation-manual/chapters/preservation-metadata/preservation-metadata.pdfhttp://www.dcc.ac.uk/sites/default/files/documents/resource/curation-manual/chapters/preservation-metadata/preservation-metadata.pdfhttp://www.dcc.ac.uk/sites/default/files/documents/resource/curation-manual/chapters/preservation-metadata/preservation-metadata.pdfhttp://www.niso.org/publications/press/UnderstandingMetadata.pdfhttp://www.niso.org/publications/press/UnderstandingMetadata.pdfhttp://www.arl.org/storage/documents/publications/arl-br-226.pdfhttp://www.arl.org/storage/documents/publications/arl-br-226.pdf
  • 5/26/2018 Digital Curation Plan

    19/34

    19 | P a g e

    Workflow Diagrams

    Receive File

    Migrate File Format

    Rename File

    Review Metadata

    Generate Checksum

    Review File

    Create Administrative andPreservation Metadata

    Move Descriptive Metadata to XML

    Perform Checksum Check

    Create Repository Record

    Workflow for Receive &Pre-process Phase

    Workflow for QualityAssurance and Store &Preserve Phases

  • 5/26/2018 Digital Curation Plan

    20/34

    20 | P a g e

    Appendix A: Data Management PlanA Researcher who is applying for funding will almost always need to write a data management plan with

    specific details on their repository. JTDS recommends the Library and Information Center staff provide a

    sample DMP to encourage Researchers to deposit data through Pennington Biomedical and to ease theburden of the grant application process.

    Although each funding agency will have its own unique set of requirements, a majority of DMPs will

    include the following points:

    What types of data will be created or collected?

    Which data will be retained?

    How will the data be managed and preserved (short and long term)?

    How will the primary data be shared?

    What factors may affect the ability to manage data?

    i.e. legal or ethical restrictions on non-aggregated data What other information should be preserved?

    i.e. code, supplemental files, metadata

    What formats will data be stored in?

    How will data be disseminated?

    Any additional data management requirements

    JTDS recommends Library and Information Center staff use the DMP Tool, created by the California

    Digital Library, DMPonline, developed by the Digital Curation Centre, or templates provided by

    Columbia University Libraries, to create the sample DMP.

    DMP Tool:https://dmp.cdlib.org/

    DMP Online:https://dmponline.dcc.ac.uk/

    Templates from Columbia University Libraries:http://scholcomm.columbia.edu/data-

    management/data-management-plan-templates/

    The following are examples of sample plans provided by a repository:

    http://www.northumbria.ac.uk/static/5007/ceispdf/dmpfull.pdf

    http://rci.ucsd.edu/_files/DMP%20Example%20Nitz.pdf

    http://www.irss.unc.edu/odum/contentSubpage.jsp?nodeid=570

    https://dmp.cdlib.org/https://dmp.cdlib.org/https://dmp.cdlib.org/https://dmponline.dcc.ac.uk/https://dmponline.dcc.ac.uk/https://dmponline.dcc.ac.uk/http://scholcomm.columbia.edu/data-management/data-management-plan-templates/http://scholcomm.columbia.edu/data-management/data-management-plan-templates/http://scholcomm.columbia.edu/data-management/data-management-plan-templates/http://scholcomm.columbia.edu/data-management/data-management-plan-templates/http://www.northumbria.ac.uk/static/5007/ceispdf/dmpfull.pdfhttp://www.northumbria.ac.uk/static/5007/ceispdf/dmpfull.pdfhttp://rci.ucsd.edu/_files/DMP%20Example%20Nitz.pdfhttp://rci.ucsd.edu/_files/DMP%20Example%20Nitz.pdfhttp://www.irss.unc.edu/odum/contentSubpage.jsp?nodeid=570http://www.irss.unc.edu/odum/contentSubpage.jsp?nodeid=570http://www.irss.unc.edu/odum/contentSubpage.jsp?nodeid=570http://rci.ucsd.edu/_files/DMP%20Example%20Nitz.pdfhttp://www.northumbria.ac.uk/static/5007/ceispdf/dmpfull.pdfhttp://scholcomm.columbia.edu/data-management/data-management-plan-templates/http://scholcomm.columbia.edu/data-management/data-management-plan-templates/https://dmponline.dcc.ac.uk/https://dmp.cdlib.org/
  • 5/26/2018 Digital Curation Plan

    21/34

    21 | P a g e

    Appendix B: Recommended Metadata Schemas and

    ToolsDescriptive Metadata SchemasDublin Core- The Dublin Core Metadata Initiative is a metadata schema that features a small set of

    descriptive terms for web resources. For more information, visithttp://dublincore.org/documents/dcmi-

    terms/.

    MODS- The Metadata Object Description Schema (MODS) is a schema for a bibliographic element set

    that may be used for a variety of purposes, and particularly for library applications. For more

    information, visithttp://www.loc.gov/standards/mods/mods-outline-3-5.html .

    Metadata Crosswalk from Dublin Core to MODS:

    http://www.loc.gov/standards/mods/dcsimple-mods.html

    NISO JATS- The Journal Article Tag Suite (JATS) is a National Information Standards Organization

    (NISO) standard that defines a set of XML elements and attributes for tagging journal articl es... JATS is

    a continuation of the National Library of Medicine Archiving and Interchange DTD work begun in 2002

    by the National Center for Biotechnology Information. For more information, visit

    http://jats.nlm.nih.gov/archiving/tag-library/1.1d1/andhttp://jats.niso.org/.

    DataCite- The DataCite Metadata Schema is a list of core metadata properties chosen for the accurate

    and consistent identification of a resource for citation and retrieval purposes, along with recommended

    use instructions. For more information, visithttp://schema.datacite.org/meta/kernel-3/doc/DataCite-

    MetadataKernel_v3.0.pdf.

    Descriptive Metadata Recommendations

    Table 1 and Table 2 feature the best schema choices for descriptive metadata for

    manuscripts/publications, images, and datasets, as well as crosswalksbetween the schemas. Each items

    metadata should include, at minimum, the fields listed in the tables. For publications and images, JTDS

    recommends that a more robust schema like NISO JATS be used in order to capture fields that will

    support the special uses and reuses of Pennington Biomedical research items. For datasets, JTDS

    recommends either NISO JATS or DataCite. While NISO JATS is usually reserved for

    manuscripts/publications, using the schema to describe datasets and/or images1will assist with

    consistency of workflows.

    1Contextual metadata for images, such as the related publications/dataset field, should always be included. This

    type of information describes why a digital object was created and how it relates to or is distinguished from other

    digital objects, which is especially important for graphs and other images that can be misunderstood out of context.

    http://dublincore.org/documents/dcmi-terms/http://dublincore.org/documents/dcmi-terms/http://dublincore.org/documents/dcmi-terms/http://dublincore.org/documents/dcmi-terms/http://www.loc.gov/standards/mods/mods-outline-3-5.htmlhttp://www.loc.gov/standards/mods/mods-outline-3-5.htmlhttp://www.loc.gov/standards/mods/mods-outline-3-5.htmlhttp://www.loc.gov/standards/mods/dcsimple-mods.htmlhttp://www.loc.gov/standards/mods/dcsimple-mods.htmlhttp://jats.nlm.nih.gov/archiving/tag-library/1.1d1/http://jats.nlm.nih.gov/archiving/tag-library/1.1d1/http://jats.niso.org/http://jats.niso.org/http://jats.niso.org/http://schema.datacite.org/meta/kernel-3/doc/DataCite-MetadataKernel_v3.0.pdfhttp://schema.datacite.org/meta/kernel-3/doc/DataCite-MetadataKernel_v3.0.pdfhttp://schema.datacite.org/meta/kernel-3/doc/DataCite-MetadataKernel_v3.0.pdfhttp://schema.datacite.org/meta/kernel-3/doc/DataCite-MetadataKernel_v3.0.pdfhttp://schema.datacite.org/meta/kernel-3/doc/DataCite-MetadataKernel_v3.0.pdfhttp://schema.datacite.org/meta/kernel-3/doc/DataCite-MetadataKernel_v3.0.pdfhttp://jats.niso.org/http://jats.nlm.nih.gov/archiving/tag-library/1.1d1/http://www.loc.gov/standards/mods/dcsimple-mods.htmlhttp://www.loc.gov/standards/mods/mods-outline-3-5.htmlhttp://dublincore.org/documents/dcmi-terms/http://dublincore.org/documents/dcmi-terms/
  • 5/26/2018 Digital Curation Plan

    22/34

    22 | P a g e

    Table 1. Publication and Image Descriptive Metadata Crosswalks

    Suggested Fields Dublin Core Fields MODS NISO JATS

    Title Title Title Title

    Author Creator Name Contributor / ContributorGroup

    Publisher Publisher Publisher Publisher

    Journal Name Journal Title

    Volume Volume

    Issue Issue

    Date Date Date IssuedDate CreatedDate CapturedDate Other

    Date

    Page Range Page Range

    DOI/Unique Identifier Identifier Identifier Object ID

    Abstract Description Abstract Abstract

    PubMed ID Article ID

    PubMed Central ID Article ID

    MeSH Terms Subject Subject Kwd

    Related Publication or

    DatasetRelation Related Item Related Article / Related

    Object

    Grant Support Note can be used withNote Type: funding

    Funding Source

    Researcher ID Contributor Identifier

    File Type Type Type of Resource Custom Meta

    File Format Format Physical Description Custom Meta

    Copyright Rights Access Condition Copyright Holder /Copyright Statement

    Embargo Period Note can be used withNote Type: restriction

    Custom Meta

  • 5/26/2018 Digital Curation Plan

    23/34

    23 | P a g e

    Open Access Statement Open Access

    Table 2. Dataset Descriptive Metadata Crosswalks

    Suggested

    Fields

    Dublin Core MODS NISO JATS Data Cite

    Title Title Title Title Title

    Author Creator Name Contributor /Contributor Group

    Creator

    Publisher, if applicable Publisher Publisher Publisher Publisher

    Date Date Date Issued

    Date Created

    Date Captured

    Date Other

    Date Publication Year

    (required)

    Date

    Abstract/

    Description

    Description Abstract Abstract Description

    DOI/ Unique Identifier Identifier Identifier Object ID Identifier

    MeSH Terms Subject Subject Kwd Subject

    Related Publication or

    Dataset

    Relation Related Item Related Article /

    Related Object

    Related Identifier

    Grant Support Note can be used withNote Type: funding

    Funding Source Description can beused

    Researcher ID Contributor Identifier Name Identifier

    File Type Type Type of Resource Custom Meta Description can beused

    File Format Format Physical Description Custom Meta Format

    Copyright Rights Access Condition Copyright Holder /Copyright Statement

    Rights

    Embargo Period Note can be used with

    Note Type: restrictionCustom Meta

    Open Access

    StatementOpen Access

    NULL value Custom Meta Description can beused

    Version of Dataset Note can be used withNote Type: versionidentification

    Custom Meta Version

    Size of dataset Custom Meta Size

  • 5/26/2018 Digital Curation Plan

    24/34

    24 | P a g e

    Administrative Metadata Schema

    METS- The Making of America II project (MOA2) attempted to address these issues in part by

    providing an encoding format for descriptive, administrative, and structural metadata for textual and

    image-based works. METS... attempts to build upon the work of MOA2 and provides an XML document

    format for encoding metadata necessary for both management of digital library objects within a repository

    and exchange of such objects between repositories (or between repositories and their users). For more

    information, visithttp://www.loc.gov/standards/mets/METSOverview.v2.html .

    Preservation Metadata Schema

    PREMIS- The PREMIS Data Dictionary for Preservation Metadata is the international standard for

    metadata to support the preservation of digital objects and ensure their long-term usability ThePREMIS Editorial Committee coordinates revisions and implementation of the standard, which consists

    of the Data Dictionary, an XML schema, and supporting documentation. For more information, visit

    http://www.loc.gov/standards/premis/v2/premis-2-2.pdf.

    Additional Metadata Resources

    Minimum Information for Biological and Biomedical Investigations (MIBBI) is a portal to a large

    number of minimum information guidelines for various biological disciplines:

    http://www.dcc.ac.uk/resources/metadata-standards/mibbi-minimum-information-biological-and-

    biomedical-investigations

    Descriptive Ontology for Biomedical Investigations (OBI): http://obi-

    ontology.org/page/Main_Page The DCC Digital Curation Reference Manual Installment on Scientific Metadata, which provides

    an overview in order to help determine a scientific institutions metadata needs:

    http://www.dcc.ac.uk/sites/default/files/documents/Scientific%20Metadata_2011_Final.pdf

    http://www.loc.gov/standards/mets/METSOverview.v2.htmlhttp://www.loc.gov/standards/mets/METSOverview.v2.htmlhttp://www.loc.gov/standards/mets/METSOverview.v2.htmlhttp://www.loc.gov/standards/premis/v2/premis-2-2.pdfhttp://www.loc.gov/standards/premis/v2/premis-2-2.pdfhttp://www.dcc.ac.uk/resources/metadata-standards/mibbi-minimum-information-biological-and-biomedical-investigationshttp://www.dcc.ac.uk/resources/metadata-standards/mibbi-minimum-information-biological-and-biomedical-investigationshttp://www.dcc.ac.uk/resources/metadata-standards/mibbi-minimum-information-biological-and-biomedical-investigationshttp://obi-ontology.org/page/Main_Pagehttp://obi-ontology.org/page/Main_Pagehttp://obi-ontology.org/page/Main_Pagehttp://www.dcc.ac.uk/sites/default/files/documents/Scientific%20Metadata_2011_Final.pdfhttp://www.dcc.ac.uk/sites/default/files/documents/Scientific%20Metadata_2011_Final.pdfhttp://www.dcc.ac.uk/sites/default/files/documents/Scientific%20Metadata_2011_Final.pdfhttp://obi-ontology.org/page/Main_Pagehttp://obi-ontology.org/page/Main_Pagehttp://www.dcc.ac.uk/resources/metadata-standards/mibbi-minimum-information-biological-and-biomedical-investigationshttp://www.dcc.ac.uk/resources/metadata-standards/mibbi-minimum-information-biological-and-biomedical-investigationshttp://www.loc.gov/standards/premis/v2/premis-2-2.pdfhttp://www.loc.gov/standards/mets/METSOverview.v2.html
  • 5/26/2018 Digital Curation Plan

    25/34

    25 | P a g e

    Appendix C: Descriptive Metadata TemplateName and Department: ________________

    Researcher ID: _______________________

    Todays Date: ________________________

    (Please highlight) Are you transferring a manuscript a dataset supplemental images other?

    If you highlighted other, please describe:

    If you highlighted dataset, what value is used for Nulls in the data?

    What is the title of the item?

    Please list the author(s):

    If the item has been accepted for publication, please list the Publisher and Journal Name:

    Please provide the abstract or a description: (Be as complete as possible.)

    Is the item related to a publication or a dataset (i.e. is this a supplemental image for a publication or doesthis manuscript have accompanying data)? If yes, please list the title, author, and date for the related

    publication or dataset:

    Was this item created with funding from the National Institutes of Health? If yes, please list all NIH grantnumbers: (This will help ensure that your research items remain in compliance with open accessrequirements.)

    Does this item have any embargo periods that the Library and Information Center staff should be aware

    of? If yes, please describe the type and length:

    ___________________________________________________________________________**For Administrative Use Only**| Checksum value:

  • 5/26/2018 Digital Curation Plan

    26/34

    26 | P a g e

    Appendix D: Deposit AgreementsOnce the data audit process is completed and the technical infrastructure for this plan has been

    implemented, the Library and Information Center staff will need all Pennington Biomedical Researchers

    to sign a deposit agreement for the transfer of responsibility of their data files to the Library andInformation Center staff as well as for their storage in the new institutional repository, regardless of the

    choice in repository software. This agreement should include a high-level outline of workflows for

    metadata creation, preservation processing, and how copyright will be handled - especially for items that

    are open access. This may require additional investigations into the Researchers publisher agreements

    and how publisher restrictions can/will be handled. It should also include a statement granting the Library

    and Information Center staff proxy to upload the processed data files into the repository.

    The length of time for which the deposit agreement is valid should be based on the Researchers position.

    Since Pennington Biomedicals tenure appointments are five years in length, tenured faculty should sign

    agreements based on the length remaining in her/his tenure. For example, if a tenured researcher has three

    years remaining in her/his tenure, the researcher should sign a three year agreement. If a tenuredresearcher has just started a new tenure appointment, the tenured faculty should sign a five year

    agreement. Post-Docs or Adjunct researchers should sign shorter agreements, only lasting one or two

    years.

    JTDS recommends that Pennington Biomedical consult with their legal counsel for the exact wording, but

    the following examples may help to assist:

    http://www.lib.cam.ac.uk/repository/deposit_agreement.html

    http://www.unimelb.edu.au/copyright/umeragreement13August07.pdf

    http://www.lib.cam.ac.uk/repository/deposit_agreement.htmlhttp://www.lib.cam.ac.uk/repository/deposit_agreement.htmlhttp://www.unimelb.edu.au/copyright/umeragreement13August07.pdfhttp://www.unimelb.edu.au/copyright/umeragreement13August07.pdfhttp://www.unimelb.edu.au/copyright/umeragreement13August07.pdfhttp://www.lib.cam.ac.uk/repository/deposit_agreement.html
  • 5/26/2018 Digital Curation Plan

    27/34

    27 | P a g e

    Appendix E: Repository Software RecommendationsA full explanation of recommendations for repository software follows. Many options and resources exist,

    and only the best options have been presented.

    Before getting started, becoming familiar with a high-level repository guide will help to gauge which

    features are important to Pennington Biomedical, as well as answering additional questions:

    JISC Digital Repositories InfoKit:http://tools.jiscinfonet.ac.uk/downloads/repositories/digital-

    repositories.pdf

    LEarning About Digital Institutional Repositories (LEADIRs) Workbook:

    http://dspace.mit.edu/bitstream/handle/1721.1/26698/Barton_2004_Creating.pdf?sequence=1

    The repository software recommendations are broken into three categories:

    Pennington Biomedical Hosted Repository- All hardware and all technical infrastructure will

    be housed by Pennington Biomedical. Additionally, all IT support will be Pennington

    Biomedicals responsibility (Table 1).

    Louisiana State University Hosted Repository- All hardware and all technical infrastructure

    will be housed by Louisiana State University. Additionally, IT support may be a combination of

    Louisiana State Universitys and Pennington Biomedicals responsibility depending on the

    agreement signed between the two institutions (Table 2).

    Third-Party Hosted Repository- All hardware and all technical infrastructure will be housed by

    a third-party. These types of repositories can either be a stand-alone Pennington Biomedical

    repository, or a shared repository in which researchers from many institutions deposit (Table 3).

    High-level pros and cons have been presented for each repository software option presented.

    More information on comparing and contrasting repository software features can be found in the

    following guides:

    DCCs Preservation and Curation in Institutional Repositories:

    http://www.dcc.ac.uk/sites/default/files/documents/reports/irpc-report-v1.3.pdf

    Institutional repository software comparison: DSpace, EPrints, Digital Commons, Islandora and

    Hydra:

    https://circle.ubc.ca/bitstream/handle/2429/44812/Castagne_M_LIBR596_IR_comparison_2013.pdf?sequence=1

    United Nations Educational, Scientific, and Cultural Organization (UNESCO) Institutional

    Repository Software Comparison:http://unesdoc.unesco.org/images/0022/002271/227115E.pdf

    http://tools.jiscinfonet.ac.uk/downloads/repositories/digital-repositories.pdfhttp://tools.jiscinfonet.ac.uk/downloads/repositories/digital-repositories.pdfhttp://tools.jiscinfonet.ac.uk/downloads/repositories/digital-repositories.pdfhttp://tools.jiscinfonet.ac.uk/downloads/repositories/digital-repositories.pdfhttp://dspace.mit.edu/bitstream/handle/1721.1/26698/Barton_2004_Creating.pdf?sequence=1http://dspace.mit.edu/bitstream/handle/1721.1/26698/Barton_2004_Creating.pdf?sequence=1http://www.dcc.ac.uk/sites/default/files/documents/reports/irpc-report-v1.3.pdfhttp://www.dcc.ac.uk/sites/default/files/documents/reports/irpc-report-v1.3.pdfhttps://circle.ubc.ca/bitstream/handle/2429/44812/Castagne_M_LIBR596_IR_comparison_2013.pdf?sequence=1https://circle.ubc.ca/bitstream/handle/2429/44812/Castagne_M_LIBR596_IR_comparison_2013.pdf?sequence=1https://circle.ubc.ca/bitstream/handle/2429/44812/Castagne_M_LIBR596_IR_comparison_2013.pdf?sequence=1http://unesdoc.unesco.org/images/0022/002271/227115E.pdfhttp://unesdoc.unesco.org/images/0022/002271/227115E.pdfhttp://unesdoc.unesco.org/images/0022/002271/227115E.pdfhttp://unesdoc.unesco.org/images/0022/002271/227115E.pdfhttps://circle.ubc.ca/bitstream/handle/2429/44812/Castagne_M_LIBR596_IR_comparison_2013.pdf?sequence=1https://circle.ubc.ca/bitstream/handle/2429/44812/Castagne_M_LIBR596_IR_comparison_2013.pdf?sequence=1http://www.dcc.ac.uk/sites/default/files/documents/reports/irpc-report-v1.3.pdfhttp://dspace.mit.edu/bitstream/handle/1721.1/26698/Barton_2004_Creating.pdf?sequence=1http://tools.jiscinfonet.ac.uk/downloads/repositories/digital-repositories.pdfhttp://tools.jiscinfonet.ac.uk/downloads/repositories/digital-repositories.pdf
  • 5/26/2018 Digital Curation Plan

    28/34

    28 | P a g e

    Table 1. Pennington Biomedical Hosted Repository.

    Islandora DSpace

    Documentation:https://wiki.duraspace.org/display/ISLANDOR

    A6131/Islandora

    Documentation:https://wiki.duraspace.org/display/DSDOC/All+Doc

    umentation

    Islandora Modules/Add-on Architecture:

    http://islandora.ca/resources/modules

    Pros

    Robust both in customization and

    documentation

    Easy, out-of-the-box implementation

    Open Source & Free Download Open Source & Free Download

    Well-documented organization/supportstructure with 79 implementations worldwide

    Well-documented organization/support structurewith over 1000 implementations worldwide

    Built-in relationship functionality to support

    links between Pennington Biomedical openaccess publications, supplementary files, anddata, while still allowing them to stand on their

    own as an object

    N/A

    Requires persistent identifiers, which canbenefit grant applications and renewals thatrequire open access

    Requires persistent identifiers, which can benefitgrant applications and renewals that require openaccess

    Easy to use interface for adding objects,changing metadata, and other administrative

    tasks

    Easy to use interface for adding objects, changingmetadata, and other administrative tasks

    Features batch ingest and workflow add-ontools

    Supports batch importing and METS packageimports, as well as metadata ingest from PubMed

    https://wiki.duraspace.org/display/ISLANDORA6131/Islandorahttps://wiki.duraspace.org/display/ISLANDORA6131/Islandorahttps://wiki.duraspace.org/display/ISLANDORA6131/Islandorahttps://wiki.duraspace.org/display/ISLANDORA6131/Islandorahttps://wiki.duraspace.org/display/DSDOC/All+Documentationhttps://wiki.duraspace.org/display/DSDOC/All+Documentationhttps://wiki.duraspace.org/display/DSDOC/All+Documentationhttps://wiki.duraspace.org/display/DSDOC/All+Documentationhttp://islandora.ca/resources/moduleshttp://islandora.ca/resources/moduleshttp://islandora.ca/resources/moduleshttp://islandora.ca/resources/moduleshttp://islandora.ca/resources/moduleshttp://islandora.ca/resources/moduleshttps://wiki.duraspace.org/display/DSDOC/All+Documentationhttps://wiki.duraspace.org/display/DSDOC/All+Documentationhttps://wiki.duraspace.org/display/DSDOC/All+Documentationhttps://wiki.duraspace.org/display/ISLANDORA6131/Islandorahttps://wiki.duraspace.org/display/ISLANDORA6131/Islandorahttps://wiki.duraspace.org/display/ISLANDORA6131/Islandora
  • 5/26/2018 Digital Curation Plan

    29/34

    29 | P a g e

    New modules include support for: BagIt - This module provides a

    Create Bag option that allows thepackaging of the datastreams in

    Islandora objects.

    Checksum - A simple module to allowrepository managers to enable the

    creation of a checksum for objects. Ifenabled, the following checksum

    algorithms are available: MD5, SHA-1,SHA-256, SHA-384, SHA-512. Note:This is will checksum all datastreams.

    Basic PREMIS - This moduleproduces XML and HTML

    representations of PREMIS metadatafor objects in your repository.Currently, it documents all fixity

    checks performed on datastreams,includes agententries for your

    institution and for the FedoraCommons software and maps contentsof each object's rightselements inDC datastreams to equivalent PREMISrightsExtension elements.

    Checksum Checker - This moduleverifies the checksums derived fromIslandora object datastreams and adds aPREMIS fixity checkentry to theobject's audit log for each datastream

    checked.

    Allows tasks to be run on the items stored in therepository that assist in long-term preservation

    efforts. Some examples include: applying a virus scan to item bitstreams

    identifying a collection based on format

    types which can help assist in formatmigrations

    ensuring a given set of metadata fields arepresent in every item

    ensuring all item bitstreams are readable andtheir checksums agree with the ingest values

    Scholar Module allows for ingest fromPubMed, as well as setting embargo periods

    and citation suggestions:

    http://islandora.ca/sites/default/files/Islandora%20Scholar%20Module%20-

    %20Islandora%20Camp%20NY.pdf

    Supports embargo periods

    N/A Allows versioning of items, but currently hasrestrictions on the versioning functionality:

    https://wiki.duraspace.org/display/DSDOC4x/Item+

    Level+Versioning

    http://islandora.ca/sites/default/files/Islandora%20Scholar%20Module%20-%20Islandora%20Camp%20NY.pdfhttp://islandora.ca/sites/default/files/Islandora%20Scholar%20Module%20-%20Islandora%20Camp%20NY.pdfhttp://islandora.ca/sites/default/files/Islandora%20Scholar%20Module%20-%20Islandora%20Camp%20NY.pdfhttp://islandora.ca/sites/default/files/Islandora%20Scholar%20Module%20-%20Islandora%20Camp%20NY.pdfhttp://islandora.ca/sites/default/files/Islandora%20Scholar%20Module%20-%20Islandora%20Camp%20NY.pdfhttps://wiki.duraspace.org/display/DSDOC4x/Item+Level+Versioninghttps://wiki.duraspace.org/display/DSDOC4x/Item+Level+Versioninghttps://wiki.duraspace.org/display/DSDOC4x/Item+Level+Versioninghttps://wiki.duraspace.org/display/DSDOC4x/Item+Level+Versioninghttps://wiki.duraspace.org/display/DSDOC4x/Item+Level+Versioninghttp://islandora.ca/sites/default/files/Islandora%20Scholar%20Module%20-%20Islandora%20Camp%20NY.pdfhttp://islandora.ca/sites/default/files/Islandora%20Scholar%20Module%20-%20Islandora%20Camp%20NY.pdfhttp://islandora.ca/sites/default/files/Islandora%20Scholar%20Module%20-%20Islandora%20Camp%20NY.pdfhttp://islandora.ca/sites/default/files/Islandora%20Scholar%20Module%20-%20Islandora%20Camp%20NY.pdf
  • 5/26/2018 Digital Curation Plan

    30/34

    30 | P a g e

    Cons

    Implementation and installation would requireheavy IT involvement. Any customizations

    would most likely require additional IT time orLibrary and Information Center staff familiarwith XML.

    Implementation and installation would require heavyIT involvement. Any customizations would most

    likely require additional IT time or Library andInformation Center staff familiar with XML, thecommand line, and SQL:

    http://www.dspace.org/sites/dspace.org/files/dspacehowtoguide.pdf

    Out-of-the-box descriptive metadata support isonly for Dublin Core and MODS. Thoughautomatic generation of technical metadata is

    supported through an additional add-on, it is

    limited only to integration with FITS. NewContent Models would need to be created forany other metadata standards.

    Out-of-the-box support for descriptive,administrative, and structural metadata uses a customDSpace schema. Other metadata schemas would

    require customized ingest forms.

    Out-of-the-box format support is limited to

    PDFs, video, audio, image, and books (TIFFs).This creates a need to build custom contentmodels for Pennington Biomedical Excel/CSVdata files, as well as text manuscript items.

    Default bitstreams do not include comma separated

    files (.csv), only Excel spreadsheets (.xls).

    No automatic support for any preservationactions. These steps would need to happen

    outside of the repository, requiring additionalpolicies, workflows, and staff time.

    Only built-in preservation action is checksums. Noautomatic support for any other preservation actions.

    These steps would need to happen outside of therepository, requiring additional policies, workflows,and staff time.

    Table 2. Louisiana State University (LSU) Hosted Repository.

    Hub-Zero

    Documentation:http://hubzero.org/documentation

    Requires a different deposit agreement for researchers and a guarantee of open access by LouisianaState University.

    Basic considerations to ensure before agreeing to deposit:http://www.crl.edu/archiving-preservation/digital-archives/metrics-assessing-and-certifying/core-re

    http://www.dspace.org/sites/dspace.org/files/dspacehowtoguide.pdfhttp://www.dspace.org/sites/dspace.org/files/dspacehowtoguide.pdfhttp://www.dspace.org/sites/dspace.org/files/dspacehowtoguide.pdfhttp://hubzero.org/documentationhttp://hubzero.org/documentationhttp://hubzero.org/documentationhttp://hubzero.org/documentationhttp://www.crl.edu/archiving-preservation/digital-archives/metrics-assessing-and-certifying/core-rehttp://www.crl.edu/archiving-preservation/digital-archives/metrics-assessing-and-certifying/core-rehttp://www.crl.edu/archiving-preservation/digital-archives/metrics-assessing-and-certifying/core-rehttp://www.crl.edu/archiving-preservation/digital-archives/metrics-assessing-and-certifying/core-rehttp://www.crl.edu/archiving-preservation/digital-archives/metrics-assessing-and-certifying/core-rehttp://www.crl.edu/archiving-preservation/digital-archives/metrics-assessing-and-certifying/core-rehttp://hubzero.org/documentationhttp://hubzero.org/documentationhttp://www.dspace.org/sites/dspace.org/files/dspacehowtoguide.pdfhttp://www.dspace.org/sites/dspace.org/files/dspacehowtoguide.pdf
  • 5/26/2018 Digital Curation Plan

    31/34

    31 | P a g e

    Additional questions to consider: How is LSU going to handle preservation workflows, specifically the steps recommended in

    this document?

    What metadata is being captured for each item? Is it based on a trusted and well-known

    schema? Can HubZero support preservation and technical metadata? Are there places tocapture grant information and open access statements?

    Are batch uploads possible? Can metadata be imported from PubMed?

    Are persistent identifiers being implemented?

    Which modules are being installed? How will those support and/or add-on to these

    recommendations?

    Is the implementation of HubZero and workflow of ingest and long-term preservation being

    documented by LSU staff?

    Will a member of Pennington Biomedical Library and Information Center staff be allowed to

    have manager privileges? If not, how will the storage workflow/transfer of research items from

    Pennington to LSU take place? How will updates to Pennington Biomedical items be handled?

    This should be well-documented and included in an agreement between Pennington Biomedical

    and LSU.

    Will researchers maintain copyright? Or will some sort of open copyright (like Creative

    Commons) be required?

    Are embargos supported?

    Which formats are supported by default? Are customizations to this default list being

    considered? Ensure that all formats listed in data audit results are being accounted for.

    What backups/redundancy is being implemented? Is at least one of these backups occurring in

    a different geographical location?

    What costs will Pennington be expected to cover? First year? Five years? Seven to Ten years?

    Are review schedules in place for hardware? For format monitoring? Is format monitoringoccurring automatically?

    Pros

    Implementation would be Louisiana State Universitys responsibility.

    IT involvement may be Louisiana State Universitys responsibility if outlined in the agreement betweenthe two institutions.

    Cons

    Little control over customizations.

    May have to compromise on metadata and long-term preservation workflows.

  • 5/26/2018 Digital Curation Plan

    32/34

    32 | P a g e

    Table 3. Third-Party Hosted Repository.

    DSpace - Pennington Biomedical Stand Alone

    Documentation:http://dspacedirect.org/

    DSpace also offers a hosted option, in which libraries and small institutions can pay a subscription fee.

    Pros

    In addition to the pros listed under a local implementation of DSpace, costs would be deferred to asubscription fee rather than IT involvement, hardware, and staff time for implementation.

    Cons

    In addition to the cons listed under a local implementation of DSpace, customizations may not beavailable. If they are, they will require additional fees.

    May have to compromise on metadata and long-term preservation workflows.

    Dryad - Shared Repository

    Documentation:http://datadryad.org/pages/repository

    Pros

    Costs would be deferred to a subscription fee rather than IT involvement, hardware, and staff time forimplementation. Pricing information:http://datadryad.org/pages/pricing

    Repository infrastructure is built upon the DSpace software and partners with CLOCKSSto ensure

    long-term access.

    DOIs are assigned to each item.

    Versioning of items is supported, as well as automatic monitoring for format obsolescence.

    Cons

    May have to compromise on metadata and long-term preservation workflows.

    Repository will not feature institutional branding.

    All items are under a Creative Commons copyright.

    figshare - Shared Repository

    Documentation:http://figshare.com/about

    Pros

    Supports unlimited space for free, as long as items are made public.

    Format agnostic.

    http://dspacedirect.org/http://dspacedirect.org/http://dspacedirect.org/http://datadryad.org/pages/repositoryhttp://datadryad.org/pages/repositoryhttp://datadryad.org/pages/repositoryhttp://datadryad.org/pages/pricinghttp://datadryad.org/pages/pricinghttp://datadryad.org/pages/pricinghttp://figshare.com/abouthttp://figshare.com/abouthttp://figshare.com/abouthttp://figshare.com/abouthttp://datadryad.org/pages/pricinghttp://datadryad.org/pages/repositoryhttp://dspacedirect.org/
  • 5/26/2018 Digital Curation Plan

    33/34

    33 | P a g e

    Repository infrastructure partners with CLOCKSSto ensure long-term access.

    DOIs are assigned to each item.

    Supports a variety of metrics.

    A separate institutional repository space can be claimed. For example,penningtionbiomedical.figshare.com.

    Cons

    May have to compromise on metadata and long-term preservation workflows.

    Repository will not feature institutional branding.

    All items are under a Creative Commons copyright.

  • 5/26/2018 Digital Curation Plan

    34/34

    34 | P a g e

    Appendix F: Budget ToolsBefore Pennington Biomedical implements any of the recommendations in this plan, a complete budget

    should be created. JTDS recommends the following resources to help Pennington Biomedical project

    costs, including hardware, software, and staff:

    DCCs suggestions for creating a business plan and understanding the costs of implementation of

    Data Management Services:http://www.dcc.ac.uk/resources/how-guides/how-develop-rdm-

    services#Business-plans

    The espida model helps to make business cases for proposals that may not necessarily offer

    immediate financial benefit to an organisation, but rather bring benefit in more intangible

    spheres:http://www.gla.ac.uk/services/library/espida/

    The Life Cycle Information for E-Literature (LIFE) Project has developed a methodology to

    model the digital lifecycle and calculate the costs of preserving digital information for the next 5,

    10 or 20 years:http://www.life.ac.uk/

    The Transparent Approach to Costing (TRAC) provide[s] information on the income andexpenditure of universities TRAC has been the standard methodology used by Higher

    Education Institutes (HEIs) in the UK for costing their main activities (teaching, research and

    other core activities):http://www.jcpsg.ac.uk/guidance/

    The Keeping Research Data Safe (KRDS) cost/benefit studies, funded by JISC, features tools

    and methodologies that focus on the challenges of assessing costs and benefits of curation and

    preservation of research data:http://beagrie.com/krds/

    http://www.dcc.ac.uk/resources/how-guides/how-develop-rdm-services#Business-planshttp://www.dcc.ac.uk/resources/how-guides/how-develop-rdm-services#Business-planshttp://www.dcc.ac.uk/resources/how-guides/how-develop-rdm-services#Business-planshttp://www.dcc.ac.uk/resources/how-guides/how-develop-rdm-services#Business-planshttp://www.gla.ac.uk/services/library/espida/http://www.gla.ac.uk/services/library/espida/http://www.gla.ac.uk/services/library/espida/http://www.life.ac.uk/http://www.life.ac.uk/http://www.life.ac.uk/http://www.jcpsg.ac.uk/guidance/http://www.jcpsg.ac.uk/guidance/http://www.jcpsg.ac.uk/guidance/http://beagrie.com/krds/http://beagrie.com/krds/http://beagrie.com/krds/http://beagrie.com/krds/http://beagrie.com/krds/http://beagrie.com/krds/http://www.jcpsg.ac.uk/guidance/http://www.life.ac.uk/http://www.gla.ac.uk/services/library/espida/http://www.dcc.ac.uk/resources/how-guides/how-develop-rdm-services#Business-planshttp://www.dcc.ac.uk/resources/how-guides/how-develop-rdm-services#Business-plans