ya-ning arthur chen, feng-chien chung computing centre, academia sinica 11 april, isgc 2008
DESCRIPTION
Ya-ning Arthur Chen, Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008. A hybrid approach of digital long term preservation to institutional repositories - A case study of DSpace/SRB Integration. Outline. Background of MAAT From Website to Institutional Repository - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Ya-ning Arthur Chen, Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008](https://reader035.vdocuments.net/reader035/viewer/2022062309/568147e2550346895db5167a/html5/thumbnails/1.jpg)
A hybrid approach of digital long term preservation to institutional
repositories - A case study of DSpace/SRB Integration
Ya-ning Arthur Chen, Feng-chien Chung
Computing Centre, Academia Sinica
11 April, ISGC 2008
![Page 2: Ya-ning Arthur Chen, Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008](https://reader035.vdocuments.net/reader035/viewer/2022062309/568147e2550346895db5167a/html5/thumbnails/2.jpg)
Outline
• Background of MAAT• From Website to Institutional Repository• Long Term Preservation & OAIS• The Hybrid Approach• Future
![Page 3: Ya-ning Arthur Chen, Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008](https://reader035.vdocuments.net/reader035/viewer/2022062309/568147e2550346895db5167a/html5/thumbnails/3.jpg)
MAAT – Background
• The Metadata Architecture & Application Team (MAAT) was established in 2002 to engage in metadata research and service supportive for the National Digital Archives Program (NDAP) in Taiwan
• To date, the MAAT has been supporting over 80 digital library projects of Taiwan E-Learning & Digital Archive Program (TELDAP, former: NDAP)
![Page 4: Ya-ning Arthur Chen, Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008](https://reader035.vdocuments.net/reader035/viewer/2022062309/568147e2550346895db5167a/html5/thumbnails/4.jpg)
MAAT – Motivation
• A number of documents have been created and can be categorized into
– questionnaires,
– work sheets,
– meeting records,
– metadata mapping tables,
– system specifications,
– best practices of metadata standards,
– technical reports,
– research papers,
– briefings, and
– tutorial materials.
• Most documents of the MAAT website are arranged in a static manner.
![Page 5: Ya-ning Arthur Chen, Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008](https://reader035.vdocuments.net/reader035/viewer/2022062309/568147e2550346895db5167a/html5/thumbnails/5.jpg)
MAAT Website
http://www.sinica.edu.tw/~metadata
Academia Sinica
![Page 6: Ya-ning Arthur Chen, Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008](https://reader035.vdocuments.net/reader035/viewer/2022062309/568147e2550346895db5167a/html5/thumbnails/6.jpg)
MAAT - Consideration1
• Document management and repository– over 1,000 documents and URL links have
been arranged and served at the MAAT website.
– the MAAT website needs an effective system of document management.
• Access control – The MAAT website still lacks access
control for document access.
![Page 7: Ya-ning Arthur Chen, Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008](https://reader035.vdocuments.net/reader035/viewer/2022062309/568147e2550346895db5167a/html5/thumbnails/7.jpg)
MAAT - Consideration2
• Workflow reengineering– the MAAT website adopts a centralized model
to maintain documents and website arrangement.
– This model is very complicated and labor-intensive, and the overhead cost is very high.
• Usage Statistics Report
![Page 8: Ya-ning Arthur Chen, Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008](https://reader035.vdocuments.net/reader035/viewer/2022062309/568147e2550346895db5167a/html5/thumbnails/8.jpg)
MAAT - Challenge
• Too many publications, • Too much change (that is various
document versions), • Too many contributors, and • Too many institutions.
![Page 9: Ya-ning Arthur Chen, Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008](https://reader035.vdocuments.net/reader035/viewer/2022062309/568147e2550346895db5167a/html5/thumbnails/9.jpg)
Implementation Level
Static Website
Institution Repository
Phase1: from website to IR
![Page 10: Ya-ning Arthur Chen, Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008](https://reader035.vdocuments.net/reader035/viewer/2022062309/568147e2550346895db5167a/html5/thumbnails/10.jpg)
DSpace - feature
• Captures– Digital research material in any format– Directly from creators (e.g. faculty)– Large-scale, stable, managed long-term storage
• Describes– Descriptive metadata (Dublin Core)– Technical metadata (file size, format…) – Rights metadata (licenses, creative commons…)
• Distributes– Via WWW, with necessary access control
• Preserves– Persistent ID and Handle– Bitstream format registry
![Page 11: Ya-ning Arthur Chen, Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008](https://reader035.vdocuments.net/reader035/viewer/2022062309/568147e2550346895db5167a/html5/thumbnails/11.jpg)
DSpace - Data Model
![Page 12: Ya-ning Arthur Chen, Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008](https://reader035.vdocuments.net/reader035/viewer/2022062309/568147e2550346895db5167a/html5/thumbnails/12.jpg)
MAAT – Content1
• Content Type – 支援計畫 (Documents from the Projects we support)
– 出版與活動 (Documents of Publication and Activity)
– 計畫管理 (Project Management related – restricted documents)
– 研究發展 (Research & Development - restricted documents)
– 48 Communities, 110 collections, 783 items
• Document Format – User upload: 794 pdf files, 446 ms word files, 59 ms powerpoint
slides, 27 xml files, 17 jpeg images, 16 html files, 7 ms excel files…and the others
– System generate: Over 1900 Plain Text files (mainly DSpace License files)…
![Page 13: Ya-ning Arthur Chen, Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008](https://reader035.vdocuments.net/reader035/viewer/2022062309/568147e2550346895db5167a/html5/thumbnails/13.jpg)
MAAT – Content2
• Access Method– DSpace user browse and search interface– Search engines (google, yahoo…etc.)– OAI-PMH harvesting
![Page 14: Ya-ning Arthur Chen, Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008](https://reader035.vdocuments.net/reader035/viewer/2022062309/568147e2550346895db5167a/html5/thumbnails/14.jpg)
MAAT DSpace
http://pl11.sinica.edu.tw:8080/dspace/index.jsp
![Page 15: Ya-ning Arthur Chen, Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008](https://reader035.vdocuments.net/reader035/viewer/2022062309/568147e2550346895db5167a/html5/thumbnails/15.jpg)
DSpace - Consideration
• The Need for Extending DSpace Storage Capabilities– The amount of documents grows so fast that an
enormous size storage solution is required
• The Lack of Risk Management Mechanism– The Reliable Backup and Disaster Recovery Systems
are not included in the default DSpace Installation
![Page 16: Ya-ning Arthur Chen, Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008](https://reader035.vdocuments.net/reader035/viewer/2022062309/568147e2550346895db5167a/html5/thumbnails/16.jpg)
Implementation Level
Statis Website
Institution Repository
Phase1: from website to IR
Institution Repository + Grid
Phase2: from IR to Long Term Preservation
![Page 17: Ya-ning Arthur Chen, Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008](https://reader035.vdocuments.net/reader035/viewer/2022062309/568147e2550346895db5167a/html5/thumbnails/17.jpg)
DSpace/SRB Approach1
• In 2004, NARA (with NSF/NPACI) has funded a project aimed at integrating DSpace and SRB to – allow DSpace to use the data grid as a storage layer– permit the exchange of authentic documents between them
• NARA Proposal & Participants– San Diego Super Computer Center (SDSC)
• Member of National Partnership for Advanced Computational Infrastructure (NPACI) an NSF sponsored program
– MIT Libraries– UC San Diego Libraries (UCSD)– Hewlett Packard Laboratories (HP)– National Archives and Records Administration (NARA)
![Page 18: Ya-ning Arthur Chen, Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008](https://reader035.vdocuments.net/reader035/viewer/2022062309/568147e2550346895db5167a/html5/thumbnails/18.jpg)
DSpace/SRB Approach2
• In DSpace, there can be multiple bitstream stores, each of these bitstream stores can be traditional storage or SRB storage.
• Both traditional and SRB storage are specified by configuration parameters.
• Both traditional and SRB bitstream stores are configured in dspace.cfg
![Page 19: Ya-ning Arthur Chen, Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008](https://reader035.vdocuments.net/reader035/viewer/2022062309/568147e2550346895db5167a/html5/thumbnails/19.jpg)
Examination of DSpace/SRB
• An Open Archive Information System (OAIS) intends to preserve information for access and use by a Designated Community
![Page 20: Ya-ning Arthur Chen, Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008](https://reader035.vdocuments.net/reader035/viewer/2022062309/568147e2550346895db5167a/html5/thumbnails/20.jpg)
OAIS Functional Model
![Page 21: Ya-ning Arthur Chen, Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008](https://reader035.vdocuments.net/reader035/viewer/2022062309/568147e2550346895db5167a/html5/thumbnails/21.jpg)
Workflow
4. DSpace DB/SRB MCAT
3. SRB Storage
6. DSpace User Interface
2. DSpace Ingest
1. Common service (Network, OS…)
Submit Interface& Batch Import
DSpace/GoogleUser
5. DSpace & SRB Admin
![Page 22: Ya-ning Arthur Chen, Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008](https://reader035.vdocuments.net/reader035/viewer/2022062309/568147e2550346895db5167a/html5/thumbnails/22.jpg)
OAIS Functional Model…Again
DSpace & SRB Administration
DSpace RDBMS & SRB MCAT DSpace
Submit Interface
DSpace User InterfaceSRB
Mass Storage
DSpace Ingest
DSpace Batch Import
![Page 23: Ya-ning Arthur Chen, Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008](https://reader035.vdocuments.net/reader035/viewer/2022062309/568147e2550346895db5167a/html5/thumbnails/23.jpg)
Producer, Management and Consumer
• Producer– DSpace may play the role of ingest SIP from
producer, and generate AIP for Management & Storage
• Management– SRB May play the role of receive AIP then Store &
Manage data, and generate AIP for Access• Consumer
– DSpace May Play the role of process the access request and generate the proper DIP for dissemination
DSpace RDBMS & SRB MCAT DSpace
Submit Interface
DSpace User InterfaceSRB
Mass Storage
DSpace Ingest
DSpace Batch Import
SIP AIP AIPDIP
![Page 24: Ya-ning Arthur Chen, Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008](https://reader035.vdocuments.net/reader035/viewer/2022062309/568147e2550346895db5167a/html5/thumbnails/24.jpg)
Archives arrangement
• Logical Archives structure:– DSpace allow multi-level communities and
one level collection– Archive’s principle
• Principle of provenance• Principle of respect des fonds
• Physical Files Arrangement: – SRB Mass Storage Technology
![Page 25: Ya-ning Arthur Chen, Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008](https://reader035.vdocuments.net/reader035/viewer/2022062309/568147e2550346895db5167a/html5/thumbnails/25.jpg)
Future1
• Best Practice & SOP for DSpace/SRB integration
• Deeper Check Against Activities of OAIS• Preservation Planning and policy
– Monitor Producer/Management/Consumer’s service requirements and emerging technology, develop archival strategy & migration plan
![Page 26: Ya-ning Arthur Chen, Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008](https://reader035.vdocuments.net/reader035/viewer/2022062309/568147e2550346895db5167a/html5/thumbnails/26.jpg)
Future2
• Feasibility Evaluation– Migrate from SRB to others advanced
technology, such as SRM, iRODS…– Adopt metadata approach to enhance digital
preservation, such as PREMIS and METS (ex: structural map, behavior section…)
![Page 27: Ya-ning Arthur Chen, Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008](https://reader035.vdocuments.net/reader035/viewer/2022062309/568147e2550346895db5167a/html5/thumbnails/27.jpg)
Thank You