new value from the dspace foundation and fedora commons michele kimpton and sandy payette executive...
TRANSCRIPT
New Value from the DSpace Foundation and Fedora Commons
Michele Kimpton and Sandy PayetteExecutive Directors
DuraSpace
Social and Technical Forces (2000-present)Waves of Repository-Enabled Applications
• Institutional Repositories• Digital Collections
• Digital Libraries• Collaborative Spaces and “Web 2.0”
• Scholarly and Scientific Infrastructure• E-Research• Data (archiving, linking, sharing)
Implications for our future work
mor
e d
istr
ibut
ed
mor
e co
llabo
rativ
e
mor
e w
eb-o
rient
ed
mor
e op
en
mor
e in
tero
pera
ble
Emergence of Infrastructure
Source: Understanding Infrastructure: Lessons for New ScientificInfrastructure, http://deepblue.lib.umich.edu/handle/2027.42/49353
Systems
Integrate componentsCentral controlDedicated/specialized gatewaysMore closedMore preconceived
Integrate systemsDistributed controlGeneric gatewaysMore openMore reconfigurable
Networks
Source: Francine Berman, Got Data? A Guide to Data Preservation in the Information Age, pp 50-56
December 2008
page 55
page 53
History: DSpace and Fedora
• Two open source repository systems– DSpace:
• End-user application and repository• Turn key system providing easy out-of-box
– Fedora: • Web services (repository and supporting services)• Flexible, modular, and scalable
• Enabling technology supporting…– scholarship, science, culture, education– open access– preservation and archiving
DSpace and Fedora Installations
Largest share of open repositories worldwide… over 700 institutions tracked in our registries
UniversitiesResearch CentersLibrariesArchivesCultural HeritageGovernmentMore…
DSpace Foundation and Fedora Commons501(c)(3) non-profit organizations
Web APIsStorage Abstraction
Architecture Strategy
SWORD DepositMS Word Plug-In
DuraSpaceFuture Joint Offerings
Business StrategyCommunication/Outreach
Progression of Partnership
http://blogs.the451group.com/opensource/
Goals of Strategic Partnership• Stewardship:
– Support and align open source development communities for DSpace and Fedora
– Keepers of the cause (durability + access)• Innovation:
– Think beyond existing platforms – New strategic directions for repositories– New products and services
• Sustainability: – Devise business models that fit our sector– Services that generate revenue for non-profits
What About the Cloud?
An emerging architecture in which data and applications reside in cyberspace,
allowing users to access via the internet(Pew Internet 9/08)
A style of computing where massively scalable IT-related capabilities are provided “as a service” using Internet
technologies to multiple external customers. (Gartner, 6/08).
Types of Cloud Services
• Software as a Service (SAAS) – e.g. , Google Apps
• Cloud Computing– e.g., Amazon Elastic Compute Cloud (EC2)
• Cloud Storage– e.g., Amazon Simple Storage Service (S3)
Cloud Services
Vision: Federated Repositories and Cyberinfrastructure
DuraSpace
Heaven
DuraSpace PropositionTrust and durability in the cloud
What have we learned from our users?
Focus Groups
Site Visits
Forums
Problems
• Tools and processes unproven• Limited IT support• Capital expenditures limited• Task can be overwhelming ( replication,
migration, emulation ect.)
Preservation important but difficult to implement
Problems
• Systems not interoperable• Heterogeneous applications/platforms• Lack of commons standards• Inelastic compute capability
Barriers to making content more accessible and useful to researchers
Advantages – Cloud Services• Flexibility• Scalability• Pay for use• Easy to implement• Cost
Cost
Public cloud providers drive cost down through scale, location and virtualization technology
Large Data centers(50k+) can achieve 5 to 7 times costs savings over Medium Data Centers(1,000)
*Hamilton, J Internet-Scale Service Efficiency (Sept 08)
Technology* Cost Med DC Cost Large DC
Network $95 per Mbit/sec/mo $13 per Mbit/sec/mo
Storage $2.20 per Gbyte/mo $.40 per Gbyte/mo
Admin 140 servers/admin >1000 servers/admin
Issues• Security• Transparency• Data lock in• SLA’s• Trust
DuraSpaceTrusted management of and access to
durable digital assets in the cloud
DuraSpaceMediating
Service
DuraSpace- Notional Architecture
Architectural view
Core services-Preservation based
• Replicate to multiple storage providers• Replicate to multiple geographic areas• Be able to manage content and services
through web based “Dashboard”• Includes integrity checking and monitoring• “Pay for use” for services and storage
Technology Services• Build and run services on top of content stored in the
cloud– Search– Aggregation– Streaming– Migration– Hosting
• Enable others to build services/apps on top of content
Use Cases:DuraSpace with Cloud Storage
• Online backup for text, images, datasets, video, audio
• Preservation-Multiple copies, geographies, administrations
• Temporary or permanent project storage
Use cases:DuraSpace with Cloud Compute
• Streaming service for video• JPEG2000 image engine• Indexing and other processing heavy jobs• Staging area for repository ingest• Repositories in cloud• Data and text mining over open data• Aggregation and web 2.0 tools on open content
and collections
DuraSpace software
• Open source - apache license• Open core• Run Your Own: Private clouds, University
consortia• Extensible: Research partners
Critical success factors
• Ease of use- simplicity• Trusted partner for end user• Cost effective• Scalable/Flexible• Can establish key partnerships with service
providers• Can build community of developers and users
Timeline• Identified initial cloud partners• Identified initial pilot partners• Defined initial requirements• Initial open source release -Q3 2009• Begin pilot- Fall 2009• Extensions available for repository platforms- Q1 2010• Roll out to Repository community-Q1 2010• Launch production service Q2 2010
Initial capabilities• Replication, up to three providers
(including local store)• Web based “Dashboard”• Data integrity checking and monitoring• Can push content from DSpace/Fedora
repository platform• Integrated billing• Compute capability• A few initial compute services TBD
Listen…
Sandy and Michele’s DuraSpace webinar
http://www.education-webevents.com/