duraspace, fedora and duracloud thorny staples director, community strategy and alliances esip...
TRANSCRIPT
DuraSpace, Fedora and DuraCloud
Thorny Staples
Director, Community Strategy and Alliances
ESIP Meeting, July 8, 2009
DuraSpace, Inc.
• Combined Fedora Commons, Inc. and DSpace Foundation
• 501-(c)3 private, non-profit company• 4-year project funded by Moore Foundation to
become self-sustaining• Misson-driven, products evolve to follow
community needs• Moving towards community-based software
development
Scholarly and Scientific Collections
Preservation and Archiving
Education, Knowledge Spaces
The world we work in…
Data Curation, Linking, Publishing
blog and wiki
and more …
DuraSpace Products
• Fedora• Dspace• Akubra – storage plug-in module with
tranactional file system• Mulgara – RDF indexing engine• Topaz – core semantic knowledgebase
components• DuraClould
Solution Communities
• Community group that creates and maintains the vision for solution bundle in an area
• Gathers resources to create software for solution
• Coordinates development with DuraSpace technical staff
• Smaller group that gets things done will emerge
Solution Areas
• Data Curation• Open Access Publishing• Integration Services• Preservation and Archiving• Small Archives• Scholars’ Workbench
Other Possible Community Groups
• Other software development groups• News and Publications Outreach group that
works with our Communications Director• Issue/advocacy groups that work on
standards important to the community
The Flexible Extensible Digital Object Repository Architecture
• A set of abstractions that can be used to represent different kinds of data
• A repository management system• A foundation for many information
management applications• Designed to make data “durable” over the
long term
165 Current Known Users
• Broadcasting and media – 1• Consortia – 9• Corporations – 14• Government agencies – 8• IT- Related Institutions – 10• Medical Centers and Libraries – 4• Museums and Cultural Organizations – 5• National Libraries and Archives – 16• Professional Societies – 2• Publishing - 4• Research Groups and Projects – 18• Semantic and Virtual Library Projects - 6• University Libraries and Archives - 68
Community-built Applications
• Fez• Muradora• Islandora (Drupal-based)• Arrow• VITAL (vendor software from VTLS)• eSciDoc• Hydra
Making complex digital information “durable” is a very hard problem
• The existence and meaning of content needs to be verifiable as technologies change
• A history of the changes to the encoding and state of content must be reliably provided
• A meaningful context for any unit of content may be one of many and must be sustained
• Complex resources will increasingly be dispersed across institutional boundaries.
The Fedora abstractions provide a durability framework.
• Content is “unitized” as information objects that combine data, metadata, policies, relationships and the history of the object.
• Complex digital resources are formally defined graphs of related objects.
• The public view of the content is presented as abstract behaviors.
• The web services orientation of Fedora provides the basis for repository federation.
Abstract Data M anage me nt(F e dora CommonsSe rv ice F rame work)
Preservationand Archiv ing
Solutions
Data CurationSolutions
Scholars'Repository
PublishingSolutions
TapeLib rar ie s
SAN
RAID array
DCDC
Persistent ID
RELS-EXTRELS-EXT
AUDITAUDIT
11
22
nn
Reserved Datastreams
Custom Datastreams
(any type, any number)
A data object is one unit of content
POLICYPOLICY
Object Properties
“Data or Service Definition or Service Deployment
“A”, “I”, or “D” (Active, Inactive, Deleted)
“Any string”
“Any string”
“2007-04-30T19:59:03.000Z” (UTC, ISO8601 format)
“2007-04-30T19:59:03.000Z” (UTC, ISO8601 format)
“Any string”
System generates value
Either way
Client provides value
LEGEND
“namespace:name
PID
Object Type
State
Label
Content Model
Created Date
Last Modified Date
Owner ID
PID
ManagedFedora stores and manages the content bytestream. Content located via internal ID
Fedora stores a reference (URL) to the content and mediates access to the content
Fedora stores a reference (URL) to the content,but will not mediate access to content.
Fedora stores a name-spaced block of XML content within the Fedora digital object XML wrapper file.
Datastreams hold or represent the content
External
External Redirected
Inline XML
Datastream Properties
Version
Any XML “NCName” unique within the object
“X”, “M”, “E”, or “R” (Inline XML, Managed,Externally Referenced, or Redirected)
“A”, “I”, or “D” (Active, Inactive, Deleted)
“true” or “false”
1 or more
System generates value
Either way
Client provides value
LEGEND
DatastreamDatastream
Datastream ID
State
Control Group
Versionable
Relationships Among Objects
• Describes adjacency relationships among objects, among units of content
• RDF data of the form:
PID – typeOfRelationship – relatedObjectPID• Can used to assemble aggregations of
objects• Can build graphs of relationships to feed into
user interfaces
Optional Object Behaviors
• Data objects can have different views or transformations
• Sets of abstract behaviors that different kinds of objects can subscribe to
• Corresponding sets of services that specific objects can execute
• The business logic is hidden behind an abstraction
Pid
syste m Me ta
MO D S
JP2 0 00
T hum b S cree n Mas te rC us to m
S izeD ub linC o re MODS C itation
MODSFile
J PEG200File
ContentAccess
ContentManagement
General Im age Object
JPEG2000 Im age Object
Persistent ID(PID)
SystemMetadata
thumbnail image file
med res. image file
high res. image file
max res. image file
Persistent ID(PID)
SystemMetadata
JPEG200im age file
Service Description
Service M echanismfor General Im age Objects
ServiceM echanismfor JPEG2000 Im age Objects
get-thum bnail-sized-im age
get-m ed-sized-im age
get-h igh-res-im age
get-m ax-sized-im age
get-thum bnail-sized-im age
get-m ed-sized-im age
get-h igh-res-im age
get-m ax-sized-im age
get-sm allest-JPEG2000-size
get-m idrange-JPEG2000-size
get-h igh-res-JPEG2000-size
get-m ax-JPEG2000-size
Content Models
• Create classes of data objects• Expressed as Cmodel objects• A Cmodel object defines the number and
types of data streams for objects of that class• A Cmodel object binds to service objects to
enable appropriate behaviors to be inherited by data objects
Persistent ID (PID)
Service DefinitionMetadata
SystemMetadata
DatastreamsCmodel Object
Persistent ID (PID)
Service BindingMetadata (WSDL)
SystemMetadata
Datastreams
WebService
service contract
service
subscriptio
n
data contract
Persistent ID (PID)
RDF data
Datastreams
System Metadata
Service Mechanism Object
Service Definition Object
Persistent ID (PID)
System Metadata
Datastreams
Data Objects
Other components include:- Parameter values used by the method- Datetime stamp for earlier version
A behavior call has the form:
Object PID + SDef Name + Method Name
Objects Representing Aggregations
• Creating parent objects for complex resources
• Representing explicit collections• Representing implicit collections• Creating digital surrogates for physical
entities
A Research Project
RemoteSensingDevice
Im agesDatasets
Proposal
RemoteCamera
FinalReport
Fedora Repository ServiceGSearch GSearch
OAIOAI
IngestIngest
SimpleJMS
SimpleJMS
Fedora Framework Service Integration
More…More…repository publishes events
serviceslisten andconsumeevents or other messages
Current Work… early seeds for DuraCloud concept
SharedStorage Abstraction
Plug-in 1 Plug-in 2 Plug-in …
Amazon University SAN/Fabric
LocalStorage
DuraCloud - basics
• Replicate to multiple storage providers• Replicate to multiple geographic areas• Monitor and audit digital assets• Compute services in cloud next to content
• Hosted by DuraSpace not-for-profit org• Partnerships with cloud providers• “Pay for use” for services and storage
DuraCloudTrusted management of and access to
durable digital assets in the cloud
DuraSpaceMediating
Service
Sun
EMCAmazon
Microsoft
http://www.duraspace.org/
http://www.fedora-commons.org/
Examples
• Text, images only • Art and Architecture• Quantitative data• Aggregation objects