duraspace, fedora and duracloud thorny staples director, community strategy and alliances esip...

32
DuraSpace, Fedora and DuraCloud Thorny Staples Director, Community Strategy and Alliances ESIP Meeting, July 8, 2009

Upload: teresa-park

Post on 20-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DuraSpace, Fedora and DuraCloud Thorny Staples Director, Community Strategy and Alliances ESIP Meeting, July 8, 2009

DuraSpace, Fedora and DuraCloud

Thorny Staples

Director, Community Strategy and Alliances

ESIP Meeting, July 8, 2009

Page 2: DuraSpace, Fedora and DuraCloud Thorny Staples Director, Community Strategy and Alliances ESIP Meeting, July 8, 2009

DuraSpace, Inc.

• Combined Fedora Commons, Inc. and DSpace Foundation

• 501-(c)3 private, non-profit company• 4-year project funded by Moore Foundation to

become self-sustaining• Misson-driven, products evolve to follow

community needs• Moving towards community-based software

development

Page 3: DuraSpace, Fedora and DuraCloud Thorny Staples Director, Community Strategy and Alliances ESIP Meeting, July 8, 2009

Scholarly and Scientific Collections

Preservation and Archiving

Education, Knowledge Spaces

The world we work in…

Data Curation, Linking, Publishing

blog and wiki

and more …

Page 4: DuraSpace, Fedora and DuraCloud Thorny Staples Director, Community Strategy and Alliances ESIP Meeting, July 8, 2009

DuraSpace Products

• Fedora• Dspace• Akubra – storage plug-in module with

tranactional file system• Mulgara – RDF indexing engine• Topaz – core semantic knowledgebase

components• DuraClould

Page 5: DuraSpace, Fedora and DuraCloud Thorny Staples Director, Community Strategy and Alliances ESIP Meeting, July 8, 2009

Solution Communities

• Community group that creates and maintains the vision for solution bundle in an area

• Gathers resources to create software for solution

• Coordinates development with DuraSpace technical staff

• Smaller group that gets things done will emerge

Page 6: DuraSpace, Fedora and DuraCloud Thorny Staples Director, Community Strategy and Alliances ESIP Meeting, July 8, 2009

Solution Areas

• Data Curation• Open Access Publishing• Integration Services• Preservation and Archiving• Small Archives• Scholars’ Workbench

Page 7: DuraSpace, Fedora and DuraCloud Thorny Staples Director, Community Strategy and Alliances ESIP Meeting, July 8, 2009

Other Possible Community Groups

• Other software development groups• News and Publications Outreach group that

works with our Communications Director• Issue/advocacy groups that work on

standards important to the community

Page 8: DuraSpace, Fedora and DuraCloud Thorny Staples Director, Community Strategy and Alliances ESIP Meeting, July 8, 2009

The Flexible Extensible Digital Object Repository Architecture

• A set of abstractions that can be used to represent different kinds of data

• A repository management system• A foundation for many information

management applications• Designed to make data “durable” over the

long term

Page 9: DuraSpace, Fedora and DuraCloud Thorny Staples Director, Community Strategy and Alliances ESIP Meeting, July 8, 2009

165 Current Known Users

• Broadcasting and media – 1• Consortia – 9• Corporations – 14• Government agencies – 8• IT- Related Institutions – 10• Medical Centers and Libraries – 4• Museums and Cultural Organizations – 5• National Libraries and Archives – 16• Professional Societies – 2• Publishing - 4• Research Groups and Projects – 18• Semantic and Virtual Library Projects - 6• University Libraries and Archives - 68

Page 10: DuraSpace, Fedora and DuraCloud Thorny Staples Director, Community Strategy and Alliances ESIP Meeting, July 8, 2009

Community-built Applications

• Fez• Muradora• Islandora (Drupal-based)• Arrow• VITAL (vendor software from VTLS)• eSciDoc• Hydra

Page 11: DuraSpace, Fedora and DuraCloud Thorny Staples Director, Community Strategy and Alliances ESIP Meeting, July 8, 2009

Making complex digital information “durable” is a very hard problem

• The existence and meaning of content needs to be verifiable as technologies change

• A history of the changes to the encoding and state of content must be reliably provided

• A meaningful context for any unit of content may be one of many and must be sustained

• Complex resources will increasingly be dispersed across institutional boundaries.

Page 12: DuraSpace, Fedora and DuraCloud Thorny Staples Director, Community Strategy and Alliances ESIP Meeting, July 8, 2009

The Fedora abstractions provide a durability framework.

• Content is “unitized” as information objects that combine data, metadata, policies, relationships and the history of the object.

• Complex digital resources are formally defined graphs of related objects.

• The public view of the content is presented as abstract behaviors.

• The web services orientation of Fedora provides the basis for repository federation.

Page 13: DuraSpace, Fedora and DuraCloud Thorny Staples Director, Community Strategy and Alliances ESIP Meeting, July 8, 2009

Abstract Data M anage me nt(F e dora CommonsSe rv ice F rame work)

Preservationand Archiv ing

Solutions

Data CurationSolutions

Scholars'Repository

PublishingSolutions

TapeLib rar ie s

SAN

RAID array

Page 14: DuraSpace, Fedora and DuraCloud Thorny Staples Director, Community Strategy and Alliances ESIP Meeting, July 8, 2009

DCDC

Persistent ID

RELS-EXTRELS-EXT

AUDITAUDIT

11

22

nn

Reserved Datastreams

Custom Datastreams

(any type, any number)

A data object is one unit of content

POLICYPOLICY

Page 15: DuraSpace, Fedora and DuraCloud Thorny Staples Director, Community Strategy and Alliances ESIP Meeting, July 8, 2009

Object Properties

“Data or Service Definition or Service Deployment

“A”, “I”, or “D” (Active, Inactive, Deleted)

“Any string”

“Any string”

“2007-04-30T19:59:03.000Z” (UTC, ISO8601 format)

“2007-04-30T19:59:03.000Z” (UTC, ISO8601 format)

“Any string”

System generates value

Either way

Client provides value

LEGEND

“namespace:name

PID

Object Type

State

Label

Content Model

Created Date

Last Modified Date

Owner ID

PID

Page 16: DuraSpace, Fedora and DuraCloud Thorny Staples Director, Community Strategy and Alliances ESIP Meeting, July 8, 2009

ManagedFedora stores and manages the content bytestream. Content located via internal ID

Fedora stores a reference (URL) to the content and mediates access to the content

Fedora stores a reference (URL) to the content,but will not mediate access to content.

Fedora stores a name-spaced block of XML content within the Fedora digital object XML wrapper file.

Datastreams hold or represent the content

External

External Redirected

Inline XML

Page 17: DuraSpace, Fedora and DuraCloud Thorny Staples Director, Community Strategy and Alliances ESIP Meeting, July 8, 2009

Datastream Properties

Version

Any XML “NCName” unique within the object

“X”, “M”, “E”, or “R” (Inline XML, Managed,Externally Referenced, or Redirected)

“A”, “I”, or “D” (Active, Inactive, Deleted)

“true” or “false”

1 or more

System generates value

Either way

Client provides value

LEGEND

DatastreamDatastream

Datastream ID

State

Control Group

Versionable

Page 18: DuraSpace, Fedora and DuraCloud Thorny Staples Director, Community Strategy and Alliances ESIP Meeting, July 8, 2009

Relationships Among Objects

• Describes adjacency relationships among objects, among units of content

• RDF data of the form:

PID – typeOfRelationship – relatedObjectPID• Can used to assemble aggregations of

objects• Can build graphs of relationships to feed into

user interfaces

Page 19: DuraSpace, Fedora and DuraCloud Thorny Staples Director, Community Strategy and Alliances ESIP Meeting, July 8, 2009

Optional Object Behaviors

• Data objects can have different views or transformations

• Sets of abstract behaviors that different kinds of objects can subscribe to

• Corresponding sets of services that specific objects can execute

• The business logic is hidden behind an abstraction

Page 20: DuraSpace, Fedora and DuraCloud Thorny Staples Director, Community Strategy and Alliances ESIP Meeting, July 8, 2009

Pid

syste m Me ta

MO D S

JP2 0 00

T hum b S cree n Mas te rC us to m

S izeD ub linC o re MODS C itation

MODSFile

J PEG200File

ContentAccess

ContentManagement

Page 21: DuraSpace, Fedora and DuraCloud Thorny Staples Director, Community Strategy and Alliances ESIP Meeting, July 8, 2009

General Im age Object

JPEG2000 Im age Object

Persistent ID(PID)

SystemMetadata

thumbnail image file

med res. image file

high res. image file

max res. image file

Persistent ID(PID)

SystemMetadata

JPEG200im age file

Service Description

Service M echanismfor General Im age Objects

ServiceM echanismfor JPEG2000 Im age Objects

get-thum bnail-sized-im age

get-m ed-sized-im age

get-h igh-res-im age

get-m ax-sized-im age

get-thum bnail-sized-im age

get-m ed-sized-im age

get-h igh-res-im age

get-m ax-sized-im age

get-sm allest-JPEG2000-size

get-m idrange-JPEG2000-size

get-h igh-res-JPEG2000-size

get-m ax-JPEG2000-size

Page 22: DuraSpace, Fedora and DuraCloud Thorny Staples Director, Community Strategy and Alliances ESIP Meeting, July 8, 2009

Content Models

• Create classes of data objects• Expressed as Cmodel objects• A Cmodel object defines the number and

types of data streams for objects of that class• A Cmodel object binds to service objects to

enable appropriate behaviors to be inherited by data objects

Page 23: DuraSpace, Fedora and DuraCloud Thorny Staples Director, Community Strategy and Alliances ESIP Meeting, July 8, 2009

Persistent ID (PID)

Service DefinitionMetadata

SystemMetadata

DatastreamsCmodel Object

Persistent ID (PID)

Service BindingMetadata (WSDL)

SystemMetadata

Datastreams

WebService

service contract

service

subscriptio

n

data contract

Persistent ID (PID)

RDF data

Datastreams

System Metadata

Service Mechanism Object

Service Definition Object

Persistent ID (PID)

System Metadata

Datastreams

Data Objects

Page 24: DuraSpace, Fedora and DuraCloud Thorny Staples Director, Community Strategy and Alliances ESIP Meeting, July 8, 2009

Other components include:- Parameter values used by the method- Datetime stamp for earlier version

A behavior call has the form:

Object PID + SDef Name + Method Name

Page 25: DuraSpace, Fedora and DuraCloud Thorny Staples Director, Community Strategy and Alliances ESIP Meeting, July 8, 2009

Objects Representing Aggregations

• Creating parent objects for complex resources

• Representing explicit collections• Representing implicit collections• Creating digital surrogates for physical

entities

Page 26: DuraSpace, Fedora and DuraCloud Thorny Staples Director, Community Strategy and Alliances ESIP Meeting, July 8, 2009

A Research Project

RemoteSensingDevice

Im agesDatasets

Proposal

RemoteCamera

FinalReport

Page 27: DuraSpace, Fedora and DuraCloud Thorny Staples Director, Community Strategy and Alliances ESIP Meeting, July 8, 2009

Fedora Repository ServiceGSearch GSearch

OAIOAI

IngestIngest

SimpleJMS

SimpleJMS

Fedora Framework Service Integration

More…More…repository publishes events

serviceslisten andconsumeevents or other messages

Page 28: DuraSpace, Fedora and DuraCloud Thorny Staples Director, Community Strategy and Alliances ESIP Meeting, July 8, 2009

Current Work… early seeds for DuraCloud concept

SharedStorage Abstraction

Plug-in 1 Plug-in 2 Plug-in …

Amazon University SAN/Fabric

LocalStorage

Page 29: DuraSpace, Fedora and DuraCloud Thorny Staples Director, Community Strategy and Alliances ESIP Meeting, July 8, 2009

DuraCloud - basics

• Replicate to multiple storage providers• Replicate to multiple geographic areas• Monitor and audit digital assets• Compute services in cloud next to content

• Hosted by DuraSpace not-for-profit org• Partnerships with cloud providers• “Pay for use” for services and storage

Page 30: DuraSpace, Fedora and DuraCloud Thorny Staples Director, Community Strategy and Alliances ESIP Meeting, July 8, 2009

DuraCloudTrusted management of and access to

durable digital assets in the cloud

DuraSpaceMediating

Service

Sun

EMCAmazon

Microsoft

Page 31: DuraSpace, Fedora and DuraCloud Thorny Staples Director, Community Strategy and Alliances ESIP Meeting, July 8, 2009

http://www.duraspace.org/

http://www.fedora-commons.org/

Page 32: DuraSpace, Fedora and DuraCloud Thorny Staples Director, Community Strategy and Alliances ESIP Meeting, July 8, 2009

Examples

• Text, images only • Art and Architecture• Quantitative data• Aggregation objects