the flexible extensible digital object repository architecture

26
The Flexible Extensible Digital Object Repository Architecture A set of abstractions that can be used to represent different kinds of data A repository management system A foundation for many information management applications Designed to make data “durable” over the long term

Upload: mave

Post on 06-Jan-2016

71 views

Category:

Documents


2 download

DESCRIPTION

The Flexible Extensible Digital Object Repository Architecture. A set of abstractions that can be used to represent different kinds of data A repository management system A foundation for many information management applications Designed to make data “durable” over the long term. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The Flexible Extensible Digital Object Repository Architecture

The Flexible Extensible Digital Object Repository Architecture

• A set of abstractions that can be used to represent different kinds of data

• A repository management system• A foundation for many information

management applications• Designed to make data “durable” over the

long term

Page 2: The Flexible Extensible Digital Object Repository Architecture

The Fedora Project

• Developed at Cornell under an NSF grant• UVA Library re-interpreted the architecture

and created the first practical implementation• 3 year project funded in 2001 by Andrew W.

Mellon Foundation to create open-source system

• Another 3 years of development funded by Mellon in 2004

Page 3: The Flexible Extensible Digital Object Repository Architecture

Fedora Commons, Inc.

• 501-(c)3 private, non-profit company• 4-year project funded by Moore Foundation to

become self-sustaining• Continuing software development• Moving towards community-based software

development• Establishing “solution communities” for the

development of solution bundles.

Page 4: The Flexible Extensible Digital Object Repository Architecture

Other Fedora Commons Projects

• Akubra – storage plug-in module with tranactional file system

• Mulgara – RDF indexing engine• Topaz – core semantic knowledgebase

components

Page 5: The Flexible Extensible Digital Object Repository Architecture

Scholarly and Scientific Collections

Preservation and Archiving

Education, Knowledge Spaces

The world we work in…

Data Curation, Linking, Publishing

blog and wiki

and more …

Page 6: The Flexible Extensible Digital Object Repository Architecture

148 Current Known Users

• Broadcasting and media – 1• Consortia – 6• Corporations – 13• Government agencies – 4• IT- Related Institutions – 9• Medical Centers and Libraries – 4• Museums and Cultural Organizations – 4• National Libraries and Archives – 16• Professional Societies – 2• Publishing - 4• Research Groups and Projects – 14• Semantic and Virtual Library Projects - 6• University Libraries and Archives - 65

Page 7: The Flexible Extensible Digital Object Repository Architecture

7 Known Vendors and Integrators:

• Acuity Unlimited• Aptivate• Atos Origin, France• Fitz Karlsrhue• MediaShelf, LLC• Sun Microsystems• VTLS

Page 8: The Flexible Extensible Digital Object Repository Architecture

Abstract Data M anage me nt(F e dora CommonsSe rv ice F rame work)

Preservationand Archiv ing

Solutions

Data CurationSolutions

eResearchSolutions

PublishingSolutions

TapeLib rar ie s

Hone ycomb

RAID array

Page 9: The Flexible Extensible Digital Object Repository Architecture

Solution Communities

• Community group that creates and maintains the vision for solution bundle in an area

• Gathers resources to create software for solution

• Coordinates development with the FC Architecture Council

• Smaller group that gets things done will emerge

Page 10: The Flexible Extensible Digital Object Repository Architecture

Solution Areas

• Data Curation – Sayeed Choudry, from Johns Hopkins University

• Preservation and Archiving – Ron Jantz, from Rutgers

• Open Access Publishing – Rich Cave, from PLOS

• Integration Services – Matt Zumwalt from MediaShelf, LLC

Page 11: The Flexible Extensible Digital Object Repository Architecture

Other Possible Community Groups

• Other software development groups• News and Publications Outreach group that

works with our Communications Director• Issue/advocacy groups that work on

standards important to the community

Page 12: The Flexible Extensible Digital Object Repository Architecture

Making complex digital information “durable” is a very hard problem

• The existence and meaning of content needs to be verifiable as technologies change

• A history of the changes to the encoding and state of content must be reliably provided

• A meaningful context for any unit of content may be one of many and must be sustained

• Complex resources will increasingly be dispersed across institutional boundaries.

Page 13: The Flexible Extensible Digital Object Repository Architecture

The Fedora abstractions provide a durability framework.

• Content is “unitized” as information objects that combine data, metadata, policies, relationships and the history of the object.

• Complex digital resources are formally defined graphs of related objects.

• The public view of the content is presented as abstract behaviors.

• The web services orientation of Fedora provides the basis for repository federation.

Page 14: The Flexible Extensible Digital Object Repository Architecture

A data object is one unit of content, represented by an XML file.

Persistent ID (PID)

System Metadata

Policies

Relationships

Local Content

Datastreams managedby the system

Datastreams for thecomponents of the content

Page 15: The Flexible Extensible Digital Object Repository Architecture

Datastreams hold or represent the content

• Inline XML : content in the FoXML object• Managed Content : content is managed by

the repository• Externally Referenced: URL of remote

content is in the FoXML object• Re-directed Referenced: external but content

is not disseminated through Fedora

Page 16: The Flexible Extensible Digital Object Repository Architecture

Datastream Characteristics

• Can have any number and multiple types in the same object

• Versioned automatically by default• Checksums automatically by default• Formal identifier• Alternate identifiers• Audit trail maintained about all datastream

actions

Page 17: The Flexible Extensible Digital Object Repository Architecture

Relationships Among Objects

• Describes adjacency relationships among objects

• RDF data of the form:

PID – typeOfRelationship – relatedObjectPID• Can used to assemble aggregations of

objects• Can build graphs of relationships to feed into

user interfaces

Page 18: The Flexible Extensible Digital Object Repository Architecture

Objects Representing Aggregations

• Creating parent objects for complex resources

• Representing explicit collections• Representing implicit collections• Creating digital surrogates for physical

entities

Page 19: The Flexible Extensible Digital Object Repository Architecture

Optional Object Behaviors

• Data objects can have different views or transformations

• Sets of abstract behaviors that different kinds of objects can subscribe to

• Corresponding sets of services that specific objects can execute

• The business logic is hidden behind an abstraction

Page 20: The Flexible Extensible Digital Object Repository Architecture

General Im age Object

JPEG2000 Im age Object

Persistent ID(PID)

SystemMetadata

thumbnail image file

med res. image file

high res. image file

max res. image file

Persistent ID(PID)

SystemMetadata

JPEG200im age file

Service Description

Service M echanismfor General Im age Objects

ServiceM echanismfor JPEG2000 Im age Objects

get-thum bnail-sized-im age

get-m ed-sized-im age

get-h igh-res-im age

get-m ax-sized-im age

get-thum bnail-sized-im age

get-m ed-sized-im age

get-h igh-res-im age

get-m ax-sized-im age

get-sm allest-JPEG2000-size

get-m idrange-JPEG2000-size

get-h igh-res-JPEG2000-size

get-m ax-JPEG2000-size

Page 21: The Flexible Extensible Digital Object Repository Architecture

Content Models

• Create classes of data objects• Expressed as Cmodel objects• A Cmodel object defines the number and

types of data streams for objects of that class• A Cmodel object binds to service objects to

enable appropriate behaviors to be inherited by data objects

Page 22: The Flexible Extensible Digital Object Repository Architecture

Persistent ID (PID)

Service DefinitionMetadata

SystemMetadata

DatastreamsCmodel Object

Persistent ID (PID)

Service BindingMetadata (WSDL)

SystemMetadata

Datastreams

WebService

service contract

service

subscriptio

n

data contract

Persistent ID (PID)

RDF data

Datastreams

System Metadata

Service Mechanism Object

Service Definition Object

Persistent ID (PID)

System Metadata

Datastreams

Data Objects

Page 23: The Flexible Extensible Digital Object Repository Architecture

Other components include:- Parameter values used by the method- Datetime stamp for earlier version

A behavior call has the form:

Object PID + SDef Name + Method Name

Page 24: The Flexible Extensible Digital Object Repository Architecture

Fedora Repository ServiceGSearch

OAI

DirIngest

SimpleJMS

The Fedora Service Framework

Preserve

These are the core servivce components we distribute.

Page 25: The Flexible Extensible Digital Object Repository Architecture

http://www.fedora-commons.org/

Page 26: The Flexible Extensible Digital Object Repository Architecture

Examples

• Text, images only • Art and Architecture• Quantitative data• Aggregation objects