the flexible extensible digital object repository architecture
DESCRIPTION
The Flexible Extensible Digital Object Repository Architecture. A set of abstractions that can be used to represent different kinds of data A repository management system A foundation for many information management applications Designed to make data “durable” over the long term. - PowerPoint PPT PresentationTRANSCRIPT
The Flexible Extensible Digital Object Repository Architecture
• A set of abstractions that can be used to represent different kinds of data
• A repository management system• A foundation for many information
management applications• Designed to make data “durable” over the
long term
The Fedora Project
• Developed at Cornell under an NSF grant• UVA Library re-interpreted the architecture
and created the first practical implementation• 3 year project funded in 2001 by Andrew W.
Mellon Foundation to create open-source system
• Another 3 years of development funded by Mellon in 2004
Fedora Commons, Inc.
• 501-(c)3 private, non-profit company• 4-year project funded by Moore Foundation to
become self-sustaining• Continuing software development• Moving towards community-based software
development• Establishing “solution communities” for the
development of solution bundles.
Other Fedora Commons Projects
• Akubra – storage plug-in module with tranactional file system
• Mulgara – RDF indexing engine• Topaz – core semantic knowledgebase
components
Scholarly and Scientific Collections
Preservation and Archiving
Education, Knowledge Spaces
The world we work in…
Data Curation, Linking, Publishing
blog and wiki
and more …
148 Current Known Users
• Broadcasting and media – 1• Consortia – 6• Corporations – 13• Government agencies – 4• IT- Related Institutions – 9• Medical Centers and Libraries – 4• Museums and Cultural Organizations – 4• National Libraries and Archives – 16• Professional Societies – 2• Publishing - 4• Research Groups and Projects – 14• Semantic and Virtual Library Projects - 6• University Libraries and Archives - 65
7 Known Vendors and Integrators:
• Acuity Unlimited• Aptivate• Atos Origin, France• Fitz Karlsrhue• MediaShelf, LLC• Sun Microsystems• VTLS
Abstract Data M anage me nt(F e dora CommonsSe rv ice F rame work)
Preservationand Archiv ing
Solutions
Data CurationSolutions
eResearchSolutions
PublishingSolutions
TapeLib rar ie s
Hone ycomb
RAID array
Solution Communities
• Community group that creates and maintains the vision for solution bundle in an area
• Gathers resources to create software for solution
• Coordinates development with the FC Architecture Council
• Smaller group that gets things done will emerge
Solution Areas
• Data Curation – Sayeed Choudry, from Johns Hopkins University
• Preservation and Archiving – Ron Jantz, from Rutgers
• Open Access Publishing – Rich Cave, from PLOS
• Integration Services – Matt Zumwalt from MediaShelf, LLC
Other Possible Community Groups
• Other software development groups• News and Publications Outreach group that
works with our Communications Director• Issue/advocacy groups that work on
standards important to the community
Making complex digital information “durable” is a very hard problem
• The existence and meaning of content needs to be verifiable as technologies change
• A history of the changes to the encoding and state of content must be reliably provided
• A meaningful context for any unit of content may be one of many and must be sustained
• Complex resources will increasingly be dispersed across institutional boundaries.
The Fedora abstractions provide a durability framework.
• Content is “unitized” as information objects that combine data, metadata, policies, relationships and the history of the object.
• Complex digital resources are formally defined graphs of related objects.
• The public view of the content is presented as abstract behaviors.
• The web services orientation of Fedora provides the basis for repository federation.
A data object is one unit of content, represented by an XML file.
Persistent ID (PID)
System Metadata
Policies
Relationships
Local Content
Datastreams managedby the system
Datastreams for thecomponents of the content
Datastreams hold or represent the content
• Inline XML : content in the FoXML object• Managed Content : content is managed by
the repository• Externally Referenced: URL of remote
content is in the FoXML object• Re-directed Referenced: external but content
is not disseminated through Fedora
Datastream Characteristics
• Can have any number and multiple types in the same object
• Versioned automatically by default• Checksums automatically by default• Formal identifier• Alternate identifiers• Audit trail maintained about all datastream
actions
Relationships Among Objects
• Describes adjacency relationships among objects
• RDF data of the form:
PID – typeOfRelationship – relatedObjectPID• Can used to assemble aggregations of
objects• Can build graphs of relationships to feed into
user interfaces
Objects Representing Aggregations
• Creating parent objects for complex resources
• Representing explicit collections• Representing implicit collections• Creating digital surrogates for physical
entities
Optional Object Behaviors
• Data objects can have different views or transformations
• Sets of abstract behaviors that different kinds of objects can subscribe to
• Corresponding sets of services that specific objects can execute
• The business logic is hidden behind an abstraction
General Im age Object
JPEG2000 Im age Object
Persistent ID(PID)
SystemMetadata
thumbnail image file
med res. image file
high res. image file
max res. image file
Persistent ID(PID)
SystemMetadata
JPEG200im age file
Service Description
Service M echanismfor General Im age Objects
ServiceM echanismfor JPEG2000 Im age Objects
get-thum bnail-sized-im age
get-m ed-sized-im age
get-h igh-res-im age
get-m ax-sized-im age
get-thum bnail-sized-im age
get-m ed-sized-im age
get-h igh-res-im age
get-m ax-sized-im age
get-sm allest-JPEG2000-size
get-m idrange-JPEG2000-size
get-h igh-res-JPEG2000-size
get-m ax-JPEG2000-size
Content Models
• Create classes of data objects• Expressed as Cmodel objects• A Cmodel object defines the number and
types of data streams for objects of that class• A Cmodel object binds to service objects to
enable appropriate behaviors to be inherited by data objects
Persistent ID (PID)
Service DefinitionMetadata
SystemMetadata
DatastreamsCmodel Object
Persistent ID (PID)
Service BindingMetadata (WSDL)
SystemMetadata
Datastreams
WebService
service contract
service
subscriptio
n
data contract
Persistent ID (PID)
RDF data
Datastreams
System Metadata
Service Mechanism Object
Service Definition Object
Persistent ID (PID)
System Metadata
Datastreams
Data Objects
Other components include:- Parameter values used by the method- Datetime stamp for earlier version
A behavior call has the form:
Object PID + SDef Name + Method Name
Fedora Repository ServiceGSearch
OAI
DirIngest
SimpleJMS
The Fedora Service Framework
Preserve
These are the core servivce components we distribute.
http://www.fedora-commons.org/
Examples
• Text, images only • Art and Architecture• Quantitative data• Aggregation objects