andy powell, eduserv foundation [email protected] june 2006 eprints application profile
TRANSCRIPT
Jun
e 2
00
6
Andy Powell, Eduserv [email protected]
www.eduserv.org.uk/foundation
Eprints Application Profile
June 2006Eprint Application Profile Meeting - London
Agenda
• Welcome and introductions
• Issues with current use of simple DC
• Functional Requirements
• Model
• Lunch
• Eprints Application Profile
• Workplan
June 2006Eprint Application Profile Meeting - London
Current issues
• what’s the problem with using simple DC to describe eprints?
• difficult to differentiate ‘works/expressions’ from ‘manifestations/items’
• does dc:identifier identify the work/expression or a particular manifestation/item of the work?
– in ePrints UK guidelines, dc:identifier used to identify ‘work/expression’ and dc:relation used to identify ‘manifestation/item’
– but dc:relation may be used for other resources (e.g. cited works), therefore ambiguity in the metadata record
– and guidelines not widely implemented anyway…
– therefore difficult for software applications to move reliably from the metadata record to the full text
June 2006Eprint Application Profile Meeting - London
Current issues (2)
• not possible to determine whether subject terms are taken from a controlled vocabulary or not (e.g. is ‘Physics’ a free-text keyword or a term taken from Dewey?).
– therefore difficult to base subject-browse interfaces on controlled vocabulary hierarchy
• not possible to disambiguate authors with same name or reconcile instances of the same author being given different form of name
– therefore difficult to build browse-by-author type interfaces
• dates are ambiguous (either because of formatting and/or because type of date is not known)
June 2006Eprint Application Profile Meeting - London
functional requirements
• support search based on title, author, description, keyword, full text index
• support browse by keyword and author
• support rich subject browse based on knowledge of controlled vocabulary
• support filtering of search results and browse tree by type, publisher, date range, status and version(?)
• display title, author, publisher, keyword, full-text match in search results and browse tree
• move reliably from search results and browse tree to available copies, filtered by format
• move from search results and browse tree to OpenURL ‘link server’
• support citation analysis (between works/expressions)
June 2006Eprint Application Profile Meeting - London
functional requirements (2)
• enable capture of metadata about and relationships between different ‘versions’ of the same eprint
• be suitable for use in the context of OpenURLs and OpenURL resolvers
• i.e. support navigation/discovery of particular version of an eprint (e.g. most recent version of Author’s Original) and navigation/discovery of most appropriate copy of discovered ‘version’
• be compatible with dc-citation WG recommendations
• be compatible with preservation metadata approaches
• be compatible with library cataloguing approaches
June 2006Eprint Application Profile Meeting - London
Functional assumptions
• citations are made between eprint ‘expressions’ (in FRBR terms)
• hypertext links tend to be made between eprint ‘items’ (in FRBR terms)
• adopting a simple underlying model now may be expedient in the short term but costly to interoperability in the long term
• the underlying model need to be as complex as it needs to be, but not more so!
• a complex underlying model may be manifest in relatively simple metadata and/or end-user interfaces
June 2006Eprint Application Profile Meeting - London
FRBR (1)
• FRBR models the bibliographic world using 4 key entities - 'Work', 'Expression', 'Manifestation' and 'Item'.
– A work is a distinct intellectual or artistic creation. A work is an abstract entity
– An expression is the intellectual or artistic realization of a work in the form of alpha-numeric, musical, or choreographic notation, sound, image, object, movement, etc., or any combination of such forms. An expression is the specific intellectual or artistic form that a work takes each time it is "realized."
– A manifestation is the physical embodiment of an expression of a work. The entity defined as manifestation encompasses a wide range of materials, including manuscripts, books, periodicals, maps, posters, sound recordings, films, video recordings, CD-ROMs, multimedia kits, etc.
– An item is a single exemplar of a manifestation. The entity defined as item is a concrete entity.
June 2006Eprint Application Profile Meeting - London
FRBR (2)
• FRBR also defines a set of additional entities that are related to the four entities above - 'Person', 'Corporate body', 'Concept', 'Object', 'Event' and 'Place' - and a set of relationships between each of the entities.
• the key entity-relations appear to be: – Work -- is realized through --> Expression
– Expression -- is embodied in --> Manifestation
– Manifestation -- is exemplified by --> Item
– Work -- is created by --> Person or Corporate Body
– Manifestation -- is produced by --> Person or Corporate Body
– Expression -- has a translation --> Expression
– Expression -- has a revision --> Expression
– Manifestation -- has an alternative --> Manifestation
June 2006Eprint Application Profile Meeting - London
FRBR (3)
• Simple metadata standards like Dublin Core have traditionally tended to model the resources being described in a rather flat way - for example, as a set of relatively unrelated 'document-like objects‘
• this approach may be sufficient in the context of describing Web pages, it is rather limited in those cases, like scholarly publications, where the things being described are more complex. For example, a typical eprint (the publisher's PDF file that is deposited in an eprint archive) is a single item that is an exemplar of a particular manifestation (the PDF manifestation) of a particular expression (the published version) of a work (the conceptual work that is the eprint). There may be other items that are exemplars of the same manifestation (the PDF file as served from the publisher's Web site for example), other manifestations of the saame expression (the HTML manifestation), and other expressions of the same work (the pre-print for example), and so on.
June 2006Eprint Application Profile Meeting - London
Model
• based on FRBR
• but some of the labels have been changed
• intention is to make things more intuitive
• but may not have succeeded!
June 2006Eprint Application Profile Meeting - London
Eprints model
Eprint
Version0..∞
isExpressedAs
Format
isManifestedAs
0..∞
Copy
isAvailableAs
0..∞
Agent0..∞isAuthoredBy
0..∞
isPublishedBy
June 2006Eprint Application Profile Meeting - London
Eprints model and FRBR
Eprint
Version0..∞
isExpressedAs
Format
isManifestedAs
0..∞
Copy
isAvailableAs
0..∞
Agent0..∞isAuthoredBy
0..∞
isPublishedBy
FRBRWork
FRBRExpression
FRBRManifestation
FRBRItem
June 2006Eprint Application Profile Meeting - London
Eprints model and FRBR
Eprint
Version0..∞
isExpressedAs
Format
isManifestedAs
0..∞
Copy
isAvailableAs
0..∞
Agent0..∞isAuthoredBy
0..∞
isPublishedBy
the eprint (an abstract concept)
the ‘version of record’
orthe ‘french
version’or
‘version 2.1’
the PDF format of the version of
record
the publisher’s copy of the
PDF …
the author or the publisher
June 2006Eprint Application Profile Meeting - London
FRBR for eprints
The eprint – an abstract work
Author’s Original 1.0 Author’s Original 1.1Version of Record
(French)
html pdf
publisher’s copyinstitutional repository
copy
eprint(work)
version(expression)
format(manifestation)
copy(item)
Here we are using FRBR to model eprints. A work is “a distinct intellectual or artisticcreation”. An expression is “the intellectual or artistic realization of a work in the form of alpha-numeric … notation …”. A manifestation is “the physical [or digital] embodiment of an expression of a work”. Finally, an item is “a single exemplar of a manifestation”.Note that “Author’s Original” and “Version of Record” (used below) are taken from the ALPSP/NISO ‘status’ vocabulary at http://www.niso.org/committees/Journal_versioning/TermsandDefinitionsdraft2006.pdf
Note 1: different languages modelled as versions as per FRBR sect 5.3.2Note 2: orange parts used as basis for examples later…
… Version of Record(English)
June 2006Eprint Application Profile Meeting - London
Vertical vs. horizontal relationships
Eprint
Version
isExpressedAs
Version
isExpressedAs
Format Format
isManifestedAs isManifestedAs
hasVersion
hasFormat
June 2006Eprint Application Profile Meeting - London
Vertical vs. horizontal relationships (2)
Eprint
Version
isExpressedAs
Version
isExpressedAs
Format Format
isManifestedAs isManifestedAs
hasVersion and hasFormat relationships inferred by following vertical relations
June 2006Eprint Application Profile Meeting - London
Attributes
Eprint:titlesubjectabstractidentifier (URI)
Version:date issuedstatusversion numberlanguagetypecopyrightidentifier (URI)Format:
formatdate modifiedidentifier (URI)
Copy:identifier (URI)
Agent:nametypedate of birthaffiliationmailboxhomepageidentifier (URI)
OpenURL orcitation (string)
is available as (URI)
creatoris expressed as
publisheris manifested as
June 2006Eprint Application Profile Meeting - London
Attributes
Eprint:titlesubjectabstractidentifier (URI)creatoris expressed as
Eprint:titlesubjectabstractidentifier (URI)creatoris expressed as
Format:formatdate modifiedpublisheris available as (URI)
Format:formatdate modifiedpublisheris available as (URI)
Agent:nametypedate of birthaffiliationmailboxhomepage
Agent:nametypedate of birthaffiliationmailboxhomepage
Version:date issuedstatusversion numberlanguagetyperightsOpenURL orcitation (string)is manifested as
Version:date issuedstatusversion numberlanguagetyperightsOpenURL orcitation (string)is manifested as