towards a digital library theory: a formal digital library ontology

37
Towards a Digital Library Theory: A Formal Digital Library Ontology Marcos André Gonçalves, Layne T. Watson, and Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA, [email protected] (For ACM SIGIR Mathematical/Formal Methods

Upload: chanel

Post on 19-Jan-2016

67 views

Category:

Documents


0 download

DESCRIPTION

Towards a Digital Library Theory: A Formal Digital Library Ontology. Marcos Andr é Gonçalves, Layne T. Watson, and E dward A. Fox Virginia Tech, Blacksburg, VA 24061 USA, [email protected] (For ACM SIGIR Mathematical/Formal Methods in Information Retrieval, MF/IR 2004, Sheffield, UK, Aug. 29, 2004). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Towards a Digital Library Theory: A Formal Digital Library Ontology

Towards a Digital Library Theory: A Formal Digital Library Ontology

Marcos André Gonçalves, Layne T. Watson, and Edward A. Fox

Virginia Tech, Blacksburg, VA 24061 USA, [email protected]

(For ACM SIGIR Mathematical/Formal Methods in Information

Retrieval, MF/IR 2004, Sheffield, UK, Aug. 29, 2004)

Page 2: Towards a Digital Library Theory: A Formal Digital Library Ontology

Outline Background: The 5S Model Motivation for this Work Digital Library Formal Ontology Taxonomy of DL Services Applications of the Theory Conclusions and Future Work

Page 3: Towards a Digital Library Theory: A Formal Digital Library Ontology

Background: The 5S Model Why 5S?

DLs are not benefiting from formal theories as have other CS fields: DB, IR, PL, etc.

DL construction: difficult, ad-hoc, lacking support for tailoring/customization

Conceptual modeling, requirements analysis, and methodological approaches are rarely supported in DL development. Lack of specific DL models, formalisms, languages

Page 4: Towards a Digital Library Theory: A Formal Digital Library Ontology

Background: The 5S Model Informally, DLs can be defined as complex

information systems that: help satisfy info needs of users (societies) provide info services (scenarios) organize info in usable ways (structures) (re)present info in usable ways (spaces) communicate info with users (streams)

Page 5: Towards a Digital Library Theory: A Formal Digital Library Ontology

Background: The 5S ModelStreams

Scenarios

Societies

Structures

Spaces

Static /Passive

Dynamic /Active

Page 6: Towards a Digital Library Theory: A Formal Digital Library Ontology

Background: 5S and DL formal definitions and compositions (April 2004 TOIS)

5S

structures (d.10)streams (d.9) spaces (d.18) scenarios (d.21) societies (d. 24)

structural metadataspecification(d.25)

descriptive metadataspecification(d.26)

repository(d. 33)

collection (d. 31)

(d.34)indexingservice

structured stream (d.29)

digitalobject (d.30)

metadata catalog (d.32)

browsingservice

(d.37)

searchingservice (d.35)

digital library(minimal) (d. 38)

services (d.22)

sequence (d. 3)

graph (d. 6)function (d. 2)

measurable(d.12), measure(d.13), probability (d.14), vector (d.15), topological (d.16) spaces

event (d.10)state (d. 18)

hypertext(d.36)

sequence (d. 3)

transmission(d.23)

relation (d. 1) language (d.5)

grammar (d. 7)

tuple (d. 4)*

Page 7: Towards a Digital Library Theory: A Formal Digital Library Ontology

Background: The 5S Model Summary of TOIS 2004 Formal Definitions:

A digital library is a 10-tuple (Streams, Structs, Sps, Scs, St2, Coll, Cat, Rep, Serv, Soc) in which Streams is a set of streams, which are sequences of

arbitrary types (e.g., bits, characters, pixels, frames); Structs is a set of structures, which are tuples, (G, ),

where G= (V, E) is a directed graph and : (V E) L is a labeling function;

Sps is a set of spaces each of which can be a measurable, measure, probability, topological, metric, or vector space.

Page 8: Towards a Digital Library Theory: A Formal Digital Library Ontology

Background: The 5S Model Scs = {sc1, sc2, …, scd} is a set of scenarios where each sck =

<e1k({p1k}), e2k({p2k}), …, ed_kk({pd_kk})> is a sequence of events that also can have a number of parameters {pik}. Events represent changes in computational states; parameters represent specific locations in a state and respective values.

St2 is a set of functions : V Streams ( ) that associate nodes of a structure with a pair of natural numbers (a, b) corresponding to a portion (span/segment) of a stream.

Coll = {C1, C2, …, Cf} is a set of DL collections where each DL

collection Ck = {do1k, do2k, …, dof_kk} is a set of digital objects.

Each digital object dok = (hk, Stm1k, Stt2k, k) is a tuple where

Stm1k Streams, Stt2k Structs, k St2, and hk is a handle

which represents a unique identifier for the object.

Page 9: Towards a Digital Library Theory: A Formal Digital Library Ontology

Background: The 5S Model Cat = {DMC_1, DMC_2, …, DMC_f} is a set of metadata catalogs for Coll where

each metadata catalog DMC_k = {(h, msshk)}, and msshk = {mshk1, mshk2, …, mshkn_hk} is a set of descriptive metadata specifications. Each descriptive metadata specification mshki is a structure with atomic values (e.g., numbers, dates, strings) associated with nodes.

A repository Rep = {(Ci, DMC_i)} (i=1 to f) is a set of pairs (collection, metadata catalog); it is assumed there exist operations to manipulate the family of pairs (e.g., get, store, delete).

Serv = {Se1, Se2, …, Ses} is a set of services where each service Sek = {sc1k, .., scs_kk} is described by a set of related scenarios.

Soc = (C, R) where C is a set of communities and R is a set of relationships among communities. SM = {sm1, sm2, …, smj}, and Ac = {ac1, ac2, …, acr } are two such communities where the former is a set of service managers responsible for running DL services and the latter is a set of actors that use those services. Being basically an electronic entity, a member smk of SM distinguishes

itself from actors by defining or implementing a set of operations {op1k, op2k, …, opnk} smk

Page 10: Towards a Digital Library Theory: A Formal Digital Library Ontology

BackgroundStreams

text

audio

image

video do mss

R

C DMcIc

Se

Sc

e

SM

Ac

op

Scenarios

Societies

Top

Pr

Metric

Measurable

Measure

Structures

Spaces

Vec

ms

Page 11: Towards a Digital Library Theory: A Formal Digital Library Ontology

Motivation Previous definitions emphasize syntactic aspects, i.e., how

digital library concepts are composed or built from previously defined concepts.

Complete a formal DL theory by: Making explicit the implicit relationships that exist among the DL

formal concepts defined in [Gonc04] Providing set of axiomatic rules that precisely define and constrain

the semantics of the relationships Categorizing and classifying DL services on the basis of the

ontology Research questions

How should DL services be built from the other DL components

Which are the fundamental and elementary DL services ? How can services be built/composed from other DL services?

We will explore semantic relations and rules of the DL domain by using ontologies.

Page 12: Towards a Digital Library Theory: A Formal Digital Library Ontology

Digital Library Formal Ontology An ontology is a tuple = (Ontol_Concepts,

Ontol_Rels) where: Ontol_Concepts is a family of ontological concepts, Ontol_Rels is a family of relations. Relations in Ontol_Rels are operationally realized by

one or more rules (e.g., first-order logic axioms) which intentionally specify or constrain which elements of a concept can participate in a relation.

Ontol_Rules is a family of rules of a particular ontology.

Page 13: Towards a Digital Library Theory: A Formal Digital Library Ontology

Relationships Intra-Model

Video contains Audio (MM) Metadata Catalog describes Collection (LIS) Probabilistic Space is_a Measure Space Service extends Service (reuse) Service Manager inherits_from Service Manager (OO)

Inter-Model Event executes Operation Actor participates_in Scenario Service Manager runs Service Service employs/produces Streams Structures

Spaces

Digital Library Formal Ontology

Page 14: Towards a Digital Library Theory: A Formal Digital Library Ontology

Digital Library Formal Ontology Concepts: {Se, Sc, e}; Key: Se = service; Sc = scenario; e =

event. Relations:

contains Sc e Symbolic Rule. x, y (x contains y Sc(x) e(y) j: (j x.Dom y = x(j)) )

precedes e e Sc; happens_before e e Sc Symbolic Rule 1. x, y, z (x precedesz y e(x) e(y) Sc(z) i, j: (z contains x

z contains y x = z(i) y=z(j) i + 1 = j)) Symbolic Rule 2. x, y, z (x happens_beforez y e(x) e(y) Sc(z) i, j: (z

contains x z contains y x = z(i) y=z(j) i < j)) includes Se Se Sc Sc; extends Se Se Sc Sc

Symbolic Rule 1. x, y (x includes y Sc(x) Sc(y) (z: e(z) y contains z x contains z) (p, q: e(p) e(q) p precedesy q p precedesx q))

Symbolic Rule 2. x, y (x extends y Sc(x) Sc(y) (z: e(z) y contains z x contains z) (p, q: e(p) e(q) p happens_beforey q p happens_beforex q))

Symbolic Rule 3. x, y (x extends y Se(x) Se(y) y x (x y p, q: Sc(p) Sc(q) p x q y p extends q))

Page 15: Towards a Digital Library Theory: A Formal Digital Library Ontology

Digital Library Formal OntologyStreams

text

audio

image

video do mss

R

C DMc

describes

stores

is_version_of

Ic

Se

Sc

e

extendsreuses

SM

Ac

opexecutes

participates_in

recipient

runs

Scenarios

Societies

inherits_from/includes

association

uses

Top

Pr Metric

Measurable

Measure

describes

employsproduces

employsproduces

employsproduces

Structures

Spaces

Vec

belongs_to

contains

ms

is_ais_a

precedeshappens_before

is_a

redefinesinvokes

contains

contains

contains

Page 16: Towards a Digital Library Theory: A Formal Digital Library Ontology

Digital Library Formal Ontology Consistency Rules

Catalog-Collection A complete catalog has at least one set of metadata

specifications for each digital object in the collection it describes (surjective partial function).

In a consistent catalog, each set of metadata specifications describes (exactly) one digital object in the related collection (total function).

Scenarios-Society A scenario x is consistent with regards to a set of

service managers Y if each operation executed by each event in the scenario is defined in some service manager y Y.

Page 17: Towards a Digital Library Theory: A Formal Digital Library Ontology

Digital Library Formal Ontology Characterizing employs/produces relationships In the table each service is characterized by

parameters (input, output) of the initial and final events of the scenarios that compose those services

All other previous definitions and keys apply here.

That set is complemented with the following definitions:

Page 18: Towards a Digital Library Theory: A Formal Digital Library Ontology

Services Related Definitions A query q is the representation of user interest or

information need. Hyptxt is an hypertext; wherein an anchor is a node. A log_entry is a descriptive metadata specification about

an event of a scenario. Let {doi} = {doi1, doi2,…, doin } be a set of digital

objects and Ct = {c1, c2,…,cn} be a set of labels for categories. A classifier classCt: {doi} 2Ct is a function that maps a digital object to a set of categories.

A cluster cluk = {do1k, do2k, …, donk} is a subset of a set of digital objects.

Page 19: Towards a Digital Library Theory: A Formal Digital Library Ontology

Service User input Other Service Input

Output

Acquiring {doi} Ci Cj

Browsing anchor Hyptxtk {doi}

Cataloging doi, msi_k (hi, mssi_m) (hi, mssi_(m+k))

Classifying doi classCt (doi, {ck_i})

Clustering {doi} X {cluk_i}

Expanding (query) {doi} IC_i, qi qj

Indexing Ci none IC_i

Linking doi Hyptxtk Hyptxtik

Logging none ei({pi}) log_entryi

Rating doi ,acj none {(doi,acj,rk)}

Searching q, Ci IC_i {dok}

Visualizing {doi} tfrk spik

Page 20: Towards a Digital Library Theory: A Formal Digital Library Ontology

Infrastructure Services: dealing with basic concepts such as collections and catalogs Repository-Building: create collections (digital objects)

and/or catalogs (metadata specifications). Preservational: generate instances by copying collections

(digital objects) or transforming (converting/translating) objects into different formats for preservation purposes

Add_Value: either aggregate value/information to collections (digital objects) or connect objects together.

Information Satisfaction: dealing with higher level societal requirements

KEY in next slide: Fundamental: minimal set of services or essential to existence

of a DL Composite DL service: takes input from some other service;

otherwise the service is called elementary.

Applications: A Taxonomy of DL Services

Page 21: Towards a Digital Library Theory: A Formal Digital Library Ontology

Applications: A Taxonomy of DL Services

Infrastructure Services Repository-Building Creational Preservational

Add_ Value

Information Satisfaction Services

Acquiring Authoring Cataloging Crawling (focused) Describing Digitizing Harvesting Submitting

Conserving Converting Copying/Replicating Emulating Translating (format)

Annotating Classifying Clustering Evaluating Extracting Indexing Linking Logging Measuring Rating Reviewing (peer) Surveying Training (classifier) Translating (language/format)

Binding Browsing Customizing Disseminating Expanding (query) Filtering Recommending Requesting Searching Visualizing

Page 22: Towards a Digital Library Theory: A Formal Digital Library Ontology

Searching Browsing

Ic

AcquiringUser interests/needs

query anchor

UniversalCollection

Ci

DMCi

Indexing

Society

actor

DescribingCataloguing

Linking

Hypertext

Infra-structure Services(fundamental)

Information Satisfaction Services(fundamental)

criteria sortOrder

{doi}

Submitting

Authoring

dok

mskj

Application: A Taxonomy of DL Services

Page 23: Towards a Digital Library Theory: A Formal Digital Library Ontology

DL Services I/O Behavior Regarding the prior figure, which shows:

Instantiations of the “Services Definition” model Inputs and outputs of examples of infrastructure

and information satisfaction DL services Key:

CDL = Collection

ICDL = index for collection CDL

{doi} = digital object

Soc = Society

Page 24: Towards a Digital Library Theory: A Formal Digital Library Ontology

Applications: A Taxonomy of DL Services

SearchingBrowsing

queryanchor

Society

actor

criteria sortOrder

Ck, {doi}

Recommending Filtering Binding Visualizing Expanding query

user model/expr Classifier/expr {doj}

{doR} {doF}

bi

InformationSatisfaction Services

spV query’

fundamental

Rating/Reviewing (peer)

Training

Infrastructure

Services (Add_Value)

composite

Page 25: Towards a Digital Library Theory: A Formal Digital Library Ontology

Application: Defining Quality in Digital Libraries

Formal theory can help to define “what’s a good digital library” by: Formally defining metrics of quality for each formal

concept (and relationships) Helping defining and applying numerical measures to

these metrics

Consider this in the Information Life Cycle

Page 26: Towards a Digital Library Theory: A Formal Digital Library Ontology

AuthoringModifying

OrganizingIndexing

Storing

Archiving

NetworkingAccessing

Filtering

Creation

DistributionUtilization

Reputation

Similarity

Desirability

AccuracyCompletenessConformance

Discovery

SearchingBrowsingRecommending

Relevance

Timeliness

Accessibility

Usage

Inactive

Active

Discard

RetentionMining

Semi-Active

Preservability

Timeliness

Page 27: Towards a Digital Library Theory: A Formal Digital Library Ontology

Defining Quality in Digital LibrariesDL Concept Dimensions of Quality Digital object Accessibility

Pertinence Preservability Relevance Similarity Significance Timeliness

Metadata specification Accuracy Completeness Conformance

Collection Completeness Impact Factor

Catalog Completeness Consistency

Repository Completeness Consistency

Services Composability Efficiency Effectiveness Extensibility Reusability Reliability

Page 28: Towards a Digital Library Theory: A Formal Digital Library Ontology

Defining Quality in Digital Libraries

Metadata specifications and metadata format - completeness Completeness of metadata specifications refers to the degree to

which values are present in the description, according to a metadata standard. As far as an individual property is concerned, only two situations are possible: either a value is assigned to the property in question, or not.

 Metric Completeness(msx) = 1 - (no. of missing attributes in

msx/ total attributes of the schema to which msx

conforms)

Page 29: Towards a Digital Library Theory: A Formal Digital Library Ontology

Defining Quality in Digital Libraries Metadata specifications and metadata format

- completeness OCLC NDLTD Union Catalog

00. 10. 20. 30. 40. 50. 60. 70. 80. 91

GWUD LSU

VTETD

MIT

UBC

PHYSNET

VTINDIV

VANDERBILT

NCSU

USASK

PITT HKU

HUMBOLT

OCLC

BGMYU

DRESDEN

VIENNA

GATECH

ETSU USF

MUENCHEN

UTENN

CCSD

WATERLOO

NSYSU

LAVAL

UPSALLA

CALTECH

UCL

WagUniv

Page 30: Towards a Digital Library Theory: A Formal Digital Library Ontology

Defining Quality in Digital Libraries Services - Extensibility and Reusability

A service Y reuses a service X if the behavior of Y incorporates the behavior of X.

A service Y extends a service X if it subsumes the behavior of X and potentially includes additional subflows of events.

Metrics Macro-Reusability(Serv) = ( reused(sei), sei Serv)/ |Serv|,

where reused is a 1, if smj, sej reuses si; 0, otherwise. Micro-Reusability(Serv) = ( LOC(smx) * reused(sei), smx

SM, sei Serv, sex runs sei )/ |LOC(sm), sm SM|, where LOC corresponds to the number of lines of code of a service manager

Page 31: Towards a Digital Library Theory: A Formal Digital Library Ontology

Defining Quality in Digital Libraries Services - Extensibility and Reusability

Service Component

Based

LOC for implementing

service

LOC reused from

component

Total LOC

Searching – Back-end Yes - 1650 1650

Search Wrapping No 100 - 100

Recommending Yes - 700 700

Recommend Wrapping No 200 - 200

Annotating – Back-end Yes 50 600 600

Annotate Wrapping No 50 - 50

Union Catalog Yes - 680 680

User Interface Service No 1800 - 1600

Browsing No 1390 - 1390

Comparing (objects) No 650 - 650

Marking Items No 550 - 550

Items of Interest No 480 - 480

Recent Searches/Discussions

No 230 - 230

Collections Description No 250 - 250

User Management No 600 - 600

Framework Code No 2000 - 2000

Total 8280 3630 11910

Macro-Reusability = 3/16 = 0.187Micro-Reusability = 3630 / 11910 = 0.304

Page 32: Towards a Digital Library Theory: A Formal Digital Library Ontology

Application: Re-engineering a DL Specification Language

5SL: Specification Language Reengineering

Using the relationships to redefine/reorganize the semantics and organization of the XML elements within the several sections of the DL specification

Page 33: Towards a Digital Library Theory: A Formal Digital Library Ontology

Re-engineering a DL Specification Language

Page 34: Towards a Digital Library Theory: A Formal Digital Library Ontology

Re-engineering a DL Specification Language

Page 35: Towards a Digital Library Theory: A Formal Digital Library Ontology

5SLGen: Automatic DL Generation

5S Meta

Model5SLGraph

DL Expert

DL Designer

5SL DL

Model

5SLGen

Practitioner

Researcher

TailoredDL

Services

Teacher

componentpool

ODLSearch,ODLBrowse,ODLRate,ODLReview,

…….

Requirements (1) Analysis (2)

Implementation (4)

Design (3)

Page 36: Towards a Digital Library Theory: A Formal Digital Library Ontology

Conclusions and Future Work Presented a DL formal ontology which

specifies the semantics of the relationships among the DL concepts therefore completing a theory for DLs

Applied the resulting ontology to: Define a taxonomy of DL services Create a Quality Model for DLs Re-engineer a DL specification language

Page 37: Towards a Digital Library Theory: A Formal Digital Library Ontology

Conclusions and Future Work Future Work Include:

Including Pre- and Post-Conditions in the Service Behavior Analysis

New Applications of the Model/theory New Design and Generation Tools Quality tools Modeling Complex Heterogeneous/Integrated Systems

Archaeology (ETANA)

Develop theorems and proofs Writing books…