prototyping digital libraries handling heterogeneous data sources – an etana-dl case study

71
Handling Heterogeneous Data Sources – An ETANA-DL Case Study Unni Ravindranathan, Rao Shen, Marcos André Gonçalves, Weiguo Fan, Edward A. Fox, James W. Flanagan [email protected] http://fox.cs.vt.edu Virginia Tech, Blacksburg, VA, USA (and ECDL 2004, Bath, England, September 2004

Upload: kyrene

Post on 23-Jan-2016

22 views

Category:

Documents


0 download

DESCRIPTION

Prototyping Digital Libraries Handling Heterogeneous Data Sources – An ETANA-DL Case Study. ECDL 2004, Bath, England, September 2004. Unni Ravindranathan, Rao Shen, Marcos Andr é Gon ç alves, Weiguo Fan, Edward A. Fox, James W. Flanagan [email protected] http://fox.cs.vt.edu - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Prototyping Digital Libraries Handling Heterogeneous Data Sources – An ETANA-DL Case Study

Unni Ravindranathan, Rao Shen, Marcos André Gonçalves, Weiguo

Fan,Edward A. Fox, James W.

Flanagan

[email protected] http://fox.cs.vt.eduVirginia Tech, Blacksburg, VA, USA (and CWRU)

ECDL 2004, Bath, England, September 2004

Page 2: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Acknowledgements(Selected)

Sponsors: NSF grant ITR-0325579; AOL, ASOR, CWRU, ETANA, Vanderbilt U., Virginia Tech

Faculty/Staff: Lillian Cassel, Debra Dudley, Roger Ehrich, Manuel Perez, Naren Ramakrishnan

VT (Former) Students: Aaron Krowne, Ming Luo, Fernando Das Neves, Ricardo Torres, Hussein Suleman

Page 3: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Acknowledgements (contd.)

• Karen Borstad, MPP

• Douglas Clark, Walla Walla College

• Joanne Eustis, CWRU

• Nick Fischio, CWRU

• Paul Gherman, Vanderbilt U.

• Andrew Graham, U. Toronto

• Tim Harrison, U. Toronto

• Larry Herr, Canadian University College

• Christopher Holland, LRP

• Paul Jacobs, Mississippi State U.

• Douglas Knight, Vanderbilt U.

• Stan LaBianca, Andrews U.

• David McCreery, Willamette U.

• Eric Meyers, Duke U.

• Adam Porter, Illinois College

• Jack Sasson, Vanderbilt U.

• Tom Schaub, Indiana U. of Penn.

• Randall Younker, Andrews U.

Page 4: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Outline

Problems Background Approach ETANA-DL ETANA-DL Prototype System

Modeling ETANA-DL ETANA-DL Services

Analysis Conclusions Future Work

Page 5: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Problems

Interoperability among heterogeneous archaeological systems

Delay in publication of primary archaeological data

Lack of sustainable solutions to long-term preservation of valuable information

Lack of services useful to the archaeology community, including “traditional DL services”

Difficulty in understanding complex archaeological information systems

Difficulty in requirements elicitation for archaeological systems

Page 6: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Outline

Problems Background Approach ETANA-DL ETANA-DL Prototype System

Modeling ETANA-DL ETANA-DL Services

Analysis Conclusions Future Work

Page 7: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Open Archives Initiatives

Promotes interoperability among DLs Open Archives Initiative Protocol for

Metadata Harvesting (OAI-PMH) Data Provider

• possess metadata and share it (internally / externally)• via well-defined OAI protocols (e.g., database servers)

Service Provider• harvest data from Data Providers• provide higher-level services to users

Page 8: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Traditional Digital Libraries

?1010100101010010101010010101010101010101

Program

1010100101010010101010010101010101010101

Document

1010100101010010101010010101010101010101

Document

1010100101010010101010010101010101010101

Document

1010100101010010101010010101010101010101

Program

1010100101010010101010010101010101010101

Program

1010100101010010101010010101010101010101

Image

1010100101010010101010010101010101010101

Image

1010100101010010101010010101010101010101

Image1010100101010010101010010101010101010101

Video

1010100101010010101010010101010101010101

Video

1010100101010010101010010101010101010101

Video?Monolithic

and/orCustom-built

web-basedapplication

Users Digital Library

Digital Objects

Page 9: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Introduction to ODL(Open Digital Libraries)

Open Digital Libraries• Framework for componentized Digital Libraries

• Design principles for components• Protocols for inter-component communications

• Built upon OAI

Page 10: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Open Digital Libraries Approach

Users ETANA-DL Sites

1010100101010010101010010101010101010101

1010100101010010101010010101010101010101

Bone

Search Filter

Union

Recent

Browse

US

ER

INT

ER

FA

CE

Filter

1010100101010010101010010101010101010101

1010100101010010101010010101010101010101

Seed

1010100101010010101010010101010101010101

1010100101010010101010010101010101010101

Figurine

1010100101010010101010010101010101010101

1010100101010010101010010101010101010101

Pottery

Page 11: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Basic ODL Model: An application for Archaeology

OAI Data Provider

OAI-PMH

ODL Protocol

User Interface

Nimrin

ETANA-DLUnion Catalog

OAI-PMH

ETANA-DL Search Engine

ODL Service ProviderComponent

WWW Interface

ODL Protocol

ODL Protocol

Page 12: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Componentized services example

User

SearchHandlerServlet

Query

Results

IRDBSearchEngine

User Interface

IndexDB

Query in the IRDBquery language

Results in XML

QueryParsed XML

Page 13: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

5S Model – Informally

Digital libraries are complex information systems that:

• help satisfy info needs of users (societies)• provide info services (scenarios)• organize info in usable ways (structures)• present info in usable ways (spaces)• communicate info with users (streams)

Page 14: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Outline

Problems Background Approach ETANA-DL ETANA-DL Prototype System

Modeling ETANA-DL ETANA-DL Services

Analysis Conclusions Future Work

Page 15: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Solution – our approach

Applying and extending Digital Library (DL) techniques to solve the following problems: interoperability, making primary data available, data preservation

Modeling archaeological information systems using 5S theory to better understand the domain and design the system and the supported services

Rapidly prototyping DLs that handle heterogeneous archaeological data using componentized frameworks: requirements elicitation, provide useful services.

Page 16: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Outline

Problems Background Approach ETANA-DL ETANA-DL Prototype System

Modeling ETANA-DL ETANA-DL Services

Analysis Conclusions Future Work

Page 17: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

ETANA-DL

Archaeological Digital Library Applies and extends the OAI-PMH

• Open Archives Initiative Protocol for Metadata Handling

Design considerations• Componentized• Distributed architecture• Extensible• Portable

Page 18: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

ETANA Digital Library Core Components - DigBase

DigBase (DB)• Central repository - stores metadata• Union catalog - for the collections in ETANA-DL• Various kinds of digital objects – excavation

records, images, text collections, etc.• General services - Search, Browse, Annotate,

Recommend, etc.• Archaeology-specific services - artifact

analysis, visualizations, artifact interpretation, workflows, etc.

Page 19: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

ETANA Digital Library Core Components - DigKit

DigKit (DK)• A suite of tools for collecting and

recording archaeological data in the field, that can be used for a new dig

• Metadata will migrate to DigBase (DB).

• Real-time collaborative archaeology: Metadata in DB will be rapidly available to others.

Page 20: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Outline

Problems Background Approach ETANA-DL ETANA-DL Prototype System

Modeling ETANA-DL ETANA-DL Services

Analysis Conclusions Future Work

Page 21: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Architecture

UnionCatalog

Inverted Files

DB used by Services

Index

Index

BrowseEngine

SearchComponent

Browse DB

OtherETANA-DL

Services

Web

Interface

XOAI

XOAI

DigBase

DB

DataMapping

Component

OA

I Data P

rovider

OAI

Archaeological Site ETANA-DL

DigKit

Configure

Page 22: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Modeling ETANA-DL – An Archaeological DL Meta-model

Text Video Audio

*Site *Sub-partition *Container *Artifact*LocusRegion

Taxonomies

Temporal Artifact-specific

Space model

Structuremodel

Metadata

Drawing Photo 3DStreammodel

*Partition

Society model

Archaeologist

General public

Geographic space

Service Manager

Information Satisfaction

Value added

Repository buildingScenario

model Services

Domain specific

User interface Metric space

Spatial

Page 23: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Modeling ETANA-DL – The ETANA-DL model

*Field *Pail *Bone*LocusJordan

Taxonomies

Space model

Structuremodel Field record,

locus sheet

Figurine image (photo)

Streammodel

Umayri

Society model

Archaeologist

Generic public

Site-specific coordinate system

Web interface Vector space

ETANA-DLService Manager

Searching, Browsing

Annotation, binding

Harvesting, Converting Scenariomodel Services

Object comparison, marking item for analysis

Archaeologicalperiods

Bone type

Seed species

*Square

*Figurine

*Quadrant *Bag*LocusJordan Valley Nimrin *Square

*Field *Basket*LocusSouthern Israel Halif *Area*Seed

Site/field plan(drawing)

Preliminary/FinalReport (application/pdf)

Spatial

Page 24: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Modeling ETANA-DL – Mapping heterogeneous data to the structural model

Site PartitionSub-

partitionLocus Container

LahavField

IAreaA8

LocusA8074

Basket224

NimrinQuadrant

NW

Quadrant Value

N25/W50

Locus96

Bag240

UmayriField

ASquare

7J59Locus001

Pail12

Page 25: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Data Mapping

Page 26: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

ETANA-DL Schema Design

Bone Seed Figurine

ETANA-DLObject

Count

Animal

……

Species

Name

……

Description

Dimensions

……

Owner

Subpartition

PartitionLocus

ID Container

Collection

……

Page 27: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Outline

Problems Background Approach ETANA-DL ETANA-DL Prototype System

Modeling ETANA-DL ETANA-DL Services

Analysis Conclusions Future Work

Page 28: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

ETANA-DL Services: Categories

Information satisfaction• Searching• Browsing• Recommendation

Archaeology (Domain) specific• Object comparison• Marking items

Value-added• Annotation• Items of interest (Binding service)• Recent searches/discussions• User management

Page 29: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Searching: Search Interface

Page 30: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Searching: Search Results

Page 31: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Searching: Advanced Search

Page 32: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Searching: Advanced Search Results

Page 33: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Multi Dimensional Browsing

Site structur

e

Temporal

Object-specific

User context

Page 34: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Searching within a Context

Page 35: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Searching within a Context: Search Results

Page 36: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Restoring Browsing Contexts

Page 37: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Object Comparison: Selecting Objects for Comparison

Page 38: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Object Comparison: Editing Attributes

Page 39: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Object Comparison: Editing Attributes

Page 40: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Object Comparison: Comparing Objects

Page 41: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Object Comparison: Comparison Results

Page 42: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Marking items

Page 43: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Viewing marked items

Page 44: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Remarking items

Page 45: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Discussion Board (Annotation): View Messages

Page 46: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Discussion Board (Annotation): Post Messages/Replies

Page 47: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Collections Description

Page 48: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Other services

Items of Interest (Binding service) Recent searches/discussions Recommendation User management

Account creation Login

Page 49: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Items of Interest: Binding Service

Page 50: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Recent Searches/Discussions

Page 51: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Recommendation

Page 52: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

User Management: New User Account

Page 53: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

User Management: Login

Page 54: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

User Management: Navigations

Page 55: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Outline

Problems Background Approach ETANA-DL ETANA-DL Prototype System

Modeling ETANA-DL ETANA-DL Services

Analysis Conclusions Future Work

Page 56: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Heterogeneous data handling

SiteArtifact

TypeOriginal data source

Number of attributesin original

record

Number of attributes in

harvested record

Number of records

harvested

Lahav FigurineTab-delimited

text file15 18 564

Nimrin

Bone field record

Table in Oracle DB

21 24 7420

Seed field record

Table in Oracle DB

12 15 430

UmayriBone field record

2 tables in Access DB

8 24 2123

Total 10537

Page 57: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Heterogeneous data handling

SiteData Analysis (in hours)

Data Mapping (in hours)

Data Provider Implementation(in hours)

Service Provider Implementation(in hours)

Lahav 48 144 4 1

Nimrin 48 48 4 1

Umayri

24 48 4 1

Total 120 240 12 3

Page 58: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Heterogeneous data handling

32%

64%

3% 1%

Data Analysis

Data Mapping

Data Provider Implementation

Service Provider Implementation

Page 59: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Rapid prototyping: Lines of Code

Type of Service

LOC for implementing service

LOC reused from components

Total LOC

Reuse Percentage

Componentized

350 3630 3980 91

Non-componentized

7950 - 7950 -

Total 8300 3630 11930 30.4

Page 60: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Rapid prototyping: Service development times

28%14%

58%

35%27%

38%

Requirements Analysis and Design

Implementation

Testing

Componentized Services

Non-componentized

Services

Page 61: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

User Analysis

Initial comments from all 3 projects, plus others interested in ETANA-DL

Positive feedback – users liked:• Data integration• Prototype cross-collection information

access services• Information structuring• Utility of supported services

Negative feedback – user concerns:• Need for service enhancements• Usability

Page 62: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Outline

Problems Background Approach ETANA-DL ETANA-DL Prototype System

Modeling ETANA-DL ETANA-DL Services

Analysis Conclusions Future Work

Page 63: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Conclusions

• Apply 5S to the archaeological domain• Identified requirements for future

versions of system• Extensible and componentized

approach for handling heterogeneous archaeological data from disparate sources

• Rapidly generated prototype archaeological DL

• Making primary archaeological data available without significant delay

Page 64: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Outline

Problems Background Approach ETANA-DL ETANA-DL Prototype System

Modeling ETANA-DL ETANA-DL Services

Analysis Conclusions Future Work

Page 65: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Future Work

Componentizing current DL services Creating next-generation DL services

from expanding set of requirements Integrating richer content (Semi-)automatic data mapping Automating the ingest of DL content Enhancing interface capabilities Formal usability studies

Page 66: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Visual Browsing

Visual BrowseBy sites

Page 67: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Visual Browsing: Topographical Drawings

Full site North west quadrant

Square:N40/W20

Page 68: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Visual Browsing: Square information

Loci layout

Square:N40/W20

Locus: 86

Page 69: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Visual Browsing: locus sheet

Page 70: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Publications

1. U. Ravindranathan, R. Shen, M. A. Goncalves, W. Fan, E. A. Fox, J. W. Flanagan. ETANA-DL: A Digital Library for Integrated Handling of Heterogeneous Archaeological Data. To be presented at the ACM-IEEE Joint Conference on Digital Libraries (JCDL 2004), Tucson, AZ, June 7-11, 2004.

2. U. Ravindranathan, R. Shen, M. A. Goncalves, W. Fan, E. A. Fox, J. W. Flanagan. ETANA-DL: Managing Complex Information Applications – An Archaeology Digital Library. Demo to be presented at the ACM-IEEE Joint Conference on Digital Libraries (JCDL 2004), Tucson, AZ, June 7-11, 2004.

3. U. Ravindranathan, R. Shen, M. A. Goncalves, W. Fan, E. A. Fox, J. W. Flanagan. Prototyping Digital Libraries Handling Heterogeneous Data Sources – The ETANA-DL Case Study. European Conference on Digital Libraries (ECDL 2004), Bath, U.K., September 12-17, 2004 (submitted).

Page 71: Prototyping Digital Libraries Handling Heterogeneous Data Sources –  An ETANA-DL Case Study

Questions/Feedback ??