prototyping digital libraries handling heterogeneous data sources – an etana-dl case study
DESCRIPTION
Prototyping Digital Libraries Handling Heterogeneous Data Sources – An ETANA-DL Case Study. ECDL 2004, Bath, England, September 2004. Unni Ravindranathan, Rao Shen, Marcos Andr é Gon ç alves, Weiguo Fan, Edward A. Fox, James W. Flanagan [email protected] http://fox.cs.vt.edu - PowerPoint PPT PresentationTRANSCRIPT
Prototyping Digital Libraries Handling Heterogeneous Data Sources – An ETANA-DL Case Study
Unni Ravindranathan, Rao Shen, Marcos André Gonçalves, Weiguo
Fan,Edward A. Fox, James W.
Flanagan
[email protected] http://fox.cs.vt.eduVirginia Tech, Blacksburg, VA, USA (and CWRU)
ECDL 2004, Bath, England, September 2004
Acknowledgements(Selected)
Sponsors: NSF grant ITR-0325579; AOL, ASOR, CWRU, ETANA, Vanderbilt U., Virginia Tech
Faculty/Staff: Lillian Cassel, Debra Dudley, Roger Ehrich, Manuel Perez, Naren Ramakrishnan
VT (Former) Students: Aaron Krowne, Ming Luo, Fernando Das Neves, Ricardo Torres, Hussein Suleman
Acknowledgements (contd.)
• Karen Borstad, MPP
• Douglas Clark, Walla Walla College
• Joanne Eustis, CWRU
• Nick Fischio, CWRU
• Paul Gherman, Vanderbilt U.
• Andrew Graham, U. Toronto
• Tim Harrison, U. Toronto
• Larry Herr, Canadian University College
• Christopher Holland, LRP
• Paul Jacobs, Mississippi State U.
• Douglas Knight, Vanderbilt U.
• Stan LaBianca, Andrews U.
• David McCreery, Willamette U.
• Eric Meyers, Duke U.
• Adam Porter, Illinois College
• Jack Sasson, Vanderbilt U.
• Tom Schaub, Indiana U. of Penn.
• Randall Younker, Andrews U.
Outline
Problems Background Approach ETANA-DL ETANA-DL Prototype System
Modeling ETANA-DL ETANA-DL Services
Analysis Conclusions Future Work
Problems
Interoperability among heterogeneous archaeological systems
Delay in publication of primary archaeological data
Lack of sustainable solutions to long-term preservation of valuable information
Lack of services useful to the archaeology community, including “traditional DL services”
Difficulty in understanding complex archaeological information systems
Difficulty in requirements elicitation for archaeological systems
Outline
Problems Background Approach ETANA-DL ETANA-DL Prototype System
Modeling ETANA-DL ETANA-DL Services
Analysis Conclusions Future Work
Open Archives Initiatives
Promotes interoperability among DLs Open Archives Initiative Protocol for
Metadata Harvesting (OAI-PMH) Data Provider
• possess metadata and share it (internally / externally)• via well-defined OAI protocols (e.g., database servers)
Service Provider• harvest data from Data Providers• provide higher-level services to users
Traditional Digital Libraries
?1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Image1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
Video?Monolithic
and/orCustom-built
web-basedapplication
Users Digital Library
Digital Objects
Introduction to ODL(Open Digital Libraries)
Open Digital Libraries• Framework for componentized Digital Libraries
• Design principles for components• Protocols for inter-component communications
• Built upon OAI
Open Digital Libraries Approach
Users ETANA-DL Sites
1010100101010010101010010101010101010101
1010100101010010101010010101010101010101
Bone
Search Filter
Union
Recent
Browse
US
ER
INT
ER
FA
CE
Filter
1010100101010010101010010101010101010101
1010100101010010101010010101010101010101
Seed
1010100101010010101010010101010101010101
1010100101010010101010010101010101010101
Figurine
1010100101010010101010010101010101010101
1010100101010010101010010101010101010101
Pottery
Basic ODL Model: An application for Archaeology
OAI Data Provider
OAI-PMH
ODL Protocol
User Interface
Nimrin
ETANA-DLUnion Catalog
OAI-PMH
ETANA-DL Search Engine
ODL Service ProviderComponent
WWW Interface
ODL Protocol
ODL Protocol
Componentized services example
User
SearchHandlerServlet
Query
Results
IRDBSearchEngine
User Interface
IndexDB
Query in the IRDBquery language
Results in XML
QueryParsed XML
5S Model – Informally
Digital libraries are complex information systems that:
• help satisfy info needs of users (societies)• provide info services (scenarios)• organize info in usable ways (structures)• present info in usable ways (spaces)• communicate info with users (streams)
Outline
Problems Background Approach ETANA-DL ETANA-DL Prototype System
Modeling ETANA-DL ETANA-DL Services
Analysis Conclusions Future Work
Solution – our approach
Applying and extending Digital Library (DL) techniques to solve the following problems: interoperability, making primary data available, data preservation
Modeling archaeological information systems using 5S theory to better understand the domain and design the system and the supported services
Rapidly prototyping DLs that handle heterogeneous archaeological data using componentized frameworks: requirements elicitation, provide useful services.
Outline
Problems Background Approach ETANA-DL ETANA-DL Prototype System
Modeling ETANA-DL ETANA-DL Services
Analysis Conclusions Future Work
ETANA-DL
Archaeological Digital Library Applies and extends the OAI-PMH
• Open Archives Initiative Protocol for Metadata Handling
Design considerations• Componentized• Distributed architecture• Extensible• Portable
ETANA Digital Library Core Components - DigBase
DigBase (DB)• Central repository - stores metadata• Union catalog - for the collections in ETANA-DL• Various kinds of digital objects – excavation
records, images, text collections, etc.• General services - Search, Browse, Annotate,
Recommend, etc.• Archaeology-specific services - artifact
analysis, visualizations, artifact interpretation, workflows, etc.
ETANA Digital Library Core Components - DigKit
DigKit (DK)• A suite of tools for collecting and
recording archaeological data in the field, that can be used for a new dig
• Metadata will migrate to DigBase (DB).
• Real-time collaborative archaeology: Metadata in DB will be rapidly available to others.
Outline
Problems Background Approach ETANA-DL ETANA-DL Prototype System
Modeling ETANA-DL ETANA-DL Services
Analysis Conclusions Future Work
Architecture
UnionCatalog
Inverted Files
DB used by Services
Index
Index
BrowseEngine
SearchComponent
Browse DB
OtherETANA-DL
Services
Web
Interface
XOAI
XOAI
DigBase
DB
DataMapping
Component
OA
I Data P
rovider
OAI
Archaeological Site ETANA-DL
DigKit
Configure
Modeling ETANA-DL – An Archaeological DL Meta-model
Text Video Audio
*Site *Sub-partition *Container *Artifact*LocusRegion
Taxonomies
Temporal Artifact-specific
Space model
Structuremodel
Metadata
Drawing Photo 3DStreammodel
*Partition
Society model
Archaeologist
General public
Geographic space
Service Manager
Information Satisfaction
Value added
Repository buildingScenario
model Services
Domain specific
User interface Metric space
Spatial
Modeling ETANA-DL – The ETANA-DL model
*Field *Pail *Bone*LocusJordan
Taxonomies
Space model
Structuremodel Field record,
locus sheet
Figurine image (photo)
Streammodel
Umayri
Society model
Archaeologist
Generic public
Site-specific coordinate system
Web interface Vector space
ETANA-DLService Manager
Searching, Browsing
Annotation, binding
Harvesting, Converting Scenariomodel Services
Object comparison, marking item for analysis
Archaeologicalperiods
Bone type
Seed species
*Square
*Figurine
*Quadrant *Bag*LocusJordan Valley Nimrin *Square
*Field *Basket*LocusSouthern Israel Halif *Area*Seed
Site/field plan(drawing)
Preliminary/FinalReport (application/pdf)
Spatial
Modeling ETANA-DL – Mapping heterogeneous data to the structural model
Site PartitionSub-
partitionLocus Container
LahavField
IAreaA8
LocusA8074
Basket224
NimrinQuadrant
NW
Quadrant Value
N25/W50
Locus96
Bag240
UmayriField
ASquare
7J59Locus001
Pail12
Data Mapping
ETANA-DL Schema Design
Bone Seed Figurine
ETANA-DLObject
Count
Animal
……
Species
Name
……
Description
Dimensions
……
Owner
Subpartition
PartitionLocus
ID Container
Collection
……
Outline
Problems Background Approach ETANA-DL ETANA-DL Prototype System
Modeling ETANA-DL ETANA-DL Services
Analysis Conclusions Future Work
ETANA-DL Services: Categories
Information satisfaction• Searching• Browsing• Recommendation
Archaeology (Domain) specific• Object comparison• Marking items
Value-added• Annotation• Items of interest (Binding service)• Recent searches/discussions• User management
Searching: Search Interface
Searching: Search Results
Searching: Advanced Search
Searching: Advanced Search Results
Multi Dimensional Browsing
Site structur
e
Temporal
Object-specific
User context
Searching within a Context
Searching within a Context: Search Results
Restoring Browsing Contexts
Object Comparison: Selecting Objects for Comparison
Object Comparison: Editing Attributes
Object Comparison: Editing Attributes
Object Comparison: Comparing Objects
Object Comparison: Comparison Results
Marking items
Viewing marked items
Remarking items
Discussion Board (Annotation): View Messages
Discussion Board (Annotation): Post Messages/Replies
Collections Description
Other services
Items of Interest (Binding service) Recent searches/discussions Recommendation User management
Account creation Login
Items of Interest: Binding Service
Recent Searches/Discussions
Recommendation
User Management: New User Account
User Management: Login
User Management: Navigations
Outline
Problems Background Approach ETANA-DL ETANA-DL Prototype System
Modeling ETANA-DL ETANA-DL Services
Analysis Conclusions Future Work
Heterogeneous data handling
SiteArtifact
TypeOriginal data source
Number of attributesin original
record
Number of attributes in
harvested record
Number of records
harvested
Lahav FigurineTab-delimited
text file15 18 564
Nimrin
Bone field record
Table in Oracle DB
21 24 7420
Seed field record
Table in Oracle DB
12 15 430
UmayriBone field record
2 tables in Access DB
8 24 2123
Total 10537
Heterogeneous data handling
SiteData Analysis (in hours)
Data Mapping (in hours)
Data Provider Implementation(in hours)
Service Provider Implementation(in hours)
Lahav 48 144 4 1
Nimrin 48 48 4 1
Umayri
24 48 4 1
Total 120 240 12 3
Heterogeneous data handling
32%
64%
3% 1%
Data Analysis
Data Mapping
Data Provider Implementation
Service Provider Implementation
Rapid prototyping: Lines of Code
Type of Service
LOC for implementing service
LOC reused from components
Total LOC
Reuse Percentage
Componentized
350 3630 3980 91
Non-componentized
7950 - 7950 -
Total 8300 3630 11930 30.4
Rapid prototyping: Service development times
28%14%
58%
35%27%
38%
Requirements Analysis and Design
Implementation
Testing
Componentized Services
Non-componentized
Services
User Analysis
Initial comments from all 3 projects, plus others interested in ETANA-DL
Positive feedback – users liked:• Data integration• Prototype cross-collection information
access services• Information structuring• Utility of supported services
Negative feedback – user concerns:• Need for service enhancements• Usability
Outline
Problems Background Approach ETANA-DL ETANA-DL Prototype System
Modeling ETANA-DL ETANA-DL Services
Analysis Conclusions Future Work
Conclusions
• Apply 5S to the archaeological domain• Identified requirements for future
versions of system• Extensible and componentized
approach for handling heterogeneous archaeological data from disparate sources
• Rapidly generated prototype archaeological DL
• Making primary archaeological data available without significant delay
Outline
Problems Background Approach ETANA-DL ETANA-DL Prototype System
Modeling ETANA-DL ETANA-DL Services
Analysis Conclusions Future Work
Future Work
Componentizing current DL services Creating next-generation DL services
from expanding set of requirements Integrating richer content (Semi-)automatic data mapping Automating the ingest of DL content Enhancing interface capabilities Formal usability studies
Visual Browsing
Visual BrowseBy sites
Visual Browsing: Topographical Drawings
Full site North west quadrant
Square:N40/W20
Visual Browsing: Square information
Loci layout
Square:N40/W20
Locus: 86
Visual Browsing: locus sheet
Publications
1. U. Ravindranathan, R. Shen, M. A. Goncalves, W. Fan, E. A. Fox, J. W. Flanagan. ETANA-DL: A Digital Library for Integrated Handling of Heterogeneous Archaeological Data. To be presented at the ACM-IEEE Joint Conference on Digital Libraries (JCDL 2004), Tucson, AZ, June 7-11, 2004.
2. U. Ravindranathan, R. Shen, M. A. Goncalves, W. Fan, E. A. Fox, J. W. Flanagan. ETANA-DL: Managing Complex Information Applications – An Archaeology Digital Library. Demo to be presented at the ACM-IEEE Joint Conference on Digital Libraries (JCDL 2004), Tucson, AZ, June 7-11, 2004.
3. U. Ravindranathan, R. Shen, M. A. Goncalves, W. Fan, E. A. Fox, J. W. Flanagan. Prototyping Digital Libraries Handling Heterogeneous Data Sources – The ETANA-DL Case Study. European Conference on Digital Libraries (ECDL 2004), Bath, U.K., September 12-17, 2004 (submitted).
Questions/Feedback ??