incremental, semi-automatic, mapping- based integration of heterogeneous collections into...

44
Incremental, Semi-automatic, Mapping-Based Integration of Heterogeneous Collections into Archaeological Digital Libraries: Megiddo Case Study ECDL 2005, Vienna, September 19, 2005 Ananth Raghavan, Naga Srinivas Vemuri, Rao Shen, Marcos André Gonçalves, Weiguo Fan, and Edward A. Fox [email protected] http://fox.cs.vt.edu

Upload: jasper-fletcher

Post on 04-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Incremental, Semi-automatic, Mapping- Based Integration of Heterogeneous Collections into Archaeological Digital Libraries: Megiddo Case Study ECDL 2005,

Incremental, Semi-automatic, Mapping-Based Integration of Heterogeneous

Collections into Archaeological Digital Libraries: Megiddo Case Study

ECDL 2005, Vienna, September 19, 2005

Ananth Raghavan, Naga Srinivas Vemuri, Rao Shen, Marcos André Gonçalves, Weiguo Fan, and

Edward A. Fox

[email protected] http://fox.cs.vt.edu

Page 2: Incremental, Semi-automatic, Mapping- Based Integration of Heterogeneous Collections into Archaeological Digital Libraries: Megiddo Case Study ECDL 2005,

Acknowledgements (Selected)

• Sponsors: NSF grant ITR-0325579; AOL, ASOR, CWRU, ETANA, Vanderbilt U., Virginia Tech

• Faculty/Staff: Lillian Cassel, Debra Dudley, Roger Ehrich, Manuel Perez, …

• VT (Former) Students: Doug Gorton, Aaron Krowne, Ming Luo, Hussein Suleman, Ricardo Torres, …

Page 3: Incremental, Semi-automatic, Mapping- Based Integration of Heterogeneous Collections into Archaeological Digital Libraries: Megiddo Case Study ECDL 2005,

Acknowledgements (Selected)

• Karen Borstad, MPP

• Giorgio Buccellati, UCLA

• Douglas Clark, Walla Walla College

• Joanne Eustis, CWRU

• Nick Fischio, CWRU

• Israel Finkelstein, Tel-Aviv University

• Paul Gherman, Vanderbilt U.

• Andrew Graham, U. Toronto

• Tim Harrison, U. Toronto

• Larry Herr, Canadian University College

• Christopher Holland, LRP

• Paul Jacobs, Mississippi State U.

• Douglas Knight, Vanderbilt U.

• Stan LaBianca, Andrews U.

• David McCreery, Willamette U.

• Eric Meyers, Duke U.

• Adam Porter, Illinois College

• Jack Sasson, Vanderbilt U.

• Tom Schaub, Indiana U. of Penn.

• Randall Younker, Andrews U.

• Doug Gorton, Virginia Tech

Page 4: Incremental, Semi-automatic, Mapping- Based Integration of Heterogeneous Collections into Archaeological Digital Libraries: Megiddo Case Study ECDL 2005,

Outline

Problems Background: ETANA-DL, Megiddo Approaches

Within the 5S framework Visual mapping service Multi-dimensional browsing

Conclusions Future Work

Page 5: Incremental, Semi-automatic, Mapping- Based Integration of Heterogeneous Collections into Archaeological Digital Libraries: Megiddo Case Study ECDL 2005,

Problems

Vast quantities of heterogeneous archaeological data Integration is a monumental task.

Wrapper automation difficult to construct a global schema in

archaeological domain

Page 6: Incremental, Semi-automatic, Mapping- Based Integration of Heterogeneous Collections into Archaeological Digital Libraries: Megiddo Case Study ECDL 2005,

Background

ETANA-DL Web Site

Page 7: Incremental, Semi-automatic, Mapping- Based Integration of Heterogeneous Collections into Archaeological Digital Libraries: Megiddo Case Study ECDL 2005,

Background (Cont.)

Megiddo Collection Archaeological site in Israel Contains over 30000 records 7 different types of artifacts

Wall Locus Pottery Bucket Flint tool Vessel Lab Item Miscellaneous Artifact

Page 8: Incremental, Semi-automatic, Mapping- Based Integration of Heterogeneous Collections into Archaeological Digital Libraries: Megiddo Case Study ECDL 2005,

Approaches

Within the 5S framework Visual mapping service

Semi-automatically generate wrapper based on a visual schema mapping tool that simultaneously improves the global schema.

Multi-dimensional browsing service Extend access to newly integrated collections

through multi-dimension browsing component.

Page 9: Incremental, Semi-automatic, Mapping- Based Integration of Heterogeneous Collections into Archaeological Digital Libraries: Megiddo Case Study ECDL 2005,

5S MetaModel

5SGraphDL

Expert

DL Designer

5SL DL

Model

5SLGen

Practitioner

Researcher

TailoredDL

Services

Teacher

componentpool

ODLSearch,ODLBrowse,ODLRate,ODLReview,

…….

Requirements (1) Analysis (2)

Implementation (4)

Design (3)

5SGraph 5SGen

Mapping Tool

5SSuite

Page 10: Incremental, Semi-automatic, Mapping- Based Integration of Heterogeneous Collections into Archaeological Digital Libraries: Megiddo Case Study ECDL 2005,

Structure Sub-model

Mapping Tool

Wrapper

ArchDL Designer

5SGraph5S Archaeology

MetaModelArchDL Expert ArchDL Designer

ETANA-DLUnion Services

Descriptions

HarvestingMapping

SearchingBrowsing

Scenario Sub-model

Local Schema ETANA-DL Schema

Local data

Globaldata

UnionCatalog

5SGen

ComponentPool

Browsing…

Multi-dimensionBrowsing Service

Page 11: Incremental, Semi-automatic, Mapping- Based Integration of Heterogeneous Collections into Archaeological Digital Libraries: Megiddo Case Study ECDL 2005,

*Pottery bucket

*Flint tool

*LocusMegiddo *Area *Square*Vessel

*Lab item

*miscellaneous artifact

Megiddo Site Organization in Structure Sub-model

Page 12: Incremental, Semi-automatic, Mapping- Based Integration of Heterogeneous Collections into Archaeological Digital Libraries: Megiddo Case Study ECDL 2005,

Visual Mapping Service

Features of visual schema mapping tool Scenario usage

Mapping Megiddo local schema into ETANA global one

Usability evaluation

Page 13: Incremental, Semi-automatic, Mapping- Based Integration of Heterogeneous Collections into Archaeological Digital Libraries: Megiddo Case Study ECDL 2005,

Features of Visual Schema Mapping Tool

Schema Visualization using hyperbolic trees Recommendation engine that uses 3 algorithms

Name-based matching (editing distance) Rules Mapping history

Colors to distinguish between different types of schema nodes (root, leaf, non-leaf, selected, recommended, and mapped)

Mapping table that stores mappings from local to global nodes

Allows for renaming, deleting a node, and adding a local schema sub-tree as a child in the global schema.

Generates an XSLT style sheet as a result of mapping process.

Page 14: Incremental, Semi-automatic, Mapping- Based Integration of Heterogeneous Collections into Archaeological Digital Libraries: Megiddo Case Study ECDL 2005,

Features of Visual Schema Mapping Tool

Page 15: Incremental, Semi-automatic, Mapping- Based Integration of Heterogeneous Collections into Archaeological Digital Libraries: Megiddo Case Study ECDL 2005,

Mapping Megiddo Local Schema into ETANA Global Schema

Mapping of flint tool and vessel collectionsName-based matching (editing distance)Rules

Area - > PARTITION Square1 - > SUBPARTITION OriginalBucket - > CONTAINER Locus - > LOCUS

Mapping history

Page 16: Incremental, Semi-automatic, Mapping- Based Integration of Heterogeneous Collections into Archaeological Digital Libraries: Megiddo Case Study ECDL 2005,

Mapping Megiddo Local Schema into ETANA Global Schema (Cont.)

Initial set of mappings for flint tool based on rules and name-based matching

Page 17: Incremental, Semi-automatic, Mapping- Based Integration of Heterogeneous Collections into Archaeological Digital Libraries: Megiddo Case Study ECDL 2005,

Mapping Megiddo Local Schema into ETANA Global Schema (Cont.)

Adding FLINT sub-tree as a child of OBJECT in the global schema

Page 18: Incremental, Semi-automatic, Mapping- Based Integration of Heterogeneous Collections into Archaeological Digital Libraries: Megiddo Case Study ECDL 2005,

Mapping Megiddo Local Schema into ETANA Global Schema (Cont.)

Global node Description renamed to DESCRIPTION, and user choosing to Save Mappings

Page 19: Incremental, Semi-automatic, Mapping- Based Integration of Heterogeneous Collections into Archaeological Digital Libraries: Megiddo Case Study ECDL 2005,

Mapping Megiddo Local Schema into ETANA Global Schema (Cont.)

Flint tool style sheet generated

Page 20: Incremental, Semi-automatic, Mapping- Based Integration of Heterogeneous Collections into Archaeological Digital Libraries: Megiddo Case Study ECDL 2005,

Mapping Megiddo Local Schema into ETANA Global Schema (Cont.)

Using the View Only Top Level Leaf Nodes option mapping Vessel Collection

Page 21: Incremental, Semi-automatic, Mapping- Based Integration of Heterogeneous Collections into Archaeological Digital Libraries: Megiddo Case Study ECDL 2005,

Mapping Megiddo Local Schema into ETANA Global Schema (Cont.)

Name change recommendation based on mapping history

Page 22: Incremental, Semi-automatic, Mapping- Based Integration of Heterogeneous Collections into Archaeological Digital Libraries: Megiddo Case Study ECDL 2005,

Usability Evaluation

Claims Analysis Exploring trade-off between

linear representation and hyperbolic tree representation with recommendations in terms of mapping speed.

scrolling involved in linear representation and re-orient actions involved in hyperbolic trees.

representing mappings as lines across the screen and in a separate mapping table

editing capability in the same tool and mapping and editing in different tools in terms of ease of use and editing and mapping speed.

Benchmark Tasks (BTs) to explore the above claims Comparison between Schema Mapper and MapForce for 1-

1 schema mapping (as found in ETANA-DL).

Page 23: Incremental, Semi-automatic, Mapping- Based Integration of Heterogeneous Collections into Archaeological Digital Libraries: Megiddo Case Study ECDL 2005,

Benchmark Task 1

Required the user to map 6 given nodes from the local to global schema.

Used to compare time and scrolls vs. re-orients and number of errors.

Users were asked to indicate as to which tool helped them locate nodes faster.

Page 24: Incremental, Semi-automatic, Mapping- Based Integration of Heterogeneous Collections into Archaeological Digital Libraries: Megiddo Case Study ECDL 2005,

Benchmark Task 1 Quantitative Results

Page 25: Incremental, Semi-automatic, Mapping- Based Integration of Heterogeneous Collections into Archaeological Digital Libraries: Megiddo Case Study ECDL 2005,

Benchmark Task 1 Quantitative Results (Cont.)

Page 26: Incremental, Semi-automatic, Mapping- Based Integration of Heterogeneous Collections into Archaeological Digital Libraries: Megiddo Case Study ECDL 2005,

Benchmark Task 1 Quantitative Results (Cont.)

2 users recorded 1 error each when using Schema Mapper, no errors for MapForce.

The error was that they selected the wrong local schema node.

However, both of them realized their error because of the mapping table provided.

Reduces the criticality of error.

Page 27: Incremental, Semi-automatic, Mapping- Based Integration of Heterogeneous Collections into Archaeological Digital Libraries: Megiddo Case Study ECDL 2005,

Benchmark Task 1 Qualitative Results

Wins 8 out of 9 users felt that Schema Mapper helped locate

both local schema and global schema nodes faster than MapForce.

The remaining user felt that both tools were equally effective for local schema node detection. However, for global schema node detection, Schema Mapper was superior.

Areas for Improvement Users complained that they could not look at the full

node name in Schema Mapper.

Page 28: Incremental, Semi-automatic, Mapping- Based Integration of Heterogeneous Collections into Archaeological Digital Libraries: Megiddo Case Study ECDL 2005,

Benchmark Task 2

User asked to map Megiddo Flint collection into ETANA-DL.

Task involves schema editing.

Task accomplished by using MapForce for mapping and XML Spy for editing for comparison with Schema Mapper.

Used to compare efficiency between the two tools.

Page 29: Incremental, Semi-automatic, Mapping- Based Integration of Heterogeneous Collections into Archaeological Digital Libraries: Megiddo Case Study ECDL 2005,

Benchmark Task 2 Quantitative Results

Page 30: Incremental, Semi-automatic, Mapping- Based Integration of Heterogeneous Collections into Archaeological Digital Libraries: Megiddo Case Study ECDL 2005,

Benchmark Task 2 Quantitative Results (Cont.)

Page 31: Incremental, Semi-automatic, Mapping- Based Integration of Heterogeneous Collections into Archaeological Digital Libraries: Megiddo Case Study ECDL 2005,

Benchmark Task 2 Quantitative Results (Cont.)

Schema Mapper – All errors were due to Rename feature. Task required the user to rename the node name to

uppercase of existing node name. The Rename box in the UI did not contain the old name.

Critical Incident with a high criticality Rectified by adding old name in the Rename box while

prompting the user to enter a new name.

In MapForce, one user actually lost all his mappings!!

Page 32: Incremental, Semi-automatic, Mapping- Based Integration of Heterogeneous Collections into Archaeological Digital Libraries: Megiddo Case Study ECDL 2005,

Benchmark Task 2 Qualitative Results

Wins All 9 users preferred editing capability of Schema

Mapper over that of MapForce and XML Spy combined.

Areas for Improvement Rename functionality to be extended to the mapping

table. Allowing a group rename by selecting multiple nodes

and renaming them in a separate window.

Page 33: Incremental, Semi-automatic, Mapping- Based Integration of Heterogeneous Collections into Archaeological Digital Libraries: Megiddo Case Study ECDL 2005,

Benchmark Task 3

Asks the users to identify mappings done in BT-2.

Compares the time taken by each tool to identify the mappings.

Compares errors in identifying mappings.

Page 34: Incremental, Semi-automatic, Mapping- Based Integration of Heterogeneous Collections into Archaeological Digital Libraries: Megiddo Case Study ECDL 2005,

Benchmark Task 3 Quantitative Results

Page 35: Incremental, Semi-automatic, Mapping- Based Integration of Heterogeneous Collections into Archaeological Digital Libraries: Megiddo Case Study ECDL 2005,

Benchmark Task 3 Quantitative Results (Cont.)

Page 36: Incremental, Semi-automatic, Mapping- Based Integration of Heterogeneous Collections into Archaeological Digital Libraries: Megiddo Case Study ECDL 2005,

Benchmark Task 3 Quantitative Results

Wins 7 out of 9 users were faster using Schema Mapper. No errors using Schema Mapper whereas 2 users made

1 error each while using MapForce.

Areas for Improvement Sorting feature can be added to further aid the user in

locating the mappings faster. (Has been subsequently added.)

Page 37: Incremental, Semi-automatic, Mapping- Based Integration of Heterogeneous Collections into Archaeological Digital Libraries: Megiddo Case Study ECDL 2005,

Benchmark Task 3 Qualitative Results

Wins All 9 users found it easier to identify mappings with

Schema Mapper than MapForce.

Page 38: Incremental, Semi-automatic, Mapping- Based Integration of Heterogeneous Collections into Archaeological Digital Libraries: Megiddo Case Study ECDL 2005,

Benchmark Task 4

Users were asked whether they would be using View Only Top-Level Leaf Nodes and View Only This Sub-tree features.

This question was mainly posed to find out whether an undo feature (getting back the original view with all nodes displayed) needed to be implemented.

All users unanimously agreed that they would use both of the features.

(Undo feature was implemented subsequently.)

Page 39: Incremental, Semi-automatic, Mapping- Based Integration of Heterogeneous Collections into Archaeological Digital Libraries: Megiddo Case Study ECDL 2005,

Summary of Usability Evaluation

All claims justified.

Rename box modified to display old name while prompting for new name.

Undo feature implemented.

Sort feature provided for sorting the mapping table.

Page 40: Incremental, Semi-automatic, Mapping- Based Integration of Heterogeneous Collections into Archaeological Digital Libraries: Megiddo Case Study ECDL 2005,

Multi-dimension Browsing Service

Extend browsing service to integrated Megiddo collection Flint Vessel Lab item Miscellaneous artifact

Page 41: Incremental, Semi-automatic, Mapping- Based Integration of Heterogeneous Collections into Archaeological Digital Libraries: Megiddo Case Study ECDL 2005,

Multi-dimension Browsing Service

Integrated Megiddo collection

Page 42: Incremental, Semi-automatic, Mapping- Based Integration of Heterogeneous Collections into Archaeological Digital Libraries: Megiddo Case Study ECDL 2005,

Conclusions

Demonstrate the DL integration workflow through Megiddo case study.

Visual schema mapping tool supports integration by wrapper generation and global schema enrichment.

Positive results from initial pilot studies of the visual schema mapping tool

Page 43: Incremental, Semi-automatic, Mapping- Based Integration of Heterogeneous Collections into Archaeological Digital Libraries: Megiddo Case Study ECDL 2005,

Future Work

Extensive usability studies

Explore complex mappings

Enhance mapping recommendations

Page 44: Incremental, Semi-automatic, Mapping- Based Integration of Heterogeneous Collections into Archaeological Digital Libraries: Megiddo Case Study ECDL 2005,

Questions?Comments?