managing change on the web luis francisco-revilla frank m. shipman richard furuta unmil karadkar...
Post on 21-Dec-2015
214 views
TRANSCRIPT
Managing Change on the Web
Luis Francisco-Revilla Frank M. Shipman
Richard Furuta Unmil Karadkar
Avital Arora
Center for the Study of Digital Libraries Texas A&M University
What is this talk about?
A system approach to help in managing digital libraries with collections of fluid resources with distributed location and ownership
Modern paradigms of digital libraries Pointers rather than the resources
Web-based collections NSDL
(http://www.ehr.nsf.gov/due/programs/nsdl/) Meta-documents High fluidity Changes vary in relevance Little system aid for assessing relevance
of changes
Related work
David Johnson PhD Dissertation, University of Washington Document distance Weighted, asymmetric
Change monitoring systems AIDE, URL Minder, WatzNew Fine-grained yes/no detection WebWatcher (evolving)
“Interesting” Identification Syskill & Webert, Do-I-Care-Agent , Letizia Personal, reader specific, profile-based
Motivation
Managing Walden’s Paths collection Paths are meta-documents
Sequential arrangement of Web pages Rhetorically coherent Contextualized Distributed ownership Distributed authorship
Continuous revision of the collection
Mechanisms for addressing the issue
Caching the pages Caching strategies Some changes are desirable
Fluid paths Ephemeral paths Rhetorical coherence
The real issue
Mechanisms only allowed limited reaction to changes
Detecting changes is easy but determining the relevance is difficult
Humans are still required to determine the significance of changes
In order to react to changes the assessment of their relevance is required
The perception of change (overview)
Observe how humans perceive changes of Web pages
Inform and evaluate the approach and design Questions
1. Do people view the same changes in a different way when given different amounts of time?
2. What kind of changes are easily perceived?
3. Of what kind of changes do users want to be notified?
Kinds of change
Content changes (what) Presentation changes (how) Structural changes (linking) Behavioral changes
Results and implications
Presentation changes were usually perceived as irrelevant
The desire of notification and the perception of overall change increased as the degree of content change did
Time played a larger role for the perception of structural changes than for the content changes
As the degree of structural change increased, so did the desire of notification
Links are useful metrics
Path Manager: the system
Java based Paths or bookmark lists HTML pages Functional state of the document
Original Valid Last-time
Algorithms
Variation of Johnson Weighted sum of
additions, deletions and modifications for each metric
Added metric for structure changes
Flexible Asymmetric Lack normalization
Proportional Determines the
proportion of modification for each metric
Simple Symmetrical Normalized
Web page retrieval and connectivity
Potentially slow and unpredictable Parallel retrieval
Multi-threaded Multiple attempts and retries Different states
Connection state Retrieval state Analysis state
Challenges and limitations
Heuristic identification of document structure (I.e. headings)
Indirection Behavior Dynamic pages
Conclusions
Managing distributed collections of documents remains challenging and time consuming requiring the assistance of humans
The Path Manager supports the maintenance of collection of Web pages
by recognizing, evaluating and informing the user of relevant changes
keeps track of the original, valid and last-time state of Web pages
The study conducted indicated the desire for structural changes to be included in the determination of overall change
Contact information
Luis [email protected]
Frank M. Shipman, [email protected]
Richard [email protected]
Unmil [email protected]
Avital [email protected]