an xml log standard and tool for digital library logging analysis

27
An XML Log Standard and Tool for Digital Library Logging Analysis Marcos André Gonçalves, Ming Luo, Rao Shen, Mir Farooq Ali, and Edward A. Fox Virginia Tech

Upload: lawson

Post on 05-Feb-2016

38 views

Category:

Documents


0 download

DESCRIPTION

An XML Log Standard and Tool for Digital Library Logging Analysis. Marcos Andr é Gonçalves, Ming Luo, Rao Shen, Mir Farooq Ali, and Edward A. Fox Virginia Tech. Outline. Motivation Related Work Problems with existing DL logs The Digital Library Standardized Log Format - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: An XML Log Standard and Tool for Digital Library Logging Analysis

An XML Log Standard and Tool for Digital Library Logging Analysis

Marcos André Gonçalves, Ming Luo, Rao Shen, Mir Farooq Ali, and Edward A. Fox

Virginia Tech

Page 2: An XML Log Standard and Tool for Digital Library Logging Analysis

Outline Motivation Related Work

Problems with existing DL logs

The Digital Library Standardized Log Format DL log standard design DL Log format structure

DL log tool and its implementation Conclusions and future work

Page 3: An XML Log Standard and Tool for Digital Library Logging Analysis

Motivation Log analysis

Source of information about: How patrons really use DL services How systems behave while supporting user information seeking activities Examples: patterns

Used to: Evaluate Enhance services Help and design user interfaces Better allocation of resources

Common practice in the web setting Supported by web servers, proxy caching

Page 4: An XML Log Standard and Tool for Digital Library Logging Analysis

Motivation (cont.) DLs differ from the web

DL collections are explicitly organized, described, managed, and preserved Users with more specific tasks and needs Digital objects and collections more structured

DL Logging should offer much richer information and opportunities Tradeoff : user privacy

Current DL logs Differences in formats and recorded information Problems:

Lack of interoperability No reuse of analysis tools Comparability of log analysis results

Page 5: An XML Log Standard and Tool for Digital Library Logging Analysis

Related Work Web Servers (Common Log Format)

Focused in browsing, stateless

bbn-cache-3.cisco.com - - [22/Oct/1998:00:20:21 -0400] "GET /~harley/courses.html HTTP/1.0" 200 1734bbn-cache-3.cisco.com - - [22/Oct/1998:00:20:22 -0400] "GET /~harley/clip_art/word_icon.gif HTTP/1.0" 200 1050www4.e-softinc.com - - [22/Oct/1998:00:20:27 -0400] "HEAD / HTTP/1.0" 200 0user-38ldbam.dialup.mindspring.com - - [22/Oct/1998:00:20:48 -0400] "GET /~lhuang/junior/capehatteras.html HTTP/1.0" 200 328user-38ldbam.dialup.mindspring.com - - [22/Oct/1998:00:20:48 -0400] "GET /~lhuang/junior/PB2panforringed.mirror.gif HTTP/1.0" 200 20222eger-dl01.agria.hu - - [22/Oct/1998:00:20:51 -0400] "GET /~tjohnson/pinouts/ HTTP/1.0" 200 26994

Page 6: An XML Log Standard and Tool for Digital Library Logging Analysis

Related Work (cont.) DL- Greenstone

ADMINISTRATION 37

/fast-cgi-bin/niupepalibrary

(a) its-www1.massey.ac.nz

(b) [Thu Dec 07 23:47:00 NZDT 2000]

(c) (a=p, b=0, bcp=, beu=, c=niupepa, cc=, ccp=0, ccs=0, cl=, cm=, cq2=, d=, e=, er=, f=0, fc=1, gc=0, gg=text, gt=0, h=, h2=, hl=1, hp=, il=l, j=, j2=, k=1, ky=, l=en, m=50, n=, n2=, o=20, p=home, pw=, q=, q2=, r=1, s=0, sp=frameset, t=1, ua=, uan=, ug=, uma=listusers, umc=, umnpw1=, umnpw2=, umpw=, umug=, umun=, umus=, un=, us=invalid, v=0, w=w, x=0, z=130.123.128.4-950647871)

(d) "Mozilla/4.08 [en] (Win95; I ;Nav)"

Page 7: An XML Log Standard and Tool for Digital Library Logging Analysis

Relate Work (cont.) Search Engine - OpenTextMon Sep 28 17:48:42 1998----- Starting Search -----Mon Sep 28 17:48:42 1998{Transaction Begin}Mon Sep 28 17:48:42 1998{RankMode Relevance1}Mon Sep 28 17:48:42 1998"Bacillus thuringiensis " Mon Sep 28 17:48:42 1998P0 = "Bacillus thuringiensis " Mon Sep 28 17:48:42 1998R = (*D including (*P0))Mon Sep 28 17:48:42 1998R = (((*R rankedby *P0)))Mon Sep 28 17:48:42 1998S = (subset.1.10 (*R))Mon Sep 28 17:48:42 1998SL0 = (region "OTSummary" within.1 (*S))Mon Sep 28 17:48:42 1998(*SL0 within.1 ( subset.1.1 *S ))Mon Sep 28 17:48:42 1998(*SL0 within.1 ( subset.2.1 *S ))Mon Sep 28 17:48:42 1998{Transaction End}

Page 8: An XML Log Standard and Tool for Digital Library Logging Analysis

Related Work (cont.) Problems with existing DL logs

Incompatibility Incompleteness Complexity of analysis Lack of organization Ambiguity Inflexibility Verboseness

Page 9: An XML Log Standard and Tool for Digital Library Logging Analysis

The Digital Library Standardized Log Format Comprehensive Reflective of the actual DL system behavior Easily readable Precise Flexible to accommodate in varying systems Succinct enough to be implemented Concern: user privacy

Page 10: An XML Log Standard and Tool for Digital Library Logging Analysis

The Digital Library Standardized Log Format- Design (cont.) Capture high level user and system behaviors

Hierarchical organization Encapsulated in transactions

Interactions between the users and the system or among the system components

Log format designed to record a number of different kinds of transactions

Examples:1. Login to the system 2. Submission of search query3. Browsing a result list4. Recording of a user failure

Page 11: An XML Log Standard and Tool for Digital Library Logging Analysis

The Digital Library Standardized Log Format- Design (cont.)

Design Reflective of DL behavior Based on the 5S formal theory

Unifying, mathematical theory to formally describe the semantics of DL components

Guidance for how to organize the log structure

Page 12: An XML Log Standard and Tool for Digital Library Logging Analysis

The Digital Library Standardized Log Format- Design (cont.)

5S Definition Use in Log Design

Streams Represent static and dynamic multimedia content

Temporal events, types of digital objects

Structures Labeled directed graphs; provide organization within the DL

Structured documents and metadata; structured searches, collection, metadata catalog; hypertext, classification scheme

Spaces Sets, properties and operations on those sets

Retrieval mode, Presentation information,

Scenarios sequences of events that modify states of a computation in order to accomplish some functional requirement.

Organization of the user and system actions into transactions, statements, events and actions; DL services as sets of scenarios.

Societies Sets of communities and relationships among them

User information

Page 13: An XML Log Standard and Tool for Digital Library Logging Analysis

The Digital Library Standardized Log Format (cont.)

Specification Collection of extensive, flat set of attributes

query

event

registering

transaction

session

errorbrowse

actiontimestamp

Machine information

help

search

update

Sorting rule

search

catalog

collection Resultcutoff

response

Page 14: An XML Log Standard and Tool for Digital Library Logging Analysis

The Digital Library Standardized Log Format - Specification

Organization in structured logical way XML- XML Schema

Standard syntax Guarantee quality, correctness Rich set of basic types help standardization Abundance of XML parsers helps construction

of analysis tools

Page 15: An XML Log Standard and Tool for Digital Library Logging Analysis

The Digital Library Standardized Log Format - Structure Top Level Hierarchy

Log

Log Entry

Transaction

SessionId

MachineInfo

TimeStamp

Statement

. . . . . .

Page 16: An XML Log Standard and Tool for Digital Library Logging Analysis

The Digital Library Standardized Log Format – Structure (cont.)

Decomposition of statement into different types

AdmInfo

Statement

SessionInfo

Event

ErrorInfo

HelpInfo

RegisterInfo

Page 17: An XML Log Standard and Tool for Digital Library Logging Analysis

AdmInfo

Statement

SessionInfo

Event

ErrorInfo

HelpInfo

RegisterInfo

Action StatusInfo

Search Browse StoreSysInfoUpdate

The Digital Library Standardized Log Format – Structure (cont.)

Decomposition of event

Page 18: An XML Log Standard and Tool for Digital Library Logging Analysis

The Digital Library Standardized Log Format – Structure (cont.)

Search Attributes

Search

QueryString

TimeFrame

PresentationInfo

SearchBy

Format NumberOfResultsSortBy CutOff

Collection

Catalog

Page 19: An XML Log Standard and Tool for Digital Library Logging Analysis

DL Log Tool and Implementation Java classes

XMLLogData: store data XMLLogManager: methods to read and write log

information according to the format Synchronized read and writes: avoid conflicts and

inconsistencies

Middleware for plug-in DL tool to target system Events based on target system architecture and

implementation Implemented in the MARIAN DL system

Page 20: An XML Log Standard and Tool for Digital Library Logging Analysis

DL Log Tool and Implementation (cont.): the MARIAN DL system

Database Layer

Search Layer

UserInteraction

Layer

Data Analysis,Collection Builders &Loading Tools

Webgate

Semantic networks persistent storageGeneralized inverted

index interfaces

DL Information networks characterization, indexing and loading

Tailored DL Infrastructure generation

Database management API

Searcher community

Semantic networkManagement API

Fusion modules

Distributed client communication

Structured logging

Customization and personalization

Query history

Multilingual support

Page 21: An XML Log Standard and Tool for Digital Library Logging Analysis

DL Log Tool and Implementation (cont.)

MARIANUser Layer

XMLLogManagerwriteLogEntry(parameters)

c1

XMLLogData

c2

Log middleware

Systemevent

storelogData(parameters)

Userevent

Analysistool

getLogData(parameters)

logData

Analysisrequest

result

DLpatron

DLanalyst

Page 22: An XML Log Standard and Tool for Digital Library Logging Analysis

DL Log Tool and Implementation (cont.) Example 1: Login to the system

<Transaction ID = "3452"> <SessionId > 987654usr3 </SessionId> <SessionInfo> <SessionStart> Start </SessionStart> <LoginInfo> <UserId> mhabib <UserId> </LoginInfo> </SessionInfo> <TimeStamp> 2002-05-31T20:10:55.000-05:00 </TimeStamp> <MachineInfo> <IPAddress> 128.173.244.56 <IPAddress> <Port> 8000 </Port> </MachineInfo></TransId>

Page 23: An XML Log Standard and Tool for Digital Library Logging Analysis

DL Log Tool and Implementation ... <Event> <Action> <Search> <Collection>Dirline</Collection> <ObjectType>CommunityRecord</ObjectType> <SearchBy>SearchByAnyParts</SearchBy> <SearchType>NonPersistant</SearchType> <QueryString>low back pain</QueryString> <TimeFrame> <StartTime>2002-05-31T20:11:07.000-05:00</StartTime> <EndTime>2002-05-31T20:11:09.000-05:00</EndTime> </TimeFrame> <PresentationInfo> <Format>List</Format> <SortBy>ByRank</SortBy> <NumberOfResults>217</NumberOfResults> <Cutoff>20</Cutoff> </PresentationInfo> ...

Example 2: query all Dirline records about “low back pain”

Page 24: An XML Log Standard and Tool for Digital Library Logging Analysis

DL Log Tool and Implementation

<Transaction ID = "3456">

<SessionId > 987654usr3 </SessionId>

...

<Statement>

<Event>

<Action>

<Browse>

<DocID> 5114 </DocID>

<DocName>University of Washington School of

Medicine Multidisciplinary Pain Center (UWPC)

</DocName>

...

Example 3: Browse an item of the ranked list returned as an answer for the previous search

Page 25: An XML Log Standard and Tool for Digital Library Logging Analysis

In conclusion Analysis of current DL log formats

Need for standardization, common practices, interoperable tools

Designed an XML-based log format standard for DL logging analysis Captures a rich, detailed set of system and user

behaviors.

Implemented format in a log component tool Connected to the MARIAN DL system

Page 26: An XML Log Standard and Tool for Digital Library Logging Analysis

Future Work Build suite of Components for Evaluation Use log format and tools to evaluate several projects

Networked Digital Library of Theses and Dissertations (NDLTD)

CITIDEL Broadening the scope of use to other NSDL projects Extend and use log tool with other DL systems and

architectures Consider user privacy issues Explore info for personalization

Page 27: An XML Log Standard and Tool for Digital Library Logging Analysis

Future work Crosswalks to other standards (e.g. CLF)

“Not yet other standard” More challenges

Distributed Logs Large settings

Investigate compression issues to deal with XML verboseness

Promote discussions: Listserv: [email protected]