semantic analysis of user browsing patterns in the web of data @usewod,

15
KIT University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association www.kit.edu Enabling Semantic Analysis of User Browsing Patterns in the Web of Data M.Sc. Julia Hoxha Institute of Applied Informatics and Formal Description Methods (AIFB) Karlsruhe Institute of Technology USEWOD Workshop @WWW2012 Lyon, France

Upload: juliahoxha

Post on 10-May-2015

138 views

Category:

Technology


0 download

DESCRIPTION

Enabling Semantic Analysis of User Browsing Patterns in the Web of Data USEWOD Workshop, @WWW2012 A useful step towards better interpretation and analysis of the usage patterns is to formalize the semantics of the resources that users are accessing in the Web. We focus on this problem and present an approach for the semantic formalization of usage logs, which lays the basis for e ffective techniques of querying expressive usage patterns. We also present a query answering approach, which is useful to find in the logs expressive patterns of usage behavior via formulation of semantic and temporal-based constraints. We have processed over 30 thousand user browsing sessions extracted from usage logs of DBPedia and Semantic Web Dog Food. The logs are semantically formalized using respective domain ontologies and RDF representations of the Web resources being accessed. We show the e ffectiveness of our approach through experimental results, providing in this way an exploratory analysis of the way users browse the Web of Data.

TRANSCRIPT

Page 1: Semantic Analysis of User Browsing Patterns in the Web of Data @USEWOD,

KIT – University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association www.kit.edu

Enabling Semantic Analysis of User Browsing Patterns in the Web of Data

M.Sc. Julia Hoxha Institute of Applied Informatics and Formal Description Methods (AIFB) Karlsruhe Institute of Technology

USEWOD Workshop @WWW2012 Lyon, France

Page 2: Semantic Analysis of User Browsing Patterns in the Web of Data @USEWOD,

Paper

Hoxha, J., Junghans, M., and Agarwal, S. (2012). Enabling Semantic Analysis of User Browsing Patterns in the Web of Data. In 2nd International Workshop on Usage Analysis and the Web of Data (USEWOD), 21st International World Wide Web Conference (WWW2012), Lyon, France, vol. CoRR, abs/1204.2713.

http://arxiv.org/abs/1204.2713

Page 3: Semantic Analysis of User Browsing Patterns in the Web of Data @USEWOD,

Outline Introduction

Framework for Behavior Analysis

Semantic Modeling of Cross-site Browsing Behavior

Web Browsing Activity Model (WAM)

Formalization Approach

Querying Behavioral Patterns

Evaluation

Conclusions

3 J. Hoxha – USEWOD Workshop, Lyon, 2012

Page 4: Semantic Analysis of User Browsing Patterns in the Web of Data @USEWOD,

Understanding user behavior in accessing Web resources helps site providers/domain experts: • Discover user preferences or detect bottlenecks

• Build adaptive Web sites

• Make appropriate recommendations to users, etc.

How to facilitate the analysis of usage patterns?

• Provide formal, semantic description of usage logs

• Offer techniques to expressively query patterns

Introduction

4 J. Hoxha – USEWOD Workshop, Lyon, 2012

HTTP Requests of Usage Logs InProceedi

ngs

swrc:Conference Event

swrc:Proceedings

foaf:Person

dc:creator

isA

ns2:relatedToEvent

swrc:Publication

ns1:name ns3:based_near

dbpedia: Populated

Place

ID Time User Action

1 [17:11:49:21 http://www.google.de/search?q=Lyon+www2012

1 [17:11:49:33] http://dbpedia.org/page/Lyon

1 [17:11:49:39] http://data.semanticweb.org/conference/ www/2011/demo/a-demo-search-engine-for-products

SWDF Domain Ontology

literal

Page 5: Semantic Analysis of User Browsing Patterns in the Web of Data @USEWOD,

Modeling and Analysis Framework

www

?

...

User 1 User n

?

www

Web Browsing Behavior Monitoring System

Semantic Activity Models

Domain Ontologies

Semantic Formalization

Selection

Target Data

--- --- --- --- --- --- --- ---

Preprocessed Data

Transformation

Preprocessing

Transformed Data

Event A Event B Event C

Event K Event N

Semantic Activity Model

Browsing

Activity

Formalization

Annotation with Domain Ontology

Cross-site Browsing Activities

Mo

nit

ori

ng

Form

aliz

atio

n

Pattern Mining

An

alys

is Querying Capabilities

Event e1 = (A1, I1, t1)

Type Ai ={content, function}

Input I1 = {i1,...,ik}

URL l1, Time t1

Event en = (An, In, tn)

Type An

Input In = {i1,...,ik}

URL ln, Time tn

User Session of browsing Events

Repository

s: <l1, l2, l3, ..., ln>

Semantic Formalization

5 J. Hoxha – USEWOD Workshop, Lyon, 2012

Page 6: Semantic Analysis of User Browsing Patterns in the Web of Data @USEWOD,

Definitions

Event • l full URL invoked, T types, P parameter, t timestamp

Event types

• Tc content type of an event

• Tf function type of an event

Session

• s is ordered sequence of events

• , s.t. i is the event order in s

• Ts start time and Te end time, s.t. 6 J. Hoxha – USEWOD Workshop, Lyon, 2012

Page 7: Semantic Analysis of User Browsing Patterns in the Web of Data @USEWOD,

7

Web browsing Activity Model (WAM)

wam:StartEvent

event:Event

rdfs:subClassOf

wam:Session

wam:EndEvent

wam:Parameter

wam:Input Variable

wam:OutputVariable

time:Temporal Entity

wam:User

wam:hasEvent wam:hasStartEvent

wam:hasEndEvent

wam:hasUser

Literal

wam:userID

Literal

wam:userIP

wam:hasTime

time:Interval

rdfs:subClassOf

wam:inInterval

Literal

wam:eventURL

wam:EventURL

wam:fullURL wam:baseURL

wam:hasInput

wam:hasParameter

Literal

wam:hasName wam:hasValue

wam:FunctionType

wam:Content Type

wam:functionType

wam:Event

time:Instant

wam:EventType

wam:contentType

rdfs:subClassOf

wam:<http://greenlinkeddata.org/wam.owl#> time:<http://www.w3.org/2006/time#> event: <http://purl.org/NET/c4dm/event.owl#> rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> rdfs:<http://www.w3.org/2000/01/rdf-schema#>

Domain Ontology used for semantic enrichment Based on function and content

?

http://www.avis.com/car-rental/reservation/ start-reservation.ac?resForm.pickUpLocation=Lyon

owa:Parameter Name

http://data.semanticweb.org/person/julia-hoxha

wam:BaseURL

wam:order

Literal

Page 8: Semantic Analysis of User Browsing Patterns in the Web of Data @USEWOD,

Formalization Approach Formalization based on

WAM ontology • Step 1. Semantic Enrichment

• Step 2. Extend Knowledge Base (ABox assertions for events & domain ontology)

• Step 3. RDF Serialization

Selection

Target Data

--- --- --- --- --- --- --- ---

Preprocessed Data

Transformation

Preprocessing

Transformed Data

Event A Event B Event C

Event K Event N

Semantic Activity Models

Semantic

Formalization

Annotation with Domain Ontology

Semantic Enrichment • For each link in logs, find URI of Web resource

• Find RDF representation of the resource (via a Mapping Template)

e.g. SWDF: http://data.semanticweb.org/person/julia-hoxha/html - HTML

http://data.semanticweb.org/person/julia-hoxha - URI

http://data.semanticweb.org/person/julia-hoxha/rdf - RDF/XML

• Extract ontology classes to which it belongs – used as ContentType of event (Person, ResearchGroup, Publication, MusicGroup, etc.)

8 J. Hoxha – USEWOD Workshop, Lyon, 2012

Page 9: Semantic Analysis of User Browsing Patterns in the Web of Data @USEWOD,

Semantic Analysis

Querying with semantic constraints

Address also temporal constraints

regarding the dynamics of user browsing behavior

Example: - In how many sessions within Mar-Apr 2011 users searched in Google, afterwards visited a page in SWDF?

Various levels of abstraction: e.g. instead of google -> any search engine or instead of any page -> WWW2011 page or even higher abstraction -> Conference page

9

s: <e1, ..., e2, ef >

e1.time e1.urlBase e1.type

„Conference“

„WWW2011“

isA

isA

J. Hoxha – USEWOD Workshop, Lyon, 2012

Page 10: Semantic Analysis of User Browsing Patterns in the Web of Data @USEWOD,

Consider real time (timestamps) and abstract time (order of events) to query usage patterns

Q: find sessions with start time Ts and end time Te containing an event e1 with URL

www.ex1.org, eventually succeeded by another e2 in the session with URL www.ex2.org

We address temporal logics capable of ontological reasoning • apply temporal operators e.g. next, eventually, always

(based on Lineal Temporal Logic - LTL)

• query formulated as LTL formula extended with DL axioms

Temporal Constraints

X LTL Formula in a State Transition System

LTL + DL - Proposition A as a set of Abox assertions e.g.

10 J. Hoxha – USEWOD Workshop, Lyon, 2012

A is true at the next state after the initial state s1

A is true at some state on the path

A is true at all states along the path

Page 11: Semantic Analysis of User Browsing Patterns in the Web of Data @USEWOD,

DL-LTL Query Formulation

Queries formulate

• 1) certain conditions on the session itself

• 2) temporal patterns in the events within the session

Query: Q (s): find sessions with start time Ts and end time Te

1) Conditions on the session itself 2) Temporal patterns within a session expressed as a DL-LTL formula, e.g.

containing an event e1 with content type “publication”, eventually succeeded by another e2 with function type “search engine”

11 J. Hoxha – USEWOD Workshop, Lyon, 2012

Page 12: Semantic Analysis of User Browsing Patterns in the Web of Data @USEWOD,

Query Answering Approach

Step 1. Check constraints on the session itself

Step 2. Verify temporal constraints applying model checking technique

Iterate over sessions S={S1, S2,…,Sn}

(a) build a finite state automaton (FSA) for each Si

(b) verification of DL-LTL formula

iterate over the states of FSA to determine whether a condition holds in the respective state

12 J. Hoxha – USEWOD Workshop, Lyon, 2012

Page 13: Semantic Analysis of User Browsing Patterns in the Web of Data @USEWOD,

SDWF 2009: % of sessions initiated in the domain

Evaluation Validate feasibility of the

formalization approach

Show feasibility of the query answering approach • Query sessions with

different patterns

• Measure performance

13

SWDF 2009

DBPedia 3-3

Monitoring Period

01.Jul.09- 12.Jul.09

01.Jul.09- 12.Jul.09

avg.#sessions/day

235.9 2899

#sessions

2831 31893

Formalization Bing 2.7%

Google 97%

Dbpedia 2009 DBPedia 2009: % of sessions initiated in the domain

• Only 1.46% of daily sessions containing SPARQL queries

Page 14: Semantic Analysis of User Browsing Patterns in the Web of Data @USEWOD,

Evaluation (II) Querying

• answering time varies slightly for the queries (~0.15 seconds)

• For up to 1000 sessions below 1.4 seconds

• model checking time is small

• OWL reasoning takes

~ 94% of the overall answering time

tim

e (s

ec)

nr. sessions

Q1

14 J. Hoxha – USEWOD Workshop, Lyon, 2012

Page 15: Semantic Analysis of User Browsing Patterns in the Web of Data @USEWOD,

Conclusions

Propose a framework for behavior modeling and analysis:

• Approach for semantic formalization of logs

• Techniques of querying patterns with temporal and semantic constraints

Challenges and Future Work • Find datasets of client-side navigation logs at multiple sites

• Domain Ontology acquisition

• Classification Techniques to find FunctionType

• Optimization of Query Answering

15 J. Hoxha – USEWOD Workshop, Lyon, 2012