1 from user access patterns to dynamic hypertext linking patrick farrell, siddharth gudka, mike...

24
1 From User Access Patterns to Dynamic Hypertext Linking Patrick Farrell, Siddharth Gudka, Mike Oxley, Simon Phillips A Research Directions In Computing Presentation T. Yan, M. Jacobsen, H. Garcia-Molina, U. Dayal

Upload: morgan-cole

Post on 28-Mar-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 From User Access Patterns to Dynamic Hypertext Linking Patrick Farrell, Siddharth Gudka, Mike Oxley, Simon Phillips A Research Directions In Computing

1

From User Access Patterns to Dynamic Hypertext Linking

Patrick Farrell, Siddharth Gudka, Mike Oxley, Simon Phillips

A Research Directions In Computing Presentation

T. Yan, M. Jacobsen, H. Garcia-Molina, U. Dayal

Page 2: 1 From User Access Patterns to Dynamic Hypertext Linking Patrick Farrell, Siddharth Gudka, Mike Oxley, Simon Phillips A Research Directions In Computing

2

Agenda• Introduction• Some theory• The paper• A short critique• After the paper

– Academic research– The Authors’ work

• The technology in use today• Conclusion• Questions

Page 3: 1 From User Access Patterns to Dynamic Hypertext Linking Patrick Farrell, Siddharth Gudka, Mike Oxley, Simon Phillips A Research Directions In Computing

3

Introduction

HypothesisThat hyperlinks to unvisited and indirectly linked

pages can be offered based upon pages the user has already visited

Experimenta) to analyse log files to form clusters of

commonly co-accessed pagesb) to categorize online users into the correct

categories and offer appropriate links

Page 4: 1 From User Access Patterns to Dynamic Hypertext Linking Patrick Farrell, Siddharth Gudka, Mike Oxley, Simon Phillips A Research Directions In Computing

4

Mass customisation

• Concept of adapting things to each user – on a large scale

• Economic benefit in adding value• Satisfied shoppers also more likely to return• What’s new?

– In the physical world, customisation doesn’t scale.

– Using technology and intelligent algorithms, it can.

Page 5: 1 From User Access Patterns to Dynamic Hypertext Linking Patrick Farrell, Siddharth Gudka, Mike Oxley, Simon Phillips A Research Directions In Computing

5

Adaptive Web Sites

• Sites that automatically improve their organisation and presentation based on visitor access patterns

• We can cluster pages on a site together based on their co-occurrence frequency– Likelihood that user will visit page P having

visited Q• For a user browsing the site, use session

history to predict which pages a user may want to access – and so adapt site

Page 6: 1 From User Access Patterns to Dynamic Hypertext Linking Patrick Farrell, Siddharth Gudka, Mike Oxley, Simon Phillips A Research Directions In Computing

6

The Paper

• Yan et al. implement an adaptive web site, based on user access logs.

• Paper discusses different approaches to clustering and implementation

• Experimental data is presented– validating the concept of clustering on an

academic site– showing the value added by an adaptive website

using their technique

• The log analysis software used is published

Page 7: 1 From User Access Patterns to Dynamic Hypertext Linking Patrick Farrell, Siddharth Gudka, Mike Oxley, Simon Phillips A Research Directions In Computing

7

The paper - Justification

• Use the metaphor of a shopper browsing an online shop

• Adaptive site can provide links to similar items to those being browsed– eg “Male Yuppie” browsing executive toys– Might also be interested in sportswear

• As site grows, static links to ‘related’ content more of a challenge - dynamic is much better

• Many practical examples today – but not 10 years ago!

Page 8: 1 From User Access Patterns to Dynamic Hypertext Linking Patrick Farrell, Siddharth Gudka, Mike Oxley, Simon Phillips A Research Directions In Computing

8

Online

The Paper – System Design

Link Generator

HTML Documents

Offline

Access logs

Preprocess Cluster

User Categories

URL

HTML with suggestions

WebServer

End user

Page 9: 1 From User Access Patterns to Dynamic Hypertext Linking Patrick Farrell, Siddharth Gudka, Mike Oxley, Simon Phillips A Research Directions In Computing

9

The paper - Preprocessing

• For each user session– form a n-dimensional vector of the pages

visited– can weight vector elements using a metric

• Number of hits to page• Estimate of time spent on page (possibly

normalised)

• ‘Close’ session vectors in n-dimensional space form a cluster

Page 10: 1 From User Access Patterns to Dynamic Hypertext Linking Patrick Farrell, Siddharth Gudka, Mike Oxley, Simon Phillips A Research Directions In Computing

10

The paper - Clustering

• Different algorithms to cluster vectors by ‘closeness’

• Paper uses Leader algorithm – with additional constraints– Constraint: Minimum hits in a valid

session– Constraint: Minimum cluster size

• Algorithm fast and memory efficient– But not order invariant

Page 11: 1 From User Access Patterns to Dynamic Hypertext Linking Patrick Farrell, Siddharth Gudka, Mike Oxley, Simon Phillips A Research Directions In Computing

11

Dynamic Link Generation

• Use session history to track page a user has visited– Authors buffered logs in memory using a database– Sessions part of most web servers now

• Match partial vector of session with pre-calculated categories to build list of appropriate pages– Partial vector, so Euclidian distance not necessarily

appropriate– May be better to simply count matching categories

• Filter the suggestion list to remove pages visited - and possibly any already adjacent in navigation tree

Page 12: 1 From User Access Patterns to Dynamic Hypertext Linking Patrick Farrell, Siddharth Gudka, Mike Oxley, Simon Phillips A Research Directions In Computing

12

Paper – Experimental results

• Time spent on particular pages follows Zipfian distribution – not useful for page weight

• The authors present a number of experimental results about clustering algorithm parameters, e.g. min. cluster size

• Found clusters on academic website that were not evident from hypertext layout – so clustering serves purpose.

Page 13: 1 From User Access Patterns to Dynamic Hypertext Linking Patrick Farrell, Siddharth Gudka, Mike Oxley, Simon Phillips A Research Directions In Computing

13

Critique• Paper presents new concept of clustering web

accesses – but essentially draws together existing work in other fields

• Makes key simplifications– Ignores any web caching, proxies, etc– Considering all pages in a session as being in a

category is naïve – e.g. navigation pages, indexes, etc

• Weakness in experiments– Authors invented nominal ‘sessions’ based on

unique end-user addresses as server didn’t support sessions

– Only present data for one site• 2,709 sessions – of which 50% were in the same cluster!

Page 14: 1 From User Access Patterns to Dynamic Hypertext Linking Patrick Farrell, Siddharth Gudka, Mike Oxley, Simon Phillips A Research Directions In Computing

14

Further Work

• Garcia-Molina– Beyond Document Similarity: Understanding

Value-Based Search and Browsing Technologies (2000)

• Discusses judging value of web documents based on user behaviour

• Dayal:– Knowledge-Based Support Services: Monitoring

and Adaptation (2000)• Discusses a Knowledge-Based Service deployed within

HP to deliver customer support services.• System adapts based on observed user patterns and

evolving needs

Page 15: 1 From User Access Patterns to Dynamic Hypertext Linking Patrick Farrell, Siddharth Gudka, Mike Oxley, Simon Phillips A Research Directions In Computing

15

Related Work

• Web Prefetching (Jiang & Kleinrock, 1998)– Addresses slow access speeds of World Wide Web

• PREDICTION MODULE: Computes access probabilities.• THRESHOLD MODULE: Computes prefetch thresholds.

– Uses clustering to divide users into categories by access probability

• Restoring Meaningful Episodes in a Proxy Log (Lou et al. 2001)– Extracting user’s activity information from proxy

logs– Classifies individual requests into meaningful

semantic elements– Semantics-based CUT-AND-PICK approach

Page 16: 1 From User Access Patterns to Dynamic Hypertext Linking Patrick Farrell, Siddharth Gudka, Mike Oxley, Simon Phillips A Research Directions In Computing

16

Related Work

• SUGGEST (Baraglia et al. 2002, 2004)– No off-line component– Quality metric to estimate effectiveness of

suggestions

• Media Agents (Wenyin et al. 2003)– Automatic collection of semantic indices of

multimedia data– Semantic descriptions from content of documents– User’s interaction refines semantic indices and

suggests other multimedia data

Page 17: 1 From User Access Patterns to Dynamic Hypertext Linking Patrick Farrell, Siddharth Gudka, Mike Oxley, Simon Phillips A Research Directions In Computing

17

Custom application - Analog

Applications & The Paper

Uses clustering tech to analyse log files

To dynamically generate possibly interesting links

Means

End

Successful(to an extent)

Page 18: 1 From User Access Patterns to Dynamic Hypertext Linking Patrick Farrell, Siddharth Gudka, Mike Oxley, Simon Phillips A Research Directions In Computing

18

1996-2005 Technology Directions

Vivisimo

Google Labs

Clustering Documents

Amazon

Flickr

Tivo

Collaborative Filtering

Page 19: 1 From User Access Patterns to Dynamic Hypertext Linking Patrick Farrell, Siddharth Gudka, Mike Oxley, Simon Phillips A Research Directions In Computing

19

Amazon.com

• Uses recommendation algorithm– person who bought ‘x’ also bought ‘y’

• Item-to-item collaborative filtering– provides recommendations based on grouped

items, not customers

For each item in product catalog, I1 For each customer C who purchased I1 For each item I2 purchased by customer C Record that a customer purchased I1 and I2 For each item I2 Compute the similarity between I1 and I2

Ess

ence

Page 20: 1 From User Access Patterns to Dynamic Hypertext Linking Patrick Farrell, Siddharth Gudka, Mike Oxley, Simon Phillips A Research Directions In Computing

20

Amazon.com

• Creates vectors where each vector is an item with M dimensions (customers)

• Similarity between two items computed by measuring cosine of angle between two vectors.

• Offline computation theoretically expensive: O(N2M)

• In practice only O(NM) as most customers have few purchases.

Page 21: 1 From User Access Patterns to Dynamic Hypertext Linking Patrick Farrell, Siddharth Gudka, Mike Oxley, Simon Phillips A Research Directions In Computing

21

Conclusion

• The paper was on the right track

• Appreciated applicability of clustering to e-commerce

• Hypothesis proved by experiment

• Failed to address or even predict scalability issues

Page 22: 1 From User Access Patterns to Dynamic Hypertext Linking Patrick Farrell, Siddharth Gudka, Mike Oxley, Simon Phillips A Research Directions In Computing

22

References

• Author’s Work– Yan, T., Jacobsen, M., Garcia-Molina, H., Dayal, U., ‘From

User Access Patterns to Dynamic Hypertext Linking,’ In: Fifth International World Wide Web Conference, 1996 (Paris, France)

– Paepcke, A., Garcia-Molina, H., Rodriquez, G. and Cho, J., ‘Beyond Document Similarity: Understanding Value-Based Search and Browsing Technologies’, In: Stanford University Technical Report, 2000

– Delic, K. A. and Dayal, U., ‘Knowledge-Based Support Services: Monitoring and Adaptation,’ In: Proceedings of the 11th international Workshop on Database and Expert Systems Applications, IEEE Computer Society, 2000

Page 23: 1 From User Access Patterns to Dynamic Hypertext Linking Patrick Farrell, Siddharth Gudka, Mike Oxley, Simon Phillips A Research Directions In Computing

23

References

• Related Work– Baraglia, R., Silverstri, F., Palmerini, P., ‘On-line Generation

of Suggestions for Web Users’, In: Proceedings of IEEE International Conference on Information Technology: Coding and Computing, April 2004

– Baraglia, R., Palmerini, P., ‘A web usage mining system’, In: Proceedings of IEEE International Conference on Information Technology: Coding and Computing, April 2002

– Wenyin, L., Chen, Z., Lin, F., Zhang, H., Ma, W., ‘Ubiquitous Media Agents: A framework for managing personally accumulated multimedia files,’ 9th ACM international conference on multimedia, 2003 (Toronto, Canada)

– Jiang, Z., Kleinrock, L., ‘Web prefetching in a mobile environment’, IEEE Personal Communications 5(5): 25 – 34, October 1998

Page 24: 1 From User Access Patterns to Dynamic Hypertext Linking Patrick Farrell, Siddharth Gudka, Mike Oxley, Simon Phillips A Research Directions In Computing

24

References

– Lou, W., Lu, H., Liu, G., Yiang, Q., ‘Restoring Meaningful Episodes in a Proxy Log’, 2001.

– Ungar, L., Foster, D., ‘Clustering Methods For Collaborative Filtering’, In: AAAI Workshop On Recommendation Systems, 1998.

– Linden, G., Smith, B., York, J., ‘Amazon.com Recommendations Item-to-Item Collaborative Filtering’, In: IEEE Internet Computing, Vo. 7, No. 1, Jan 2003.