exploiting implicit feedback to identify usage patterns on the desktop
DESCRIPTION
Exploiting Implicit Feedback to Identify Usage Patterns on the Desktop. Bachelor Thesis Leibniz University of Hanover Micha ł Kopycki. Bestseller. H ow to write SPYWARE for “research purpose” and get paid for this. Personalization Research Issues (from Eelco’s presentation). - PowerPoint PPT PresentationTRANSCRIPT
04/20/23Michał Kopycki 1
Exploiting Implicit Feedback to Identify Usage Patterns on the Desktop
Bachelor Thesis Leibniz University of Hanover Michał Kopycki
04/20/23Michał Kopycki 2
How to write SPYWARE for “research purpose” and get paid for this
Bestselle
r
04/20/23Michał Kopycki 3
Personalization Research Issues (from Eelco’s presentation)
Data Acquisition
Knowledge Inference
Adaptation Decision Making
Adaptation Mechanism
User Model
04/20/23Michał Kopycki 4
Outline
Motivation
Logging Framework
User study
Conclusion and future work
Motivation
04/20/23Michał Kopycki 5
MovielensAmazon
Del.icio.us
LastFM
Haystack ‘97
Letizia ‘95
Stuff I’ve Seen ‘03
LifeStreams ‘96
JIRIT ‘00
[BM02]
[CDH+08][Her06]
[CSC+07]
[RM00]Beagle++ ‘05
[WJR02]
[CGNP05][CN06]
[TDH05]
StumbleUpon
Libra
User ContextUser Context
User Context ... in our context
04/20/23Michał Kopycki 6
TFxIDF
GPS location
Reference
Genre
Sender
Resource as context
Web address
Interaction with resource as context
Sequence of access
Time windows
Bookmarking
Reading time
Printing document
04/20/23Sergey Chernov, Task Detection for Activity-Based Desktop Search, L3S Research Seminar
Slide 7 of 16
What is user context good for ?Desktop SearchDesktop Search
Search!Logger
www.pharos-audiovisual-search.eu
Pharos Project
Pharos Deliverable
Pharos Review
pas.kbs.uni-hannover.de
PIM Research
PIM 2008 paper
Logger v0.2
3/12/2008 3/12/2008
11:00 12:00 13:00 14:00 15:00 16:00 17:00
10:30 - 11:59Pharos work
12:58 - 13:58PIM Research13:58 - 14:57
Pharos
14:57 - 17:01PIM
1. Relationships between resources
2. Elicitation of user interests 3. Activity based computing
04/20/23Michał Kopycki 7
Thesis goals
1. User context recognition support
2. Public Desktop dataset alternative
04/20/23Michał Kopycki 8
“…exploiting usage analysis information about sequences of accesses to local resources…” (L3S 2006)
„… The absence of shared information makes it difficult to focus research problems, and to compare research results…” (Newman 1997)
“…an appropriate common test collection that is accepted by the community is required…” (Voorhees. 2001)
“…Desktop datasets within different research groups using a single methodology and a common set of tools …” (L3S 2008)
“…Building a Desktop IR testbed seems to be more challenging…”(L3S 2007)
04/20/23Michał Kopycki 9
Outline
Motivation
Logging Framework
User study
Conclusion and future work
Requirements
- Automatic
04/20/23Michał Kopycki 10
- Automatic
- Cross-application- Implicit Feedback
- Privacy preserving
- Cross-application- Implicit Feedback
A
B
C
Relevant
Not relevant
Relevant
Not relevant
Relevant
Not relevant
- Privacy preserving
Web
File System
IM
- Extensible- ExtensibleLogging Framework
New best Email client plug-in
New best Web browser plug-in
Applications
Our approach
04/20/23Michał Kopycki 11
Resources
Component view
04/20/23Michał Kopycki 12
User Activity Logger
Desktop
Window Events
File System
Internet Explorer
Outlook Express
Thunderbird
Firefox
Thudnerbird
Firefox
Outlook 2003
Outlook 2007
C\C++
Window hooks
File system drivers
Windows undocumented API
JavaScript
XUL
C#
VSTO
.NET
Logging Framework
04/20/23Michał Kopycki 13
Supported notifications
04/20/23Michał Kopycki 14
Nepomuk adaptation
04/20/23Michał Kopycki 15
User Observation Hub
Logging Framework
04/20/23Michał Kopycki 16
Outline
Motivation
Logging Framework
User study
Conclusion and future work
User study
21 participants Average of 170 active logging days 2,828,706 Events Average of 2,815 distinct emails per user Average of 9,337 distinct URLs per user Average of 902 events per user per day Average 5 hours of active interaction per user per day
04/20/23Michał Kopycki 17
Dataset activity coverage
04/20/23Michał Kopycki 18
Data collection
04/20/23Michał Kopycki 19
Data Encryption schema
File path level1 \ level2 \ filename . extension
URL Protocol \ host \ dynamic part
URL host part Host name . Domain name . TLD
Address book entry User name \ email address
Email address domain name. TLD
Encryption schemas:
Methodology:
www
l3s
de
A glimpse into user behavior
04/20/23Michał Kopycki 20
Instant reader Moderate reader
04/20/23Michał Kopycki 21
Outline
Motivation
Logging Framework
User study
Conclusion and future work
Conclusion
1. Logging Framework• http://pas.kbs.uni-hannover.de/• http://sourceforge.net/projects/activity-logger
2. User study 3. Desktop Dataset4. Nepomuk integration 5. PIM’08 Workshop paper
04/20/23Michał Kopycki 22
Future work
1. Logging Framework: centralized architecture ontology based RDF output format support for new applications and notifications Vista support
2. Exploratory analysis of the Desktop dataset• Email interaction• Web search interaction • Application interaction
04/20/23Michał Kopycki 23
References[BM02] Peter Brusilovsky and Mark T. Maybury. From adaptive hypermedia to the adaptive web. Communications of the ACM, volume 45, pages 30–
33, 2002.[CDH+08] Sergey Chernov, Gianluca Demartini, Eelco Herder, Michał Kopycki, and Wolfgang Nejdl. Evaluating Personal Information Management
using an activity logs enriched Desktop dataset. In (To appear) PIM ’08: In Proceedings of the Workshop on Personal Information Management, 2008.
[CSC+07] Sergey Chernov, Pavel Serdyukov, Paul-Alexandru Chirita, Gianluca Demartini, and Wolfgang Nejdl. Building a desktop search test-bed. In ECIR ’07: Proceedings of 29th European Conference on IR Research, Advances in Information Retrieval, pages 686–690. Springer, 2007.
[Her06] E. Herder. Forward, Back and Home Again - Analyzing User Behavior on the Web. PhD thesis, University of Twente, Enschede, 2006.[RM00] B. J. Rhodes and P. Maes. Just-in-time information retrieval agents. IBM Systems Journal, volume 39, pages 685–704, 2000.[TDH05] Jaime Teevan, Susan T. Dumais, and Eric Horvitz. Personalizing search via automated analysis of interests and activities. In SIGIR ’05:
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 449–456. ACM, 2005.
[WJR02] R.W. White, J.M. Jose, and I. Ruthven. Comparing explicit and implicit feedback techniques for web retrieval: Trec-10 interactive track report. TREC ’02: Proceedings of the Tenth Text Retrieval Conference, 2002.
[CN06] Paul-Alexandru Chirita, Wolfgang Nejdl Analyzing User Behavior to Rank Desktop Items. In: String Processing and Information Retrieval, 13th International Conference, SPIRE 2006, Proceedings, pp. 86-97, 2006.
[CGNP05] Paul-Alexandru Chirita, Stefania Costache, Wolfgang Nejdl, Raluca Paiu Beagle++: Semantically Enhanced Searching and Ranking on the Desktop. (Electronic Edition) In: The Semantic Web: Research and Applications, 3rd European Semantic Web Conference, ESWC 2006, Proceedings, pp. 348-362, 2006.
[WTN00] Steve Whittaker, Loren Terveen, and Bonnie A. Nardi. Let’s stop pushing the envelope and start addressing it: a Reference Task Agenda for HCI. Human Computer Interaction, volume 15, pages 75–106, 2000.
[McG95] Joseph E. McGrath. Methodology matters: doing research in the behavioral and social sciences. Human-computer interaction: toward the year 2000, pages 152–169, 1995.
[CLWB01] Mark Claypool, Phong Le, Makoto Wased, and David Brown. Implicit interest indicators. In IUI ’01: Proceedings of the 6th international conference on Intelligent user interfaces, pages 33–40. ACM, 2001.
[TAAK04] Jaime Teevan, Christine Alvarado, Mark S. Ackerman, and David R. Karger. The perfect search engine is not enough: a study of orienteering behavior in directed search. In CHI ’04: Proceedings of the SIGCHI conference on Human factors in computing systems, pages 415–422. ACM, 2004.
[WRJ02] Ryen W. White, Ian Ruthven, and Joemon M. Jose. Finding relevant documents using top ranking sentences: an evaluation of two alternative schemes. In SIGIR ’02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pages 57–64. ACM, 2002.
[Voo02] Ellen M. Voorhees. The philosophy of information retrieval evaluation. In CLEF ’01: Revised Papers from the SecondWorkshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems, pages 355–370, London, 2002.
04/20/23Michał Kopycki 24
Many thanks to:
Sergey and Eelco
04/20/23Michał Kopycki 25
Study participants
YOU !!
Related work
04/20/23Michał Kopycki 26
Implicit Feedback
Explicit Feedback
Single domain (Web, Email)
Cross domain
Dragontalk
Connections
Beagle ++
Stuff I’ve Seen
LifeStreamsHaystack
MyLifeBits
[TAAK04]
[WRJ02]
Collected data
04/20/23Michał Kopycki 27
A glimpse into user behavior
File access over folder hierarchy
04/20/23Michał Kopycki 28
A glimpse into user behavior
Web page visit length
04/20/23Michał Kopycki 29
Alternative to the public Desktop dataset
04/20/23Michał Kopycki 30
Dataset 1
Desktop dump
Logging Framework
Dataset 2
Desktop dump
Logging Framework
Dataset 3
Desktop dump
Logging Framework
Comparable Soft-repeatable
Common output
Common structure
Seems hard, but…
04/20/23Michał Kopycki 31
“It is possible”[BLA06],[APRILFOOL08],[HAHA07] DEADLINE