retrieval and evaluation techniques for personal information
DESCRIPTION
Retrieval and Evaluation Techniques for Personal Information. Jin Young Kim. 7/ 26 Ph.D Dissertation Seminar. Personal Information Retrieval (PIR). The practice and the study of supporting users to retrieve personal information effectively. Personal Information Retrieval in the Wild. - PowerPoint PPT PresentationTRANSCRIPT
Retrieval and Evaluation Tech-niques for Personal Information
Jin Young Kim
7/26 Ph.D Dissertation Seminar
Personal Information Retrieval (PIR)
2
The practice and the study of supporting users to retrieve personal information effectively
Personal Information Retrieval in the Wild Everyone has unique information & practices
Different information and information needs Different preference and behavior
Many existing software solutions Platform-level: desktop search, folder structure Application-level: email, calendar, office suites
3
Previous Work in PIR (Desktop Search) Focus
User interface issues [Dumais03,06]
Desktop-specific features [Solus06] [Cohen08]
Limitations Each based on different environment and user group None of them performed comparative evaluation Research findings do not accumulate over the years
4
Our Approach Develop general techniques for PIR
Start from essential characteristics of PIR Applicable regardless of users and information types
Make contributions to related areas Structured document retrieval Simulated evaluation for known-item finding
Build a platform for sustainable progress Develop repeatable evaluation techniques Share the research findings and the data
5
6
Essential Characteristics of PIR Many document types
Unique metadata for each type
People combine search and browsing [Teevan04]
Long-term interactions with a single user
People mostly find known-items [Elsweiler07]
Privacy concern for the data set
Field-based Search Models
Associative Browsing Model
Simulated Evaluation Methods
Challenge Users may remember different things about the
document How can we present effective results for both cases?
Search and Browsing Retrieval Models
Registration
James
User’s Memory
Query Retrieval Results
Search
Browsing
Lexical Memory
Associa-tive Memory
1.2.3.4.5.
7
Information Seeking Scenario in PIR
Registration
James
James
Registration
2011
User Input
System Output
2011
Search
Browsing
Search
A user initiate a session witha keyword query
The user switches to browsing by clicking on a email document
The user switches to back to search with a different query
Challenge User’s query originates from what she remembers. How can we simulate user’s querying behavior real-
istically?
Simulated Evaluation Techniques
Registration
James
User’s Memory
Query Retrieval Results
Lexical Memory
Associa-tive Memory
1.2.3.4.5.
Search
Browsing
9
Research Questions Field-based Search Models
How can we improve the retrieval effectiveness in PIR? How can we improve the type prediction quality?
Associative Browsing Model How can we enable the browsing support for PIR? How can we improve the suggestions for browsing?
Simulated Evaluation Methods How can we evaluate a complex PIR system by simulation? How can we establish the validity of simulated evaluation?
10
Field-based Search Models
12
Searching for Personal Information An example of desktop search
13
Field-based Search Framework for PIR Type-specific Ranking
Rank documents in each document collection (type) Type Prediction
Predict the document type relevant to user’s query Final Results Generation
Merge into a single ranked list
Type-specific Ranking for PIR Individual collection has type-specific features
Thread-based features for emails Path-based features for documents
Most of these documents have rich metadata Email: <sender, receiver, date, subject, body> Document: <title, author, abstract, content> Calendar: <title, date, place, participants>
We focus on developing general retrieval tech-niques for structured documents
Structured Document Retrieval Field Operator / Advanced Search Interface User’s search terms are found in multiple fields
15
Understanding Re-finding Behavior in Naturalistic Email Interaction Logs. Elsweiler, D, Harvey, M, Hacker., M [SIGIR'11]
16
Structured Document Retrieval: Models Document-based Retrieval Model
Score each document as a whole
Field-based Retrieval Model Combine evidences from each field
q1 q2 ... qm
Document-based Scoring Field-based Scoring
f1
f2
fn
...
q1 q2 ... qm
f1
f2
fn
...
f1
f2
fn
...
w1
w2
wn
w1
w2
wn
17
1
1
221
2
Field Relevance Different fields are important for different query
terms
‘james’ is relevant when it occurs in
<to>
‘registration’ is rele-vant
when it occurs in <sub-ject>
Field Relevance Model for Structured IR
18
Estimating the Field Relevance: Overview If User Provides Feedback
Relevant document provides sufficient information
If No Feedback is Available Combine field-level term statistics from multiple
sources
contenttitle
from/to
Relevant Docscontent
titlefrom/to
Collection content
titlefrom/to
Top-k Docs
+ ≅
Assume a user who marked DR as relevant Estimate field relevance from the field-level term
dist. of DR
We can personalize the results accordingly Rank higher docs with similar field-level term distri-
bution This weight is provably optimal under LM retrieval
framework
Estimating Field Relevance using Feedback
19DR
- To is relevant for ‘james’- Content is relevant for ‘reg-istration’
Field Relevance:
Linear Combination of Multiple Sources Weights estimated using training queries
Features Field-level term distribution of the collection
Unigram and Bigram LM
Field-level term distribution of top-k docs Unigram and Bigram LM
A priori importance of each field (wj) Estimated using held-out training queries
Estimating Field Relevance without Feedback
20
Unigram is thesame to PRM-S
Similar to MFLM and BM25F
Pseudo-rele-vance Feed-back
21
Retrieval Using the Field Relevance Comparison with Previous Work
Ranking in the Field Relevance Model
q1 q2 ... qm
f1
f2
fn
...
f1
f2
fn
...
w1
w2
wn
w1
w2
wn
q1 q2 ... qm
f1
f2
fn
...
f1
f2
fn
...
P(F1|q1)
P(F2|q1)
P(Fn|q1)
P(F1|qm)
P(F2|qm)
P(Fn|qm)
Per-term Field Weight
Per-term Field Score
sum
multiply
22
Retrieval Effectiveness (Metric: Mean Reciprocal Rank)
DQL BM25F MFLM FRM-C FRM-T FRM-RTREC 54.2% 59.7% 60.1% 62.4% 66.8% 79.4%IMDB 40.8% 52.4% 61.2% 63.7% 65.7% 70.4%Monster 42.9% 27.9% 46.0% 54.2% 55.8% 71.6%
Evaluating the Field Relevance Model
DQL BM25F MFLM FRM-C FRM-T FRM-R20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
80.0%
TRECIMDBMonster
Fixed Field WeightsPer-term Field Weights
23
Type Prediction Methods Field-based collection Query-Likelihood (FQL)
Calculate QL score for each field of a collection Combine field-level scores into a collection score
Feature-based Method Combine existing type-prediction methods Grid Search / SVM for finding combination weights
24
Type Prediction Performance Pseudo-desktop Collections
CS Collection
FQL improves performance over CQL Combining features improves the performance further
(% of queries with correct prediction)
Summary So Far… Field relevance model for structured document
retrieval Enables relevance feedback through field weighting Improves performance using linear feature-based
estimation
Type prediction methods for PIR Field-based type prediction method (FQL) Combination of features improve the performance
further
We move onto associative browsing model What happens when users can’t recall good search
terms?
Associative Browsing Model
Recap: Retrieval Framework for PIR
Registration
James
James
Keyword Search Associative Browsing
27
User Interaction for Associative Browsing Users enter a concept or document page by
search The system provides a list of suggestions for
browsing
Data Model User Interface
29
How can we build associations?
Manually?Participants wouldn’t create associa-tions beyond simple tagging opera-
tions- Sauermann et al. 2005
Automatically?How would it match user’s preference?
30
Building the Associative Browsing Model
2. Concept Extrac-tion
3. Link Extraction
4. Link Refinement
1. Document Collec-tion
Term SimilarityTemporal SimilarityCo-occurrence Click-based Train-ing
31
Concept: Search Engine
Link Extraction and Refinement Link Scoring
Combination of link type scores S(c1,c2) = Σi [ wi × Linki(c1,c2) ]
Link Presentation Ranked list of suggested items Users click on them for browsing
Link Refinement (training wi) Maximize click-based relevance
Grid Search : Maximize retrieval effectiveness (MRR) RankSVM : Minimize error in pairwise preference
Concepts DocumentsTerm Vector SimilarityTemporal SimilarityTag SimilarityString Similarity Path / Type Simi-
larityCo-occurrence Concept Similar-
ity
32
Evaluating Associative Browsing Model Data set: CS Collection
Collect public documents in UMass CS department CS dept. people competed in known-item finding
tasks
Value of browsing for known-item finding % of sessions browsing was used % of sessions browsing was used & led to success
Quality of browsing suggestions Mean Reciprocal Rank using clicks as judgments 10-fold cross validation over the click data collected
Value of Browsing for Known-item Finding
Comparison with Simulation Results Roughly matches in terms of overall usage and suc-
cess ratio
The Value of Associative Browsing Browsing was used in 30% of all sessions Browsing saved 75% of sessions when used
Evaluation Type Total(#ses-sions)
Browsing used
Successfuloutcome
Simulation 63,260 9,410 (14.8%)
3,957 (42.0%)
User Study (1) 290 42 (14.5%) 15 (35.7%)User Study (2) 142 43 (30.2%) 32 (74.4%)
Document OnlyDocument
+ Con-cept
34
Quality of Browsing Suggestions Concept Browsing (MRR)
Document Browsing (MRR) title content tag time string cooc occur Uniform Grid SVM
00.050.1
0.150.2
0.250.3
0.350.4
0.450.5
CS/Top1 CS/Top5
title content tag time topic path type concept Uniform Grid SVM 0
0.020.040.060.080.1
0.120.140.160.18
CS/Top1 CS/Top5
Simulated Evaluation Methods
36
Challenges in PIR Evaluation Hard to create a ‘test-collection’
Each user has different documents and habits People will not donate their documents and queries
for research
Limitations of user study Experimenting with a working system is costly Experimental control is hard with real users and
tasks Data is not reusable by third parties
37
Simulate components of evaluation Collection: user’s documents with metadata Task: search topics and relevance judgments Interaction: query and click data
Our Approach: Simulated Evaluation
Simulated Evaluation Overview Simulated document collections
Pseudo-desktop Collections Subsets of W3C mailing list + Other document types
CS Collection UMass CS mailing list / Calendar items / Crawl of home-
page
Evaluation MethodsControlled User Study Simulated InteractionField-based Search
DocTrack Search Game Query Generation Meth-ods
Associative Browsing
DocTrack Search + Browsing Game
Probabilistic User Model-ing
39
Controlled User Study: DocTrack Game Procedure
Collect public documents in UMass CS dept. (CS Collection) Build a web interface where participants can find docu-
ments People in CS department participated
DocTrack search game 20 participants / 66 games played 984 queries collected for 882 target documents
DocTrack search+browsing game 30 participants / 53 games played 290 +142 search sessions collected
40
DocTrack Game
*Users can use search and browsing for DocTrack search+browsing game
41
Query Generation for Evaluating PIR Known-item finding for PIR
A target document represents an information need Users would take terms from the target document
Query Generation for PIR Randomly select a target document Algorithmically take terms from the document
Parameters of Query Generation Choice of extent : Document [Azzopardi07] vs. Field Choice of term : Uniform vs. TF vs. IDF vs. TF-IDF [Az-
zopardi07]
42
Validating of Generated Queries Basic Idea
Use the set of human-generated queries for valida-tion
Compare at the level of query terms and retrieval scores
Validation by Comparing Query-terms The generation probability of manual query q from
Pterm
Validation by Compare Retrieval Scores [Azzopardi07] Two-sided Kolmogorov-Smirnov test
Validation Results for Generated Queries Validation based on query terms
Validation based on retrieval score distribution
44
Probabilistic User Model for PIR Query generation model
Term selection from a target document State transition model
Use browsing when result looks marginally rele-vant
Link selection model Click on browsing suggestions based on perceived
relevance
A User Model for Link Selection User’s level of knowledge
Random : randomly click on a ranked list Informed : more likely to click on more relevant item Oracle : always click on the most relevant item
Relevance estimated using the position of the target item
1 …2 …3 …4 …5 …
1 …2 …3 …4 …5 …
1 …2 …3 …4 …5 … 45
Success Ratio of Browsing Varying the level of knowledge and fan-out for
simulation Exploration is valuable for users with low
knowledge level
FO1 FO2 FO30.3
0.32
0.34
0.36
0.38
0.4
0.42
0.44
0.46
0.48
randominformedoracle
More Exploration46
47
Community Efforts using the Data Sets
Conclusions & Future Work
Major Contributions Field-based Search Models
Field relevance model for structured document retrieval Field-based and combination-based type prediction method
Associative Browsing Model An adaptive technique for generating browsing suggestions Evaluation of associative browsing in known-item finding
Simulated Evaluation Methods for Known-item Finding DocTrack game for controlled user study Probabilistic user model for generating simulated interaction
49
Field Relevance for Complex Structures Current work assumes documents with flat
structure
Field Relevance for Complex Structures? XML documents with hierarchical structure Joined Database Relations with graph structure
Cognitive Model of Query Generation Current query generation methods assume:
Queries are generated from the complete document Query-terms are chosen independently from one
another
Relaxing these assumptions Model the user’s degradation in memory Model the dependency in query term selection
Ongoing work Graph-based representation of documents Query terms can be chosen by random walk
Thank you for your attention! Special thanks to my advisor, coauthors, and all
of you here!
Are we closer to the superhuman now?
One More Slide: What I Learned… Start from what’s happening from user’s mind
Field relevance / query generation, …
Balance user input and algorithmic support Generating suggestions for associative browsing
Learn from your peers & make contributions Query generation method / DocTrack game Simulated test collections & workshop