retrieval and evaluation techniques for personal information

Retrieval and Evaluation Tech-niques for Personal Information

Jin Young Kim

7/26 Ph.D Dissertation Seminar

Personal Information Retrieval (PIR)

2

The practice and the study of supporting users to retrieve personal information effectively

Personal Information Retrieval in the Wild Everyone has unique information & practices

Different information and information needs Different preference and behavior

Many existing software solutions Platform-level: desktop search, folder structure Application-level: email, calendar, office suites

3

Previous Work in PIR (Desktop Search) Focus

User interface issues [Dumais03,06]

Desktop-specific features [Solus06] [Cohen08]

Limitations Each based on different environment and user group None of them performed comparative evaluation Research findings do not accumulate over the years

4

Our Approach Develop general techniques for PIR

Start from essential characteristics of PIR Applicable regardless of users and information types

Make contributions to related areas Structured document retrieval Simulated evaluation for known-item finding

Build a platform for sustainable progress Develop repeatable evaluation techniques Share the research findings and the data

5

6

Essential Characteristics of PIR Many document types

Unique metadata for each type

People combine search and browsing [Teevan04]

Long-term interactions with a single user

People mostly find known-items [Elsweiler07]

Privacy concern for the data set

Field-based Search Models

Associative Browsing Model

Simulated Evaluation Methods

Challenge Users may remember different things about the

document How can we present effective results for both cases?

Search and Browsing Retrieval Models

Registration

James

User’s Memory

Query Retrieval Results

Search

Browsing

Lexical Memory

Associa-tive Memory

1.2.3.4.5.

7

Information Seeking Scenario in PIR

Registration

James

James

Registration

2011

User Input

System Output

2011

Search

Browsing

Search

A user initiate a session witha keyword query

The user switches to browsing by clicking on a email document

The user switches to back to search with a different query

Challenge User’s query originates from what she remembers. How can we simulate user’s querying behavior real-

istically?

Simulated Evaluation Techniques

Registration

James

User’s Memory

Query Retrieval Results

Lexical Memory

Associa-tive Memory

1.2.3.4.5.

Search

Browsing

9

Research Questions Field-based Search Models

How can we improve the retrieval effectiveness in PIR? How can we improve the type prediction quality?

Associative Browsing Model How can we enable the browsing support for PIR? How can we improve the suggestions for browsing?

Simulated Evaluation Methods How can we evaluate a complex PIR system by simulation? How can we establish the validity of simulated evaluation?

10

Field-based Search Models

12

Searching for Personal Information An example of desktop search

13

Field-based Search Framework for PIR Type-specific Ranking

Rank documents in each document collection (type) Type Prediction

Predict the document type relevant to user’s query Final Results Generation

Merge into a single ranked list

Type-specific Ranking for PIR Individual collection has type-specific features

Thread-based features for emails Path-based features for documents

Most of these documents have rich metadata Email: <sender, receiver, date, subject, body> Document: <title, author, abstract, content> Calendar: <title, date, place, participants>

We focus on developing general retrieval tech-niques for structured documents

Structured Document Retrieval Field Operator / Advanced Search Interface User’s search terms are found in multiple fields

15

Understanding Re-finding Behavior in Naturalistic Email Interaction Logs. Elsweiler, D, Harvey, M, Hacker., M [SIGIR'11]

16

Structured Document Retrieval: Models Document-based Retrieval Model

Score each document as a whole

Field-based Retrieval Model Combine evidences from each field

q1 q2 ... qm

Document-based Scoring Field-based Scoring

f1

f2

fn

...

q1 q2 ... qm

f1

f2

fn

...

f1

f2

fn

...

w1

w2

wn

w1

w2

wn

17

1

1

221

2

Field Relevance Different fields are important for different query

terms

‘james’ is relevant when it occurs in

<to>

‘registration’ is rele-vant

when it occurs in <sub-ject>

Field Relevance Model for Structured IR

18

Estimating the Field Relevance: Overview If User Provides Feedback

Relevant document provides sufficient information

If No Feedback is Available Combine field-level term statistics from multiple

sources

contenttitle

from/to

Relevant Docscontent

titlefrom/to

Collection content

titlefrom/to

Top-k Docs

+ ≅

Assume a user who marked DR as relevant Estimate field relevance from the field-level term

dist. of DR

We can personalize the results accordingly Rank higher docs with similar field-level term distri-

bution This weight is provably optimal under LM retrieval

framework

Estimating Field Relevance using Feedback

19DR

- To is relevant for ‘james’- Content is relevant for ‘reg-istration’

Field Relevance:

Linear Combination of Multiple Sources Weights estimated using training queries

Features Field-level term distribution of the collection

Unigram and Bigram LM

Field-level term distribution of top-k docs Unigram and Bigram LM

A priori importance of each field (wj) Estimated using held-out training queries

Estimating Field Relevance without Feedback

20

Unigram is thesame to PRM-S

Similar to MFLM and BM25F

Pseudo-rele-vance Feed-back

21

Retrieval Using the Field Relevance Comparison with Previous Work

Ranking in the Field Relevance Model

q1 q2 ... qm

f1

f2

fn

...

f1

f2

fn

...

w1

w2

wn

w1

w2

wn

q1 q2 ... qm

f1

f2

fn

...

f1

f2

fn

...

P(F1|q1)

P(F2|q1)

P(Fn|q1)

P(F1|qm)

P(F2|qm)

P(Fn|qm)

Per-term Field Weight

Per-term Field Score

sum

multiply

22

Retrieval Effectiveness (Metric: Mean Reciprocal Rank)

DQL BM25F MFLM FRM-C FRM-T FRM-RTREC 54.2% 59.7% 60.1% 62.4% 66.8% 79.4%IMDB 40.8% 52.4% 61.2% 63.7% 65.7% 70.4%Monster 42.9% 27.9% 46.0% 54.2% 55.8% 71.6%

Evaluating the Field Relevance Model

DQL BM25F MFLM FRM-C FRM-T FRM-R20.0%

30.0%

40.0%

50.0%

60.0%

70.0%

80.0%

TRECIMDBMonster

Fixed Field WeightsPer-term Field Weights

23

Type Prediction Methods Field-based collection Query-Likelihood (FQL)

Calculate QL score for each field of a collection Combine field-level scores into a collection score

Feature-based Method Combine existing type-prediction methods Grid Search / SVM for finding combination weights

24

Type Prediction Performance Pseudo-desktop Collections

CS Collection

FQL improves performance over CQL Combining features improves the performance further

(% of queries with correct prediction)

Summary So Far… Field relevance model for structured document

retrieval Enables relevance feedback through field weighting Improves performance using linear feature-based

estimation

Type prediction methods for PIR Field-based type prediction method (FQL) Combination of features improve the performance

further

We move onto associative browsing model What happens when users can’t recall good search

terms?

Associative Browsing Model

Recap: Retrieval Framework for PIR

Registration

James

James

Keyword Search Associative Browsing

27

User Interaction for Associative Browsing Users enter a concept or document page by

search The system provides a list of suggestions for

browsing

Data Model User Interface

29

How can we build associations?

Manually?Participants wouldn’t create associa-tions beyond simple tagging opera-

tions- Sauermann et al. 2005

Automatically?How would it match user’s preference?

30

Building the Associative Browsing Model

2. Concept Extrac-tion

3. Link Extraction

4. Link Refinement

1. Document Collec-tion

Term SimilarityTemporal SimilarityCo-occurrence Click-based Train-ing

31

Concept: Search Engine

Link Extraction and Refinement Link Scoring

Combination of link type scores S(c1,c2) = Σi [ wi × Linki(c1,c2) ]

Link Presentation Ranked list of suggested items Users click on them for browsing

Link Refinement (training wi) Maximize click-based relevance

Grid Search : Maximize retrieval effectiveness (MRR) RankSVM : Minimize error in pairwise preference

Concepts DocumentsTerm Vector SimilarityTemporal SimilarityTag SimilarityString Similarity Path / Type Simi-

larityCo-occurrence Concept Similar-

ity

32

Evaluating Associative Browsing Model Data set: CS Collection

Collect public documents in UMass CS department CS dept. people competed in known-item finding

tasks

Value of browsing for known-item finding % of sessions browsing was used % of sessions browsing was used & led to success

Quality of browsing suggestions Mean Reciprocal Rank using clicks as judgments 10-fold cross validation over the click data collected

Value of Browsing for Known-item Finding

Comparison with Simulation Results Roughly matches in terms of overall usage and suc-

cess ratio

The Value of Associative Browsing Browsing was used in 30% of all sessions Browsing saved 75% of sessions when used

Evaluation Type Total(#ses-sions)

Browsing used

Successfuloutcome

Simulation 63,260 9,410 (14.8%)

3,957 (42.0%)

User Study (1) 290 42 (14.5%) 15 (35.7%)User Study (2) 142 43 (30.2%) 32 (74.4%)

Document OnlyDocument

+ Con-cept

34

Quality of Browsing Suggestions Concept Browsing (MRR)

Document Browsing (MRR) title content tag time string cooc occur Uniform Grid SVM

00.050.1

0.150.2

0.250.3

0.350.4

0.450.5

CS/Top1 CS/Top5

title content tag time topic path type concept Uniform Grid SVM 0

0.020.040.060.080.1

0.120.140.160.18

CS/Top1 CS/Top5

Simulated Evaluation Methods

36

Challenges in PIR Evaluation Hard to create a ‘test-collection’

Each user has different documents and habits People will not donate their documents and queries

for research

Limitations of user study Experimenting with a working system is costly Experimental control is hard with real users and

tasks Data is not reusable by third parties

37

Simulate components of evaluation Collection: user’s documents with metadata Task: search topics and relevance judgments Interaction: query and click data

Our Approach: Simulated Evaluation

Simulated Evaluation Overview Simulated document collections

Pseudo-desktop Collections Subsets of W3C mailing list + Other document types

CS Collection UMass CS mailing list / Calendar items / Crawl of home-

page

Evaluation MethodsControlled User Study Simulated InteractionField-based Search

DocTrack Search Game Query Generation Meth-ods

Associative Browsing

DocTrack Search + Browsing Game

Probabilistic User Model-ing

39

Controlled User Study: DocTrack Game Procedure

Collect public documents in UMass CS dept. (CS Collection) Build a web interface where participants can find docu-

ments People in CS department participated

DocTrack search game 20 participants / 66 games played 984 queries collected for 882 target documents

DocTrack search+browsing game 30 participants / 53 games played 290 +142 search sessions collected

40

DocTrack Game

*Users can use search and browsing for DocTrack search+browsing game

41

Query Generation for Evaluating PIR Known-item finding for PIR

A target document represents an information need Users would take terms from the target document

Query Generation for PIR Randomly select a target document Algorithmically take terms from the document

Parameters of Query Generation Choice of extent : Document [Azzopardi07] vs. Field Choice of term : Uniform vs. TF vs. IDF vs. TF-IDF [Az-

zopardi07]

42

Validating of Generated Queries Basic Idea

Use the set of human-generated queries for valida-tion

Compare at the level of query terms and retrieval scores

Validation by Comparing Query-terms The generation probability of manual query q from

Pterm

Validation by Compare Retrieval Scores [Azzopardi07] Two-sided Kolmogorov-Smirnov test

Validation Results for Generated Queries Validation based on query terms

Validation based on retrieval score distribution

44

Probabilistic User Model for PIR Query generation model

Term selection from a target document State transition model

Use browsing when result looks marginally rele-vant

Link selection model Click on browsing suggestions based on perceived

relevance

A User Model for Link Selection User’s level of knowledge

Random : randomly click on a ranked list Informed : more likely to click on more relevant item Oracle : always click on the most relevant item

Relevance estimated using the position of the target item

1 …2 …3 …4 …5 …

1 …2 …3 …4 …5 …

1 …2 …3 …4 …5 … 45

Success Ratio of Browsing Varying the level of knowledge and fan-out for

simulation Exploration is valuable for users with low

knowledge level

FO1 FO2 FO30.3

0.32

0.34

0.36

0.38

0.4

0.42

0.44

0.46

0.48

randominformedoracle

More Exploration46

47

Community Efforts using the Data Sets

Conclusions & Future Work

Major Contributions Field-based Search Models

Field relevance model for structured document retrieval Field-based and combination-based type prediction method

Associative Browsing Model An adaptive technique for generating browsing suggestions Evaluation of associative browsing in known-item finding

Simulated Evaluation Methods for Known-item Finding DocTrack game for controlled user study Probabilistic user model for generating simulated interaction

49

Field Relevance for Complex Structures Current work assumes documents with flat

structure

Field Relevance for Complex Structures? XML documents with hierarchical structure Joined Database Relations with graph structure

Cognitive Model of Query Generation Current query generation methods assume:

Queries are generated from the complete document Query-terms are chosen independently from one

another

Relaxing these assumptions Model the user’s degradation in memory Model the dependency in query term selection

Ongoing work Graph-based representation of documents Query terms can be chosen by random walk

Thank you for your attention! Special thanks to my advisor, coauthors, and all

of you here!

Are we closer to the superhuman now?

One More Slide: What I Learned… Start from what’s happening from user’s mind

Field relevance / query generation, …

Balance user input and algorithmic support Generating suggestions for associative browsing

Learn from your peers & make contributions Query generation method / DocTrack game Simulated test collections & workshop

retrieval and evaluation techniques for personal information

Documents

information types

single search session

retrieval perspective

research problem

single user people

previous work

related areas of research

data set