1
Capturing User Intent for Capturing User Intent for Information RetrievalInformation Retrieval
Hien NguyenHien Nguyen
University of ConnecticutUniversity of Connecticut
Major advisorMajor advisor: Dr. Eugene Santos Jr. (UCONN): Dr. Eugene Santos Jr. (UCONN)Associate advisorAssociate advisor: Dr. Robert McCartney (UCONN): Dr. Robert McCartney (UCONN)Associate advisorAssociate advisor: Dr. AnHai Doan (UIUC): Dr. AnHai Doan (UIUC)Associate advisorAssociate advisor: Dr. Robert Henning (UCONN): Dr. Robert Henning (UCONN)
6
ProblemProblem
Why do we need user models for IR?Why do we need user models for IR?
Intermediary
User
Information needs
Information
resources
Employing a cognitive user model for Information Retrieval (IR): capture and use knowledge about a user to improve a user’s effectiveness in an information seeking task.
8
OutlineOutlineOutlineOutlineOutlineOutline
Problem Motivation Our approach Empirical evaluation Conclusion
9
MotivationMotivation
Existing methodologies for building user model for IR include: System-centered approaches: use IR
techniques. (e.g Spink & Losee(1996), Efthimis(96), Lopez-Pujalte (03),Drucker et al(02))
User-centered approaches: use Human Computer Interaction (HCI)/Artificial Intelligence (AI) techniques (e.g:Belkin(93), Radlord(96))
Hybrid approaches: combine IR with HCI/AI techniques. (e.g Logan et al. (94), Decampos et. al 98, Ruthven et al. 03) Very little crossover between IR and AI/HCI to
build user models for IR
10
MotivationMotivation
Existing methodologies for building user model for IR include: System-centered approaches: use IR
techniques. (e.g Spink & Losee(1996), Efthimis(96), Lopez-Pujalte (03),Drucker et al(02))
User-centered approaches: use Human Computer Interaction (HCI)/Artificial Intelligence (AI) techniques (e.g:Belkin(93), Radlord(96))
Hybrid approaches: combine IR with HCI/AI techniques. (e.g Logan et al. (94), Decampos et. al 98, Ruthven et al. 03)
Very little crossover between IR and AI/HCI to build user models for IR
11
MotivationMotivation
Existing methodologies for building user model for IR include: System-centered approaches: use IR
techniques. (e.g Spink & Losee(1996), Efthimis(96), Lopez-Pujalte (03),Drucker et al(02))
User-centered approaches: use Human Computer Interaction (HCI)/Artificial Intelligence (AI) techniques (e.g:Belkin(93), Radlord(96))
Hybrid approaches: combine IR with HCI/AI techniques. (e.g Logan et al. (94), Decampos et. al 98, Ruthven et al. 03)
Very little crossover between IR and AI/HCI to build user models for IR
12
MotivationMotivation
Important factors for building user models for IR:
Partiality
Vagueness
Incremental
Uncertainty
Dynamics Adaptive
Intent:
Author’s intent
User’s intent
Relevance feedback
13
Thesis of our research Thesis of our research
We try to improve a user’s effectiveness in an information seeking task by: Developing a hybrid user model to capture
user intent dynamically by analyzing behavioral information of retrieved relevant documents and by combiningthe captured user intent with the elements of an IR system in a decision theoretic framework (ICAI00, IAT01, UM03, HFES03 & 04, AH04)
Using IR evaluation procedures and collections and examining usability testing to evaluate this model (HFES04, AH04, UM05)
14
ContributionsContributions
Develop a hybrid user model by combining information about a user and information about an IR system in a decision theoretic framework
Develop a unified evaluation framework
Fine-grained representation
Ability to learn user knowledge dynamically
15
OutlineOutlineOutlineOutlineOutlineOutline
Problem Motivation Our approach
IPC Model Hybrid Model
Empirical evaluation Conclusion
16
IPC User Model IPC User Model
Captures user intent.
Consists of 3 components: User interests (I): “What needs to be done or
accomplished?” User preferences (P): “How is something
done or accomplished?” User context (C): “Why is the user trying to
accomplish something?”
(ICAI00, ITA01, UM03, HFES03)
17
Context Network (C)Context Network (C)
Captures user knowledge. It contains concept nodes and relation nodes.
Isa
Urate oxidase
Enzyme
Urate
Isa
Is constructed “on-the-fly” by finding intersections of all retrieved relevant document graphs..
Cosmids Isa Enzyme
IsaBiologically Active
Substance
(a)
Urateoxidase Isa Enzyme
IsaBiologically Active
Substance
(b)
Isa
Biologically Active
Substance
Isa
Urate oxidase
Enzyme
Urate
Isa
18
Interest Set (I)Interest Set (I)
Determines what is currently relevant to a user.
Each element of interest set consists of interest concept (a) and interest level L(a).
Fading mechanism:L(a) = 0.5*(L(a) + n/m)n: number of retrieved relevant document
with am: number of retrieved documents
19
Preference Network (P)Preference Network (P)
Represents how a user wants to form a query
Is represented using Bayesian networks.
Consists of pre-condition, goal and action nodesPre-condition: represents the requirement of a tool used to form a query
Goal: represents a tool to form a query (filter/expander)
Action: represents the modified query
Pc12Pc11 Pc22 Pc31
G1
A1
G2
Pc32
A2
G3
A3
Pc12Pc11
G1
A1
Pc11=T Pc11=F 0.9 0.1
Pc11=T Pc12=T
Pc11=T Pc12=F
Pc11=F Pc12=T
Pc11=F Pc12=F
G1=T 1 0 0 0 G1=F 0 1 1 1
G1=T G1=F A1=T 1 0 A1=F 0 1
20
Preference Network (P)Preference Network (P)
Update: When a user gives relevance feedback after each query. Correction function calculates the probability
that a new preference network will improve retrieval performance for both tools.
The one with higher probability will be added
21
Implementation of IPC User ModelImplementation of IPC User Model Given M={I,P,C} and a query graph q.
Construct I’ by spreading activation algorithm on C.
Set as evidences all interest concepts of I’ found in P and query node representing q found in P.
Perform belief updating on P. Choose top n goal nodes from P (G)
For every goal g in G: Depending on each g, add corresponding paths
in C to q
22
An exampleAn example
Query: Banking transaction
Retrieved document:
Report 1: date 1 April, 2003
Report 14: date 21 April, 2003
Report 16: date 27 April, 2003
Report 7: date 15 April, 2003
Report 8: date 19 April, 2003
Report 1: date 1 April, 2003
Report 14: date 21 April, 2003
Report 16: date 27 April, 2003
Report 7: date 15 April, 2003
Report 8: date 19 April, 2003
suspicious banking transactions
involving Abdul Ramazi.
23
Query: banking transaction:
Example of Query GraphExample of Query Graph
banking_
transaction
transactionbank
related_toisa
24
Example of Document GraphExample of Document Graph
FBI 1) Report Date: 1 April, 2003.
FBI: Abdul Ramazi is the owner of the Select Gourmet Foods shop in Springfield Mall. First Union National Bank lists Select Gourmet Foods as holding account number. Six checks totaling $35.000 have been deposited in this account in the past four months and are recorded as having been drawn on accounts at the Pyramid Bank of Cairo, Egypt and the Central Bank of Dubai, United Arab Emirates. Both of these banks have just been listed as possible conduits in money laundering schemes.
Abdul_
Ramazi
Abdul Ramazi
IsaRelated_to
Select
Gourmet
Foods shop
Related_to
Relate_to
Springfield
Mall
First Union
National Bank
Bank
IsaRelated_to
Holding
Account
number
account
Related_to
Cairo
Pyramid Bank
Cairo
IsaIsa
Dubai
Central
Bank
Isa
money
laundering
scheme
Isa
scheme
………
25
Intersection of retrieved relevant Intersection of retrieved relevant documentsdocuments
bankisaFirst Union
National bank
Abdul
account
_owner
related_to
Abdul_
Ramari
Abdul
_ramazi
isa
…
related_tobank
_account
ramaziisaAbdul
_ramazi
26
Existing Interest SetExisting Interest Set
Interest concept Interest level
money_laundering 0.87deposit 0.82withdraw 0.8bank _account 0.76…..
27
Updated Interest SetUpdated Interest Set
Interest concept Interest level
abdul_ramazi 0.83chicago 0.76bank _account 0.7first_union_national_bank 0.66…..
28
Existing Context NetworkExisting Context Network
money_
laundering
related
_tobanking_
transaction
deposit withdraw
isa
bank_
account
isa
account
isarelated
_to
transaction bank
isa Related_to
29
Updated Context NetworkUpdated Context Network
money_
laundering
related
_tobanking_
transaction
deposit withdraw
isa
bank_
account
isa
account
isarelated
_to
transactionbank
account
_owner
Abdul_
Ramari
isa
related_to
isa
First Union
National bank
...
isa Related_to
30
Existing Preference NetworkExisting Preference Network
bank_account
forged_document
terrorism
money_laundering
proactive_query_1068822..
filter_1068822..
query_1068822..
expander_1168822
proactive_query
_1168822
wmd
Qusay
Iraq
query_116678
…
31
Updated Preference NetworkUpdated Preference Network
bank_account
forged_document
terrorism
money_laundering
proactive_query_1068822..
filter_1068822..
query_1068822..
filter_1163
proactive_query
_1168822
deposit withdrawquery_116678
…….
32
Modified Query GraphModified Query Graph
Original query graph
banking_
transaction
transaction bank
related_toisa
Abdul_
Ramazi
First Union
National bank
Modified query graph
banking_
transaction
transaction bank
related_toisa
related
_to
bank_
account
related_to
isa
account
_owner
isa
33
OutlineOutline
Problem Motivation Our approach
IPC Model Hybrid Model
Empirical evaluation Conclusion
34
Hybrid User ModelHybrid User Model
Motivation Allows deeper influence on an IR system. Adaptation using only a user’s information
may not be helpful if a user is new to a domain Insight information about an IR system may
help a user get closer to his/her final searching goal
35
Hybrid User ModelHybrid User Model
Our approach: Convert this problem into a multi-attribute
decision problem• Determine a set of attributes:
{I,P,C,Q,T,In,D,S}
• Evaluate each outcome by effectiveness function: average precision at three point fixed recalls
36
Hybrid User ModelHybrid User Model
Our approach (continue) Reduce the number of attributes, only Query
(Q) and Threshold (T) are considered Construct a value function over these two
attributes:V(Q,T) = 1V1(Q) + 2V2(T)
iff x2i x1i for all i=1,2
x2i > x1i for some i
),(),( 22211211 xxxx
37
Hybrid User ModelHybrid User Model
Sub value function for a query Take advantage of literature on predicting
query performance from IR Initial sub value function (He and Ounis 04)
Update sub value function
idfqV )(1idf
idfqV
)(1
)1(log
/)5.0(log)(
2
2
N
NNqidf q
)()()( qqidfqidf oldnew
38
Hybrid User ModelHybrid User Model
Sub value function over threshold
otherwise
TTTV t
0
1)(
00 pNT
tR
ttt
e
TlastsimTT
)(1 (Boughanem 00)
39
Implementation of Hybrid User ModelImplementation of Hybrid User Model
QueryIPC
Model
Q1,Q2,…,Qm
Computer V(Qi)
Threshold
preference
Compute
Threshold
Send Qi,T to
search module
Update V(Q)
V(T)Feedback
40
OutlineOutlineOutlineOutlineOutlineOutline
Problem Motivation Our approach
IPC Model Hybrid Model
Empirical evaluation
Conclusion
41
Evaluation objectivesEvaluation objectives
Does our user model capture a user’s intent accurately?
Does our user model improve a user’s effectiveness in an information seeking task?
42
EvaluationFramework
Accuracy Effectiveness
Hypothetical User
Real User
Evaluation frameworkEvaluation framework
43
Evaluation of user model accuracyEvaluation of user model accuracy
Objective: determines how accurate a user’s intent has been captured by comparing models generated by humans and models generated by our system.
Procedures: 5 graduate students 10 queries from CACM collection on
distributed computing and optimization Each user filled our a questionnaire For each query, each user generates a model
from looking at the first 15 returned documents.
44
Evaluation of user model accuracyEvaluation of user model accuracy
User Searchengine
RelevanceFeedback
DistributedComputing
Optimization
User 1 1 2 2 2
User 2 1 7 4 5
User 3 3 3 3 3
User 4 2 6 5 3
User 5 2 2 3 3
Average 1.8 4 3.4 3.2
Profile of 5 participants
45
Evaluation of user model accuracyEvaluation of user model accuracy
Metrics:Metrics:
n
iiII Qsim
nrestavgSimInte
1),( )(
121
n
iiPP Qsim
neferenceavgSim
1),( )(
1Pr
21
n
iiLL QSM
ncalContextavgSimLexi
1, )(
121
n
iiCC QTO
ntnomyContexavgSimTaxo
1, )(
121
46
Evaluation of user model accuracyEvaluation of user model accuracy
User Preference Interest Lexical Taxonomy
User 1 20% 7.77% (48.7%) 25.97% 3.19%
User 2 80% 17.96% (50.5%) 25.97% 2.44%
User 3 90% 33.3% (66.67%) 27.62% 9.06%
User 4 50% 45.8% (72.5%) 41.87% 15.22%
User 5 40% 19.7% (38.54%) 35.4% 10.22%
Average 56% 24.89%(55.38%) 30.2% 8.02%
47
DiscussionDiscussion
Context: Similarity of Lexical (30.2%) is along the line
with the work reported in (Maedche and Staab 2002) for similarity between two ontologies generated by humans.
Taxonomy similarity shows the differences between machine and humans.
Interests and Preferences are captured relatively accurately.
48
Evaluations with a hypothetical userEvaluations with a hypothetical user
Metrics: precision, recall, average precision at three point fixed recall
Testbed: home-made medical database, Cranfield, CACM, Medline.
Procedures: standard and new.
Compare with Ide dec-hi using term frequency inverted document frequency (TFIDF) (Salton and Buckley 90) (Lopez-Pujalte et al 03)
(HEFS 04, AH04)
49
Standard procedure for IPC modelStandard procedure for IPC model
CRANFIELD
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
TFIDF/Ide dec-hi IPC Model
Av
era
ge
pre
cis
ion
Initial run
Feedback run
CACM
00.050.1
0.150.2
0.25
TFIDF/Idedec-hi
IPC ModelAve
rag
e p
reci
sio
n
Initial run
Feedback run
Medline
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
TFIDF/Ide dec-hi IPC Model
Av
era
ge
pre
cis
ion
Initial run
Feedback run
50
New procedure for IPC modelNew procedure for IPC model
00.050.1
0.150.2
0.250.3
0.35
Ide dec-hi Exp 1 Exp 2 Exp 3 Exp4
CRANFIELD
Ave
rag
e p
reci
sio
n
Initial Run
Feedback Run
CACM
00.05
0.10.15
0.20.25
TF
IDF
/Ide
de
c-h
i
Exp
1
Exp
2
Exp
3
Exp
4Av
era
ge
pre
cis
ion
Initial run
Feedback run
MEDLINE
00.10.20.30.40.50.60.7
TFIDF/Idedec-hi
Exp 1 Exp 2 Exp 3 Exp 4
Av
era
ge
Pre
cis
ion
Initial run
Feedback run
51
DiscussionDiscussion
Effectiveness of feedback: Experiments 1,3 and 4 show that using feedback, average precision is always higher than initial run
Competitiveness with TFIDF/Ide dec-hi MEDLINE, CRANFIELD: all experiments of
new procedure show competitive results in feedback run while offer better results in initial run
CACM: competitive results with TFIDF/Ide dec-hi
52
Relevant documents in top 15 for CRANFIELD
Rank TFIDF Exp1 Exp2 Exp3 Exp4
1 13 19 19 20 22
2 33 41 42 41 39
3 49 64 62 65 62
4 26 31 32 31 28
Total 121 155 155 157 151
53
Evaluation with real usersEvaluation with real users(UM05)(UM05)
Test bed: CNS collection on WMD and terrorism.
Our approach vs. Verity Query Language. Subjects use two different systems in parallel. There are 10 scripted queries on “Iran R&D
supporting Biological Weapons”. Only 10 documents are reviewed for relevancy.
Three analysts took part in the experiments.
54
Evaluation with real usersEvaluation with real users
80 15
1
1 5
9
User 1 User 2
User 3
33 3
0
3 13
2
User 1 User 2
User 3
User Model Verity Query Language
User Model Verity
Total unique relevant documents 39 27
Documents marked as relevant by all 3 analysts
8 3
Documents marked as relevant by more than 2 analysts
15 19
Documents marked as relevant by only 1 analyst
24 8
55
Evaluation with real usersEvaluation with real users
Retrieves more relevant documents compared to Verity Query Language.
Tracks individual differences better than off-the-shelf commercial keyword-based system.
56
Standard procedure for hybrid modelStandard procedure for hybrid model
CRANFIELD
00.10.20.30.40.5
TF
IDF
/Ide
dec-
hi
Non
-hyb
rid
Hyb
rid (
sidf
)
Hyb
rid(m
idf/s
idf)
Hyb
rid(m
idf/s
idf)Ave
rag
e p
reci
sio
n
Initial run
Feedback run
57
New procedure for hybrid modelNew procedure for hybrid model
Experiment 1
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
TF
IDF
/Ide
de
c-h
i
No
n-h
ybri
d
Hyb
rid
(si
df)
Hyb
rid
(mid
f/sid
f)("
isa
")
Hyb
rid
(mid
f/sid
f)("
isa
"+"r
ela
ted
to")
Av
era
ge
pre
cis
ion
Initial run
Feedback run
58
DiscussionDiscussion
Hybrid user model achieves more relevant documents in the top 15 compare to IPC model alone.
There is insufficient evidence to conclude which value function works best. Implication: we can use a simple value function
and still achieve good results.
114116
118120
122124
126128
130132
134
TFIDF Standard Exp 1 Exp 2 Exp 3
59
OutlineOutline
Problem Motivation Our approach
IPC Model Hybrid Model
Empirical evaluation Conclusion
60
ConclusionConclusion
Capturing a user’s intent and combining the captured user intent with elements of an IR system in a decision theoretic framework.
Novelties of our approach: Hybrid user model which truly integrates
information about a user and information about an IR system.
Unified evaluation framework. Fine-grained representation of user model Learn user knowledge dynamically
61
Future research directionsFuture research directions
Human
FactorsUser Modeling Information
Retrieval
User Intermediary
Information
resources
Text,Image
Single database, distributed database
System-based metrics, user-based metrics
Value function, utility function
Explanation
Small sample size, Large sample size
Quantitative, Qualitative
Single user, Group
Self-generated knowledge, common knowledge, hybrid
62
Research summaryResearch summary
User Models for Information Retrieval: ICAI00, IAT01, UM03, AAAI DC 2004
Empirical Evaluations of Adaptive User Model: UM03, AH04, UM05
Human Factors: HFES03, HFES04
Collaborative Filtering: UAI99, AAAI Workshop 99, AAAI Workshop 98
Planning, Agents, Distributed Computing: AIPS98, IC2000, SPIE05
63
AcknowledgementAcknowledgement
This research has been funded by AFRL Human This research has been funded by AFRL Human Effectiveness Directorate Through Sytronics Inc. Effectiveness Directorate Through Sytronics Inc. and Advanced Research and Development Activity and Advanced Research and Development Activity (ARDA) – US Government.(ARDA) – US Government.
Thanks to Dr. Santos Jr, Dr. McCartney, Dr. Thanks to Dr. Santos Jr, Dr. McCartney, Dr. Henning, Dr. Doan, Dr. Zhao, Hua Wang, Fei Gao, Henning, Dr. Doan, Dr. Zhao, Hua Wang, Fei Gao, Erik Pukinskis, Bence Mayar, Chester Lee, Greg Erik Pukinskis, Bence Mayar, Chester Lee, Greg Johnson, Hang Dinh, Mohamed, and Feng Zhang. Johnson, Hang Dinh, Mohamed, and Feng Zhang.
64
ReferencesReferences
Impacts of User Modeling on Personalization of Information Retrieval: An evaluation with hyman intelligence analysts. Eugene Santos Jr., Qunhua Zhao, Hien Nguyen, Hua Wang. 2005. In Technical report of Workshop on Evaluation of Adaptive Systems at UM 2005. To appear.
Capturing User Intent for Information Retrieval. 2004. Hien Nguyen, Eugene Santos Jr., Qunhua Zhao and Hua Wang. In Proceedings of the 48th Annual Meeting for the Human Factors and Ergonomics Society (HFES-04). New Orleans, LA. 2004. Pages 371-375.
Evaluation of Effects on Retrieval Performance for an Adaptive User Model. Hien Nguyen, Eugene Santos Jr., Qunhua Zhao and Chester Lee. 2004. In Adaptive Hypermedia 2004: Workshop Proceedings - Part I. Eindhoven, the Netherlands. Pages 193-202.
User Modeling for Intent Prediction in Information Analysis. 2003. Eugene Santos Jr., Hien Nguyen, Qunhua Zhao, and Hua Wang. In Proceedings of the 47th Annual Meeting for the Human Factors and Ergonomics Society (HFES-03), Denver, CO. Pages 1034-1038.
Empirical Evaluation of Adaptive User Modeling in a Medical Information Retrieval Application. 2003. Eugene Santos Jr., Hien Nguyen, Qunhua Zhao, and Erik Pukinskis. Lecture Notes in Artificial Intelligence 2702: User Modeling 2003 (Eds. P. Brusilovsky, A. Corbett, and F. de Rosis), Springer. Pages 292-296.
Kavanah: An Active User Interface Information Retrieval Application. 2001.Eugene Santos Jr., Hien Nguyen and Scott M. Brown. Proceedings of the 2nd Asia-Pacific Conference on Intelligent Agent Technology. Maebashi, Japan. Pages 412-423.
Active User Interface in a Knowledge Discovery and Retrieval System. 2000. Hien Nguyen, Mitch G. Saba, Eugene Santos, Jr. and Scott M. Brown. In Proceedings of the International Conference on Artificial Intelligence (ICAI2000). Las Vegas, Nevada. June 2000. Pages 339-344.
Medical Document Information Retrieval through Active User Interfaces. 2000. Eugene Santos Jr., Hien Nguyen, Scott M. Brown. In Proceedings of the International Conference on Artificial Intelligence (ICAI2000). Las Vegas, Nevada. June 2000. Pages 323-329. (Invited paper).