![Page 1: Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign](https://reader036.vdocuments.net/reader036/viewer/2022082908/5a4d1ad07f8b9ab05997138f/html5/thumbnails/1.jpg)
Active Feedback in Ad Hoc IR
Xuehua Shen, ChengXiang ZhaiDepartment of Computer Science
University of Illinois, Urbana-Champaign
![Page 2: Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign](https://reader036.vdocuments.net/reader036/viewer/2022082908/5a4d1ad07f8b9ab05997138f/html5/thumbnails/2.jpg)
2
Normal Relevance Feedback (RF)
Feedback
Judgments:d1 +d2 -…dk -
Query RetrievalSystem
Top K Resultsd1 3.5d2 2.4…dk 0.5
User
DocumentCollection
![Page 3: Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign](https://reader036.vdocuments.net/reader036/viewer/2022082908/5a4d1ad07f8b9ab05997138f/html5/thumbnails/3.jpg)
3
Document Selection in RF
Feedback
Judgments:d1 +d2 -…dk -
Query RetrievalSystem
Which k docs
to present ?
User
DocumentCollection
Can we do better than just presenting top-K? (Consider diversity…)
![Page 4: Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign](https://reader036.vdocuments.net/reader036/viewer/2022082908/5a4d1ad07f8b9ab05997138f/html5/thumbnails/4.jpg)
4
Active Feedback (AF)
An IR system actively selects documentsfor obtaining relevance judgments
If a user is willing to judge K documents,
which K documents should we present
in order to maximize learning effectiveness?
![Page 5: Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign](https://reader036.vdocuments.net/reader036/viewer/2022082908/5a4d1ad07f8b9ab05997138f/html5/thumbnails/5.jpg)
5
Outline
• Framework and specific methods
• Experiment design and results
• Summary and future work
![Page 6: Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign](https://reader036.vdocuments.net/reader036/viewer/2022082908/5a4d1ad07f8b9ab05997138f/html5/thumbnails/6.jpg)
6
A Framework for Active Feedback
• Consider active feedback as a decision problem– Decide K documents (D) for relevance judgment
• Formalize it as an optimization problem– Optimize the expected learning benefits (loss) by
requesting relevance judgments on D from the user
• Consider two cases of loss function according to the interaction between documents
![Page 7: Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign](https://reader036.vdocuments.net/reader036/viewer/2022082908/5a4d1ad07f8b9ab05997138f/html5/thumbnails/7.jpg)
7
Formula of the Framework
* arg min ( , ) ( | , , )D
D L D p U q C d
1
( , ) ( , , ) ( | , , )
( , , ) ( | , , )
j
k
i iij
L D l D j p j D U
l D j p j d U
Value of documents for learning
Independent judgment
Different judgments
![Page 8: Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign](https://reader036.vdocuments.net/reader036/viewer/2022082908/5a4d1ad07f8b9ab05997138f/html5/thumbnails/8.jpg)
8
Independent Loss
1
( , ) ( , , ) ( | , , )k
i iij
L D l D j p j d U
1
( , , ) ( , , )k
i ii
l D j l d j
Independent Loss
( ) ( , , ) ( | , , ) ( | , , )i
i i i i ij
r d l d j p j d U p U q C d
*
1
arg min ( , , ) ( | , , ) ( | , , )i
k
i i i iD i j
D l d j p j d U p U q C d
1 1
( , ) ( , , ) ( | , , )kk
i i i ii ij
L D l d j p j d U
Expected loss of each document
![Page 9: Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign](https://reader036.vdocuments.net/reader036/viewer/2022082908/5a4d1ad07f8b9ab05997138f/html5/thumbnails/9.jpg)
9
Independent Loss (cont.)
Uncertainty Sampling
( ,1, ) log ( 1 | , ) ( ,0, ) log ( 0 | , ) i i i
i i i
l d p R d d Cl d p R d d C
( ) ( | , ) ( | , , )i ir d H R d p U q C d
( ) ( , , ) ( | , , ) ( | , , )i
i i i i ij
r d l d j p j d U p U q C d
Top K
1
, 0 1 0
, ( ,1, ) , ( 0, ) ,
i i
i
d C l d Cl d C C C
0 1 0( ) ( ) ( 1 | , , ) ( | , , )i i ir d C C C p j d U p U q C d
Relevant docs more useful than non-relevant docs
More uncertain, more useful
![Page 10: Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign](https://reader036.vdocuments.net/reader036/viewer/2022082908/5a4d1ad07f8b9ab05997138f/html5/thumbnails/10.jpg)
10
Dependent Loss
First select Top N docs of baseline retrieval
Cluster N docs into K clusters
K Cluster Centroid
MMR
…
Gapped Top KPick one doc every G+1 docs
1
( , , ) ( 1 | , , ) ( , )k
i ii
L D U p j d U D
More relevant,more useful
More diverse,more useful
![Page 11: Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign](https://reader036.vdocuments.net/reader036/viewer/2022082908/5a4d1ad07f8b9ab05997138f/html5/thumbnails/11.jpg)
11
Illustration of Three AF Methods
Top-K (normal feedback)
12345678910111213141516…
GappedTop-K
K-Cluster Centroid
Aiming at high diversity …
![Page 12: Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign](https://reader036.vdocuments.net/reader036/viewer/2022082908/5a4d1ad07f8b9ab05997138f/html5/thumbnails/12.jpg)
12
Evaluating Active Feedback
QuerySelect K
Docs
K docs
Judgment File
+
Judged Docs
+ ++
--
InitialResultsNo Feedback
(Top-k, Gapped, Clustering)
FeedbackFeedbackResults
![Page 13: Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign](https://reader036.vdocuments.net/reader036/viewer/2022082908/5a4d1ad07f8b9ab05997138f/html5/thumbnails/13.jpg)
13
Retrieval Methods (Lemur toolkit)
Query Q
DDocument D
Q
)||( DQD Results
KL Divergence
Feedback Docs F={d1, …, dn}
Active Feedback
Default parameter settingsunless otherwise stated
FQQ )1('F
Mixture Model Feedback
Only learn from relevant docs
![Page 14: Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign](https://reader036.vdocuments.net/reader036/viewer/2022082908/5a4d1ad07f8b9ab05997138f/html5/thumbnails/14.jpg)
14
Comparison of Three AF Methods
Collection
Active FB Method
#AFRel
Per topic
Include judged docsMAP Pr@10doc
HARD
2003
Baseline / 0.301 0.501Pseudo FB / 0.320 0.515
Top-K 3.0 0.325 0.527Gapped 2.6 0.330** 0.548 *
Clustering 2.4 0.332 0.565
AP88-89
Baseline / 0.201 0.326Pseudo FB / 0.218 0.343
Top-K 2.2 0.228 0.351Gapped 1.5 0.234 * 0.389 **
Clustering 1.3 0.237 ** 0.393 **Top-K is the worst!Clustering uses fewest relevant docs
![Page 15: Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign](https://reader036.vdocuments.net/reader036/viewer/2022082908/5a4d1ad07f8b9ab05997138f/html5/thumbnails/15.jpg)
15
Appropriate Evaluation of Active Feedback
New DB(AP88-89, AP90)
Original DBwith judged docs(AP88-89, HARD)
+ -+
Original DBwithout judged docs
+ -+
Can’t tell if the ranking of un-judged documents is improved
Different methods have different test documents
See the learning effectmore explicitly
But the docs must be similar to original docs
![Page 16: Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign](https://reader036.vdocuments.net/reader036/viewer/2022082908/5a4d1ad07f8b9ab05997138f/html5/thumbnails/16.jpg)
16
Retrieval Performance on AP90 Dataset
Method Baseline Pseudo
FB
Top K Gapped Top K
K Cluster Centroid
MAP 0.203 0.220 0.220 0.222 0.223
pr@10 0.295 0.317 0.321 0.326** 0.325
Top-K is consistently the worst!
![Page 17: Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign](https://reader036.vdocuments.net/reader036/viewer/2022082908/5a4d1ad07f8b9ab05997138f/html5/thumbnails/17.jpg)
17
Mixture Model Parameter Factor
Mixture Model Parameter alpha factor on the Performance
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.5 0.6 0.7 0.8 0.9 0.95 0.98
alpha
pr@
10do
cs
Top K on HARD
Gapped Top K on HARD
K Cluster Centroid onHARDTop K on AP88-89
Gapped Top K on AP88-89K Cluster Centroid onAP88-89
FQQ )1('
![Page 18: Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign](https://reader036.vdocuments.net/reader036/viewer/2022082908/5a4d1ad07f8b9ab05997138f/html5/thumbnails/18.jpg)
18
Summary
• Introduce the active feedback problem
• Propose a preliminary framework and three methods (Top-k, Gapped Top-k, Clustering)
• Study the evaluation strategy
• Experiment results show that – Presenting the top-k is not the best strategy
– Clustering can generate fewer, higher quality feedback examples
![Page 19: Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign](https://reader036.vdocuments.net/reader036/viewer/2022082908/5a4d1ad07f8b9ab05997138f/html5/thumbnails/19.jpg)
19
Future Work
• Explore other methods for active feedback
• Develop a general framework
• Combine pseudo feedback and active feedback
![Page 20: Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign](https://reader036.vdocuments.net/reader036/viewer/2022082908/5a4d1ad07f8b9ab05997138f/html5/thumbnails/20.jpg)
20
Thank you !
The End