lab 2 · 2019-03-20 · schedule ground-truth. ad-hoc metrics for se evaluation. implementation of...
TRANSCRIPT
Schedule● Ground-Truth.
● Ad-hoc metrics for SE evaluation.
● Implementation of some metrics for SE evaluation in Python.
2
SE: black-boxInput: query
Output: Rank of Document_Identifiers
SE1. did_32. did_53. did_14. did_105. did_46. did_27. did_78. did_6
.
.
.
4
How to choose the best one?
SE_11. did_32. did_53. did_14. did_05. did_4
.
.
.
5
SE_21. did_732. did_33. did_424. did_105. did_3
.
.
.
Ground-Truth: set of relevant documents for a particular query.
We do not have scores.
We have no order.
Ground-Truth
6
Who is the most important document for query q_5?
How to choose the best one?
SE_11. did_92. did_53. did_74. did_05. did_4
.
.
.
7
SE_21. did_32. did_43. did_424. did_105. did_1
.
.
.
GT(q_1) = {did_3, did_6, did_2, did_8, did_0}Result_SE_1(q_1) = [did_0, did_1, did_2, did_3, did_4, did_5, did_6, did_7, did_8, did_9...
GT(q_2) = {did_7, did_3}Result_SE_1(q_2) = [did_3, did_7, did_2, did_1, did_4, did_5, did_6, did_2, did_8, did_9...Result_SE_2(q_2) = [did_0, did_1, did_2, did_8, did_4, did_5, did_6, did_9, did_7, did_3...
GT(q_2) = {did_7, did_3}Result_SE_2(q_2) = [did_7, did_3, did_2, did_1, did_4, did_5, did_6, did_0, did_8, did_9...GT(q_3) = {did_9, did_5, did_7, did_3, did_1, did_11, did_21, did_43, did_27}Result_SE_1(q_3) = [did_3, did_7, did_2, did_6, did_4, did_10, did_8, did_14, did_12, did_0...
P@k
8
..) The order does not care...) k is not related to |GT|
\text{P@k}(q)=\frac{\text{Number of Relevant Documents in First K Positions}}{\text{k}}\\
\text{R-Precision}(q)=\frac{\text{Number of Relevant Documents in First~} |GT(q)| \text{~Positions}}{|GT(q)|}\\
R-Precision
9
..) The order does not care...) k is now related to |GT|
Which is the relation between P@k and R-Precision?
GT(q_1) = {did_3, did_6, did_2, did_8, did_0}Result_SE_1(q_1) = [did_0, did_1, did_2, did_3, did_4, did_5, did_6, did_7, did_8, did_9...
GT(q_2) = {did_7, did_3}Result_SE_1(q_2) = [did_3, did_7, did_2, did_1, did_4, did_5, did_6, did_2, did_8, did_9...Result_SE_2(q_2) = [did_0, did_1, did_2, did_8, did_4, did_5, did_6, did_9, did_7, did_3...
GT(q_2) = {did_7, did_3}Result_SE_1(q_3) = [did_7, did_3, did_2, did_1, did_4, did_5, did_6, did_0, did_8, did_9...GT(q_3) = {did_9, did_5, did_7, did_3, did_1, did_11, did_21, did_43, did_27}Result_SE_1(q_3) = [did_3, did_7, did_2, did_6, did_4, did_10, did_8, did_14, did_12, did_0...
GT(q_4) = {did_8, did_6, did_4, did_2, did_0}Result_SE_1(q_4) = [did_3, did_5, did_7, did_0, did_2, did_1, did_6, did_9, did_8, did_4...Result_SE_2(q_4) = [did_6, did_0, did_9, did_1, did_3, did_4, did_7, did_5, did_8, did_2...
GT(q_1) = {did_3, did_6, did_2, did_8, did_0}GT(q_2) = {did_7, did_3}GT(q_3) = {did_9, did_5, did_7, did_3, did_1, did_11, did_21, did_43, did_27}GT(q_4) = {did_8, did_6, did_4, did_2, did_0}
Result_SE_1(q_1) = [did_0, did_1, did_2, did_3, did_4, did_5, did_6, did_7, did_8, did_9...Result_SE_1(q_2) = [did_3, did_7, did_2, did_1, did_4, did_5, did_6, did_0, did_8, did_9...Result_SE_1(q_3) = [did_7, did_3, did_2, did_1, did_4, did_5, did_6, did_0, did_8, did_9...Result_SE_1(q_4) = [did_3, did_5, did_7, did_0, did_2, did_1, did_6, did_9, did_8, did_4...
Result_SE_2(q_1) = [did_2, did_5, did_7, did_6, did_4, did_3, did_9, did_0, did_8, did_1...Result_SE_2(q_2) = [did_1, did_3, did_2, did_8, did_0, did_5, did_6, did_4, did_7, did_9...Result_SE_2(q_3) = [did_1, did_3, did_0, did_4, did_7, did_6, did_9, did_5, did_2, did_8...Result_SE_2(q_4) = [did_9, did_6, did_4, did_0, did_1, did_3, did_5, did_8, did_2, did_7...
MRR(Q) = \frac{1}{|Q|} \sum_{\forall q \in Q} \frac{1}{index_q(FirstRelevantResult)}
MRR: Mean Reciprocal Rank
10
Result_SE_1(q_1) = [did_0, did_1, did_2, did_3, did_4, did_5, did_6, did_7, did_8, did_9]GT(q_1) = {did_3, did_6, did_2, did_8, did_2}Result_SE_1(q_1) = [did_0, did_1, did_2, did_3, did_4, did_5, did_6, did_7, did_8, did_9]
GT(q_2) = {did_7, did_3}Result_SE_1(q_2) = [did_3, did_7, did_2, did_1, did_4, did_5, did_6, did_2, did_8, did_9]Result_SE_2(q_2) = [did_0, did_1, did_2, did_8, did_4, did_5, did_6, did_9, did_7, did_3]
GT(q_2) = {did_7, did_3}Result_SE_1(q_3) = [did_0, did_1, did_2, did_3, did_4, did_5, did_6, did_7, did_8, did_9]GT(q_3) = {did_0, did_2, did_2, did_3, did_4, did_5, did_6, did_7, did_8, did_9}Result_SE_1(q_3) = [did_0, did_1, did_2, did_3, did_4, did_5, did_6, did_7, did_8, did_9]
nDCG(q,k) = \frac{DCG(q,k)}{IDCG(q,k)}\\
DCG(q,k) = relevance(doc_1, q) + \sum_{position=2}^{k} \frac{relevance(doc_{position}, q)}{log_2(position)}
nDCG: normalized Discounted Cumulative Gain
11
..) The order does not care...) k is now related to |GT|
IDCG(q,k) is the DCG(q, k) of a perfect ranking algorithm.
GT(q_1) = {did_3, did_6, did_2, did_8, did_0}GT(q_2) = {did_7, did_3}GT(q_3) = {did_9, did_5, did_7, did_3, did_1, did_11, did_21, did_43, did_27}GT(q_4) = {did_8, did_6, did_4, did_2, did_0}
Result_SE_1(q_1) = [did_0, did_1, did_2, did_3, did_4, did_5, did_6, did_7, did_8, did_9...Result_SE_1(q_2) = [did_3, did_7, did_2, did_1, did_4, did_5, did_6, did_0, did_8, did_9...Result_SE_1(q_3) = [did_7, did_3, did_2, did_1, did_4, did_5, did_6, did_0, did_8, did_9...Result_SE_1(q_4) = [did_3, did_5, did_7, did_0, did_2, did_1, did_6, did_9, did_8, did_4...
Result_SE_2(q_1) = [did_2, did_5, did_7, did_6, did_4, did_3, did_9, did_0, did_8, did_1...Result_SE_2(q_2) = [did_1, did_3, did_2, did_8, did_0, did_5, did_6, did_4, did_7, did_9...Result_SE_2(q_3) = [did_1, did_3, did_0, did_4, did_7, did_6, did_9, did_5, did_2, did_8...Result_SE_2(q_4) = [did_9, did_6, did_4, did_0, did_1, did_3, did_5, did_8, did_2, did_7...
nDCG(q,k) = \frac{DCG(q,k)}{IDCG(q,k)}\\
DCG(q,k) = relevance(doc_1, k) + \sum_{position=2}^{k} \frac{relevance(doc_{position}, k)}{log_2(position)}
nDCG: normalized Discounted Cumulative Gain
12
..) The order does not care...) k is now related to |GT|
GT: /Lab_2/GT/Ground_Truth.tsv
Results from SE_1: /Lab_2/SEs_Results/Results_from_SE1.tsvResults from SE_2: /Lab_2/SEs_Results/Results_from_SE2.tsvResults from SE_2: /Lab_2/SEs_Results/Results_from_SE2.tsv
Write a Python sw for evaluating the performance of these three SEs.MRR(Q) = \frac{1}{|Q|} \sum_{\forall q \in Q} \frac{1}{index_q(FirstRelevantResult)}
Your turn ;)
13
..) The order does not care...) k is now related to |GT|
Result_SE_1(q_1) = [did_0, did_1, did_2, did_3, did_4, did_5, did_6, did_7, did_8, did_9]GT(q_1) = {did_3, did_6, did_2, did_8, did_2}Result_SE_1(q_1) = [did_0, did_1, did_2, did_3, did_4, did_5, did_6, did_7, did_8, did_9]
GT(q_2) = {did_7, did_3}Result_SE_1(q_2) = [did_3, did_7, did_2, did_1, did_4, did_5, did_6, did_2, did_8, did_9]Result_SE_2(q_2) = [did_0, did_1, did_2, did_8, did_4, did_5, did_6, did_9, did_7, did_3]
GT(q_2) = {did_7, did_3}Result_SE_1(q_3) = [did_0, did_1, did_2, did_3, did_4, did_5, did_6, did_7, did_8, did_9]GT(q_3) = {did_0, did_2, did_2, did_3, did_4, did_5, did_6, did_7, did_8, did_9}Result_SE_1(q_3) = [did_0, did_1, did_2, did_3, did_4, did_5, did_6, did_7, did_8, did_9]