unbiased offline recommender evaluation for missing-not-at...
TRANSCRIPT
Unbiased Offline Recommender Evaluation for Missing-Not-At-Random Implicit Feedback
1
Funders:
Longqi Yang Yin Cui Yuan Xuan Chenyang Wang Serge Belongie Deborah Estrin
2
Offline Evaluation of Recommendation Algorithm
( , )( , )
( , )… …
user-item interactions
R
recommendation algorithms
rewards
…
…
R
recommendation algorithms
rewards
…
3
Offline Evaluation of Recommendation Algorithm
( , )( , )
( , )…
user interaction history
Pros:• Cost effective.• Efficient.• Iterate faster.• Experiment before deployment.
…
R
recommendation algorithms
rewards
…
4
Offline Evaluation of Recommendation Algorithm
( , )( , )
( , )…
user interaction history
Pros:• Cost effective.• Efficient.• Iterate faster.• Experiment before deployment.
Cons:• The data is Missing-Not-At-Random (MNAR)
5
Offline Evaluation procedure
user !
item "
user ! interacted with item "
6
Offline Evaluation procedure
train/test
7
Offline Evaluation procedure
1. Train and validate a recommendation model
2. Averaged performance over held-out (user, item) interaction pairs
(Average-Over-All)
8
Offline Evaluation procedure
1. Train and validate a recommendation model
(policy) !
2. Averaged performance over held-out (user, item) interaction pairs
(Average-Over-All)
Rating-based recommendation systems
Implicit feedback-based recommendation systems
9
Previous work: Average-Over-All is biased for rating-based recommendation systems, because ratings are MNAR
[Marlin et al. 09], [Schnabel et al. 16], [Steck 10], [Steck 11], and [Steck 13]
10
Previous work: Average-Over-All is biased for rating-based recommendation systems, because ratings are MNAR
[Marlin et al. 09], [Schnabel et al. 16], [Steck 10], [Steck 11], and [Steck 13]
Previous work: Average-Over-All is unbiased for implicit feedback-based recommendation systems, because implicit
feedback is missing uniformly at random.[Lim 15]
11
This work: Average-Over-All is biased for implicit feedback-based recommendation systems, because
implicit feedback is NOT missing uniformly at random.
12
This work: Average-Over-All is biased for implicit feedback-based recommendation systems, because
implicit feedback is NOT missing uniformly at random.
Popularity bias (Users are more likely to be exposed to popular items)
trending recommendation
13
A Hypothetical Example
Popular Items Long-tail Items
# of liked items(over all items)
# of liked items(over observations)
1 10
10 1
:
:
Algorithm 1 Performance
Algorithm 2 Performance
0.8
0.75 0.75
0
14
A Hypothetical Example
Popular Items Long-tail Items
# of liked items(over all items)
# of liked items(over observations)
1 10
10 1
:
:
Algorithm 1 Performance
Algorithm 2 Performance
0.8
0.75 0.75
0
15
A Hypothetical Example
Popular Items Long-tail Items
# of liked items(over all items)
# of liked items(over observations)
1 10
10 1
:
:
Algorithm 1 Performance
Algorithm 2 Performance
0.8
0.75 0.75
0
16
A Hypothetical Example
Popular Items Long-tail Items
# of liked items(over all items)
# of liked items(over observations)
1 10
10 1
:
:
Algorithm 1 Performance
Algorithm 2 Performance
0.8
0.75 0.75
0
17
A Hypothetical Example
Popular Items Long-tail Items
# of liked items(over all items)
# of liked items(over observations)
1 10
10 1
:
:
Algorithm 1 Performance
Algorithm 2 Performance
0.8
0.75 0.75
0
18
A Hypothetical Example
Popular Items Long-tail Items
# of liked items(over all items)
# of liked items(over observations)
1 10
10 1
:
:
Algorithm 1 Performance
Algorithm 2 Performance
0.8
0.75 0.75
0Any sensible evaluation
19
A Hypothetical Example
Popular Items Long-tail Items
# of liked items(over all items)
# of liked items(over observations)
1 10
10 1
:
:
Algorithm 1 Performance
Algorithm 2 Performance
0.8
0.75 0.75
0
Average-Over-All
20
Formalize Reward !
R(Z) =1
|U|X
u2U
1
|Su|X
i2Su
c(Zu,i)Ideal evaluation:
Item rankings predicted by an algorithm<latexit sha1_base64="ZyVWiPK9uixN125Ap0AIBWyDfRc=">AAACEHicdVDPaxNBGJ1Nq42xaqzHXoYGsaewG3YxuQW81FuF5gckIXw7+yUZMjO7zHxbCCF/ghf/FS8eWopXj978b5ykEWzRBwOP977H8F5aKOkoDH8FlYPDJ0+Pqs9qz49fvHxVf33Sd3lpBfZErnI7TMGhkgZ7JEnhsLAIOlU4SJcftv7gGq2TubmiVYETDXMjZ1IAeWlaf/eRUHMLZinN3HGfzaQgzHi64mA4qHluJS30tN4Im504iaOEh804jJMo8iRqd8J2wqNmuEOD7XE5rf8cZ7koNRoSCpwbRWFBkzVYkkLhpjYuHRYgljDHkacGNLrJeldow996JeOz3PpniO/UvxNr0M6tdOovNdDCPfa24r+8UUmz9mQtTVESGnH/0axUnHK+XYdn0qIgtfIEhC8uBRcLsOAnsa7mR/jTlP+f9FvNyPNPrUY33s9RZafsjJ2ziL1nXXbBLlmPCfaZfWU37Db4EnwL7oLv96eVYJ95wx4g+PEb612dHA==</latexit><latexit sha1_base64="ZyVWiPK9uixN125Ap0AIBWyDfRc=">AAACEHicdVDPaxNBGJ1Nq42xaqzHXoYGsaewG3YxuQW81FuF5gckIXw7+yUZMjO7zHxbCCF/ghf/FS8eWopXj978b5ykEWzRBwOP977H8F5aKOkoDH8FlYPDJ0+Pqs9qz49fvHxVf33Sd3lpBfZErnI7TMGhkgZ7JEnhsLAIOlU4SJcftv7gGq2TubmiVYETDXMjZ1IAeWlaf/eRUHMLZinN3HGfzaQgzHi64mA4qHluJS30tN4Im504iaOEh804jJMo8iRqd8J2wqNmuEOD7XE5rf8cZ7koNRoSCpwbRWFBkzVYkkLhpjYuHRYgljDHkacGNLrJeldow996JeOz3PpniO/UvxNr0M6tdOovNdDCPfa24r+8UUmz9mQtTVESGnH/0axUnHK+XYdn0qIgtfIEhC8uBRcLsOAnsa7mR/jTlP+f9FvNyPNPrUY33s9RZafsjJ2ziL1nXXbBLlmPCfaZfWU37Db4EnwL7oLv96eVYJ95wx4g+PEb612dHA==</latexit><latexit sha1_base64="ZyVWiPK9uixN125Ap0AIBWyDfRc=">AAACEHicdVDPaxNBGJ1Nq42xaqzHXoYGsaewG3YxuQW81FuF5gckIXw7+yUZMjO7zHxbCCF/ghf/FS8eWopXj978b5ykEWzRBwOP977H8F5aKOkoDH8FlYPDJ0+Pqs9qz49fvHxVf33Sd3lpBfZErnI7TMGhkgZ7JEnhsLAIOlU4SJcftv7gGq2TubmiVYETDXMjZ1IAeWlaf/eRUHMLZinN3HGfzaQgzHi64mA4qHluJS30tN4Im504iaOEh804jJMo8iRqd8J2wqNmuEOD7XE5rf8cZ7koNRoSCpwbRWFBkzVYkkLhpjYuHRYgljDHkacGNLrJeldow996JeOz3PpniO/UvxNr0M6tdOovNdDCPfa24r+8UUmz9mQtTVESGnH/0axUnHK+XYdn0qIgtfIEhC8uBRcLsOAnsa7mR/jTlP+f9FvNyPNPrUY33s9RZafsjJ2ziL1nXXbBLlmPCfaZfWU37Db4EnwL7oLv96eVYJ95wx4g+PEb612dHA==</latexit><latexit sha1_base64="ZyVWiPK9uixN125Ap0AIBWyDfRc=">AAACEHicdVDPaxNBGJ1Nq42xaqzHXoYGsaewG3YxuQW81FuF5gckIXw7+yUZMjO7zHxbCCF/ghf/FS8eWopXj978b5ykEWzRBwOP977H8F5aKOkoDH8FlYPDJ0+Pqs9qz49fvHxVf33Sd3lpBfZErnI7TMGhkgZ7JEnhsLAIOlU4SJcftv7gGq2TubmiVYETDXMjZ1IAeWlaf/eRUHMLZinN3HGfzaQgzHi64mA4qHluJS30tN4Im504iaOEh804jJMo8iRqd8J2wqNmuEOD7XE5rf8cZ7koNRoSCpwbRWFBkzVYkkLhpjYuHRYgljDHkacGNLrJeldow996JeOz3PpniO/UvxNr0M6tdOovNdDCPfa24r+8UUmz9mQtTVESGnH/0axUnHK+XYdn0qIgtfIEhC8uBRcLsOAnsa7mR/jTlP+f9FvNyPNPrUY33s9RZafsjJ2ziL1nXXbBLlmPCfaZfWU37Db4EnwL7oLv96eVYJ95wx4g+PEb612dHA==</latexit>
21
Formalize Reward !
R(Z) =1
|U|X
u2U
1
|Su|X
i2Su
c(Zu,i)Ideal evaluation:
Predicted ranking of item i for user u
Items liked by user u among the entire item set
Reward for (u, i) pair
scoring metric<latexit sha1_base64="kRMqKdaDY8ZBXiHQ/ORnrxYWwbE=">AAAB9XicdVDLSgMxFM3UV62vqks3wSK4Gmamj6m7ghuXFewD2rFk0kwbmkmGJKOUof/hxoUibv0Xd/6N6UNQ0QMXDufcy733hAmjSjvOh5VbW9/Y3MpvF3Z29/YPiodHbSVSiUkLCyZkN0SKMMpJS1PNSDeRBMUhI51wcjn3O3dEKir4jZ4mJIjRiNOIYqSNdKuwkJSPYEy0pHhQLDl2vVop+z507HLNLXtVQzzHr1/UoGs7C5TACs1B8b0/FDiNCdeYIaV6rpPoIENSU8zIrNBPFUkQnqAR6RnKUUxUkC2unsEzowxhJKQpruFC/T6RoVipaRyazhjpsfrtzcW/vF6qo3qQUZ6kmnC8XBSlDGoB5xHAIZUEazY1BGFJza0Qj5FEWJugCiaEr0/h/6Tt2a5ju9deqVFZxZEHJ+AUnAMX+KABrkATtAAGEjyAJ/Bs3VuP1ov1umzNWauZY/AD1tsnKXGS5Q==</latexit><latexit sha1_base64="kRMqKdaDY8ZBXiHQ/ORnrxYWwbE=">AAAB9XicdVDLSgMxFM3UV62vqks3wSK4Gmamj6m7ghuXFewD2rFk0kwbmkmGJKOUof/hxoUibv0Xd/6N6UNQ0QMXDufcy733hAmjSjvOh5VbW9/Y3MpvF3Z29/YPiodHbSVSiUkLCyZkN0SKMMpJS1PNSDeRBMUhI51wcjn3O3dEKir4jZ4mJIjRiNOIYqSNdKuwkJSPYEy0pHhQLDl2vVop+z507HLNLXtVQzzHr1/UoGs7C5TACs1B8b0/FDiNCdeYIaV6rpPoIENSU8zIrNBPFUkQnqAR6RnKUUxUkC2unsEzowxhJKQpruFC/T6RoVipaRyazhjpsfrtzcW/vF6qo3qQUZ6kmnC8XBSlDGoB5xHAIZUEazY1BGFJza0Qj5FEWJugCiaEr0/h/6Tt2a5ju9deqVFZxZEHJ+AUnAMX+KABrkATtAAGEjyAJ/Bs3VuP1ov1umzNWauZY/AD1tsnKXGS5Q==</latexit><latexit sha1_base64="kRMqKdaDY8ZBXiHQ/ORnrxYWwbE=">AAAB9XicdVDLSgMxFM3UV62vqks3wSK4Gmamj6m7ghuXFewD2rFk0kwbmkmGJKOUof/hxoUibv0Xd/6N6UNQ0QMXDufcy733hAmjSjvOh5VbW9/Y3MpvF3Z29/YPiodHbSVSiUkLCyZkN0SKMMpJS1PNSDeRBMUhI51wcjn3O3dEKir4jZ4mJIjRiNOIYqSNdKuwkJSPYEy0pHhQLDl2vVop+z507HLNLXtVQzzHr1/UoGs7C5TACs1B8b0/FDiNCdeYIaV6rpPoIENSU8zIrNBPFUkQnqAR6RnKUUxUkC2unsEzowxhJKQpruFC/T6RoVipaRyazhjpsfrtzcW/vF6qo3qQUZ6kmnC8XBSlDGoB5xHAIZUEazY1BGFJza0Qj5FEWJugCiaEr0/h/6Tt2a5ju9deqVFZxZEHJ+AUnAMX+KABrkATtAAGEjyAJ/Bs3VuP1ov1umzNWauZY/AD1tsnKXGS5Q==</latexit><latexit sha1_base64="kRMqKdaDY8ZBXiHQ/ORnrxYWwbE=">AAAB9XicdVDLSgMxFM3UV62vqks3wSK4Gmamj6m7ghuXFewD2rFk0kwbmkmGJKOUof/hxoUibv0Xd/6N6UNQ0QMXDufcy733hAmjSjvOh5VbW9/Y3MpvF3Z29/YPiodHbSVSiUkLCyZkN0SKMMpJS1PNSDeRBMUhI51wcjn3O3dEKir4jZ4mJIjRiNOIYqSNdKuwkJSPYEy0pHhQLDl2vVop+z507HLNLXtVQzzHr1/UoGs7C5TACs1B8b0/FDiNCdeYIaV6rpPoIENSU8zIrNBPFUkQnqAR6RnKUUxUkC2unsEzowxhJKQpruFC/T6RoVipaRyazhjpsfrtzcW/vF6qo3qQUZ6kmnC8XBSlDGoB5xHAIZUEazY1BGFJza0Qj5FEWJugCiaEr0/h/6Tt2a5ju9deqVFZxZEHJ+AUnAMX+KABrkATtAAGEjyAJ/Bs3VuP1ov1umzNWauZY/AD1tsnKXGS5Q==</latexit>
Item rankings predicted by an algorithm<latexit sha1_base64="ZyVWiPK9uixN125Ap0AIBWyDfRc=">AAACEHicdVDPaxNBGJ1Nq42xaqzHXoYGsaewG3YxuQW81FuF5gckIXw7+yUZMjO7zHxbCCF/ghf/FS8eWopXj978b5ykEWzRBwOP977H8F5aKOkoDH8FlYPDJ0+Pqs9qz49fvHxVf33Sd3lpBfZErnI7TMGhkgZ7JEnhsLAIOlU4SJcftv7gGq2TubmiVYETDXMjZ1IAeWlaf/eRUHMLZinN3HGfzaQgzHi64mA4qHluJS30tN4Im504iaOEh804jJMo8iRqd8J2wqNmuEOD7XE5rf8cZ7koNRoSCpwbRWFBkzVYkkLhpjYuHRYgljDHkacGNLrJeldow996JeOz3PpniO/UvxNr0M6tdOovNdDCPfa24r+8UUmz9mQtTVESGnH/0axUnHK+XYdn0qIgtfIEhC8uBRcLsOAnsa7mR/jTlP+f9FvNyPNPrUY33s9RZafsjJ2ziL1nXXbBLlmPCfaZfWU37Db4EnwL7oLv96eVYJ95wx4g+PEb612dHA==</latexit><latexit sha1_base64="ZyVWiPK9uixN125Ap0AIBWyDfRc=">AAACEHicdVDPaxNBGJ1Nq42xaqzHXoYGsaewG3YxuQW81FuF5gckIXw7+yUZMjO7zHxbCCF/ghf/FS8eWopXj978b5ykEWzRBwOP977H8F5aKOkoDH8FlYPDJ0+Pqs9qz49fvHxVf33Sd3lpBfZErnI7TMGhkgZ7JEnhsLAIOlU4SJcftv7gGq2TubmiVYETDXMjZ1IAeWlaf/eRUHMLZinN3HGfzaQgzHi64mA4qHluJS30tN4Im504iaOEh804jJMo8iRqd8J2wqNmuEOD7XE5rf8cZ7koNRoSCpwbRWFBkzVYkkLhpjYuHRYgljDHkacGNLrJeldow996JeOz3PpniO/UvxNr0M6tdOovNdDCPfa24r+8UUmz9mQtTVESGnH/0axUnHK+XYdn0qIgtfIEhC8uBRcLsOAnsa7mR/jTlP+f9FvNyPNPrUY33s9RZafsjJ2ziL1nXXbBLlmPCfaZfWU37Db4EnwL7oLv96eVYJ95wx4g+PEb612dHA==</latexit><latexit sha1_base64="ZyVWiPK9uixN125Ap0AIBWyDfRc=">AAACEHicdVDPaxNBGJ1Nq42xaqzHXoYGsaewG3YxuQW81FuF5gckIXw7+yUZMjO7zHxbCCF/ghf/FS8eWopXj978b5ykEWzRBwOP977H8F5aKOkoDH8FlYPDJ0+Pqs9qz49fvHxVf33Sd3lpBfZErnI7TMGhkgZ7JEnhsLAIOlU4SJcftv7gGq2TubmiVYETDXMjZ1IAeWlaf/eRUHMLZinN3HGfzaQgzHi64mA4qHluJS30tN4Im504iaOEh804jJMo8iRqd8J2wqNmuEOD7XE5rf8cZ7koNRoSCpwbRWFBkzVYkkLhpjYuHRYgljDHkacGNLrJeldow996JeOz3PpniO/UvxNr0M6tdOovNdDCPfa24r+8UUmz9mQtTVESGnH/0axUnHK+XYdn0qIgtfIEhC8uBRcLsOAnsa7mR/jTlP+f9FvNyPNPrUY33s9RZafsjJ2ziL1nXXbBLlmPCfaZfWU37Db4EnwL7oLv96eVYJ95wx4g+PEb612dHA==</latexit><latexit sha1_base64="ZyVWiPK9uixN125Ap0AIBWyDfRc=">AAACEHicdVDPaxNBGJ1Nq42xaqzHXoYGsaewG3YxuQW81FuF5gckIXw7+yUZMjO7zHxbCCF/ghf/FS8eWopXj978b5ykEWzRBwOP977H8F5aKOkoDH8FlYPDJ0+Pqs9qz49fvHxVf33Sd3lpBfZErnI7TMGhkgZ7JEnhsLAIOlU4SJcftv7gGq2TubmiVYETDXMjZ1IAeWlaf/eRUHMLZinN3HGfzaQgzHi64mA4qHluJS30tN4Im504iaOEh804jJMo8iRqd8J2wqNmuEOD7XE5rf8cZ7koNRoSCpwbRWFBkzVYkkLhpjYuHRYgljDHkacGNLrJeldow996JeOz3PpniO/UvxNr0M6tdOovNdDCPfa24r+8UUmz9mQtTVESGnH/0axUnHK+XYdn0qIgtfIEhC8uBRcLsOAnsa7mR/jTlP+f9FvNyPNPrUY33s9RZafsjJ2ziL1nXXbBLlmPCfaZfWU37Db4EnwL7oLv96eVYJ95wx4g+PEb612dHA==</latexit>
22
Formalize Reward !
R(Z) =1
|U|X
u2U
1
|Su|X
i2Su
c(Zu,i)Ideal evaluation:
Predicted ranking of item i for user u
Items liked by user u among the entire item set
Reward for user u
scoring metric<latexit sha1_base64="kRMqKdaDY8ZBXiHQ/ORnrxYWwbE=">AAAB9XicdVDLSgMxFM3UV62vqks3wSK4Gmamj6m7ghuXFewD2rFk0kwbmkmGJKOUof/hxoUibv0Xd/6N6UNQ0QMXDufcy733hAmjSjvOh5VbW9/Y3MpvF3Z29/YPiodHbSVSiUkLCyZkN0SKMMpJS1PNSDeRBMUhI51wcjn3O3dEKir4jZ4mJIjRiNOIYqSNdKuwkJSPYEy0pHhQLDl2vVop+z507HLNLXtVQzzHr1/UoGs7C5TACs1B8b0/FDiNCdeYIaV6rpPoIENSU8zIrNBPFUkQnqAR6RnKUUxUkC2unsEzowxhJKQpruFC/T6RoVipaRyazhjpsfrtzcW/vF6qo3qQUZ6kmnC8XBSlDGoB5xHAIZUEazY1BGFJza0Qj5FEWJugCiaEr0/h/6Tt2a5ju9deqVFZxZEHJ+AUnAMX+KABrkATtAAGEjyAJ/Bs3VuP1ov1umzNWauZY/AD1tsnKXGS5Q==</latexit><latexit sha1_base64="kRMqKdaDY8ZBXiHQ/ORnrxYWwbE=">AAAB9XicdVDLSgMxFM3UV62vqks3wSK4Gmamj6m7ghuXFewD2rFk0kwbmkmGJKOUof/hxoUibv0Xd/6N6UNQ0QMXDufcy733hAmjSjvOh5VbW9/Y3MpvF3Z29/YPiodHbSVSiUkLCyZkN0SKMMpJS1PNSDeRBMUhI51wcjn3O3dEKir4jZ4mJIjRiNOIYqSNdKuwkJSPYEy0pHhQLDl2vVop+z507HLNLXtVQzzHr1/UoGs7C5TACs1B8b0/FDiNCdeYIaV6rpPoIENSU8zIrNBPFUkQnqAR6RnKUUxUkC2unsEzowxhJKQpruFC/T6RoVipaRyazhjpsfrtzcW/vF6qo3qQUZ6kmnC8XBSlDGoB5xHAIZUEazY1BGFJza0Qj5FEWJugCiaEr0/h/6Tt2a5ju9deqVFZxZEHJ+AUnAMX+KABrkATtAAGEjyAJ/Bs3VuP1ov1umzNWauZY/AD1tsnKXGS5Q==</latexit><latexit sha1_base64="kRMqKdaDY8ZBXiHQ/ORnrxYWwbE=">AAAB9XicdVDLSgMxFM3UV62vqks3wSK4Gmamj6m7ghuXFewD2rFk0kwbmkmGJKOUof/hxoUibv0Xd/6N6UNQ0QMXDufcy733hAmjSjvOh5VbW9/Y3MpvF3Z29/YPiodHbSVSiUkLCyZkN0SKMMpJS1PNSDeRBMUhI51wcjn3O3dEKir4jZ4mJIjRiNOIYqSNdKuwkJSPYEy0pHhQLDl2vVop+z507HLNLXtVQzzHr1/UoGs7C5TACs1B8b0/FDiNCdeYIaV6rpPoIENSU8zIrNBPFUkQnqAR6RnKUUxUkC2unsEzowxhJKQpruFC/T6RoVipaRyazhjpsfrtzcW/vF6qo3qQUZ6kmnC8XBSlDGoB5xHAIZUEazY1BGFJza0Qj5FEWJugCiaEr0/h/6Tt2a5ju9deqVFZxZEHJ+AUnAMX+KABrkATtAAGEjyAJ/Bs3VuP1ov1umzNWauZY/AD1tsnKXGS5Q==</latexit><latexit sha1_base64="kRMqKdaDY8ZBXiHQ/ORnrxYWwbE=">AAAB9XicdVDLSgMxFM3UV62vqks3wSK4Gmamj6m7ghuXFewD2rFk0kwbmkmGJKOUof/hxoUibv0Xd/6N6UNQ0QMXDufcy733hAmjSjvOh5VbW9/Y3MpvF3Z29/YPiodHbSVSiUkLCyZkN0SKMMpJS1PNSDeRBMUhI51wcjn3O3dEKir4jZ4mJIjRiNOIYqSNdKuwkJSPYEy0pHhQLDl2vVop+z507HLNLXtVQzzHr1/UoGs7C5TACs1B8b0/FDiNCdeYIaV6rpPoIENSU8zIrNBPFUkQnqAR6RnKUUxUkC2unsEzowxhJKQpruFC/T6RoVipaRyazhjpsfrtzcW/vF6qo3qQUZ6kmnC8XBSlDGoB5xHAIZUEazY1BGFJza0Qj5FEWJugCiaEr0/h/6Tt2a5ju9deqVFZxZEHJ+AUnAMX+KABrkATtAAGEjyAJ/Bs3VuP1ov1umzNWauZY/AD1tsnKXGS5Q==</latexit>
Item rankings predicted by an algorithm<latexit sha1_base64="ZyVWiPK9uixN125Ap0AIBWyDfRc=">AAACEHicdVDPaxNBGJ1Nq42xaqzHXoYGsaewG3YxuQW81FuF5gckIXw7+yUZMjO7zHxbCCF/ghf/FS8eWopXj978b5ykEWzRBwOP977H8F5aKOkoDH8FlYPDJ0+Pqs9qz49fvHxVf33Sd3lpBfZErnI7TMGhkgZ7JEnhsLAIOlU4SJcftv7gGq2TubmiVYETDXMjZ1IAeWlaf/eRUHMLZinN3HGfzaQgzHi64mA4qHluJS30tN4Im504iaOEh804jJMo8iRqd8J2wqNmuEOD7XE5rf8cZ7koNRoSCpwbRWFBkzVYkkLhpjYuHRYgljDHkacGNLrJeldow996JeOz3PpniO/UvxNr0M6tdOovNdDCPfa24r+8UUmz9mQtTVESGnH/0axUnHK+XYdn0qIgtfIEhC8uBRcLsOAnsa7mR/jTlP+f9FvNyPNPrUY33s9RZafsjJ2ziL1nXXbBLlmPCfaZfWU37Db4EnwL7oLv96eVYJ95wx4g+PEb612dHA==</latexit><latexit sha1_base64="ZyVWiPK9uixN125Ap0AIBWyDfRc=">AAACEHicdVDPaxNBGJ1Nq42xaqzHXoYGsaewG3YxuQW81FuF5gckIXw7+yUZMjO7zHxbCCF/ghf/FS8eWopXj978b5ykEWzRBwOP977H8F5aKOkoDH8FlYPDJ0+Pqs9qz49fvHxVf33Sd3lpBfZErnI7TMGhkgZ7JEnhsLAIOlU4SJcftv7gGq2TubmiVYETDXMjZ1IAeWlaf/eRUHMLZinN3HGfzaQgzHi64mA4qHluJS30tN4Im504iaOEh804jJMo8iRqd8J2wqNmuEOD7XE5rf8cZ7koNRoSCpwbRWFBkzVYkkLhpjYuHRYgljDHkacGNLrJeldow996JeOz3PpniO/UvxNr0M6tdOovNdDCPfa24r+8UUmz9mQtTVESGnH/0axUnHK+XYdn0qIgtfIEhC8uBRcLsOAnsa7mR/jTlP+f9FvNyPNPrUY33s9RZafsjJ2ziL1nXXbBLlmPCfaZfWU37Db4EnwL7oLv96eVYJ95wx4g+PEb612dHA==</latexit><latexit sha1_base64="ZyVWiPK9uixN125Ap0AIBWyDfRc=">AAACEHicdVDPaxNBGJ1Nq42xaqzHXoYGsaewG3YxuQW81FuF5gckIXw7+yUZMjO7zHxbCCF/ghf/FS8eWopXj978b5ykEWzRBwOP977H8F5aKOkoDH8FlYPDJ0+Pqs9qz49fvHxVf33Sd3lpBfZErnI7TMGhkgZ7JEnhsLAIOlU4SJcftv7gGq2TubmiVYETDXMjZ1IAeWlaf/eRUHMLZinN3HGfzaQgzHi64mA4qHluJS30tN4Im504iaOEh804jJMo8iRqd8J2wqNmuEOD7XE5rf8cZ7koNRoSCpwbRWFBkzVYkkLhpjYuHRYgljDHkacGNLrJeldow996JeOz3PpniO/UvxNr0M6tdOovNdDCPfa24r+8UUmz9mQtTVESGnH/0axUnHK+XYdn0qIgtfIEhC8uBRcLsOAnsa7mR/jTlP+f9FvNyPNPrUY33s9RZafsjJ2ziL1nXXbBLlmPCfaZfWU37Db4EnwL7oLv96eVYJ95wx4g+PEb612dHA==</latexit><latexit sha1_base64="ZyVWiPK9uixN125Ap0AIBWyDfRc=">AAACEHicdVDPaxNBGJ1Nq42xaqzHXoYGsaewG3YxuQW81FuF5gckIXw7+yUZMjO7zHxbCCF/ghf/FS8eWopXj978b5ykEWzRBwOP977H8F5aKOkoDH8FlYPDJ0+Pqs9qz49fvHxVf33Sd3lpBfZErnI7TMGhkgZ7JEnhsLAIOlU4SJcftv7gGq2TubmiVYETDXMjZ1IAeWlaf/eRUHMLZinN3HGfzaQgzHi64mA4qHluJS30tN4Im504iaOEh804jJMo8iRqd8J2wqNmuEOD7XE5rf8cZ7koNRoSCpwbRWFBkzVYkkLhpjYuHRYgljDHkacGNLrJeldow996JeOz3PpniO/UvxNr0M6tdOovNdDCPfa24r+8UUmz9mQtTVESGnH/0axUnHK+XYdn0qIgtfIEhC8uBRcLsOAnsa7mR/jTlP+f9FvNyPNPrUY33s9RZafsjJ2ziL1nXXbBLlmPCfaZfWU37Db4EnwL7oLv96eVYJ95wx4g+PEb612dHA==</latexit>
23
Formalize Reward !
R(Z) =1
|U|X
u2U
1
|Su|X
i2Su
c(Zu,i)Ideal evaluation:
Predicted ranking of item i for user u
Items liked by user u among the entire item set scoring metric<latexit sha1_base64="kRMqKdaDY8ZBXiHQ/ORnrxYWwbE=">AAAB9XicdVDLSgMxFM3UV62vqks3wSK4Gmamj6m7ghuXFewD2rFk0kwbmkmGJKOUof/hxoUibv0Xd/6N6UNQ0QMXDufcy733hAmjSjvOh5VbW9/Y3MpvF3Z29/YPiodHbSVSiUkLCyZkN0SKMMpJS1PNSDeRBMUhI51wcjn3O3dEKir4jZ4mJIjRiNOIYqSNdKuwkJSPYEy0pHhQLDl2vVop+z507HLNLXtVQzzHr1/UoGs7C5TACs1B8b0/FDiNCdeYIaV6rpPoIENSU8zIrNBPFUkQnqAR6RnKUUxUkC2unsEzowxhJKQpruFC/T6RoVipaRyazhjpsfrtzcW/vF6qo3qQUZ6kmnC8XBSlDGoB5xHAIZUEazY1BGFJza0Qj5FEWJugCiaEr0/h/6Tt2a5ju9deqVFZxZEHJ+AUnAMX+KABrkATtAAGEjyAJ/Bs3VuP1ov1umzNWauZY/AD1tsnKXGS5Q==</latexit><latexit sha1_base64="kRMqKdaDY8ZBXiHQ/ORnrxYWwbE=">AAAB9XicdVDLSgMxFM3UV62vqks3wSK4Gmamj6m7ghuXFewD2rFk0kwbmkmGJKOUof/hxoUibv0Xd/6N6UNQ0QMXDufcy733hAmjSjvOh5VbW9/Y3MpvF3Z29/YPiodHbSVSiUkLCyZkN0SKMMpJS1PNSDeRBMUhI51wcjn3O3dEKir4jZ4mJIjRiNOIYqSNdKuwkJSPYEy0pHhQLDl2vVop+z507HLNLXtVQzzHr1/UoGs7C5TACs1B8b0/FDiNCdeYIaV6rpPoIENSU8zIrNBPFUkQnqAR6RnKUUxUkC2unsEzowxhJKQpruFC/T6RoVipaRyazhjpsfrtzcW/vF6qo3qQUZ6kmnC8XBSlDGoB5xHAIZUEazY1BGFJza0Qj5FEWJugCiaEr0/h/6Tt2a5ju9deqVFZxZEHJ+AUnAMX+KABrkATtAAGEjyAJ/Bs3VuP1ov1umzNWauZY/AD1tsnKXGS5Q==</latexit><latexit sha1_base64="kRMqKdaDY8ZBXiHQ/ORnrxYWwbE=">AAAB9XicdVDLSgMxFM3UV62vqks3wSK4Gmamj6m7ghuXFewD2rFk0kwbmkmGJKOUof/hxoUibv0Xd/6N6UNQ0QMXDufcy733hAmjSjvOh5VbW9/Y3MpvF3Z29/YPiodHbSVSiUkLCyZkN0SKMMpJS1PNSDeRBMUhI51wcjn3O3dEKir4jZ4mJIjRiNOIYqSNdKuwkJSPYEy0pHhQLDl2vVop+z507HLNLXtVQzzHr1/UoGs7C5TACs1B8b0/FDiNCdeYIaV6rpPoIENSU8zIrNBPFUkQnqAR6RnKUUxUkC2unsEzowxhJKQpruFC/T6RoVipaRyazhjpsfrtzcW/vF6qo3qQUZ6kmnC8XBSlDGoB5xHAIZUEazY1BGFJza0Qj5FEWJugCiaEr0/h/6Tt2a5ju9deqVFZxZEHJ+AUnAMX+KABrkATtAAGEjyAJ/Bs3VuP1ov1umzNWauZY/AD1tsnKXGS5Q==</latexit><latexit sha1_base64="kRMqKdaDY8ZBXiHQ/ORnrxYWwbE=">AAAB9XicdVDLSgMxFM3UV62vqks3wSK4Gmamj6m7ghuXFewD2rFk0kwbmkmGJKOUof/hxoUibv0Xd/6N6UNQ0QMXDufcy733hAmjSjvOh5VbW9/Y3MpvF3Z29/YPiodHbSVSiUkLCyZkN0SKMMpJS1PNSDeRBMUhI51wcjn3O3dEKir4jZ4mJIjRiNOIYqSNdKuwkJSPYEy0pHhQLDl2vVop+z507HLNLXtVQzzHr1/UoGs7C5TACs1B8b0/FDiNCdeYIaV6rpPoIENSU8zIrNBPFUkQnqAR6RnKUUxUkC2unsEzowxhJKQpruFC/T6RoVipaRyazhjpsfrtzcW/vF6qo3qQUZ6kmnC8XBSlDGoB5xHAIZUEazY1BGFJza0Qj5FEWJugCiaEr0/h/6Tt2a5ju9deqVFZxZEHJ+AUnAMX+KABrkATtAAGEjyAJ/Bs3VuP1ov1umzNWauZY/AD1tsnKXGS5Q==</latexit>
Item rankings predicted by an algorithm<latexit sha1_base64="ZyVWiPK9uixN125Ap0AIBWyDfRc=">AAACEHicdVDPaxNBGJ1Nq42xaqzHXoYGsaewG3YxuQW81FuF5gckIXw7+yUZMjO7zHxbCCF/ghf/FS8eWopXj978b5ykEWzRBwOP977H8F5aKOkoDH8FlYPDJ0+Pqs9qz49fvHxVf33Sd3lpBfZErnI7TMGhkgZ7JEnhsLAIOlU4SJcftv7gGq2TubmiVYETDXMjZ1IAeWlaf/eRUHMLZinN3HGfzaQgzHi64mA4qHluJS30tN4Im504iaOEh804jJMo8iRqd8J2wqNmuEOD7XE5rf8cZ7koNRoSCpwbRWFBkzVYkkLhpjYuHRYgljDHkacGNLrJeldow996JeOz3PpniO/UvxNr0M6tdOovNdDCPfa24r+8UUmz9mQtTVESGnH/0axUnHK+XYdn0qIgtfIEhC8uBRcLsOAnsa7mR/jTlP+f9FvNyPNPrUY33s9RZafsjJ2ziL1nXXbBLlmPCfaZfWU37Db4EnwL7oLv96eVYJ95wx4g+PEb612dHA==</latexit><latexit sha1_base64="ZyVWiPK9uixN125Ap0AIBWyDfRc=">AAACEHicdVDPaxNBGJ1Nq42xaqzHXoYGsaewG3YxuQW81FuF5gckIXw7+yUZMjO7zHxbCCF/ghf/FS8eWopXj978b5ykEWzRBwOP977H8F5aKOkoDH8FlYPDJ0+Pqs9qz49fvHxVf33Sd3lpBfZErnI7TMGhkgZ7JEnhsLAIOlU4SJcftv7gGq2TubmiVYETDXMjZ1IAeWlaf/eRUHMLZinN3HGfzaQgzHi64mA4qHluJS30tN4Im504iaOEh804jJMo8iRqd8J2wqNmuEOD7XE5rf8cZ7koNRoSCpwbRWFBkzVYkkLhpjYuHRYgljDHkacGNLrJeldow996JeOz3PpniO/UvxNr0M6tdOovNdDCPfa24r+8UUmz9mQtTVESGnH/0axUnHK+XYdn0qIgtfIEhC8uBRcLsOAnsa7mR/jTlP+f9FvNyPNPrUY33s9RZafsjJ2ziL1nXXbBLlmPCfaZfWU37Db4EnwL7oLv96eVYJ95wx4g+PEb612dHA==</latexit><latexit sha1_base64="ZyVWiPK9uixN125Ap0AIBWyDfRc=">AAACEHicdVDPaxNBGJ1Nq42xaqzHXoYGsaewG3YxuQW81FuF5gckIXw7+yUZMjO7zHxbCCF/ghf/FS8eWopXj978b5ykEWzRBwOP977H8F5aKOkoDH8FlYPDJ0+Pqs9qz49fvHxVf33Sd3lpBfZErnI7TMGhkgZ7JEnhsLAIOlU4SJcftv7gGq2TubmiVYETDXMjZ1IAeWlaf/eRUHMLZinN3HGfzaQgzHi64mA4qHluJS30tN4Im504iaOEh804jJMo8iRqd8J2wqNmuEOD7XE5rf8cZ7koNRoSCpwbRWFBkzVYkkLhpjYuHRYgljDHkacGNLrJeldow996JeOz3PpniO/UvxNr0M6tdOovNdDCPfa24r+8UUmz9mQtTVESGnH/0axUnHK+XYdn0qIgtfIEhC8uBRcLsOAnsa7mR/jTlP+f9FvNyPNPrUY33s9RZafsjJ2ziL1nXXbBLlmPCfaZfWU37Db4EnwL7oLv96eVYJ95wx4g+PEb612dHA==</latexit><latexit sha1_base64="ZyVWiPK9uixN125Ap0AIBWyDfRc=">AAACEHicdVDPaxNBGJ1Nq42xaqzHXoYGsaewG3YxuQW81FuF5gckIXw7+yUZMjO7zHxbCCF/ghf/FS8eWopXj978b5ykEWzRBwOP977H8F5aKOkoDH8FlYPDJ0+Pqs9qz49fvHxVf33Sd3lpBfZErnI7TMGhkgZ7JEnhsLAIOlU4SJcftv7gGq2TubmiVYETDXMjZ1IAeWlaf/eRUHMLZinN3HGfzaQgzHi64mA4qHluJS30tN4Im504iaOEh804jJMo8iRqd8J2wqNmuEOD7XE5rf8cZ7koNRoSCpwbRWFBkzVYkkLhpjYuHRYgljDHkacGNLrJeldow996JeOz3PpniO/UvxNr0M6tdOovNdDCPfa24r+8UUmz9mQtTVESGnH/0axUnHK+XYdn0qIgtfIEhC8uBRcLsOAnsa7mR/jTlP+f9FvNyPNPrUY33s9RZafsjJ2ziL1nXXbBLlmPCfaZfWU37Db4EnwL7oLv96eVYJ95wx4g+PEb612dHA==</latexit>
Reward for the algorithm<latexit sha1_base64="58zCSwWB0/MRbmW4ak6tFuTVoiw=">AAACAXicdVDLSgMxFM3UV62vqhvBTbAIrkpa+3InuHGpYlWopWTSO21oZjIkd5RS6sZfceNCEbf+hTv/xvQhqOhZhMM593Jzjh8raZGxDy81Mzs3v5BezCwtr6yuZdc3LqxOjIC60EqbK59bUDKCOkpUcBUb4KGv4NLvHY38yxswVuroHPsxNEPeiWQgBUcntbJbZ3DLTZsG2lDsAuWqo43EbtjK5lie1Ur7rEZZvlIqFytVR9x7UGa0kGdj5MgUJ63s+3VbiySECIXi1jYKLMbmgBuUQsEwc51YiLno8Q40HI14CLY5GCcY0l2nTD4R6AjpWP2+MeChtf3Qd5Mhx6797Y3Ev7xGgkGtOZBRnCBEYnIoSBRFTUd10LY0IFD1HeHCBZeCii43XKArLeNK+EpK/ycXxXzB8dNi7rA0rSNNtskO2SMFUiWH5JickDoR5I48kCfy7N17j96L9zoZTXnTnU3yA97bJ5YylvE=</latexit><latexit sha1_base64="58zCSwWB0/MRbmW4ak6tFuTVoiw=">AAACAXicdVDLSgMxFM3UV62vqhvBTbAIrkpa+3InuHGpYlWopWTSO21oZjIkd5RS6sZfceNCEbf+hTv/xvQhqOhZhMM593Jzjh8raZGxDy81Mzs3v5BezCwtr6yuZdc3LqxOjIC60EqbK59bUDKCOkpUcBUb4KGv4NLvHY38yxswVuroHPsxNEPeiWQgBUcntbJbZ3DLTZsG2lDsAuWqo43EbtjK5lie1Ur7rEZZvlIqFytVR9x7UGa0kGdj5MgUJ63s+3VbiySECIXi1jYKLMbmgBuUQsEwc51YiLno8Q40HI14CLY5GCcY0l2nTD4R6AjpWP2+MeChtf3Qd5Mhx6797Y3Ev7xGgkGtOZBRnCBEYnIoSBRFTUd10LY0IFD1HeHCBZeCii43XKArLeNK+EpK/ycXxXzB8dNi7rA0rSNNtskO2SMFUiWH5JickDoR5I48kCfy7N17j96L9zoZTXnTnU3yA97bJ5YylvE=</latexit><latexit sha1_base64="58zCSwWB0/MRbmW4ak6tFuTVoiw=">AAACAXicdVDLSgMxFM3UV62vqhvBTbAIrkpa+3InuHGpYlWopWTSO21oZjIkd5RS6sZfceNCEbf+hTv/xvQhqOhZhMM593Jzjh8raZGxDy81Mzs3v5BezCwtr6yuZdc3LqxOjIC60EqbK59bUDKCOkpUcBUb4KGv4NLvHY38yxswVuroHPsxNEPeiWQgBUcntbJbZ3DLTZsG2lDsAuWqo43EbtjK5lie1Ur7rEZZvlIqFytVR9x7UGa0kGdj5MgUJ63s+3VbiySECIXi1jYKLMbmgBuUQsEwc51YiLno8Q40HI14CLY5GCcY0l2nTD4R6AjpWP2+MeChtf3Qd5Mhx6797Y3Ev7xGgkGtOZBRnCBEYnIoSBRFTUd10LY0IFD1HeHCBZeCii43XKArLeNK+EpK/ycXxXzB8dNi7rA0rSNNtskO2SMFUiWH5JickDoR5I48kCfy7N17j96L9zoZTXnTnU3yA97bJ5YylvE=</latexit><latexit sha1_base64="58zCSwWB0/MRbmW4ak6tFuTVoiw=">AAACAXicdVDLSgMxFM3UV62vqhvBTbAIrkpa+3InuHGpYlWopWTSO21oZjIkd5RS6sZfceNCEbf+hTv/xvQhqOhZhMM593Jzjh8raZGxDy81Mzs3v5BezCwtr6yuZdc3LqxOjIC60EqbK59bUDKCOkpUcBUb4KGv4NLvHY38yxswVuroHPsxNEPeiWQgBUcntbJbZ3DLTZsG2lDsAuWqo43EbtjK5lie1Ur7rEZZvlIqFytVR9x7UGa0kGdj5MgUJ63s+3VbiySECIXi1jYKLMbmgBuUQsEwc51YiLno8Q40HI14CLY5GCcY0l2nTD4R6AjpWP2+MeChtf3Qd5Mhx6797Y3Ev7xGgkGtOZBRnCBEYnIoSBRFTUd10LY0IFD1HeHCBZeCii43XKArLeNK+EpK/ycXxXzB8dNi7rA0rSNNtskO2SMFUiWH5JickDoR5I48kCfy7N17j96L9zoZTXnTnU3yA97bJ5YylvE=</latexit>
24
Formalize Reward !
Average-Over-All: RAOA(Z) =1
|U|X
u2U
1
|S⇤u|
X
i2S⇤u
c(Zu,i)
Items liked by user u (observed)
25
Formalize Bias
Ou,i = 1 if (u, i) is observed, and Ou,i = 0 otherwise
EO
hRAOA(Z)
i6= R(Z)
Ou,i ⇠ B(1, Pu,i)
26
RIPS(Z|P ) =1
|U|X
u2U
1
|Su|X
i2S⇤u
c(Zu,i)
Pu,i
RAOA(Z) =1
|U|X
u2U
1
|S⇤u|
X
i2S⇤u
c(Zu,i)
Inverse-Propensity-Scoring (IPS)
27
RIPS(Z|P ) =1
|U|X
u2U
1
|Su|X
i2S⇤u
c(Zu,i)
Pu,i
RAOA(Z) =1
|U|X
u2U
1
|S⇤u|
X
i2S⇤u
c(Zu,i)
EO
hRIPS(Z|P )
i= R(Z)
Inverse-Propensity-Scoring (IPS)
28
RIPS(Z|P ) =1
|U|X
u2U
1
|Su|X
i2S⇤u
c(Zu,i)
Pu,i
Self-Normalized Inverse-Propensity-Scoring (SNIPS)[Swaminathan et al.15]
RSNIPS(Z|P ) =1
|U|X
u2U
1Pi2S⇤
u
1Pu,i
X
i2S⇤u
c(Zu,i)
Pu,i
29
Estimating Propensity Scores
Factor: Popularity bias (Users are more likely to be exposed to popular items)
Assumptions:
• User-independence assumption
• Two-steps assumption
• User preference is not affected by item presentation
Pu,i = P (Ou,i = 1) = P (O⇤,i = 1) = P⇤,i
P⇤,i = P select⇤,i · P interact|select
⇤,i
P interact|select⇤,i = P interact
⇤,i
30
Estimating Propensity Scores
Popularity bias model [Steck 11]:
P select⇤,i / (n⇤
i )�
Observed item popularity
31
Estimating Propensity Scores
Popularity bias model [Steck 11]:
P select⇤,i / (n⇤
i )�
P⇤,i / (n⇤i )( �+1
2 )
Estimated from known online content
serving policy
32
Measuring bias in recommender evaluation(Yahoo! music rating dataset)
ModelAverage-Over-All
!SNIPS(# = 1.5)
!SNIPS(# = 2.0)
!SNIPS(# = 2.5)
!SNIPS(# = 3.0)
U-CML 0.401 0.270 0.260 0.253 0.248
A-CML 0.399 0.274 0.264 0.258 0.253
BPR 0.380 0.275 0.268 0.262 0.258
PMF 0.386 0.267 0.259 0.252 0.248
Mean Absolute Error (MAE), Recall
!SNIPS produces significantly lower
MAE
33
Measuring bias in recommender evaluation(Yahoo! music rating dataset)
ModelAverage-Over-All
!SNIPS(# = 1.5)
!SNIPS(# = 2.0)
!SNIPS(# = 2.5)
!SNIPS(# = 3.0)
U-CML 0.401 0.270 0.260 0.253 0.248
A-CML 0.399 0.274 0.264 0.258 0.253
BPR 0.380 0.275 0.268 0.262 0.258
PMF 0.386 0.267 0.259 0.252 0.248
Mean Absolute Error (MAE), Recall
!SNIPS produces significantly lower
MAE
The accuracy of recommending popular items is a significant overestimation of the true recommendation performance
34
Please come to our poster or refer to our paper for:
• Proofs
• Experimental details.
• More experiments.
• Deeper analysis of the unbiased evaluator.
35
Conclusions and Future Work
RSNIPS(Z|P ) =1
|U|X
u2U
1Pi2S⇤
u
1Pu,i
X
i2S⇤u
c(Zu,i)
Pu,i
EO
hRIPS(Z|P )
i= R(Z)
• Understanding variance of evaluators.
• Propensity estimation (e.g., incorporate auxiliary
user and item information).
• Debias training of recommendation systems (e.g.,
[Liang et al. 16]).
36
http://www.openrec.aiGithub link, documents, and tutorials
Longqi YangPh.D. candidate
Computer Science, Cornell Tech, Cornell University
Email: [email protected]
Web: bit.ly/longqi
Twitter: @ylongqi
Connected Experiences Lab
http://cx.jacobs.cornell.edu/
Small Data Lab
http://smalldata.io/
Funders: