unbiased offline recommender evaluation for missing-not-at...

36
Unbiased Offline Recommender Evaluation for Missing-Not-At-Random Implicit Feedback 1 Funders: Longqi Yang Yin Cui Yuan Xuan Chenyang Wang Serge Belongie Deborah Estrin

Upload: others

Post on 26-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Unbiased Offline Recommender Evaluation for Missing-Not-At …ylongqi/presentation/YangCXWBE18Slid… · [Marlin et al. 09], [Schnabel et al. 16], [Steck10], [Steck11], and [Steck13]

Unbiased Offline Recommender Evaluation for Missing-Not-At-Random Implicit Feedback

1

Funders:

Longqi Yang Yin Cui Yuan Xuan Chenyang Wang Serge Belongie Deborah Estrin

Page 2: Unbiased Offline Recommender Evaluation for Missing-Not-At …ylongqi/presentation/YangCXWBE18Slid… · [Marlin et al. 09], [Schnabel et al. 16], [Steck10], [Steck11], and [Steck13]

2

Offline Evaluation of Recommendation Algorithm

( , )( , )

( , )… …

user-item interactions

R

recommendation algorithms

rewards

Page 3: Unbiased Offline Recommender Evaluation for Missing-Not-At …ylongqi/presentation/YangCXWBE18Slid… · [Marlin et al. 09], [Schnabel et al. 16], [Steck10], [Steck11], and [Steck13]

R

recommendation algorithms

rewards

3

Offline Evaluation of Recommendation Algorithm

( , )( , )

( , )…

user interaction history

Pros:• Cost effective.• Efficient.• Iterate faster.• Experiment before deployment.

Page 4: Unbiased Offline Recommender Evaluation for Missing-Not-At …ylongqi/presentation/YangCXWBE18Slid… · [Marlin et al. 09], [Schnabel et al. 16], [Steck10], [Steck11], and [Steck13]

R

recommendation algorithms

rewards

4

Offline Evaluation of Recommendation Algorithm

( , )( , )

( , )…

user interaction history

Pros:• Cost effective.• Efficient.• Iterate faster.• Experiment before deployment.

Cons:• The data is Missing-Not-At-Random (MNAR)

Page 5: Unbiased Offline Recommender Evaluation for Missing-Not-At …ylongqi/presentation/YangCXWBE18Slid… · [Marlin et al. 09], [Schnabel et al. 16], [Steck10], [Steck11], and [Steck13]

5

Offline Evaluation procedure

user !

item "

user ! interacted with item "

Page 6: Unbiased Offline Recommender Evaluation for Missing-Not-At …ylongqi/presentation/YangCXWBE18Slid… · [Marlin et al. 09], [Schnabel et al. 16], [Steck10], [Steck11], and [Steck13]

6

Offline Evaluation procedure

train/test

Page 7: Unbiased Offline Recommender Evaluation for Missing-Not-At …ylongqi/presentation/YangCXWBE18Slid… · [Marlin et al. 09], [Schnabel et al. 16], [Steck10], [Steck11], and [Steck13]

7

Offline Evaluation procedure

1. Train and validate a recommendation model

2. Averaged performance over held-out (user, item) interaction pairs

(Average-Over-All)

Page 8: Unbiased Offline Recommender Evaluation for Missing-Not-At …ylongqi/presentation/YangCXWBE18Slid… · [Marlin et al. 09], [Schnabel et al. 16], [Steck10], [Steck11], and [Steck13]

8

Offline Evaluation procedure

1. Train and validate a recommendation model

(policy) !

2. Averaged performance over held-out (user, item) interaction pairs

(Average-Over-All)

Rating-based recommendation systems

Implicit feedback-based recommendation systems

Page 9: Unbiased Offline Recommender Evaluation for Missing-Not-At …ylongqi/presentation/YangCXWBE18Slid… · [Marlin et al. 09], [Schnabel et al. 16], [Steck10], [Steck11], and [Steck13]

9

Previous work: Average-Over-All is biased for rating-based recommendation systems, because ratings are MNAR

[Marlin et al. 09], [Schnabel et al. 16], [Steck 10], [Steck 11], and [Steck 13]

Page 10: Unbiased Offline Recommender Evaluation for Missing-Not-At …ylongqi/presentation/YangCXWBE18Slid… · [Marlin et al. 09], [Schnabel et al. 16], [Steck10], [Steck11], and [Steck13]

10

Previous work: Average-Over-All is biased for rating-based recommendation systems, because ratings are MNAR

[Marlin et al. 09], [Schnabel et al. 16], [Steck 10], [Steck 11], and [Steck 13]

Previous work: Average-Over-All is unbiased for implicit feedback-based recommendation systems, because implicit

feedback is missing uniformly at random.[Lim 15]

Page 11: Unbiased Offline Recommender Evaluation for Missing-Not-At …ylongqi/presentation/YangCXWBE18Slid… · [Marlin et al. 09], [Schnabel et al. 16], [Steck10], [Steck11], and [Steck13]

11

This work: Average-Over-All is biased for implicit feedback-based recommendation systems, because

implicit feedback is NOT missing uniformly at random.

Page 12: Unbiased Offline Recommender Evaluation for Missing-Not-At …ylongqi/presentation/YangCXWBE18Slid… · [Marlin et al. 09], [Schnabel et al. 16], [Steck10], [Steck11], and [Steck13]

12

This work: Average-Over-All is biased for implicit feedback-based recommendation systems, because

implicit feedback is NOT missing uniformly at random.

Popularity bias (Users are more likely to be exposed to popular items)

trending recommendation

Page 13: Unbiased Offline Recommender Evaluation for Missing-Not-At …ylongqi/presentation/YangCXWBE18Slid… · [Marlin et al. 09], [Schnabel et al. 16], [Steck10], [Steck11], and [Steck13]

13

A Hypothetical Example

Popular Items Long-tail Items

# of liked items(over all items)

# of liked items(over observations)

1 10

10 1

:

:

Algorithm 1 Performance

Algorithm 2 Performance

0.8

0.75 0.75

0

Page 14: Unbiased Offline Recommender Evaluation for Missing-Not-At …ylongqi/presentation/YangCXWBE18Slid… · [Marlin et al. 09], [Schnabel et al. 16], [Steck10], [Steck11], and [Steck13]

14

A Hypothetical Example

Popular Items Long-tail Items

# of liked items(over all items)

# of liked items(over observations)

1 10

10 1

:

:

Algorithm 1 Performance

Algorithm 2 Performance

0.8

0.75 0.75

0

Page 15: Unbiased Offline Recommender Evaluation for Missing-Not-At …ylongqi/presentation/YangCXWBE18Slid… · [Marlin et al. 09], [Schnabel et al. 16], [Steck10], [Steck11], and [Steck13]

15

A Hypothetical Example

Popular Items Long-tail Items

# of liked items(over all items)

# of liked items(over observations)

1 10

10 1

:

:

Algorithm 1 Performance

Algorithm 2 Performance

0.8

0.75 0.75

0

Page 16: Unbiased Offline Recommender Evaluation for Missing-Not-At …ylongqi/presentation/YangCXWBE18Slid… · [Marlin et al. 09], [Schnabel et al. 16], [Steck10], [Steck11], and [Steck13]

16

A Hypothetical Example

Popular Items Long-tail Items

# of liked items(over all items)

# of liked items(over observations)

1 10

10 1

:

:

Algorithm 1 Performance

Algorithm 2 Performance

0.8

0.75 0.75

0

Page 17: Unbiased Offline Recommender Evaluation for Missing-Not-At …ylongqi/presentation/YangCXWBE18Slid… · [Marlin et al. 09], [Schnabel et al. 16], [Steck10], [Steck11], and [Steck13]

17

A Hypothetical Example

Popular Items Long-tail Items

# of liked items(over all items)

# of liked items(over observations)

1 10

10 1

:

:

Algorithm 1 Performance

Algorithm 2 Performance

0.8

0.75 0.75

0

Page 18: Unbiased Offline Recommender Evaluation for Missing-Not-At …ylongqi/presentation/YangCXWBE18Slid… · [Marlin et al. 09], [Schnabel et al. 16], [Steck10], [Steck11], and [Steck13]

18

A Hypothetical Example

Popular Items Long-tail Items

# of liked items(over all items)

# of liked items(over observations)

1 10

10 1

:

:

Algorithm 1 Performance

Algorithm 2 Performance

0.8

0.75 0.75

0Any sensible evaluation

Page 19: Unbiased Offline Recommender Evaluation for Missing-Not-At …ylongqi/presentation/YangCXWBE18Slid… · [Marlin et al. 09], [Schnabel et al. 16], [Steck10], [Steck11], and [Steck13]

19

A Hypothetical Example

Popular Items Long-tail Items

# of liked items(over all items)

# of liked items(over observations)

1 10

10 1

:

:

Algorithm 1 Performance

Algorithm 2 Performance

0.8

0.75 0.75

0

Average-Over-All

Page 20: Unbiased Offline Recommender Evaluation for Missing-Not-At …ylongqi/presentation/YangCXWBE18Slid… · [Marlin et al. 09], [Schnabel et al. 16], [Steck10], [Steck11], and [Steck13]

20

Formalize Reward !

R(Z) =1

|U|X

u2U

1

|Su|X

i2Su

c(Zu,i)Ideal evaluation:

Item rankings predicted by an algorithm<latexit sha1_base64="ZyVWiPK9uixN125Ap0AIBWyDfRc=">AAACEHicdVDPaxNBGJ1Nq42xaqzHXoYGsaewG3YxuQW81FuF5gckIXw7+yUZMjO7zHxbCCF/ghf/FS8eWopXj978b5ykEWzRBwOP977H8F5aKOkoDH8FlYPDJ0+Pqs9qz49fvHxVf33Sd3lpBfZErnI7TMGhkgZ7JEnhsLAIOlU4SJcftv7gGq2TubmiVYETDXMjZ1IAeWlaf/eRUHMLZinN3HGfzaQgzHi64mA4qHluJS30tN4Im504iaOEh804jJMo8iRqd8J2wqNmuEOD7XE5rf8cZ7koNRoSCpwbRWFBkzVYkkLhpjYuHRYgljDHkacGNLrJeldow996JeOz3PpniO/UvxNr0M6tdOovNdDCPfa24r+8UUmz9mQtTVESGnH/0axUnHK+XYdn0qIgtfIEhC8uBRcLsOAnsa7mR/jTlP+f9FvNyPNPrUY33s9RZafsjJ2ziL1nXXbBLlmPCfaZfWU37Db4EnwL7oLv96eVYJ95wx4g+PEb612dHA==</latexit><latexit sha1_base64="ZyVWiPK9uixN125Ap0AIBWyDfRc=">AAACEHicdVDPaxNBGJ1Nq42xaqzHXoYGsaewG3YxuQW81FuF5gckIXw7+yUZMjO7zHxbCCF/ghf/FS8eWopXj978b5ykEWzRBwOP977H8F5aKOkoDH8FlYPDJ0+Pqs9qz49fvHxVf33Sd3lpBfZErnI7TMGhkgZ7JEnhsLAIOlU4SJcftv7gGq2TubmiVYETDXMjZ1IAeWlaf/eRUHMLZinN3HGfzaQgzHi64mA4qHluJS30tN4Im504iaOEh804jJMo8iRqd8J2wqNmuEOD7XE5rf8cZ7koNRoSCpwbRWFBkzVYkkLhpjYuHRYgljDHkacGNLrJeldow996JeOz3PpniO/UvxNr0M6tdOovNdDCPfa24r+8UUmz9mQtTVESGnH/0axUnHK+XYdn0qIgtfIEhC8uBRcLsOAnsa7mR/jTlP+f9FvNyPNPrUY33s9RZafsjJ2ziL1nXXbBLlmPCfaZfWU37Db4EnwL7oLv96eVYJ95wx4g+PEb612dHA==</latexit><latexit sha1_base64="ZyVWiPK9uixN125Ap0AIBWyDfRc=">AAACEHicdVDPaxNBGJ1Nq42xaqzHXoYGsaewG3YxuQW81FuF5gckIXw7+yUZMjO7zHxbCCF/ghf/FS8eWopXj978b5ykEWzRBwOP977H8F5aKOkoDH8FlYPDJ0+Pqs9qz49fvHxVf33Sd3lpBfZErnI7TMGhkgZ7JEnhsLAIOlU4SJcftv7gGq2TubmiVYETDXMjZ1IAeWlaf/eRUHMLZinN3HGfzaQgzHi64mA4qHluJS30tN4Im504iaOEh804jJMo8iRqd8J2wqNmuEOD7XE5rf8cZ7koNRoSCpwbRWFBkzVYkkLhpjYuHRYgljDHkacGNLrJeldow996JeOz3PpniO/UvxNr0M6tdOovNdDCPfa24r+8UUmz9mQtTVESGnH/0axUnHK+XYdn0qIgtfIEhC8uBRcLsOAnsa7mR/jTlP+f9FvNyPNPrUY33s9RZafsjJ2ziL1nXXbBLlmPCfaZfWU37Db4EnwL7oLv96eVYJ95wx4g+PEb612dHA==</latexit><latexit sha1_base64="ZyVWiPK9uixN125Ap0AIBWyDfRc=">AAACEHicdVDPaxNBGJ1Nq42xaqzHXoYGsaewG3YxuQW81FuF5gckIXw7+yUZMjO7zHxbCCF/ghf/FS8eWopXj978b5ykEWzRBwOP977H8F5aKOkoDH8FlYPDJ0+Pqs9qz49fvHxVf33Sd3lpBfZErnI7TMGhkgZ7JEnhsLAIOlU4SJcftv7gGq2TubmiVYETDXMjZ1IAeWlaf/eRUHMLZinN3HGfzaQgzHi64mA4qHluJS30tN4Im504iaOEh804jJMo8iRqd8J2wqNmuEOD7XE5rf8cZ7koNRoSCpwbRWFBkzVYkkLhpjYuHRYgljDHkacGNLrJeldow996JeOz3PpniO/UvxNr0M6tdOovNdDCPfa24r+8UUmz9mQtTVESGnH/0axUnHK+XYdn0qIgtfIEhC8uBRcLsOAnsa7mR/jTlP+f9FvNyPNPrUY33s9RZafsjJ2ziL1nXXbBLlmPCfaZfWU37Db4EnwL7oLv96eVYJ95wx4g+PEb612dHA==</latexit>

Page 21: Unbiased Offline Recommender Evaluation for Missing-Not-At …ylongqi/presentation/YangCXWBE18Slid… · [Marlin et al. 09], [Schnabel et al. 16], [Steck10], [Steck11], and [Steck13]

21

Formalize Reward !

R(Z) =1

|U|X

u2U

1

|Su|X

i2Su

c(Zu,i)Ideal evaluation:

Predicted ranking of item i for user u

Items liked by user u among the entire item set

Reward for (u, i) pair

scoring metric<latexit sha1_base64="kRMqKdaDY8ZBXiHQ/ORnrxYWwbE=">AAAB9XicdVDLSgMxFM3UV62vqks3wSK4Gmamj6m7ghuXFewD2rFk0kwbmkmGJKOUof/hxoUibv0Xd/6N6UNQ0QMXDufcy733hAmjSjvOh5VbW9/Y3MpvF3Z29/YPiodHbSVSiUkLCyZkN0SKMMpJS1PNSDeRBMUhI51wcjn3O3dEKir4jZ4mJIjRiNOIYqSNdKuwkJSPYEy0pHhQLDl2vVop+z507HLNLXtVQzzHr1/UoGs7C5TACs1B8b0/FDiNCdeYIaV6rpPoIENSU8zIrNBPFUkQnqAR6RnKUUxUkC2unsEzowxhJKQpruFC/T6RoVipaRyazhjpsfrtzcW/vF6qo3qQUZ6kmnC8XBSlDGoB5xHAIZUEazY1BGFJza0Qj5FEWJugCiaEr0/h/6Tt2a5ju9deqVFZxZEHJ+AUnAMX+KABrkATtAAGEjyAJ/Bs3VuP1ov1umzNWauZY/AD1tsnKXGS5Q==</latexit><latexit sha1_base64="kRMqKdaDY8ZBXiHQ/ORnrxYWwbE=">AAAB9XicdVDLSgMxFM3UV62vqks3wSK4Gmamj6m7ghuXFewD2rFk0kwbmkmGJKOUof/hxoUibv0Xd/6N6UNQ0QMXDufcy733hAmjSjvOh5VbW9/Y3MpvF3Z29/YPiodHbSVSiUkLCyZkN0SKMMpJS1PNSDeRBMUhI51wcjn3O3dEKir4jZ4mJIjRiNOIYqSNdKuwkJSPYEy0pHhQLDl2vVop+z507HLNLXtVQzzHr1/UoGs7C5TACs1B8b0/FDiNCdeYIaV6rpPoIENSU8zIrNBPFUkQnqAR6RnKUUxUkC2unsEzowxhJKQpruFC/T6RoVipaRyazhjpsfrtzcW/vF6qo3qQUZ6kmnC8XBSlDGoB5xHAIZUEazY1BGFJza0Qj5FEWJugCiaEr0/h/6Tt2a5ju9deqVFZxZEHJ+AUnAMX+KABrkATtAAGEjyAJ/Bs3VuP1ov1umzNWauZY/AD1tsnKXGS5Q==</latexit><latexit sha1_base64="kRMqKdaDY8ZBXiHQ/ORnrxYWwbE=">AAAB9XicdVDLSgMxFM3UV62vqks3wSK4Gmamj6m7ghuXFewD2rFk0kwbmkmGJKOUof/hxoUibv0Xd/6N6UNQ0QMXDufcy733hAmjSjvOh5VbW9/Y3MpvF3Z29/YPiodHbSVSiUkLCyZkN0SKMMpJS1PNSDeRBMUhI51wcjn3O3dEKir4jZ4mJIjRiNOIYqSNdKuwkJSPYEy0pHhQLDl2vVop+z507HLNLXtVQzzHr1/UoGs7C5TACs1B8b0/FDiNCdeYIaV6rpPoIENSU8zIrNBPFUkQnqAR6RnKUUxUkC2unsEzowxhJKQpruFC/T6RoVipaRyazhjpsfrtzcW/vF6qo3qQUZ6kmnC8XBSlDGoB5xHAIZUEazY1BGFJza0Qj5FEWJugCiaEr0/h/6Tt2a5ju9deqVFZxZEHJ+AUnAMX+KABrkATtAAGEjyAJ/Bs3VuP1ov1umzNWauZY/AD1tsnKXGS5Q==</latexit><latexit sha1_base64="kRMqKdaDY8ZBXiHQ/ORnrxYWwbE=">AAAB9XicdVDLSgMxFM3UV62vqks3wSK4Gmamj6m7ghuXFewD2rFk0kwbmkmGJKOUof/hxoUibv0Xd/6N6UNQ0QMXDufcy733hAmjSjvOh5VbW9/Y3MpvF3Z29/YPiodHbSVSiUkLCyZkN0SKMMpJS1PNSDeRBMUhI51wcjn3O3dEKir4jZ4mJIjRiNOIYqSNdKuwkJSPYEy0pHhQLDl2vVop+z507HLNLXtVQzzHr1/UoGs7C5TACs1B8b0/FDiNCdeYIaV6rpPoIENSU8zIrNBPFUkQnqAR6RnKUUxUkC2unsEzowxhJKQpruFC/T6RoVipaRyazhjpsfrtzcW/vF6qo3qQUZ6kmnC8XBSlDGoB5xHAIZUEazY1BGFJza0Qj5FEWJugCiaEr0/h/6Tt2a5ju9deqVFZxZEHJ+AUnAMX+KABrkATtAAGEjyAJ/Bs3VuP1ov1umzNWauZY/AD1tsnKXGS5Q==</latexit>

Item rankings predicted by an algorithm<latexit sha1_base64="ZyVWiPK9uixN125Ap0AIBWyDfRc=">AAACEHicdVDPaxNBGJ1Nq42xaqzHXoYGsaewG3YxuQW81FuF5gckIXw7+yUZMjO7zHxbCCF/ghf/FS8eWopXj978b5ykEWzRBwOP977H8F5aKOkoDH8FlYPDJ0+Pqs9qz49fvHxVf33Sd3lpBfZErnI7TMGhkgZ7JEnhsLAIOlU4SJcftv7gGq2TubmiVYETDXMjZ1IAeWlaf/eRUHMLZinN3HGfzaQgzHi64mA4qHluJS30tN4Im504iaOEh804jJMo8iRqd8J2wqNmuEOD7XE5rf8cZ7koNRoSCpwbRWFBkzVYkkLhpjYuHRYgljDHkacGNLrJeldow996JeOz3PpniO/UvxNr0M6tdOovNdDCPfa24r+8UUmz9mQtTVESGnH/0axUnHK+XYdn0qIgtfIEhC8uBRcLsOAnsa7mR/jTlP+f9FvNyPNPrUY33s9RZafsjJ2ziL1nXXbBLlmPCfaZfWU37Db4EnwL7oLv96eVYJ95wx4g+PEb612dHA==</latexit><latexit sha1_base64="ZyVWiPK9uixN125Ap0AIBWyDfRc=">AAACEHicdVDPaxNBGJ1Nq42xaqzHXoYGsaewG3YxuQW81FuF5gckIXw7+yUZMjO7zHxbCCF/ghf/FS8eWopXj978b5ykEWzRBwOP977H8F5aKOkoDH8FlYPDJ0+Pqs9qz49fvHxVf33Sd3lpBfZErnI7TMGhkgZ7JEnhsLAIOlU4SJcftv7gGq2TubmiVYETDXMjZ1IAeWlaf/eRUHMLZinN3HGfzaQgzHi64mA4qHluJS30tN4Im504iaOEh804jJMo8iRqd8J2wqNmuEOD7XE5rf8cZ7koNRoSCpwbRWFBkzVYkkLhpjYuHRYgljDHkacGNLrJeldow996JeOz3PpniO/UvxNr0M6tdOovNdDCPfa24r+8UUmz9mQtTVESGnH/0axUnHK+XYdn0qIgtfIEhC8uBRcLsOAnsa7mR/jTlP+f9FvNyPNPrUY33s9RZafsjJ2ziL1nXXbBLlmPCfaZfWU37Db4EnwL7oLv96eVYJ95wx4g+PEb612dHA==</latexit><latexit sha1_base64="ZyVWiPK9uixN125Ap0AIBWyDfRc=">AAACEHicdVDPaxNBGJ1Nq42xaqzHXoYGsaewG3YxuQW81FuF5gckIXw7+yUZMjO7zHxbCCF/ghf/FS8eWopXj978b5ykEWzRBwOP977H8F5aKOkoDH8FlYPDJ0+Pqs9qz49fvHxVf33Sd3lpBfZErnI7TMGhkgZ7JEnhsLAIOlU4SJcftv7gGq2TubmiVYETDXMjZ1IAeWlaf/eRUHMLZinN3HGfzaQgzHi64mA4qHluJS30tN4Im504iaOEh804jJMo8iRqd8J2wqNmuEOD7XE5rf8cZ7koNRoSCpwbRWFBkzVYkkLhpjYuHRYgljDHkacGNLrJeldow996JeOz3PpniO/UvxNr0M6tdOovNdDCPfa24r+8UUmz9mQtTVESGnH/0axUnHK+XYdn0qIgtfIEhC8uBRcLsOAnsa7mR/jTlP+f9FvNyPNPrUY33s9RZafsjJ2ziL1nXXbBLlmPCfaZfWU37Db4EnwL7oLv96eVYJ95wx4g+PEb612dHA==</latexit><latexit sha1_base64="ZyVWiPK9uixN125Ap0AIBWyDfRc=">AAACEHicdVDPaxNBGJ1Nq42xaqzHXoYGsaewG3YxuQW81FuF5gckIXw7+yUZMjO7zHxbCCF/ghf/FS8eWopXj978b5ykEWzRBwOP977H8F5aKOkoDH8FlYPDJ0+Pqs9qz49fvHxVf33Sd3lpBfZErnI7TMGhkgZ7JEnhsLAIOlU4SJcftv7gGq2TubmiVYETDXMjZ1IAeWlaf/eRUHMLZinN3HGfzaQgzHi64mA4qHluJS30tN4Im504iaOEh804jJMo8iRqd8J2wqNmuEOD7XE5rf8cZ7koNRoSCpwbRWFBkzVYkkLhpjYuHRYgljDHkacGNLrJeldow996JeOz3PpniO/UvxNr0M6tdOovNdDCPfa24r+8UUmz9mQtTVESGnH/0axUnHK+XYdn0qIgtfIEhC8uBRcLsOAnsa7mR/jTlP+f9FvNyPNPrUY33s9RZafsjJ2ziL1nXXbBLlmPCfaZfWU37Db4EnwL7oLv96eVYJ95wx4g+PEb612dHA==</latexit>

Page 22: Unbiased Offline Recommender Evaluation for Missing-Not-At …ylongqi/presentation/YangCXWBE18Slid… · [Marlin et al. 09], [Schnabel et al. 16], [Steck10], [Steck11], and [Steck13]

22

Formalize Reward !

R(Z) =1

|U|X

u2U

1

|Su|X

i2Su

c(Zu,i)Ideal evaluation:

Predicted ranking of item i for user u

Items liked by user u among the entire item set

Reward for user u

scoring metric<latexit sha1_base64="kRMqKdaDY8ZBXiHQ/ORnrxYWwbE=">AAAB9XicdVDLSgMxFM3UV62vqks3wSK4Gmamj6m7ghuXFewD2rFk0kwbmkmGJKOUof/hxoUibv0Xd/6N6UNQ0QMXDufcy733hAmjSjvOh5VbW9/Y3MpvF3Z29/YPiodHbSVSiUkLCyZkN0SKMMpJS1PNSDeRBMUhI51wcjn3O3dEKir4jZ4mJIjRiNOIYqSNdKuwkJSPYEy0pHhQLDl2vVop+z507HLNLXtVQzzHr1/UoGs7C5TACs1B8b0/FDiNCdeYIaV6rpPoIENSU8zIrNBPFUkQnqAR6RnKUUxUkC2unsEzowxhJKQpruFC/T6RoVipaRyazhjpsfrtzcW/vF6qo3qQUZ6kmnC8XBSlDGoB5xHAIZUEazY1BGFJza0Qj5FEWJugCiaEr0/h/6Tt2a5ju9deqVFZxZEHJ+AUnAMX+KABrkATtAAGEjyAJ/Bs3VuP1ov1umzNWauZY/AD1tsnKXGS5Q==</latexit><latexit sha1_base64="kRMqKdaDY8ZBXiHQ/ORnrxYWwbE=">AAAB9XicdVDLSgMxFM3UV62vqks3wSK4Gmamj6m7ghuXFewD2rFk0kwbmkmGJKOUof/hxoUibv0Xd/6N6UNQ0QMXDufcy733hAmjSjvOh5VbW9/Y3MpvF3Z29/YPiodHbSVSiUkLCyZkN0SKMMpJS1PNSDeRBMUhI51wcjn3O3dEKir4jZ4mJIjRiNOIYqSNdKuwkJSPYEy0pHhQLDl2vVop+z507HLNLXtVQzzHr1/UoGs7C5TACs1B8b0/FDiNCdeYIaV6rpPoIENSU8zIrNBPFUkQnqAR6RnKUUxUkC2unsEzowxhJKQpruFC/T6RoVipaRyazhjpsfrtzcW/vF6qo3qQUZ6kmnC8XBSlDGoB5xHAIZUEazY1BGFJza0Qj5FEWJugCiaEr0/h/6Tt2a5ju9deqVFZxZEHJ+AUnAMX+KABrkATtAAGEjyAJ/Bs3VuP1ov1umzNWauZY/AD1tsnKXGS5Q==</latexit><latexit sha1_base64="kRMqKdaDY8ZBXiHQ/ORnrxYWwbE=">AAAB9XicdVDLSgMxFM3UV62vqks3wSK4Gmamj6m7ghuXFewD2rFk0kwbmkmGJKOUof/hxoUibv0Xd/6N6UNQ0QMXDufcy733hAmjSjvOh5VbW9/Y3MpvF3Z29/YPiodHbSVSiUkLCyZkN0SKMMpJS1PNSDeRBMUhI51wcjn3O3dEKir4jZ4mJIjRiNOIYqSNdKuwkJSPYEy0pHhQLDl2vVop+z507HLNLXtVQzzHr1/UoGs7C5TACs1B8b0/FDiNCdeYIaV6rpPoIENSU8zIrNBPFUkQnqAR6RnKUUxUkC2unsEzowxhJKQpruFC/T6RoVipaRyazhjpsfrtzcW/vF6qo3qQUZ6kmnC8XBSlDGoB5xHAIZUEazY1BGFJza0Qj5FEWJugCiaEr0/h/6Tt2a5ju9deqVFZxZEHJ+AUnAMX+KABrkATtAAGEjyAJ/Bs3VuP1ov1umzNWauZY/AD1tsnKXGS5Q==</latexit><latexit sha1_base64="kRMqKdaDY8ZBXiHQ/ORnrxYWwbE=">AAAB9XicdVDLSgMxFM3UV62vqks3wSK4Gmamj6m7ghuXFewD2rFk0kwbmkmGJKOUof/hxoUibv0Xd/6N6UNQ0QMXDufcy733hAmjSjvOh5VbW9/Y3MpvF3Z29/YPiodHbSVSiUkLCyZkN0SKMMpJS1PNSDeRBMUhI51wcjn3O3dEKir4jZ4mJIjRiNOIYqSNdKuwkJSPYEy0pHhQLDl2vVop+z507HLNLXtVQzzHr1/UoGs7C5TACs1B8b0/FDiNCdeYIaV6rpPoIENSU8zIrNBPFUkQnqAR6RnKUUxUkC2unsEzowxhJKQpruFC/T6RoVipaRyazhjpsfrtzcW/vF6qo3qQUZ6kmnC8XBSlDGoB5xHAIZUEazY1BGFJza0Qj5FEWJugCiaEr0/h/6Tt2a5ju9deqVFZxZEHJ+AUnAMX+KABrkATtAAGEjyAJ/Bs3VuP1ov1umzNWauZY/AD1tsnKXGS5Q==</latexit>

Item rankings predicted by an algorithm<latexit sha1_base64="ZyVWiPK9uixN125Ap0AIBWyDfRc=">AAACEHicdVDPaxNBGJ1Nq42xaqzHXoYGsaewG3YxuQW81FuF5gckIXw7+yUZMjO7zHxbCCF/ghf/FS8eWopXj978b5ykEWzRBwOP977H8F5aKOkoDH8FlYPDJ0+Pqs9qz49fvHxVf33Sd3lpBfZErnI7TMGhkgZ7JEnhsLAIOlU4SJcftv7gGq2TubmiVYETDXMjZ1IAeWlaf/eRUHMLZinN3HGfzaQgzHi64mA4qHluJS30tN4Im504iaOEh804jJMo8iRqd8J2wqNmuEOD7XE5rf8cZ7koNRoSCpwbRWFBkzVYkkLhpjYuHRYgljDHkacGNLrJeldow996JeOz3PpniO/UvxNr0M6tdOovNdDCPfa24r+8UUmz9mQtTVESGnH/0axUnHK+XYdn0qIgtfIEhC8uBRcLsOAnsa7mR/jTlP+f9FvNyPNPrUY33s9RZafsjJ2ziL1nXXbBLlmPCfaZfWU37Db4EnwL7oLv96eVYJ95wx4g+PEb612dHA==</latexit><latexit sha1_base64="ZyVWiPK9uixN125Ap0AIBWyDfRc=">AAACEHicdVDPaxNBGJ1Nq42xaqzHXoYGsaewG3YxuQW81FuF5gckIXw7+yUZMjO7zHxbCCF/ghf/FS8eWopXj978b5ykEWzRBwOP977H8F5aKOkoDH8FlYPDJ0+Pqs9qz49fvHxVf33Sd3lpBfZErnI7TMGhkgZ7JEnhsLAIOlU4SJcftv7gGq2TubmiVYETDXMjZ1IAeWlaf/eRUHMLZinN3HGfzaQgzHi64mA4qHluJS30tN4Im504iaOEh804jJMo8iRqd8J2wqNmuEOD7XE5rf8cZ7koNRoSCpwbRWFBkzVYkkLhpjYuHRYgljDHkacGNLrJeldow996JeOz3PpniO/UvxNr0M6tdOovNdDCPfa24r+8UUmz9mQtTVESGnH/0axUnHK+XYdn0qIgtfIEhC8uBRcLsOAnsa7mR/jTlP+f9FvNyPNPrUY33s9RZafsjJ2ziL1nXXbBLlmPCfaZfWU37Db4EnwL7oLv96eVYJ95wx4g+PEb612dHA==</latexit><latexit sha1_base64="ZyVWiPK9uixN125Ap0AIBWyDfRc=">AAACEHicdVDPaxNBGJ1Nq42xaqzHXoYGsaewG3YxuQW81FuF5gckIXw7+yUZMjO7zHxbCCF/ghf/FS8eWopXj978b5ykEWzRBwOP977H8F5aKOkoDH8FlYPDJ0+Pqs9qz49fvHxVf33Sd3lpBfZErnI7TMGhkgZ7JEnhsLAIOlU4SJcftv7gGq2TubmiVYETDXMjZ1IAeWlaf/eRUHMLZinN3HGfzaQgzHi64mA4qHluJS30tN4Im504iaOEh804jJMo8iRqd8J2wqNmuEOD7XE5rf8cZ7koNRoSCpwbRWFBkzVYkkLhpjYuHRYgljDHkacGNLrJeldow996JeOz3PpniO/UvxNr0M6tdOovNdDCPfa24r+8UUmz9mQtTVESGnH/0axUnHK+XYdn0qIgtfIEhC8uBRcLsOAnsa7mR/jTlP+f9FvNyPNPrUY33s9RZafsjJ2ziL1nXXbBLlmPCfaZfWU37Db4EnwL7oLv96eVYJ95wx4g+PEb612dHA==</latexit><latexit sha1_base64="ZyVWiPK9uixN125Ap0AIBWyDfRc=">AAACEHicdVDPaxNBGJ1Nq42xaqzHXoYGsaewG3YxuQW81FuF5gckIXw7+yUZMjO7zHxbCCF/ghf/FS8eWopXj978b5ykEWzRBwOP977H8F5aKOkoDH8FlYPDJ0+Pqs9qz49fvHxVf33Sd3lpBfZErnI7TMGhkgZ7JEnhsLAIOlU4SJcftv7gGq2TubmiVYETDXMjZ1IAeWlaf/eRUHMLZinN3HGfzaQgzHi64mA4qHluJS30tN4Im504iaOEh804jJMo8iRqd8J2wqNmuEOD7XE5rf8cZ7koNRoSCpwbRWFBkzVYkkLhpjYuHRYgljDHkacGNLrJeldow996JeOz3PpniO/UvxNr0M6tdOovNdDCPfa24r+8UUmz9mQtTVESGnH/0axUnHK+XYdn0qIgtfIEhC8uBRcLsOAnsa7mR/jTlP+f9FvNyPNPrUY33s9RZafsjJ2ziL1nXXbBLlmPCfaZfWU37Db4EnwL7oLv96eVYJ95wx4g+PEb612dHA==</latexit>

Page 23: Unbiased Offline Recommender Evaluation for Missing-Not-At …ylongqi/presentation/YangCXWBE18Slid… · [Marlin et al. 09], [Schnabel et al. 16], [Steck10], [Steck11], and [Steck13]

23

Formalize Reward !

R(Z) =1

|U|X

u2U

1

|Su|X

i2Su

c(Zu,i)Ideal evaluation:

Predicted ranking of item i for user u

Items liked by user u among the entire item set scoring metric<latexit sha1_base64="kRMqKdaDY8ZBXiHQ/ORnrxYWwbE=">AAAB9XicdVDLSgMxFM3UV62vqks3wSK4Gmamj6m7ghuXFewD2rFk0kwbmkmGJKOUof/hxoUibv0Xd/6N6UNQ0QMXDufcy733hAmjSjvOh5VbW9/Y3MpvF3Z29/YPiodHbSVSiUkLCyZkN0SKMMpJS1PNSDeRBMUhI51wcjn3O3dEKir4jZ4mJIjRiNOIYqSNdKuwkJSPYEy0pHhQLDl2vVop+z507HLNLXtVQzzHr1/UoGs7C5TACs1B8b0/FDiNCdeYIaV6rpPoIENSU8zIrNBPFUkQnqAR6RnKUUxUkC2unsEzowxhJKQpruFC/T6RoVipaRyazhjpsfrtzcW/vF6qo3qQUZ6kmnC8XBSlDGoB5xHAIZUEazY1BGFJza0Qj5FEWJugCiaEr0/h/6Tt2a5ju9deqVFZxZEHJ+AUnAMX+KABrkATtAAGEjyAJ/Bs3VuP1ov1umzNWauZY/AD1tsnKXGS5Q==</latexit><latexit sha1_base64="kRMqKdaDY8ZBXiHQ/ORnrxYWwbE=">AAAB9XicdVDLSgMxFM3UV62vqks3wSK4Gmamj6m7ghuXFewD2rFk0kwbmkmGJKOUof/hxoUibv0Xd/6N6UNQ0QMXDufcy733hAmjSjvOh5VbW9/Y3MpvF3Z29/YPiodHbSVSiUkLCyZkN0SKMMpJS1PNSDeRBMUhI51wcjn3O3dEKir4jZ4mJIjRiNOIYqSNdKuwkJSPYEy0pHhQLDl2vVop+z507HLNLXtVQzzHr1/UoGs7C5TACs1B8b0/FDiNCdeYIaV6rpPoIENSU8zIrNBPFUkQnqAR6RnKUUxUkC2unsEzowxhJKQpruFC/T6RoVipaRyazhjpsfrtzcW/vF6qo3qQUZ6kmnC8XBSlDGoB5xHAIZUEazY1BGFJza0Qj5FEWJugCiaEr0/h/6Tt2a5ju9deqVFZxZEHJ+AUnAMX+KABrkATtAAGEjyAJ/Bs3VuP1ov1umzNWauZY/AD1tsnKXGS5Q==</latexit><latexit sha1_base64="kRMqKdaDY8ZBXiHQ/ORnrxYWwbE=">AAAB9XicdVDLSgMxFM3UV62vqks3wSK4Gmamj6m7ghuXFewD2rFk0kwbmkmGJKOUof/hxoUibv0Xd/6N6UNQ0QMXDufcy733hAmjSjvOh5VbW9/Y3MpvF3Z29/YPiodHbSVSiUkLCyZkN0SKMMpJS1PNSDeRBMUhI51wcjn3O3dEKir4jZ4mJIjRiNOIYqSNdKuwkJSPYEy0pHhQLDl2vVop+z507HLNLXtVQzzHr1/UoGs7C5TACs1B8b0/FDiNCdeYIaV6rpPoIENSU8zIrNBPFUkQnqAR6RnKUUxUkC2unsEzowxhJKQpruFC/T6RoVipaRyazhjpsfrtzcW/vF6qo3qQUZ6kmnC8XBSlDGoB5xHAIZUEazY1BGFJza0Qj5FEWJugCiaEr0/h/6Tt2a5ju9deqVFZxZEHJ+AUnAMX+KABrkATtAAGEjyAJ/Bs3VuP1ov1umzNWauZY/AD1tsnKXGS5Q==</latexit><latexit sha1_base64="kRMqKdaDY8ZBXiHQ/ORnrxYWwbE=">AAAB9XicdVDLSgMxFM3UV62vqks3wSK4Gmamj6m7ghuXFewD2rFk0kwbmkmGJKOUof/hxoUibv0Xd/6N6UNQ0QMXDufcy733hAmjSjvOh5VbW9/Y3MpvF3Z29/YPiodHbSVSiUkLCyZkN0SKMMpJS1PNSDeRBMUhI51wcjn3O3dEKir4jZ4mJIjRiNOIYqSNdKuwkJSPYEy0pHhQLDl2vVop+z507HLNLXtVQzzHr1/UoGs7C5TACs1B8b0/FDiNCdeYIaV6rpPoIENSU8zIrNBPFUkQnqAR6RnKUUxUkC2unsEzowxhJKQpruFC/T6RoVipaRyazhjpsfrtzcW/vF6qo3qQUZ6kmnC8XBSlDGoB5xHAIZUEazY1BGFJza0Qj5FEWJugCiaEr0/h/6Tt2a5ju9deqVFZxZEHJ+AUnAMX+KABrkATtAAGEjyAJ/Bs3VuP1ov1umzNWauZY/AD1tsnKXGS5Q==</latexit>

Item rankings predicted by an algorithm<latexit sha1_base64="ZyVWiPK9uixN125Ap0AIBWyDfRc=">AAACEHicdVDPaxNBGJ1Nq42xaqzHXoYGsaewG3YxuQW81FuF5gckIXw7+yUZMjO7zHxbCCF/ghf/FS8eWopXj978b5ykEWzRBwOP977H8F5aKOkoDH8FlYPDJ0+Pqs9qz49fvHxVf33Sd3lpBfZErnI7TMGhkgZ7JEnhsLAIOlU4SJcftv7gGq2TubmiVYETDXMjZ1IAeWlaf/eRUHMLZinN3HGfzaQgzHi64mA4qHluJS30tN4Im504iaOEh804jJMo8iRqd8J2wqNmuEOD7XE5rf8cZ7koNRoSCpwbRWFBkzVYkkLhpjYuHRYgljDHkacGNLrJeldow996JeOz3PpniO/UvxNr0M6tdOovNdDCPfa24r+8UUmz9mQtTVESGnH/0axUnHK+XYdn0qIgtfIEhC8uBRcLsOAnsa7mR/jTlP+f9FvNyPNPrUY33s9RZafsjJ2ziL1nXXbBLlmPCfaZfWU37Db4EnwL7oLv96eVYJ95wx4g+PEb612dHA==</latexit><latexit sha1_base64="ZyVWiPK9uixN125Ap0AIBWyDfRc=">AAACEHicdVDPaxNBGJ1Nq42xaqzHXoYGsaewG3YxuQW81FuF5gckIXw7+yUZMjO7zHxbCCF/ghf/FS8eWopXj978b5ykEWzRBwOP977H8F5aKOkoDH8FlYPDJ0+Pqs9qz49fvHxVf33Sd3lpBfZErnI7TMGhkgZ7JEnhsLAIOlU4SJcftv7gGq2TubmiVYETDXMjZ1IAeWlaf/eRUHMLZinN3HGfzaQgzHi64mA4qHluJS30tN4Im504iaOEh804jJMo8iRqd8J2wqNmuEOD7XE5rf8cZ7koNRoSCpwbRWFBkzVYkkLhpjYuHRYgljDHkacGNLrJeldow996JeOz3PpniO/UvxNr0M6tdOovNdDCPfa24r+8UUmz9mQtTVESGnH/0axUnHK+XYdn0qIgtfIEhC8uBRcLsOAnsa7mR/jTlP+f9FvNyPNPrUY33s9RZafsjJ2ziL1nXXbBLlmPCfaZfWU37Db4EnwL7oLv96eVYJ95wx4g+PEb612dHA==</latexit><latexit sha1_base64="ZyVWiPK9uixN125Ap0AIBWyDfRc=">AAACEHicdVDPaxNBGJ1Nq42xaqzHXoYGsaewG3YxuQW81FuF5gckIXw7+yUZMjO7zHxbCCF/ghf/FS8eWopXj978b5ykEWzRBwOP977H8F5aKOkoDH8FlYPDJ0+Pqs9qz49fvHxVf33Sd3lpBfZErnI7TMGhkgZ7JEnhsLAIOlU4SJcftv7gGq2TubmiVYETDXMjZ1IAeWlaf/eRUHMLZinN3HGfzaQgzHi64mA4qHluJS30tN4Im504iaOEh804jJMo8iRqd8J2wqNmuEOD7XE5rf8cZ7koNRoSCpwbRWFBkzVYkkLhpjYuHRYgljDHkacGNLrJeldow996JeOz3PpniO/UvxNr0M6tdOovNdDCPfa24r+8UUmz9mQtTVESGnH/0axUnHK+XYdn0qIgtfIEhC8uBRcLsOAnsa7mR/jTlP+f9FvNyPNPrUY33s9RZafsjJ2ziL1nXXbBLlmPCfaZfWU37Db4EnwL7oLv96eVYJ95wx4g+PEb612dHA==</latexit><latexit sha1_base64="ZyVWiPK9uixN125Ap0AIBWyDfRc=">AAACEHicdVDPaxNBGJ1Nq42xaqzHXoYGsaewG3YxuQW81FuF5gckIXw7+yUZMjO7zHxbCCF/ghf/FS8eWopXj978b5ykEWzRBwOP977H8F5aKOkoDH8FlYPDJ0+Pqs9qz49fvHxVf33Sd3lpBfZErnI7TMGhkgZ7JEnhsLAIOlU4SJcftv7gGq2TubmiVYETDXMjZ1IAeWlaf/eRUHMLZinN3HGfzaQgzHi64mA4qHluJS30tN4Im504iaOEh804jJMo8iRqd8J2wqNmuEOD7XE5rf8cZ7koNRoSCpwbRWFBkzVYkkLhpjYuHRYgljDHkacGNLrJeldow996JeOz3PpniO/UvxNr0M6tdOovNdDCPfa24r+8UUmz9mQtTVESGnH/0axUnHK+XYdn0qIgtfIEhC8uBRcLsOAnsa7mR/jTlP+f9FvNyPNPrUY33s9RZafsjJ2ziL1nXXbBLlmPCfaZfWU37Db4EnwL7oLv96eVYJ95wx4g+PEb612dHA==</latexit>

Reward for the algorithm<latexit sha1_base64="58zCSwWB0/MRbmW4ak6tFuTVoiw=">AAACAXicdVDLSgMxFM3UV62vqhvBTbAIrkpa+3InuHGpYlWopWTSO21oZjIkd5RS6sZfceNCEbf+hTv/xvQhqOhZhMM593Jzjh8raZGxDy81Mzs3v5BezCwtr6yuZdc3LqxOjIC60EqbK59bUDKCOkpUcBUb4KGv4NLvHY38yxswVuroHPsxNEPeiWQgBUcntbJbZ3DLTZsG2lDsAuWqo43EbtjK5lie1Ur7rEZZvlIqFytVR9x7UGa0kGdj5MgUJ63s+3VbiySECIXi1jYKLMbmgBuUQsEwc51YiLno8Q40HI14CLY5GCcY0l2nTD4R6AjpWP2+MeChtf3Qd5Mhx6797Y3Ev7xGgkGtOZBRnCBEYnIoSBRFTUd10LY0IFD1HeHCBZeCii43XKArLeNK+EpK/ycXxXzB8dNi7rA0rSNNtskO2SMFUiWH5JickDoR5I48kCfy7N17j96L9zoZTXnTnU3yA97bJ5YylvE=</latexit><latexit sha1_base64="58zCSwWB0/MRbmW4ak6tFuTVoiw=">AAACAXicdVDLSgMxFM3UV62vqhvBTbAIrkpa+3InuHGpYlWopWTSO21oZjIkd5RS6sZfceNCEbf+hTv/xvQhqOhZhMM593Jzjh8raZGxDy81Mzs3v5BezCwtr6yuZdc3LqxOjIC60EqbK59bUDKCOkpUcBUb4KGv4NLvHY38yxswVuroHPsxNEPeiWQgBUcntbJbZ3DLTZsG2lDsAuWqo43EbtjK5lie1Ur7rEZZvlIqFytVR9x7UGa0kGdj5MgUJ63s+3VbiySECIXi1jYKLMbmgBuUQsEwc51YiLno8Q40HI14CLY5GCcY0l2nTD4R6AjpWP2+MeChtf3Qd5Mhx6797Y3Ev7xGgkGtOZBRnCBEYnIoSBRFTUd10LY0IFD1HeHCBZeCii43XKArLeNK+EpK/ycXxXzB8dNi7rA0rSNNtskO2SMFUiWH5JickDoR5I48kCfy7N17j96L9zoZTXnTnU3yA97bJ5YylvE=</latexit><latexit sha1_base64="58zCSwWB0/MRbmW4ak6tFuTVoiw=">AAACAXicdVDLSgMxFM3UV62vqhvBTbAIrkpa+3InuHGpYlWopWTSO21oZjIkd5RS6sZfceNCEbf+hTv/xvQhqOhZhMM593Jzjh8raZGxDy81Mzs3v5BezCwtr6yuZdc3LqxOjIC60EqbK59bUDKCOkpUcBUb4KGv4NLvHY38yxswVuroHPsxNEPeiWQgBUcntbJbZ3DLTZsG2lDsAuWqo43EbtjK5lie1Ur7rEZZvlIqFytVR9x7UGa0kGdj5MgUJ63s+3VbiySECIXi1jYKLMbmgBuUQsEwc51YiLno8Q40HI14CLY5GCcY0l2nTD4R6AjpWP2+MeChtf3Qd5Mhx6797Y3Ev7xGgkGtOZBRnCBEYnIoSBRFTUd10LY0IFD1HeHCBZeCii43XKArLeNK+EpK/ycXxXzB8dNi7rA0rSNNtskO2SMFUiWH5JickDoR5I48kCfy7N17j96L9zoZTXnTnU3yA97bJ5YylvE=</latexit><latexit sha1_base64="58zCSwWB0/MRbmW4ak6tFuTVoiw=">AAACAXicdVDLSgMxFM3UV62vqhvBTbAIrkpa+3InuHGpYlWopWTSO21oZjIkd5RS6sZfceNCEbf+hTv/xvQhqOhZhMM593Jzjh8raZGxDy81Mzs3v5BezCwtr6yuZdc3LqxOjIC60EqbK59bUDKCOkpUcBUb4KGv4NLvHY38yxswVuroHPsxNEPeiWQgBUcntbJbZ3DLTZsG2lDsAuWqo43EbtjK5lie1Ur7rEZZvlIqFytVR9x7UGa0kGdj5MgUJ63s+3VbiySECIXi1jYKLMbmgBuUQsEwc51YiLno8Q40HI14CLY5GCcY0l2nTD4R6AjpWP2+MeChtf3Qd5Mhx6797Y3Ev7xGgkGtOZBRnCBEYnIoSBRFTUd10LY0IFD1HeHCBZeCii43XKArLeNK+EpK/ycXxXzB8dNi7rA0rSNNtskO2SMFUiWH5JickDoR5I48kCfy7N17j96L9zoZTXnTnU3yA97bJ5YylvE=</latexit>

Page 24: Unbiased Offline Recommender Evaluation for Missing-Not-At …ylongqi/presentation/YangCXWBE18Slid… · [Marlin et al. 09], [Schnabel et al. 16], [Steck10], [Steck11], and [Steck13]

24

Formalize Reward !

Average-Over-All: RAOA(Z) =1

|U|X

u2U

1

|S⇤u|

X

i2S⇤u

c(Zu,i)

Items liked by user u (observed)

Page 25: Unbiased Offline Recommender Evaluation for Missing-Not-At …ylongqi/presentation/YangCXWBE18Slid… · [Marlin et al. 09], [Schnabel et al. 16], [Steck10], [Steck11], and [Steck13]

25

Formalize Bias

Ou,i = 1 if (u, i) is observed, and Ou,i = 0 otherwise

EO

hRAOA(Z)

i6= R(Z)

Ou,i ⇠ B(1, Pu,i)

Page 26: Unbiased Offline Recommender Evaluation for Missing-Not-At …ylongqi/presentation/YangCXWBE18Slid… · [Marlin et al. 09], [Schnabel et al. 16], [Steck10], [Steck11], and [Steck13]

26

RIPS(Z|P ) =1

|U|X

u2U

1

|Su|X

i2S⇤u

c(Zu,i)

Pu,i

RAOA(Z) =1

|U|X

u2U

1

|S⇤u|

X

i2S⇤u

c(Zu,i)

Inverse-Propensity-Scoring (IPS)

Page 27: Unbiased Offline Recommender Evaluation for Missing-Not-At …ylongqi/presentation/YangCXWBE18Slid… · [Marlin et al. 09], [Schnabel et al. 16], [Steck10], [Steck11], and [Steck13]

27

RIPS(Z|P ) =1

|U|X

u2U

1

|Su|X

i2S⇤u

c(Zu,i)

Pu,i

RAOA(Z) =1

|U|X

u2U

1

|S⇤u|

X

i2S⇤u

c(Zu,i)

EO

hRIPS(Z|P )

i= R(Z)

Inverse-Propensity-Scoring (IPS)

Page 28: Unbiased Offline Recommender Evaluation for Missing-Not-At …ylongqi/presentation/YangCXWBE18Slid… · [Marlin et al. 09], [Schnabel et al. 16], [Steck10], [Steck11], and [Steck13]

28

RIPS(Z|P ) =1

|U|X

u2U

1

|Su|X

i2S⇤u

c(Zu,i)

Pu,i

Self-Normalized Inverse-Propensity-Scoring (SNIPS)[Swaminathan et al.15]

RSNIPS(Z|P ) =1

|U|X

u2U

1Pi2S⇤

u

1Pu,i

X

i2S⇤u

c(Zu,i)

Pu,i

Page 29: Unbiased Offline Recommender Evaluation for Missing-Not-At …ylongqi/presentation/YangCXWBE18Slid… · [Marlin et al. 09], [Schnabel et al. 16], [Steck10], [Steck11], and [Steck13]

29

Estimating Propensity Scores

Factor: Popularity bias (Users are more likely to be exposed to popular items)

Assumptions:

• User-independence assumption

• Two-steps assumption

• User preference is not affected by item presentation

Pu,i = P (Ou,i = 1) = P (O⇤,i = 1) = P⇤,i

P⇤,i = P select⇤,i · P interact|select

⇤,i

P interact|select⇤,i = P interact

⇤,i

Page 30: Unbiased Offline Recommender Evaluation for Missing-Not-At …ylongqi/presentation/YangCXWBE18Slid… · [Marlin et al. 09], [Schnabel et al. 16], [Steck10], [Steck11], and [Steck13]

30

Estimating Propensity Scores

Popularity bias model [Steck 11]:

P select⇤,i / (n⇤

i )�

Observed item popularity

Page 31: Unbiased Offline Recommender Evaluation for Missing-Not-At …ylongqi/presentation/YangCXWBE18Slid… · [Marlin et al. 09], [Schnabel et al. 16], [Steck10], [Steck11], and [Steck13]

31

Estimating Propensity Scores

Popularity bias model [Steck 11]:

P select⇤,i / (n⇤

i )�

P⇤,i / (n⇤i )( �+1

2 )

Estimated from known online content

serving policy

Page 32: Unbiased Offline Recommender Evaluation for Missing-Not-At …ylongqi/presentation/YangCXWBE18Slid… · [Marlin et al. 09], [Schnabel et al. 16], [Steck10], [Steck11], and [Steck13]

32

Measuring bias in recommender evaluation(Yahoo! music rating dataset)

ModelAverage-Over-All

!SNIPS(# = 1.5)

!SNIPS(# = 2.0)

!SNIPS(# = 2.5)

!SNIPS(# = 3.0)

U-CML 0.401 0.270 0.260 0.253 0.248

A-CML 0.399 0.274 0.264 0.258 0.253

BPR 0.380 0.275 0.268 0.262 0.258

PMF 0.386 0.267 0.259 0.252 0.248

Mean Absolute Error (MAE), Recall

!SNIPS produces significantly lower

MAE

Page 33: Unbiased Offline Recommender Evaluation for Missing-Not-At …ylongqi/presentation/YangCXWBE18Slid… · [Marlin et al. 09], [Schnabel et al. 16], [Steck10], [Steck11], and [Steck13]

33

Measuring bias in recommender evaluation(Yahoo! music rating dataset)

ModelAverage-Over-All

!SNIPS(# = 1.5)

!SNIPS(# = 2.0)

!SNIPS(# = 2.5)

!SNIPS(# = 3.0)

U-CML 0.401 0.270 0.260 0.253 0.248

A-CML 0.399 0.274 0.264 0.258 0.253

BPR 0.380 0.275 0.268 0.262 0.258

PMF 0.386 0.267 0.259 0.252 0.248

Mean Absolute Error (MAE), Recall

!SNIPS produces significantly lower

MAE

The accuracy of recommending popular items is a significant overestimation of the true recommendation performance

Page 34: Unbiased Offline Recommender Evaluation for Missing-Not-At …ylongqi/presentation/YangCXWBE18Slid… · [Marlin et al. 09], [Schnabel et al. 16], [Steck10], [Steck11], and [Steck13]

34

Please come to our poster or refer to our paper for:

• Proofs

• Experimental details.

• More experiments.

• Deeper analysis of the unbiased evaluator.

Page 35: Unbiased Offline Recommender Evaluation for Missing-Not-At …ylongqi/presentation/YangCXWBE18Slid… · [Marlin et al. 09], [Schnabel et al. 16], [Steck10], [Steck11], and [Steck13]

35

Conclusions and Future Work

RSNIPS(Z|P ) =1

|U|X

u2U

1Pi2S⇤

u

1Pu,i

X

i2S⇤u

c(Zu,i)

Pu,i

EO

hRIPS(Z|P )

i= R(Z)

• Understanding variance of evaluators.

• Propensity estimation (e.g., incorporate auxiliary

user and item information).

• Debias training of recommendation systems (e.g.,

[Liang et al. 16]).

Page 36: Unbiased Offline Recommender Evaluation for Missing-Not-At …ylongqi/presentation/YangCXWBE18Slid… · [Marlin et al. 09], [Schnabel et al. 16], [Steck10], [Steck11], and [Steck13]

36

http://www.openrec.aiGithub link, documents, and tutorials

Longqi YangPh.D. candidate

Computer Science, Cornell Tech, Cornell University

Email: [email protected]

Web: bit.ly/longqi

Twitter: @ylongqi

Connected Experiences Lab

http://cx.jacobs.cornell.edu/

Small Data Lab

http://smalldata.io/

Funders: