estimation–action–reﬂection · - co-occur, similar. notation meaning u user embedding v item...

$: Estimation–Action–Reﬂection · - Co-occur, similar. Notation Meaning u User embedding v Item embedding P_u={p_1,p _2,…, p_n} Known user preferred attributes in current conversation$
Estimation–Action–Reflection:Towards Deep Interaction Between

Conversational and Recommender Systems

1

Wenqiang Lei, Xiangnan He, Yisong Miao, Qingyun Wu, Richang Hong, Min-Yen Kan, Tat-Seng Chua {wenqianglei, xiangnanhe, miaoyisong}@gmail.com, [email protected], [email protected],

{kanmy,chuats}@comp.nus.edu.sg

Iwantanewphone.Whatoperatingsystemdoyouwant?

iOSWhataboutthelatestiPhone11?

No,tooexpensive.DoyouwantallscreendesignwithFaceID?

Yes!Doyouwantmorecoloroptions?Red,blue?

RedisgreatoptioniPhoneXRRedwith128GBisarealbargain!

Nice!Iwilltakeit!

Reflectonwhyuserreject

recommendeditems?

Useraccept,conversationterminates.

Askingattribute

Askingattribute

Askingattribute

Attempttorecommend

Attempttorecommend

What is conversational recommendation

2

Workflow of multi-round Conversational Recommendation Scenario

• Onesessionisstartedbytheuserspecifyingadesiredattribute.

• Onesessionwillbestoppedonlywhentherecommendationissuccessfulortheuserquits.

Our proposed multi-round scenario3

Objective: Accurately recommend item to user in shortest turns

Method: EAR- Estimation, Action, Reflection Deep interaction among CC(conversation system) and RC (recommendation system)

Estimation:• RCranksthecandidateitemanditemattribute.Action:• CCtakesintoaccountrankeditemsandrankedattributestodecidewhethertoask

attributeormakerecommendationReflection:• Whenuserrejectslistofrecommendation,theRCadjustsitsestimationforuser.

Estimation Action Reflection

Rankeditemandattributes Rejecteditems

Adjusttheestimationfortheuser

4

ThePositionofConversationalRecommendation—BridgingRecommendationSystemandSearch

Traditionalmethodforusertogetanitem:SearchorRecommendation

Search:User'sIntentionistotallyclear

Recommendation:User'sIntentionistotallyunclear

ConversationalRecommendation:Trytoinduceuserpreference

throughconversation!

• We have 3 Key Research Tasks:1. What item to recommend? What attribute to ask?2. Strategy to ask and recommend?3. How to adapt to user's online feedback?

Objective: Accurately recommend item to user in shortest turns

5

Estimation stage — Item prediction

I'dlikesomeItalianfood.Gotyou,doyoulikesomePizza?

Yes!Gotyou,doyoualsowantsomenightlife?

Yes!

• Howtoranktopthatrestaurantshereallywantswithinallcandidatesremained?

1000candidatesremains


95candidatesremains

6

Estimation stage — Attribute prediction

• WhatquestionshouldIasknext,soshecangivemepositivefeedback?giventheattributesIalreadyknow.

I'dlikesomeItalianfood.Gotyou,doyoulikesomeChinesefood?

No!Gotyou,doyoualsowantsome??

___?___


1000candidatesremains.

Wasteaturn!

_?_candidatesremains

7

Preliminary - FM (Factorization Machine)De Facto Choice for recommender system

- A framework to learn embedding in a same vector space.

- Capture the interaction between vectors by their inner product.

- Co-occur, similar.

Notation Meaning

u Userembedding

v Itemembedding

P_u={p_1,p_2,…, p_n}

Knownuserpreferredattributesincurrentconversationsession.

Score Function to decide how likely user would like an item:

8

Method: Bayesian Personalized Ranking

Positivesample Negativesample

9

Method: Attribute-aware BRP for item prediction and attribute preference prediction

Multi-taskLearningNote: We use information gathered by CC(conversation part) to enhance the RC!

Score function for attribute preference prediction

10

Action stage: Strategy to ask and recommend?

I'dlikesomeItalianfood.Gotyou,doyoulikesomepizza?

Yes!Gotyou,doyoulikesomenightlife?

Yes!Trytorecommend10items!

Rejected!Gotyou,doyoulikesomeWine?


Accepted!

Thistime,Itrytorecommendmoreearlier...



95candidatesremains

Targetitemrank6/10

30candidatesremains

Shouldrecommend?

Shouldrecommend?

95candidatesremains

11

Method: Strategy to ask and recommend? (Action Stage)

Weusereinforcementlearningtofindthebeststrategy.• policygradientmethod• simplepolicynetworkof2-layerfeedforwardnetwork

Note: 3 of the 4 information come from Recommender Part

Action Space: 12

Reflection stage: How to adapt to user's online feedback?

I'dlikesomeItalianfood.Gotyou,doyoulikesomepizza?

Yes!Gotyou,doyoulikesomenightlife?


Rejected!

Thistime,Itrytorecommendmoreearlier...



95candidatesremains

Adjustestimation

Shouldrecommend?

Sherejectedmyrecommended10items...However,thatiswhatsheshouldloveaccordingtoherhistory.HowcanIinducehercurrentpreferencewiththis10items?

13

Method: How to adapt to user's online feedback? (Reflection stage)Solution:Wetreattherecentlyrejected10itemsasnegativesamplestore-traintherecommender,toadjusttheestimationofuserpreference.

14

Experiment setup (1) - Dataset Collection

Dataset #user #item #interactions #attributes

Yelp 27,675 70,311 1,368,606 590Last.FM 1,801 7,432 76,693 33

DatasetDescription

Whyweneedtocreatedataset?• There’snoexistingdatasetsspeciallyforCRSasthisfieldisverynew.• Datasetsofpreviousworkhastoofewattributesforreal-worldapplications.

Howwecreatedataset?• Standardpruningoperation(user/itemhas<5reviews)• ForLast.FM,webuild33BinaryattributesforLast.FM(Classic,Popular,Rock,etc…)• ForYelp,webuild29enumeratedattributesona2-leveltaxonomyover590original

attributes. 15

Experiment setup (2)Usersimulator• Lackanofflineexperimentenvironmentforconversationalrecommendation.• Weusetherealinteractionspairbetweenuseranditem.• Theusersimulatorwillkeepthetargetitemin“itsheart”,thengiveresponsesinteractively

toouragents.Responsesincludegiveanswertoaquestion,andaccept/rejectitemwhenouragentproposesalistofrecommendation.

Trainingdetails• Wesetthemaxlengthofconversationto15,andfixthelengthofrecommendationlistto

10. • WeuseSGDoptimizertotrainFMmodel(hiddensize=64),withL2regularizationof0.001,

thelearningrateofitempredictionis0.01andattributepredictionis0.001• Forthepolicynetwork(MLP),weuse2layerhiddensizeof256,wepre-trainitasa

classifieraccordingtomax-entropyresults,anduseREINFORCEalgorithmtotrainwithlearningrateof0.001.r_success=1,r_ask=0.1,r_quit=-0.3,r_prevent=-0.1,discountfactorγ=0.7 16

Main Experiment ResultsEvaluationMatrices:• SR@k(Successrateatk-thturn)• AT(Averageturnofconversation)

17

Experiment results – Estimation stage item and attribute prediction

The offline AUC score of prediction of item and attributes• Standard FM model, • FM + A (attribute aware item BPR)• FM + A + MT (Multitask learning)

18

Experiment results – Action stageStrategy to ask and recommend?

Weconductedablationstudyonthestatevectorfedintopolicynetwork,inordertofindthecontributionofeachcomponent.

• entropyseemstobethemostsalientcomponent.

19

Experiment Result: Reflection stage How to adapt to user's online feedback?

Performance of removing the online update module. Yelp suffers less than LastFM, Why?• Yelp dataset has a better offline AUC.• When offline AUC is higher, the reflection stage tend to have less effect.

“Bad update” in Yelp Dataset

20

Conclusion and Future Works• Weformalizethetaskofmulti-turnconversationalrecommendation

• Werefinetherecommendationsysteminaconversationalscenarioforattribute-awareitemrankingandattribute-awarepreferenceestimation.

• Weproposesathree-stagesolutionEARforCRS,outperformingstate-of-the-artbaselines.

• Weplantodoonlineevaluationandobtainreal-worldexposuredatabycollaboratingwithE-commercecompanies.

21

Thank you!

22

Spare Slides

23

Importance of this research projectTheImportanceofCRS(ConversationalRecommendationSystem):• Overcomethelimitationoftraditionalstaticrecommendersystems,thusimprove

user’ssatisfactionandbringrevenueforbusiness!• Embracerecentadvancesinconversationtechnology.

TheAdvancesBroughtByOurWork:• We’rethefirsttoconsiderarealisticmulti-roundconversationalrecommendation

scenario.• UnifyingCC(ConversationComponent)andRC(RecommenderComponent),and

proposeanovelthree-stagedsolutionEAR.• Webuildtwodatasetsbysimulatinguserconversationstomakethetasksuitablefor

offlineacademicresearch.

24

Literature Review (1)

• StaticTraditionalRecommendationSystems:-CollaborativeFiltering-MatrixFactorization-FactorizationMachine-etc...

• Limitation1:-Offline:learnfromuserhistorydata,socanonlymimicuser'shistorypreference.• Limitation2:-Usercannotexplicittellsystemherpreference.-Systemcannotleverageuser'sfeedback.

Existingonlinerecommendationmethods(bandit):• epsilon-greedy• Thompson-Sampling• UpperConfidenceBound(UCB)• Linear-UCB• CollaborativeUCB...

Limitation:• Canonlyattempttorecommend

items,cannotaskattributesofitem• Themathematicsformulationof

banditrestrictsittoonlyrecommend1itemeachturn.

25

Literature Review (2)Towards Conversational Recommendation — Sun et.al. SIGIR 2018

Limitation:- Can only recommend for 1 time.

The session will end regardless of success or not.

- Recommender Component and Conversation Component are isolated part.

- Simply taking belief tacker as input for action decision.

Interaction?

Screenshot from SIGIR 2018, Towards Conversational Recommendation 26

estimation–action–reﬂection · - co-occur, similar. notation meaning u user embedding v item...

Documents