estimation–action–reflection · - co-occur, similar. notation meaning u user embedding v item...
TRANSCRIPT
Estimation–Action–Reflection:Towards Deep Interaction Between
Conversational and Recommender Systems
1
Wenqiang Lei, Xiangnan He, Yisong Miao, Qingyun Wu, Richang Hong, Min-Yen Kan, Tat-Seng Chua {wenqianglei, xiangnanhe, miaoyisong}@gmail.com, [email protected], [email protected],
{kanmy,chuats}@comp.nus.edu.sg
Iwantanewphone.Whatoperatingsystemdoyouwant?
iOSWhataboutthelatestiPhone11?
No,tooexpensive.DoyouwantallscreendesignwithFaceID?
Yes!Doyouwantmorecoloroptions?Red,blue?
RedisgreatoptioniPhoneXRRedwith128GBisarealbargain!
Nice!Iwilltakeit!
Reflectonwhyuserreject
recommendeditems?
Useraccept,conversationterminates.
Askingattribute
Askingattribute
Askingattribute
Attempttorecommend
Attempttorecommend
What is conversational recommendation
2
Workflow of multi-round Conversational Recommendation Scenario
• Onesessionisstartedbytheuserspecifyingadesiredattribute.
• Onesessionwillbestoppedonlywhentherecommendationissuccessfulortheuserquits.
Our proposed multi-round scenario3
Objective: Accurately recommend item to user in shortest turns
Method: EAR- Estimation, Action, Reflection Deep interaction among CC(conversation system) and RC (recommendation system)
Estimation:• RCranksthecandidateitemanditemattribute.Action:• CCtakesintoaccountrankeditemsandrankedattributestodecidewhethertoask
attributeormakerecommendationReflection:• Whenuserrejectslistofrecommendation,theRCadjustsitsestimationforuser.
Estimation Action Reflection
Rankeditemandattributes Rejecteditems
Adjusttheestimationfortheuser
4
ThePositionofConversationalRecommendation—BridgingRecommendationSystemandSearch
Traditionalmethodforusertogetanitem:SearchorRecommendation
Search:User'sIntentionistotallyclear
Recommendation:User'sIntentionistotallyunclear
ConversationalRecommendation:Trytoinduceuserpreference
throughconversation!
• We have 3 Key Research Tasks:1. What item to recommend? What attribute to ask?2. Strategy to ask and recommend?3. How to adapt to user's online feedback?
Objective: Accurately recommend item to user in shortest turns
5
Estimation stage — Item prediction
I'dlikesomeItalianfood.Gotyou,doyoulikesomePizza?
Yes!Gotyou,doyoualsowantsomenightlife?
Yes!
• Howtoranktopthatrestaurantshereallywantswithinallcandidatesremained?
1000candidatesremains
250candidatesremains
95candidatesremains
6
Estimation stage — Attribute prediction
• WhatquestionshouldIasknext,soshecangivemepositivefeedback?giventheattributesIalreadyknow.
I'dlikesomeItalianfood.Gotyou,doyoulikesomeChinesefood?
No!Gotyou,doyoualsowantsome??
___?___
1000candidatesremains
1000candidatesremains.
Wasteaturn!
_?_candidatesremains
7
Preliminary - FM (Factorization Machine)De Facto Choice for recommender system
- A framework to learn embedding in a same vector space.
- Capture the interaction between vectors by their inner product.
- Co-occur, similar.
Notation Meaning
u Userembedding
v Itemembedding
P_u={p_1,p_2,…, p_n}
Knownuserpreferredattributesincurrentconversationsession.
Score Function to decide how likely user would like an item:
8
Method: Bayesian Personalized Ranking
Positivesample Negativesample
9
Method: Attribute-aware BRP for item prediction and attribute preference prediction
Multi-taskLearningNote: We use information gathered by CC(conversation part) to enhance the RC!
Score function for attribute preference prediction
10
Action stage: Strategy to ask and recommend?
I'dlikesomeItalianfood.Gotyou,doyoulikesomepizza?
Yes!Gotyou,doyoulikesomenightlife?
Yes!Trytorecommend10items!
Rejected!Gotyou,doyoulikesomeWine?
Yes!Trytorecommend10items!
Accepted!
Thistime,Itrytorecommendmoreearlier...
1000candidatesremains
250candidatesremains
95candidatesremains
Targetitemrank6/10
30candidatesremains
Shouldrecommend?
Shouldrecommend?
95candidatesremains
11
Method: Strategy to ask and recommend? (Action Stage)
Weusereinforcementlearningtofindthebeststrategy.• policygradientmethod• simplepolicynetworkof2-layerfeedforwardnetwork
Note: 3 of the 4 information come from Recommender Part
Action Space: 12
Reflection stage: How to adapt to user's online feedback?
I'dlikesomeItalianfood.Gotyou,doyoulikesomepizza?
Yes!Gotyou,doyoulikesomenightlife?
Yes!Trytorecommend10items!
Rejected!
Thistime,Itrytorecommendmoreearlier...
1000candidatesremains
250candidatesremains
95candidatesremains
Adjustestimation
Shouldrecommend?
Sherejectedmyrecommended10items...However,thatiswhatsheshouldloveaccordingtoherhistory.HowcanIinducehercurrentpreferencewiththis10items?
13
Method: How to adapt to user's online feedback? (Reflection stage)Solution:Wetreattherecentlyrejected10itemsasnegativesamplestore-traintherecommender,toadjusttheestimationofuserpreference.
14
Experiment setup (1) - Dataset Collection
Dataset #user #item #interactions #attributes
Yelp 27,675 70,311 1,368,606 590Last.FM 1,801 7,432 76,693 33
DatasetDescription
Whyweneedtocreatedataset?• There’snoexistingdatasetsspeciallyforCRSasthisfieldisverynew.• Datasetsofpreviousworkhastoofewattributesforreal-worldapplications.
Howwecreatedataset?• Standardpruningoperation(user/itemhas<5reviews)• ForLast.FM,webuild33BinaryattributesforLast.FM(Classic,Popular,Rock,etc…)• ForYelp,webuild29enumeratedattributesona2-leveltaxonomyover590original
attributes. 15
Experiment setup (2)Usersimulator• Lackanofflineexperimentenvironmentforconversationalrecommendation.• Weusetherealinteractionspairbetweenuseranditem.• Theusersimulatorwillkeepthetargetitemin“itsheart”,thengiveresponsesinteractively
toouragents.Responsesincludegiveanswertoaquestion,andaccept/rejectitemwhenouragentproposesalistofrecommendation.
Trainingdetails• Wesetthemaxlengthofconversationto15,andfixthelengthofrecommendationlistto
10. • WeuseSGDoptimizertotrainFMmodel(hiddensize=64),withL2regularizationof0.001,
thelearningrateofitempredictionis0.01andattributepredictionis0.001• Forthepolicynetwork(MLP),weuse2layerhiddensizeof256,wepre-trainitasa
classifieraccordingtomax-entropyresults,anduseREINFORCEalgorithmtotrainwithlearningrateof0.001.r_success=1,r_ask=0.1,r_quit=-0.3,r_prevent=-0.1,discountfactorγ=0.7 16
Main Experiment ResultsEvaluationMatrices:• SR@k(Successrateatk-thturn)• AT(Averageturnofconversation)
17
Experiment results – Estimation stage item and attribute prediction
The offline AUC score of prediction of item and attributes• Standard FM model, • FM + A (attribute aware item BPR)• FM + A + MT (Multitask learning)
18
Experiment results – Action stageStrategy to ask and recommend?
Weconductedablationstudyonthestatevectorfedintopolicynetwork,inordertofindthecontributionofeachcomponent.
• entropyseemstobethemostsalientcomponent.
19
Experiment Result: Reflection stage How to adapt to user's online feedback?
Performance of removing the online update module. Yelp suffers less than LastFM, Why?• Yelp dataset has a better offline AUC.• When offline AUC is higher, the reflection stage tend to have less effect.
“Bad update” in Yelp Dataset
20
Conclusion and Future Works• Weformalizethetaskofmulti-turnconversationalrecommendation
• Werefinetherecommendationsysteminaconversationalscenarioforattribute-awareitemrankingandattribute-awarepreferenceestimation.
• Weproposesathree-stagesolutionEARforCRS,outperformingstate-of-the-artbaselines.
• Weplantodoonlineevaluationandobtainreal-worldexposuredatabycollaboratingwithE-commercecompanies.
21
Thank you!
22
Spare Slides
23
Importance of this research projectTheImportanceofCRS(ConversationalRecommendationSystem):• Overcomethelimitationoftraditionalstaticrecommendersystems,thusimprove
user’ssatisfactionandbringrevenueforbusiness!• Embracerecentadvancesinconversationtechnology.
TheAdvancesBroughtByOurWork:• We’rethefirsttoconsiderarealisticmulti-roundconversationalrecommendation
scenario.• UnifyingCC(ConversationComponent)andRC(RecommenderComponent),and
proposeanovelthree-stagedsolutionEAR.• Webuildtwodatasetsbysimulatinguserconversationstomakethetasksuitablefor
offlineacademicresearch.
24
Literature Review (1)
• StaticTraditionalRecommendationSystems:-CollaborativeFiltering-MatrixFactorization-FactorizationMachine-etc...
• Limitation1:-Offline:learnfromuserhistorydata,socanonlymimicuser'shistorypreference.• Limitation2:-Usercannotexplicittellsystemherpreference.-Systemcannotleverageuser'sfeedback.
Existingonlinerecommendationmethods(bandit):• epsilon-greedy• Thompson-Sampling• UpperConfidenceBound(UCB)• Linear-UCB• CollaborativeUCB...
Limitation:• Canonlyattempttorecommend
items,cannotaskattributesofitem• Themathematicsformulationof
banditrestrictsittoonlyrecommend1itemeachturn.
25
Literature Review (2)Towards Conversational Recommendation — Sun et.al. SIGIR 2018
Limitation:- Can only recommend for 1 time.
The session will end regardless of success or not.
- Recommender Component and Conversation Component are isolated part.
- Simply taking belief tacker as input for action decision.
Interaction?
Screenshot from SIGIR 2018, Towards Conversational Recommendation 26