Towards Better Online Personalization: A Framework for Empirical Evaluation and
Real-Life Validation of Hybrid Recommendation Systems
Stijn Geuens, Koen W. De Bock, Kristof Coussement
Recommendation Systems: Examples
207/20/2016 AMS World Marketing Congress 2016
How to Calculate Recommendations[Bobadilla et al. 2013; Adomavicius et al. 2008]
Classification based on calculation paradigm:
Classification based on input data:
3AMS World Marketing Congress 201607/20/2016
How to Calculate Recommendations[Bobadilla et al. 2013; Adomavicius et al. 2008]
Classification based on calculation paradigm: Memory-based [Goldberg, 1992]
Model-based [Koren, 2008]
Classification based on input data:
Socio-demographic information Demographic RecSys [eg. Pazzani 1999; Porcel et al. 2012]
Product characteristics Content-based RecSys [eg. Lang 1995; Meteren and Someren 2000]
Real-time navigation information Knowledge-based RecSys [eg. Burke 2000]
Behavioral history Collaborative filtering RecSys [eg. Herlocker et al. 2004]
Hybrid solutions [eg. Burke 2002; Preece and Sneiderman 2009]
3AMS World Marketing Congress 201607/20/2016
A Shift Towards Hybrid Algorithms
Single data source systems: advantages and disadvantages [Bobadilla et al. 2013]
Hybridization resolves these issues and leads to better performance [Bobadilla et al. 2013]
Algorithm combination vs. data source combination [Bobadilla et al. 2013]
Burke’s classification [Burke, 2002]: Weighting
Feature combination
4AMS World Marketing Congress 201607/20/2016
Contributions
Go beyond creation of a hybrid algorithm by:
Creation of a decision framework for marketing academics and professionals to guide them in their efforts to create recommendation systems
Opening the black-box of recommendation systems by introducing the concept of feature importance
5AMS World Marketing Congress 201607/20/2016
Research Questions
6AMS World Marketing Congress 2016
Data:
Recommendation Calculation:
Feature Importance:
07/20/2016
Research Questions
6AMS World Marketing Congress 2016
Data:RQ1.a. Do Recommendation systems based on different single data sources differ in performance?RQ1.b. Does combining different data sources add predictive performance?
Recommendation Calculation:RQ2. Which hybridization technique performs best for algorithms with the optimal number of data sources?
Feature Importance:RQ3. Which are the most important predictors in the best performing algorithm?
07/20/2016
Framework
AMS World Marketing Congress 2016 707/20/2016
Framework
AMS World Marketing Congress 2016 8
[Song, 2000; Kohavi et al., 2004]
07/20/2016
Framework
AMS World Marketing Congress 2016 8
[Rendle, 2010]
[Burke, 2002; Adomavicius & Tuzhilin, 2005]
07/20/2016
Framework
AMS World Marketing Congress 2016 8
[Lipton, 2014]
[Herlocker et al., 2004]
[Breiman, 2003]
07/20/2016
Framework
AMS World Marketing Congress 2016 807/20/2016
Experimental Setup
8 different company specific datasets
AMS World Marketing Congress 2016 9
Product Category Visitors Products
Shoes 31,536 11,712
Children's Clothing 16,752 3,956
Decoration 12,747 5,054
Lingerie 11,672 3,514
Furniture 20,507 6,481
Men's Clothing 8,412 4,737
Women's Clothing 50,336 12,979
Household linen 12,376 2,934
07/20/2016
Experimental Setup
Evaluation metric: F1@5 [Lipton, 2015]
Method of analysis:
AMS World Marketing Congress 2016 1007/20/2016
Experimental Setup
Evaluation metric: F1@5 [Lipton, 2015]
Method of analysis: Evaluation: Data and Recommendation Calculation
Friedman aligned rank test with Li’s procedure for posthoc testing [Garçia, 2010]
AMS World Marketing Congress 2016 1007/20/2016
Experimental Setup
Evaluation metric: F1@5 [Lipton, 2015]
Method of analysis: Evaluation: Data and Recommendation Calculation
Friedman aligned rank test with Li’s procedure for posthoc testing [Garçia, 2010]
Interpretation: Variable importance Implementation of Breiman’s (2003) method developed for random forests
AMS World Marketing Congress 2016 10
𝐹𝑒𝑎𝑡𝐼𝑚𝑝𝑖 =𝐹1@5𝐹𝑢𝑙𝑙 − 𝐹1@5𝑅𝑎𝑛𝑑𝑜𝑚 𝑝𝑒𝑟𝑚𝑢𝑡𝑎𝑡𝑖𝑜𝑛
𝑖
𝐹1@5𝐹𝑢𝑙𝑙
𝐹𝑒𝑎𝑡𝐼𝑚𝑝𝑎𝑔𝑔𝑟𝑖 =
1𝑑 𝐹𝑒𝑎𝑡𝐼𝑚𝑝𝑖
𝑑
07/20/2016
Results: Data
RQ1.a. Do Recommendation systems based on different single data sources differ in performance?
AMS World Marketing Congress 2016 11
---- indicate a non-significant difference @ 95% CI
07/20/2016
Results: Data
RQ1.a. Do Recommendation systems based on different single data sources differ in performance?
Yes, there is a difference in performance of different single data source recommendation sytems
AMS World Marketing Congress 2016 11
---- indicate a non-significant difference @ 95% CI
07/20/2016
Results: Data
RQ1.a. Do Recommendation systems based on different single data sources differ in performance?
Yes, there is a difference in performance of different single data source recommendation sytems
A company focusses best on a RBD (or PD) based recommendation sytem whenbuilding a single data source recommender system
AMS World Marketing Congress 2016 11
---- indicate a non-significant difference @ 95% CI
07/20/2016
Results: Data
RQ1.b. Does combining different data sources add predictive performance?
AMS World Marketing Congress 2016 12
…... indicate a marginally significant difference
07/20/2016
Results: Data
RQ1.b. Does combining different data sources add predictive performance?
Yes, performance increases when adding data sources
AMS World Marketing Congress 2016 12
…... indicate a marginally significant difference
07/20/2016
Results: Data
RQ1.b. Does combining different data sources add predictive performance?
Yes, performance increases when adding data sources
It is worthwhile for a company to investigate data source combination to improve performance of recommendation systems
AMS World Marketing Congress 2016 12
…... indicate a marginally significant difference
07/20/2016
Results: Recommendation Calculation
RQ2. Which hybridization technique performs best for algorithms with the optimal number of data sources?
AMS World Marketing Congress 2016 1307/20/2016
Results: Recommendation Calculation
RQ2. Which hybridization technique performs best for algorithms with the optimal number of data sources?
Factorization machines are out performing an a posteriori weighting of single data source algorithms
AMS World Marketing Congress 2016 1307/20/2016
Results: Recommendation Calculation
RQ2. Which hybridization technique performs best for algorithms with the optimal number of data sources?
Factorization machines are out performing an a posteriori weighting of single data source algorithms
It is worthwhile for a company to investigate advanced hybridization techniques to improve the performance of recommendation systems
AMS World Marketing Congress 2016 1307/20/2016
Results: Feature Importance
RQ3. Which are the most important predictors in the best performing algorithm?
Within the best performing algorithm (RQ1 and RQ2), distinction can be made between data source importance scores. RBD > PD > CD > ABD
AMS World Marketing Congress 2016 14
0% 5% 10% 15% 20% 25% 30% 35% 40%
Aggregated Behavioral Data
Customer Data
Product Data
Raw Behavioral Data
07/20/2016
Results: Feature Importance
AMS World Marketing Congress 2016 15
0% 2% 4% 6% 8% 10% 12% 14%
Number of total purchases
Mean product rating
Total value of purchases
Length of relationship
Time since last purchase
Internal vs external
Value-based segmentation
Mean Product Rating
Explicit ratings
Number of children
Marital Status
Place of residence
Age of Children
Brand
Gender
Age
Internal search
Product Division 3
Product Division 2
Product Division 1
Purchases
Addition to cart
Views
RBDPDCDABD
07/20/2016
Conclusions
A framework to guide marketing professionals and academics in their efforts to create recommendation systems
Empirical validation of the framework on 8 datasets:
AMS World Marketing Congress 2016 1607/20/2016
Conclusions
A framework to guide marketing professionals and academics in their efforts to create recommendation systems
Empirical validation of the framework on 8 datasets:
Single data sources recommendation systems differ in performance
AMS World Marketing Congress 2016 1607/20/2016
Conclusions
A framework to guide marketing professionals and academics in their efforts to create recommendation systems
Empirical validation of the framework on 8 datasets:
Single data sources recommendation systems differ in performance
Combining data sources adds to the performance of recommendation systems
AMS World Marketing Congress 2016 1607/20/2016
Conclusions
A framework to guide marketing professionals and academics in their efforts to create recommendation systems
Empirical validation of the framework on 8 datasets:
Single data sources recommendation systems differ in performance
Combining data sources adds to the performance of recommendation systems
An advanced combination technique based on feature combination outperforms a posteriori weighting of single data source algorithms
AMS World Marketing Congress 2016 1607/20/2016
Conclusions
A framework to guide marketing professionals and academics in their efforts to create recommendation systems
Empirical validation of the framework on 8 datasets:
Single data sources recommendation systems differ in performance
Combining data sources adds to the performance of recommendation systems
An advanced combination technique based on feature combination outperforms a posteriori weighting of single data source algorithms
RBD is the most important data source in the best performing model followed by PD, CD, and finally ABD
AMS World Marketing Congress 2016 1607/20/2016
Future Work
Incorporation of other evaluation metrics in the framework
Field test Evaluation of different recommendation strategies in terms of business metrics
Identification of the relationship between ‘academic’ metrics and business metrics
AMS World Marketing Congress 2016 1707/20/2016
References
J. Bobadilla, F. Ortega, A. Hernando, A. Gutierrez, Recommender systems survey, Knowl.-Based Syst., 46 (2013) 109-132
] G. Adomavicius, A. Tuzhilin, Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions, IEEE Trans. Knowl. Data Eng., 17 (2005) 734-749
Y. Koren, Factorization meets the neighborhood: A multifaceted collaborative filtering model, 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, Las Vegas, NV, 2008, pp. 426-434
M.J. Pazzani, A framework for collaborative, content-based and demographic filtering, Artif. Intell. Rev., 13 (1999) 393-408
C. Porcel, A. Tejeda-Lorente, M.A. Martinez, E. Herrera-Viedma, A hybrid recommender system for the selective dissemination of research resources in a technology transfer office, Inform. Sciences, 184 (2012) 1-19
R. Burke, Hybrid recommender systems: Survey and experiments, User Modeling and User-Adapted Interaction, 12 (2002) 331-370
AMS World Marketing Congress 2016 1807/20/2016
References
J.L. Herlocker, J.A. Konstan, L.G. Terveen, J.T. Riedl, Evaluating collaborative filtering recommender systems, ACM Trans. Inf. Syst., 22 (2004) 5-53
I.-Y. Song, Database Design for Real-World E-Commerce Systems, IEEE Data Engineering Bulletin, 23 (2000) 23-28.
R. Kohavi, L. Mason, R. Parekh, Z. Zheng, Lessons and Challenges from Mining Retail E-Commerce Data, Mach. Learn., 57 (2004) 83-113
S. Rendle, Factorization Machines, IEEE International Conference on Data Mining, Sydney, Australia, 2010
Z.C. Lipton, C. Elkan, B. Naryanaswamy, Optimal thresholding of classifiers to maximize F1 measure, in: T. Calders, F. Esposito, E. Hüllermeier, R. Meo (Eds.) Machine Learning and Knowledge Discovery in Databases, Springer Berlin Heidelberg 2014, pp. 225-239
L. Breiman, Random forests, Mach. Learn., 45 (2001) 5-32
AMS World Marketing Congress 2016 1907/20/2016
Thank you for your Attention
Contact:Stijn Geuens (0)3.20.545.892
IESEG School of Management [email protected]
3 Rue de la Digue fr.linkedin.com/pub/stijn-geuens/
F-59000 Lille stijn.geuens
AMS World Marketing Congress 2016 2007/20/2016
Appendix 1: Advantages and disadvantagesof different systems
[Burke, 2002]
AMS World Marketing Congress 2016 21
Collaborative Filtering
Content-based Knowledge-Based Demographic
Pros
No metadata engineering needed
Comparison between items possible
DeterministicNo metadata
engineering needed
Serendipity in resultsNo metadata
engineering neededNo cold-start Serendipity in results
Adaptive Adaptive
Cons
Scalability OverspecializationKnowledge engineering
requiredLong tail
Cold Start for new users and items
Cold start for new users Subjective Cold start for new users
Long tail problemCollection of product
informationStatic Static
Stability
07/20/2016
Appendix 2: Experimental FrameworkData
22AMS World Marketing Congress 2016
Data
Product Data
Three main product division
Brand
Mean product rating
Internal vs. external
Availability on the web
Customer Data
Age
Gender
Marital status
Place of residence
Number of children
Age of children
Aggregated Behavioral Data
RFM
Time since last purchase
Number of total purchases
Total value of purchases
Relationship features
Length of Relationship
Value-based segmentation
Mean product rating
Raw Behavioral Data
Explicit ratings
Purchases
Internal search
Addition to cart
Views
07/20/2016
Appendix 2: Experimental FrameworkData
AMS World Marketing Congress 2016 2307/20/2016
Appendix 2: Experimental FrameworkData
AMS World Marketing Congress 2016 24
Product Category Visitors Products
Shoes 31,536 11,712
Children's Clothing 16,752 3,956
Decoration 12,747 5,054
Lingerie 11,672 3,514
Furniture 20,507 6,481
Men's Clothing 8,412 4,737
Women's Clothing 50,336 12,979
Household linen 12,376 2,934
07/20/2016
Appendix 3: Experimental Framework:Recommendation Calculation
25
Factorization Machines Introduced by Rendle (2010)
Based on Support Vector Machines (SVM) and factorization models and combines the advantages of both.
SVM: Works with any real valued feature vector, allowing to integrated different data sources
Factorization Models: Variable interaction is calculated based on factorized parameters, allowing to estimate interaction under huge sparsity, where SVM’s fail.
General FM model equation of degree 2:
AMS World Marketing Congress 201607/20/2016