predicting expedia hotel cluster groupings with user...

Predicted label Predicted label Predicted label Model Selection and Combination • Generalize SVM Prediction to compute an ordered list of 5 clusters • n(n-1)/2 one vs. one classifiers • Each classifier gets a “vote” • Sort the cluster numbers in decreasing order by number of votes, and choose 5 • Decision Tree: 5 with highest probability • Unique proportional ensembling of of both classifiers • Multipliers for each classifier normalized and made proportional to generalization error • Higher weight to the model with lowest generalization error • For each item in each of the 2 lists, calculate score by position • Use 5 highest-scoring items in sorted order Scoring • MAP@5 (Mean Average Precision @ 5) score to evaluate list of 5 predictions for hidden test data Predicted label Predicted label • Statistics • Training data • N = 3,000,693 • Test Data • N = 2,528,243 • 100 hotel clusters • No strong individual correlations between 19 features and hotel_cluster • Preprocessing • Normalization [0, 1] for each feature • Standardization (N~(0,1)) Predicting Expedia Hotel Cluster Groupings with User Search Queries Jarrod Cingel and Liezl Puzon Stanford University Project Objective Data Results ● Provide smarter suggestions for Expedia users by predicting what group of hotels a user will book a hotel from based on certain search criteria • PCA • 5 PC’s account for 99% of the variance • Wrapper method to select 5 features with forward search • Average k-fold cross validation score as evaluation function Feature Selection Methodology 0.00 0.20 0.40 0.60 0.80 1.00 1 3 5 7 9 11 13 15 17 19 Variance vs # PC's Cumulative Variance Combine predictions Predict 5 Train SVM Predict 5 Train Decision Tree 0 0.2 0.4 0.6 MAP@5 Score MAP@5 Cross Validation on Local Training Set First guess only All five guesses MAP@5 on Hidden Set Optimal model combination has greater precision than the individual SVM, decision tree models SVM Decision Trees Combined • Source: Kaggle.com • Input: user queries on Expedia.com • Output: Hotel cluster booking • Hidden test data on Kaggle Precision Mean: 0.620 Range: 0.198 (id 71) 0.999 (id 24) Recall Mean: 0.525 Range: 0.048 (id 88) 1.000 (id 74) Predicted label Precision Mean: 0.620 Range: 0.067 (id 91) 1.000 (id 24) Recall Mean: 0.525 Range: 0.029 (id 14) 1.000 (id 24) Precision Mean: 0.670 Range: 0.000 (id 24) 1.000 (id 35) Recall Mean: 0.533 Range: 0.000 (id 24) 1.000 (id 27) |U| = number of user events P(k) = precision at cutoff k n = number of predicted hotel clusters.

Upload: dodung

Post on 19-Feb-2018

218 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: Predicting Expedia Hotel Cluster Groupings with User ...cs229.stanford.edu/proj2016spr/poster/065.pdf · Expedia.com • Output: Hotel cluster booking • Hidden test data on Kaggle

Predicted label

Model Selection and Combination •  Generalize SVM Prediction to compute an ordered list of

5 clusters •  n(n-1)/2 one vs. one classifiers •  Each classifier gets a “vote” •  Sort the cluster numbers in decreasing order by

number of votes, and choose 5 •  Decision Tree: 5 with highest probability •  Unique proportional ensembling of of both classifiers

•  Multipliers for each classifier normalized and made proportional to generalization error •  Higher weight to the model with lowest

generalization error •  For each item in each of the 2 lists, calculate score

by position •  Use 5 highest-scoring items in sorted order Scoring •  MAP@5 (Mean Average Precision @ 5) score to

evaluate list of 5 predictions for hidden test data

Predicted label

•  Statistics •  Training data

•  N = 3,000,693 •  Test Data

•  N = 2,528,243 •  100 hotel clusters •  No strong individual correlations between 19

features and hotel_cluster •  Preprocessing

•  Normalization [0, 1] for each feature •  Standardization (N~(0,1))

Predicting Expedia Hotel Cluster Groupings with User Search Queries Jarrod Cingel and Liezl Puzon

Stanford University

Project Objective

Data

Results ●  Provide smarter suggestions for Expedia users

by predicting what group of hotels a user will book a hotel from based on certain search criteria

•  PCA •  5 PC’s account for

99% of the variance •  Wrapper method to

select 5 features with forward search •  Average k-fold cross

validation score as evaluation function

Feature Selection

Methodology

0.00

0.20

0.40

0.60

0.80

1.00

1 3 5 7 9 11 13 15 17 19

Variance vs # PC's Cumulative Variance

Combine predictions

Predict 5 Train SVM

Predict 5 Train Decision Tree

0 0.2 0.4 0.6

MAP@5 Score

MAP@5

Cross Validation on Local Training Set First guess only All five guesses

MAP@5 on Hidden Set Optimal model combination has greater precision than the individual SVM, decision tree models

Dec

isio

n Tr

ees

Com

bine

•  Source: Kaggle.com •  Input: user queries on

Expedia.com •  Output: Hotel cluster

booking •  Hidden test data on

Kaggle

Precision Mean: 0.620 Range: 0.198 (id 71) 0.999 (id 24) Recall Mean: 0.525 Range: 0.048 (id 88) 1.000 (id 74)

Predicted label

Precision Mean: 0.620 Range: 0.067 (id 91) 1.000 (id 24) Recall Mean: 0.525 Range: 0.029 (id 14) 1.000 (id 24)

Precision Mean: 0.670 Range: 0.000 (id 24) 1.000 (id 35) Recall Mean: 0.533 Range: 0.000 (id 24) 1.000 (id 27)

|U| = number of user events P(k) = precision at cutoff k n = number of predicted hotel clusters.

PDF DARU - Home - Suvarna Suterasuvarnasutera.com/public/fck/file/e-brosur daru_september...Puma CLUSTER ABIRA @CLUSTER BIANCA CLUSTER ALAM CLUSTER BAYIJ CLUSTER CITRA CLUSTER DHANA

Prediction of Stock Price Movement from Options Datacs229.stanford.edu/proj2016spr/poster/052.pdfTypeequationhere. Typeequationhere. Prediction of Stock Price Movement from Options

Machine Learning for Autonomous Jellyﬁsh Orientation ...cs229.stanford.edu/proj2016spr/report/023.pdfMachine Learning for Autonomous Jellyﬁsh Orientation Determination and Tracking

Cluster Profile Report Faridabad - Mixed Engineering Cluster Profile Report - Faridabad... · Cluster Profile Report Faridabad - Mixed Engineering Cluster ... Cluster Profile Report

Expedia.com pptx

developer.ean.com [email protected] Subject: Thack London 2012

Cluster Incharge Schools of the Cluster Cluster No. Sch ID

mrslooney.weebly.commrslooney.weebly.com/.../5/4/9454444/_intnl.case.study.projects.6… · Web viewFrom which airport in the United States will you fly out? Using Expedia.com, copy

EXPEDIA.COM 2011 VACATION DEPRIVATION STUDY · EXPEDIA.COM 2011 VACATION DEPRIVATION STUDY For press inquiries, please call (425) 679-4317 or email [email protected]. Expedia’s

Classification of hotels for Expedia. - LTHfileadmin.cs.lth.se/cs/Education/edan70/AIProjects/2016/slides/... · site_name – Expedia point of sale (Expedia.com, Expedia.se, ...)

Classiﬁcation of Hand Gestures using Surface ...cs229.stanford.edu/proj2016spr/report/040.pdfClassiﬁcation of Hand Gestures using Surface Electromyography Signals For Upper-Limb

Cluster mode and plf cluster

Topic Retrieval and Articles Recommendationcs229.stanford.edu/proj2016spr/poster/050.pdf · 2017. 9. 23. · Topic Retrieval and Articles Recommendation Yu Wang , Xinyu Shen, Jinzhi

Household Energy Disaggregation based on …cs229.stanford.edu/proj2016spr/report/067.pdf1 Household Energy Disaggregation based on Difference Hidden Markov Model Ali Hemmatifar ([email protected])

CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS9840 - Learning and Computer Vision

Humanities Research Recommendations via …cs229.stanford.edu/proj2016spr/poster/008.pdfEmpirical Study: Humanities Research • Sparser datasets (fewer users, recommendations) •

EXUMA Selloffvacations.com Travelonly.com Expedia.com United.com Tpi.ca Aircanada.com Navtours.com Onetravel.com Aa.com Hotels.com Atlantis.com Delta.com/deltavacations.com Cruiseplanners.com

Automatic Segmentation and Diagnosis of Mass …cs229.stanford.edu/proj2016spr/report/064.pdfAutomatic Segmentation and Diagnosis of Mass Lesions in Mammograms Margaret Shen (marshen),

Using Cluster Analysis, Cluster Validation, and …compbio/thesis/JessShenThesis.pdf · Using Cluster Analysis, Cluster Validation, ... using cluster analysis, cluster validation,

Cluster-Cluster // The Launch Issue

CLUSTER -AN IDENTIFICATION FIXATION CLUSTER …

Material decomposition using neural network for …cs229.stanford.edu/proj2016spr/report/019.pdfMaterial decomposition using neural network for photon counting detector Picha Shunhavanich

li · 2007. 8. 9. · Discriminant analysis Clusterl Cluster 2 Cluster 3 Cluster 4 Cluster 5 Cluster 6 Cluster 7 Original-validated Cross-validated (Yo) significantly different from

Lecciones aprendidas de iniciativas cluster. Que son las iniciativas cluster? CLUSTER INICIATIVA CLUSTER RUTAS COMPETITIVAS

Demystifying the workings of Lending Clubcs229.stanford.edu/proj2016spr/report/039.pdf2 Data Pre-Processing The dataset, available atLending Club Website, is a comprehensive dataset

Introduction to Machine Learning, Stanford Universitycs229.stanford.edu/proj2016spr/poster/025.pdfIntroduction to Machine Learning, Stanford University With the dramatic growth of

Booking.Com - Expedia.Com - Hotels.Com (by Drakou Xristina - Nasiara Zoi)

WiMAX Cluster Cluster Optimization Report

ZigBee Cluster Library User Guide - NXP · PDF fileZigBee Cluster Library User Guide. ZigBee Cluster Library. ZigBee Cluster Library. ZigBee Cluster Library. ZigBee Cluster Library

Annexure-1 CLUSTER ULBS “Cluster” or “GMADA Cluster”

Cluster Initiative Bavaria - Cluster-Offensive Bayern · Chemistry Cluster 20 Cluster Energy Technology 22 Cluster Food 24 Cluster Forestry and Wood 26 Cluster Information and Communication

Presentazione Oss. Appennino · presente su expedia.com, venere.com, tui.it . I FENOMENI DELL’ESTATE 2010 SERVIZI RICHIESTI nelle strutture ricettive (oltre ai tradizionali) •

COMUNIDAD CLUSTER MEDELLiN & COLOMBIA CLUSTER … · 2010. 9. 23. · medellin & colombia cluster energia elÉctricaØ cluster textil/confecciÓn diseÑo y modaØ cluster construcciÓn

Regional Foundations of U.S. Competitiveness on... · Food Cluster Tourism Cluster Tourism Cluster California Agricultural Cluster California Agricultural Cluster State Government

Hybridization patterns in two contact zones of grass ... · Colour code: helvetica cluster cluster yellow + red yellow cluster eastern cluster cluster adjacent lineages red cluster