what are the 100 best restaurants on yelp? a difficult answer to a simple question
TRANSCRIPT
What are the 100 best restaurants on YelpA difficult answer to a simple question
Travis BrooksPM - Search and Data Science
tbrooks@yelp @traviscb1998
ProblemWhat are the 100 best restaurants on Yelp?Many words in the question are ill-definedBest? For what? For whom?Restaurant? Coffee? Tacos?On Yelp? Tourists? Locals?
We won’t answer all of these, but we can play with a few
But that’s cheatingUsers give us a lot of information in their queries,
locations, times, etc
Search Experiments
Multiple Geographies
I want to be able to talk about methodologies…
...so let’s do it from scratch
How to start?SELECT name, average_rating FROM business WHERE business_category = “restaurant” ORDER BY average_rating DESC LIMIT 100;
SELECT name, average_rating FROM business WHERE business_category = “restaurant” ORDER BY average_rating DESC LIMIT 100;
How to start?
Next Try?SELECT name, average_rating FROM business WHERE business_category = “restaurant” AND review_count > 50 ORDER BY average_rating DESC LIMIT 100;
Next Try?SELECT name, average_rating + review_count as score FROM business WHERE business_category = “restaurant” AND review_count > 50 ORDER BY score DESC LIMIT 100;
Next Try?
score = average_rating + b * review_count
Choose b such that you are happy with the list!
What does b mean?
How much difference, relative to rating, does an extra review make?
Next Try?
score = average_rating + b * review_count
Choose b such that you are happy with the list!
What does b mean?
How much difference, relative to rating, does 900 reviews make?
Other Ideas
Smoothing - used by IMDB, among others
Add some number of “pseudoratings” at the global average rating. Choosing how many to add is an art...
2014 List - Smoothed - All time faves
2015 List - Dirichlet - Weight 2014 Reviews
Product Perspective
Analysis -> Insights -> Actionable Insights -> Impact
- Paraphrased from Ken Rudin
Goals?Encourage exploration? Probably have better ways to do that that are more targeted
Reward good restaurants? I don’t object, but not sure it’s that effective
Inspire trust in Yelp? Maybe, but how many of a top 100 list is any person likely to have been to?
Generate buzz and discussion? Well - if you are interested in trying some of them, and itching to tell us we are wrong - perhaps we succeeded...
TakeawaysData driven products are subtle - even if the feature is trivial (a list)
Reasoning from first principles is great - but getting dirty and playing with the data is a better way to see what is likely to work
Keep the product goals in mind - the distractions here can be scientific and statistical puzzles and challenges, rather than traditional engineering and design rabbit holes
ThanksThanks:Dan Frank - Dirichlet work and 2015 List Will Seltzer - Dirichlet workYelp Users - Ratings, Reviews and Photos
Further Reading:Bayesian methods for Hackers WIlson Score and How not to sort by rating Nate Silver’s Burrito Bracket (This intro highly recommended for interesting methodology)