what are the 100 best restaurants on yelp? a difficult answer to a simple question

39
What are the 100 best restaurants on Yelp A difficult answer to a simple question Travis Brooks PM - Search and Data Science tbrooks@yelp @traviscb1998

Upload: extract-data-conference

Post on 16-Apr-2017

957 views

Category:

Data & Analytics


0 download

TRANSCRIPT

What are the 100 best restaurants on YelpA difficult answer to a simple question

Travis BrooksPM - Search and Data Science

tbrooks@yelp @traviscb1998

Yelp’s Mission:Connecting people with great

local businesses.

Yelp Stats:As of Q2 2015

83M 3268%83M

What types of things are on Yelp?

But we eat three times a day

QuestionWhat are the 100 best restaurants on Yelp?

ProblemWhat are the 100 best restaurants on Yelp?

Best?

Problem

What are the 100 bestrestaurants on Yelp?

Restaurant?

ProblemWhat are the 100 best restaurants on Yelp?Many words in the question are ill-definedBest? For what? For whom?Restaurant? Coffee? Tacos?On Yelp? Tourists? Locals?

We won’t answer all of these, but we can play with a few

Possible Answer

Possible Answer

But that’s cheatingUsers give us a lot of information in their queries,

locations, times, etc

Search Experiments

Multiple Geographies

I want to be able to talk about methodologies…

...so let’s do it from scratch

How to start?SELECT name, average_rating FROM business WHERE business_category = “restaurant” ORDER BY average_rating DESC LIMIT 100;

SELECT name, average_rating FROM business WHERE business_category = “restaurant” ORDER BY average_rating DESC LIMIT 100;

How to start?

Next Try?SELECT name, average_rating FROM business WHERE business_category = “restaurant” AND review_count > 50 ORDER BY average_rating DESC LIMIT 100;

Popularity ~ Quality

Next Try?SELECT name, average_rating + review_count as score FROM business WHERE business_category = “restaurant” AND review_count > 50 ORDER BY score DESC LIMIT 100;

Next Try?

score = average_rating + b * review_count

Choose b such that you are happy with the list!

What does b mean?

How much difference, relative to rating, does an extra review make?

Next Try?

score = average_rating + b * review_count

Choose b such that you are happy with the list!

What does b mean?

How much difference, relative to rating, does 900 reviews make?

Other Ideas

Wilson Score ~ estimate of confidence interval in the rating

Rank by the lower bound

Other Ideas

Smoothing - used by IMDB, among others

Add some number of “pseudoratings” at the global average rating. Choosing how many to add is an art...

Other IdeasDirichlet and more sophisticated modelling

Product Perspective

Analysis -> Insights -> Actionable Insights -> Impact

- Paraphrased from Ken Rudin

Goals?Encourage exploration? Probably have better ways to do that that are more targeted

Reward good restaurants? I don’t object, but not sure it’s that effective

Inspire trust in Yelp? Maybe, but how many of a top 100 list is any person likely to have been to?

Generate buzz and discussion? Well - if you are interested in trying some of them, and itching to tell us we are wrong - perhaps we succeeded...

TakeawaysData driven products are subtle - even if the feature is trivial (a list)

Reasoning from first principles is great - but getting dirty and playing with the data is a better way to see what is likely to work

Keep the product goals in mind - the distractions here can be scientific and statistical puzzles and challenges, rather than traditional engineering and design rabbit holes

ThanksThanks:Dan Frank - Dirichlet work and 2015 List Will Seltzer - Dirichlet workYelp Users - Ratings, Reviews and Photos

Further Reading:Bayesian methods for Hackers WIlson Score and How not to sort by rating Nate Silver’s Burrito Bracket (This intro highly recommended for interesting methodology)

@YelpEngineering

YelpEngineers

engineeringblog.yelp.com

github.com/yelpyelp.com/careers