2015-11-12 - advanced apache spark meetup @ thumbtack

12

Upload: natekupp

Post on 21-Apr-2017

294 views

Category:

Engineering


1 download

TRANSCRIPT

Page 1: 2015-11-12 - Advanced Apache Spark Meetup @ Thumbtack
Page 2: 2015-11-12 - Advanced Apache Spark Meetup @ Thumbtack

2THUMBTACK NOVEMBER 2015

HIRING LOCAL PROFESSIONALS IS STILL SHOCKINGLY HARD

Directories have moved online, but the process hasn’t changed in generations

Page 3: 2015-11-12 - Advanced Apache Spark Meetup @ Thumbtack

3THUMBTACK NOVEMBER 2015

THUMBTACK IS BUILDING THE BEST AND MOST TRUSTED WAY TO

HIRE A PROFESSIONAL FOR ANY PROJECT, ANYTIME, ANYWHERE

Page 4: 2015-11-12 - Advanced Apache Spark Meetup @ Thumbtack

4THUMBTACK NOVEMBER 2015

• Interested, available and qualified professionals come to customers

• Customers have the confidence to know who is best and that they’re paying a fair price

• One-stop-shop for all their service needs

• Free to use

Customers

• A cost effective and performance-based way to acquired new customers ($3–15 to submit each quote)

• Eliminates need to spend time on outbound marketing

• Mobile platform to run their business

Professionals

OUR MARKETPLACE CONNECTS CUSTOMERS AND PROFESSIONALS

Page 5: 2015-11-12 - Advanced Apache Spark Meetup @ Thumbtack

5THUMBTACK NOVEMBER 2015

Receive multiple quotes from pros

• Up to 5 quotes • Median quote within 1 hour • Pricing and response customized

to customer’s unique needs

Customers tell us what they need

• 800 categories with questions customized to each service

• 8-10 unique questions per category

Compare prices, reviews, profiles

• Detailed info on each pro • Reviews tied to past work • Licensing and other credentials

Hire the pro who’s right for them

• Customers can call or message pros to discuss the work before hiring

CUSTOMERS CAN GET FROM REQUEST TO HIRE IN < 1 HOUR

Page 6: 2015-11-12 - Advanced Apache Spark Meetup @ Thumbtack

6THUMBTACK NOVEMBER 2015

DATA INFRASTRUCTURE @ THUMBTACK

Democratizing access to data & building data products

Page 7: 2015-11-12 - Advanced Apache Spark Meetup @ Thumbtack

7THUMBTACK NOVEMBER 2015

USE CASES

Experiments A/B Testing

Analytics Ad-hoc SQL BI Dashboarding Event analytics

Data Products Matching Pricing

Icon Credit: Noun Project. Blake Thompson, Mister Pixel, Creative Stall

Page 8: 2015-11-12 - Advanced Apache Spark Meetup @ Thumbtack

8THUMBTACK NOVEMBER 2015

2014: WHERE WE CAME FROM

Analytics & BI

Event Analytics A/B Testing & Experiments

events

relational

Page 9: 2015-11-12 - Advanced Apache Spark Meetup @ Thumbtack

9THUMBTACK NOVEMBER 2015

KEY PROBLEMS TO SOLVE

• Analytics queries hitting production Postgres replica

• MongoDB not scaling with event data volume

• Single Python process on one machine running Mongo queries for event & experiment analysis

Page 10: 2015-11-12 - Advanced Apache Spark Meetup @ Thumbtack

10THUMBTACK NOVEMBER 2015

HA HDFS

events

relational

Production Cluster

Looker Mode Analytics

Analytics / BI

Airflow

Impala

2015 DATA PLATFORM INFRASTRUCTURE

A/B Testing Matching

Eng ClientsPricing ...

Custom APIs

Sqoop

Custom ETL

Parquet (Snappy)JSON

(1.5.1 on YARN)

Spark Core Spark SQL MLLib

Spark SQL

Page 11: 2015-11-12 - Advanced Apache Spark Meetup @ Thumbtack

11THUMBTACK NOVEMBER 2015

• Spark: Investigating Spark Streaming for event data, several additional use cases for MLLib

• Migrate Spark batch jobs from crons onto Airflow

• Moving event ETL pipeline onto Kinesis

WHAT'S NEXT?

Page 12: 2015-11-12 - Advanced Apache Spark Meetup @ Thumbtack