testtting
TRANSCRIPT
![Page 1: Testtting](https://reader036.vdocuments.net/reader036/viewer/2022081519/55c1879cbb61eb03568b458f/html5/thumbnails/1.jpg)
Data Platform and Services
Vipul Sharma and Eyal Reuveni
![Page 2: Testtting](https://reader036.vdocuments.net/reader036/viewer/2022081519/55c1879cbb61eb03568b458f/html5/thumbnails/2.jpg)
Agenda
EventbriteData ProductsData Platform
RecommendationsQuestions
![Page 3: Testtting](https://reader036.vdocuments.net/reader036/viewer/2022081519/55c1879cbb61eb03568b458f/html5/thumbnails/3.jpg)
• A social event ticketing and discovery platform• 50th Million Ticket Sold• Revenue doubled YOY• 180 Employees in SOMA SF• Solving significant engineering problems
• Data• Data, Infrastructure, Mobile, Web, Scale, Ops, QA
• Firing all cylinders and hiring blazing fastwww.eventbrite.com/jobs
![Page 4: Testtting](https://reader036.vdocuments.net/reader036/viewer/2022081519/55c1879cbb61eb03568b458f/html5/thumbnails/4.jpg)
Data Products
![Page 5: Testtting](https://reader036.vdocuments.net/reader036/viewer/2022081519/55c1879cbb61eb03568b458f/html5/thumbnails/5.jpg)
![Page 6: Testtting](https://reader036.vdocuments.net/reader036/viewer/2022081519/55c1879cbb61eb03568b458f/html5/thumbnails/6.jpg)
![Page 7: Testtting](https://reader036.vdocuments.net/reader036/viewer/2022081519/55c1879cbb61eb03568b458f/html5/thumbnails/7.jpg)
Analytics
• Add–Hoc queries by Analysts
![Page 8: Testtting](https://reader036.vdocuments.net/reader036/viewer/2022081519/55c1879cbb61eb03568b458f/html5/thumbnails/8.jpg)
Fraud and Spam
![Page 9: Testtting](https://reader036.vdocuments.net/reader036/viewer/2022081519/55c1879cbb61eb03568b458f/html5/thumbnails/9.jpg)
Data Platform
![Page 10: Testtting](https://reader036.vdocuments.net/reader036/viewer/2022081519/55c1879cbb61eb03568b458f/html5/thumbnails/10.jpg)
![Page 11: Testtting](https://reader036.vdocuments.net/reader036/viewer/2022081519/55c1879cbb61eb03568b458f/html5/thumbnails/11.jpg)
Hadoop Cluster
• 30 persistent EC2 High-Memory Instances• 30TB disk with replication factor of 2, ext3
formatted• CDH3 • Fair Scheduler• HBase
![Page 12: Testtting](https://reader036.vdocuments.net/reader036/viewer/2022081519/55c1879cbb61eb03568b458f/html5/thumbnails/12.jpg)
Infrastructure
• Search• Solr• Incremental updates towards event driven
• Recommendation/Graph• Hadoop• Native Java MapReduce• Bash for workflow
• Persistence• MySql• HDFS• HBase• MongoDB (Investigating Cassandra and Riak)
![Page 13: Testtting](https://reader036.vdocuments.net/reader036/viewer/2022081519/55c1879cbb61eb03568b458f/html5/thumbnails/13.jpg)
Infrastructure
• Stream• RabbitMQ• Internal Fire hose (Investigating Kafka)
• Offline• MapRedude• Streaming• Hive• Hue
![Page 14: Testtting](https://reader036.vdocuments.net/reader036/viewer/2022081519/55c1879cbb61eb03568b458f/html5/thumbnails/14.jpg)
Infrastructure - Sqoozie
• Workflow for mysql imports to HDFS• Generate Sqoop commands• Run these imports in parallel
• Transparent to schema changes• Include or exclude on column, data types, table
level• Data Type Casting tinyint(1) Integer• Distributed Table Imports
![Page 15: Testtting](https://reader036.vdocuments.net/reader036/viewer/2022081519/55c1879cbb61eb03568b458f/html5/thumbnails/15.jpg)
Infrastructure - Blammo
• Raw logs are imported to HDFS via flume• Almost real-time – 5 min latency• Logs are key-value pairs in JSON• Each log producer publishes schema in yaml• Hive schema and schema yaml in sync using
thrift• Control exclusion and inclusion
![Page 16: Testtting](https://reader036.vdocuments.net/reader036/viewer/2022081519/55c1879cbb61eb03568b458f/html5/thumbnails/16.jpg)
Recommendations
![Page 17: Testtting](https://reader036.vdocuments.net/reader036/viewer/2022081519/55c1879cbb61eb03568b458f/html5/thumbnails/17.jpg)
You will like to attend this event
![Page 18: Testtting](https://reader036.vdocuments.net/reader036/viewer/2022081519/55c1879cbb61eb03568b458f/html5/thumbnails/18.jpg)
Item Hierarchy (You bought camera so you need batteries - Amazon)
Collaborative Filtering – User-User Similarity (People who bought camera also bought batteries - Amazon)
Collaborative Filtering – Item-Item similarity(You like Godfather so you will like Scarface - Netflix)
Social Graph Based (Your friends like Lady Gaga so you will like Lady Gaga, PYMK – Facebook, Linkedin)
Interest Graph Based (Your friends who like rock music like you are attending Eric Clapton Event–Eventbrite)
Recommendation Engines
![Page 19: Testtting](https://reader036.vdocuments.net/reader036/viewer/2022081519/55c1879cbb61eb03568b458f/html5/thumbnails/19.jpg)
Why Interest?
Events are Social Events are Interest
Dense Graph is IrrelevantInterest are Changing
![Page 20: Testtting](https://reader036.vdocuments.net/reader036/viewer/2022081519/55c1879cbb61eb03568b458f/html5/thumbnails/20.jpg)
How do we know your Interest?
• We ask you• Based on your activity
• Events Attended• Events Browsed
• Facebook Interests• User Interest has to match Event category• Static
• Machine Learning• Logistic Regression using MLE• Sparse Matrix is generated using MapReduce• A model for each interest
![Page 21: Testtting](https://reader036.vdocuments.net/reader036/viewer/2022081519/55c1879cbb61eb03568b458f/html5/thumbnails/21.jpg)
Model Based vs Clustering
Building Social Graph is Clustering Step
Social Graph Recommendation is a Ranking Problem
Item-Item vs User-User
![Page 22: Testtting](https://reader036.vdocuments.net/reader036/viewer/2022081519/55c1879cbb61eb03568b458f/html5/thumbnails/22.jpg)
Implicit Social Graph
U1
U2 U3
U4 U5
E1
E2 E3
E4
![Page 23: Testtting](https://reader036.vdocuments.net/reader036/viewer/2022081519/55c1879cbb61eb03568b458f/html5/thumbnails/23.jpg)
Mixed Social Graph
U1
U2 U3
U4 U5
E1
E2 E3FB
LI
![Page 24: Testtting](https://reader036.vdocuments.net/reader036/viewer/2022081519/55c1879cbb61eb03568b458f/html5/thumbnails/24.jpg)
15M * 260 * 260 = 1.14 Trillion Edges
4Billion edges ranked
Each node is a feature vector representing a User
Each edge is a feature vector representing a Relationship
![Page 25: Testtting](https://reader036.vdocuments.net/reader036/viewer/2022081519/55c1879cbb61eb03568b458f/html5/thumbnails/25.jpg)
Feature Generation
• Mixed Features• A series of map-reduce jobs• Output on HDFS in flat files; Input to subsequent jobs• Orders = Event Attendees
• MAP: eid: uid• REDUCE: eid:[uid]
• Attendees Social Graph• Input: eid:[uid]• MAP: uidi:[uid]
• REDUCE: uid:[neighbors]
• Interest based features, user specific, graph mining etc• Upload feature values to HBase
![Page 26: Testtting](https://reader036.vdocuments.net/reader036/viewer/2022081519/55c1879cbb61eb03568b458f/html5/thumbnails/26.jpg)
U1
U2 U3
![Page 27: Testtting](https://reader036.vdocuments.net/reader036/viewer/2022081519/55c1879cbb61eb03568b458f/html5/thumbnails/27.jpg)
HBase
![Page 28: Testtting](https://reader036.vdocuments.net/reader036/viewer/2022081519/55c1879cbb61eb03568b458f/html5/thumbnails/28.jpg)
HBase
• Collect data from multiple Map Reduce jobs• Stores entire social graph• Over one million writes per second
![Page 29: Testtting](https://reader036.vdocuments.net/reader036/viewer/2022081519/55c1879cbb61eb03568b458f/html5/thumbnails/29.jpg)
HBase
rowid neighbors events featureX
2718282 101 3 0.3678795
![Page 30: Testtting](https://reader036.vdocuments.net/reader036/viewer/2022081519/55c1879cbb61eb03568b458f/html5/thumbnails/30.jpg)
HBase
rowid 314159:n 314159:e 314159:fx 161803:n 161803:e 161803:fx
2718282 31 1 0.3183 83 2 0.618
![Page 31: Testtting](https://reader036.vdocuments.net/reader036/viewer/2022081519/55c1879cbb61eb03568b458f/html5/thumbnails/31.jpg)
Tips & Tricks
• Distributed cache database• Sped up some Map Reduce jobs by hours• Be sure to use counters!
![Page 32: Testtting](https://reader036.vdocuments.net/reader036/viewer/2022081519/55c1879cbb61eb03568b458f/html5/thumbnails/32.jpg)
Tips & Tricks
• Hive (ab)uses• Almost as many hive jobs as custom ones• “flip join”• Statistical functions using hive• UDF
![Page 33: Testtting](https://reader036.vdocuments.net/reader036/viewer/2022081519/55c1879cbb61eb03568b458f/html5/thumbnails/33.jpg)
Tips & Tricks
• Memory Memory Memory• LZO, WAL• Combiners are great until• Shuffle and Sorting stage• Hadoop ecosystem is still new
![Page 34: Testtting](https://reader036.vdocuments.net/reader036/viewer/2022081519/55c1879cbb61eb03568b458f/html5/thumbnails/34.jpg)
Questions?