Download - Qingpeng zhang 0713
Introducing VenmoPlus.com - Explore your Venmo network!
Qingpeng “Q.P.” Zhang, Insight Data Engineering Fellow
Features - VenmoPlus.com
● fuzzy searching of user name, with friend list to help identify users with same name
● labeling the relationship between the payer and receiver● friend recommendation● searching transactions in friend circle● listing friends of the user
Features - VenmoPlus.com
● fuzzy searching of user name, with friend list to help identify users with same name
● labeling the relationship between the payer and receiver● friend recommendation● searching transactions in friend circle● listing friends of the user
Demo:VenmoPlus.com
Challenge:● Find the distance between nodes in dynamic graph in real time
Solutions
● Two databases○ Redis and ElasticSearch
● Algorithm design○ BFS -> Bidirectional Search○ Query relationship of a past transaction
● Query/search optimizations
Solutions
● Two databases○ Redis and ElasticSearch
● Algorithm design○ BFS -> Bidirectional Search○ Query relationship of a past transaction
● Query/search optimizations
Historical transactions
Real time transactions
A Tale of Two Databases
API
Redis for graph structure
420890 Graham Hadley
1630476 Leon Tang
810029 Harminder Toor
1371353 Ephraim Park
562884 Paul Min
420890 set(14935158, 562884)
1630476 set(1371353)
810029 set(190230,14935158)
1371353 set(810029,971156)
562884 set(196371,1371353)35 million edges6 million nodes
ElasticSearch for everything
ElasticSearch for everything
Redis
Elasticsearch
Redis + Elasticsearch => search transactions in friend circle
Breadth First Search -> Bidirectional Search
Shortest distance -> intersection of sets (friend lists)
● A’s 1st degree friends ∩ B’s 1st degree friends● A’s 2nd degree friends ∩ B’s 1st degree friends
O(N^2) -> O(2*N)
O(N^3) -> O(N + N^2)
VenmoPlus.com
m4.xlarge
m4.large
m4.xlarge
m4.large
t2.micro
$29.11/day
More optimization
● Only store necessary info in elasticsearch● Labeling distance of history transaction can be done in batch job, reduce
the number the real time queries● Adjust AWS instances to reduce cost
Qingpeng “Q.P.” Zhang
● Postdoc○ Lawrence Berkeley National Lab
● PhD in Computer Science, ○ Michigan State University
What I learned from Insight:
● Thinking as data engineer● Open source tools
○ Redis, Elasticsearch, Kafka, Spark Streaming, Flask, AngularJS, etc.
ElasticSearch for everything
Query relationship of a past transaction
Query relationship of a past transaction
Query distance between vertices in a historic moment in a constantly changing graph (because we don’t pre-calculate the distance….)
● If there are transactions before that one, distance = 1● If the transaction is new: distance >1
○ Remove the influence of that specific transaction temporarily○ Check distance from graph (2, 3, or >3)
Pipeline, raw data, in distributed way
Query/Search Optimizations
1. Remove aggregation for better performance… (trade-off)2. Friend recommender:
a. Using Counter to get only 5 users with the most common friends
3. Search message in friend circlea. Combine query of Elasticsearch and Redis
Historical transactions
Real time transactions
Pipeline
API