advanced visual analytics and real-time analytics at platform scale by brian bulkowski, co-founder...
TRANSCRIPT
© 2014 Aerospike. All rights reserved. Confidential 1
Advanced Visual Analytics and
Real-time Analytics at
Platform scale
Kunal Umrigar
Senior Architect at Pubmatic
In conversation with Brian Bulkowski
CTO and co-founder
Aerospike
© 2014 Aerospike. All rights reserved. Confidential 2
Who am I ?
■ Starting: TRS-80, PC, Apple II, Vax 11/70, Wang
■ First product: lightpen university teaching kiosk
■ Networks: computers without people are boring
■ Silicon Valley internet boom■ 10B market cap in 1999, employee 32
■ 2003-2007 “time off” ( startups )
■ Citrusleaf / Aerospike history■ 42 year old first-time CEO (me)
■ 2008 Prototype
■ 2010 First sale, get the band back together
■ 2011+ 3 rounds of funding (Draper, ALP, NEA, CNTP)
■ 2014 Open Source
■ 70 employees, 2 offices
@bbulkow
© 2014 Aerospike. All rights reserved. Confidential 3
MILLIONS OF CONSUMERS
BILLIONS OF DEVICES
APP SERVERS
DATA
WAREHOUSEINSIGHTS
Advertising Technology Stack
WRITE CONTEXT
In-memory NoSQL
WRITE REAL-TIME CONTEXT
READ RECENT CONTENT
PROFILE STORE
Cookies, email, deviceID, IP address, location,
segments, clicks, likes, tweets, search terms...
REAL-TIME ANALYTICS
Best sellers, top scores, trending tweets
BATCH ANALYTICS
Discover patterns,
segment data:
location patterns,
audience affinity
© 2014 Aerospike. All rights reserved. Confidential 4
Introduction to Advertising: Real-time Bidding
© 2014 Aerospike. All rights reserved. Confidential 5
North American RTB speeds & feeds
■ 1 to 6 billion cookies tracked
■ Some companies track 200M, some track 20B
■ Each bidder has their own data pool
■ Data is your weapon
■ Recent searches, behavior, IP addresses
■ Audience clusters (K-cluster, K-means) from offline Hadoop
■ “Remnant” from Google, Yahoo is about 0.6 million / sec
■ Facebook exchange: about 0.6 million / sec
■ “other” is 0.5 million / sec
Currently about 3.0M / sec in North American
© 2014 Aerospike. All rights reserved. Confidential 6
Old Architecture ( scale out in 2000 )
Request routing and sharding
APP SERVERS
CACHE
DATABASE
STORAGE
CONTENT
DELIVERY NETWORK
LOAD BALANCER
© 2014 Aerospike. All rights reserved. Confidential 7
Modern Scale Out Architecture
Load balancer
Simple stateless
APP SERVERS
IN-MEMORY NoSQL
RESEARCH
WAREHOUSE
CONTENT
DELIVERY NETWORK
LOAD BALANCER
Long term cold
storage
Fast stateless
HDFS BASED