restream: accelerating backtesting and stream replay with serial-equivalent parallel processing
TRANSCRIPT
ReStreamAccelerating Backtesting and Stream Replay
with Serial-Equivalent Parallel Processing
October 6, 2016
Johann Schleier-Smith, Erik T. Krogen, Joseph M. Hellerstein UC Berkeley
@jssmith @joe_hellerstein
Overview
• Motivations for backtesting and stream replay
• Alternatives for scaling throughput
• ReStream and Multi-Version Parallel Streaming (MVPS)
• Evaluation
Research Motivation
• Operating Tagged and hi5 social networks
• >300 million users registered • Millions of daily active users
Practical Pains Curiosity
• >10 million active accounts • >1000 updates/sec
• Must respond to current activity • Require near-instant decisions
Real-Time Spam Detection
for Dating Product
• Facts recorded in event log • Real-time stream-processing • Need to evaluate new ideas quickly,
e.g., simulate model using data of past 30 days in under 10 minutes
Real-Time Spam Detection
for Dating Product
Replay lets Agile developers ask
powerful tool for creating and enhancing streaming applications
When latency matters… streaming shines
• Spam detection • Payment fraud • Money laundering • Real-time recommendations • Ad serving • Dynamic pricing and inventory
management for e-commerce, car-services, etc.
• Financial trading • Industrial monitoring • IoT applications • And more
Research Motivation
• Operating Tagged and hi5 social networks
• >300 million users registered • Millions of daily active users
Practical Pains Curiosity
Given a program that processes an ordered log sequentially
How can we achieve parallel speedup?
Serial-Equivalent Parallel Replay
12345
Ordered log
Serial-Equivalent Parallel Replay
12345 Program
12345678910
Serial-Equivalent Parallel Replay
ABProgram
1234567891011
Serial-Equivalent Parallel Replay
ABProgram
234567891011121314
Serial-Equivalent Parallel Replay
ABC15 Programt=4t=5t=9
Program*
Program*
Serial-Equivalent Parallel Replay
Serial-Equivalent Parallel Replay
13579
246810 A
BCProgram*
Program*
Serial-Equivalent Parallel Replay
13579111213151719212325
2468101214161820222426 A
BCProgram*
Program*t=4
t=5t=9
Serial-Equivalent Parallel Replay
• Deterministic output
• More restrictive than transaction serializability
• Partition the input between multiple parallel programs
• Obtain same output as from one program
Developers’ Accelerated Replay Wish List
• Semantics of sequential operations with mutable state
• Full fine-grained temporal resolution
• Process months in minutes: 10,000x real-time rate
Want serial-equivalent parallel replay
Workload Assumptions
• Total order provided by log
• Abundant cloud resources available
• Per-event latency not a concern
Possible Solutions
Streaming Databases
• StreamBase / Aurora
• Truviso / TelegraphCQ
• Recent startups - PipelineDB - RethinkDB
• Query interface derived from SQL
• Set-oriented approach allows query plan optimization, parallelism and reordering
• Some programs can be difficult to express
• Most systems emphasize latency over replay throughput
Examples
+––
+/–
OLTP Databases
• PostgreSQL
• IBM DB2
• MS SQL Server
• Oracle
• SQL interface
• Robust high-performance implementations
• Need to coordinate parallel replay programs
• Transactional serializability gives weaker consistency than serial-equivalence
Examples
+––
+/–
• Hadoop
• Apache Spark Streaming
• Lambda architecture
• Routinely delivers desired log-processing throughput
• Easy to integrate arbitrary functions
• MapReduce foundation does not lend itself naturally to sequential processing
• Throughput and program semantics may be linked
Parallel Big Data SystemsExamples
+––
+
Other Systems• Other streaming: Google MillWheel, Yahoo S4, Apache Storm,
Twitter Heron, Apache Flink, Apache Samza, Walmart MUP8
• Deterministic databases: Calvin, Bohm
• Transactional: VoltDB / S-Store
• Complex Event Processing: Esper, Tibco, JBoss
• Other recent systems: Trill, Naiad, Google Cloud Dataflow
ReStream
• Consequence of input data
• Suggests opportunity for parallelism
• Can we maintain order when necessary, but not necessarily otherwise?
Challenge: serial equivalence and parallelism
Observation: causal dependencies are often sparse
• Consequence of input data
• Suggests opportunity for parallelism
• Can we maintain order when necessary, but not necessarily otherwise?
Challenge: serial equivalence and parallelism
Observation: causal dependencies are often sparse
Multi-Versioned State
SET(timestamp=10,key=x,value=3)
SET(timestamp=20,key=x,value=5)
GET(timestamp=15,key=x)
x=3@t=10
x=5@t=20
GET(timestamp=11,key=x)
GET(timestamp=25,key=x)
→3
→3
→5x=5@t=25
SET(timestamp=21,key=x,value=7)x
Social Network Anti-Spam Example
sender has sent 2x messages to non-friends as to friends AND
> 20% of messages sent from IP contain e-mail address
⇒ message is spam
Social Network Anti-Spam Example
Express program in four piecesA. Track friendships B. Track how often user sends to friends / non-friends C. Track how often ip address sends text containing e-mail D. For each message, check B and C to label spam
Sample Code{e:NewFriendshipEvent=>
}A
Sample Code{e:NewFriendshipEvent=>userPair=(e.userIdA,e.userIdB)friendships.merge(e.timestamp,userPair,_=>true)}
A
{e:NewFriendshipEvent=>userPair=(e.userIdA,e.userIdB)friendships.merge(e.timestamp,userPair,_=>true)}
Sample Code
A
{e:NewFriendshipEvent=>userPair=(e.userIdA,e.userIdB)friendships.merge(e.timestamp,userPair,_=>true)} WRITE
Sample Code
A
friendships.merge(timestamp,key,value)
WRITE
Sample Code
{e:NewFriendshipEvent=>userPair=(e.userIdA,e.userIdB)friendships.merge(e.timestamp,userPair,_=>true)} WRITE
Sample Code
{e:MessageEvent=>
}
42
A
B
{e:MessageEvent=>userPair=(e.senderId,e.recipientId)if(friendships.get(e.timestamp,userPair)){friendMsgs.merge(e.timestamp,e.senderId,_+1)}else{nonfriendMsgs.merge(e.timestamp,e.senderId,_+1)}}
Sample Code
43
{e:NewFriendshipEvent=>userPair=(e.userIdA,e.userIdB)friendships.merge(e.timestamp,userPair,_=>true)} WRITE
READ
A
B
{e:MessageEvent=>userPair=(e.senderId,e.recipientId)if(friendships.get(timestamp,key)){friendMsgs.merge(e.timestamp,e.senderId,_+1)}else{nonfriendMsgs.merge(e.timestamp,e.senderId,_+1)}}
Sample Code
44
{e:NewFriendshipEvent=>userPair=(e.userIdA,e.userIdB)friendships.merge(e.timestamp,userPair,_=>true)} WRITE
READ
A
B
{e:MessageEvent=>userPair=(e.senderId,e.recipientId)if(friendships.get(e.timestamp,userPair)){
friendMsgs.merge(e.timestamp,e.senderId,_+1)}else{nonfriendMsgs.merge(e.timestamp,e.senderId,_+1)}}
Sample Code
45
{e:NewFriendshipEvent=>userPair=(e.userIdA,e.userIdB)friendships.merge(e.timestamp,userPair,_=>true)} WRITE
READ
WRITE
A
B
A
B
C
D
nonfriendMsgs
R
R
R
R
R
W
W
W
W
W
friendMsgs
friendships
ipEmailMsgs
ipMsgs
A B C D
nonfriendMsgs
friendMsgs
friendshipsipEmailMsgs
ipMsgsR
R
R
R
R
W
W
WWW
Topological sort
A B C D
nonfriendMsgs
friendMsgs
friendshipsipEmailMsgs
ipMsgs
12
R
Reading from log
A B C D
nonfriendMsgs
friendMsgs
friendshipsipEmailMsgs
ipMsgs
1234
W
R
Reading from log, writing shared state
A B C D
nonfriendMsgs
friendMsgs
friendshipsipEmailMsgs
ipMsgs
123456
R
W
R
Loose coupling
A B C D
nonfriendMsgs
friendMsgs
friendshipsipEmailMsgs
ipMsgs
123456789
Loose coupling
A B C D
nonfriendMsgs
friendMsgs
friendshipsipEmailMsgs
ipMsgs
123456789
Must respect dependencies
NO
A B C D
nonfriendMsgs
friendMsgs
friendshipsipEmailMsgs
ipMsgs
123456789
Loose coupling
OK
A B C D
nonfriendMsgs
friendMsgs
friendshipsipEmailMsgs
ipMsgs
1234567891011
OK
Loose coupling
A B C D
nonfriendMsgs
friendMsgs
friendshipsipEmailMsgs
ipMsgs
12345678910111213
OK
Out-of-order processing
A B C D
nonfriendMsgs
friendMsgs
friendshipsipEmailMsgs
ipMsgs
234567891011121314
OK
Out-of-order processing
Multi-version Parallel StreamingMVPS:
A B C D
nonfriendMsgs
friendMsgs
friendships
ipEmailMsgs
ipMsgs
A B C D
MVPS
135791112131517192123
246810121416182022
A B C D
nonfriendMsgs
friendMsgs
friendships
ipEmailMsgs
ipMsgs
A B C D
MVPS
Mini-batches for MVPS
110
1112131415161718
Mini-batches for MVPS
1120
110
110
2130
4150
6170
8190101 110121 130
1120
3140
5160
7180
91 100111 120140
A B C D
nonfriendMsgs
friendMsgs
friendships
ipEmailMsgs
ipMsgs
A B C D
MVPS with mini-batches
• Partitioned parallel dataflow • Input events passed to all operators • Globally shared multi-versioned state • Logical timestamps referenced throughout computation • Analyze DAG of operator potential read-write dependency • May use mini-batches to amortize coordination • Serial-equivalent semantics
Multi-Versioned Parallel Streaming (MVPS)
Evaluation
ReStream Evaluation Aims
• Demonstrate parallel speedup vs. single-thread (COST)
• Compare to alternative systems
• Understand limits to parallelism
ReStream Evaluation Workload• Simulated social network spam detection
• Structure of read-write dependency graph linked to structure of social network
• Can tune workload characteristics by generating different social graphs
ReStream Evaluation Workload• Simulated social network spam detection
• Structure of read-write dependency graph linked to structure of social network
• Can tune workload characteristics by generating different social graphs
uniform degree distribution
skewed degree distribution
Scaling Throughput
0
200,000
400,000
600,000
1 2 4 8 16 32Hosts
Thro
ughp
ut (e
vent
s/s)
Execution Engine ReStream MVPS on Spark 1−Thread
Modeling Performance
• Greater parallel speedup possible when there are fewer read-write dataflow dependencies
• Track reads and writes of global state, compute critical path length along chained dependencies
uniform degree distributionskewed degree distribution
Parametrized by α
Web out-links
PhotoSharing
SocialNetworks
Web in-links
0
100,000
200,000
300,000
1.5 2.0 2.5 3.0
Hosts 2 4 8 16
Thro
ughp
ut (e
vent
/s)
α
Modeling Performance
R2=0.94 Per-host batch size
2,500-40,000 events (10,000 shown)
Fit gives
ReStream Summary• Serial-equivalent results from parallel replay
• Throughput much greater than real-time rate
• MVPS consistency: Multi-Versioned Parallel Streaming - Analyze for potential read-write dependencies - Timestamped multi-versioned state - Track logical time at runtime
• Also may apply to online stream processing and deterministic databases
@jssmith @joe_hellerstein This work was supported in part by AWS Cloud Credits for Research