streaming olap applications - hpts
TRANSCRIPT
Streaming OLAP Applications
C. Scott Andreas | HPTS 2013 @cscotta
From square one to multi-gigabit streams and beyond
Roadmap– Framing the problem – Four phases of an architecture’s evolution – Code: A general-purpose lockless aggregator – Demonstration – Further reading
A journey of up and out– Started at ~7,000 flows / second on one node – Added distribution, bringing us to 7,000 flow/sec/node – Implemented custom OLAP engine: 1.6 MM/sec/node – Further work remains on a streaming OLAP map/reduce, demonstrated on a stream of 80 Gbps.
Sing
le-No
de S
calab
ility
Many-Node Scalability
xA good place to be
Sing
le-Cus
tomer
Sca
labilit
y
Many-Customer Scalability
xA good place to be
Four phases– Up: Off-the-shelf CEP software – Out: Distribution – Up: Custom streaming OLAP engine – Out: Evolution toward a streaming map/reduce
[1] Off-the-Shelf CEP– Single customer, single node – Exists, works!
select symbol, avg(price) as avgPrice from StockTickEvent.win:length(100) group by symbol;
A sample EPL that returns the average price per symbol for the last 100 stock ticks: !!
http://esper.codehaus.org/tutorials/tutorial/tutorial.html
a
Sing
le-No
de S
calab
ility
Many-Node Scalability
x
you are here
7,000 events/second one node, no HA
[2] DistributionDesigning an HA multi-tenant analytics engineto map M customers onto N nodes.
streambuffering
kafka01
kafkaNN
OLAP filtering + aggregation
olap01
olapNN
Client API 0 - NN
collectors
coll01 coll02
coll03 coll04
coll05 coll06
zookeeper zookeeper zookeeper zookeeper zookeeper
Storage 0 - NNStorage 0 - NNStorage 0 - NNStorage 0 - NN
Client API 0 - NNClient API 0 - NNClient API 0 - NN
Self-Organization
github.com/boundary/ordasity
Self-Organization• ZooKeeper broadcasts a consistently-ordered view of
cluster state changes for all nodes, all active streams, and who owns what.
github.com/boundary/ordasity
Self-Organization• ZooKeeper broadcasts a consistently-ordered view of
cluster state changes for all nodes, all active streams, and who owns what.
• “Claim streams until I have at least my fair share.”
github.com/boundary/ordasity
Self-Organization• ZooKeeper broadcasts a consistently-ordered view of
cluster state changes for all nodes, all active streams, and who owns what.
• “Claim streams until I have at least my fair share.”
• If I have too much, “hand off streams until I’m doing my fair share.”
github.com/boundary/ordasity
Self-Organization• ZooKeeper broadcasts a consistently-ordered view of
cluster state changes for all nodes, all active streams, and who owns what.
• “Claim streams until I have at least my fair share.”
• If I have too much, “hand off streams until I’m doing my fair share.”
• If I’m shutting down, tell others, hand streams off, and don’t claim any more.
github.com/boundary/ordasity
Sing
le-No
de S
calab
ility
Many-Node Scalability
xyou are here
Sing
le-No
de S
calab
ility
Many-Node Scalability
xyou are here
Sing
le-No
de S
calab
ility
Many-Node Scalability
xyou are here
7,000 flows/second any number of nodes, HA
Sing
le-C
usto
mer
Sca
labilit
y
Many-Customer Scalability
xbut you are still here
Sing
le-C
usto
mer
Sca
labilit
y
Many-Customer Scalability
xbut you are still here
Sing
le-C
usto
mer
Sca
labilit
y
Many-Customer Scalability
xbut you are still here
7,000 flows/second any number of nodes, HA
[3] Custom Streaming OLAPLockless aggregation of event streams
Timestamp Dimension Key Rollup Object
Methodology: Launch process with thread count configuration,preload all data into memory,run for 10 minutes, and exit printing the final mean processing rate. Batch size: 10,000. Hardware: Tests run on an EC2 cc2.8xlarge (2x Xeon E5-2670; 32 vcores,16 physical) Software: Java 1.7.0_40-b43 Xmx24G CMS+Parnew. EC2 Linux 3.4.43-43.43.amzn1.x86_64 (ami-a73758ce)
Chart 4
0
1250000
2500000
3750000
5000000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Lockless Aggregator
Methodology: Launch process with thread count configuration,preload all data into memory,run for 10 minutes, and exit printing the final mean processing rate. Batch size: 10,000. Hardware: Tests run on an EC2 cc2.8xlarge (2x Xeon E5-2670; 32 vcores,16 physical) Software: Java 1.7.0_40-b43 Xmx24G CMS+Parnew. EC2 Linux 3.4.43-43.43.amzn1.x86_64 (ami-a73758ce)
Chart 4
0
600000
1200000
1800000
2400000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Lock-Striping Aggregator
Methodology: Launch process with thread count configuration,preload all data into memory,run for 10 minutes, and exit printing the final mean processing rate. Batch size: 10,000. Hardware: Tests run on an EC2 cc2.8xlarge (2x Xeon E5-2670; 32 vcores,16 physical) Software: Java 1.7.0_40-b43 Xmx24G CMS+Parnew. EC2 Linux 3.4.43-43.43.amzn1.x86_64 (ami-a73758ce)
Chart 4
0
1250000
2500000
3750000
5000000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Lockless Aggregator (NonBlockingHashMap)Lock-Striping Aggregator (ConcurrentHashMap)
Timestamp Dimension Key Rollup Object
Sing
le-C
usto
mer
Sca
labilit
y
Many-Customer Scalability
xmoving on up!
Sing
le-C
usto
mer
Sca
labilit
y
Many-Customer Scalability
xmoving on up!
Sing
le-C
usto
mer
Sca
labilit
y
Many-Customer Scalability
xmoving on up!
1.6MM flows/second/node any number of nodes, HA
Example Implementation
Example Implementation
demo
Man
y-No
de a
nd L
arge
Cus
tom
er S
calab
ility
Many-Node and Many-Customer Scalability
x
what gets us here?
Man
y-No
de a
nd L
arge
Cus
tom
er S
calab
ility
Many-Node and Many-Customer Scalability
x
what gets us here?
high processing rate, HA, any number of nodes, no “single-node” sharding limit.
[4] Streaming OLAP Map/ReduceIncremental lockless filtering / aggregation of event streams, final rollups of total streams
Input Sources
Map
Map
Map
Map
Map
Reduce
Output
high velocity, partitioned
streams
low velocity incremental
outputmany,
high velocityfinal
aggregationtop-level
filtering and aggregation
Streaming Map/Reduce
Streaming Map/Reduce• Higher latency, but much higher velocity
Streaming Map/Reduce• Higher latency, but much higher velocity
• Challenging for time-windowed aggregations (case of the slow mapper)
Streaming Map/Reduce• Higher latency, but much higher velocity
• Challenging for time-windowed aggregations (case of the slow mapper)
• Implementations: Apache Samza atop YARN (LinkedIn), Storm (Twitter), Summingbird (Twitter)
Streaming Map/Reduce• Higher latency, but much higher velocity
• Challenging for time-windowed aggregations (case of the slow mapper)
• Implementations: Apache Samza atop YARN (LinkedIn), Storm (Twitter), Summingbird (Twitter)
• Papers: MillWheel (Google at VLDB)http://db.disi.unitn.eu/pages/VLDBProgram/pdf/industry/p734-akidau.pdf
Parallel OLAP Aggregation
Parallel OLAP Aggregation
• Fundamental problem: contention
Parallel OLAP Aggregation
• Fundamental problem: contention
• Lockless data structures reduce contention – but CAS is no silver bullet
Parallel OLAP Aggregation
• Fundamental problem: contention
• Lockless data structures reduce contention – but CAS is no silver bullet
• One approach: thread-local aggregation with TreeMaps/HashMaps, combining operations once/sec
Parallel OLAP Aggregation
• Fundamental problem: contention
• Lockless data structures reduce contention – but CAS is no silver bullet
• One approach: thread-local aggregation with TreeMaps/HashMaps, combining operations once/sec
• “Flat Combining and the Synchronization-Parallelism Tradeoff”
CodeStreaming Aggregation: https://github.com/cscotta/deschutesCluster Coordination: https://github.com/boundary/ordasityDocumentation: http://taco.cat/deschutes
Streaming OLAP Applications
C. Scott Andreas | HPTS 2013 @cscotta
From square one to multi-gigabit streams and beyond