die lmax-architecture with disruptors: 6m transactions per ...donatas/vadovavimas/temos... · costs...

Post on 13-May-2020

5 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Die LMAX-Architecture with Disruptors: 6M Transactions per Second

Stephan Schmidt, Vice CTO, brands4friends

Me Stephan Schmidt Vice CTO brands4friends

@codemonkeyism www.codemonkeyism.com stephan.schmidt@brands4friends.de

3

brands4friends No.1 Shopping Club in Germany > 360k daily visitors > 4.5M Users eBay company

20.04.12 5 WJAX 2011

6

7

Development at brands4friends Team Java and web developers, data warehouse developers Process Scrum since 2009 Kanban for DWH since 2012

LMAX - The London Multi-Asset Exchange

20.04.12 Fußzeilentext 9

"We aim to build the highest performance financial exchange in the world"

High Performance Transaction Processing

20.04.12 10 Fußzeilentext

Service / Transaction Processor

Receive Unmarshal ReplicateJournal Business Logic Marshall Send

Service / Transaction Processor

Receive Unmarshal ReplicateJournal Business Logic Marshall Send

Queue

Queue

Queue

Queue

Queue

Queue

Ghz CPU

Cores

20.04.12 Fußzeilentext 14

Actors? SEDA?

Stuff that did not work for various reasons

20.04.12 Fußzeilentext 15

1.  RDBMS

2.  Actors

3.  SEDA

4.  J2EE …

Service / Transaction Processor

Receive Unmarshal ReplicateJournal Business Logic Marshall Send

Queue

Queue

Queue

Queue

Queue

Queue

LMAX Architecture

20.04.12 16 Fußzeilentext

Service / Transaction Processor

Receive Unmarshal ReplicateJournal Business Logic Marshall Send

Queue

Queue

Queue

Queue

Queue

Queue

Size

Node Node Node Node

Linked List Queue

Add Remove

Array Queue

Cache Line Cache Line

AddRemove

Queue as a data structure Problems with Queues

19

1.  Reading (Take) and Writing (Add) are both write access => Write Contention

2.  Write Contention solves with Locks 1.  Other solutions include Deques

3.  Locks lead to context switches to the kernel 1.  Context switches lead to CPU cache misses etc.

2.  Kernel might use opportunity to do other stuff as well

Locks Costs according to LMAX Paper

20

Method Time in ms Single Thread 300 Single Thread mit Lock 10.000 Zwei Threads mit Lock 224.000 Single Thread mit CAS 5.700 Zwei Threads mit CAS 30.000 Single Thread/ Volatile Write

4.700

“Compare And Swap” Atomic Reference etc. in Java => No Context Switch Memory Read/Write Barrier

LMAX Data Structure – Ring Buffer

21

Ring Buffer

Publisher Event Processor

Pre-Allocation of Buckets

22

Ring Buffer

31

24

1918

Publisher

30 29 28272625

23

22

21

20

17161514131211

109

87

6

5

432

10

Event Processor

2^5•  No (less) GC problems •  Objects are near each other in memory

=> cache friendly

Coordination

23

Ring Buffer

31

24

1918

Publisher

30 29 28272625

23

22

21

20

17161514131211

109

87

6

5

432

10

Event Processor

2^5

Claim Strategy

1.Claim 2.Write 3.Make Public by advancing sequence

Wait Strategy

Service / Transaction Processor

Receive Unmarshal ReplicateJournal Business Logic Marshall Send

Queue

Queue

Queue

Queue

Queue

Queue

Latency

Receive Message

Journal

Replicate

Unmarshall

Business Logic

Service / Transaction Processor

Receive Unmarshal ReplicateJournal Business Logic Marshall Send

Datenstruktur

Datenstruktur

Ouput DisruptorOuput DisruptorInput Disruptor Ouput Disruptor

Business Logic Handler

LMAX Architektur

28

Input Disruptor

Receiver

Journaler

Replicator

Un-Marshaller

Business Logic Handler

Output Disruptor

Publisher

Marshaller

HA Node

File System

Jede Stage kann mehrere Threads haben

29

31

24

1918

Receiver

Journaler

Replicator

Business Logic Handler

Receiver writes on 31. Journaler and Replicator read on 24 and can move up the sequence to 30.

Business Logic Handler needs to stay behind all others.

Un-Marshaller can move beyond Journaler and Replicator up to 30.

Un-Marshaller

Java API

20.04.12 30 Fußzeilentext

P1

C1

C2

C3

C4

C1P1

C2

C3

C4

P1

C1 C2

C3 C4

C1

C2

C3

C4P1

C1

P1

P2

20.04.12 Fußzeilentext 38

Demo

LMAX Low Level Ideas

20.04.12 Fußzeilentext 39

1.  Simple Code

2.  Everything in memory

3.  Single threaded per CPU for business logic

4.  Business logic has no I/O, I/O is done somewhere else

5.  Scheduler “knows” dependencies of handlers

6M TPS? How did LMAX do it?

40

10K+ TPS

If you don't do anything stupid

3 billions of instructions on modern CPU

100K+ TPS

Clean organized code

Standard libraries

1000K+ TPS

Custom, cache friendly collections

Performance Testing

Controlled GC

Very well modeled domain

x 10

x 10

We’re looking for very good developers

Thanks! @codemonkeyism stephan.schmidt@brands4friends.de

43

Images CC from Flickr: nimboo, imjustcreative, gremionis, justonlysteve, John_Scone, Matthias Wicke, irisgodd3ss, TunnelBug, alandd, seasonal wanderer, raulbarraltamayo, Gilmoth, Dunechaser, graftedno1

Sources

20.04.12 Fußzeilentext 44

“Disruptor: High performance alternative to bounded queues for exchanging data between concurrent threads”, Martin Thompson, Dave Farley, Michael Barker, Patricia Gee, Andrew Stewart, 2011

"The LMAX Architecture”, Martin Fowler, 2011

http://martinfowler.com/articles/lmax.html

“How to do 100K+ TPS at less than 1ms latency”, Martin Thompson, Michael Barker, 2010

top related