gc free coding in @java presented @geecon

58
GC Free coding Peter Lawrey CEO, Principal Consultant Higher Frequency Trading

Upload: peter-lawrey

Post on 10-May-2015

2.290 views

Category:

Technology


4 download

DESCRIPTION

Look at why, when and how to reduce garbage in a Java application. Includes an example of the simple to hacky optimisations.

TRANSCRIPT

Page 1: GC free coding in @Java presented @Geecon

GC Free coding

Peter LawreyCEO, Principal ConsultantHigher Frequency Trading

Page 2: GC free coding in @Java presented @Geecon

Agenda

Who are we?

When to optimise.

What can be achieved in extreme cases.

Examples of object replacement.

Libraries designed to be ultra-low GC.

Page 3: GC free coding in @Java presented @Geecon

Who are we

Higher Frequency Trading is a small consulting and software development house specialising in

Low latency, high throughput software 8 developers in Europe and USA. Sponsor HFT related open source projects Core Java engineering

Page 4: GC free coding in @Java presented @Geecon

Who am I?

Peter Lawrey

- CEO and Principal Consultant

- 3rd on Stackoverflow for Java, most Java Performance answers.

- Founder of the Performance Java User's Group

- An Australian, based in the U.K.

Page 5: GC free coding in @Java presented @Geecon

What is our OSS

Key OpenHFT projects Chronicle, low latency logging, event store and

IPC. (record / log everything) HugeCollections, cross process embedded

persisted data stores. (only need the latest)

Millions of operations per second.

Micro-second latency.

Page 6: GC free coding in @Java presented @Geecon

Why do we optimise for memory

The level required depends on the application Maximise throughput Remove worst pauses (Full GC, tenured) Remove small pauses Optimise CPU cache utilisation

Page 7: GC free coding in @Java presented @Geecon

Maximise Throughput

Latency is not important, only the amount of time it takes to perform a batch.

If you have a system which clean up do say 10 GB/s of garbage, it might be acceptable to spend 20% of your time GC-ing

In which case 2 GB/s of garbage might acceptable

Page 8: GC free coding in @Java presented @Geecon

Remove worst pauses

Stop the world, Full GCs and Tenure space collections.

An allocation rate of around 500 MB/s GC to below 5% of the CPU time reduce pressure on your tenured space Tuning the GC will do the rest

Page 9: GC free coding in @Java presented @Geecon

Remove small pauses

Reduce your garbage over a day to your Eden size.

Say you have a 24 GB Eden space and an allocation rate of 1 GB/hour, you can clean up once per day.

Page 10: GC free coding in @Java presented @Geecon

Remove small pauses

Page 11: GC free coding in @Java presented @Geecon

Time scales every developer should know.

Operation Latency In human terms

L1 Cache hit 1 ns A blink of an eye (~20 ms)

L2 Cache hit 3 ns Noticeable flicker

L3 Cache hit 10 – 20 ns Time to say “A”

Main memory 70 – 100 ns Time to say a ten word sentence

Signal down a 200m fibre cable

1 μsec One slide (speaking quickly)

SSD access 5 – 25 μsec Time to reheat a meal (3 mins)

HDD access 8 msec Time to flight around the world. (1.8 days)

Network packet from Germany to the USA

45 msec Waiting for a 7 working day delivery

Page 12: GC free coding in @Java presented @Geecon

Improve CPU cache utilisation

Your L1 cache is a small 32 KB of data, and L2 is 256 KB shared.

If you are producing 32 MB/s of garbage in a thread, you are filling your L1 cache with garbage every milli-seconds.

This makes it harder for the CPU to keep useful stuff in the cache. It doesn't know what is garbage.

Improving CPU Cache efficiency can speed up an application by 2-5x.

Page 13: GC free coding in @Java presented @Geecon

Why use Java?

A rule of thumb is that 90% of the time is spent in 10% of the code.

Writing in Java will mean that 10% of your code might mean optimising heavily.

Writing in C or C++ will mean that 100% of your code will be harder to write, or you have to use JNI, JNA, JNR-FFI.

Low level Java works well with natural Java.

Page 14: GC free coding in @Java presented @Geecon

When to optimise

Premature optimisation is the root of all evil

– Donald Knuth

Measure, Don't Guess

– Jack Shirazi & Kirk Pepperdine

Page 15: GC free coding in @Java presented @Geecon

When to optimise

You will need Key portion of code working correctly. A representative, reproducible test case. An idea of the optimisation requirements

Throughput or average latency. Latency profile at a given throughput.

Page 16: GC free coding in @Java presented @Geecon

Where to start

With representative and correct test case, I start with a commercial profiler with

CPU Profiling enabled Memory Profiling enabled.

With both on I look at the CPU profiling. Having the memory profiling on, gives memory allocation more weight.

Page 17: GC free coding in @Java presented @Geecon

An example: busy waiting for files

List<File> files = Arrays.asList(new File("dir/a.txt"), new File("dir/b.txt"), new File("dir/c.txt"));

// waiting for all files to existsboolean allFound;do { allFound = true; for (File file : files) { if (!file.exists()) { LockSupport.parkNanos(1); allFound = false; } }} while (!allFound);System.out.println("All files exist");

BTW DirectoryWatcher is the best solution.

Page 18: GC free coding in @Java presented @Geecon

Heap usage: SawTooth

Page 19: GC free coding in @Java presented @Geecon

Allocations

Page 20: GC free coding in @Java presented @Geecon

Green Level Optimisation

List<File> files = Arrays.asList(new File("dir/a.txt"), new File("dir/b.txt"), new File("dir/c.txt"));

// waiting for all files to existsboolean allFound;do { allFound = true; for (int i = 0; i < files.size(); i++) { if (!file.exists()) { LockSupport.parkNanos(1); allFound = false; } }} while (!allFound);System.out.println("All files exist");

Simple, saves 8% of garbage.

Page 21: GC free coding in @Java presented @Geecon

Red Level Optimisationstatic class MyFile extends File { String name; public MyFile(String pathname) { super(pathname); }

@Override public boolean exists() { return super.exists(); }

@Override public String getName() { if (name != null) return name; return name = super.getName(); }}public static void main(String... ignored) { List<MyFile> files = Arrays.asList(new MyFile("dir/a.txt"), new MyFile("dir/b.txt"), new MyFile("dir/c.txt"));

A bit scary and invasive but saves 46% of garbage.

Page 22: GC free coding in @Java presented @Geecon

Brown Level Optimisationprivate GetBytes getBytesCache;

public byte[] getBytes(String charsetName) throws UnsupportedEncodingException { if (charsetName == null) throw new NullPointerException(); GetBytes getBytes = getBytesCache; if (getBytes != null && getBytes.charsetName.equals(charsetName)) return getBytes.bytes; byte[] encode = StringCoding.encode(charsetName, value, 0, value.length); getBytesCache = new GetBytes(charsetName, encode); return encode;}

static class GetBytes { final String charsetName; final byte[] bytes; GetBytes(String charsetName, byte[] bytes) { this.charsetName = charsetName; this.bytes = bytes; }}

A very scary, uses -Xbootclasspath/p but saves 46% of garbage.

Page 23: GC free coding in @Java presented @Geecon

Heap usage: Flat line

Page 24: GC free coding in @Java presented @Geecon

Is it alive?

Page 25: GC free coding in @Java presented @Geecon

Common techniques to reduce garbage

Use primitives instead of wrappers.

Double → double

BigDecimal → double or long

Use primitive collections

Trove4j

Guava collections and utilities

Page 26: GC free coding in @Java presented @Geecon

Thread local mutable data

You can recycle data structures yourself if it is mutable.

Sharing mutable state between threads is hard.

Using mutable state which is local to a thread is much simpler and can be very low GC.

A quick win can be reusing buffers like ByteBuffer (and it can be off heap too)

Page 27: GC free coding in @Java presented @Geecon

SAX Parser instead of DOM

DOM is easier to work with, but requires reading in the whole document and building a tree of data.

SAX is harder to use, but you can build just the data structure you need as you go. You can populate a mutable structure for re-use.

Page 28: GC free coding in @Java presented @Geecon

Example, GC-free FIX parser

SAXophone is a library for SAX parsers FIX parser JSon parser BSon parser Yaml parser

Page 29: GC free coding in @Java presented @Geecon

SAX Parser interface

public interface BytesSaxParser { /** * reset any state */ void reset();

/** * Parse as much of the Bytes as possible. * * @param bytes */ void parse(Bytes bytes);}

Page 30: GC free coding in @Java presented @Geecon

FIX SAX Handler interface

public interface BytesSaxParser { /** * reset any state */ void reset();

/** * Parse as much of the Bytes as possible. * * @param bytes */ void parse(Bytes bytes);}

Page 31: GC free coding in @Java presented @Geecon

FIX Handlerfinal StringBuilder sender, target, clOrdId, symbol;double quantity, price;int ordType;

@Overridepublic void onField(long fieldNumber, Bytes value) { switch ((int) fieldNumber) { case 8: resetAll(); break; case 35: assert value.readByte() == 'D'; break; case 49: value.parseUTF(sender, StopCharTesters.ALL); break; case 56: value.parseUTF(target, StopCharTesters.ALL); break; case 11: value.parseUTF(clOrdId, StopCharTesters.ALL); break; case 55: value.parseUTF(symbol, StopCharTesters.ALL); break; case 38: quantity = value.parseLong(); break; case 44: price = value.parseDouble(); break; case 40: ordType = (int) value.parseLong(); break; case 10: processFixMessage(sender, target, clOrdId, symbol, quantity, price, ordType); break; } count.incrementAndGet();}

Page 32: GC free coding in @Java presented @Geecon

FIX Message to Decode

String s = "8=FIX.4.2|9=130|35=D|34=659|49=BROKER04|56=REUTERS|52=20070123-19:09:43|38=1000|59=1|100=N|40=1|11=ORD10001|60=20070123-19:01:17|55=HPQ|54=1|21=2|10=004|";

NativeBytes nb = new DirectStore(s.length()).bytes();

nb.append(s.replace('|', '\u0001'));

nb.flip();

Page 33: GC free coding in @Java presented @Geecon

Test harness

final AtomicInteger count = new AtomicInteger();FixSaxParser parser = new FixSaxParser(new MyFixHandler(count));

int runs = 200000;for (int t = 0; t < 5; t++) { count.set(0); long start = System.nanoTime(); for (int i = 0; i < runs; i++) { nb.position(0); parser.parse(nb); } long time = System.nanoTime() - start; System.out.printf("Average parse time was %.2f us, " + "fields per message %.2f%n", time / runs / 1e3, (double) count.get() / runs);}

Note : this is average latency which is really a proxy for throughput. Average latency hides large delays, but in the sub-microsecond range it is harder to get a meaningful latency distribution.

Page 34: GC free coding in @Java presented @Geecon

PerformanceRun with -verbose:gc -Xmx32m

Average parse time was 0.96 us, fields per message 17.00

Average parse time was 0.58 us, fields per message 17.00

Average parse time was 0.58 us, fields per message 17.00

Average parse time was 0.55 us, fields per message 17.00

Average parse time was 0.54 us, fields per message 17.00

Page 35: GC free coding in @Java presented @Geecon

Q & A

https://github.com/OpenHFT/OpenHFT

@PeterLawrey

[email protected]

Page 36: GC free coding in @Java presented @Geecon

What is HFT?

No standard definition. Trading faster than a human can see. Being fast can make the difference between

making and losing money. For different systems this means typical

latencies of between 10 micro-seconds and 10 milli-second.

(Latencies external to the provider)

Page 37: GC free coding in @Java presented @Geecon

Time scales every developer should know.

Operation Latency In human terms

L1 Cache hit 1 ns A blink of an eye (~20 ms)

L2 Cache hit 3 ns Noticeable flicker

L3 Cache hit 10 – 20 ns Time to say “A”

Main memory 70 – 100 ns Time to say a ten word sentence

Signal down a 200m fibre cable

1 μsec One slide (speaking quickly)

SSD access 5 – 25 μsec Time to reheat a meal (3 mins)

HDD access 8 msec Time to flight around the world. (1.8 days)

Network packet from Germany to the USA

45 msec Waiting for a 7 working day delivery

Page 38: GC free coding in @Java presented @Geecon

Simple Trading System

Page 39: GC free coding in @Java presented @Geecon

Event driven processing

Trading system use event driven processing to minimise latency in a system.

Any data needed should already be loaded in memory, not go off to a slow SQL database.

Each input event triggers a response, unless there is a need to limit the output.

Page 40: GC free coding in @Java presented @Geecon

Critical Path

A trading system is designed around the critical path. This has to be as short in terms of latency as possible.

Critical path has a tight latency budget which excludes many traditional databases.

Even the number of network hops can be minimised.

Non critical path can use tradition databases

Page 41: GC free coding in @Java presented @Geecon

Critical Path databases

Time Series databases Kdb, kona InfluxDB OpenTSDB

Designed for millions of writes per second.

Column based database => 100 Million operations per second e.g. sum a column.

Page 42: GC free coding in @Java presented @Geecon

Critical Path Databases

Page 43: GC free coding in @Java presented @Geecon

Critical Path data store

HFT strategies are; described using graphs. handle events in real time ~10 – 100 μsec. cache state rather than query a database. all custom written libraries AFAIK.

Page 44: GC free coding in @Java presented @Geecon

Critical Path data store

Logging is performed by appending to memory mapped files.

OpenHFT's Java Chronicle makes this easier to do in Java in a GC-free, off heap, lock less way.

Such low level coding is relatively easy in C or C++.

Page 45: GC free coding in @Java presented @Geecon

Non-critical Datastore

Configuration management ZooKeeper, etcd Plain files with Version control LDAP Any distributed key-value store. e.g. MongoDB

Page 46: GC free coding in @Java presented @Geecon

Big Data

Back testing a HFT system is critical and a number of solutions are available

Hadoop Matlab Time series R

Page 47: GC free coding in @Java presented @Geecon

Operational Infrastructure

Control and management infrastructure JMS, JMX Tibco RV, LBM Terracotta MongoDB

Page 48: GC free coding in @Java presented @Geecon

Reliable persistence

Trades and Orders are high value data and less voluminous than Market data or strategy results.

Typically SQL Database. Sometimes multiple databases for different

applications.

Page 49: GC free coding in @Java presented @Geecon

Why use more exotic database?

Mostly for high throughput. Million per second in one node.

Often for low latency. Latencies well below a milli-second.

Page 50: GC free coding in @Java presented @Geecon

Why wouldn't you use exotic DB

Not easy to learn, high knowledge investment.

(!R)@&{&/x!/:2_!x}'!R Often harder to use.

Less management tools. Not designed to work with web applications.

More sensitive to the details of the hardware and what else is running on the same machine.

Page 51: GC free coding in @Java presented @Geecon

Low latency at high throughput

Java Chronicle is designed as a low latency logger and IPC.

At one million small messages per second Almost zero garbage Latency between processes around 1 micro-

second Concurrent readers and writers

Supports bursts of 10 million messages/sec.

Page 52: GC free coding in @Java presented @Geecon

Chronicle and replication

Replication is point to point (TCP)

Server A records an event

– replicates to Server B

Server B reads local copy

– B processes the event

Server B stores the result.

– replicates to Server A

Server A replies.

Round trip 25 micro-seconds99% of the time

GC-freeLock lessOff heap

Unbounded

Page 53: GC free coding in @Java presented @Geecon

HugeCollections

HugeCollections provides key-value storage. Persisted (by the OS) Embedded in multiple processes Concurrent reads and writes Off heap accessible without serialization.

Page 54: GC free coding in @Java presented @Geecon

HugeCollections and throughput

SharedHashMap tested on a machine with 128 GB, 16 cores, 32 threads.

String keys, 64-bit long values. 10 million key-values updated at 37 M/s 500 million key-values updated at 23 M/s On tmpfs, 2.5 billion key-values at 26 M/s

Page 55: GC free coding in @Java presented @Geecon

HugeCollections and latency

For a Map of small key-values (both 64-bit longs)

With an update rate of 1 M/s, one thread.

Percentile 100K

entries1 M entries 10 M entries

50% (typical) 0.1 μsec 0.2 μsec 0.2 μsec

90% (worst 1 in 10) 0.4 μsec 0.5 μsec 0.5 μsec

99% (worst 1 in 100) 4.4 μsec 5.5 μsec 7 μsec

99.9% 9 μsec 10 μsec 10 μsec

99.99% 10 μsec 12 μsec 13 μsec

worst 24 μsec 29 μsec 26 μsec

Page 56: GC free coding in @Java presented @Geecon

Bonus topic: Units

A peak times an application writes 49 “mb/s” to a disk which supports 50 “mb/s” and is replicated over a 100 “mb/s” network.

What units were probably intended and where would you expect buffering if any?

Page 57: GC free coding in @Java presented @Geecon

Bonus topic: Units

A peak times an application writes 49 MiB/s to a disk which supports 50 MB/s and is replicated over a 100 Mb/s network.

MiB = 1024^2 bytes

MB = 1000^2 bytes

Mb = 125,000 bytes

The 49 MiB/s is the highest rate and 100 Mb/s is the lowest.

Page 58: GC free coding in @Java presented @Geecon

Bonus topic: Units

Unit bandwidth Used for

mb - miili-bit mb/s – milli-bits per second ?

mB - milli-byte mB/s – milli-bytes per second ?

kb – kilo-bit (1000) kb/s – kilo-bits (baud) per second Dial up bandwidth

kB – kilo-byte (1000) kB/s – kilo-bytes per second ?

Mb – mega-bit (1000^2) Mb/s – mega-bits (baud) per second Cat 5 ethernet

MB - mega-byte (1000^2) MB/s – mega bytes per second Disk bandwidth

Mib – mibi-bit (1024^2) Mib – Mibi-bits per second ?

MiB – mibi-byte (1024^2) MiB – Mibi-bytes per second Memory bandwidth

Gb – giga-bit (1000^3) Gb/s – giga-bit (baud) per second High speed networks

GB – giga-byte (1000^3) GB/s – giga-byte per second -

Gib – gibi-bit (1024^3) Gib/s – gibi-bit per second -

GiB – gibi-byte (1024^3) GiB/s – gibi-byte per second. Memory Bandwidth