dealing with jvm limitations in apache cassandra (fosdem 2012)

Post on 20-May-2015

13.368 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Dealing with JVM limitationsin Apache Cassandra

Jonathan Ellis / @spyced

Pain points for Java databases

✤ GC✤ GC✤ GC

Pain points for Java databases

✤ GC✤ Platform specific code

GC

✤ Concurrent and compacting: choose one✤ G1✤ Azul C4 / Zing?

Fragmentation

✤ Bloom filter arrays✤ Compression offsets

Fragmentation, 2

✤ Arena allocation for memtables

(Memtables?)

Memory

Hard drive

Memtable

write( , )k1 c1:v1

Commit log

Memory

Hard drive

Memtable

write( , )k1 c1:v

Commit log

k1 c1:v

k1 c1:v

Memory

Hard drive

write( , )k1 c2:v

k1 c1:v

k1 c1:v

k1 c2:v

c2:v

Memory

Hard drive

k1 c1:v

k1 c1:v

k1 c2:v

c2:v

write( , )k2 c1:v c2:v

k2 c1:v c2:v

k2 c1:v c2:v

Memory

Hard drive

k1 c1:v

k1 c1:v

k1 c2:v

c2:v

write( , )k1 c1:v c3:v

k2 c1:v c2:v

k2 c1:v c2:v

k1 c1:v c3:v

c3:v

Memory

Hard drive

SSTable

flush

k1 c1:v c2:v

k2 c1:v c2:v

c3:v

index

cleanup

“Java is a memory hog”

✤ Large overhead for typical objects and collections✤ How large?✤ java.lang.instrument.Instrumentation

✤ JAMM: Java Agent for Memory Measurements✤ https://github.com/jbellis/jamm

org.apache.cassandra.cache.SerializingCache

✤ Live objects are about 85% JVM bookeeping✤ org.apache.cassandra.cache.FreeableMemory using reference

counting✤ Considering doing reference-counted, off-heap memtables

as well

Don’t forget about young gen

✤ Always stop-the-world for ~100ms

Platform-specific code

✤ OS✤ JVM

m[un]map

✤ Log-structured storage wants to remove old files post-compaction; some platforms disallow deleting open files

✤ Old workaround (pre-1.0): ✤ use PhantomReference to tell when mmap’d file is GC (hence

unmapped)✤ Poor user experience and messy corner cases

✤ New workaround:✤ Class.forName("sun.nio.ch.DirectBuffer").getMethod("cleaner")

mmap part 2

✤ 2GB limit via ByteBuffer: public abstract byte get(int index)

✤ Workaround: MmappedSegmentedFilepublic Iterator<DataInput> iterator(long position)

link

✤ Used for snapshots✤ Old workaround: JNA✤ New workaround: supported directly by Java7

mlockall

✤ swappiness: pissing off database developers since 2001 (?)✤ mlockall(MCL_CURRENT)

Low-level i/o

✤ posix_fadvise✤ mincore/fincore✤ fctl

✤ ... JNA

A plug for JNA

✤ https://github.com/twall/jna

static { try { Native.register("c"); ...

private static native int mlockall(int flags) throws LastErrorException;

The fallacy of choosing portability over power

✤ Applets have been dead for years✤ Python gets it right

✤ import readline

The fallacy of choosing safety over power

✤ Allowing munmap would expose developers to segfaults✤ But, relying on the GC to clean up external resources is a

well-known antipattern✤ File.close

✤ We need munmap badly enough that we resort to unnatural and unportable code to get it✤ You haven’t kept us from risking segfaults, you’ve just made us

miserable

Compatibility through obscurity?

✤ sun.misc.Unsafe✤ Used by high-profile libraries like high-scale-lib

... even public options

http://blogs.oracle.com/dave/entry/false_sharing_induced_by_card

Too negative?

Still true

✤ "Many concurrent algorithms are very easy to write with a GC and totally hard (to down right impossible) using explicit free." -- Cliff Click

top related