java in high-performance computing

117
Java in High-Performance Computing Dawid Weiss Carrot Search Institute of Computing Science, Poznan University of Technology GeeCon Pozna ´ n, 05/2010

Upload: others

Post on 03-Feb-2022

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Java in High-Performance Computing

Java in High-Performance Computing

Dawid Weiss

Carrot SearchInstitute of Computing Science, Poznan University of Technology

GeeCon Poznan, 05/2010

Page 2: Java in High-Performance Computing
Page 3: Java in High-Performance Computing
Page 4: Java in High-Performance Computing

Learn from the mistakes of others. You can’t live longenough to make them all yourself.

— Eleanor Roosevelt

Page 5: Java in High-Performance Computing
Page 6: Java in High-Performance Computing

Talk outline

• What is “High performance”?

• What is “Java”?

• Measuring performance (benchmarking).

• HPPC library.

Crosscutting: (un?)common pitfalls and performance killers. SomeHotSpot internals.

Page 7: Java in High-Performance Computing

Talk outline

• What is “High performance”?

• What is “Java”?

• Measuring performance (benchmarking).

• HPPC library.

Crosscutting: (un?)common pitfalls and performance killers. SomeHotSpot internals.

Page 8: Java in High-Performance Computing

Divide-and-conquerstyle algorithm

for (Example e : examples) {e.hasQuiz() ? e.showQuiz() : e.showCode();e.explain();e.deriveConclusions();

}

Page 9: Java in High-Performance Computing
Page 10: Java in High-Performance Computing

— PART I —

High PerformanceComputing

Page 11: Java in High-Performance Computing

High-performance computing (HPC) usessupercomputers and computer clusters to solveadvanced computation problems.

— Wikipedia

Page 12: Java in High-Performance Computing

Is Java faster than C/C++?The short answer is: it depends.

— Cliff Click

Page 13: Java in High-Performance Computing

It’s usually hard to makea fast program run faster.

It’s easy to make a slowprogram run even slower.

It’s easy to make fasthardware run slow.

Page 14: Java in High-Performance Computing

It’s usually hard to makea fast program run faster.

It’s easy to make a slowprogram run even slower.

It’s easy to make fasthardware run slow.

Page 15: Java in High-Performance Computing

It’s usually hard to makea fast program run faster.

It’s easy to make a slowprogram run even slower.

It’s easy to make fasthardware run slow.

Page 16: Java in High-Performance Computing

For now, HPC

• limited allowed computation time,

• constrained resources (hardware, memory).

Good HPC software ∝ no (obvious) flaws.

Page 17: Java in High-Performance Computing

For now, HPC

• limited allowed computation time,

• constrained resources (hardware, memory).

Good HPC software ∝ no (obvious) flaws.

Page 18: Java in High-Performance Computing

— PART II —

What is Java?

(Recall: Is Java faster than C/C++?)

Page 19: Java in High-Performance Computing

Example 1

public void testSum1() {int sum = 0;for (int i = 0; i < COUNT; i++)

sum += sum1(i, i);result = sum;

}

public void testSum2() {int sum = 0;for (int i = 0; i < COUNT; i++)

sum += sum2(i, i);result = sum;

}

where the body of sum1 and sum2 sums arguments and returns theresult and COUNT is significantly large. . .

Page 20: Java in High-Performance Computing

Example 1

public void testSum1() {int sum = 0;for (int i = 0; i < COUNT; i++)

sum += sum1(i, i);result = sum;

}

public void testSum2() {int sum = 0;for (int i = 0; i < COUNT; i++)

sum += sum2(i, i);result = sum;

}

where the body of sum1 and sum2 sums arguments and returns theresult and COUNT is significantly large. . .

Page 21: Java in High-Performance Computing

VM sum1 sum2

sun-1.6.0-20

0.04 2.62sun-1.6.0-16 0.04 3.20sun-1.5.0-18 0.04 3.29

ibm-1.6.2 0.08 6.28jrockit-27.5.0 0.18 0.16

harmony-r917296 0.17 0.35

(averages in sec., 10 measured rounds, 5 warmup, 64-bit Ubuntu, dual-core AMD Athlon 5200).

Page 22: Java in High-Performance Computing

VM sum1 sum2

sun-1.6.0-20 0.04

2.62sun-1.6.0-16 0.04 3.20sun-1.5.0-18 0.04 3.29

ibm-1.6.2 0.08 6.28jrockit-27.5.0 0.18 0.16

harmony-r917296 0.17 0.35

(averages in sec., 10 measured rounds, 5 warmup, 64-bit Ubuntu, dual-core AMD Athlon 5200).

Page 23: Java in High-Performance Computing

VM sum1 sum2

sun-1.6.0-20 0.04 2.62sun-1.6.0-16

0.04 3.20sun-1.5.0-18 0.04 3.29

ibm-1.6.2 0.08 6.28jrockit-27.5.0 0.18 0.16

harmony-r917296 0.17 0.35

(averages in sec., 10 measured rounds, 5 warmup, 64-bit Ubuntu, dual-core AMD Athlon 5200).

Page 24: Java in High-Performance Computing

VM sum1 sum2

sun-1.6.0-20 0.04 2.62sun-1.6.0-16 0.04 3.20sun-1.5.0-18

0.04 3.29ibm-1.6.2 0.08 6.28

jrockit-27.5.0 0.18 0.16harmony-r917296 0.17 0.35

(averages in sec., 10 measured rounds, 5 warmup, 64-bit Ubuntu, dual-core AMD Athlon 5200).

Page 25: Java in High-Performance Computing

VM sum1 sum2

sun-1.6.0-20 0.04 2.62sun-1.6.0-16 0.04 3.20sun-1.5.0-18 0.04 3.29

ibm-1.6.2

0.08 6.28jrockit-27.5.0 0.18 0.16

harmony-r917296 0.17 0.35

(averages in sec., 10 measured rounds, 5 warmup, 64-bit Ubuntu, dual-core AMD Athlon 5200).

Page 26: Java in High-Performance Computing

VM sum1 sum2

sun-1.6.0-20 0.04 2.62sun-1.6.0-16 0.04 3.20sun-1.5.0-18 0.04 3.29

ibm-1.6.2 0.08 6.28jrockit-27.5.0

0.18 0.16harmony-r917296 0.17 0.35

(averages in sec., 10 measured rounds, 5 warmup, 64-bit Ubuntu, dual-core AMD Athlon 5200).

Page 27: Java in High-Performance Computing

VM sum1 sum2

sun-1.6.0-20 0.04 2.62sun-1.6.0-16 0.04 3.20sun-1.5.0-18 0.04 3.29

ibm-1.6.2 0.08 6.28jrockit-27.5.0 0.18 0.16

harmony-r917296

0.17 0.35

(averages in sec., 10 measured rounds, 5 warmup, 64-bit Ubuntu, dual-core AMD Athlon 5200).

Page 28: Java in High-Performance Computing

VM sum1 sum2

sun-1.6.0-20 0.04 2.62sun-1.6.0-16 0.04 3.20sun-1.5.0-18 0.04 3.29

ibm-1.6.2 0.08 6.28jrockit-27.5.0 0.18 0.16

harmony-r917296 0.17 0.35

(averages in sec., 10 measured rounds, 5 warmup, 64-bit Ubuntu, dual-core AMD Athlon 5200).

Page 29: Java in High-Performance Computing

VM sum1 sum2 sum3 sum4

sun-1.6.0-20 0.04 2.62 1.05 3.76sun-1.6.0-16 0.04 3.20 1.39 4.99sun-1.5.0-18 0.04 3.29 1.46 5.20

ibm-1.6.2 0.08 6.28 0.16 14.64jrockit-27.5.0 0.18 0.16 1.16 3.18

harmony-r917296 0.17 0.35 9.18 22.49

(averages in sec., 10 measured rounds, 5 warmup, 64-bit Ubuntu, dual-core AMD Athlon 5200).

Page 30: Java in High-Performance Computing

int sum1(int a, int b) {return a + b;

}

Integer sum2(Integer a, Integer b) {return a + b;

}

Integer sum2(Integer a, Integer b) {return Integer.valueOf(

a.intValue() + b.intValue());}

Page 31: Java in High-Performance Computing

int sum3(int... args) {int sum = 0;for (int i = 0; i < args.length; i++)

sum += args[i];return sum;

}

Integer sum4(Integer... args) {int sum = 0;for (int i = 0; i < args.length; i++) {

sum += args[i];}return sum;

}

Integer sum4(Integer [] args) {// ...

}

Page 32: Java in High-Performance Computing

Conclusions

• Syntactic sugar may be costly.

• Primitive types are fast.

• Large differences between different VMs.

Page 33: Java in High-Performance Computing
Page 34: Java in High-Performance Computing

Example 2

Write once, run anywhere!

Page 35: Java in High-Performance Computing
Page 36: Java in High-Performance Computing
Page 37: Java in High-Performance Computing
Page 38: Java in High-Performance Computing

But it’s the same VM!

Page 39: Java in High-Performance Computing

It works on my machine!

Page 40: Java in High-Performance Computing

private static boolean ready;

public static void startThread() {new Thread() {

public void run() {try {

sleep(2000);} catch (Exception e) { /* ignore */ }System.out.println("Marking loop exit.");ready = true;

}}.start();

}

public static void main(String[] args) {startThread();System.out.println("Entering the loop...");while (!ready) {

// Do nothing.}System.out.println("Done, I left the loop!");

}

Page 41: Java in High-Performance Computing

while (!ready) {// Do nothing.

}≡?

boolean r = ready;while (!r) {

// Do nothing.}

In most cases true, from a JMM perspective.

Page 42: Java in High-Performance Computing

while (!ready) {// Do nothing.

}≡?

boolean r = ready;while (!r) {

// Do nothing.}

In most cases true, from a JMM perspective.

Page 43: Java in High-Performance Computing

JVM Internals. . .

Page 44: Java in High-Performance Computing
Page 45: Java in High-Performance Computing
Page 46: Java in High-Performance Computing
Page 47: Java in High-Performance Computing

C1:

• fast

• not (much) optimization

C2:

• slow(er) than C1

• a lot of JMM-allowed optimizations

Page 48: Java in High-Performance Computing

There are hundreds of JVMtuning/diagnostic switches.

Page 49: Java in High-Performance Computing

My personal favorite:

Page 50: Java in High-Performance Computing

Conclusions

• Bytecode is far from what is executed.

• A lot going on under the (VM) hood.

• Bad code may work, but will eventually crash.

• HotSpot-level optimizations are good.

• If there is a bug in the HotSpot compiler. . .

Page 51: Java in High-Performance Computing

Conclusions

• Bytecode is far from what is executed.

• A lot going on under the (VM) hood.

• Bad code may work, but will eventually crash.

• HotSpot-level optimizations are good.

• If there is a bug in the HotSpot compiler. . .

Page 52: Java in High-Performance Computing
Page 53: Java in High-Performance Computing
Page 54: Java in High-Performance Computing
Page 55: Java in High-Performance Computing

Any other diversifyingfactors?

Page 56: Java in High-Performance Computing
Page 57: Java in High-Performance Computing

J2ME

• more VM vendors,

• hardware diversity,

• software and hardware quirks.

Page 58: Java in High-Performance Computing
Page 59: Java in High-Performance Computing

Non-JVM target platforms

• Dalvik

• GWT

• IKVM

Page 60: Java in High-Performance Computing
Page 61: Java in High-Performance Computing

Conclusions

• There is no “single” Java performance model.

• Performance depends on the VM,environment, class library, hardware.

• Apply benchmark-and-correct cycle.

Page 62: Java in High-Performance Computing

Benchmarking

Page 63: Java in High-Performance Computing

Example 3

public void testSum1() {int sum = 0;for (int i = 0; i < COUNT; i++)

sum += sum1(i, i);result = sum;

}

public void testSum1_2() {int sum = 0;for (int i = 0; i < COUNT; i++)

sum += sum1(i, i);}

Page 64: Java in High-Performance Computing

VM sum1 sum1_2

sun-1.6.0-20

0.04 0.00sun-1.6.0-16 0.04 0.00sun-1.5.0-18 0.04 0.00

ibm-1.6.2 0.08 0.01jrockit-27.5.0 0.17 0.08

harmony-r917296 0.17 0.11

(averages in sec., 10 measured rounds, 5 warmup, 64-bit Ubuntu, dual-core AMD Athlon 5200).

Page 65: Java in High-Performance Computing

VM sum1 sum1_2

sun-1.6.0-20 0.04

0.00sun-1.6.0-16 0.04 0.00sun-1.5.0-18 0.04 0.00

ibm-1.6.2 0.08 0.01jrockit-27.5.0 0.17 0.08

harmony-r917296 0.17 0.11

(averages in sec., 10 measured rounds, 5 warmup, 64-bit Ubuntu, dual-core AMD Athlon 5200).

Page 66: Java in High-Performance Computing

VM sum1 sum1_2

sun-1.6.0-20 0.04 0.00

sun-1.6.0-16 0.04 0.00sun-1.5.0-18 0.04 0.00

ibm-1.6.2 0.08 0.01jrockit-27.5.0 0.17 0.08

harmony-r917296 0.17 0.11

(averages in sec., 10 measured rounds, 5 warmup, 64-bit Ubuntu, dual-core AMD Athlon 5200).

Page 67: Java in High-Performance Computing

VM sum1 sum1_2

sun-1.6.0-20 0.04 0.00sun-1.6.0-16 0.04 0.00sun-1.5.0-18 0.04 0.00

ibm-1.6.2 0.08 0.01jrockit-27.5.0 0.17 0.08

harmony-r917296 0.17 0.11

(averages in sec., 10 measured rounds, 5 warmup, 64-bit Ubuntu, dual-core AMD Athlon 5200).

Page 68: Java in High-Performance Computing
Page 69: Java in High-Performance Computing

java -server -XX:+PrintOptoAssembly -XX:+PrintCompilation ...

- method holder: ’com/dawidweiss/geecon2010/Example03’- access: 0xc1000001 public- name: ’testSum1_2’

...010 pushq rbp

subq rsp, #16 # Create framenop # nop for patch_verified_entry

016 addq rsp, 16 # Destroy framepopq rbptestl rax, [rip + #offset_to_poll_page] # Safepoint: poll for GC

021 ret

Page 70: Java in High-Performance Computing

java -server -XX:+PrintOptoAssembly -XX:+PrintCompilation ...

- method holder: ’com/dawidweiss/geecon2010/Example03’- access: 0xc1000001 public- name: ’testSum1_2’

...010 pushq rbp

subq rsp, #16 # Create framenop # nop for patch_verified_entry

016 addq rsp, 16 # Destroy framepopq rbptestl rax, [rip + #offset_to_poll_page] # Safepoint: poll for GC

021 ret

Page 71: Java in High-Performance Computing

Conclusions

• Benchmarks must be executed to providefeedback.

• HotSpot is smart and effective at removingdead code.

Page 72: Java in High-Performance Computing

Example 4

@Testpublic void testAdd1() {

int sum = 0;for (int i = 0; i < COUNT; i++) {

sum += add1(i);}guard = sum;

}

public int add1(int i) {return i + 1;

}

Note add1 is virtual.

Page 73: Java in High-Performance Computing

switch testAdd1

-XX:+Inlining -XX:+PrintInlining 0.04-XX:-Inlining ?

(averages in sec., 10 measured rounds, 5 warmup, 64-bit Ubuntu, dual-core AMD Athlon 5200, JRE 1.7b80-debug).

Page 74: Java in High-Performance Computing

switch testAdd1

-XX:+Inlining -XX:+PrintInlining 0.04-XX:-Inlining 0.45

(averages in sec., 10 measured rounds, 5 warmup, 64-bit Ubuntu, dual-core AMD Athlon 5200, JRE 1.7b80-debug).

Page 75: Java in High-Performance Computing

Most Java calls aremonomorphic.

Page 76: Java in High-Performance Computing

HotSpot adjusts tomegamorphic calls

automatically.

Page 77: Java in High-Performance Computing

Example 5

abstract class Superclass {abstract int call();

}

class Sub1 extends Superclass{ int call() { return 1; } }

class Sub2 extends Superclass{ int call() { return 2; } }

class Sub3 extends Superclass{ int call() { return 3; } }

Superclass[] mixed =initWithRandomInstances(10000);

Superclass[] solid =initWithSub1Instances(10000);

@Testpublic void testMonomorphic() {

int sum = 0;int m = solid.length;for (int i = 0; i < COUNT; i++)

sum += solid[i % m].call();guard = sum;

}

@Testpublic void testMegamorphic() {

int sum = 0;int m = mixed.length;for (int i = 0; i < COUNT; i++)

sum += mixed[i % m].call();guard = sum;

}

Page 78: Java in High-Performance Computing

VM monomorphic megamorphic

sun-1.6.0-20 0.19 0.32sun-1.6.0-16 0.19 0.34sun-1.5.0-18 0.18 0.34

ibm-1.6.2 0.20 0.30jrockit-27.5.0 0.22 0.29

harmony-r917296 0.27 0.32

(averages in sec., 10 measured rounds, 5 warmup, 64-bit Ubuntu, dual-core AMD Athlon 5200).

Page 79: Java in High-Performance Computing

Example 6

@Testpublic void testBitCount1() {

int sum = 0;for (int i = 0; i < COUNT; i++)

sum += Integer.bitCount(i);guard = sum;

}

@Testpublic void testBitCount2() {

int sum = 0;for (int i = 0; i < COUNT; i++)

sum += bitCount(i);guard = sum;

}

/* Copied from* {@link Integer#bitCount}*/

static int bitCount(int i) {// HD, Figure 5-2i = i - ((i >>> 1)

& 0x55555555);i = (i & 0x33333333)

+ ((i >>> 2) & 0x33333333);i = (i + (i >>> 4))

& 0x0f0f0f0f;i = i + (i >>> 8);i = i + (i >>> 16);return i & 0x3f;

}

Page 80: Java in High-Performance Computing

VM testBitCount1 testBitCount2

sun-1.6.0-20 0.43 0.43sun-1.7.0-b80 0.43 0.43

(averages in sec., 10 measured rounds, 5 warmup, 64-bit Ubuntu, dual-core AMD Athlon 5200).

VM testBitCount1 testBitCount2

sun-1.6.0-20 0.08 0.33sun-1.7.0-b83 0.07 0.32

(averages in sec., 10 measured rounds, 5 warmup, 64-bit Windows 7, Intel I7 860).

Page 81: Java in High-Performance Computing

VM testBitCount1 testBitCount2

sun-1.6.0-20 0.43 0.43sun-1.7.0-b80 0.43 0.43

(averages in sec., 10 measured rounds, 5 warmup, 64-bit Ubuntu, dual-core AMD Athlon 5200).

VM testBitCount1 testBitCount2

sun-1.6.0-20 0.08 0.33sun-1.7.0-b83 0.07 0.32

(averages in sec., 10 measured rounds, 5 warmup, 64-bit Windows 7, Intel I7 860).

Page 82: Java in High-Performance Computing

... -XX:+PrintInlining ...

...Inlining intrinsic _bitCount_i at bci:9 in ..Example06::testBitCount1Inlining intrinsic _bitCount_i at bci:9 in ..Example06::testBitCount1Inlining intrinsic _bitCount_i at bci:9 in ..Example06::testBitCount1Example06.testBitCount1: [measured 10 out of 15 rounds]round: 0.07 [+- 0.00], round.gc: 0.00 [+- 0.00] ...

@ 9 com.dawidweiss.geecon2010.Example06::bitCount inline (hot)@ 9 com.dawidweiss.geecon2010.Example06::bitCount inline (hot)@ 9 com.dawidweiss.geecon2010.Example06::bitCount inline (hot)

Example06.testBitCount2: [measured 10 out of 15 rounds]round: 0.32 [+- 0.01], round.gc: 0.00 [+- 0.00] ...

Page 83: Java in High-Performance Computing

... -XX:+PrintInlining ...

...Inlining intrinsic _bitCount_i at bci:9 in ..Example06::testBitCount1Inlining intrinsic _bitCount_i at bci:9 in ..Example06::testBitCount1Inlining intrinsic _bitCount_i at bci:9 in ..Example06::testBitCount1Example06.testBitCount1: [measured 10 out of 15 rounds]round: 0.07 [+- 0.00], round.gc: 0.00 [+- 0.00] ...

@ 9 com.dawidweiss.geecon2010.Example06::bitCount inline (hot)@ 9 com.dawidweiss.geecon2010.Example06::bitCount inline (hot)@ 9 com.dawidweiss.geecon2010.Example06::bitCount inline (hot)

Example06.testBitCount2: [measured 10 out of 15 rounds]round: 0.32 [+- 0.01], round.gc: 0.00 [+- 0.00] ...

Page 84: Java in High-Performance Computing

... -XX:+PrintOptoAssembly ...

{method}- klass: {other class}- method holder: com/dawidweiss/geecon2010/Example06- name: testBitCount1

...0c2 B13: # B12 B14 &lt;- B8 B12 Loop: B13-B12 inner stride: ...0c2 movl R10, RDX # spill...0e1 movl [rsp + #40], R11 # spill0e6 popcnt R8, R8...0f5 addl R9, #7 # int0f9 popcnt R11, R110fe popcnt RCX, R9

Page 85: Java in High-Performance Computing

... -XX:+PrintOptoAssembly ...

{method}- klass: {other class}- method holder: com/dawidweiss/geecon2010/Example06- name: testBitCount1

...0c2 B13: # B12 B14 &lt;- B8 B12 Loop: B13-B12 inner stride: ...0c2 movl R10, RDX # spill...0e1 movl [rsp + #40], R11 # spill0e6 popcnt R8, R8...0f5 addl R9, #7 # int0f9 popcnt R11, R110fe popcnt RCX, R9

Page 86: Java in High-Performance Computing
Page 87: Java in High-Performance Computing
Page 88: Java in High-Performance Computing

Conclusions

• Benchmarks must be statistically sound.→ averages, variance, min, max, warm-up phase

• Account for HotSpot optimisations.

• Account for hardware differences.→ test-on-target

• Use domain data and real scenarios.

• Inspect suspicious output with debug JVM.

See more: Cliff Click, http://java.sun.com/javaone/2009/articles/rockstar_click.jsp.

Page 89: Java in High-Performance Computing

HPPCHigh Performance Primitive Collections

Page 90: Java in High-Performance Computing

Motivation

• Primitive types: fast and memory-friendly.

• Optional assertions.

• Single-threaded. No fail-fast.

• Fast, fast, fast iterators, with no GC overhead.

• Open internals (explicit implementation).

• Programmers know what they’re doing.

Page 91: Java in High-Performance Computing

Why not JCF?

public interface List<E> extends Collection<E> {boolean contains(Object o); // [-] contract-enforced methodsIterator<E> iterator(); // [-] iterators over primitive types?Object[] toArray(); // [-] troublesome covariants...

Page 92: Java in High-Performance Computing

Friendly Competition• fastutil

• PCJ

• GNU Trove

• Apache Mahout (ported COLT)

• Apache Primitive Collections

All of these have pros and cons and deal with JCF compatibilitysomehow.

Page 93: Java in High-Performance Computing

Iterators in fastutil or PCJ

interface IntIterator extends Iterator<Integer> {// Primitive-specific methodint nextInt();

}

Page 94: Java in High-Performance Computing

Iterators in HPPC

public final class IntCursor {public int index;public int value;

}

public class IntArrayList extends Iterable<IntCursor> {Iterator<IntCursor> iterator() { ... }

}

Page 95: Java in High-Performance Computing

Iterating over list elements in HPPC

for (IntCursor c : list) {System.out.println(c.index + ": " + c.value);

}

...or

list.forEach(new IntProcedure() {public void apply(int value) {

System.out.println(value);}

});

...or

final int [] buffer = list.buffer;final int size = list.size();

for (int i = 0; i < size; i++) {System.out.println(i + ": " + buffer[i]);

}

Page 96: Java in High-Performance Computing

Iterating over list elements in HPPC

for (IntCursor c : list) {System.out.println(c.index + ": " + c.value);

}

...or

list.forEach(new IntProcedure() {public void apply(int value) {

System.out.println(value);}

});

...or

final int [] buffer = list.buffer;final int size = list.size();

for (int i = 0; i < size; i++) {System.out.println(i + ": " + buffer[i]);

}

Page 97: Java in High-Performance Computing

Iterating over list elements in HPPC

for (IntCursor c : list) {System.out.println(c.index + ": " + c.value);

}

...or

list.forEach(new IntProcedure() {public void apply(int value) {

System.out.println(value);}

});

...or

final int [] buffer = list.buffer;final int size = list.size();

for (int i = 0; i < size; i++) {System.out.println(i + ": " + buffer[i]);

}

Page 98: Java in High-Performance Computing

The fastest one?

Page 99: Java in High-Performance Computing
Page 100: Java in High-Performance Computing
Page 101: Java in High-Performance Computing

What’s in HPPC?

Page 102: Java in High-Performance Computing
Page 103: Java in High-Performance Computing

Open implementation isgood.

Page 104: Java in High-Performance Computing

/*** Applies a supplemental hash function to a given* hashCode, which defends against poor quality* hash functions. [...]*/

static int hash(int h) {// This function ensures that hashCodes that differ only by// constant multiples at each bit position have a bounded// number of collisions (approximately 8 at default load factor).h ^= (h >>> 20) ^ (h >>> 12);return h ^ (h >>> 7) ^ (h >>> 4);

}

HashMap rehashes your (carefully crafted) hash code.

Page 105: Java in High-Performance Computing

HPPC approach (example):

public class LongIntOpenHashMap implements LongIntMap {// ...public LongIntOpenHashMap(int initialCapacity, float loadFactor,

LongHashFunction keyHashFunction, IntHashFunction valueHashFunction) {// ...

}

Defaults: LongMurmurHash, IntHashFunction.

Page 106: Java in High-Performance Computing

Example 7

Frequency count of character bigrams in a given text.

Page 107: Java in High-Performance Computing

• HPPC:

final char [] CHARS = DATA;final IntIntOpenHashMap counts = new IntIntOpenHashMap();for (int i = 0; i < CHARS.length - 1; i++) {

counts.putOrAdd((CHARS[i] << 16 | CHARS[i + 1]), 1, 1);}

• JCF, boxed integer types.

final Integer currentCount = map.get(bigram);map.put(bigram, currentCount == null ? 1 : currentCount + 1);

• JCF, with IntHolder (mutable value object).

• GNU Trove

map.adjustOrPutValue(bigram, 1, 1);

• fastutil, OpenHashMap and LinkedOpenHashMap

map.put(bigram, map.get(bigram) + 1);

• PCJ, OpenHashMap and ChainedHashMap

Page 108: Java in High-Performance Computing
Page 109: Java in High-Performance Computing
Page 110: Java in High-Performance Computing

Is Java faster than C/C++?The short answer is: it depends.

— Cliff Click

Page 111: Java in High-Performance Computing

Example 8

The same algorithm for building a DFSA automaton accepting aset of strings. Input: 3 565 575 strings, 158M of text.

gcc -O2 java 1.6.0_20-64

real

63.850s 43.197s

user

63.110s 46.370s

sys

0.240s 0.840s

Page 112: Java in High-Performance Computing

Example 8

The same algorithm for building a DFSA automaton accepting aset of strings. Input: 3 565 575 strings, 158M of text.

gcc -O2 java 1.6.0_20-64

real

63.850s 43.197s

user

63.110s 46.370s

sys

0.240s 0.840s

Page 113: Java in High-Performance Computing

Example 8

The same algorithm for building a DFSA automaton accepting aset of strings. Input: 3 565 575 strings, 158M of text.

gcc -O2 java 1.6.0_20-64

real 63.850s

43.197s

user 63.110s

46.370s

sys 0.240s

0.840s

Page 114: Java in High-Performance Computing

Example 8

The same algorithm for building a DFSA automaton accepting aset of strings. Input: 3 565 575 strings, 158M of text.

gcc -O2 java 1.6.0_20-64

real 63.850s 43.197suser 63.110s 46.370ssys 0.240s 0.840s

Page 115: Java in High-Performance Computing

Summary and Conclusions

Page 116: Java in High-Performance Computing

Performance checklist(sanity check)

• Algorithms, algorithms, algorithms.

• Proper data structures.

• Spurious GC activity.

• Memory barriers in tight loops.

• CPU cache utilization.

• Low-level, hotspot-specific code structuring.

Page 117: Java in High-Performance Computing

HPPC and junit-benchmarks are at:http://labs.carrotsearch.com