so you want to write your own benchmark

So you want to write

your

own microbenchmark

Dror B

erezni

tsky

Decem

ber 18

th 2008

2

Agenda

• Introduction

• Java™ micro benchmarking pitfalls

• Writing your own benchmark

• Micro benchmarking tools

• Summary

3

Microbenchmark – simple definition

1. Start the

clock

2. Run the code 3. Stop the

clock

4. Report

4

Better microbenchmark definition

• Small program

• Goal: Measure something about a few

lines of code

• All other variables should be removed

• Returns some kind of a numeric

result

5

Why do I need microbenchmarks?

• Discover something about my code:

• How fast is it

• Calculate throughput – TPS, KB/s

• Measure the result of changing my code:

• Should I replace a HashMap with a TreeMap?

• What is the cost of synchronizing a method?

6

Why are you talking about this?

• It’s hard to write a robust

microbenchmark

• it’s even harder to do it in Java™

• There are not enough Java

microbenchmarking tools

• There are too many flawed

microbenchmarks out there

7

Agenda

• Introduction

• Java micro benchmarking pitfalls



• Summary

8

A microbenchmark story: the problem

The boss asks you to solve a performance issue

in one of the components

Blah, blah …

9

A microbenchmark story: the cause

You find out that the cause is excessive use of Math.sqrt()

10

A microbenchmark story: a solution?

• You decide to develop a state of the art

square root approximation

• After developing the square root

approximation you want to benchmark it against the java.lang.Math

implementation

11

public static void main(String[] args) {

long start = System.currentTimeMillis(); // start the clock

for (double i = 0; i < 10 * 1000 * 1000; i++) {

mySqrt(i); // little piece of code

}

long end = System.currentTimeMillis(); // stop the clock

long duration = end - start;

System.out.format("Test duration: %d (ms) %n", duration);

}

SQRT approximation microbenchmark

Let’s run this little piece of code in a loop

and see what happens …

12

SQRT microbenchmark results

Wow, this is really fast !

Test duration: 0 (ms)

13

Flawed microbenchmark

14

SQRT microbenchmark: what’s wrong?

The Java™ HotSpot virtual machine

Dynamic optimizations

On Stack Replacement

Dynamic Compilation

Dead code elimination

Classloading

Garbage collection

15

The HotSpot: a mixed mode system

Profiling

DynamicCompilation

Stuff Happen

Code is interpreted

Interpreted againor recompiled

1

2

3

4

5

16

Dynamic compilation

• Dynamic compilation is unpredictable

• Don’t know when the compiler will run

• Don’t know how long the compiler will run

• Same code may be compiled more than once

• The JVM can switch to compiled code at will

17

• Dynamic compilation can seriously

influence microbenchmark results

Dynamic compilation cont.

Interpreted execution +

Dynamic compilation +

Compiled code execution

≠Compiled / Interpreted code

execution

Continuous recompilation Steady-state

18


• The HotSpot server compiler performs

large variety of optimizations:

• loop unrolling

• range check elimination

• dead-code elimination

• code hoisting …

19

Code hoisting ?

Did he just said

“code

hoisting”?

20

What the heck is code hoisting ?

• Hoist = to raise or lift

• Size optimization

• Eliminate duplicated pieces

of code in method bodies

by hoisting expressions

or statements

21

Code hoisting example

Optimizing Java for Size: Compiler Techniques for Code Compaction, Samuli Heilala

a + b is a busy

expression

After hoisting the

expression a + b. A

new local variable t

has been introduced

22

Dynamic optimizations cont.

• Most of the optimizations are performed

at runtime

• Profiling data is used by the compiler to

improve optimization decisions

• You don’t have access to the dynamically

compiled code

23


long start = System.nanoTime();

int result = 0;

for (int i = 0; i < 10 * 1000 * 1000; i++) {

result += Math.sqrt(i);

}

long duration = (System.nanoTime() - start) / 1000000;


}

Example: Very fast square root?

10,000,000 calls to Math.sqrt() ~ 4 ms

24



int result = 0;

for (int i = 0; i < 10 * 1000 * 1000; i++) {


}

System.out.format("Result: %d %n", result);



}

Example: not so fast?

Now it takes ~ 2000 ms ?!?

Single line

of code

added

25

DCE - Dead Code Elimination

• Dead code - code that has no effect on the

outcome of the program execution



int result = 0;

for (int i = 0; i < 10 * 1000 * 1000; i++) {


}



}

Dead Code

26

OSR - On Stack Replacement

• Methods are HOT if they cumulatively

execute more than 10,000 of loop

iterations

• Older JVM versions did not switch to the

compiled version until the method exited

and was re-entered

• OSR - switch from interpretation to

compiled code in the middle of a loop

27

OSR and microbenchmarking

• OSR’d code may be less performant

• Some optimizations are not performed

• OSR usually happen when you put

everything into one long method

• Developers tend to write long main()

methods when benchmarking

• Real life applications are hopefully divided

into more fine grained methods

28

Classloading

• Classes are usually loaded only when

they are first used

• Class loading takes time

• I/O

• Parsing

• Verification

• May flow your benchmark results

29

Garbage Collection

• JVM automatically claim resources by

• Garbage collection

• Objects finalization

• Outside of developer’s control

• Unpredictable

• Should be measured if invoked as a result

of the benchmarked code

30

Time measurement

public static void main(String[] args) throwsInterruptedException {

long start = System.currentTimeMillis();

Thread.sleep(1);

final long end = System.currentTimeMillis();

final long duration = (end - start);


}

Test duration: 16 (ms)

How long is one millisecond?

31

System.curremtTimeMillis()

• Accuracy varies with platform

Markus KoblerLinux – 2.6 kernel1 ms

Java GlossaryMac OS X 1 ms

David HolmesWindows NT, 2K, XP, 200310 – 15 ms

Java Glossary Windows 95/98 55 ms

SourcePlatformResolution

32

Wrong target platform

• Choosing the wrong platform for your

microbenchmark

• Benchmarking on Windows when your

target platform is Linux

• Benchmarking a highly threaded

application on a single core machine

• Benchmarking on a Sun JVM when the

target platform is Oracle (BEA) JRockit

33

Caching

• Caching

• Hardware – CPU caching

• Operating System – File system caching

• Database – query caching

34

Caching: CPU L1 and L2 caches

• The more the data accessed are far from the CPU, the more the delays are high

• Size of dataset affects access cost

136.44657438128192K

9.82141345116k

Cost (ns)Time (us)Array size

Jcachev2 results for Intel® core™2 duo T8300, L1 = 32 KB, L2 = 3 MB

35

Busy environment

• Running in a busy environment – CPU,

IO, Memory

36

Agenda

• Introduction


•Writing your own benchmark


• Summary

37

Warm-up your code

38

Warm-up up your code

• Let the JVM reach steady state execution

profile before you start benchmarking

• All classes should be loaded before

benchmarking

• Usually executing your code for ~10

seconds should be enough

39

Warm-up up your code – cont.

• Detect JIT compilations by using

• CompilationMXBean.

getTotalCompilationTime()

• -XX:+PrintCompilation

• Measure classloading time

• Use the ClassLoadingMXBean

40

CompilationMXBean usage

import java.lang.management.ManagementFactory;

import java.lang.management.CompilationMXBean;

long compilationTimeTotal;

CompilationMXBean compBean =

ManagementFactory.getCompilationMXBean();

if (compBean.isCompilationTimeMonitoringSupported())

compilationTimeTotal = compBean.getTotalCompilationTime();

41


• Avoid on stack replacement

• Don’t put all your benchmark code in one big main() method

• Avoid dead code elimination

• Print the final result

• Report unreasonable speedups

42

Garbage Collection

• Measure garbage collection time

• Force garbage collection and finalization

before benchmarking

• Perform enough iteration to reach garbage

collection steady state

• Gather gc stats: -XX:PrintGCTimeStamps

-XX:PrintGCDetails

43

Time measurement

• Use System.nanoTime()

• Microseconds accuracy on modern operating

systems and hardware

• Not worse than currentTimeMillis()

• Notice: Windows users

• executes in microseconds

• don’t overuse !

44

JVM configuration

• Use similar JVM options to your target

environment:

• -server or –client JVM

• Enough heap space (-Xmx)

• Garbage collection options

• Thread stack size (-Xss)

• JIT compiling options

45

Other issues

• Use fixed size data sets

• Too large data sets can cause L1 cache

blowout

• Notice system load

• Don’t play GTA while benchmarking !

46

Agenda

• Introduction




• Summary

47

• Various specialized benchmarks

• SPECjAppServer ®

• SPECjvm™

• CaffeineMark 3.0™

• SciMark 2.0

• Only a few benchmarking frameworks

Java™ benchmarking tools

48

Japex Micro-Benchmark framework

• Similar in spirit to JUnit

• Measures throughput – work over time

• Transactions Per Second (Default)

• KBs per second

• XML based configuration

• XML/HTML reports

49

Japex: Drivers

• Encapsulates knowledge about a specific algorithm implementation

• Must extend JapexDriverBase

public interface JapexDriver extends Runnable {

public void initializeDriver();

public void prepare(TestCase testCase);

public void warmup(TestCase testCase);

public void run(TestCase testCase);

public void finish(TestCase testCase);

public void terminateDriver();

}

50

public class SqrtNewtonApproxDriver extends JapexDriverBase {

private long tmp;

…

@Override

public void warmup(TestCase testCase) {

tmp += sqrt(getNextRandomNumber());

}

…

}

Japex: Writing your own driver

51

<testSuite name="SQRT Test Suite"

xmlns=http://www.sun.com/japex/testSuite …>

<param name="libraryDir" value="C:/java/japex/lib"/>

<param name="japex.classPath" value="./target/classes"/>

<param name="japex.runIterations" value="1000000" />

<driver name="SqrtApproxNewtonDriver">

<param name="Description" value="Newton Driver"/>

<param name="japex.driverClass“

value="com.alphacsp.javaedge.benchmark.

japex.driver.SqrtNewtonApproxDriver"/>

</driver>

<testCase name="testcase1"/>

</testSuite>

Japex: Test suite

52

Japex: HTML Reports

53

Japex: more chart types

Scatter chart

Line chart

54

Japex: pros and cons

• Pros

• Similar to JUnit

• Nice HTML reports

• Cons

• Last stable release on March 2007

• HotSpot issues are not handled

• XML configuration

55

Brent Boyer’s Benchmark framework

• Part of the “Robust Java benchmarking”

article by Brent Boyer

• Automate as many aspects as possible:

• Resource reclamation

• Class loading

• Dead code elimination

• Statistics

56

Benchmark framework example

Benchmark.Params params = new Benchmark.Params(true);

params.setExecutionTimeGoal(0.5);

params.setNumberMeasurements(50);

Runnable task = new Runnable() {

public void run() {

sqrt(getNextRandomNumber());

}

};

Benchmark benchmark = new Benchmark(task, params);

System.out.println(benchmark.toString());

57

Benchmark single line summary

first = 25.702 us,

mean = 91.070 ns

(CI deltas: -115.591 ps, +171.423 ps)

sd = 1.451 us (CI deltas: -461.523 ns, +676.964 ns)

WARNING: execution times have mild outliers, SD

VALUES MAY BE INACCURATE

Benchmark output:

58

Outlier and serial correlation issues

• Records outlier and serial correlation issues

• Outliers indicate that a major measurement error happened

• Large outliers - some other activity started on the computer during measurement

• Small outliers might hint that DCE occurred

• Serial correlation indicates that the JVM has not reached its steady-state performance profile

59

Benchmark : pros and cons

• Pros

• Handles HotSpot related issues

• Detailed statistics

• Cons

• Each run takes a lot of time

• Not a formal project

• Lacks documentation

60

Agenda

• Introduction




• Summary

61

Summary 1

• Micro benchmarking is hard when it comes to Java™

• Define what you want to measure and how want to do it, pick your goals

• Know what you are doing

• Always warm-up your code

• Handle DCE, OSR, GC issues

• Use fixed size data sets and fixed work

62

Summary 2

• Do not rely solely on microbenchmark

results

• Sanity check results

• Use a profiler

• Test your code in real life scenarios under

realistic load (macro-benchmark)

63

Summary: resources

• http://www.ibm.com/developerworks/java/librar

y/j-benchmark1.html

• http://www.azulsystems.com/events/javaone_20

02/microbenchmarks.pdf

• https://japex.dev.java.net/

• http://www.ibm.com/developerworks/java/librar

y/j-jtp12214/

• http://www.dei.unipd.it/~bertasi/jcache/

64

Thank Thank

You !You !

so you want to write your own benchmark

Technology

code compaction

fewlines of code

code hoisting examplea

duplicated piecesof

interpretation tocompiled

sqrtisingle lineof code

microbenchmarking osrd

long start