so you want to write your own benchmark
DESCRIPTION
Performance has always been a major concern in software development and should not be taken lightly even when commodity computers have multicore CPUs and a few gigabytes of RAM. One of the most handy, simple tools for performance testing are microbenchmarks. Unfortunately, developing correct Java microbenchmarks is a complex task with many pitfalls on the way. This presentation is about the Do's and Don'ts of Java microbenchmarking and about what tools are out there to help with this tricky task.TRANSCRIPT
So you want to write
your
own microbenchmark
Dror B
erezni
tsky
Decem
ber 18
th 2008
2
Agenda
• Introduction
• Java™ micro benchmarking pitfalls
• Writing your own benchmark
• Micro benchmarking tools
• Summary
3
Microbenchmark – simple definition
1. Start the
clock
2. Run the code 3. Stop the
clock
4. Report
4
Better microbenchmark definition
• Small program
• Goal: Measure something about a few
lines of code
• All other variables should be removed
• Returns some kind of a numeric
result
5
Why do I need microbenchmarks?
• Discover something about my code:
• How fast is it
• Calculate throughput – TPS, KB/s
• Measure the result of changing my code:
• Should I replace a HashMap with a TreeMap?
• What is the cost of synchronizing a method?
6
Why are you talking about this?
• It’s hard to write a robust
microbenchmark
• it’s even harder to do it in Java™
• There are not enough Java
microbenchmarking tools
• There are too many flawed
microbenchmarks out there
7
Agenda
• Introduction
• Java micro benchmarking pitfalls
• Writing your own benchmark
• Micro benchmarking tools
• Summary
8
A microbenchmark story: the problem
The boss asks you to solve a performance issue
in one of the components
Blah, blah …
9
A microbenchmark story: the cause
You find out that the cause is excessive use of Math.sqrt()
10
A microbenchmark story: a solution?
• You decide to develop a state of the art
square root approximation
• After developing the square root
approximation you want to benchmark it against the java.lang.Math
implementation
11
public static void main(String[] args) {
long start = System.currentTimeMillis(); // start the clock
for (double i = 0; i < 10 * 1000 * 1000; i++) {
mySqrt(i); // little piece of code
}
long end = System.currentTimeMillis(); // stop the clock
long duration = end - start;
System.out.format("Test duration: %d (ms) %n", duration);
}
SQRT approximation microbenchmark
Let’s run this little piece of code in a loop
and see what happens …
12
SQRT microbenchmark results
Wow, this is really fast !
Test duration: 0 (ms)
13
Flawed microbenchmark
14
SQRT microbenchmark: what’s wrong?
The Java™ HotSpot virtual machine
Dynamic optimizations
On Stack Replacement
Dynamic Compilation
Dead code elimination
Classloading
Garbage collection
15
The HotSpot: a mixed mode system
Profiling
DynamicCompilation
Stuff Happen
Code is interpreted
Interpreted againor recompiled
1
2
3
4
5
16
Dynamic compilation
• Dynamic compilation is unpredictable
• Don’t know when the compiler will run
• Don’t know how long the compiler will run
• Same code may be compiled more than once
• The JVM can switch to compiled code at will
17
• Dynamic compilation can seriously
influence microbenchmark results
Dynamic compilation cont.
Interpreted execution +
Dynamic compilation +
Compiled code execution
≠Compiled / Interpreted code
execution
Continuous recompilation Steady-state
18
Dynamic optimizations
• The HotSpot server compiler performs
large variety of optimizations:
• loop unrolling
• range check elimination
• dead-code elimination
• code hoisting …
19
Code hoisting ?
Did he just said
“code
hoisting”?
20
What the heck is code hoisting ?
• Hoist = to raise or lift
• Size optimization
• Eliminate duplicated pieces
of code in method bodies
by hoisting expressions
or statements
21
Code hoisting example
Optimizing Java for Size: Compiler Techniques for Code Compaction, Samuli Heilala
a + b is a busy
expression
After hoisting the
expression a + b. A
new local variable t
has been introduced
22
Dynamic optimizations cont.
• Most of the optimizations are performed
at runtime
• Profiling data is used by the compiler to
improve optimization decisions
• You don’t have access to the dynamically
compiled code
23
public static void main(String[] args) {
long start = System.nanoTime();
int result = 0;
for (int i = 0; i < 10 * 1000 * 1000; i++) {
result += Math.sqrt(i);
}
long duration = (System.nanoTime() - start) / 1000000;
System.out.format("Test duration: %d (ms) %n", duration);
}
Example: Very fast square root?
10,000,000 calls to Math.sqrt() ~ 4 ms
24
public static void main(String[] args) {
long start = System.nanoTime();
int result = 0;
for (int i = 0; i < 10 * 1000 * 1000; i++) {
result += Math.sqrt(i);
}
System.out.format("Result: %d %n", result);
long duration = (System.nanoTime() - start) / 1000000;
System.out.format("Test duration: %d (ms) %n", duration);
}
Example: not so fast?
Now it takes ~ 2000 ms ?!?
Single line
of code
added
25
DCE - Dead Code Elimination
• Dead code - code that has no effect on the
outcome of the program execution
public static void main(String[] args) {
long start = System.nanoTime();
int result = 0;
for (int i = 0; i < 10 * 1000 * 1000; i++) {
result += Math.sqrt(i);
}
long duration = (System.nanoTime() - start) / 1000000;
System.out.format("Test duration: %d (ms) %n", duration);
}
Dead Code
26
OSR - On Stack Replacement
• Methods are HOT if they cumulatively
execute more than 10,000 of loop
iterations
• Older JVM versions did not switch to the
compiled version until the method exited
and was re-entered
• OSR - switch from interpretation to
compiled code in the middle of a loop
27
OSR and microbenchmarking
• OSR’d code may be less performant
• Some optimizations are not performed
• OSR usually happen when you put
everything into one long method
• Developers tend to write long main()
methods when benchmarking
• Real life applications are hopefully divided
into more fine grained methods
28
Classloading
• Classes are usually loaded only when
they are first used
• Class loading takes time
• I/O
• Parsing
• Verification
• May flow your benchmark results
29
Garbage Collection
• JVM automatically claim resources by
• Garbage collection
• Objects finalization
• Outside of developer’s control
• Unpredictable
• Should be measured if invoked as a result
of the benchmarked code
30
Time measurement
public static void main(String[] args) throwsInterruptedException {
long start = System.currentTimeMillis();
Thread.sleep(1);
final long end = System.currentTimeMillis();
final long duration = (end - start);
System.out.format("Test duration: %d (ms) %n", duration);
}
Test duration: 16 (ms)
How long is one millisecond?
31
System.curremtTimeMillis()
• Accuracy varies with platform
Markus KoblerLinux – 2.6 kernel1 ms
Java GlossaryMac OS X 1 ms
David HolmesWindows NT, 2K, XP, 200310 – 15 ms
Java Glossary Windows 95/98 55 ms
SourcePlatformResolution
32
Wrong target platform
• Choosing the wrong platform for your
microbenchmark
• Benchmarking on Windows when your
target platform is Linux
• Benchmarking a highly threaded
application on a single core machine
• Benchmarking on a Sun JVM when the
target platform is Oracle (BEA) JRockit
33
Caching
• Caching
• Hardware – CPU caching
• Operating System – File system caching
• Database – query caching
34
Caching: CPU L1 and L2 caches
• The more the data accessed are far from the CPU, the more the delays are high
• Size of dataset affects access cost
136.44657438128192K
9.82141345116k
Cost (ns)Time (us)Array size
Jcachev2 results for Intel® core™2 duo T8300, L1 = 32 KB, L2 = 3 MB
35
Busy environment
• Running in a busy environment – CPU,
IO, Memory
36
Agenda
• Introduction
• Java micro benchmarking pitfalls
•Writing your own benchmark
• Micro benchmarking tools
• Summary
37
Warm-up your code
38
Warm-up up your code
• Let the JVM reach steady state execution
profile before you start benchmarking
• All classes should be loaded before
benchmarking
• Usually executing your code for ~10
seconds should be enough
39
Warm-up up your code – cont.
• Detect JIT compilations by using
• CompilationMXBean.
getTotalCompilationTime()
• -XX:+PrintCompilation
• Measure classloading time
• Use the ClassLoadingMXBean
40
CompilationMXBean usage
import java.lang.management.ManagementFactory;
import java.lang.management.CompilationMXBean;
long compilationTimeTotal;
CompilationMXBean compBean =
ManagementFactory.getCompilationMXBean();
if (compBean.isCompilationTimeMonitoringSupported())
compilationTimeTotal = compBean.getTotalCompilationTime();
41
Dynamic optimizations
• Avoid on stack replacement
• Don’t put all your benchmark code in one big main() method
• Avoid dead code elimination
• Print the final result
• Report unreasonable speedups
42
Garbage Collection
• Measure garbage collection time
• Force garbage collection and finalization
before benchmarking
• Perform enough iteration to reach garbage
collection steady state
• Gather gc stats: -XX:PrintGCTimeStamps
-XX:PrintGCDetails
43
Time measurement
• Use System.nanoTime()
• Microseconds accuracy on modern operating
systems and hardware
• Not worse than currentTimeMillis()
• Notice: Windows users
• executes in microseconds
• don’t overuse !
44
JVM configuration
• Use similar JVM options to your target
environment:
• -server or –client JVM
• Enough heap space (-Xmx)
• Garbage collection options
• Thread stack size (-Xss)
• JIT compiling options
45
Other issues
• Use fixed size data sets
• Too large data sets can cause L1 cache
blowout
• Notice system load
• Don’t play GTA while benchmarking !
46
Agenda
• Introduction
• Java micro benchmarking pitfalls
• Writing your own benchmark
• Micro benchmarking tools
• Summary
47
• Various specialized benchmarks
• SPECjAppServer ®
• SPECjvm™
• CaffeineMark 3.0™
• SciMark 2.0
• Only a few benchmarking frameworks
Java™ benchmarking tools
48
Japex Micro-Benchmark framework
• Similar in spirit to JUnit
• Measures throughput – work over time
• Transactions Per Second (Default)
• KBs per second
• XML based configuration
• XML/HTML reports
49
Japex: Drivers
• Encapsulates knowledge about a specific algorithm implementation
• Must extend JapexDriverBase
public interface JapexDriver extends Runnable {
public void initializeDriver();
public void prepare(TestCase testCase);
public void warmup(TestCase testCase);
public void run(TestCase testCase);
public void finish(TestCase testCase);
public void terminateDriver();
}
50
public class SqrtNewtonApproxDriver extends JapexDriverBase {
private long tmp;
…
@Override
public void warmup(TestCase testCase) {
tmp += sqrt(getNextRandomNumber());
}
…
}
Japex: Writing your own driver
51
<testSuite name="SQRT Test Suite"
xmlns=http://www.sun.com/japex/testSuite …>
<param name="libraryDir" value="C:/java/japex/lib"/>
<param name="japex.classPath" value="./target/classes"/>
<param name="japex.runIterations" value="1000000" />
<driver name="SqrtApproxNewtonDriver">
<param name="Description" value="Newton Driver"/>
<param name="japex.driverClass“
value="com.alphacsp.javaedge.benchmark.
japex.driver.SqrtNewtonApproxDriver"/>
</driver>
<testCase name="testcase1"/>
</testSuite>
Japex: Test suite
52
Japex: HTML Reports
53
Japex: more chart types
Scatter chart
Line chart
54
Japex: pros and cons
• Pros
• Similar to JUnit
• Nice HTML reports
• Cons
• Last stable release on March 2007
• HotSpot issues are not handled
• XML configuration
55
Brent Boyer’s Benchmark framework
• Part of the “Robust Java benchmarking”
article by Brent Boyer
• Automate as many aspects as possible:
• Resource reclamation
• Class loading
• Dead code elimination
• Statistics
56
Benchmark framework example
Benchmark.Params params = new Benchmark.Params(true);
params.setExecutionTimeGoal(0.5);
params.setNumberMeasurements(50);
Runnable task = new Runnable() {
public void run() {
sqrt(getNextRandomNumber());
}
};
Benchmark benchmark = new Benchmark(task, params);
System.out.println(benchmark.toString());
57
Benchmark single line summary
first = 25.702 us,
mean = 91.070 ns
(CI deltas: -115.591 ps, +171.423 ps)
sd = 1.451 us (CI deltas: -461.523 ns, +676.964 ns)
WARNING: execution times have mild outliers, SD
VALUES MAY BE INACCURATE
Benchmark output:
58
Outlier and serial correlation issues
• Records outlier and serial correlation issues
• Outliers indicate that a major measurement error happened
• Large outliers - some other activity started on the computer during measurement
• Small outliers might hint that DCE occurred
• Serial correlation indicates that the JVM has not reached its steady-state performance profile
59
Benchmark : pros and cons
• Pros
• Handles HotSpot related issues
• Detailed statistics
• Cons
• Each run takes a lot of time
• Not a formal project
• Lacks documentation
60
Agenda
• Introduction
• Java micro benchmarking pitfalls
• Writing your own benchmark
• Micro benchmarking tools
• Summary
61
Summary 1
• Micro benchmarking is hard when it comes to Java™
• Define what you want to measure and how want to do it, pick your goals
• Know what you are doing
• Always warm-up your code
• Handle DCE, OSR, GC issues
• Use fixed size data sets and fixed work
62
Summary 2
• Do not rely solely on microbenchmark
results
• Sanity check results
• Use a profiler
• Test your code in real life scenarios under
realistic load (macro-benchmark)
63
Summary: resources
• http://www.ibm.com/developerworks/java/librar
y/j-benchmark1.html
• http://www.azulsystems.com/events/javaone_20
02/microbenchmarks.pdf
• https://japex.dev.java.net/
• http://www.ibm.com/developerworks/java/librar
y/j-jtp12214/
• http://www.dei.unipd.it/~bertasi/jcache/
64
Thank Thank
You !You !