java profiling do it yourself
TRANSCRIPT
JVM diagnostic interfaces
• JMX• JVMTI – native API only• Attach API
Ad hoc instrumentation and more
• Perf counters• Heap dump • Flight recorder
MBeans: threading
CPU usage per thread (user / sys) Memory allocation per thread Block / wait times
Should be enabled Stack traces
Invaluable
SJK: ttop
2014-10-01T19:27:22.825+0400 Process summary process cpu=101.80% application cpu=100.50% (user=86.21% sys=14.29%) other: cpu=1.30% GC cpu=0.00% (young=0.00%, old=0.00%) heap allocation rate 123mb/s[000037] user=83.66% sys=14.02% alloc= 121mb/s - Proxy:ExtendTcpProxyService1:TcpAcceptor:TcpProcessor[000075] user= 0.97% sys= 0.08% alloc= 411kb/s - RMI TCP Connection(35)-10.139.200.51[000029] user= 0.61% sys=-0.00% alloc= 697kb/s - Invocation:Management[000073] user= 0.49% sys=-0.01% alloc= 343kb/s - RMI TCP Connection(33)-10.128.46.114[000023] user= 0.24% sys=-0.01% alloc= 10kb/s - PacketPublisher[000022] user= 0.00% sys= 0.10% alloc= 11kb/s - PacketReceiver[000072] user= 0.00% sys= 0.07% alloc= 22kb/s - RMI TCP Connection(31)-10.139.207.76[000056] user= 0.00% sys= 0.05% alloc= 20kb/s - RMI TCP Connection(25)-10.139.207.76[000026] user= 0.12% sys=-0.07% alloc= 2217b/s - Cluster|Member(Id=18, Timestamp=2014-10-01 15:58:3 ...[000076] user= 0.00% sys= 0.04% alloc= 6657b/s - JMX server connection timeout 76[000021] user= 0.00% sys= 0.03% alloc= 526b/s - PacketListener1P[000034] user= 0.00% sys= 0.02% alloc= 1537b/s - Proxy:ExtendTcpProxyService1[000049] user= 0.00% sys= 0.02% alloc= 6011b/s - JMX server connection timeout 49[000032] user= 0.00% sys= 0.01% alloc= 0b/s - DistributedCache
https://github.com/aragozin/jvm-tools/blob/master/sjk-core/COMMANDS.md#ttop-command
Available via PerfCounters
MBeans: memory
• Memory geometry information• Collection count• Last collection details
for each collector • GC events available as notifications
since Java 7
SJK: GC
[GC: Copy#1806 time: 7ms interval: 332ms mem: Eden Space: 108032k-108032k->0k[max:191168k,rate:-325397.59kb/s] Tenured Gen: 162185k+14k->162199k[max:477888k,rate:42.22kb/s] Survivor Space: 235k-13k->222k[max:23872k,rate:-41.93kb/s]]
[GC: Copy#1807 time: 8ms interval: 338ms mem: Eden Space: 108032k-108032k->0k[max:191168k,rate:-319621.30kb/s] Tenured Gen: 162199k+219k->162418k[max:477888k,rate:648.30kb/s] Survivor Space: 222k-217k->4k[max:23872k,rate:-644.90kb/s]]
[GC: Copy#1808 time: 7ms interval: 321ms mem: Eden Space: 108032k-108032k->0k[max:191168k,rate:-336548.29kb/s] Tenured Gen: 162418k+0k->162418k[max:477888k,rate:0.00kb/s] Survivor Space: 4k-2k->1k[max:23872k,rate:-7.64kb/s]]
[GC: Copy#1809 time: 7ms interval: 321ms mem: Eden Space: 108032k-108032k->0k[max:191168k,rate:-336548.29kb/s] Tenured Gen: 162418k+0k->162418k[max:477888k,rate:0.00kb/s] Survivor Space: 1k+0k->1k[max:23872k,rate:0.24kb/s]]
[GC: Copy#1810 time: 4ms interval: 700ms mem: Eden Space: 108032k-108032k->0k[max:191168k,rate:-154331.43kb/s] Tenured Gen: 162418k+0k->162418k[max:477888k,rate:0.00kb/s] Survivor Space: 1k+288k->290k[max:23872k,rate:412.00kb/s]]
[GC: Copy#1811 time: 5ms interval: 311ms mem: Eden Space: 108032k-108032k->0k[max:191168k,rate:-347369.77kb/s] Tenured Gen: 162418k+0k->162418k[max:477888k,rate:0.00kb/s] Survivor Space: 290k-155k->135k[max:23872k,rate:-498.52kb/s]]
[GC: Copy#1812 time: 3ms interval: 340ms mem: Eden Space: 108032k-108032k->0k[max:191168k,rate:-317741.18kb/s] Tenured Gen: 162418k+0k->162418k[max:477888k,rate:0.00kb/s] Survivor Space: 135k-2k->132k[max:23872k,rate:-6.14kb/s]]
[GC: Copy#1813 time: 6ms interval: 325ms mem: Eden Space: 108032k-108032k->0k[max:191168k,rate:-332406.15kb/s] Tenured Gen: 162418k+0k->162418k[max:477888k,rate:0.00kb/s] Survivor Space: 132k+0k->133k[max:23872k,rate:0.65kb/s]]
TotalCopy[ collections: 28 | avg: 0.0065 secs | total: 0.2 secs ]MarkSweepCompact[ collections: 0 | avg: NaN secs | total: 0.0 secs ]
https://github.com/aragozin/jvm-tools/blob/master/sjk-core/COMMANDS.md#gc-command
MBeans: diagnostic commands
com.sun.management:type=DiagnosticCommandcom.sun.management:type=HotSpotDiagnostic• Forcing GC / GC log rotation• Head dump• Flight recoder• Changing --XX options• etc
Java 8
JVM Attach API
• List JVM processes• Attach to JVM by PID• Send control commands
heap dump / histogram stack dump
• Inspect system properties and VM options• Launch instrumentation agents
https://github.com/gridkit/jvm-attach
SJK: hh --dead
Dead object histogram Similar to jmap –histo Invoke jmap –histo two time
all heap objects live heap object calculates difference
Can show top N rows
https://github.com/aragozin/jvm-tools/blob/master/sjk-core/COMMANDS.md#hh-command
SJK: hh --dead
1: 19117456 2038375696 [C 2: 9543865 441272568 [Ljava.lang.Object; 3: 13519356 432619392 java.util.HashMap$Entry 4: 12558262 301398288 java.lang.String 5: 7193066 287722640 org.hibernate.engine.spi.CollectionKey 6: 619253 160678888 [I 7: 4710497 113051928 org.jboss.seam.international.Messages$1$1 8: 571327 100876880 [Ljava.util.HashMap$Entry; 9: 1436183 57447320 org.hibernate.event.spi.FlushEntityEvent 10: 1661932 53181824 java.util.Stack 11: 209899 52047904 [B 12: 1624200 51974400 org.hibernate.engine.internal.Cascade 13: 929354 44608992 java.util.HashMap 14: 1812762 43506288 org.hibernate.i.u.c.IdentityMap$IdentityMapEntry 15: 850157 34006280 java.util.TreeMap$Entry 16: 1044636 25071264 java.util.ArrayList 17: 1340986 23423328 [Ljava.lang.Class; 18: 710973 22751136 java.io.ObjectStreamClass$WeakClassKey 19: 885164 21243936 org.hibernate.event.internal.WrapVisitor 20: 885126 21243024 org.hibernate.event.internal.FlushVisitor ...Total 95197823 4793878008
https://github.com/aragozin/jvm-tools/blob/master/sjk-core/COMMANDS.md#hh-command
SJK: jps
JDK’s jps on steroid Uses attach API Lists VMs Filtering by JVM system properties Prints property values Prints effective –XX options
https://github.com/aragozin/jvm-tools/blob/master/sjk-core/COMMANDS.md#jps-command
SJK: jps
My favorite command
> sjk jps -pd PID MAIN duser.dir XMaxHeapSize90543 sjk-0.3.1-SNAPSHOT.jar /var/vas_sdk_test_server -
XX:MaxHeapSize=321262714885315 WrapperSimpleApp /var/vas_sdk_test_server/vas-sdk-test-13030 -
XX:MaxHeapSize=429496729611094 WrapperSimpleApp /var/vas_sdk_test_server/vas-sdk-test-13020 -
XX:MaxHeapSize=4294967296993 Main /var/gedoms-uat/private/rtdb_1 -XX:MaxHeapSize=1288490188856603 AxiomApplication /var/gedoms-uat/private/gedoms_1 -XX:MaxHeapSize=214748364824046 WrapperSimpleApp /var/sonar/sonar-3.6.2/bin/linux-x86-64 -XX:MaxHeapSize=536870912
https://github.com/aragozin/jvm-tools/blob/master/sjk-core/COMMANDS.md#jps-command
Perf counters
Based on shared memory safe for target JVM
Flat data model misc JVM counters true GC CPU usage data you can add own counter programmatically
Stack Trace Sampling
Capture• Dump stack traces via local connection• Store in highly compressed dump
10-30 bytes per traceAnalysis• Frame frequency• Conditional frame frequency• Traces classification histogram
https://github.com/aragozin/jvm-tools/blob/master/sjk-core/COMMANDS.md#stcap-command
Stack Trace Sampling
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Base
other
DefaultServlet.doGet
LifecycleImpl.render
LifecycleImpl.execute
WorkItemController.doselect
Seam bean interceptor - lock contention
Seam bean interceptor - inject/disinject/outject
ResourceBundle - getObject
ResourceBundle - missing resource
Facelets compile
Hibernate (rest)
Hibernate (autoFlush)
JDBC
Working with heap dumps
Java API to traverse heap dump object graphAvailable at https://github.com/aragozin/jvm-tools/tree/master/hprof-
heap Based on NetBeans profiler library No temporary files used Fixed generic method signatures Improved performance
Useful for In-place processing of large heap dumps
150 GiB is my personal record Write domain specific heap usage reports
Working with heap dumps
HeapPath Convenient way to extract value from dump Error proof Handles String, primitives/boxed and arrays
myfield1.myfield2.myfield3myarrayfield[0].myfieldmyarrayfield[*].myfieldmyarrayfield[*][*]myfield1.*.myfield3[*].value(MyClass)myhashmap?entrySet[key=description].value
Working with heap dumps
See alsohttps://github.com/vlsi/mat-calcite-pluginHeap dump meets SQL
SJK Summary
Visit https://github.com/aragozin/jvm-tools Single executable JAR Command line interface Exploits JMX / Attach API / PerfCounters Sampling profiler included Extensible commands
Write commands for your own application
Sigar
System Information Gatherer And Reporterhttps://github.com/hyperic/sigar• Cross platform• Common system metrics
CPU, Context switches, IO, etc• Java bindings
Self extracting JAR: org.gridkit.lab:sigar-lib:1.6.4
BTrace
Visit https://kenai.com/projects/btrace
Instrumentation profiling Inject code snippets written in Java CLI or Java API to use Extendible
BTrace
@OnMethod(clazz = "org.jboss.seam.Component", method = "/(inject)/") void entryByMethod2(@ProbeClassName String className, @ProbeMethodName String methodName, @Self Object component) { if (component != null) { Field nameField = field(classOf(component), "name", true); if (nameField != null) { String name = (String)get(nameField, component); Profiling.recordEntry(bench, concat("org.jboss.seam.Component.", concat(methodName, concat(":", name)))); } } }
@OnMethod(clazz = "org.jboss.seam.Component", method = "/(inject)/", location = @Location(value = Kind.RETURN)) void exitByMthd2(@ProbeClassName String className, @ProbeMethodName String methodName, @Self Object component, @Duration long duration) { if (component != null) { Field nameField = field(classOf(component), "name", true); if (nameField != null) { String name = (String)get(nameField, component); Profiling.recordExit(bench, concat("org.jboss.seam.Component.", concat(methodName, concat(":", name))), duration); } } }
Flight Recorder
+ Accessible via JMX+ Targeting JVM internals+ Low overhead‐ Non-compact file format‐ Biased profiling‐ Weak support for thread sampling
Flight Recorder
Non uniform
Some real cases
Self profiling benchmarks
Memory allocation regression tests assert for memory consumption using thread’s
allocation counter
Microbenchmarks Monitor GC events to exclude result affected by GC pause Track CPU usage Thread sampling during benchmark
Performance tests
Nimble – framework for automated distributed performance testinghttps://github.com/gridkit/nimble https://code.google.com/p/gridkit/source/browse/grid-lab/trunk/examples/zk-benchmark-sample/
End – to – End automation Setup environment Run test scenarios Capture application KPI Capture metrics for OS and Java processes Integrated profiling with BTrace Single output data file
Profiling in production
In-house continuous query engine 20+ applications Different environments, support teams, etc A lot of performance challenges
Built-in self profiling – currently in pilot Thread sampling CPU / allocation tracing Application specific diagnostics Fully encapsulated in application itself
Threading MBean performance depredate for multithreaded access
Heap analyzer
In-house continuous query engine Relational graph up to few thousand nodes High memory consumption
Heap dump reporter automatically generates Memory consumption by operators Row count per operator Graph topology In Excel friendly format
Ideas for future
StackViewer
Visual tools for thread dumpanalysis
https://github.com/aragozin/stackviewer
Heap dump API + scripting
Coding in Java works well, but Reports are not interactive Slow trial and error turn around
How about interactive console for heap analysis? bean shell / groovy ? how to do code completion in console?
Big Brother 4 J
We have already done it for performance testing Project detection Metrics capturing Integrated profiling sampling + instrumentation All metric in one place
How about ? Standalone tool for production Detecting specific Java processing Creating flight recording automatically In hyper dense file format
Thank you
Alexey Ragozin [email protected]
http://blog.ragozin.info- my technical bloghttp://github.com/aragozin http://github.com/gridkit- my open source projects