and toshio nakatani from a hardware performance monitor€¦ · low overhead is important for...

35
IBM Research October 28, 2009 | OOPSLA 2009 at Orlando, Florida © 2009 IBM Corporation How a Java VM Can Get More from a Hardware Performance Monitor Hiroshi Inoue and Toshio Nakatani IBM Research – Tokyo

Upload: others

Post on 21-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: and Toshio Nakatani from a Hardware Performance Monitor€¦ · Low overhead is important for online profiling techniques ... CPU Hardware Performance Monitor 1) configure the HPM

IBM Research

October 28, 2009 | OOPSLA 2009 at Orlando, Florida © 2009 IBM Corporation

How a Java VM Can Get More from a Hardware Performance Monitor

Hiroshi Inoue and Toshio NakataniIBM Research – Tokyo

Page 2: and Toshio Nakatani from a Hardware Performance Monitor€¦ · Low overhead is important for online profiling techniques ... CPU Hardware Performance Monitor 1) configure the HPM

2

IBM Research

How a Java VM Can Get More from a Hardware Performance Monitor © 2009 IBM Corporation

Motivation

Motivation

Runtime profile information is important for better optimizations in Just-In-Time compilersLow overhead is important for online profiling techniques

We exploit the hardware performance monitor (HPM) of the processor to minimize the profiling overhead

We study following two important profilesObject creation profile

Lock contention profile

• what class? where in source code?

Page 3: and Toshio Nakatani from a Hardware Performance Monitor€¦ · Low overhead is important for online profiling techniques ... CPU Hardware Performance Monitor 1) configure the HPM

3

IBM Research

How a Java VM Can Get More from a Hardware Performance Monitor © 2009 IBM Corporation

Hardware Performance Monitor (HPM) of POWER6

HPM of the POWER6 processor has four programmable performance counters, each of which:

– counts the number of hardware events, such as

• executed instructions

• memory accesses

• cache misses

– generates an interrupt when the counter overflows

• to sample events with a specified sampling interval

• two special purpose registers (SIAR, SDAR) to provide the instruction address and the data address that caused the interrupt

Page 4: and Toshio Nakatani from a Hardware Performance Monitor€¦ · Low overhead is important for online profiling techniques ... CPU Hardware Performance Monitor 1) configure the HPM

4

IBM Research

How a Java VM Can Get More from a Hardware Performance Monitor © 2009 IBM Corporation

HPM-based profiler vs. software-based profiler

HPM-based profilers (e.g. Oprofile in Linux, Intel’s VTune)– to generate profiles for hardware events (e.g. cache miss)

Software-based profilers (e.g. HPROF in JDK)– to generate profiles for Java-level events (e.g. object creation)

We propose HPM-based profiling techniquesfor Java-level events (object creation and lock contention)– HPM-based techniques are easy to balance the accuracy and

the overhead by controlling the sampling interval of the HPM

Page 5: and Toshio Nakatani from a Hardware Performance Monitor€¦ · Low overhead is important for online profiling techniques ... CPU Hardware Performance Monitor 1) configure the HPM

5

IBM Research

How a Java VM Can Get More from a Hardware Performance Monitor © 2009 IBM Corporation

Overview of our HPM-based profiler in JVM

JVM(IBM JDK for Java6 SR2)

OS (RedHat Enterprise Linux 5.2)

device driver for HPM

HPM profiler

CPU (POWER6)

Hardware Performance Monitor

Page 6: and Toshio Nakatani from a Hardware Performance Monitor€¦ · Low overhead is important for online profiling techniques ... CPU Hardware Performance Monitor 1) configure the HPM

6

IBM Research

How a Java VM Can Get More from a Hardware Performance Monitor © 2009 IBM Corporation

JVM

OSdevice driver for HPM

HPM profiler

CPU Hardware Performance Monitor

1) configure the HPM via the device driver

set events to countset sampling intervals

How our HPM-based profiler interact with the HPM

Page 7: and Toshio Nakatani from a Hardware Performance Monitor€¦ · Low overhead is important for online profiling techniques ... CPU Hardware Performance Monitor 1) configure the HPM

7

IBM Research

How a Java VM Can Get More from a Hardware Performance Monitor © 2009 IBM Corporation

JVM

device driver for HPM

HPM profiler

CPU Hardware Performance Monitor

1) configure the HPM via the device driver

2) record the instruction address (SIAR) and the data address (SDAR) in a buffer

How our HPM-based profiler interact with the HPM

when an interrupt occurs(e.g. once per 10,000 L1D cache misses)

OS

Page 8: and Toshio Nakatani from a Hardware Performance Monitor€¦ · Low overhead is important for online profiling techniques ... CPU Hardware Performance Monitor 1) configure the HPM

8

IBM Research

How a Java VM Can Get More from a Hardware Performance Monitor © 2009 IBM Corporation

JVM

device driver for HPM

HPM profiler

CPU Hardware Performance Monitor

1) configure the HPM via the device driver

3) translate the hardware addresses into Java-level information

How our HPM-based profiler interact with the HPM

when the buffer becomes full

OS

2) record the instruction address (SIAR) and the data address (SDAR) in a buffer

instruction address Java method data address Java class, field and

location (nursery, tenure, or stack)

Page 9: and Toshio Nakatani from a Hardware Performance Monitor€¦ · Low overhead is important for online profiling techniques ... CPU Hardware Performance Monitor 1) configure the HPM

9

IBM Research

How a Java VM Can Get More from a Hardware Performance Monitor © 2009 IBM Corporation

JVM

device driver for HPM

HPM profiler

CPU Hardware Performance Monitor

4) configure the HPM again

How our HPM-based profiler interact with the HPM

OS

Page 10: and Toshio Nakatani from a Hardware Performance Monitor€¦ · Low overhead is important for online profiling techniques ... CPU Hardware Performance Monitor 1) configure the HPM

10

IBM Research

How a Java VM Can Get More from a Hardware Performance Monitor © 2009 IBM Corporation

Agenda

Motivation

Overview

Object creation profiling using the HPM

Lock contention profiling using the HPM

Context-sensitive profiling

Summary

Page 11: and Toshio Nakatani from a Hardware Performance Monitor€¦ · Low overhead is important for online profiling techniques ... CPU Hardware Performance Monitor 1) configure the HPM

11

IBM Research

How a Java VM Can Get More from a Hardware Performance Monitor © 2009 IBM Corporation

Our approach

How we can profile Java-level events using the HPM

To correlate Java-level events with the hardware events directly supported by the HPM

We propose two different approaches– for the object creation profile– for the lock contention profile

Page 12: and Toshio Nakatani from a Hardware Performance Monitor€¦ · Low overhead is important for online profiling techniques ... CPU Hardware Performance Monitor 1) configure the HPM

12

IBM Research

How a Java VM Can Get More from a Hardware Performance Monitor © 2009 IBM Corporation

classes of created objectsand allocation sites

largeInserting a hook in allocation site

classes of created objectsmediumCounting objects in garbage collector

OutputOverhead

Existing techniques for object creation profiling

Page 13: and Toshio Nakatani from a Hardware Performance Monitor€¦ · Low overhead is important for online profiling techniques ... CPU Hardware Performance Monitor 1) configure the HPM

13

IBM Research

How a Java VM Can Get More from a Hardware Performance Monitor © 2009 IBM Corporation

Object creation profiling using the HPM

Observation

The virtual function table pointer in the Java object header is modified only at the object creation time

store for the virtual function table pointer = object creation

VFT flags lock fields

three-word object header

object format in our JVM

Page 14: and Toshio Nakatani from a Hardware Performance Monitor€¦ · Low overhead is important for online profiling techniques ... CPU Hardware Performance Monitor 1) configure the HPM

14

IBM Research

How a Java VM Can Get More from a Hardware Performance Monitor © 2009 IBM Corporation

Steps of the object creation profiling

1. To generate a memory store event profile using the HPM

each sampled event includes the instruction address and the data address

2. To translate the data address of each sampled store event into Java class and the offset in the object

3. To filter out events whose offset is not for the virtual function table pointer in the object header

4. The memory store event profile has became the object creation profile after the filtering

Page 15: and Toshio Nakatani from a Hardware Performance Monitor€¦ · Low overhead is important for online profiling techniques ... CPU Hardware Performance Monitor 1) configure the HPM

15

IBM Research

How a Java VM Can Get More from a Hardware Performance Monitor © 2009 IBM Corporation

class of created objectsand allocation sites

largeInserting a hook in allocation site

class of created objects onlymediumCounting objects in garbage collector

class of created objectsand allocation sites

for sampled events

small (depend on

sampling interval)

Our HPM-based technique

OutputOverhead

no overhead by stopping the HPM to generate interrupts!

Comparing object creation profiling techniques

Page 16: and Toshio Nakatani from a Hardware Performance Monitor€¦ · Low overhead is important for online profiling techniques ... CPU Hardware Performance Monitor 1) configure the HPM

16

IBM Research

How a Java VM Can Get More from a Hardware Performance Monitor © 2009 IBM Corporation

Examples of the object creation profiles

hardware: 4 cores of POWER6 4.0GHz, 16 GB memory (IBM BladeCenter JS22)software: RedHat Enterprise Linux 5.2, IBM JDK for Java6 SR2 32 bit

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Countedin GC

Ourprofiler

other classes

java/lang/Integer

spec/jbb/Orderline

java/math/BigDecimal

java/lang/String

char array

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Countedin GC

Ourprofiler

other classes

integer array

char array

com/sun/tools/javac/code/Types$Subst

com/sun/tools/javac/tree/JCTree$JCIdent

com/sun/tools/javac/code/Type$MethodType

java/util/HashMap$Entry

byte array

java/lang/Integer

com/sun/tools/javac/util/ListBuffer

com/sun/tools/javac/util/List

brea

kdow

n b

y th

e nu

mbe

r of

obj

ects

See the paper for quantitative evaluationsSee the paper for quantitative evaluations

SPECjbb2005 compiler.compiler (SPECjvm2008)

Page 17: and Toshio Nakatani from a Hardware Performance Monitor€¦ · Low overhead is important for online profiling techniques ... CPU Hardware Performance Monitor 1) configure the HPM

17

IBM Research

How a Java VM Can Get More from a Hardware Performance Monitor © 2009 IBM Corporation

Examples of the object creation profiles

hardware: 4 cores of POWER6 4.0GHz, 16 GB memory (IBM BladeCenter JS22)software: RedHat Enterprise Linux 5.2, IBM JDK for Java6 SR2 32 bit

SPECjbb2005 SPECjvm2008 compiler.compiler

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Countedin GC

Ourprofiler

other classes

java/lang/Integer

spec/jbb/Orderline

java/math/BigDecimal

java/lang/String

char array

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Countedin GC

Ourprofiler

other classes

integer array

char array

com/sun/tools/javac/code/Types$Subst

com/sun/tools/javac/tree/JCTree$JCIdent

com/sun/tools/javac/code/Type$MethodType

java/util/HashMap$Entry

byte array

java/lang/Integer

com/sun/tools/javac/util/ListBuffer

com/sun/tools/javac/util/List

brea

kdow

n b

y th

e nu

mbe

r of

obj

ects

See the paper for quantitative evaluationsSee the paper for quantitative evaluations

java/math/BigDecimal.multiply 34.8%spec/jbb/Orderline.process 15.5%java/math/BigDecimal.setScale 11.6%java/math/BigDecimal.movePointRight 10.0%other methods 28.1%

spec/jbb/infra/Util/XMLTransactionLog.populateXML 52.9%java/math/BigDecimal.longString1 35.6%other methods 11.5%

spec/jbb/infra/Util/XMLTransactionLog.populateXML 39.0%java/math/BigDecimal.longString1 55.2%spec/jbb/Company.getCustomerByLastName 2.8%other methods 3.0%

Page 18: and Toshio Nakatani from a Hardware Performance Monitor€¦ · Low overhead is important for online profiling techniques ... CPU Hardware Performance Monitor 1) configure the HPM

18

IBM Research

How a Java VM Can Get More from a Hardware Performance Monitor © 2009 IBM Corporation

Agenda

Motivation

Overview

Object creation profiling using the HPM

Lock contention profiling using the HPM

Context-sensitive profiling

Summary

Page 19: and Toshio Nakatani from a Hardware Performance Monitor€¦ · Low overhead is important for online profiling techniques ... CPU Hardware Performance Monitor 1) configure the HPM

19

IBM Research

How a Java VM Can Get More from a Hardware Performance Monitor © 2009 IBM Corporation

Comparing lock contention profiling techniques

largeblocking locksoftware-based profiling

smallspin lockour HPM-based profiling

overheadsuitable for

Two profiling techniques are complementary

smaller overhead is important for lock contention profiling because the overhead may change the lock contentions

Page 20: and Toshio Nakatani from a Hardware Performance Monitor€¦ · Low overhead is important for online profiling techniques ... CPU Hardware Performance Monitor 1) configure the HPM

20

IBM Research

How a Java VM Can Get More from a Hardware Performance Monitor © 2009 IBM Corporation

Instrument with the special NOP instruction

Our approach

insert a special NOP instruction, called ProbeNOP, in the code of interest to correlate with the hardware event

where, the ProbeNOP instruction does not affect the program meaningthe HPM can count the number of ProbeNOP execution

Page 21: and Toshio Nakatani from a Hardware Performance Monitor€¦ · Low overhead is important for online profiling techniques ... CPU Hardware Performance Monitor 1) configure the HPM

21

IBM Research

How a Java VM Can Get More from a Hardware Performance Monitor © 2009 IBM Corporation

Our ProbeNOP Implementation

No existing processor has the ProbeNOP instruction

We need to find an existing instruction that can be used as the ProbeNOP

The instruction does not change the program behavior

The HPM can count the execution of the instruction

The JVM does not use the instruction

We used the vector permute instruction (vperm) of the VMX (Altivec) instruction on POWER6

The HPM can count the execution of the VMX instructions

The current JVM does not use the VMX

Page 22: and Toshio Nakatani from a Hardware Performance Monitor€¦ · Low overhead is important for online profiling techniques ... CPU Hardware Performance Monitor 1) configure the HPM

22

IBM Research

How a Java VM Can Get More from a Hardware Performance Monitor © 2009 IBM Corporation

Profiling lock contentions

while (true) {if (tryLock(obj) == SUCCESS) break;

ProbeNOP;}

// critical section begins

monitor enter for obj

Note: This pseudocode is radically simplifiedSee the paper for more realistic implementation

Note: This pseudocode is radically simplifiedSee the paper for more realistic implementation

The HPM counter increments at every iteration in the spin loop

The HPM periodically generates interrupts

The profiler can identify the location of the contended locksbased on the HPM interrupts

Additionally, we want to know which object causes the contention

Page 23: and Toshio Nakatani from a Hardware Performance Monitor€¦ · Low overhead is important for online profiling techniques ... CPU Hardware Performance Monitor 1) configure the HPM

23

IBM Research

How a Java VM Can Get More from a Hardware Performance Monitor © 2009 IBM Corporation

Value profiling with the ProbeNOP

while (true) {if (tryLock(obj) == SUCCESS) break;ProbeNOP(obj);

}

// critical section begins

monitor enter for obj

Note: This pseudocode is radically simplifiedSee the paper for more realistic implementation

Note: This pseudocode is radically simplifiedSee the paper for more realistic implementation

We encode information on which register or memory location to profile in the unused bits of the ProbeNOP

The interrupt handler decodes the information encoded in the ProbeNOP instruction and collects the values of the specified target

Page 24: and Toshio Nakatani from a Hardware Performance Monitor€¦ · Low overhead is important for online profiling techniques ... CPU Hardware Performance Monitor 1) configure the HPM

24

IBM Research

How a Java VM Can Get More from a Hardware Performance Monitor © 2009 IBM Corporation

Example of the lock contention profile

SPIN LOOP METHOD============== ==================================================

% samples 36.9% (75) com/ibm/ejs/ras/Tr.register(Ljava/lang/Class;Ljava14.3% (29) org/apache/openjpa/meta/MetaDataRepository.getMeta6.4% (13) com/ibm/io/async/ResultHandler.runEventProcessingL3.0% (6) com/ibm/ws/persistence/EntityManagerImpl.createNam

...

com/ibm/ejs/ras/Tr.register(Ljava/lang/Class;Ljava 36.9% (75)SPIN LOOP LOCATION CLASS============== ======== ===============================

% samples 36.9% (75) tenure com/ibm/ws/bootstrap/WsLogManager

for DayTrader benchmark

Page 25: and Toshio Nakatani from a Hardware Performance Monitor€¦ · Low overhead is important for online profiling techniques ... CPU Hardware Performance Monitor 1) configure the HPM

25

IBM Research

How a Java VM Can Get More from a Hardware Performance Monitor © 2009 IBM Corporation

Agenda

Motivation

Overview

Object creation profiling using the HPM

Lock contention profiling using the HPM

Context-sensitive profiling

Summary

Page 26: and Toshio Nakatani from a Hardware Performance Monitor€¦ · Low overhead is important for online profiling techniques ... CPU Hardware Performance Monitor 1) configure the HPM

26

IBM Research

How a Java VM Can Get More from a Hardware Performance Monitor © 2009 IBM Corporation

Context-sensitive profiling for the HPM: Motivation

Motivation

Methods in collection classes (e.g. HashMap) often cause many cache missesjust the code location of the cache misses are not informative enough for optimization

Calling context for such methods are important

Challenge:To obtain a calling context information with minimum overhead

Page 27: and Toshio Nakatani from a Hardware Performance Monitor€¦ · Low overhead is important for online profiling techniques ... CPU Hardware Performance Monitor 1) configure the HPM

27

IBM Research

How a Java VM Can Get More from a Hardware Performance Monitor © 2009 IBM Corporation

An example of the context-sensitive profiling

method A

method B

method C

method A

method D

method C

method A

method C

call

call

call

call

call

cachemiss

cachemiss

cachemiss

Page 28: and Toshio Nakatani from a Hardware Performance Monitor€¦ · Low overhead is important for online profiling techniques ... CPU Hardware Performance Monitor 1) configure the HPM

28

IBM Research

How a Java VM Can Get More from a Hardware Performance Monitor © 2009 IBM Corporation

An example of the context-sensitive profiling

method A

method B

method C

method A

method D

method C

method A

method C

call

call

call

call

call

cachemiss

cachemiss

cachemiss

See the paper for more detailSee the paper for more detail

12 12 12

8 8

820 10

size of the stack frame

call stack depth = 40 call stack depth = 30

call stack depth = 20

We collect the value of the stack frame pointer in the HPM interrupt handler

Page 29: and Toshio Nakatani from a Hardware Performance Monitor€¦ · Low overhead is important for online profiling techniques ... CPU Hardware Performance Monitor 1) configure the HPM

29

IBM Research

How a Java VM Can Get More from a Hardware Performance Monitor © 2009 IBM Corporation

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

SPECjbb20

05co

mpi

ler.c

ompil

er

derb

y

sunf

low

xml.v

alida

tion

seria

lm

pega

udio

DayTra

der

Accuracy of the calling context information

L1 c

ache

mis

s ev

ents

who

se c

alle

rs w

ere

un

ique

ly id

entif

ied

for

at l

east

thre

e le

vels

Page 30: and Toshio Nakatani from a Hardware Performance Monitor€¦ · Low overhead is important for online profiling techniques ... CPU Hardware Performance Monitor 1) configure the HPM

30

IBM Research

How a Java VM Can Get More from a Hardware Performance Monitor © 2009 IBM Corporation

Summary

We showed how our profiler in JVM generates the object creation profile and the lock contention profile using the HPM

We correlate the object creations with the store instructions for Java object headers

We correlate the lock contentions with the special NOP instruction, called ProbeNOP

We proposed a technique to identify the calling context for each sample using the call stack depth

This technique works surprisingly well with many tested benchmarks including a large application server workload

Page 31: and Toshio Nakatani from a Hardware Performance Monitor€¦ · Low overhead is important for online profiling techniques ... CPU Hardware Performance Monitor 1) configure the HPM

31

IBM Research

How a Java VM Can Get More from a Hardware Performance Monitor © 2009 IBM Corporation

Thank you!

Page 32: and Toshio Nakatani from a Hardware Performance Monitor€¦ · Low overhead is important for online profiling techniques ... CPU Hardware Performance Monitor 1) configure the HPM

32

IBM Research

How a Java VM Can Get More from a Hardware Performance Monitor © 2009 IBM Corporation

Backup slides

Page 33: and Toshio Nakatani from a Hardware Performance Monitor€¦ · Low overhead is important for online profiling techniques ... CPU Hardware Performance Monitor 1) configure the HPM

33

IBM Research

How a Java VM Can Get More from a Hardware Performance Monitor © 2009 IBM Corporation

JVM

device driver for HPM

HPM profiler

CPU Hardware Performance Monitor

How our HPM-based profiler interact with the HPM

L1D cache miss occursat 0x1234 (SIAR) when accessing 0xAAAA (SDAR)

buffered in the OS-space bufferSIAR SDAR

{ 0x1234, 0xAAAA }{ 0x1000, 0xBBBB }{ 0x1000, 0xCCCC }....

{ method = hashMap.get(), class = java.lang.String,field = 1st field, location = nursery heap }

{ method = hashMap.get,....

OS

Page 34: and Toshio Nakatani from a Hardware Performance Monitor€¦ · Low overhead is important for online profiling techniques ... CPU Hardware Performance Monitor 1) configure the HPM

34

IBM Research

How a Java VM Can Get More from a Hardware Performance Monitor © 2009 IBM Corporation

Accuracy of the object creation profiling

0

0.10.2

0.3

0.4

0.50.6

0.7

0.80.9

1

30 seconds2,000 samples/sec

30 seconds8,000 samples/sec

240 seconds8,000 samples/sec

number of samples

over

lap

met

ric

SPECjbb2005 compiler.compilerderby sunflowxml.validation serialmpegaudio daytrader

Page 35: and Toshio Nakatani from a Hardware Performance Monitor€¦ · Low overhead is important for online profiling techniques ... CPU Hardware Performance Monitor 1) configure the HPM

35

IBM Research

How a Java VM Can Get More from a Hardware Performance Monitor © 2009 IBM Corporation

0.9

0.92

0.94

0.96

0.98

1

1.02SPECjbb

2005

com

piler

.com

piler

derb

y

sunf

low

xml.v

alida

tion

seria

lm

pega

udio

DayTra

der

Geo. M

EAN

rela

tive

thro

ughp

ut .

without profiler

with profiler

Overhead of the profiler

high

er is

fast

er

(non-zeroorigin)

sampling rate8,000 samples/secor 1 sample/msec/HW thread