1 garbage collection advantage: improving program locality xianglong huang (ut) stephen m blackburn...

31
1 Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass), Zhenlin Wang (MTU), Perry Cheng (IBM)

Upload: barbara-bradford

Post on 17-Jan-2016

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass),

1

Garbage Collection Advantage:

Improving Program Locality

Xianglong Huang (UT)Stephen M Blackburn (ANU), Kathryn S McKinley (UT)

J Eliot B Moss (UMass), Zhenlin Wang (MTU), Perry Cheng (IBM)

Page 2: 1 Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass),

2

Motivation

• Memory gap problem• OO programs become more popular• OO programs exacerbates memory gap

problem– Automatic memory management– Pointer data structures– Many small methods

Goal: improve OO program locality

Page 3: 1 Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass),

3

Cache Performance Matters

_213_javac

05

10152025303540

Tota

l Cyc

les

(in b

illio

ns)

Page 4: 1 Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass),

4

Opportunity

• Generational copying garbage collector reorders objects at runtime

Page 5: 1 Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass),

5

1

4

65

7

2 3

Copying of Linked Objects

BreadthFirst

65

7

432

1

Page 6: 1 Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass),

6

71 2 3 4 5 6

1

4

65

7

2 3

Copying of Linked Objects

65

7

432

1

BreadthFirst

DepthFirst

Page 7: 1 Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass),

7

71 2 3 4 5 6

Copying of Linked Objects

DepthFirst

OnlineObjectReordering

1 4BreadthFirst

61 2 3 4 75

1

4

65

7

2 3

65

7

432

1

41

Page 8: 1 Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass),

8

Outline

• Motivation• Online Object Reordering

(OOR)• Methodology• Experimental Results• Conclusion

Page 9: 1 Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass),

9

Online Object Reordering

• Where are the cache misses?• How to identify hot field accesses

at runtime?• How to reorder the objects?

Page 10: 1 Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass),

10

Where Are The Cache Misses?

VM Objects StackOlder

Generation

• Heap structure:

Nursery

Not to scale

Page 11: 1 Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass),

11

Where Are The Cache Misses?

_209_db

0200400600800

100012001400160018002000

To

tal

Acc

esse

s (i

n m

illi

on

s)

L2 hits

L2 misses

Page 12: 1 Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass),

12

Where Are The Cache Misses?

• Two opportunities to reorder objects in the older generation– Promote nursery objects– Full heap collection

Page 13: 1 Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass),

13

How to Find Hot Fields?

• Runtime info (intercept every read)?

• Compiler analysis?• Runtime information + compiler

analysis Key: Low overhead estimation

Page 14: 1 Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass),

14

Which Classes Need Reordering?

Step 1: Compiler analysis– Excludes cold basic blocks– Identifies field accesses

Step 2: JIT adaptive sampling identifies hot methods– Mark as hot field accesses in hot

methods

Key: Low overhead estimation

Page 15: 1 Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass),

15

Example: Compiler Analysis

Compiler

Hot BBCollect access info

Cold BBIgnore

Compiler

Access List:1. A.b2. ….….

Method Foo { Class A a; try { …=a.b; … } catch(Exception e){ …a.c }}

Page 16: 1 Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass),

16

Example: Adaptive Sampling

Method Foo { Class A a; try { …=a.b;

… } catch(Exception e){

…a.c }}

Adaptive Sampling

Foo is hot

Foo Accesses:1. A.b2. ….….

A.b is hot

A

B

b…..

c A’s type information

c b

Page 17: 1 Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass),

17

1

4

65

7

2 3

Copying of Linked Objects

65

7

43

OnlineObjectReordering

Type Information

143

2

1

Hot space Cold space

Page 18: 1 Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass),

18

OOR System Overview

BaselineCompiler

SourceCode

ExecutingCode

AdaptiveSampling Optimizing

Compiler

HotMethods

Access InfoDatabase

Register HotField Accesses

Look Up

AddsEntries

GC: CopiesObjects

Affects Locality

AdviceGC: CopiesObjects

OOR additionJikesRVM componentInput/Output

OptimizingCompiler

AdaptiveSampling

Improves Locality

Page 19: 1 Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass),

19

Outline

• Motivation• Online Object Reordering• Methodology• Experimental Results• Conclusion

Page 20: 1 Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass),

20

Methodology: Virtual Machine

• Jikes RVM– VM written in Java– High performance– Timer based adaptive sampling – Dynamic optimization

• Experiment setup– Pseudo-adaptive – 2nd iteration [Eeckhout et al.]

Page 21: 1 Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass),

21

Methodology: Memory Management

• Memory Management Toolkit (MMTk):– Allocators and garbage collectors– Multi-space heap

• Boot image• Large object space (LOS)• Immortal space

• Experiment setup– Generational copying GC with 4M

bounded nursery

Page 22: 1 Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass),

22

Overhead: OOR Analysis Only

Benchmark Base Execution Time (sec)

w/ only OOR Analysis (sec)

Overhead

jess 4.39 4.43 0.84%

jack 5.79 5.82 0.57%

raytrace 4.63 4.61 -0.59%

mtrt 4.95 4.99 0.70%

javac 12.83 12.70 -1.05%

compress 8.56 8.54 0.20%

pseudojbb 13.39 13.43 0.36%

db 18.88 18.88 -0.03%

antlr 0.94 0.91 -2.90%

hsqldb 160.56 158.46 -1.30%

ipsixql 41.62 42.43 1.93%

jython 37.71 37.16 -1.44%

ps-fun 129.24 128.04 -1.03%

Mean -0.19%

Page 23: 1 Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass),

23

Detailed Experiments

• Separate application and GC time• Vary thresholds for method heat• Vary thresholds for cold basic

blocks• Three architectures

– x86, AMD, PowerPC

• x86 Performance counter: – DL1, trace cache, L2, DTLB, ITLB

Page 24: 1 Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass),

24

Performance javac

Page 25: 1 Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass),

25

Performance db

Page 26: 1 Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass),

26

Performance jython

Any static ordering leaves you vulnerable to pathological cases.

Page 27: 1 Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass),

27

Phase Changes

Page 28: 1 Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass),

28

Related Work

• Evaluate static orderings [Wilson et al.]– Large performance variation

• Static profiling [Chilimbi et al., and others]– Lack of flexibility

• Instance-based object reordering [Chilimbi et al.]– Too expensive

Page 29: 1 Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass),

29

Conclusion

• Static traversal orders have up to 25% variation

• OOR improves or matches best static ordering

• OOR has very low overhead• Past predicts future

Page 30: 1 Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass),

30

Questions?

Thank you!

Page 31: 1 Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass),

31

OOR System Overview

• Records object accesses in each method (excludes cold basic blocks)

• Finds hot methods by adaptive sampling

• Reorders objects with hot fields in older generation during GC

• Copies hot objects into separate region