generational stack collection and profile driven pretenuring perry cheng robert harper peter lee...
Post on 20-Dec-2015
217 views
TRANSCRIPT
![Page 1: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)](https://reader036.vdocuments.net/reader036/viewer/2022081514/56649d425503460f94a1dd11/html5/thumbnails/1.jpg)
Generational Stack Collection And Profile Generational Stack Collection And Profile
driven Pretenuringdriven Pretenuring
Perry Cheng Robert Harper
Peter Lee
Presented By Moti Alperovitch
![Page 2: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)](https://reader036.vdocuments.net/reader036/viewer/2022081514/56649d425503460f94a1dd11/html5/thumbnails/2.jpg)
The problem
• Some data die young, and some data die old.
• In recursions, most deep stack unwind very infrequently.
• Scanning unchanged roots may take a dominant time.
![Page 3: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)](https://reader036.vdocuments.net/reader036/viewer/2022081514/56649d425503460f94a1dd11/html5/thumbnails/3.jpg)
We compare the following types
• Semispace stack collection (Cheney).
• Generational collector.
• General Collection with stack marker.
• Pretenuring with Stack marker.
![Page 4: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)](https://reader036.vdocuments.net/reader036/viewer/2022081514/56649d425503460f94a1dd11/html5/thumbnails/4.jpg)
Semispace copy collection
• Scanning the Stack for roots, and copy data that reachable from the roots to unused areas (Nursery, Survive).
• Disadvantage:– all data is copied, when some data die young,
and some die old.
![Page 5: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)](https://reader036.vdocuments.net/reader036/viewer/2022081514/56649d425503460f94a1dd11/html5/thumbnails/5.jpg)
Generational collection
• Base on semispace copy collection.
• Arrange some heap areas according to the objects life time.
• Disadvantage:– For programs with deep call chain, The stack
scanning can take a lot of time.– Long time object are typically copied several
times before they are tenured.
![Page 6: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)](https://reader036.vdocuments.net/reader036/viewer/2022081514/56649d425503460f94a1dd11/html5/thumbnails/6.jpg)
General stack collection
• Use stack marker in order to cache the root scan.
• Disadvantage:– Long time object are typically copied several
times before they are tenured
![Page 7: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)](https://reader036.vdocuments.net/reader036/viewer/2022081514/56649d425503460f94a1dd11/html5/thumbnails/7.jpg)
Pretenuring
• Making a run, in order to build profiles for each object life time according to it’s allocation site.
![Page 8: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)](https://reader036.vdocuments.net/reader036/viewer/2022081514/56649d425503460f94a1dd11/html5/thumbnails/8.jpg)
TIL Compiler
• Optimization compiler for ML (SML).
• Intentional polymorphism.
• Nearly Tag free garbage collection.
• Conventional functional language optimization.
• Loop Optimization.
![Page 9: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)](https://reader036.vdocuments.net/reader036/viewer/2022081514/56649d425503460f94a1dd11/html5/thumbnails/9.jpg)
Stack Scanning
• At any execution point, data is live if it is accessed as the program continue to execute.
• The collector need to retain data that is accessible by following the all pointers roots.
• The roots are registers and stack slots.
![Page 10: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)](https://reader036.vdocuments.net/reader036/viewer/2022081514/56649d425503460f94a1dd11/html5/thumbnails/10.jpg)
Difficulties
• Accurate determine the root set.
• In callee-save registers, the content of a register or stack slot can come from caller frames so stack frames cannot be decoded in isolation.
• In Polymorphism the compiler cannot statically compute whether a value is a pointer of not.
![Page 11: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)](https://reader036.vdocuments.net/reader036/viewer/2022081514/56649d425503460f94a1dd11/html5/thumbnails/11.jpg)
Finding the root
• When the GC is called from mutator, the return address indicate the current execution point (Return Address).
• By the RA (Using a table), we can determine the frame layout of the GC - caller frame.
• By continuing this way, we can find the root.
![Page 12: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)](https://reader036.vdocuments.net/reader036/viewer/2022081514/56649d425503460f94a1dd11/html5/thumbnails/12.jpg)
Finding the roots
• Determine the roots set from the initial frame, By scanning downwards.
• The two ways scanning is needed since there are stack slots that their type depend on the previous stack slot.
![Page 13: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)](https://reader036.vdocuments.net/reader036/viewer/2022081514/56649d425503460f94a1dd11/html5/thumbnails/13.jpg)
Trace table information
• The Return address (RA).
• Stack frame size.
• For each stack-slot we record its trace:– Pointer: The compiler statically determine that
it’s a pointer.– Non Pointer - The value is not a root.– Calee-save + (Register) - Calle-save
information.
![Page 14: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)](https://reader036.vdocuments.net/reader036/viewer/2022081514/56649d425503460f94a1dd11/html5/thumbnails/14.jpg)
Trace table information - 2
– Compute: Compiler couldn’t statically determine the pointer status of a value. Have an additional information to determine where the type of such value reside.
![Page 15: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)](https://reader036.vdocuments.net/reader036/viewer/2022081514/56649d425503460f94a1dd11/html5/thumbnails/15.jpg)
Stack frames and the corresponding table entry.
RA=0x2001c71842
Slot 1Slot 2Slot 3Slot 4Slot 5Slot 6
55 56
77 78 79
INTINTINT
3.1415
Stack Frame
RA=0x2001c718
Frame size = 6
Non Pointer
Pointer
Pointer
Compute: Stack 4
Entry 1Entry 2Entry 3Entry 4Entry 5Entry 6Entry 7Compute: Calle $10
…Trace info on Register
Table Entry
![Page 16: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)](https://reader036.vdocuments.net/reader036/viewer/2022081514/56649d425503460f94a1dd11/html5/thumbnails/16.jpg)
Semispace against Generations collections
Time for K = 1.5
01020304050
60708090
100
CheckSum Color FFT Grobner KnuthPending
Lexgend Life Peg PIA Simplae
Program Name
ms
SemiSpaceGenerational
![Page 17: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)](https://reader036.vdocuments.net/reader036/viewer/2022081514/56649d425503460f94a1dd11/html5/thumbnails/17.jpg)
SemiSpace against Generations collections
Time for K = 4
0
10
20
30
40
50
60
CheckSum Color FFT Grobner KnuthPending
Lexgend Life Peg PIA Simplae
Program Name
ms
SemiSpaceGenerational
![Page 18: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)](https://reader036.vdocuments.net/reader036/viewer/2022081514/56649d425503460f94a1dd11/html5/thumbnails/18.jpg)
SemiSpace against Generations collectionsNumber of GC for K = 1.5
05000
100001500020000250003000035000
Check
SumColo
rFFT
Grobn
er
Knuth
Pen
ding
Lexge
nd Life Peg PIA
Simpla
e
Program Name
Number
SemiSpaceGenerational
![Page 19: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)](https://reader036.vdocuments.net/reader036/viewer/2022081514/56649d425503460f94a1dd11/html5/thumbnails/19.jpg)
Semispace against Generations collectionsNumber of GC for K = 4
02000400060008000
1000012000
Check
SumColo
rFFT
Grobn
er
Knuth
Pen
ding
Lexge
nd Life Peg PIA
Simpla
e
Program Name
Number
SemiSpaceGenerational
![Page 20: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)](https://reader036.vdocuments.net/reader036/viewer/2022081514/56649d425503460f94a1dd11/html5/thumbnails/20.jpg)
Stack marking
• When the stack is deep, scanning the root may take a dominant time of the GC time.
• Most of the stack usually doesn’t change from the previous GC, to the current GC.
• Marking the stack frames that didn’t changed, can significant improve the roots scanning.
![Page 21: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)](https://reader036.vdocuments.net/reader036/viewer/2022081514/56649d425503460f94a1dd11/html5/thumbnails/21.jpg)
Marking the stack - 1st method
• On each stack frame, add a flag whether it was changed. The collector reset this flag when passing it, while the mutator set this flag.
• Disadvantage:– The mutator is involved in the GC process.
– The compiler need to do several operations for the GC, on each return, while most time the GC is not used.
![Page 22: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)](https://reader036.vdocuments.net/reader036/viewer/2022081514/56649d425503460f94a1dd11/html5/thumbnails/22.jpg)
Marking the stack - 2nd method
• When scanning the roots, set the RA of every n stack frame to a special stub function.
• The stub function hold a table of the RA.
• The stub function notes that this frame was deactivate, and continue to the original RA.
![Page 23: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)](https://reader036.vdocuments.net/reader036/viewer/2022081514/56649d425503460f94a1dd11/html5/thumbnails/23.jpg)
Marking the stack - Method 2
• The Problems with this method:– Functions doesn’t always return normally.– When exception is raised, It’s invoked in stack
order until there is a matching handler.– Fortunately, we can hold a value of M that
updated on exceptions that is contains the shallowest stack pointer that occurred as a result of raised exception.
![Page 24: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)](https://reader036.vdocuments.net/reader036/viewer/2022081514/56649d425503460f94a1dd11/html5/thumbnails/24.jpg)
Stack Marker improvement
-100
1020304050607080
%
Che
ckSu
m
Col
or
FF
T
Gro
bner KB
Lex
gen
Lif
e
Nqu
een
Peg
PIA
Sam
ple
![Page 25: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)](https://reader036.vdocuments.net/reader036/viewer/2022081514/56649d425503460f94a1dd11/html5/thumbnails/25.jpg)
Pretenuring
• Using profile data to predict the survival rate of an object.
• We speculate that object allocated from the same place in program would have to be similar lifetime.
• In order to check this hypothesis we divide the program to some heap allocations site.
![Page 26: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)](https://reader036.vdocuments.net/reader036/viewer/2022081514/56649d425503460f94a1dd11/html5/thumbnails/26.jpg)
Pretenuring - 2
• The compiler is modified in order to update a table of allocation sites when creating.
• During garbage collection the entries are updated.
• We scan allocation area after each collection to located death object and update their allocation site.
![Page 27: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)](https://reader036.vdocuments.net/reader036/viewer/2022081514/56649d425503460f94a1dd11/html5/thumbnails/27.jpg)
Pretenuring - 3
• Using this information we can create statistics about the number, size and average age of object created from each allocation site.
• We include only allocation sites that included at least 1% of the allocations, or 1% of the copied data.
![Page 28: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)](https://reader036.vdocuments.net/reader036/viewer/2022081514/56649d425503460f94a1dd11/html5/thumbnails/28.jpg)
The profile results
![Page 29: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)](https://reader036.vdocuments.net/reader036/viewer/2022081514/56649d425503460f94a1dd11/html5/thumbnails/29.jpg)
The profile results
![Page 30: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)](https://reader036.vdocuments.net/reader036/viewer/2022081514/56649d425503460f94a1dd11/html5/thumbnails/30.jpg)
The results
• According to the results we can see that 90% of the allocation have very short life time, but 96 - 99 % of the copied date are generated from 4 sites.
![Page 31: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)](https://reader036.vdocuments.net/reader036/viewer/2022081514/56649d425503460f94a1dd11/html5/thumbnails/31.jpg)
Using the profile data
• Object that created from allocated site that have long life time, directly created into the older generation.
• Problem: An object directly allocated in the older generation may have a reference to an object in the younger generation.
![Page 32: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)](https://reader036.vdocuments.net/reader036/viewer/2022081514/56649d425503460f94a1dd11/html5/thumbnails/32.jpg)
Solutions ?
• Allocating that type of object in the young generation.– May lead to a lot more copying.
• Remember the area of the older generation that have reference to the young reference, and scan it on each minor generation.– Scanning without copying doesn’t take a lot of
time.
![Page 33: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)](https://reader036.vdocuments.net/reader036/viewer/2022081514/56649d425503460f94a1dd11/html5/thumbnails/33.jpg)
Improvement of pretenuring (ms)
Generational collection Generational collection withpretenuring
ProgramK=1.5 K=2.0 K=4.0 K=1.5 K=2.0 K=4.0
%Improve
Knuth-Bandix 7.66 8.00 8.07 1.44 1.76 1.88 33
Lexgebnd 3.20 2.58 2.43 2.63 2.00 1.55 27
Nqueen 1.83 1.86 1.95 13.88 14.03 13.53 50
Simple 5.05 4.81 4.33 3.58 3.74 3.71 12
![Page 34: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)](https://reader036.vdocuments.net/reader036/viewer/2022081514/56649d425503460f94a1dd11/html5/thumbnails/34.jpg)
Improvement of pretenuring (bytes copy)
Generational collection Generational collection withpretenuring
ProgramK=1.5 K=2.0 K=4.0 K=1.5 K=2.0 K=4.0
%Improve
Knuth-Bandix
14,569,800
17.869,436
17,695,560
2,050,212
5,376,156
5,151,708 70
Lexgebnd27,427,5
4418,647,6
3216,435,2
9224,278,3
8815,452,6
9613,397
,340 18
Nqueen5,312,54
85,312,548
5,312,548
194,256 194,256194,256 96
Simple25,771,3
4825,431,1
4425,430,248
14,241,500
14,734,176
14,133,376 44
![Page 35: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)](https://reader036.vdocuments.net/reader036/viewer/2022081514/56649d425503460f94a1dd11/html5/thumbnails/35.jpg)
Comparing between all the methods
0
20
40
60
80
100
120C
olor
Gro
bner KB
Lex
gen
Lif
e
Nqu
een
PIA
Sim
ple
Generational Stack Markers Pretenuring with stack Marker
![Page 36: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)](https://reader036.vdocuments.net/reader036/viewer/2022081514/56649d425503460f94a1dd11/html5/thumbnails/36.jpg)
Conclusion for pretenuring
• The reduction of GC time is smaller that excepted from the reduction of data copied.
• Since we have to check the younger generations, the cost of GC time is still proportional to the live data (With a smaller constant).
![Page 37: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)](https://reader036.vdocuments.net/reader036/viewer/2022081514/56649d425503460f94a1dd11/html5/thumbnails/37.jpg)
Suggestion to improve the speed
• Creating a control-flow and data-flow analysis on objects.
![Page 38: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)](https://reader036.vdocuments.net/reader036/viewer/2022081514/56649d425503460f94a1dd11/html5/thumbnails/38.jpg)
Conclusions
• Generational collector is twice faster on GC time. And also improve the GC time, since it’s improve the cache locality.
• For programs that use deep stack, caching the roots data can improve GC time up to 74%.
• Profiling the heap can improve the speed for some cases by 50%.
![Page 39: Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)](https://reader036.vdocuments.net/reader036/viewer/2022081514/56649d425503460f94a1dd11/html5/thumbnails/39.jpg)
The End