aim products and technology ibm presentations | july 21, 2004 presentation subtitle: 20pt arial...
TRANSCRIPT
AIM Products and Technology
IBM Presentations | July 21, 2004 © 2004 IBM Corporation
Modron
Production world view of Garbage Collectionin the J2ME and J2SE spaces
2
IBM Products
J9/Modron © 2004 IBM Corporation
What this talk is about
Description of the lines in the sand we draw between the various parts of the memory manager in J9
– Allocation, garbage collection, free list management Comparison of the collectors in the J2ME and J2SE spaces
– Limitations of environment, and how they affect what you can do Examination of some of the performance related issues associated
to the decision that get made– For better or for worse (or inconsequence?)
3
IBM Products
J9/Modron © 2004 IBM Corporation
A quick history lesson
J9 was started as clean-room Java VM implementation for the embedded (J2ME) space
– Small, Hotswap debugging, JVMPI, TCK compliant
– Garbage collection was single threaded generational solution
Fast forward…
J9 continues to be clean room, but is also targeted for the desktop and server (J2SE) space
– Keep things small, but features add size
– Scalable collection strategy (CPU + Memory)
4
IBM Products
J9/Modron © 2004 IBM Corporation
The problem space
The original problem space starts with small devices– Hand-helds, cell phones and air conditioners
– Limited OS support, sometimes none
– Threading packages are suspect
– Hardware is simplistic
As we move forward, desktops and servers enter the picture,– Desktop machines
– Web browsers– Development environments (Eclipse)
– Server
– Websphere
5
IBM Products
J9/Modron © 2004 IBM Corporation
Breaking down the components
The highest level concept that has differing properties is the Heap.
The problem space drives how the heap is organized– Virtual Memory?
– Contiguous or non-contiguous? Based on how the heap is divided (physically or virtually through
collection strategies), each area is associated to a Segment.
Heap
segments
6
IBM Products
J9/Modron © 2004 IBM Corporation
Breaking down the components
A Memory Space describes the garbage collection strategy that is applied to the heap.
– Mark/Sweep/Compact (single area – “flat”)
– Generational semi-space copying collector
– Is the top most container for the segments that divide the heap Typical heap has only a single Memory Space
– J9 supports multiple memory spaces in a single heap
– “light weight processes” The Memory Space responsibility is to identify what type of
collection strategies it applies to the heap it is responsible for
7
IBM Products
J9/Modron © 2004 IBM Corporation
Breaking down the components
Memory Spaces are divided into Memory Subspaces that associates the different parts of the Memory Space with different collection strategies.
A Memory Subspace is responsible for handling allocation requests and failures, as well as garbage collection requests, made on different parts of the heap.
Memory Space
New Old
Heap
segments
8
IBM Products
J9/Modron © 2004 IBM Corporation
Breaking down the components
A Memory Subspace that can allocate from the heap uses a Memory Pool, which handles adding/removing from the free list
A Pool is only responsible for the management of the free list– Not responsible for garbage collection
– Not responsible for object initialization
– Handle any synchronization issues
Memory Space
New Old
Bump Ptr Address Ordered List
9
IBM Products
J9/Modron © 2004 IBM Corporation
Memory Space Breakdown
“Flat” Memory Space
Memory Space
Memory Sub Space
Memory PoolHandles allocation for a particular memory space
Focal point for allocation, failed allocate and collection
Container class that groups regions of memory into a “space”
10
IBM Products
J9/Modron © 2004 IBM Corporation
Breaking down the components
Memory Subspaces also have Collectors associated to them as part of the allocation failure handling process
– Can call for collect if the associated pool fails the allocation request Collections of a particular subspace are responsible for memory
associated to it and all its child subspaces
New Old
Generational Global GC
Local GC
11
IBM Products
J9/Modron © 2004 IBM Corporation
Breaking down the components
Expansion and contraction of the various Memory Sub Spaces is handled by Physical Arenas (PA) and Physical Sub Arenas (PSA).
– PA are associated to the memory space and
– PSA are responsible for communicating directly with the heap (and its governing PA) when allocating or releasing memory
Memory Space
New Old
Heap
Physical Arena
Physical Sub Arena Physical Sub Arena
segments
12
IBM Products
J9/Modron © 2004 IBM Corporation
Breaking down the components
Heap
Memory Sub Space
Memory Pool
Memory Space
Physical Sub Arena
Physical Arena
GC
segments
13
IBM Products
J9/Modron © 2004 IBM Corporation
VM Facilities
Whenever I hear this:
We’d love to write a collector for J9!
It is usually followed by these two questions:
1. How do I “stop the world” so that I can collect?2. How do I walk the stacks to find all references, or can we even do
that?
14
IBM Products
J9/Modron © 2004 IBM Corporation
VM Facilities – Stop The World
J9 uses a co-operative suspend mechanism– Java threads are either actively mutating the heap or are external to
the VM (e.g., JNI calls)
– All threads are sent asynchronous messages to “suspend” through VM facilities
– Threads external to the VM during a suspend request are locked from re-entering the VM
– When all threads have been accounted for, the stop is successful
Motivation: Not all embedded platforms have thread packages that work.
Question: But why not use thread package facilities when available?
15
IBM Products
J9/Modron © 2004 IBM Corporation
VM Facilities – Scanning stacks
Part of the root set walk is finding all references on stacks. If you are unable to tell where references are on the stack, you can pessimistically decide that anything that looks like a reference is.
Co-operative suspend model: Part of the agreement is to leave the stack in a well understood state
– This includes being able to find all references at all times
The collector can find all references through a stack walk
16
IBM Products
J9/Modron © 2004 IBM Corporation
The Heap LockSpecJBB2000 (http://www.spec.org/jbb2000/)
Thr
ough
put
Warehouses
17
IBM Products
J9/Modron © 2004 IBM Corporation
The Heap Lock
Possibly the single most important item for scaling!– Misleading term; the focus is on possible contention in acquiring
memory for allocation in the heap.
Simplest example: The bump pointer.
Guarantee a compacted heap, easy to allocate the free entry– Inlining this allocate (JIT, VM) is also easy.
Heap
allocate ptr
(Always compacting the entire heap may be slow, but there are ways to mitigate this)
18
IBM Products
J9/Modron © 2004 IBM Corporation
The Heap Lock
Problem with Bump Pointer: Does not handle contention well.– Many threads, many CPU’s – much looping trying to bump the pointer
– CPU is busy, but doing nothing
– Bus lock contention attempting the atomically change the value
Reduce contention on the lock by going to it less:“batch” allocate more heap than we need each time
19
IBM Products
J9/Modron © 2004 IBM Corporation
The Heap Lock
Thread Local Heaps (TLH)
Allocate a region of memory from the free list each time– Region is local to a thread only (no contention to allocate out of)
– Reduces the time on lock per object
Heap
Thread local
Works well for a true free list system (fragmented heap).
20
IBM Products
J9/Modron © 2004 IBM Corporation
The Heap Lock
Few key points to making TLH work,
Need to guarantee some form of minimum size on the free list– Too small, and you’ve gone back to single lock/single object
– Too large, unnecessarily fragmenting the heap (dark matter)
– Variable? Might make sense, if average object size changes Vary the rate of consumption when allocating a TLH
– Threads that allocate frequently grow their TLH consumption rate(Hungry threads get fed more)
– Quiet threads keep a low consumption rate
Minimum free list size is the only guarantee on what you’ll get back!
21
IBM Products
J9/Modron © 2004 IBM Corporation
Object (Header) Overhead
One of the most asked questions from customers:
What is your object header size?
Motivation for the question is really two things:
1. Reduce memory foot print (embedded space)2. Reduce frequency of garbage collection (more efficient use of
heap)
The better question is:What is the average overhead per object wrt/ heap and total memory?
22
IBM Products
J9/Modron © 2004 IBM Corporation
Object (Header) Overhead
Heap factors: Alignment? More than pointer width, chance for wastage Monitor slot, hash code – Does it exist? Cost of creating it?
Non-heap factors: Meta level structures for completing collection
– Mark maps
– Card tables Meta level data for completing collection
– Reference object description
– Object allocation map
This does not even include the cost associated with execution!
23
IBM Products
J9/Modron © 2004 IBM Corporation
The Collectors
J9 has historically had a generational garbage collector– Two generations
– New area collected by a semi-space copying collector– Old area collector by mark/sweep/compact (+ new area)
– Always compacted, always stayed small
– Non-contiguous address space for heap
J9 continues to be generational + offers single generation collector– Two configurations, tiny and standard (parallel)
– New area is optional
– Compaction is optional
24
IBM Products
J9/Modron © 2004 IBM Corporation
Global Collector
There are 3 parts to the global collector,
Marking– Traces through root sets and objects to find all live objects in the
system Sweep
– Finds all objects that are dead (or that have died previously) and forms the free list
Compact– Shuffle live objects in memory such that free entries form large
contiguous chunks
25
IBM Products
J9/Modron © 2004 IBM Corporation
Global Collector - Marking
For small devices, the less extra data you carry around, the better– Already a small amount of memory
Keep in mind the heap can be a series of malloc()’d pieces of memory– Why not allocate a big chunk? Can’t, or don’t want to.
To actually mark an object as live, set a bit in the object header– Class slot (which is aligned) is as good as any
Trace through marked objects by keeping a list what has been visited– This can be achieved by adding two slots to the Class type
26
IBM Products
J9/Modron © 2004 IBM Corporation
Global Collector - Marking
Instance linking
Class A
classLink
instanceLink
Class M
classLink
instanceLink
Class X
classLink
instanceLink
Object A1
classPointer
Object A2
classPointer
Object M1
classPointer
Object X1
classPointer
27
IBM Products
J9/Modron © 2004 IBM Corporation
Global Collector - Marking
Positives: Overhead of 2 words per type in the system for tracing
– +1 bit to declare the object as marked Trace through an instance entirely in one shot
Negatives: The entire heap gets written to (caching)
– Twice, once to mark and once to clear the “mark” bit Not parallel friendly
28
IBM Products
J9/Modron © 2004 IBM Corporation
Global Collector - Marking
On larger machines, the heap is contiguous memory– A mark map is used to track objects
– Single bit per slot on the heap (3.125% overhead)
Heap
mark map
29
IBM Products
J9/Modron © 2004 IBM Corporation
Global Collector - Marking
Tracing live objects has 2 stages– Gathering all root references
– Stacks, JNI references, constant tables, classes
– Recursive scanning of live objects until no new objects remain A marking thread uses a WorkStack
– Push objects that it has successfully marked
– Pop objects whose fields should be scanned Objects have all their fields scanned at the same time
– Cache-friendly technique
30
IBM Products
J9/Modron © 2004 IBM Corporation
Global Collector - Marking
The WorkStack uses an input/output system– Queue for items to process (object whose fields it has scanned)
– Queue for objects it has found and successfully marked
Wor
kSta
ckInput queue
Output queue
Header
Next object to scan
(object fields)
31
IBM Products
J9/Modron © 2004 IBM Corporation
Global Collector - Marking
The WorkStack uses an input/output system– Queue for items to process (object whose fields it has scanned)
– Queue for objects it has found and successfully marked
Wor
kSta
ckInput queue
Output queue
Header
Next object to scan
32
IBM Products
J9/Modron © 2004 IBM Corporation
Global Collector - Marking
The queue used by the WorkStack is called a WorkPacket– There are many WorkPacket queues in the collector
– The WorkPackets object manages full/empty (to be processed, available for filling) WorkPacket objects
Wor
kSta
ck
WP (input)
WP (output)
Wor
kPac
kets
WP (empty) WP (empty) WP (empty)
WP (full) WP (full)
33
IBM Products
J9/Modron © 2004 IBM Corporation
Global Collector - Marking
The queue used by the WorkStack is called a WorkPacket– There are many WorkPacket queues in the collector
– The WorkPackets object manages full/empty (to be processed, available for filling) WorkPacket objects
Wor
kSta
ck
WP (input)
WP (output)
Wor
kPac
kets
WP (empty) WP (empty) WP (empty)
WP (full) WP (full)
34
IBM Products
J9/Modron © 2004 IBM Corporation
Global Collector - Marking
Of course… we only have finite resources.– You can run out of WorkPackets, so how do you handle overflow?
1. Take a full packet, and move its contents to an overflow “list” to be processed later
2. Overflow avoidance
35
IBM Products
J9/Modron © 2004 IBM Corporation
Global Collector - Marking
Reference array splitting– Do not scan all references at once
– Defer scanning to the output WorkPacket
– API to push two elements: Array and Index
Wor
kSta
ckInput queue
Output queue
Header
Reference Array
36
IBM Products
J9/Modron © 2004 IBM Corporation
Global Collector - Marking
Reference array splitting– Do not scan all references at once
– Defer scanning to the output WorkPacket
– API to push two elements: Array and Index
Wor
kSta
ckInput queue
Output queue
Header
Reference Array
index
37
IBM Products
J9/Modron © 2004 IBM Corporation
Global Collector - Marking
Positives: Parallel story for tracing
– Also included a work sharing story “Marking” is a more localized operation
Negatives: Destroyed part of the locality for tracing
– But was it any better before?
– We could improve this anyways Memory overhead
38
IBM Products
J9/Modron © 2004 IBM Corporation
Global Collector - Sweep
Find everything that is dead (or has died previously) and try to add it to the free list.
For the small collector this is easy:– Walk the heap memory, unmarking live objects and coalescing
unmarked objects into the free list
– Very single threaded approach
For the large collector, this is almost as easy:– Walk the mark map, finding ranges of zeroes that might be free list
candidates and process them
This doesn’t appear to need any extra data structures.. Unless you parallelize it.
39
IBM Products
J9/Modron © 2004 IBM Corporation
Global Collection - Sweep
Sweep Chunk
Inner Free List
40
IBM Products
J9/Modron © 2004 IBM Corporation
Global Collection - Sweep
Sweep Chunk
Leading free entry Projection
Each chunk records 3 things• Inner free list• Leading free entry• Projection
Chunks are then connected
41
IBM Products
J9/Modron © 2004 IBM Corporation
Global Collection - Sweep
Connecting chunks has a few gotchas– Chunks that appear completely empty
– Projections can span several chunks
– Chunks that appeared to be completely free are in fact consumed
42
IBM Products
J9/Modron © 2004 IBM Corporation
Local Collection
“Objects die young”
Semi-Space copying collector for the nursery– Typically a fraction of total heap size
Objects are promoted to old space when a copy threshold is reached
– The age is adaptive based on nursery population
– Tenure quickly based on size (not available – not hard)
43
IBM Products
J9/Modron © 2004 IBM Corporation
Local Collection
Allocate
Survivor
Tenure
44
IBM Products
J9/Modron © 2004 IBM Corporation
Local Collection
Allocate
Survivor
Tenure
45
IBM Products
J9/Modron © 2004 IBM Corporation
Local Collection
Allocate
Survivor
Tenure
46
IBM Products
J9/Modron © 2004 IBM Corporation
Local Collection
Work is done in parallel– Must atomic install a forwarding pointer
– Avoid contention when allocating memory for a flip or tenure
– Can use a TLH style system
– Problem: Can exceed the space available
Allocate
Survivor
Wasted Space?
47
IBM Products
J9/Modron © 2004 IBM Corporation
Local Collection
A remembered set is used to record objects in Old space that (potentially) contain references to new space
– Set is grown by mutator, never shrunk
– Collector is responsible for shrinking
– Old objects appear only once in the remembered set
48
IBM Products
J9/Modron © 2004 IBM Corporation
Local Collection
New Space
Old Space
Remembered Set
49
IBM Products
J9/Modron © 2004 IBM Corporation
Local Collection
New Space
Old Space
Remembered Set
Relax contention on the Remembered Set with TLH style allocation
50
IBM Products
J9/Modron © 2004 IBM Corporation
Local Collection – The age debate
How come you have more than one age group?
Copying objects between spaces many times isn’t a waste of effort if the sum of the work is less than the cost of a garbage collect.
– Cost to garbage collect
– Cost to potentially compact
– Cost in fragmentation of the old area (allocator)
Remember: Not only does the scavenger offer short collection times, it acts as a form of incremental compactor (and that helps allocation times)
51
IBM Products
J9/Modron © 2004 IBM Corporation
Local Collection
“Objects die young”Combine this statement with the fact that we have a mechanism to abort a
scavenge.
Why have a 50/50 split between the allocation and survivor areas?
allocate survivor
allocatesurvivor
52
IBM Products
J9/Modron © 2004 IBM Corporation
RAS (Reliability Accessibility Serviceability)
Bad things happen– User native code runs rampant, corrupts your heap
– The VM/JIT has bugs
– The GC has bugs
When crashes occur in the field, debugging can be difficult– Machine is inaccessible to you
– Machine may be too small to hold trace info or core file
– Core file may contain IP that the customer wants to protect
53
IBM Products
J9/Modron © 2004 IBM Corporation
RAS (Reliability Accessibility Serviceability)
Assert()’s and trace points are all well and good……many GC related bugs occur long before the crash point, even many
collections removed
Heap verification– Before/After collection
– Triggered events (class unload, finalization, etc)
– Both structural and checksum style verification
– Also extend to verifying structures within VM (JNI Global References)
– Post mortem solutions (core files)
Runtime repairing of structures/heap also possible
54
IBM Products
J9/Modron © 2004 IBM Corporation
Resource Managed Support
As mentioned earlier, a typical “run” has a single Memory Space– All threads allocate from the same heap hierarchy
– There is only one hierarchy (new/old, single, etc)
Some programs are designed in modular fashion, where individual tasks could be considered stand alone programs themselves
To avoid heap pollution, each of these “tasks” gets its own memory space in the heap
– If the sizing of the Memory Spaces is right, we can avoid GC
– When the task is complete, we can “destroy” the space
55
IBM Products
J9/Modron © 2004 IBM Corporation
In summary…
Garbage collection isn’t hard to understand– It’s occasionally painful to debug
– Having a model you can understand across all collectors is important The decisions you make at one end affect the other
– Performance can affect footprint
– Footprint can affect performance
– Optimizing for certain scenarios can lead to large corner cases The environment you are targeting affects all your decisions
– Size, speed, hardware features Almost as important as your collector are the tools used to debug it
– Particularly the ones you write yourself