apache geode offheap storage
TRANSCRIPT
![Page 1: Apache Geode Offheap Storage](https://reader036.vdocuments.net/reader036/viewer/2022062306/5889bf891a28abca448b4bdd/html5/thumbnails/1.jpg)
Off-heap Storage •
![Page 2: Apache Geode Offheap Storage](https://reader036.vdocuments.net/reader036/viewer/2022062306/5889bf891a28abca448b4bdd/html5/thumbnails/2.jpg)
Agenda• Motivation and goals for off-heap storage• Off-heap features and usage• Implementation overview• Preliminary benchmarks: off-heap vs. heap• Tips and best practices
![Page 3: Apache Geode Offheap Storage](https://reader036.vdocuments.net/reader036/viewer/2022062306/5889bf891a28abca448b4bdd/html5/thumbnails/3.jpg)
Motivation and goals for off-heap storage
![Page 4: Apache Geode Offheap Storage](https://reader036.vdocuments.net/reader036/viewer/2022062306/5889bf891a28abca448b4bdd/html5/thumbnails/4.jpg)
Why Off-heap
• • Increase data density and reduce memory overhead
• 50+ GB user data in one JVM
• 10+ TB user data in one cluster
• Usable out-of-box without extensive GC tuning of JVM
• Maintain existing throughput performance
![Page 5: Apache Geode Offheap Storage](https://reader036.vdocuments.net/reader036/viewer/2022062306/5889bf891a28abca448b4bdd/html5/thumbnails/5.jpg)
Off-heap Usage and Features
![Page 6: Apache Geode Offheap Storage](https://reader036.vdocuments.net/reader036/viewer/2022062306/5889bf891a28abca448b4bdd/html5/thumbnails/6.jpg)
Off-heap: How Do I Use It?
• Set the off-heap memory size for the process– Using the new property: off-heap-memory-size
• Mark regions whose entry values should be stored off-heap
– Using the new region attribute: off-heap (false | true)• Adjust the JVM heap memory size down accordingly
– The smaller the better; at least try to keep it below 32G• Optionally
– Configure Resource Manager
![Page 7: Apache Geode Offheap Storage](https://reader036.vdocuments.net/reader036/viewer/2022062306/5889bf891a28abca448b4bdd/html5/thumbnails/7.jpg)
Off-heap Features
• Startup options
• Interaction with other features
• Resource Manager
• Monitoring & Management
• Limitations
![Page 8: Apache Geode Offheap Storage](https://reader036.vdocuments.net/reader036/viewer/2022062306/5889bf891a28abca448b4bdd/html5/thumbnails/8.jpg)
Startup Options
• --off-heap-memory-size – specifies amount of off-heap memory to allocate
• -lock-memory – specifies to lock memory from the OS
• Example:
gfsh start server –initial-heap=10G –max-heap=10G –off-heap-memory-size=200G –lock-memory=true
![Page 9: Apache Geode Offheap Storage](https://reader036.vdocuments.net/reader036/viewer/2022062306/5889bf891a28abca448b4bdd/html5/thumbnails/9.jpg)
Off-heap Interaction with Other Features
• PDX– Values currently copied from off-heap to create a
PDXInstance• Deltas: expensive• Compression: compatible with off-heap• Querying: more expensive with off-heap• EntryEvents
– Limited availability of oldValue, newValue• Indexes
– Functional range indexes not supported (too expensive)
![Page 10: Apache Geode Offheap Storage](https://reader036.vdocuments.net/reader036/viewer/2022062306/5889bf891a28abca448b4bdd/html5/thumbnails/10.jpg)
Off-heap and Resource Manager
• Out of Memory Semantics
• Eviction and Critical Thresholds
• Resource Manager API
![Page 11: Apache Geode Offheap Storage](https://reader036.vdocuments.net/reader036/viewer/2022062306/5889bf891a28abca448b4bdd/html5/thumbnails/11.jpg)
Out of Memory occurs when...
• Java heap runs out of memory– Threads start throwing OutOfMemoryError
• Off-heap runs out of memory– Threads start throwing OutOfOffHeapMemoryException
• => causing the Geode member to close and disconnect
– Closes the Cache to prevent reading inconsistent data– Disconnects from the Geode cluster to prevent distribution
problems or hangs
![Page 12: Apache Geode Offheap Storage](https://reader036.vdocuments.net/reader036/viewer/2022062306/5889bf891a28abca448b4bdd/html5/thumbnails/12.jpg)
Eviction and Critical Thresholds for Java Heap
• CriticalHeapPercentage– triggers LowMemoryException for puts into heap regions– default is 90% – critical member informs other members that it is critical
• EvictionHeapPercentage– triggers eviction of entries in heap regions configured with
LRU_HEAP– default is 90% of CriticalHeapPercentage
![Page 13: Apache Geode Offheap Storage](https://reader036.vdocuments.net/reader036/viewer/2022062306/5889bf891a28abca448b4bdd/html5/thumbnails/13.jpg)
Eviction and Critical Thresholds for Off-heap
• CriticalOffHeapPercentage– triggers LowMemoryException for puts into off-heap
regions– default is 90% if –off-heap-memory-size is specified– critical member informs other members that it is critical
• EvictionOffHeapPercentage– triggers eviction of entries in off-heap regions configured
with LRU_HEAP– default is 90% of CriticalOffHeapPercentage if –off-heap-
memory-size is specified
![Page 14: Apache Geode Offheap Storage](https://reader036.vdocuments.net/reader036/viewer/2022062306/5889bf891a28abca448b4bdd/html5/thumbnails/14.jpg)
Startup Options
• Existing:• -critical-heap-percentage• -eviction-heap-percentage
• New:• -critical-off-heap-percentage• -eviction-off-heap-percentage
• Example:start server –initial-heap=10G –max-heap=10G –off-heap-memory-size=200G –lock-memory=true –critical-off-heap-percentage=99
![Page 15: Apache Geode Offheap Storage](https://reader036.vdocuments.net/reader036/viewer/2022062306/5889bf891a28abca448b4bdd/html5/thumbnails/15.jpg)
ResourceManager API
• GemFireCache#getResourceManager()• com.gemstone.gemfire.cache.control.ResourceMana
ger– exposes getters/setters for all of the heap and off-heap
threshold percentages– Examples:
▪ public void setCriticalOffHeapPercentage(float offHeapPercentage);▪ public float getCriticalOffHeapPercentage();
![Page 16: Apache Geode Offheap Storage](https://reader036.vdocuments.net/reader036/viewer/2022062306/5889bf891a28abca448b4bdd/html5/thumbnails/16.jpg)
Monitoring & Management
• Statistics
• Mbeans
• gfsh
![Page 17: Apache Geode Offheap Storage](https://reader036.vdocuments.net/reader036/viewer/2022062306/5889bf891a28abca448b4bdd/html5/thumbnails/17.jpg)
Statisticsname descriptioncompactions The total number of times off-heap memory has been compacted.compactionTime The total time spent compacting off-heap memory.fragmentation The percentage of off-heap memory fragmentation. Updated every time a
compaction is performed.fragments The number of fragments of free off-heap memory. Updated every time a
compaction is done.freeMemory The amount of off-heap memory, in bytes, that is not being used.largestFragment The largest fragment of memory found by the last compaction of off heap memory.
Updated every time a compaction is done.maxMemory The maximum amount of off-heap memory, in bytes. This is the amount of memory
allocated at startup and does not change.objects The number of objects stored in off-heap memory.reads The total number of reads of off-heap memory.usedMemory The amount of off-heap memory, in bytes, that is being used to store data.
![Page 18: Apache Geode Offheap Storage](https://reader036.vdocuments.net/reader036/viewer/2022062306/5889bf891a28abca448b4bdd/html5/thumbnails/18.jpg)
MBeansMemberMXBeangetOffHeapCompactionTime -- provides the value of the compactionTime statisticgetOffHeapFragmentation -- provides the value of the fragmentation statisticgetOffHeapFreeMemory -- provides the value of the freeMemory statisticgetOffHeapObjects -- provides the value of the objects statistic
getOffHeapUsedMemory -- provides the value of the usedMemory statisticgetOffHeapMaxMemory -- provides the value of freeMemory + usedMemory
RegionMXBeanlistRegionAttributes (operation)
enableOffHeapMemory (true | false)
![Page 19: Apache Geode Offheap Storage](https://reader036.vdocuments.net/reader036/viewer/2022062306/5889bf891a28abca448b4bdd/html5/thumbnails/19.jpg)
Gfsh Support for Off-heap Memory
• alter disk-store: new option "--off-heap" for setting off-heap for each region in the disk-store
• create region: new option "--off-heap" for setting off-heap• describe member: now displays the off-heap size• describe offline-disk-store: now shows if a region is off-
heap• describe region: now displays the off-heap region attribute• show metrics: Now has an offheap category. The offheap
metrics are: maxMemory, freeMemory, usedMemory, objects, fragmentation, and compactionTime
• start server: added --lock-memory, --off-heap-memory-size, --critical-off-heap-percentage, and --eviction-off-heap-perentage
![Page 20: Apache Geode Offheap Storage](https://reader036.vdocuments.net/reader036/viewer/2022062306/5889bf891a28abca448b4bdd/html5/thumbnails/20.jpg)
Off-heap Limitations
• Maximum object size limited to slightly less than 2 GB
• All data nodes must consistently configure a region to be off-heap
• Functional Range Indexes not supported• Keys, subscription queue entries not stored off-heap• Fragmentation statistic is only updated during off-
heap compactions
![Page 21: Apache Geode Offheap Storage](https://reader036.vdocuments.net/reader036/viewer/2022062306/5889bf891a28abca448b4bdd/html5/thumbnails/21.jpg)
Implementation Overview
![Page 22: Apache Geode Offheap Storage](https://reader036.vdocuments.net/reader036/viewer/2022062306/5889bf891a28abca448b4bdd/html5/thumbnails/22.jpg)
Off-heap: How are We Doing It?
• Using memory that is separate from the Java heap– Build our own Memory Manager– Memory Manager is very finely tuned and specific to our
usage– Avoid GC overhead
▪ Avoid copying of objects for promotion between generations▪ Garbage Collector is a major performance killer
– Use sun.misc.Unsafe API for performance• Optimizing code to minimize usage of heap memory• Using off-heap as primary store instead of
overflowing to it
![Page 23: Apache Geode Offheap Storage](https://reader036.vdocuments.net/reader036/viewer/2022062306/5889bf891a28abca448b4bdd/html5/thumbnails/23.jpg)
Off-heapMemory
Management
![Page 24: Apache Geode Offheap Storage](https://reader036.vdocuments.net/reader036/viewer/2022062306/5889bf891a28abca448b4bdd/html5/thumbnails/24.jpg)
Off-heap Implementation
• Memory allocated in 2GB slabs– Max data value size: ~2GB– Object values stored serialized; blobs stored as byte arrays– Allocation faster for values < 128KB
▪ Controlled by a system property: gemfire.OFF_HEAP_FREE_LIST_COUNT
▪ First try to allocate from the free list; if that fails, allocate from unused memory
▪ Small values (< 8B) inlined (not using any off-heap space)
• Compaction consolidates free memory to minimize fragmentation
– Blocks writes; best to avoid by minimizing fragmentation
![Page 25: Apache Geode Offheap Storage](https://reader036.vdocuments.net/reader036/viewer/2022062306/5889bf891a28abca448b4bdd/html5/thumbnails/25.jpg)
Off-heap Implementation (cont’d)
• Allocated chunks– Header
▪ isSerialized▪ isCompressed▪ Size▪ Padding size
• Free chunks– Header
▪ Size▪ Address of next chunk in the list
![Page 26: Apache Geode Offheap Storage](https://reader036.vdocuments.net/reader036/viewer/2022062306/5889bf891a28abca448b4bdd/html5/thumbnails/26.jpg)
What is Stored On-heap vs. Off-heap
Stored On-heap Stored Off-heapRegion Meta-Data ValuesEntry Meta-Data Reference CountsOff-Heap Addresses Lists of Free Memory BlocksKeys WAN Queue ElementsIndexesSubscription Queue Elements
![Page 27: Apache Geode Offheap Storage](https://reader036.vdocuments.net/reader036/viewer/2022062306/5889bf891a28abca448b4bdd/html5/thumbnails/27.jpg)
Preliminary Benchmarks: Off-heap
vs. Heap
![Page 28: Apache Geode Offheap Storage](https://reader036.vdocuments.net/reader036/viewer/2022062306/5889bf891a28abca448b4bdd/html5/thumbnails/28.jpg)
Off-heap: Initial Testing Results
• 256 GB user data per node across 8 nodes for total of 2 TB of user data
• Heap-only test worked twice as hard to produce 1/3 the updates as test using Off-Heap
– Details on the next slide• Succeeded in scaling up to much larger in-memory
data• Increased throughput of operations for large data
sets
![Page 29: Apache Geode Offheap Storage](https://reader036.vdocuments.net/reader036/viewer/2022062306/5889bf891a28abca448b4bdd/html5/thumbnails/29.jpg)
Heap vs. Off-Heap Comparison
Java Heap Off-Heapcreates/sec 30,000 45,000updates/sec 17,000 (std dev: 2130) 51,000 (std dev: 737)
Java RSS size 50 GB 32 GBCPU load 70% (load avg 10
cpus)32% (load avg 5 cpus)
JVM GC ConcurrentMarkSweep ConcurrentMarkSweepGC ms/sec 777 ms 24 msGC marks (GC pauses) 1 per 30 sec never
![Page 30: Apache Geode Offheap Storage](https://reader036.vdocuments.net/reader036/viewer/2022062306/5889bf891a28abca448b4bdd/html5/thumbnails/30.jpg)
Recommendations and Best Practices
![Page 31: Apache Geode Offheap Storage](https://reader036.vdocuments.net/reader036/viewer/2022062306/5889bf891a28abca448b4bdd/html5/thumbnails/31.jpg)
Off-heap Rules of Thumb
• Avoid fragmentation– In order to avoid compaction– Avoid usage patterns that lead to fragmentation
– Many updates of varying value size
• Avoid “unfriendly” features– Deltas– Functional Range Indexes– Querying
![Page 32: Apache Geode Offheap Storage](https://reader036.vdocuments.net/reader036/viewer/2022062306/5889bf891a28abca448b4bdd/html5/thumbnails/32.jpg)
Off-heap Recommendations
• Do use when– The values are relatively uniform in size– The values are mostly less than 128K in size– The usage patterns involve cycles of many creates followed
by destroys or clear– The values do not need to be frequently deserialized
• Configure all data nodes with the same off-heap-memory-size
![Page 33: Apache Geode Offheap Storage](https://reader036.vdocuments.net/reader036/viewer/2022062306/5889bf891a28abca448b4bdd/html5/thumbnails/33.jpg)
Questions for You...
![Page 34: Apache Geode Offheap Storage](https://reader036.vdocuments.net/reader036/viewer/2022062306/5889bf891a28abca448b4bdd/html5/thumbnails/34.jpg)
We’d appreciate your thoughts...
• Would you like an API to invoke a compaction?• Would you like to be able to configure the slab size?• Would you like to configure the max value size for
the most efficient off-heap allocation, or maybe the size increment?
• Anything else?• Full spec at:
https://cwiki.apache.org/confluence/display/GEODE/Off-Heap+Memory+Spec
![Page 35: Apache Geode Offheap Storage](https://reader036.vdocuments.net/reader036/viewer/2022062306/5889bf891a28abca448b4bdd/html5/thumbnails/35.jpg)
Thank You!