garbage collection auto tuning for java map reduce on multi-cores
TRANSCRIPT
Powerpoint Templates 1
Presentation By:Pradeeban KathiraveluINESC-ID Lisboa Instituto Superior Técnico, Universidade de Lisboa
Garbage Collection Auto-Tuning forJava MapReduce on Multi-Cores
Jeremy Singer George Kovoor Gavin Brown Mikel LujánUniversity of [email protected]
Powerpoint Templates 2
Agenda Introduction Motivation Contributions Evaluation
Scalability GC Impact GC Auto Tuning
Related Work Conclusions
Powerpoint Templates 3
Introduction MRJ, A MapReduce Java Framework
for multi-core architectures Use of memory management auto-
tuning techniques based on machine learning.
MRJ performance within 10% of optimal On 75% of the benchmark tests.
Powerpoint Templates 4
Why GC Auto Tuning?
MRJ end-user cannot be expected to perform expert analysis to determine
GC activity reducing MRJ performance.
How to improve the JVM configuration.
Powerpoint Templates 5
Motivation
Efficient adaptation to benchmark-specific or heap-size-specific anomalies.
Could be installed by the system administrator
automatically enabled for users that do not have sufficient permissions to change JVM parameters.
Enable rapid deployment of MRJ on new multi-core architecture layouts
Powerpoint Templates 6
Contributions A Scalable Java fork/join framework
for MapReduce (MRJ), on a commodity multi-core platform.
A comprehensive study on the impact of Java runtime garbage collection (GC) on MRJ
An auto-tuning approach to optimize GC for MRJ.
Powerpoint Templates 7
MRJ
Same application interface as Hadoop. Only map() and reduce() to be defined. Abstracts away all the details of the
parallelization, runtime scheduling, .. Focus on the application logic.
Powerpoint Templates 8
Evaluation
Scalability evaluation on a four-core, hyperthreaded Intel Core i7 processor
Using standard MapReduce benchmarks.
Powerpoint Templates 10
Scalability of grep
Scalability of grep degrades with increasing numbers ofprocessors, for small heap sizes
Powerpoint Templates 11
GC Overhead
GC overhead increases with the number of processors, more significantly for small heap sizes
Powerpoint Templates 12
Relative GC Performance
Input Dependent Application performance different inputs. Small → Serial. Medium, Large → Parallel and Concurrent. Different Heap Sizes.
Application Dependent Parallel >> Serial & Concurrent ??
Powerpoint Templates 13
sm: concurrent > parallel ?
sm: Search for a word in an input file. Death rate = Total garbage collected
Total execution time
Powerpoint Templates 14
GC Auto Tuning Performance(relative to optimal policy)
Powerpoint Templates 15
GC Auto Tuning Performance(relative to default policy)
Powerpoint Templates 16
Related Work
The original work on MapReduce [13, 14] applies to compute-clusters.
Ranger et al. describe the first application of MapReduce to multi-core processors [31].
Conventional memory management techniques do not scale to large multi-core environments [40].
Application of machine learning to Java runtime performance auto-tuning is a growing trend [26, 39].
Powerpoint Templates 17
Conclusions MRJ: A Java-based framework for MapReduce parallelism
Targets conventional multi-core architectures. Speedups of up to 6x the default GC policy
10% geometric mean speedup over all benchmarks with the largest input data sets.
Scalable performance With increasing # of threads to the underlying Java
fork/join pool Machine-learning GC auto-tuning policy improving the
runtime performance
Powerpoint Templates 19
Selected References
[13] J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. In Proceedings of the 6th symposium on operating systems design and implementation, pages 137–150, 2004.
[14] J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. Communications of the ACM, 51(1):107–113, 2008.
[26] F. Mao and X. Shen. Cross-input learning and discriminative prediction in evolvable virtual machines. In Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization, pages 92–101, 2009.
[31] C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, and C. Kozyrakis. Evaluating mapreduce for multi-core and multiprocessor systems. In Proceedings of the 13th International Symposium on High Performance Computer Architecture, pages 13–24, 2007.
[39] C. Zhang and M. Hirzel. Online phase-adaptive data layout selection. In ECOOP 2008 Object-Oriented Programming, pages 309–334, 2008.
[40] Y. Zhao, J. Shi, K. Zheng, H. Wang, H. Lin, and L. Shao. Allocation wall: a limiting factor of Java applications on emerging multi-core platforms. ACM SIGPLAN Notices, 44(10):361–376, 2009.