mimir:& mapreduce&over&mpi · mimir:& mapreduce&over&mpi...

Post on 16-Oct-2020

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

M i m i r : &

M a p R e d u c e & o v e r & M P ITao&Gao,&Yanfei Guo,&Boyu Zhang,&Pietro&Cicotti,&Yutong&Lu,&Pavan&Balaji,&Michela&Taufer

Project Overview

Data analytics is an integral part of large0scale scientific computing. MapReduce has gained the mosttraction. Efforts have been made to enable efficient MapReduce for supercomputing systems, but theyare limited to homogeneous workloads. Mimir, a novel MapReduce over MPI framework tackles skeweddata, imbalance in memory usage, and loss in data scalability with (a) combiner optimizations tominimize and balance memory usage; (b) dynamic repartitions to achieve close to optimal balance of thememory usage across processes and reduce the execution time; and (c) a split method to handledatasets with superkeys. Results show that Mimir can scale to at least 3,072 processes on the Tianhe02supercomputer.

Scalability StudyBenchmarks: WordCount (WC) – single0passMapReduce application with associative andcommutative reduce function; OctreeClustering (OC) – iterative chain of MR jobs;and Join – single0pass MR application thatmerges two imbalance datasets.Tianhe02: compute node with two Intel XeonE202692v2 CPUs (12 cores each, 24 corestotal) running at 2.2 GHz. Each node has 64GB of memory

Scalability of Mimir in terms of number of processes for the in0memory workflow, the combiner workflow (cb), the dynamicrepartition workflow (rp), and splitting approach (sp). Numbers inbold represent the configuration with best performance afterwhich further optimizations do not increase the performance.

Data&with&imbalanced&values

Data&with&imbalanced&keys

WordCount (WC) Octree&Clustering&(OC) Join

WordCount (WC) Octree&Clustering&(OC)

Publication:&T.\Gao, Y.\Guo, B.\Zhang, P.\Cicotti, Y.\Lu, P.\Balaji,\and\M.\Taufer:\Mimir:&MemoryMEfficient&and&Scalable&MapReduce&for&Large&Supercomputing&Systems. IPDPS 2017: 109801108.

Join

top related