data-intensive computing: from clouds to gpus gagan agrawal december 3, 20151

43
Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal March 26, 2022 1

Upload: audra-summers

Post on 13-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal December 3, 20151

Data-Intensive Computing: From Clouds to GPUs

Gagan Agrawal

April 18, 20231

Page 2: Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal December 3, 20151

April 18, 20232

Growing need for analysis of large scale data Scientific Commercial

Data-intensive Supercomputing (DISC) Map-Reduce has received a lot of attention

Database and Datamining communities High performance computing community

Motivation

Page 3: Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal December 3, 20151

Motivation (2) Processor Architecture Trends

Clock speeds are not increasing Trend towards multi-core architectures Accelerators like GPUs are very popular Cluster of multi-cores / GPUs are becoming

common Trend Towards the Cloud

Use storage/computing/services from a provider How many of you prefer gmail over cse/osu account for

e-mail? Utility model of computation Need high-level APIs and adaptation

April 18, 20233

Page 4: Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal December 3, 20151

My Research Group

Data-intensive theme at multiple levels Parallel programming Models Multi-cores and accelerators Adaptive Middleware Scientific Data Management / Workflows Deep Web Integration and Analysis

April 18, 20234

Page 5: Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal December 3, 20151

This Talk Parallel Programming API for Data-Intensive

Computing An alternate API and System for Google’s Map-

Reduce Show actual comparison

Fault-tolerance for data-intensive computing Data-intensive Computing on Accelerators

Compilation for GPUs

April 18, 20235

Page 6: Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal December 3, 20151

Map-Reduce Simple API for (data-intensive) parallel

programming Computation is:

Apply map on each input data element Produce (key,value) pair(s) Sort them using the key Apply reduce on the set with a distinct key values

April 18, 20236

Page 7: Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal December 3, 20151

April 18, 20237

Map-Reduce Execution

Page 8: Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal December 3, 20151

April 18, 20238

Positives: Simple API

Functional language based Very easy to learn

Support for fault-tolerance Important for very large-scale clusters

Questions Performance?

Comparison with other approaches Suitability for different class of applications?

Map-Reduce: Positives and Questions

Page 9: Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal December 3, 20151

Class of Data-Intensive Applications Many different types of applications

Data-center kind of applications Data scans, sorting, indexing

More ``compute-intensive’’ data-intensive applications Machine learning, data mining, NLP Map-reduce / Hadoop being widely used for this class

Standard Database Operations Sigmod 2009 paper compares Hadoop with Databases

and OLAP systems

What is Map-reduce suitable for? What are the alternatives?

MPI/OpenMP/Pthreads – too low level?April 18, 20239

Page 10: Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal December 3, 20151

April 18, 2023 10

FREERIDE: GOALS Developed 2001-2003 at Ohio State Framework for Rapid Implementation of

Data Mining Engines The ability to rapidly prototype a high-

performance mining implementation Distributed memory parallelization Shared memory parallelization Ability to process disk-resident datasets

Only modest modifications to a sequential implementation for the above three

April 18, 202310

Page 11: Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal December 3, 20151

FREERIDE – Technical Basis

April 18, 202311

Popular data mining algorithms have a common canonical loop

Generalized Reduction Can be used as the

basis for supporting a common API

Demonstrated for Popular Data Mining and Scientific Data Processing Applications

While( ) {

forall (data instances d) {

I = process(d)

R(I) = R(I) op f(d)

}

…….

}

Page 12: Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal December 3, 20151

April 18, 202312

Similar, but with subtle differences

Comparing Processing Structure

Page 13: Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal December 3, 20151

April 18, 202313April 18, 2023 13

Processing Structure for FREERIDE

Basic one-stage dataflow

Page 14: Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal December 3, 20151

Observations on Processing Structure Map-Reduce is based on functional idea

Do not maintain state This can lead to sorting overheads FREERIDE API is based on a programmer

managed reduction object Not as ‘clean’ But, avoids sorting Can also help shared memory parallelization Helps better fault-recovery

April 18, 202314

Page 15: Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal December 3, 20151

April 18, 202315

Tuning parameters in Hadoop Input Split size Max number of concurrent map tasks per node Number of reduce tasks

For comparison, we used four applications Data Mining: KMeans, KNN, Apriori Simple data scan application: Wordcount

Experiments on a multi-core cluster 8 cores per node (8 map tasks)

Experiment Design

Page 16: Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal December 3, 20151

April 18, 202316

KMeans: varying # of nodes

0

50

100

150

200

250

300

350

400

4 8 16

HadoopFREERIDE

Avg

. Tim

e P

er

Itera

tion

(sec)

# of nodes

Dataset: 6.4GK : 1000Dim: 3

Results – Data Mining

Page 17: Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal December 3, 20151

April 18, 2023 17

Results – Data Mining (II)

April 18, 202317

Apriori: varying # of nodes

0

20

40

60

80

100

120

140

4 8 16

HadoopFREERIDE

Avg

. Tim

e P

er

Itera

tion

(sec)

# of nodes

Dataset: 900MSupport level: 3%Confidence level: 9%

Page 18: Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal December 3, 20151

April 18, 2023 18April 18, 202318

KNN: varying # of nodes

0

20

40

60

80

100

120

140

160

4 8 16

HadoopFREERIDE

Avg

. Tim

e P

er

Itera

tion

(sec)

# of nodes

Dataset: 6.4GK : 1000Dim: 3

Results – Data Mining (III)

Page 19: Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal December 3, 20151

April 18, 202319

Wordcount: varying # of nodes

0

100

200

300

400

500

600

4 8 16

HadoopFREERIDE

Tota

l Tim

e (

sec)

# of nodes

Dataset: 6.4G

Results – Datacenter-like Application

Page 20: Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal December 3, 20151

April 18, 202320

KMeans: varying dataset size

0

20

40

60

80

100

120

140

160

180

800M 1.6G 3.2G 6.4G 12.8G

HadoopFREERIDE

Avg

. Tim

e P

er

Itera

tion

(sec)

Dataset Size

K : 100Dim: 3On 8 nodes

Scalability Comparison

Page 21: Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal December 3, 20151

April 18, 202321

Wordcount: varying dataset size

0

100

200

300

400

500

600

800M 1.6G 3.2G 6.4G 12.8G

HadoopFREERIDE

Tota

l Tim

e (

sec)

Dataset Size

On 8 nodes

Scalability – Word Count

Page 22: Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal December 3, 20151

Observations

April 18, 202322

Performance issues with Hadoop are now well know

How much of a factor is the API Java, file system

API comparison on the same platform Design of MATE

Map-reduce with an AlternaTE API

Page 23: Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal December 3, 20151

April 18, 202323April 18, 2023 23

Basis: Phoenix implementation Shared memory map-reduce implementation

from Stanford C based An efficient runtime that handles

parallelization, resource management, and fault recovery

Support FREERIDE-like API

Page 24: Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal December 3, 20151

April 18, 202324April 18, 2023 24

Functions APIs provided by the runtime

Function Description R/Oint mate_init(scheudler_args_t * args) R

int mate_scheduler(void * args) R

int mate_finalize(void * args) O

void reduction_object_pre_init() R

int reduction_object_alloc(int size)—return the object id

R

void reduction_object_post_init() R

void accumulate/maximal/minimal(int id, int offset, void * value)

O

void reuse_reduction_object() O

void * get_intermediate_result(int iter, int id, int offset)

O

Page 25: Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal December 3, 20151

April 18, 202325April 18, 2023 25

Experiments: K-means K-means: 400MB, 3-dim points, k = 100 on

one AMD node with 16 cores

Page 26: Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal December 3, 20151

Fault-Tolerance in FREERIDE/MATE Map-reduce supports fault-tolerance by

replicating files Storage and processing time overheads

FREERIDE/MATE API offers another option Reduction object is a low-cost application-level

checkpoint Can support efficient recovery Can also allow redistribution of work on other

nodes

April 18, 202326

Page 27: Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal December 3, 20151

Fault Tolerance Results

April 18, 202327

Page 28: Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal December 3, 20151

This Talk Parallel Programming API for Data-Intensive

Computing An alternate API and System for Google’s Map-

Reduce Show actual comparison

Fault-tolerance for data-intensive computing Data-intensive Computing on Accelerators

Compilation for GPUs

April 18, 202328

Page 29: Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal December 3, 20151

Background - GPU Computing

• Many-core architectures/Accelerators are becoming more popular

• GPUs are inexpensive and fast

• CUDA is a high-level language for GPU programming

Page 30: Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal December 3, 20151

CUDA Programming

• Significant improvement over use of Graphics Libraries

• But .. • Need detailed knowledge of the architecture of GPU and a new

language • Must specify the grid configuration• Deal with memory allocation and movement• Explicit management of memory hierarchy

Page 31: Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal December 3, 20151

Parallel Data mining

• Common structure of data mining applications (FREERIDE)

• /* outer sequential loop *//* outer sequential loop */

  while() {while() {

  /* Reduction loop *//* Reduction loop */

  Foreach (element e){Foreach (element e){

  (i, val) = process(e);(i, val) = process(e);

  Reduc(i) = Reduc(i) Reduc(i) = Reduc(i) opop val; val;

  }}

Page 32: Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal December 3, 20151

Porting on GPUs High-level Parallelization is straight-forward Details of Data Movement Impact of Thread Count on Reduction time Use of shared memory

Page 33: Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal December 3, 20151

Architecture of the System

Variable informati

on

Reduction

functions

Optional functions Code

Analyzer( In LLVM)

Variable

Analyzer

Code Generato

r

Variable Access

Pattern and Combinatio

n Operations

Host Program

Grid configurati

on and kernel

invocation

Kernel functions

Executable

User Input

Page 34: Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal December 3, 20151

User Input

A sequential reduction function

Optional functions (initialization function, combination function…)

Values of each variable or size of array

Variables to be used in the reduction function

Page 35: Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal December 3, 20151

Analysis of Sequential Code

• Get the information of access features of each variable

• Determine the data to be replicated• Get the operator for global combination• Variables for shared memory

Page 36: Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal December 3, 20151

Memory Allocation and Copy

Copy the updates back to host memory after the Copy the updates back to host memory after the kernel reduction function returnskernel reduction function returns

CC..

Need copy for each threadNeed copy for each thread

T0 T1 T2 T3 T4 T61T62 T63 T0 T1

…… ……

T0 T1 T2 T3 T4 T61T62 T63 T0 T1

……

AA..

BB..

Page 37: Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal December 3, 20151

Generating CUDA Code and C++/C code Invoking the Kernel Function

• Memory allocation and copy• Thread grid configuration (block number

and thread number)• Global function• Kernel reduction function• Global combination

Page 38: Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal December 3, 20151

Optimizations

• Using shared memory• Providing user-specified initialization

functions and combination functions• Specifying variables that are allocated

once

Page 39: Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal December 3, 20151

Applications

• K-means clustering• EM clustering• PCA

Page 40: Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal December 3, 20151

K-means Results

Speedups

Page 41: Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal December 3, 20151

Speedup of EM

Page 42: Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal December 3, 20151

Speedup of PCA

Page 43: Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal December 3, 20151

Summary Data-intensive Computing is of growing

importance One size doesn’t fit all

Map-reduce has many limitations Accelerators can be promising for achieving

performance

April 18, 202343