a prediction framework for fast sparse triangular solves

124
* Koç University, Istanbul, Turkey parcorelab.com A Prediction Framework for Fast Sparse Triangular Solves Najeeb Ahmad * , Buse Yilmaz * , Didem Unat * Best Artifact Awardee

Upload: others

Post on 05-Apr-2022

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Prediction Framework for Fast Sparse Triangular Solves

*Koç University, Istanbul, Turkey

parcorelab.com

A Prediction Framework for Fast SparseTriangular Solves

Najeeb Ahmad*, Buse Yilmaz*, Didem Unat*

Best Artifact Awardee

Page 2: A Prediction Framework for Fast Sparse Triangular Solves

Outline

• Part I: Main Topic

– Introduction

– Background and Motivation

– Prediction Framework

– Evaluation

– Related Work

– Conclusion

A Prediction Framework for Fast Sparse Triangular Solves 2

Page 3: A Prediction Framework for Fast Sparse Triangular Solves

Outline

• Part I: Main Topic

– Introduction

– Background and Motivation

– Prediction Framework

– Evaluation

– Related Work

– Conclusion

• Part II: Artifact Evaluation

– Our Artifact Evaluation Experience

A Prediction Framework for Fast Sparse Triangular Solves 2

Page 4: A Prediction Framework for Fast Sparse Triangular Solves

Introduction

• Sparse Triangular Solve (SpTRSV)

– an important computational kernel

– most time-consuming part of an application in many cases• e.g. ILU-preconditioned GMRES solvers [1]

A Prediction Framework for Fast Sparse Triangular Solves 3

Page 5: A Prediction Framework for Fast Sparse Triangular Solves

Introduction

• Sparse Triangular Solve (SpTRSV)

– an important computational kernel

– most time-consuming part of an application in many cases

• e.g. ILU-preconditioned GMRES solvers [1]

• Many CPU, GPU SpTRSV algorithms available

– CPU: Intel MKL, Park et al. [2]

– GPU: NVIDIA cuSPARSE library, Liu et al. [3], Li et al. [4]

A Prediction Framework for Fast Sparse Triangular Solves 3

Page 6: A Prediction Framework for Fast Sparse Triangular Solves

Introduction

• Sparse Triangular Solve (SpTRSV)

– an important computational kernel

– most time-consuming part of an application in many cases

• e.g. ILU-preconditioned GMRES solvers [1]

• Many CPU, GPU SpTRSV algorithms available

– CPU: Intel MKL, Park et al. [2]

– GPU: NVIDIA cuSPARSE library, Liu et al. [3], Li et al. [4]

• No single algorithm/platform performs best for all input matrices

– Algorithm performance varies with matrix sparsity pattern

A Prediction Framework for Fast Sparse Triangular Solves 3

Page 7: A Prediction Framework for Fast Sparse Triangular Solves

Introduction

• Sparse Triangular Solve (SpTRSV)

– an important computational kernel

– most time-consuming part of an application in many cases

• e.g. ILU-preconditioned GMRES solvers [1]

• Many CPU, GPU SpTRSV algorithms available

– CPU: Intel MKL, Park et al. [2]

– GPU: NVIDIA cuSPARSE library, Liu et al. [3], Li et al. [4]

• No single algorithm/platform performs best for all input matrices

– Algorithm performance varies with matrix sparsity pattern

Selecting the fastest SpTRSV algorithm is a non-trivial task!!!

A Prediction Framework for Fast Sparse Triangular Solves 3

Page 8: A Prediction Framework for Fast Sparse Triangular Solves

Contributions

• A machine learning-based framework

– predicts the fastest SpTRSV algorithm on heterogeneous CPU-GPU systems

– automated feature extraction, performance data collection and model training

– Extensible with new training datasets, algorithms

A Prediction Framework for Fast Sparse Triangular Solves 4

Page 9: A Prediction Framework for Fast Sparse Triangular Solves

Contributions

• A machine learning-based framework

– predicts the fastest SpTRSV algorithm on heterogeneous CPU-GPU systems

– automated feature extraction, performance data collection and model training

– Extensible with new training datasets, algorithms

• Performance, accuracy, overhead evaluation of the framework on state-of-the-art CPU-GPU system

A Prediction Framework for Fast Sparse Triangular Solves 4

Page 10: A Prediction Framework for Fast Sparse Triangular Solves

Contributions

• A machine learning-based framework

– predicts the fastest SpTRSV algorithm on heterogeneous CPU-GPU systems

– automated feature extraction, performance data collection and model training

– Extensible with new training datasets, algorithms

• Performance, accuracy, overhead evaluation of the framework on state-of-the-art CPU-GPU system

• Performance study of six SpTRSV algorithms (CPU & GPU)

A Prediction Framework for Fast Sparse Triangular Solves 4

Page 11: A Prediction Framework for Fast Sparse Triangular Solves

Contributions

• A machine learning-based framework

– predicts the fastest SpTRSV algorithm on heterogeneous CPU-GPU systems

– automated feature extraction, performance data collection and model training

– Extensible with new training datasets, algorithms

• Performance, accuracy, overhead evaluation of the framework on state-of-the-art CPU-GPU system

• Performance study of six SpTRSV algorithms (CPU & GPU)

• Identification of matrix sparsity features SpTRSV for performance prediction

A Prediction Framework for Fast Sparse Triangular Solves 4

Page 12: A Prediction Framework for Fast Sparse Triangular Solves

Background and Motivation

• Sparse triangular systems

Ly = b or Ux = y

A Prediction Framework for Fast Sparse Triangular Solves 5

Page 13: A Prediction Framework for Fast Sparse Triangular Solves

Background and Motivation

• Sparse triangular systems

Ly = b or Ux = y

A Prediction Framework for Fast Sparse Triangular Solves 5

L

Page 14: A Prediction Framework for Fast Sparse Triangular Solves

Background and Motivation

• Sparse triangular systems

Ly = b or Ux = y

A Prediction Framework for Fast Sparse Triangular Solves 5

L U

Page 15: A Prediction Framework for Fast Sparse Triangular Solves

Background and Motivation

• SpTRSV characteristics

A Prediction Framework for Fast Sparse Triangular Solves 6

L sparsity pattern

Page 16: A Prediction Framework for Fast Sparse Triangular Solves

Background and Motivation

• SpTRSV characteristics

A Prediction Framework for Fast Sparse Triangular Solves 6

L sparsity pattern Dependency Graph for L

Page 17: A Prediction Framework for Fast Sparse Triangular Solves

Background and Motivation

• SpTRSV Algorithms

– Level-scheduling

– Synchronization-free

A Prediction Framework for Fast Sparse Triangular Solves 7

Page 18: A Prediction Framework for Fast Sparse Triangular Solves

Background and Motivation

• SpTRSV Algorithms

– Level-scheduling

– Synchronization-free

A Prediction Framework for Fast Sparse Triangular Solves 7

Page 19: A Prediction Framework for Fast Sparse Triangular Solves

Background and Motivation

• SpTRSV Algorithms– Level-scheduling

– Synchronization-free

• SpTRSV performance– CPU

• Intel MKL library– MKL(sequential)

– MKL(parallel)

– GPU• NVIDIA cuSPARSE library

– CUS1

– CUS2(level)

– CUS2(no level)

• Sync-Free [3]

A Prediction Framework for Fast Sparse Triangular Solves 7

Page 20: A Prediction Framework for Fast Sparse Triangular Solves

Background and Motivation

• SpTRSV Algorithms– Level-scheduling

– Synchronization-free

• SpTRSV performance– CPU

• Intel MKL library– MKL(sequential)

– MKL(parallel)

– GPU• NVIDIA cuSPARSE library

– CUS1

– CUS2(level)

– CUS2(no level)

• Sync-Free [3]

A Prediction Framework for Fast Sparse Triangular Solves 7

MKL(seq)30%

MKL(par)

5%

CUS119%

CUS2(lvl)19%

CUS2(no lvl)5%

Sync-Free22%

Fastest SpTRSV on Intel Gold(6148) + NVIDIA V100 GPU for37 sparse matrices (from SuiteSparse collection)

Page 21: A Prediction Framework for Fast Sparse Triangular Solves

Background and Motivation

• SpTRSV Algorithms– Level-scheduling

– Synchronization-free

• SpTRSV performance– CPU

• Intel MKL library– MKL(sequential)

– MKL(parallel)

– GPU• NVIDIA cuSPARSE library

– CUS1

– CUS2(level)

– CUS2(no level)

• Sync-Free [3]

A Prediction Framework for Fast Sparse Triangular Solves 7

CPU, 35%

GPU, 65%

Fastest SpTRSV on Intel Gold(6148) + NVIDIA V100 GPU for37 sparse matrices (from SuiteSparse collection)

Page 22: A Prediction Framework for Fast Sparse Triangular Solves

Background and Motivation

• SpTRSV Algorithms– Level-scheduling

– Synchronization-free

• SpTRSV performance– CPU

• Intel MKL library– MKL(sequential)

– MKL(parallel)

– GPU• NVIDIA cuSPARSE library

– CUS1

– CUS2(level)

– CUS2(no level)

• Sync-Free [3]

A Prediction Framework for Fast Sparse Triangular Solves 7

How to select the fastest SpTRSV algorithmfor a given input matrix on a CPU-GPU platform?

Page 23: A Prediction Framework for Fast Sparse Triangular Solves

SpTRSV Prediction Framework

• A machine learning-based framework for the fastest SpTRSV prediction on a CPU-GPU machine

– Based on features and SpTRSV performance data of a pre-selected matrix set

A Prediction Framework for Fast Sparse Triangular Solves 8

Page 24: A Prediction Framework for Fast Sparse Triangular Solves

SpTRSV Prediction Framework

• A machine learning-based framework for the fastest SpTRSV prediction on a CPU-GPU machine

– Based on features and SpTRSV performance data of a pre-selected matrix set

• Framework overview

– Five components

A Prediction Framework for Fast Sparse Triangular Solves 8

Page 25: A Prediction Framework for Fast Sparse Triangular Solves

SpTRSV Prediction Framework

• A machine learning-based framework for the fastest SpTRSV prediction on a CPU-GPU machine

– Based on features and SpTRSV performance data of a pre-selected matrix set

• Framework overview

– Five components

A Prediction Framework for Fast Sparse Triangular Solves 8

SparseMatrixDataset

1

Page 26: A Prediction Framework for Fast Sparse Triangular Solves

SpTRSV Prediction Framework

• A machine learning-based framework for the fastest SpTRSV prediction on a CPU-GPU machine

– Based on features and SpTRSV performance data of a pre-selected matrix set

• Framework overview

– Five components

A Prediction Framework for Fast Sparse Triangular Solves 8

SparseMatrixDataset

MatrixFeature

Extractor

1 2

Page 27: A Prediction Framework for Fast Sparse Triangular Solves

SpTRSV Prediction Framework

• A machine learning-based framework for the fastest SpTRSV prediction on a CPU-GPU machine

– Based on features and SpTRSV performance data of a pre-selected matrix set

• Framework overview

– Five components

A Prediction Framework for Fast Sparse Triangular Solves 8

SparseMatrixDataset

SpTRSVAlgorithmRepository

MatrixFeature

Extractor

1 2

3

Page 28: A Prediction Framework for Fast Sparse Triangular Solves

SpTRSV Prediction Framework

• A machine learning-based framework for the fastest SpTRSV prediction on a CPU-GPU machine

– Based on features and SpTRSV performance data of a pre-selected matrix set

• Framework overview

– Five components

A Prediction Framework for Fast Sparse Triangular Solves 8

SparseMatrixDataset

SpTRSVAlgorithmRepository

MatrixFeature

Extractor

Performance Data

Collector

1 2

3 4

Page 29: A Prediction Framework for Fast Sparse Triangular Solves

SpTRSV Prediction Framework

• A machine learning-based framework for the fastest SpTRSV prediction on a CPU-GPU machine

– Based on features and SpTRSV performance data of a pre-selected matrix set

• Framework overview

– Five components

A Prediction Framework for Fast Sparse Triangular Solves 8

SparseMatrixDataset

SpTRSVAlgorithmRepository

MatrixFeature

Extractor

Performance Data

Collector

Model Trainer And

Tester

1 2

3 4

5

Page 30: A Prediction Framework for Fast Sparse Triangular Solves

SpTRSV Prediction Framework

• A machine learning-based framework for the fastest SpTRSV prediction on a CPU-GPU machine

– Based on features and SpTRSV performance data of a pre-selected matrix set

• Framework overview

– Five components

A Prediction Framework for Fast Sparse Triangular Solves 8

SparseMatrixDataset

SpTRSVAlgorithmRepository

MatrixFeature

Extractor

Performance Data

Collector

Model Trainer And

Tester

1 2

3 4

5

Page 31: A Prediction Framework for Fast Sparse Triangular Solves

SpTRSV Prediction Framework

• A machine learning-based framework for the fastest SpTRSV prediction on a CPU-GPU machine

– Based on features and SpTRSV performance data of a pre-selected matrix set

• Framework overview

– Five components

A Prediction Framework for Fast Sparse Triangular Solves 8

SparseMatrixDataset

SpTRSVAlgorithmRepository

MatrixFeature

Extractor

Performance Data

Collector

Model Trainer And

Tester

PredictionModel

1 2

3 4

5

Page 32: A Prediction Framework for Fast Sparse Triangular Solves

SpTRSV Prediction Framework

• A machine learning-based framework for the fastest SpTRSV prediction on a CPU-GPU machine

– Based on features and SpTRSV performance data of a pre-selected matrix set

• Framework overview

– Five components

A Prediction Framework for Fast Sparse Triangular Solves 8

SparseMatrixDataset

SpTRSVAlgorithmRepository

MatrixFeature

Extractor

Performance Data

Collector

Model Trainer And

Tester

PredictionModel

Input SparseMatrix

PredictedSpTRSVAlgorithm

1 2

3 4

5

Page 33: A Prediction Framework for Fast Sparse Triangular Solves

Feature Selection

• Structural or sparsity features

– Started with ~50 features, 30 features finalized• Based on feature scores

A Prediction Framework for Fast Sparse Triangular Solves 9

Page 34: A Prediction Framework for Fast Sparse Triangular Solves

Feature Selection

• Selected Features

A Prediction Framework for Fast Sparse Triangular Solves 10

No. Features Description Score rank

1 nnzs # of non-zeros 1

6 m Number of rows/columns 6

18 lvls # of levels 15

Page 35: A Prediction Framework for Fast Sparse Triangular Solves

Feature Selection

• Selected Features

A Prediction Framework for Fast Sparse Triangular Solves 10

No. Features Description Score rank

1 nnzs # of non-zeros 1

-2-4 <max, mean, std>_nnz_pl_rw nnz per level row wise stats 2,4,5

-5 max_nnz_pl_cw nnz per level col wise stats 3

6 m Number of rows/columns 6

18 lvls # of levels 15

Page 36: A Prediction Framework for Fast Sparse Triangular Solves

Feature Selection

• Selected Features

A Prediction Framework for Fast Sparse Triangular Solves 10

No. Features Description Score rank

1 nnzs # of non-zeros 1

-2-4 <max, mean, std>_nnz_pl_rw nnz per level row wise stats 2,4,5

-5 max_nnz_pl_cw nnz per level col wise stats 3

6 m Number of rows/columns 6

7-10 <max,mean,median,std>_rpl Rows per level stats 7,12,13,16

18 lvls # of levels 15

Page 37: A Prediction Framework for Fast Sparse Triangular Solves

Feature Selection

• Selected Features

A Prediction Framework for Fast Sparse Triangular Solves 10

No. Features Description Score rank

1 nnzs # of non-zeros 1

-2-4 <max, mean, std>_nnz_pl_rw nnz per level row wise stats 2,4,5

-5 max_nnz_pl_cw nnz per level col wise stats 3

6 m Number of rows/columns 6

7-10 <max,mean,median,std>_rpl Rows per level stats 7,12,13,16

18 lvls # of levels 15

26-30 >_mean_<max,std_mean,median,min rl_pl Row-length per level stats 21,23,24,25,26

Page 38: A Prediction Framework for Fast Sparse Triangular Solves

Feature Selection

• Selected Features

A Prediction Framework for Fast Sparse Triangular Solves 10

No. Features Description Score rank

1 nnzs # of non-zeros 1

-2-4 <max, mean, std>_nnz_pl_rw nnz per level row wise stats 2,4,5

-5 max_nnz_pl_cw nnz per level col wise stats 3

6 m Number of rows/columns 6

7-10 <max,mean,median,std>_rpl Rows per level stats 7,12,13,16

18 lvls # of levels 15

19-21 >_mean_<max,mean,std cl_pl Column-length per level stats 17,18,20

26-30 >_mean_<max,std_mean,median,min rl_pl Row-length per level stats 21,23,24,25,26

Page 39: A Prediction Framework for Fast Sparse Triangular Solves

Feature Selection

• Selected Features

A Prediction Framework for Fast Sparse Triangular Solves 10

No. Features Description Score rank

1 nnzs # of non-zeros 1

-2-4 <max, mean, std>_nnz_pl_rw nnz per level row wise stats 2,4,5

-5 max_nnz_pl_cw nnz per level col wise stats 3

6 m Number of rows/columns 6

7-10 <max,mean,median,std>_rpl Rows per level stats 7,12,13,16

13-14 <max,min>_rl_cnt Rows with max/min lengths 9,11

18 lvls # of levels 15

19-21 >_mean_<max,mean,std cl_pl Column-length per level stats 17,18,20

22-25 rl<mx_mean,median,std>_ Row-length stats 19,27,28,30

26-30 >_mean_<max,std_mean,median,min rl_pl Row-length per level stats 21,23,24,25,26

Page 40: A Prediction Framework for Fast Sparse Triangular Solves

Feature Selection

• Selected Features

A Prediction Framework for Fast Sparse Triangular Solves 10

No. Features Description Score rank

1 nnzs # of non-zeros 1

-2-4 <max, mean, std>_nnz_pl_rw nnz per level row wise stats 2,4,5

-5 max_nnz_pl_cw nnz per level col wise stats 3

6 m Number of rows/columns 6

7-10 <max,mean,median,std>_rpl Rows per level stats 7,12,13,16

<11-12 min,max>_cl_cnt Columns with max/min length 8,10

13-14 <max,min>_rl_cnt Rows with max/min lengths 9,11

15-17 <max,std,median>_cl Column-length stats 14,22,29

18 lvls # of levels 15

19-21 >_mean_<max,mean,std cl_pl Column-length per level stats 17,18,20

22-25 rl<mx_mean,median,std>_ Row-length stats 19,27,28,30

26-30 >_mean_<max,std_mean,median,min rl_pl Row-length per level stats 21,23,24,25,26

Page 41: A Prediction Framework for Fast Sparse Triangular Solves

SpTRSV Prediction Framework

• Matrix Feature Extractor

– A C++/CUDA tool• Uses CPU, GPU for efficient feature

extraction

A Prediction Framework for Fast Sparse Triangular Solves 11

Page 42: A Prediction Framework for Fast Sparse Triangular Solves

SpTRSV Prediction Framework

• Matrix Feature Extractor

– A C++/CUDA tool• Uses CPU, GPU for efficient feature

extraction

• Performance data collector

– For input matrix, collect performance data for all algorithms

– Reports ID of the fastest algorithm

A Prediction Framework for Fast Sparse Triangular Solves 11

Page 43: A Prediction Framework for Fast Sparse Triangular Solves

SpTRSV Prediction Framework

• Matrix Feature Extractor

– A C++/CUDA tool• Uses CPU, GPU for efficient feature

extraction

• Performance data collector

– For input matrix, collect performance data for all algorithms

– Reports ID of the fastest algorithm

• Model Trainer and Tester

A Prediction Framework for Fast Sparse Triangular Solves 11

Model Trainer And

Tester

Matrix features

IDs of fastest algorithm

Page 44: A Prediction Framework for Fast Sparse Triangular Solves

SpTRSV Prediction Framework

• Matrix Feature Extractor

– A C++/CUDA tool• Uses CPU, GPU for efficient feature

extraction

• Performance data collector

– For input matrix, collect performance data for all algorithms

– Reports ID of the fastest algorithm

• Model Trainer and Tester

• Model Selection

– Scikit-learn library for model selection and evaluation

A Prediction Framework for Fast Sparse Triangular Solves 11

Model Trainer And

Tester

Matrix features

IDs of fastest algorithm

Page 45: A Prediction Framework for Fast Sparse Triangular Solves

SpTRSV Prediction Framework

• Matrix Feature Extractor

– A C++/CUDA tool• Uses CPU, GPU for efficient feature

extraction

• Performance data collector

– For input matrix, collect performance data for all algorithms

– Reports ID of the fastest algorithm

• Model Trainer and Tester

• Model Selection

– Scikit-learn library for model selection and evaluation

– Supervised machine learning

• Deep learning requires large data sets, training times

A Prediction Framework for Fast Sparse Triangular Solves 11

Model Trainer And

Tester

Matrix features

IDs of fastest algorithm

Page 46: A Prediction Framework for Fast Sparse Triangular Solves

SpTRSV Prediction Framework

• Matrix Feature Extractor

– A C++/CUDA tool

• Uses CPU, GPU for efficient feature extraction

• Performance data collector

– For input matrix, collect performance data for all algorithms

– Reports ID of the fastest algorithm

• Model Trainer and Tester

• Model Selection

– Scikit-learn library for model selection and evaluation

– Supervised machine learning

• Deep learning requires large data sets, training times

– Evaluated classification models:

• Decision trees

• Random Forests

• Support Vector Machine (with grid-search)

• K-Nearest Neighbors

• Multi-Layer Perceptron

A Prediction Framework for Fast Sparse Triangular Solves 11

Model Trainer And

Tester

Matrix features

IDs of fastest algorithm

Page 47: A Prediction Framework for Fast Sparse Triangular Solves

SpTRSV Prediction Framework

• Matrix Feature Extractor

– A C++/CUDA tool

• Uses CPU, GPU for efficient feature extraction

• Performance data collector

– For input matrix, collect performance data for all algorithms

– Reports ID of the fastest algorithm

• Model Trainer and Tester

• Model Selection

– Scikit-learn library for model selection and evaluation

– Supervised machine learning

• Deep learning requires large data sets, training times

– Evaluated classification models:

• Decision trees

• Random Forests

• Support Vector Machine (with grid-search)

• K-Nearest Neighbors

• Multi-Layer Perceptron

A Prediction Framework for Fast Sparse Triangular Solves 11

Model Trainer And

Tester

Matrix features

IDs of fastest algorithm

Page 48: A Prediction Framework for Fast Sparse Triangular Solves

Model Training and Testing

A Prediction Framework for Fast Sparse Triangular Solves 12

SparseMatrixDataset

Feature Scaling

TrainingSet

TestingSet

Page 49: A Prediction Framework for Fast Sparse Triangular Solves

Model Training and Testing

A Prediction Framework for Fast Sparse Triangular Solves 12

SparseMatrixDataset

Feature Scaling

TrainingSet

TestingSet

75%

25%

Page 50: A Prediction Framework for Fast Sparse Triangular Solves

Model Training and Testing

A Prediction Framework for Fast Sparse Triangular Solves 12

SparseMatrixDataset

Feature Scaling

TrainingSet

TestingSet M

atri

x Fe

atu

re E

xtra

ctio

n

AlgorithmPerformance Data

ClassifierTraining

75%

25%

Page 51: A Prediction Framework for Fast Sparse Triangular Solves

Model Training and Testing

A Prediction Framework for Fast Sparse Triangular Solves 12

SparseMatrixDataset

Feature Scaling

TrainingSet

TestingSet M

atri

x Fe

atu

re E

xtra

ctio

n

AlgorithmPerformance Data

ClassifierTraining

Trained Model

75%

25%

Page 52: A Prediction Framework for Fast Sparse Triangular Solves

Model Training and Testing

A Prediction Framework for Fast Sparse Triangular Solves 12

SparseMatrixDataset

Feature Scaling

TrainingSet

TestingSet M

atri

x Fe

atu

re E

xtra

ctio

n

AlgorithmPerformance Data

ClassifierTraining

Trained Model

Predicted Algorithm

75%

25%

Page 53: A Prediction Framework for Fast Sparse Triangular Solves

Model Training and Testing

A Prediction Framework for Fast Sparse Triangular Solves 12

SparseMatrixDataset

Feature Scaling

TrainingSet

TestingSet M

atri

x Fe

atu

re E

xtra

ctio

n

AlgorithmPerformance Data

ClassifierTraining

Trained Model

Predicted Algorithm

75%

25%

10-fold cross validation

Page 54: A Prediction Framework for Fast Sparse Triangular Solves

Effects of CPU-GPU Data Transfers

• Computations before/after SpTRSV on different platforms

A Prediction Framework for Fast Sparse Triangular Solves 13

SpTRSV

Computations on CPU

Computations on GPU

Nu

merical A

pp

lication

Ly = b

Page 55: A Prediction Framework for Fast Sparse Triangular Solves

Effects of CPU-GPU Data Transfers

• Computations before/after SpTRSV on different platforms

A Prediction Framework for Fast Sparse Triangular Solves 13

SpTRSV

Computations on CPU

Computations on GPU

Nu

merical A

pp

lication

Ly = b

SpTRSV on CPU

Page 56: A Prediction Framework for Fast Sparse Triangular Solves

Effects of CPU-GPU Data Transfers

• Computations before/after SpTRSV on different platforms

A Prediction Framework for Fast Sparse Triangular Solves 13

SpTRSV

Computations on CPU

Computations on GPU

Nu

merical A

pp

lication

Ly = b

SpTRSV on CPU

y

Page 57: A Prediction Framework for Fast Sparse Triangular Solves

Effects of CPU-GPU Data Transfers

• Computations before/after SpTRSV on different platforms

A Prediction Framework for Fast Sparse Triangular Solves 13

SpTRSV

Computations on CPU

Computations on GPU

Nu

merical A

pp

lication

Ly = b

SpTRSV on GPU

Page 58: A Prediction Framework for Fast Sparse Triangular Solves

Effects of CPU-GPU Data Transfers

• Computations before/after SpTRSV on different platforms

A Prediction Framework for Fast Sparse Triangular Solves 13

SpTRSV

Computations on CPU

Computations on GPU

Nu

merical A

pp

lication

Ly = b

SpTRSV on GPU

b

Page 59: A Prediction Framework for Fast Sparse Triangular Solves

Effects of CPU-GPU Data Transfers

• Computations before/after SpTRSV on different platforms

A Prediction Framework for Fast Sparse Triangular Solves 13

SpTRSV

Computations on CPU

Computations on GPU

Nu

merical A

pp

lication

Ly = b

Computations on GPU

Computations on CPU

Nu

merical A

pp

lication

SpTRSV

Page 60: A Prediction Framework for Fast Sparse Triangular Solves

Effects of CPU-GPU Data Transfers

• Computations before/after SpTRSV on different platforms

A Prediction Framework for Fast Sparse Triangular Solves 13

SpTRSV

Computations on CPU

Computations on GPU

Nu

merical A

pp

lication

Ly = b

Computations on GPU

Computations on CPU

Nu

merical A

pp

lication

SpTRSVSpTRSV on GPU

Page 61: A Prediction Framework for Fast Sparse Triangular Solves

Effects of CPU-GPU Data Transfers

• Computations before/after SpTRSV on different platforms

A Prediction Framework for Fast Sparse Triangular Solves 13

SpTRSV

Computations on CPU

Computations on GPU

Nu

merical A

pp

lication

Ly = b

Computations on GPU

Computations on CPU

Nu

merical A

pp

lication

SpTRSVSpTRSV on GPU

y

Page 62: A Prediction Framework for Fast Sparse Triangular Solves

Effects of CPU-GPU Data Transfers

• Computations before/after SpTRSV on different platforms

A Prediction Framework for Fast Sparse Triangular Solves 13

SpTRSV

Computations on CPU

Computations on GPU

Nu

merical A

pp

lication

Ly = b

Computations on GPU

Computations on CPU

Nu

merical A

pp

lication

SpTRSVSpTRSV on CPU

Page 63: A Prediction Framework for Fast Sparse Triangular Solves

Effects of CPU-GPU Data Transfers

• Computations before/after SpTRSV on different platforms

A Prediction Framework for Fast Sparse Triangular Solves 13

SpTRSV

Computations on CPU

Computations on GPU

Nu

merical A

pp

lication

Ly = b

Computations on GPU

Computations on CPU

Nu

merical A

pp

lication

SpTRSVSpTRSV on CPU

b

Page 64: A Prediction Framework for Fast Sparse Triangular Solves

Effects of CPU-GPU Data Transfers

• Computations before/after SpTRSV on different platforms

A Prediction Framework for Fast Sparse Triangular Solves 13

SpTRSV

Computations on CPU

Computations on GPU

Nu

merical A

pp

lication

Ly = b

Computations on GPU

Computations on CPU

Nu

merical A

pp

lication

SpTRSV

Page 65: A Prediction Framework for Fast Sparse Triangular Solves

Effects of CPU-GPU Data Transfers

• Computations before/after SpTRSV on different platforms

A Prediction Framework for Fast Sparse Triangular Solves 13

SpTRSV

Computations on CPU

Computations on GPU

Nu

merical A

pp

lication

Ly = b

Computations on GPU

Computations on CPU

Nu

merical A

pp

lication

SpTRSV

Data transfershave no impact onAlgorithm selectionin this case

Page 66: A Prediction Framework for Fast Sparse Triangular Solves

Effects of CPU-GPU Data Transfers

• Computations before/after SpTRSV on same platform

A Prediction Framework for Fast Sparse Triangular Solves 14

SpTRSV

Computations on CPU

Computations on CPU

Nu

merical A

pp

lication

Ly = b

Page 67: A Prediction Framework for Fast Sparse Triangular Solves

Effects of CPU-GPU Data Transfers

• Computations before/after SpTRSV on same platform

A Prediction Framework for Fast Sparse Triangular Solves 14

SpTRSV

Computations on CPU

Computations on CPU

Nu

merical A

pp

lication

Ly = b

SpTRSV on CPU

Page 68: A Prediction Framework for Fast Sparse Triangular Solves

Effects of CPU-GPU Data Transfers

• Computations before/after SpTRSV on same platform

A Prediction Framework for Fast Sparse Triangular Solves 14

SpTRSV

Computations on CPU

Computations on CPU

Nu

merical A

pp

lication

Ly = b

SpTRSV on GPU

Page 69: A Prediction Framework for Fast Sparse Triangular Solves

Effects of CPU-GPU Data Transfers

• Computations before/after SpTRSV on same platform

A Prediction Framework for Fast Sparse Triangular Solves 14

SpTRSV

Computations on CPU

Computations on CPU

Nu

merical A

pp

lication

Ly = b

SpTRSV on GPU

y

b

Page 70: A Prediction Framework for Fast Sparse Triangular Solves

Effects of CPU-GPU Data Transfers

• Computations before/after SpTRSV on same platform

A Prediction Framework for Fast Sparse Triangular Solves 14

SpTRSV

Computations on CPU

Computations on CPU

Nu

merical A

pp

lication

Ly = b

Computations on GPU

Computations on CPU

Nu

merical A

pp

lication

SpTRSV

Page 71: A Prediction Framework for Fast Sparse Triangular Solves

Effects of CPU-GPU Data Transfers

• Computations before/after SpTRSV on same platform

A Prediction Framework for Fast Sparse Triangular Solves 14

SpTRSV

Computations on CPU

Computations on CPU

Nu

merical A

pp

lication

Ly = b

Computations on GPU

Computations on CPU

Nu

merical A

pp

lication

SpTRSVSpTRSV on GPU

Page 72: A Prediction Framework for Fast Sparse Triangular Solves

Effects of CPU-GPU Data Transfers

• Computations before/after SpTRSV on same platform

A Prediction Framework for Fast Sparse Triangular Solves 14

SpTRSV

Computations on CPU

Computations on CPU

Nu

merical A

pp

lication

Ly = b

Computations on GPU

Computations on CPU

Nu

merical A

pp

lication

SpTRSVSpTRSV on CPU

Page 73: A Prediction Framework for Fast Sparse Triangular Solves

Effects of CPU-GPU Data Transfers

• Computations before/after SpTRSV on same platform

A Prediction Framework for Fast Sparse Triangular Solves 14

SpTRSV

Computations on CPU

Computations on CPU

Nu

merical A

pp

lication

Ly = b

Computations on GPU

Computations on CPU

Nu

merical A

pp

lication

SpTRSVSpTRSV on CPU

y

b

Page 74: A Prediction Framework for Fast Sparse Triangular Solves

Effects of CPU-GPU Data Transfers

• Computations before/after SpTRSV on same platform

A Prediction Framework for Fast Sparse Triangular Solves 14

SpTRSV

Computations on CPU

Computations on CPU

Nu

merical A

pp

lication

Ly = b

Computations on GPU

Computations on CPU

Nu

merical A

pp

lication

SpTRSV

Page 75: A Prediction Framework for Fast Sparse Triangular Solves

Effects of CPU-GPU Data Transfers

• Computations before/after SpTRSV on same platform

A Prediction Framework for Fast Sparse Triangular Solves 14

SpTRSV

Computations on CPU

Computations on CPU

Nu

merical A

pp

lication

Ly = b

Computations on GPU

Computations on CPU

Nu

merical A

pp

lication

SpTRSV

Data transfers may impact algorithm selectionin this case

Page 76: A Prediction Framework for Fast Sparse Triangular Solves

Effects of CPU-GPU Data Transfers

• Computations before/after SpTRSV on same platform

A Prediction Framework for Fast Sparse Triangular Solves 14

SpTRSV

Computations on CPU

Computations on CPU

Nu

merical A

pp

lication

Ly = b

Computations on GPU

Computations on CPU

Nu

merical A

pp

lication

SpTRSV

Framework accountsfor data transfer costs

Page 77: A Prediction Framework for Fast Sparse Triangular Solves

Evaluation

• Hardware platform

– CPU: Intel Gold (6148)

• 40 cores (20 cores/socket)

– GPU: NVIDIA Tesla V100

A Prediction Framework for Fast Sparse Triangular Solves 15

Page 78: A Prediction Framework for Fast Sparse Triangular Solves

Evaluation

• Hardware platform

– CPU: Intel Gold (6148)• 40 cores (20 cores/socket)

– GPU: NVIDIA Tesla V100

• Software configuration

– Intel Parallel Studio 2019

– NVIDIA CUDA 10.1

– Compiler options:• -O3

• -gencode arch=compute_70,

code=sm_70

A Prediction Framework for Fast Sparse Triangular Solves 15

Page 79: A Prediction Framework for Fast Sparse Triangular Solves

Evaluation

• Hardware platform

– CPU: Intel Gold (6148)• 40 cores (20 cores/socket)

– GPU: NVIDIA Tesla V100

• Software configuration

– Intel Parallel Studio 2019

– NVIDIA CUDA 10.1

– Compiler options:• -O3

• -gencode arch=compute_70, code=sm_70

• Sparse Matrix Dataset

– 998 real square matrices from SuiteSparse matrix collection• 1K to 16.24M rows

• 1.074K to 232M nnzs

– Extensible with new matrices

A Prediction Framework for Fast Sparse Triangular Solves 15

Page 80: A Prediction Framework for Fast Sparse Triangular Solves

Evaluation

• Hardware platform

– CPU: Intel Gold (6148)• 40 cores (20 cores/socket)

– GPU: NVIDIA Tesla V100

• Software configuration

– Intel Parallel Studio 2019

– NVIDIA CUDA 10.1

– Compiler options:• -O3

• -gencode arch=compute_70, code=sm_70

• Sparse Matrix Dataset

– 998 real square matrices from SuiteSparse matrix collection• 1K to 16.24M rows

• 1.074K to 232M nnzs

– Extensible with new matrices

• Performance of SpTRSV Algorithms (998 matrices)

A Prediction Framework for Fast Sparse Triangular Solves 15

MKL(seq)41% MKL(par)

1%

CUS111%

CUS2(lvl)6%

CUS2(no lvl)2%Sync-Free

39%

Page 81: A Prediction Framework for Fast Sparse Triangular Solves

Evaluation

• Hardware platform

– CPU: Intel Gold (6148)• 40 cores (20 cores/socket)

– GPU: NVIDIA Tesla V100

• Software configuration

– Intel Parallel Studio 2019

– NVIDIA CUDA 10.1

– Compiler options:• -O3

• -gencode arch=compute_70, code=sm_70

• Sparse Matrix Dataset

– 998 real square matrices from SuiteSparse matrix collection• 1K to 16.24M rows

• 1.074K to 232M nnzs

– Extensible with new matrices

• Performance of SpTRSV Algorithms (998 matrices)

A Prediction Framework for Fast Sparse Triangular Solves 15

MKL(seq)41% MKL(par)

1%

CUS111%

CUS2(lvl)6%

CUS2(no lvl)2%Sync-Free

39%

CPU42%

GPU58%

Page 82: A Prediction Framework for Fast Sparse Triangular Solves

Evaluation: Prediction Accuracy

– 10-fold cross validation

A Prediction Framework for Fast Sparse Triangular Solves 16

Page 83: A Prediction Framework for Fast Sparse Triangular Solves

Evaluation: Prediction Accuracy

– 10-fold cross validation

– With 30 features

A Prediction Framework for Fast Sparse Triangular Solves 16

Mean ~87% ~89% ~87% ~87%

Page 84: A Prediction Framework for Fast Sparse Triangular Solves

Evaluation: Prediction Accuracy

– 10-fold cross validation

– With 30 features

– With top 10 features

A Prediction Framework for Fast Sparse Triangular Solves 16

Mean ~87% ~89% ~87% ~87% ~80% ~81% ~80% ~79%

Page 85: A Prediction Framework for Fast Sparse Triangular Solves

Evaluation: Speedup Gain

A Prediction Framework for Fast Sparse Triangular Solves 17

• Framework achieves significant speedups over arbitrary algorithm choice

Page 86: A Prediction Framework for Fast Sparse Triangular Solves

Evaluation: Overheads

• Acceptable overheads, especially for large matrices

A Prediction Framework for Fast Sparse Triangular Solves 18

Page 87: A Prediction Framework for Fast Sparse Triangular Solves

Related Work

• OSKI library [5]

- Runtime autotuning of SpTRSV

• PetaBricks [6]

– Algorithm selection based on data size

• Nitro [7]

– Algorithm selection through user-guided machine learning

• MAPS simulation framework [8]

– Heuristics-based SpTRSV algo. selection on CPU/GPU

– Limited to reservoir simulation

• SpTRSV Algo. Selection on GPUs [9]

– Machine learning-based approach

A Prediction Framework for Fast Sparse Triangular Solves 19

Page 88: A Prediction Framework for Fast Sparse Triangular Solves

Conclusions

• We use supervised machine learning approach for SpTRSValgorithm selection on CPU-GPU systems

• We implemented the approach as an automated, extensible framework for prediction model training and fastest SpTRSVprediction

• Framework evaluation on Intel Gold CPU + NVIDIA V100 GPU shows 87% model accuracy, 1.4-2.7x mean SpTRSV speedups

A Prediction Framework for Fast Sparse Triangular Solves 20

Page 89: A Prediction Framework for Fast Sparse Triangular Solves

Paper Artifacts

• Artifacts– Support materials (source code, tools, benchmarks, datasets, models)

required for reproducibility of claimed experimental results1

A Prediction Framework for Fast Sparse Triangular Solves 211 Euro-Par 2020 Call for Artifact Evaluation

Page 90: A Prediction Framework for Fast Sparse Triangular Solves

Paper Artifacts

• Artifacts– Support materials (source code, tools, benchmarks, datasets, models)

required for reproducibility of claimed experimental results1

• Artifact Evaluation Process (AEP) at Euro-Par1

– Completely optional but highly recommended

– Evaluated by independent committee of experts

A Prediction Framework for Fast Sparse Triangular Solves 211 Euro-Par 2020 Call for Artifact Evaluation

Page 91: A Prediction Framework for Fast Sparse Triangular Solves

Paper Artifacts

• Artifacts

– Support materials (source code, tools, benchmarks, datasets, models) required for reproducibility of claimed experimental results1

• Artifact Evaluation Process (AEP) at Euro-Par1

– Completely optional but highly recommended

– Evaluated by independent committee of experts

• Our motivation for Artifact Evaluation

– Making our research more accessible

– Documenting and organizing the research for future reference/extension

– Enhance credibility of the research claims

A Prediction Framework for Fast Sparse Triangular Solves 211 Euro-Par 2020 Call for Artifact Evaluation

Page 92: A Prediction Framework for Fast Sparse Triangular Solves

Artifact Preparation for Evaluation

• File organization

A Prediction Framework for Fast Sparse Triangular Solves 22

Page 93: A Prediction Framework for Fast Sparse Triangular Solves

Artifact Preparation for Evaluation

• File organization

A Prediction Framework for Fast Sparse Triangular Solves 22

Page 94: A Prediction Framework for Fast Sparse Triangular Solves

Artifact Preparation for Evaluation

• File organization

A Prediction Framework for Fast Sparse Triangular Solves 22

Page 95: A Prediction Framework for Fast Sparse Triangular Solves

Artifact Preparation for Evaluation

• File organization

A Prediction Framework for Fast Sparse Triangular Solves 22

Page 96: A Prediction Framework for Fast Sparse Triangular Solves

Artifact Preparation for Evaluation

• File organization

A Prediction Framework for Fast Sparse Triangular Solves 22

Page 97: A Prediction Framework for Fast Sparse Triangular Solves

Artifact Preparation for Evaluation

• File organization

A Prediction Framework for Fast Sparse Triangular Solves 22

A selection criterionArtifact should be self-contained

Page 98: A Prediction Framework for Fast Sparse Triangular Solves

Artifact Preparation for Evaluation

• File organization

A Prediction Framework for Fast Sparse Triangular Solves 23

Page 99: A Prediction Framework for Fast Sparse Triangular Solves

Artifact Preparation for Evaluation

• File organization

A Prediction Framework for Fast Sparse Triangular Solves 23

Page 100: A Prediction Framework for Fast Sparse Triangular Solves

Artifact Preparation for Evaluation

• File organization

A Prediction Framework for Fast Sparse Triangular Solves 23

Page 101: A Prediction Framework for Fast Sparse Triangular Solves

Artifact Preparation for Evaluation

• File organization

A Prediction Framework for Fast Sparse Triangular Solves 23

A selection criterionArtifacts with long running time will not be evaluated

Page 102: A Prediction Framework for Fast Sparse Triangular Solves

Artifact Preparation for Evaluation

• File organization

A Prediction Framework for Fast Sparse Triangular Solves 23

Page 103: A Prediction Framework for Fast Sparse Triangular Solves

Artifact Preparation for Evaluation

• File organization

A Prediction Framework for Fast Sparse Triangular Solves 23

Page 104: A Prediction Framework for Fast Sparse Triangular Solves

Artifact Preparation for Evaluation

• File organization

A Prediction Framework for Fast Sparse Triangular Solves 23

A selection criterionEase of use

Page 105: A Prediction Framework for Fast Sparse Triangular Solves

Artifact Preparation for Evaluation

• The Overview Document

A Prediction Framework for Fast Sparse Triangular Solves 24

Page 106: A Prediction Framework for Fast Sparse Triangular Solves

Artifact Preparation for Evaluation

• The Overview Document

A Prediction Framework for Fast Sparse Triangular Solves 24

Page 107: A Prediction Framework for Fast Sparse Triangular Solves

Artifact Preparation for Evaluation

• The Overview Document

A Prediction Framework for Fast Sparse Triangular Solves 24

Page 108: A Prediction Framework for Fast Sparse Triangular Solves

Artifact Preparation for Evaluation

• The Overview Document

A Prediction Framework for Fast Sparse Triangular Solves 24

Page 109: A Prediction Framework for Fast Sparse Triangular Solves

Artifact Preparation for Evaluation

• The Overview Document

A Prediction Framework for Fast Sparse Triangular Solves 24

Page 110: A Prediction Framework for Fast Sparse Triangular Solves

Artifact Preparation for Evaluation

• The Overview Document

A Prediction Framework for Fast Sparse Triangular Solves 24

Page 111: A Prediction Framework for Fast Sparse Triangular Solves

Artifact Preparation for Evaluation

• The Overview Document

A Prediction Framework for Fast Sparse Triangular Solves 24

Page 112: A Prediction Framework for Fast Sparse Triangular Solves

Artifact Preparation for Evaluation

• Reproducing Paper Results

A Prediction Framework for Fast Sparse Triangular Solves 25

Page 113: A Prediction Framework for Fast Sparse Triangular Solves

Artifact Preparation for Evaluation

• Reproducing Paper Results

A Prediction Framework for Fast Sparse Triangular Solves 25

Page 114: A Prediction Framework for Fast Sparse Triangular Solves

Artifact Preparation for Evaluation

• Reproducing Paper Results

A Prediction Framework for Fast Sparse Triangular Solves 25

Page 115: A Prediction Framework for Fast Sparse Triangular Solves

Artifact Preparation for Evaluation

• Reproducing Paper Results

A Prediction Framework for Fast Sparse Triangular Solves 25

Page 116: A Prediction Framework for Fast Sparse Triangular Solves

Artifact Preparation for Evaluation

• Dataset Generation Guide

A Prediction Framework for Fast Sparse Triangular Solves 26

Page 117: A Prediction Framework for Fast Sparse Triangular Solves

Artifact Preparation for Evaluation

• Dataset Generation Guide

A Prediction Framework for Fast Sparse Triangular Solves 26

Page 118: A Prediction Framework for Fast Sparse Triangular Solves

Artifact Preparation for Evaluation

• Dataset Generation Guide

A Prediction Framework for Fast Sparse Triangular Solves 26

Page 119: A Prediction Framework for Fast Sparse Triangular Solves

Artifact Preparation for Evaluation

• Dataset Generation Guide

A Prediction Framework for Fast Sparse Triangular Solves 26

Page 120: A Prediction Framework for Fast Sparse Triangular Solves

Artifact Preparation for Evaluation

• Dataset Generation Guide

A Prediction Framework for Fast Sparse Triangular Solves 26

Page 121: A Prediction Framework for Fast Sparse Triangular Solves

Artifact Preparation for Evaluation

• Dataset Generation Guide

A Prediction Framework for Fast Sparse Triangular Solves 26

Page 122: A Prediction Framework for Fast Sparse Triangular Solves

Artifact Evaluation

• Final Remarks

– Artifact Evaluation, a time-consuming but rewarding process

– Artifact mirror:

– For questions, queries, suggestions, contact: [email protected]

A Prediction Framework for Fast Sparse Triangular Solves 27

https://github.com/ParCoreLab/SpTRSV_Framework

Page 123: A Prediction Framework for Fast Sparse Triangular Solves

THANK YOU

A Prediction Framework for Fast Sparse Triangular Solves 28

Page 124: A Prediction Framework for Fast Sparse Triangular Solves

References[1] A. Jamal et al., "A Hybrid CPU/GPU Approach for the Parallel Algebraic Recursive Multilevel Solver pARMS," 2016 18th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), Timisoara, 2016

[2] Park, J. et al., “Sparsifying synchronization for high-performance shared-memory sparse triangular solver.” ISC, 2014

[3] Liu, W. et al., ”Fast synchronization‐free algorithms for parallel sparse triangular solves with multiple right‐hand sides.”, Concurrency Computat: Pract Exper. 2017

[4] Li et al., “Efficient parallel implementations of sparse triangular solves for gpu architectures.”, SIAM Conference on Parallel Processing for Scientific Computing, 2020

[5] Vuduc et al., “OSKI: A library of automatically tuned sparse matrix kernels.”, Journal of Physics: Conference Series, 2005

[6] Ansel, J. et al., “Petabricks: A language and compiler for algorithmic choice.”, SIGPLAN, 2009

[7] Muralidharan, S. et al., “Nitro: A framework for adaptive code variant tuning.”, IEEE 28th IPDPS, 2014

[8] Klie, H. et al., “Exploiting capabilities of many core platforms in reservoir simulation.”, In: SPE Reservoir Simulation Symposiu m, 2011

[9] Dufrechou, E. et al., “Automatic selection of sparse triangular linear system solvers on gpus through machine learning techniques”, International Symposium on Computer Architecture and High Performance Computing, 2019

A Prediction Framework for Fast Sparse Triangular Solves 29