by yequn zhang, yu zhang. contents introduction problem analysis proposed algorithm evaluation

27
Gaussian Elimination By Yequn Zhang, Yu Zhang

Post on 20-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: By Yequn Zhang, Yu Zhang. Contents Introduction Problem Analysis Proposed Algorithm Evaluation

Gaussian EliminationBy

Yequn Zhang, Yu Zhang

Page 2: By Yequn Zhang, Yu Zhang. Contents Introduction Problem Analysis Proposed Algorithm Evaluation

Contents

IntroductionProblem AnalysisProposed AlgorithmEvaluation

Page 3: By Yequn Zhang, Yu Zhang. Contents Introduction Problem Analysis Proposed Algorithm Evaluation

Contents

IntroductionProblem AnalysisProposed AlgorithmEvaluation

Page 4: By Yequn Zhang, Yu Zhang. Contents Introduction Problem Analysis Proposed Algorithm Evaluation

Gaussian EliminationForward EliminationBack Substitution

Page 5: By Yequn Zhang, Yu Zhang. Contents Introduction Problem Analysis Proposed Algorithm Evaluation

Contents

IntroductionProblem AnalysisProposed AlgorithmEvaluation

Page 6: By Yequn Zhang, Yu Zhang. Contents Introduction Problem Analysis Proposed Algorithm Evaluation

Problem AnalysisData size used by kernels changes continuouslyDifficult to find an appropriate block size to avoid divergenceBlock-based approach

Assign a certain part of computation running on CPU-leave the irregularity to cpu

Manually make the data size changes with a step of block sizeBlock number per grid is easy to set

Page 7: By Yequn Zhang, Yu Zhang. Contents Introduction Problem Analysis Proposed Algorithm Evaluation

Contents

IntroductionProblem AnalysisProposed AlgorithmEvaluation

Page 8: By Yequn Zhang, Yu Zhang. Contents Introduction Problem Analysis Proposed Algorithm Evaluation

Forward EliminationA block-based approachTry to avoid divergenceTry to use GPUTry to be fine-grained

Page 9: By Yequn Zhang, Yu Zhang. Contents Introduction Problem Analysis Proposed Algorithm Evaluation

K 1

Find Max Row

Page 10: By Yequn Zhang, Yu Zhang. Contents Introduction Problem Analysis Proposed Algorithm Evaluation

Swapcpu

Now start toeliminate the block of data on cpu

Page 11: By Yequn Zhang, Yu Zhang. Contents Introduction Problem Analysis Proposed Algorithm Evaluation

Calculatecoefficients

Page 12: By Yequn Zhang, Yu Zhang. Contents Introduction Problem Analysis Proposed Algorithm Evaluation

Eliminationon CPU

Page 13: By Yequn Zhang, Yu Zhang. Contents Introduction Problem Analysis Proposed Algorithm Evaluation

K 1

Calculate Coefficients

Page 14: By Yequn Zhang, Yu Zhang. Contents Introduction Problem Analysis Proposed Algorithm Evaluation

K2K 2

Eliminationon CPU

Page 15: By Yequn Zhang, Yu Zhang. Contents Introduction Problem Analysis Proposed Algorithm Evaluation

Swap on GPU

K3

K 3

Page 16: By Yequn Zhang, Yu Zhang. Contents Introduction Problem Analysis Proposed Algorithm Evaluation

K4Elimination on GPU

K 4

Page 17: By Yequn Zhang, Yu Zhang. Contents Introduction Problem Analysis Proposed Algorithm Evaluation

K5Eliminationon GPU

K 5

Page 18: By Yequn Zhang, Yu Zhang. Contents Introduction Problem Analysis Proposed Algorithm Evaluation

Intra-block loop

Page 19: By Yequn Zhang, Yu Zhang. Contents Introduction Problem Analysis Proposed Algorithm Evaluation

Inter-block loop

Page 20: By Yequn Zhang, Yu Zhang. Contents Introduction Problem Analysis Proposed Algorithm Evaluation

Last inter-block loopprocessedon CPU

Page 21: By Yequn Zhang, Yu Zhang. Contents Introduction Problem Analysis Proposed Algorithm Evaluation

Back SubstitutionLaunch kernel when number of coefficients per row

exceeds four block size (64*4=256)A fine-grained way, use a similar way as forward

elimination, part on CPU and part on GPU

Page 22: By Yequn Zhang, Yu Zhang. Contents Introduction Problem Analysis Proposed Algorithm Evaluation

Contents

IntroductionProblem AnalysisProposed AlgorithmEvaluation

Page 23: By Yequn Zhang, Yu Zhang. Contents Introduction Problem Analysis Proposed Algorithm Evaluation

Block size effect

Page 24: By Yequn Zhang, Yu Zhang. Contents Introduction Problem Analysis Proposed Algorithm Evaluation

The contribution of swap and find max rowIs it necessary to implement every part on GPU?

Page 25: By Yequn Zhang, Yu Zhang. Contents Introduction Problem Analysis Proposed Algorithm Evaluation

Performance breakdownContribution of each part to the total performance,

including kernels as well as CPU part

Page 26: By Yequn Zhang, Yu Zhang. Contents Introduction Problem Analysis Proposed Algorithm Evaluation

Speedup

Page 27: By Yequn Zhang, Yu Zhang. Contents Introduction Problem Analysis Proposed Algorithm Evaluation

Questions ?