parallel algorithms for dense linear algebra computationsmb/teaching/studenttalks/...parallel...

18
Parallel Algorithms For Dense Linear Algebra Computations K.A. GALLIVAN , R.J. PLEMMONS, and A.H. SAMEH February 6, 2013

Upload: others

Post on 16-Oct-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Parallel Algorithms For Dense Linear Algebra Computationsmb/Teaching/STUDENTTALKS/...Parallel Algorithms For Dense Linear Algebra Computations K.A.GALLIVAN,R.J.PLEMMONS,andA.H.SAMEH

Parallel Algorithms For Dense Linear AlgebraComputations

K.A. GALLIVAN , R.J. PLEMMONS, and A.H. SAMEH

February 6, 2013

Page 2: Parallel Algorithms For Dense Linear Algebra Computationsmb/Teaching/STUDENTTALKS/...Parallel Algorithms For Dense Linear Algebra Computations K.A.GALLIVAN,R.J.PLEMMONS,andA.H.SAMEH

Outline

1 Context

2 Intro/Abstract

3 Architecture

4 Computational Primitives4.1 BLAS Level 14.2 BLAS Level 24.3 BLAS Level 3

K.A. GALLIVAN , R.J. PLEMMONS, and A.H. SAMEH | 2/18

Page 3: Parallel Algorithms For Dense Linear Algebra Computationsmb/Teaching/STUDENTTALKS/...Parallel Algorithms For Dense Linear Algebra Computations K.A.GALLIVAN,R.J.PLEMMONS,andA.H.SAMEH

5 The Big Idea : Blocksize Analysis5.1 Results

6 Conclusion

K.A. GALLIVAN , R.J. PLEMMONS, and A.H. SAMEH | 3/18

Page 4: Parallel Algorithms For Dense Linear Algebra Computationsmb/Teaching/STUDENTTALKS/...Parallel Algorithms For Dense Linear Algebra Computations K.A.GALLIVAN,R.J.PLEMMONS,andA.H.SAMEH

1 Context

Meta-analysis covering:

1. Parallel algorithms for dense matrix computations

2. Implementation practices

3. Efficiency analysis

K.A. GALLIVAN , R.J. PLEMMONS, and A.H. SAMEH | 4/18

Page 5: Parallel Algorithms For Dense Linear Algebra Computationsmb/Teaching/STUDENTTALKS/...Parallel Algorithms For Dense Linear Algebra Computations K.A.GALLIVAN,R.J.PLEMMONS,andA.H.SAMEH

2 Intro/Abstract

1. Efficient parallel algorithm design ought to be architecture-specific

2. Efficient algorithms can be decomposed into Computational Primi-tives

K.A. GALLIVAN , R.J. PLEMMONS, and A.H. SAMEH | 5/18

Page 6: Parallel Algorithms For Dense Linear Algebra Computationsmb/Teaching/STUDENTTALKS/...Parallel Algorithms For Dense Linear Algebra Computations K.A.GALLIVAN,R.J.PLEMMONS,andA.H.SAMEH

K.A. GALLIVAN , R.J. PLEMMONS, and A.H. SAMEH | 6/18

Page 7: Parallel Algorithms For Dense Linear Algebra Computationsmb/Teaching/STUDENTTALKS/...Parallel Algorithms For Dense Linear Algebra Computations K.A.GALLIVAN,R.J.PLEMMONS,andA.H.SAMEH

3 Architecture

K.A. GALLIVAN , R.J. PLEMMONS, and A.H. SAMEH | 7/18

Page 8: Parallel Algorithms For Dense Linear Algebra Computationsmb/Teaching/STUDENTTALKS/...Parallel Algorithms For Dense Linear Algebra Computations K.A.GALLIVAN,R.J.PLEMMONS,andA.H.SAMEH

Hierarchical shared memory and distributed memory architectures bothinfluence algorithm design with a component denoted by ∆l or the dataloading overhead. So if we include arithmetic time Ta we get the following:

T = Ta + ∆l = naτa + nlτl, (1)

This is the basis of our analysis. Alternatively:

∆l

Ta= λµ (2)

where µ = nl/na is the cache-miss ratio and λ = τl/τa is the cost ratio.

K.A. GALLIVAN , R.J. PLEMMONS, and A.H. SAMEH | 8/18

Page 9: Parallel Algorithms For Dense Linear Algebra Computationsmb/Teaching/STUDENTTALKS/...Parallel Algorithms For Dense Linear Algebra Computations K.A.GALLIVAN,R.J.PLEMMONS,andA.H.SAMEH

4 Computational Primitives

BLAS

1. Basic Linear Algebra Subroutines (Subprograms)

2. Comprise the base computational units in LA

K.A. GALLIVAN , R.J. PLEMMONS, and A.H. SAMEH | 9/18

Page 10: Parallel Algorithms For Dense Linear Algebra Computationsmb/Teaching/STUDENTTALKS/...Parallel Algorithms For Dense Linear Algebra Computations K.A.GALLIVAN,R.J.PLEMMONS,andA.H.SAMEH

4.1 BLAS Level 1

Vector-Vector Operations

1. α← xT y (dot product)

2. y ← y ± αx (vector triads)

3. Note : BLAS 1 requires many synchronizations relative to the numberof arithmetic ops (large µ = nl/na).

K.A. GALLIVAN , R.J. PLEMMONS, and A.H. SAMEH | 10/18

Page 11: Parallel Algorithms For Dense Linear Algebra Computationsmb/Teaching/STUDENTTALKS/...Parallel Algorithms For Dense Linear Algebra Computations K.A.GALLIVAN,R.J.PLEMMONS,andA.H.SAMEH

4.2 BLAS Level 2

Matrix-vector Operations

1. y ← y ±Ax (matrix-vector product)

2. A← A± xyT (rank-1 update)

3. BLAS 2 allows us to compute many BLAS 1 primitives in parallelthereby increasing na relative to nl per process.

4. Note : BLAS 2 can degrade to BLAS1 as dim(A) or min(dim(A))goes to 1.

K.A. GALLIVAN , R.J. PLEMMONS, and A.H. SAMEH | 11/18

Page 12: Parallel Algorithms For Dense Linear Algebra Computationsmb/Teaching/STUDENTTALKS/...Parallel Algorithms For Dense Linear Algebra Computations K.A.GALLIVAN,R.J.PLEMMONS,andA.H.SAMEH

4.3 BLAS Level 3

Matrix-matrix Operations

1. C ← C +AB (Matrix multiplication)

2. By Gallivan et al, typically the most efficient primitive IF cache sizeis considered when partitioning/decomposing the problem. Blocksizedecision gives us maximum speed-up.

K.A. GALLIVAN , R.J. PLEMMONS, and A.H. SAMEH | 12/18

Page 13: Parallel Algorithms For Dense Linear Algebra Computationsmb/Teaching/STUDENTTALKS/...Parallel Algorithms For Dense Linear Algebra Computations K.A.GALLIVAN,R.J.PLEMMONS,andA.H.SAMEH

5 The Big Idea : Blocksize AnalysisConsider the BLAS3 primitive C ← C + AB. We would expect topartition the matricies C,A, and B into submatricies Cij , Aik and Bkj

whose dimensions are m1×m3,m1×m2 and m2×m3, respectively. Ourbasic loop might be of the form:

K.A. GALLIVAN , R.J. PLEMMONS, and A.H. SAMEH | 13/18

Page 14: Parallel Algorithms For Dense Linear Algebra Computationsmb/Teaching/STUDENTTALKS/...Parallel Algorithms For Dense Linear Algebra Computations K.A.GALLIVAN,R.J.PLEMMONS,andA.H.SAMEH

where n1 = k1m1, n2 = k2m2, and n3 = k3m3. Consider number oftransers required for given submatricies:

µ = 12m1

+ 12m2

+ 12n3

(3)

If infinite cache, we have a minimum of:

µ = 12n1

+ 12n2

+ 12n3

(4)

We want to minimize m1 and m2 subject to number of processors andcache size....

K.A. GALLIVAN , R.J. PLEMMONS, and A.H. SAMEH | 14/18

Page 15: Parallel Algorithms For Dense Linear Algebra Computationsmb/Teaching/STUDENTTALKS/...Parallel Algorithms For Dense Linear Algebra Computations K.A.GALLIVAN,R.J.PLEMMONS,andA.H.SAMEH

As it turns out this takes a form:

µ = 1√CS

+ p

2CS + 12n3

. (5)

where CS is the cache size and assuming n3 is larger than√CS.

K.A. GALLIVAN , R.J. PLEMMONS, and A.H. SAMEH | 15/18

Page 16: Parallel Algorithms For Dense Linear Algebra Computationsmb/Teaching/STUDENTTALKS/...Parallel Algorithms For Dense Linear Algebra Computations K.A.GALLIVAN,R.J.PLEMMONS,andA.H.SAMEH

5.1 ResultsPerformance for a square matrix multiplication on Alliant FX/8

K.A. GALLIVAN , R.J. PLEMMONS, and A.H. SAMEH | 16/18

Page 17: Parallel Algorithms For Dense Linear Algebra Computationsmb/Teaching/STUDENTTALKS/...Parallel Algorithms For Dense Linear Algebra Computations K.A.GALLIVAN,R.J.PLEMMONS,andA.H.SAMEH

6 Conclusion

1. Data Locality - The key factor in exploiting parallelism.

2. Blocksize - Main tool to control factors of Data Locality and ensureeffective load management

K.A. GALLIVAN , R.J. PLEMMONS, and A.H. SAMEH | 17/18

Page 18: Parallel Algorithms For Dense Linear Algebra Computationsmb/Teaching/STUDENTTALKS/...Parallel Algorithms For Dense Linear Algebra Computations K.A.GALLIVAN,R.J.PLEMMONS,andA.H.SAMEH

Questions?