generic programming for high performance numerical linear...
TRANSCRIPT
![Page 1: Generic Programming for High Performance Numerical Linear ...ecee.colorado.edu/~siek/pubs/pubs/1998/siek98:_mtl_scitools.pdf · Generic Programming for High Performance Numerical](https://reader031.vdocuments.net/reader031/viewer/2022022706/5be2797409d3f2f02d8bde5f/html5/thumbnails/1.jpg)
Generic Programming for High PerformanceNumerical Linear Algebra
Jeremy Siek and Andrew Lumsdaine
Department of Computer Science and Engineering
University of Notre Dame
1
![Page 2: Generic Programming for High Performance Numerical Linear ...ecee.colorado.edu/~siek/pubs/pubs/1998/siek98:_mtl_scitools.pdf · Generic Programming for High Performance Numerical](https://reader031.vdocuments.net/reader031/viewer/2022022706/5be2797409d3f2f02d8bde5f/html5/thumbnails/2.jpg)
Overview
1. Introduction & Motivation
2. Generic Programming
3. Generic Algorithms for Linear Algebra
4. The MTL Algorithms
5. The MTL Components
6. High Performance
7. Conclusion
2
![Page 3: Generic Programming for High Performance Numerical Linear ...ecee.colorado.edu/~siek/pubs/pubs/1998/siek98:_mtl_scitools.pdf · Generic Programming for High Performance Numerical](https://reader031.vdocuments.net/reader031/viewer/2022022706/5be2797409d3f2f02d8bde5f/html5/thumbnails/3.jpg)
Introduction & Motivation� Scientific software can benefit from software engineering
methodologies
– Development
– Maintenance
� Perpetual interest in using C++ (e.g.) in scientific
computing
� Common perception: Abstraction is the enemy of
performance
3
![Page 4: Generic Programming for High Performance Numerical Linear ...ecee.colorado.edu/~siek/pubs/pubs/1998/siek98:_mtl_scitools.pdf · Generic Programming for High Performance Numerical](https://reader031.vdocuments.net/reader031/viewer/2022022706/5be2797409d3f2f02d8bde5f/html5/thumbnails/4.jpg)
Introduction & Motivation� C++ can be used effectively in scientific computing (with
concomitant software engineering benefits)
� Generic programming has some particular benefits in this
domain
� Follow the theme of STL
� Keep high-performance always in mind
4
![Page 5: Generic Programming for High Performance Numerical Linear ...ecee.colorado.edu/~siek/pubs/pubs/1998/siek98:_mtl_scitools.pdf · Generic Programming for High Performance Numerical](https://reader031.vdocuments.net/reader031/viewer/2022022706/5be2797409d3f2f02d8bde5f/html5/thumbnails/5.jpg)
Combinatorial Explosion� Four precision types
� Several dense storage types
� A multitude of sparse storage types
� Row and column oriented matrices
� Scaling and striding
5
![Page 6: Generic Programming for High Performance Numerical Linear ...ecee.colorado.edu/~siek/pubs/pubs/1998/siek98:_mtl_scitools.pdf · Generic Programming for High Performance Numerical](https://reader031.vdocuments.net/reader031/viewer/2022022706/5be2797409d3f2f02d8bde5f/html5/thumbnails/6.jpg)
Combinatorial Explosion� Unnecessary artifact of certain programming languages
� Algorithm expression also includes data type information
� Not necessary in certain other languages (most notably,
C++)
6
![Page 7: Generic Programming for High Performance Numerical Linear ...ecee.colorado.edu/~siek/pubs/pubs/1998/siek98:_mtl_scitools.pdf · Generic Programming for High Performance Numerical](https://reader031.vdocuments.net/reader031/viewer/2022022706/5be2797409d3f2f02d8bde5f/html5/thumbnails/7.jpg)
Generic Programming� Algorithms can be expressed independently of data
storage formats
� Define standard interfaces for data storage components
� iterators are the interface between containers and
algorithms
� E.g., The Standard Template Library
� High performance linear algebra is amenable to generic
programming
7
![Page 8: Generic Programming for High Performance Numerical Linear ...ecee.colorado.edu/~siek/pubs/pubs/1998/siek98:_mtl_scitools.pdf · Generic Programming for High Performance Numerical](https://reader031.vdocuments.net/reader031/viewer/2022022706/5be2797409d3f2f02d8bde5f/html5/thumbnails/8.jpg)
Iterator Bridge Between Algorithms and
Containers
ContainerAlgorithm Iterator
8
![Page 9: Generic Programming for High Performance Numerical Linear ...ecee.colorado.edu/~siek/pubs/pubs/1998/siek98:_mtl_scitools.pdf · Generic Programming for High Performance Numerical](https://reader031.vdocuments.net/reader031/viewer/2022022706/5be2797409d3f2f02d8bde5f/html5/thumbnails/9.jpg)
Example of Generic Programmingtemplate <class InIter, class T>
T accumulate(InIter first, InIter last, T init)
{
while (first != last)
init = init + *first++;
return init;
}
// how it is used:
vector<double> x(10,1.0);
double sum = accumulate(x.begin(),x.end(),0.0);
9
![Page 10: Generic Programming for High Performance Numerical Linear ...ecee.colorado.edu/~siek/pubs/pubs/1998/siek98:_mtl_scitools.pdf · Generic Programming for High Performance Numerical](https://reader031.vdocuments.net/reader031/viewer/2022022706/5be2797409d3f2f02d8bde5f/html5/thumbnails/10.jpg)
Generic Algorithms for Linear Algebra� Extend the generic style of programming to domain of
linear algebra
� A matrix can be abstractly thought of as a container of
containers
� Use iterators and 2-dimensional iterators to traverse the
matrix
� A large class of matrix types can be implemented with
this interface.
10
![Page 11: Generic Programming for High Performance Numerical Linear ...ecee.colorado.edu/~siek/pubs/pubs/1998/siek98:_mtl_scitools.pdf · Generic Programming for High Performance Numerical](https://reader031.vdocuments.net/reader031/viewer/2022022706/5be2797409d3f2f02d8bde5f/html5/thumbnails/11.jpg)
The MTL Generic Algorithms� Encompasses BLAS functionality
� A single algorithm typically used for all matrix and
numeric types
� Index-less algorithms
� Sparse and dense algorithms unified
� Transpose, scaling, and striding handled by adapters, not
the algorithm
11
![Page 12: Generic Programming for High Performance Numerical Linear ...ecee.colorado.edu/~siek/pubs/pubs/1998/siek98:_mtl_scitools.pdf · Generic Programming for High Performance Numerical](https://reader031.vdocuments.net/reader031/viewer/2022022706/5be2797409d3f2f02d8bde5f/html5/thumbnails/12.jpg)
Index-less Algorithms� Iterate from begin() to end() of a vector.
� Iterate from begin rows() to end rows() (or
columns) of a 2-D container.
� This side-steps traditional annoyances such as the
difference between Fortran (from 1) and C (from 0)
indexing.
12
![Page 13: Generic Programming for High Performance Numerical Linear ...ecee.colorado.edu/~siek/pubs/pubs/1998/siek98:_mtl_scitools.pdf · Generic Programming for High Performance Numerical](https://reader031.vdocuments.net/reader031/viewer/2022022706/5be2797409d3f2f02d8bde5f/html5/thumbnails/13.jpg)
Unifying Sparse and Dense� Iterators hides difference in traversal
� index() method hides difference in indexing
� An example from a matrix-vector multiply.
for(j = i->begin(); not_at(j, i->end()); ++j)
tmp += *j * x[j.index()];
13
![Page 14: Generic Programming for High Performance Numerical Linear ...ecee.colorado.edu/~siek/pubs/pubs/1998/siek98:_mtl_scitools.pdf · Generic Programming for High Performance Numerical](https://reader031.vdocuments.net/reader031/viewer/2022022706/5be2797409d3f2f02d8bde5f/html5/thumbnails/14.jpg)
A Generic Matrix-Vector Multiplytemplate <class Matrix,class IterX,class IterY>
void matvec::mult(Matrix A, IterX x, IterY y) {
typename Matrix::row_2Diterator i;
typename Matrix::RowVector::iterator j;
for (i = A.begin_rows();
not_at(i, A.end_rows()); ++i) {
typename Matrix::PR tmp = y[i.index()];
for(j=i->begin(); not_at(j,i->end()); ++j)
tmp += *j * x[j.index()];
y[i.index()] = tmp;
}
}
14
![Page 15: Generic Programming for High Performance Numerical Linear ...ecee.colorado.edu/~siek/pubs/pubs/1998/siek98:_mtl_scitools.pdf · Generic Programming for High Performance Numerical](https://reader031.vdocuments.net/reader031/viewer/2022022706/5be2797409d3f2f02d8bde5f/html5/thumbnails/15.jpg)
Transpose, Scaling, and Striding� Matrix and vector adapters
� An adapter wraps up an object an modifies its behavior
// y <- A' * alpha x
matvec::mult(trans(A),
scale(x, alpha),
stride(y, incy));
15
![Page 16: Generic Programming for High Performance Numerical Linear ...ecee.colorado.edu/~siek/pubs/pubs/1998/siek98:_mtl_scitools.pdf · Generic Programming for High Performance Numerical](https://reader031.vdocuments.net/reader031/viewer/2022022706/5be2797409d3f2f02d8bde5f/html5/thumbnails/16.jpg)
The MTL Components� Iterators
� 1-D Containers
� 2-D Containers
� Orientation Adapter: Row and Column
� Shape Adapter: Banded, Triangle, Symmetric
triangle<row<array2D<dense1D<double>>>,lower>
16
![Page 17: Generic Programming for High Performance Numerical Linear ...ecee.colorado.edu/~siek/pubs/pubs/1998/siek98:_mtl_scitools.pdf · Generic Programming for High Performance Numerical](https://reader031.vdocuments.net/reader031/viewer/2022022706/5be2797409d3f2f02d8bde5f/html5/thumbnails/17.jpg)
The MTL Iterators� For sparse vectors
� For dense vectors
� Strided iterator adapter
� Scaled iterator adapter
� Block iterator
17
![Page 18: Generic Programming for High Performance Numerical Linear ...ecee.colorado.edu/~siek/pubs/pubs/1998/siek98:_mtl_scitools.pdf · Generic Programming for High Performance Numerical](https://reader031.vdocuments.net/reader031/viewer/2022022706/5be2797409d3f2f02d8bde5f/html5/thumbnails/18.jpg)
The MTL 1-D Containers� dense1D similar to STL's vector class
� sparse1D index-value pairs
� compressed1D separate index and value arrays
� scaled1D adapter class
� strided adapter class
triangle<row<array2D<dense1D<double>>>,lower>
18
![Page 19: Generic Programming for High Performance Numerical Linear ...ecee.colorado.edu/~siek/pubs/pubs/1998/siek98:_mtl_scitools.pdf · Generic Programming for High Performance Numerical](https://reader031.vdocuments.net/reader031/viewer/2022022706/5be2797409d3f2f02d8bde5f/html5/thumbnails/19.jpg)
The MTL 2-D Containers� array2D composes 1-D containers into a matrix
� dense2D contiguous dense matrix
� compressed2D contiguous sparse matrix
� scaled2D adapter class
triangle<row<array2D<dense1D<double>>>,lower>
19
![Page 20: Generic Programming for High Performance Numerical Linear ...ecee.colorado.edu/~siek/pubs/pubs/1998/siek98:_mtl_scitools.pdf · Generic Programming for High Performance Numerical](https://reader031.vdocuments.net/reader031/viewer/2022022706/5be2797409d3f2f02d8bde5f/html5/thumbnails/20.jpg)
The MTL Orientation Classes� Row Orientation
– Maps row to major
– Maps column to minor
� Column Orientation
– Maps column to major
– Maps row to minor
� All 2-D methods and typedefs are mapped.
triangle<row<array2D<dense1D<double>>>,lower>
20
![Page 21: Generic Programming for High Performance Numerical Linear ...ecee.colorado.edu/~siek/pubs/pubs/1998/siek98:_mtl_scitools.pdf · Generic Programming for High Performance Numerical](https://reader031.vdocuments.net/reader031/viewer/2022022706/5be2797409d3f2f02d8bde5f/html5/thumbnails/21.jpg)
The MTL Shape Adapters� banded , triangle
� symmetric , hermitian
triangle<row<array2D<dense1D<double>>>,lower>
21
![Page 22: Generic Programming for High Performance Numerical Linear ...ecee.colorado.edu/~siek/pubs/pubs/1998/siek98:_mtl_scitools.pdf · Generic Programming for High Performance Numerical](https://reader031.vdocuments.net/reader031/viewer/2022022706/5be2797409d3f2f02d8bde5f/html5/thumbnails/22.jpg)
High Performance with C++� Explicit unrolling and blocking in C++ using the BLAIS.
� Use a good optimizing compiler to remove layers of
abstraction: lightweight object optimization and inlining.
� Follow a set of coding guidelines to ensure the above
optimizations can be made, and double check the
intermediate C code.
� Don't interfere with backend compiler unrolling and
scheduling.
22
![Page 23: Generic Programming for High Performance Numerical Linear ...ecee.colorado.edu/~siek/pubs/pubs/1998/siek98:_mtl_scitools.pdf · Generic Programming for High Performance Numerical](https://reader031.vdocuments.net/reader031/viewer/2022022706/5be2797409d3f2f02d8bde5f/html5/thumbnails/23.jpg)
Dense Matrix-Matrix Performance
(UltraSPARC 170E)
101
102
103
0
50
100
150
200
250
300
Matrix Size
Mflo
ps
MTLSun Perf LibATLASFortran BLASTNT
23
![Page 24: Generic Programming for High Performance Numerical Linear ...ecee.colorado.edu/~siek/pubs/pubs/1998/siek98:_mtl_scitools.pdf · Generic Programming for High Performance Numerical](https://reader031.vdocuments.net/reader031/viewer/2022022706/5be2797409d3f2f02d8bde5f/html5/thumbnails/24.jpg)
Dense Matrix-Matrix Performance
(RS6000 590)
101
102
103
0
50
100
150
200
250
300
Matrix Size
Mflo
ps
MTLESSL
24
![Page 25: Generic Programming for High Performance Numerical Linear ...ecee.colorado.edu/~siek/pubs/pubs/1998/siek98:_mtl_scitools.pdf · Generic Programming for High Performance Numerical](https://reader031.vdocuments.net/reader031/viewer/2022022706/5be2797409d3f2f02d8bde5f/html5/thumbnails/25.jpg)
Dense Matrix-Vector Performance
(UltraSPARC 170E)
0 50 100 150 200 250 300 350 4000
20
40
60
80
100
120
140
160
N
Mflo
ps
MTLFortran BLASSun Perf LibTNT
25
![Page 26: Generic Programming for High Performance Numerical Linear ...ecee.colorado.edu/~siek/pubs/pubs/1998/siek98:_mtl_scitools.pdf · Generic Programming for High Performance Numerical](https://reader031.vdocuments.net/reader031/viewer/2022022706/5be2797409d3f2f02d8bde5f/html5/thumbnails/26.jpg)
Sparse Matrix-Vector Performance
(UltraSPARC 170E)
100
101
102
0
10
20
30
40
50
60
70
Average non zeroes per row
Mflo
ps
MTLSPARSKITNISTTNT
26
![Page 27: Generic Programming for High Performance Numerical Linear ...ecee.colorado.edu/~siek/pubs/pubs/1998/siek98:_mtl_scitools.pdf · Generic Programming for High Performance Numerical](https://reader031.vdocuments.net/reader031/viewer/2022022706/5be2797409d3f2f02d8bde5f/html5/thumbnails/27.jpg)
Conclusion� High performance linear algebra
� Comprehensive (sparse, dense, etc.) and orthogonal
� Only 25,000 lines of code
� 150,000 lines for Fortran BLAS
27