claude tadonki mines paristech – cri – mathématiques et systèmes

9
Claude Tadonki Mines ParisTech – CRI – Mathématiques et Systèmes Laboratoire de l’Accélérateur Linéaire/IN2P3/CNRS France [email protected] 2nd Workshop on Architecture and Multi-Core Applications 23rd International Symposium on Computer Architecture and High Performance Computing (SBAC PAD 2011) October, 26 – 29 2010, Vitória, Espírito Santo, Brazil.

Upload: cortez

Post on 11-Jan-2016

47 views

Category:

Documents


3 download

DESCRIPTION

Claude Tadonki Mines ParisTech – CRI – Mathématiques et Systèmes Laboratoire de l’Accélérateur Linéaire/IN2P3/CNRS France [email protected]. 2nd Workshop on Architecture and Multi-Core Applications - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Claude Tadonki Mines ParisTech – CRI – Mathématiques et Systèmes

Claude TadonkiMines ParisTech – CRI – Mathématiques et Systèmes

Laboratoire de l’Accélérateur Linéaire/IN2P3/CNRSFrance

[email protected]

2nd Workshop on Architecture and Multi-Core Applications23rd International Symposium on Computer Architecture and High Performance Computing (SBAC PAD 2011)

October, 26 – 29 2010, Vitória, Espírito Santo, Brazil.

Page 2: Claude Tadonki Mines ParisTech – CRI – Mathématiques et Systèmes

Large Scale Kronecker Product on Supercomputers C. TADONKI

The Kronecker product (définition and applications)The Kronecker product (définition and applications)

2nd Workshop on Architecture and Multi-Core Applications23rd International Symposium on Computer Architecture and High Performance Computing (SBAC PAD 2011)

October, 26 – 29 2010, Vitória, Espírito Santo, Brazil.

Page 3: Claude Tadonki Mines ParisTech – CRI – Mathématiques et Systèmes

Large Scale Kronecker Product on Supercomputers C. TADONKI

The Kronecker product (properties and problem formulation)The Kronecker product (properties and problem formulation)

2nd Workshop on Architecture and Multi-Core Applications23rd International Symposium on Computer Architecture and High Performance Computing (SBAC PAD 2011)

October, 26 – 29 2010, Vitória, Espírito Santo, Brazil.

Page 4: Claude Tadonki Mines ParisTech – CRI – Mathématiques et Systèmes

Large Scale Kronecker Product on Supercomputers C. TADONKI

The Kronecker (complexity and recurrence equation)The Kronecker (complexity and recurrence equation)

2nd Workshop on Architecture and Multi-Core Applications23rd International Symposium on Computer Architecture and High Performance Computing (SBAC PAD 2011)

October, 26 – 29 2010, Vitória, Espírito Santo, Brazil.

Forming the matrix first would • require a huge amount of memory• yield lot of redundant multiplication, which in total would be

Using the so-called normal factorization, we could derive an optimal scheme which reduces the number of floatting point multiplication to

Page 5: Claude Tadonki Mines ParisTech – CRI – Mathématiques et Systèmes

Large Scale Kronecker Product on Supercomputers C. TADONKI

The Kronecker product and its applicationsThe Kronecker product and its applications

2nd Workshop on Architecture and Multi-Core Applications23rd International Symposium on Computer Architecture and High Performance Computing (SBAC PAD 2011)

October, 26 – 29 2010, Vitória, Espírito Santo, Brazil.

Page 6: Claude Tadonki Mines ParisTech – CRI – Mathématiques et Systèmes

Large Scale Kronecker Product on Supercomputers C. TADONKI

Performance issues and heuristic for finding a good topology Performance issues and heuristic for finding a good topology

2nd Workshop on Architecture and Multi-Core Applications23rd International Symposium on Computer Architecture and High Performance Computing (SBAC PAD 2011)

October, 26 – 29 2010, Vitória, Espírito Santo, Brazil.

The total (parallel) execution time depends on• the sizes of the matrices• the gap between virtual topology and physical topology• the way the task is splitted among the processors (decomposition)

Page 7: Claude Tadonki Mines ParisTech – CRI – Mathématiques et Systèmes

Large Scale Kronecker Product on Supercomputers C. TADONKI

Performances Performances

2nd Workshop on Architecture and Multi-Core Applications23rd International Symposium on Computer Architecture and High Performance Computing (SBAC PAD 2011)

October, 26 – 29 2010, Vitória, Espírito Santo, Brazil.

We consider N = 6 matrices of orders 30, 36, 32, 18, 24, 16,thus L = 159 252 480

We see that• our heuristic yields a significant improvment compare to trivial decompositions• we start loosing the scalabily when the number of cores increases (com)We the turn to hybrid implementation

Page 8: Claude Tadonki Mines ParisTech – CRI – Mathématiques et Systèmes

Large Scale Kronecker Product on Supercomputers C. TADONKI

Performance of the hybrid implementationPerformance of the hybrid implementation

2nd Workshop on Architecture and Multi-Core Applications23rd International Symposium on Computer Architecture and High Performance Computing (SBAC PAD 2011)

October, 26 – 29 2010, Vitória, Espírito Santo, Brazil.

We see that• the hybrid implementation is better for larger number of cores• for smaller number of cores, the SM implemntation exacerbates on cache missesNeed to investigate on the compromise and a better memory layout.

Page 9: Claude Tadonki Mines ParisTech – CRI – Mathématiques et Systèmes

END & QUESTIONSEND & QUESTIONS

2nd Workshop on Architecture and Multi-Core Applications23rd International Symposium on Computer Architecture and High Performance Computing (SBAC PAD 2011)

October, 26 – 29 2010, Vitória, Espírito Santo, Brazil.

Large Scale Kronecker Product on Supercomputers C. TADONKI