cs 591x – cluster computing and programming parallel computers parallel libraries

27
CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries

Upload: thomas-walters

Post on 26-Dec-2015

237 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries

CS 591x – Cluster Computing and Programming Parallel Computers

Parallel Libraries

Page 2: CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries

Parallel Libraries

Recall that so far we have been – Breaking up (decomposing) our

“large” problems into smaller pieces…

Distributing the pieces of the problem to multiple processors

Explicitly moving data among processes through message passing

Page 3: CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries

Parallel Libraries

Note that – Large scientific and engineering

problems often represent data in matrices and vectors

Large scientific and engineering problems make heavy use of linear algebra, linear systems, non-linear systems

Page 4: CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries

Parallel Libraries

MPI is designed to support the development of librariesConsequently, there are a number of libraries, based on MPI, used to develop parallel softwareSome libraries take care of much, or all of the parallelizationThat means….

Page 5: CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries

Parallel Libraries

… You don’t have to…… but you still can…… if you want… sometimes…

Page 6: CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries

Parallel Libraries

ScaLAPACK Scalable Linear Algebra PACKage

PETSc Portable, Extensible Toolkit for

Scientific Computation

Page 7: CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries

ScaLaPACK

Built on LAPACK – Linear Algebra Package Powerful Widely used in scientific and engineering

computing not scalable to distributed memory parallel

computers

LAPACK is built on BLAS – the Basic Linear Algebra Subprogram library

Page 8: CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries

ScaLAPACK

uses PBLAS – Parallel BLAS performs local matrix and vector operations

in parallel application uses BLAS

uses BLACS – Basic Linear Algebra Communications Subprograms library handles interprocess communications for

ScaLAPACK uses MPI (other implementations also)

Page 9: CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries

ScaLAPACK

Maps matrices and vectors to a process grid called a BLACSgrid similar to an MPI Cartesian topology matrices and vectors decomposed

into rectangular blocks – block cyclically distributed to BLACSgrid

Page 10: CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries

ScaLAPACK – sample based on Pacheco pg. 345-350

MPI_Init(&argc, &argv);

MPI_Comm_size(MPI_COMM_WORLD, &p);

MPI_Comm_rank(MPI_COMM_WORLD,&myrank);

Get_input(p, myrank, &n, &n_proc_rows,&nproc_cols,

&row_block_size, &col_block_size);

m=n;

Cblacs_get(0,0,&blacs_grid); /* build blacs grid */

/* R process grid will use row major order */

Cblacs_gridinit(&blacs_grid,”R”,nproc_rows, nproc_cols);

Cblacs_pcoord(blacs_grid,my_rank,&my_proc_row,&my_proc_col);

Page 11: CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries

ScaLAPACK – sample cont.local_mat_rows=get_dim(m,row_block_size,my_proc_row,nproc_rows);

local_mat_cols=get_dim(n,col_block_size,my_proc_col,nproc_cols);

Allocate(my_rank,”A”,&A_local,local_mat_rows*local_mat_cols,1);

b_local_size=get_dim(m,row_block_size,my_proc_row,nproc_rows);

Allocate(my_rank,”b”,b_local,b_local_size,1);

exact_local_size=get_dim(m,col_block_size,my_proc_row,nproc_rows);

Allocate(myrank,”Exact”,&exact_local,exact_local_size,1);

Page 12: CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries

ScaLAPACK – sample cont.Build_descript(my_rank,”A”,A_descript,m,n,row_block_size,col_block_size,blacs_grid,local_mat_rows);

Build_descript(my_rank,”B”,b_descript,m,1,row_block_size,1,blacs_grid,b_local_size);

Build_descript(my_rank,”Exact”,exact_descript,n,1,col_block_size,1,blacs_grid,exact_local_size);

Page 13: CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries

scaLAPACK – sample cont.Initialize(p,my_rank,A_local,local_mat_rows,local_mat_cols,exact_local,exact_local_size);

Mat_vect_mult(m,n,A_local,A_descript, exact_local, exact_descript, b_local, b_descript);

Allocate(my_rank,”pivot_list”,&pivot_list,local_mat_rows + row_block_size,0);

MPI_Barrier(MPI_COMM_WORLD);

Page 14: CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries

scaLAPACK – sample cont./* psgesv solves Ax=b returns solution in b */

solve(my_rank,n,A_local,A_descript,pivot_list, b_local, b_descript);

Cblacs_exit(1);

MPI_Finalize();

}

Page 15: CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries

scaLAPACK – sample cont.void Mat_vect_mult(int m, int n, float* A_local,

int A_descript, float* x_local, int* x_descript, float y_local, int* y_descript) (

char transpose = ‘N’;

psgemv(&transpose, &m, &n, &alpha, A_local, &first_row_A, &first_col_A, A_descript, x_local, &first_row_x, &first_col_x, x_descript, &beta,

y_local, &first_row_y, &first_col_y, y_descript,

y_increment);

}

Page 16: CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries

Crossing Languages – Some Issues

Calling routines from another language calling Fortran subroutine in C

Using n dimensional arrays remember row major vs column major

Passing arguments in routine/function calls Fortran passes by address, C passes

by value

Page 17: CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries

PETSc

Portable, Extensible Toolkit for Scientific ComputationLarge, powerfulSolves Partial differential equations Linear systems Non-linear systems

Solves matrices – Dense Sparse

Page 18: CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries

PETSc

PETSc routines return error codesPETSc error checking routines to help troubleshoot problems CHKERRRA(errorcode)

Page 19: CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries

PETSc

Built on top of MPIDeveloped primarily for C/C++ unlike scaLAPACK has Fortran interface

Dense and sparce matrices same interface

Page 20: CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries

PETScIncludes many non-blocking operations i.e. any process can update any cell matrix as

non-blocking operation --- other work can be going on while this update

operation is carried out

Many options available from command linePETSc includes many solversSolvers can be selected from command line can change solvers without recompiling

PETSC_DECIDES

Page 21: CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries

PETSc

from -- http://www.epcc.ed.ac.uk/tracsbin/petsc-2.0.24/docs/splitmanual/node2.html#Node2

Page 22: CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries

PETSc

from -- http://www.epcc.ed.ac.uk/tracsbin/petsc-2.0.24/docs/splitmanual/node2.html#Node2

Page 23: CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries

PETSc – sample routinesPetscOptionsGetInt(PETSC_NULL, “-n”, &n, &flg);

VecSetType(Vec x, Vec_type vec_type);

VecCreate(MPI_Comm comm, Vec *x);

VecSetSizes(Vec x, int m, int M);

VecDuplicate(Vec old, Vec new);

MatCreate(MPI_Comm comm, int m, int n, int M, int N, Mat* A);

MatSetValues(Mat A, int m, int* im, int n, int* in,

PetscScalar *values, INSERT_VALUES);

Page 24: CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries

PETSc – sample routinesMatAssemblyBegin(Mat A, MAT_FINAL_ASSEMBLY);

MatAssemblyEnd(Mat A, MAT_FINAL_ASSEMBLY);

KSPCreate(MPI_Comm comm, KSP *ksp);

KSPSolve(KSP ksp, Vec b, Vec x);

PetscInitialize(&argc, &argv);

PetscFinalize();

Page 25: CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries

BLAS (Basic Linear Algebra Subprograms http://www.netlib.org/blas/

LAPACK Linear Algebra PACKage http://www.netlib.org/lapack/ http://www.netlib.org/lapack/lug/

index.html

ScaLaPACK http://www.netlib.org/scalapack/scalapac

k_home.html

Page 26: CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries

PETSc http://www-unix.mcs.anl.gov/petsc/petsc-

as/ http://acts.nersc.gov/petsc/ http://www.chuug.org/talks/petsc.pdf http://www.epcc.ed.ac.uk/tracsbin/petsc-

2.0.24/docs/splitmanual/manual.html#Node0

Page 27: CS 591x – Cluster Computing and Programming Parallel Computers Parallel Libraries