blocked algorithms for the matrix square root - university of …siam/files/snscc12/... · 2013. 1....

49
Blocked Algorithms for the Matrix Square Root Edvin Deadman 1 Nick Higham 2 Rui Ralha 3 1 University of Manchester / Numerical Algorithms Group 2 University of Manchester 3 University of Minho, Portugal May 18, 2012 Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, Portugal Blocked Algorithms for the Matrix Square Root 1/29

Upload: others

Post on 31-Jan-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

  • Blocked Algorithms for the Matrix SquareRoot

    Edvin Deadman 1 Nick Higham 2 Rui Ralha 3

    1University of Manchester / Numerical Algorithms Group

    2University of Manchester

    3University of Minho, Portugal

    May 18, 2012

    Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 1/29

  • Outline

    1. Numerical Libraries and NAG2. The Matrix Square Root:

    Definition and UsesThe Schur MethodBlockingParallelism

    3. Implementation of Algorithms in Numerical Libraries

    Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 2/29

  • Why use numerical libraries?

    Numerical computation is difficult to do accurately.Writing routines from scratch is time consuming.Commonly encountered problems:

    Overflow/underflow: how does the computer deal with large/ small numbers?Condition: how sensitive is the solution to small changes inthe input?Stability: how sensitive is the computation to roundingerrors?

    Important considerations:Error analysis.Information about error bounds.Parallelism.

    Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 3/29

  • How a numerical library is used

    Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 4/29

  • The NAG Library

    Root FindingSummation of SeriesQuadratureOrdinary Differential EquationsPartial Differential EquationsNumerical DifferentiationIntegral EquationsMesh GenerationInterpolationCurve and Surface FittingOptimizationApproximations of SpecialFunctions

    Dense Linear AlgebraSparse Linear AlgebraCorrelation & Regression AnalysisMultivariate MethodsAnalysis of VarianceRandom Number GeneratorsUnivariate EstimationNonparametric StatisticsSmoothing in StatisticsContingency Table AnalysisSurvival AnalysisTime Series AnalysisOperations Research

    Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 5/29

  • Defining the Matrix Square Root

    A square root of A is a solution of X 2 = A.

    There can be infinitely many square roots. They are neverunique: (

    cos θ sin θsin θ − cos θ

    )2=

    (1 00 1

    ).

    If A has m Jordan blocks and s ≤ m distinct eigenvaluesthen there are 2s square roots that are primary functions ofA. [5]

    If A has no eigenvalues on the negative real line, thenthere is a unique principal square root A1/2 witheigenvalues in the right half plane.

    Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 6/29

  • Defining the Matrix Square Root

    A square root of A is a solution of X 2 = A.There can be infinitely many square roots. They are neverunique: (

    cos θ sin θsin θ − cos θ

    )2=

    (1 00 1

    ).

    If A has m Jordan blocks and s ≤ m distinct eigenvaluesthen there are 2s square roots that are primary functions ofA. [5]

    If A has no eigenvalues on the negative real line, thenthere is a unique principal square root A1/2 witheigenvalues in the right half plane.

    Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 6/29

  • Defining the Matrix Square Root

    A square root of A is a solution of X 2 = A.There can be infinitely many square roots. They are neverunique: (

    cos θ sin θsin θ − cos θ

    )2=

    (1 00 1

    ).

    If A has m Jordan blocks and s ≤ m distinct eigenvaluesthen there are 2s square roots that are primary functions ofA. [5]

    If A has no eigenvalues on the negative real line, thenthere is a unique principal square root A1/2 witheigenvalues in the right half plane.

    Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 6/29

  • Defining the Matrix Square Root

    A square root of A is a solution of X 2 = A.There can be infinitely many square roots. They are neverunique: (

    cos θ sin θsin θ − cos θ

    )2=

    (1 00 1

    ).

    If A has m Jordan blocks and s ≤ m distinct eigenvaluesthen there are 2s square roots that are primary functions ofA. [5]

    If A has no eigenvalues on the negative real line, thenthere is a unique principal square root A1/2 witheigenvalues in the right half plane.

    Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 6/29

  • Defining the Matrix Square Root: Example

    A =

    1 1 0 00 1 0 00 0 2 10 0 0 2

    Four primary square roots:1 12 0 00 1 0 00 0

    √2 1

    2√

    20 0 0

    √2

    ,−1 −12 0 00 −1 0 00 0 −

    √2 − 1

    2√

    20 0 0 −

    √2

    ,

    1 12 0 00 1 0 00 0 −

    √2 − 1

    2√

    20 0 0 −

    √2

    ,−1 −12 0 00 −1 0 00 0

    √2 1

    2√

    20 0 0

    √2

    .

    Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 7/29

  • Defining the Matrix Square Root: Example

    A =

    1 1 0 00 1 0 00 0 2 10 0 0 2

    Four primary square roots:

    1 12 0 00 1 0 00 0

    √2 1

    2√

    20 0 0

    √2

    ,−1 −12 0 00 −1 0 00 0 −

    √2 − 1

    2√

    20 0 0 −

    √2

    ,

    1 12 0 00 1 0 00 0 −

    √2 − 1

    2√

    20 0 0 −

    √2

    ,−1 −12 0 00 −1 0 00 0

    √2 1

    2√

    20 0 0

    √2

    .Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, Portugal

    Blocked Algorithms for the Matrix Square Root 7/29

  • Defining the Matrix Square Root: Example

    A =

    1 1 0 00 1 0 00 0 2 10 0 0 2

    Four primary square roots:

    1 12 0 00 1 0 00 0

    √2 1

    2√

    20 0 0

    √2

    ,−1 −12 0 00 −1 0 00 0 −

    √2 − 1

    2√

    20 0 0 −

    √2

    ,

    1 12 0 00 1 0 00 0 −

    √2 − 1

    2√

    20 0 0 −

    √2

    ,−1 −12 0 00 −1 0 00 0

    √2 1

    2√

    20 0 0

    √2

    .Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, Portugal

    Blocked Algorithms for the Matrix Square Root 8/29

  • Uses of the Matrix Square Root

    Markov models of finance and population dynamics:If P(t) is the transition matrix for a time step t , then

    √P

    could be used for t2 .Roots of stochastic matrices: still an open problem.

    Solution of differential equations:ÿ + Ay = 0 has solution y = cos(

    √At)y0 + sin(

    √At)y1.

    Polar decomposition / matrix sign decomposition.Important kernel routine for computing matrix logarithm,pth roots and powers and trigonometric matrixfunctions. [1;6]

    Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 9/29

  • Uses of the Matrix Square Root

    Markov models of finance and population dynamics:If P(t) is the transition matrix for a time step t , then

    √P

    could be used for t2 .Roots of stochastic matrices: still an open problem.

    Solution of differential equations:ÿ + Ay = 0 has solution y = cos(

    √At)y0 + sin(

    √At)y1.

    Polar decomposition / matrix sign decomposition.Important kernel routine for computing matrix logarithm,pth roots and powers and trigonometric matrixfunctions. [1;6]

    Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 9/29

  • Uses of the Matrix Square Root

    Markov models of finance and population dynamics:If P(t) is the transition matrix for a time step t , then

    √P

    could be used for t2 .Roots of stochastic matrices: still an open problem.

    Solution of differential equations:ÿ + Ay = 0 has solution y = cos(

    √At)y0 + sin(

    √At)y1.

    Polar decomposition / matrix sign decomposition.

    Important kernel routine for computing matrix logarithm,pth roots and powers and trigonometric matrixfunctions. [1;6]

    Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 9/29

  • Uses of the Matrix Square Root

    Markov models of finance and population dynamics:If P(t) is the transition matrix for a time step t , then

    √P

    could be used for t2 .Roots of stochastic matrices: still an open problem.

    Solution of differential equations:ÿ + Ay = 0 has solution y = cos(

    √At)y0 + sin(

    √At)y1.

    Polar decomposition / matrix sign decomposition.Important kernel routine for computing matrix logarithm,pth roots and powers and trigonometric matrixfunctions. [1;6]

    Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 9/29

  • The Schur Method

    1. Compute a Schur decomposition: A = QTQ∗ with T uppertriangular. [2]

    2. Expand U2 = T elementwise. For a primary square root, Uis also upper triangular:

    U2ii = Tii ,

    UiiUij + UijUjj = Tij −j−1∑

    k=i+1

    UikUkj .

    U is found either a column or a superdiagonal at a time.3.√

    A = QUQ∗

    Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 10/29

  • The Schur Method: Properties

    Cost: 2813n3 flops.

    Real arithmetic version uses quasi upper triangular Schurdecomposition and 2× 2 diagonal block structure. [4]

    Computed square root of the triangular matrix Û satisfiesÛ2 = T + ∆T , where

    |∆T | ≤ γ̃n|Û|2

    (this only holds normwise in the real arithmetic version).

    No use of level 3 BLAS - slow!Now focus on the triangular phase of the algorithm.

    Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 11/29

  • The Schur Method: Properties

    Cost: 2813n3 flops.

    Real arithmetic version uses quasi upper triangular Schurdecomposition and 2× 2 diagonal block structure. [4]

    Computed square root of the triangular matrix Û satisfiesÛ2 = T + ∆T , where

    |∆T | ≤ γ̃n|Û|2

    (this only holds normwise in the real arithmetic version).No use of level 3 BLAS - slow!Now focus on the triangular phase of the algorithm.

    Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 11/29

  • Run times for random complex triangular matrices

    0 1000 2000 3000 4000 5000 6000 7000 80000

    50

    100

    150

    200

    250

    300

    350

    400

    450

    n

    time

    (s)

    point

    Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 12/29

  • The Blocked Schur Method

    The Uij and Tij are now taken to be blocks:

    U2ii = Tii , (1)

    UiiUij + UijUjj = Tij −j−1∑

    k=i+1

    UikUkj . (2)

    Solve (1) using the point method and (2) by solving theSylvester equation (e.g. xTRSYL in LAPACK).

    Same error bounds hold true.Over 90% of run time spent in GEMM calls and 8% inSylvester equation solution.Not very sensitive to block size or choice of kernel routines.

    Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 13/29

  • The Blocked Schur Method

    The Uij and Tij are now taken to be blocks:

    U2ii = Tii , (1)

    UiiUij + UijUjj = Tij −j−1∑

    k=i+1

    UikUkj . (2)

    Solve (1) using the point method and (2) by solving theSylvester equation (e.g. xTRSYL in LAPACK).Same error bounds hold true.Over 90% of run time spent in GEMM calls and 8% inSylvester equation solution.Not very sensitive to block size or choice of kernel routines.

    Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 13/29

  • Run times for random complex triangular matrices

    0 1000 2000 3000 4000 5000 6000 7000 80000

    50

    100

    150

    200

    250

    300

    350

    400

    450

    n

    time

    (s)

    pointblock

    Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 14/29

  • The Recursive Blocked Schur Method

    Recursive solution of the triangular phase: [3](U11 U120 U22

    )2=

    (T11 T120 T22

    ).

    U211 = T11 and U222 = T22 are solved recursively.

    Solve U11U12 + U12U22 = T12 using the recursive methodof Jonsson & Kågström. [7]

    ‘Point’ algorithms are used when the recursion hasreached a certain size (e.g. n = 64).Same error bound holds as point algorithm (proof byinduction).

    Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 15/29

  • Run times for random complex triangular matrices

    0 1000 2000 3000 4000 5000 6000 7000 80000

    50

    100

    150

    200

    250

    300

    350

    400

    450

    n

    time

    (s)

    pointblockrecursion

    Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 16/29

  • Aside: multiplying two triangular matrices

    0 1000 2000 3000 4000 5000 6000 7000 80000

    50

    100

    150

    200

    250

    n

    time

    (s)

    pointblockrecursion

    Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 17/29

  • Full complex matrices (called from MATLAB)

    0 200 400 600 800 1000 1200 1400 1600 1800 20000

    50

    100

    150

    200

    250

    n

    time

    (s)

    sqrtmfort_pointfort_recurse

    Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 18/29

  • Full real matrices (called from MATLAB)

    0 200 400 600 800 1000 1200 1400 1600 1800 20000

    50

    100

    150

    200

    250

    n

    time

    (s)

    sqrtm_realsqrtmfort_point_realfort_recurse_real

    Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 19/29

  • Parallelism

    Which approach is best in parallel?

    Options for parallelism:Threaded BLAS.Explicit loop-based parallelism of the triangular phase.OpenMP tasks.

    Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 20/29

  • Parallelism

    Which approach is best in parallel?

    Options for parallelism:Threaded BLAS.Explicit loop-based parallelism of the triangular phase.OpenMP tasks.

    Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 20/29

  • Parallel Point and Block Methods

    Blocks below and left of (i , j) block must be computed first:

    Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 21/29

  • Parallel Point and Block Methods

    Synchronisation required after each superdiagonal:

    Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 22/29

  • Parallel Point and Block Methods

    Synchronisation required after each superdiagonal:

    Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 22/29

  • Parallel Point and Block Methods

    Synchronisation required after each superdiagonal:

    Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 22/29

  • Parallel Point and Block Methods

    Synchronisation required after each superdiagonal:

    Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 22/29

  • Recursive Blocking in Parallel

    Task based approach - each recursive call generates newtasks. Synchronization points required:

    Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 23/29

  • Recursive Blocking in Parallel

    Task based approach - each recursive call generates newtasks. Synchronization points required:

    Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 23/29

  • Recursive Blocking in Parallel

    Task based approach - each recursive call generates newtasks. Synchronization points required:

    Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 23/29

  • Recursive Blocking in Parallel

    Task based approach - each recursive call generates newtasks. Synchronization points required:

    Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 23/29

  • Recursive Blocking in Parallel

    Task based approach - each recursive call generates newtasks. Synchronization points required:

    Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 23/29

  • Parallel Results for 4000× 4000 Triangular Matrices

    1 Thread 8 Threads − BLAS 8 Threads − OpenMP0

    20

    40

    60

    80

    100

    120

    140tim

    e (s

    )

    pointblockrecursion

    Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 24/29

  • Parallel Results for 4000× 4000 Full Matrices

    1 Thread 8 Threads − BLAS 8 Threads − OpenMP0

    100

    200

    300

    400

    500

    600

    700

    800

    900

    1000tim

    e (s

    )

    pointblockrecursion

    Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 25/29

  • Implementation for the NAG Library

    Key: robustness and error handling.

    Extension to singular matrices:Eigenvalue considered to vanish if λ < �‖A‖Reorder and check if the vanishing eigenvalue issemisimple.

    Negative eigenvalues:Eigenvalue considered to lie on R− if Re(λ) < 0 and|Im(λ)| < �|Re(λ)|.A non-principal square root can be returned in this case.

    Condition estimation:Condition number estimates and residual bounds areavailable for the matrix square root.Condition number is expensive to compute.

    Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 26/29

  • Implementation for the NAG Library

    Key: robustness and error handling.Extension to singular matrices:

    Eigenvalue considered to vanish if λ < �‖A‖Reorder and check if the vanishing eigenvalue issemisimple.

    Negative eigenvalues:Eigenvalue considered to lie on R− if Re(λ) < 0 and|Im(λ)| < �|Re(λ)|.A non-principal square root can be returned in this case.

    Condition estimation:Condition number estimates and residual bounds areavailable for the matrix square root.Condition number is expensive to compute.

    Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 26/29

  • Implementation for the NAG Library

    Key: robustness and error handling.Extension to singular matrices:

    Eigenvalue considered to vanish if λ < �‖A‖Reorder and check if the vanishing eigenvalue issemisimple.

    Negative eigenvalues:Eigenvalue considered to lie on R− if Re(λ) < 0 and|Im(λ)| < �|Re(λ)|.A non-principal square root can be returned in this case.

    Condition estimation:Condition number estimates and residual bounds areavailable for the matrix square root.Condition number is expensive to compute.

    Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 26/29

  • Implementation for the NAG Library

    Key: robustness and error handling.Extension to singular matrices:

    Eigenvalue considered to vanish if λ < �‖A‖Reorder and check if the vanishing eigenvalue issemisimple.

    Negative eigenvalues:Eigenvalue considered to lie on R− if Re(λ) < 0 and|Im(λ)| < �|Re(λ)|.A non-principal square root can be returned in this case.

    Condition estimation:Condition number estimates and residual bounds areavailable for the matrix square root.Condition number is expensive to compute.

    Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 26/29

  • Implementation for the NAG Library

    Real and complex routines, with or without conditionestimation, and extra routines specifically for triangularmatrices.Test programs:

    Check the residual (√

    A)2 − A against the theoreticalresidual bound.Test against computation in VPA using condition estimate.Tricky test cases e.g. almost singular matrices, nearlynegative eigenvalues.Error exits and illegal inputs.

    Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 27/29

  • Summary

    Recursive blocking can be fast, when the algorithm allows.NAG implementation uses OpenMP, so will use standardblocking.Matrix square root is a key computational kernel - BLAS?The fastest method in serial is not necessarily the fastestmethod in parallel.

    Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 28/29

  • References[1] A. H. Al-Mohy and N. J. Higham. Improved inverse scaling and

    squaring algorithms for the matrix logarithm. MIMS Eprint, 83,2011.

    [2] Å. Björck and S. Hammarling. A Schur method for the square rootof a matrix. Lin. Alg. & Appl., 52/53:127–140, 1983.

    [3] E. Deadman, N. J. Higham, and R. Ralha. A recursive blockedSchur algorithm for computing the matrix square root. MIMSEprint, 26, 2012.

    [4] N. J. Higham. Computing real square roots of a real matrix. Lin.Alg. & Appl., 88/89:405–430, 1987.

    [5] N. J. Higham. Functions of Matrices: Theory and Computation.SIAM, 2008.

    [6] N. J. Higham and L. Lin. A Schur-Padé algorithm for fractionalpowers of a matrix. SIAM J. Matrix Anal. & Appl.,32(3):1056–1078, 2010.

    [7] I. Jonsson and B. Kågström. Recursive blocked algorithms forsolving triangular systems - part I: One-sided and coupledSylvester-type matrix equations. ACM Transactions onMathematical Software, 28(4):392–415, December 2002.

    Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 29/29