blocked algorithms for the matrix square root - university of …siam/files/snscc12/... · 2013. 1....
TRANSCRIPT
-
Blocked Algorithms for the Matrix SquareRoot
Edvin Deadman 1 Nick Higham 2 Rui Ralha 3
1University of Manchester / Numerical Algorithms Group
2University of Manchester
3University of Minho, Portugal
May 18, 2012
Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 1/29
-
Outline
1. Numerical Libraries and NAG2. The Matrix Square Root:
Definition and UsesThe Schur MethodBlockingParallelism
3. Implementation of Algorithms in Numerical Libraries
Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 2/29
-
Why use numerical libraries?
Numerical computation is difficult to do accurately.Writing routines from scratch is time consuming.Commonly encountered problems:
Overflow/underflow: how does the computer deal with large/ small numbers?Condition: how sensitive is the solution to small changes inthe input?Stability: how sensitive is the computation to roundingerrors?
Important considerations:Error analysis.Information about error bounds.Parallelism.
Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 3/29
-
How a numerical library is used
Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 4/29
-
The NAG Library
Root FindingSummation of SeriesQuadratureOrdinary Differential EquationsPartial Differential EquationsNumerical DifferentiationIntegral EquationsMesh GenerationInterpolationCurve and Surface FittingOptimizationApproximations of SpecialFunctions
Dense Linear AlgebraSparse Linear AlgebraCorrelation & Regression AnalysisMultivariate MethodsAnalysis of VarianceRandom Number GeneratorsUnivariate EstimationNonparametric StatisticsSmoothing in StatisticsContingency Table AnalysisSurvival AnalysisTime Series AnalysisOperations Research
Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 5/29
-
Defining the Matrix Square Root
A square root of A is a solution of X 2 = A.
There can be infinitely many square roots. They are neverunique: (
cos θ sin θsin θ − cos θ
)2=
(1 00 1
).
If A has m Jordan blocks and s ≤ m distinct eigenvaluesthen there are 2s square roots that are primary functions ofA. [5]
If A has no eigenvalues on the negative real line, thenthere is a unique principal square root A1/2 witheigenvalues in the right half plane.
Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 6/29
-
Defining the Matrix Square Root
A square root of A is a solution of X 2 = A.There can be infinitely many square roots. They are neverunique: (
cos θ sin θsin θ − cos θ
)2=
(1 00 1
).
If A has m Jordan blocks and s ≤ m distinct eigenvaluesthen there are 2s square roots that are primary functions ofA. [5]
If A has no eigenvalues on the negative real line, thenthere is a unique principal square root A1/2 witheigenvalues in the right half plane.
Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 6/29
-
Defining the Matrix Square Root
A square root of A is a solution of X 2 = A.There can be infinitely many square roots. They are neverunique: (
cos θ sin θsin θ − cos θ
)2=
(1 00 1
).
If A has m Jordan blocks and s ≤ m distinct eigenvaluesthen there are 2s square roots that are primary functions ofA. [5]
If A has no eigenvalues on the negative real line, thenthere is a unique principal square root A1/2 witheigenvalues in the right half plane.
Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 6/29
-
Defining the Matrix Square Root
A square root of A is a solution of X 2 = A.There can be infinitely many square roots. They are neverunique: (
cos θ sin θsin θ − cos θ
)2=
(1 00 1
).
If A has m Jordan blocks and s ≤ m distinct eigenvaluesthen there are 2s square roots that are primary functions ofA. [5]
If A has no eigenvalues on the negative real line, thenthere is a unique principal square root A1/2 witheigenvalues in the right half plane.
Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 6/29
-
Defining the Matrix Square Root: Example
A =
1 1 0 00 1 0 00 0 2 10 0 0 2
Four primary square roots:1 12 0 00 1 0 00 0
√2 1
2√
20 0 0
√2
,−1 −12 0 00 −1 0 00 0 −
√2 − 1
2√
20 0 0 −
√2
,
1 12 0 00 1 0 00 0 −
√2 − 1
2√
20 0 0 −
√2
,−1 −12 0 00 −1 0 00 0
√2 1
2√
20 0 0
√2
.
Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 7/29
-
Defining the Matrix Square Root: Example
A =
1 1 0 00 1 0 00 0 2 10 0 0 2
Four primary square roots:
1 12 0 00 1 0 00 0
√2 1
2√
20 0 0
√2
,−1 −12 0 00 −1 0 00 0 −
√2 − 1
2√
20 0 0 −
√2
,
1 12 0 00 1 0 00 0 −
√2 − 1
2√
20 0 0 −
√2
,−1 −12 0 00 −1 0 00 0
√2 1
2√
20 0 0
√2
.Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, Portugal
Blocked Algorithms for the Matrix Square Root 7/29
-
Defining the Matrix Square Root: Example
A =
1 1 0 00 1 0 00 0 2 10 0 0 2
Four primary square roots:
1 12 0 00 1 0 00 0
√2 1
2√
20 0 0
√2
,−1 −12 0 00 −1 0 00 0 −
√2 − 1
2√
20 0 0 −
√2
,
1 12 0 00 1 0 00 0 −
√2 − 1
2√
20 0 0 −
√2
,−1 −12 0 00 −1 0 00 0
√2 1
2√
20 0 0
√2
.Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, Portugal
Blocked Algorithms for the Matrix Square Root 8/29
-
Uses of the Matrix Square Root
Markov models of finance and population dynamics:If P(t) is the transition matrix for a time step t , then
√P
could be used for t2 .Roots of stochastic matrices: still an open problem.
Solution of differential equations:ÿ + Ay = 0 has solution y = cos(
√At)y0 + sin(
√At)y1.
Polar decomposition / matrix sign decomposition.Important kernel routine for computing matrix logarithm,pth roots and powers and trigonometric matrixfunctions. [1;6]
Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 9/29
-
Uses of the Matrix Square Root
Markov models of finance and population dynamics:If P(t) is the transition matrix for a time step t , then
√P
could be used for t2 .Roots of stochastic matrices: still an open problem.
Solution of differential equations:ÿ + Ay = 0 has solution y = cos(
√At)y0 + sin(
√At)y1.
Polar decomposition / matrix sign decomposition.Important kernel routine for computing matrix logarithm,pth roots and powers and trigonometric matrixfunctions. [1;6]
Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 9/29
-
Uses of the Matrix Square Root
Markov models of finance and population dynamics:If P(t) is the transition matrix for a time step t , then
√P
could be used for t2 .Roots of stochastic matrices: still an open problem.
Solution of differential equations:ÿ + Ay = 0 has solution y = cos(
√At)y0 + sin(
√At)y1.
Polar decomposition / matrix sign decomposition.
Important kernel routine for computing matrix logarithm,pth roots and powers and trigonometric matrixfunctions. [1;6]
Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 9/29
-
Uses of the Matrix Square Root
Markov models of finance and population dynamics:If P(t) is the transition matrix for a time step t , then
√P
could be used for t2 .Roots of stochastic matrices: still an open problem.
Solution of differential equations:ÿ + Ay = 0 has solution y = cos(
√At)y0 + sin(
√At)y1.
Polar decomposition / matrix sign decomposition.Important kernel routine for computing matrix logarithm,pth roots and powers and trigonometric matrixfunctions. [1;6]
Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 9/29
-
The Schur Method
1. Compute a Schur decomposition: A = QTQ∗ with T uppertriangular. [2]
2. Expand U2 = T elementwise. For a primary square root, Uis also upper triangular:
U2ii = Tii ,
UiiUij + UijUjj = Tij −j−1∑
k=i+1
UikUkj .
U is found either a column or a superdiagonal at a time.3.√
A = QUQ∗
Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 10/29
-
The Schur Method: Properties
Cost: 2813n3 flops.
Real arithmetic version uses quasi upper triangular Schurdecomposition and 2× 2 diagonal block structure. [4]
Computed square root of the triangular matrix Û satisfiesÛ2 = T + ∆T , where
|∆T | ≤ γ̃n|Û|2
(this only holds normwise in the real arithmetic version).
No use of level 3 BLAS - slow!Now focus on the triangular phase of the algorithm.
Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 11/29
-
The Schur Method: Properties
Cost: 2813n3 flops.
Real arithmetic version uses quasi upper triangular Schurdecomposition and 2× 2 diagonal block structure. [4]
Computed square root of the triangular matrix Û satisfiesÛ2 = T + ∆T , where
|∆T | ≤ γ̃n|Û|2
(this only holds normwise in the real arithmetic version).No use of level 3 BLAS - slow!Now focus on the triangular phase of the algorithm.
Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 11/29
-
Run times for random complex triangular matrices
0 1000 2000 3000 4000 5000 6000 7000 80000
50
100
150
200
250
300
350
400
450
n
time
(s)
point
Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 12/29
-
The Blocked Schur Method
The Uij and Tij are now taken to be blocks:
U2ii = Tii , (1)
UiiUij + UijUjj = Tij −j−1∑
k=i+1
UikUkj . (2)
Solve (1) using the point method and (2) by solving theSylvester equation (e.g. xTRSYL in LAPACK).
Same error bounds hold true.Over 90% of run time spent in GEMM calls and 8% inSylvester equation solution.Not very sensitive to block size or choice of kernel routines.
Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 13/29
-
The Blocked Schur Method
The Uij and Tij are now taken to be blocks:
U2ii = Tii , (1)
UiiUij + UijUjj = Tij −j−1∑
k=i+1
UikUkj . (2)
Solve (1) using the point method and (2) by solving theSylvester equation (e.g. xTRSYL in LAPACK).Same error bounds hold true.Over 90% of run time spent in GEMM calls and 8% inSylvester equation solution.Not very sensitive to block size or choice of kernel routines.
Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 13/29
-
Run times for random complex triangular matrices
0 1000 2000 3000 4000 5000 6000 7000 80000
50
100
150
200
250
300
350
400
450
n
time
(s)
pointblock
Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 14/29
-
The Recursive Blocked Schur Method
Recursive solution of the triangular phase: [3](U11 U120 U22
)2=
(T11 T120 T22
).
U211 = T11 and U222 = T22 are solved recursively.
Solve U11U12 + U12U22 = T12 using the recursive methodof Jonsson & Kågström. [7]
‘Point’ algorithms are used when the recursion hasreached a certain size (e.g. n = 64).Same error bound holds as point algorithm (proof byinduction).
Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 15/29
-
Run times for random complex triangular matrices
0 1000 2000 3000 4000 5000 6000 7000 80000
50
100
150
200
250
300
350
400
450
n
time
(s)
pointblockrecursion
Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 16/29
-
Aside: multiplying two triangular matrices
0 1000 2000 3000 4000 5000 6000 7000 80000
50
100
150
200
250
n
time
(s)
pointblockrecursion
Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 17/29
-
Full complex matrices (called from MATLAB)
0 200 400 600 800 1000 1200 1400 1600 1800 20000
50
100
150
200
250
n
time
(s)
sqrtmfort_pointfort_recurse
Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 18/29
-
Full real matrices (called from MATLAB)
0 200 400 600 800 1000 1200 1400 1600 1800 20000
50
100
150
200
250
n
time
(s)
sqrtm_realsqrtmfort_point_realfort_recurse_real
Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 19/29
-
Parallelism
Which approach is best in parallel?
Options for parallelism:Threaded BLAS.Explicit loop-based parallelism of the triangular phase.OpenMP tasks.
Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 20/29
-
Parallelism
Which approach is best in parallel?
Options for parallelism:Threaded BLAS.Explicit loop-based parallelism of the triangular phase.OpenMP tasks.
Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 20/29
-
Parallel Point and Block Methods
Blocks below and left of (i , j) block must be computed first:
Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 21/29
-
Parallel Point and Block Methods
Synchronisation required after each superdiagonal:
Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 22/29
-
Parallel Point and Block Methods
Synchronisation required after each superdiagonal:
Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 22/29
-
Parallel Point and Block Methods
Synchronisation required after each superdiagonal:
Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 22/29
-
Parallel Point and Block Methods
Synchronisation required after each superdiagonal:
Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 22/29
-
Recursive Blocking in Parallel
Task based approach - each recursive call generates newtasks. Synchronization points required:
Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 23/29
-
Recursive Blocking in Parallel
Task based approach - each recursive call generates newtasks. Synchronization points required:
Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 23/29
-
Recursive Blocking in Parallel
Task based approach - each recursive call generates newtasks. Synchronization points required:
Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 23/29
-
Recursive Blocking in Parallel
Task based approach - each recursive call generates newtasks. Synchronization points required:
Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 23/29
-
Recursive Blocking in Parallel
Task based approach - each recursive call generates newtasks. Synchronization points required:
Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 23/29
-
Parallel Results for 4000× 4000 Triangular Matrices
1 Thread 8 Threads − BLAS 8 Threads − OpenMP0
20
40
60
80
100
120
140tim
e (s
)
pointblockrecursion
Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 24/29
-
Parallel Results for 4000× 4000 Full Matrices
1 Thread 8 Threads − BLAS 8 Threads − OpenMP0
100
200
300
400
500
600
700
800
900
1000tim
e (s
)
pointblockrecursion
Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 25/29
-
Implementation for the NAG Library
Key: robustness and error handling.
Extension to singular matrices:Eigenvalue considered to vanish if λ < �‖A‖Reorder and check if the vanishing eigenvalue issemisimple.
Negative eigenvalues:Eigenvalue considered to lie on R− if Re(λ) < 0 and|Im(λ)| < �|Re(λ)|.A non-principal square root can be returned in this case.
Condition estimation:Condition number estimates and residual bounds areavailable for the matrix square root.Condition number is expensive to compute.
Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 26/29
-
Implementation for the NAG Library
Key: robustness and error handling.Extension to singular matrices:
Eigenvalue considered to vanish if λ < �‖A‖Reorder and check if the vanishing eigenvalue issemisimple.
Negative eigenvalues:Eigenvalue considered to lie on R− if Re(λ) < 0 and|Im(λ)| < �|Re(λ)|.A non-principal square root can be returned in this case.
Condition estimation:Condition number estimates and residual bounds areavailable for the matrix square root.Condition number is expensive to compute.
Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 26/29
-
Implementation for the NAG Library
Key: robustness and error handling.Extension to singular matrices:
Eigenvalue considered to vanish if λ < �‖A‖Reorder and check if the vanishing eigenvalue issemisimple.
Negative eigenvalues:Eigenvalue considered to lie on R− if Re(λ) < 0 and|Im(λ)| < �|Re(λ)|.A non-principal square root can be returned in this case.
Condition estimation:Condition number estimates and residual bounds areavailable for the matrix square root.Condition number is expensive to compute.
Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 26/29
-
Implementation for the NAG Library
Key: robustness and error handling.Extension to singular matrices:
Eigenvalue considered to vanish if λ < �‖A‖Reorder and check if the vanishing eigenvalue issemisimple.
Negative eigenvalues:Eigenvalue considered to lie on R− if Re(λ) < 0 and|Im(λ)| < �|Re(λ)|.A non-principal square root can be returned in this case.
Condition estimation:Condition number estimates and residual bounds areavailable for the matrix square root.Condition number is expensive to compute.
Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 26/29
-
Implementation for the NAG Library
Real and complex routines, with or without conditionestimation, and extra routines specifically for triangularmatrices.Test programs:
Check the residual (√
A)2 − A against the theoreticalresidual bound.Test against computation in VPA using condition estimate.Tricky test cases e.g. almost singular matrices, nearlynegative eigenvalues.Error exits and illegal inputs.
Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 27/29
-
Summary
Recursive blocking can be fast, when the algorithm allows.NAG implementation uses OpenMP, so will use standardblocking.Matrix square root is a key computational kernel - BLAS?The fastest method in serial is not necessarily the fastestmethod in parallel.
Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 28/29
-
References[1] A. H. Al-Mohy and N. J. Higham. Improved inverse scaling and
squaring algorithms for the matrix logarithm. MIMS Eprint, 83,2011.
[2] Å. Björck and S. Hammarling. A Schur method for the square rootof a matrix. Lin. Alg. & Appl., 52/53:127–140, 1983.
[3] E. Deadman, N. J. Higham, and R. Ralha. A recursive blockedSchur algorithm for computing the matrix square root. MIMSEprint, 26, 2012.
[4] N. J. Higham. Computing real square roots of a real matrix. Lin.Alg. & Appl., 88/89:405–430, 1987.
[5] N. J. Higham. Functions of Matrices: Theory and Computation.SIAM, 2008.
[6] N. J. Higham and L. Lin. A Schur-Padé algorithm for fractionalpowers of a matrix. SIAM J. Matrix Anal. & Appl.,32(3):1056–1078, 2010.
[7] I. Jonsson and B. Kågström. Recursive blocked algorithms forsolving triangular systems - part I: One-sided and coupledSylvester-type matrix equations. ACM Transactions onMathematical Software, 28(4):392–415, December 2002.
Edvin Deadman , Nick Higham , Rui Ralha , University of Manchester / Numerical Algorithms Group, University of Manchester, University of Minho, PortugalBlocked Algorithms for the Matrix Square Root 29/29