Section 1.5 - Exploiting partitioning of matricesand vectors
Maggie MyersRobert A. van de Geijn
The University of Texas at Austin
Practical Linear Algebra – Fall 2009
http://z.cs.utexas.edu/wiki/pla.wiki/ 1
Example: Partitioning vectors
Given a vector
x =
4−131
one can think of this as its four elements
x =
χ0
χ1
χ2
χ3
where χ0 = 4, χ1 = −1, etc.
Note
The parentheses are only there to delimit (outline) the vector.They have no particular other meaning.
http://z.cs.utexas.edu/wiki/pla.wiki/ 2
Example: Partitioning vectors
Given a vector
x =
4−131
one can think of this as two subvectors:
x =(x0
x1
)=
(
4−1
)(
31
) =
4−131
so that
x0 =(
4−1
)and x1 =
(31
)
http://z.cs.utexas.edu/wiki/pla.wiki/ 3
Example: Partitioning vectors
Given a vector
x =
4−131
one can think of this as two subvectors:
x =(x0
x1
)=
(
4) −1
31
=
4−131
so that
x0 =(
4)
and x1 =
−131
http://z.cs.utexas.edu/wiki/pla.wiki/ 4
Example: Partitioning vectors
Given a vector
x =
4−131
one can think of this as two subvectors:
x =(x0
x1
)=
( )
4−131
=
4−131
so that
x0 =( )
and x1 =
4−131
http://z.cs.utexas.edu/wiki/pla.wiki/ 5
Example: Inner product with partitioned vectors
Given vector
x =(x0
x1
)=
4−131
and y =(y0
y1
)=
1−23−4
We find that
xT y = (4)× (1) + (−1)× (−2)︸ ︷︷ ︸=(
4−1
)T ( 1−2
)= xT
0 y0
+ (3)× (−3) + (1)×−4)︸ ︷︷ ︸=(
31
)T ( 3−4
)= xT
1 y1︸ ︷︷ ︸= xT
0 y0 + xT1 y1
http://z.cs.utexas.edu/wiki/pla.wiki/ 6
Theorem
Let x, y ∈ Rn and partition
x =
x0
x1...
xN−1
and y =
y0
y1...
yN−1
,
where xi and yi have the same size, for i = 0, . . . , N − 1.
http://z.cs.utexas.edu/wiki/pla.wiki/ 7
Theorem (continued)
Then
xT y =
x0
x1...
xN−1
T
y0
y1...
yN−1
=(xT
0 xT1 · · · xT
N−1
)
y0
y1...
yN−1
= xT
0 y0 + xT1 y1 + · · ·+ xT
N−1yN−1
http://z.cs.utexas.edu/wiki/pla.wiki/ 8
Example: axpywith partitioned vectors
Given
x =(x0
x1
)=
4−131
, y =(y0
y1
)=
1−23−4
, and α = 4
We find that
αx+ y = 4
4−131
+
1−23−4
=
(4)× (4) + (1)
(4)× (−1) + (−2)(4)× (3) + (3)
(4)× (1) + (−4)
=
4(
4−1
)+(
1−2
)4(
31
)+(
3−4
) =
(αx0 + y0
αx1 + y1
)
http://z.cs.utexas.edu/wiki/pla.wiki/ 9
Theorem
Let x, y ∈ Rn, α ∈ R, and partition
x =
x0
x1...
xN−1
and y =
y0
y1...
yN−1
,
where xi and yi have the same size, for i = 0, . . . , N − 1.
http://z.cs.utexas.edu/wiki/pla.wiki/ 10
Theorem (continued)
Then
αx+ y = α
x0
x1...
xN−1
+
y0
y1...
yN−1
=
αx0 + y0
αx1 + y1...
αxN−1 + yN−1
http://z.cs.utexas.edu/wiki/pla.wiki/ 11
Partitioning matrices
A =
A00 a01 A02
aT10 α11 aT
12
A20 a21 A22
=
−1 2 4 1 0
1 0 −1 −2 12 −1 3 1 21 2 3 4 3−1 −2 0 1 2
Pronounce a21 as a-two-one instead of a-twentyone, please.
Notice how the labels “A”, “a”, “α” are used.
Why do we use the label “aT10”?
http://z.cs.utexas.edu/wiki/pla.wiki/ 12
Example: blocked matrix-vector multiplication
Consider
A =
A00 a01 A02
aT10 α11 aT
12
A20 a21 A22
=
−1 2 4 1 0
1 0 −1 −2 12 −1 3 1 21 2 3 4 3−1 −2 0 1 2
,
x =
x0
χ1
x2
=
12345
, and y =
y0
ψ1
y2
,
where y0, y2 ∈ R2.
http://z.cs.utexas.edu/wiki/pla.wiki/ 13
Example (continued)
Then Ax =0BBBB@−1 2 4 1 0
1 0 −1 −2 1
2 −1 3 1 2
1 2 3 4 3−1 −2 0 1 2
1CCCCA0BBBB@
12
3
45
1CCCCA =
0BBBB@(−1)× (1) + (2)× (2) + (4)× (3) + (1)× (4) + (0)× (5)
(1)× (1) + (0)× (2) + (−1)× (3) + (−2)× (4) + (1)× (5)
(2)× (1) + (−1)× (2) + (3)× (3) + (1)× (4) + (2)× (5)
(1)× (1) + (2)× (2) + (3)× (3) + (4)× (4) + (3)× (5)(−1)× (1) + (−2)× (2) + (0)× (3) + (1)× (4) + (2)× (5)
1CCCCA
http://z.cs.utexas.edu/wiki/pla.wiki/ 14
Example (continued)
0BBBB@(−1)× (1) + (2)× (2) + (4)× (3) + (1)× (4) + (0)× (5)
(1)× (1) + (0)× (2) + (−1)× (3) + (−2)× (4) + (1)× (5)
(2)× (1) + (−1)× (2) + (3)× (3) + (1)× (4) + (2)× (5)
(1)× (1) + (2)× (2) + (3)× (3) + (4)× (4) + (3)× (5)(−1)× (1) + (−2)× (2) + (0)× (3) + (1)× (4) + (2)× (5)
1CCCCA
=
0BBBBBB@
„−1 2
1 0
«„12
«+
„4−1
«3 +
„1 0−2 1
«„45
«`
2 −1´„ 1
2
«+ (3)3 +
`1 2
´„ 45
«„
1 2−1 −2
«„12
«+
„30
«3 +
„4 31 2
«„45
«
1CCCCCCA
=
0BBBB@„
31
«+
„12−3
«+
„4−3
«0 + 9 + 14„
5−5
«+
„90
«+
„3114
«1CCCCA =
0BBBB@19−523459
1CCCCA
http://z.cs.utexas.edu/wiki/pla.wiki/ 15
Blocked matrix-vector multiplication
Let A ∈ Rm×n, x ∈ Rn, and y ∈ Rn. Let
m = m0 +m1 + · · ·mM−1, mi ≥ 0 for i = 0, . . . ,M − 1; and
n = n0 + n1 + · · ·nN−1, nj ≥ 0 for j = 0, . . . , N − 1; and
Partition
A =
0BBB@A0,0 A0,1 · · · A0,N−1
A1,0 A1,1 · · · A1,N−1
......
. . ....
AM−1,0 AM−1,1 · · · AM−1,N−1
1CCCA ,
x =
0BBB@x0
x1
...xN−1
1CCCA , and y =
0BBB@y0y1...
yM−1
1CCCAwith Ai,j ∈ Rmi×nj , xj ∈ Rnj , and yi ∈ Rmi .
http://z.cs.utexas.edu/wiki/pla.wiki/ 16
Theorem (continued)
Then0BBB@y0y1...
yM−1
1CCCA =
0BBB@A0,0 A0,1 · · · A0,N−1
A1,0 A1,1 · · · A1,N−1
......
. . ....
AM−1,0 AM−1,1 · · · AM−1,N−1
1CCCA0BBB@
x0
x1
...xN−1
1CCCA
=
0BBB@A0,0x0 +A0,1x1 + · · ·+A0,N−1xN−1
A1,0x0 +A1,1x1 + · · ·+A1,N−1xN−1
...
AM−1,0x0 +AM−1,1x1 + · · ·+AM−1,N−1xN−1
1CCCA
In other words...
yi =N−1∑j=0
Ai,jxj .
http://z.cs.utexas.edu/wiki/pla.wiki/ 17
Example (revisited)
Consider
A =
0@ A00 a01 A02
aT10 α11 aT12A20 a21 A22
1A =
0BBBB@−1 2 4 1 0
1 0 −1 −2 1
2 −1 3 1 2
1 2 3 4 3−1 −2 0 1 2
1CCCCA ,
x =
0@ x0
χ1
x2
1A =
0BBBB@12
3
45
1CCCCA , and y =
0@ y0ψ1
y2
1A ,
where y0, y2 ∈ R2.
http://z.cs.utexas.edu/wiki/pla.wiki/ 18
Example (continued)
Then
y =
0@ y0ψ1
y2
1A =
0@ A00 a01 A02
aT10 α11 aT12A20 a21 A22
1A0@ x0
χ1
x2
1A=
0@ A00x0 + a01χ1 +A02x2
aT10x0 + α11χ1 + aT12x2
A20x0 + a21χ1 +A22x2
1A
=
0BBBBBB@
„−1 2
1 0
«„12
«+
„4−1
«3 +
„1 0−2 1
«„45
«`
2 −1´„ 1
2
«+ (3)3 +
`1 2
´„ 45
«„
1 2−1 −2
«„12
«+
„30
«3 +
„4 31 2
«„45
«
1CCCCCCA
=
0BBBB@„
31
«+
„12−3
«+
„4−3
«0 + 9 + 14„
5−5
«+
„90
«+
„3114
«1CCCCA =
0BBBB@19−523459
1CCCCAhttp://z.cs.utexas.edu/wiki/pla.wiki/ 19
We are now going to “play” with partitioned matrices, to get thehang of it.
http://z.cs.utexas.edu/wiki/pla.wiki/ 20
Special case: Partition matrix by rows and result vector byelements
Partition
A =
aT
0
aT1...
aTm−1
and y =
ψ0
ψ1...
ψm−1
Then y = Ax can be computed as
ψ0
ψ1...
ψm−1
=
aT
0
aT1...
aTm−1
x =
aT
0 xaT
1 x...
aTm−1x
http://z.cs.utexas.edu/wiki/pla.wiki/ 21
Concrete example
(on blackboard)
http://z.cs.utexas.edu/wiki/pla.wiki/ 22
A very strange way of presenting the algorithm...
y := Mvmult unb var1(A, x, y)
Partition A→„AT
AB
«, y →
„yT
yB
«where AT is 0× n and yT is 0× 1
while m(AT ) < m(A) doRepartition„
AT
AB
«→
0@A0
aT1A2
1A ,
„yT
yB
«→
0@ y0
ψ1
y2
1Awhere a1 is a row
ψ1 := aT1 x+ ψ1
Continue with„AT
AB
«←
0@A0
aT1A2
1A ,
„yT
yB
«←
0@ y0ψ1
y2
1Aendwhile
http://z.cs.utexas.edu/wiki/pla.wiki/ 23
Special case: Partition matrix by columns and vector by elements
Partition
A =(a0 a1 · · · an−1
)and x =
χ0
χ1...
χn−1
Then y = Ax can be computed as
y =(a0 a1 · · · an−1
)
χ0
χ1...
χn−1
= a0χ0 + a1χ1 + · · ·+ an−1χn−1
= χ0a0 + χ1a1 + · · ·+ χn−1an−1.
http://z.cs.utexas.edu/wiki/pla.wiki/ 24
Concrete example
(on blackboard)
http://z.cs.utexas.edu/wiki/pla.wiki/ 25
A very strange way of presenting the algorithm...
y := Mvmult unb var2(A, x, y)
Partition A→`AL AR
´, x→
„xT
xB
«where AL is m× 0 and xT is 0× 1
while m(xT ) < m(x) doRepartition`
AL AR´→`A0 a1 A2
´,
„xT
xB
«→
0@ x0
χ1
x2
1Awhere a1 is a column
y := χ1a1 + y
Continue with`AL AR
´←`A0 a1 A2
´,
„xT
xB
«←
0@ x0
χ1
x2
1Aendwhile
http://z.cs.utexas.edu/wiki/pla.wiki/ 26
Example: Transpose matrix-vector multiplication
Let
A =
1 −2 02 −1 11 2 3
and x =
−12−3
.
Then
ATx =
1 −2 02 −1 11 2 3
T −12−3
=
1 2 1−2 −1 2
0 1 3
−12−3
=
0−6−7
.
http://z.cs.utexas.edu/wiki/pla.wiki/ 27
Algorithm for transposing a matrix
B := Trans unb var1(A,B)
Partition A→`AL AR
´, B →
„BT
BB
«where AL is m× 0 and BT is 0× n
while n(AL) < n(A) doRepartition`
AL AR´→`A0 a1 A2
´,
„BT
BB
«→
0@B0
bT1B2
1Awhere a1 is a column and bT1 is a row
bT1 := aT1
Continue with`AT AB
´←`A0 aT1 A2
´,
„BT
BB
«←
0@B0
bT1B2
1Aendwhile
http://z.cs.utexas.edu/wiki/pla.wiki/ 28
Example: Blocked matrix transposition
−1 2 4 1
1 0 −1 −22 −1 3 11 2 3 4−1 −2 0 1
T
=
−1 1 2 1 −1
2 0 −1 2 −24 −1 3 3 01 −2 1 4 1
http://z.cs.utexas.edu/wiki/pla.wiki/ 29
0BBBB@−1 2 4 1
1 0 −1 −2
2 −1 3 1
1 2 3 4−1 −2 0 1
1CCCCAT
=
0BBBB@„−1 2
1 0
« „4−1
« „1−2
«`
2 −1´ `
3´ `
1´„
1 2−1 −2
« „30
« „41
«1CCCCAT
=
0BBBBBBB@
„−1 2
1 0
«T `2 −1
´T „1 2−1 −2
«T„
4−1
«T `3´T „
30
«T„
1−2
«T `1´T „
41
«T
1CCCCCCCA
=
0BB@„−1 1
2 0
« „2−1
« „1 −12 −2
«`
4 −1´ `
3´ `
3 0´`
1 −2´ `
1´ `
4 1´
1CCA
=
0BB@−1 1 2 1 −1
2 0 −1 2 −2
4 −1 3 3 0
1 −2 1 4 1
1CCAhttp://z.cs.utexas.edu/wiki/pla.wiki/ 30
Theorem
Let
A =
A0,0 A0,1 · · · A0,N−1
A1,0 A1,1 · · · A1,N−1...
.... . .
...
AM−1,0 AM−1,1 · · · AM−1,N−1
.
Then
AT =
AT
0,0 AT1,0 · · · AT
M−1,0
AT0,1 AT
1,1 · · · ATM−1,1
......
. . ....
AT0,N−1 AT
1,N−1 · · · ATM−1,N−1
.
http://z.cs.utexas.edu/wiki/pla.wiki/ 31
y := Mvmult unb var1(A, x, y)
Partition A→„AT
AB
«, y →
„yT
yB
«where AT is 0× n and yT is 0× 1
while m(AT ) < m(A) doRepartition„
AT
AB
«→
0@A0
aT1A2
1A ,
„yT
yB
«→
0@ y0
ψ1
y2
1Awhere a1 is a row
ψ1 := aT1 x+ ψ1
Continue with„AT
AB
«←
0@A0
aT1A2
1A ,
„yT
yB
«←
0@ y0ψ1
y2
1Aendwhile
http://z.cs.utexas.edu/wiki/pla.wiki/ 32
y := Mvmult unb var1b(A, x, y)
Partition A→„
AT L AT R
ABL ABR
«,
x→„
xT
xB
«, y →
„yT
yB
«where AT L is 0× 0, xT , yT are 0× 1
while m(AT L) < m(A) doRepartition„
AT L AT R
ABL ABR
«→
0@ A00 a01 A02
aT10 α11 aT
12A20 a21 A22
1A,
„xT
xB
«→
0@ x0χ1x2
1A ,
„yT
yB
«→
0@ y0ψ1y2
1Awhere α11, χ1, and ψ1 are scalars
ψ1 := aT10x0 + α11χ1 + aT
12x2 + ψ1
Continue with„AT L AT R
ABL ABR
«←
0@ A00 a01 A02aT10 α11 aT
12A20 a21 A22
1A,
„xT
xB
«←
0@ x0χ1x2
1A ,
„yT
yB
«←
0@ y0ψ1y2
1Aendwhile
http://z.cs.utexas.edu/wiki/pla.wiki/ 33
Theorem
Let U be an upper triangular matrix. Partition
U →(UTL UTR
UBL UBR
)=
U00 u01 U02
uT10 υ11 uT
12
U20 u21 U22
,
where UTL and U00 are square matrices. Then
U →(UTL UTR
0 UBR
)=
U00 u01 U02
0 υ11 uT12
0 0 U22
,
where UTL and UBR are upper triangular matrices.
http://z.cs.utexas.edu/wiki/pla.wiki/ 34
Example
Consider
U00 u01 U02
uT10 υ11 uT
12
U20 u21 U22
,=
−1 2 4 1 0
0 0 −1 −2 10 0 3 1 20 0 0 4 30 0 0 0 2
We notice that uT
10 = 0, U20 = 0, and u21 = 0.
http://z.cs.utexas.edu/wiki/pla.wiki/ 35
y := Mvmult unb var1b(A, x, y)
Partition A→„
AT L AT R
ABL ABR
«,
x→„
xT
xB
«, y →
„yT
yB
«where AT L is 0× 0, xT , yT are 0× 1
while m(AT L) < m(A) doRepartition„
AT L AT R
ABL ABR
«→
0@ A00 a01 A02
aT10 α11 aT
12A20 a21 A22
1A,
„xT
xB
«→
0@ x0χ1x2
1A ,
„yT
yB
«→
0@ y0ψ1y2
1Awhere α11, χ1, and ψ1 are scalars
ψ1 := aT10x0 + α11χ1 + aT
12x2 + ψ1
Continue with„AT L AT R
ABL ABR
«←
0@ A00 a01 A02aT10 α11 aT
12A20 a21 A22
1A,
„xT
xB
«←
0@ x0χ1x2
1A ,
„yT
yB
«←
0@ y0ψ1y2
1Aendwhile
http://z.cs.utexas.edu/wiki/pla.wiki/ 36
y := Trmv un unb var1(U, x, y)
Partition U →„
UT L UT R
0 UBR
«,
x→„
xT
xB
«, y →
„yT
yB
«where UT L is 0× 0, xT , yT are 0× 1
while m(UT L) < m(U) doRepartition„
UT L UT R
0 UBR
«→
0@ U00 u01 U02
0 υ11 uT12
0 0 U22
1A,
„xT
xB
«→
0@ x0χ1x2
1A ,
„yT
yB
«→
0@ y0ψ1y2
1Awhere υ11, χ1, and ψ1 are scalars
ψ1 := uT10x0+ υ11χ1 + uT
12x2 + ψ1
Continue with„UT L UT R
0 UBR
«←
0@ U00 u01 U020 υ11 uT
120 0 A22
1A,
„xT
xB
«←
0@ x0χ1x2
1A ,
„yT
yB
«←
0@ y0ψ1y2
1Aendwhile
http://z.cs.utexas.edu/wiki/pla.wiki/ 37
Exercise
Let U ∈ Rn×n be an upper triangular matrix. Modify thealgorithm for computing y = Ax to compute y = Ux instead,taking advantage of the zeroes in the matrix.
http://z.cs.utexas.edu/wiki/pla.wiki/ 38
Cost of a triangular matrix-vector multiplication?
Consider U →
0@ U00 u01 U02
uT10 υ11 uT12U20 u21 U22
1A with U ∈ Rn×n and U00 ∈ Rk×k. Then
What is the size of uT12? n− k − 1
What is the cost of ψ1 := υ11χ1 + uT12x2 + ψ1?
2 + 2(n− k − 1) = 2(n− k).
What is the total cost of the algorithm (in flops)?
Cost =n−1∑k=0
[2(n− k)] = 2n−1∑k=0
[(n− k)] = 2n∑
j=1
j
= 2
n−1∑j=0
j + n
= 2(n(n− 1)
2+ n
)
= 2(n(n+ 1)
2
)= n(n+ 1) ≈ n2.
http://z.cs.utexas.edu/wiki/pla.wiki/ 39