powers of tensors and fast matrix multiplication · pdf filepowers of tensors and fast matrix...

Post on 06-Feb-2018

232 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Powers of Tensors and Fast Matrix Multiplication

François Le Gall

Department of Computer ScienceGraduate School of Information Science and Technology

The University of Tokyo

Simons Institute, 12 November 2014

Overview of our Results

Algebraic Complexity of Matrix Multiplication

• Model: algebraic circuits‣ gates: +, −, ×, ÷ (operations on two elements of the field)‣ input:‣ output:

Compute the product of two n� n matrices A and B over a field F

aij , bij (2n2 inputs)cij =

�nk=1 aikbkj (n2

outputs)

CM (n) = minimal number of algebraic operations needed to compute the product

Exponent of matrix multiplication

� = inf�

� | CM (n) � n� for all large enough n�

Obviously, 2 � � � 3.

note: may depend on the field F

note: may depend on the field F

History of the main improvements on theexponent of square matrix multiplication

Upper bound Year Authors� � 3� < 2.81 1969 Strassen� < 2.79 1979 Pan� < 2.78 1979 Bini, Capovani, Romani and Lotti� < 2.55 1981 Schonhage� < 2.53 1981 Pan� < 2.52 1982 Romani� < 2.50 1982 Coppersmith and Winograd� < 2.48 1986 Strassen� < 2.376 1987 Coppersmith and Winograd� < 2.373 2010 Stothers� < 2.3729 2012 Vassilevska Williams� < 2.3728639 2014 Le Gall This work

analysis of a tensor by the laser method

History of the main improvements on theexponent of square matrix multiplication

Upper bound Year Authors� � 3� < 2.81 1969 Strassen� < 2.79 1979 Pan� < 2.78 1979 Bini, Capovani, Romani and Lotti� < 2.55 1981 Schonhage� < 2.53 1981 Pan� < 2.52 1982 Romani� < 2.50 1982 Coppersmith and Winograd� < 2.48 1986 Strassen� < 2.376 1987 Coppersmith and Winograd� < 2.373 2010 Stothers� < 2.3729 2012 Vassilevska Williams� < 2.3728639 2014 Le Gall

History of the main improvements on theexponent of square matrix multiplication

� < 2.48 1986 Strassen� < 2.376 1987 Coppersmith and Winograd� < 2.373 2010 Stothers� < 2.3729 2012 Vassilevska Williams� < 2.3728639 2014 Le Gall

LM-based analysis v1LM-based analysis v2.0LM-based analysis v2.1LM-based analysis v2.2LM-based analysis v2.3

The tensors considered become more difficult to analyze(technical difficulties appear + the “size” of the tensor increases)

Previous versions (up to v2.2): analyzing the tensor required solving a complicated optimization problem (difficult when the size of the tensor increases)

Our new technique (v2.3): analyzing the tensor (i.e., obtaining an upper bound on ω from it) can be done in time polynomial in the size of the tensor

‣ analysis based on convex optimization

analysis of the tensor by the laser method (LM)

Applications of our methodany tensor from which an upper bound on ω can be

obtained from the laser methodcorresponding upper

bound on ω

Laser-method-based analysis v2.3

polynomial time

which tensor? powers of the basic tensor from Coppersmith and Winograd’s paper

m Upper boundNumber of variables in

Authorsin the optimization problem

1 � < 2.3871900 1 CW (1987)2 � < 2.3754770 3 CW (1987)4 � < 2.3729269 9 Stothers (2010)8 � < 2.3729 29 Vassilevska Williams (2012)16 � < 2.3728640 101 Le Gall (2014)32 � < 2.3728639 373 Le Gall (2014)

analysis of the m-th power of the tensor by CW

� < 2.48 1986 Strassen� < 2.376 1987 Coppersmith and Winograd� < 2.373 2010 Stothers� < 2.3729 2012 Vassilevska Williams� < 2.3728639 2014 Le Gall

LM-based analysis v1LM-based analysis v2.0LM-based analysis v2.1LM-based analysis v2.2LM-based analysis v2.3

Applications of our method

m Upper boundNumber of variables in

Authorsin the optimization problem

1 � < 2.3871900 1 CW (1987)2 � < 2.3754770 3 CW (1987)4 � < 2.3729269 9 Stothers (2010)8 � < 2.3729 29 Vassilevska Williams (2012)16 � < 2.3728640 101 Le Gall (2014)32 � < 2.3728639 373 Le Gall (2014)

analysis of the m-th power of the tensor by CW

any tensor from which an upper bound on ω can be

obtained from the laser method polynomial time

which tensor? powers of the basic tensor from Coppersmith and Winograd’s paper

� < 2.48 1986 Strassen� < 2.376 1987 Coppersmith and Winograd� < 2.373 2010 Stothers� < 2.3729 2012 Vassilevska Williams� < 2.3728639 2014 Le Gall

LM-based analysis v1LM-based analysis v2.0LM-based analysis v2.1LM-based analysis v2.2LM-based analysis v2.3

Laser-method-based analysis v2.3 corresponding upper

bound on ω

Applications of our method

m Upper boundNumber of variables in

Authorsin the optimization problem

1 � < 2.3871900 1 CW (1987)2 � < 2.3754770 3 CW (1987)4 � < 2.3729269 9 Stothers (2010)8 � < 2.3729 29 Vassilevska Williams (2012)16 � < 2.3728640 101 Le Gall (2014)32 � < 2.3728639 373 Le Gall (2014)

analysis of the m-th power of the tensor by CW

any tensor from which an upper bound on ω can be

obtained from the laser method polynomial time

which tensor? powers of the basic tensor from Coppersmith and Winograd’s paper

� < 2.48 1986 Strassen� < 2.376 1987 Coppersmith and Winograd� < 2.373 2010 Stothers� < 2.3729 2012 Vassilevska Williams� < 2.3728639 2014 Le Gall

LM-based analysis v1LM-based analysis v2.0LM-based analysis v2.1LM-based analysis v2.2LM-based analysis v2.3

Laser-method-based analysis v2.3 corresponding upper

bound on ω

Applications of our method

m Upper boundNumber of variables in

Authorsin the optimization problem

1 � < 2.3871900 1 CW (1987)2 � < 2.3754770 3 CW (1987)4 � < 2.3729269 9 Stothers (2010)8 � < 2.3729 29 Vassilevska Williams (2012)16 � < 2.3728640 101 Le Gall (2014)32 � < 2.3728639 373 Le Gall (2014)

analysis of the m-th power of the tensor by CW

any tensor from which an upper bound on ω can be

obtained from the laser method polynomial time

which tensor? powers of the basic tensor from Coppersmith and Winograd’s paper

� < 2.48 1986 Strassen� < 2.376 1987 Coppersmith and Winograd� < 2.373 2010 Stothers� < 2.3729 2012 Vassilevska Williams� < 2.3728639 2014 Le Gall

LM-based analysis v1LM-based analysis v2.0LM-based analysis v2.1LM-based analysis v2.2LM-based analysis v2.3

Laser-method-based analysis v2.3 corresponding upper

bound on ω

How to Obtain Upper Bounds on ω ?

Goal: compute the product of A =�

a11 a12

a21 a22

�by B =

�b11 b12

b21 b22

1. Compute: m1 = a11 � (b12 � b22),

m2 = (a11 + a12) � b22,

m3 = (a21 + a22) � b11,

m4 = a22 � (b21 � b11),

m5 = (a11 + a22) � (b11 + b22),

m6 = (a12 � a22) � (b21 + b22),

m7 = (a11 � a21) � (b11 + b12).

2. Output: �m2 + m4 + m5 + m6 = c11,

m1 + m2 = c12,

m3 + m4 = c21,

m1 �m3 + m5 �m7 = c22.

(for the product of two 2×2 matrices)

7 multiplications 18 additions/substractions

Strassen’s algorithm

Goal: compute the product of A =�

a11 a12

a21 a22

�by B =

�b11 b12

b21 b22

1. Compute: m1 = a11 � (b12 � b22),

m2 = (a11 + a12) � b22,

m3 = (a21 + a22) � b11,

m4 = a22 � (b21 � b11),

m5 = (a11 + a22) � (b11 + b22),

m6 = (a12 � a22) � (b21 + b22),

m7 = (a11 � a21) � (b11 + b12).

2. Output: �m2 + m4 + m5 + m6 = c11,

m1 + m2 = c12,

m3 + m4 = c21,

m1 �m3 + m5 �m7 = c22.

7 multiplications 18 additions/substractions

Strassen’s algorithm

Recursive application gives CM (2k) = O(7k)=� � � log2(7) = 2.807... [Strassen 69]

= O((2k)log2 7)

(for the product of two 2k×2k matrices)

7 multiplications 18 additions/substractions

Strassen’s algorithm

Recursive application gives CM (2k) = O(7k)=� � � log2(7) = 2.807... [Strassen 69]

= O((2k)log2 7)

(for the product of two 2k×2k matrices)

Suppose that the product of two m�m matrices can be computedwith t multiplications. Then

� � logm(t) or, equivalently, m� � t.

More generally:

Strassen’s algorithm is the case m = 2 and t = 7

The tensor of matrix multiplication

intuitive interpretation: ‣ this is a formal sum

‣ when the aik and the bkj are replaced by the corresponding entries of matrices, the coefficient of cij becomes

�nk=1 aikbkj

DefinitionThe tensor corresponding to the multiplication of an m� n matrix byan n� p matrix is m�

i=1

p�

j=1

n�

k=1

aik � bkj � cij .�m,n, p� =

General 3-tensorsConsider three vector spaces U , V and W over FTake bases of U, V and W : U = span{x1, . . . , xdim(U)}

V = span{y1, . . . , ydim(V )}

A tensor over (U, V,W ) is an element of U � V �W

i.e., a formal sum T =dim(U)�

u=1

dim(V )�

v=1

dim(W )�

w=1

duvw xu � yv � zw

� F

“a three-dimension array with dim(U)� dim(V )� dim(W ) entries in F”

W = span{z1, . . . , zdim(W )}

General 3-tensorsA tensor over (U, V,W ) is an element of U � V �W

i.e., a formal sum T =dim(U)�

u=1

dim(V )�

v=1

dim(W )�

w=1

duvw xu � yv � zw

� F

DefinitionThe tensor corresponding to the multiplication of an m� n matrix byan n� p matrix is m�

i=1

p�

j=1

n�

k=1

aik � bkj � cij .

dim(U) = mn, dim(V ) = np and dim(W ) = mp

U = span�{aik}1�i�m,1�k�n

V = span�{bk�j}1�k��n,1�j�p

W = span�{ci�j�}1�i��m,1�j��p

�m,n, p� =

dikk�ji�j� =�

1 if i = i�, j = j�, k = k�

0 otherwise

RankDefinitionThe tensor corresponding to the multiplication of an m� n matrix byan n� p matrix is m�

i=1

p�

j=1

n�

k=1

aik � bkj � cij .�m,n, p� =

R(�2, 2, 2�) � 7

�2, 2, 2� =a11 � (b12 � b22)� (c12 + c22)+ (a11 + a12)� b22 � (�c11 + c12)+ (a21 + a22)� b11 � (c21 � c22)+ a22 � (b21 � b11)� (c11 + c21)+ (a11 + a22)� (b11 + b22)� (c11 + c22)+ (a12 � a22)� (b21 + b22)� c11

+ (a11 � a21)� (b11 + b12)� (�c22)

Strassen’s algorithm gives

rank = # of multiplications of the best (bilinear) algorithm

R(�m,n, p�) � mnp

How to obtain upper bounds on ω ?Remember:

In our terminology: R(�m,m, m�) � t =� m� � t

Theorem

R(�m,n, p�) � t =� (mnp)�/3 � t

border rank

Second generalization:

Suppose that the product of two m�m matrices can be computedwith t multiplications. Then

� � logm(t) or, equivalently, m� � t.

Theorem

R(�m,n, p�) � t =� (mnp)�/3 � t

First generalization:

[Bini et al. 1979]

R(�m,n, p�) � R(�m,n, p�)

How to obtain upper bounds on ω ?Theorem (the asymptotic sum inequality, special case) [Schönhage 1981]

R(�m1, n1, p1� � �m2, n2, p2�) � t =� (m1n1p1)�/3 + (m2n2p2)�/3 � t

Third generalization:

Theorem (the asymptotic sum inequality, general form) [Schönhage 1981]

R

�k�

i=1

�mi, ni, pi��� t =�

k�

i=1

(minipi)�/3 � t

direct sum

Theorem

R(�m,n, p�) � t =� (mnp)�/3 � t

border rank

Second generalization:

Theorem

R(�m,n, p�) � t =� (mnp)�/3 � t

First generalization:

[Bini et al. 1979]

R(�m,n, p�) � R(�m,n, p�)

History of the main improvements on theexponent of square matrix multiplication

Upper bound Year Authors� � 3� < 2.81 1969 Strassen� < 2.79 1979 Pan� < 2.78 1979 Bini et al.� < 2.55 1981 Schonhage� < 2.53 1981 Pan� < 2.52 1982 Romani� < 2.50 1982 Coppersmith and Winograd� < 2.48 1986 Strassen� < 2.376 1987 Coppersmith and Winograd� < 2.373 2010 Stothers� < 2.3729 2012 Vassilevska Williams� < 2.3728639 2014 Le Gall

upper bound on ω from the analysis of the rank of a tensor

analysis of the border rank of a tensor

analysis of a tensor by the

asymptotic sum inequality

analysis of a tensor by the laser method

The Laser Method on a Simpler Example

V. Strassen.Algebra and Complexity. Proceedings of the first European Congress of Mathematics, pp. 429-446, 1994.

from

The “laser method” Why this is called the “laser method”?

Upper bound Year Authors� � 3� < 2.81 1969 Strassen� < 2.79 1979 Pan� < 2.78 1979 Bini, Capovani, Romani and Lotti� < 2.55 1981 Schonhage� < 2.53 1981 Pan� < 2.52 1982 Romani� < 2.50 1982 Coppersmith and Winograd� < 2.48 1986 Strassen� < 2.376 1987 Coppersmith and Winograd� < 2.373 2010 Stothers� < 2.3729 2012 Vassilevska Williams� < 2.3728639 2014 Le Gall

Ref. [27]variants (improvements)

of the laser method

The first CW constructionLet q be a positive integer.

Consider three vector spaces U , V and W of dimension q + 1 over F.

U = span{x0, . . . , xq}V = span{y0, . . . , yq} W = span{z0, . . . , zq}

Coppersmith and Winograd (1987) introduced the following tensor:

Teasy = T 011easy + T 101

easy + T 110easy,

where T 011easy =

q�

i=1

x0 � yi � zi

T 101easy =

q�

i=1

xi � y0 � zi

T 110easy =

q�

i=1

xi � yi � z0

�= �1, 1, q�

�= �1, q, 1�

�= �q, 1, 1�

T 011easy =

q�

i=1

x00 � y0i � z0i

T 101easy =

q�

i=1

xi0 � y00 � zi0

T 110easy =

q�

i=1

x0i � yi0 � z00

1×1 matrix by 1×q matrix

q×1 matrix by 1×1 matrix

1×q matrix by q×1 matrix

tensor over (U, V,W )

The first CW constructionU = span{x0, . . . , xq}V = span{y0, . . . , yq} W = span{z0, . . . , zq}

Coppersmith and Winograd (1987) introduced the following tensor:

Teasy = T 011easy + T 101

easy + T 110easy,

where

U = U0 � U1, where U0 = span{x0} and U1 = span{x1, . . . , xq}V = V0 � V1, where V0 = span{y0} and V1 = span{y1, . . . , yq}W = W0 �W1, where W0 = span{z0} and W1 = span{z1, . . . , zq}

tensor over (U0, V1,W1)

tensor over (U1, V0,W1)

tensor over (U1, V1,W0)

T 011easy =

q�

i=1

x0 � yi � zi

T 101easy =

q�

i=1

xi � y0 � zi

T 110easy =

q�

i=1

xi � yi � z0

This is not a direct sum

The first CW constructionTeasy = T 011

easy + T 101easy + T 110

easy

Since the sum in not direct, we cannot use the asymptotic sum inequality

Consider T�2easy = (T 011

easy + T 101easy + T 110

easy)� (T 011easy + T 101

easy + T 110easy)

= T 011easy � T 011

easy + T 011easy � T 101

easy + · · · + T 110easy � T 110

easy (9 terms)

Consider T�Neasy = T 011

easy � · · ·� T 011easy + · · · + T 110

easy � · · ·� T 110easy (3N terms)

Coppersmith and Winograd showed how to select terms that ��

322/3

�N

do not share variables (i.e., form a direct sum)

R(Teasy) � q + 2

R(Teasy) = q + 2Actually,

Note: R(T�Neasy ) = (q + 1)N+o(N) would imply � = 2

by zeroing variables(i.e., without increasing the rank)

Theorem [Coppermith and Winograd 87]

The first CW construction: Analysis

=�

322/3

�(1�o(1))N

Consider T�Neasy = T 011

easy � · · ·� T 011easy + · · · + T 110

easy � · · ·� T 110easy (3N terms)

N copies of T 011easy

0 copies of T 101easy

0 copies of T 110easy

0 copies of T 011easy

0 copies of T 101easy

N copies of T 110easy

H�

13 , 2

3

�= � 1

3 log�

13

�� 2

3 log�

23

�(entropy)

= log�31/3 �

�32

�2/3�

= log�

322/3

The tensor T�Neasy can be converted into a direct sum of

exp��

H

�13,23

�� o(1)

�N

terms, each containing N3 copies of T 011

easy,N3 copies of T 101

easy and N3 copies of T 110

easy.

by zeroing variables(i.e., without increasing the rank)

Theorem [Coppermith and Winograd 87]

The first CW construction: Analysis

=�

322/3

�(1�o(1))N

isomorphic to�T 011

easy

��N/3 ��T 101

easy

��N/3 ��T 110

easy

��N/3 �=�qN/3, qN/3, qN/3

Theorem (the asymptotic sum inequality, general form) [Schönhage 1981]

R

�k�

i=1

�mi, ni, pi��� t =�

k�

i=1

(minipi)�/3 � t

Consequence:�

322/3

�(1�o(1))N

� qN�/3 � R(T�Neasy ) � R(Teasy)N = (q + 2)N

=� 322/3

� q�/3 � q + 2 =� � � 2.403... for q = 8

The tensor T�Neasy can be converted into a direct sum of

exp��

H

�13,23

�� o(1)

�N

terms, each containing N3 copies of T 011

easy,N3 copies of T 101

easy and N3 copies of T 110

easy.

Idea behind the proofTeasy = T 011

easy + T 101easy + T 110

easy T 011easy =

q�

i=1

x0 � yi � zi

T 101easy =

q�

i=1

xi � y0 � zi

T 110easy =

q�

i=1

xi � yi � z0

= (T 011easy + T 101

easy + T 110easy)� (T 011

easy + T 101easy + T 110

easy)

= T 011easy � T 011

easy + T 011easy � T 101

easy + · · · + T 110easy � T 110

easy (9 terms)

tensor over (U0 � U1)� (V1 � V0)� (W1 �W1)label 011011

011011001111 111100011110 100111 110011

110110 101101 111001

T�2easy

Consider N = 2

T 011easy � T 101

easy =q�

i,i�=0

(x0 � xi�)� (yi � y0)� (zi � zi�)

tensor over (U0 � U1)� (V1 � V0)� (W1 �W1)T 011

easy � T 101easy =

q�

i,i�=0

(x0 � xi�)� (yi � y0)� (zi � zi�)

Idea behind the proof

= (T 011easy + T 101

easy + T 110easy)� (T 011

easy + T 101easy + T 110

easy)

= T 011easy � T 011

easy + T 011easy � T 101

easy + · · · + T 110easy � T 110

easy (9 terms)011011001111 111100

011110 100111 110011110110 101101 111001

T 011easy � T 011

easy =q�

i,i�=0

(x0 � x0)� (yi � yi�)� (zi � zi�)tensor over (U0 � U0)� (V1 � V1)� (W1 �W1)

SHARE VARIABLES

T�2easy

Consider N = 2

label 011011

remove this term(e.g., by setting all variables in V1 � V1 to zero)note: this removes more than one term!

by setting all variables in U1 � U0, V0 � V0and W0 �W1 to zero

Idea behind the proofConclusion: we can convert (a sum of 9 terms) into a direct sum of 2 termsT�2

easy

Consider T�Neasy = T 011

easy � · · ·� T 011easy + · · · + T 110

easy � · · ·� T 110easy (3N terms)

1 · · · 11 · · · 10 · · · 0labels:3N 3N

NEXT STEP

0 · · · 01 · · · 11 · · · 1

Idea behind the proof

number of possibilities

The proof of this theorem is based on a complicated construction using the existence of dense sets of integers with no three-term arithmetic progression

3N

#0 = N/3#1 = 2N/3

#0 = N/3#1 = 2N/3

#0 = N/3#1 = 2N/3

0 · · · 1 1 · · · 0 0 · · · 1

Theorem [Coppermith and Winograd 87]

=�

322/3

�(1�o(1))N

The tensor T�Neasy can be converted into a direct sum of

exp��

H

�13,23

�� o(1)

�N

terms, each containing N3 copies of T 011

easy,N3 copies of T 101

easy and N3 copies of T 110

easy.

�N

N3 , 2N

3

�� exp

�H

�13,23

�N

that do not share a blue part, a red part or a green part

We can obtain labels of the form�

322/3

�(1�o(1))N

General Formulationof the Laser Method

and Reinterpretation

The laser method: general formulation

V�(T ) = limN��

V�,N (T )1/N The value of T

For any tensor T , any N � 1 and any � � [2, 3] define V�,N (T )

isomorphic to�k

i=1�mi, ni, pi�

Theorem (the asymptotic sum inequality, general form) [Schönhage 1981]k�

i=1

(minipi)�/3 � R

�k�

i=1

�mi, ni, pi��

Otherwise we use V�(T ) = V�(T � �T � �2T )1/3This is the definition for symmetric tensors.

This is an increasing function of ρV�(T � T �) � V�(T ) + V�(T �) V�(T � T �) � V�(T )� V�(T �)

as the maximum of�k

i=1(minipi)�/3 over all restrictions of T�N

V�(�m,n, p�) = (mnp)�/3

V�(T ) = limN��

V�,N (T )1/N The value of T

For any tensor T , any N � 1 and any � � [2, 3] define V�,N (T )

isomorphic to�k

i=1�mi, ni, pi�

Example: The first CW construction

Theorem [Coppermith and Winograd 87]

=�

322/3

�(1�o(1))N

isomorphic to�T 011

easy

��N/3 ��T 101

easy

��N/3 ��T 110

easy

��N/3 �=�qN/3, qN/3, qN/3

V�,N (Teasy) ��

322/3

�(1�o(1))N

� q�N/3V�(Teasy) �

322/3

� q�/3

The tensor T�Neasy can be converted into a direct sum of

exp��

H

�13,23

�� o(1)

�N

terms, each containing N3 copies of T 011

easy,N3 copies of T 101

easy and N3 copies of T 110

easy.

as the maximum of�k

i=1(minipi)�/3 over all restrictions of T�N

The laser method: general formulation

Theorem (simple generalization of the asymptotic sum inequality)

V�(T ) � R(T )

V�(T ) = limN��

V�,N (T )1/N The value of T

V�(�m,n, p�) = (mnp)�for instance,

For any tensor T , any N � 1 and any � � [2, 3] define V�,N (T )

isomorphic to�k

i=1�mi, ni, pi�

Theorem (the asymptotic sum inequality, general form) [Schönhage 1981]k�

i=1

(minipi)�/3 � R

�k�

i=1

�mi, ni, pi��

as the maximum of�k

i=1(minipi)�/3 over all restrictions of T�N

The laser method: general formulationConsider three vector spaces U , V and W over F

Assume that U , V and W are decomposed as

U =�

i�I

Ui V =�

j�J

Vj W =�

k�K

Wk for some I, J, K � Z

A tensor T over (U, V,W ) is an element of U � V �W

The tensor T is a partitioned tensor (with respect to this decomposition) if

T =�

(i,j,k)�I�J�K

Tijk

where Tijk � Ui � Vj �Wk for each (i, j, k) � I � J �K

it can be written as

support of the tensor: supp(T ) = {(i, j, k) � I � J �K | Tijk �= 0}each non-zero Tijk is called a component of T

We say that the tensor is tight if there exists some integer d such thati + j + k = d for all (i, j, k) � supp(T )

Example: The first CW construction

Teasy = T 011easy + T 101

easy + T 110easy,

where

U = U0 � U1, where U0 = span{x0} and U1 = span{x1, . . . , xq}V = V0 � V1, where V0 = span{y0} and V1 = span{y1, . . . , yq}W = W0 �W1, where W0 = span{z0} and W1 = span{z1, . . . , zq}

tensor over (U0, V1,W1)

tensor over (U1, V0,W1)

tensor over (U1, V1,W0)

T 011easy =

q�

i=1

x0 � yi � zi

T 101easy =

q�

i=1

xi � y0 � zi

T 110easy =

q�

i=1

xi � yi � z0

supp(Teasy) = {(0, 1, 1), (1, 0, 1), (1, 1, 0)}

I = {0, 1}J = {0, 1}K = {0, 1}

i + j + k = 2 for all (i, j, k) � supp(Teasy)it is tight, since

Main Theorem [LG 14] (reinterpretation of prior works)

The laser method: general formulation

For any tight partitioned tensor T , any probability distribution P over supp(T ),

and any � � [2, 3], we have

log(V�(T )) �3�

�=1

H(P�)3

+�

(i,j,k)�supp(T )

P (i, j, k) log(V�(Tijk))� �(P ).

H: entropy

�(P ): to be defined later (zero in the case of simple tensors)

Conclusion: we can compute a lower bound on the value of T if we knowa lower bound on the value of each component

we can then obtain an upper bound on ω via V�(T ) � R(T )

concretely, we use V�(T ) � R(T ) =� � � � and do a binary search on ρ

P�: projection of P along the �-th coordinate (= marginal distribution)

Example: The first CW constructionMain Theorem [LG 14] (reinterpretation of prior works)

For any tight partitioned tensor T , any probability distribution P over supp(T ),

and any � � [2, 3], we have

log(V�(T )) �3�

�=1

H(P�)3

+�

(i,j,k)�supp(T )

P (i, j, k) log(V�(Tijk))� �(P ).

H: entropy

�(P ): to be defined later (zero in the case of simple tensors)

supp(Teasy) = {(0, 1, 1), (1, 0, 1), (1, 1, 0)} V�(T 011easy) = V�(�1, 1, q�) = q�/3

V�(T 101easy) = V�(�q, 1, 1�) = q�/3

V�(T 110easy) = V�(�1, q, 1�) = q�/3

P (0, 1, 1) = P (1, 0, 1) = P (1, 1, 0) = 1/3

P1(0) = 1/3, P1(1) = 2/3 and P2 = P3 = P1

�(P ) = 0

log(V�(Teasy)) � H

�13,23

�+

13

log�q�/3

�+

13

log�q�/3

�+

13

log�q�/3

P�: projection of P along the �-th coordinate (= marginal distribution)

Example: The first CW constructionTheorem [Coppersmith and Winograd 87]

=�

322/3

�(1�o(1))N

supp(Teasy) = {(0, 1, 1), (1, 0, 1), (1, 1, 0)} V�(T 011easy) = V�(�1, 1, q�) = q�/3

V�(T 101easy) = V�(�q, 1, 1�) = q�/3

V�(T 110easy) = V�(�1, q, 1�) = q�/3

P (0, 1, 1) = P (1, 0, 1) = P (1, 1, 0) = 1/3

P1(0) = 1/3, P1(1) = 2/3 and P2 = P3 = P1

�(P ) = 0

V�,N (Teasy) ��

322/3

�(1�o(1))N

� q�N/3V�(Teasy) �

322/3

� q�/3

log(V�(Teasy)) � H

�13,23

�+

13

log�q�/3

�+

13

log�q�/3

�+

13

log�q�/3

The tensor T�Neasy can be converted into a direct sum of

exp��

H

�13,23

�� o(1)

�N

terms, each containing N3 copies of T 011

easy,N3 copies of T 101

easy and N3 copies of T 110

easy.

Main Theorem [LG 14]

The laser method: general formulation

For any tight partitioned tensor T , any probability distribution P over supp(T ),

and any � � [2, 3], we have

log(V�(T )) �3�

�=1

H(P�)3

+�

(i,j,k)�supp(T )

P (i, j, k) log(V�(Tijk))� �(P ).

Interpretation: the laser method enables us to convert (by zeroing variables)

T�N into a direct sum of

terms, each isomorphic to �

(i,j,k)�supp(T )

[T ijk]�P (i,j,k)N

exp

��3�

�=1

H(P�)3

� �(P )� o(1)

�N

The second CW constructionLet q be a positive integer.

Coppersmith and Winograd (1987) considered the following tensor:

Consider three vector spaces U , V and W of dimension q + 2 over F.

U = span{x0, . . . , xq, xq+1}

V = span{y0, . . . , yq, yq+1}W = span{z0, . . . , zq, zq+1}

where

TCW = T 011CW + T 101

CW + T 110CW + T 002

CW + T 020CW + T 200

CW ,

T 011CW = T 011

easy

T 101CW = T 101

easy

T 110CW = T 110

easy

and T 002CW = x0 � y0 � zq+1

�= �1, 1, 1�T 020

CW = x0 � yq+1 � z0�= �1, 1, 1�

T 200CW = xq+1 � y0 � z0

�= �1, 1, 1� .

R(TCW) = q + 2TCW = Teasy + T 002CW + T 020

CW + T 200CW

The second CW construction

This is not a direct sum

U = span{x0, . . . , xq, xq+1}V = span{y0, . . . , yq, yq+1}

W = span{z0, . . . , zq, zq+1}

T 011CW tensor over (U0, V1,W1)

T 101CW tensor over (U1, V0,W1)

T 110CW tensor over (U1, V1,W0)

TCW = T 011CW + T 101

CW + T 110CW + T 002

CW + T 020CW + T 200

CW

U = U0 � U1�U2, where U0 = span{x0}, U1 = span{x1, . . . , xq} and U2 = span{xq+1}V = V0 � V1�V2, where V0 = span{y0}, V1 = span{y1, . . . , yq} and V2 = span{yq+1}W = W0 �W1�W2, where W0 = span{z0}, W1 = span{z1, . . . , zq} and W2 = span{zq+1}

T 002CW = x0 � y0 � zq+1 tensor over (U0, V0,W2)

T 020CW = x0 � yq+1 � z0 tensor over (U0, V2,W0)

T 200CW = xq+1 � y0 � z0 tensor over (U2, V0,W0)

T 002CW = x0 � y0 � zq+1

�= �1, 1, 1�T 020

CW = x0 � yq+1 � z0�= �1, 1, 1�

T 200CW = xq+1 � y0 � z0

�= �1, 1, 1�

The second CW construction: laser method

T 011CW tensor over (U0, V1,W1)

T 101CW tensor over (U1, V0,W1)

T 110CW tensor over (U1, V1,W0)

TCW = T 011CW + T 101

CW + T 110CW + T 002

CW + T 020CW + T 200

CW

supp(TCW) = {(0, 1, 1), (1, 0, 1), (1, 1, 0), (0, 0, 2), (0, 2, 0), (2, 0, 0)}

T 002CW = x0 � y0 � zq+1 tensor over (U0, V0,W2)

T 020CW = x0 � yq+1 � z0 tensor over (U0, V2,W0)

T 200CW = xq+1 � y0 � z0 tensor over (U2, V0,W0)

T 002CW = x0 � y0 � zq+1

�= �1, 1, 1�T 020

CW = x0 � yq+1 � z0�= �1, 1, 1�

T 200CW = xq+1 � y0 � z0

�= �1, 1, 1�

V�(T 002CW ) = V�(T 020

CW ) = V�(T 200CW ) = 1 V�(T 011

CW ) = V�(T 101CW ) = V�(T 110

CW ) = q�/3

P1(0) = � + 2(1/3� �), P1(1) = 2�, P1(2) = (1/3� �)

P (0, 1, 1) = P (1, 0, 1) = P (1, 1, 0) = �with 0 � � � 1/3take

P (0, 0, 2) = P (0, 2, 0) = P (2, 0, 0) = (1/3� �)

The second CW construction: laser method

Main Theorem [LG 14]

For any tight partitioned tensor T , any probability distribution P over supp(T ),

and any � � [2, 3], we have

log(V�(T )) �3�

�=1

H(P�)3

+�

(i,j,k)�supp(T )

P (i, j, k) log(V�(Tijk))� �(P ).

=� log(V�(TCW)) � H

�23� �, 2�,

13� �

�+ log(q��)

V�(TCW) � R(TCW) = q + 2combined with

� � 2.38718... for q = 6 and � = 0.3173this gives

supp(TCW) = {(0, 1, 1), (1, 0, 1), (1, 1, 0), (0, 0, 2), (0, 2, 0), (2, 0, 0)}

V�(T 002CW ) = V�(T 020

CW ) = V�(T 200CW ) = 1 V�(T 011

CW ) = V�(T 101CW ) = V�(T 110

CW ) = q�/3

P (0, 1, 1) = P (1, 0, 1) = P (1, 1, 0) = �with 0 � � � 1/3take

P1(0) = � + 2(1/3� �), P1(1) = 2�, P1(2) = (1/3� �)

P (0, 0, 2) = P (0, 2, 0) = P (2, 0, 0) = (1/3� �)

m Upper boundNumber of variables in

Authorsin the optimization problem

1 � < 2.3871900 1 CW (1987)2 � < 2.3754770 3 CW (1987)4 � < 2.3729269 9 Stothers (2010)8 � < 2.3729 29 Vassilevska Williams (2012)16 � < 2.3728640 101 Le Gall (2014)32 � < 2.3728639 373 Le Gall (2014)

analysis of the m-th power of the tensor by CW

Analysis of the second construction

T�2CW = (T 011

CW + T 101CW + T 110

CW + T 002CW + T 020

CW + T 200CW )�2 (36 terms)

T�2CW = T 400 + T 040 + T 004 + T 310 + T 301 + T 103 + T 130 + T 013

+ T 031 + T 220 + T 202 + T 022 + T 211 + T 121 + T 112,

R(T�2CW) � (q + 2)2

where

T 400 = T 200CW � T 200

CW ,

T 310 = T 200CW � T 110

CW + T 110CW � T 200

CW ,

T 220 = T 200CW � T 020

CW + T 020CW � T 200

CW + T 110CW � T 110

CW ,

T 211 = T 200CW � T 011

CW + T 011CW � T 200

CW + T 110CW � T 101

CW + T 101CW � T 110

CW ,

and the other 11 terms are obtained by permuting the variables(e.g., T 040 = T 020

CW � T 020CW ).

Idea: rewrite it as a (non-direct) sum of 15 terms by regrouping terms

Analysis of the second power

MERGING

T�2CW = T 400 + T 040 + T 004 + T 310 + T 301 + T 103 + T 130 + T 013

+ T 031 + T 220 + T 202 + T 022 + T 211 + T 121 + T 112,

where

T 400 = T 200CW � T 200

CW ,

T 310 = T 200CW � T 110

CW + T 110CW � T 200

CW ,

T 220 = T 200CW � T 020

CW + T 020CW � T 200

CW + T 110CW � T 110

CW ,

T 211 = T 200CW � T 011

CW + T 011CW � T 200

CW + T 110CW � T 101

CW + T 101CW � T 110

CW ,

and the other 11 terms are obtained by permuting the variables(e.g., T 040 = T 020

CW � T 020CW ).

Analysis of the second powersupp(T�2

CW) = {(4, 0, 0), . . . , (0, 0, 4), (3, 1, 0), . . . , (0, 1, 3), (2, 2, 0), . . . , (0, 2, 2), (2, 1, 1), . . . , (1, 1, 2)}3 permutations 6 permutations 3 permutations 3 permutations

lower bounds on the values of each component can be computed (recursively)choice of distribution: P (4, 0, 0) = . . . = P (0, 0, 4) = �, P (3, 1, 0) = . . . = P (0, 1, 3) = �

P (2, 2, 0) = . . . = P (0, 2, 2) = �, P (2, 1, 1) = . . . = P (1, 1, 2) = �(4-1=3 parameters)

Analysis of the second powersupp(T�2

CW) = {(4, 0, 0), . . . , (0, 0, 4), (3, 1, 0), . . . , (0, 1, 3), (2, 2, 0), . . . , (0, 2, 2), (2, 1, 1), . . . , (1, 1, 2)}3 permutations 6 permutations 3 permutations 3 permutations

lower bounds on the values of each component can be computed (recursively)choice of distribution: P (4, 0, 0) = . . . = P (0, 0, 4) = �, P (3, 1, 0) = . . . = P (0, 1, 3) = �

P (2, 2, 0) = . . . = P (0, 2, 2) = �, P (2, 1, 1) = . . . = P (1, 1, 2) = �

we have �(P ) = 0

(4-1=3 parameters)

Analysis of the second powersupp(T�2

CW) = {(4, 0, 0), . . . , (0, 0, 4), (3, 1, 0), . . . , (0, 1, 3), (2, 2, 0), . . . , (0, 2, 2), (2, 1, 1), . . . , (1, 1, 2)}3 permutations 6 permutations 3 permutations 3 permutations

lower bounds on the values of each component can be computed (recursively)choice of distribution: P (4, 0, 0) = . . . = P (0, 0, 4) = �, P (3, 1, 0) = . . . = P (0, 1, 3) = �

P (2, 2, 0) = . . . = P (0, 2, 2) = �, P (2, 1, 1) = . . . = P (1, 1, 2) = �

we have �(P ) = 0

(4-1=3 parameters)

Main Theorem [LG 14]

For any tight partitioned tensor T , any probability distribution P over supp(T ),

and any � � [2, 3], we have

log(V�(T )) �3�

�=1

H(P�)3

+�

(i,j,k)�supp(T )

P (i, j, k) log(V�(Tijk))� �(P ).

Theorem

V�(T ) � R(T ) =� � � 2.3755... for q = 6 and � = 0.00023, � = 0.0125,

� = 0.10254 and � = 0.2056

m Upper boundNumber of variables in

Authorsin the optimization problem

1 � < 2.3871900 1 CW (1987)2 � < 2.3754770 3 CW (1987)4 � < 2.3729269 9 Stothers (2010)8 � < 2.3729 29 Vassilevska Williams (2012)16 � < 2.3728640 101 Le Gall (2014)32 � < 2.3728639 373 Le Gall (2014)

analysis of the m-th power of the tensor by CW

Analysis of the second power

What about the third power (using similar merging schemes)?

this does not give any improvement

Analysis of the fourth powerT�4

CW = (T 011CW + T 101

CW + T 110CW + T 002

CW + T 020CW + T 200

CW )�4 (64 terms)

R(T�4CW) � (q + 2)4

Idea: rewrite it as a (non-direct) sum of a smaller number of terms by regrouping terms

T�4CW = T 800 + T 710 + T 620 + T 611 + T 530 + T 521 + T 440 + T 431

+ T 422 + T 332 + permutations of these terms

T 080, T 008, T 701, T 107, T 170, T 017, T 071, . . .

10-1=9 parameters for the probability distributionthis time �(P ) �= 0

Main Theorem [LG 14]

The laser method: general formulation

For any tight partitioned tensor T , any probability distribution P over supp(T ),

and any � � [2, 3], we have

log(V�(T )) �3�

�=1

H(P�)3

+�

(i,j,k)�supp(T )

P (i, j, k) log(V�(Tijk))� �(P ).

H: entropy

�(P ): to be defined later (zero in the case of simple tensors)P�: projection of P along the �-th coordinate (= marginal distribution)

Main Theorem [LG 14]

The laser method: general formulation

For any tight partitioned tensor T , any probability distribution P over supp(T ),

and any � � [2, 3], we have

log(V�(T )) �3�

�=1

H(P�)3

+�

(i,j,k)�supp(T )

P (i, j, k) log(V�(Tijk))� �(P ).

H: entropy

�(P ): to be defined later (zero in the case of simple tensors)

where the max is over all distributions Q over supp(T )such that P1 = Q1, P2 = Q2 and P3 = Q3

�(P ) = max[H(Q)]�H(P )

when the structure of support is simple, we typically haveP1 = Q1, P2 = Q2, P3 = Q3 =� P = Q and thus �(P ) = 0

P�: projection of P along the �-th coordinate (= marginal distribution)

The laser method: general formulationInterpretation: the laser method enables us to convert (by zeroing variables)

T�N into a direct sum of

terms, each isomorphic to �

(i,j,k)�supp(T )

[T ijk]�P (i,j,k)N

where the max is over all distributions Q over supp(T )such that P1 = Q1, P2 = Q2 and P3 = Q3

�(P ) = max[H(Q)]�H(P )

“type P”

we can control only the choice of the marginal distributions P1, P2 and P3

what we obtain is a (non-direct) sum of all “type Q” termsthe most frequent terms are those with Q maximizing H(Q) the fact that “type P” are not the most frequent introduces the penalty term -Γ(P)

exp

��3�

�=1

H(P�)3

� �(P )� o(1)

�N

Main Theorem [LG 14]

The laser method: computing the bound

For any tight partitioned tensor T , any probability distribution P over supp(T ),

and any � � [2, 3], we have

log(V�(T )) �3�

�=1

H(P�)3

+�

(i,j,k)�supp(T )

P (i, j, k) log(V�(Tijk))� �(P ).

How to find the best distribution for a given ρ?

If Γ(P) = 0 for all distributions P, the best distribution can be done efficiently (numerically) using convex optimization

maximization of a concave function under linear constraints

assume that (a lower bound on) each V�(Tijk) is known

linearconcave =0

Main Theorem [LG 14]

The laser method: computing the bound

For any tight partitioned tensor T , any probability distribution P over supp(T ),

and any � � [2, 3], we have

log(V�(T )) �3�

�=1

H(P�)3

+�

(i,j,k)�supp(T )

P (i, j, k) log(V�(Tijk))� �(P ).

How to find the best distribution for a given ρ?assume that (a lower bound on) each V�(Tijk) is known

linearconcave ???

where the max is over all distributions Q over supp(T )such that P1 = Q1, P2 = Q2 and P3 = Q3

�(P ) = max[H(Q)]�H(P )In general:

hard to solve, but can be done up to the 4th power of the CW tensor [Stothers 10]

The laser method: computing the bound

where the max is over all distributions Q over supp(T )such that P1 = Q1, P2 = Q2 and P3 = Q3

�(P ) = max[H(Q)]�H(P )In general:

hard to solve, but can be done up to the 4th power of the CW tensor [Stothers 10]

m Upper boundNumber of variables in

Authorsin the optimization problem

1 � < 2.3871900 1 CW (1987)2 � < 2.3754770 3 CW (1987)4 � < 2.3729269 9 Stothers (2010)8 � < 2.3729 29 Vassilevska Williams (2012)16 � < 2.3728640 101 Le Gall (2014)32 � < 2.3728639 373 Le Gall (2014)

analysis of the m-th power of the tensor by CW

Main Theorem [LG 14]

The laser method: computing the bound

For any tight partitioned tensor T , any probability distribution P over supp(T ),

and any � � [2, 3], we have

log(V�(T )) �3�

�=1

H(P�)3

+�

(i,j,k)�supp(T )

P (i, j, k) log(V�(Tijk))� �(P ).

How to find the best distribution for a given ρ?assume that (a lower bound on) each V�(Tijk) is known

linearconcave ???

where the max is over all distributions Q over supp(T )such that P1 = Q1, P2 = Q2 and P3 = Q3

�(P ) = max[H(Q)]�H(P )In general:

Simplification: restrict the search to the set of distributions P such that �(P ) = 0

hard to solve, but can be done up to the 4th power of the CW tensor [Stothers 10]

still hard to solve, but can be done up to the 8th power of the CW tensor [Vassilevska-Williams 12]

The laser method: computing the bound

Simplification: restrict the search to the set of distributions P such that �(P ) = 0still hard to solve, but can be done up to the 8th power of the CW tensor

[Vassilevska-Williams 12]

m Upper boundNumber of variables in

Authorsin the optimization problem

1 � < 2.3871900 1 CW (1987)2 � < 2.3754770 3 CW (1987)4 � < 2.3729269 9 Stothers (2010)8 � < 2.3729 29 Vassilevska Williams (2012)16 � < 2.3728640 101 Le Gall (2014)32 � < 2.3728639 373 Le Gall (2014)

analysis of the m-th power of the tensor by CW

Main Theorem [LG 14]

The laser method: computing the bound

For any tight partitioned tensor T , any probability distribution P over supp(T ),

and any � � [2, 3], we have

log(V�(T )) �3�

�=1

H(P�)3

+�

(i,j,k)�supp(T )

P (i, j, k) log(V�(Tijk))� �(P ).

How to find the best distribution for a given ρ?assume that (a lower bound on) each V�(Tijk) is known

linearconcave ???

where the max is over all distributions Q over supp(T )such that P1 = Q1, P2 = Q2 and P3 = Q3

�(P ) = max[H(Q)]�H(P )In general:

Simplification: restrict the search to the set of distributions P such that �(P ) = 0

hard to solve, but can be done up to the 4th power of the CW tensor [Stothers 10]

still hard to solve, but can be done up to the 8th power of the CW tensor [Vassilevska-Williams 12]

The laser method: general formulationMain Theorem [LG 14]

For any tight partitioned tensor T , any probability distribution P over supp(T ),

and any � � [2, 3], we have

log(V�(T )) �3�

�=1

H(P�)3

+�

(i,j,k)�supp(T )

P (i, j, k) log(V�(Tijk))� �(P ).

call this expression f(P )

where the max is over all distributions Q over supp(T )such that P1 = Q1, P2 = Q2 and P3 = Q3

�(P ) = max[H(Q)]�H(P )

Efficient method to find a solution [LG 14](close to the optimal solution):

The laser method: general formulationMain Theorem [LG 14]

For any tight partitioned tensor T , any probability distribution P over supp(T ),

and any � � [2, 3], we have

log(V�(T )) �3�

�=1

H(P�)3

+�

(i,j,k)�supp(T )

P (i, j, k) log(V�(Tijk))� �(P ).

call this expression f(P )

where the max is over all distributions Q over supp(T )such that P1 = Q1, P2 = Q2 and P3 = Q3

�(P ) = max[H(Q)]�H(P )

1. find a distribution P that maximizes f(P ), and call it P

2. find the distribution Q that maximizes H(Q) under the constraints

Q1 = P1, Q2 = P2 and Q3 = P3. Call it Q.

3. output f(Q)

Since �(Q) = 0, we have log(V�(T )) � f(Q) from the theorem

concave objective function, linear constraints

concave objective function, linear constraints

Efficient method to find a solution [LG 14](close to the optimal solution):

m Upper boundNumber of variables in

Authorsin the optimization problem

1 � < 2.3871900 1 CW (1987)2 � < 2.3754770 3 CW (1987)4 � < 2.3729269 9 Stothers (2010)8 � < 2.3729 29 Vassilevska Williams (2012)16 � < 2.3728640 101 Le Gall (2014)32 � < 2.3728639 373 Le Gall (2014)

analysis of the m-th power of the tensor by CW

Analysis of power 16 and 32

solutions to the optimization problems obtained numerically by convex optimization

Conclusion

any tight partitioned tensor for which (lower bounds on) the value

of each component is knownupper bound on ω

polynomial time

m Upper boundNumber of variables in

Authorsin the optimization problem

1 � < 2.3871900 1 CW (1987)2 � < 2.3754770 3 CW (1987)4 � < 2.3729269 9 Stothers (2010)8 � < 2.3729 29 Vassilevska Williams (2012)16 � < 2.3728640 101 Le Gall (2014)32 � < 2.3728639 373 Le Gall (2014)

analysis of the m-th power of the tensor by CW

We constructed a time-efficient implementation of the laser method

We applied it to study higher powers of the basic tensor by CW

recent result [Ambainis, Filmus, LG 14]:studying higher powers (using the same approach) cannot give an upper bound better than 2.3725

Laser-method-based analysis (v2.3)

convex optimization

top related