probabilistic rank and matrix rigidity - arxiv

21
arXiv:1611.05558v2 [cs.CC] 7 Jan 2017 Probabilistic Rank and Matrix Rigidity Josh Alman Ryan Williams Abstract We consider a notion of probabilistic rank and probabilistic sign-rank of a matrix, which measures the extent to which a matrix can be probabilistically represented by low-rank matrices. We demonstrate several connections with matrix rigidity, communication complexity, and circuit lower bounds. The most interesting outcomes are: The Walsh-Hadamard Transform is Not Very Rigid. We give surprising upper bounds on the rigidity of a family of matrices whose rigidity has been extensively studied, and was conjectured to be highly rigid. For the 2 n × 2 n Walsh-Hadamard transform H n (a.k.a. Sylvester matrices, a.k.a. the communication matrix of Inner Product modulo 2), we show how to modify only 2 ε n entries in each row and make the rank of H n drop below 2 n(1(ε 2 / log(1/ε ))) , for all small ε > 0, over any field. That is, it is not possible to prove arithmetic circuit lower bounds on Hadamard matrices such as H n , via L. Valiant’s matrix rigidity approach. We also show non-trivial rigidity upper bounds for H n with smaller target rank. Matrix Rigidity and Threshold Circuit Lower Bounds. We give new consequences of rigid matri- ces for Boolean circuit complexity. First, we show that explicit n × n Boolean matrices which maintain rank at least 2 (log n) 1δ after n 2 2 (log n) δ /2 modified entries (over any field, for any δ > 0) would yield an explicit function that does not have sub-quadratic-size AC 0 circuits with two layers of arbitrary linear threshold gates. Second, we prove that explicit 0/1 matrices over R which are modestly more rigid than the best known rigidity lower bounds for sign-rank would imply exponential-gate lower bounds for the infamously difficult class of depth-two linear threshold circuits with arbitrary weights on both layers (LTF LTF). In particular, we show that matrices defined by these seemingly-difficult circuit classes actually have low probabilistic rank and sign-rank, respectively. An Equivalence Between Communication, Probabilistic Rank, and Rigidity. It has been known since Razborov [1989] that explicit rigidity lower bounds would resolve longstanding lower-bound prob- lems in communication complexity, but it seemed possible that communication lower bounds could be proved without making progress on matrix rigidity. We show that for every function f which is randomly self-reducible in a natural way (the inner product mod 2 is an example), bounding the communication complexity of f (in a precise technical sense) is equivalent to bounding the rigidity of the matrix of f , via an equivalence with probabilistic rank. Computer Science Department, Stanford University, [email protected]. Supported by NSF DGE-114747. Computer Science Department, Stanford University. Supported in part by a Microsoft Research Faculty Fellowship and NSF CCF-1552651 (CAREER). Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

Upload: others

Post on 19-Feb-2022

7 views

Category:

Documents


0 download

TRANSCRIPT

arX

iv:1

611.

0555

8v2

[cs.

CC

] 7

Jan

2017

Probabilistic Rank and Matrix Rigidity

Josh Alman∗ Ryan Williams†

Abstract

We consider a notion of probabilistic rank and probabilistic sign-rank of a matrix, which measuresthe extent to which a matrix can be probabilistically represented by low-rank matrices. We demonstrateseveral connections with matrix rigidity, communication complexity, and circuit lower bounds. The mostinteresting outcomes are:

The Walsh-Hadamard Transform is Not Very Rigid. We give surprisingupper boundson therigidity of a family of matrices whose rigidity has been extensively studied, and was conjectured tobe highly rigid. For the 2n× 2n Walsh-Hadamard transformHn (a.k.a. Sylvester matrices, a.k.a. thecommunication matrix of Inner Product modulo 2), we show howto modify only 2εn entries in each rowand make the rank ofHn drop below 2n(1−Ω(ε2/ log(1/ε))), for all smallε > 0, over any field. That is, it isnot possibleto prove arithmetic circuit lower bounds on Hadamard matrices such asHn, via L. Valiant’smatrix rigidity approach. We also show non-trivial rigidity upper bounds forHn with smaller target rank.

Matrix Rigidity and Threshold Circuit Lower Bounds. We give new consequences of rigid matri-ces for Boolean circuit complexity. First, we show that explicit n×n Boolean matrices which maintainrank at least 2(logn)1−δ

after n2

2(logn)δ/2 modified entries (overany field, for anyδ > 0) would yield an

explicit function that does not have sub-quadratic-sizeAC0 circuits with two layers of arbitrary linear

threshold gates. Second, we prove that explicit 0/1 matrices overR which are modestly more rigid thanthe best known rigidity lower bounds forsign-rankwould imply exponential-gatelower bounds for theinfamously difficult class of depth-two linear threshold circuits with arbitrary weights on both layers(LTF LTF). In particular, we show that matrices defined by these seemingly-difficult circuit classesactually have low probabilistic rank and sign-rank, respectively.

An Equivalence Between Communication, Probabilistic Rank, and Rigidity. It has been knownsince Razborov [1989] that explicit rigidity lower bounds would resolve longstanding lower-bound prob-lems in communication complexity, but it seemed possible that communication lower bounds could beproved without making progress on matrix rigidity. We show that for every functionf which is randomlyself-reducible in a natural way (the inner product mod 2 is anexample), bounding the communicationcomplexity of f (in a precise technical sense) isequivalentto bounding the rigidity of the matrix off ,via an equivalence with probabilistic rank.

∗Computer Science Department, Stanford University,[email protected]. Supported by NSF DGE-114747.†Computer Science Department, Stanford University. Supported in part by a Microsoft Research Faculty Fellowship and NSF

CCF-1552651 (CAREER). Any opinions, findings and conclusions or recommendations expressed in this material are those of theauthors and do not necessarily reflect the views of the National Science Foundation.

1 Introduction

Let R be a ring. In analogy with the notion of a probabilistic polynomial, we define aprobabilistic matrixover Rto be a distribution of matricesM ⊂ Rn×n. A probabilistic matrixM computesa matrixA∈ Rn×n

with errorε > 0 if for every entry(i, j) ∈ [n]2,

PrB∼M

[A[i, j] = B[i, j]]≥ 1− ε .

In this way, a probabilistic matrix is aworst-case randomized representationof a fixed matrix. A probabilis-tic matrixM has rankr if the maximum rank of aM ∼ M is r.

We define theε-probabilistic rankof a matrixM ∈ Rn×n to be the minimum rank of a probabilistic matrixcomputingM with error ε . Such probabilistic matrices are of interest and potentially very useful, becausesome full rank matrices can be represented by probabilisticmatrices of rather low rank. For example, everyidentity matrix hasε-probabilistic rankO(1/ε) over any field, by simulating a protocol for EQUALITYusing log(1/ε)+O(1) communication that computes random inner products (cf. TheoremD.1).

We began studying probabilistic rank in the hopes of better understanding the use of probabilistic polyno-mials in algorithm design. Recent work has shown how substituting low-degree probabilistic polynomialsin place of common subroutines can be very useful for speeding up the best known running times for manycore problems [Wil14b, Wil14a, AWY15, AW15, ACW16, LPT+17]. However, almost every algorithmicapplication ends up embedding the low-degree polynomial evaluation problem in afast multiplication oftwo low-rank (rectangular) matrices. That is, this algorithmic work is really using the fact thatthat variouscircuits and subroutines from core algorithms havelow probabilistic rank, and is applying low-rank repre-sentations to obtain an algorithmic speedup. Because “low probabilistic rank” is potentially a far broadernotion than that of “low-degree probabilistic polynomials”, it makes more sense to study probabilistic rankdirectly, in the hopes of finding stronger algorithmic applications.

In this paper, we consider complexity-theoretic aspects ofprobabilistic rank. We demonstrate how proba-bilistic rank is a powerful notion for understanding the age-old problem of matrix rigidity, and some modelsof communication complexity where knowledge is still sparse.

Matrix Rigidity. A central part of our paper connects the probabilistic rank of a matrix to its rigidity. Therank-r rigidity of a matrixA ∈ Rn×n, denoted byRA(r), is the minimum Hamming distance fromA to ann×n matrix of rankr overR. That is,RA(r) is the number of entries ofA that must be modified in orderfor the rank to drop tor. (Sometimes we’ll want to work over a particular fieldK; in that case we’ll speakof “RA(r) over K.”) Matrix rigidity was introduced by Leslie Valiant [Val77] in 1977, as a path towardsarithmetic circuit lower bounds for linear transformations. Valiant showed that for a fieldF, and every lineartransformationT : Fn → Fn computable by a circuit ofO(n) addition gates of bounded fan-in (with scalarmultiplications on the wires) andO(logn) depth,RT(O(n/ log logn))≤ n1+ε , for every fixedε > 0. Thus toprove a circuit lower bound forT, it suffices to lower bound the rigidity ofT for rankO(n/ log logn). Valiantproved that random 0/1 matrices over a field are highly rigid (whp), and strong rigidity lower bounds areknown when one allows exponential (or infinite) precision inthe matrix entries [Lok06, KLPS14]. However,no explicit rigid matricesT with (say)RT(O(n/ log logn)) > n1.0001 are known1, despite decades of effort(see the surveys [Cod00, Lok09] and the recent work [GT16]). The best known lower bounds for explicitM yield only RM(r) ≥ Ω(n2

r · log(n/r)) [Fri93, SSS97], and Lokam [Lok00] argues that known methods(“untouched minor arguments”) cannot prove rigidity lowerbounds larger than this. Very recently, Goldreichand Tal showed an improved “semi-explicit” rigidity lower bound: for randomn× n Toeplitz matrices

1An infinite family of matricesMn | n∈S is said to be explicit if there is a polynomial time algorithmA such thanA(1n) printsMn whenn∈ S.

1

M, they proved thatRM(r) ≥ Ω( n3

r2 logn) whp, when rankr ≥ √n [GT16] (note that such matrices can be

generated withO(n) bits of randomness).

In 1989, Razborov [Raz89] (see also [Wun12]) described a connection between matrix rigidity and com-munication complexity: Lettingf be a function inPHcc (the communication complexity equivalent of thepolynomial-time hierarchy), the 2n×2n communication matrixM f of f hasRM f (2

logc(n/ε))≤ ε ·4n, whereε > 0 is arbitrary andc> 0 is a constant depending only onf , but notn. (Razborov’s proof uses low-degreepolynomials which approximateAC0 functions.) Thus, explicit rigidity lower bounds in the “low” rank and“high” error setting would imply long-open communication lower bounds.

Among the many attempts to prove arithmetic circuit lower bounds via rigidity, perhaps the most com-monly studied explicit matrix has been the Walsh-Hadamard transform [PS88, Alo90, Gri, Nis, KR98,Cod00, Lok01, LTV03, Mid05, dW06, Ras16]:

Definition 1.1. For vectors x,y ∈ Rd, let 〈x,y〉 denote their inner product. Let v1, . . . ,v2n ∈ 0,1n be theenumeration of all n-bit vectors in lexicographical order.TheWalsh-Hadamard matrixHn is the2n × 2n

matrix defined by Hn(vi ,v j) := (−1)〈vi ,vj 〉.

It was believed thatHn is rigid because its rows are mutually orthogonal (i.e.,Hn is Hadamard), so inseveral of the above references, only that property was assumed of the matrices. The best rigidity lowerbounds known forHn have the formRHn(r) ≥ Ω(4n/r); for the target rankr = O(2n/ logn) in Valiant’sproblem, the lower bound is onlyΩ(2n logn). It was a folklore theorem that one can modify onlyO(n)entries of ann× n Hadamard matrix and make its rank at mostn/2 [Lok14], but it was believed that forlower rank many more entries would require modification.

Hadamard Ain’t So Rigid. We give a good excuse for the weakness of these lower bounds:

Theorem 1.1(Non-Rigidity of Hadamard Matrices). For every field K, for every sufficiently smallε > 0,and for all n, we haveRHn

(

2n− f (ε)n)≤ 2n(1+ε) over K, for a function f where f(ε) = Θ(ε2/ log(1/ε)).

In fact, we show a strong non-rigidity upper bound: by modifying at most 2εn entries in each row ofHn,the rank ofHn drops to 2n− f (ε)n. That is, the matrix rigidity approach to arithmetic circuit lower boundsdoes notapply to Hadamard matrices such as the Walsh-Hadamard transform. We would have requiredlower bounds of the formRHn(2

n/(logn)) ≥ 2n(1+ε) for someε > 0 to obtain circuit lower bounds; theupper bound of Theorem1.1shows this is impossible. The proof is in Section3.

We do not (yet) believe that the Walsh-Hadamard transform has O(2n)-sizeO(n)-depth circuits; a moreappropriate conclusion is that rigidity is too coarse to adequately capture the lower bound problem in thiscase. Having said that, Theorem1.1 does imply new circuit constructions: it follows that thereis a depth-two unbounded fan-in arithmetic circuit for the Walsh-Hadamard transform with 2n+O(ε log(1/ε))n+22n−Ω(ε2n)

gates; settingε > 0 appropriately, we have a 4δn-size circuit for someδ < 1.

We also show non-trivial rigidity upper bounds forHn in the regime that would be useful for communica-tion complexity, where the rigidity is much closer to 4n.

Theorem 1.2(Non-Rigidity of Hadamard Matrices, Part II). For every integer r∈ [22n], one can modify at

most22n/r entries of Hn and obtain a matrix of rank(n/ ln(r))O(√

nlog(r)).

See AppendixA for the proof. While the product of rank and rigidity (a natural measure) ofHn is only

known to be at leastΩ(4n), Theorem1.2provides an upper bound of 4n ·nO(√

nlog(r))

/r, which is not smallenough to refute the conjectured rigidity lower bounds required for communication complexity applica-tions. But as we show later, these upper bounds still have non-trivial consequences for the communicationcomplexity of IP2.

2

New Applications of Explicit Rigid Matrices. Rigidity has been studied primarily for its connections tocommunication complexity and to lower bounds on arithmeticcircuits computing linear transformations.We show new implications of constructing explicit rigid matrices for Boolean circuit complexity.

First, we show how explicit rigidity lower bounds would yield Boolean circuit lower bounds where onlysomewhat weak results are known:

Theorem 1.3. Let K be an arbitrary field, andMn be a family of Boolean matrices such that (a) Mn isn×n, (b) there is a poly(logn) time algorithm A such that A(n, i, j) prints Mn(i, j), and (c) there is aδ > 0such that for infinitely many n,

RMn

(

2(logn)1−δ)

≥ n2

2(logn)δ/2 over K.

Then the language(n, i, j) | Mn(i, j) = 1 ∈ P does not haveAC0 LTF AC0 LTF circuits of n2−ε -sizeand o(logn/ log logn)-depth, for allε > 0.

The theorem is obtained by giving non-trivial probabilistic rank bounds for such circuits, building onLokam [Lok01]. Therefore, proving rigidity (or probabilistic rank) lower bounds for explicit 0/1 matricesover a fieldK would imply nearly-quadratic size lower bounds forAC

0 LTF AC0 LTF circuits of un-bounded depth, a powerful class of Boolean circuits. (The best known lower bounds are that functions inthe huge classENP do not have such circuits [ACW16].) See AppendixC for these results.

Sign-Rank Rigidity. The sign rank of a−1/1 matrix M is the lowest rank of a matrixN such thatsign(M[i, j]) = sign(N[i, j]), for all (i, j). Lower bounds on the sign-rank of matrices were used 15 yearsago to prove exponential lower bounds againstLTF MAJ andLTF SYM circuits [For02, FKL+01], i.e.restricted versions of depth-two threshold circuits. We extend the sign-rank connection to a circuit class forwhich strong lower bounds have long been open: explicit matrices with high rigidity under sign-rank wouldimply strong depth-two threshold circuit lower bounds. (Here, sign-rank rigidity is defined in the naturalway, with “rank” replaced with “sign-rank” in the rigidity definition.)

A corollary of a theorem of Razborov and Sherstov [RS10] (see AppendixD) is that for alln, Hn hassign-rankr-rigidity at leastΩ(4n/r), just as in the case of normal rank rigidity. We show that evenasomewhat minor improvement would already imply exponential-size lower bounds for depth-two linearthreshold circuits with unbounded weights on both layers, aproblem open for decades [HMP+93, KW16]:

Theorem 1.4. Suppose the sign rank r-rigidity of Hn is Ω(4n/r .999) for some rank bound r≥ 2αn and someα > 0. Then the Inner Product Modulo2 requires2Ω(n)-sizeLTFLTF circuits.

TheoremD.3 gives a more general statement. Under the hood is an upper bound: matrices defined bysmallLTFLTF circuits have lowprobabilistic sign-rank: for every such circuit ofsgates, viewing its truthtable as a 2n/2 × 2n/2 matrix, there is a distribution ofO(s2n2/ε)-rank matrices which sign-represent thetruth table in a worst-case probabilistic way with errorε .

Rigidity, Communication, and Probabilistic Rank: An Equiv alence. Probabilistic rank arises verynaturally in studying generalized models of communicationcomplexity. For a Boolean functionf : 0,1n×0,1n → 0,1, let M f be the 2n × 2n truth table matrix of fwith M f [x,y] = f (x,y) for all x,y. Thefollowing correspondence between probabilistic rank and communication complexity is immediate (onecould even take the proposition as adefinitionof BP ·MODmP communication complexity).

Proposition 1. Let m> 1 be an integer, let f: 0,1n ×0,1n → 0,1, and let Mf be its truth tablematrix. TheBP ·MODmP communication complexity of f with errorε equals the (base-2) logarithm of theε-probabilistic rank of Mf overZm (within additive constants).

3

Similarly,AM (Arthur-Merlin communication complexity) is equivalent to probabilistic Boolean rank.

It’s easy to see that if a matrix hasε-probabilistic rankr, then its rank-r rigidity is at mostε22n; thus rigid-ity lower bounds imply communication lower bounds. But conversely, it seems easier to prove lower boundson probabilistic rank compared to rigidity: with probabilistic rank, we need to rule out a “distribution” oferroneous matrix entries which are required to “spread the errors” around; with rigidity, we have to rule outanyadversarial choice of bad entries.

We show (in AppendixG) that for every randomly self-reducible functionf : 0,12n → R in which theself-reduction makesk non-adaptive queries, low rigidity implies low probabilistic rank: theε-probabilisticrank of its corresponding matrix is at most(kr)k if its rank-r rigidity is at mostε · 4n. Thus there is astrong relationship betweenε-probabilistic rank (and communication complexity, by Proposition1) and therank for which the rigidity is anε-fraction of the matrix. For the Walsh-Hadamard transform,we prove (inSection4) that the probabilistic rank ofHn and the rigidity ofHn are equivalent concepts over fields:

Theorem 1.5. For every field K and for every n,RHn(r)≤ ε ·4n over K if and only if Hn hasε-probabilisticrank r over K.

The matricesHn represent the communication matrices of the widely-studied Inner Product Modulo 2(IP2) function. By Proposition1, theBP ·MODpP communication complexity of IP2 and the rigidity ofHn

overFp are really equivalent concepts. Applying this theorem, ourearlier rigidity upper bounds also implysome modest but interesting improvements on communicationcomplexity protocols. From the rigidity upperbound of Theorem1.2, we obtain a communication protocol for IP2 withO(

nlog(1/ε) log( nlog(1/ε))) bits

and errorε in theBP ·MODpP communication model, for every primep. (Aaronson and Wigderson gaveanMA protocol for IP withO(

√nlog(n/ε)) communication complexity and errorε [AW09]; ours is more

efficient for ε ≪ 1/2√

logn.) Applying Theorem1.1 yields an IP2 protocol withn(1−Ω(ε2/ log(1/ε)))communication and only 1/2n−εn error. We are skeptical that our rigidity upper bounds forHn are tight;we hope these results will aid future work (to prove rigidityupper bounds, one only has to think aboutcommunication protocols for IP2).

1.1 Related Work

Besides the many references already mentioned earlier, there are a few other related works we know of.

Toggle Rank.By Yao’s minimax principle [Yao83], BP ·MODmP communication complexity (randomizedcommunication with “counting modulom” power) equals worst-case distributionalMODmP communicationcomplexity. In matrix terms, putting an arbitrary distribution P on the pairs0,1n×0,1n, the worst-caseε-distributional complexity ofM is the lowest rank (overZm) of a 2n×2n matrixN with error||M−N|| ≤ εoverP. Wunderlich [Wun12] calls this rank notion theapproximate toggle rank. Proposition1 shows thatprobabilistic rank and approximate toggle rank are very closely related, but they arenot the same as theusual rigidity concept, which corresponds to the uniform distribution on pairs. For structured functions likeIP2, we prove (Theorem1.5) that the uniform distributionis the worst case.

Sign-rank Rigidity and AC0-MOD2 circuits. A tantalizing open problem that has gained popularity inrecent years [SV12, ABG+14, CGJ+16] is whether IP2 has polynomial-sizeAC0 MOD2 circuits: i.e.,circuits of O(1)-depth over AND/OR/NOT, but with a layer of gates computing PARITY at the bottomnearest the inputs. Servedio and Viola [SV12] propose an interesting attack: in our terminology, they notethatAC0MOD2 circuits of sizeshavenO(logd−1 s) log(1/ε) sign-rank rigidity at mostε22n overR, and prove alower bound on the correlation of signs of sparse polynomials (taken as a proxy for low-rank sign-matrices)with IP2. That is, they prove a weak sign-rank rigidity lowerbound (note Razborov and Sherstov prove ananalogous lower bound for sign-rank rigidity of IP2; see AppendixD). Our results have two consequencesfor this sort of approach. First, TheoremD.1 shows that a sign-rank rigidity lower bound would prove

4

something much stronger: a lower bound fordepth-two threshold circuitscomputing IP2, a longstandingopen problem. Second, our non-trivial upper bounds on the rank rigidity of the IP2 matrix (which isHn)suggest that IP2 may have much lower sign-rank rigidity thanexpected.

Sign-Rank Rigidity and Margin Complexity. Linial and Shraibman [LS09] prove (in our terminology)that the sign-rank rigidity of ann× n matrix A is at mostεn2 for target rankO(mc(A)2 log(1/ε)), wheremc(A) is the “margin complexity” ofA. Thus the margin complexity of a matrix can be used to upperbound sign-rank rigidity. They also study rigidity notion based onmc, conjecture that highmc implies highmargin-complexity rigidity, and show that high margin-complexity also implies communication complexitylower bounds (for similar parameters as the standard rank-rigidity setting).

Approximate Rank. A different “approximating” rank notion has been studied in[BdW01, KS10, ALSV13],with connections to quantum computing and approximation algorithms. Theε-approximate rank ofM ∈Rn×n is the lowest rank of a matrixA such that||M−A||∞ ≤ ε . That is, we can obtain one matrix from theother by perturbing each entry by at mostε > 0. The appropriate analogy here seems to be that probabilisticpolynomials are to probabilistic rank, asℓ∞-approximate polynomials are to approximate rank: both arenatural generalizations of polynomial representations tomatrix representations, with different properties.

2 Preliminaries

We assume basic familiarity with complexity theory. For circuit complexity, we useC D to denote depth-two circuits where the output gate is of typeC and the “hidden” layer is of typeD , e.g.,LTFLTF denotes“depth-two linear threshold circuits”,LTF MOD2 denotes “linear threshold function of parities”, etc. Forvariablesx1, . . . ,xn, we use~x to denote(x1, . . . ,xn), and for~x ∈ 0,1n, we use|~x| = ∑i xi to denote itsHamming weight. We use the Iverson bracket[P] : 0,1n → 0,1 to denote the Boolean function whichoutputs 1 if and only if propertyP is true of then inputs. Below we describe some basic properties relatingprobabilistic polynomials, probabilistic rank, and rigidity.

Definition 2.1. Let R be any ring, and f: 0,12n → R be any function on2n Boolean variables. Thetruthtable matrixM f of f is the2n×2n matrix given by

M f (vi ,v j) = f (vi ,v j),

where v1, . . . ,v2n ∈ 0,1n is the enumeration of all n-bit vectors in lexicographical order.

Given the above definition, it is natural to define the probabilistic rank of a function:

Definition 2.2. Theε-probabilistic rank of a function f: 0,12n → R is theε-probabilistic rank of its truthtable matrix Mf . The rank of f and the rigidity of f are defined similarly.

Definition 2.3 (Razborov [Raz87], Smolensky [Smo87]). Let R be a ring, and let f: 0,1n → R. Aprobabilistic polynomialfor f with error ε and degree d is a distributionP on polynomials p: 0,1n → Rof degree at most d such that foreveryx∈ 0,1n, Prp∼P [p(x) = f (x)] ≥ 1− ε . We may similarly refer toa probabilistic polynomial with m monomials.

The following simple mapping from sparse polynomials to low-rank matrices is very useful:

Lemma 2.1. Let R be any ring, and f: 0,12n → R. Let p: R2n → R be a polynomial with m monomialssuch that p(x,y) = f (x,y) for any x,y∈ 0,1n. Then the rank of f is at most m.

Proof. Let a1, . . . ,am,b1, . . . ,bm : Rn →Rbe monomials such thatp(x,y) =∑mi=1ai(x) ·bi(y) is the monomial

expansion ofp. For 1≤ i ≤ m, define vectors~αi ,~βi ∈ R2nby ~αi [x] = ai(x) and~βi [y] = bi(y) for eachx,y∈

0,1n. ThenM f = ∑mi=1~αi ⊗~βi, where⊗ denotes the outer product of vectors. Thus rank(M f )≤ m.

5

As a corollary, the probabilistic rank off is at most the sparsity of a probabilistic polynomial forf :

Corollary 2.1. Let R be any ring, and f: 0,12n → R. If f has a probabilistic polynomialP with at mostm monomials and errorε , then theε-probabilistic rank of f is at most m.

Proof. Let p be a polynomial in the support of the distributionP. Since p has at mostm monomials,by Lemma2.1 the truth table matrixMp of p (restricted to the domain0,12n) has rank at mostm. Thedistribution ofMp over p drawn fromP is therefore anε-probabilistic rank-m distribution forM f , sinceM f (x,y) = Mp(x,y) if and only if f (x,y) = p(x,y).

By drawing a ‘typical’ matrix from the probabilistic rank distribution, we can always obtain a matrixrigidity upper bound from a sparse probabilistic polynomial.

Corollary 2.2. Let R be any ring, and f: 0,12n → R be any function on2n Boolean variables. If f has aprobabilistic polynomial P with at most m monomials and error ε , then one can modifyε22n entries of thetruth table matrix Mf and obtain a matrix of rank at most m.

3 Non-Rigidity of Walsh-Hadamard

Now we prove that the Walsh-Hadamard matrices are not rigid enough for Valiant’s program:

Reminder of Theorem 1.1 For every field K, for every sufficiently smallε > 0, and for all n, we haveRHn

(

2n− f (ε)n)≤ 2n(1+ε) over K, for a function f where f(ε) = Θ(ε2/ log(1/ε)).For a vectorv∈ 0,1n, let |v| be the number of ones inv. Let H : [0,1]→ [0,1] denote the binary entropy

functionH(p) =−plog2 p− (1− p) log2(1− p).

We will need some estimates of binomial coefficients. Forε ∈ (0,1/2):(

nεn

)

≤ n·2H(ε)n, and (1)

2n−O(ε2n) ≤(

n(1/2− ε)n

)

≤ 2n−Ω(ε2n). (2)

Equation (1) is standard; equation (2) follows from standard tail bounds on the binomial distribution. Inparticular, the probability that a uniform random bit string has at most(1/2− ε)n ones is at most 2−c1ε2n

and at least 2−c2ε2n, for universal constantsc1,c2 > 0.

Our first (simple) lemma uses a polynomial to compute a large fraction ofHn’s entries with a low-rankmatrix. However, this fraction won’t be high enough; we’ll need another idea to “correct” many entries later.

Lemma 3.1. For every field K, and for everyε ∈ (0,1/2), there is a multilinear polynomial p(x1, . . . ,xn,y1, . . . ,yn)

over K with at most2n−Ω(ε2n) monomials, such that for all~x,~y∈ 0,1n with 〈~x,~y〉 ∈ [2εn,(1/2+ ε)n],

p(~x,~y) = (−1)〈~x,~y〉.

The proof uses properties of multivariate polynomial interpolation over the integers. To be concrete, wewill apply the following lemma from one of our previous papers:

Lemma 3.2([AW15], Lemma 3.1). For any integers n, r,k with n≥ r +k and any integers c1, . . . ,cr , thereis a multivariate polynomial p: 0,1n → Z of degree r−1 with integer coefficients such that p(z) = ci forall~z∈ 0,1n with Hamming weight|~z|= k+ i.

6

Intuitively, Lemma3.2 is true because the dimension of the space of degree-(r − 1) polynomials innvariables is large enough that we can always construct a polynomial with the desired constraints.

Proof of Lemma3.1. By Lemma3.2 with k = 2εn− 1, r = (1/2− ε)n+ 1, andci = (−1)k+i , one canconstruct a multivariate polynomialq : 0,1n →Z with integer coefficients, of degree(1/2−ε)n, such thatfor all~z∈ 0,1n with |~z| ∈ [2εn,(1/2+ ε)n], we haveq(~z) = (−1)|~z|. Since the prime subfield of everyfield K is eitherQ or Fm for some primem, andq has integer coefficients,q can be viewed as a polynomialoverK (by taking the coefficients modulom if appropriate). Then our desired polynomial is

p(x1, . . . ,xn,y1, . . . ,yn) = q(x1y1,x2y2, . . . ,xnyn) .

We can upper-bound the number of monomials inp as follows. First, since we only care about the value ofp on 0,12n, we can makep multilinear by applying the equationv2 = v to all variables. Second, observethat for all i = 1, . . . ,n, xi andyi appear in exactly the same monomials. So if we introduce a variablezi inplace of eachxi ·yi in p, the number of monomials in our newn-variate polynomialp′ equals the number ofmonomials inp.

Since p′ is multilinear and degree(1/2− ε)n+ 1, the number of monomials is at mostn( n(1/2−ε)n+1

)

,

which by (2) is at most 2n−c2ε2n for some constantc2 > 0.

Our second lemma says: fixing a vectorx with about 1/2 ones, there is a strong upper bound the numberof vectors which has about 1/2 ones but has small (integer) inner product withx; we’ll use this to upperbound the number of erroneous entries at the very end.

Lemma 3.3. For every vector x∈ 0,1n with |x| ∈ [(1/2− a)n,(1/2+ a)n], and any parameters a,b ∈(0,1/5), the probability that a uniformly random vector y from0,1n satisfies both

• |y| ∈ [(1/2−a)n,(1/2+a)n], and

• ∑nk=1 xkyk ≤ bn

is at most(2an+1)(bn+1) ·2( f (a,b)−1)n, where f is a function such that f(a,b)→ 0 as a,b→ 0.

The usual toolbox of small-deviation estimates does not seem to yield the lemma; we give a direct proof.

Proof. For allx of the above form, everyk∈ [(1/2−a)n,(1/2+a)n], and everys≤ bn, we count the numberof y∈ 0,1n with |y|= k and∑n

k=1xkyk = s. A vectory satisfies these properties if and only if:

• there are exactlys integersi with y[i] = 1 andx[i] = 1, and

• there are exactlyk−s integersi with y[i] = 1 andx[i] = 0.

So there are(|x|

s

)(n−|x|k−s

)

such choices ofy. The total probability is hence

12n

(1/2+a)n

∑k=(1/2−a)n

bn

∑s=0

(|x|s

)(

n−|x|k−s

)

=12n

(1/2+a)n

∑k=(1/2−a)n

bn

∑s=0

(|x|s

)(

n−|x|k−s

)

≤ 12n

(1/2+a)n

∑k=(1/2−a)n

bn

∑s=0

(

(1/2+a)ns

)(

(1/2+a)nk−s

)

≤ 12n

(1/2+a)n

∑k=(1/2−a)n

(bn+1) ·(

(1/2+a)nbn

)(

(1/2+a)n(1/2−a−b)n

)

. (*)

7

Recall that ifk1 < k2 < n/3 then( n

k1

)

<( n

k2

)

, and if k3 > k4 > n/2, then( n

k3

)

<( n

k4

)

. Step (*) therefore

follows sinces≤ bn< 12(1/2+a)n andk−s≥ (1/2−a−b)n> 1

2(1/2+a)n whenever 0< a,b< 1/5. Letg(n) = (2an+1) · (bn+1). Simplifying further, the above expression is at most

g(n)2n

(

(1/2+a)nbn

)(

(1/2+a)n(2a+b)n

)

≤ g(n)2n ·2(1/2+a)n·H(b/(1/2+a))2(1/2+a)n·H((2a+b)/(1/2+a)) (by (1))

≤ g(n)2n ·2(1/2+a)n·H(2b)2(1/2+a)n·H(4a+2b)

≤ g(n)4n ·2(1/2+a)n·2·2b·log(1/2b)2(1/2+a)n·2·(4a+2b)·log(1/(4a+2b)) (H(ε)≤ 2ε log2(1/ε) for ε < 1/2)

≤ g(n)2n ·2f (a,b)n,

where f (a,b) = (1/2+a)(4blog(1/2b)+ (8a+4b) log(1/(4a+2b))).

Our third lemma is a simple linear-algebraic observation: given a low-rank matrixM that computes an-other matrixN on all but a small number of rows and columns,N must also have relatively low rank.

Lemma 3.4. Let M′ be a matrix of rank r which is equal to M except in at most k columns andℓ rows. Thenthe rank of M is at most r+k+ ℓ.

Proof. We will start withM′, and add at mostk+ ℓ rank-one matrices toM′ so that it equalsM.

Consider a columnc on whichM does not equalM′. We can add toM′ a correction matrixCc given by

Cc(i, j) =

M(i,v)−M′(i,v) if j = v,

0 otherwise.

Then,M′+Cc equalsM on columnc, and is unchanged in any other column. Moreover, sinceCc is onlynonzero on a single column, it has rank one. So all we have to dois add the correction matrixCc for eachcolumnc on whichM andM′ differ. The rows ofM′ can be corrected analogously.

Corollary 3.1. Let T be any2n×2n matrix. Let a∈ (0,1/2), and let M be a2n×2n matrix of rank r, indexedby n-bit vectors. There is a2n×2n matrix M′ of rank at most r+4·n·2n−Ω(a2n) such that M′(vi ,v j)=T(vi ,v j)on all vi ,v j ∈ 0,1n where at least one of the following holds:

• |vi | /∈ [(1/2−a)n,(1/2+a)n],

• |v j | /∈ [(1/2−a)n,(1/2+a)n], or,

• M(vi ,v j) = T(vi ,v j).

Proof. The number ofvi ∈ 0,1n with |vi | /∈ [(1/2−a)n,(1/2+a)n] is at most

(1/2−a)n

∑i=0

(

ni

)

+n

∑i=(1/2+a)n

(

ni

)

= 2(1/2−a)n

∑i=0

(

ni

)

≤ n·2n−Ω(a2n),

by (2). Applying Lemma3.4 to M andM′ with k andℓ set to 2·n·2n−Ω(a2n), the result follows.

8

Let us outline how we’ll use all of the above. First, we construct a matrixM of rank about 2n−Ω(ε2n)

approximatingHn, using the polynomial from Lemma3.1 in a straightforward way. This matrixM hasfar more erroneous entries than what we desire. But by Lemma3.3, we can infer that the errors inM arehighly concentrated on a relatively small number of rows andcolumns. Applying Corollary3.1, the rowsand columns can be “corrected” in a way that increases the rank of M by only 2n−Ω(ε2n). By Lemma3.3,each row of the matrix left over will have 2O(ε log(1/ε)n) erroneous entries.

Proof of Theorem1.1. In fact, we prove that one only has to modify 2O(ε log(1/ε)n) entries in each row ofHn,to obtain the desired rank.

Let ε > 0 be given. By Lemma3.1, there is a polynomialp(x,y) in 2n variables withm= 2n−Ω(ε2n)

monomials which computes(−1)〈x,y〉 correctly, on all(x,y) ∈ 0,12n such that〈x,y〉 ∈ [2εn,(1/2+ ε)n].Construct a 2n ×2n matrix M of rank m as in Corollary2.1, so thatM(x,y) = p(x,y). By definition,M

equalsHn on all (x,y) ∈ 0,12n satisfying〈x,y〉 ∈ [2εn,(1/2+ ε)n].Applying Corollary3.1 to M with T = Hn anda= ε , we obtain a matrixM′ of rankm+4 ·n ·2n−Ω(ε2n)

which is correct on all(x,y) where either|x| /∈ [(1/2− ε)n,(1/2+ ε)n], |t| /∈ [(1/2− ε)n,(1/2+ ε)n], or〈x,y〉 ∈ [2εn,(1/2+ ε)n].

Fix a row ofHn indexed byx∈ 0,1n with |x| ∈ [(1/2−ε)n,(1/2+ε)n] (note the other rows are alreadycorrect). To show thatM′ differs fromHn on a small number of entries, we need to bound the number ofysuch that none of the above conditions hold, i.e.,

1. |y| ∈ [(1/2− ε)n,(1/2+ ε)n] and

2. 〈x,y〉 /∈ [2εn,(1/2+ ε)n].

Note for our givenx, it is never true that〈x,y〉 > (1/2+ ε)n. Therefore we only need to bound the numberN of y such that|y| ∈ [(1/2−a)n,(1/2+a)n] and yet〈x,y〉 < 2εn. By Lemma3.3 with a= ε andb= ε ,the probability that a randomy satisfies〈x,y〉 < 2εn and |y| ∈ [(1/2− ε)n,(1/2+ ε)n], is at mostO(n2) ·2( f (ε ,ε)−1)n, where f → 0 asε → 0. ThereforeN ≤ 2n ·O(n2) ·2( f (ε ,ε)−1)n ≤ O(n2) ·2f (ε ,ε)n.

Now for sufficiently largen andε ∈ (0,1/2), M′ has rank at mostm+ 4 ·n ·2n−Ω(ε2n) ≤ 5n ·2n−Ω(ε2n).Furthermore, on every row,M′ differs fromHn in at mostn2 ·2f (ε ,ε) ≤ 2O(ε log(1/ε)n) entries.

Other rigidity upper bounds forHn are described in AppendixA.

4 Probabilistic Rank and Rigidity: An Equivalence

In this section, we show that the probabilistic rank ofHn and the rigidity ofHn are thesameconcept overfields. It is easy to see that ifε-probabilistic rank ofHn is k over a fieldK, then the rank-k rigidity of Hn isat mostε22n overK. Exploiting the random self-reducibility of theHn function, we can show a converse:lower bounds on probabilistic rank imply proportionate rigidity lower bounds. This is of interest becauseprobabilistic rank lower bounds appear to be fundamentallyeasier to prove than rigidity lower bounds.

Reminder of Theorem1.5 For every field K and for every n,RHn(r) ≤ ε22n over K if and only if Hn hasε-probabilistic rank r over K.

First let us give some definitions. Let⊗ denote the outer product of vectors. For vectorsa∈ K2nwhose

entries are indexed byv1, . . . ,v2n ∈ 0,1n, andx,y∈ 0,1n, let a(x,y) denote the vector inK2ngiven by

a(x,y)[vi ] = (−1)〈vi ,y〉a[vi ⊕x].

This permutes the entries ofa, then negates half of the entries.

9

Proof. One direction is trivial: low probabilistic rank implies low rigidity, by simply drawing a “typical”matrix from the distribution. For the other direction, supposea1, . . . ,ar andb1, . . . ,br are vectors inK2n

suchthat the 2n×2n matrix

M :=r

∑k=1

ak⊗bk (3)

differs from Hn in at mostε22n entries. Pick vectorsx,y ∈ 0,1n uniformly at random, and consider the2n×2n matrix

M′ = (−1)〈x,y〉r

∑k=1

a(x,y)k ⊗b(y,x)k . (4)

In this form it is clear thatM′ has rank at mostr. We claim that each entry ofM′ is equal to the correspondingentry ofHn with probability at least 1− ε , over the choice ofx andy, which will complete the proof.

Consider a given entryM′(vi ,v j). It is sufficient to show that ifM(vi ⊕x,v j ⊕y) = Hn(vi ⊕x,v j ⊕y) thenM′(vi ,v j) = Hn(vi ,v j), since(vi ⊕x,v j ⊕y) is a uniformly random pair of vectors in0,1n. Suppose this isthe case, meaningM(vi ⊕x,v j ⊕y) = (−1)〈vi⊕x,vj⊕y〉. Applying definition (3) and then (4) we see that

(−1)〈vi⊕x,vj⊕y〉 =r

∑k=1

ak[vi ⊕x] ·bk[v j ⊕y]

= (−1)〈vi ,y〉+〈vj ,x〉r

∑k=1

(−1)〈vi ,y〉ak[vi ⊕x] · (−1)〈vj ,x〉bk[v j ⊕y]

= (−1)〈vi ,y〉+〈vj ,x〉r

∑k=1

a(x,y)k [vi ] ·b(y,x)k [v j ]

= (−1)〈vi ,y〉+〈vj ,x〉 · (−1)〈x,y〉 ·M′(vi ,v j).

Rearranging, we see as desired that

M′(vi ,v j) = (−1)〈vi⊕x,vj⊕y〉+〈vi ,y〉+〈vj ,x〉+〈x,y〉 = (−1)〈vi ,vj 〉,

where the last step follows from the bilinearity of the innerproduct〈·, ·〉.

Therefore, proving communication lower bounds for the IP2 function against (for example) the classBP ·MODmP is equivalentto proving rigidity lower bounds forHn over the ringZm. Applying our rigidityupper bounds forHn (Theorems1.1 and1.2), we obtain surprisingly low probabilistic rank bounds forHn

(and therefore communication-efficient protocols as well):

Corollary 4.1. For every field K, for every sufficiently smallε > 0, and for all n, Hn has 1/2n(1−ε)-probabilistic rank at most2n−Ω(ε2/ log(1/ε))n over K, andε-probabilistic rank at most(1/ε)O(

√nlogn).

Our reduction from rigidity to probabilistic rank in fact works for any (non-adaptive) random self-reduciblefunction [FF93] that makes a small number of oracle calls. See AppendixG.

5 Discussion

Our most significant finding is that Hadamard matrices are notas rigid as previously believed: for everyε > 0, there are infinitely manyN andN×N Hadamard matrices whose rank drops belowN1−Ω(ε2/ log(1/ε))

10

after modifying onlyNε entries in each row. This rules out a proof of arithmetic circuit lower bounds forthe DFT overZn

2 via matrix rigidity. Our proof shows precisely how low rank-rigidity can be more powerfulthan low-sparsity polynomial approximations: we start with a sparse polynomial that has errors concentratedon negligibly many rows and columns, and use a simple lemma tocorrect most erroneous rows and columns.

Are there other conjectured-to-be-rigid matrices which are not? One candidate would be the generatingmatrix of a good linear code overF2. Very recently, Goldreich [Dvi16] has reported a distribution of matricesin which most of them are the generating matrix of a good linear code that is not rigid, found by Dvir. Itwould be very interesting to find an explicit code with this property. Another next natural target would beVandermonde matrices. Given a fieldF of ordern, and lettingg be a generator of the multiplicative groupF×, then×n matrixV[i, j] := g(i−1)·( j−1) also has structure that may be similarly exploitable.

Our proof that functions with smallLTF LTF circuits have low probabilistic sign-rank (TheoremD.1)effectively shows how to randomly reduce an “inner product defined by aLTFLTF” to an “inner productdefined by aLTF XOR.” It seems likely that this result could have further applications (beyond what weshowed). The theorem suggests the research question: is it possible to write aLTFLTF circuit as an small“approximate-MAJORITY” ofLTFXOR circuits, i.e. aprobabilisticPTF, in the sense of [ACW16]? Thiswould be an intriguing simulation of depth-two threshold circuits.

Another significant theme in this paper is the close relationship between probabilistic rank and thresholdcircuits, as well as rigidity. It seems likely that more algorithmic applications will be found by further studyof probabilistic rank of matrices; perhaps some lower bounds can also be proved via these connections.

Acknowledgements. We thank Mrinal Kumar for helpful comments, Oded Goldreich,Michael Forbesand Satya Lokam for helpful discussions, and Amir Shpilka and Avishay Tal for patiently listening to R.W.’sconjectures and results on non-rigidity at Banff (BIRS) in August 2016.

References

[ABG+14] Adi Akavia, Andrej Bogdanov, Siyao Guo, Akshay Kamath, and Alon Rosen. Candidate weakpseudorandom functions in AC0 o MOD2. In ITCS, pages 251–260, 2014.

[ACW16] Josh Alman, Timothy Chan, and Ryan Williams. Polynomial representations of threshold func-tions and algorithmic applications. InFOCS, pages 467–476, 2016.

[Alo90] Noga Alon. On the rigidity of an Hadamard matrix. Manuscript. See [Juk01, Section 15.1.2],1990.

[ALSV13] Noga Alon, Troy Lee, Adi Shraibman, and Santosh Vempala. The approximate rank of a matrixand its algorithmic applications. InSTOC, pages 675–684. ACM, 2013.

[AMY16] Noga Alon, Shay Moran, and Amir Yehudayoff. Sign rank versus VC dimension. InCOLT,pages 47–80, 2016.

[AW09] Scott Aaronson and Avi Wigderson. Algebrization: A new barrier in complexity theory.ACMTransactions on Computation Theory (TOCT), 1(1):2, 2009.

[AW15] Josh Alman and Ryan Williams. Probabilistic polynomials and hamming nearest neighbors. InFOCS, pages 136–150. IEEE, 2015.

[AWY15] Amir Abboud, Richard Ryan Williams, and Huacheng Yu. More applications of the polynomialmethod to algorithm design. InSODA, pages 218–230, 2015.

11

[BdW01] Harry Buhrman and Ronald de Wolf. Communication complexity lower bounds by polynomials.In CCC, pages 120–130. IEEE, 2001.

[CGJ+16] Mahdi Cheraghchi, Elena Grigorescu, Brendan Juba, KarlWimmer, and Ning Xie. ACˆ0 oMOD2 lower bounds for the boolean inner product. InICALP, pages 35:1–35:14, 2016.

[Cod00] Bruno Codenotti. Matrix rigidity.Linear Algebra and its Applications, 304(1–3):181 – 192,2000.

[Dvi16] Zeev Dvir. On the non-rigidity of generating matrices of good codes. Writeup by Oded Goldre-ich. Available athttp://www.wisdom.weizmann.ac.il/~oded/MC/209.html, October 302016.

[dW06] Ronald de Wolf. Lower bounds on matrix rigidity via a quantum argument. InProc. ICALP,volume 4051, pages 62–71, 2006.

[FF93] Joan Feigenbaum and Lance Fortnow. Random-self-reducibility of complete sets.SIAM J.Comput., 22(5):994–1005, 1993.

[FKL+01] Jurgen Forster, Matthias Krause, Satyanarayana V. Lokam, Rustam Mubarakzjanov, NielsSchmitt, and Hans Ulrich Simon. Relations between communication complexity, linear ar-rangements, and computational complexity. InFSTTCS, pages 171–182, 2001.

[For02] Jurgen Forster. A linear lower bound on the unbounded error probabilistic communicationcomplexity.J. Comput. System. Sci., 65:612–625, 2002.

[Fri93] Joel Friedman. A note on matrix rigidity.Combinatorica, 13(2):235–239, 1993.

[GPW16] Mika Goos, Toniann Pitassi, and Thomas Watson. Zero-information protocols and unambiguityin arthur-merlin communication.Algorithmica, 76(3):684–719, 2016.

[Gri] D. Yu. Grigor’ev. Unpublished work. Cited in [KR98].

[GT16] Oded Goldreich and Avishay Tal. Matrix rigidity of random toeplitz matrices. InSTOC, pages91–104, 2016.

[HMP+93] Andras Hajnal, Wolfgang Maass, Pavel Pudlak, Mario Szegedy, and Gyorgy Turan. Thresholdcircuits of bounded depth.J. Comput. Syst. Sci., 46(2):129–154, 1993.

[Juk01] Stasys Jukna.Extremal Combinatorics, With Applications in Computer Science. EATCS Series.Springer, 2001.

[KLPS14] Abhinav Kumar, Satyanarayana V. Lokam, Vijay M. Patankar, and M. N. Jayalal Sarma. Usingelimination theory to construct rigid matrices.computational complexity, 23(4):531–563, 2014.

[KR98] B. S. Kashin and A. A. Razborov. Improved lower boundson the rigidity of Hadamard matrices.Matematicheskie Zametki, 63(4):535–540, 1998. (in Russian).

[KS10] Adam R Klivans and Alexander A Sherstov. Lower boundsfor agnostic learning via approxi-mate rank.Computational Complexity, 19(4):581–604, 2010.

[KW16] Daniel M. Kane and Ryan Williams. Super-linear gate and super-quadratic wire lower boundsfor depth-two and depth-three threshold circuits. InSTOC, pages 633–643, 2016.

12

[Lok00] Satyanarayana V. Lokam. On the rigidity of vandermonde matrices.Theoretical ComputerScience, 237(1–2):477 – 483, 2000.

[Lok01] Satyanarayana V. Lokam. Spectral methods for matrix rigidity with applications to size-depthtradeoffs and communication complexity.Journal of Computer and System Sciences, 63:449–473, 2001.

[Lok06] Satyanarayana V. Lokam. Quadratic lower bounds on matrix rigidity. In Proc. TAMC, volume3959, pages 295–307. Springer, 2006.

[Lok09] Satyanarayana V. Lokam. Complexity lower bounds using linear algebra.Foundations andTrends in Theoretical Computer Science, 4(1-2):1–155, 2009.

[Lok14] Satyanarayana V. Lokam. Exercises on matrix rigid-ity. Simons Institute for Theory of Computing. Available athttps://simons.berkeley.edu/sites/default/files/docs/1738/exercises.pdf,2014.

[LPT+17] Daniel Lokshtanov, Ramamohan Paturi, Suguru Tamaki, Ryan Williams, and Huacheng Yu.Beating brute force for systems of polynomial equations over finite fields. InSODA, page toappear, 2017.

[LS09] Nathan Linial and Adi Shraibman. Learning complexity vs communication complexity.Com-binatorics, Probability & Computing, 18(1-2):227–245, 2009.

[LTV03] JM Landsberg, J. Taylor, and N.K. Vishnoi. The geometry of matrix rigidity. preprint availableathttps://smartech.gatech.edu/handle/1853/6514, 2003.

[Mid05] Gatis Midrijanis. Three lines proof of the lower bound for the matrix rigidity. Technical report,Arxiv e-print, June 2005.

[MT98] Alexis Maciel and Denis Therien. Threshold circuits of small majority-depth.Information andComputation, 146(1):55–83, 1998.

[Nis] Noam Nisan. Unpublished work. Cited in [KR98].

[PS88] P. Pudlak and P. Savicky. Private communication, cited in [Raz89], 1988.

[Ras16] Cyrus Rashtchian. Bounded matrix rigidity and John’s theorem. Electronic Colloquium onComputational Complexity (ECCC), 23:93, 2016.

[Raz87] A. A. Razborov. Lower bounds on the size of bounded depth circuits over a complete basis withlogical addition.Mathematical Notes of the Academy of Sciences of the USSR, 41(4):333–338,1987.

[Raz89] A. A. Razborov. On rigid matrices (in Russian). Manuscript can be found athttp://people.cs.uchicago.edu/~razborov/files/rigid.pdf, 1989.

[RS10] Alexander A. Razborov and Alexander A. Sherstov. Thesign-rank ofAC0. SIAM J. Comput.,39(5):1833–1855, 2010.

[Smo87] Roman Smolensky. Algebraic methods in the theory oflower bounds for Boolean circuit com-plexity. In STOC, pages 77–82, 1987.

13

[SSS97] M.A. Shokrollahi, D.A. Spielman, and V. Stemann. A remark on matrix rigidity.InformationProcessing Letters, 64(6):283 – 285, 1997.

[SV12] Rocco A. Servedio and Emanuele Viola. On a special case of rigidity. Electronic Colloquiumon Computational Complexity (ECCC), 19:144, 2012.

[Tar93] Jun Tarui. Probabilistic polynomials, AC0 functions and the polynomial-time hierarchy.Theor.Comput. Sci., 113(1):167–183, 1993.

[Val77] Leslie G. Valiant. Graph-theoretic arguments in low-level complexity. InMathematical Founda-tions of Computer Science (MFCS), pages 162–176, Berlin, Heidelberg, 1977. Springer BerlinHeidelberg.

[Wil14a] Richard Ryan Williams. The polynomial method in circuit complexity applied to algorithmdesign (invited talk). InFSTTCS, pages 47–60, 2014.

[Wil14b] Ryan Williams. Faster all-pairs shortest paths via circuit complexity. InSymposium on Theoryof Computing, STOC 2014, New York, NY, USA, May 31 - June 03, 2014, pages 664–673, 2014.

[Wun12] Henning Wunderlich. On a theorem of razborov.Computational Complexity, 21(3):431–477,2012.

[Yao83] Andrew C. Yao. Lower bounds by probabilistic arguments (extended abstract). InFOCS, pages420–428. IEEE, 1983.

A Rigidity Upper Bounds For High Error

In this section, we prove upper bounds on the rigidity of the Walsh-Hadamard transform in the regime wherethe error is constant, or much larger than 1/2n; this setting is of interest for communication complexity lowerbounds.

Reminder of Theorem 1.2 For every integer r∈ [22n], one can modify at most22n/r entries of Hn and

obtain a matrix of rank at most(n/ log(r))O(√

nlog(r)).

The proof follows from applying an optimal-degree probabilistic polynomial for symmetric functions:

Theorem A.1 ([AW15]). There is a probabilistic polynomial over any field, or the integers, for any sym-metric Boolean function on n variables, with errorε and degree O(

nlog(1/ε)).

Proof of Theorem 1.2. Set ε = 1/r, and define the Boolean functionIP2 : 0,12n → −1,1 byIP2(x,y) = (−1)〈x,y〉 for all x,y∈ 0,1n. We can see thatHn is the truth table matrixMIP2. By Corollary

2.2, it is sufficient to construct a probabilistic polynomial for IP2 with errorε and(n/ ln(1/ε))O(√

nlog(1/ε))

monomials. Consider the Boolean functionPARITY(z1, . . . ,zn) = (−1)z1+···+zn for all z∈ 0,1n, and notethat IP2(x1, . . . ,xn,y1, . . . ,yn) = PARITY(x1y1,x2y2, . . . ,xnyn). SincePARITY is symmetric, by TheoremA.1 it has a probabilistic polynomialP of errorε and degreed = O(

nlog(1/ε)). Hence, the distributionof p(x1y1, . . . ,xnyn) over p drawn fromP is a probabilistic polynomial forIP2. Since we are only interestedin the value ofp(z) whenz∈ 0,1n, we can makep multilinear by applying the equationv2 = v to all

variables. Then the number of monomials ofp is at most∑O(√

nlog(1/ε))i=0

(ni

)

≤ (n/ ln(1/ε))O(√

nlog(1/ε)).Since inp(x1y1, . . . ,xnyn) we are substituting in a monomial for each variable, its expansion has the samenumber of monomials asp, as desired.

14

B Rigidity Upper Bound for SYM-AND circuits

Here we generalize Theorems1.1 and1.2 to SYM AND circuits. In the proof of Theorem1.1, the keyproperty of theIP2 function required is that has the form

IP2(x1, . . . ,xn,y1, . . . ,yn) = f (x1∧y1, . . . ,xn∧yn),

where f is a symmetric Boolean function (in our case,f computes parity). The same proof yields thefollowing generalization:

Theorem B.1. For every symmetric function f: 0,1n → R, define the function IPf : 0,12n → R byIPf (x,y) = f (x1 ∧ y1, . . . ,xn ∧ yn). For all sufficiently smallε , there is aδ < 1 and a matrix of rank2δn

which differs from the truth table matrix MIPf in at most2(1+ε)n entries.

The proof of Theorem1.2 only requires a probabilistic polynomial construction in Corollary 2.2. Ourprobabilistic matrix distribution simply substitutes monomials into the probabilistic polynomial of Theo-remA.1 for any symmetric function. Since each monomial can be viewed as anAND, the same argumentwill work for any SYMAND circuit.

Theorem B.2. For any Boolean function f: 0,12n → R which can be written as aSYM AND circuitwith sAND gates, and for every integer r∈ [22n], one can modify22n/r entries of the truth table matrix Mfand obtain a matrix of rank at most(s/ logr)O(

√slogr).

C Explicit Rigid Matrices and Threshold Circuits

In this section, we show how explicit rigidity lower bounds would also imply circuit lower bounds wherewe currently only know weak results (e.g., we know that some functions inENP do not have such circuits).

Theorem C.1. For every constantδ > 0 and everyAC0 LTF AC0 LTF circuit C of size-s= n2−δ anddepth-d= o(log(n)/ log log(n/ε)), there exists aγ > 0 such that the truth table of C as a2n/2×2n/2 matrix

MC has rigidityRMC

(

2n1−γ log(1/ε))

≤ ε2n, for all ε ∈ (1/2n,1), over any field.

Our proof will use a technique by Maciel and Therien for converting each middle layerLTF gate into anequivalentAC0MAJ circuit:

Theorem C.2([MT98] Theorem 3.3, [ACW16] Theorem 7.1). For everyα > 0, everyLTF on n inputs canbe computed by a polynomial-sizeAC0 MAJ circuit where the fan-in of eachMAJ gate is n1+α and thecircuit has depth O(log(1/α)).

We will also use Tarui’s probabilistic polynomial forAC0:

Theorem C.3([Tar93] Theorem 3.6). Every circuit inAC0 with depth d has a probabilistic polynomial overZ of degree O(logd(n)) and error1/2logO(1)(n).

Proof of Theorem C.1. By LemmaD.3, eachLTF gate in the bottom layer hasε/s-probabilistic rankO(n2s/ε). We will design a probabilistic polynomial for the upperAC

0 LTF AC0 circuitry, which willgive the desired result when composed with this probabilistic rank expression.

First, eachLTF gate in the middle layer has fan-in at mosts= n2−δ . Applying TheoremC.2with α = δ/2to each, the upperAC0 LTF AC0 circuit becomes aAC0 MAJAC0 where eachMAJ gate has fan-in atmostn(2−δ )(1+δ/2) = n2−δ 2/2, and the depth is stillO(d).

15

We can now apply the probabilistic polynomial forAC0 from TheoremC.3 with degreeO(logd(n))

error 1/2logO(1)(n) to theAC0 circuits, and the probabilistic probabilistic polynomialfor symmetric func-

tions onn2−δ 2/2 bits from TheoremA.1 with error ε/s and degreeO(n1−δ 2/4 log(s/ε)) to theMAJ gatesin the middle. This results in a probabilistic polynomial ofdegreeO(n1−δ 2/4 logO(d)(n/ε)). For d =o(log(n)/ log log(n/ε)), this isO(n1−β ) for anyβ ∈ (0,δ 2/4).

We can view the terms in the probabilistic rank expression for the LTF gates in the bottom layer asvariables that we substitute into this probabilistic polynomial; the number of monomials in this expansionwill upper bound the rank, as in Lemma2.1. Since there are at mosts such gates, and each probabilisticrank expression hasO(n2s/ε) terms, we are substitutingO(n2s2/ε) terms into our polynomial. Hence, thenumber of monomials will be upper bounded by

(n2s2/ε)O(n1−β ) = 2O(n1−γ) log(1/ε),

for anyγ < β . This is of the desired form, where we can pick any positive value γ < δ 2/4. The correctnessfollows by union bounding over all≤ sprobabilistic substitutions we make, each of which has error proba-bility at mostε/s.

From the above theorem, setting the errorε appropriately, we infer a new consequence of explicit rigidmatrices:

Reminder of Theorem1.3 Let K be an arbitrary field, andMn be a family of Boolean matrices such that(a) Mn is n×n, (b) there is a poly(logn) time algorithm A such that A(n, i, j) prints Mn(i, j), and (c) there isa δ > 0 such that for infinitely many n,

RMn

(

2(logn)1−δ)

≥ n2

2(logn)δ/2 over K.

Then the language(n, i, j) | Mn(i, j) = 1 ∈ P does not haveAC0 LTF AC0 LTF circuits of n2−ε -sizeand o(logn/ log logn)-depth, for allε > 0.

Therefore, proving strong rigidity lower bounds for explicit matrices overR has consequences for Booleancircuit complexity as well. Indeed, the desired circuit lower bounds could be derived from lower-boundingprobabilistic rank.

D Sign-rank Rigidity and Depth-Two Threshold Circuits

Given a matrixA∈Rn×n, its sign rank is the minimum rank of anyB∈ −1,1n×n such thatsign(A[i, j]) =sign(B[i, j]) for all i, j ∈ [n]. Theε-probabilistic sign-rank ofA is defined analogously as with probabilisticrank. We sayA hassign rank r-rigidity t if a minimum oft entries ofA need to be modified in order forA tohave sign rank at mostr.

First, we observe (in AppendixE) that in the sign-rank setting, random -1/1 matrices are still rigid: forexample, with high probability, a random -1/1 matrix has sign-rank-(n/ log2n) rigidity at leastΩ(n2). Eventhough most -1/1 matrices have high sign-rank rigidity, we show that the truth table of a smallLTF LTFcircuit is always close to a matrix of low sign-rank. For evenn, we say a functionf : 0,1n → 0,1hasε-probabilistic sign rank rif the truth table ofC construed as a 2n/2 × 2n/2 matrix hasε-probabilisticsign-rankr.

Theorem D.1. For every function f with aLTFLTF circuit of size s, and everyε > 0, theε-probabilisticsign-rank of f is O(s2n2/ε). Moreover, we can sample a low-rank matrix from the distribution of matricesin 2n/2 ·poly(s,n) time.

16

We will prove this theorem in a few steps. LetEQn : 0,12n → 0,1 be the equality function, i.e.,EQn(x,x) = [x= y] (using Iverson bracket notation). Similarly, letLEQn : 0,12n →0,1 be the functionLEQn(x,x) = [x≤ y] wherex andy are interpreted as integers in0, . . . ,2n−1.

Lemma D.1. For every n, EQn hasε-probabilistic rank at most O(1/ε) over any field.

Proof. We mimic a well-known randomized communication protocol for EQn. Pick k = ⌈log2(1/ε)⌉ uni-formly random subsetsS1, . . . ,Sk ⊆ 0,1n, and define the hash functionsh1, . . . ,hk : 0,1n → 0,1 byhi(x) =

j∈Six j . Note thathi(x) 6= hi(y) with 1/2 chance ifx 6= y. Hence, the following expression equals

EQ(x,y) with error probability at mostε :

k

∏i=1

(hi(x)hi(y)+ (1−hi(x))(1−hi(y))). (5)

When expanded out, (5) is a sum of 2k = O(1/ε) terms of the formf (x) ·g(y) for some functionsf andg,each of which has rank one.

Lemma D.2. For every n, LEQn hasε-probabilistic rank at most O(n2/ε) over any field.

Proof. We expressLEQn in terms ofEQpredicates which check for the first bit in whichx andy differ, as

LEQn(x1, . . . ,xn,y1, . . . ,yn) =n

∑i=1

(1−xi) ·yi ·EQi−1(x1, . . . ,xi−1,y1, . . . ,yi−1). (6)

We then get the desired rank bound by replacing eachEQwith the probabilistic rank expression from LemmaD.1 with errorε/n. By the union bound, alln of theEQ predicates will be correct with probability at least1− ε , and hence we will correctly computeLEQn.

Lemma D.3. For every n, every linear threshold function f: 0,12n → 0,1 has ε-probabilistic rankO(n2/ε).

Proof. A linear threshold functionf is defined asf (x1, . . . ,xn,y1, . . . ,yn) = [∑i vixi +∑i wiyi ≥ k], where allvi ’s, wi ’s, andk are reals. We want to show that the 2n×2n matrix indexed byxi-assignments on the rowsandyi-assignments on the columns has low probabilistic rank. We will exploit the fact that the linear formson xi ’s andyi ’s can be preprocessed separately in a rank decomposition.

Definea : 0,1n → R by a(x1, . . . ,xn) = ∑ni=1vixi , andb : 0,1n → R by b(y1, . . . ,yn) = k−∑n

j=1wiyi .Hence

f (x,y) = [a(x) ≤ b(y)] .

Let L be the list, sorted in increasing order, of all values ofa(x) andb(y), for all x∈ 0,1n andy∈ 0,1n.Then define the functionα : 0,1n → 0,1n+1 whereα(x) equals the earliest index ofa(x) in the sortedlist L, interpreted as an+1 bit number. Defineβ : 0,1n →0,1n+1 similarly. Then

f (x,y) = LEQn+1(α(x),β (y)).

So theε-probabilistic rank ofM f is at most that ofMEQn+1, which we upper-bounded in LemmaD.2.

Now we are ready to upper-bound the probabilistic sign-rankof depth-two threshold circuits:

17

Proof of Theorem D.1. We interpret ourLTF LTF circuit C as a function on two groups ofn/2 bits,x1, . . . ,xn/2, andy1, . . . ,yn/2. Let w1, . . . ,ws,k∈ R be the weights of the output gate, so that

C(x1, . . . ,xn,y1, . . . ,yn) =

[

s

∑i=1

wi · fi(x1, . . . ,xn,y1, . . . ,yn)≤ k

]

,

for sdifferentLTF functions fi . By LemmaD.3, the truth table matrixM fi of eachfi has(ε/s)-probabilisticrank r = O(n2s/ε). Our probabilistic distribution of matrices forMC can be constructed as follows: for alli = 1, . . . ,s, draw a random rank-r matrix Pi from the distribution forM fi , and set

QC = (−k · I)+∑i

(wi ·Pi).

QC has rank at mostsr+1≤ O(n2s2/ε) and for all(~x,~y), Pr[sign(QC[~x,~y]) 6=C(~x,~y)]≤ ε .

Are there explicit matrices with non-trivial sign-rank rigidity? We observe that the best-known rankrigidity lower bounds forHn extend to sign-rank rigidity:

Theorem D.2(Follows from Razborov and Sherstov [RS10]). For all n, and r∈ [2n/2,2n], the sign-rank-rrigidity of Hn is at leastΩ(4n/r).

Proof. Theorem 5.1 of [RS10] gives the following lower bound on sign-rank: given any matrix A∈−1,1n×n,suppose that all buth entries of matrixA have absolute value at leastγ . Then

sign-rank(A)≥ γn2

||A||n+ γh,

where||A|| is the spectral norm ofA. For the case ofHn, if we modify h := 4n/r entries arbitrarily, all buthentries have absolute value equal to 1. Thus

sign-rank(Hn)≥4n

||Hn||n+4n/r.

As ||Hn|| ≤ O(2n/2) [For02], we have sign-rank(Hn)≥ Ω(4n/(23n/2+4n/r))≥ Ω(2n/2+ r)≥ Ω(r).

Can the above lower bound be improved slightly? Combining the previous two theorems, it followsthat any minor improvement in the above rank/rigidity trade-off would begin to imply lower bounds forLTFLTF:

Theorem D.3. Suppose there is anα > 0 such that for infinitely many n, the sign rank r-rigidity of Hn isΩ(4n/r1−α), for some r≥ ω(n2/α s(n)2/α ). Then the Inner Product Modulo2 does not haveLTF LTFcircuits of s(n) gates.

Proof. Suppose the sign rankr-rigidity of Hm is Ω(4n/r1−α ). Let ε = 1/r1−α . It follows that theε-probabilistic sign-rank ofHn is greater thanr. But for aLTFLTF function withs gates, its matrix alwayshasΩ(ε)-probabilistic rankO(s2n2/ε) = O(s2n2r1−α), by TheoremD.1. Thus we have a contradictionwhenO(s2n2r1−α) is asymptotically less thanr, i.e.,

r = ω(n2/α s2/α),

corresponding to ans-gate lower bound againstLTF LTF circuits. SinceHn is just a linear translation ofthe matrix for Inner Product Modulo 2, the proof is complete.

For instance, proving the sign-rank 2αn-rigidity of Hn is at least 4n/2.999αn for someα > 0 would implyexponential-gate lower bounds for depth-two threshold circuits computing IP2.

18

E Random Matrices are Sign-Rank Rigid

The proof that random -1/1 matrices have high sign-rank rigidity follows readily from recent work:

Theorem E.1(Follows from Alon-Moran-Yehudayoff [AMY16]). Let r(n) = o(n/ logn). For all sufficientlylarge n, a random n× n matrix with−1/1 entries has sign-rank-r(n) rigidity at least Ω(n2), with highprobability.

Proof. There are 2n2

matrices over−1,1. The number of distinct matrices with sign rank at mostr isbounded by 2O(rn logn) [AMY16]. For a fixed matrixM, the number of matrices within Hamming distancedof M is at mostO(

(n2

t

)

). Thus the number of matrices for which up tot entries can be changed to obtain amatrix of sign rank at mostr, is upper-bounded by

2O(rn logn) ·(

n2

t

)

≤ nO(rn) · (en2/t)t .

Suppose we sett = εn2. Then the above quantity is at most

nO(rn) · (e/ε)εn2.

For r = o(n/ logn) andε log2(e/ε)< 1, a random matrix is not among these matrices with high probability.Therefore a random matrix has sign rank-o(n/ logn) rigidity Ω(n2) with high probability.

F Equivalence Between Probabilistic Rank Modulo m and BP-MODm Com-munication Complexity

Here we sketch how probabilistic rank overZm is equivalent toBP ·MODmP communication complexity:

Proposition 2. Let m> 1 be an integer, let f: 0,1n ×0,1n → 0,1, and let Mf be its truth tablematrix. Let Cε( f ) be theBP ·MODmP communication complexity of f with errorε , and letε-rankZm(M f )be theε-probabilistic rank of Mf overZm. Then Cε( f )≤ log2(ε-rankZm(M f )+1)≤ 2Cε( f ).

This proposition is different from the one quoted in the introduction (giving constant-factor equivalencesbetween the log of the rank and the communication complexity) because we are assuming a more stringentcommunication model here. However, the more general model is often taken as the definition, in which casethe probabilistic rank and communication complexity trulycoincide.

First, given a distribution of low-rank matrices forM f , it is easy to construct a protocol forf : Aliceand Bob publicly randomly sample a matrix from the distribution, which is a product of two matricesAand B. Alice takes the row ofA of length r corresponding to her input, Bob takes the column ofB oflength r corresponding to his, and they then compute the inner product of these two vectors overZm with⌈log2(r +1)⌉ communication in theMODmP model.

To construct a distribution of matrices from communicationprotocols, we do a simple modification of theBPP

⊕P communication model. In fact, sometimes the literaturedefinestheBPP⊕P communication modelin this modified way [GPW16]. After the public randomness is chosen, Alice and Bob can, along with theirc nondeterministic bits, also sum over all possibletranscriptsof at mostc bits between them. For eachchoice of randomness and nondeterminism there is a unique accepting transcript, so this extra choice doesnot alter the number of accepting communication patterns. But in this modified version, now Alice and Bobdo not even have to communicate: they only have to send a single bit indicating whether they would acceptor not, given the transcript and the nondeterminism. From such a protocol, it is straightforward to constructa 2n×22c matrix A representing Alice’s protocol and a 22c×2n matrix B representing Bob, for any givenstring of public randomness.

19

G Random Self-Reducibility, Rigidity, and Probabilistic Rank

The reduction from rigidity to probabilistic rank works forany (non-adaptive) random self-reducible func-tion [FF93] that makes a small number of oracle calls. Our notion of random self-reducibility is adapted forthe communication complexity setting (for example, we do not care about the feasibility of the reduction).

Definition G.1. A function f : 0,1n ×0,1n → 0,1 is k-random self-reducibleif there are randomsampling procedures S1,S2 and a function g: 0,1k →0,1 such that:

(a) S1 takes x∈ 0,1n and a random string r and outputs x1, . . . ,xk ∈ 0,1n such that for all n-bitstrings z,Prr [xi = z] = 1/2n for all i,

(b) S2 takes y∈ 0,1n and a random string s and outputs y1, . . . ,yk ∈0,1n such that for all n-bit stringsz,Prs[yi = z] = 1/2n for all i, and

(c) f(x,y) = g( f (x1,y1), . . . , f (xk,yk)).

Requirements (a) and (b) in the definition ensures that eachxi and yi are uniform random variables;requirement (c) says that we can reconstructf (x,y) from the valuesf (x1,y1), . . . , f (xk,yk).

Theorem G.1. Let r,n∈N andε ∈ (0,1), and let f : 0,12n →0,1 be k-random self-reducible. SupposeM f has rank-r rigidity at mostε4n over K. Then the(kε)-probabilistic rank of Mf over K is at most O(

(krk

)

).

Proof. Suppose there is an 2n× r matrix A andr ×2n matrix B, such thatM f andA ·B differ in at mostε4n

entries. We construct a distribution of low-rank matrices for M f as follows.Let P(z1, . . . ,zk) be the unique multilinear polynomial overK that represents the functiong from the

random self-reduction forf . Givenk rowsX1, . . . ,Xk ∈ Kr of A, andk columnsY1, . . . ,Yk ∈ Kr of B, definea polynomial in 2kr variables:

Q(X1, . . . ,Xk,Y1, . . . ,Yk) = P(〈X1,Y1〉 , . . . ,〈Xk,Yk〉).Treating each term of the formXi[ j] ·Yi [ j] as a variable,Q can be written as a sum oft ≤

(krk

)

total terms.Call the termsm1, . . . ,mt , each of which are over 2kr variables.

Let r be a random string forS1 andsbe a random string forS2. Forx∈ 0,1n, let x1, . . . ,xk ∈ 0,1n bethe outputs ofS1(x) with randomnessr. We define thexth row of a new 2n× t matrix Ar to be

[m1(A[x1, :], . . . ,A[xk, :],~1, . . . ,~1), . . . ,mt(A[x1, :], . . . ,A[xk, :],~1, . . . ,~1)].

For y ∈ 0,1n, let y1, . . . ,yk be the outputs ofS2(x) with randomnesss. Define theyth column of a newt ×2n matrix Bs to be

[m1(~1, . . . ,~1,B[:,y1], . . . ,B[:,yk]), . . . ,mt(~1, . . . ,~1,B[:,y1], . . . ,B[:,yk])]T .

Then, for all(x,y) ∈ 0,1n×0,1n, the inner product of thexth row ofAr and theyth column ofBs is

∑i

mi(A[x1, :], . . . ,A[xk, :],B[:,y1], . . . ,B[:,yk]) = Q(A[x1, :], . . . ,A[xk, :],B[:,y1], . . . ,B[:,yk])

= P(〈A[x1, :],B[:,y1]〉 , . . . ,〈A[xk, :],B[:,yk]〉).Since (A · B) differs from M f on an ε-fraction of entries, for uniform randomxi ,y j ∈ 0,1n we have〈A[xi , :],B[:,yi ]〉 6= f (xi ,yi) with probability at mostε . So with probability at least 1− kε , f (xi ,yi) =〈A[xi , :],B[:,yi ]〉 for all i = 1, . . . ,k. Thus the polynomialP(〈A[x1, :],B[:,y1]〉 , . . . ,〈A[xk, :],B[:,yk]〉) beingimplemented byAr ·Bs outputs f (x,y) with probability at least 1− kε . Hence all matricesCr,s = Ar ·Bs inour defined distribution have rank at mostO(

(krk

)

), and for every(x,y) ∈ 0,1n ×0,1n, Prr,s[Cr,s[x,y] =f (x,y)] ≥ 1−kε .

20