harmonic analysis in learning theory

47
Harmonic Analysis in Learning Theory Jeff Jackson Duquesne University

Upload: pandora-page

Post on 03-Jan-2016

28 views

Category:

Documents


3 download

DESCRIPTION

Harmonic Analysis in Learning Theory. Jeff Jackson Duquesne University. Themes. Harmonic analysis is central to learning theoretic results in wide variety of models Results generally strongest known for learning with respect to uniform distribution - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Harmonic Analysis in  Learning Theory

Harmonic Analysis in Learning Theory

Jeff Jackson

Duquesne University

Page 2: Harmonic Analysis in  Learning Theory

Themes

• Harmonic analysis is central to learning theoretic results in wide variety of models– Results generally strongest known for

learning with respect to uniform distribution

• Work on learning problems has led to some new harmonic results– Spectral properties of Boolean function

classes– Algorithms for approximating Boolean

functions

Page 3: Harmonic Analysis in  Learning Theory

Uniform Learning Model

Boolean Function Class F

(e.g., DNF)

Example OracleEX(f)

Target functionf : {0,1}n {0,1}

Learning AlgorithmA

UniformRandomExamples

< x, f(x) >

Hypothesish:{0,1}n {0,1} s.t. Prx~U [f(x) ≠ h(x) ] < ε

Accuracyε > 0

Page 4: Harmonic Analysis in  Learning Theory

Circuit Classes

• Constant-depth AND/OR circuits (AC0 without the polynomial-size restriction; call this CDC)

• DNF: depth-2 circuit with OR at root

. . . . . . . . .} d levels

v1 v2 v3 vn

. . . . . .

Negations allowed

Page 5: Harmonic Analysis in  Learning Theory

Decision Treesv3

v1v2

v40

0

01

1

Page 6: Harmonic Analysis in  Learning Theory

Decision Treesv3

v1v2

v40

0

01

1

x3 = 0

x = 11001

Page 7: Harmonic Analysis in  Learning Theory

Decision Treesv3

v1v2

v40

0

01

1

x1 = 1

x = 11001

Page 8: Harmonic Analysis in  Learning Theory

Decision Treesv3

v1v2

v40

0

01

1

x = 11001f(x) = 1

Page 9: Harmonic Analysis in  Learning Theory

Function Size

• Each function representation has a natural size measure:– CDC, DNF: # of gates– DT: # of leaves

• Size sF (f) of f with respect to class F is size of smallest representation of f within F– For all Boolean f,

sCDC(f) ≤ sDNF(f) ≤ sDT(f)

Page 10: Harmonic Analysis in  Learning Theory

Efficient Uniform Learning Model

Boolean Function Class F

(e.g., DNF)

Example OracleEX(f)

Target functionf : {0,1}n {0,1}

Learning AlgorithmA

UniformRandomExamples

< x, f(x) >

Hypothesish:{0,1}n {0,1} s.t. Prx~U [f(x) ≠ h(x) ] < ε

Accuracyε > 0

Timepoly(n,sF ,1/ε)

Page 11: Harmonic Analysis in  Learning Theory

Harmonic-Based Uniform Learning

• [LMN]: constant-depth circuits are quasi-efficiently (n polylog(s/ε)-time) uniform learnable

• [BT]: monotone Boolean functions are uniform learnable in time roughly 2√n logn

– Monotone: For all x, i: f(x|xi=0) ≤ f(x|xi=1)

– Also exponential in 1/ε (so assumes ε constant)– But independent of any size measure

Page 12: Harmonic Analysis in  Learning Theory

Notation

• Assume f: {0,1}n {-1,1}

• For all a in {0,1}n, χa (x) ≡ (-1) a · x

• For all a in {0,1}n, Fourier coefficient f(a) of f at a is:

• Sometimes write, e.g., f({1}) for f(10…0)

)]()([)(ˆ E~

xxa ax

ffU

^

^

^

Page 13: Harmonic Analysis in  Learning Theory

Fourier Properties of Classes

• [LMN]: f is a constant-depth circuit of depth d andS = { a : |a| < logd(s/ε) } ( |a| ≡ # of 1’s in a )

• [BT]:f is a monotone Boolean function andS = { a : |a| < √n / ε) }

:if)(ˆS

2 a

af

Page 14: Harmonic Analysis in  Learning Theory

Spectral Properties

Page 15: Harmonic Analysis in  Learning Theory

Proof Techniques

• [LMN]: Hastad’s Switching Lemma + harmonic analysis

• [BT]: Based on [KKL]– Define AS(f) ≡ n · Prx,i[f(x|xi=0) ≠ f(x|xi=1)]– If S = {a : |a| < AS(f)/ε} then ΣaS f2(a) < ε– For monotone f, harmonic analysis + Cauchy-

Schwartz shows AS(f) ≤ √n– Note: This is tight for MAJ

^

Page 16: Harmonic Analysis in  Learning Theory

Function Approximation

• For all Boolean f,

• For S {0,1}n, define

• [LMN]:

aa

a )(ˆ}1,0{

n

ff

aa

a )(ˆS

S

ff

S

2S~ )(ˆ))](()([Pr

ax axx ffsignfU

Page 17: Harmonic Analysis in  Learning Theory

“The” Fourier Learning Algorithm

• Given: ε (and perhaps s, d)

• Determine k such that for S = {a : |a| < k}, ΣaS f2(a) < ε

• Draw sufficiently large sample of examples <x,f(x)> to closely estimate f(a) for all aS– Chernoff bounds: ~nk/ε sample size sufficient

• Output h ≡ sign(ΣaS f(a) χa)

• Run time ~ n2k/ε

^

~

^

Page 18: Harmonic Analysis in  Learning Theory

Halfspaces

• [KOS]: Halfspaces are efficiently uniform learnable (given ε is constant)– Halfspace: wRn+1 s.t. f(x) = sign(w · (xº1))

– If S = {a : |a| < (21/ε)2 } then aS f2(a) < ε

– Apply LMN algorithm

• Similar result applies for arbitrary function applied to constant number of halfspaces– Intersection of halfspaces key learning pblm

^

Page 19: Harmonic Analysis in  Learning Theory

Halfspace Techniques

• [O] (cf. [BKS], [BJTa]): – Noise sensitivity of f at γ is probability that

corrupting each bit of x with probability γ changes f(x)

– NSγ (f) = ½(1-a(1-2 γ)|a| f2(a))

• [KOS]:– If S = {a : |a| < 1/ γ} then aS f2(a) < 3 NSγ (f)

– If f is halfspace then NSε < 9√ ε

^

^

Page 20: Harmonic Analysis in  Learning Theory

Monotone DT

• [OS]: Monotone functions are efficiently learnable given:– ε is constant– sDT(f) is used as the size measure

• Techniques:– Harmonic analysis: for monotone f,

AS(f) ≤ √log sDT(f) – [BT]: If S = {a : |a| < AS(f)/ε} then ΣaS f2(a) < ε– Friedgut: |T| ≤ 2AS(f)/ε s.t. ΣAT f2(A) < ε

^

^

Page 21: Harmonic Analysis in  Learning Theory

Weak Approximators

• KKL also show that if f is monotone,there is an i such that -f({i}) ≥ log2n/n

• Therefore Pr[f(x) = -χ{i}(x)] ≥ ½ + log2n/2n

• In general, h s.t. Pr[f = h] ≥ ½ + 1/poly(n,s) is called a weak approximator to f

• If A outputs a weak approximator for every f in F , then F is weakly learnable

^

Page 22: Harmonic Analysis in  Learning Theory

Uniform Learning Model

Boolean Function Class F

(e.g., DNF)

Example OracleEX(f)

Target functionf : {0,1}n {0,1}

Learning AlgorithmA

UniformRandomExamples

< x, f(x) >

Hypothesish:{0,1}n {0,1} s.t. Prx~U [f(x) ≠ h(x) ] < ε

Accuracyε > 0

Page 23: Harmonic Analysis in  Learning Theory

Weak Uniform Learning Model

Boolean Function Class F

(e.g., DNF)

Example OracleEX(f)

Target functionf : {0,1}n {0,1}

Learning AlgorithmA

UniformRandomExamples

< x, f(x) >

Hypothesish:{0,1}n {0,1} s.t.

Prx~U [f(x) ≠ h(x) ] < ½ - 1/p(n,s)

Page 24: Harmonic Analysis in  Learning Theory

Efficient Weak Learning Algorithm for Monotone Boolean Functions

• Draw set of ~n2 examples <x,f(x)>

• For i = 1 to n– Estimate f({i})

• Output h ≡ argmaxf({i})(-χ{i})

^

^

Page 25: Harmonic Analysis in  Learning Theory

Weak Approximation for MAJ of Constant-Depth Circuits

• Note that adding a single MAJ to a CDC destroys the LMN spectral property

• [JKS]: MAJ of CDC’s is quasi-efficiently quasi-weak uniform learnable– If f is a MAJ of CDC’s of depth d, and if the

number of gates in f is s, then there is a set A {0,1}n such that

• |A| < logd s ≡ k

• Pr[f(x) = χA(x)] ≥ ½ +1/4snk

Page 26: Harmonic Analysis in  Learning Theory

Weak Learning Algorithm

• Compute k = logds

• Draw ~snk examples <x,f(x)>

• Repeat for |A| < k– Estimate f(A)

• Until find A s.t. f(A) > 1/2snk

• Output h ≡ χA

• Run time ~npolylog(s)

^

^

Page 27: Harmonic Analysis in  Learning Theory

Weak ApproximatorProof Techniques

• “Discriminator Lemma” (HMPST)– Implies one of the CDC’s is a weak

approximator to f

• LMN spectral characterization of CDC

• Harmonic analysis

• Beigel result used to extend weak learning to CDC with polylog MAJ gates

Page 28: Harmonic Analysis in  Learning Theory

Boosting

• In many (not all) cases, uniform weak learning algorithms can be converted to uniform (strong) learning algorithms using a boosting technique ([S], [F], …)– Need to learn weakly with respect to near-

uniform distributions• For near-uniform distribution D, find weak hj s.t.

Prx~D[hj = f] > ½ + 1/poly(n,s)

– Final h typically MAJ of weak approximators

Page 29: Harmonic Analysis in  Learning Theory

Strong Learning for MAJ of Constant-Depth Circuits

• [JKS]: MAJ of CDC is quasi-efficiently uniform learnable– Show that for near-uniform distributions, some

parity function is a weak approximator– Beigel result again extends to CDC with poly-

log MAJ gates

• [KP] + boosting: there are distributions for which no parity is a weak approximator

Page 30: Harmonic Analysis in  Learning Theory

Uniform Learning from a Membership Oracle

Boolean Function Class F

(e.g., DNF)

Membership OracleMEM(f)

Target functionf : {0,1}n {0,1}

Learning AlgorithmAf(x)

Hypothesish:{0,1}n {0,1} s.t. Prx~U [f(x) ≠ h(x) ] < ε

Accuracyε > 0

x

Page 31: Harmonic Analysis in  Learning Theory

Uniform Membership Learning of Decision Trees

• [KM]– L1(f) ≡ a |f(a)| ≤ sDT(f)

– If S = {a : |f(a)| ≥ ε/L1(f)} then ΣaS f2(a) < ε

– [GL]: Algorithm (memberhip oracle) for finding {a : |f(a)| ≥ θ} in time ~n/θ6

– So can efficiently uniform membership learn DT

– Output h same form as LMN:h ≡ sign(ΣaS f(a) χa)

^

^ ^

^

^

~

^

Page 32: Harmonic Analysis in  Learning Theory

Uniform Membership Learning of DNF

• [J]– (distributions D) χa s.t.

Prx~D[f(x) = χa(x)] ≥ ½ + 1/6sDNF

– Modified [GL] can efficiently locate such χa given oracle for near-uniform D

• Boosters can provide such an oracle when uniform learning

– Boosting provides strong learning

• [BJTb] (see also [KS]): – Modified Levin algo finds χa in time ~ns2

Page 33: Harmonic Analysis in  Learning Theory

Uniform Learning from a Classification Noise Oracle

Boolean Function Class F

(e.g., DNF)

Classification Noise OracleEXη (f)

Target functionf : {0,1}n {0,1}

Learning AlgorithmAPr[<x, f(x)>]=1-η

Pr[<x, -f(x)>]=η

Hypothesish:{0,1}n {0,1} s.t. Prx~U [f(x) ≠ h(x) ] < ε

Accuracyε > 0

Uniform random x

Error rateη > 0

Page 34: Harmonic Analysis in  Learning Theory

Uniform Learning from a Statistical Query Oracle

Boolean Function Class F

(e.g., DNF)

Statistical Query OracleSQ(f)

Target functionf : {0,1}n {0,1}

Learning AlgorithmAEU[q(x, f(x))] ± τ

Hypothesish:{0,1}n {0,1} s.t. Prx~U [f(x) ≠ h(x) ] < ε

Accuracyε > 0

( q(), τ )

Page 35: Harmonic Analysis in  Learning Theory

SQ and Classification Noise Learning

• [K]– If F is uniform SQ learnable in time

poly(n, sF ,1/ε, 1/τ) then F is uniform CN learnable in time poly(n, sF ,1/ε, 1/τ, 1/(1-2η))

– Empirically, almost always true that if F is efficiently uniform learnable then F is efficiently uniform SQ learnable (i.e., 1/τ poly in other parameters)

• Exception: F = PARn ≡ {χa : a {0,1}n, |a| ≤ n}

Page 36: Harmonic Analysis in  Learning Theory

Uniform SQ Hardness for PAR

• [BFJKMR]– Harmonic analysis shows that for any q, χa:

EU[q(x, χa(x))] = q(0n+1) + q(a º 1)

– Thus adversarial SQ response to (q,τ) is q(0n+1) whenever |q(a º 1)| < τ

– Parseval: |q(b º 1)| < τ for all but 1/τ2 Fourier coefficients

– So ‘bad’ query eliminates only poly coefficients

– Even PARlog n not efficiently SQ learnable

^ ^

^ ^

^

Page 37: Harmonic Analysis in  Learning Theory

Uniform Learning from an Attribute Noise Oracle

Boolean Function Class F

(e.g., DNF)

Attribute Noise OracleEXDN(f)

Target functionf : {0,1}n {0,1}

Learning AlgorithmA<xr, f(x)>, r~DN

Hypothesish:{0,1}n {0,1} s.t. Prx~U [f(x) ≠ h(x) ] < ε

Accuracyε > 0

Uniform random x

Noise modelDN

Page 38: Harmonic Analysis in  Learning Theory

Uniform Learning with Independent Attribute Noise

• [BJTa]:– LMN algorithm produces estimates of

f(a) · Er~DN[χa(r)]

• Example application– Assume noise process DN is a product distribution:

• DN(x) = ∏i (pi(xi) + (1-pi)(1-xi))

– Assume pi < 1/polylog n, 1/ε at most quasi-poly(n) (mild restrictions)

– Then modified LMN uniform learns attribute noisy AC0 in quasi-poly time

^

Page 39: Harmonic Analysis in  Learning Theory

Agnostic Learning Model

Arbitrary Boolean Function

Example OracleEX(f)

Target functionf : {0,1}n {0,1}

Learning AlgorithmA

UniformRandomExamples

< x, f(x) >

Hypothesish:{0,1}n {0,1} s.t.

Prx~U [f(x) ≠ h(x) ]minimized

Page 40: Harmonic Analysis in  Learning Theory

Near-Agnostic Learning via LMN

• [KKM]:– Let f be an arbitrary Boolean function– Fix any set S {1..n} and fix ε– Let g be any function s.t.

• ΣaS g2(a) < ε and

• Pr[f ≠ g] is minimized (call this η)

– Then for h learned by LMN by estimating coefficients of f over S:

• Pr[f ≠ h] < 4η + ε

^

Page 41: Harmonic Analysis in  Learning Theory

Average Case Uniform Learning Model

Boolean Function Class F

(e.g., DNF)

Example OracleEX(f)

D -randomf : {0,1}n {0,1}

Learning AlgorithmA

UniformRandomExamples

< x, f(x) >

Hypothesish:{0,1}n {0,1} s.t. Prx~U [f(x) ≠ h(x) ] < ε

Accuracyε > 0

Page 42: Harmonic Analysis in  Learning Theory

Average Case Learning of DT

• [JSa]:– D : uniform over complete, non-redundant

log-depth DT’s – DT efficiently uniform learnable on average– Output is a DT (proper learning)

Page 43: Harmonic Analysis in  Learning Theory

Average Case Learning of DT

• Technique– [KM]: All Fourier coefficients of DT with min

depth d are rational with denominator 2d

– In average-case tree, coefficient f({i}) for at least one variable vi has odd numerator

• So log(denominator) is min depth of tree

– Try all variables at root and find depth of child trees, choosing root with shallowest children

– Recurse on child trees to choose their roots

^

Page 44: Harmonic Analysis in  Learning Theory

Average Case Learning of DNF

• [JSb]:– D : s terms, each term uniform from terms of

length log s– Monotone DNF with <n2 terms and DNF with

<n1.5 terms properly and efficiently uniform learnable on average

• Harmonic property– In average-case DNF, sign of f({i,j}) (usually)

indicates whether vi and vj are in a common term or not

^

Page 45: Harmonic Analysis in  Learning Theory

Summary

• Most uniform-learning results depend on harmonic analysis

• Learning theory provides motivation for new harmonic observations

• Even very “weak” harmonic results can be useful in learning-theory algorithms

Page 46: Harmonic Analysis in  Learning Theory

Some Open Problems

• Efficient uniform learning of monotone DNF– Best to date for small sDNF is [S], time

~nslog s (based on [BT], [M], [LMN])

• Non-uniform learning– Relatively easy to extend many results to

product distributions, e.g. [FJS] extends [LMN]– Key issue in real-world applicability

Page 47: Harmonic Analysis in  Learning Theory

Open Problems (cont’d)

• Weaker dependence on ε– Several algorithms fully exponential (or worse)

in 1/ε

• Additional proper learning results– Allows for interpretation of learned hypothesis