hypothesis testing for structured probability...

Hypothesis Testing for Structured Probability Distributions

Ilias Diakonikolas USC

Joint work with Daniel Kane (UCSD)

Vladimir Nikishkin (Edinburgh)

What this talk is about

Basic object of study: Probability distributions over ordered domain: or

Notation: p, q: either pmf or pdf

[n] = 1, . . . , n I = [a, b] ⊆ R

Menu Explaining the title: •  Let be a family of probability distributions

•  Identity Testing Problem: −  Distinguish between the cases p=q and dist (p, q) > ε −  Minimize sample size, computation time

Unknown 1, 2, 2, 4, 3,…

Known/Unknown

2, 1, 2, 3, 1,…

Total Varia0on Distance dTV(p, q) = (1/2)p− q1

D

p ∈ D

q ∈ D

This Talk

Unified Framework for Identity Testing: Leads to sample-optimal and computationally efficient

estimators for a variety of structured distribution families.

& (Matching Information-Theoretic Lower Bounds)

Outline

§  Introduction, Related and Prior Work §  Framework Overview §  Testing Identity to a Fixed Distribution §  Testing Closeness between two Unknown Distributions §  Future Directions and Concluding Remarks

Distribution Testing (Hypothesis Testing)

Given samples (observations) from one (or more) unknown probability distribution(s) (model), decide whether it satisfies a certain property. •  Introduced by Karl Pearson (’99). •  Classical Problem in Statistics [Neyman-Pearson’33, Lehman-Romano’05]

•  Last twenty years (TCS): property testing [Goldreich-Ron’00, Batu et al. FOCS’00/JACM’13]

Related Work – Property Testing (I)

Focus has been on arbitrary distributions over support of size . Testing Identity to an explicitly known Distribution: •  [Goldreich-Ron’00]: upper bound for uniformity testing

(collision statistics) •  [Batu et al., FOCS’01]: upper bound for testing

identity to any known distribution.

•  [Paninski ’03]: upper bound of for uniformity testing, assuming . Lower bound of .

•  [Valiant-Valiant, FOCS’14, D-Kane-Nikishkin, SODA’15]: upper bound of for identity testing to any known distribution.

n

O(√n/

4)

O(√n) · poly(1/)

O(√n/

2) = Ω(n−1/4) Ω(

√n/2)

O(√n/

2)

Related Work – Property Testing (II)

Focus has been on arbitrary distributions over support of size . Testing Closeness between two unknown distributions: •  [Batu et al., FOCS’00]: upper bound for testing closeness between two unknown discrete distributions.

•  [P. Valiant, STOC’08]: lower bound of for constant error.

•  [Chan-D-Valiant-Valiant, SODA’14]: tight upper bound and lower bound of

n

O(n2/3 log n/8/3)

Ω(n2/3)

O(maxn2/3/

4/3, n

1/2/

2)

Summary of Related Work

Testing Closeness

Tight Bound

[Chan-D-Valiant-Valiant’14]

Learning Tight Bound

[folklore]

Testing Identity

Tight Bound

[Valiant-Valiant’14, D-Kane-Nikishkin’15]

support size: , total variation distance error: n

Θ(maxn2/3/4/3, n1/2/2)

Θ(n1/2/2)

Θ(n/2)

Estimating Structured Distributions

•  Statistical Estimation well-understood for arbitrary discrete distributions.

•  How about for structured distributions? •  Long line of work in statistics since the 1950’s [Grenander’56, Rao’69,

Wegman’70, Birge’87,…]. Focus has been on density estimation (learning).

•  [Batu-Kumar-Rubinfeld, STOC’04]: identity testing of monotone distributions.

•  [Daskalakis-D-Servedio-Valiant-Valiant, SODA’13]: generalization to

k-modal distributions.

Types of Structured Distributions

bimodal

log-‐concave

monotone •  Distributions with “shape restrictions”

•  Simple combinations of simple distributions

mixtures of Gaussians

Mixtures of simple distributions

Sums of simple distributions

+ + … + Poisson Binomial Distribu9ons

Outline


First Step: Changing the metric

Identity Testing Problem for family Given (sample) access to : •  Output “YES” (with high probability) if (completeness) •  Output “NO” (with high probability) if (soundness)

Reduces to Identity Testing Problem under - distance Given (sample) access to : •  Output “YES” (with high probability) if •  Output “NO” (with high probability) if

Dp, q ∈ D

p = qp− q1 ≥

Ak

p, qp = q

p− qAk ≥

-Distance between Distributions (I)

Definition. For and , we define the - distance between as follows: Facts: •  For , (essentially) equivalent to the Kolmogorov distance.

•  For any , we have .

•  We have:

Ak

p, q

I1 I2 I3 Ik

p− qAk = supI=(Ii)ki=1

k

i=1

|p(Ii)− q(Ii)|

p, q : R → [0, 1]

Ak

k ≥ 2

k = 2

k ≥ 2 p− qAk ≤ p− q1

limk→∞

p− qAk = p− q1

-Distance between Distributions (II)

Definition. For and , we define the - distance between as follows: Upper Bound on Sample Complexity: For a family of one-dimensional distributions and , let be the smallest integer such that for any it holds Then, the parameter is the “right” complexity measure for estimating a property of the family .

k ≥ 1 Ak

p, q

D > 0p, q ∈ D

D

k = k(D, )

k

I1 I2 I3 Ik

p− qAk = supI=(Ii)ki=1

k

i=1

|p(Ii)− q(Ii)|

p, q : R → [0, 1]

Ak

p− q1 ≤ p− qAk + /2.

for each :

Overview of Framework

Approximation (Existential Step)

Identity Tester under - distance

(Algorithmic Step) > 0

YES/NO

Error parameter:

Ak

k = k(D, )

min k s.t.p, q ∈ D

p− q1 ≤ p− qAk + /2. = /2

L1-Identity Tester for D

D

Second Step: Design -Distance Tester

Identity Testing Problem under - distance Given (sample) access to : •  Output “YES” (with high probability) if •  Output “NO” (with high probability) if Two fundamentally different regimes: •  One of known explicitly [Testing Identity to Fixed Distribution]. •  Both unknown [Testing Closeness].

Ak

p, q

p = qp− qAk ≥

Ak

p, q

p, q

-distance vs L1 distance

Testing Closeness Tight Bound

Support: [n], L1 distance

- distance

Learning Tight Bound Support: [n], L1 distance

- distance

Testing Identity Tight Bound Support: [n], L1 distance

- distance

Θ(maxn2/3/4/3, n1/2/2)

Θ(n1/2/2)

Θ(n/2)

Ak

Ak

Ak Θ(k/2)

Θ(k1/2/2)

Ak

[VC]

Outline


-Testing Identity to Fixed Distribution

Theorem [D-Kane-Nikishkin’15] For any , , and any explicit there exists a computationally efficient algorithm that distinguishes between the case versus with constant error probability using samples from . Moreover, this sample size is information-theoretically necessary for this task. Remark: •  The upper bound holds both for discrete and continuous distributions.

> 0 k ≥ 2

p = q p− qAk ≥

q

O(k1/2/2)

p

Ak

Applications: L1 -Identity Testing for Structured Distributions

Distribution Family

Sample Size Parameters

t-flat

t-piecewise degree-d

Log-concave

Log-concave t-mixture

t-modal over [n]

MHR over [n]

k = O(t)

k = O(t(d+ 1))

O(−9/4) k = O(−1/2)

k = O(t−1/2)

k = O(t log(n)/)

k = O(log(n)/)

O

t1/2

2

O

(t(d+ 1))1/2

2

O

t1/2

9/4

O

(t log n)1/2

5/2

O

(log n)1/2

5/2

- Identity Testing: Basic Facts

Lemma: Identity testing reduces to uniformity testing. Proof Idea: Appropriately “stretch” the domain size. Henceforth, focus on uniformity testing. Observation: If we know the partition maximizing the discrepancy, can reduce to L1- identity testing over domain of size k.

J1 J2 Jk

p− UAk =k

j=1

|p(Jj)− U(Jj)|

Ak

- Uniformity Testing: First Approach

•  Partition the domain into intervals of equal

length. •  Apply an L1- uniformity tester on the reduced distributions over

these intervals.

Claim: Sample Complexity:

= 10k/ I1, . . . , I

p− UAk − /2 ≤

i=1

|p(Ii)− U(Ii)| ≤ p− UAk

I1 I2 I3 I

J1 J2 Jk

O(1/2/2) = O(k1/2/5/2)

Ak

- Uniformity Testing: Optimal Algorithm

•  Construct several oblivious decompositions of the domain. •  Use L2- uniformity tester over the reduced distributions.

In more detail: •  Consider equal-length interval partitions of the domain.

Partition consists of intervals. •  For each j, apply an L2- uniformity tester with L2 - error •  Accept if and only if all testers accept. Structural Lemma: One of the partitions will detect the discrepancy.

Ak

M = log(1/)I(j) j = k · 2j

j = · 23j/8/1/2j

Outline


-distance vs L1 distance

Testing Closeness Tight Bound

Support: [n], L1 distance

- distance

Learning Tight Bound Support: [n], L1 distance

- distance

Testing Identity Tight Bound Support: [n], L1 distance

- distance

Θ(maxn2/3/4/3, n1/2/2)

Θ(n1/2/2)

Θ(n/2)

Ak

Ak

Ak Θ(k/2)

Θ(k1/2/2)

Θ(maxk2/3/4/3, k1/2/2)

Ak

[VC]

- Equivalence Testing

Theorem For any and , and any distributions there exists a computationally efficient algorithm that distinguishes between the case versus with constant error probability using samples. Moreover, this sample size is information-theoretically necessary for this task. Remarks: •  The upper bound holds both for discrete and continuous distributions.

•  The lower bound applies to continuous distributions or discrete distributions over a domain of size .

> 0 k ≥ 2 p, q

p = q p− qAk ≥

O(maxk4/5/6/5, k1/2/2)

N ≥ 2poly(k)

Ak

- Closeness Testing: Basic Facts

•  No oblivious decomposition can work: Discrepancy may be hidden in intervals even though reduced distributions are the same.

•  Can partition the domain into “light” intervals, and apply standard

closeness tester on reduced distributions over these intervals. •  Inherently leads to sample algorithms: Need adaptive

partition in which at least one distribution has small mass.

•  How do we obtain sample size?

Ak

o(k)

Ω(k)

- Closeness Testing Algorithm

Consider the following “order-based” algorithm: •  Let . Draw samples from p,

and samples from q.

•  Let be the union of and sorted in increasing order.

•  Let

•  If , return “NO”; otherwise, return “YES.”

m = O(k4/5/6/5) m1 = Poi(m)m2 = Poi(m)

Sp

Sq

S Sp Sq

Z = #(pairs of consecutive elements of S from same distribution)−#(pairs of consecutive elements of S from different distributions)

Z > 3√m

Ak

Closeness Testing: Sketch of Analysis

•  Bound mean and variance and using concentration.

•  Completeness: and •  Soundness: Main technical step bounding from below.

-  Easy to argue: - Highly non-trivial:

E[Z] = 0

E[Z]

Var[Z] = O(m)

E[Z] = Ω(m33/k2)

Var[Z] = 2m− 1

Outline


Future Directions

Unified Technique for Identity Testing: Use - distance as a proxy. Concrete Open Problems: •  Understanding the regime [DKN’16]. •  Testing Other Properties of Structured Distributions: Independence, Entropy, etc. A Few Open-ended Challenges: •  Other Criteria: Privacy, Communication •  High-Dimensional Structured Distributions •  Tradeoffs between sample size and computational efficiency?

Thank you for your attention!

log n ≤ k ≤ n

Ak

Sketch of Lower Bound (I)

•  Suppose algorithm only considers ordering of the samples. •  Consider the following instance:

•  If less than 3 samples land in a mini-bucket, no useful information.

…

p pq p = q

2k

k

2k

1−

k

Sketch of Lower Bound (II)

•  If less than 3 samples land in a mini-bucket, no useful information

for an order-based tester. •  Expected number of buckets with 3 samples •  Need this quantity to be How about for general testers? •  Can embed above instance into larger domain, so that ordered-

based testers suffice. •  Non-constructive argument (Ramsey’s theorem).

…

p pq p = q

2k

k

2k

1−

k

km

k

3

√m

hypothesis testing for structured probability...

Documents