sublinear algorithms via precision sampling
DESCRIPTION
Sublinear Algorithms via Precision Sampling. Alexandr Andoni (Microsoft Research) joint work with: Robert Krauthgamer (Weizmann Inst.) Krzysztof Onak (CMU). Goal. Compute the number of Dacians in the empire. Estimate S=a 1 +a 2 +…a n where a i [0,1]. sublinearly…. Sampling. - PowerPoint PPT PresentationTRANSCRIPT
Sublinear Algorithms via Precision Sampling
Alexandr Andoni (Microsoft Research)
joint work with:
Robert Krauthgamer (Weizmann Inst.) Krzysztof Onak (CMU)
Goal
Compute the number of Dacians in the empire
Estimate S=a1+a2+…an where ai[0,1]
sublinearly…
Sampling Send accountants to a subset J of provinces
Estimator: S =∑jJ aj * n/J
Chebyshev bound: with 90% success probability0.5*S – O(n/m) < S < 2*S + O(n/m)
For constant additive error, need m~n
Send accountants to each province, but require only approximate counts Estimate ai, up to some pre-selected precision ui: |ai – ai|
< ui
Challenge: achieve good trade-off between quality of approximation to S total cost of estimating each a i to precision ui
Precision Sampling Framework
Formalization
Sum Estimator Adversary
1. fix a1,a2,…an1. fix precisions ui
2. fix a1,a2,…an s.t. |ai – ai| < ui
3. given a1,a2,…an, output S s.t.|∑ai – S| < 1.
What is cost? Here, average cost = 1/n * ∑ 1/ui to achieve precision ui, use 1/ui “resources”: e.g., if ai is itself a sum
ai=∑jaij computed by subsampling, then one needs Θ(1/ui) samples For example, can choose all ui=1/n
Average cost ≈ n This is best possible, if estimator S = ∑a i
Precision Sampling Lemma Goal: estimate ∑ai from ai satisfying |ai-ai|<ui. Precision Sampling Lemma: can get, with 90%
success: O(1) additive error and 1.5 multiplicative error:
S – O(1) < SL < 1.5*S + O(1) with average cost equal to O(log n)
Example: distinguish Σai=5 vs Σai=0 Consider two extreme cases:
if five ai=1: sample all, but need only crude approx (ui=1/10)
if all ai=5/n: only few with good approx ui=1/n, and the rest with ui=1
ε 1+εS – ε < S < (1+ ε)S + ε
O(ε-3 log n)
Precision Sampling Algorithm Precision Sampling Lemma: can get, with 90%
success: O(1) additive error and 1.5 multiplicative error:
S – O(1) < SL < 1.5*S + O(1) with average cost equal to O(log n)
Algorithm: Choose each ui[0,1] i.i.d. Estimator: S = count number of i‘s s.t. ai / ui > 6
(modulo a normalization constant) Proof of correctness:
we use only ai which are (1+ε)-approximation to ai
E[S] ≈ ∑ Pr[ai / ui > 6] = ∑ ai/6. E[1/u] = O(log n) w.h.p.
function of [ai /ui - 4/ε]+ and ui’sconcrete distrib. = minimum of O(ε-3) u.r.v.
O(ε-3 log n)
ε 1+εS – ε < S < (1+ ε)S + ε
Why? Save time:
Problem: computing edit distance between two strings new algorithm that obtains (log n)1/ε approximation in
n1+O(ε) time via efficient property-testing algorithm that uses Precision
Sampling More details: see the talk by Robi on Friday!
Save space: Problem: compute norms/frequency moments in
streams gives a simple and unified approach to compute all lp, Fk
moments, and other goodies More details: now
Streaming frequencies Setup:
1+ε estimate frequencies in small space Let xi = frequency of ethnicity i kth moment: Σxi
k
k[0,2]: space O(1/ε2)
[AMS’96,I’00, GC07, Li08, NW10, KNW10, KNPW11]
k>2: space O(n1-2/k)[AMS’96,SS’02,BYJKS’02,CKS’03,IW’05,BGKS’06,BO10]
Sometimes frequencies xi are negative: If measuring traffic difference (delay, etc) We want linear “dim reduction” L:RnRm
m<<n
Ethnicity Frequency
Dacians 358
Galois 12
Barbarians 2988
Norm Estimation via Precision Sampling Idea:
Use PSL to compute the sum ||x||kk=∑ |xi|k
General approach 1. Pick ui’s according to PSL and let yi=xi/ui
1/k
2. Compute all yik up to additive approximation O(1)
Can be done by computing the heavy hitters of the vector y
3. Use PSL to compute the sum ||x||kk=∑ |xi|k
Space bound is controlled by the norm ||y||2
Since heavy hitters under l2 is the best we can do Note that ||y||2≤||x||2 * E[1/ui]
Streaming Fk moments Theorem: linear sketch for Fk with O(1)
approximation, O(1) update, and O(n1-2/k log n) space (in words).
Sketch: Pick random ui [0,1], si±1, and let yi = si * xi / ui
1/k
throw into one hash table H, size m=O(n1-2/k log n) cells
Update: on (i, a) H[h(i)] += si*a/ui
1/k
Estimator: Maxj[m] |H[j]|k
Randomness: O(1) independence suffices
x1 x2 x3 x4 x5 x6
y1
+y3
y4 y2
+y5+y6
x=
H=
More Streaming Algorithms Other streaming algorithms:
Algorithm for all k-moments, including k≤2 For k>2, improves existing space bounds [AMS96, IW05,
BGKS06, BO10] For k≤2, worse space bounds [AMS96, I00, GC07, Li08, NW10,
KNW10, KNPW11]
Improved algorithm for mixed norms (lp of lk) [CM05, GBD08, JW09] space bounded by (Rademacher) p-type constant
Algorithm for lp-sampling problem [MW’10] This work extended to give tight bounds by [JST’11]
Connections: Inspired by the streaming algorithm of [IW05], but
simpler Turns out to be distant relative of Priority Sampling
[DLT’07]
Finale Other applications for Precision Sampling
framework ? Better algorithms for precision sampling ?
Best bound for average cost (for 1+ε approximation) Upper bound: O(1/ ε3 * log n) (tight for our algorithm) Lower bound: Ω(1/ ε2 * log n)
Bounds for other cost models? E.g., for 1/square root of precision, the bound is O(1 / ε3/2)
Other forms of “access” to ai’s ?