foundations of data science course ramesh hariharan jan 2014hariharan-ramesh.com/ppts/ndim.pdf ·...

High Dimensional SpacesFoundations of Data Science Course

Ramesh Hariharan

Jan 2014

Ramesh Hariharan High Dimensional Spaces

What is Volume?

Volume of a cuboid with sides l1, . . . , ln is l1 ∗ l2 ∗ · · · ∗ ln

For a general object, integrate:

Decompose the object into infinitesimal n-dimensional cuboids

Count the number of such cuboids

Scaling each dimension by r multiplies volume by rn.

What is Volume?

Volume of an n-Dimensional Sphere

Vn(r) = fn × rn for radius r

f1 = 2

f2 = π

f3 = 43π

Does fn increase or decrease with n?

f1 = 2

f2 = π

f3 = 43π

f1 = 2

f2 = π

f3 = 43π

f1 = 2

f2 = π

f3 = 43π

f1 = 2

f2 = π

f3 = 43π

Inductive View of fn

Inductive Derivation for fn

fn = 2 fn−1∫ π

20 sinn(θ) dθ n ≥ 2

f1 = 2

fn = 2n ∫ π2

0 sinn(θ) dθ∫ π

20 sinn−1(θ) dθ . . .

∫ π2

0 sin1(θ) dθ

fn = 2 fn−1∫ π

f1 = 2

fn = 2n ∫ π2

20 sinn−1(θ) dθ . . .

∫ π2

0 sin1(θ) dθ

fn = 2 fn−1∫ π

f1 = 2

fn = 2n ∫ π2

20 sinn−1(θ) dθ . . .

∫ π2

0 sin1(θ) dθ

Volume of a 1, 2, 3, 4-Dimensional Sphere

f1 = 2

f2 = 22 ∫ π2

0 sin2(θ) dθ∫ π

20 sin1(θ) dθ = π

f3 = 23 ∫ π2

20 sin2(θ) dθ

∫ π2

0 sin1(θ) dθ = 43π

24 ∫ π2

20 sin3(θ) dθ

∫ π2

20 sin1(θ) dθ = π2

f1 = 2

f2 = 22 ∫ π2

f3 = 23 ∫ π2

20 sin2(θ) dθ

∫ π2

0 sin1(θ) dθ = 43π

24 ∫ π2

20 sin3(θ) dθ

∫ π2

20 sin1(θ) dθ = π2

f1 = 2

f2 = 22 ∫ π2

f3 = 23 ∫ π2

20 sin2(θ) dθ

∫ π2

0 sin1(θ) dθ = 43π

24 ∫ π2

20 sin3(θ) dθ

∫ π2

20 sin1(θ) dθ = π2

f1 = 2

f2 = 22 ∫ π2

f3 = 23 ∫ π2

20 sin2(θ) dθ

∫ π2

0 sin1(θ) dθ = 43π

24 ∫ π2

20 sin3(θ) dθ

∫ π2

20 sin1(θ) dθ = π2

f1 = 2

f2 = 22 ∫ π2

f3 = 23 ∫ π2

20 sin2(θ) dθ

∫ π2

0 sin1(θ) dθ = 43π

24 ∫ π2

20 sin3(θ) dθ

∫ π2

20 sin1(θ) dθ = π2

Sine Power Integrals

∫ π2

0 sinn(θ)dθ = n−1n

∫ π2

0 sinn−2(θ)dθ∫ π2

n−3n−2 · · ·

π2 , for even n∫ π

20 sinn(θ)dθ = n−1

nn−3n−2 · · ·

23 , for odd n∫ π

20 sinn(θ)dθ

∫ π2

0 sinn−1(θ)dθ = π2n√

π2(n+1) ≤

∫ π2

0 sinn(θ)dθ ≤√

∫ π2

n−3n−2 · · ·

nn−3n−2 · · ·

20 sinn(θ)dθ

∫ π2

π2(n+1) ≤

∫ π2

n−3n−2 · · ·

nn−3n−2 · · ·

20 sinn(θ)dθ

∫ π2

π2(n+1) ≤

∫ π2

n−3n−2 · · ·

nn−3n−2 · · ·

20 sinn(θ)dθ

∫ π2

π2(n+1) ≤

∫ π2

n−3n−2 · · ·

nn−3n−2 · · ·

20 sinn(θ)dθ

∫ π2

π2(n+1) ≤

∫ π2

The Formula for fn

fn = πn/2n2 !

, for even n

fn = π(n−1)/2

n2 ( n

2−1)··· 12, for odd n

fn → 0 as n →∞!

The biggest unit sphere sits in 5-d!

The Formula for fn

fn = πn/2n2 !

, for even n

fn = π(n−1)/2

n2 ( n

2−1)··· 12, for odd n

fn → 0 as n →∞!

The Formula for fn

fn = πn/2n2 !

, for even n

fn = π(n−1)/2

n2 ( n

2−1)··· 12, for odd n

fn → 0 as n →∞!

The Formula for fn

fn = πn/2n2 !

, for even n

fn = π(n−1)/2

n2 ( n

2−1)··· 12, for odd n

fn → 0 as n →∞!

The Unit Sphere vs theUnit Cube

Corners of a unitcube are distance√

n2 from the origin

Center points ofeach side aredistance 1

2 fromthe origin

It looks like this

The Unit Sphere vs theUnit Cube

Corners of a unitcube are distance√

n2 from the origin

Center points ofeach side aredistance 1

2 fromthe origin

It looks like this

Where is the Volume Concentrated?

How much of the volume is located outside a band of angle 2αaround the equator?

R π2 −α

0 sinn(θ) dθR π2

0 sinn(θ) dθ

Denominator:∫ π

20 sinn(θ) dθ ≥

2(n+1)

Numerator:∫ π

2 −α

0 sinn(θ) dθ ≤?

R π2 −α

0 sinn(θ) dθR π2

0 sinn(θ) dθ

Denominator:∫ π

20 sinn(θ) dθ ≥

2(n+1)

Numerator:∫ π

2 −α

0 sinn(θ) dθ ≤?

R π2 −α

0 sinn(θ) dθR π2

0 sinn(θ) dθ

Denominator:∫ π

20 sinn(θ) dθ ≥

2(n+1)

Numerator:∫ π

2 −α

0 sinn(θ) dθ ≤?

R π2 −α

0 sinn(θ) dθR π2

0 sinn(θ) dθ

Denominator:∫ π

20 sinn(θ) dθ ≥

2(n+1)

Numerator:∫ π

2 −α

0 sinn(θ) dθ ≤?

∫ π2 −α

0 sinn(θ) dθ ≤?

∫ π2 −α

0sinn(θ) dθ

sin2 α

y(1− y)

n−12 dy , y = cos2(θ)

≤ 12 sin α

sin2 αe−y n−1

≤ 1(n − 1) sin α

e−n−1

2 sin2 α

∫ π2 −α

0 sinn(θ) dθ ≤?

∫ π2 −α

0sinn(θ) dθ

sin2 α

y(1− y)

n−12 dy , y = cos2(θ)

≤ 12 sin α

sin2 αe−y n−1

≤ 1(n − 1) sin α

e−n−1

2 sin2 α

∫ π2 −α

0 sinn(θ) dθ ≤?

∫ π2 −α

0sinn(θ) dθ

sin2 α

y(1− y)

n−12 dy , y = cos2(θ)

≤ 12 sin α

sin2 αe−y n−1

≤ 1(n − 1) sin α

e−n−1

2 sin2 α

∫ π2 −α

0 sinn(θ) dθ ≤?

∫ π2 −α

0sinn(θ) dθ

sin2 α

y(1− y)

n−12 dy , y = cos2(θ)

≤ 12 sin α

sin2 αe−y n−1

≤ 1(n − 1) sin α

e−n−1

2 sin2 α

Volume Fraction outside the 2α-angle Equatorial Band

R π2 −α

0 sinn(θ) dθR π2

0 sinn(θ) dθ≤

√2(n+1)

(n−1) sin αe−n−1

2 sin2 α

For α ∼ sin(α) = 1√n , this is ∼

πe = .4839

More than half the volume is in a 2√n angle band around the

equator.

For sin(α) = a√n , the above bound is ∼

√2π

1ae−

Reminiscent of the Normal distribution?

2∫∞

a1√2π

e−x22 dx ≤

√2π

1a e−

R π2 −α

0 sinn(θ) dθR π2

0 sinn(θ) dθ≤

√2(n+1)

2 sin2 α

πe = .4839

equator.

√2π

1ae−

2∫∞

a1√2π

e−x22 dx ≤

√2π

1a e−

R π2 −α

0 sinn(θ) dθR π2

0 sinn(θ) dθ≤

√2(n+1)

2 sin2 α

πe = .4839

equator.

√2π

1ae−

2∫∞

a1√2π

e−x22 dx ≤

√2π

1a e−

R π2 −α

0 sinn(θ) dθR π2

0 sinn(θ) dθ≤

√2(n+1)

2 sin2 α

πe = .4839

equator.

√2π

1ae−

2∫∞

a1√2π

e−x22 dx ≤

√2π

1a e−

R π2 −α

0 sinn(θ) dθR π2

0 sinn(θ) dθ≤

√2(n+1)

2 sin2 α

πe = .4839

equator.

√2π

1ae−

2∫∞

a1√2π

e−x22 dx ≤

√2π

1a e−

Do 2 Equators sum to more than the whole!

Surface Area An(r) of an n-Dimensional Sphere

∫ r0 An(r) dr = Vn(r)

dVn(r)dr = An(r)

An(r) = anrn−1, and an = nfn

an = 2an−1∫ π

20 sinn−2(θ) dθ

a2 = 2π

an = 2n−1π∫ π

20 sinn−2(θ) dθ

∫ π2

0 sinn−3(θ) dθ . . .∫ π

20 sin1(θ) dθ

dVn(r)dr = An(r)

an = 2an−1∫ π

20 sinn−2(θ) dθ

a2 = 2π

an = 2n−1π∫ π

20 sinn−2(θ) dθ

∫ π2

0 sinn−3(θ) dθ . . .∫ π

20 sin1(θ) dθ

dVn(r)dr = An(r)

an = 2an−1∫ π

20 sinn−2(θ) dθ

a2 = 2π

an = 2n−1π∫ π

20 sinn−2(θ) dθ

∫ π2

0 sinn−3(θ) dθ . . .∫ π

20 sin1(θ) dθ

dVn(r)dr = An(r)

an = 2an−1∫ π

20 sinn−2(θ) dθ

a2 = 2π

an = 2n−1π∫ π

20 sinn−2(θ) dθ

∫ π2

0 sinn−3(θ) dθ . . .∫ π

20 sin1(θ) dθ

dVn(r)dr = An(r)

an = 2an−1∫ π

20 sinn−2(θ) dθ

a2 = 2π

an = 2n−1π∫ π

20 sinn−2(θ) dθ

∫ π2

0 sinn−3(θ) dθ . . .∫ π

20 sin1(θ) dθ

dVn(r)dr = An(r)

an = 2an−1∫ π

20 sinn−2(θ) dθ

a2 = 2π

an = 2n−1π∫ π

20 sinn−2(θ) dθ

∫ π2

0 sinn−3(θ) dθ . . .∫ π

20 sin1(θ) dθ

Inductive View of an

Dot Product between a Fixed Unit Vector and a Random Unit Vector

A Spherically Symmetric Random Unit Vector:Probability of lying in any specific patch P on the surface isproportional to the area of P.

Dot Product is also the length of the projection of the fixed vectoron the random vector.

Dot Product equals cos(θ), where θ is the angle between the twovectors.

E(cos2(θ)), Var(cos2(θ)), and tail bounds on cos2(θ)?

E(cos2(θ))

∫ π2

0 sinn−2(θ) cos2(θ) dθ∫ π2

0 sinn−2(θ) dθ

∫ π2

0 sinn−2(θ) dθ −∫ π

20 sinn−2(θ) dθ

= 1− n − 1n

E(cos2(θ))

∫ π2

0 sinn−2(θ) dθ

∫ π2

20 sinn−2(θ) dθ

= 1− n − 1n

E(cos2(θ))

∫ π2

0 sinn−2(θ) dθ

∫ π2

20 sinn−2(θ) dθ

= 1− n − 1n

Var(cos2(θ))

∫ π2

0 sinn−2(θ) dθ− 1

∫ π2

0 sinn−2(θ) dθ − 2∫ π

20 sinn(θ) dθ +

∫ π2

0 sinn+2(θ) dθ∫ π2

= 1− 2n − 1

(n − 1)(n + 1)

n(n + 2)− 1

n2 =2(n − 1)

n2(n + 2)≤ 2

Var(cos2(θ))

∫ π2

0 sinn−2(θ) dθ − 2∫ π

20 sinn(θ) dθ +

∫ π2

= 1− 2n − 1

(n − 1)(n + 1)

n(n + 2)− 1

n2 =2(n − 1)

n2(n + 2)≤ 2

Var(cos2(θ))

∫ π2

0 sinn−2(θ) dθ − 2∫ π

20 sinn(θ) dθ +

∫ π2

= 1− 2n − 1

(n − 1)(n + 1)

n(n + 2)− 1

n2 =2(n − 1)

n2(n + 2)≤ 2

Tail Bounds on cos2(θ)

Pr(cos2(θ) > a2

n ) =R cos−1( a√

0 sinn−2(θ) dθR π2

0 sinn−2(θ) dθ

≤√

2(n−1)(n−2)π

1(n−3)ae−

n−32n a2 ∼

√2π

1ae−

Tail Bounds on cos2(θ)

Pr(cos2(θ) > a2

n ) =R cos−1( a√

0 sinn−2(θ) dθR π2

0 sinn−2(θ) dθ

≤√

2(n−1)(n−2)π

1(n−3)ae−

n−32n a2 ∼

√2π

1ae−

Projection Length of Fixed Unit Vector on Random Unit Vector

With probability 1−√

1ae−

a22 , the projected length is between 0

and a√n

With probability 0.946, the projected length is between 0 and 2√n

Can we drive the projected length to be much more tightlydistributed around 1√

1ae−

and a√n

1ae−

and a√n

Project on to many Random Vectors

Let X1, . . . , Xk be the projection lengths on to k independentrandom unit vectors

The resulting k -tuple defines a mapping from n-dimensionalspace to k -dimensional space

X =√

X 21 + · · ·+ X 2

k is the length of the vector post-mapping

Consider X 2 = X 21 + · · ·+ X 2

X =√

X 21 + · · ·+ X 2

Consider X 2 = X 21 + · · ·+ X 2

X =√

X 21 + · · ·+ X 2

Consider X 2 = X 21 + · · ·+ X 2

X =√

X 21 + · · ·+ X 2

Consider X 2 = X 21 + · · ·+ X 2

Sums of Random Variables

Since X 21 , . . . , X 2

k are i.i.d, E(X 2

k ) = E(X 21 ) and Var(X 2

k ) =Var(X 2

I.e., the distribution of X 2

k preserves the mean but is much tighteraround the mean.

Pr(|X 2

k − E(X 2

k )| ≥ α) << Pr(|X 21 − E(X 2

1 )| ≥ α)

Pr(|X 2 − E(X 2)| ≥ kα) << Pr(|X 21 − E(X 2

1 )| ≥ α)

Since X 21 , . . . , X 2

k are i.i.d, E(X 2

k ) = E(X 21 ) and Var(X 2

k ) =Var(X 2

Pr(|X 2

k − E(X 2

k )| ≥ α) << Pr(|X 21 − E(X 2

1 )| ≥ α)

Pr(|X 2 − E(X 2)| ≥ kα) << Pr(|X 21 − E(X 2

1 )| ≥ α)

Since X 21 , . . . , X 2

k are i.i.d, E(X 2

k ) = E(X 21 ) and Var(X 2

k ) =Var(X 2

Pr(|X 2

k − E(X 2

k )| ≥ α) << Pr(|X 21 − E(X 2

1 )| ≥ α)

Pr(|X 2 − E(X 2)| ≥ kα) << Pr(|X 21 − E(X 2

1 )| ≥ α)

Approximate Length Preservation in k -Dimensional RandomProjection

E(X 2) = kn , by Linearity of Expectation

Var(X 2) ≤ 2kn2 , by Linearity of Variance under Independence

With probability 1−?, X 2 is in (1− ε)kn . . . (1 + ε)k

If ? as small as m−3...

Union Bound: With probability 1−m−1, lengths for m2 distinctfixed vectors of arbitrary lengths are all simultaneouslyapproximately preserved, modulo scaling by

√nk !!

Asymptotic Tight Concentration for X 2

By CLT, for k →∞, the distribution of X 2 =∑k

0 X 2i tends to

N(kn ,≤ 2k

Pr(|X 2 − kn | ≥ εk

n ) should then be ≤√

4ε2kπ

e−ε2k

For k > 12 log mε2 , this is 1

How do we show this for finite k?

0 X 2i tends to

N(kn ,≤ 2k

Pr(|X 2 − kn | ≥ εk

4ε2kπ

e−ε2k

0 X 2i tends to

N(kn ,≤ 2k

Pr(|X 2 − kn | ≥ εk

4ε2kπ

e−ε2k

0 X 2i tends to

N(kn ,≤ 2k

Pr(|X 2 − kn | ≥ εk

4ε2kπ

e−ε2k

Tight Concentration and Tail Bound Inequalities

Markov’s Inequality for a non-negative random variable Y

Pr(Y > k) ≤ E(Y )/k

Chebychev’s Inequality

Pr(|X 2 − kn| ≥ ε

) ≤ Var(X 2)

(εkn )2

≤ 2ε2k

Not strong enough to yield negative exponential dependence onk .

Pr(Y > k) ≤ E(Y )/k

Pr(|X 2 − kn| ≥ ε

) ≤ Var(X 2)

(εkn )2

≤ 2ε2k

Pr(Y > k) ≤ E(Y )/k

Pr(|X 2 − kn| ≥ ε

) ≤ Var(X 2)

(εkn )2

≤ 2ε2k

Pr(Y > k) ≤ E(Y )/k

Pr(|X 2 − kn| ≥ ε

) ≤ Var(X 2)

(εkn )2

≤ 2ε2k

Lower Tail Bound for X 2

Using Markov’s inequality on e−tX 2, where t > 0 (as in Chernoff

Bounds):

Pr(X 2 < (1− ε)kn

) = Pr(−tX 2 > −t(1− ε)kn

= Pr(e−tX 2> e−t(1−ε) k

n ) ≤ E(e−tX 2)et(1−ε) k

Since X 2 =∑k

1 X 2i and the Xi ’s are identical and independent:

E(e−tX 2)et(1−ε) k

n = E(e−tX 2i )ket(1−ε) k

E(e−tX 2i ) ≤?

Bounds):

Pr(X 2 < (1− ε)kn

) = Pr(−tX 2 > −t(1− ε)kn

= Pr(e−tX 2> e−t(1−ε) k

n ) ≤ E(e−tX 2)et(1−ε) k

Since X 2 =∑k

E(e−tX 2i ) ≤?

Bounds):

Pr(X 2 < (1− ε)kn

) = Pr(−tX 2 > −t(1− ε)kn

= Pr(e−tX 2> e−t(1−ε) k

n ) ≤ E(e−tX 2)et(1−ε) k

Since X 2 =∑k

E(e−tX 2i ) ≤?

Bounds):

Pr(X 2 < (1− ε)kn

) = Pr(−tX 2 > −t(1− ε)kn

= Pr(e−tX 2> e−t(1−ε) k

n ) ≤ E(e−tX 2)et(1−ε) k

Since X 2 =∑k

E(e−tX 2i ) ≤?

Bounds):

Pr(X 2 < (1− ε)kn

) = Pr(−tX 2 > −t(1− ε)kn

= Pr(e−tX 2> e−t(1−ε) k

n ) ≤ E(e−tX 2)et(1−ε) k

Since X 2 =∑k

E(e−tX 2i ) ≤?

Bounds):

Pr(X 2 < (1− ε)kn

) = Pr(−tX 2 > −t(1− ε)kn

= Pr(e−tX 2> e−t(1−ε) k

n ) ≤ E(e−tX 2)et(1−ε) k

Since X 2 =∑k

E(e−tX 2i ) ≤?

Using 1− x ≤ e−x ≤ 1− x + x2

2 , for all x ≥ 0:

E(e−tX 2i ) ≤ E(1− tX 2

i + t2 X 4i

≤ 1− tn

2n2 ≤ e−tn (1− 3t

E(e−tX 2i ) ≤?

Using 1− x ≤ e−x ≤ 1− x + x2

2 , for all x ≥ 0:

E(e−tX 2i ) ≤ E(1− tX 2

i + t2 X 4i

≤ 1− tn

2n2 ≤ e−tn (1− 3t

Completing the Lower Tail Bound for X 2

Pr(X 2 < (1− ε)kn ) ≤ E(e−tX 2

i )ket(1−ε) kn

≤ e−ktn (1− 3t

2n )+ ktn (1−ε) ≤ e−

ktn (ε− 3t

Setting t = nε3 > 0 to minimize the above

Pr(X 2 < (1− ε)kn

) ≤ e−kε3 (ε− ε

2 ) ≤ e−kε2

Pr(X 2 < (1− ε)kn ) ≤ E(e−tX 2

i )ket(1−ε) kn

≤ e−ktn (1− 3t

2n )+ ktn (1−ε) ≤ e−

ktn (ε− 3t

Pr(X 2 < (1− ε)kn

) ≤ e−kε3 (ε− ε

2 ) ≤ e−kε2

Pr(X 2 < (1− ε)kn ) ≤ E(e−tX 2

i )ket(1−ε) kn

≤ e−ktn (1− 3t

2n )+ ktn (1−ε) ≤ e−

ktn (ε− 3t

Pr(X 2 < (1− ε)kn

) ≤ e−kε3 (ε− ε

2 ) ≤ e−kε2

Pr(X 2 < (1− ε)kn ) ≤ E(e−tX 2

i )ket(1−ε) kn

≤ e−ktn (1− 3t

2n )+ ktn (1−ε) ≤ e−

ktn (ε− 3t

Pr(X 2 < (1− ε)kn

) ≤ e−kε3 (ε− ε

2 ) ≤ e−kε2

Pr(X 2 < (1− ε)kn ) ≤ E(e−tX 2

i )ket(1−ε) kn

≤ e−ktn (1− 3t

2n )+ ktn (1−ε) ≤ e−

ktn (ε− 3t

Pr(X 2 < (1− ε)kn

) ≤ e−kε3 (ε− ε

2 ) ≤ e−kε2

Upper Tail Bound for X 2

As for the Lower Tail Bound, with t > 0:

Pr(X 2 > (1 + ε)kn

) = Pr(tX 2 > t(1 + ε)kn

= Pr(etX 2> et(1+ε) k

n ) ≤ E(etX 2)e−t(1+ε) k

Since X 2 =∑k

E(etX 2)e−t(1+ε) k

n = E(etX 2i )ke−t(1+ε) k

Pr(X 2 > (1 + ε)kn

) = Pr(tX 2 > t(1 + ε)kn

n ) ≤ E(etX 2)e−t(1+ε) k

Since X 2 =∑k

Pr(X 2 > (1 + ε)kn

) = Pr(tX 2 > t(1 + ε)kn

n ) ≤ E(etX 2)e−t(1+ε) k

Since X 2 =∑k

Pr(X 2 > (1 + ε)kn

) = Pr(tX 2 > t(1 + ε)kn

n ) ≤ E(etX 2)e−t(1+ε) k

Since X 2 =∑k

Pr(X 2 > (1 + ε)kn

) = Pr(tX 2 > t(1 + ε)kn

n ) ≤ E(etX 2)e−t(1+ε) k

Since X 2 =∑k

The Upper Tail Bound for X 2

Setting y = cos2θ.∫ π2

0 sinn−2 θet cos2 θ dθ∫ π2

0 sinn−2 θ dθ≤

√2(n − 1)

(1− y)n−3

2 ety√

Setting 1− y ≤ e−y ,∀y .

≤√

2(n − 1)

e−y( n−32 −t)

Setting∫ 1

0 y− 12 e−y dy ≤

≤√

2(n − 1)

n−32 − t

√π ≤

√n − 1

n − 3− 2t

√2(n − 1)

(1− y)n−3

2 ety√

≤√

2(n − 1)

e−y( n−32 −t)

Setting∫ 1

0 y− 12 e−y dy ≤

≤√

2(n − 1)

n−32 − t

√π ≤

√n − 1

n − 3− 2t

√2(n − 1)

(1− y)n−3

2 ety√

≤√

2(n − 1)

e−y( n−32 −t)

Setting∫ 1

0 y− 12 e−y dy ≤

≤√

2(n − 1)

n−32 − t

√π ≤

√n − 1

n − 3− 2t

√2(n − 1)

(1− y)n−3

2 ety√

≤√

2(n − 1)

e−y( n−32 −t)

Setting∫ 1

0 y− 12 e−y dy ≤

≤√

2(n − 1)

n−32 − t

√π ≤

√n − 1

n − 3− 2t

Completing the Upper Tail Bound for X 2

E(etX 2i )ke−t(1+ε) k

n ≤(√ n−1

n−3−2t

)ke−t(1+ε) kn

Using (1− x)−12 ≤

√1 + x + 2x2 ≤ e

x2 (1+2x), for 0 ≤ x ≤ 1

2 , andconstraining 0 < 2t < n−3

2 , k << n

(√ n − 1n − 3− 2t

)k ≤(√n − 1

n − 3)k

(1− 2tn − 3

)−k2 ≤ eO( k

n )+ tkn−3 (1+ 4t

n−3 )

n ≤(√ n−1

n−3−2t

)ke−t(1+ε) kn

Using (1− x)−12 ≤

√1 + x + 2x2 ≤ e

x2 (1+2x), for 0 ≤ x ≤ 1

2 , k << n

(√ n − 1n − 3− 2t

)k ≤(√n − 1

n − 3)k

(1− 2tn − 3

)−k2 ≤ eO( k

n )+ tkn−3 (1+ 4t

n−3 )

n ≤(√ n−1

n−3−2t

)ke−t(1+ε) kn

Using (1− x)−12 ≤

√1 + x + 2x2 ≤ e

x2 (1+2x), for 0 ≤ x ≤ 1

2 , k << n

(√ n − 1n − 3− 2t

)k ≤(√n − 1

n − 3)k

(1− 2tn − 3

)−k2 ≤ eO( k

n )+ tkn−3 (1+ 4t

n−3 )

n ≤(√ n−1

n−3−2t

)ke−t(1+ε) kn

Using (1− x)−12 ≤

√1 + x + 2x2 ≤ e

x2 (1+2x), for 0 ≤ x ≤ 1

2 , k << n

(√ n − 1n − 3− 2t

)k ≤(√n − 1

n − 3)k

(1− 2tn − 3

)−k2 ≤ eO( k

n )+ tkn−3 (1+ 4t

n−3 )

So: E(etX 2i )ke−t(1−ε) k

≤ [eO( kn )+ tk

n−3 (1+ 4tn−3 )][e−t(1+ε) k

= eO( k

n )+[ tkn−3−

tkn ]+[ 4t2k

(n−3)2− εtk

≤ eO( k

n )+[ 4t2k(n−3)2

− εtkn ]

Setting t = ε (n−3)2

8n and assuming k << n, we get:

≤ e−ε2 k16 +O( k

n ) ≤ 2e−ε2 k16

≤ [eO( kn )+ tk

n−3 (1+ 4tn−3 )][e−t(1+ε) k

= eO( k

n )+[ tkn−3−

tkn ]+[ 4t2k

(n−3)2− εtk

≤ eO( k

n )+[ 4t2k(n−3)2

− εtkn ]

≤ e−ε2 k16 +O( k

n ) ≤ 2e−ε2 k16

≤ [eO( kn )+ tk

n−3 (1+ 4tn−3 )][e−t(1+ε) k

= eO( k

n )+[ tkn−3−

tkn ]+[ 4t2k

(n−3)2− εtk

≤ eO( k

n )+[ 4t2k(n−3)2

− εtkn ]

≤ e−ε2 k16 +O( k

n ) ≤ 2e−ε2 k16

≤ [eO( kn )+ tk

n−3 (1+ 4tn−3 )][e−t(1+ε) k

= eO( k

n )+[ tkn−3−

tkn ]+[ 4t2k

(n−3)2− εtk

≤ eO( k

n )+[ 4t2k(n−3)2

− εtkn ]

≤ e−ε2 k16 +O( k

n ) ≤ 2e−ε2 k16

≤ [eO( kn )+ tk

n−3 (1+ 4tn−3 )][e−t(1+ε) k

= eO( k

n )+[ tkn−3−

tkn ]+[ 4t2k

(n−3)2− εtk

≤ eO( k

n )+[ 4t2k(n−3)2

− εtkn ]

≤ e−ε2 k16 +O( k

n ) ≤ 2e−ε2 k16

≤ [eO( kn )+ tk

n−3 (1+ 4tn−3 )][e−t(1+ε) k

= eO( k

n )+[ tkn−3−

tkn ]+[ 4t2k

(n−3)2− εtk

≤ eO( k

n )+[ 4t2k(n−3)2

− εtkn ]

≤ e−ε2 k16 +O( k

n ) ≤ 2e−ε2 k16

≤ [eO( kn )+ tk

n−3 (1+ 4tn−3 )][e−t(1+ε) k

= eO( k

n )+[ tkn−3−

tkn ]+[ 4t2k

(n−3)2− εtk

≤ eO( k

n )+[ 4t2k(n−3)2

− εtkn ]

≤ e−ε2 k16 +O( k

n ) ≤ 2e−ε2 k16

Wrapping Up: The Johnson-Lindenstrauß Theorem

Given m points a1, . . . , am in n-dimensional space, m ≥ n, andgiven ε, 0 ≤ ε ≤ 1.

Choose k random unit vectors r1, . . . , rk , where k = 48 ln mε2 << n.

Define k -dimensional points b1, . . . , bm, wherebi = (ai · r1, ai · r2, · · · , ai · rk ).

Consider any pair ai , aj . Then:

|bi − bj ||ai − aj |

ai − aj

|ai − aj |· r1)2 + (

ai − aj

|ai − aj |· r2)2 + · · ·+ (

ai − aj

|ai − aj |· rn)2

Then√

(1− ε)√

kn ≤

|bi−bj ||ai−aj | ≤

√(1 + ε)

√kn with probability 3

And this holds for all pairs simultaneously with probability 1− 32m .

ai − aj

|ai − aj |· r1)2 + (

ai − aj

|ai − aj |· r2)2 + · · ·+ (

ai − aj

|ai − aj |· rn)2

Then√

(1− ε)√

kn ≤

√(1 + ε)

ai − aj

|ai − aj |· r1)2 + (

ai − aj

|ai − aj |· r2)2 + · · ·+ (

ai − aj

|ai − aj |· rn)2

Then√

(1− ε)√

kn ≤

√(1 + ε)

ai − aj

|ai − aj |· r1)2 + (

ai − aj

|ai − aj |· r2)2 + · · ·+ (

ai − aj

|ai − aj |· rn)2

Then√

(1− ε)√

kn ≤

√(1 + ε)

ai − aj

|ai − aj |· r1)2 + (

ai − aj

|ai − aj |· r2)2 + · · ·+ (

ai − aj

|ai − aj |· rn)2

Then√

(1− ε)√

kn ≤

√(1 + ε)

ai − aj

|ai − aj |· r1)2 + (

ai − aj

|ai − aj |· r2)2 + · · ·+ (

ai − aj

|ai − aj |· rn)2

Then√

(1− ε)√

kn ≤

√(1 + ε)

ai − aj

|ai − aj |· r1)2 + (

ai − aj

|ai − aj |· r2)2 + · · ·+ (

ai − aj

|ai − aj |· rn)2

Then√

(1− ε)√

kn ≤

√(1 + ε)

foundations of data science course ramesh hariharan jan 2014hariharan-ramesh.com/ppts/ndim.pdf ·...

Documents