on complexity, sampling, and -nets and -samples. range spaces a range space is a pair, where is a...

45
On Complexity, Sampling, and -Nets and - Samples

Upload: elissa-cramer

Post on 14-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

On Complexity, Sampling, and -Nets and -Samples

Range Spaces

• A range space is a pair , where is a ground set, it’s elements called points and is a family of subsets of , it’s elements called ranges.

• For example:

S ( , )X R X

R

X

2( , )S X R all disks

Measure

• Let be a range space and let x be a finite subset of , then for the measure is:

( , )S X RX r R

( )r x

m rx

Motivation

• For , the estimation of , is defined by:

• We want to find a Sample , such that for every , .

r R

( )r N

s rN

N x

N xr R ( ) ( )m r s r

More Definitions

• Let be a range space. For , let be the projection of on

• If contains all subsets of , then is shattered by

• is the maximum cardinality of a shattered subset of

( , )S X R |YR r Y r R

Y XR Y

YR Y Y

R

dim ( )VC S X

Half-Spaces

• Claim: let be a set in . There are real numbers ,not all of them zero, such that

• Theorem: Let be a set in . Then, there exist 2 disjoint subsets such that

1 2 2, ,..., lP p p p l

1 2,..., l

0, 0i i ii i

p

1 2 2, ,..., lP p p p l

,C D P( ) ( ) ,CH C CH D C D P

Half-Spaces

• Lemma: let be a finite set. Let and is a half-space of containing . Then there is a point in contained in

• Corollary: Let Then, . Proof:– regular simplex–

lP ( )s CH P

dim ( ) 1VC S l

h l s

P h

( , )lS X R closed halfspaces

dim ( ) 1VC S l

dim ( ) 2VC S l

The Growth Function

• Let be the growth function.

• Notice:

0i

nG n n

i

1 1 1G n G n G n

Sauer’s Lemma

• If is a range space with and then

• Proof: By induction. The claim trivially holds for or .

• Let , and consider the sets:

( , )X R dim ( , )VC X R X n ( )R G n

0 0n

x X

\ \ , \ \xR r x r x R r x R R x r x r R

\xR R R x

Sauer’s Lemma

• Notice:

• By induction.

dim ( \ , ) 1VC xX x R

1\ ( 1) ( 1) ( )xR R R x G n G n G n

Even More Definitions

• Let be a range space. We define its shatter function to be:

• We define the shattering dimension as the smallest such that

( , )S X R

( ) maxs B

B XB m

m R

d ( ) ( ),ds m O m m

• Let , we have that . As such and by its definition

n Bd

d ( )BR G n n

BR n

• Let be the largest set shattered by and , so

Assuming

N X S

( log )O d d 2 ( ) d

sNR c N

lg( ) lg( )c d

lg( )

lg( )

cd

max(2,2 lg( ))c

26 2(6 ) ln(6 )

2 lg( ) ln ln 2

dd d d d

Why Do We Need Shattering Dimension?

• Let be a range space. The shattering dimension of is 3

• Proof: Consider any set of points, then : –

2( , )S X R all disks S

P n 34PR n

1Pr r R r n

2 2P

nr r R r

3 4 82 3P

n nr r R r

Mixing Range Spaces

• Let be range spaces with finite VC dimensions .

• Let be a function that maps k-tuples of sets , into a subset of X by

• Consider the range set• Theorem: The associated range space

has a VC dimension bounded by , where

1

( , )ki

i iS X R

1,..., k

1( ,..., )kf r ri

ir R , , \,

1' ( ,..., ) ik iR f r r r R

( lg )O k kmax i i

Proof

• Assume is being shattered by

• Assume

,Y X Y t

12 ' ( ,..., )

( ) ( ( )) 2( )i

t i ik iY Y Y

i

kk

i

R r r r R R

teG t G t

T

, lg( / ) 1t e te

(1 lg( / )) 3 lg( / )t k te k t

Proof

• Setting gives us:

3 lg( / )t k t

/x t

3 lg( )

36

ln ln 22 6 ln(6 )

12 ln(6 )

x k x

x kk

xx k k

t k k

Dual Range Spaces

• Let be a range space.

• Let , We define

• The dual range space is

( , )S X R

p X pR r p r R

( , ), pS R X X R p X

• Assume is shattered by , then there are points creating the following matrix:

12 1,...,r r R

S

2

1 22

1

2

0 1 1

: 0 0 1

0 0 1

p p p

r

M r

r

0 0 0

0 0 1

' : 0 1 0

1 1 1

M

• The set of points represented by the columns of M’ is shattered by

12 lg

0 0 0

0 0 1

' : 0 1 0

1 1 1

M

S

lg

1 lg 12

• Let be a range space, and let be a finite subset of points. A subset is an for if :

Samples ( , )S X R x

C x Sample x

: ( ) ( )r R m r s r

The Theorem• There is a positive constant such that if is

any range space with VC dimension at most , is a finite subset and then a random subset of cardinality

is an for with probability at least

c ( , )X R

Sample x X

, 0

2

1log log

cs

sample x 1

• Let be a range space, and let be a finite subset of points. A subset is an for if:

Nets ( , )S X R x

N x Net x

: ( )r R m r r N

The TheoremNet • Let be a range space of VC dimension ,

and let be a finite subset of points.• Suppose • Let be a set obtained by random independent

draws from , where:

• Then is an with probability at least

( , )S X R x

0 1, 1

N m

x4 4 8 16

max( lg , lg )m

N Net 1

Some Applications

Range Searching

• Consider a very large set of points in the plane.

• We would like to be able to quickly decide how many points are included inside a query rectangle.

• The Theorem tells us that there is a subset of constant size, that works for all query rectangles.

sample

Learning a Concept

• A function is defined over the plane and returns ‘1’ inside an unknown disk and ‘0’ outside of it.

• There is some distribution and we can pick (roughly) random points in a sample and compute just for them

f

unknownD

D

((1/ ) log(1/ ))O R f

Learning a Concept

:f

unknownDunknownD

Learning a Concept

• We compute the smallest that contains only the points in labeled ‘1’ and define that returns ‘1’ inside and ‘0’ outside it

• We claim that classifies correctly all but of the points:

'DD g

'D

unknownD

'D

g

fraction

Pr ( ) ( )p f p g p D

Learning a Concept

• Let be a range space where is all the symmetric differences between 2 disks.

• is our finite set•

• is the probability of mistake in classification

2( , )S R R

2D 'unknownr D D R

( )m r

Learning a Concept

• By the Theorem is an , and so:

Net R Net

( )m r r R

( )m r

Discrepancy

• Let be a coloring. We define the discrepancy of over a range as the amount of imbalance in the coloring:

• Let be a partition of into pairs. We will refer to as compatible with if a each point in a pair in is colored by a different color by

: { 1,1}X r

( ) ( )p r

r p

X

Discrepancy

• Let denote the crossing number of • Let be the contribution of the i’th crossing pair to

For , for some threshold we have by Chernoff’s inequality:

#r r

{ 1, 1}iX

( )r

2 # ln(4 )r rc m

2 1Pr ( ) 2Pr 2exp

2# 2r

r i r ci r

r Xm

0c

• Lemma: Let be an for , and let be an for . Then is an for

• Proof:

Q P

Building via DiscrepancySample

1 sample P'Q Q 2 sample Q 'Q

1 2 sample P

1 2

' '

' '

'

'

P r Q r P r Q r Q r Q r

P Q P Q Q Q

P r Q r Q r Q r

P Q Q Q

• Let be a range space with shattering dimension

• Let be a set of points, consider the induced range space

• • Consider coloring , with the discrepancy

bounded as we have seen ant let

( , )S X R

d

P X

( )dPm R O n

Building via DiscrepancySample

n

( , )P PS P R

{ ( ) 1},

2

Q p P p

nQ

• Now, for every range :

, for some absolute constant

r R

Building via DiscrepancySample

( \ ) ( ) ln(4 ) ln ( )

ln( )

d

d

P Q r Q r r n m n O n

c n n

c

2 ( \ ) ln( )dP r Q r P Q r Q r c n n

ln( )( ), ( )

P r Q r d nn n c

P Q n

• Let . We will compute coloring of with low discrepancy. Let be the points of colored -1 by

• Let

• By the lemma we have that is a for

0 1,P P P Q

Building via DiscrepancySample

1i

1iP iP

1iP 1i

1 1( ) ( )

2i i i

nn

kP 1

k

iisample

P

• We look for the maximal such that :

• So, taking the largest results in a set of the size which is an for

Building via DiscrepancySample

k1

k

ii

1 11

1 11 111 1

ln( )ln( / 2 ) ln( / 2 )

/ 2 / 2

i kkk k

i i kii k

d nd n d nc c c

n n n

2 2 2

11 1 112 2 2

1

ln( )2 lnk

kk

nc d c d c dn

n

k kP2(( / ) ln( / ))O d d sample P

• Let be a range space of VC dimension and let be a set of points of size

• Let be a set of points obtained by independent samples from

• Let • We wish to bound • Let be a sample generated like

Proof of the TheoremNet

( , )S X R x n

1( ,..., )mN x xx

1 r R r x n r N

1Pr

1( ,..., )mT y y N

• Let

• Assume occurs, and • Then

Proof of the TheoremNet

2 2

mr R r x n r N r T

1 2Pr 2Pr :

2 1 2 1 1 2 1Pr Pr / Pr Pr / Pr

2 1Pr 1/ 2 : 1 1'r

2 1 2Pr Pr ' Pr '2

mr r T

• Let• Then is a binomial variable with:

• Thus, by Chebychev’s inequality:

Proof of the TheoremNet ' /p r x n

' 'X r T

' , ' (1 )X pm V X p p m pm

2

Pr ' Pr ' Pr '2 2 2

Pr ' Pr ' ' '2 2

2 1, 8 / 8 /

2

m pm pmX X X pm

pm pmX pm pm X X V X

m ppm

• Let

Proof of the TheoremNet '2 2

mr R r N r T

'1 2 2Pr 2Pr 2Pr :

'2 2

' /22Pr (2 )2 :mG m

2 2

2

'2' '

2 2

' '2 2

Pr ( )Pr Pr ( ) Pr

Pr

Pr ( ) Pr Pr ( )

m m

m

z x z x

z x

Z zZ z Z z

Z z

Z z Z z Z z

• Now we fix a set ,and it’s enough to consider the range space , we know

• Let us fix any and consider the event

Proof of the TheoremNet

Z

( , )ZZ R (2 )ZR G m

Zr R

2r

mr N r T

• If it is trivial• Otherwise,

Proof of the TheoremNet /2Pr 2 :m

r

( ) / 2k r N T m

/2

2 2Pr Pr /

(2 )(2 1)...( 1) ( 1)...( 1)

2 (2 1)...( 1) 2 (2 1)...(2 1)

2 2

r

k m

m k mr N

m m

m k m k m k m m m k

m m m m m m k

Proof of the TheoremNet

' /2 /22

' /22

Pr Pr Pr 2 (2 )2

Pr (2 )2

ZZ

m mr r Z

r Rr R

m

Z R G m

G m

• And to sum up:

' /21 2 2Pr 2Pr 2Pr 2 (2 )2 mG m