on complexity, sampling, and -nets and -samples

45
On Complexity, Sampling, and -Nets and - Samples

Upload: clara

Post on 24-Feb-2016

39 views

Category:

Documents


0 download

DESCRIPTION

On Complexity, Sampling, and -Nets and -Samples. Range Spaces. A range space is a pair , where is a ground set, it’s elements called points and is a family of subsets of , it’s elements called ranges . For example: . Measure. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: On Complexity, Sampling, and   -Nets  and    -Samples

On Complexity, Sampling, and -Nets and -Samples

Page 2: On Complexity, Sampling, and   -Nets  and    -Samples

Range Spaces

• A range space is a pair , where is a ground set, it’s elements called points and is a family of subsets of , it’s elements called ranges.

• For example:

S ( , )X R XR

X

2( , )S X R all disks

Page 3: On Complexity, Sampling, and   -Nets  and    -Samples

Measure

• Let be a range space and let x be a finite subset of , then for the measure is:

( , )S X RX r R

( )r x

m rx

Page 4: On Complexity, Sampling, and   -Nets  and    -Samples

Motivation

• For , the estimation of , is defined by:

• We want to find a Sample , such that for every , .

r R

( )r N

s rN

N x

N xr R ( ) ( )m r s r

Page 5: On Complexity, Sampling, and   -Nets  and    -Samples

More Definitions

• Let be a range space. For , let be the projection of on

• If contains all subsets of , then is shattered by

• is the maximum cardinality of a shattered subset of

( , )S X R |YR r Y r R

Y XR Y

YR Y YR

dim ( )VC S X

Page 6: On Complexity, Sampling, and   -Nets  and    -Samples

Half-Spaces

• Claim: let be a set in . There are real numbers ,not all of them zero, such that

• Theorem: Let be a set in . Then, there exist 2 disjoint subsets such that

1 2 2, ,..., lP p p p l

1 2,..., l

0, 0i i ii i

p

1 2 2, ,..., lP p p p l

,C D P( ) ( ) ,CH C CH D C D P

Page 7: On Complexity, Sampling, and   -Nets  and    -Samples

Half-Spaces

• Lemma: let be a finite set. Let and is a half-space of containing . Then there is a point in contained in

• Corollary: Let Then, . Proof:– regular simplex–

lP ( )s CH P

dim ( ) 1VC S l

h l s

P h

( , )lS X R closed halfspaces

dim ( ) 1VC S l

dim ( ) 2VC S l

Page 8: On Complexity, Sampling, and   -Nets  and    -Samples

The Growth Function

• Let be the growth function.

• Notice:

0i

nG n n

i

1 1 1G n G n G n

Page 9: On Complexity, Sampling, and   -Nets  and    -Samples

Sauer’s Lemma

• If is a range space with and then

• Proof: By induction. The claim trivially holds for or .

• Let , and consider the sets:

( , )X R dim ( , )VC X R X n ( )R G n

0 0nx X

\ \ , \ \xR r x r x R r x R R x r x r R

\xR R R x

Page 10: On Complexity, Sampling, and   -Nets  and    -Samples

Sauer’s Lemma

• Notice:

• By induction.

dim ( \ , ) 1VC xX x R

1\ ( 1) ( 1) ( )xR R R x G n G n G n

Page 11: On Complexity, Sampling, and   -Nets  and    -Samples

Even More Definitions

• Let be a range space. We define its shatter function to be:

• We define the shattering dimension as the smallest such that

( , )S X R

( ) maxs BB XB m

m R

d ( ) ( ),ds m O m m

Page 12: On Complexity, Sampling, and   -Nets  and    -Samples

• Let , we have that . As such and by its definition

n Bd

d ( )BR G n n

BR n

Page 13: On Complexity, Sampling, and   -Nets  and    -Samples

• Let be the largest set shattered by and , so

Assuming

N X S

( log )O d d 2 ( ) d

sNR c N

lg( ) lg( )c d

lg( )lg( )

c d

max(2,2 lg( ))c

2 6 2(6 ) ln(6 )2 lg( ) ln ln 2

dd d d d

Page 14: On Complexity, Sampling, and   -Nets  and    -Samples

Why Do We Need Shattering Dimension?• Let be a range space. The

shattering dimension of is 3

• Proof: Consider any set of points, then : –

2( , )S X R all disks S

P n 34PR n

1Pr r R r n

2 2P

nr r R r

3 4 82 3P

n nr r R r

Page 15: On Complexity, Sampling, and   -Nets  and    -Samples

Mixing Range Spaces

• Let be range spaces with finite VC dimensions .

• Let be a function that maps k-tuples of sets , into a subset of X by

• Consider the range set• Theorem: The associated range space

has a VC dimension bounded by , where

1

( , )ki

i iS X R

1,..., k

1( ,..., )kf r ri

ir R , , \,

1' ( ,..., ) ik iR f r r r R

( lg )O k kmax i i

Page 16: On Complexity, Sampling, and   -Nets  and    -Samples

Proof

• Assume is being shattered by

• Assume

,Y X Y t

12 ' ( ,..., )

( ) ( ( )) 2( )i

t i ik iY Y Y

i

kk

i

R r r r R R

teG t G t

T

, lg( / ) 1t e te

(1 lg( / )) 3 lg( / )t k te k t

Page 17: On Complexity, Sampling, and   -Nets  and    -Samples

Proof

• Setting gives us:

3 lg( / )t k t

/x t

3 lg( )3 6

ln ln 22 6 ln(6 )12 ln(6 )

x k xx k kx

x k kt k k

Page 18: On Complexity, Sampling, and   -Nets  and    -Samples

Dual Range Spaces

• Let be a range space.

• Let , We define

• The dual range space is

( , )S X R

p X pR r p r R

( , ), pS R X X R p X

Page 19: On Complexity, Sampling, and   -Nets  and    -Samples

• Assume is shattered by , then there are points creating the following matrix:

12 1,...,r r R

S

2

1 2 2

1

2

0 1 1: 0 0 1

0 0 1

p p p

rM r

r

0 0 00 0 1

' : 0 1 0

1 1 1

M

Page 20: On Complexity, Sampling, and   -Nets  and    -Samples

• The set of points represented by the columns of M’ is shattered by

12 lg

0 0 00 0 1

' : 0 1 0

1 1 1

M

S

lg

1 lg 12

Page 21: On Complexity, Sampling, and   -Nets  and    -Samples

• Let be a range space, and let be a finite subset of points. A subset is an for if :

Samples ( , )S X R x

C x Sample x

: ( ) ( )r R m r s r

Page 22: On Complexity, Sampling, and   -Nets  and    -Samples

The Theorem• There is a positive constant such that if is

any range space with VC dimension at most , is a finite subset and then a random subset of cardinality

is an for with probability at least

c ( , )X R

Sample

x X, 0

2

1log logcs

sample x 1

Page 23: On Complexity, Sampling, and   -Nets  and    -Samples

• Let be a range space, and let be a finite subset of points. A subset is an for if:

Nets ( , )S X R x

N x Net x

: ( )r R m r r N

Page 24: On Complexity, Sampling, and   -Nets  and    -Samples

The TheoremNet • Let be a range space of VC dimension ,

and let be a finite subset of points.• Suppose • Let be a set obtained by random independent

draws from , where:

• Then is an with probability at least

( , )S X R x

0 1, 1

N mx

4 4 8 16max( lg , lg )m

N Net 1

Page 25: On Complexity, Sampling, and   -Nets  and    -Samples

Some Applications

Page 26: On Complexity, Sampling, and   -Nets  and    -Samples

Range Searching

• Consider a very large set of points in the plane.

• We would like to be able to quickly decide how many points are included inside a query rectangle.

• The Theorem tells us that there is a subset of constant size, that works for all query rectangles.

sample

Page 27: On Complexity, Sampling, and   -Nets  and    -Samples

Learning a Concept

• A function is defined over the plane and returns ‘1’ inside an unknown disk and ‘0’ outside of it.

• There is some distribution and we can pick (roughly) random points in a sample and compute just for them

f

unknownD

D((1/ ) log(1/ ))O

R f

Page 28: On Complexity, Sampling, and   -Nets  and    -Samples

Learning a Concept

:f

unknownDunknownD

Page 29: On Complexity, Sampling, and   -Nets  and    -Samples

Learning a Concept

• We compute the smallest that contains only the points in labeled ‘1’ and define that returns ‘1’ inside and ‘0’ outside it

• We claim that classifies correctly all but of the points:

'DD g

'D

unknownD

'D

g

fraction

Pr ( ) ( )p f p g p D

Page 30: On Complexity, Sampling, and   -Nets  and    -Samples

Learning a Concept

• Let be a range space where is all the symmetric differences between 2 disks.

• is our finite set•

• is the probability of mistake in classification

2( , )S R R

2D 'unknownr D D R

( )m r

Page 31: On Complexity, Sampling, and   -Nets  and    -Samples

Learning a Concept

• By the Theorem is an , and so:

Net R Net

( )m r r R

( )m r

Page 32: On Complexity, Sampling, and   -Nets  and    -Samples

Discrepancy

• Let be a coloring. We define the discrepancy of over a range as the amount of imbalance in the coloring:

• Let be a partition of into pairs. We will refer to as compatible with if a each point in a pair in is colored by a different color by

: { 1,1}X r

( ) ( )p r

r p

X

Page 33: On Complexity, Sampling, and   -Nets  and    -Samples

Discrepancy

• Let denote the crossing number of • Let be the contribution of the i’th crossing pair to

For , for some threshold we have by Chernoff’s inequality:

#r r

{ 1, 1}iX

( )r

2 # ln(4 )r rc m

2 1Pr ( ) 2Pr 2exp2# 2r

r i r ci r

r Xm

0c

Page 34: On Complexity, Sampling, and   -Nets  and    -Samples

• Lemma: Let be an for , and let be an for . Then is an for

• Proof:

Q P

Building via DiscrepancySample

1 sample P'Q Q 2 sample Q 'Q

1 2 sample P

1 2

' '' '

''

P r Q r P r Q r Q r Q rP Q P Q Q Q

P r Q r Q r Q rP Q Q Q

Page 35: On Complexity, Sampling, and   -Nets  and    -Samples

• Let be a range space with shattering dimension

• Let be a set of points, consider the induced range space

• • Consider coloring , with the discrepancy

bounded as we have seen ant let

( , )S X R

dP X

( )dPm R O n

Building via DiscrepancySample

n( , )P PS P R

{ ( ) 1},

2

Q p P p

nQ

Page 36: On Complexity, Sampling, and   -Nets  and    -Samples

• Now, for every range :

, for some absolute constant

r R

Building via DiscrepancySample

( \ ) ( ) ln(4 ) ln ( )

ln( )

d

d

P Q r Q r r n m n O n

c n n

c

2 ( \ ) ln( )dP r Q r P Q r Q r c n n

ln( )( ), ( )P r Q r d nn n cP Q n

Page 37: On Complexity, Sampling, and   -Nets  and    -Samples

• Let . We will compute coloring of with low discrepancy. Let be the points of colored -1 by

• Let

• By the lemma we have that is a for

0 1,P P P Q

Building via DiscrepancySample

1i

1iP iP

1iP 1i

1 1( ) ( )2i i i

nn

kP 1

kiisample

P

Page 38: On Complexity, Sampling, and   -Nets  and    -Samples

• We look for the maximal such that :

• So, taking the largest results in a set of the size which is an for

Building via DiscrepancySample

k1

kii

1 11

1 11 111 1

ln( )ln( / 2 ) ln( / 2 )/ 2 / 2

i kkk k

i i kii k

d nd n d nc c cn n n

2 2 2

11 1 112 2 2

1

ln( ) 2 lnkk

k

nc d c d c dnn

k kP2(( / ) ln( / ))O d d sample P

Page 39: On Complexity, Sampling, and   -Nets  and    -Samples

• Let be a range space of VC dimension and let be a set of points of size

• Let be a set of points obtained by independent samples from

• Let • We wish to bound • Let be a sample generated like

Proof of the TheoremNet

( , )S X R x n

1( ,..., )mN x xx

1 r R r x n r N

1Pr

1( ,..., )mT y y N

Page 40: On Complexity, Sampling, and   -Nets  and    -Samples

• Let

• Assume occurs, and • Then

Proof of the TheoremNet

2 2mr R r x n r N r T

1 2Pr 2Pr :

2 1 2 1 1 2 1Pr Pr / Pr Pr / Pr

2 1Pr 1/ 2 : 1 1'r

2 1 2Pr Pr ' Pr '2mr r T

Page 41: On Complexity, Sampling, and   -Nets  and    -Samples

• Let• Then is a binomial variable with:

• Thus, by Chebychev’s inequality:

Proof of the TheoremNet ' /p r x n

' 'X r T

' , ' (1 )X pm V X p p m pm

2

Pr ' Pr ' Pr '2 2 2

Pr ' Pr ' ' '2 2

2 1 , 8 / 8 /2

m pm pmX X X pm

pm pmX pm pm X X V X

m ppm

Page 42: On Complexity, Sampling, and   -Nets  and    -Samples

• Let

Proof of the TheoremNet '2 2

mr R r N r T

'1 2 2Pr 2Pr 2Pr :

'2 2

' /22Pr (2 )2 :mG m

2 2

2

'2' '

2 2

' '2 2

Pr ( )Pr Pr ( ) Pr

Pr

Pr ( ) Pr Pr ( )

m m

m

z x z x

z x

Z zZ z Z z

Z z

Z z Z z Z z

Page 43: On Complexity, Sampling, and   -Nets  and    -Samples

• Now we fix a set ,and it’s enough to consider the range space , we know

• Let us fix any and consider the event

Proof of the TheoremNet

Z( , )ZZ R (2 )ZR G m

Zr R

2rmr N r T

Page 44: On Complexity, Sampling, and   -Nets  and    -Samples

• If it is trivial• Otherwise,

Proof of the TheoremNet

/2Pr 2 :mr

( ) / 2k r N T m

/2

2 2Pr Pr /

(2 )(2 1)...( 1) ( 1)...( 1)2 (2 1)...( 1) 2 (2 1)...(2 1)

2 2

r

k m

m k mr N

m mm k m k m k m m m k

m m m m m m k

Page 45: On Complexity, Sampling, and   -Nets  and    -Samples

Proof of the TheoremNet

' /2 /22

' /22

Pr Pr Pr 2 (2 )2

Pr (2 )2

ZZ

m mr r Z

r Rr R

m

Z R G m

G m

• And to sum up:

' /21 2 2Pr 2Pr 2Pr 2 (2 )2 mG m