on complexity, sampling, and -nets and -samples
DESCRIPTION
On Complexity, Sampling, and -Nets and -Samples. Range Spaces. A range space is a pair , where is a ground set, it’s elements called points and is a family of subsets of , it’s elements called ranges . For example: . Measure. - PowerPoint PPT PresentationTRANSCRIPT
On Complexity, Sampling, and -Nets and -Samples
Range Spaces
• A range space is a pair , where is a ground set, it’s elements called points and is a family of subsets of , it’s elements called ranges.
• For example:
S ( , )X R XR
X
2( , )S X R all disks
Measure
• Let be a range space and let x be a finite subset of , then for the measure is:
( , )S X RX r R
( )r x
m rx
Motivation
• For , the estimation of , is defined by:
• We want to find a Sample , such that for every , .
r R
( )r N
s rN
N x
N xr R ( ) ( )m r s r
More Definitions
• Let be a range space. For , let be the projection of on
• If contains all subsets of , then is shattered by
• is the maximum cardinality of a shattered subset of
( , )S X R |YR r Y r R
Y XR Y
YR Y YR
dim ( )VC S X
Half-Spaces
• Claim: let be a set in . There are real numbers ,not all of them zero, such that
• Theorem: Let be a set in . Then, there exist 2 disjoint subsets such that
1 2 2, ,..., lP p p p l
1 2,..., l
0, 0i i ii i
p
1 2 2, ,..., lP p p p l
,C D P( ) ( ) ,CH C CH D C D P
Half-Spaces
• Lemma: let be a finite set. Let and is a half-space of containing . Then there is a point in contained in
• Corollary: Let Then, . Proof:– regular simplex–
lP ( )s CH P
dim ( ) 1VC S l
h l s
P h
( , )lS X R closed halfspaces
dim ( ) 1VC S l
dim ( ) 2VC S l
The Growth Function
• Let be the growth function.
• Notice:
0i
nG n n
i
1 1 1G n G n G n
Sauer’s Lemma
• If is a range space with and then
• Proof: By induction. The claim trivially holds for or .
• Let , and consider the sets:
•
( , )X R dim ( , )VC X R X n ( )R G n
0 0nx X
\ \ , \ \xR r x r x R r x R R x r x r R
\xR R R x
Sauer’s Lemma
• Notice:
• By induction.
dim ( \ , ) 1VC xX x R
1\ ( 1) ( 1) ( )xR R R x G n G n G n
Even More Definitions
• Let be a range space. We define its shatter function to be:
• We define the shattering dimension as the smallest such that
( , )S X R
( ) maxs BB XB m
m R
d ( ) ( ),ds m O m m
• Let , we have that . As such and by its definition
n Bd
d ( )BR G n n
BR n
• Let be the largest set shattered by and , so
Assuming
N X S
( log )O d d 2 ( ) d
sNR c N
lg( ) lg( )c d
lg( )lg( )
c d
max(2,2 lg( ))c
2 6 2(6 ) ln(6 )2 lg( ) ln ln 2
dd d d d
Why Do We Need Shattering Dimension?• Let be a range space. The
shattering dimension of is 3
• Proof: Consider any set of points, then : –
–
–
2( , )S X R all disks S
P n 34PR n
1Pr r R r n
2 2P
nr r R r
3 4 82 3P
n nr r R r
Mixing Range Spaces
• Let be range spaces with finite VC dimensions .
• Let be a function that maps k-tuples of sets , into a subset of X by
• Consider the range set• Theorem: The associated range space
has a VC dimension bounded by , where
1
( , )ki
i iS X R
1,..., k
1( ,..., )kf r ri
ir R , , \,
1' ( ,..., ) ik iR f r r r R
( lg )O k kmax i i
Proof
• Assume is being shattered by
•
• Assume
,Y X Y t
12 ' ( ,..., )
( ) ( ( )) 2( )i
t i ik iY Y Y
i
kk
i
R r r r R R
teG t G t
T
, lg( / ) 1t e te
(1 lg( / )) 3 lg( / )t k te k t
Proof
• Setting gives us:
3 lg( / )t k t
/x t
3 lg( )3 6
ln ln 22 6 ln(6 )12 ln(6 )
x k xx k kx
x k kt k k
Dual Range Spaces
• Let be a range space.
• Let , We define
• The dual range space is
( , )S X R
p X pR r p r R
( , ), pS R X X R p X
• Assume is shattered by , then there are points creating the following matrix:
12 1,...,r r R
S
2
1 2 2
1
2
0 1 1: 0 0 1
0 0 1
p p p
rM r
r
0 0 00 0 1
' : 0 1 0
1 1 1
M
• The set of points represented by the columns of M’ is shattered by
12 lg
0 0 00 0 1
' : 0 1 0
1 1 1
M
S
lg
1 lg 12
• Let be a range space, and let be a finite subset of points. A subset is an for if :
Samples ( , )S X R x
C x Sample x
: ( ) ( )r R m r s r
The Theorem• There is a positive constant such that if is
any range space with VC dimension at most , is a finite subset and then a random subset of cardinality
is an for with probability at least
c ( , )X R
Sample
x X, 0
2
1log logcs
sample x 1
• Let be a range space, and let be a finite subset of points. A subset is an for if:
Nets ( , )S X R x
N x Net x
: ( )r R m r r N
The TheoremNet • Let be a range space of VC dimension ,
and let be a finite subset of points.• Suppose • Let be a set obtained by random independent
draws from , where:
• Then is an with probability at least
( , )S X R x
0 1, 1
N mx
4 4 8 16max( lg , lg )m
N Net 1
Some Applications
Range Searching
• Consider a very large set of points in the plane.
• We would like to be able to quickly decide how many points are included inside a query rectangle.
• The Theorem tells us that there is a subset of constant size, that works for all query rectangles.
sample
Learning a Concept
• A function is defined over the plane and returns ‘1’ inside an unknown disk and ‘0’ outside of it.
• There is some distribution and we can pick (roughly) random points in a sample and compute just for them
f
unknownD
D((1/ ) log(1/ ))O
R f
Learning a Concept
:f
unknownDunknownD
Learning a Concept
• We compute the smallest that contains only the points in labeled ‘1’ and define that returns ‘1’ inside and ‘0’ outside it
• We claim that classifies correctly all but of the points:
'DD g
'D
unknownD
'D
g
fraction
Pr ( ) ( )p f p g p D
Learning a Concept
• Let be a range space where is all the symmetric differences between 2 disks.
• is our finite set•
• is the probability of mistake in classification
2( , )S R R
2D 'unknownr D D R
( )m r
Learning a Concept
• By the Theorem is an , and so:
Net R Net
( )m r r R
( )m r
Discrepancy
• Let be a coloring. We define the discrepancy of over a range as the amount of imbalance in the coloring:
• Let be a partition of into pairs. We will refer to as compatible with if a each point in a pair in is colored by a different color by
: { 1,1}X r
( ) ( )p r
r p
X
Discrepancy
• Let denote the crossing number of • Let be the contribution of the i’th crossing pair to
For , for some threshold we have by Chernoff’s inequality:
#r r
{ 1, 1}iX
( )r
2 # ln(4 )r rc m
2 1Pr ( ) 2Pr 2exp2# 2r
r i r ci r
r Xm
0c
• Lemma: Let be an for , and let be an for . Then is an for
• Proof:
Q P
Building via DiscrepancySample
1 sample P'Q Q 2 sample Q 'Q
1 2 sample P
1 2
' '' '
''
P r Q r P r Q r Q r Q rP Q P Q Q Q
P r Q r Q r Q rP Q Q Q
• Let be a range space with shattering dimension
• Let be a set of points, consider the induced range space
• • Consider coloring , with the discrepancy
bounded as we have seen ant let
( , )S X R
dP X
( )dPm R O n
Building via DiscrepancySample
n( , )P PS P R
{ ( ) 1},
2
Q p P p
nQ
• Now, for every range :
, for some absolute constant
r R
Building via DiscrepancySample
( \ ) ( ) ln(4 ) ln ( )
ln( )
d
d
P Q r Q r r n m n O n
c n n
c
2 ( \ ) ln( )dP r Q r P Q r Q r c n n
ln( )( ), ( )P r Q r d nn n cP Q n
• Let . We will compute coloring of with low discrepancy. Let be the points of colored -1 by
• Let
• By the lemma we have that is a for
0 1,P P P Q
Building via DiscrepancySample
1i
1iP iP
1iP 1i
1 1( ) ( )2i i i
nn
kP 1
kiisample
P
• We look for the maximal such that :
• So, taking the largest results in a set of the size which is an for
Building via DiscrepancySample
k1
kii
1 11
1 11 111 1
ln( )ln( / 2 ) ln( / 2 )/ 2 / 2
i kkk k
i i kii k
d nd n d nc c cn n n
2 2 2
11 1 112 2 2
1
ln( ) 2 lnkk
k
nc d c d c dnn
k kP2(( / ) ln( / ))O d d sample P
• Let be a range space of VC dimension and let be a set of points of size
• Let be a set of points obtained by independent samples from
• Let • We wish to bound • Let be a sample generated like
Proof of the TheoremNet
( , )S X R x n
1( ,..., )mN x xx
1 r R r x n r N
1Pr
1( ,..., )mT y y N
• Let
• Assume occurs, and • Then
Proof of the TheoremNet
2 2mr R r x n r N r T
1 2Pr 2Pr :
2 1 2 1 1 2 1Pr Pr / Pr Pr / Pr
2 1Pr 1/ 2 : 1 1'r
2 1 2Pr Pr ' Pr '2mr r T
• Let• Then is a binomial variable with:
• Thus, by Chebychev’s inequality:
Proof of the TheoremNet ' /p r x n
' 'X r T
' , ' (1 )X pm V X p p m pm
2
Pr ' Pr ' Pr '2 2 2
Pr ' Pr ' ' '2 2
2 1 , 8 / 8 /2
m pm pmX X X pm
pm pmX pm pm X X V X
m ppm
• Let
Proof of the TheoremNet '2 2
mr R r N r T
'1 2 2Pr 2Pr 2Pr :
'2 2
' /22Pr (2 )2 :mG m
2 2
2
'2' '
2 2
' '2 2
Pr ( )Pr Pr ( ) Pr
Pr
Pr ( ) Pr Pr ( )
m m
m
z x z x
z x
Z zZ z Z z
Z z
Z z Z z Z z
• Now we fix a set ,and it’s enough to consider the range space , we know
• Let us fix any and consider the event
Proof of the TheoremNet
Z( , )ZZ R (2 )ZR G m
Zr R
2rmr N r T
• If it is trivial• Otherwise,
Proof of the TheoremNet
/2Pr 2 :mr
( ) / 2k r N T m
/2
2 2Pr Pr /
(2 )(2 1)...( 1) ( 1)...( 1)2 (2 1)...( 1) 2 (2 1)...(2 1)
2 2
r
k m
m k mr N
m mm k m k m k m m m k
m m m m m m k
Proof of the TheoremNet
' /2 /22
' /22
Pr Pr Pr 2 (2 )2
Pr (2 )2
ZZ
m mr r Z
r Rr R
m
Z R G m
G m
• And to sum up:
' /21 2 2Pr 2Pr 2Pr 2 (2 )2 mG m