15-826: multimedia databases and data mining › ... › courses › 826.s09 › foils-pdf ›...
TRANSCRIPT
C. Faloutsos 15-826
1
CMU SCS
15-826: Multimedia Databases
and Data Mining
Lecture #12: Fractals - case studies Part III
(regions, quadtrees, knn queries)
C. Faloutsos
15-826 Copyright: C. Faloutsos (2009) 2
CMU SCS
Must-read Material
• Alberto Belussi and Christos Faloutsos,
Estimating the Selectivity of Spatial Queries
Using the `Correlation' Fractal Dimension
Proc. of VLDB, p. 299-310, 1995
15-826 Copyright: C. Faloutsos (2009) 3
CMU SCS
Optional Material
Optional, but very useful: Manfred Schroeder
Fractals, Chaos, Power Laws: Minutes
from an Infinite Paradise W.H. Freeman
and Company, 1991
C. Faloutsos 15-826
2
15-826 Copyright: C. Faloutsos (2009) 4
CMU SCS
Outline
Goal: ‘Find similar / interesting things’
• Intro to DB
• Indexing - similarity search
• Data Mining
15-826 Copyright: C. Faloutsos (2009) 5
CMU SCS
Indexing - Detailed outline
• primary key indexing
• secondary key / multi-key indexing
• spatial access methods
– z-ordering
– R-trees
– misc
• fractals
– intro
– applications
• text
• ...
15-826 Copyright: C. Faloutsos (2009) 6
CMU SCS
Indexing - Detailed outline
• fractals
– intro
– applications
• disk accesses for R-trees (range queries)
• dimensionality reduction
• selectivity in M-trees
• dim. curse revisited
• “fat fractals”
• quad-tree analysis [Gaede+]
• nn queries [Belussi+]
C. Faloutsos 15-826
3
15-826 Copyright: C. Faloutsos (2009) 7
CMU SCS
‘Fat’ fractals & R-tree
performance on region data• Problem [Proietti+,’99]
• Given
– N (# of data regions )
• estimate how many of them will qualify for the average range query (q1 x q2 x ... qE)
Of course, we need more info
Q: what?
15-826 Copyright: C. Faloutsos (2009) 8
CMU SCS
R-tree performance on region
dataA: the distributions of their sizes
Q: do we also need some info about the locations?
15-826 Copyright: C. Faloutsos (2009) 9
CMU SCS
R-tree performance on region
dataA: the distributions of their sizes
Q: do we also need some info about the locations?
A: no (not for range queries)
C. Faloutsos 15-826
4
15-826 Copyright: C. Faloutsos (2009) 10
CMU SCS
R-tree performance on region
dataA: the distributions of their sizes
Q: what exactly would we need?
15-826 Copyright: C. Faloutsos (2009) 11
CMU SCS
R-tree performance on region
dataA: the distributions of their sizes
Q: what exactly would we need?
A: for self-similar regions (~ ‘fat’ fractals), we just need the slope of the Korcak law!
(and the total area) [Proietti+]
15-826 Copyright: C. Faloutsos (2009) 12
CMU SCS
More power laws: areas –
Korcak’s law
Scandinavian lakes
Any pattern?
C. Faloutsos 15-826
5
15-826 Copyright: C. Faloutsos (2009) 13
CMU SCS
More power laws: areas –
Korcak’s law
Scandinavian lakes
area vs
complementary
cumulative count
(log-log axes) 0
2
4
6
8
10
0 2 4 6 8 10 12 14
Num
ber
of
regio
ns
Area
Area distribution of regions (SL dataset)
-0.85*x +10.8
log(count( >= area))
log(area)
B (patchiness
exponent)
15-826 Copyright: C. Faloutsos (2009) 14
CMU SCS
More power laws: Korcak
1
10
100
1000
1 10 100 1000 10000 100000
Nu
mb
er
of
isla
nd
s
Area
Area distribution of Japan Archipelago
Japan islands;
area vs cumulative
count (log-log axes) log(area)
log(count( >= area))
15-826 Copyright: C. Faloutsos (2009) 15
CMU SCS
Korcak’s law & “fat fractals”
C. Faloutsos 15-826
6
15-826 Copyright: C. Faloutsos (2009) 16
CMU SCS
R-tree performance on regions
• Once we know ‘B’ (and the total area)
• we can second-guess the individual sizes
• and then apply the [Pagel+93] formula
• Bottom line:
15-826 Copyright: C. Faloutsos (2009) 17
CMU SCS
R-tree performance on regions
15-826 Copyright: C. Faloutsos (2009) 18
CMU SCS
R-tree performance on regions
0.70190,526757REGIONS
0.60136,893470ISLANDS
0.8575,910816LAKES
BANDataset
C. Faloutsos 15-826
7
15-826 Copyright: C. Faloutsos (2009) 19
CMU SCS
R-tree performance on regions
query side
sel. error
15-826 Copyright: C. Faloutsos (2009) 20
CMU SCS
‘Fat’ fractals - observation
B: patchiness exp; d: embedding dim
DH: Hausdorff of periphery
B= DH / d
15-826 Copyright: C. Faloutsos (2009) 21
CMU SCS
‘Fat’ fractals - observation
C. Faloutsos 15-826
8
15-826 Copyright: C. Faloutsos (2009) 22
CMU SCS
‘Fat’ fractals
• intuition behind B = DH / d ?
15-826 Copyright: C. Faloutsos (2009) 23
CMU SCS
‘Fat’ fractals
• intuition behind B = DH / d ?
• A: consider ‘flooding’:
15-826 Copyright: C. Faloutsos (2009) 24
CMU SCS
‘Fat’ fractals
• intuition behind B = DH / d ?
• A: consider ‘flooding’:
C. Faloutsos 15-826
9
15-826 Copyright: C. Faloutsos (2009) 25
CMU SCS
Conclusions
• ‘Fat’ fractals model regions well
• patchiness exp.: B = DH / d
• can help us estimate selectivities
15-826 Copyright: C. Faloutsos (2009) 26
CMU SCS
Indexing - Detailed outline
• fractals
– intro
– applications
• disk accesses for R-trees (range queries)
• dimensionality reduction
• selectivity in M-trees
• dim. curse revisited
• “fat fractals”
• quad-tree analysis [Gaede+]
• nn queries [Belussi+]
15-826 Copyright: C. Faloutsos (2009) 27
CMU SCS
Fractals and Quadtrees
• Problem: how many quadtree nodes will we
need, to store a region in some level of
approximation? [Gaede+96]
C. Faloutsos 15-826
10
15-826 Copyright: C. Faloutsos (2009) 28
CMU SCS
Fractals and Quadtrees
• I.e.:
15-826 Copyright: C. Faloutsos (2009) 29
CMU SCS
Fractals and Quadtrees
• I.e.:
level of quadtree
# of quadtree
‘blocks’
?
15-826 Copyright: C. Faloutsos (2009) 30
CMU SCS
Fractals and Quadtrees
• Datasets:
FranconiaBrain Atlas
C. Faloutsos 15-826
11
15-826 Copyright: C. Faloutsos (2009) 31
CMU SCS
Fractals and Quadtrees
• Hint:
– assume that the boundary is self-similar, with a
given fd
– how will the quad-tree (oct-tree) look like?
15-826 Copyright: C. Faloutsos (2009) 32
CMU SCS
Fractals and Quadtrees
white
black
gray
15-826 Copyright: C. Faloutsos (2009) 33
CMU SCS
Fractals and QuadtreesLet pg(i) the prob. to find a gray node at level i.
If self-similar, what can we say for pg(i) ?
C. Faloutsos 15-826
12
15-826 Copyright: C. Faloutsos (2009) 34
CMU SCS
Fractals and QuadtreesLet pg(i) the prob. to find a gray node at level i.
If self-similar, what can we say for pg(i) ?
A: pg(i) = pg= constant
15-826 Copyright: C. Faloutsos (2009) 35
CMU SCS
Fractals and Quadtrees
Assume only ‘gray’ and ‘white’ nodes (ie., no volume’)
Assume that pg is given - how many gray nodes at level i?
15-826 Copyright: C. Faloutsos (2009) 36
CMU SCS
Fractals and Quadtrees
Assume only ‘gray’ and ‘white’ nodes (ie., no volume’)
Assume that pg is given - how many gray nodes at level i?
A: 1 at level 0;
4*pg
(4*pg)* (4*pg)
...
(4*pg )i
C. Faloutsos 15-826
13
15-826 Copyright: C. Faloutsos (2009) 37
CMU SCS
Fractals and Quadtrees
• I.e.:
level of quadtree
# of quadtree
‘blocks’
? (4*pg) i
15-826 Copyright: C. Faloutsos (2009) 38
CMU SCS
Fractals and Quadtrees
• I.e.:
level of quadtree
log(# of quadtree
‘blocks’)
? log[(4*pg) i]
15-826 Copyright: C. Faloutsos (2009) 39
CMU SCS
Fractals and Quadtrees
• Conclusion: Self-similarity leads to easy
and accurate estimation
level
log2(#blocks)
C. Faloutsos 15-826
14
15-826 Copyright: C. Faloutsos (2009) 40
CMU SCS
Fractals and Quadtrees
• Conclusion: Self-similarity leads to easy
and accurate estimation
level
log2(#blocks)
15-826 Copyright: C. Faloutsos (2009) 41
CMU SCS
Fractals and Quadtrees
15-826 Copyright: C. Faloutsos (2009) 42
CMU SCS
Fractals and Quadtrees
level
log(#blocks)
C. Faloutsos 15-826
15
15-826 Copyright: C. Faloutsos (2009) 43
CMU SCS
Fractals and Quadtrees
• Final observation: relationship between pg
and fractal dimension?
15-826 Copyright: C. Faloutsos (2009) 44
CMU SCS
Fractals and Quadtrees
• Final observation: relationship between pg
and fractal dimension?
• A: very close:
(4*pg)i = # of gray nodes at level i =
# of Hausdorff grid-cells of side (1/2)i = r
Eventually: DH = 2 + log2( pg )
and, for E-d spaces: DH = E + log2( pg )
15-826 Copyright: C. Faloutsos (2009) 45
CMU SCS
Fractals and Quadtrees
for E-d spaces: DH = E + log2( pg )
Sanity check:
- point: DH = 0 pg = ??
- line in 2-d: DH = 1 pg = ??
- plane in 2-d: DH=2 pg= ??
C. Faloutsos 15-826
16
15-826 Copyright: C. Faloutsos (2009) 46
CMU SCS
Fractals and Quadtrees
Final conclusions:
• self-similarity leads to estimates for # of z-
values = # of quadtree/oct-tree blocks
• close dependence on the Hausdorff fractal
dimension of the boundary
15-826 Copyright: C. Faloutsos (2009) 47
CMU SCS
Indexing - Detailed outline
• fractals
– intro
– applications
• disk accesses for R-trees (range queries)
• dimensionality reduction
• selectivity in M-trees
• dim. curse revisited
• “fat fractals”
• quad-tree analysis [Gaede+]
• nn queries [Belussi+]
15-826 Copyright: C. Faloutsos (2009) 48
CMU SCS
NN queries
• Q: in NN queries, what is the effect of the shape of the query region? [Belussi+95]
rLinf
L2
L1
C. Faloutsos 15-826
17
15-826 Copyright: C. Faloutsos (2009) 49
CMU SCS
NN queries
• Q: in NN queries, what is the effect of the shape of the query region?
• that is, for L2, and self-similar data:
log(d)
log(#pairs-within(<=d))
r L2
D2
15-826 Copyright: C. Faloutsos (2009) 50
CMU SCS
NN queries
• Q: What about L1, Linf?
log(d)
log(#pairs-within(<=d))
r L2
D2
15-826 Copyright: C. Faloutsos (2009) 51
CMU SCS
NN queries
• Q: What about L1, Linf?
• A: Same slope, different intercept
log(d)
log(#pairs-within(<=d))
r L2
D2
C. Faloutsos 15-826
18
15-826 Copyright: C. Faloutsos (2009) 52
CMU SCS
NN queries
• Q: What about L1, Linf?
• A: Same slope, different intercept
log(d)
log(#neighbors)
15-826 Copyright: C. Faloutsos (2009) 53
CMU SCS
NN queries
• Q: what about the intercept? Ie., what can we say about N2 and Ninf
r L2
N2 neighbors
rLinf
Ninf neighbors
volume: V2 volume: Vinf
SKIP
15-826 Copyright: C. Faloutsos (2009) 54
CMU SCS
NN queries
• Consider sphere with volume Vinf and r’radius
r L2
N2 neighbors
rLinf
Ninf neighbors
volume: V2 volume: Vinf
r’
SKIP
C. Faloutsos 15-826
19
15-826 Copyright: C. Faloutsos (2009) 55
CMU SCS
NN queries
• Consider sphere with volume Vinf and r’radius
• (r/r’)^E = V2 / Vinf
• (r/r’)^D2 = N2 / N2’
• N2’ = Ninf (since shape does not matter)
• and finally:
SKIP
15-826 Copyright: C. Faloutsos (2009) 56
CMU SCS
NN queries
( N2 / Ninf ) ^ 1/D2 = (V2 / Vinf) ^ 1/E
SKIP
15-826 Copyright: C. Faloutsos (2009) 57
CMU SCS
NN queries
Conclusions: for self-similar datasets
• Avg # neighbors: grows like (distance)^D2 , regardless of query shape (circle, diamond, square, e.t.c. )
C. Faloutsos 15-826
20
15-826 Copyright: C. Faloutsos (2009) 58
CMU SCS
Indexing - Detailed outline
• fractals
– intro
– applications• disk accesses for R-trees (range queries)
• dimensionality reduction
• selectivity in M-trees
• dim. curse revisited
• “fat fractals”
• quad-tree analysis [Gaede+]
• nn queries [Belussi+]
– Conclusions
15-826 Copyright: C. Faloutsos (2009) 59
CMU SCS
Fractals - overall conclusions
• self-similar datasets: appear often
• powerful tools: correlation integral, NCDF, rank-frequency plot
• intrinsic/fractal dimension helps in
– estimations (selectivities, quadtrees, etc)
– dim. reduction / dim. curse
• (later: can help in image compression...)
15-826 Copyright: C. Faloutsos (2009) 60
CMU SCS
References
1. Belussi, A. and C. Faloutsos (Sept. 1995). Estimating the
Selectivity of Spatial Queries Using the `Correlation' Fractal
Dimension. Proc. of VLDB, Zurich, Switzerland.
2. Faloutsos, C. and V. Gaede (Sept. 1996). Analysis of the z-
ordering Method Using the Hausdorff Fractal Dimension.
VLDB, Bombay, India.
3. Proietti, G. and C. Faloutsos (March 23-26, 1999). I/O
complexity for range queries on region data stored using an R-
tree. International Conference on Data Engineering (ICDE),
Sydney, Australia.