in spaces of curves algorithms and data structuresadriemel/wogml_talk.pdf · anne driemel:...
TRANSCRIPT
Anne Driemel: Algorithms and data structures for spaces of curves
Algorithms and Data Structures
in Spaces of Curves
Anne Driemel
TU Eindhoven, the Netherlands
Anne Driemel: Algorithms and data structures for spaces of curves
Spaces of Curves
A curve is a continuous map
f : [0, 1]→ Rd
f(0)
f(t)
f(1)
Anne Driemel: Algorithms and data structures for spaces of curves
Spaces of Curves
A curve is a continuous map
Example:time series data
f : [0, 1]→ Rd
f(0)
f(t)
f(1)
Anne Driemel: Algorithms and data structures for spaces of curves
Spaces of Curves
A curve is a continuous map
Example:time series data
f : [0, 1]→ Rd
f(0)
f(t)
f(1)
Anne Driemel: Algorithms and data structures for spaces of curves
Spaces of Curves
A curve is a continuous map
Example:time series data
f : [0, 1]→ Rd
f(0)
f(t)
f(1)
Anne Driemel: Algorithms and data structures for spaces of curves
Spaces of Curves
A curve is a continuous map
Example:time series data
f : [0, 1]→ Rd
f(0)
f(t)
f(1)
Anne Driemel: Algorithms and data structures for spaces of curves
Machine Learning
Step (1): feature extraction
Step (2): learning the distribution in feature space
Hyndman, Wang, Laptev: Large-Scale Unusual Time Series Detection,ICDM Workshops 2015
Anne Driemel: Algorithms and data structures for spaces of curves
– results depend on the choice of features
– features have to be manually designed
– features capture certain aspects only
– features are not always independent of one another
Disadvantages of working in feature space:
Machine Learning
Anne Driemel: Algorithms and data structures for spaces of curves
– results depend on the choice of features
– features have to be manually designed
– features capture certain aspects only
– features are not always independent of one another
Question: Is it possible to bypass the feature spaceand learn the distribution of curves directly?
Disadvantages of working in feature space:
Machine Learning
Anne Driemel: Algorithms and data structures for spaces of curves
Overview of this talk
Machine
learning in
spaces of
curves
Anne Driemel: Algorithms and data structures for spaces of curves
Overview of this talk
Clustering
Median and Depth
Machine
learning in
spaces of
curves
Density estimate
Distribution of curves
Anne Driemel: Algorithms and data structures for spaces of curves
Overview of this talk
Clustering
Median and Depth
Classification ofcurves
Range searching
Machine
learning in
spaces of
curves
NN-suche
Density estimate
Distribution of curves
Decision trees
Anne Driemel: Algorithms and data structures for spaces of curves
Overview of this talk
Clustering
Median and Depth
Classification ofcurves
Range searching
Dimension reduction
Machine
learning in
spaces of
curves
Geometric approximation
Representation
NN-suche
Density estimate
Overfitting
Distribution of curves
Decision trees
Anne Driemel: Algorithms and data structures for spaces of curves
Overview of this talk
Clustering
Median and Depth
Classification ofcurves
Range searching
Dimension reduction
Machine
learning in
spaces of
curves
Geometric approximation
Representation
NN-suche
Density estimate
Overfitting
Distribution of curvesComplexity
Time
Error rate
Space
Sample size
Decision trees
Anne Driemel: Algorithms and data structures for spaces of curves
Density estimate
Density estimate using sophisticated histograms (KDE)
Only works in low-dimensional space
Modes
[Scott Multivariate Density Estimation 1950].
Anne Driemel: Algorithms and data structures for spaces of curves
Density estimate for curves?
What is the typicalshape of a hurricanetrajectory?
Anne Driemel: Algorithms and data structures for spaces of curves
Definition of the median
Let (X, d) be a metric space on a set X withdistance function d : X2 → R≥0
The median of a set P ⊆ X is defined as
argminc∈X
∑p∈P
d(c, p)
c
Anne Driemel: Algorithms and data structures for spaces of curves
Overfitting of the median
A1, . . . , An/2
B1, . . . , Bn/2
c
Anne Driemel: Algorithms and data structures for spaces of curves
Smoothing of the median
Definition: Let Xd` be the set of polygonal curves in
Rd that have exactly ` edges.
The `-Median of a set P ⊆ Xdm is defined as
argminc∈Xd
`
∑p∈P
d(c, p)
Anne Driemel: Algorithms and data structures for spaces of curves
Smoothing of the median
Definition: Let Xd` be the set of polygonal curves in
Rd that have exactly ` edges.
The `-Median of a set P ⊆ Xdm is defined as
argminc∈Xd
`
∑p∈P
d(c, p)
Anne Driemel: Algorithms and data structures for spaces of curves
Clustering of curves
argminC⊆Xd
`|C|=k
∑p∈P
d(nn(p, C), p)
argminC⊆Xd
`|C|=k
maxp∈P
d(nn(p, C), p)
The (k, `)-center-clustering is
The (k, `)-median-clustering is
Let C ⊆ Xd` be a k-set of centers
Let nn(p, C) be the closest center to a point p
Anne Driemel: Algorithms and data structures for spaces of curves
Clustering in discrete metric spaces
k-median, randomised:– (1 + ε)-approximation in O(n2(k/ε)
O(1)
) time– Algorithm of [Kumar, Sabharwal, Sen ’10]– Doubling spaces: [Ackermann, Blomer, Sohler ’10]
k-center:– 2-approximation in O(kn) time [Gonzales ’85]
k-median:– (1 +
√3 + ε)-approximation [Li and Svensson ’13]
– Lower bound for the approximation factor ∼ 1.73
Most variants of clustering are NP-hard or APX-hard
Anne Driemel: Algorithms and data structures for spaces of curves
Distance measure for curves
A
B
What is the minimal distance with which two peoplecan traverse both entire curves?
δ
The Frechet distance measures the similarity of curves
Anne Driemel: Algorithms and data structures for spaces of curves
Clustering of curves
1-center (no smoothing):
– O(m1m2 log(m1 +m2))[Alt and Godau ’95] [Buchin et al. ’13]
– Lower bound based on SETH Ω((m1 +m2)2−δ) (∀δ > 0)[Bringmann ’14] [Bringmann and Mulzer ’15]
– weak Frechet distance: O(mn) time and space[Har-Peled, Raichel ’11]
Given n curves in Rd with m edges each
– discrete Frechet distance: O(mn) time and space[Ahn et al. ’15]
Single distance computation:
Given two curves m1 and m2 edges
Anne Driemel: Algorithms and data structures for spaces of curves
Theorem:We can compute an O(1)-approximation for the(k, `)-center-clustering and (k, `)-median-clusteringin O(nm) time. For d = 1 we can compute a(1 + ε)-approximation in the same time.
Our results for clustering
Driemel, Krivosija und Sohler:Clustering time series under the Frechet distanceSODA 2016
Anne Driemel: Algorithms and data structures for spaces of curves
Theorem:We can compute an O(1)-approximation for the(k, `)-center-clustering and (k, `)-median-clusteringin O(nm) time. For d = 1 we can compute a(1 + ε)-approximation in the same time.
Sampling-Property:For d = 1 one can derive a (1 + ε)-approximation of the(1, `)-median based on a sample of size mε`.
Our results for clustering
Driemel, Krivosija und Sohler:Clustering time series under the Frechet distanceSODA 2016
Anne Driemel: Algorithms and data structures for spaces of curves
Overview of this talk
Clustering
Median and Depth
Classification ofcurves
Range searching
Dimension reduction
Machine
learning in
spaces of
curves
Geometric approximation
Representation
NN-suche
Density estimate
Overfitting
Distribution of curvesComplexity
Time
Error rate
Space
Sample size
Decision trees
Anne Driemel: Algorithms and data structures for spaces of curves
Classification
A central task in machine learning is the approximationof a function
The input is usually a set of examples
(p, `) | p ∈ X, ` ∈ L
f : X → L
X = instancesL = labels
Anne Driemel: Algorithms and data structures for spaces of curves
Classification of curves
de Vries, van Someren Machine learning for vessel trajectories using compression,alignments and domain knowledge Expert Syst. Appl. 2012
Cargo ship
Tanker
Law enforcement
Tug
Anne Driemel: Algorithms and data structures for spaces of curves
Nearest-neighbor searching
x1
x2
label 1label 2
Anne Driemel: Algorithms and data structures for spaces of curves
Nearest-neighbor searching
x1
x2
label 1label 2
q
Query
Anne Driemel: Algorithms and data structures for spaces of curves
Nearest-neighbor searching
x1
x2
label 1label 2
q The label of the exampleinstance closest to q
Answer:Query
Anne Driemel: Algorithms and data structures for spaces of curves
Nearest-neighbor searching
x1
x2
label 1label 2
q
(1) Partition the space intohomogeneous areas
(2) Data structure forpoint location
The label of the exampleinstance closest to q
Answer:
f : X → L
Query
Anne Driemel: Algorithms and data structures for spaces of curves
Nearest-neighbor searching
x1
x2
label 1label 2
q
(1) Partition the space intohomogeneous areas
(2) Data structure forpoint location
The label of the exampleinstance closest to q
Answer:
f : X → L
Voronoi diagram
Θ(ndd/2e)
Query
Anne Driemel: Algorithms and data structures for spaces of curves
Nearest-neighbor searching
(a) if d(p,q) ≤ d1 then Pr [h(p) = h(q)] ≥ p1(b) if d(p,q) ≥ d2 then Pr [h(p) = h(q)] ≤ p2
A set of hash functions H is called (d1, d2, p1, p2)-locality-sensitive if it holds for all p,q ∈ Rd:
d1
d2
p
Anne Driemel: Algorithms and data structures for spaces of curves
Driemel und Silvestri: Locality-sensitive-hashing for curvesSoCG 2017
Space Query time ApproximationO(mn) O
(m2n
)1 –
O(|U |√m(m
√mn)2
)O(mO(1) log n
)O(logm+ log log n) [Indyk02]
O(n log n+mn) O(m log n) O(m) new
O(24mdn log n+mn) O(24mdm log n)O(1) new
O(22kmk−1n log n+mn)O(22kmk log n) O(m/k) new
Our results for NN-searching
We show LSH-families for the discrete Frechet distanceinput: n curves from Xd
m.
Anne Driemel: Algorithms and data structures for spaces of curves
Overview of this talk
Clustering
Median and Depth
Classification ofcurves
Range searching
Dimension reduction
Machine
learning in
spaces of
curves
Geometric approximation
Representation
NN-suche
Density estimate
Overfitting
Distribution of curvesComplexity
Time
Error rate
Space
Sample size
Decision trees
Anne Driemel: Algorithms and data structures for spaces of curves
Range searching for curves
q
Which hurricate trajectories aresimilar to the query curve q?
Anne Driemel: Algorithms and data structures for spaces of curves
Range searching for curves
A range query (q, r) among a set of curves S ⊆ Xdm
should return the set
p ∈ S | d(p, q) ≤ r
The range is defined as the metric ball of radius rcentered at q ∈ Xd
` .
Anne Driemel: Algorithms and data structures for spaces of curves
Range searching for curves
Only known data structure for this type of queries:
For any parameter value 1 ≤ s ≤√n
– approximation ≈ 6.24– query time in O(spolylog n)– space in O((ns )2 polylog n)
Queries: Only supports query curves from X21
[de Berg, Cook and Gudmundsson ’13]
Output: Count of the curves in the range
Anne Driemel: Algorithms and data structures for spaces of curves
Our results for range searching
Theorem:
Assume there exists a data structure for Frechet rangequeries among an n-set from Xd
m with– Space in ≤ S(n)– Query time in ≤ Q(n) +O(k) (k is output size)
Then it holds
S(n) = Ω
((n
Q(n)
)2)
Driemel und Afshani, On the complexity of range searchingamong curves (Manuscript).
Anne Driemel: Algorithms and data structures for spaces of curves
Range searching for curves
Our proof uses a volume argument of [Afshani ’13]
Input:Slabs in R2
Output:Slabs that contain q
q
Q = [0, 1]2
Anne Driemel: Algorithms and data structures for spaces of curves
Range searching for curves
Our proof uses a volume argument of [Afshani ’13]
Input:Slabs in R2
Output:Slabs that contain q
Q = [0, 1]2
Input construction withoutput size k = t
Anne Driemel: Algorithms and data structures for spaces of curves
Range searching for curves
Our proof uses a volume argument of [Afshani ’13]
Input:Slabs in R2
Output:Slabs that contain q
Q = [0, 1]2
Input construction withoutput size k = t
Anne Driemel: Algorithms and data structures for spaces of curves
Range searching for curves
Our proof uses a volume argument of [Afshani ’13]
Input:Slabs in R2
Output:Slabs that contain q
Q = [0, 1]2
Input construction withoutput size k = t
Anne Driemel: Algorithms and data structures for spaces of curves
Range searching for curves
Our proof uses a volume argument of [Afshani ’13]
Input:Slabs in R2
Output:Slabs that contain q
Q = [0, 1]2
S(n) ≥ tvmax
Input construction withoutput size k = t
vmax
Anne Driemel: Algorithms and data structures for spaces of curves
Range searching for curves
Our proof uses a volume argument of [Afshani ’13]
Input:Slabs in R2
Output:Slabs that contain q
v ≤ t3
n2·π
tn
Q = [0, 1]2
πt
Input construction withoutput size k = t
S(n) ≥ tvmax
≥(nt
)2
Anne Driemel: Algorithms and data structures for spaces of curves
Range searching for curves
Our proof uses a volume argument of [Afshani ’13]
Input:Slabs in R2
Output:Slabs that contain q
Q = [0, 1]2
Input construction withoutput size k = t
S(n) ≥ tvmax
≥(nt
)2=(
nQ(n)
)2
v ≤ t3
n2·π
Anne Driemel: Algorithms and data structures for spaces of curves
Range searching for curves
q q
Dualization of range queries
p p
input space query space
Anne Driemel: Algorithms and data structures for spaces of curves
Range searching for curves
What is the locus of queries that output this curve?
Restrict queries to X21 (Single edges)
Anne Driemel: Algorithms and data structures for spaces of curves
Range searching for curves
1
2
3
4
5
6
r
What is the locus of queries that output this curve?
Restrict queries to X21 (Single edges)
Anne Driemel: Algorithms and data structures for spaces of curves
Range searching for curves
1
2
3
4
5
6
r
What is the locus of queries that output this curve?
Restrict queries to X21 (Single edges)
Anne Driemel: Algorithms and data structures for spaces of curves
Range searching for curves
What is the locus of queries that output this curve?
Restrict queries to X21 (Single edges)
Anne Driemel: Algorithms and data structures for spaces of curves
Range searching for curves
α
β
`
Represent query lines in the dual space (β, α)
Anne Driemel: Algorithms and data structures for spaces of curves
Range searching for curves
α
β
`
p
xc
ycr
Represent query lines in the dual space (β, α)
Lines stabbing the disk at (xc, yc) with radius r:
α ≤ xc sin(β) + yc cos(β) + r
α ≥ xc sin(β) + yc cos(β)− r
Anne Driemel: Algorithms and data structures for spaces of curves
β
α
q
q
Dualization of range queries for curves
Range searching for curves
input space query space
A
Anne Driemel: Algorithms and data structures for spaces of curves
β
α
q
q
Dualization of range queries for curves
Range searching for curves
input space query space
A B
Anne Driemel: Algorithms and data structures for spaces of curves
β
α
Dualization of range queries for curves
Q
Q
Range searching for curves
input space query space
A B
Anne Driemel: Algorithms and data structures for spaces of curves
Range searching for curves
t
nt
Sketching the input construction for the lower bound:
– Fix the input curves at their endpoints
Anne Driemel: Algorithms and data structures for spaces of curves
Range searching for curves
t
nt
xi
Sketching the input construction for the lower bound:
– A query line outputs an input curve pij iff the queryline intersects the vertical grid edge at (i, j)
yj
– Fix the input curves at their endpoints
Anne Driemel: Algorithms and data structures for spaces of curves
Range searching for curves
t
nt
xi
Sketching the input construction for the lower bound:
– A query line outputs an input curve pij iff the queryline intersects the vertical grid edge at (i, j)
yj
– Fix the input curves at their endpoints
Anne Driemel: Algorithms and data structures for spaces of curves
Range searching for curves
t
nt
Sketching the input construction for the lower bound:
– A query line outputs an input curve pij iff the queryline intersects the vertical grid edge at (i, j)
xi
– Fix the input curves at their endpoints
yj
Anne Driemel: Algorithms and data structures for spaces of curves
Range searching for curves
t
nt
Sketching the input construction for the lower bound:
– A query line outputs an input curve pij iff the queryline intersects the vertical grid edge at (i, j)
– Fix the input curves at their endpoints
???
Anne Driemel: Algorithms and data structures for spaces of curves
Range searching for curves
β
α
Two input curves in the query space
Q
Anne Driemel: Algorithms and data structures for spaces of curves
Range searching for curves
β
α
Two input curves in the query space
Anne Driemel: Algorithms and data structures for spaces of curves
Lower bounds for range searching
Theorem:
Assume there exists a data structure for Frechet rangequeries in Xd
` among an n-set from Xdm with
– Space in ≤ S(n)– Query time in ≤ Q(n) +O(k) (k is output size)
Then it holds
Driemel und Afshani, On the complexity of range searchingamong curves (Manuscript)
S(n) = Ω
((n
Q(n)
)2)
Anne Driemel: Algorithms and data structures for spaces of curves
Lower bounds for range searching
Assume there exists a data structure for Frechet rangequeries in Xd
` among an n-set from Xdm with
– Space in ≤ S(n)– Query time in ≤ Q(n) +O(k) (k is output size)
Then it holds
Driemel und Afshani, On the complexity of range searchingamong curves (Manuscript)
S(n) = Ω
((n
Q(n)
)2
·(
log(n/Q(n))
`2
)`−2)
Theorem (Extension):
Anne Driemel: Algorithms and data structures for spaces of curves
Data structures
Theorem (discrete Frechet):
We can build a data structure for discrete Frechet rangequeries in Xd
` among an n-set from Xdm with
– Space in O(n(log log n)m−1)
– Query time in O(n1−1/d logO(m) n · `O(d) + k)( where k is the output size)
assuming ` ∈ O(polylog n).
We can match the lower bound with a multi-levelpartition tree that has the correct number of levels:
Driemel und Afshani, On the complexity of range searchingamong curves (Manuscript)
Anne Driemel: Algorithms and data structures for spaces of curves
Data structures
We can match the lower bound with a multi-levelpartition tree that has the correct number of levels:
Driemel und Afshani, On the complexity of range searchingamong curves (Manuscript)
We can build a data structure for continuous Frechetrange queries in X2
` among an n-set from X2m with
– Space in O(n(log log n)O(m2))
– Query time in O(√n logO(m) n+ k)
( where k is the output size)assuming ` ∈ O(polylog n).
Theorem (continuous Frechet):
Anne Driemel: Algorithms and data structures for spaces of curves
Data structure for discrete Frechet
Sketch of the data structure
(input curves)
Anne Driemel: Algorithms and data structures for spaces of curves
Data structure for discrete Frechet
Sketch of the data structure
P1
Multi-level partition tree
P1 P2 P3 . . .
(input curves)
Anne Driemel: Algorithms and data structures for spaces of curves
P2
Data structure for discrete Frechet
Sketch of the data structure
Multi-level partition tree
P1 P2 P3 . . .
(input curves)
Anne Driemel: Algorithms and data structures for spaces of curves
Data structure for discrete Frechet
Sketch of the data structure
P3
Multi-level partition tree
P1 P2 P3 . . .
(input curves)
Anne Driemel: Algorithms and data structures for spaces of curves
Data structure for discrete Frechet
Sketch of the data structure
P4
Multi-level partition tree
P1 P2 P3 . . .
. . .
(input curves)
Anne Driemel: Algorithms and data structures for spaces of curves
q1
q2
q3q4
q5
Data structure for discrete Frechet
Sketch of the query algorithm
(query curve)
Anne Driemel: Algorithms and data structures for spaces of curves
q1
q2
q3q4
q5
Data structure for discrete Frechet
Sketch of the query algorithm
(query curve)
Anne Driemel: Algorithms and data structures for spaces of curves
q1
q2
q3q4
q5
Data structure for discrete Frechet
Sketch of the query algorithm
q1 1q2 1q3 0q4 1q5 0
(query curve)
Anne Driemel: Algorithms and data structures for spaces of curves
q1 1 0 0 1q2 1 0 0 0q3 0 1 0 0q4 1 0 1 0q5 0 0 0 1
Data structure for discrete Frechet
Sketch of the query algorithm
Anne Driemel: Algorithms and data structures for spaces of curves
q1 1 0 0 1q2 1 0 0 0q3 0 1 0 0q4 1 0 1 0q5 0 0 0 1
Data structure for discrete Frechet
Sketch of the query algorithm
P1
Anne Driemel: Algorithms and data structures for spaces of curves
q1 1 0 0 1q2 1 0 0 0q3 0 1 0 0q4 1 0 1 0q5 0 0 0 1
Data structure for discrete Frechet
Sketch of the query algorithm
P2
Anne Driemel: Algorithms and data structures for spaces of curves
q1 1 0 0 1q2 1 0 0 0q3 0 1 0 0q4 1 0 1 0q5 0 0 0 1
Data structure for discrete Frechet
Sketch of the query algorithm
P3
Anne Driemel: Algorithms and data structures for spaces of curves
q1 1 0 0 1q2 1 0 0 0q3 0 1 0 0q4 1 0 1 0q5 0 0 0 1
Data structure for discrete Frechet
Sketch of the query algorithm
P4
Anne Driemel: Algorithms and data structures for spaces of curves
q1 1 0 0 1q2 1 0 0 0q3 0 1 0 0q4 1 0 1 0q5 0 0 0 1
Data structure for discrete Frechet
Sketch of the query algorithm
feasible alignment
Anne Driemel: Algorithms and data structures for spaces of curves
Data structures
Theorem (discrete Frechet):
We can build a data structure for discrete Frechet rangequeries in Xd
` among an n-set from Xdm with
– Space in O(n · (log log n)m−1)
– Query time in O(n1−1/d logO(m) n · `O(d) + k)( where k is the output size)
assuming ` in O(polylog n).
We can match the lower bound with a multi-levelpartition tree that has the correct number of levels:
Driemel und Afshani, On the complexity of range searchingamong curves (Manuscript)
Anne Driemel: Algorithms and data structures for spaces of curves
Data structures
We can match the lower bound with a multi-levelpartition tree that has the correct number of levels:
Driemel und Afshani, On the complexity of range searchingamong curves (Manuscript)
We can build a data structure for continuous Frechetrange queries in X2
` among an n-set from X2m with
– Space in O(n · (log log n)O(m2))
– Query time in O(√n logO(m) n+ k)
( where k is the output size)assuming ` in O(polylog n).
Theorem (continuous Frechet):
Anne Driemel: Algorithms and data structures for spaces of curves
Overview of this talk
Clustering
Median and Depth
Classification ofcurves
Range searching
Dimension reduction
Machine
learning in
spaces of
curves
Geometric approximation
Representation
NN-suche
Density estimate
Overfitting
Distribution of curvesComplexity
Time
Error rate
Space
Sample size
Decision trees
Anne Driemel: Algorithms and data structures for spaces of curves
Summary
We talked about:
(3) Machine learning for curves
(1) Clustering for curves
Challenges:
– Curse of dimensionality
– Risk of overfitting
– Complexity of the distance measure
(2) Data structuring for curves
Anne Driemel: Algorithms and data structures for spaces of curves
Some open problems
– Clustering
– NN-Searching
Practical algorithms for clustering of curvesMore robust definitions of median and depth
Randomized curve simplificationLSH for continuous Frechet distance
– Range-Searching
DS for high-space/ low-query-time regime
Continuous Frechet distance in 3D (seems much harder)