in spaces of curves algorithms and data structuresadriemel/wogml_talk.pdf · anne driemel:...

91
Algorithms and Data Structures in Spaces of Curves Anne Driemel TU Eindhoven, the Netherlands

Upload: hadung

Post on 20-Aug-2019

212 views

Category:

Documents


0 download

TRANSCRIPT

Anne Driemel: Algorithms and data structures for spaces of curves

Algorithms and Data Structures

in Spaces of Curves

Anne Driemel

TU Eindhoven, the Netherlands

Anne Driemel: Algorithms and data structures for spaces of curves

Spaces of Curves

A curve is a continuous map

f : [0, 1]→ Rd

f(0)

f(t)

f(1)

Anne Driemel: Algorithms and data structures for spaces of curves

Spaces of Curves

A curve is a continuous map

Example:time series data

f : [0, 1]→ Rd

f(0)

f(t)

f(1)

Anne Driemel: Algorithms and data structures for spaces of curves

Spaces of Curves

A curve is a continuous map

Example:time series data

f : [0, 1]→ Rd

f(0)

f(t)

f(1)

Anne Driemel: Algorithms and data structures for spaces of curves

Spaces of Curves

A curve is a continuous map

Example:time series data

f : [0, 1]→ Rd

f(0)

f(t)

f(1)

Anne Driemel: Algorithms and data structures for spaces of curves

Spaces of Curves

A curve is a continuous map

Example:time series data

f : [0, 1]→ Rd

f(0)

f(t)

f(1)

Anne Driemel: Algorithms and data structures for spaces of curves

Machine Learning

Step (1): feature extraction

Step (2): learning the distribution in feature space

Hyndman, Wang, Laptev: Large-Scale Unusual Time Series Detection,ICDM Workshops 2015

Anne Driemel: Algorithms and data structures for spaces of curves

– results depend on the choice of features

– features have to be manually designed

– features capture certain aspects only

– features are not always independent of one another

Disadvantages of working in feature space:

Machine Learning

Anne Driemel: Algorithms and data structures for spaces of curves

– results depend on the choice of features

– features have to be manually designed

– features capture certain aspects only

– features are not always independent of one another

Question: Is it possible to bypass the feature spaceand learn the distribution of curves directly?

Disadvantages of working in feature space:

Machine Learning

Anne Driemel: Algorithms and data structures for spaces of curves

Overview of this talk

Machine

learning in

spaces of

curves

Anne Driemel: Algorithms and data structures for spaces of curves

Overview of this talk

Clustering

Median and Depth

Machine

learning in

spaces of

curves

Density estimate

Distribution of curves

Anne Driemel: Algorithms and data structures for spaces of curves

Overview of this talk

Clustering

Median and Depth

Classification ofcurves

Range searching

Machine

learning in

spaces of

curves

NN-suche

Density estimate

Distribution of curves

Decision trees

Anne Driemel: Algorithms and data structures for spaces of curves

Overview of this talk

Clustering

Median and Depth

Classification ofcurves

Range searching

Dimension reduction

Machine

learning in

spaces of

curves

Geometric approximation

Representation

NN-suche

Density estimate

Overfitting

Distribution of curves

Decision trees

Anne Driemel: Algorithms and data structures for spaces of curves

Overview of this talk

Clustering

Median and Depth

Classification ofcurves

Range searching

Dimension reduction

Machine

learning in

spaces of

curves

Geometric approximation

Representation

NN-suche

Density estimate

Overfitting

Distribution of curvesComplexity

Time

Error rate

Space

Sample size

Decision trees

Anne Driemel: Algorithms and data structures for spaces of curves

Density estimate

Density estimate using sophisticated histograms (KDE)

Only works in low-dimensional space

Modes

[Scott Multivariate Density Estimation 1950].

Anne Driemel: Algorithms and data structures for spaces of curves

Density estimate for curves?

What is the typicalshape of a hurricanetrajectory?

Anne Driemel: Algorithms and data structures for spaces of curves

Definition of the median

Let (X, d) be a metric space on a set X withdistance function d : X2 → R≥0

The median of a set P ⊆ X is defined as

argminc∈X

∑p∈P

d(c, p)

c

Anne Driemel: Algorithms and data structures for spaces of curves

Overfitting of the median

A

B

Anne Driemel: Algorithms and data structures for spaces of curves

Overfitting of the median

A

B

c

Anne Driemel: Algorithms and data structures for spaces of curves

Overfitting of the median

A1, . . . , An/2

B1, . . . , Bn/2

c

Anne Driemel: Algorithms and data structures for spaces of curves

Smoothing of the median

Definition: Let Xd` be the set of polygonal curves in

Rd that have exactly ` edges.

The `-Median of a set P ⊆ Xdm is defined as

argminc∈Xd

`

∑p∈P

d(c, p)

Anne Driemel: Algorithms and data structures for spaces of curves

Smoothing of the median

Definition: Let Xd` be the set of polygonal curves in

Rd that have exactly ` edges.

The `-Median of a set P ⊆ Xdm is defined as

argminc∈Xd

`

∑p∈P

d(c, p)

Anne Driemel: Algorithms and data structures for spaces of curves

Clustering of curves

argminC⊆Xd

`|C|=k

∑p∈P

d(nn(p, C), p)

argminC⊆Xd

`|C|=k

maxp∈P

d(nn(p, C), p)

The (k, `)-center-clustering is

The (k, `)-median-clustering is

Let C ⊆ Xd` be a k-set of centers

Let nn(p, C) be the closest center to a point p

Anne Driemel: Algorithms and data structures for spaces of curves

Clustering in discrete metric spaces

k-median, randomised:– (1 + ε)-approximation in O(n2(k/ε)

O(1)

) time– Algorithm of [Kumar, Sabharwal, Sen ’10]– Doubling spaces: [Ackermann, Blomer, Sohler ’10]

k-center:– 2-approximation in O(kn) time [Gonzales ’85]

k-median:– (1 +

√3 + ε)-approximation [Li and Svensson ’13]

– Lower bound for the approximation factor ∼ 1.73

Most variants of clustering are NP-hard or APX-hard

Anne Driemel: Algorithms and data structures for spaces of curves

Distance measure for curves

A

B

What is the minimal distance with which two peoplecan traverse both entire curves?

δ

The Frechet distance measures the similarity of curves

Anne Driemel: Algorithms and data structures for spaces of curves

Clustering of curves

1-center (no smoothing):

– O(m1m2 log(m1 +m2))[Alt and Godau ’95] [Buchin et al. ’13]

– Lower bound based on SETH Ω((m1 +m2)2−δ) (∀δ > 0)[Bringmann ’14] [Bringmann and Mulzer ’15]

– weak Frechet distance: O(mn) time and space[Har-Peled, Raichel ’11]

Given n curves in Rd with m edges each

– discrete Frechet distance: O(mn) time and space[Ahn et al. ’15]

Single distance computation:

Given two curves m1 and m2 edges

Anne Driemel: Algorithms and data structures for spaces of curves

Theorem:We can compute an O(1)-approximation for the(k, `)-center-clustering and (k, `)-median-clusteringin O(nm) time. For d = 1 we can compute a(1 + ε)-approximation in the same time.

Our results for clustering

Driemel, Krivosija und Sohler:Clustering time series under the Frechet distanceSODA 2016

Anne Driemel: Algorithms and data structures for spaces of curves

Theorem:We can compute an O(1)-approximation for the(k, `)-center-clustering and (k, `)-median-clusteringin O(nm) time. For d = 1 we can compute a(1 + ε)-approximation in the same time.

Sampling-Property:For d = 1 one can derive a (1 + ε)-approximation of the(1, `)-median based on a sample of size mε`.

Our results for clustering

Driemel, Krivosija und Sohler:Clustering time series under the Frechet distanceSODA 2016

Anne Driemel: Algorithms and data structures for spaces of curves

Overview of this talk

Clustering

Median and Depth

Classification ofcurves

Range searching

Dimension reduction

Machine

learning in

spaces of

curves

Geometric approximation

Representation

NN-suche

Density estimate

Overfitting

Distribution of curvesComplexity

Time

Error rate

Space

Sample size

Decision trees

Anne Driemel: Algorithms and data structures for spaces of curves

Classification

A central task in machine learning is the approximationof a function

The input is usually a set of examples

(p, `) | p ∈ X, ` ∈ L

f : X → L

X = instancesL = labels

Anne Driemel: Algorithms and data structures for spaces of curves

Classification of curves

de Vries, van Someren Machine learning for vessel trajectories using compression,alignments and domain knowledge Expert Syst. Appl. 2012

Cargo ship

Tanker

Law enforcement

Tug

Anne Driemel: Algorithms and data structures for spaces of curves

Nearest-neighbor searching

x1

x2

label 1label 2

Anne Driemel: Algorithms and data structures for spaces of curves

Nearest-neighbor searching

x1

x2

label 1label 2

q

Query

Anne Driemel: Algorithms and data structures for spaces of curves

Nearest-neighbor searching

x1

x2

label 1label 2

q The label of the exampleinstance closest to q

Answer:Query

Anne Driemel: Algorithms and data structures for spaces of curves

Nearest-neighbor searching

x1

x2

label 1label 2

q

(1) Partition the space intohomogeneous areas

(2) Data structure forpoint location

The label of the exampleinstance closest to q

Answer:

f : X → L

Query

Anne Driemel: Algorithms and data structures for spaces of curves

Nearest-neighbor searching

x1

x2

label 1label 2

q

(1) Partition the space intohomogeneous areas

(2) Data structure forpoint location

The label of the exampleinstance closest to q

Answer:

f : X → L

Voronoi diagram

Θ(ndd/2e)

Query

Anne Driemel: Algorithms and data structures for spaces of curves

Nearest-neighbor searching

(a) if d(p,q) ≤ d1 then Pr [h(p) = h(q)] ≥ p1(b) if d(p,q) ≥ d2 then Pr [h(p) = h(q)] ≤ p2

A set of hash functions H is called (d1, d2, p1, p2)-locality-sensitive if it holds for all p,q ∈ Rd:

d1

d2

p

Anne Driemel: Algorithms and data structures for spaces of curves

Driemel und Silvestri: Locality-sensitive-hashing for curvesSoCG 2017

Space Query time ApproximationO(mn) O

(m2n

)1 –

O(|U |√m(m

√mn)2

)O(mO(1) log n

)O(logm+ log log n) [Indyk02]

O(n log n+mn) O(m log n) O(m) new

O(24mdn log n+mn) O(24mdm log n)O(1) new

O(22kmk−1n log n+mn)O(22kmk log n) O(m/k) new

Our results for NN-searching

We show LSH-families for the discrete Frechet distanceinput: n curves from Xd

m.

Anne Driemel: Algorithms and data structures for spaces of curves

Overview of this talk

Clustering

Median and Depth

Classification ofcurves

Range searching

Dimension reduction

Machine

learning in

spaces of

curves

Geometric approximation

Representation

NN-suche

Density estimate

Overfitting

Distribution of curvesComplexity

Time

Error rate

Space

Sample size

Decision trees

Anne Driemel: Algorithms and data structures for spaces of curves

Range searching for curves

q

Which hurricate trajectories aresimilar to the query curve q?

Anne Driemel: Algorithms and data structures for spaces of curves

Range searching for curves

Anne Driemel: Algorithms and data structures for spaces of curves

Range searching for curves

A range query (q, r) among a set of curves S ⊆ Xdm

should return the set

p ∈ S | d(p, q) ≤ r

The range is defined as the metric ball of radius rcentered at q ∈ Xd

` .

Anne Driemel: Algorithms and data structures for spaces of curves

Range searching for curves

Only known data structure for this type of queries:

For any parameter value 1 ≤ s ≤√n

– approximation ≈ 6.24– query time in O(spolylog n)– space in O((ns )2 polylog n)

Queries: Only supports query curves from X21

[de Berg, Cook and Gudmundsson ’13]

Output: Count of the curves in the range

Anne Driemel: Algorithms and data structures for spaces of curves

Our results for range searching

Theorem:

Assume there exists a data structure for Frechet rangequeries among an n-set from Xd

m with– Space in ≤ S(n)– Query time in ≤ Q(n) +O(k) (k is output size)

Then it holds

S(n) = Ω

((n

Q(n)

)2)

Driemel und Afshani, On the complexity of range searchingamong curves (Manuscript).

Anne Driemel: Algorithms and data structures for spaces of curves

Range searching for curves

Our proof uses a volume argument of [Afshani ’13]

Input:Slabs in R2

Output:Slabs that contain q

q

Q = [0, 1]2

Anne Driemel: Algorithms and data structures for spaces of curves

Range searching for curves

Our proof uses a volume argument of [Afshani ’13]

Input:Slabs in R2

Output:Slabs that contain q

Q = [0, 1]2

Input construction withoutput size k = t

Anne Driemel: Algorithms and data structures for spaces of curves

Range searching for curves

Our proof uses a volume argument of [Afshani ’13]

Input:Slabs in R2

Output:Slabs that contain q

Q = [0, 1]2

Input construction withoutput size k = t

Anne Driemel: Algorithms and data structures for spaces of curves

Range searching for curves

Our proof uses a volume argument of [Afshani ’13]

Input:Slabs in R2

Output:Slabs that contain q

Q = [0, 1]2

Input construction withoutput size k = t

Anne Driemel: Algorithms and data structures for spaces of curves

Range searching for curves

Our proof uses a volume argument of [Afshani ’13]

Input:Slabs in R2

Output:Slabs that contain q

Q = [0, 1]2

S(n) ≥ tvmax

Input construction withoutput size k = t

vmax

Anne Driemel: Algorithms and data structures for spaces of curves

Range searching for curves

Our proof uses a volume argument of [Afshani ’13]

Input:Slabs in R2

Output:Slabs that contain q

v ≤ t3

n2·π

tn

Q = [0, 1]2

πt

Input construction withoutput size k = t

S(n) ≥ tvmax

≥(nt

)2

Anne Driemel: Algorithms and data structures for spaces of curves

Range searching for curves

Our proof uses a volume argument of [Afshani ’13]

Input:Slabs in R2

Output:Slabs that contain q

Q = [0, 1]2

Input construction withoutput size k = t

S(n) ≥ tvmax

≥(nt

)2=(

nQ(n)

)2

v ≤ t3

n2·π

Anne Driemel: Algorithms and data structures for spaces of curves

Range searching for curves

q q

Dualization of range queries

p p

input space query space

Anne Driemel: Algorithms and data structures for spaces of curves

Range searching for curves

What is the locus of queries that output this curve?

Restrict queries to X21 (Single edges)

Anne Driemel: Algorithms and data structures for spaces of curves

Range searching for curves

1

2

3

4

5

6

r

What is the locus of queries that output this curve?

Restrict queries to X21 (Single edges)

Anne Driemel: Algorithms and data structures for spaces of curves

Range searching for curves

1

2

3

4

5

6

r

What is the locus of queries that output this curve?

Restrict queries to X21 (Single edges)

Anne Driemel: Algorithms and data structures for spaces of curves

Range searching for curves

What is the locus of queries that output this curve?

Restrict queries to X21 (Single edges)

Anne Driemel: Algorithms and data structures for spaces of curves

Range searching for curves

α

β

`

Represent query lines in the dual space (β, α)

Anne Driemel: Algorithms and data structures for spaces of curves

Range searching for curves

α

β

`

p

xc

ycr

Represent query lines in the dual space (β, α)

Lines stabbing the disk at (xc, yc) with radius r:

α ≤ xc sin(β) + yc cos(β) + r

α ≥ xc sin(β) + yc cos(β)− r

Anne Driemel: Algorithms and data structures for spaces of curves

β

α

q

q

Dualization of range queries for curves

Range searching for curves

input space query space

A

Anne Driemel: Algorithms and data structures for spaces of curves

β

α

q

q

Dualization of range queries for curves

Range searching for curves

input space query space

A B

Anne Driemel: Algorithms and data structures for spaces of curves

β

α

Dualization of range queries for curves

Q

Q

Range searching for curves

input space query space

A B

Anne Driemel: Algorithms and data structures for spaces of curves

Range searching for curves

t

nt

Sketching the input construction for the lower bound:

– Fix the input curves at their endpoints

Anne Driemel: Algorithms and data structures for spaces of curves

Range searching for curves

t

nt

xi

Sketching the input construction for the lower bound:

– A query line outputs an input curve pij iff the queryline intersects the vertical grid edge at (i, j)

yj

– Fix the input curves at their endpoints

Anne Driemel: Algorithms and data structures for spaces of curves

Range searching for curves

t

nt

xi

Sketching the input construction for the lower bound:

– A query line outputs an input curve pij iff the queryline intersects the vertical grid edge at (i, j)

yj

– Fix the input curves at their endpoints

Anne Driemel: Algorithms and data structures for spaces of curves

Range searching for curves

t

nt

Sketching the input construction for the lower bound:

– A query line outputs an input curve pij iff the queryline intersects the vertical grid edge at (i, j)

xi

– Fix the input curves at their endpoints

yj

Anne Driemel: Algorithms and data structures for spaces of curves

Range searching for curves

t

nt

Sketching the input construction for the lower bound:

– A query line outputs an input curve pij iff the queryline intersects the vertical grid edge at (i, j)

– Fix the input curves at their endpoints

???

Anne Driemel: Algorithms and data structures for spaces of curves

Range searching for curves

β

α

Two input curves in the query space

Q

Anne Driemel: Algorithms and data structures for spaces of curves

Range searching for curves

β

α

Two input curves in the query space

Anne Driemel: Algorithms and data structures for spaces of curves

Lower bounds for range searching

Theorem:

Assume there exists a data structure for Frechet rangequeries in Xd

` among an n-set from Xdm with

– Space in ≤ S(n)– Query time in ≤ Q(n) +O(k) (k is output size)

Then it holds

Driemel und Afshani, On the complexity of range searchingamong curves (Manuscript)

S(n) = Ω

((n

Q(n)

)2)

Anne Driemel: Algorithms and data structures for spaces of curves

Lower bounds for range searching

Assume there exists a data structure for Frechet rangequeries in Xd

` among an n-set from Xdm with

– Space in ≤ S(n)– Query time in ≤ Q(n) +O(k) (k is output size)

Then it holds

Driemel und Afshani, On the complexity of range searchingamong curves (Manuscript)

S(n) = Ω

((n

Q(n)

)2

·(

log(n/Q(n))

`2

)`−2)

Theorem (Extension):

Anne Driemel: Algorithms and data structures for spaces of curves

Data structures

Theorem (discrete Frechet):

We can build a data structure for discrete Frechet rangequeries in Xd

` among an n-set from Xdm with

– Space in O(n(log log n)m−1)

– Query time in O(n1−1/d logO(m) n · `O(d) + k)( where k is the output size)

assuming ` ∈ O(polylog n).

We can match the lower bound with a multi-levelpartition tree that has the correct number of levels:

Driemel und Afshani, On the complexity of range searchingamong curves (Manuscript)

Anne Driemel: Algorithms and data structures for spaces of curves

Data structures

We can match the lower bound with a multi-levelpartition tree that has the correct number of levels:

Driemel und Afshani, On the complexity of range searchingamong curves (Manuscript)

We can build a data structure for continuous Frechetrange queries in X2

` among an n-set from X2m with

– Space in O(n(log log n)O(m2))

– Query time in O(√n logO(m) n+ k)

( where k is the output size)assuming ` ∈ O(polylog n).

Theorem (continuous Frechet):

Anne Driemel: Algorithms and data structures for spaces of curves

Data structure for discrete Frechet

Sketch of the data structure

(input curves)

Anne Driemel: Algorithms and data structures for spaces of curves

Data structure for discrete Frechet

Sketch of the data structure

P1

Multi-level partition tree

P1 P2 P3 . . .

(input curves)

Anne Driemel: Algorithms and data structures for spaces of curves

P2

Data structure for discrete Frechet

Sketch of the data structure

Multi-level partition tree

P1 P2 P3 . . .

(input curves)

Anne Driemel: Algorithms and data structures for spaces of curves

Data structure for discrete Frechet

Sketch of the data structure

P3

Multi-level partition tree

P1 P2 P3 . . .

(input curves)

Anne Driemel: Algorithms and data structures for spaces of curves

Data structure for discrete Frechet

Sketch of the data structure

P4

Multi-level partition tree

P1 P2 P3 . . .

. . .

(input curves)

Anne Driemel: Algorithms and data structures for spaces of curves

q1

q2

q3q4

q5

Data structure for discrete Frechet

Sketch of the query algorithm

(query curve)

Anne Driemel: Algorithms and data structures for spaces of curves

q1

q2

q3q4

q5

Data structure for discrete Frechet

Sketch of the query algorithm

(query curve)

Anne Driemel: Algorithms and data structures for spaces of curves

q1

q2

q3q4

q5

Data structure for discrete Frechet

Sketch of the query algorithm

q1 1q2 1q3 0q4 1q5 0

(query curve)

Anne Driemel: Algorithms and data structures for spaces of curves

q1 1 0 0 1q2 1 0 0 0q3 0 1 0 0q4 1 0 1 0q5 0 0 0 1

Data structure for discrete Frechet

Sketch of the query algorithm

Anne Driemel: Algorithms and data structures for spaces of curves

q1 1 0 0 1q2 1 0 0 0q3 0 1 0 0q4 1 0 1 0q5 0 0 0 1

Data structure for discrete Frechet

Sketch of the query algorithm

P1

Anne Driemel: Algorithms and data structures for spaces of curves

q1 1 0 0 1q2 1 0 0 0q3 0 1 0 0q4 1 0 1 0q5 0 0 0 1

Data structure for discrete Frechet

Sketch of the query algorithm

P2

Anne Driemel: Algorithms and data structures for spaces of curves

q1 1 0 0 1q2 1 0 0 0q3 0 1 0 0q4 1 0 1 0q5 0 0 0 1

Data structure for discrete Frechet

Sketch of the query algorithm

P3

Anne Driemel: Algorithms and data structures for spaces of curves

q1 1 0 0 1q2 1 0 0 0q3 0 1 0 0q4 1 0 1 0q5 0 0 0 1

Data structure for discrete Frechet

Sketch of the query algorithm

P4

Anne Driemel: Algorithms and data structures for spaces of curves

q1 1 0 0 1q2 1 0 0 0q3 0 1 0 0q4 1 0 1 0q5 0 0 0 1

Data structure for discrete Frechet

Sketch of the query algorithm

feasible alignment

Anne Driemel: Algorithms and data structures for spaces of curves

Data structures

Theorem (discrete Frechet):

We can build a data structure for discrete Frechet rangequeries in Xd

` among an n-set from Xdm with

– Space in O(n · (log log n)m−1)

– Query time in O(n1−1/d logO(m) n · `O(d) + k)( where k is the output size)

assuming ` in O(polylog n).

We can match the lower bound with a multi-levelpartition tree that has the correct number of levels:

Driemel und Afshani, On the complexity of range searchingamong curves (Manuscript)

Anne Driemel: Algorithms and data structures for spaces of curves

Data structures

We can match the lower bound with a multi-levelpartition tree that has the correct number of levels:

Driemel und Afshani, On the complexity of range searchingamong curves (Manuscript)

We can build a data structure for continuous Frechetrange queries in X2

` among an n-set from X2m with

– Space in O(n · (log log n)O(m2))

– Query time in O(√n logO(m) n+ k)

( where k is the output size)assuming ` in O(polylog n).

Theorem (continuous Frechet):

Anne Driemel: Algorithms and data structures for spaces of curves

Overview of this talk

Clustering

Median and Depth

Classification ofcurves

Range searching

Dimension reduction

Machine

learning in

spaces of

curves

Geometric approximation

Representation

NN-suche

Density estimate

Overfitting

Distribution of curvesComplexity

Time

Error rate

Space

Sample size

Decision trees

Anne Driemel: Algorithms and data structures for spaces of curves

Summary

We talked about:

(3) Machine learning for curves

(1) Clustering for curves

Challenges:

– Curse of dimensionality

– Risk of overfitting

– Complexity of the distance measure

(2) Data structuring for curves

Anne Driemel: Algorithms and data structures for spaces of curves

Some open problems

– Clustering

– NN-Searching

Practical algorithms for clustering of curvesMore robust definitions of median and depth

Randomized curve simplificationLSH for continuous Frechet distance

– Range-Searching

DS for high-space/ low-query-time regime

Continuous Frechet distance in 3D (seems much harder)