distributed trajectory similarity searchlifeifei/papers/traj-sim-vldb-poster.pdfdistributed...

1
Distributed Trajectory Similarity Search Dong Xie, Feifei Li, Jeff M. Phillips {dongx, lifeifei, jeffp}@cs.utah.edu University of Utah Search Procedure Trajectory Similarity Search Segment-based vs. Trajectory-based Indexing Motivation Pruning Theorem Customized R-Tree Local Index Two Level Indexing in Apache Spark Design Choice and Optimization Experiment Results Huge amount of trajectory data are being generated everyday. Widely used in traffic analysis, transportation planning. Classic problem: find ‘similar’ trajectories. Require distributed solutions to scale out. nearest neighbor query over trajectories under a specific distance metric . Not yet studied under a distributed environment. Different metrics: Hausdorff distance vs. Frechet distance Partition trajectories as individual objects Partition all segments in trajectories Given a distance threshold >0, and two trajectories and . If there exists a segment such that mindist , > , then we have , > and , > Step 1: Pruning Bound Selection Find a safe pruning bound covering at least data trajectories. Sample trajectories passing similar regions as the query trajectory. Find the -th closest distance as the pruning bound . Theory beneath: quantile estimation based on samples. Step 2: Index - based Pruning Utilize the distributed index to find the set of trajectory IDs can be safely pruned by . Step 3: Finalizing Results Rebuild all candidate trajectories, then launch a distributed top-K. Concise Data Structure for TID sets in customized R-Trees. Roaring Bitmap: a concise and flexible compressed bitmap Dual Indexing Strategy. Keep another data copy organized in trajectory objects. Eliminate the procedure of regrouping candidate trajectories.

Upload: others

Post on 08-Oct-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Distributed Trajectory Similarity Searchlifeifei/papers/traj-sim-vldb-poster.pdfDistributed Trajectory Similarity Search Dong Xie, Feifei Li, Jeff M. Phillips {dongx, lifeifei, jeffp}@cs.utah.edu

Distributed Trajectory Similarity Search Dong Xie, Feifei Li, Jeff M. Phillips

{dongx, lifeifei, jeffp}@cs.utah.edu

University of Utah

Search Procedure

Trajectory Similarity Search

Segment-based vs. Trajectory-based Indexing

Motivation Pruning Theorem

Customized R-Tree Local Index

Two Level Indexing in Apache Spark

Design Choice and Optimization

Experiment Results

Huge amount of trajectory data are being generated everyday.

Widely used in traffic analysis, transportation planning.

Classic problem: find ‘similar’ trajectories.

Require distributed solutions to scale out.

𝒌 nearest neighbor query over trajectories under a specific

distance metric 𝐷.

Not yet studied under a distributed environment.

Different metrics: Hausdorff distance vs. Frechet distance

Partition trajectoriesas individual objects

Partition all segmentsin trajectories

Given a distance threshold 𝜀 > 0, and two trajectories 𝑄 and 𝑇.

If there exists a segment ℓ𝑖 ∈ 𝑇 such that mindist ℓ𝑖 , 𝑄 > 𝜀,

then we have 𝐷𝐻 𝑄, 𝑇 > 𝜀 and 𝐷𝐹 𝑄, 𝑇 > 𝜀

Step 1: Pruning Bound SelectionFind a safe pruning bound 𝜀 covering at least 𝑘 data trajectories.

Sample 𝑐 ⋅ 𝑘 trajectories passing similar regions as the query trajectory.

Find the 𝑘-th closest distance as the pruning bound 𝜀.

Theory beneath: quantile estimation based on samples.

Step 2: Index-based Pruning

Utilize the distributed index to find the set of trajectory IDs can be safely pruned by 𝜀.

Step 3: Finalizing Results

Rebuild all candidate trajectories, then launch a distributed top-K.

Concise Data Structure for TID sets in customized R-Trees.

Roaring Bitmap: a concise and flexible compressed bitmap

Dual Indexing Strategy.

Keep another data copy organized in trajectory objects.

Eliminate the procedure of regrouping candidate trajectories.