comparing efficient nearest neighbor techniques for ...diana carolina montanez,~ adolfo quiroz,...

24
Introducction Algortihms Data and Results Comparing Efficient Nearest Neighbor Techniques for Support Vector Machines in Large Feature Spaces Diana Carolina Monta˜ nez, Adolfo Quiroz, Mateo Dulce, Alvaro J. Riascos Villegas February de 2018 Monta˜ nez, Quiroz, Dulce, Riascos Nearest Neighbor Techniques SVM in Large Feature Spaces

Upload: others

Post on 30-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Comparing Efficient Nearest Neighbor Techniques for ...Diana Carolina Montanez,~ Adolfo Quiroz, Mateo Dulce, Alvaro J. Riascos Villegas February de 2018 Montanez,~ Quiroz, Dulce, Riascos

IntroducctionAlgortihms

Data and Results

Comparing Efficient Nearest Neighbor Techniquesfor Support Vector Machines in Large Feature

Spaces

Diana Carolina Montanez, Adolfo Quiroz, Mateo Dulce, AlvaroJ. Riascos Villegas

February de 2018

Montanez, Quiroz, Dulce, Riascos Nearest Neighbor Techniques SVM in Large Feature Spaces

Page 2: Comparing Efficient Nearest Neighbor Techniques for ...Diana Carolina Montanez,~ Adolfo Quiroz, Mateo Dulce, Alvaro J. Riascos Villegas February de 2018 Montanez,~ Quiroz, Dulce, Riascos

IntroducctionAlgortihms

Data and Results

Contenido

1 Introducction

2 AlgortihmsEfficient Search of Support VectorsProximity Preserving OrderEfficient Pivot Selection

3 Data and Results

Montanez, Quiroz, Dulce, Riascos Nearest Neighbor Techniques SVM in Large Feature Spaces

Page 3: Comparing Efficient Nearest Neighbor Techniques for ...Diana Carolina Montanez,~ Adolfo Quiroz, Mateo Dulce, Alvaro J. Riascos Villegas February de 2018 Montanez,~ Quiroz, Dulce, Riascos

IntroducctionAlgortihms

Data and Results

Introduction

We address the problem of efficiently finding nearestneighbors in order to train a support vector machine classifierin a large feature space.

SVM in a nutshell.

Montanez, Quiroz, Dulce, Riascos Nearest Neighbor Techniques SVM in Large Feature Spaces

Page 4: Comparing Efficient Nearest Neighbor Techniques for ...Diana Carolina Montanez,~ Adolfo Quiroz, Mateo Dulce, Alvaro J. Riascos Villegas February de 2018 Montanez,~ Quiroz, Dulce, Riascos

IntroducctionAlgortihms

Data and Results

Introduction

We address the problem of efficiently finding nearestneighbors in order to train a support vector machine classifierin a large feature space.

SVM in a nutshell.

Montanez, Quiroz, Dulce, Riascos Nearest Neighbor Techniques SVM in Large Feature Spaces

Page 5: Comparing Efficient Nearest Neighbor Techniques for ...Diana Carolina Montanez,~ Adolfo Quiroz, Mateo Dulce, Alvaro J. Riascos Villegas February de 2018 Montanez,~ Quiroz, Dulce, Riascos

IntroducctionAlgortihms

Data and Results

Efficient Search of Support VectorsProximity Preserving OrderEfficient Pivot Selection

Contenido

1 Introducction

2 AlgortihmsEfficient Search of Support VectorsProximity Preserving OrderEfficient Pivot Selection

3 Data and Results

Montanez, Quiroz, Dulce, Riascos Nearest Neighbor Techniques SVM in Large Feature Spaces

Page 6: Comparing Efficient Nearest Neighbor Techniques for ...Diana Carolina Montanez,~ Adolfo Quiroz, Mateo Dulce, Alvaro J. Riascos Villegas February de 2018 Montanez,~ Quiroz, Dulce, Riascos

IntroducctionAlgortihms

Data and Results

Efficient Search of Support VectorsProximity Preserving OrderEfficient Pivot Selection

Algortihms

We follow Camelo et.al strategy for solving for SVs. usingK-NN.

Solving the K-NN query is costly in large feture space. Wefollow a pivot strategy and Chavez, et.al. proximity preservingorder algorithm.

We refine the pivot strategy by choosing appropiate (base)pivots following Mico et.al.

We experimentally explore the relative merits of the pivotstrategy against the more refined strategy and the state of theart benchmark: LIBSVM.

Montanez, Quiroz, Dulce, Riascos Nearest Neighbor Techniques SVM in Large Feature Spaces

Page 7: Comparing Efficient Nearest Neighbor Techniques for ...Diana Carolina Montanez,~ Adolfo Quiroz, Mateo Dulce, Alvaro J. Riascos Villegas February de 2018 Montanez,~ Quiroz, Dulce, Riascos

IntroducctionAlgortihms

Data and Results

Efficient Search of Support VectorsProximity Preserving OrderEfficient Pivot Selection

Algortihms

We follow Camelo et.al strategy for solving for SVs. usingK-NN.

Solving the K-NN query is costly in large feture space. Wefollow a pivot strategy and Chavez, et.al. proximity preservingorder algorithm.

We refine the pivot strategy by choosing appropiate (base)pivots following Mico et.al.

We experimentally explore the relative merits of the pivotstrategy against the more refined strategy and the state of theart benchmark: LIBSVM.

Montanez, Quiroz, Dulce, Riascos Nearest Neighbor Techniques SVM in Large Feature Spaces

Page 8: Comparing Efficient Nearest Neighbor Techniques for ...Diana Carolina Montanez,~ Adolfo Quiroz, Mateo Dulce, Alvaro J. Riascos Villegas February de 2018 Montanez,~ Quiroz, Dulce, Riascos

IntroducctionAlgortihms

Data and Results

Efficient Search of Support VectorsProximity Preserving OrderEfficient Pivot Selection

Algortihms

We follow Camelo et.al strategy for solving for SVs. usingK-NN.

Solving the K-NN query is costly in large feture space. Wefollow a pivot strategy and Chavez, et.al. proximity preservingorder algorithm.

We refine the pivot strategy by choosing appropiate (base)pivots following Mico et.al.

We experimentally explore the relative merits of the pivotstrategy against the more refined strategy and the state of theart benchmark: LIBSVM.

Montanez, Quiroz, Dulce, Riascos Nearest Neighbor Techniques SVM in Large Feature Spaces

Page 9: Comparing Efficient Nearest Neighbor Techniques for ...Diana Carolina Montanez,~ Adolfo Quiroz, Mateo Dulce, Alvaro J. Riascos Villegas February de 2018 Montanez,~ Quiroz, Dulce, Riascos

IntroducctionAlgortihms

Data and Results

Efficient Search of Support VectorsProximity Preserving OrderEfficient Pivot Selection

Algortihms

We follow Camelo et.al strategy for solving for SVs. usingK-NN.

Solving the K-NN query is costly in large feture space. Wefollow a pivot strategy and Chavez, et.al. proximity preservingorder algorithm.

We refine the pivot strategy by choosing appropiate (base)pivots following Mico et.al.

We experimentally explore the relative merits of the pivotstrategy against the more refined strategy and the state of theart benchmark: LIBSVM.

Montanez, Quiroz, Dulce, Riascos Nearest Neighbor Techniques SVM in Large Feature Spaces

Page 10: Comparing Efficient Nearest Neighbor Techniques for ...Diana Carolina Montanez,~ Adolfo Quiroz, Mateo Dulce, Alvaro J. Riascos Villegas February de 2018 Montanez,~ Quiroz, Dulce, Riascos

IntroducctionAlgortihms

Data and Results

Efficient Search of Support VectorsProximity Preserving OrderEfficient Pivot Selection

Camelo, et.al

Finding SV is costly.

They propose solving for SV on small subsamples and look toK-NN of SV.

They demonstrate two theorems that rationalizes theirsampling strategy.

Montanez, Quiroz, Dulce, Riascos Nearest Neighbor Techniques SVM in Large Feature Spaces

Page 11: Comparing Efficient Nearest Neighbor Techniques for ...Diana Carolina Montanez,~ Adolfo Quiroz, Mateo Dulce, Alvaro J. Riascos Villegas February de 2018 Montanez,~ Quiroz, Dulce, Riascos

IntroducctionAlgortihms

Data and Results

Efficient Search of Support VectorsProximity Preserving OrderEfficient Pivot Selection

Camelo, et.al

Finding SV is costly.

They propose solving for SV on small subsamples and look toK-NN of SV.

They demonstrate two theorems that rationalizes theirsampling strategy.

Montanez, Quiroz, Dulce, Riascos Nearest Neighbor Techniques SVM in Large Feature Spaces

Page 12: Comparing Efficient Nearest Neighbor Techniques for ...Diana Carolina Montanez,~ Adolfo Quiroz, Mateo Dulce, Alvaro J. Riascos Villegas February de 2018 Montanez,~ Quiroz, Dulce, Riascos

IntroducctionAlgortihms

Data and Results

Efficient Search of Support VectorsProximity Preserving OrderEfficient Pivot Selection

Camelo, et.al

Finding SV is costly.

They propose solving for SV on small subsamples and look toK-NN of SV.

They demonstrate two theorems that rationalizes theirsampling strategy.

Montanez, Quiroz, Dulce, Riascos Nearest Neighbor Techniques SVM in Large Feature Spaces

Page 13: Comparing Efficient Nearest Neighbor Techniques for ...Diana Carolina Montanez,~ Adolfo Quiroz, Mateo Dulce, Alvaro J. Riascos Villegas February de 2018 Montanez,~ Quiroz, Dulce, Riascos

Camelo, et.al

Page 14: Comparing Efficient Nearest Neighbor Techniques for ...Diana Carolina Montanez,~ Adolfo Quiroz, Mateo Dulce, Alvaro J. Riascos Villegas February de 2018 Montanez,~ Quiroz, Dulce, Riascos

Camelo, et.al

Page 15: Comparing Efficient Nearest Neighbor Techniques for ...Diana Carolina Montanez,~ Adolfo Quiroz, Mateo Dulce, Alvaro J. Riascos Villegas February de 2018 Montanez,~ Quiroz, Dulce, Riascos

Camelo, et.al

Page 16: Comparing Efficient Nearest Neighbor Techniques for ...Diana Carolina Montanez,~ Adolfo Quiroz, Mateo Dulce, Alvaro J. Riascos Villegas February de 2018 Montanez,~ Quiroz, Dulce, Riascos

Camelo, et.al

Page 17: Comparing Efficient Nearest Neighbor Techniques for ...Diana Carolina Montanez,~ Adolfo Quiroz, Mateo Dulce, Alvaro J. Riascos Villegas February de 2018 Montanez,~ Quiroz, Dulce, Riascos

Camelo, et.al

Page 18: Comparing Efficient Nearest Neighbor Techniques for ...Diana Carolina Montanez,~ Adolfo Quiroz, Mateo Dulce, Alvaro J. Riascos Villegas February de 2018 Montanez,~ Quiroz, Dulce, Riascos

IntroducctionAlgortihms

Data and Results

Efficient Search of Support VectorsProximity Preserving OrderEfficient Pivot Selection

Chavez, et.al

K-NN queries are expensive specially in high dimensionalfeature spaces.

Montanez, Quiroz, Dulce, Riascos Nearest Neighbor Techniques SVM in Large Feature Spaces

Page 19: Comparing Efficient Nearest Neighbor Techniques for ...Diana Carolina Montanez,~ Adolfo Quiroz, Mateo Dulce, Alvaro J. Riascos Villegas February de 2018 Montanez,~ Quiroz, Dulce, Riascos

Chavez, et.al

Pivot strategy:

pivots.jpg

Page 20: Comparing Efficient Nearest Neighbor Techniques for ...Diana Carolina Montanez,~ Adolfo Quiroz, Mateo Dulce, Alvaro J. Riascos Villegas February de 2018 Montanez,~ Quiroz, Dulce, Riascos

Chavez, et.al

Proximity preserving order:1 Then, for each d ∈ D, the distance to pi ∈ P is calculated. For

each d , the pi are ordered according to their distance to d .Therefore we have for each d a permutation πd of theelements of P.

2 Spearman’s Footrule. Let πd(pi ) be the position of pi in πd ,then

F (πd , πd′) =∑

1≤i≤k

|πd(pi )− πd′(pi )|

3 Example: πd = (p2, p1, p3), πd′ = (p1, p3, p2) thenF (πd , πd′) = |(2− 1)|+ |(1− 3)|+ |(3− 2)|

Page 21: Comparing Efficient Nearest Neighbor Techniques for ...Diana Carolina Montanez,~ Adolfo Quiroz, Mateo Dulce, Alvaro J. Riascos Villegas February de 2018 Montanez,~ Quiroz, Dulce, Riascos

IntroducctionAlgortihms

Data and Results

Efficient Search of Support VectorsProximity Preserving OrderEfficient Pivot Selection

Mico, et.al

Random vrs. well separated pivots

base pivots.jpg

Montanez, Quiroz, Dulce, Riascos Nearest Neighbor Techniques SVM in Large Feature Spaces

Page 22: Comparing Efficient Nearest Neighbor Techniques for ...Diana Carolina Montanez,~ Adolfo Quiroz, Mateo Dulce, Alvaro J. Riascos Villegas February de 2018 Montanez,~ Quiroz, Dulce, Riascos

IntroducctionAlgortihms

Data and Results

Contenido

1 Introducction

2 AlgortihmsEfficient Search of Support VectorsProximity Preserving OrderEfficient Pivot Selection

3 Data and Results

Montanez, Quiroz, Dulce, Riascos Nearest Neighbor Techniques SVM in Large Feature Spaces

Page 23: Comparing Efficient Nearest Neighbor Techniques for ...Diana Carolina Montanez,~ Adolfo Quiroz, Mateo Dulce, Alvaro J. Riascos Villegas February de 2018 Montanez,~ Quiroz, Dulce, Riascos

IntroducctionAlgortihms

Data and Results

Data

Table 1. Dataset description.

Daraset Train size Test size Features

real-sim 16,525 4,131 14,462COM 8,205 2,172 9,892CC 8,224 2,153 9,906

Montanez, Quiroz, Dulce, Riascos Nearest Neighbor Techniques SVM in Large Feature Spaces

Page 24: Comparing Efficient Nearest Neighbor Techniques for ...Diana Carolina Montanez,~ Adolfo Quiroz, Mateo Dulce, Alvaro J. Riascos Villegas February de 2018 Montanez,~ Quiroz, Dulce, Riascos

Results