database implementation of a model-free classifier
DESCRIPTION
University of Athens. ADBIS 2007. Database Implementation of a Model-Free Classifier. Konstantinos Morfonios. Introduction. Motivation. LOCUS. Parallel Execution. Experimental Evaluation. Conclusions & Future Work. Introduction. Motivation. LOCUS. Parallel Execution. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/1.jpg)
Database Implementation of a Model-Free Classifier
Konstantinos Morfonios
ADBIS 2007
University of Athens
![Page 2: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/2.jpg)
Introduction
LOCUS
Parallel Execution
Experimental Evaluation
Conclusions & Future Work
Motivation
![Page 3: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/3.jpg)
Introduction
LOCUS
Parallel Execution
Experimental Evaluation
Conclusions & Future Work
Motivation
![Page 4: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/4.jpg)
Introduction
ω1 = ω2 =
Classification
x = <x1, x2, …, xD> ω = f(x)
![Page 5: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/5.jpg)
<x1,1, x1,2, …, x1,D, ω1><x2,1, x2,2, …, x2,D, ω2><x3,1, x3,2, …, x3,D, ω1><x4,1, x4,2, …, x4,D, ω1>
.
.
.“Lazy”“Eager”
Introduction
x1 = <x1, x2, …, xD>x2 = <x1, x2, …, xD>
(+) Faster decisions( - ) Large/complex datasets( - ) Dynamic datasets( - ) Dynamic models
(Nearest Neighbors)(Decision Trees)
![Page 6: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/6.jpg)
Introduction
LOCUS
Parallel Execution
Experimental Evaluation
Conclusions & Future Work
Motivation
![Page 7: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/7.jpg)
Introduction
LOCUS
Parallel Execution
Experimental Evaluation
Conclusions & Future Work
Motivation
![Page 8: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/8.jpg)
Motivation
Large/complex datasets
![Page 9: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/9.jpg)
Motivation
![Page 10: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/10.jpg)
Motivation
Large/complex datasets Dynamic datasets
![Page 11: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/11.jpg)
Motivation
![Page 12: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/12.jpg)
Motivation
Large/complex datasets Dynamic datasets Dynamic models
![Page 13: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/13.jpg)
Motivation
![Page 14: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/14.jpg)
Motivation
Large/complex datasets Dynamic datasets Dynamic models
Lazy (model-free)
![Page 15: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/15.jpg)
Motivation
Large/complex datasets Dynamic datasets Dynamic models
Lazy (model-free)
Nearest Neighbors
Disk-based
![Page 16: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/16.jpg)
Motivation
Nearest Neighbors
Suffers from “curse of dimensionality”• Not reliable [Beyer et al., ICDT 1999]• Not indexable [Shaft et al., ICDT 2005]
LOCUS(Lazy Optimal Classifier of Unlimited Scalability)
![Page 17: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/17.jpg)
Motivation
• Category?
LOCUS(Lazy Optimal Classifier of Unlimited Scalability)
![Page 18: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/18.jpg)
Motivation
• Lazy
LOCUS(Lazy Optimal Classifier of Unlimited Scalability)
![Page 19: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/19.jpg)
Motivation
• Lazy
LOCUS(Lazy Optimal Classifier of Unlimited Scalability)
• Scaling?
![Page 20: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/20.jpg)
Motivation
• Lazy
LOCUS(Lazy Optimal Classifier of Unlimited Scalability)
• Based on simple SQL queries
![Page 21: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/21.jpg)
Motivation
• Lazy
LOCUS(Lazy Optimal Classifier of Unlimited Scalability)
• Based on simple SQL queries• Accuracy?
![Page 22: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/22.jpg)
Motivation
• Lazy
LOCUS(Lazy Optimal Classifier of Unlimited Scalability)
• Based on simple SQL queries• Converges to optimal Bayes Classifier
![Page 23: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/23.jpg)
Motivation
• Lazy
LOCUS(Lazy Optimal Classifier of Unlimited Scalability)
• Based on simple SQL queries• Converges to optimal Bayes Classifier• Other features?
![Page 24: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/24.jpg)
Motivation
• Lazy
LOCUS(Lazy Optimal Classifier of Unlimited Scalability)
• Based on simple SQL queries• Converges to optimal Bayes Classifier• Parallelizable
![Page 25: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/25.jpg)
Introduction
LOCUS
Parallel Execution
Experimental Evaluation
Conclusions & Future Work
Motivation
![Page 26: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/26.jpg)
Introduction
LOCUS
Parallel Execution
Experimental Evaluation
Conclusions & Future Work
Motivation
![Page 27: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/27.jpg)
LOCUS
x = <f1, f2>
ω2 =
ω1 =
(f1 [0, 20], f2 [0, 10])
f2
f1
Example
![Page 28: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/28.jpg)
LOCUSf2
f1
Ideally: Dense space
![Page 29: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/29.jpg)
LOCUS
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
f2
f1
ω(<7, 4>) = ?Ideally: Dense space
![Page 30: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/30.jpg)
LOCUS
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
f2
f1
ω(<7, 4>) =
![Page 31: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/31.jpg)
LOCUS
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
f2
f1
Reality:• Many features• Large domains Sparse space
![Page 32: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/32.jpg)
Reality:• Many features• Large domains Sparse space
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
LOCUSf2
f1
ω(<7, 4>) = ?
?
![Page 33: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/33.jpg)
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
LOCUSf2
f1
ω(<7, 4>) = ?ω1: 2ω2: 1
3-NN
![Page 34: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/34.jpg)
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
LOCUSf2
f1
ω(<7, 4>) = ω1: 2ω2: 1
3-NN
![Page 35: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/35.jpg)
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
LOCUSf2
f1
ω(<7, 4>) = ?
LOCUS
![Page 36: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/36.jpg)
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
LOCUSf2
f1
ω(<7, 4>) = ?ω1: 7ω2: 3
LOCUS
![Page 37: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/37.jpg)
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
LOCUSf2
f1
ω(<7, 4>) = ω1: 7ω2: 3
LOCUS
![Page 38: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/38.jpg)
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
LOCUSf2
f1
Disk-based implementation
LOCUS
![Page 39: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/39.jpg)
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
LOCUS
2δ1
2δ2
SELECT ω, count(*)FROM RWHERE f1≥x1-δ1 AND f1≤x1+δ1 AND f2≥x2-δ2 AND f2≤x2+δ2
GROUP BY ωR(f1, f2, ω)
<x1, x2>
ω1: 7ω2: 3
ω(<7, 4>) =
![Page 40: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/40.jpg)
LOCUS SELECT ω, count(*)FROM RWHERE f1≥x1-δ1 AND f1≤x1+δ1 AND f2≥x2-δ2 AND f2≤x2+δ2
GROUP BY ωR(f1, f2, ω)
What if R is large?
Classical optimization techniques for a well-known type of aggregate queries
• Indexing
• Presorting• Materialized views
![Page 41: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/41.jpg)
LOCUS SELECT ω, count(*)FROM RWHERE f1≥x1-δ1 AND f1≤x1+δ1 AND f2≥x2-δ2 AND f2≤x2+δ2
GROUP BY ωR(f1, f2, ω)
Method reliability?
LOCUS converges to the optimal Bayes classifier as the size of the dataset increases (proof in the paper)
![Page 42: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/42.jpg)
LOCUS SELECT ω, count(*)FROM RWHERE f1≥x1-δ1 AND f1≤x1+δ1 AND f2≥x2-δ2 AND f2≤x2+δ2
GROUP BY ωR(f1, f2, ω)
What if a feature, say f2, is categorical? (e.g. sex)
![Page 43: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/43.jpg)
LOCUS SELECT ω, count(*)FROM RWHERE f1≥x1-δ1 AND f1≤x1+δ1 AND f2=x2
GROUP BY ωR(f1, f2, ω)
Not a problem, since generally in practice:
• Combinations of categorical and numeric features• Categorical features have small domains
Hence, they do not contribute to sparsity
What if a feature, say f2, is categorical? (e.g. sex)
![Page 44: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/44.jpg)
Introduction
LOCUS
Parallel Execution
Experimental Evaluation
Conclusions & Future Work
Motivation
![Page 45: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/45.jpg)
Introduction
LOCUS
Parallel Execution
Experimental Evaluation
Conclusions & Future Work
Motivation
![Page 46: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/46.jpg)
SELECT SELECT
SELECT
SELECT
Parallel ExecutionR1
R2
R3
R4
R = R1 R2 R3 R4
![Page 47: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/47.jpg)
Parallel Execution
ω1: 5ω2: 2
ω1: 7ω2: 1
ω1: 5ω2: 1
ω1: 6ω2: 0
R1
R2
R3
R4
Count: distributive function
ω1: 23ω2: 4
52
123
183
234
![Page 48: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/48.jpg)
ω1: 7ω2: 1
ω1: 5ω2: 1
ω1: 6ω2: 0
ω1: 5ω2: 2
Parallel Execution
Small network traffic Load balancing Lightweight operations on the main serverSELECT SELECT
SELECT
SELECT
R1
R2
R3R4
ω1: 7ω2: 1
ω1: 5ω2: 1
ω1: 6ω2: 0
ω1: 5ω2: 2
52
123
183
234
![Page 49: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/49.jpg)
Introduction
LOCUS
Parallel Execution
Experimental Evaluation
Conclusions & Future Work
Motivation
![Page 50: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/50.jpg)
Introduction
LOCUS
Parallel Execution
Experimental Evaluation
Conclusions & Future Work
Motivation
![Page 51: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/51.jpg)
Experimental Evaluation
LOCUS vs DTs and NNs (weka) Synthetic datasets
Ten functions [Agrawal et al., IEEE TKDE 1993]D = 9N [5103, 5106]
Real-world datasetsUCI Repository
![Page 52: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/52.jpg)
Experimental Evaluation
0
2
4
6
8
10
12
1 2 3 4 5 6 7 8 9 10
Function
Err
or
rate
(%
)
LOCUS DT
Classification error rate (synthetic datasets, N = 5104)
![Page 53: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/53.jpg)
Experimental Evaluation
0
1
2
3
4
5
6
1 2 3 4 5 6 7 8 9 10
Function
Err
or
rate
(%
)
5000 50000 500000 5000000
Effect of dataset size on classification error rate of LOCUS (synthetic datasets, N [5103, 5106])
![Page 54: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/54.jpg)
Experimental Evaluation
0
100
200
300
400
500
600
1.E+03 1.E+04 1.E+05 1.E+06 1.E+07
Training Set Size
Ave
rag
e D
ecis
ion
Tim
e (m
sec)
Effect of dataset size on time scalability of LOCUS (synthetic datasets, N [5103, 5106])
![Page 55: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/55.jpg)
Experimental Evaluation
0
5
10
15
20
25
30
35
40
Patien
t
Glas
sLi
ver
Breas
tCan
cer
Diabe
tes
Lette
rs
CovTy
pe 5
0000
Err
or
rate
(%
)
LOCUS DT
Classification error rate (real-world datasets)
![Page 56: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/56.jpg)
0
5
10
15
20
25
30
35
DT LOCUS
Err
or
rate
(%
)
5000 50000 500000
Experimental EvaluationEffect of dataset size on classification error rate
(dataset CovType, N [5103, 5105])
![Page 57: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/57.jpg)
Introduction
LOCUS
Parallel Execution
Experimental Evaluation
Conclusions & Future Work
Motivation
![Page 58: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/58.jpg)
Introduction
LOCUS
Parallel Execution
Experimental Evaluation
Conclusions & Future Work
Motivation
![Page 59: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/59.jpg)
Conclusions & Future Work
LOCUSLazy (complex/dynamic datasets and models)Efficient (based on simple SQL queries)Reliable (converging to optimal)Parallelizable
![Page 60: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/60.jpg)
Conclusions & Future Work
Similar techniques for feature selectionregression
Implementation of a parallel version
![Page 61: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/61.jpg)
Questions?
![Page 62: Database Implementation of a Model-Free Classifier](https://reader035.vdocuments.net/reader035/viewer/2022081501/56814159550346895dad349f/html5/thumbnails/62.jpg)
Thank you!