object orie’d data analysis, last time hdlss discrimination –md much better maximal data piling...

52
Object Orie’d Data Analysis, Last Time • HDLSS Discrimination – MD much better • Maximal Data Piling – HDLSS space is a strange place • Kernel Embedding – Embed data in higher dimensional manifold – Gives greater flexibility to linear methods – Which manifold? - Radial basis functions – Careful about over fitting?

Upload: gwenda-nash

Post on 27-Dec-2015

226 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed

Object Orie’d Data Analysis, Last Time

• HDLSS Discrimination– MD much better

• Maximal Data Piling– HDLSS space is a strange place

• Kernel Embedding– Embed data in higher dimensional manifold

– Gives greater flexibility to linear methods

– Which manifold? - Radial basis functions

– Careful about over fitting?

Page 2: Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed

Kernel EmbeddingAizerman, Braverman and Rozoner

(1964) • Motivating idea:

Extend scope of linear discrimination,By adding nonlinear components to data

(embedding in a higher dim’al space)

• Better use of name:nonlinear discrimination?

Page 3: Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed

Kernel EmbeddingStronger effects for higher order polynomial embedding:

E.g. for cubic,

linear separation can give 4 parts (or fewer)

332 :,, xxxx

Page 4: Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed

Kernel EmbeddingGeneral View: for original data matrix:

add rows:

i.e. embed in ThenHigher sliceDimensional with aSpace hyperplane

dnd

n

xx

xx

1

111

nn

dnd

n

dnd

n

xxxx

xx

xx

xx

xx

212111

221

21

211

1

111

Page 5: Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed

Kernel EmbeddingPolynomial Embedding, Toy Example 3:Donut

Page 6: Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed

Kernel EmbeddingPolynomial Embedding, Toy Example 3:Donut

Page 7: Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed

Kernel EmbeddingPolynomial Embedding, Toy Example 3:Donut

Page 8: Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed

Kernel EmbeddingPolynomial Embedding, Toy Example 3:Donut

Page 9: Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed

Kernel EmbeddingToy Example 4: Checkerboard

VeryChallenging!

LinearMethod?

PolynomialEmbedding?

Page 10: Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed

Kernel EmbeddingToy Example 4: Checkerboard

Polynomial Embedding:

• Very poor for linear

• Slightly better for higher degrees

• Overall very poor

• Polynomials don’t have needed

flexibility

Page 11: Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed

Kernel EmbeddingToy Example 4: CheckerboardRadialBasisEmbedding+ FLDIsExcellent!

Page 12: Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed

Kernel EmbeddingOther types of embedding:

• Explicit

• Implicit

Will be studied soon, after

introduction to Support Vector Machines…

Page 13: Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed

Kernel Embedding generalizations of this idea to other

types of analysis

& some clever computational ideas.

E.g. “Kernel based, nonlinear Principal

Components Analysis”

Ref: Schölkopf, Smola and Müller

(1998)

Page 14: Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed

Support Vector MachinesMotivation:

• Find a linear method that “works well”for embedded data

• Note: Embedded data are very non-Gaussian

• Suggests value ofreally new approach

Page 15: Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed

Support Vector MachinesClassical References:

• Vapnik (1982)

• Boser, Guyon & Vapnik (1992)

• Vapnik (1995)

Excellent Web Resource:

• http://www.kernel-machines.org/

Page 16: Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed

Support Vector MachinesRecommended tutorial:

• Burges (1998)

Recommended Monographs:

• Cristianini & Shawe-Taylor (2000)

• Schölkopf & Alex Smola (2002)

Page 17: Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed

Support Vector MachinesGraphical View, using Toy Example:

• Find separating plane

• To maximize distances from data to plane

• In particular smallest distance

• Data points closest are called

support vectors

• Gap between is called margin

Page 18: Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed

Support Vector MachinesGraphical View, using Toy Example:

Page 19: Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed

Support Vector MachinesGraphical View, using Toy Example:

Page 20: Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed

Support Vector MachinesGraphical View, using Toy Example:

Page 21: Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed

Support Vector MachinesGraphical View, using Toy Example:

Page 22: Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed

Support Vector MachinesGraphical View, using Toy Example:

• Find separating plane

• To maximize distances from data to plane

• In particular smallest distance

• Data points closest are called

support vectors

• Gap between is called margin

Page 23: Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed

SVMs, Optimization Viewpoint

Formulate Optimization problem, based on:

• Data (feature) vectors • Class Labels • Normal Vector • Location (determines intercept) • Residuals (right side) • Residuals (wrong side) • Solve (convex problem) by quadratic

programming

nxx ,...,1

1iyw

b bwxyr tiii

ii r

Page 24: Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed

SVMs, Optimization Viewpoint

Lagrange Multipliers primal formulation (separable case):

• Minimize: Where are Lagrange

multipliers

Dual Lagrangian version:• Maximize:

Get classification function:

n

iiiiP bwxywbwL

1

2

21 1,,

0,...,1 n

i ji

jijijiiD xxyyL,

21

n

iiii bxxyxf

1

Page 25: Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed

SVMs, ComputationMajor Computational Point:• Classifier only depends on data

through inner products!• Thus enough to only store inner

products• Creates big savings in optimization• Especially for HDLSS data• But also creates variations in kernel

embedding (interpretation?!?)• This is almost always done in practice

Page 26: Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed

SVMs, Comput’n & Embedding

For an “Embedding Map”,

e.g.

Explicit Embedding:

Maximize:

Get classification function:

• Straightforward application of embedding

• But loses inner product advantage

x

2x

xx

i ji

jijijiiD xxyyL,

21

n

iiii bxxyxf

1

Page 27: Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed

SVMs, Comput’n & EmbeddingImplicit Embedding:

Maximize:

Get classification function:

• Still defined only via inner products• Retains optimization advantage• Thus used very commonly• Comparison to explicit embedding?• Which is “better”???

i ji

jijijiiD xxyyL,

21

n

iiii bxxyxf

1

Page 28: Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed

SVMs & RobustnessUsually not severely affected by

outliers,But a possible weakness:

Can have very influential pointsToy E.g., only 2 points drive SVM

Page 29: Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed

SVMs & RobustnessCan have very influential points

Page 30: Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed

SVMs & RobustnessUsually not severely affected by outliers,But a possible weakness:

Can have very influential pointsToy E.g., only 2 points drive SVMNotes:• Huge range of chosen hyperplanes• But all are “pretty good discriminators”• Only happens when whole range is

OK???• Good or bad?

Page 31: Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed

SVMs & RobustnessEffect of violators (toy example):

Page 32: Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed

SVMs & RobustnessEffect of violators (toy example):

• Depends on distance to plane

• Weak for violators nearby

• Strong as they move away

• Can have major impact on plane

• Also depends on tuning parameter C

Page 33: Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed

SVMs, Computation Caution: available algorithms are not

created equal

Toy Example:

• Gunn’s Matlab code

• Todd’s Matlab code

Page 34: Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed

SVMs, Computation Toy Example: Gunn’s Matlab code

Page 35: Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed

SVMs, Computation Toy Example: Todd’s Matlab code

Page 36: Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed

SVMs, Computation Caution: available algorithms are not

created equal

Toy Example:

• Gunn’s Matlab code

• Todd’s Matlab code

Serious errors in Gunn’s version, does not find real optimum…

Page 37: Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed

SVMs, Tuning Parameter Recall Regularization Parameter C:

• Controls penalty for violation

• I.e. lying on wrong side of plane

• Appears in slack variables

• Affects performance of SVM

Toy Example:

d = 50, Spherical Gaussian data

Page 38: Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed

SVMs, Tuning Parameter Toy Example: d = 50, Sph’l Gaussian

data

Page 39: Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed

SVMs, Tuning Parameter Toy Example:

d = 50, Spherical Gaussian dataX=Axis: Opt. Dir’n Other: SVM Dir’n• Small C:

– Where is the margin?– Small angle to optimal (generalizable)

• Large C:– More data piling– Larger angle (less generalizable)– Bigger gap (but maybe not better???)

• Between: Very small range

Page 40: Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed

SVMs, Tuning Parameter Toy Example: d = 50, Sph’l Gaussian

data

Put MD on horizontal axis

Page 41: Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed

SVMs, Tuning Parameter Toy Example:

d = 50, Spherical Gaussian data

Careful look at small C:

Put MD on horizontal axis

• Shows SVM and MD same for C small– Mathematics behind this?

• Separates for large C– No data piling for MD

Page 42: Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed

Support Vector Machines

Important Extension:

Multi-Class SVMs

Hsu & Lin (2002)

Lee, Lin, & Wahba (2002)

• Defined for “implicit” version

• “Direction Based” variation???

Page 43: Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed

Distance Weighted Discrim’n

Improvement of SVM for HDLSS DataToy e.g.

(similar toearlier movie)

50d)1,0(N

2.21 20 nn

Page 44: Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed

Distance Weighted Discrim’n

Toy e.g.: Maximal Data Piling Direction- Perfect

Separation- Gross

Overfitting- Large Angle- Poor

Gen’ability

Page 45: Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed

Distance Weighted Discrim’n Toy e.g.: Support Vector Machine

Direction- Bigger Gap- Smaller Angle- Better

Gen’ability- Feels support

vectors toostrongly???

- Ugly subpops?- Improvement?

Page 46: Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed

Distance Weighted Discrim’n Toy e.g.: Distance Weighted

Discrimination- Addresses

these issues- Smaller Angle- Better

Gen’ability- Nice subpops- Replaces

min dist. by avg. dist.

Page 47: Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed

Distance Weighted Discrim’n Based on Optimization Problem:

More precisely: Work in appropriate penalty for violationsOptimization Method:

Second Order Cone Programming• “Still convex” gen’n of quad’c

program’g• Allows fast greedy solution• Can use available fast software

(SDP3, Michael Todd, et al)

n

i iw r1,

1min

Page 48: Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed

Distance Weighted Discrim’n References for more on DWD:

• Current paper:Marron, Todd and Ahn (2007)

• Links to more papers:Ahn (2007)

• JAVA Implementation of DWD:caBIG (2006)

• SDPT3 Software:Toh (2007)

Page 49: Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed

Distance Weighted Discrim’n 2-d Visualization:

Pushes PlaneAway FromData

All PointsHave SomeInfluence

n

i iw r1,

1min

Page 50: Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed

Support Vector MachinesGraphical View, using Toy Example:

Page 51: Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed

Support Vector MachinesGraphical View, using Toy Example:

Page 52: Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed

Distance Weighted Discrim’n

Graphical View, using Toy Example: