machine learning saarland university, ss 2007

Machine LearningSaarland University, SS 2007

Holger BastMarjan CelikikKevin ChangStefan Funke

Joachim GiesenMax-Planck-Institut für Informatik

Saarbrücken, Germany

Lecture 1, Friday April 19th, 2007(basics and example applications)

Overview of this Lecture Machine Learning Basics

– Classification– Objects as feature vectors– Regression– Clustering

Example applications– Surface reconstruction– Preference Learning– Netflix challenge (how to earn $1,000,000)– Text search

Classification

Given a set of points, each labeled + or –– learn something from them …– … in order to predict label of new points

+ +

++

+

+

++ – –

––

–

––

–?–

this is an instance of supervised learning

Classification — Quality

Which classifier is better?– answer requires a model of where the data comes from– and a measure of quality/accuracy

+ +

++

+

+

++ – –

––

–

––

–?

Classification — Outliers and Overfitting

We have to find a balance between two extremes– oversimplification ( large classification error)– overfitting ( lack of regularity)– again: requires a model of the data

+ +

++

+

+

++ – –

––

–

––

–

+

––

–

Classification — Point Transformation

If a classifier does not work for the original data– try it on a transformation of the data– typically: make points linearly separable by a

suitable mapping to a higher-dimensional space

+ ++ +++– – – ++0

++

++

+

– ––

++

+

map x to (x , |x|)

Classification — more labels

+ +

++

+

+

++

– –

––

–

––

–

o o

oo

o

oo

Typically:– first, basic technique for binary classification– then, extension to more labels

Objects as Feature Vectors But why learn something about points ? General Idea:

– represent objects as points in a space of fixed dimension

– each dimension corresponds to a so-called feature of the object

Very crucial:– selection of features– normalization of vectors

Objects as Feature Vectors Example: Objects with attributes

– features = values– normalize by reference value for each feature

Person 1 Person 2 Person 3

188 cm 181 cm 190 cm75 kg 90 kg 77 kg

age 36 age 32 age 34

1887536

1819033

Person 4

176 cm55 kg

age 24

heightweightage

1907734

1725534

1.040.940.90

1.011.130.83

height/180weight/70age/30

1.060.960.85

0.960.690.60

Objects as Feature Vectors

2 8 28 5 82 7 2

282858272

Example: Images– features = pixels

(with grey values)– often fine without

further normalization 1 6 16 6 61 6 1

Image 1 Image 2

pixel (1,1)pixel (1,2)pixel (1,3)pixel (2,1)pixel (2,2)pixel (2,3)pixel (3,1)pixel (3,2)pixel (3,3)

161666161

Objects as Feature Vectors Example: Text documents

– features = words– normalize to unit norm

1110001

LearningMachineSSStatisticalTheory20062007

Doc. 1Machine LearningSS 2007

Doc. 2Statistica

l LearningTheory

SS 2007

Doc. 3Statistica

l LearningTheory

SS 20061011101

1011110

Objects as Feature Vectors Example: Text documents

– features = words– normalize to unit norm

0.50.50.50000.5

LearningMachineSSStatisticalTheory20062007

Doc. 1Machine LearningSS 2007

Doc. 2Statistica

l LearningTheory

SS 2007

Doc. 3Statistica

l LearningTheory

SS 20060.400.40.40.400.4

0.400.40.40.40.40

Regression

Learn a function that maps objects to values Similar trade-off as for classification:

– risk of oversimplification vs. risk of overfitting

xx

x

xx

?

x

x

x

given value(typically multi-dimensional)

value to learn(typically a real number)

Clustering

Partition given set of points into clusters Similar problems as for classification

– follow data distribution, but not too closely– transformation often helps (next slide)

xxx x

x

xx

x

x

x

xx

this is an instance of unsupervised learning

Clustering

Partition given set of points into clusters Similar problems as for classification

– follow data distribution, but not too closely– transformation often helps (next slide)

xxx x

x

xx

x

x

x

xx

Clustering — Transformation For clustering, typically dimension reduction helps

– whereas in classification, embedding in a higher-dimensional space typically helps

1 0 1 0 01 1 0 0 01 1 1 1 00 0 0 1 1

internetwebsurfingbeach

vectors fordocuments 2, 3, and 4

equally dissimilar

0.9 0.8 0.8 0.0 0.0-0.1 0.0 0.0 1.1 0.9

project to 2 dimensions

2-clustering wouldwork fine now

doc1 doc2 doc3 doc4 doc5

Application Example: Text Search

676 abstracts from the Max-Planck-Institute– for example:

We present two theoretically interesting and empirically successful techniques for improving the linear programming approaches, namely graph transformation and local cuts, in the context of the Steiner problem. We show the impact of these techniques on the solution of the largest benchmark instances ever solved.

– 3283 words (words like and, or, this, … removed)– abstracts come from 5 working groups: Algorithms, Logic,

Graphics, CompBio, Databases– reduce to 10 concepts

No dictionary, no training, only the plain text itself !

machine learning saarland university, ss 2007

Documents

feature person

feature vectorsexample

machine learning ss

socalled feature

feature vectorsbut

classification outliers

risk of overfittingxxxxx

risk of oversimplification