machine learning saarland university, ss 2007
DESCRIPTION
Machine Learning Saarland University, SS 2007. Lecture 1, Friday April 19 th , 2007 (basics and example applications). Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken, Germany. Overview of this Lecture. - PowerPoint PPT PresentationTRANSCRIPT
Machine LearningSaarland University, SS 2007
Holger BastMarjan CelikikKevin ChangStefan Funke
Joachim GiesenMax-Planck-Institut für Informatik
Saarbrücken, Germany
Lecture 1, Friday April 19th, 2007(basics and example applications)
Overview of this Lecture Machine Learning Basics
– Classification– Objects as feature vectors– Regression– Clustering
Example applications– Surface reconstruction– Preference Learning– Netflix challenge (how to earn $1,000,000)– Text search
Classification
Given a set of points, each labeled + or –– learn something from them …– … in order to predict label of new points
+ +
++
+
+
++ – –
––
–
––
–?–
this is an instance of supervised learning
Classification — Quality
Which classifier is better?– answer requires a model of where the data comes from– and a measure of quality/accuracy
+ +
++
+
+
++ – –
––
–
––
–?
Classification — Outliers and Overfitting
We have to find a balance between two extremes– oversimplification ( large classification error)– overfitting ( lack of regularity)– again: requires a model of the data
+ +
++
+
+
++ – –
––
–
––
–
+
––
–
Classification — Point Transformation
If a classifier does not work for the original data– try it on a transformation of the data– typically: make points linearly separable by a
suitable mapping to a higher-dimensional space
+ ++ +++– – – ++0
++
++
+
– ––
++
+
map x to (x , |x|)
Classification — more labels
+ +
++
+
+
++
– –
––
–
––
–
o o
oo
o
oo
Typically:– first, basic technique for binary classification– then, extension to more labels
Objects as Feature Vectors But why learn something about points ? General Idea:
– represent objects as points in a space of fixed dimension
– each dimension corresponds to a so-called feature of the object
Very crucial:– selection of features– normalization of vectors
Objects as Feature Vectors Example: Objects with attributes
– features = values– normalize by reference value for each feature
Person 1 Person 2 Person 3
188 cm 181 cm 190 cm75 kg 90 kg 77 kg
age 36 age 32 age 34
1887536
1819033
Person 4
176 cm55 kg
age 24
heightweightage
1907734
1725534
1.040.940.90
1.011.130.83
height/180weight/70age/30
1.060.960.85
0.960.690.60
Objects as Feature Vectors
2 8 28 5 82 7 2
282858272
Example: Images– features = pixels
(with grey values)– often fine without
further normalization 1 6 16 6 61 6 1
Image 1 Image 2
pixel (1,1)pixel (1,2)pixel (1,3)pixel (2,1)pixel (2,2)pixel (2,3)pixel (3,1)pixel (3,2)pixel (3,3)
161666161
Objects as Feature Vectors Example: Text documents
– features = words– normalize to unit norm
1110001
LearningMachineSSStatisticalTheory20062007
Doc. 1Machine LearningSS 2007
Doc. 2Statistica
l LearningTheory
SS 2007
Doc. 3Statistica
l LearningTheory
SS 20061011101
1011110
Objects as Feature Vectors Example: Text documents
– features = words– normalize to unit norm
0.50.50.50000.5
LearningMachineSSStatisticalTheory20062007
Doc. 1Machine LearningSS 2007
Doc. 2Statistica
l LearningTheory
SS 2007
Doc. 3Statistica
l LearningTheory
SS 20060.400.40.40.400.4
0.400.40.40.40.40
Regression
Learn a function that maps objects to values Similar trade-off as for classification:
– risk of oversimplification vs. risk of overfitting
xx
x
xx
?
x
x
x
given value(typically multi-dimensional)
value to learn(typically a real number)
Regression
Learn a function that maps objects to values Similar trade-off as for classification:
– risk of oversimplification vs. risk of overfitting
xx
x
xx
?
x
x
x
given value(typically multi-dimensional)
value to learn(typically a real number)
Clustering
Partition given set of points into clusters Similar problems as for classification
– follow data distribution, but not too closely– transformation often helps (next slide)
xxx x
x
xx
x
x
x
xx
this is an instance of unsupervised learning
Clustering
Partition given set of points into clusters Similar problems as for classification
– follow data distribution, but not too closely– transformation often helps (next slide)
xxx x
x
xx
x
x
x
xx
Clustering
Partition given set of points into clusters Similar problems as for classification
– follow data distribution, but not too closely– transformation often helps (next slide)
xxx x
x
xx
x
x
x
xx
Clustering
Partition given set of points into clusters Similar problems as for classification
– follow data distribution, but not too closely– transformation often helps (next slide)
xxx x
x
xx
x
x
x
xx
Clustering — Transformation For clustering, typically dimension reduction helps
– whereas in classification, embedding in a higher-dimensional space typically helps
1 0 1 0 01 1 0 0 01 1 1 1 00 0 0 1 1
internetwebsurfingbeach
vectors fordocuments 2, 3, and 4
equally dissimilar
0.9 0.8 0.8 0.0 0.0-0.1 0.0 0.0 1.1 0.9
project to 2 dimensions
2-clustering wouldwork fine now
doc1 doc2 doc3 doc4 doc5
Application Example: Text Search
676 abstracts from the Max-Planck-Institute– for example:
We present two theoretically interesting and empirically successful techniques for improving the linear programming approaches, namely graph transformation and local cuts, in the context of the Steiner problem. We show the impact of these techniques on the solution of the largest benchmark instances ever solved.
– 3283 words (words like and, or, this, … removed)– abstracts come from 5 working groups: Algorithms, Logic,
Graphics, CompBio, Databases– reduce to 10 concepts
No dictionary, no training, only the plain text itself !