introduction to machine learning algorithms

36
1 Introduction Introduction to to Machine Learning Algorithms Machine Learning Algorithms บุญเสริม บุญเสริม กิจศิริกุล กิจศิริกุล ภาควิชาวิศวกรรมคอมพิวเตอร ภาควิชาวิศวกรรมคอมพิวเตอร จุฬาลงกรณมหาวิทยาลัย จุฬาลงกรณมหาวิทยาลัย

Upload: others

Post on 12-Sep-2021

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Machine Learning Algorithms

1

Introduction Introduction to to

Machine Learning AlgorithmsMachine Learning Algorithms

บุญเสริมบุญเสริม กิจศิริกลุกิจศิริกลุภาควิชาวิศวกรรมคอมพิวเตอรภาควิชาวิศวกรรมคอมพิวเตอร จุฬาลงกรณมหาวทิยาลัยจุฬาลงกรณมหาวทิยาลัย

Page 2: Introduction to Machine Learning Algorithms

2

The Needs for Machine LearningThe Needs for Machine Learning

Recently, full-genome sequencing has blossomed.

Several other technologies, such as DNA microarrays and mass spectrometry have considerably progressed.

The technologies rapidly produce terabytes of data.

Page 3: Introduction to Machine Learning Algorithms

3

The Problem of DiabetesThe Problem of Diabetes

To predict whether the patient shows signs of diabetesData contains 2 classes, i.e.+ : tested positive for diabetes− : tested negative for diabetes

Page 4: Introduction to Machine Learning Algorithms

4

The Problem of Diabetes (contThe Problem of Diabetes (cont.).)

Data contains 8 attributes1. Number of times pregnant2. Plasma glucose concentration after?? 2 hours in an oral

glucose tolerance test3. Diastolic blood pressure (mm Hg)4. Triceps skin fold thickness (mm)5. 2-Hour serum insulin (mu U/ml)6. Body mass index (weight in kg/(height in m)^2)7. Diabetes pedigree function8. Age (years)

Page 5: Introduction to Machine Learning Algorithms

5

The Data of Diabetes (Table 1)The Data of Diabetes (Table 1)

No. Time Pregnant

Plasma Glucose

Blood Pressure

Fold Thickness

Serum Insulin

BMI Pedi-gree

189 846

175

83

94

168

235

110

140

543

166

0.39830.1

25.8

43.3

28.1

43.1

39.3

22.2

103

23.2

0.587

0.183

0.167

2.288

0.704

0.245

0.487

89

137

126

145

97

30.5 0.158197

60

72

30

66

40

88

82

66

70

23

19

38

23

35

41

19

15

45

Age Class

1 59 +

5 51 +

1 33 −

1 21 −

0 33 +

3 27 −

13 57 −

1 22 −

2 53 +

Page 6: Introduction to Machine Learning Algorithms

6

Inductive Learning form Data Inductive Learning form Data (Classification Tasks)(Classification Tasks)

Input Data

Learning Algorithm

Learned Model

Page 7: Introduction to Machine Learning Algorithms

7

Inductive Learning form Data Inductive Learning form Data –– Cont.Cont.(Classification Tasks)(Classification Tasks)

Learned Model

Test Data

Learning Algorithm

Classification

Page 8: Introduction to Machine Learning Algorithms

8

Machine Learning AlgorithmsMachine Learning Algorithms

Decision Tree LearningNeural NetworksSupport Vector MachinesEtc.

Page 9: Introduction to Machine Learning Algorithms

9

Decision Tree LearningDecision Tree Learning

Decision tree representation

Each internal node tests an attribute

Each branch corresponds to attribute value

Each leaf node assigns a classification

Page 10: Introduction to Machine Learning Algorithms

10

An Example of Decision TreesAn Example of Decision Treesplasma glucose

body mass indexnegative

≤127 >127

blood pressure

negative

blood pressure

≤29.9 >29.9

blood pressure

positivenegative

plasma glucosepositive

≤72 >72 ≤61 >61

positivenegative

>66 >155≤66 ≤155

Page 11: Introduction to Machine Learning Algorithms

11

A Decision Tree Learning AlgorithmA Decision Tree Learning Algorithm

Main Loop:1. A ← the “best” decision attribute for next node

2. Assign A as decision attribute for node

3. For each value of A, create new descendant of node

4. Sort training examples to leaf nodes

5. If training examples are perfectly classified, then STOP, else iterate over new leaf nodes

Page 12: Introduction to Machine Learning Algorithms

12

Selection of the Selection of the ““BestBest”” AttributeAttributeUsing Data in Table 1 (4+,5−), some candidate attributes are shown below.

no. time pregnantplasma glucose

4+,1−

≤126 >126

0+,4− 2+,2−≤1 >1

2+,3−

agebody mass index

3+,2−

≤28.1 >28.1

1+,3− 4+,2−≤27 >27

0+,3−

Page 13: Introduction to Machine Learning Algorithms

13

EntropyEntropy

S is a sample of training examplesp+ is the proportion of positive examples in Sp− is the proportion of negative examples in SEntropy measures the impurity of SEntropy(S) = − p+log2p+− p−log2p−

Page 14: Introduction to Machine Learning Algorithms

14

Information GainInformation GainGain(S,A) = expected reduction in entropy due to

sorting on A

Example:

∑∈

−≡)(

)(||||

)(),(AValuev

SEntropySvS

SEntropyASGain v

plasma glucose4+,5−

4+,1−

≤126 >126

0+,4−

59.0

51

2log51

54

2log54

44

2log44

40

2log40

95

2log95

94

2log94),( =

⎥⎥⎥⎥

⎢⎢⎢⎢

⎟⎠⎞

⎜⎝⎛ −−

+⎟⎠⎞

⎜⎝⎛ −−

−⎟⎠⎞

⎜⎝⎛ −−≡glucoseplasmaSGain

Page 15: Introduction to Machine Learning Algorithms

15

Selection of the Selection of the ““BestBest”” Attribute by GainAttribute by Gain

Gain=0.59 Gain=0.01

Gain=0.09 Gain=0.38

no. time pregnant

2+,2−≤1 >1

2+,3−

body mass index

3+,2−

≤28.1 >28.1

1+,3−

age

4+,2−≤27 >27

0+,3−

plasma glucose

4+,1−

≤126 >126

0+,4−

“Plasma glucose” is selected as the root node, and the process of adding nodes is repeated.

Page 16: Introduction to Machine Learning Algorithms

16

Artificial Neural NetworksArtificial Neural Networks

Artificial neural networks (ANN) simulate units in human brain (neurons)Many weighted interconnections among unitsLearning by adaptation of the connection weights

Page 17: Introduction to Machine Learning Algorithms

17

PerceptronPerceptronOne of the earliest neural network modelsInput is a real valued vector (x1,…,xn)Use an activation function to compute output value (o)Output is linear combination of the input with weights (w1,…,wn) x1

x2

xn

...

1

w2

wn

w x0= 1w0

Σo

x1

x2

xn

...

1

w2

wn

w x0= 1w0

Σo

⎩⎨⎧

<+++−>+++

=θθ

nn

nnn xwxwxwif

xwxwxwifxxxo

L

L

2211

221121 1

1),...,,(

⎩⎨⎧

<++++−>++++

=0101

),...,,(22110

2211021

nn

nnn xwxwxwwif

xwxwxwwifxxxo

L

L

Page 18: Introduction to Machine Learning Algorithms

18

The Data of Diabetes (Table 1)The Data of Diabetes (Table 1)

No. Time Pregnant

Plasma Glucose

Blood Pressure

Fold Thickness

Serum Insulin

BMI Pedi-gree

189 846

175

83

94

168

235

110

140

543

166

0.39830.1

25.8

43.3

28.1

43.1

39.3

22.2

103

23.2

0.587

0.183

0.167

2.288

0.704

0.245

0.487

89

137

126

145

97

30.5 0.158197

60

72

30

66

40

88

82

66

70

23

19

38

23

35

41

19

15

45

Age Class

1 59 +

5 51 +

1 33 −

1 21 −

0 33 +

3 27 −

13 57 −

1 22 −

2 53 +

Page 19: Introduction to Machine Learning Algorithms

19

The Data of Diabetes (Table 1)The Data of Diabetes (Table 1)

No. Time Pregnant

Plasma Glucose

Blood Pressure

Fold Thickness

Serum Insulin

BMI Pedi-gree

189 846

175

83

94

168

235

110

140

543

166

0.39830.1

25.8

43.3

28.1

43.1

39.3

22.2

103

23.2

0.587

0.183

0.167

2.288

0.704

0.245

0.487

89

137

126

145

97

30.5 0.158197

60

72

30

66

40

88

82

66

70

23

19

38

23

35

41

19

15

45

Age Class

1 59 +

5 51 +

1 33 −

1 21 −

0 33 +

3 27 −

13 57 −

1 22 −

2 53 +

Page 20: Introduction to Machine Learning Algorithms

20

PerceptronPerceptron’’ss Decision SurfaceDecision Surface

+

90 100 110 120 130 140 150 160 170 180 190

2224

26

28

30

32

34

36

38

x 1

x2

200

38

40

42

44

+

+−

−−

+

BMI

plasma glucose

x ww x w

w= − −21

21

0

2

w0+w x1 1 xw 22+ =0

Page 21: Introduction to Machine Learning Algorithms

21

PerceptronPerceptron TrainingTraining

t=1t=-1

w=[0.25 –0.1 0.5]x2 = 0.2 x1 – 0.5

o=1

o=-1

(x,t)=([-1,-1],1)o=sgn(0.25+0.1-0.5)=-1

∆w=[0.2 –0.2 –0.2]

(x,t)=([2,1],-1)o=sgn(0.45-0.6+0.3)=1

∆w=[-0.2 –0.4 –0.2]

(x,t)=([1,1],1)o=sgn(0.25-0.7+0.1)=-1

∆w=[0.2 0.2 0.2]

Page 22: Introduction to Machine Learning Algorithms

22

Linearly NonLinearly Non--Separable ExamplesSeparable Examples

+

+

+

Page 23: Introduction to Machine Learning Algorithms

23

Multilayer NetworksMultilayer Networks

Multilayer Feedforward Neural Networks can represent non-linear decision surfaceAn Example of multilayer networks

w0=-0.5

x1

x2

w0=-0.5Σ

Σ

1.0 1.0

obinary function

Σ

ΣΣ

w0=-1.5

1.0 1.0

1.0 -2.0

+

x10 1

1

+

x2

0

0

-0.5

-1.5

0

0

-0.5 0

Page 24: Introduction to Machine Learning Algorithms

24

Sigmoid FunctionSigmoid FunctionSigmoid function is used as activation function in multilayer networks

1

0

σ( )ye y=

+ −1

1

y

x1

x2

xn

...

w1

w2

wn

x0= 1w0

Σo

x1

x2

xn

...

w1

w2

wn

x0= 1w0

Σo

Page 25: Introduction to Machine Learning Algorithms

25

+

+

−−

Using SigmoidUsing Sigmoid

Page 26: Introduction to Machine Learning Algorithms

26

BackpropagationBackpropagation AlgorithmAlgorithm

Backward step: propagate errors from output to hidden layer

yj

xk

δj

wki

wjk

δk

Forward step: Propagate activation from input to output layer

xi

Page 27: Introduction to Machine Learning Algorithms

27

Support Vector MachinesSupport Vector MachinesAn SVM constructs an optimal hyperplane that separates the data points of two classes as far as possible

No. Time Pregnant

Plasma Glucose

Blood Pressure

Fold Thickness

Serum Insulin

BMI Pedi-gree

189 8461758394

168235110140543

1660.39830.1

25.843.328.143.139.322.2

103

23.2

0.5870.1830.1672.2880.7040.2450.487

8913712614597

30.5 0.158197

607230664088826670

231938233541191545

Age Class

1 59 +5 51 +1 33 −1 21 −0 33 +3 27 −

13 57 −1 22 −2 53 +

Page 28: Introduction to Machine Learning Algorithms

28

Optimal Separating Optimal Separating HyperplaneHyperplane

+

90 100 110 120 130 140 150 160 170 180 190

2224

26

28

30

32

34

36

38

x 1

x2

200

38

40

42

44

+

+−

−−

+

BMI

plasma glucose

Page 29: Introduction to Machine Learning Algorithms

29

An Example of Linearly Separable FunctionsAn Example of Linearly Separable Functions

• No. of support vectors = 4

Page 30: Introduction to Machine Learning Algorithms

30

An Example of Linearly NonAn Example of Linearly Non--Separable FunctionsSeparable Functions

Page 31: Introduction to Machine Learning Algorithms

31

An Example of Linearly Separable FunctionsAn Example of Linearly Separable Functions

• In case of using the input space

Page 32: Introduction to Machine Learning Algorithms

32

Feature SpacesFeature Spaces

For linearly non-separable function, it is very likely that a linear separator (hyperplane) can be constructed in higher dimensional space.Suppose we map the data points in the input space Rn

into some feature space of higher dimension, Rm using function F

F : Rn → Rm

Example:F : R2 → R3

x = (x1, x2) , F(x) = (x1, x2, √2 x1x2)

Page 33: Introduction to Machine Learning Algorithms

33

An Example of Linearly NonAn Example of Linearly Non--Separable FunctionsSeparable Functions• In case of using the feature space: F(x) = (x1, x2, √2 x1x2)

Page 34: Introduction to Machine Learning Algorithms

34

An Example of Linearly NonAn Example of Linearly Non--Separable FunctionsSeparable Functions• The corresponding non-linear function in the input space

Page 35: Introduction to Machine Learning Algorithms

35

MulticlassMulticlass SVMsSVMsThe Max Win algorithm is an approach for constructing multiclass SVMsTraining:− Construct all possible binary SVMs− For N classes, there will be N(N-1)/2 binary

SVMs− Each classifier is trained on 2 out of N classesTesting:− A test example is classified by all classifiers− Each classifier provides one vote for its

preferred class− The majority vote is the final output

Page 36: Introduction to Machine Learning Algorithms

36

Comparison of the Algorithms on DiabetesComparison of the Algorithms on Diabetes

Algorithm AccuracyDecision Tree 78.65%

Neural Network 79.17%Support Vector Machine 78.65%