multinomial classification and application of ml
TRANSCRIPT
![Page 1: Multinomial classification and application of ML](https://reader035.vdocuments.net/reader035/viewer/2022062820/58abaa291a28abdf3c8b5f63/html5/thumbnails/1.jpg)
Machine/Deep Learningwith Theano
Softmax classification : Multinomial classificationApplication & Tips : Learning rate, data preprocessing, overfitting
Deep Neural Nets for Everyone
![Page 2: Multinomial classification and application of ML](https://reader035.vdocuments.net/reader035/viewer/2022062820/58abaa291a28abdf3c8b5f63/html5/thumbnails/2.jpg)
Multinomial Classification
Softmax classification
![Page 3: Multinomial classification and application of ML](https://reader035.vdocuments.net/reader035/viewer/2022062820/58abaa291a28abdf3c8b5f63/html5/thumbnails/3.jpg)
Logistic Regression
𝐻 𝐿(𝑋 )=𝑊𝑋
𝐻 𝐿 ( 𝑋 )=𝑍
𝑔 (𝑍 )=1
1+𝑒−𝑍
𝐻𝑅 ( 𝑋 )=𝑔 (𝐻 𝐿(𝑋 ))
𝑋
𝑊
𝑍 𝑌
: Prediction ( 0 ~ 1 ) : Real Value ( 0 or 1 )
![Page 4: Multinomial classification and application of ML](https://reader035.vdocuments.net/reader035/viewer/2022062820/58abaa291a28abdf3c8b5f63/html5/thumbnails/4.jpg)
Binomial Classification
?왼쪽의 그림은 원 일까 ?
yes/no
![Page 5: Multinomial classification and application of ML](https://reader035.vdocuments.net/reader035/viewer/2022062820/58abaa291a28abdf3c8b5f63/html5/thumbnails/5.jpg)
Binomial Classification
리의 경향성선의 경향성
𝑥1
𝑥2
원𝑋
𝑊
𝑍 𝑌
다각형
![Page 6: Multinomial classification and application of ML](https://reader035.vdocuments.net/reader035/viewer/2022062820/58abaa291a28abdf3c8b5f63/html5/thumbnails/6.jpg)
Multinomial Classification
ABC
?왼쪽의 그림은 A/B/C 중 무엇일까 ?
![Page 7: Multinomial classification and application of ML](https://reader035.vdocuments.net/reader035/viewer/2022062820/58abaa291a28abdf3c8b5f63/html5/thumbnails/7.jpg)
𝑥1
𝑥2
AB C
Multinomial Classification
![Page 8: Multinomial classification and application of ML](https://reader035.vdocuments.net/reader035/viewer/2022062820/58abaa291a28abdf3c8b5f63/html5/thumbnails/8.jpg)
𝑥1
𝑥2
AB C
Multinomial Classification
𝑋
𝑊
𝑍 𝑌
A?
![Page 9: Multinomial classification and application of ML](https://reader035.vdocuments.net/reader035/viewer/2022062820/58abaa291a28abdf3c8b5f63/html5/thumbnails/9.jpg)
𝑥1
𝑥2
AB C
Multinomial Classification
𝑋
𝑊
𝑍 𝑌
?B
![Page 10: Multinomial classification and application of ML](https://reader035.vdocuments.net/reader035/viewer/2022062820/58abaa291a28abdf3c8b5f63/html5/thumbnails/10.jpg)
𝑥1
𝑥2
AB C
Multinomial Classification
𝑋
𝑊
𝑍 𝑌
?C
![Page 11: Multinomial classification and application of ML](https://reader035.vdocuments.net/reader035/viewer/2022062820/58abaa291a28abdf3c8b5f63/html5/thumbnails/11.jpg)
𝑥1
𝑥2
AB C
Multinomial Classification
𝑋
𝑊
𝑍 𝑌
𝑋
𝑊
𝑍 𝑌
𝑋
𝑊
𝑍 𝑌A?
B?
C?
![Page 12: Multinomial classification and application of ML](https://reader035.vdocuments.net/reader035/viewer/2022062820/58abaa291a28abdf3c8b5f63/html5/thumbnails/12.jpg)
Multinomial Classification
𝑋
𝑊
𝑍[𝑤1 𝑤2 𝑤3 ][𝑥1𝑥2𝑥3] ¿ [𝑤1𝑥1+𝑤2 𝑥2+𝑤3𝑥3 ]
![Page 13: Multinomial classification and application of ML](https://reader035.vdocuments.net/reader035/viewer/2022062820/58abaa291a28abdf3c8b5f63/html5/thumbnails/13.jpg)
Multinomial Classification
𝑋
𝑊
𝑍 [𝑤 𝐴1 𝑤𝐴2 𝑤 𝐴3 ] [𝑥1𝑥2𝑥3] ¿ [𝑤 𝐴1𝑥1+𝑤 𝐴2𝑥2+𝑤𝐴 3𝑥3 ]
𝑋
𝑊
𝑍 [𝑤𝐵1 𝑤𝐵2 𝑤𝐵 3 ] [𝑥1𝑥2𝑥3] ¿ [𝑤𝐵1𝑥1+𝑤𝐵2𝑥2+𝑤𝐵3𝑥3 ]
𝑋
𝑊
𝑍 [𝑤𝐶 1 𝑤𝐶 2 𝑤𝐶 3 ] [𝑥1𝑥2𝑥3] ¿ [𝑤𝐶 1𝑥1+𝑤𝐶 2𝑥2+𝑤𝐶 3 𝑥3 ]
![Page 14: Multinomial classification and application of ML](https://reader035.vdocuments.net/reader035/viewer/2022062820/58abaa291a28abdf3c8b5f63/html5/thumbnails/14.jpg)
Multinomial Classification
𝑋
𝑊
𝑍
[𝑤𝐴1 𝑥1+𝑤 𝐴2𝑥2+𝑤𝐴3 𝑥3𝑤𝐵 1𝑥1+𝑤𝐵 2𝑥2+𝑤𝐵 3𝑥3𝑤𝐶 1𝑥1+𝑤𝐶 2𝑥2+𝑤𝐶 3𝑥3 ]𝑋
𝑊
𝑍 [𝑥1𝑥2𝑥3]¿𝑋
𝑊
𝑍
[𝑤𝐴1 𝑤 𝐴2 𝑤 𝐴3
𝑤𝐵1 𝑤𝐵2 𝑤𝐵 3𝑤𝐶 1 𝑤𝐶 2 𝑤𝐶 3
]
[𝐻 𝐴(𝑋 )𝐻𝐵(𝑋 )𝐻𝐶 (𝑋 )]¿
![Page 15: Multinomial classification and application of ML](https://reader035.vdocuments.net/reader035/viewer/2022062820/58abaa291a28abdf3c8b5f63/html5/thumbnails/15.jpg)
Multinomial Classification
[𝐻 𝐴(𝑋 )𝐻𝐵(𝑋 )𝐻𝐶 (𝑋 )] [ 1505−0.1]
example ABC
How Simi-lar?
¿
![Page 16: Multinomial classification and application of ML](https://reader035.vdocuments.net/reader035/viewer/2022062820/58abaa291a28abdf3c8b5f63/html5/thumbnails/16.jpg)
Multinomial Classification : Softmax Function
Score Probability
𝑯 𝑨 ( 𝑿 )=𝒁 𝑨
𝑯 𝑩 ( 𝑿 )=𝒁𝑩
𝑯𝑪 ( 𝑿 )=𝒁𝑪
𝒀 𝑨
𝒀 𝑩
𝒀 𝑪
(2) (1)
![Page 17: Multinomial classification and application of ML](https://reader035.vdocuments.net/reader035/viewer/2022062820/58abaa291a28abdf3c8b5f63/html5/thumbnails/17.jpg)
Multinomial Classification
𝑋𝑊 𝐴
𝑍 𝐴
𝑋𝑊 𝐵
𝑍𝐵
𝑋𝑊 𝐶
𝑍𝐶
ABC
softmax hot encoding(find maximum)
1.0
0 .0
0 .0
𝑌 𝐵
𝑌 𝑐
𝑌 𝐴0 .8
0 .15
0 .05
![Page 18: Multinomial classification and application of ML](https://reader035.vdocuments.net/reader035/viewer/2022062820/58abaa291a28abdf3c8b5f63/html5/thumbnails/18.jpg)
Cost Function
Cross Entropy Function
![Page 19: Multinomial classification and application of ML](https://reader035.vdocuments.net/reader035/viewer/2022062820/58abaa291a28abdf3c8b5f63/html5/thumbnails/19.jpg)
Entropy Function
(Information) Entropy
𝐻 (𝑝 )=−∑ 𝑝 (𝑥) log𝑝 (𝑥)
• 확률 분포 p 에 담긴 불확실성을 나타내는 지표
• 이 값이 클 수록 일정한 방향성과 규칙성이 없는 chaos
• p 라는 대상을 표현하기위해 필요한 정보량 (bit)
![Page 20: Multinomial classification and application of ML](https://reader035.vdocuments.net/reader035/viewer/2022062820/58abaa291a28abdf3c8b5f63/html5/thumbnails/20.jpg)
Cross Entropy Function
Cross Entropy
𝐻 (𝑝 ,𝑞 )=−∑ 𝑝 (𝑥) log𝑞(𝑥 )
• 두 확률 분포 p, q 사이에 존재하는 정보량을 계산하는 방법
• p->q 로 정보를 바꾸기 위해 필요한 정보량 (bit)
![Page 21: Multinomial classification and application of ML](https://reader035.vdocuments.net/reader035/viewer/2022062820/58abaa291a28abdf3c8b5f63/html5/thumbnails/21.jpg)
Cross Entropy Cost Function
𝑋𝑊 𝐴
𝑍 𝐴
𝑋𝑊 𝐵
𝑍𝐵
𝑋𝑊 𝐶
𝑍𝐶
𝑌 𝐴
𝑌 𝐵
𝑌 𝑐
: Prediction ( 0 ~ 1 ) : Real Value ( 0 or 1 )𝐷 (𝑌 𝑖 ,𝑌 𝑖 )=−∑ 𝑌 𝑖 log𝑌 𝑖
![Page 22: Multinomial classification and application of ML](https://reader035.vdocuments.net/reader035/viewer/2022062820/58abaa291a28abdf3c8b5f63/html5/thumbnails/22.jpg)
Cross Entropy Cost Function
[𝑌 𝐴
𝑌 𝐵
𝑌 𝐶]=[100] [𝑌 𝐴
𝑌 𝐵𝑌 𝐶
]=[100]
𝐷 (𝑌 𝑖 ,𝑌 𝑖)=−∑ 𝑌 𝑖 log𝑌 𝑖
![Page 23: Multinomial classification and application of ML](https://reader035.vdocuments.net/reader035/viewer/2022062820/58abaa291a28abdf3c8b5f63/html5/thumbnails/23.jpg)
Cross Entropy Cost Function
𝐷 (𝑌 𝑖 ,𝑌 𝑖 )=−∑ 𝑌 𝑖 log𝑌 𝑖
[𝑌 𝐴
𝑌 𝐵
𝑌 𝐶]=[100] [𝑌 𝐴
𝑌 𝐵𝑌 𝐶
]=[010]
![Page 24: Multinomial classification and application of ML](https://reader035.vdocuments.net/reader035/viewer/2022062820/58abaa291a28abdf3c8b5f63/html5/thumbnails/24.jpg)
Logistic Cost VS Cross Entropy
binomial classification 의 경우 각각 오직 2 가지 경우의 Real Data 와 H(x) 값이 나올 수 있다 . [01 ][10 ]
위 행렬은 다음과 같이 표현 할 수 있다 . [ 𝐻 (𝑥)1−𝐻 (𝑥)]𝐻 (𝑥 ) , 𝑦 {01
[ 𝑦1− 𝑦 ]
![Page 25: Multinomial classification and application of ML](https://reader035.vdocuments.net/reader035/viewer/2022062820/58abaa291a28abdf3c8b5f63/html5/thumbnails/25.jpg)
Logistic Cost VS Cross Entropy
Cross Entropy Cost Function에 대입하면 𝐻 (𝐻 (𝑥 ), 𝑦 )=−[ 𝑦1− 𝑦 ] ∙ log [ 𝐻 (𝑥 )
1−𝐻 (𝑥 )]
![Page 26: Multinomial classification and application of ML](https://reader035.vdocuments.net/reader035/viewer/2022062820/58abaa291a28abdf3c8b5f63/html5/thumbnails/26.jpg)
Cross Entropy Cost Function
𝐿= 1𝑁∑
𝑛𝐷𝑛 (𝑌 ,𝑌 )=− 1𝑁∑ (∑𝑌 𝑖 log𝑌 𝑖)
N 개의 training set 에 대한 Cost 들의 합
![Page 27: Multinomial classification and application of ML](https://reader035.vdocuments.net/reader035/viewer/2022062820/58abaa291a28abdf3c8b5f63/html5/thumbnails/27.jpg)
Application & Tips
Learning RateData Preprocessing
Overshooting
![Page 28: Multinomial classification and application of ML](https://reader035.vdocuments.net/reader035/viewer/2022062820/58abaa291a28abdf3c8b5f63/html5/thumbnails/28.jpg)
Gradient Descent Function
𝑊=𝑊 −𝛼 𝜕𝜕𝑊 𝐶𝑜𝑠𝑡(𝑊 )
Learning Rate
![Page 29: Multinomial classification and application of ML](https://reader035.vdocuments.net/reader035/viewer/2022062820/58abaa291a28abdf3c8b5f63/html5/thumbnails/29.jpg)
Learning rate : Overshooting
𝐿(𝑊 )
𝑊
![Page 30: Multinomial classification and application of ML](https://reader035.vdocuments.net/reader035/viewer/2022062820/58abaa291a28abdf3c8b5f63/html5/thumbnails/30.jpg)
Learning rate : Too small
𝐿(𝑊 )
𝑊
![Page 31: Multinomial classification and application of ML](https://reader035.vdocuments.net/reader035/viewer/2022062820/58abaa291a28abdf3c8b5f63/html5/thumbnails/31.jpg)
Data Preprocessing
𝐿(𝑊 )
𝑊 𝑤1
𝑤2
![Page 32: Multinomial classification and application of ML](https://reader035.vdocuments.net/reader035/viewer/2022062820/58abaa291a28abdf3c8b5f63/html5/thumbnails/32.jpg)
Data Preprocessing
𝑤1
𝑤2
𝑊=𝑊 −𝛼 𝜕𝜕𝑊 𝐶𝑜𝑠𝑡(𝑊 )
변하면서 각 weight 값들에 미치는 영향이 다를 때 적절한 Learning rate 을 찾기가 힘들어진다 .
![Page 33: Multinomial classification and application of ML](https://reader035.vdocuments.net/reader035/viewer/2022062820/58abaa291a28abdf3c8b5f63/html5/thumbnails/33.jpg)
Data Preprocessing : Standardization
𝑤𝑖 ′=𝑤𝑖−𝜇𝑖
𝜎 𝑖
의 평균
의 표준편차
![Page 34: Multinomial classification and application of ML](https://reader035.vdocuments.net/reader035/viewer/2022062820/58abaa291a28abdf3c8b5f63/html5/thumbnails/34.jpg)
Overfitting
• training data 에 과도하게 최적화 되는 현상
• real data 에 대해선 잘 동작하지 않는다 .
![Page 35: Multinomial classification and application of ML](https://reader035.vdocuments.net/reader035/viewer/2022062820/58abaa291a28abdf3c8b5f63/html5/thumbnails/35.jpg)
Overfitting
𝑥2
원
𝑥1
𝑥2
원
𝑥1
![Page 36: Multinomial classification and application of ML](https://reader035.vdocuments.net/reader035/viewer/2022062820/58abaa291a28abdf3c8b5f63/html5/thumbnails/36.jpg)
Overfitting
• 많은 양의 training data 로 학습 시킨다 .
• feature() 의 개수를 줄인다 .
• Regularization
Solution:
![Page 37: Multinomial classification and application of ML](https://reader035.vdocuments.net/reader035/viewer/2022062820/58abaa291a28abdf3c8b5f63/html5/thumbnails/37.jpg)
Overfitting : Regularization
𝐿= 1𝑁∑
𝑛𝐷𝑛 (𝑌 ,𝑌 )+λ∑𝑊 2
• weight 가 너무 큰 값을 가지지 않도록 한다 . => Cost 함수가 굴곡이 심하지 않도록 조정한다 .
Regularization Strength
![Page 38: Multinomial classification and application of ML](https://reader035.vdocuments.net/reader035/viewer/2022062820/58abaa291a28abdf3c8b5f63/html5/thumbnails/38.jpg)
Overfitting : Regularization
𝐿= 1𝑁∑
𝑛𝐷𝑛 (𝑌 ,𝑌 )+λ∑𝑊 2
Regularization Strength
![Page 39: Multinomial classification and application of ML](https://reader035.vdocuments.net/reader035/viewer/2022062820/58abaa291a28abdf3c8b5f63/html5/thumbnails/39.jpg)
Application & Tips
Learning and Test data sets
![Page 40: Multinomial classification and application of ML](https://reader035.vdocuments.net/reader035/viewer/2022062820/58abaa291a28abdf3c8b5f63/html5/thumbnails/40.jpg)
Training, validation and test sets
• training data 에 대해서는 이미 정답을 memorize 한 상태이기 때문에 실제 real data 에 잘 작동 하는지 확인을 할 수 없다 . => Test data 필요 !
• 학습된 machine 에 대해서 적절한 learning rate 와 regularization strengt 를 찾기 위한 validation 작업이 있어야 한다 . => Validation data 필요 !
![Page 41: Multinomial classification and application of ML](https://reader035.vdocuments.net/reader035/viewer/2022062820/58abaa291a28abdf3c8b5f63/html5/thumbnails/41.jpg)
Online Learning
Data
Model
• 너무 많은 양의 데이터가 있을 때 , 분할하여 나누어 학습시킨다 .
• Data 가 지속적으로 유입 되는 경우 사용되기도 한다 .