03. linear regression

43
Jeonghun Yoon

Upload: jeonghun-yoon

Post on 10-Feb-2017

35 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: 03. linear regression

Jeonghun Yoon

Page 2: 03. linear regression

ģ§€ė‚œ ģ‹œź°„.....Naive Bayes Classifier

argmaxš‘¦š‘ƒ š‘„1, ā€¦ , š‘„š‘‘ š‘¦ š‘ƒ(š‘¦) = argmax

š‘¦ š‘ƒ š‘„š‘– š‘¦ š‘ƒ(š‘¦)

š‘‘

š‘–=1

class š‘¦ ģ˜ ė°œģƒ ķ™•ė„ ź³¼ test setģ—ģ„œ class š‘¦ģ˜ labelģ„ ź°€ģ§„ ė°ģ“ķ„°ģ˜ ķŠ¹ģ„± ė²”ķ„°ģ˜

ģ›ģ†Œ š‘„š‘– (ė¬øģ„œģ˜ ģ˜ˆģ—ģ„œėŠ” ė‹Øģ–“) ź°€ ė‚˜ģ˜¬ ķ™•ė„ ģ˜ ź³±

ex) (I, love, you)ź°€ spamģøģ§€ ģ•„ė‹Œģ§€ ģ•Œźø° ģœ„ķ•“ģ„œėŠ”,

test setģ—ģ„œ spamģ“ ģ°Øģ§€ķ•˜ėŠ” ė¹„ģœØź³¼

spamģœ¼ė”œ labeling ėœ ė¬øģ„œģ—ģ„œ Iģ™€ loveģ™€ youź°€ ė°œģƒķ•˜ėŠ” ķ™•ė„ ģ„ ėŖØė‘ ź³±ķ•œ ź²ƒź³¼,

test setģ—ģ„œ hamģ“ ģ°Øģ§€ķ•˜ėŠ” ė¹„ģœØź³¼

hamģœ¼ė”œ labeling ėœ ė¬øģ„œģ—ģ„œ Iģ™€ loveģ™€ youź°€ ė°œģƒķ•˜ėŠ” ķ™•ė„ ģ„ ėŖØė‘ ź³±ķ•œ ź²ƒģ„,

ė¹„źµķ•œė‹¤.

Page 3: 03. linear regression

ģ§€ė‚œ ģ‹œź°„ ėÆøė¹„ķ–ˆė˜ ģ  ė“¤... 1. Laplacian Smoothing (appendix ģ°øź³ )

2. MLE / MAP

1

Page 4: 03. linear regression

Bayesā€™ Rule

š‘ šœƒ š•© =š‘ š•© šœƒ š‘(šœƒ)

š‘ š•© šœƒ š‘(šœƒ)

posteriori (ģ‚¬ķ›„ ķ™•ė„ )

likelihood (ģš°ė„ ź°’)

prior (ģ‚¬ģ „ ķ™•ė„ )

ģ‚¬ķ›„ ķ™•ė„  : ź“€ģ°° ź°’ė“¤ģ“ ź“€ģ°° ėœ ķ›„ģ— ėŖØģˆ˜(parameter)ģ˜ ė°œģƒ ķ™•ė„ ģ„ źµ¬ķ•œė‹¤.

ģ‚¬ģ „ ķ™•ė„  : ź“€ģ°° ź°’ė“¤ģ“ ź“€ģ°° ė˜źø° ģ „ģ— ėŖØģˆ˜ģ˜ ė°œģƒ ķ™•ė„ ģ„ źµ¬ķ•œė‹¤.

ģš°ė„ ź°’ : ėŖØģˆ˜ģ˜ ź°’ģ“ ģ£¼ģ–“ģ”Œģ„ ė•Œ ź“€ģ°° ź°’ė“¤ģ“ ė°œģƒķ•  ķ™•ė„ 

Page 5: 03. linear regression

Maximum Likelihood Estimate

š•© = (š‘„1, ā€¦ , š‘„š‘›)

š“› šœ½ = š’‘ š•© šœ½

ģš°ė„(likelihood)ėŠ” ė‹¤ģŒź³¼ ź°™ģ“ ģ •ģ˜ ėœė‹¤.

ė³€ģˆ˜(parameter) šœƒź°€ ģ£¼ģ–“ģ”Œģ„ ė•Œ, data set š•© = (š‘„1, ā€¦ , š‘„š‘›) (ź“€ģ°° ėœ, observed) ė„¼ ģ–»ģ„ ģˆ˜ ģžˆėŠ”(obtaining) ķ™•ė„ 

š‘(š•©|šœƒ)

š‘‹

šœƒģ˜ ķ•Øģˆ˜. šœƒģ˜ pdfėŠ” ģ•„ė‹˜.

š•© = (š‘„1, ā€¦ , š‘„š‘›)

Page 6: 03. linear regression

Maximum Likelihood EstimateėŠ” ė‹¤ģŒź³¼ ź°™ģ“ ģ •ģ˜ ėœė‹¤.

ź“€ģ°° ėœ data set š•© = š‘„1, ā€¦ , š‘„š‘› ģ„ ģ–»ģ„ ģˆ˜ ģžˆėŠ” ķ™•ė„ ģ“ ź°€ģž„ ķ° šœƒź°€ MLEģ“ė‹¤.

š‘(š•©|šœƒ1)

š‘‹ š•© = (š‘„1, ā€¦ , š‘„š‘›)

šœ½ = ššš«š š¦ššš±šœ½š“› šœ½ = ššš«š š¦ššš±

šœ½š’‘(š•©|šœ½) Ģ‚

š‘(š•©|šœƒ2) š‘(š•©|šœƒ3)

š‘(š•©|šœƒ) šœƒ = šœƒ2 Ģ‚

Page 7: 03. linear regression

ģš°ė¦¬ź°€ likelihood function š‘(š•©|šœƒ)ģ™€ prior š‘(šœƒ)ė„¼ ģ•Œ ė•Œ, Bayes ruleģ— ģ˜ķ•˜ģ—¬ posteriori functionģ˜ ź°’ģ„ źµ¬ķ•  ģˆ˜ ģžˆė‹¤.

š’‘ šœ½ š•© āˆ š’‘ š•© šœ½ š’‘(šœ½)

Maximum A Posteriori Estimate

š‘ šœƒ š•© =š‘ š•© šœƒ š‘(šœƒ)

š‘ š•© šœƒ š‘(šœƒ)

posteriori (ģ‚¬ķ›„ ķ™•ė„ )

likelihood (ģš°ė„ ź°’)

prior (ģ‚¬ģ „ ķ™•ė„ )

Page 8: 03. linear regression

Likelihood š‘(š•©|šœƒ)

Prior š‘(šœƒ)

Posterior š‘ šœƒ š•© āˆ š‘ š•© šœƒ š‘(šœƒ)

Page 9: 03. linear regression

Likelihood š‘(š•©|šœƒ)

Prior š‘(šœƒ)

Posterior š‘ šœƒ š•© āˆ š‘ š•© šœƒ š‘(šœƒ)

Page 10: 03. linear regression

šœ½ = ššš«š š¦ššš±šœ½š’‘(šœ½|š•©)

Likelihood š‘(š•©|šœƒ)

Prior š‘(šœƒ)

Posterior š‘ šœƒ š•© āˆ š‘ š•© šœƒ š‘(šœƒ)

Page 11: 03. linear regression

Regression

Page 12: 03. linear regression

ė‚˜ėŠ” ķ° ģ‹ ė°œķšŒģ‚¬ģ˜ CEOģ“ė‹¤. ė§Žģ€ ģ§€ģ ė“¤ģ„ ź°€ģ§€ź³  ģžˆė‹¤.

ź·øė¦¬ź³  ģ“ė²ˆģ— ģƒˆė”œģš“ ģ§€ģ ģ„ ė‚“ź³  ģ‹¶ė‹¤. ģ–“ėŠ ģ§€ģ—­ģ— ė‚“ģ•¼ ė ź¹Œ?

ė‚“ź°€ ģƒˆė”œģš“ ģ§€ģ ģ„ ė‚“ź³  ģ‹¶ģ–“ķ•˜ėŠ” ģ§€ģ—­ė“¤ģ˜ ģ˜ˆģƒ ģˆ˜ģµė§Œ ķŒŒģ•…ķ•  ģˆ˜ ģžˆģœ¼ė©“

ķ° ė„ģ›€ģ“ ė  ź²ƒģøė°!

ė‚“ź°€ ź°€ģ§€ź³  ģžˆėŠ” ģžė£Œ(data)ėŠ” ź° ģ§€ģ ģ˜ ģˆ˜ģµ(profits)ź³¼ ź° ģ§€ģ ģ“ ģžˆėŠ” ģ§€ģ—­ģ˜

ģøźµ¬ģˆ˜(populations)ģ“ė‹¤.

ķ•“ź²°ģ±…! Linear Regression!

ģ“ź²ƒģ„ ķ†µķ•˜ģ—¬, ģƒˆė”œģš“ ģ§€ģ—­ģ˜ ģøźµ¬ģˆ˜ė„¼ ģ•Œź²Œ ė  ź²½ģš°, ź·ø ģ§€ģ—­ģ˜ ģ˜ˆģƒ ģˆ˜ģµģ„ źµ¬

ķ•  ģˆ˜ ģžˆė‹¤.

Example 1)

Page 13: 03. linear regression

Example 2)

ė‚˜ėŠ” ģ§€źøˆ Pittsburghė”œ ģ“ģ‚¬ė„¼ ģ™”ė‹¤ ė‚˜ėŠ” ź°€ģž„ ķ•©ė¦¬ģ ģø ź°€ź²©ģ˜ ģ•„ķŒŒķŠøė„¼ ģ–»źø° ģ›ķ•œė‹¤. ź·øė¦¬ź³  ė‹¤ģŒģ˜ ģ”°ź±“ė“¤ģ€ ė‚“ź°€ ģ§‘ģ„ ģ‚¬źø° ģœ„ķ•“ ź³ ė ¤ķ•˜ėŠ” ź²ƒė“¤ģ“ė‹¤. square-ft(ķ‰ė°©ėÆøķ„°), ģ¹Øģ‹¤ģ˜ ģˆ˜, ķ•™źµ ź¹Œģ§€ģ˜ ź±°ė¦¬... ė‚“ź°€ ģ›ķ•˜ėŠ” ķ¬źø°ģ™€ ģ¹Øģ‹¤ģ˜ ģˆ˜ė„¼ ź°€ģ§€ź³  ģžˆėŠ” ģ§‘ģ˜ ź°€ź²©ģ€ ź³¼ģ—° ģ–¼ė§ˆģ¼ź¹Œ?

Page 14: 03. linear regression

ā‘  Given an input š‘„ we would like to compute an output š‘¦. (ė‚“ź°€ ģ›ķ•˜ėŠ” ģ§‘ģ˜ ķ¬źø°ģ™€, ė°©ģ˜ ź°œģˆ˜ė„¼ ģž…ė „ķ–ˆģ„ ė•Œ, ģ§‘ ź°€ź²©ģ˜ ģ˜ˆģø” ź°’ģ„ ź³„ģ‚°)

ā‘” For example 1) Predict height from age (height = š‘¦, age = š‘„) 2) Predict Google`s price from Yahoo`s price (Google's price = š‘¦, Yahoo's price = š‘„)

š‘¦ = šœƒ0 + šœƒ1š‘„

ģ¦‰, źø°ģ”“ģ˜ dataė“¤ģ—ģ„œ

ģ§ģ„ (š‘¦ = šœƒ0 + šœƒ1š‘„)ģ„ ģ°¾ģ•„ė‚“ė©“,

ģƒˆė”œģš“ ź°’ š‘„š‘›š‘’š‘¤ź°€ ģ£¼ģ–“ģ”Œģ„ ė•Œ,

ķ•“ė‹¹ķ•˜ėŠ” š‘¦ģ˜ ź°’ģ„ ģ˜ˆģø”ķ•  ģˆ˜

ģžˆź² źµ¬ė‚˜!

learning, training

prediction

Page 15: 03. linear regression

Input : ģ§‘ģ˜ ķ¬źø°(š‘„1), ė°©ģ˜ ź°œģˆ˜(š‘„2), ķ•™źµź¹Œģ§€ģ˜ ź±°ė¦¬(š‘„3),.....

(š‘„1, š‘„2, ā€¦ , š‘„š‘›) : ķŠ¹ģ„± ė²”ķ„° feature vector

Output : ģ§‘ ź°’(š‘¦)

š’š = šœ½šŸŽ + šœ½šŸš’™šŸ + šœ½šŸš’™šŸ +ā‹Æ+ šœ½š’š’™š’

training setģ„ ķ†µķ•˜ģ—¬ ķ•™ģŠµ(learning)

Page 16: 03. linear regression

Simple Linear Regression

Page 17: 03. linear regression

š‘¦š‘– = šœƒ0 + šœƒ1š‘„š‘– + šœ€š‘–

š‘–ė²ˆģ§ø ź“€ģ°°ģ  š‘¦š‘– , š‘„š‘– ź°€ ģ£¼ģ–“ģ”Œģ„ ė•Œ ė‹Øģˆœ ķšŒź·€ ėŖØķ˜•ģ€ ė‹¤ģŒź³¼ ź°™ė‹¤.

šœ–3

šœ–š‘– : š‘–ė²ˆģ§ø ź“€ģ°°ģ ģ—ģ„œ ģš°ė¦¬ź°€ źµ¬ķ•˜ź³ ģž ķ•˜ėŠ” ķšŒź·€ģ§ģ„ ź³¼ ģ‹¤ģ œ ź“€ģ°°ėœ š‘¦š‘–ģ˜ ģ°Øģ“ (error)

ģš°ė¦¬ėŠ” ģ˜¤ė„˜ģ˜ ķ•©ģ„ ź°€ģž„ ģž‘ź²Œ ė§Œė“œėŠ” ģ§ģ„ ģ„ ģ°¾ź³  ģ‹¶ė‹¤. ģ¦‰ ź·øė ‡ź²Œ ė§Œė“œėŠ” šœ½šŸŽģ™€ šœ½šŸģ„ ģ¶”ģ •ķ•˜ź³  ģ‹¶ė‹¤ ! How!! ģµœģ†Œ ģ œź³± ė²•! (Least Squares Method)

min š‘¦š‘– āˆ’ šœƒ0 + šœƒ1š‘„š‘–2

š‘–

= š‘šš‘–š‘› šœ–š‘–2

š‘–

š‘¦ = šœƒ0 + šœƒ1š‘„

ģ‹¤ģ œ ź“€ģø” ź°’ ķšŒź·€ ģ§ģ„ ģ˜ ź°’(ģ“ģƒģ ģø ź°’)

ģ¢…ģ† ė³€ģˆ˜ ģ„¤ėŖ… ė³€ģˆ˜, ė…ė¦½ ė³€ģˆ˜

Page 18: 03. linear regression

min š‘¦š‘– āˆ’ šœƒ0 + šœƒ1š‘„š‘–2

š‘–

= min šœ–š‘–2

š‘–

ģ‹¤ģ œ ź“€ģø” ź°’ ķšŒź·€ ģ§ģ„ ģ˜ ź°’(ģ“ģƒģ ģø ź°’)

ģœ„ģ˜ ģ‹ģ„ ģµœėŒ€ķ•œ ė§Œģ”± ģ‹œķ‚¤ėŠ” šœƒ0, šœƒ1ģ„ ģ¶”ģ •ķ•˜ėŠ” ė°©ė²•ģ€ ė¬“ģ—‡ģ¼ź¹Œ?

(ģ“ėŸ¬ķ•œ šœƒ1, šœƒ2ė„¼ šœƒ1, šœƒ2 ė¼ź³  ķ•˜ģž.)

- Normal Equation

- Steepest Gradient Descent

Ė† Ė†

Page 19: 03. linear regression

What is normal equation?

ź·¹ėŒ€ ź°’, ź·¹ģ†Œ ź°’ģ„ źµ¬ķ•  ė•Œ, ģ£¼ģ–“ģ§„ ģ‹ģ„ ėÆøė¶„ķ•œ ķ›„ģ—, ėÆøė¶„ķ•œ ģ‹ģ„ 0ģœ¼ė”œ ė§Œė“œėŠ” ź°’ģ„ ģ°¾ėŠ”ė‹¤.

min š‘¦š‘– āˆ’ šœƒ0 + šœƒ1š‘„š‘–2

š‘–

ėؼģ €, šœƒ0ģ— ėŒ€ķ•˜ģ—¬ ėÆøė¶„ķ•˜ģž. āˆ’ š‘¦š‘– āˆ’ šœƒ0 + šœƒ1š‘„š‘– = 0

š‘–

šœ•

šœ•šœƒ0 š‘¦š‘– āˆ’ šœƒ0 + šœƒ1š‘„š‘–

2

š‘–

=

ė‹¤ģŒģœ¼ė”œ, šœƒ1ģ— ėŒ€ķ•˜ģ—¬ ėÆøė¶„ķ•˜ģž. āˆ’ š‘¦š‘– āˆ’ šœƒ0 + šœƒ1š‘„š‘– š‘„š‘– = 0

š‘–

šœ•

šœ•šœƒ1 š‘¦š‘– āˆ’ šœƒ0 + šœƒ1š‘„š‘–

2

š‘–

=

ģœ„ ģ˜ ė‘ ģ‹ģ„ 0ģœ¼ė”œ ė§Œģ”±ģ‹œķ‚¤ėŠ” šœƒ0, šœƒ1ė„¼ ģ°¾ģœ¼ė©“ ėœė‹¤. ģ“ģ²˜ėŸ¼ 2ź°œģ˜ ėÆøģ§€ģˆ˜ģ— ėŒ€ķ•˜ģ—¬,

2ź°œģ˜ ė°©ģ •ģ‹(system)ģ“ ģžˆģ„ ė•Œ, ģš°ė¦¬ėŠ” ģ“ systemģ„ normal equation(ģ •ź·œė°©ģ •ģ‹)ģ“ė¼ ė¶€ė„øė‹¤.

Page 20: 03. linear regression

The normal equation form

š•©š‘– = 1, š‘„š‘–š‘‡, Ī˜ = šœƒ0, šœƒ1

š‘‡, š•Ŗ = š‘¦1, š‘¦2, ā€¦ , š‘¦š‘›š‘‡ , š‘‹ =

11ā€¦

š‘„1š‘„2ā€¦

1 š‘„š‘›

, š•– = (šœ–1, ā€¦ , šœ–š‘›) ė¼ź³  ķ•˜ģž.

š•Ŗ = š‘‹Ī˜ + š•–

š‘¦1 = šœƒ0 + šœƒ1š‘„1 + šœ–1

š‘¦2 = šœƒ0 + šœƒ1š‘„2 + šœ–2

.......

š‘¦š‘›āˆ’1 = šœƒ0 + šœƒ1š‘„š‘›āˆ’1 + šœ–š‘›āˆ’1

š‘¦š‘› = šœƒ0 + šœƒ1š‘„š‘› + šœ–š‘›

š‘›ź°œģ˜ ź“€ģø” ź°’ (š‘„š‘– , š‘¦š‘–)ģ€ ģ•„ėž˜ģ™€ ź°™ģ€ ķšŒź·€ ėŖØķ˜•ģ„ ź°€ģ§„ė‹¤ź³  ź°€ģ •ķ•˜ģž.

š‘¦1š‘¦2š‘¦3ā€¦š‘¦š‘›

=

111ā€¦

š‘„1š‘„2š‘„3ā€¦

1 š‘„š‘›

šœƒ0šœƒ1

+

šœ–1šœ–2šœ–3ā€¦šœ–š‘›

Page 21: 03. linear regression

šœ–š‘—2

š‘›

š‘—=1

= š•–š‘‡š•– = š•Ŗ āˆ’ š‘‹Ī˜ š‘‡(š•Ŗ āˆ’ š‘‹Ī˜)

= š•Ŗš‘‡š•Ŗ āˆ’ Ī˜š‘‡š‘‹š‘‡š•Ŗ āˆ’ š•Ŗš‘‡š‘‹Ī˜ + Ī˜š‘‡š‘‹š‘‡š‘‹Ī˜ = š•Ŗš‘‡š•Ŗ āˆ’ 2Ī˜š‘‡š‘‹š‘‡š•Ŗ + Ī˜š‘‡š‘‹š‘‡š‘‹Ī˜

1 by 1 ķ–‰ė ¬ģ“ėƀė”œ ģ „ģ¹˜ķ–‰ė ¬ģ˜ ź°’ģ“ ź°™ė‹¤!

šœ•(š•–š‘‡š•–)

šœ•Ī˜= šŸŽ

šœ•(š•–š‘‡š•–)

šœ•Ī˜= āˆ’2š‘‹š‘‡š•Ŗ + 2š‘‹š‘‡š‘‹Ī˜ = šŸŽ

š‘‹š‘‡š‘‹ššÆ = š‘‹š‘‡š•Ŗ ššÆ = š‘‹š‘‡š‘‹ āˆ’1š‘‹š‘‡š•Ŗ Ė†

ģ •ź·œė°©ģ •ģ‹

š•Ŗ = š‘‹Ī˜ + š•– š•– = š•Ŗ āˆ’ š‘‹Ī˜

Minimize šœ–š‘—2

š‘›

š‘—=1

Page 22: 03. linear regression

What is Gradient Descent?

machine learningģ—ģ„œėŠ” ė§¤ź°œ ė³€ģˆ˜(parameter, ģ„ ķ˜•ķšŒź·€ģ—ģ„œėŠ” šœƒ0, šœƒ1)ź°€ ģˆ˜ģ‹­~

ģˆ˜ė°± ģ°Øģ›ģ˜ ė²”ķ„°ģø ź²½ģš°ź°€ ėŒ€ė¶€ė¶„ģ“ė‹¤. ė˜ķ•œ ėŖ©ģ  ķ•Øģˆ˜(ģ„ ķ˜•ķšŒź·€ģ—ģ„œėŠ” Ī£šœ–š‘–2)ź°€

ėŖØė“  źµ¬ź°„ģ—ģ„œ ėÆøė¶„ ź°€ėŠ„ķ•˜ė‹¤ėŠ” ė³“ģž„ģ“ ķ•­ģƒ ģžˆėŠ” ź²ƒė„ ģ•„ė‹ˆė‹¤.

ė”°ė¼ģ„œ ķ•œ ė²ˆģ˜ ģˆ˜ģ‹ ģ „ź°œė”œ ķ•“ė„¼ źµ¬ķ•  ģˆ˜ ģ—†ėŠ” ģƒķ™©ģ“ ģ ģ§€ ģ•Šź²Œ ģžˆė‹¤.

ģ“ėŸ° ź²½ģš°ģ—ėŠ” ģ“ˆźø° ķ•“ģ—ģ„œ ģ‹œģž‘ķ•˜ģ—¬ ķ•“ė„¼ ė°˜ė³µģ ģœ¼ė”œ ź°œģ„ ķ•“ ė‚˜ź°€ėŠ” ģˆ˜ģ¹˜ģ 

ė°©ė²•ģ„ ģ‚¬ģš©ķ•œė‹¤. (ėÆøė¶„ģ“ ģ‚¬ģš© ėØ)

Page 23: 03. linear regression

What is Gradient Descent?

ģ“ˆźø°ķ•“ š›¼0 ģ„¤ģ • š‘” = 0

š›¼š‘”ź°€ ė§Œģ”±ģŠ¤ėŸ½ė‚˜?

š›¼š‘”+1 = š‘ˆ š›¼š‘” š‘” = š‘” + 1

š›¼ = š›¼š‘” Ė† No

Yes

Page 24: 03. linear regression

What is Gradient Descent?

Gradient Descent

ķ˜„ģž¬ ģœ„ģ¹˜ģ—ģ„œ ź²½ģ‚¬ź°€ ź°€ģž„ źø‰ķ•˜ź²Œ ķ•˜ź°•ķ•˜ėŠ” ė°©ķ–„ģ„ ģ°¾ź³ ,

ź·ø ė°©ķ–„ģœ¼ė”œ ģ•½ź°„ ģ“ė™ķ•˜ģ—¬ ģƒˆė”œģš“ ģœ„ģ¹˜ė„¼ ģž”ėŠ”ė‹¤.

ģ“ėŸ¬ķ•œ ź³¼ģ •ģ„ ė°˜ė³µķ•Øģœ¼ė”œģØ ź°€ģž„ ė‚®ģ€ ģ§€ģ (ģ¦‰ ģµœģ € ģ )ģ„ ģ°¾ģ•„ ź°„ė‹¤.

Gradient Ascent

ķ˜„ģž¬ ģœ„ģ¹˜ģ—ģ„œ ź²½ģ‚¬ź°€ ź°€ģž„ źø‰ķ•˜ź²Œ ģƒģŠ¹ķ•˜ėŠ” ė°©ķ–„ģ„ ģ°¾ź³ ,

ź·ø ė°©ķ–„ģœ¼ė”œ ģ•½ź°„ ģ“ė™ķ•˜ģ—¬ ģƒˆė”œģš“ ģœ„ģ¹˜ė„¼ ģž”ėŠ”ė‹¤.

ģ“ėŸ¬ķ•œ ź³¼ģ •ģ„ ė°˜ė³µķ•Øģœ¼ė”œģØ ź°€ģž„ ė†’ģ€ ģ§€ģ (ģ¦‰ ģµœėŒ€ ģ )ģ„ ģ°¾ģ•„ ź°„ė‹¤.

Page 25: 03. linear regression

What is Gradient Descent?

Gradient Descent

š›¼š‘”+1 = š›¼š‘” āˆ’ šœŒšœ•š½

šœ•š›¼ š›¼š‘”

š½ =ėŖ©ģ ķ•Øģˆ˜

šœ•š½

šœ•š›¼ š›¼š‘”: š›¼š‘”ģ—ģ„œģ˜ ė„ķ•Øģˆ˜

šœ•š½

šœ•š›¼ģ˜ ź°’

š›¼š‘” š›¼š‘”+1

āˆ’šš‘±

ššœ¶ šœ¶š’•

šš‘±

ššœ¶ šœ¶š’•

š›¼š‘”ģ—ģ„œģ˜ ėÆøė¶„ź°’ģ€ ģŒģˆ˜ģ“ė‹¤.

ź·øėž˜ģ„œ šœ•J

šœ•Ī± Ī±t ė„¼ ė”ķ•˜ź²Œ ė˜ė©“

ģ™¼ģŖ½ģœ¼ė”œ ģ“ė™ķ•˜ź²Œ ėœė‹¤.

ź·øėŸ¬ė©“ ėŖ©ģ ķ•Øģˆ˜ģ˜ ź°’ģ“ ģ¦ź°€ķ•˜ėŠ”

ė°©ķ–„ģœ¼ė”œ ģ“ė™ķ•˜ź²Œ ėœė‹¤.

ė”°ė¼ģ„œ šœ•J

šœ•Ī± Ī±tė„¼ ė¹¼ģ¤€ė‹¤.

ź·øė¦¬ź³  ģ ė‹¹ķ•œ šœŒė„¼ ź³±ķ•“ģ£¼ģ–“ģ„œ ģ”°źøˆė§Œ

ģ“ė™ķ•˜ź²Œ ķ•œė‹¤.

āˆ’š†šš‘±

ššœ¶ šœ¶š’•

Page 26: 03. linear regression

What is Gradient Descent?

Gradient Descent

š›¼š‘”+1 = š›¼š‘” āˆ’ šœŒšœ•š½

šœ•š›¼ š›¼š‘”

Gradient Ascent

š›¼š‘”+1 = š›¼š‘” + šœŒšœ•š½

šœ•š›¼ š›¼š‘”

š½ =ėŖ©ģ ķ•Øģˆ˜

šœ•š½

šœ•š›¼ š›¼š‘”: š›¼š‘”ģ—ģ„œģ˜ ė„ķ•Øģˆ˜

šœ•š½

šœ•š›¼ģ˜ ź°’

Gradient Descent, Gradient AscentėŠ” ģ „ķ˜•ģ ģø Greedy algorithmģ“ė‹¤.

ź³¼ź±° ė˜ėŠ” ėÆøėž˜ė„¼ ź³ ė ¤ķ•˜ģ§€ ģ•Šź³  ķ˜„ģž¬ ģƒķ™©ģ—ģ„œ ź°€ģž„ ģœ ė¦¬ķ•œ ė‹¤ģŒ ģœ„ģ¹˜ė„¼ ģ°¾ģ•„

Local optimal pointė”œ ėė‚  ź°€ėŠ„ģ„±ģ„ ź°€ģ§„ ģ•Œź³ ė¦¬ģ¦˜ģ“ė‹¤.

Page 27: 03. linear regression

š½ Ī˜ = 1

2 šœƒ0 + šœƒ1š‘„š‘– āˆ’ š‘¦š‘–

2

š‘›

š‘–=1

= 1

2 Ī˜š‘‡š•©š‘– āˆ’ š‘¦š‘–

2

š‘›

š‘–=1

š•©š‘– = 1, š‘„š‘–š‘‡, Ī˜ = šœƒ0, šœƒ1

š‘‡, š•Ŗ = š‘¦1, š‘¦2, ā€¦ , š‘¦š‘›š‘‡ , š‘‹ =

11ā€¦

š‘„1š‘„2ā€¦

1 š‘„š‘›

, š•– = (šœ–1, ā€¦ , šœ–š‘›) ė¼ź³  ķ•˜ģž.

šœƒ0š‘”+1 = šœƒ0

š‘” āˆ’ š›¼šœ•

šœ•šœƒ0š½(Ī˜)š‘”

šœƒ1š‘”+1 = šœƒ1

š‘” āˆ’ š›¼šœ•

šœ•šœƒ1š½(Ī˜)š‘”

šœƒ0ģ˜ š‘”ė²ˆģ§ø ź°’ģ„,

š½(Ī˜)ė„¼ šœƒ0ģœ¼ė”œ ėÆøė¶„ķ•œ ģ‹ģ—ė‹¤ź°€ ėŒ€ģž….

ź·ø ķ›„ģ—, ģ“ ź°’ģ„ šœƒ0ģ—ģ„œ ė¹¼ ģ¤Œ.

ėÆøė¶„ķ•  ė•Œ ģ“ģš©.

Gradient descentė„¼ ģ¤‘ģ§€ķ•˜ėŠ”

źø°ģ¤€ģ“ ė˜ėŠ” ķ•Øģˆ˜

Page 28: 03. linear regression

š½ Ī˜ = 1

2 šœƒ0 + šœƒ1š‘„š‘– āˆ’ š‘¦š‘–

2

š‘›

š‘–=1

= 1

2 Ī˜š‘‡š•©š‘– āˆ’ š‘¦š‘–

2

š‘›

š‘–=1

š•©š‘– = 1, š‘„š‘–š‘‡, Ī˜ = šœƒ0, šœƒ1

š‘‡, š•Ŗ = š‘¦1, š‘¦2, ā€¦ , š‘¦š‘›š‘‡ , š‘‹ =

11ā€¦

š‘„1š‘„2ā€¦

1 š‘„š‘›

, š•– = (šœ–1, ā€¦ , šœ–š‘›) ė¼ź³  ķ•˜ģž.

Gradient of š½(Ī˜)

šœ•

šœ•šœƒ0š½ šœƒ = (Ī˜š‘‡š•©š‘– āˆ’ š‘¦š‘–)

š‘›

š‘–=1

1 šœ•

šœ•šœƒ1š½ šœƒ = (Ī˜š‘‡š•©š‘– āˆ’ š‘¦š‘–)

š‘›

š‘–=1

š‘„š‘–

š›»š½ Ī˜ =šœ•

šœ•šœƒ0š½ Ī˜ ,šœ•

šœ•šœƒ1š½ Ī˜

š‘‡

= Ī˜š‘‡š•©š‘– āˆ’ š‘¦š‘– š•©š‘–

š‘›

š‘–=1

Page 29: 03. linear regression

š•©š‘– = 1, š‘„š‘–š‘‡, Ī˜ = šœƒ0, šœƒ1

š‘‡, š•Ŗ = š‘¦1, š‘¦2, ā€¦ , š‘¦š‘›š‘‡ , š‘‹ =

11ā€¦

š‘„1š‘„2ā€¦

1 š‘„š‘›

, š•– = (šœ–1, ā€¦ , šœ–š‘›) ė¼ź³  ķ•˜ģž.

šœƒ0š‘”+1 = šœƒ0

š‘” āˆ’ š›¼ (Ī˜š‘‡š•©š‘– āˆ’ š‘¦š‘–)

š‘›

š‘–=1

1 ė‹Ø, ģ“ ė•Œģ˜ Ī˜ģžė¦¬ģ—ėŠ”

š‘”ė²ˆģ§øģ— ģ–»ģ–“ģ§„ Ī˜ź°’ģ„ ėŒ€ģž…ķ•“ģ•¼ ķ•œė‹¤.

šœƒ1š‘”+1 = šœƒ1

š‘” āˆ’ š›¼ Ī˜š‘‡š•©š‘– āˆ’ š‘¦š‘– š‘„š‘–

š‘›

š‘–=1

Page 30: 03. linear regression

Steepest Descent

Page 31: 03. linear regression

Steepest Descent

ģž„ģ  : easy to implement, conceptually clean, guaranteed convergence

ė‹Øģ  : often slow converging

Ī˜š‘”+1 = Ī˜š‘” āˆ’ š›¼ {(Ī˜š‘”)š‘‡š•©š‘– āˆ’ š‘¦š‘–}š•©š‘–

š‘›

š‘–=1

Normal Equations

ģž„ģ  : a single-shot algorithm! Easiest to implement.

ė‹Øģ  : need to compute pseudo-inverse š‘‹š‘‡š‘‹ āˆ’1, expensive, numerical issues

(e.g., matrix is singular..), although there are ways to get around this ...

š•– = š‘‹š‘‡š‘‹ āˆ’1š‘‹š‘‡š•Ŗ Ė†

Page 32: 03. linear regression

Multivariate Linear Regression

Page 33: 03. linear regression

š’š = šœ½šŸŽ + šœ½šŸš’™šŸ + šœ½šŸš’™šŸ +ā‹Æ+ šœ½š’š’™š’

ė‹Øģˆœ ģ„ ķ˜• ķšŒź·€ ė¶„ģ„ģ€, input ė³€ģˆ˜ź°€ 1. ė‹¤ģ¤‘ ģ„ ķ˜• ķšŒź·€ ė¶„ģ„ģ€, input ė³€ģˆ˜ź°€ 2ź°œ ģ“ģƒ.

Googleģ˜ ģ£¼ģ‹ ź°€ź²©

Yahooģ˜ ģ£¼ģ‹ ź°€ź²©

Microsoftģ˜ ģ£¼ģ‹ ź°€ź²©

Page 34: 03. linear regression

š’š = šœ½šŸŽ + šœ½šŸš’™šŸšŸ + šœ½šŸš’™šŸ

šŸ’ + š

ģ˜ˆė„¼ ė“¤ģ–“, ģ•„ėž˜ģ™€ ź°™ģ€ ģ‹ģ„ ģ„ ķ˜•ģœ¼ė”œ ģƒź°ķ•˜ģ—¬ ķ’€ ģˆ˜ ģžˆėŠ”ź°€?

ė¬¼ė” , input ė³€ģˆ˜ź°€ polynomial(ė‹¤ķ•­ģ‹)ģ˜ ķ˜•ķƒœģ“ģ§€ė§Œ, coefficients šœƒš‘–ź°€ ģ„ ķ˜•(linear)ģ“ėƀė”œ ģ„ ķ˜• ķšŒź·€ ė¶„ģ„ģ˜ ķ•“ė²•ģœ¼ė”œ ķ’€ ģˆ˜ ģžˆė‹¤.

ššÆ = š‘‹š‘‡š‘‹ āˆ’1š‘‹š‘‡š•Ŗ Ė†

šœƒ0, šœƒ1, ā€¦ , šœƒš‘›š‘‡

Page 35: 03. linear regression

General Linear Regression

Page 36: 03. linear regression

š’š = šœ½šŸŽ + šœ½šŸš’™šŸ + šœ½šŸš’™šŸ +ā‹Æ+ šœ½š’š’™š’ ģ¤‘ ķšŒź·€ ė¶„ģ„

ģ¼ė°˜ ķšŒź·€ ė¶„ģ„ š’š = šœ½šŸŽ + šœ½šŸš’ˆšŸ(š’™šŸ) + šœ½šŸš’ˆšŸ(š’™šŸ) + ā‹Æ+ šœ½š’š’ˆš’(š’™š’)

š‘”š‘—ėŠ” š‘„š‘— ė˜ėŠ”

(š‘„āˆ’šœ‡š‘—)

2šœŽš‘— ė˜ėŠ”

1

1+exp(āˆ’š‘ š‘—š‘„)ė“±ģ˜ ķ•Øģˆ˜ź°€ ė  ģˆ˜ ģžˆė‹¤.

ģ“ź²ƒė„ ė§ˆģ°¬ź°€ģ§€ė”œ ģ„ ķ˜• ķšŒź·€ ķ’€ģ“ ė°©ė²•ģœ¼ė”œ ė¬øģ œė„¼ ķ’€ ģˆ˜ ģžˆė‹¤.

Page 37: 03. linear regression

š‘¤š‘‡ = (š‘¤0, š‘¤1, ā€¦ , š‘¤š‘›)

šœ™ š‘„š‘–š‘‡= šœ™0 š‘„

š‘– , šœ™1 š‘„š‘– , ā€¦ , šœ™š‘› š‘„

š‘–

Page 38: 03. linear regression

š‘¤š‘‡ = (š‘¤0, š‘¤1, ā€¦ , š‘¤š‘›)

šœ™ š‘„š‘–š‘‡= šœ™0 š‘„

š‘– , šœ™1 š‘„š‘– , ā€¦ , šœ™š‘› š‘„

š‘–

normal equation

Page 39: 03. linear regression

[ ģžė£Œģ˜ ė¶„ģ„ ]

ā‘  ėŖ©ģ  : ģ§‘ģ„ ķŒ”źø° ģ›ķ•Ø. ģ•Œė§žģ€ ź°€ź²©ģ„ ģ°¾źø° ģ›ķ•Ø.

ā‘” ź³ ė ¤ķ•  ė³€ģˆ˜(feature) : ģ§‘ģ˜ ķ¬źø°(in square feet), ģ¹Øģ‹¤ģ˜ ź°œģˆ˜, ģ§‘ ź°€ź²©

Page 40: 03. linear regression

(ģ¶œģ²˜ : http://aimotion.blogspot.kr/2011/10/machine-learning-with-python-linear.html)

ā‘¢ ģ£¼ģ˜ģ‚¬ķ•­ : ģ§‘ģ˜ ķ¬źø°ģ™€ ģ¹Øģ‹¤ģ˜ ź°œģˆ˜ģ˜ ģ°Øģ“ź°€ ķ¬ė‹¤. ģ˜ˆė„¼ ė“¤ģ–“, ģ§‘ģ˜ ķ¬źø°ź°€ 4000 square feetģøė°,

ģ¹Øģ‹¤ģ˜ ź°œģˆ˜ėŠ” 3ź°œģ“ė‹¤. ģ¦‰, ė°ģ“ķ„° ģƒ featureė“¤ ź°„ ź·œėŖØģ˜ ģ°Øģ“ź°€ ķ¬ė‹¤. ģ“ėŸ“ ź²½ģš°,

featureģ˜ ź°’ģ„ ģ •ź·œķ™”(normalizing)ė„¼ ķ•“ģ¤€ė‹¤. ź·øėž˜ģ•¼, Gradient Descentė„¼ ģˆ˜ķ–‰ķ•  ė•Œ,

ź²°ź³¼ź°’ģœ¼ė”œ ė¹ ė„“ź²Œ ģˆ˜ė “ķ•˜ė‹¤.

ā‘£ ģ •ź·œķ™”ģ˜ ė°©ė²•

- featureģ˜ mean(ķ‰ź· )ģ„ źµ¬ķ•œ ķ›„, featureė‚“ģ˜ ėŖØė“  dataģ˜ ź°’ģ—ģ„œ meanģ„ ė¹¼ģ¤€ė‹¤.

- dataģ—ģ„œ meanģ„ ė¹¼ ģ¤€ ź°’ģ„, ź·ø dataź°€ ģ†ķ•˜ėŠ” standard deviation(ķ‘œģ¤€ ķŽøģ°Ø)ė”œ ė‚˜ėˆ„ģ–“ ģ¤€ė‹¤. (scaling)

ģ“ķ•“ź°€ ģ•ˆ ė˜ė©“, ģš°ė¦¬ź°€ ź³ ė“±ķ•™źµ ė•Œ ė°°ģ› ė˜ ģ •ź·œė¶„ķ¬ė„¼ ķ‘œģ¤€ģ •ź·œė¶„ķ¬ė”œ ė°”ź¾øģ–“ģ£¼ėŠ” ź²ƒģ„ ė– ģ˜¬ė ¤ė³“ģž.

ķ‘œģ¤€ģ •ź·œė¶„ķ¬ė„¼ ģ‚¬ģš©ķ•˜ėŠ” ģ“ģœ  ģ¤‘ ķ•˜ė‚˜ėŠ”, ģ„œė”œ ė‹¤ė„ø ė‘ ė¶„ķ¬, ģ¦‰ ė¹„źµź°€ ė¶ˆź°€ėŠ„ķ•˜ź±°ė‚˜ ģ–“ė ¤ģš“ ė‘ ė¶„ķ¬ė„¼ ģ‰½ź²Œ

ė¹„źµķ•  ģˆ˜ ģžˆź²Œ ķ•“ģ£¼ėŠ” ź²ƒģ“ģ—ˆė‹¤.

š‘ = š‘‹ āˆ’ šœ‡

šœŽ If š‘‹~(šœ‡, šœŽ) then š‘~š‘(1,0)

Page 41: 03. linear regression

1. http://www.cs.cmu.edu/~epxing/Class/10701/Lecture/lecture5-LiR.pdf

2. http://www.cs.cmu.edu/~10701/lecture/RegNew.pdf

3. ķšŒź·€ė¶„ģ„ ģ œ 3ķŒ (ė°•ģ„±ķ˜„ ģ €)

4. ķŒØķ„“ģøģ‹ (ģ˜¤ģ¼ģ„ ģ§€ģŒ)

5. ģˆ˜ė¦¬ķ†µź³„ķ•™ ģ œ 3ķŒ (ģ „ėŖ…ģ‹ ģ§€ģŒ)

Page 42: 03. linear regression

Laplacian Smoothing

multinomial random variable š‘§ : š‘§ėŠ” 1ė¶€ķ„° š‘˜ź¹Œģ§€ģ˜ ź°’ģ„ ź°€ģ§ˆ ģˆ˜ ģžˆė‹¤.

ģš°ė¦¬ėŠ” test setģœ¼ė”œ š‘šź°œģ˜ ė…ė¦½ģø ź“€ģ°° ź°’ š‘§ 1 , ā€¦ , š‘§ š‘š ģ„ ź°€ģ§€ź³  ģžˆė‹¤.

ģš°ė¦¬ėŠ” ź“€ģ°° ź°’ģ„ ķ†µķ•“, š’‘(š’› = š’Š) ė„¼ ģ¶”ģ •ķ•˜ź³  ģ‹¶ė‹¤. (š‘– = 1, ā€¦ , š‘˜)

ģ¶”ģ • ź°’(MLE)ģ€,

š‘ š‘§ = š‘— = š¼{š‘§ š‘– = š‘—}š‘šš‘–=1

š‘š

ģ“ė‹¤. ģ—¬źø°ģ„œ š¼ . ėŠ” ģ§€ģ‹œ ķ•Øģˆ˜ ģ“ė‹¤. ź“€ģ°° ź°’ ė‚“ģ—ģ„œģ˜ ė¹ˆė„ģˆ˜ė„¼ ģ‚¬ģš©ķ•˜ģ—¬ ģ¶”ģ •ķ•œė‹¤.

ķ•œ ź°€ģ§€ ģ£¼ģ˜ ķ•  ź²ƒģ€, ģš°ė¦¬ź°€ ģ¶”ģ •ķ•˜ė ¤ėŠ” ź°’ģ€ ėŖØģ§‘ė‹Ø(population)ģ—ģ„œģ˜ ėŖØģˆ˜

š‘(š‘§ = š‘–)ė¼ėŠ” ź²ƒģ“ė‹¤. ģ¶”ģ •ķ•˜źø° ģœ„ķ•˜ģ—¬ test set(or ķ‘œė³ø ģ§‘ė‹Ø)ģ„ ģ‚¬ģš©ķ•˜ėŠ” ź²ƒ ėæģ“ė‹¤.

ģ˜ˆė„¼ ė“¤ģ–“, š‘§(š‘–) ā‰  3 for all š‘– = 1, ā€¦ ,š‘š ģ“ė¼ė©“, š‘ š‘§ = 3 = 0 ģ“ ė˜ėŠ” ź²ƒģ“ė‹¤.

ģ“ź²ƒģ€, ķ†µź³„ģ ģœ¼ė”œ ė³¼ ė•Œ, ģ¢‹ģ§€ ģ•Šģ€ ģƒź°ģ“ė‹¤. ė‹Øģ§€, ķ‘œė³ø ģ§‘ė‹Øģ—ģ„œ ė³“ģ“ģ§€

ģ•ŠėŠ” ė‹¤ėŠ” ģ“ģœ ė”œ ģš°ė¦¬ź°€ ģ¶”ģ •ķ•˜ź³ ģž ķ•˜ėŠ” ėŖØģ§‘ė‹Øģ˜ ėŖØģˆ˜ ź°’ģ„ 0ģœ¼ė”œ ķ•œė‹¤ėŠ” ź²ƒģ€

ķ†µź³„ģ ģœ¼ė”œ ģ¢‹ģ§€ ģ•Šģ€ ģƒź°(bad idea)ģ“ė‹¤. (MLEģ˜ ģ•½ģ )

Page 43: 03. linear regression

ģ“ź²ƒģ„ ź·¹ė³µķ•˜źø° ģœ„ķ•“ģ„œėŠ”,

ā‘  ė¶„ģžź°€ 0ģ“ ė˜ģ–“ģ„œėŠ” ģ•ˆ ėœė‹¤.

ā‘” ģ¶”ģ • ź°’ģ˜ ķ•©ģ“ 1ģ“ ė˜ģ–“ģ•¼ ķ•œė‹¤. š‘ š‘§ = š‘—š‘§ =1 (āˆµ ķ™•ė„ ģ˜ ķ•©ģ€ 1ģ“ ė˜ģ–“ģ•¼ ķ•Ø)

ė”°ė¼ģ„œ,

š’‘ š’› = š’‹ = š‘° š’› š’Š = š’‹ + šŸš’Žš’Š=šŸ

š’Ž+ š’Œ

ģ“ė¼ź³  ķ•˜ģž.

ā‘ ģ˜ ģ„±ė¦½ : test set ė‚“ģ— š‘—ģ˜ ź°’ģ“ ģ—†ģ–“ė„, ķ•“ė‹¹ ģ¶”ģ • ź°’ģ€ 0ģ“ ė˜ģ§€ ģ•ŠėŠ”ė‹¤.

ā‘”ģ˜ ģ„±ė¦½ : š‘§(š‘–) = š‘—ģø dataģ˜ ģˆ˜ė„¼ š‘›š‘—ė¼ź³  ķ•˜ģž. š‘ š‘§ = 1 = š‘›1+1

š‘š+š‘˜, ā€¦ , š‘ š‘§ = š‘˜ =

š‘›š‘˜+1

š‘š+š‘˜

ģ“ė‹¤. ź° ģ¶”ģ • ź°’ģ„ ė‹¤ ė”ķ•˜ź²Œ ė˜ė©“ 1ģ“ ė‚˜ģ˜Øė‹¤.

ģ“ź²ƒģ“ ė°”ė”œ Laplacian smoothingģ“ė‹¤.

š‘§ź°€ ė  ģˆ˜ ģžˆėŠ” ź°’ģ“ 1ė¶€ķ„° š‘˜ź¹Œģ§€ ź· ė“±ķ•˜ź²Œ ė‚˜ģ˜¬ ģˆ˜ ģžˆė‹¤ėŠ” ź°€ģ •ģ“ ģ¶”ź°€ė˜ģ—ˆė‹¤ź³ 

ģ§ź“€ģ ģœ¼ė”œ ģ•Œ ģˆ˜ ģžˆė‹¤. 1