computational learning theory: occam’s razor learning fall 2017 computational learning theory:...

Post on 09-Mar-2018

219 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

MachineLearningFall2017

ComputationalLearningTheory:Occam’sRazor

1SlidesbasedonmaterialfromDanRoth,Avrim Blum,TomMitchellandothers

Thislecture:ComputationalLearningTheory

• TheTheoryofGeneralization

• ProbablyApproximatelyCorrect(PAC)learning

• Positiveandnegativelearnabilityresults

• AgnosticLearning

• ShatteringandtheVCdimension

2

Wherearewe?

• TheTheoryofGeneralization– Whencanbetrustthelearningalgorithm?– Whatfunctionscanbelearned?– BatchLearning

• ProbablyApproximatelyCorrect(PAC)learning

• Positiveandnegativelearnabilityresults

• AgnosticLearning

• ShatteringandtheVCdimension

3

Thissection

1. Analyzeasimplealgorithmforlearningconjunctions

2. DefinethePACmodeloflearning

3. MakeformalconnectionstotheprincipleofOccam’srazor

4

Thissection

ü Analyzeasimplealgorithmforlearningconjunctions

ü DefinethePACmodeloflearning

3. MakeformalconnectionstotheprincipleofOccam’srazor

5

Occam’sRazor

NamedafterWilliamofOccam– AD1300s

Prefersimplerexplanationsovermorecomplexones

“Numquam ponenda est pluralitas sinenecessitate”

Historically,awidelyprevalentideaacrossdifferentschoolsofphilosophy

6

(Neverpositpluralitywithoutnecessity.)

TowardsformalizingOccam’sRazor

Claim:Theprobabilitythatthereisahypothesish2 Hthat:1. IsConsistentwithmexamples,and2. HaserrD(h)>²islessthan|H|(1- ²)m

Proof:Lethbesuchabadhypothesisthathasanerror>²ProbabilitythathisconsistentwithoneexampleisPr[f(x)=h(x)]<1- ²

ThetrainingsetconsistsofmexamplesdrawnindependentlySo,probabilitythathisconsistentwithmexamples<(1- ²)m

Probabilitythatsomebadhypothesis inHisconsistentwithmexamplesislessthan|H|(1- ²)m

7

TowardsformalizingOccam’sRazor

Claim:Theprobabilitythatthereisahypothesish2 Hthat:1. IsConsistentwithmexamples,and2. HaserrD(h)>²islessthan|H|(1- ²)m

Proof:Lethbesuchabadhypothesisthathasanerror>²ProbabilitythathisconsistentwithoneexampleisPr[f(x)=h(x)]<1- ²

ThetrainingsetconsistsofmexamplesdrawnindependentlySo,probabilitythathisconsistentwithmexamples<(1- ²)m

Probabilitythatsomebadhypothesis inHisconsistentwithmexamplesislessthan|H|(1- ²)m

8

(Assumingconsistency)

TowardsformalizingOccam’sRazor

Claim:Theprobabilitythatthereisahypothesish2 Hthat:1. IsConsistentwithmexamples,and2. HaserrD(h)>²islessthan|H|(1- ²)m

Proof:Lethbesuchabadhypothesisthathasanerror>²ProbabilitythathisconsistentwithoneexampleisPr[f(x)=h(x)]<1- ²

ThetrainingsetconsistsofmexamplesdrawnindependentlySo,probabilitythathisconsistentwithmexamples<(1- ²)m

Probabilitythatsomebadhypothesis inHisconsistentwithmexamplesislessthan|H|(1- ²)m

9

(Assumingconsistency)

Thatis,consistentyetbad

TowardsformalizingOccam’sRazor

Claim:Theprobabilitythatthereisahypothesish2 Hthat:1. IsConsistentwithmexamples,and2. HaserrD(h)>²islessthan|H|(1- ²)m

Proof:Lethbesuchabadhypothesisthathasanerror>²ProbabilitythathisconsistentwithoneexampleisPr[f(x)=h(x)]<1- ²

ThetrainingsetconsistsofmexamplesdrawnindependentlySo,probabilitythathisconsistentwithmexamples<(1- ²)m

Probabilitythatsomebadhypothesis inHisconsistentwithmexamplesislessthan|H|(1- ²)m

10

(Assumingconsistency)

Thatis,consistentyetbad

TowardsformalizingOccam’sRazor

Claim:Theprobabilitythatthereisahypothesish2 Hthat:1. IsConsistentwithmexamples,and2. HaserrD(h)>²islessthan|H|(1- ²)m

Proof:Lethbesuchabadhypothesisthathasanerror>²ProbabilitythathisconsistentwithoneexampleisPr[f(x)=h(x)]<1- ²

ThetrainingsetconsistsofmexamplesdrawnindependentlySo,probabilitythathisconsistentwithmexamples<(1- ²)m

Probabilitythatsomebadhypothesis inHisconsistentwithmexamplesislessthan|H|(1- ²)m

11

(Assumingconsistency)

Thatis,consistentyetbad

TowardsformalizingOccam’sRazor

Claim:Theprobabilitythatthereisahypothesish2 Hthat:1. IsConsistentwithmexamples,and2. HaserrD(h)>²islessthan|H|(1- ²)m

Proof:Lethbesuchabadhypothesisthathasanerror>²ProbabilitythathisconsistentwithoneexampleisPr[f(x)=h(x)]<1- ²

ThetrainingsetconsistsofmexamplesdrawnindependentlySo,probabilitythathisconsistentwithmexamples<(1- ²)m

Probabilitythatsomebadhypothesis inHisconsistentwithmexamplesislessthan|H|(1- ²)m

12

(Assumingconsistency)

Thatis,consistentyetbad

TowardsformalizingOccam’sRazor

Claim:Theprobabilitythatthereisahypothesish2 Hthat:1. IsConsistentwithmexamples,and2. HaserrD(h)>²islessthan|H|(1- ²)m

Proof:Lethbesuchabadhypothesisthathasanerror>²ProbabilitythathisconsistentwithoneexampleisPr[f(x)=h(x)]<1- ²

ThetrainingsetconsistsofmexamplesdrawnindependentlySo,probabilitythathisconsistentwithmexamples<(1- ²)m

Probabilitythatsomebadhypothesis inHisconsistentwithmexamplesislessthan|H|(1- ²)m

13

(Assumingconsistency)

Thatis,consistentyetbad

Occam’sRazor

Theprobabilitythatthereisahypothesish2 Hthatis1. Consistentwithmexamples,and2. HaserrD(h)>²islessthan|H|(1- ²)m

Justlikebefore,wewanttomakethisprobabilitysmall,saysmallerthan±|H|(1- ²)m<±

ln(|H|)+mln(1- ²)<ln ±

WeknowthatLet’suseln(1- ²) <-² togetasafer±

14

Thatis,if then,theprobabilityofgettingabadhypothesisissmall

Occam’sRazor

Theprobabilitythatthereisahypothesish2 Hthatis1. Consistentwithmexamples,and2. HaserrD(h)>²islessthan|H|(1- ²)m

Justlikebefore,wewanttomakethisprobabilitysmall,saysmallerthan±|H|(1- ²)m<±

ln(|H|)+mln(1- ²)<ln ±

WeknowthatLet’suseln(1- ²) <-² togetasafer±

15

Thatis,if then,theprobabilityofgettingabadhypothesisissmall

Occam’sRazor

Theprobabilitythatthereisahypothesish2 Hthatis1. Consistentwithmexamples,and2. HaserrD(h)>²islessthan|H|(1- ²)m

Justlikebefore,wewanttomakethisprobabilitysmall,saysmallerthan±|H|(1- ²)m<±

ln(|H|)+mln(1- ²)<ln ±

WeknowthatLet’suseln(1- ²) <-² togetasafer±

16

Thatis,if then,theprobabilityofgettingabadhypothesisissmall

Occam’sRazor

Theprobabilitythatthereisahypothesish2 Hthatis1. Consistentwithmexamples,and2. HaserrD(h)>²islessthan|H|(1- ²)m

Justlikebefore,wewanttomakethisprobabilitysmall,saysmallerthan±|H|(1- ²)m<±

ln(|H|)+mln(1- ²)<ln ±

WeknowthatLet’suseln(1- ²) <-² togetasafer±

17

Thatis,if then,theprobabilityofgettingabadhypothesisissmall

Occam’sRazor

Theprobabilitythatthereisahypothesish2 Hthatis1. Consistentwithmexamples,and2. HaserrD(h)>²islessthan|H|(1- ²)m

Justlikebefore,wewanttomakethisprobabilitysmall,saysmallerthan±|H|(1- ²)m<±

ln(|H|)+mln(1- ²)<ln ±

WeknowthatLet’suseln(1- ²) <-² togetasafer±

18

Thatis,if then,theprobabilityofgettingabadhypothesisissmall

Occam’sRazor

Theprobabilitythatthereisahypothesish2 Hthatis1. Consistentwithmexamples,and2. HaserrD(h)>²islessthan|H|(1- ²)m

Justlikebefore,wewanttomakethisprobabilitysmall,saysmallerthan±|H|(1- ²)m<±

ln(|H|)+mln(1- ²)<ln ±

WeknowthatLet’suseln(1- ²) <-² togetasafer±

19

Thatis,if then,theprobabilityofgettingabadhypothesisissmall

Occam’sRazor

LetHbeanyhypothesisspace.Withprobability1- ±,ahypothesish2 Hthatisconsistent withatrainingsetofsizem willhaveanerror<² onfutureexamplesif

ThisiscalledtheOccam’sRazorbecauseitexpressesapreferencetowardssmallerhypothesisspaces.

Showswhenam-consistenthypothesisgeneralizeswell(i.e error<²).

Complicated/largerhypothesisspacesarenotnecessarilybad.Butsimpleronesareunlikelytofoolusbybeingconsistentwithmanyexamples!

20

Occam’sRazor

LetHbeanyhypothesisspace.Withprobability1- ±,ahypothesish2 Hthatisconsistent withatrainingsetofsizem willhaveanerror<² onfutureexamplesif

ThisiscalledtheOccam’sRazorbecauseitexpressesapreferencetowardssmallerhypothesisspaces.

Showswhenam-consistenthypothesisgeneralizeswell(i.e error<²).

Complicated/largerhypothesisspacesarenotnecessarilybad.Butsimpleronesareunlikelytofoolusbybeingconsistentwithmanyexamples!

21

1.Expectinglowererrorincreasessamplecomplexity(i.e moreexamplesneededfortheguarantee)

Occam’sRazor

LetHbeanyhypothesisspace.Withprobability1- ±,ahypothesish2 Hthatisconsistent withatrainingsetofsizem willhaveanerror<² onfutureexamplesif

ThisiscalledtheOccam’sRazorbecauseitexpressesapreferencetowardssmallerhypothesisspaces.

Showswhenam-consistenthypothesisgeneralizeswell(i.e error<²).

Complicated/largerhypothesisspacesarenotnecessarilybad.Butsimpleronesareunlikelytofoolusbybeingconsistentwithmanyexamples!

22

1.Expectinglowererrorincreasessamplecomplexity(i.e moreexamplesneededfortheguarantee)

2.Ifwehavealargerhypothesisspace,thenwewillmakelearningharder(i.e highersamplecomplexity)

Occam’sRazor

LetHbeanyhypothesisspace.Withprobability1- ±,ahypothesish2 Hthatisconsistent withatrainingsetofsizem willhaveanerror<² onfutureexamplesif

ThisiscalledtheOccam’sRazorbecauseitexpressesapreferencetowardssmallerhypothesisspaces.

Showswhenam-consistenthypothesisgeneralizeswell(i.e error<²).

Complicated/largerhypothesisspacesarenotnecessarilybad.Butsimpleronesareunlikelytofoolusbybeingconsistentwithmanyexamples!

23

1.Expectinglowererrorincreasessamplecomplexity(i.e moreexamplesneededfortheguarantee)

2.Ifwehavealargerhypothesisspace,thenwewillmakelearningharder(i.e highersamplecomplexity)

3.Ifwewantahigherconfidenceintheclassifierwewillproduce,samplecomplexitywillbehigher.

Occam’sRazor

LetHbeanyhypothesisspace.Withprobability1- ±,ahypothesish2 Hthatisconsistent withatrainingsetofsizem willhaveanerror<² onfutureexamplesif

ThisiscalledtheOccam’sRazorbecauseitexpressesapreferencetowardssmallerhypothesisspaces.

Showswhenam-consistenthypothesisgeneralizeswell(i.e error<²).

Complicated/largerhypothesisspacesarenotnecessarilybad.Butsimpleronesareunlikelytofoolusbybeingconsistentwithmanyexamples!

24

Occam’sRazor

LetHbeanyhypothesisspace.Withprobability1- ±,ahypothesish2 Hthatisconsistent withatrainingsetofsizem willhaveanerror<² onfutureexamplesif

ThisiscalledtheOccam’sRazorbecauseitexpressesapreferencetowardssmallerhypothesisspaces.

Showswhenam-consistenthypothesisgeneralizeswell(i.e error<²).

Complicated/largerhypothesisspacesarenotnecessarilybad.Butsimpleronesareunlikelytofoolusbybeingconsistentwithmanyexamples!

25

Occam’sRazor

LetHbeanyhypothesisspace.Withprobability1- ±,ahypothesish2 Hthatisconsistent withatrainingsetofsizem willhaveanerror<² onfutureexamplesif

ThisiscalledtheOccam’sRazorbecauseitexpressesapreferencetowardssmallerhypothesisspaces.

Showswhenam-consistenthypothesisgeneralizeswell(i.e error<²).

Complicated/largerhypothesisspacesarenotnecessarilybad.Butsimpleronesareunlikelytofoolusbybeingconsistentwithmanyexamples!

26

Occam’sRazor

LetHbeanyhypothesisspace.Withprobability1- ±,ahypothesish2 Hthatisconsistent withatrainingsetofsizem willhaveanerror<² onfutureexamplesif

ThisiscalledtheOccam’sRazorbecauseitexpressesapreferencetowardssmallerhypothesisspaces.

Showswhenam-consistenthypothesisgeneralizeswell(i.e error<²).

Complicated/largerhypothesisspacesarenotnecessarilybad.Butsimpleronesareunlikelytofoolusbybeingconsistentwithmanyexamples!

27

Consistent LearnersandOccam’sRazorFromthedefinition,wegetthefollowinggeneralschemeforPAClearning

28

GivenasampleDofmexamples• FindsomehÎ H thatisconsistentwithallmexamples

• Ifmislargeenough,aconsistenthypothesismustbecloseenoughtof

• Checkthatmdoesnothavetobetoolarge(i.e polynomialintherelevantparameters):weshowedthatthe“closeness”guaranteerequiresthat

m>1/² (ln |H|+ln 1/±)

• ShowthattheconsistenthypothesishÎ H canbecomputedefficiently

Consistent LearnersandOccam’sRazorFromthedefinition,wegetthefollowinggeneralschemeforPAClearning

Weworkedoutthedetailsforconjunctions• TheEliminationalgorithmtofindahypothesishthatisconsistentwiththetraining

set(easytocompute)• Weshoweddirectlythatifwehavesufficientlymanyexamples(polynomialinthe

parameters),thanhisclosetothetargetfunction.29

GivenasampleDofmexamples• FindsomehÎ H thatisconsistentwithallmexamples

• Ifmislargeenough,aconsistenthypothesismustbecloseenoughtof

• Checkthatmdoesnothavetobetoolarge(i.e polynomialintherelevantparameters):weshowedthatthe“closeness”guaranteerequiresthat

m>1/² (ln |H|+ln 1/±)

• ShowthattheconsistenthypothesishÎ H canbecomputedefficiently

Exercises

Wehaveseenthedecisiontreelearningalgorithm.Supposeourproblemhasnbinaryfeatures.Whatisthesizeofthehypothesisspace?

AredecisiontreesefficientlyPAClearnable?

30

top related