cpsc 340: machine learning and data miningfwood/cs340/lectures/l5.pdf · machine learning and data...
TRANSCRIPT
![Page 1: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the](https://reader033.vdocuments.net/reader033/viewer/2022042805/5f660c023e69554f8f31340d/html5/thumbnails/1.jpg)
CPSC340:MachineLearningandDataMining
ProbabilisticClassificationFall2020
![Page 2: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the](https://reader033.vdocuments.net/reader033/viewer/2022042805/5f660c023e69554f8f31340d/html5/thumbnails/2.jpg)
Admin• Waitinglistpeople:everyshouldbein!• Coursewebpage:– https://www.cs.ubc.ca/~fwood/CS340/
• Homework1duetonight.
![Page 3: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the](https://reader033.vdocuments.net/reader033/viewer/2022042805/5f660c023e69554f8f31340d/html5/thumbnails/3.jpg)
LastTime:Training,Testing,andValidation• Trainingstep:
• Predictionstep:
• Whatweareinterestedinisthetesterror:– Errormadebypredictionsteponnewdata.
![Page 4: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the](https://reader033.vdocuments.net/reader033/viewer/2022042805/5f660c023e69554f8f31340d/html5/thumbnails/4.jpg)
LastTime:FundamentalTrade-Off• Wedecomposedtesterrortogetafundamentaltrade-off:
– WhereEapprox =(Etest – Etrain).
• Etrain goesdownasmodelgetscomplicated:– Trainingerrorgoesdownasadecisiontreegetsdeeper.
• ButEapprox goesupasmodelgetscomplicated:– Trainingerrorbecomesaworseapproximationoftesterror.
![Page 5: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the](https://reader033.vdocuments.net/reader033/viewer/2022042805/5f660c023e69554f8f31340d/html5/thumbnails/5.jpg)
LastTime:ValidationError• Goldenrule:wecan’tlookattestdataduringtraining.• ButwecanapproximateEtest withavalidationerror:– Erroronasetoftrainingexampleswe“hid”duringtraining.
– Findthedecisiontreebasedonthe“train”rows.– Validationerroristheerrorofthedecisiontreeonthe“validation”rows.
• Wetypicallychoose“hyper-parameters”likedepthtominimizethevalidationerror.
![Page 6: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the](https://reader033.vdocuments.net/reader033/viewer/2022042805/5f660c023e69554f8f31340d/html5/thumbnails/6.jpg)
OverfittingtotheValidationSet?• Validationerrorusuallyhasloweroptimizationbiasthantrainingerror.
– Mightoptimizeover20valuesof“depth”,insteadofmillions+ofpossibletrees.
• Butwecanstilloverfit tothevalidationerror(commoninpractice):– Validationerrorisonlyanunbiasedapproximationifyouuseitonce.– Onceyoustartoptimizingit,youstarttooverfit tothevalidationset.
• Thisismostimportantwhenthevalidationsetis“small”:– Theoptimizationbiasdecreasesasthenumberofvalidationexamplesincreases.
• Remember,ourgoalisstilltodowellonthetestset(newdata),notthevalidationset(wherewealreadyknowthelabels).
![Page 7: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the](https://reader033.vdocuments.net/reader033/viewer/2022042805/5f660c023e69554f8f31340d/html5/thumbnails/7.jpg)
Shouldyoutrustthem?• Scenario1:
– “Ibuiltamodelbasedonthedatayougaveme.”– “Itclassifiedyourdatawith98%accuracy.”– “Itshouldget98%accuracyontherestofyourdata.”
• Probablynot:– Theyarereportingtrainingerror.– Thismighthavenothingtodowithtesterror.– E.g.,theycouldhavefitaverydeepdecisiontree.
• Why‘probably’?– Iftheyonlytriedafewverysimplemodels,the98%mightbereliable.– E.g.,theyonlyconsidereddecisionstumpswithsimple1-variablerules.
![Page 8: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the](https://reader033.vdocuments.net/reader033/viewer/2022042805/5f660c023e69554f8f31340d/html5/thumbnails/8.jpg)
Shouldyoutrustthem?• Scenario2:– “Ibuiltamodelbasedonhalfofthedatayougaveme.”– “Itclassifiedtheotherhalfofthedatawith98%accuracy.”– “Itshouldget98%accuracyontherestofyourdata.”
• Probably:– Theycomputedthevalidationerroronce.– Thisisanunbiasedapproximationofthetesterror.– Trustthemifyoubelievetheydidn’tviolatethegoldenrule.
![Page 9: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the](https://reader033.vdocuments.net/reader033/viewer/2022042805/5f660c023e69554f8f31340d/html5/thumbnails/9.jpg)
Shouldyoutrustthem?• Scenario3:– “Ibuilt10models basedonhalfofthedatayougaveme.”– “Oneofthemclassifiedtheotherhalfofthedatawith98%accuracy.”– “Itshouldget98%accuracyontherestofyourdata.”
• Probably:– Theycomputedthevalidationerrorasmallnumberoftimes.– Maximizingovertheseerrorsisabiasedapproximationoftesterror.– Buttheyonlymaximizeditover10models,sobiasisprobablysmall.– Theyprobablyknowaboutthegoldenrule.
![Page 10: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the](https://reader033.vdocuments.net/reader033/viewer/2022042805/5f660c023e69554f8f31340d/html5/thumbnails/10.jpg)
Shouldyoutrustthem?• Scenario4:– “Ibuilt1billionmodels basedonhalfofthedatayougaveme.”– “Oneofthemclassifiedtheotherhalfofthedatawith98%accuracy.”– “Itshouldget98%accuracyontherestofyourdata.”
• Probablynot:– Theycomputedthevalidationerrorahugenumberoftimes.– Theytriedsomanymodels,oneofthemislikelytoworkbychance.
• Why‘probably’?– Ifthe1billionmodelswereallextremely-simple,98%mightbereliable.
![Page 11: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the](https://reader033.vdocuments.net/reader033/viewer/2022042805/5f660c023e69554f8f31340d/html5/thumbnails/11.jpg)
Shouldyoutrustthem?• Scenario5:
– “Ibuilt1billionmodels basedonthefirstthirdofthedatayougaveme.”– “Oneofthemclassifiedthesecondthirdofthedatawith98%accuracy.”– “Italsoclassifiedthelastthirdofthedatawith98%accuracy.”– “Itshouldget98%accuracyontherestofyourdata.”
• Probably:– Theycomputedthefirstvalidationerrorahugenumberoftimes.– Buttheyhadasecondvalidationsetthattheyonlylookedatonce.– Thesecondvalidationsetgivesunbiasedtesterrorapproximation.– Thisisideal,aslongastheydidn’tviolategoldenruleonthelastthird.– AndassumingyouareusingIIDdatainthefirstplace.
![Page 12: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the](https://reader033.vdocuments.net/reader033/viewer/2022042805/5f660c023e69554f8f31340d/html5/thumbnails/12.jpg)
ValidationErrorandOptimizationBias• Optimizationbiasissmallifyouonlycompareafewmodels:– Bestdecisiontreeonthetrainingsetamongdepths1,2,3,…,10.– Riskofoverfittingtovalidationsetislowifwetry10things.
• Optimizationbiasislargeifyoucomparealotofmodels:– Allpossibledecisiontreesofdepth10orless.– Herewe’reusingthevalidationsettopickbetweenabillion+models:
• Riskofoverfittingtovalidationsetishigh:couldhavelowvalidationerrorbychance.
– Ifyoudidthis,youmightwantasecondvalidationsettodetectoverfitting.
• Andoptimizationbiasshrinksasyougrowsizeofvalidationset.
![Page 13: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the](https://reader033.vdocuments.net/reader033/viewer/2022042805/5f660c023e69554f8f31340d/html5/thumbnails/13.jpg)
Aside:OptimizationBiasleadstoPublicationBias• Supposethat20researchersperformtheexactsameexperiment:
• Theyeachtestwhethertheireffectis“significant”(p<0.05).– 19/20findthatitisnotsignificant.– Butthe1groupfindingit’ssignificantpublishesapaperabouttheeffect.
• Thisisagainoptimizationbias,contributingtopublicationbias.– Acontributingfactortomanyreportedeffectsbeingwrong.
![Page 14: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the](https://reader033.vdocuments.net/reader033/viewer/2022042805/5f660c023e69554f8f31340d/html5/thumbnails/14.jpg)
Cross-Validation(CV)• Isn’titwastefultoonlyusepartofyourdata?• 5-foldcross-validation:– Trainon80%ofthedata,validateontheother20%.– Repeatthis5moretimeswithdifferentsplits,andaveragethescore.
![Page 15: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the](https://reader033.vdocuments.net/reader033/viewer/2022042805/5f660c023e69554f8f31340d/html5/thumbnails/15.jpg)
Cross-Validation(CV)
TRAIN
TRAIN
TRAIN
TRAIN
VALIDATION
TRAIN
TRAIN
TRAIN
VALIDATION
TRAIN
TRAIN
TRAIN
VALIDATION
TRAIN
TRAIN
TRAIN
VALIDATION
TRAIN
TRAIN
TRAIN
VALIDATION
TRAIN
TRAIN
TRAIN
TRAIN
![Page 16: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the](https://reader033.vdocuments.net/reader033/viewer/2022042805/5f660c023e69554f8f31340d/html5/thumbnails/16.jpg)
Cross-ValidationPseudo-Code
![Page 17: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the](https://reader033.vdocuments.net/reader033/viewer/2022042805/5f660c023e69554f8f31340d/html5/thumbnails/17.jpg)
Cross-Validation(CV)• Youcantakethisideafurther(“k-foldcross-validation”):– 10-foldcross-validation:trainon90%ofdataandvalidateon10%.
• Repeat10timesandaverage(testonfold1,thenfold2,…,thenfold10),
– Leave-one-outcross-validation:trainonallbutonetrainingexample.• Repeatntimesandaverage.
• Gets moreaccurate butmoreexpensive withmorefolds.– Tochoosedepthwecomputethecross-validationscoreforeachdepth.
• Asbefore,ifdataisorderedthenfoldsshouldberandomsplits.– Randomizefirst,thensplitintofixedfolds.
![Page 18: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the](https://reader033.vdocuments.net/reader033/viewer/2022042805/5f660c023e69554f8f31340d/html5/thumbnails/18.jpg)
(pause)
![Page 19: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the](https://reader033.vdocuments.net/reader033/viewer/2022042805/5f660c023e69554f8f31340d/html5/thumbnails/19.jpg)
The“Best”MachineLearningModel• Decisiontreesarenotalwaysmostaccurateontesterror.• Whatisthe“best”machinelearningmodel?• Analternativemeasureofperformanceisthegeneralizationerror:
– Averageerroroverallxi vectorsthatarenotseeninthetrainingset.– “Howwellweexpecttodoforacompletelyunseenfeaturevector”.
• Nofreelunchtheorem (proofinbonusslides):– Thereisno “best”modelachievingthebestgeneralizationerrorforeveryproblem.
– IfmodelAgeneralizesbettertonewdatathanmodelBononedataset,thereisanotherdatasetwheremodelBworksbetter.
• Thisquestion islikeaskingwhichis“best”among“rock”,“paper”,and“scissors”.
![Page 20: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the](https://reader033.vdocuments.net/reader033/viewer/2022042805/5f660c023e69554f8f31340d/html5/thumbnails/20.jpg)
The“Best”MachineLearningModel• Implicationsofthelackofa“best”model:
– Weneedtolearnaboutandtryoutmultiplemodels.• SowhichonestostudyinCPSC340?
– We’llusuallymotivateeachmethodbyaspecificapplication.– Butwe’refocusingonmodelsthathavebeeneffectiveinmanyapplications.
• Caveatofnofreelunch(NFL)theorem:– Theworldisverystructured.– Somedatasetsaremorelikelythanothers.– ModelAreallycouldbebetterthanmodelBoneveryrealdatasetinpractice.
• Machinelearningresearch:– Largefocusonmodelsthatareusefulacrossmanyapplications.
![Page 21: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the](https://reader033.vdocuments.net/reader033/viewer/2022042805/5f660c023e69554f8f31340d/html5/thumbnails/21.jpg)
Application:E-mailSpamFiltering• Wantabuildasystemthatdetectsspame-mails.– Context:spamusedtobeabigproblem.
• Canweformulateassupervisedlearning?
![Page 22: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the](https://reader033.vdocuments.net/reader033/viewer/2022042805/5f660c023e69554f8f31340d/html5/thumbnails/22.jpg)
SpamFilteringasSupervisedLearning• Collectalargenumberofe-mails,getsuserstolabelthem.
• Wecanuse(yi =1)ife-mail‘i’isspam,(yi =0)ife-mailisnotspam.• Extractfeaturesofeache-mail(likebagofwords).– (xij =1)ifword/phrase‘j’isine-mail‘i’,(xij =0)ifitisnot.
$ Hi CPSC 340 Vicodin Offer …
1 1 0 0 1 0 …
0 0 0 0 1 1 …
0 1 1 1 0 0 …
… … … … … … …
Spam?
1
1
0
…
![Page 23: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the](https://reader033.vdocuments.net/reader033/viewer/2022042805/5f660c023e69554f8f31340d/html5/thumbnails/23.jpg)
FeatureRepresentationforSpam• Aretherebetterfeaturesthanbagofwords?– Weaddbigrams (setsoftwowords):
• “CPSC340”,“waitlist”,“specialdeal”.– Ortrigrams (setsofthreewords):
• “Limitedtimeoffer”,“courseregistrationdeadline”,“you’reawinner”.
– Wemightincludethesenderdomain:• <senderdomain==“mail.com”>.
– Wemightincluderegularexpressions:• <yourfirstandlastname>.
![Page 24: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the](https://reader033.vdocuments.net/reader033/viewer/2022042805/5f660c023e69554f8f31340d/html5/thumbnails/24.jpg)
ReviewofSupervisedLearningNotation• Wehavebeenusingthenotation‘X’and‘y’forsupervisedlearning:
• Xismatrixofallfeatures,yisvectorofalllabels.– Weuseyi forthelabelofexample‘i’(element‘i’of‘y’).– Weusexij forfeature‘j’ofexample‘i‘.– Weusexi asthelistoffeaturesofexample‘i’ (row‘i’of‘X’).
• Sointheabovex3 =[011100…].• Inpractice,onlystorelistofnon-zerofeaturesforeachxi(smallmemoryrequirement).
$ Hi CPSC 340 Vicodin Offer …
1 1 0 0 1 0 …
0 0 0 0 1 1 …
0 1 1 1 0 0 …
… … … … … … …
Spam?
1
1
0
…
![Page 25: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the](https://reader033.vdocuments.net/reader033/viewer/2022042805/5f660c023e69554f8f31340d/html5/thumbnails/25.jpg)
ProbabilisticClassifiers• Foryears,bestspamfilteringmethodsusednaïveBayes.
– Aprobabilistic classifier basedonBayesrule.– Ittendstoworkwellwithbagofwords.– RecentlyshowntoimproveonstateoftheartforCRISPR“geneediting”(link).
• Probabilisticclassifiersmodeltheconditionalprobability,p(yi |xi).– “Ifamessagehaswordsxi,whatisprobabilitythatmessageisspam?”
• Classifyitasspamifprobabilityofspamishigherthannotspam:– Ifp(yi =“spam”|xi)>p(yi =“notspam”|xi)
• return“spam”.– Else
• return“notspam”.
![Page 26: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the](https://reader033.vdocuments.net/reader033/viewer/2022042805/5f660c023e69554f8f31340d/html5/thumbnails/26.jpg)
SpamFilteringwithBayesRule• Tomodelconditionalprobability,naïveBayesusesBayesrule:
• Soweneedtofigureoutthreetypesofterms:– Marginalprobabilityp(yi)thatane-mailisspam.– Marginalprobabilityp(xi) thatane-mailhasthesetofwordsxi.– Conditionalprobabilityp(xi |yi)thataspame-mailhasthewordsxi.
• Andthesamefornon-spame-mails.
![Page 27: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the](https://reader033.vdocuments.net/reader033/viewer/2022042805/5f660c023e69554f8f31340d/html5/thumbnails/27.jpg)
SpamFilteringwithBayesRule
• Whatdothesetermsmean?
ALLE-MAILS(includingduplicates)
![Page 28: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the](https://reader033.vdocuments.net/reader033/viewer/2022042805/5f660c023e69554f8f31340d/html5/thumbnails/28.jpg)
SpamFilteringwithBayesRule
• p(yi =“spam”)isprobabilitythatarandome-mailisspam.– Thisiseasytoapproximatefromdata:usetheproportioninyourdata.
ALLE-MAILS(includingduplicates)SPAMNOT
SPAM Thisisan“estimate”ofthetrueprobability.Inparticular,thisformulaisa“maximumlikelihoodestimate”(MLE).WewillcoverlikelihoodsandMLEslaterinthecourse.
![Page 29: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the](https://reader033.vdocuments.net/reader033/viewer/2022042805/5f660c023e69554f8f31340d/html5/thumbnails/29.jpg)
SpamFilteringwithBayesRule
• p(xi)isprobabilitythatarandome-mailhasfeaturesxi:– Hardtoapproximate:with‘d’wordsweneedtocollect2d “coupons”,
andthat’sjusttoseeeachwordcombinationonce.
ALLE-MAILS(includingduplicates)
![Page 30: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the](https://reader033.vdocuments.net/reader033/viewer/2022042805/5f660c023e69554f8f31340d/html5/thumbnails/30.jpg)
SpamFilteringwithBayesRule
• p(xi)isprobabilitythatarandome-mailhasfeaturesxi:– Hardtoapproximate:with‘d’wordsweneedtocollect2d “coupons”,butitturnsoutwecanignoreit:
![Page 31: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the](https://reader033.vdocuments.net/reader033/viewer/2022042805/5f660c023e69554f8f31340d/html5/thumbnails/31.jpg)
SpamFilteringwithBayesRule
• p(xi |yi =“spam”)isprobabilitythatspamhasfeaturesxi.
ALLE-MAILS(includingduplicates)
NOTSPAM SPAM
• Alsohardtoapproximate.• Andweneedit.
![Page 32: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the](https://reader033.vdocuments.net/reader033/viewer/2022042805/5f660c023e69554f8f31340d/html5/thumbnails/32.jpg)
NaïveBayes• NaïveBayesmakesabigassumptiontomakethingseasier:
• Weassumeall featuresxi areconditionallyindependentgivelabel yi.– Onceyouknowit’sspam,probabilityof“vicodin”doesn’tdependon“340”.– Definitelynottrue,butsometimesagoodapproximation.
• Andnowweonlyneedeasyquantitieslikep(“vicodin” =0|yi =“spam”).
![Page 33: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the](https://reader033.vdocuments.net/reader033/viewer/2022042805/5f660c023e69554f8f31340d/html5/thumbnails/33.jpg)
NaïveBayes• p(“vicodin”=1|“spam”=1)isprobabilityofseeing“vicodin”inspam.
ALLPOSSIBLEE-MAILS(includingduplicates)SPAMNOT
SPAM
• Easytoestimate:Vicodin
Again,thisisa“maximumlikelihoodestimate”(MLE).Wewillcoverhowtoderivethislater.
![Page 34: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the](https://reader033.vdocuments.net/reader033/viewer/2022042805/5f660c023e69554f8f31340d/html5/thumbnails/34.jpg)
Summary• Optimizationbias:usingavalidationsettoomuchoverfits.• Cross-validation:allowsbetteruseofdatatoestimatetesterror.• Nofreelunchtheorem:thereisno“best”MLmodel.• Probabilisticclassifiers:trytoestimatep(yi |xi).• NaïveBayes:simpleprobabilisticclassifierbasedoncounting.– Usesconditionalindependenceassumptionstomaketrainingpractical.
• Nexttime:– A“best”machinelearningmodelas‘n’goesto∞.
![Page 35: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the](https://reader033.vdocuments.net/reader033/viewer/2022042805/5f660c023e69554f8f31340d/html5/thumbnails/35.jpg)
BacktoDecisionTrees• Insteadofvalidationset,youcanuseCVtoselecttreedepth.
• Butyoucanalsousethesetodecidewhethertosplit:– Don’tsplitifvalidation/CVerrordoesn’timprove.– Differentpartsofthetreewillhavedifferentdepths.
• Orfitdeepdecisiontreeanduse[cross-]validationtoprune:– Removeleafnodesthatdon’timproveCVerror.
• Popularimplementationsthathavethesetricksandothers.
![Page 36: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the](https://reader033.vdocuments.net/reader033/viewer/2022042805/5f660c023e69554f8f31340d/html5/thumbnails/36.jpg)
RandomSubsamples• Insteadofsplittingintok-folds,consider“randomsubsample”method:– Ateach“round”,choosearandomsetofsize‘m’.
• Trainonallexamplesexceptthese‘m’examples.• Computevalidationerroronthese‘m’examples.
• Advantages:– Stillanunbiasedestimatoroferror.– Numberof“rounds”doesnotneedtoberelatedto“n”.
• Disadvantage:– Examplesthataresampledmoreoftengetmore“weight”.
![Page 37: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the](https://reader033.vdocuments.net/reader033/viewer/2022042805/5f660c023e69554f8f31340d/html5/thumbnails/37.jpg)
Cross-ValidationTheory• DoesCVgiveunbiasedestimateoftesterror?
– Yes!• Sinceeachdatapointisonlyusedonceinvalidation,expectedvalidationerroroneachdatapointistesterror.
– Butagain,ifyouuseCVtoselectamongmodelsthenitisnolongerunbiased.
• WhataboutvarianceofCV?– Hardtocharacterize.– CVvarianceon‘n’datapointsisworsethanwithavalidationsetofsize‘n’.
• Butwebelieveitisclose.
• Doescross-validationremoveoptimizationbias?– No,butthebiasmightbesmallersinceyouhavemore“test”points.
![Page 38: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the](https://reader033.vdocuments.net/reader033/viewer/2022042805/5f660c023e69554f8f31340d/html5/thumbnails/38.jpg)
HandlingDataSparsity• Doweneedtostorethefullbagofwords0/1variables?– No:onlyneedlistofnon-zerofeatures foreache-mail.
– Math/modeldoesn’tchange,butmoreefficientstorage.
$ Hi CPSC 340 Vicodin Offer …
1 1 0 0 1 0 …
0 0 0 0 1 1 …
0 1 1 1 0 0 …
1 1 0 0 0 1 …
Non-Zeroes
{1,2,5,…}
{5,6,…}
{2,3,4,…}
{1,2,6,…}
![Page 39: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the](https://reader033.vdocuments.net/reader033/viewer/2022042805/5f660c023e69554f8f31340d/html5/thumbnails/39.jpg)
ProofofNoFreeLunchTheorem• Let’sshowthe“nofreelunch”theoreminasimplesetting:– Thexi andyi arebinary,andyi beingadeterministicfunctionofxi.
• With‘d’features,each“learningproblem”isamapfromeachofthe2d featurecombinationsto0or1:{0,1}d ->{0,1}
• Let’spickoneofthesemaps(“learningproblems”)and:– Generateasettrainingsetof‘n’IIDsamples.– FitmodelA (convolutionalneuralnetwork)andmodelB (naïveBayes).
Feature 1 Feature2 Feature3
0 0 0
0 0 1
0 1 0
… … …
Map1 Map2 Map3 …
0 1 0 …
0 0 1 …
0 0 0 …
… … … …
![Page 40: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the](https://reader033.vdocuments.net/reader033/viewer/2022042805/5f660c023e69554f8f31340d/html5/thumbnails/40.jpg)
ProofofNoFreeLunchTheorem• Definethe“unseen”examplesasthe(2d – n)notseenintraining.– Assumingnorepetitionsofxi values,andn<2d.– Generalizationerroristheaverageerroronthese“unseen”examples.
• SupposethatmodelAgot1%errorandmodelBgot60%error.– WewanttoshowmodelBbeatsmodelAonanother“learningproblem”.
• Amongoursetof“learningproblems”findtheonewhere:– Thelabelsyi agreeonalltrainingexamples.– Thelabelsyi disagreeonall“unseen”examples.
• Onthisother“learningproblem”:– ModelAgets99%errorandmodelBgets40%error.
![Page 41: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the](https://reader033.vdocuments.net/reader033/viewer/2022042805/5f660c023e69554f8f31340d/html5/thumbnails/41.jpg)
ProofofNoFreeLunchTheorem• Further,acrossall“learningproblems”withthese‘n’examples:– Averagegeneralizationerrorofeverymodelis50%onunseenexamples.
• It’srightoneachunseenexampleinexactlyhalfthelearningproblems.– With‘k’classes,theaverageerroris(k-1)/k(randomguessing).
• Thisiskindofdepressing:– Forgeneralproblems,no“machinelearning”isbetterthan“predict0”.
• ButtheproofalsorevealstheproblemwiththeNFLtheorem:– Assumesevery“learningproblem”isequallylikely.– Worldencouragespatternslike“similarfeaturesimpliessimilarlabels”.