practical advice for building machine learning applications · 2019-12-05 · machine learning...
TRANSCRIPT
MachineLearning
PracticalAdvicefor
BuildingMachineLearningApplications
1BasedonlecturesandpapersbyAndrewNg,PedroDomingos,TomMitchellandothers
MLandtheworld
• Diagnosticsofyourlearningalgorithm
• Erroranalysis
• InjectingmachinelearningintoYourFavoriteTask
• Makingmachinelearningmatter
2
MakingMLworkintheworld
MostlyexperientialadviceAlsobasedonwhatotherpeoplehavesaidSeereadingsonclasswebsite
MLandtheworld
• Diagnosticsofyourlearningalgorithm
• Erroranalysis
• InjectingmachinelearningintoYourFavoriteTask
• Makingmachinelearningmatter
3
Debuggingmachinelearning
SupposeyoutrainanSVMoralogisticregressionclassifierforspamdetection
Youobviously followbestpracticesforfindinghyper-parameters(suchascross-validation)
Yourclassifierisonly75%accurate
Whatcanyoudotoimproveit?
4
(assumingthattherearenobugsinthecode)
Differentwaystoimproveyourmodel
Moretrainingdata
Features1. Usemorefeatures2. Usefewerfeatures3. Useotherfeatures
Bettertraining1. Runformoreiterations2. Useadifferentalgorithm3. Useadifferentclassifier4. Playwithregularization
5
Differentwaystoimproveyourmodel
Moretrainingdata
Features1. Usemorefeatures2. Usefewerfeatures3. Useotherfeatures
Bettertraining1. Runformoreiterations2. Useadifferentalgorithm3. Useadifferentclassifier4. Playwithregularization
6
Tedious!Andpronetoerrors,dependenceonluck
Letustrytomakethisprocessmoremethodical
First,diagnostics
Somepossibleproblems:
1. Over-fitting(highvariance)
2. Under-fitting(highbias)
3. Yourlearningdoesnotconverge
4. Areyoumeasuringtherightthing?
7
Easiertofixaproblemifyouknowwhereitis
Detectingoverorunderfitting
Over-fitting:Thetrainingaccuracyismuchhigherthanthetestaccuracy– Themodelexplainsthetrainingsetverywell,butpoorgeneralization
Under-fitting:Bothaccuraciesareunacceptablylow– Themodelcannotrepresenttheconceptwellenough
8
Detectinghighvariance usinglearningcurves
9
Sizeoftrainingdata
Error
Trainingerror
Detectinghighvariance usinglearningcurves
10
Sizeoftrainingdata
Error
Trainingerror
Generalizationerror/testerror
Detectinghighvariance usinglearningcurves
11
Testerrorkeepsdecreasingastrainingsetincreases)moredatawillhelp
Sizeoftrainingdata
Error
Largegapbetweentrainandtesterror
Trainingerror
Generalizationerror/testerror
Typicallyseenformorecomplexmodels
Detectinghighbiasusinglearningcurves
12
Sizeoftrainingset
Error Trainingerror
Generalizationerror/testerror
Bothtrainandtesterrorareunacceptable(Butthemodelseemstoconverge)
Typicallyseenformoresimplemodels
Differentwaystoimproveyourmodel
Moretrainingdata
Features1. Usemorefeatures2. Usefewerfeatures3. Useotherfeatures
Bettertraining1. Runformoreiterations2. Useadifferentalgorithm3. Useadifferentclassifier4. Playwithregularization
13
Differentwaystoimproveyourmodel
Moretrainingdata
Features1. Usemorefeatures2. Usefewerfeatures3. Useotherfeatures
Bettertraining1. Runformoreiterations2. Useadifferentalgorithm3. Useadifferentclassifier4. Playwithregularization
14
Helpswithover-fitting
Helpswithunder-fittingHelpswithover-fittingCouldhelpwithover-fittingandunder-fitting
Couldhelpwithover-fittingandunder-fitting
Diagnostics
Somepossibleproblems:
ü Over-fitting(highvariance)
ü Under-fitting(highbias)
3. Yourlearningdoesnotconverge
4. Areyoumeasuringtherightthing?
15
Easiertofixaproblemifyouknowwhereitis
Doesyourlearningalgorithmconverge?
Iflearningisframedasanoptimizationproblem,tracktheobjective
16
Iterations
ObjectiveNotyetconvergedhere Convergedhere
Doesyourlearningalgorithmconverge?
Iflearningisframedasanoptimizationproblem,tracktheobjective
17
Iterations
ObjectiveNotyetconvergedhere Howabouthere?
Notalwayseasytodecide
Doesyourlearningalgorithmconverge?
Iflearningisframedasanoptimizationproblem,tracktheobjective
18
Iterations
Objective
Somethingiswrong
Doesyourlearningalgorithmconverge?
Iflearningisframedasanoptimizationproblem,tracktheobjective
19
Iterations
Objective
Helpstodebug
Somethingiswrong
Ifwearedoinggradientdescentonaconvexfunctiontheobjectivecan’tincrease(Caveat:ForSGD,theobjectivewillslightlyincreaseoccasionally,butnotbymuch)
Differentwaystoimproveyourmodel
Moretrainingdata
Features1. Usemorefeatures2. Usefewerfeatures3. Useotherfeatures
Bettertraining1. Runformoreiterations2. Useadifferentalgorithm3. Useadifferentclassifier4. Playwithregularization
20
Helpswithoverfitting
Helpswithunder-fittingHelpswithover-fittingCouldhelpwithover-fittingandunder-fitting
Couldhelpwithover-fittingandunder-fitting
Differentwaystoimproveyourmodel
Moretrainingdata
Features1. Usemorefeatures2. Usefewerfeatures3. Useotherfeatures
Bettertraining1. Runformoreiterations2. Useadifferentalgorithm3. Useadifferentclassifier4. Playwithregularization
21
Helpswithoverfitting
Helpswithunder-fittingHelpswithover-fittingCouldhelpwithover-fittingandunder-fitting
Couldhelpwithover-fittingandunder-fitting
Tracktheobjectiveforconvergence
Diagnostics
Somepossibleproblems:
ü Over-fitting(highvariance)
ü Under-fitting(highbias)
ü Yourlearningdoesnotconverge
4. Areyoumeasuringtherightthing?
22
Easiertofixaproblemifyouknowwhereitis
Whattomeasure
• Accuracyofpredictionisthemostcommonmeasurement
• Butifyourdatasetisunbalanced,accuracymaybemisleading– 1000positiveexamples,1negativeexample– Aclassifierthatalwayspredictspositivewillget99.9%accuracy.Hasit
reallylearnedanything?
• Unbalancedlabelsàmeasurelabelspecificprecision,recallandF-measure– Precisionforalabel:Amongexamplesthatarepredictedwithlabel,what
fractionarecorrect– Recallforalabel:Amongtheexampleswithgivengroundtruthlabel,
whatfractionarecorrect– F-measure:Harmonicmeanofprecisionandrecall
23
MLandtheworld
• Diagnosticsofyourlearningalgorithm
• Erroranalysis
• InjectingmachinelearningintoYourFavoriteTask
• Makingmachinelearningmatter
24
MachineLearninginthisclass
25
MLcode
MachineLearningincontext
26Figurefrom[Sculley,etalNIPS2015]
ErrorAnalysis
Generallymachinelearningplaysasmall(butimportant)roleinalargerapplication• Pre-processing• Featureextraction(possiblybyotherMLbasedmethods)• Datatransformations
Howmuchdoeachofthesecontributetotheerror?
Erroranalysistriestoexplainwhyasystemisnotperformingperfectly
27
Example:Atypicaltextprocessingpipeline
28
Example:Atypicaltextprocessingpipeline
29
Text
Example:Atypicaltextprocessingpipeline
30
Text
Words
Example:Atypicaltextprocessingpipeline
31
Text
Words
Parts-of-speech
Example:Atypicaltextprocessingpipeline
32
Text
Words
Parts-of-speech
Parsetrees
Example:Atypicaltextprocessingpipeline
33
Text
Words
Parts-of-speech
Parsetrees
AML-basedapplication
Example:Atypicaltextprocessingpipeline
34
Text
Words
Parts-of-speech
Parsetrees
AML-basedapplication
EachofthesecouldbeMLdriven
Ordeterministic
Butstillerrorprone
Example:Atypicaltextprocessingpipeline
35
Text
Words
Parts-of-speech
Parsetrees
AML-basedapplication
EachofthesecouldbeMLdriven
Ordeterministic
Butstillerrorprone
Howmuchdoeachofthesecontributetotheerrorofthefinalapplication?
Trackingerrorsinacomplexsystem
Pluginthegroundtruthfortheintermediatecomponentsandseehowmuchtheaccuracyofthefinalsystemchanges
36
System Accuracy
End-to-endpredicted 55%
Withgroundtruth words 60%
+groundtruthparts-of-speech 84%
+groundtruthparse trees 89%
+ground truthfinaloutput 100%
Trackingerrorsinacomplexsystem
Pluginthegroundtruthfortheintermediatecomponentsandseehowmuchtheaccuracyofthefinalsystemchanges
37
Errorinthepart-of-speechcomponenthurtsthemost
System Accuracy
End-to-endpredicted 55%
Withgroundtruth words 60%
+groundtruthparts-of-speech 84%
+groundtruthparse trees 89%
+ground truthfinaloutput 100%
Ablativestudy
Explainingdifferencebetweentheperformancebetweenastrongmodelandamuchweakerone(abaseline)
UsuallyseenwithfeaturesSupposewehaveacollectionoffeaturesandoursystemdoeswell,butwedon’tknowwhichfeaturesaregivingustheperformance
Evaluatesimplersystemsthatprogressivelyusefewerandfewerfeaturestoseewhichfeaturesgivethehighestboost
38
Itisnotenoughtohaveaclassifierthatworks;itisusefultoknowwhyitworks.
Helpsinterpretpredictions,diagnoseerrorsandcanprovideanaudittrail
MLandtheworld
• Diagnosticsofyourlearningalgorithm
• Erroranalysis
• InjectingmachinelearningintoYourFavoriteTask
• Makingmachinelearningmatter
39
Classifyingfish
SayyouwanttobuildaclassifierthatidentifieswhetherarealphysicalfishissalmonortunaHowdoyougoaboutthis?
40
Classifyingfish
SayyouwanttobuildaclassifierthatidentifieswhetherarealphysicalfishissalmonortunaHowdoyougoaboutthis?
41
Theslowapproach
1. Carefullyidentifyfeatures,getthebestdata,thesoftwarearchitecture,maybedesignanewlearningalgorithm
2. Implementitandhopeitworks
Advantage:Perhapsabetterapproach,maybeevenanewlearningalgorithm.Research.
Classifyingfish
SayyouwanttobuildaclassifierthatidentifieswhetherarealphysicalfishissalmonortunaHowdoyougoaboutthis?
42
Theslowapproach
1. Carefullyidentifyfeatures,getthebestdata,thesoftwarearchitecture,maybedesignanewlearningalgorithm
2. Implementitandhopeitworks
Advantage:Perhapsabetterapproach,maybeevenanewlearningalgorithm.Research.
Thehacker’sapproach
1. Firstimplementsomething
2. Usediagnosticstoiterativelymakeitbetter
Advantage:Fasterrelease,willhaveasolutionforyourproblemquicker
Classifyingfish
SayyouwanttobuildaclassifierthatidentifieswhetherarealphysicalfishissalmonortunaHowdoyougoaboutthis?
43
Theslowapproach
1. Carefullyidentifyfeatures,getthebestdata,thesoftwarearchitecture,maybedesignanewlearningalgorithm
2. Implementitandhopeitworks
Advantage:Perhapsabetterapproach,maybeevenanewlearningalgorithm.Research.
Thehacker’sapproach
1. Firstimplementsomething
2. Usediagnosticstoiterativelymakeitbetter
Advantage:Fasterrelease,willhaveasolutionforyourproblemquicker
Bewaryofprematureoptimization
Beequallywaryofprematurelycommittingtoabadpath
Whattowatchoutfor
• Doyouhavetherightevaluationmetric?– Anddoesyourlossfunctionreflectit?
• Bewareofcontamination:Ensurethatyourtrainingdataisnotcontaminatedwiththetestset– Learning=generalizationtonewexamples– Donotseeyourtestseteither.Youmayinadvertentlycontaminatethemodel
– Bewareofcontaminatingyourfeatureswiththelabel!– (Besuspiciousofperfectpredictors)
44
Whattowatchoutfor
• Beawareofbiasvs.variancetradeoff(orover-fittingvs.under-fitting)
• Beawarethatintuitionsmaynotworkinhighdimensions– Noproofbypicture– Curseofdimensionality
• Atheoreticalguaranteemayonlybetheoretical– Maymakeinvalidassumptions(eg:ifthedataisseparable)– Mayonlybelegitimatewithinfinitedata(eg:estimatingprobabilities)– Experimentsonrealdataareequallyimportant
45
Bigdataisnotenough
Butmoredataisalwaysbetter– Cleanerdataisevenbetter
Rememberthatlearningisimpossiblewithoutsomebiasthatsimplifiesthesearch– Otherwise,nogeneralization
Learningrequiresknowledgetoguidethelearner– Machinelearningisnot amagicwand
46
Whatknowledge?
• Whichmodelistherightoneforthistask?– Linearmodels,decisiontrees,deepneuralnetworks,etc
• Whichlearningalgorithm?– Doesthedataviolateanycrucialassumptionsthatwereusedto
definethelearningalgorithmorthemodel?– Doesthatmatter?
• Featureengineeringiscrucial
• Implicitly,theseareallclaimsaboutthenatureoftheproblem
47
Miscellaneousadvice
• Learnsimplermodelsfirst– Ifnothing,atleasttheyformabaselinethatyoucanimproveupon
• Ensemblesseemtoworkbetter
• Thinkaboutwhetheryourproblemislearnableatall– Learning=generalization
48
MLandtheworld
• Diagnosticsofyourlearningalgorithm
• Erroranalysis
• InjectingmachinelearningintoYourFavoriteTask
• Makingmachinelearningmatter
49
Makingmachinelearningmatter
ChallengestothegreaterMLcommunity1. AlawpassedorlegaldecisionmadethatreliesontheresultofanMLanalysis
2. $100MsavedthroughimproveddecisionmakingprovidedbyanMLsystem
3. AconflictbetweennationsavertedthroughhighqualitytranslationprovidedbyanMLsystem
4. A50%reductionincybersecurity break-insthroughMLdefenses
5. AhumanlifesavedthroughadiagnosisorinterventionrecommendedbyanMLsystem
6. Improvementof10%inonecountry’sHumanDevelopmentIndexattributabletoanMLsystem
50
MLandsystembuilding
51
SeveralrecentpapersabouthowMLfitsinthecontextoflargesoftwaresystems
Data-drivendecisionmaking:Increasinglyprevalent
Somebroaderconcernsaboutalgorithmicdecision-makingemerge:• Doclassifiersexhibitbiases?
• Howdoweensurethatsuchsystemsaretransparentintheirdecision-making?
• Whencanyoubelieveyourmodel?Shouldyouleavedecisionmakingtoit?
• Whatifstatisticalmodelsareusedinethicallydubiousways?
52
Algorithmsarenolongerjustaboutshowingproof-of-conceptlearning
Theserefertoauxiliarycriteriathatneednotdirectlytiedtothelossthatweminimize
Biasedclassifiers
Whatifclassifiersareusedtodecide…– … howlongsomeoneshouldbesentencedforacrime?– … orwhethersomeone’sloanapplicationshouldbeapproved?– … orwhethersomeoneshouldbefired?
Questions:• Howcanweensurethattheclassifiersdonotexhibitillegalor
unethicalbiases?– i.e.theclassifiersarefair?
• Howdowedesignalgorithmsandevaluationframeworkswhichavoiddiscrimination?
• Whatifthedataitself(eitherknowinglyorunknowingly)isunfair?
53
Allthesearerealexamples
Therighttoexplanations
Imagineyouhaveajobandaclassifierdecidestofireyou.– Maybebecauseitmadeanerroronthisinstance(ie.you!)
Questions:• Howdowedevelopstatisticalmethodsthatnotonly
makeaprediction,butarealsotransparentintheirdecisionmakingprocess?
• Howdowedevelopalgorithmsthatcanexplaintheirdecisions?
54
Arecurrentlegalsystemsrobust?
Perhapsthereisroomtorewrite/updatelawstoaccountformachinelearning
• Whatifevidenceisfakedbyasamplingfromagenerativemodel?
• WhoistoblameifaML-basedautomaticcarisinvolvedinanaccident?
• WhatifanML-basedautonomousweapondecidestokillsomeonewithoutanyhumanintervention?
55
Fairness,AccountabilityandTransparency
• Weneedtoensurethat– OuralgorithmsareFair– OuralgorithmscanbeheldAccountable– OuralgorithmsexhibitTransparency
• Thesearealldifficulttoformalize– Butstillimportant
• Anditisimportanttokeeptheseconcernswhenwe– Defineatask– Collectdata– Defineevaluations– Designfeatures,models– … reallyateverystepalongtheway
56
http://www.fatml.orghttps://fatconference.org
Aretrospectivelookatthecourse
57
Learning=generalization
“AcomputerprogramissaidtolearnfromexperienceEwithrespecttosomeclassoftasksT andperformancemeasureP,ifitsperformanceattasksinT,asmeasuredbyP,improveswithexperienceE.”
TomMitchell(1999)
58
Wesawdifferent“models”
Or:whatkindofafunctionshouldalearnerlearn
– Linearclassifiers
– Decisiontrees
– Non-linearclassifiers,featuretransformations,neuralnetworks
– Ensemblesofclassifiers
59
Differentlearningprotocols
• Supervisedlearning– Ateachersuppliesacollectionofexampleswithlabels– Thelearner hastolearntolabelnewexamplesusingthisdata
• Wedidnotsee– Unsupervisedlearning
• Noteacher,learnerhasonlyunlabeledexamples• Datamining
– Semi-supervisedlearning• Learnerhasaccesstobothlabeledandunlabeledexamples
60
Learningalgorithms
• Onlinealgorithms:Learnercanaccessonlyonelabeledatatime– Perceptron
• Batchalgorithms:Learnercanaccesstotheentiredataset– NaïveBayes– Supportvectormachines,logisticregression– Decisiontreesandnearestneighbors– Boosting– Neuralnetworks
61
Representingdata
Whatisthebestwaytorepresentdataforaparticulartask?
• Features
• Dimensionalityreduction(wedidn’tcoverthis,butdolookatthematerialifyouareinterested)
62
Thetheoryofmachinelearning
Mathematicallydefininglearning
– Onlinelearning
– ProbablyApproximatelyCorrect(PAC)Learning
– Bayesianlearning
63
Representation,optimization,evaluation
64Tablefrom[Domingos,2012]
Machinelearningistoo easy!
• Remarkablydiversecollectionofideas
• Yet,inpracticemanyoftheseapproachesworkroughlyequallywell– Eg:SVMvslogisticregressionvsaveragedperceptron
65
Whatwedidnotsee
MachinelearningisalargeandgrowingareaofscientificstudyWedidnotcover• Kernelmethods• Unsupervisedlearning,clustering• HiddenMarkovmodels• Multiclasssupportvectormachines• Topicmodels• Structuredmodels• ….
66
Butwesawthefoundationsofhowtothinkaboutmachinelearning
Whatwedidnotsee
MachinelearningisalargeandgrowingareaofscientificstudyWedidnotcover• Kernelmethods• Unsupervisedlearning,clustering• HiddenMarkovmodels• Multiclasssupportvectormachines• Topicmodels• Structuredmodels• ….
67
Butwesawthefoundationsofhowtothinkaboutmachinelearning
Severalclassesthatcanfollow(orarerelatedto)thiscourse:• DataMining• Clustering• TheoryofMachineLearning• Variousapplications(NLP,vision,…)• Datavisualization• Specializations:Deeplearning,StructuredPrediction,etc
Thiscourse
Focusontheunderlyingconceptsandalgorithmicideasinthefieldofmachinelearning
Not about• Usingaspecificmachinelearningtool• Anysinglelearningparadigm
68
Whatwesaw
1. Abroadtheoreticalandpracticalunderstandingofmachinelearningparadigmsandalgorithms
2. Abilitytoimplementlearningalgorithms
3. Identifywheremachinelearningcanbeappliedandmakethemostappropriatedecisions(aboutalgorithms,models,supervision,etc)
69