practical advice for building machine learning applications · 2019-12-05 · machine learning...

MachineLearning

PracticalAdvicefor

BuildingMachineLearningApplications

1BasedonlecturesandpapersbyAndrewNg,PedroDomingos,TomMitchellandothers

MLandtheworld

• Diagnosticsofyourlearningalgorithm

• Erroranalysis

• InjectingmachinelearningintoYourFavoriteTask

• Makingmachinelearningmatter

2

MakingMLworkintheworld

MostlyexperientialadviceAlsobasedonwhatotherpeoplehavesaidSeereadingsonclasswebsite

MLandtheworld


• Erroranalysis



3

Debuggingmachinelearning

SupposeyoutrainanSVMoralogisticregressionclassifierforspamdetection

Youobviously followbestpracticesforfindinghyper-parameters(suchascross-validation)

Yourclassifierisonly75%accurate

Whatcanyoudotoimproveit?

4

(assumingthattherearenobugsinthecode)

Differentwaystoimproveyourmodel

Moretrainingdata

Features1. Usemorefeatures2. Usefewerfeatures3. Useotherfeatures

Bettertraining1. Runformoreiterations2. Useadifferentalgorithm3. Useadifferentclassifier4. Playwithregularization

5


Moretrainingdata



6

Tedious!Andpronetoerrors,dependenceonluck

Letustrytomakethisprocessmoremethodical

First,diagnostics

Somepossibleproblems:

1. Over-fitting(highvariance)

2. Under-fitting(highbias)

3. Yourlearningdoesnotconverge

4. Areyoumeasuringtherightthing?

7

Easiertofixaproblemifyouknowwhereitis

Detectingoverorunderfitting

Over-fitting:Thetrainingaccuracyismuchhigherthanthetestaccuracy– Themodelexplainsthetrainingsetverywell,butpoorgeneralization

Under-fitting:Bothaccuraciesareunacceptablylow– Themodelcannotrepresenttheconceptwellenough

8

Detectinghighvariance usinglearningcurves

9

Sizeoftrainingdata

Error

Trainingerror


10

Sizeoftrainingdata

Error

Trainingerror

Generalizationerror/testerror


11

Testerrorkeepsdecreasingastrainingsetincreases)moredatawillhelp

Sizeoftrainingdata

Error

Largegapbetweentrainandtesterror

Trainingerror


Typicallyseenformorecomplexmodels

Detectinghighbiasusinglearningcurves

12

Sizeoftrainingset

Error Trainingerror


Bothtrainandtesterrorareunacceptable(Butthemodelseemstoconverge)

Typicallyseenformoresimplemodels


Moretrainingdata



13


Moretrainingdata



14

Helpswithover-fitting

Helpswithunder-fittingHelpswithover-fittingCouldhelpwithover-fittingandunder-fitting

Couldhelpwithover-fittingandunder-fitting

Diagnostics


ü Over-fitting(highvariance)

ü Under-fitting(highbias)

3. Yourlearningdoesnotconverge


15


Doesyourlearningalgorithmconverge?

Iflearningisframedasanoptimizationproblem,tracktheobjective

16

Iterations

ObjectiveNotyetconvergedhere Convergedhere



17

Iterations

ObjectiveNotyetconvergedhere Howabouthere?

Notalwayseasytodecide



18

Iterations

Objective

Somethingiswrong



19

Iterations

Objective

Helpstodebug

Somethingiswrong

Ifwearedoinggradientdescentonaconvexfunctiontheobjectivecan’tincrease(Caveat:ForSGD,theobjectivewillslightlyincreaseoccasionally,butnotbymuch)


Moretrainingdata



20

Helpswithoverfitting




Moretrainingdata



21

Helpswithoverfitting



Tracktheobjectiveforconvergence

Diagnostics


ü Over-fitting(highvariance)

ü Under-fitting(highbias)

ü Yourlearningdoesnotconverge


22


Whattomeasure

• Accuracyofpredictionisthemostcommonmeasurement

• Butifyourdatasetisunbalanced,accuracymaybemisleading– 1000positiveexamples,1negativeexample– Aclassifierthatalwayspredictspositivewillget99.9%accuracy.Hasit

reallylearnedanything?

• Unbalancedlabelsàmeasurelabelspecificprecision,recallandF-measure– Precisionforalabel:Amongexamplesthatarepredictedwithlabel,what

fractionarecorrect– Recallforalabel:Amongtheexampleswithgivengroundtruthlabel,

whatfractionarecorrect– F-measure:Harmonicmeanofprecisionandrecall

23

MLandtheworld


• Erroranalysis



24

MachineLearninginthisclass

25

MLcode

MachineLearningincontext

26Figurefrom[Sculley,etalNIPS2015]

ErrorAnalysis

Generallymachinelearningplaysasmall(butimportant)roleinalargerapplication• Pre-processing• Featureextraction(possiblybyotherMLbasedmethods)• Datatransformations

Howmuchdoeachofthesecontributetotheerror?

Erroranalysistriestoexplainwhyasystemisnotperformingperfectly

27

Example:Atypicaltextprocessingpipeline

28


29

Text


30

Text

Words


31

Text

Words

Parts-of-speech


32

Text

Words

Parts-of-speech

Parsetrees


33

Text

Words

Parts-of-speech

Parsetrees

AML-basedapplication


34

Text

Words

Parts-of-speech

Parsetrees


EachofthesecouldbeMLdriven

Ordeterministic

Butstillerrorprone


35

Text

Words

Parts-of-speech

Parsetrees


EachofthesecouldbeMLdriven

Ordeterministic

Butstillerrorprone

Howmuchdoeachofthesecontributetotheerrorofthefinalapplication?

Trackingerrorsinacomplexsystem

Pluginthegroundtruthfortheintermediatecomponentsandseehowmuchtheaccuracyofthefinalsystemchanges

36

System Accuracy

End-to-endpredicted 55%

Withgroundtruth words 60%

+groundtruthparts-of-speech 84%

+groundtruthparse trees 89%

+ground truthfinaloutput 100%

Trackingerrorsinacomplexsystem

Pluginthegroundtruthfortheintermediatecomponentsandseehowmuchtheaccuracyofthefinalsystemchanges

37

Errorinthepart-of-speechcomponenthurtsthemost

System Accuracy

End-to-endpredicted 55%

Withgroundtruth words 60%

+groundtruthparts-of-speech 84%

+groundtruthparse trees 89%

+ground truthfinaloutput 100%

Ablativestudy

Explainingdifferencebetweentheperformancebetweenastrongmodelandamuchweakerone(abaseline)

UsuallyseenwithfeaturesSupposewehaveacollectionoffeaturesandoursystemdoeswell,butwedon’tknowwhichfeaturesaregivingustheperformance

Evaluatesimplersystemsthatprogressivelyusefewerandfewerfeaturestoseewhichfeaturesgivethehighestboost

38

Itisnotenoughtohaveaclassifierthatworks;itisusefultoknowwhyitworks.

Helpsinterpretpredictions,diagnoseerrorsandcanprovideanaudittrail

MLandtheworld


• Erroranalysis



39

Classifyingfish

SayyouwanttobuildaclassifierthatidentifieswhetherarealphysicalfishissalmonortunaHowdoyougoaboutthis?

40

Classifyingfish


41

Theslowapproach

1. Carefullyidentifyfeatures,getthebestdata,thesoftwarearchitecture,maybedesignanewlearningalgorithm

2. Implementitandhopeitworks

Advantage:Perhapsabetterapproach,maybeevenanewlearningalgorithm.Research.

Classifyingfish


42

Theslowapproach




Thehacker’sapproach

1. Firstimplementsomething

2. Usediagnosticstoiterativelymakeitbetter

Advantage:Fasterrelease,willhaveasolutionforyourproblemquicker

Classifyingfish


43

Theslowapproach




Thehacker’sapproach

1. Firstimplementsomething

2. Usediagnosticstoiterativelymakeitbetter

Advantage:Fasterrelease,willhaveasolutionforyourproblemquicker

Bewaryofprematureoptimization

Beequallywaryofprematurelycommittingtoabadpath

Whattowatchoutfor

• Doyouhavetherightevaluationmetric?– Anddoesyourlossfunctionreflectit?

• Bewareofcontamination:Ensurethatyourtrainingdataisnotcontaminatedwiththetestset– Learning=generalizationtonewexamples– Donotseeyourtestseteither.Youmayinadvertentlycontaminatethemodel

– Bewareofcontaminatingyourfeatureswiththelabel!– (Besuspiciousofperfectpredictors)

44

Whattowatchoutfor

• Beawareofbiasvs.variancetradeoff(orover-fittingvs.under-fitting)

• Beawarethatintuitionsmaynotworkinhighdimensions– Noproofbypicture– Curseofdimensionality

• Atheoreticalguaranteemayonlybetheoretical– Maymakeinvalidassumptions(eg:ifthedataisseparable)– Mayonlybelegitimatewithinfinitedata(eg:estimatingprobabilities)– Experimentsonrealdataareequallyimportant

45

Bigdataisnotenough

Butmoredataisalwaysbetter– Cleanerdataisevenbetter

Rememberthatlearningisimpossiblewithoutsomebiasthatsimplifiesthesearch– Otherwise,nogeneralization

Learningrequiresknowledgetoguidethelearner– Machinelearningisnot amagicwand

46

Whatknowledge?

• Whichmodelistherightoneforthistask?– Linearmodels,decisiontrees,deepneuralnetworks,etc

• Whichlearningalgorithm?– Doesthedataviolateanycrucialassumptionsthatwereusedto

definethelearningalgorithmorthemodel?– Doesthatmatter?

• Featureengineeringiscrucial

• Implicitly,theseareallclaimsaboutthenatureoftheproblem

47

Miscellaneousadvice

• Learnsimplermodelsfirst– Ifnothing,atleasttheyformabaselinethatyoucanimproveupon

• Ensemblesseemtoworkbetter

• Thinkaboutwhetheryourproblemislearnableatall– Learning=generalization

48

MLandtheworld


• Erroranalysis



49

Makingmachinelearningmatter

ChallengestothegreaterMLcommunity1. AlawpassedorlegaldecisionmadethatreliesontheresultofanMLanalysis

2. $100MsavedthroughimproveddecisionmakingprovidedbyanMLsystem

3. AconflictbetweennationsavertedthroughhighqualitytranslationprovidedbyanMLsystem

4. A50%reductionincybersecurity break-insthroughMLdefenses

5. AhumanlifesavedthroughadiagnosisorinterventionrecommendedbyanMLsystem

6. Improvementof10%inonecountry’sHumanDevelopmentIndexattributabletoanMLsystem

50

MLandsystembuilding

51

SeveralrecentpapersabouthowMLfitsinthecontextoflargesoftwaresystems

Data-drivendecisionmaking:Increasinglyprevalent

Somebroaderconcernsaboutalgorithmicdecision-makingemerge:• Doclassifiersexhibitbiases?

• Howdoweensurethatsuchsystemsaretransparentintheirdecision-making?

• Whencanyoubelieveyourmodel?Shouldyouleavedecisionmakingtoit?

• Whatifstatisticalmodelsareusedinethicallydubiousways?

52

Algorithmsarenolongerjustaboutshowingproof-of-conceptlearning

Theserefertoauxiliarycriteriathatneednotdirectlytiedtothelossthatweminimize

Biasedclassifiers

Whatifclassifiersareusedtodecide…– … howlongsomeoneshouldbesentencedforacrime?– … orwhethersomeone’sloanapplicationshouldbeapproved?– … orwhethersomeoneshouldbefired?

Questions:• Howcanweensurethattheclassifiersdonotexhibitillegalor

unethicalbiases?– i.e.theclassifiersarefair?

• Howdowedesignalgorithmsandevaluationframeworkswhichavoiddiscrimination?

• Whatifthedataitself(eitherknowinglyorunknowingly)isunfair?

53

Allthesearerealexamples

Therighttoexplanations

Imagineyouhaveajobandaclassifierdecidestofireyou.– Maybebecauseitmadeanerroronthisinstance(ie.you!)

Questions:• Howdowedevelopstatisticalmethodsthatnotonly

makeaprediction,butarealsotransparentintheirdecisionmakingprocess?

• Howdowedevelopalgorithmsthatcanexplaintheirdecisions?

54

Arecurrentlegalsystemsrobust?

Perhapsthereisroomtorewrite/updatelawstoaccountformachinelearning

• Whatifevidenceisfakedbyasamplingfromagenerativemodel?

• WhoistoblameifaML-basedautomaticcarisinvolvedinanaccident?

• WhatifanML-basedautonomousweapondecidestokillsomeonewithoutanyhumanintervention?

55

Fairness,AccountabilityandTransparency

• Weneedtoensurethat– OuralgorithmsareFair– OuralgorithmscanbeheldAccountable– OuralgorithmsexhibitTransparency

• Thesearealldifficulttoformalize– Butstillimportant

• Anditisimportanttokeeptheseconcernswhenwe– Defineatask– Collectdata– Defineevaluations– Designfeatures,models– … reallyateverystepalongtheway

56

http://www.fatml.orghttps://fatconference.org

Aretrospectivelookatthecourse

57

Learning=generalization

“AcomputerprogramissaidtolearnfromexperienceEwithrespecttosomeclassoftasksT andperformancemeasureP,ifitsperformanceattasksinT,asmeasuredbyP,improveswithexperienceE.”

TomMitchell(1999)

58

Wesawdifferent“models”

Or:whatkindofafunctionshouldalearnerlearn

– Linearclassifiers

– Decisiontrees

– Non-linearclassifiers,featuretransformations,neuralnetworks

– Ensemblesofclassifiers

59

Differentlearningprotocols

• Supervisedlearning– Ateachersuppliesacollectionofexampleswithlabels– Thelearner hastolearntolabelnewexamplesusingthisdata

• Wedidnotsee– Unsupervisedlearning

• Noteacher,learnerhasonlyunlabeledexamples• Datamining

– Semi-supervisedlearning• Learnerhasaccesstobothlabeledandunlabeledexamples

60

Learningalgorithms

• Onlinealgorithms:Learnercanaccessonlyonelabeledatatime– Perceptron

• Batchalgorithms:Learnercanaccesstotheentiredataset– NaïveBayes– Supportvectormachines,logisticregression– Decisiontreesandnearestneighbors– Boosting– Neuralnetworks

61

Representingdata

Whatisthebestwaytorepresentdataforaparticulartask?

• Features

• Dimensionalityreduction(wedidn’tcoverthis,butdolookatthematerialifyouareinterested)

62

Thetheoryofmachinelearning

Mathematicallydefininglearning

– Onlinelearning

– ProbablyApproximatelyCorrect(PAC)Learning

– Bayesianlearning

63

Representation,optimization,evaluation

64Tablefrom[Domingos,2012]

Machinelearningistoo easy!

• Remarkablydiversecollectionofideas

• Yet,inpracticemanyoftheseapproachesworkroughlyequallywell– Eg:SVMvslogisticregressionvsaveragedperceptron

65

Whatwedidnotsee

MachinelearningisalargeandgrowingareaofscientificstudyWedidnotcover• Kernelmethods• Unsupervisedlearning,clustering• HiddenMarkovmodels• Multiclasssupportvectormachines• Topicmodels• Structuredmodels• ….

66

Butwesawthefoundationsofhowtothinkaboutmachinelearning

Whatwedidnotsee

MachinelearningisalargeandgrowingareaofscientificstudyWedidnotcover• Kernelmethods• Unsupervisedlearning,clustering• HiddenMarkovmodels• Multiclasssupportvectormachines• Topicmodels• Structuredmodels• ….

67

Butwesawthefoundationsofhowtothinkaboutmachinelearning

Severalclassesthatcanfollow(orarerelatedto)thiscourse:• DataMining• Clustering• TheoryofMachineLearning• Variousapplications(NLP,vision,…)• Datavisualization• Specializations:Deeplearning,StructuredPrediction,etc

Thiscourse

Focusontheunderlyingconceptsandalgorithmicideasinthefieldofmachinelearning

Not about• Usingaspecificmachinelearningtool• Anysinglelearningparadigm

68

Whatwesaw

1. Abroadtheoreticalandpracticalunderstandingofmachinelearningparadigmsandalgorithms

2. Abilitytoimplementlearningalgorithms

3. Identifywheremachinelearningcanbeappliedandmakethemostappropriatedecisions(aboutalgorithms,models,supervision,etc)

69

practical advice for building machine learning applications · 2019-12-05 · machine learning...

Documents