machine learning concepts for software monitoring - lior redlus, coralogix - devopsdays tel aviv...
TRANSCRIPT
AboutMyself
• 31yr.Scientistatheart.
• B.Sc andM.Sc inNeuroscienceandInformationProcessing(BIU)
• Co-founderandChiefScientist@Coralogix
AboutCoralogix
• AMachineLearningplatformforsoftwareLogAnalysis
• LogManagementalreadyincluded:indexing,querying,filtering,alertingetc.
• Coralogix Analytics:• Turnsyourdataintopatternsandflows• Givesyoudeepinsightsonyoursystem• Automaticallydetectsproductionproblems
Inthistalk…
• We’llexploresomechallengesinsoftwarelogstoday
• Haveanoverviewofmachinelearningandsomeusecases
• Suggestafully-automaticalgorithmforanomalydetectioninlogs
Schedule:
• Logstoday
• MachineLearningtotherescue!
• TypesofMachineLearning
• ApplyingtoLogRecords
• Possibleloganalysispipeline
Logstoday(1)
• Whatdoweusethemfor?• Debugging• Security• Compliance• Useranalytics• andmanymore!
• Twousecasesstandout:• ProductionMonitoring(70%)• ProductionTroubleshooting(67%)
Logstoday(2)
• Open-sourcesoftwareacceleratesdevelopment
• Cloudenablesmassivescale
• Evensmallcompaniesaregeneratinghugeamountsoflogs
• Thegrowthisexponential!
Logstoday(3)
• LogManagement(andBigData)approach:
1. Collecteverything!
2. Don’tworry,we’llknowwhattodowhenweneedit
• Orwillwe..?
Logstoday(4)
• TheproblemwithLogManagement:
• Humansdotheanalysis
• Andhumansarebadat…• Identifyingcomplexrelationships• Noticingsmall(butimportant)changes• Staying100%infocusallthetime
Logstoday(5)
• ToomuchtimeiswastedonFINDING issuesinsteadofFIXING them
• MostDevOps spend>70%ofissueresolutiontimejusttofindwhatwentwrong!
• Problem:LogManagementdoesnothavea“brain”
• Solution:Giveitabrain!
Inotherwords:welcometoLogAnalytics
Logstoday(6)
Schedule:
• Logstoday
• MachineLearningtotherescue!
• TypesofMachineLearning
• ApplyingtoLogRecords
• Possibleloganalysispipeline
MachineLearningtotherescue
• WhatisMachineLearning?
"Fieldofstudythatgivescomputerstheabilitytolearnwithoutbeingexplicitlyprogrammed.“
- ArthurSamuel,pioneerofMachineLearning,1959
MachineLearningtotherescue
• Traditionalcoding:• Youhaveamodeloftheworld• Youwritecodethatexplicitlyrepresentsthismodel• Thecodebehavesexactlyasexpected• Needtomanuallyupdatethecodeinachangingworld
MachineLearningtotherescue
• MachineLearning:• Youhavelooseconceptsabouttheworld(orevennone!)• Youwritecodethatlearnsthedataandbuildsmodelsoftheworld• Theexactbehaviorofthecodeisnotknown,butgenerallyworkswell• Canautomaticallyupdatethemodelasneeded!
• Howwell?• Muchfasterthanhumans• Sometimeswithbetteraccuracy!
Schedule:
• Logstoday
• MachineLearningtotherescue!
• TypesofMachineLearning
• ApplyingtoLogRecords
• Possibleloganalysispipeline
• SupervisedLearning:• Usesdatawithclearly-definedoutput(“labeleddata”)• Machinelearnsexplicitlythroughrightandwronganswers
• Twomaintypes:• Regression – Predictcontinuousvaluesbasedonsetsof(correlated)data• Classification– Predicttheclassofanitembasedonitsproperties
TypesofMachineLearning- Supervised
• Regression1– Giventhetemperatureandyogurtsold• Predictthetemperaturebasedonamountofyogurtsold
• Linearregression:
TypesofMachineLearning– Regression(1)
Temperature(F)
Frozenyogurtsold(lbs)
• Regression2– Givencupsofcoffeesoldper10minutes• Predicthowmanycupsaresoldonanygiventimeoftheday
• Linearregression:
• Polynomialregression:
TypesofMachineLearning– Regression(2)
Timeofday(hours)
Cupsofcoffeesold
• Classification:canweautomaticallyidentifythetypeofaniris?
• Assumption:wecandifferentiateiristypesbytheirleavessizes
TypesofMachineLearning– Classification(1)
TypesofMachineLearning– Classification(2)
• Classification:givenleavessizesofirises(Fisher’s dataset,1936)• Predictwhichtypeisanirisbasedonitsleaves
TypesofMachineLearning– Classification(3)
• Classification:Fisher’s irisdataset• SupportVectorMachine(SVM)achieves73%accuracy!
SepalW
idth(cm)
SepalLength(cm)
SVMwithlinearkernel
SepalLength(cm)
SepalW
idth(cm)
setosa
versicolor
virginica
• Reinforcement(reward-based)Learning:• Asetofrulesdefinesinteractionwiththeenvironment• “Good”actionsmaygrantrewards• “Bad”actionsmayreducerewards• Machinetriestomaximizethisscore
• Usedingamebots,recommendersystemsetc.
TypesofMachineLearning– Reinforcement(1)
TypesofMachineLearning– Reinforcement(2)
• Recommendersystems:• Buildprofilesforitemsandforusers• Recommendanitemtoauserbasedonpreviouspurchases• Gainrewardswhenusersclickonrecommendeditems• Updateprofilesbasedonrecommendations,ratingsetc.
TypesofMachineLearning– Reinforcement(3)
• Generallyspeaking,recommendersystemsoffersimilarthingstosimilarusers:
Jim
Bob
TypesofMachineLearning– Unsupervised(1)
• Problem:• Supervisedlearningisgood,butrequireslabeleddata• Mostdataintheworldisnotlabeled,there’snoright/wronganswer• Labelingrequireshumaneffortà tediousandexpensive
• UnsupervisedLearning:• Themachineautomaticallyrecognizesrelationshipsinthedata• Norightorwronganswersaregiven• ManytimesusedtoenhanceSupervisedLearning
• Someapproachesinclude:• Clusteringalgorithms:k-means,k-nearest-neighborsetc.• Anomalydetectionofrareevents• Deeplearning(forprettymucheverything…)
• DeepLearningapproach:• Learnfromalotofnon-labeleddata• Learnhighlynon-linearcorrelations(representcomplexrelationships)• Surprisinglygoodresultsformanyapplications!
TypesofMachineLearning– Unsupervised(2)
TypesofMachineLearning– Unsupervised(3)
• DeepLearning:canweautomaticallyclusterdigitstogether?• Data:60,000b/w20x20pixelimagesofhand-writtendigits• Eachimageis“flattened”toa1Dvectorof400floatingpointvalues[0..1]
[0.0,0.0,0.01,0.07,0.07,0.07,0.49,0.65,1.0,0.97,…,0.0,0.0]
TypesofMachineLearning– Unsupervised(4)
• DeepLearning:canweautomaticallyclusterdigitstogether?• Imagevectorsarefedtotheneuralnetwork
[0.0,0.0,0.01,0.07,0.07,0.07,0.49,0.65,1.0,0.97,…,0.0,0.0]
.
.
..........
TypesofMachineLearning– Unsupervised(5)
• DeepLearning:canweautomaticallyclusterdigitstogether?• Theneuralnetworkautomaticallylearnsfeaturesoftheimages• Eachneuron“lightsup”whenitrecognizesafeatureinthepreviouslayer
roundedges
verticallines
diagonal lines
… etc …..
TypesofMachineLearning– Unsupervised(6)
• DeepLearning:canweautomaticallyclusterdigitstogether?• Thelastlayerrecognizeshighlycomplexfeaturesoftheimage:thedigits!• Thismethodachievesanamazing0.2%errorrate inthistask!
[0.0,0.0,0.01,0.07,0.97,…,0.0,0.0]
3
1
Output:1
Schedule:
• Logstoday
• MachineLearningtotherescue!
• TypesofMachineLearning
• ApplyingtoLogRecords
• Possibleloganalysispipeline
ApplyingtoLogRecords(1)
• Problems:• Logdataisveryredundant• Hardtofindtheimportantevents• Rarelogsareaneedleinthehaystack
• Also:• Actionsinthesystemarerepresentedbyaseriesoflogsrecords• Butotherlogsinterruptthevisualflow• Tracingthelogsofacompleteactionishard
ApplyingtoLogRecords(2)
• Solutions:• Identifylogprototypes(“logtemplates”)• Clusterlogswhichrepresentanaction• Alertwhenactionsareincompleteoranomalous• Notifyaboutnewerrorswhichhaveneveroccurredbefore
Andmuchmore!
Logprototypesdistribution– real-world• The10mostfrequentlogsmakeup~60%ofthedata(!)
LogPrototypes
LogFrequency
Today’sschedule:
• Logstoday
• MachineLearningtotherescue!
• TypesofMachineLearning
• Possibleloganalysispipeline
Loganalysispipeline- clustering
• Clusterlogrecords(rawstrings)intologprototypes:I. FindadistancemetrictocomparelogrecordsII. CreateanewtypeoflogifdistanceistoofarIII. Findthevariableswithinlogtypes
Log1: “CreatingtagonStream:-1Position:42”
Log2: “CreatingtagonStream:2Position:65”
Loganalysispipeline- clustering
Loganalysispipeline- clustering
• Problem:comparingalllogsub-stringsisexpensive!• Solution:useheuristicdistancemethods
“CreatingtagonStream:-1Position:42”{Creating}{tag}{on}{Stream:}{-1}...
Locality-sensitivehashing(LSH)
0011000010…0100
Log1Hash
Log2Hash…
LognHash
Loganalysispipeline- clustering
• Result:Mrawlogrecordsà Nlogprototypes(N<<M)
• Misinthebillions;Nisinthethousands
“CreatingtagonStream:-1Position:42”
“CreatingtagonStream:2Position:65”
“CreatingtagonStream:{var1}Position:{var2}”
Loganalysispipeline– variablestatistics
• Modeldistributionofvariableswithinlogprototypes• Defineanomalyboundaries
“CreatingtagonStream:{var1}Position: {var2}”
ValuesVariable[-1,2,…,1]var1
[42,65,… ,53]var2Anomalousvalues
Loganalysispipeline– sequencefinding
• Findsequencesoflogprototypesthatarestatistically-related
• Independenceassumption– iflogsareunrelated,allpairsshouldhavethesameprobability
• Sequenceswithrelatedlogswillhavehighercounts,andbreakthe G-Test:
Authenticatepayment
Loganalysispipeline– sequencefinding
Purchaserequest
GetcartfromDB
ProcessDBresponse
Sendresponsetoclient
UpdateBIsystem2
Markascomplete
1 2 3
4
6
7
UpdateBIsystem1
55
Loganalysispipeline– sequencefinding
• Countalllogsequencesoflength2(2-sequences)• L1L2 willbeafrequent2-sequence• WeexpectnottofindanyoccurrencesofL1L4
Loganalysispipeline– sequencefinding
• Aftermappingall2-sequences,normalizetheirscores:• Subtractbytheaverage• Dividebythevariance
• Trytolengthenall2-sequencesbyonelogto3-sequences
𝑆#$ %
=𝐹𝑟𝑒𝑞 𝑆#
$ − 𝜇 𝑆 $
𝜎 𝑆 $
Loganalysispipeline– sequencefinding
• Repeattheprocess:
• Foreachk-sequencetrytoconstructalonger(k+1)-sequence
• StopwhenfailingtheG-Testorwhenthenormalizedscoredecreases:
• Savethek-sequenceasvalid(anactioninthesystem)
𝑆#/ %
< 𝑆#/1# %
Loganalysispipeline
• Determinetheratioofeachlogwithinthesequence• E.g.1:1:1isa3-sequencewheretheratioofeachlogprototypeisthesame
• Inourexample:• 1:1:1:1:2:1:1, a7-sequencewithonelogprototypeexpectedtwiceasmuchastheothers
Loganalysispipeline
• Alert about a sequence anomaly when ratio is distant enough fromthe valid sequence, e.g. 𝑝 < 0.001
• Software is constantly changing – update all models all the time
• Of course, there is much more then we explored here!
Summary
• EveryonewillanalyzetheirBigData– includinglogs
• Hardtodobyyourself– butextremelyrewarding!
• Mostimportantly:
Youcanfocusonyourproductinsteadofitsbugs
Questions?
• Pleasefeelfreetocontactmedirectly:
LiorRedlus,ChiefScientist,[email protected]
http://www.coralogix.com