machine learning concepts for software monitoring - lior redlus, coralogix - devopsdays tel aviv...

53
Machine Learning concepts for software monitoring Lior Redlus Co-founder and Chief Scientist Coralogix

Upload: devopsdays-tel-aviv

Post on 08-Jan-2017

48 views

Category:

Technology


0 download

TRANSCRIPT

Machine Learning conceptsforsoftwaremonitoring

Lior RedlusCo-founderandChiefScientist

Coralogix

AboutMyself

• 31yr.Scientistatheart.

• B.Sc andM.Sc inNeuroscienceandInformationProcessing(BIU)

• Co-founderandChiefScientist@Coralogix

AboutCoralogix

• AMachineLearningplatformforsoftwareLogAnalysis

• LogManagementalreadyincluded:indexing,querying,filtering,alertingetc.

• Coralogix Analytics:• Turnsyourdataintopatternsandflows• Givesyoudeepinsightsonyoursystem• Automaticallydetectsproductionproblems

Inthistalk…

• We’llexploresomechallengesinsoftwarelogstoday

• Haveanoverviewofmachinelearningandsomeusecases

• Suggestafully-automaticalgorithmforanomalydetectioninlogs

Schedule:

• Logstoday

• MachineLearningtotherescue!

• TypesofMachineLearning

• ApplyingtoLogRecords

• Possibleloganalysispipeline

Logstoday(1)

• Whatdoweusethemfor?• Debugging• Security• Compliance• Useranalytics• andmanymore!

• Twousecasesstandout:• ProductionMonitoring(70%)• ProductionTroubleshooting(67%)

Logstoday(2)

• Open-sourcesoftwareacceleratesdevelopment

• Cloudenablesmassivescale

• Evensmallcompaniesaregeneratinghugeamountsoflogs

• Thegrowthisexponential!

Logstoday(3)

• LogManagement(andBigData)approach:

1. Collecteverything!

2. Don’tworry,we’llknowwhattodowhenweneedit

• Orwillwe..?

Logstoday(4)

• TheproblemwithLogManagement:

• Humansdotheanalysis

• Andhumansarebadat…• Identifyingcomplexrelationships• Noticingsmall(butimportant)changes• Staying100%infocusallthetime

Logstoday(5)

• ToomuchtimeiswastedonFINDING issuesinsteadofFIXING them

• MostDevOps spend>70%ofissueresolutiontimejusttofindwhatwentwrong!

• Problem:LogManagementdoesnothavea“brain”

• Solution:Giveitabrain!

Inotherwords:welcometoLogAnalytics

Logstoday(6)

Schedule:

• Logstoday

• MachineLearningtotherescue!

• TypesofMachineLearning

• ApplyingtoLogRecords

• Possibleloganalysispipeline

MachineLearningtotherescue

• WhatisMachineLearning?

"Fieldofstudythatgivescomputerstheabilitytolearnwithoutbeingexplicitlyprogrammed.“

- ArthurSamuel,pioneerofMachineLearning,1959

MachineLearningtotherescue

• Traditionalcoding:• Youhaveamodeloftheworld• Youwritecodethatexplicitlyrepresentsthismodel• Thecodebehavesexactlyasexpected• Needtomanuallyupdatethecodeinachangingworld

MachineLearningtotherescue

• MachineLearning:• Youhavelooseconceptsabouttheworld(orevennone!)• Youwritecodethatlearnsthedataandbuildsmodelsoftheworld• Theexactbehaviorofthecodeisnotknown,butgenerallyworkswell• Canautomaticallyupdatethemodelasneeded!

• Howwell?• Muchfasterthanhumans• Sometimeswithbetteraccuracy!

Schedule:

• Logstoday

• MachineLearningtotherescue!

• TypesofMachineLearning

• ApplyingtoLogRecords

• Possibleloganalysispipeline

• SupervisedLearning:• Usesdatawithclearly-definedoutput(“labeleddata”)• Machinelearnsexplicitlythroughrightandwronganswers

• Twomaintypes:• Regression – Predictcontinuousvaluesbasedonsetsof(correlated)data• Classification– Predicttheclassofanitembasedonitsproperties

TypesofMachineLearning- Supervised

• Regression1– Giventhetemperatureandyogurtsold• Predictthetemperaturebasedonamountofyogurtsold

• Linearregression:

TypesofMachineLearning– Regression(1)

Temperature(F)

Frozenyogurtsold(lbs)

• Regression2– Givencupsofcoffeesoldper10minutes• Predicthowmanycupsaresoldonanygiventimeoftheday

• Linearregression:

• Polynomialregression:

TypesofMachineLearning– Regression(2)

Timeofday(hours)

Cupsofcoffeesold

• Classification:canweautomaticallyidentifythetypeofaniris?

• Assumption:wecandifferentiateiristypesbytheirleavessizes

TypesofMachineLearning– Classification(1)

TypesofMachineLearning– Classification(2)

• Classification:givenleavessizesofirises(Fisher’s dataset,1936)• Predictwhichtypeisanirisbasedonitsleaves

TypesofMachineLearning– Classification(3)

• Classification:Fisher’s irisdataset• SupportVectorMachine(SVM)achieves73%accuracy!

SepalW

idth(cm)

SepalLength(cm)

SVMwithlinearkernel

SepalLength(cm)

SepalW

idth(cm)

setosa

versicolor

virginica

• Reinforcement(reward-based)Learning:• Asetofrulesdefinesinteractionwiththeenvironment• “Good”actionsmaygrantrewards• “Bad”actionsmayreducerewards• Machinetriestomaximizethisscore

• Usedingamebots,recommendersystemsetc.

TypesofMachineLearning– Reinforcement(1)

TypesofMachineLearning– Reinforcement(2)

• Recommendersystems:• Buildprofilesforitemsandforusers• Recommendanitemtoauserbasedonpreviouspurchases• Gainrewardswhenusersclickonrecommendeditems• Updateprofilesbasedonrecommendations,ratingsetc.

TypesofMachineLearning– Reinforcement(3)

• Generallyspeaking,recommendersystemsoffersimilarthingstosimilarusers:

Jim

Bob

TypesofMachineLearning– Unsupervised(1)

• Problem:• Supervisedlearningisgood,butrequireslabeleddata• Mostdataintheworldisnotlabeled,there’snoright/wronganswer• Labelingrequireshumaneffortà tediousandexpensive

• UnsupervisedLearning:• Themachineautomaticallyrecognizesrelationshipsinthedata• Norightorwronganswersaregiven• ManytimesusedtoenhanceSupervisedLearning

• Someapproachesinclude:• Clusteringalgorithms:k-means,k-nearest-neighborsetc.• Anomalydetectionofrareevents• Deeplearning(forprettymucheverything…)

• DeepLearningapproach:• Learnfromalotofnon-labeleddata• Learnhighlynon-linearcorrelations(representcomplexrelationships)• Surprisinglygoodresultsformanyapplications!

TypesofMachineLearning– Unsupervised(2)

TypesofMachineLearning– Unsupervised(3)

• DeepLearning:canweautomaticallyclusterdigitstogether?• Data:60,000b/w20x20pixelimagesofhand-writtendigits• Eachimageis“flattened”toa1Dvectorof400floatingpointvalues[0..1]

[0.0,0.0,0.01,0.07,0.07,0.07,0.49,0.65,1.0,0.97,…,0.0,0.0]

TypesofMachineLearning– Unsupervised(4)

• DeepLearning:canweautomaticallyclusterdigitstogether?• Imagevectorsarefedtotheneuralnetwork

[0.0,0.0,0.01,0.07,0.07,0.07,0.49,0.65,1.0,0.97,…,0.0,0.0]

.

.

..........

TypesofMachineLearning– Unsupervised(5)

• DeepLearning:canweautomaticallyclusterdigitstogether?• Theneuralnetworkautomaticallylearnsfeaturesoftheimages• Eachneuron“lightsup”whenitrecognizesafeatureinthepreviouslayer

roundedges

verticallines

diagonal lines

… etc …..

TypesofMachineLearning– Unsupervised(6)

• DeepLearning:canweautomaticallyclusterdigitstogether?• Thelastlayerrecognizeshighlycomplexfeaturesoftheimage:thedigits!• Thismethodachievesanamazing0.2%errorrate inthistask!

[0.0,0.0,0.01,0.07,0.97,…,0.0,0.0]

3

1

Output:1

Schedule:

• Logstoday

• MachineLearningtotherescue!

• TypesofMachineLearning

• ApplyingtoLogRecords

• Possibleloganalysispipeline

ApplyingtoLogRecords(1)

• Problems:• Logdataisveryredundant• Hardtofindtheimportantevents• Rarelogsareaneedleinthehaystack

• Also:• Actionsinthesystemarerepresentedbyaseriesoflogsrecords• Butotherlogsinterruptthevisualflow• Tracingthelogsofacompleteactionishard

ApplyingtoLogRecords(2)

• Solutions:• Identifylogprototypes(“logtemplates”)• Clusterlogswhichrepresentanaction• Alertwhenactionsareincompleteoranomalous• Notifyaboutnewerrorswhichhaveneveroccurredbefore

Andmuchmore!

Logprototypesdistribution– real-world• The10mostfrequentlogsmakeup~60%ofthedata(!)

LogPrototypes

LogFrequency

Logprototypesdistribution– real-world

Showmestatisticsandcorrelatethese:

Alertmewhenthesehappen:

Today’sschedule:

• Logstoday

• MachineLearningtotherescue!

• TypesofMachineLearning

• Possibleloganalysispipeline

Loganalysispipeline- clustering

• Clusterlogrecords(rawstrings)intologprototypes:I. FindadistancemetrictocomparelogrecordsII. CreateanewtypeoflogifdistanceistoofarIII. Findthevariableswithinlogtypes

Log1: “CreatingtagonStream:-1Position:42”

Log2: “CreatingtagonStream:2Position:65”

Loganalysispipeline- clustering

Loganalysispipeline- clustering

• Problem:comparingalllogsub-stringsisexpensive!• Solution:useheuristicdistancemethods

“CreatingtagonStream:-1Position:42”{Creating}{tag}{on}{Stream:}{-1}...

Locality-sensitivehashing(LSH)

0011000010…0100

Log1Hash

Log2Hash…

LognHash

Loganalysispipeline- clustering

• Result:Mrawlogrecordsà Nlogprototypes(N<<M)

• Misinthebillions;Nisinthethousands

“CreatingtagonStream:-1Position:42”

“CreatingtagonStream:2Position:65”

“CreatingtagonStream:{var1}Position:{var2}”

Loganalysispipeline– variablestatistics

• Modeldistributionofvariableswithinlogprototypes• Defineanomalyboundaries

“CreatingtagonStream:{var1}Position: {var2}”

ValuesVariable[-1,2,…,1]var1

[42,65,… ,53]var2Anomalousvalues

Loganalysispipeline– sequencefinding

• Findsequencesoflogprototypesthatarestatistically-related

• Independenceassumption– iflogsareunrelated,allpairsshouldhavethesameprobability

• Sequenceswithrelatedlogswillhavehighercounts,andbreakthe G-Test:

Authenticatepayment

Loganalysispipeline– sequencefinding

Purchaserequest

GetcartfromDB

ProcessDBresponse

Sendresponsetoclient

UpdateBIsystem2

Markascomplete

1 2 3

4

6

7

UpdateBIsystem1

55

Loganalysispipeline– sequencefinding

• Countalllogsequencesoflength2(2-sequences)• L1L2 willbeafrequent2-sequence• WeexpectnottofindanyoccurrencesofL1L4

Loganalysispipeline– sequencefinding

• Aftermappingall2-sequences,normalizetheirscores:• Subtractbytheaverage• Dividebythevariance

• Trytolengthenall2-sequencesbyonelogto3-sequences

𝑆#$ %

=𝐹𝑟𝑒𝑞 𝑆#

$ − 𝜇 𝑆 $

𝜎 𝑆 $

Loganalysispipeline– sequencefinding

• Repeattheprocess:

• Foreachk-sequencetrytoconstructalonger(k+1)-sequence

• StopwhenfailingtheG-Testorwhenthenormalizedscoredecreases:

• Savethek-sequenceasvalid(anactioninthesystem)

𝑆#/ %

< 𝑆#/1# %

Loganalysispipeline

• Determinetheratioofeachlogwithinthesequence• E.g.1:1:1isa3-sequencewheretheratioofeachlogprototypeisthesame

• Inourexample:• 1:1:1:1:2:1:1, a7-sequencewithonelogprototypeexpectedtwiceasmuchastheothers

Loganalysispipeline

• Alert about a sequence anomaly when ratio is distant enough fromthe valid sequence, e.g. 𝑝 < 0.001

• Software is constantly changing – update all models all the time

• Of course, there is much more then we explored here!

Summary

• EveryonewillanalyzetheirBigData– includinglogs

• Hardtodobyyourself– butextremelyrewarding!

• Mostimportantly:

Youcanfocusonyourproductinsteadofitsbugs

Questions?

• Pleasefeelfreetocontactmedirectly:

LiorRedlus,ChiefScientist,[email protected]

http://www.coralogix.com