debugging skynet: a machine learning approach to log analysis - ianir ideses, logz.io - devopsdays...

20
Debugging Skynet A Machine Learning Approach to Log Analysis ianir ideses - Logz.io

Upload: devopsdays-tel-aviv

Post on 08-Jan-2017

42 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Debugging Skynet: A Machine Learning Approach to Log Analysis - Ianir Ideses, logz.io - DevOpsDays Tel Aviv 2016

Debugging Skynet

A Machine Learning Approach to Log Analysis

ianir ideses - Logz.io

Page 2: Debugging Skynet: A Machine Learning Approach to Log Analysis - Ianir Ideses, logz.io - DevOpsDays Tel Aviv 2016

The Problem - Overlogging• Millions of logs per week

• Important logs get lost in the clutter

• Need to surface the relevant logs, deemphasize irrelevant logs

Page 3: Debugging Skynet: A Machine Learning Approach to Log Analysis - Ianir Ideses, logz.io - DevOpsDays Tel Aviv 2016

Proposed Solution• A Machine Learning approach

• Can sift through large amounts of data

• Can evolve and react to changes in data

• Requires large amounts of data to be effective

Page 4: Debugging Skynet: A Machine Learning Approach to Log Analysis - Ianir Ideses, logz.io - DevOpsDays Tel Aviv 2016

Machine Learning• Unsupervised• Clustering• Anomaly detection

• Supervised• Recommender systems• Classifiers

Page 5: Debugging Skynet: A Machine Learning Approach to Log Analysis - Ianir Ideses, logz.io - DevOpsDays Tel Aviv 2016

Unsupervised Machine Learning• No labels are needed, just lots of data

• Useful when reducing a large amount of data points to a smaller cluster subset

Page 6: Debugging Skynet: A Machine Learning Approach to Log Analysis - Ianir Ideses, logz.io - DevOpsDays Tel Aviv 2016

Unsupervised Machine Learning

"GET /twiki/bin/edit/Main/Double_bounce_sender?topicparent=Main.Confi"GET /twiki/bin/rdiff/TWiki/NewUserTemplate?rev1=1.3&rev2=1.2 HTTP/1."GET /mailman/listinfo/hsdivision HTTP/1.1" 200 6291"GET /twiki/bin/view/TWiki/WikiSyntax HTTP/1.1" 200 7352"GET /twiki/bin/view/Main/DCCAndPostFix HTTP/1.1" 200 5253"GET /twiki/bin/oops/TWiki/AppendixFileSystem?template=oopsmore¶m1=1."GET /twiki/bin/view/Main/PeterThoeny HTTP/1.1" 200 4924"GET /twiki/bin/edit/Main/Header_checks?topicparent=Main.Configuratio"GET /twiki/bin/attach/Main/OfficeLocations HTTP/1.1" 401 12851"GET /twiki/bin/view/TWiki/WebTopicEditTemplate HTTP/1.1" 200 3732

"GET /app_dev.php/ HTTP/1.1" 200 6715 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36""GET /bundles/framework/css/body.css HTTP/1.1" 200 6657 "http://my.log-sandbox/app_dev.php/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.231"GET /bundles/framework/css/structure.css HTTP/1.1" 200 1191 "http://my.log-sandbox/app_dev.php/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42."GET /bundles/acmedemo/css/demo.css HTTP/1.1" 200 2204 "http://my.log-sandbox/app_dev.php/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311"GET /bundles/acmedemo/images/welcome-quick-tour.gif HTTP/1.1" 200 4770 "http://my.log-sandbox/app_dev.php/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko)"GET /bundles/acmedemo/images/welcome-demo.gif HTTP/1.1" 200 4053 "http://my.log-sandbox/app_dev.php/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrom

Nov 20 17:27:55 HANNIBAL MyProgram[13163]: Program started by User 1000 Nov 21 17:27:53 HANNIBAL MyProgram[13163]: Program terminated by User 1000 Nov 21 17:27:58 JANE MyProgram[13163]: Program started by User 555Nov 23 18:27:53 ARILOU MyProgram[13163]: Program stopped by User 777

Page 7: Debugging Skynet: A Machine Learning Approach to Log Analysis - Ianir Ideses, logz.io - DevOpsDays Tel Aviv 2016

Supervised Machine Learning• Learning from labeled examples

• Requires a well defined question:• Is this email spam?• Is this object a car?• Is this log interesting?

• Deployed successfully in many domains, most notable classifiers are NN, SVM, Bayesian Classifiers

Page 8: Debugging Skynet: A Machine Learning Approach to Log Analysis - Ianir Ideses, logz.io - DevOpsDays Tel Aviv 2016

Supervised Machine Learning - SVM• Data elements are arranged in vectors• Each vector index is assigned a weight in the training phase• A score is computed by summing up the relevant weights

0.1

0.5

-0.9

0.3

Xconnection error success failure“Connection failure”: 0.1 + 0.3 = 0.4

“Connection success”: 0.1 - 0.9 = -0.8

Page 9: Debugging Skynet: A Machine Learning Approach to Log Analysis - Ianir Ideses, logz.io - DevOpsDays Tel Aviv 2016

Log Relevancy• An ill posed problem

• Relevancy is user specific

• People tend to search forknown issues

• There are also unknownunknowns

• Labels are potentiallyvery tedious to acquire

Page 10: Debugging Skynet: A Machine Learning Approach to Log Analysis - Ianir Ideses, logz.io - DevOpsDays Tel Aviv 2016

Proposed Solution - Labels• Acquiring labels:• Implicit/explicit user behavior

• Inter-user similarities

• Public knowledge bases

Page 11: Debugging Skynet: A Machine Learning Approach to Log Analysis - Ianir Ideses, logz.io - DevOpsDays Tel Aviv 2016

Machine Learning in Practice• Data is textual, numerical and alphanumerical

• Classifiers that have shown good results:• Random Forests, resemble flow chart decision making• Linear SVM

• Both classifiers are easy to interpret in the feature space

Page 12: Debugging Skynet: A Machine Learning Approach to Log Analysis - Ianir Ideses, logz.io - DevOpsDays Tel Aviv 2016

Machine Learning in Practice

connected: -0.157199772246to provider: -0.15319903564connected successfully: -0.15319903564

unable: 0.671539714688topic: 0.678756599452error: 0.788508324168

Page 13: Debugging Skynet: A Machine Learning Approach to Log Analysis - Ianir Ideses, logz.io - DevOpsDays Tel Aviv 2016

Machine Learning in Practice - Modules• Log normalization

• Label acquisition

• Model training

• Log classification and enhancement

Page 14: Debugging Skynet: A Machine Learning Approach to Log Analysis - Ianir Ideses, logz.io - DevOpsDays Tel Aviv 2016

Log Normalization• Lower case, stem, stop words

• Identify common fields (timestamp, severity, etc’)

• Identify variable, functions, class names

• Identify known reserved words

• Cluster logs that share the same prototype

Page 15: Debugging Skynet: A Machine Learning Approach to Log Analysis - Ianir Ideses, logz.io - DevOpsDays Tel Aviv 2016

Labeler• Different sources for labels• CQA sites• Explicit user interaction• Implicit user interaction• Heuristics

Page 16: Debugging Skynet: A Machine Learning Approach to Log Analysis - Ianir Ideses, logz.io - DevOpsDays Tel Aviv 2016

Log Enhancer• Use knowledge about log events to add prior data

• Suggest solutions to known problems

• Tag relevant logs for display to the user

Page 17: Debugging Skynet: A Machine Learning Approach to Log Analysis - Ianir Ideses, logz.io - DevOpsDays Tel Aviv 2016

Flow

Log Normalization

Labeler

ML - Training Log Enhancer

Logs

Classifiers

Logs

Page 18: Debugging Skynet: A Machine Learning Approach to Log Analysis - Ianir Ideses, logz.io - DevOpsDays Tel Aviv 2016

Machine Learning at Scale• Use Spark to drive high throughput, high scale

• Tbytes of data, daily

• Spot Instances to keep costs at bay

Page 19: Debugging Skynet: A Machine Learning Approach to Log Analysis - Ianir Ideses, logz.io - DevOpsDays Tel Aviv 2016

To Sum Up• Formulate your question• Get enough data• Get enough labels• Clean data

• Train your classifier

Page 20: Debugging Skynet: A Machine Learning Approach to Log Analysis - Ianir Ideses, logz.io - DevOpsDays Tel Aviv 2016