web intrusion detection with bayesian network by kanatoko avtokyo 2013.5 english slide
TRANSCRIPT
Copyright (c) Bitforest Co., Ltd.
Web Intrusion Detection with Bayesian Network
KanatokoChief Tech OfficerBitforest Co.,Ltd.
@kinyukahttp://www.jumperz.net/
http://www.scutum.jp/
02/17/141
Copyright (c) Bitforest Co., Ltd.
Who am I?
– Kanatoko– Web Application Firewall Developer– My mission: Building accurate WAF
• Reduce false positives/false negatives
02/17/142
Copyright (c) Bitforest Co., Ltd.
Bayes’ theorem
– Used when we want to calculate P(A|B) when P(B|A) is known
– P(B|A) : the probability of event B given event A– Not so hard to understand
02/17/143
Copyright (c) Bitforest Co., Ltd.
What is Bayesian Network?
– probabilistic graphical model (a type of statistical model) that represents a set of random variables and their conditional dependencies via a graph(Wikipedia)
02/17/144
AVTokyo
HackerDrunken
Beer in hand
Copyright (c) Bitforest Co., Ltd.
Famous sprinkler example
02/17/145
•Nodes and Edges represent cause and effect•Probabilities are shown as tables (CPT: conditional probability table)•Observations(=Evidences) are used as Input to nodes•Unobservable nodes are used as Output (= What want to know )•“Glass is wet. What is the probability it rained?”
Copyright (c) Bitforest Co., Ltd.
Weka
– OSS, Java, Data mining software– GUI/lib/tools– (Sprinkler Demo)
02/17/146
Copyright (c) Bitforest Co., Ltd.
Web Intrusion Detection with Bayesian Network
02/17/147
•Probability that the HTTP request is an attack: 1%•Probability that the HTTP request is NOT an attack: 99%•Probability that the HTTP request contains ‘alert’ given that the request is an attack: 8%•Probability that the HTTP request contains ‘alert’ given that the request is NOT an attack: 92%•Probability that the HTTP request contains ‘alert’ given that the request is NOT an attack: 0.2%•Probability that the HTTP request NOT contains ‘alert’ given that the request is NOT an attack: 99.8%
What is the probability that the HTTP request is an attack? 1%
What is the probability that the HTTP request is an attackGiven that the HTTP request contains ‘alert’
28.8%
Copyright (c) Bitforest Co., Ltd.
Spam filter and Naïve Bayes
02/17/148
Copyright (c) Bitforest Co., Ltd.
Building Accurate Intrusion Detection System / Web Application Firewall
– Signature-based ( Blacklist)• If ‘alert’ then die!• Simple and has some advantages
– Clear– Performance: Stable / Fast enough– Maintainable/Human readable
• Disadvantage: High false positive rate
02/17/149
Copyright (c) Bitforest Co., Ltd.
Building Accurate Intrusion Detection System / Web Application Firewall(cont)
– Threshold model (vs. simple signature/blacklist model)• Inc/Dec scores on each signature matching• Treated as an attack when total score exceeds the
certain threshold• Low false positives (good)• Hard to change/maintenance(bad)• Example rule 1: score +5 on ‘UNION’• Example rule 2: score +5 on ‘SELECT’• When both ‘UNION’ and ‘SELECT’ found… score +10 ?• Example rule 3: score +20 on ‘UNION and SELECT’• Too complicated
02/17/1410
Copyright (c) Bitforest Co., Ltd.
Building Accurate Intrusion Detection System / Web Application Firewall(cont)
– Threshold model (vs. simple signature/blacklist model)
– Score +5 on ‘Alert’ ( XSS )– Score +5 on ‘UNION’ ( SQLi )– Score +10 on “Alert UNION”?– Should distinct XSS and SQLi (classes)
02/17/1411
Copyright (c) Bitforest Co., Ltd.
Building Accurate Intrusion Detection System / Web Application Firewall(cont)
– Bayesian Network• Resolves almost all problems of the threshold model
02/17/1412
Copyright (c) Bitforest Co., Ltd.
Advantages of Bayesian Network
– Complicated relations can be modeled as network (GUI)
– Computation result is expressed as probability– Easy to maintain– Corresponds to expert knowledge
02/17/1413
Copyright (c) Bitforest Co., Ltd.
Complicated relations can be modeled as network (GUI)
– One to many, weak/strong relations can be expressed– Models can be developed in GUI tool and then can be
used to compute the probabilities– We use Weka Bayesian Network Editor– Example: XSS/CMS– Example: VA/User in Japan– Example: ‘eval’ and Programming languages(Java,
Ruby, JavaScript, Perl, PHP… )
02/17/1414
Copyright (c) Bitforest Co., Ltd.
Computation result is expressed as probability
– ‘UNION’ only ( not special )– ‘SELECT’ only ( not special )– Both ‘UNION’ and ‘SELECT’ ( should be marked )– The probability of ‘rare case’ is calculated as high by
Bayes Theorem
02/17/1415
Copyright (c) Bitforest Co., Ltd.
Easy to maintain
– Intermediate nodes(mediating variables) play important role
– Influences are as expected when we update the values in CPT
– Can be improved little by little because it is not a black box such as Neural Network
02/17/1416
Copyright (c) Bitforest Co., Ltd.
Corresponds to expert knowledge
“If A and B, then maybe C …”
Is expressed as probability
Similarity between human decision making process and Bayesian Network
02/17/1417
Copyright (c) Bitforest Co., Ltd.
Conclusion
Bayesian Network can be used to make decisions based on observations
If “Human(Expert) can detect attacks”
Then, We want the computer to do that
Use Bayesian Network!
02/17/1418
Copyright (c) Bitforest Co., Ltd.
We’re hiring!
– Bitforest Co.,Ltd.– Web Application Security Expert– Data Science Expert– Contact to @kinyuka
02/17/1419