five-sigma network events (and how to find them)
TRANSCRIPT
![Page 1: Five-sigma Network Events (and how to find them)](https://reader031.vdocuments.net/reader031/viewer/2022012505/61802d4511d39a727774954e/html5/thumbnails/1.jpg)
Five-sigmaNetworkEvents(andhowtofindthem)
JohnO'NeilEdgewiseNetworksHalloween,2018
![Page 2: Five-sigma Network Events (and how to find them)](https://reader031.vdocuments.net/reader031/viewer/2022012505/61802d4511d39a727774954e/html5/thumbnails/2.jpg)
Networks are Complex
�2
• Nooneknowswhat'sgoingon
![Page 3: Five-sigma Network Events (and how to find them)](https://reader031.vdocuments.net/reader031/viewer/2022012505/61802d4511d39a727774954e/html5/thumbnails/3.jpg)
Finding the Strange & Unusual
�3
• Orthenew&unexpected🎃
• …andifit’sdifferent,itmightbebad.☠
• Outlier—Improbabledatapointintheexpecteddistribution🦔
• Anomaly—Datapointgeneratedbyadifferentdistribution👽
![Page 4: Five-sigma Network Events (and how to find them)](https://reader031.vdocuments.net/reader031/viewer/2022012505/61802d4511d39a727774954e/html5/thumbnails/4.jpg)
Mr. Splanky
�4
![Page 5: Five-sigma Network Events (and how to find them)](https://reader031.vdocuments.net/reader031/viewer/2022012505/61802d4511d39a727774954e/html5/thumbnails/5.jpg)
Anomaly & Outlier Tools
�5
“Ifyouwantsomethingdoneright,doityourself.”—Charles-GuillaumeÉtienne
![Page 6: Five-sigma Network Events (and how to find them)](https://reader031.vdocuments.net/reader031/viewer/2022012505/61802d4511d39a727774954e/html5/thumbnails/6.jpg)
Using Python
�6
• Interpretablepseudocode
• Maturelibraries.
• Easytoinstall
• Fastenough😀
![Page 7: Five-sigma Network Events (and how to find them)](https://reader031.vdocuments.net/reader031/viewer/2022012505/61802d4511d39a727774954e/html5/thumbnails/7.jpg)
Creating Tools for Outlier Detection
�7
• IntroducingafewtoolswritteninPython
• Intendedtoanswerinterestingquestionsandscalewell
• Easytomodify/improvetosatisfyyourcuriosity
• Astartingpointforyourowntools
• Codeisavailableat:
http://github.com/EdgewiseNetworks/five-sigma
![Page 8: Five-sigma Network Events (and how to find them)](https://reader031.vdocuments.net/reader031/viewer/2022012505/61802d4511d39a727774954e/html5/thumbnails/8.jpg)
Discover Bad Things Before Big Problems
�8
• Keeptrackofnetflowsacrossmachinesandacrosstime
• Wellenoughtorecognizeunusualthings
• Buttoomuchinformation
• Andmakeittunable
![Page 9: Five-sigma Network Events (and how to find them)](https://reader031.vdocuments.net/reader031/viewer/2022012505/61802d4511d39a727774954e/html5/thumbnails/9.jpg)
Standard Deviation
�9
5σ ≈ 10−6
3.29σ ≈ 10−3
Theamountof“spread”ina(usuallyGaussian)distribution.
![Page 10: Five-sigma Network Events (and how to find them)](https://reader031.vdocuments.net/reader031/viewer/2022012505/61802d4511d39a727774954e/html5/thumbnails/10.jpg)
Project Overview
�10
1. Createafeedoftypicalnetflows• Basedonrealnetflowsbutanonymized• Format:timestamp,src_ip,src_port,dest_ip,dest_port,flow_count
2. Createaconsumerforthesenetflows.
3. Createanumberofconsumertoolstotrackinterestingstatistics.• Standarddeviations• Updateperiod
![Page 11: Five-sigma Network Events (and how to find them)](https://reader031.vdocuments.net/reader031/viewer/2022012505/61802d4511d39a727774954e/html5/thumbnails/11.jpg)
Examples Of Useful Information
�11
1. DoesanIPaddresskeepscanningfornewopenports?
2. DidanIPaddresssuddenlygetalotbusierthanit’severbeeninthepast?
3. DidanIPaddresssuddenlygetalotbusierthananyotherIPaddress?
4. Shouldn’tthisIPaddresshavestoppeddoingnewthingsbynow?
![Page 12: Five-sigma Network Events (and how to find them)](https://reader031.vdocuments.net/reader031/viewer/2022012505/61802d4511d39a727774954e/html5/thumbnails/12.jpg)
Tools To Use: Sketching & Streaming
�12
• Lotsofdatatokeeptrackof
• Butwe'reonlyinterestedincertainaspectsofit- Setcardinality—HyperLogLog- Incrementalmeans&standarddeviations- Onlinelinearregression
• Makebigdataintosmalldata
![Page 13: Five-sigma Network Events (and how to find them)](https://reader031.vdocuments.net/reader031/viewer/2022012505/61802d4511d39a727774954e/html5/thumbnails/13.jpg)
Other Examples of Approximate Probabilistic Sketches
�13
• BloomFilter(setmembership)
• Count-MinSketch(countingitems)
• MinHash(setintersection)
• Locality-SensitiveHashing(LSH:nearestneighbors)
• Q-digest/T-digest(quantiledistribution—MOREABOUTTHISLATER)
![Page 14: Five-sigma Network Events (and how to find them)](https://reader031.vdocuments.net/reader031/viewer/2022012505/61802d4511d39a727774954e/html5/thumbnails/14.jpg)
IpPortScanDetector
�14
Q:DoesanIPaddresskeepscanningfornewopenports?
Contains:{IP_address:HyperLogLog}mapEachHLLcountsdistinctIP:portdestinations.
At each period:mean, sigma = Stdev(hll.cardinality() for every HLL)For each IP_address & HLL :if HLL.cardinality() > N sigmas above the mean:report it.
![Page 15: Five-sigma Network Events (and how to find them)](https://reader031.vdocuments.net/reader031/viewer/2022012505/61802d4511d39a727774954e/html5/thumbnails/15.jpg)
GrowthDetector
�15
Q:DidanIPaddresssuddenlygetalotbusierthanit’severbeeninthepast?
Contains:{IP_address:HyperLogLog}—periodCardinalityMapEachHLLcountsdistinctIP:portdestinationsoveralltime.
{IP_address:StdDev}—periodStatisticsMapEachStdDevincrementallycalculatesmeansandstdevs.
At each period:For each IP_address & HLL & StdDev:currCount = HLL.cardinality()mean, sigma = StdDev.getMeanAndStdev()if currCount > N sigmas above its mean:report it.
HLL.clear()StdDev.add(currCount, current_period)
![Page 16: Five-sigma Network Events (and how to find them)](https://reader031.vdocuments.net/reader031/viewer/2022012505/61802d4511d39a727774954e/html5/thumbnails/16.jpg)
ExplosionDetector
�16
Q:DidanIPaddresssuddenlygetalotbusierthananyotherIPaddress?
Contains:{IP_address:HyperLogLog}—periodCardinalityMapEachHLLcountsdistinctIP:portdestinationsincurrentperiod.
At each period:mean, sigma = Stdev(hll.cardinality() for every HLL)For each IP_address, hll:curr = hll.cardinality()if curr > N sigmas above the mean:report it.
hll.clear()
![Page 17: Five-sigma Network Events (and how to find them)](https://reader031.vdocuments.net/reader031/viewer/2022012505/61802d4511d39a727774954e/html5/thumbnails/17.jpg)
Host Stabilization
�17
• Assumean“exponentialdecay”ofnewIP:portcontactsovertime
• Weknowhowmanywe’veseen,butnothowmanyareleft.
• CanweestimateN_remgivenN_obs?Somecalculuslater…whyyes,wecan.
Nrem ≈ − slope(xi) × avg(xi)
y ∼ e−ax
![Page 18: Five-sigma Network Events (and how to find them)](https://reader031.vdocuments.net/reader031/viewer/2022012505/61802d4511d39a727774954e/html5/thumbnails/18.jpg)
HostStabilizationDetector
�18
Q:Shouldn’tthisIPaddresshavestoppeddoingnewthingsbynow?
Contains:{IP_address:HyperLogLog}—cardinalityMapEachHLLcountsdistinctIP:portdestinationsoveralltime.
{IP_address:StdDev}—periodAverageMapEachStdDevincrementallycalculatesmeansandstdevs.
{IP_address:IncrLinReg}—IncrementalLinearRegressionMapEachStdDevincrementallycalculatesmeansandstdevs.
At each period:For each IP_address, HLL, StdDev, IncrLinReg:N_obs = HLL.cardinality()slope, intercept = IncrLinReg.estimate()avg = StdDev.getMean()N_rem = -slope * avgreportIfDisagree(N_rem < tol, IP_address.frozen)IP_address.setFrozen(N_rem < tol){IncrLinReg, StdDev}.update(N_obs, current_period)
![Page 19: Five-sigma Network Events (and how to find them)](https://reader031.vdocuments.net/reader031/viewer/2022012505/61802d4511d39a727774954e/html5/thumbnails/19.jpg)
Demo Time!
�19
![Page 20: Five-sigma Network Events (and how to find them)](https://reader031.vdocuments.net/reader031/viewer/2022012505/61802d4511d39a727774954e/html5/thumbnails/20.jpg)
But is it Gaussian?
�20
• “Long-tail”or“fat-tail”distributions?
• Trypowerlaworlog-linearfitting• Andmanyothers?• Butthiscangetcomplicated….
• ReplaceStdDevwithtdigest.TDigest
![Page 21: Five-sigma Network Events (and how to find them)](https://reader031.vdocuments.net/reader031/viewer/2022012505/61802d4511d39a727774954e/html5/thumbnails/21.jpg)
Conclusions
�21
• Withouttheagonizingpain
• PythondatasciencetoolsFTW
• Coolsketching&streamingdatastructures
• “Alittlelearningisadangerousthing”…andalittlestatisticsisevenbetter!
• Onlythebeginning—lotsofroomforimprovement
![Page 22: Five-sigma Network Events (and how to find them)](https://reader031.vdocuments.net/reader031/viewer/2022012505/61802d4511d39a727774954e/html5/thumbnails/22.jpg)
TheEndThanksforattending!
Suggestedquestions
1. HowdoIinstallPython,again?2. WhatcanIdowithflow_countsinmynetflows?3. ShowmethecalculusforestimatingNrem!4. So,whatistherealstatisticaldistributionofthatdata?5. HowdoesHyperLogLogwork?
http://github.com/EdgewiseNetworks/five-sigma