detecting botnets with temporal persistence · detecting botnets with temporal persistence jaideep...
TRANSCRIPT
Detecting Botnets with Temporal Persistence
Jaideep Chandrashekar Frederic GiroireNina TaftEve Schooler
Intel Labs
Mascotte projectI3S (CNRS, Univ. of Nice)INRIA Sophia Antipolis
exploit
Botnet Life-Cycle
call home actuation
spamDoS
clickfraudproxies theft
espionage
IRCHTTPP2P
hybrid
dirty webpagedrive by download
trojans
exploit
Botnet Life-Cycle
call home actuation
spamDoS
clickfraudproxies theft
espionage
IRCHTTPP2P
hybrid
dirty webpagedrive by download
trojans
exploit actuation
signature matchingsoftware patching
exploit
Botnet Life-Cycle
call home actuation
spamDoS
clickfraudproxies theft
espionage
IRCHTTPP2P
hybrid
dirty webpagedrive by download
trojans
exploit actuation
signature matchingsoftware patching
see RAID’09 proceedings
exploit
Botnet Life-Cycle
call home actuation
spamDoS
clickfraudproxies theft
espionage
IRCHTTPP2P
hybrid
dirty webpagedrive by download
trojans
exploit actuation
signature matchingsoftware patching traffic anomaly detectors
NBAD
see RAID’09 proceedings
exploit
Botnet Life-Cycle
call home actuation
spamDoS
clickfraudproxies theft
espionage
IRCHTTPP2P
hybrid
dirty webpagedrive by download
trojans
exploit actuation
signature matchingsoftware patching traffic anomaly detectors
NBAD
traffic correlationport inspectionpayload analysis
noisyprone to false positives
see RAID’09 proceedings
exploit
Botnet Life-Cycle
call home actuation
spamDoS
clickfraudproxies theft
espionage
IRCHTTPP2P
hybrid
dirty webpagedrive by download
trojans
exploit actuation
signature matchingsoftware patching traffic anomaly detectors
NBAD
traffic correlationport inspectionpayload analysis
hard to adapta-priori knowledge req.
noisyprone to false positives
see RAID’09 proceedings
Botnets C&C invariants
• Botmasters seldom try to connect to the drones:
• drones initiate the (rondezvous) connections
• Drones need to call home often:
• (if not) drone falls off the radar
Botnets C&C invariants
• Botmasters seldom try to connect to the drones:
• drones initiate the (rondezvous) connections
• Drones need to call home often:
• (if not) drone falls off the radar
watch outgoing traffic
use a frequency based metric
Our Solution: Canary
A general purpose, non-specific learning based behavioral detector to uncover botnet C&C destinations at the end-host
without a-priori assumptions about traffic types, destinations or protocols
Our Solution: Canary
A general purpose, non-specific learning based behavioral detector to uncover botnet C&C destinations at the end-host
without a-priori assumptions about traffic types, destinations or protocols
High Level Method
Training
watch destinationswhitelist frequent destinations
Day1 Day2 Day3 Day4 Day5 .. .. DayN ..
High Level Method
Training
Detection
watch destinationswhitelist frequent destinations whitelist
params
Day1 Day2 Day3 Day4 Day5 .. .. DayN ..
High Level Method
Training
Detection
watch destinationswhitelist frequent destinations
ignore whitelisted destinationstrack frequency for non-whitelisted destinationsraise alarm for new high frequency destinations
whitelistparams
Day1 Day2 Day3 Day4 Day5 .. .. DayN ..
High Level Method
Training
Detection
watch destinationswhitelist frequent destinations
ignore whitelisted destinationstrack frequency for non-whitelisted destinationsraise alarm for new high frequency destinations
Botnet C&C’s are likely to be frequently visited
whitelistparams
Day1 Day2 Day3 Day4 Day5 .. .. DayN ..
High Level Method
Training
Detection
watch destinationswhitelist frequent destinations
ignore whitelisted destinationstrack frequency for non-whitelisted destinationsraise alarm for new high frequency destinations
Botnet C&C’s are likely to be frequently visited
adding to the whitelist is a very rare event
whitelistparams
Day1 Day2 Day3 Day4 Day5 .. .. DayN ..
Remaining Detail
Destination granularity: tracking IP address = large whitelists!
➡ destination atoms
(Frequency) metric needs to capture: loosely periodic behavior at unknown timescales
➡ persistence100 1 2 3 4 5 6 7 8 9
10
0
1
2
3
4
5
6
7
8
9
We track persistence of destination atoms & build whitelists of
destination atoms
Destination Atomsmail1.sc.intel.com
mail3.sc.intel.com
mail3.jf.intel.com
xyz.google.com
abs.google.com
circuit.intel.com
cps.circuit.intel.com
Destination Atomsmail1.sc.intel.com
mail3.sc.intel.com
mail3.jf.intel.com
xyz.google.com
abs.google.com
circuit.intel.com
cps.circuit.intel.com
mail.intel.com
google.com
circuit.intel.com
Picking Timescale
Botnet X Botnet Y Botnet Z
connect to C&C hourly
p-value: 24/24 =1
connect to C&C every 5-6 hours
p-value: 4/24 = 0.17
connect to C&C once a day
p-value: 1/24 =0.042
suppose: w= 1hr, W=24hr
Picking Timescale
Botnet X Botnet Y Botnet Z
connect to C&C hourly
p-value: 24/24 =1
connect to C&C every 5-6 hours
p-value: 4/24 = 0.17
connect to C&C once a day
p-value: 1/24 =0.042
suppose: w= 1hr, W=24hr
Cannot assume a single, fixed timescale!!!!
Selecting Timescale(s)
• Select n overlapping timescales
TS1=(w1,W1), TS2=(w2,W2), TS3=(w3,W3),...., TSn=(wn,Wn)
• pi := persistence of atom for TSi=(wi,Wi)
• track pi concurrently for all the timescales
• p(atom) := maxi pi(atom)
Selecting Timescale(s)
• Select n overlapping timescales
TS1=(w1,W1), TS2=(w2,W2), TS3=(w3,W3),...., TSn=(wn,Wn)
• pi := persistence of atom for TSi=(wi,Wi)
• track pi concurrently for all the timescales
• p(atom) := maxi pi(atom)
Can become very expensive!
Trick: select Wi = k. wiThen, use a single bitmap of size k.wmax
Dataset: Training• Normal user traces collected from 157 end-hosts
for 4 weeks
• Data collected on end-hosts
• winpcap + wrapper code
• traces assumed clean (some suspicious traffic observed: ground truth not available)
• Initial 2 weeks of data used for training
• pick threshold for persistence
• construct per user whitelists
Picking threshold(s)
→seems reasonable
80 % of destinations have a p-value <0.2
20% of destinations have a p-value > 0.2
10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
persistence
# o
f ato
ms
if p(atom) > 0.6 add to whitelist
Validation
• Started with 55 distinct malware binaries
• 27 had traffic; 12 had traffic for longer than 1 day
• Ran each malware for 1 week; all traffic logged
• Packet traces ➜ flow traces [Bro]
• Flow traces manually analyzed to isolate C&C traffic
windows auto update
malware
ClamAV Signature C&C type # of C&C atoms C&C Volumemin - max
Trojan.Aimbot-25 port 22 1 0-5.7Trojan.Wootbot-247 IRC port 12347 4 0-6.8Trojan.Gobot.T IRC port 66659 1 0.2-2.1Trojan.Codbot-14 IRC port 6667 2 0-9.2Trojan.Aimbot-5 IRC via http proxy 3 0-10Trojan.IRCBot-776* HTTP 16 0-1.Trojan.VB-666* IRC port 6667 1 0-1.3Trojan.IRC-Script-50 IRC ports 6662-6669,9999,7000 8 0-2.1 8Trojan.Spybot-248 port 9305 4 3.8-4.6Trojan.MyBot-8926 IRC port 7007 1 0-0.1Trojan.IRC.Zapchast-11 IRC ports 6666, 6667 9 0-1Trojan.Peed-69 [Storm] P2P/Overnet 19672 0-30
Converted packet traces to flow traces and hand analyzed each trace individually
to identify/isolate C&C trafficto identify/isolate attack traffic
3 Detailed Examples• SDBot
• 2 atoms in covert channel- identified by IRC server names
• attack traffic- scans on ports 135, 139, 445 & 2097*
• Zapchast
• 9 atoms in covert channel- popular IRC ports
• attack traffic- netbios(?)
• Storm/Peacomm• ~82,000 atoms (almost all atoms are singletons)
• no well known port/address for C&C destinations
• attack traffic is SMTP (overwhelmingly), and possibly some http & ssh
C&C Detection (all)Botnet Persistence Timescale # dest. atomsIRCBot-776 1.0 (10,1) 1IRCBot-776 0.8 (200,20) 2Aimbot-5 1.0 (10,1) 1Aimbot-5 1.0 (40,4) 1Aimbot-5 1.0 (160,16) 1MyBot-8926 0.6 (160,16) 1IRC.Zapchast-11 1.0 (40,4) 3Spybot-248 1.0 (10,1) 2IRC-Script-50 1.0 (10,1) 7VB-666 0.7 (10,1) 1Codbot-14 1.0 (10,1) 1Gobot.T 1.0 (10,1) 1Wootbot-247 1.0 (10,1) 3IRC.Zapchast-11 1.0 (10,1) 6Aimbot-25 1.0 (10,1) 1Peed-69 [Storm] 1.0 (10,1) > 1
All samples detected by threshold 0.6associated false positive rate~ 0.5/day
Improvement in Anomaly Detection
Filtering traffic via whitelists:
• Reduces volume of suspicious traffic
• Reduces False Positive Rate of Anomaly detection
• improves sensitivity (allows lowered thresholds)
thresh
Caveats
• CANARY is not botnet specific, so alarms are non-specific (need external intelligence to characterize)
• Cannot detect some forms of fast fluxing and some P2P networks (Storm is easily detected though)
• Single fast flux can be detected
• New applications can cause false alarms
• Cannot deal with malware hosted on “whitelisted” sites
Conclusions• Tracking persistence uncovers low intensity,
stealthy behavior such as C&C channels
• Filtering by whitelist improves (traffic) anomaly detection
• Augments existing techniques and provides additional coverage against unknown threats
• Not a silver bullet, but this is an arms race
False Positives
7 1 2 3 4 5 6
0.75
0.8
0.85
0.9
0.95
1
False Positives /day
Dete
cti
on R
ate
0.10.2
1.0
0.9
0.8
0.7
0.60.5
0.4
0.3