detecting botnets with temporal persistence · detecting botnets with temporal persistence jaideep...

Detecting Botnets with Temporal Persistence

Jaideep Chandrashekar Frederic GiroireNina TaftEve Schooler

Intel Labs

Mascotte projectI3S (CNRS, Univ. of Nice)INRIA Sophia Antipolis

Botnets: Why care?

exploit

Botnet Life-Cycle

call home actuation

spamDoS

clickfraudproxies theft

espionage

IRCHTTPP2P

hybrid

dirty webpagedrive by download

trojans

exploit

Botnet Life-Cycle

call home actuation

spamDoS


espionage

IRCHTTPP2P

hybrid


trojans

exploit actuation

signature matchingsoftware patching

exploit

Botnet Life-Cycle

call home actuation

spamDoS


espionage

IRCHTTPP2P

hybrid


trojans

exploit actuation

signature matchingsoftware patching

see RAID’09 proceedings

exploit

Botnet Life-Cycle

call home actuation

spamDoS


espionage

IRCHTTPP2P

hybrid


trojans

exploit actuation

signature matchingsoftware patching traffic anomaly detectors

NBAD


exploit

Botnet Life-Cycle

call home actuation

spamDoS


espionage

IRCHTTPP2P

hybrid


trojans

exploit actuation


NBAD

traffic correlationport inspectionpayload analysis

noisyprone to false positives


exploit

Botnet Life-Cycle

call home actuation

spamDoS


espionage

IRCHTTPP2P

hybrid


trojans

exploit actuation


NBAD

traffic correlationport inspectionpayload analysis

hard to adapta-priori knowledge req.

noisyprone to false positives


Botnets C&C invariants

• Botmasters seldom try to connect to the drones:

• drones initiate the (rondezvous) connections

• Drones need to call home often:

• (if not) drone falls off the radar

Botnets C&C invariants

• Botmasters seldom try to connect to the drones:

• drones initiate the (rondezvous) connections

• Drones need to call home often:

• (if not) drone falls off the radar

watch outgoing traffic

use a frequency based metric

Our Solution: Canary

A general purpose, non-specific learning based behavioral detector to uncover botnet C&C destinations at the end-host

without a-priori assumptions about traffic types, destinations or protocols

High Level Method

Training

Day1 Day2 Day3 Day4 Day5 .. .. DayN ..

High Level Method

Training

watch destinationswhitelist frequent destinations


High Level Method

Training

Detection

watch destinationswhitelist frequent destinations whitelist

params


High Level Method

Training

Detection


ignore whitelisted destinationstrack frequency for non-whitelisted destinationsraise alarm for new high frequency destinations

whitelistparams


High Level Method

Training

Detection



Botnet C&C’s are likely to be frequently visited

whitelistparams


High Level Method

Training

Detection



Botnet C&C’s are likely to be frequently visited

adding to the whitelist is a very rare event

whitelistparams


Remaining Detail

Destination granularity: tracking IP address = large whitelists!

➡ destination atoms

(Frequency) metric needs to capture: loosely periodic behavior at unknown timescales

➡ persistence100 1 2 3 4 5 6 7 8 9

10

0

1

2

3

4

5

6

7

8

9

We track persistence of destination atoms & build whitelists of

destination atoms

Destination Atomsmail1.sc.intel.com

mail3.sc.intel.com

mail3.jf.intel.com

xyz.google.com

abs.google.com

circuit.intel.com

cps.circuit.intel.com

Destination Atomsmail1.sc.intel.com

mail3.sc.intel.com

mail3.jf.intel.com

xyz.google.com

abs.google.com

circuit.intel.com

cps.circuit.intel.com

mail.intel.com

google.com

circuit.intel.com

Persistence

1

1

1 0

1 0

00

1

0

1 1

1

1

sliding window

Persistence

1

1

1 0

1 0

00

1

0

1 1

1

1

sliding window

w

W

Persistence

1

1

1 0

1 0

00

1

0

1 1

1

1

sliding window

3/7

w

W

Persistence

1

1

1 0

1 0

00

1

0

1 1

1

1

sliding window

3/7

6/7

w

W

Picking Timescale

Botnet X Botnet Y Botnet Z

connect to C&C hourly

p-value: 24/24 =1

connect to C&C every 5-6 hours

p-value: 4/24 = 0.17

connect to C&C once a day

p-value: 1/24 =0.042

suppose: w= 1hr, W=24hr

Picking Timescale

Botnet X Botnet Y Botnet Z

connect to C&C hourly

p-value: 24/24 =1

connect to C&C every 5-6 hours

p-value: 4/24 = 0.17

connect to C&C once a day

p-value: 1/24 =0.042

suppose: w= 1hr, W=24hr

Cannot assume a single, fixed timescale!!!!

Selecting Timescale(s)

• Select n overlapping timescales

TS1=(w1,W1), TS2=(w2,W2), TS3=(w3,W3),...., TSn=(wn,Wn)

• pi := persistence of atom for TSi=(wi,Wi)

• track pi concurrently for all the timescales

• p(atom) := maxi pi(atom)

Selecting Timescale(s)

• Select n overlapping timescales

TS1=(w1,W1), TS2=(w2,W2), TS3=(w3,W3),...., TSn=(wn,Wn)

• pi := persistence of atom for TSi=(wi,Wi)

• track pi concurrently for all the timescales

• p(atom) := maxi pi(atom)

Can become very expensive!

Trick: select Wi = k. wiThen, use a single bitmap of size k.wmax

Dataset: Training• Normal user traces collected from 157 end-hosts

for 4 weeks

• Data collected on end-hosts

• winpcap + wrapper code

• traces assumed clean (some suspicious traffic observed: ground truth not available)

• Initial 2 weeks of data used for training

• pick threshold for persistence

• construct per user whitelists

Picking threshold(s)

→seems reasonable

80 % of destinations have a p-value <0.2

20% of destinations have a p-value > 0.2

10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

persistence

# o

f ato

ms

if p(atom) > 0.6 add to whitelist

Whitelist Sizes

14

0

12.5

25

40 70 90 110 125 145

# u

sers

whitelist size

Validation

• Started with 55 distinct malware binaries

• 27 had traffic; 12 had traffic for longer than 1 day

• Ran each malware for 1 week; all traffic logged

• Packet traces ➜ flow traces [Bro]

• Flow traces manually analyzed to isolate C&C traffic

windows auto update

malware

ClamAV Signature C&C type # of C&C atoms C&C Volumemin - max

Trojan.Aimbot-25 port 22 1 0-5.7Trojan.Wootbot-247 IRC port 12347 4 0-6.8Trojan.Gobot.T IRC port 66659 1 0.2-2.1Trojan.Codbot-14 IRC port 6667 2 0-9.2Trojan.Aimbot-5 IRC via http proxy 3 0-10Trojan.IRCBot-776* HTTP 16 0-1.Trojan.VB-666* IRC port 6667 1 0-1.3Trojan.IRC-Script-50 IRC ports 6662-6669,9999,7000 8 0-2.1 8Trojan.Spybot-248 port 9305 4 3.8-4.6Trojan.MyBot-8926 IRC port 7007 1 0-0.1Trojan.IRC.Zapchast-11 IRC ports 6666, 6667 9 0-1Trojan.Peed-69 [Storm] P2P/Overnet 19672 0-30

Converted packet traces to flow traces and hand analyzed each trace individually

to identify/isolate C&C trafficto identify/isolate attack traffic

3 Detailed Examples• SDBot

• 2 atoms in covert channel- identified by IRC server names

• attack traffic- scans on ports 135, 139, 445 & 2097*

• Zapchast

• 9 atoms in covert channel- popular IRC ports

• attack traffic- netbios(?)

• Storm/Peacomm• ~82,000 atoms (almost all atoms are singletons)

• no well known port/address for C&C destinations

• attack traffic is SMTP (overwhelmingly), and possibly some http & ssh

Connection Rates (per min)

0

37.5

75.0

112.5

150.0

SDBotZapChast

Storm

C&C Attack

C&C Detection

0.0 0.2 0.4 0.6 0.8 1.0

1 hr

6 hr

12 hr

18 hr

24 hr

SDBot

ZapChast

Storm

persistence

C&C Detection (all)Botnet Persistence Timescale # dest. atomsIRCBot-776 1.0 (10,1) 1IRCBot-776 0.8 (200,20) 2Aimbot-5 1.0 (10,1) 1Aimbot-5 1.0 (40,4) 1Aimbot-5 1.0 (160,16) 1MyBot-8926 0.6 (160,16) 1IRC.Zapchast-11 1.0 (40,4) 3Spybot-248 1.0 (10,1) 2IRC-Script-50 1.0 (10,1) 7VB-666 0.7 (10,1) 1Codbot-14 1.0 (10,1) 1Gobot.T 1.0 (10,1) 1Wootbot-247 1.0 (10,1) 3IRC.Zapchast-11 1.0 (10,1) 6Aimbot-25 1.0 (10,1) 1Peed-69 [Storm] 1.0 (10,1) > 1

All samples detected by threshold 0.6associated false positive rate~ 0.5/day

Improvement in Anomaly Detection

Filtering traffic via whitelists:

• Reduces volume of suspicious traffic

• Reduces False Positive Rate of Anomaly detection

• improves sensitivity (allows lowered thresholds)

thresh

Traffic Anomaly Detector Gains

Caveats

• CANARY is not botnet specific, so alarms are non-specific (need external intelligence to characterize)

• Cannot detect some forms of fast fluxing and some P2P networks (Storm is easily detected though)

• Single fast flux can be detected

• New applications can cause false alarms

• Cannot deal with malware hosted on “whitelisted” sites

Conclusions• Tracking persistence uncovers low intensity,

stealthy behavior such as C&C channels

• Filtering by whitelist improves (traffic) anomaly detection

• Augments existing techniques and provides additional coverage against unknown threats

• Not a silver bullet, but this is an arms race

Questions?

False Positives

7 1 2 3 4 5 6

0.75

0.8

0.85

0.9

0.95

1

False Positives /day

Dete

cti

on R

ate

0.10.2

1.0

0.9

0.8

0.7

0.60.5

0.4

0.3

Change in AD thresholds

Change in AD thresholds

original~200filtered~ 30

detecting botnets with temporal persistence · detecting botnets with temporal persistence jaideep...

Documents