botnet and spam detection in high-speed networks

29
Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks

Upload: linda-reese

Post on 02-Jan-2016

102 views

Category:

Documents


1 download

DESCRIPTION

Botnet and Spam Detection in High-Speed Networks. Wenke Lee and Nick Feamster Georgia Tech. Overview. Problem: Botnet and Spam Detection in high-speed networks Common theme: Examine network-level properties and build classifier Two systems: BotMiner and SNARE Overview - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Botnet and Spam Detection in High-Speed Networks

Wenke Lee and Nick FeamsterGeorgia Tech

Botnet and Spam Detection in High-Speed Networks

Page 2: Botnet and Spam Detection in High-Speed Networks

Overview

• Problem: Botnet and Spam Detection in high-speed networks

• Common theme: Examine network-level properties and build classifier

• Two systems: BotMiner and SNARE– Overview– Integration with SMITE architecture

• Current integration status and plan

Page 3: Botnet and Spam Detection in High-Speed Networks

3

BotMiner: Structure and Protocol Independent

• Botnets can change their C&C content (encryption, etc.), protocols (IRC, HTTP, etc.), structures (P2P, etc.), C&C servers, infection models …

bot

bot

bot

bot

bot

C&C

bot

bot

bot

bot

bot

bot

(a) (b)

Page 4: Botnet and Spam Detection in High-Speed Networks

4

Definition of a Botnet

• “A coordinated group of malware instances that are controlled by a botmaster via some C&C channel”– Hosts that have similar C&C-like traffic and similar

malicious activities

• We need to monitor two planes– C-plane (C&C communication plane): “who is talking

to whom”– A-plane (malicious activity plane): “who is doing what”

Page 5: Botnet and Spam Detection in High-Speed Networks

5

BotMiner Architecture

Scan

Spam

A-Plane Monitor

BinaryDownloading

C-Plane Monitor

Flow Log

C-PlaneClustering

NetworkTraffic

Exploit

...

Activity Log

A-PlaneClustering

Cross-PlaneCorrelation

Reports

SensorsAlgorithms

Correlation

Page 6: Botnet and Spam Detection in High-Speed Networks

6

BotMiner C-plane Clustering

• What characterizes a communication flow (C-flow) between a local host and a remote service? – <protocol, srcIP, dstIP, dstPort>– Temporal related statistical distribution information

– E.g., BPS (bytes per second), FPH (flows per hour)

– Spatial related statistical distribution information– E.g., BPP (bytes per packet), PPF (packets per flow)

Page 7: Botnet and Spam Detection in High-Speed Networks

7

A-plane Clustering

• Capture “similar activities patterns”

Page 8: Botnet and Spam Detection in High-Speed Networks

8

Cross-plane Correlation

• Botnet score s(h) for every host h– A host has higher score if it is in more activity

clusters and in both activity and communication clusters

– A host with a high score is a bot

• Similarity score between bot host hi and hj

– Two hosts in the same A-clusters and in at least one common C-cluster are clustered together

– Each cluster is a bot

Page 9: Botnet and Spam Detection in High-Speed Networks

9

SMITE Integration: BotMiner

Page 10: Botnet and Spam Detection in High-Speed Networks

10

• Sensors– Feature extraction for C-Plane and A-Plane

clustering– C-Flow temporal and statistical features

• Counting packets and connections between each pair of endpoints: bytes per second, flows per hour, bytes per packet, packets per flow

– A-Plane header and payload features• Destination IP addresses and ports, payload

bytes/strings

– These sensors are not specific to BotMiner

Integrating BotMiner and SMITE

Page 11: Botnet and Spam Detection in High-Speed Networks

11

• Algorithms– C-plane clustering

• Multi-step clustering based on statistical and temporal C-flow features

– A-plane clustering• Based on activity-specific similarity measures: e.g., spread of

destination IP addresses and ports, Dice’s coefficient of string similarity, and byte frequency or entropy of payload

– Bot scoring and botnet clustering methods• Scoring based on participation in C-plane and A-plane

clusters• Clustering based on common memberships in the C-plane

and A-plane clusters

Integrating BotMiner and SMITE

Page 12: Botnet and Spam Detection in High-Speed Networks

12

• Correlation– Botnet detection involves both vertical and horizontal

analysis/clustering:• Vertical: what activities a host has been involved in

– Bot detection

• Horizontal: what other hosts have similar (vertical) behavior patterns

– Botnet detection

– Similar analysis can be applied to other alerts• Improve botnet detection• Understand malicious activities and plans of attacks• Measure the scale of attacks

Integrating BotMiner and SMITE

Page 13: Botnet and Spam Detection in High-Speed Networks

13

• Filter email based on how it is sent, in addition to simply what is sent.

• Network-level properties are less malleable– Hosting or upstream ISP (AS number)– Membership in a botnet (spammer, hosting

infrastructure)– Network location of sender and receiver– Set of target recipients

Network-Based Spam Detection

Page 14: Botnet and Spam Detection in High-Speed Networks

14

Finding the Right Features

• Goal: Sender reputation from a single packet header?– Low overhead– Fast classification– In-network– Perhaps more evasion resistant

• Key challenge– What features satisfy these properties and can

distinguish spammers from legitimate senders?

Page 15: Botnet and Spam Detection in High-Speed Networks

15

Network-Level Features

• Single-Packet– AS of sender’s IP– Distance to k nearest senders– Status of email service ports– Geodesic distance– Time of day

• Single-Message– Number of recipients– Length of message

• Aggregate (Multiple Message/Recipient)

Page 16: Botnet and Spam Detection in High-Speed Networks

16

Sender-Receiver Geodesic Distance

90% of legitimate messages travel 2,200 miles or less

Page 17: Botnet and Spam Detection in High-Speed Networks

17

Density of Senders in IP Space

For spammers, k nearest senders are much closer in IP space

Page 18: Botnet and Spam Detection in High-Speed Networks

18

Local Time of Day at Sender

Spammers “peak” at different local times of day

Page 19: Botnet and Spam Detection in High-Speed Networks

19

Other Network-Level Features

• Time-of-day at sender

• Upstream AS of sender

• Message size (and variance)

• Number of recipients (and variance)

Page 20: Botnet and Spam Detection in High-Speed Networks

20

Combining Features: RuleFit

• Put features into the RuleFit classifier• 10-fold cross validation on one day of query logs

from a large spam filtering appliance provider

• Comparable performance to SpamHaus– Incorporating into the system can further reduce FPs

• Using only network-level features• Completely automated

Page 21: Botnet and Spam Detection in High-Speed Networks

21

Benefits of Whitelisting

Whitelisting top 50 ASes:False positives reduced to 0.14%

Page 22: Botnet and Spam Detection in High-Speed Networks

22

Integrating SNARE and SMITE

Sensors

Algorithms/Correlation

Page 23: Botnet and Spam Detection in High-Speed Networks

23

Integration with SMITE

• Sensors– Extract network features from traffic– IP addresses– Combine with auxiliary data (routing, time, etc.)

• Algorithms– Clustering algorithm to identify behavioral fingerprints– Learning algorithm to classify based on multiple features

• Correlation– Clusters formed by aggregating sending behavior observed

across multiple sensors– Various features also require input from data collected across

collections of IP addresses

Page 24: Botnet and Spam Detection in High-Speed Networks

24

SMITE Integration Challenges

• Sources of labeled data– SNARE requires clean sources of labeled

data for training

• Data collection– SNARE’s performance improves when

behavior can be observed across multiple domains

Page 25: Botnet and Spam Detection in High-Speed Networks

25

Overall SMITE Integration

Page 26: Botnet and Spam Detection in High-Speed Networks

26

SMITE Integration: Current Work

• Study pipeline architecture and code

• Modify flow-analyzer to dump 5-tuple flow information

Page 27: Botnet and Spam Detection in High-Speed Networks

27

SMITE Integration: Phase I

• Modify flow-analyzer with SMITE team to generate 5-tuple flow information (mid-March)

• Spam/scan detection, flow aggregation in BotMiner; Spam feature extraction in SNARE (end of March)

• Clustering and correlation in BotMiner; Classifier in SNARE (end of April)

Page 28: Botnet and Spam Detection in High-Speed Networks

28

SMITE Integration: Phase II

• Evaluate performance of BotMiner and SNARE– How many hours to process one-day of traffic, or what is

the “lag” time between event and detection?

• Design real-time detection algorithms– A two-tier system: off-line module output lists of suspicious

hosts, and real-time module inspects all packets of these hosts; or, off-line module output clusters

• Design algorithms to handle asymmetric traffic– Cluster on each direction of traffic and cross-correlate

Page 29: Botnet and Spam Detection in High-Speed Networks

Thank You!