detecting malicious flux service networks through passive analysis of recursive dns traces

1

Detecting Malicious Flux Service Networks through Passive Analysis of Recursive DNS

Traces

Roberto Perdisci, Igino Corona, David Dagon, Wenke LeeACSAC (Dec, 2009)

2010/3/2

2

Agenda

• Introduction• Objective• Detecting Malicious Flux Networks• Experiments• Conclusion

2010/3/2

3

Agenda


Fast-Flux?At 2007

fast-flux domain names

Malicious Fast-Flux Network

2010/3/2

4

Malicious flux service networks

• Be viewed as illegitimate content-delivery networks (CDNs)

• The nodes of a malicious flux service network is called flux agents

• Commonly used to host phishing websites, illegal adult content, or serve as malware propagation vectors

2010/3/2

5

Related Work

• Detecting fast-flux domain names

• Characterized fast flux domains and the details of the classification algorithms

• Limited to mainly studying fast-flux domains advertised through email spams

2010/3/2

6

Approach

• Novel and passive

• Monitor the DNS queries and responses fromthe users to the RDNS, and selectively store information about potential fast-flux domains into a central DNS data collector

• By deploying sensors in front of the recursive DNS (RDNS) ?

2010/3/2

7

Agenda


Focus on detecting malicious flux networks in- the-wild

Passive detection benefit the accuracy of spam filtering applications

2010/3/2

8

Agenda


2010/3/2

9

Characteristics of Flux Domain Names

a) Short time-to-live (TTL)b) The set of resolved IPs (i.e., the flux agents)

returned at each query changes rapidly, usually after every TTL

c) The overall set of resolved IPs obtained by querying the same domain name over time is often very large

d) The resolved IPs are scattered across many different networks

2010/3/2

10

Traffic Volume Reduction(F1)(1)

• q(d) = (ti, T(d),P(d))– DNS query performed by a user at time ti to resolve

the set of IP addresses owned by domain name d• T(d)

– the time-to-live (TTL) of the DNS response• P(d)

– the set of resolved IPs returned by the RDNS server

2010/3/2

12

Periodic List Pruning(F2)(1)• Candidate flux domain name d– d =

• : the time when the last DNS query for d was observed

• : the total number of DNS queries related to d ever seen until

• : the maximum TTL ever observed for d• : the cumulative set of all the resolved IPs

ever seen for d until time • : a sequence of pairs– where

)G,R ,T̂,Q ,(t (d)i

(d)i

(d)i

)d(ii

it

(d)iQ

(d)iR

(d)iG

(d)iT̂

it

it

1..ij(d)

jj )}r , {(t

|R| - |R| r (d)1-j

(d)j

(d)j

2010/3/2

13

Periodic List Pruning(F2)(2)

0.5)p OR 5 |R(| AND 3 |G| AND 100 Q (d)j

(d)jj

F2-a)

2010/3/2

14

Domain Clustering(1)

• A similarity (or proximity) matrix P = {sij}i,j=1..n that consists of similarities sij = sim(di, dj)– D = {d1, d2, ..dn},

2010/3/2

15

Domain Clustering(2)

• The hierarchical clustering algorithm takes P as input and produces in output a dendrogram, i.e., a tree-like data structure in which the leaves represent the original domains in D

2010/3/2

16

Service Classifier (1)

• Some features used to distinguish between malicious flux services and legitimate/non-flux services

• Both passive and active features– Passive: directly extracted from the information

collected by passive monitoring the DNS queries• Ex: Number of resolved IPs,

– Active: need some external information to be computed• Ex: Country code diversity,

1

10

2010/3/2

17

Service Classifier (2)

• Employ the popular C4.5 decision-tree classifier to automatically classify a cluster Ci as either malicious flux service or legitimate/non-flux service

2010/3/2

18

Agenda


2010/3/2

19

Collecting Recursive DNS Traffic

• Two sensors in front of two different RDNS servers of large north American ISP

• Between March 1 and April 14, 2009• More than 4 million users• Monitor 2.5 billion DNS queries per day• Set the epoch E to be one day

2010/3/2

20

Clustering Candidate Flux Domains(1)

• Apply a single-linkage hierarchical clustering algorithm to group together domains that belong to the same network

• Need 30 ~ 40 minutes per day and per sensor• Obtained 4000 domain clusters per day

2010/3/2

21

Clustering Candidate Flux Domains(2)

• Manually verified the quality of the results for a subset of the clusters obtained every day

• With the help of a graphical interface• Ex:– NTP server pool in Europe, North America,

Oceania, etc

2010/3/2

22

Evaluation of the Service Classifier(1)

• Statistical supervised learning approach• Label the cluster domains,– according to network prefix diversity, – cumulative number of distinct resolved IPs,– the IP growth ratio, , etc.

4

6

1

2010/3/2

23

Evaluation of the Service Classifier(2)

5

3

6

6

3

5

: Avg. TTL per domain

: Number of domains per network

: IP Growth Ratio

Classify between malicious flux network

andnon malicious flux network

2010/3/2

24

Can this Contribute to Spam Filtering?(1)

• Check the intersection between domain name set from spam emails and domains from the malicious flux networks identified by the detection system

2010/3/2

25

Can this Contribute to Spam Filtering?(2)

2010/3/2

26

Agenda


2010/3/2

27

Conslusion

• The detection system is based on passive analysis of recursive DNS (RDNS) traffic traces

• Not limited to the analysis of suspicious domain names extracted from spam emails or precompiled domain blacklists

• Benefit spam filtering applications

2010/3/2

detecting malicious flux service networks through passive analysis of recursive dns traces

Documents

fast flux domains

related flux agents

potential fastflux domains

advertisethe malicious

dns queries

dns responsepdthe

recursive dns rdns

different networks