practical anomaly detection based on classifying frequent...

36
Practical Anomaly Detection based on Classifying Frequent Traffic Patterns Ignasi Paredes-Oliva 1 Ismael Castell-Uroz 1 Pere Barlet-Ros 1 Xenofontas Dimitropoulos 2 Josep Solé-Pareta 1 1 UPC BarcelonaTech, Spain {iparedes,icastell,pbarlet,pareta}@ac.upc.edu 2 ETH Zürich, Switzerland [email protected] 15 th IEEE Global Internet Symposium (GI) Orlando, FL, United States March 30 th , 2012

Upload: tranhuong

Post on 18-Feb-2018

218 views

Category:

Documents


1 download

TRANSCRIPT

Practical Anomaly Detection based onClassifying Frequent Traffic Patterns

Ignasi Paredes-Oliva1 Ismael Castell-Uroz1 Pere Barlet-Ros1

Xenofontas Dimitropoulos2 Josep Solé-Pareta1

1UPC BarcelonaTech, Spain{iparedes,icastell,pbarlet,pareta}@ac.upc.edu

2ETH Zürich, [email protected]

15th IEEE Global Internet Symposium (GI)Orlando, FL, United States

March 30th, 2012

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Outline

1 Introduction

2 Related Work

3 Our Proposal

4 Performance Evaluation

5 Conclusions

2 / 21

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Outline

1 Introduction

2 Related Work

3 Our Proposal

4 Performance Evaluation

5 Conclusions

3 / 21

Introduction Related Work Our Proposal Performance Evaluation Conclusions

The problem

Growth of cyber-attacks1

Anomaly detection systems not widely deployede.g., too many false positives, complex black boxes

Anomaly classification and root-cause analysis are still openissues

e.g., manual analysis → error-prone, complex, slow and expensive2

Our goal

Simple system for automatic anomaly detection andclassification

High classification accuracy and low false positives

Conceptually simple working scheme

1Kim-Kwang Raymond Choo, The cyber threat landscape: Challenges and future research directions, Computers & Security, 2011.

2M. Molina et al., Anomaly Detection in Backbone Networks: Building a Security Service Upon an Innovative Tool. TNC 2010.

4 / 21

Introduction Related Work Our Proposal Performance Evaluation Conclusions

The problem

Growth of cyber-attacks1

Anomaly detection systems not widely deployede.g., too many false positives, complex black boxes

Anomaly classification and root-cause analysis are still openissues

e.g., manual analysis → error-prone, complex, slow and expensive2

Our goal

Simple system for automatic anomaly detection andclassification

High classification accuracy and low false positives

Conceptually simple working scheme

1Kim-Kwang Raymond Choo, The cyber threat landscape: Challenges and future research directions, Computers & Security, 2011.

2M. Molina et al., Anomaly Detection in Backbone Networks: Building a Security Service Upon an Innovative Tool. TNC 2010.

4 / 21

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Outline

1 Introduction

2 Related Work

3 Our Proposal

4 Performance Evaluation

5 Conclusions

5 / 21

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Related work and contributions

Many proposals on anomaly detection

Anomaly classification marginally studied

Contributions of this paper

Novel approach for automatic anomaly detection and classificationbased on classifying frequent traffic patterns

Evaluated using data from two large networks

High classification accuracy and low false positives ratio

System deployed in the Catalan NREN

6 / 21

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Related work and contributions

Many proposals on anomaly detection

Anomaly classification marginally studied

Contributions of this paper

Novel approach for automatic anomaly detection and classificationbased on classifying frequent traffic patterns

Evaluated using data from two large networks

High classification accuracy and low false positives ratio

System deployed in the Catalan NREN

6 / 21

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Outline

1 Introduction

2 Related Work

3 Our Proposal

4 Performance Evaluation

5 Conclusions

7 / 21

Introduction Related Work Our Proposal Performance Evaluation Conclusions

System Overview

Two phases:Offline: build model to classify anomaliesOnline: use model to classify incoming traffic

Freq.

Item-Set

Mining

Feature

Extraction

Machine

LearningModel

Freq.

Item-Set

Mining

Feature

ExtractionClassification

8 / 21

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Frequent Item-Set Mining

Originally used in market basket analysis to find out products thatwere frequently bought together and make appealing offers (e.g.,beer and chips)

What is an item-set?

compact summarization of elements occurring together

Why is it useful for anomaly detection?

Many attacks involve high volume of flows with common features

e.g., Port Scan: many flows with same sIP and dIP

9 / 21

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Frequent Item-Set Mining

Originally used in market basket analysis to find out products thatwere frequently bought together and make appealing offers (e.g.,beer and chips)

What is an item-set?

compact summarization of elements occurring together

Why is it useful for anomaly detection?

Many attacks involve high volume of flows with common features

e.g., Port Scan: many flows with same sIP and dIP

9 / 21

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Frequent Item-Set Mining

Originally used in market basket analysis to find out products thatwere frequently bought together and make appealing offers (e.g.,beer and chips)

What is an item-set?

compact summarization of elements occurring together

Why is it useful for anomaly detection?

Many attacks involve high volume of flows with common features

e.g., Port Scan: many flows with same sIP and dIP

9 / 21

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Frequent Item-Set Mining

Originally used in market basket analysis to find out products thatwere frequently bought together and make appealing offers (e.g.,beer and chips)

What is an item-set?

compact summarization of elements occurring together

Why is it useful for anomaly detection?

Many attacks involve high volume of flows with common features

e.g., Port Scan: many flows with same sIP and dIP

9 / 21

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Frequent Item-Set Mining

Port Scan example

sIP dIP sPort dPort1st flow X.77.17.59 Y.88.243.209 41393 212092nd flow X.77.17.59 Y.88.243.209 41393 547663rd flow X.77.17.59 Y.88.243.209 41393 314484th flow X.77.17.59 Y.88.243.209 41393 58514

...2911th flow X.77.17.59 Y.88.243.209 41393 48732

sIP dIP sPort dPortitem-set X.77.17.59 Y.88.243.209 41393 *

Need further information per item-set in order to classify it

10 / 21

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Frequent Item-Set Mining

Port Scan example

sIP dIP sPort dPort1st flow X.77.17.59 Y.88.243.209 41393 212092nd flow X.77.17.59 Y.88.243.209 41393 547663rd flow X.77.17.59 Y.88.243.209 41393 314484th flow X.77.17.59 Y.88.243.209 41393 58514

...2911th flow X.77.17.59 Y.88.243.209 41393 48732

sIP dIP sPort dPortitem-set X.77.17.59 Y.88.243.209 41393 *

Need further information per item-set in order to classify it

10 / 21

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Frequent Item-Set Mining

Port Scan example

sIP dIP sPort dPort1st flow X.77.17.59 Y.88.243.209 41393 212092nd flow X.77.17.59 Y.88.243.209 41393 547663rd flow X.77.17.59 Y.88.243.209 41393 314484th flow X.77.17.59 Y.88.243.209 41393 58514

...2911th flow X.77.17.59 Y.88.243.209 41393 48732

sIP dIP sPort dPortitem-set X.77.17.59 Y.88.243.209 41393 *

Need further information per item-set in order to classify it

10 / 21

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Feature Extraction

Computed features for each frequent item-set

ValueDefined Undefined

Defined Src IP/Dst IP True FalseSrc/Dst Port Port Number NaNProtocol Protocol Number NaNURG/ACK/PSH/RST/SYN/FIN True FalseBytes per Packet (bpp) #Bytes/#PacketsPacket per Flow (ppf) #Packets/#Flows

11 / 21

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Building the classifier (offline)

Goal: build model taking into account manually labeled frequentitem-sets

Output classes

Anomalous: DoS (DDoS, SYN/ACK/UDP/ICMP floods), NetworkScans (ICMP/Other Network Scans), Port Scans (SYN/ACK/UDPPort Scans)

Normal (legitimate traffic)

Unknown (not normal and did not fit in any anomalous class)

Labeled item-sets + features + output classes are given to theC5.0 algorithm (machine learning) → output: classificationmodel

12 / 21

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Building the classifier (offline)

Goal: build model taking into account manually labeled frequentitem-sets

Output classes

Anomalous: DoS (DDoS, SYN/ACK/UDP/ICMP floods), NetworkScans (ICMP/Other Network Scans), Port Scans (SYN/ACK/UDPPort Scans)

Normal (legitimate traffic)

Unknown (not normal and did not fit in any anomalous class)

Labeled item-sets + features + output classes are given to theC5.0 algorithm (machine learning) → output: classificationmodel

12 / 21

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Building the classifier (offline)

Goal: build model taking into account manually labeled frequentitem-sets

Output classes

Anomalous: DoS (DDoS, SYN/ACK/UDP/ICMP floods), NetworkScans (ICMP/Other Network Scans), Port Scans (SYN/ACK/UDPPort Scans)

Normal (legitimate traffic)

Unknown (not normal and did not fit in any anomalous class)

Labeled item-sets + features + output classes are given to theC5.0 algorithm (machine learning) → output: classificationmodel

12 / 21

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Classifying an item-set (online)

Use model to classify each incoming item-set

bpp ≤ 29 bpp > 29

proto > 6proto ≤ 6

...sIP_defined = false

...

sIP_defined = true

ppf ≤ 1.04 ppf > 1.04

...

Port Scan

DDoS

Start

13 / 21

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Outline

1 Introduction

2 Related Work

3 Our Proposal

4 Performance Evaluation

5 Conclusions

14 / 21

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Datasets

1 GÉANTEuropean backbone NREN

Connects 34 european NRENs, 12 non-european NRENs and 2commercial providers

Sampled NetFlow

2 Anella CientíficaCatalan NREN

Connects more than 80 research institutions

NetFlow (unsampled)

Our system is currently deployed in this scenario

15 / 21

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Datasets

1 GÉANTEuropean backbone NREN

Connects 34 european NRENs, 12 non-european NRENs and 2commercial providers

Sampled NetFlow

2 Anella CientíficaCatalan NREN

Connects more than 80 research institutions

NetFlow (unsampled)

Our system is currently deployed in this scenario

15 / 21

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Building the Ground Truth

1 Run frequent item-set mining on GÉANT NetFlow data

2 Manually analyze and classify returned item-setsa

Anomalous

Normal

Unknown

Ground Truth composed by 1249 labeled item-sets

16 / 21

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Building the Ground Truth

1 Run frequent item-set mining on GÉANT NetFlow data

2 Manually analyze and classify returned item-setsa

Anomalous

Normal

Unknown

Ground Truth composed by 1249 labeled item-sets

16 / 21

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Building the Ground Truth

1 Run frequent item-set mining on GÉANT NetFlow data

2 Manually analyze and classify returned item-setsa

Anomalous

Normal

Unknown

Ground Truth composed by 1249 labeled item-sets

16 / 21

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Building the Ground Truth

1 Run frequent item-set mining on GÉANT NetFlow data

2 Manually analyze and classify returned item-setsa

Anomalous

Normal

Unknown

Ground Truth composed by 1249 labeled item-sets

16 / 21

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Results in GÉANT

AC

K

FLO

OD

AC

K

PO

RT S

CA

N

DD

OS

ICM

P

FLO

OD

ICM

P S

CA

N

NE

TW

OR

K

SC

AN

NO

RM

AL

SY

N F

LO

OD

SY

N

PO

RT S

CA

N

UD

P F

LO

OD

UD

P

PO

RT S

CA

N

UN

KN

OW

N

0

20

40

60

80

100

Precision

Recall

Overall accuracy: 95,7%

Perc

enta

ge

Unbalanced model → overall performance is good (≈ 96%) but not forACK Port Scans and ICMP Floods

In balanced model (representativeness of classes above was increased)→ great improvement: 98% accuracy

17 / 21

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Results in GÉANT

ACK

FLOOD

ACK

PORT SCAN

DDOS

ICMP

FLOOD

ICMP SCAN

NETWORK

SCAN

NORMAL

SYN FLOOD

SYN

PORT SCAN

UDP FLOOD

UDP

PORT SCAN

UNKNOWN

0

20

40

60

80

100

PrecisionRecall

Overall accuracy: 98%

Percentage

Unbalanced model → overall performance is good (≈ 96%) but not forACK Port Scans and ICMP Floods

In balanced model (representativeness of classes above was increased)→ great improvement: 98% accuracy

17 / 21

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Results in the Catalan NREN

AC

K

PO

RT S

CA

N

DD

OS

ICM

P S

CA

N

NE

TW

OR

K

SC

AN

SY

N F

LO

OD

SY

N

PO

RT S

CA

N

UN

KN

OW

N

0

20

40

60

80

100

Overall Accuracy: 94,11%Perc

enta

ge

Decision tree from GÉANT data

In 10 days, 18 false positives out of 310 anomalies

Low precision for DDoS and ACK Port Scans → 80% of these FP werewrongly classified replies from Network Scans and SYN Floods

After improving the system to take this into account: in 10 days, 4 falsepositives out of 310 anomalies

18 / 21

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Results in the Catalan NREN

ACK

PORT SCAN

DDOS

ICMP SCAN

NETWORK

SCAN

SYN FLOOD

SYN

PORT SCAN

UNKNOWN

0

20

40

60

80

100

PrecissionRecall

Overall Accuracy: 99,1%Percentage

Decision tree from GÉANT data

In 10 days, 18 false positives out of 310 anomalies

Low precision for DDoS and ACK Port Scans → 80% of these FP werewrongly classified replies from Network Scans and SYN Floods

After improving the system to take this into account: in 10 days, 4 falsepositives out of 310 anomalies

18 / 21

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Outline

1 Introduction

2 Related Work

3 Our Proposal

4 Performance Evaluation

5 Conclusions

19 / 21

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Conclusions

Novel system to detect and classify anomalies in network traffic

Conceptually simple approach → Easy to comprehend andreason about detected anomalies

High classification accuracy (e.g., > 98%)

Low number of false positives (≈ 1%)

Classification model trained in GÉANT and successfully used inthe Catalan NREN

System deployed in the Catalan NREN

20 / 21

Practical Anomaly Detection based onClassifying Frequent Traffic Patterns

Ignasi Paredes-Oliva1 Ismael Castell-Uroz1 Pere Barlet-Ros1

Xenofontas Dimitropoulos2 Josep Solé-Pareta1

1UPC BarcelonaTech, Spain{iparedes,icastell,pbarlet,pareta}@ac.upc.edu

2ETH Zürich, [email protected]

15th IEEE Global Internet Symposium (GI)Orlando, FL, United States

We thank DANTE and CESCA for having provided us access to GÉANT and Anella Científica,respectively. This work was partially funded by the Spanish Ministry of Education under contractTEC2011-27474 and the Catalan Government under contract 2009SGR-1140