practical anomaly detection based on classifying frequent...

Practical Anomaly Detection based onClassifying Frequent Traffic Patterns

Ignasi Paredes-Oliva1 Ismael Castell-Uroz1 Pere Barlet-Ros1

Xenofontas Dimitropoulos2 Josep Solé-Pareta1

1UPC BarcelonaTech, Spain{iparedes,icastell,pbarlet,pareta}@ac.upc.edu

2ETH Zürich, [email protected]

15th IEEE Global Internet Symposium (GI)Orlando, FL, United States

March 30th, 2012

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Outline

1 Introduction

2 Related Work

3 Our Proposal

4 Performance Evaluation

5 Conclusions

2 / 21

Outline

1 Introduction

2 Related Work

3 Our Proposal

5 Conclusions

3 / 21

The problem

Growth of cyber-attacks1

Anomaly detection systems not widely deployede.g., too many false positives, complex black boxes

Anomaly classification and root-cause analysis are still openissues

e.g., manual analysis → error-prone, complex, slow and expensive2

Our goal

Simple system for automatic anomaly detection andclassification

High classification accuracy and low false positives

Conceptually simple working scheme

1Kim-Kwang Raymond Choo, The cyber threat landscape: Challenges and future research directions, Computers & Security, 2011.

2M. Molina et al., Anomaly Detection in Backbone Networks: Building a Security Service Upon an Innovative Tool. TNC 2010.

4 / 21

The problem

Growth of cyber-attacks1

Anomaly detection systems not widely deployede.g., too many false positives, complex black boxes

Anomaly classification and root-cause analysis are still openissues

e.g., manual analysis → error-prone, complex, slow and expensive2

Our goal

Simple system for automatic anomaly detection andclassification

High classification accuracy and low false positives

Conceptually simple working scheme

1Kim-Kwang Raymond Choo, The cyber threat landscape: Challenges and future research directions, Computers & Security, 2011.

2M. Molina et al., Anomaly Detection in Backbone Networks: Building a Security Service Upon an Innovative Tool. TNC 2010.

4 / 21

Outline

1 Introduction

2 Related Work

3 Our Proposal

5 Conclusions

5 / 21

Related work and contributions

Many proposals on anomaly detection

Anomaly classification marginally studied

Contributions of this paper

Novel approach for automatic anomaly detection and classificationbased on classifying frequent traffic patterns

Evaluated using data from two large networks

High classification accuracy and low false positives ratio

System deployed in the Catalan NREN

6 / 21

Related work and contributions

Many proposals on anomaly detection

Anomaly classification marginally studied

Contributions of this paper

Novel approach for automatic anomaly detection and classificationbased on classifying frequent traffic patterns

Evaluated using data from two large networks

High classification accuracy and low false positives ratio

6 / 21

Outline

1 Introduction

2 Related Work

3 Our Proposal

5 Conclusions

7 / 21

System Overview

Two phases:Offline: build model to classify anomaliesOnline: use model to classify incoming traffic

Item-Set

Mining

Feature

Extraction

Machine

LearningModel

Item-Set

Mining

Feature

ExtractionClassification

8 / 21

Frequent Item-Set Mining

Originally used in market basket analysis to find out products thatwere frequently bought together and make appealing offers (e.g.,beer and chips)

What is an item-set?

compact summarization of elements occurring together

Why is it useful for anomaly detection?

Many attacks involve high volume of flows with common features

e.g., Port Scan: many flows with same sIP and dIP

9 / 21

Port Scan example

sIP dIP sPort dPort1st flow X.77.17.59 Y.88.243.209 41393 212092nd flow X.77.17.59 Y.88.243.209 41393 547663rd flow X.77.17.59 Y.88.243.209 41393 314484th flow X.77.17.59 Y.88.243.209 41393 58514

...2911th flow X.77.17.59 Y.88.243.209 41393 48732

sIP dIP sPort dPortitem-set X.77.17.59 Y.88.243.209 41393 *

Need further information per item-set in order to classify it

10 / 21

Port Scan example

...2911th flow X.77.17.59 Y.88.243.209 41393 48732

10 / 21

Port Scan example

...2911th flow X.77.17.59 Y.88.243.209 41393 48732

10 / 21

Feature Extraction

Computed features for each frequent item-set

ValueDefined Undefined

Defined Src IP/Dst IP True FalseSrc/Dst Port Port Number NaNProtocol Protocol Number NaNURG/ACK/PSH/RST/SYN/FIN True FalseBytes per Packet (bpp) #Bytes/#PacketsPacket per Flow (ppf) #Packets/#Flows

11 / 21

Building the classifier (offline)

Goal: build model taking into account manually labeled frequentitem-sets

Output classes

Anomalous: DoS (DDoS, SYN/ACK/UDP/ICMP floods), NetworkScans (ICMP/Other Network Scans), Port Scans (SYN/ACK/UDPPort Scans)

Normal (legitimate traffic)

Unknown (not normal and did not fit in any anomalous class)

Labeled item-sets + features + output classes are given to theC5.0 algorithm (machine learning) → output: classificationmodel

12 / 21

Output classes

12 / 21

Output classes

12 / 21

Classifying an item-set (online)

Use model to classify each incoming item-set

bpp ≤ 29 bpp > 29

proto > 6proto ≤ 6

...sIP_defined = false

sIP_defined = true

ppf ≤ 1.04 ppf > 1.04

Port Scan

13 / 21

Outline

1 Introduction

2 Related Work

3 Our Proposal

5 Conclusions

14 / 21

Datasets

1 GÉANTEuropean backbone NREN

Connects 34 european NRENs, 12 non-european NRENs and 2commercial providers

Sampled NetFlow

2 Anella CientíficaCatalan NREN

Connects more than 80 research institutions

NetFlow (unsampled)

Our system is currently deployed in this scenario

15 / 21

Datasets

1 GÉANTEuropean backbone NREN

Connects 34 european NRENs, 12 non-european NRENs and 2commercial providers

Sampled NetFlow

2 Anella CientíficaCatalan NREN

Connects more than 80 research institutions

NetFlow (unsampled)

Our system is currently deployed in this scenario

15 / 21

Building the Ground Truth

1 Run frequent item-set mining on GÉANT NetFlow data

2 Manually analyze and classify returned item-setsa

Anomalous

Normal

Unknown

Ground Truth composed by 1249 labeled item-sets

16 / 21

Anomalous

Normal

Unknown

16 / 21

Anomalous

Normal

Unknown

16 / 21

Anomalous

Normal

Unknown

16 / 21

Results in GÉANT

Precision

Recall

Overall accuracy: 95,7%

Unbalanced model → overall performance is good (≈ 96%) but not forACK Port Scans and ICMP Floods

In balanced model (representativeness of classes above was increased)→ great improvement: 98% accuracy

17 / 21

Results in GÉANT

PORT SCAN

ICMP SCAN

NETWORK

NORMAL

SYN FLOOD

PORT SCAN

UDP FLOOD

PORT SCAN

UNKNOWN

PrecisionRecall

Overall accuracy: 98%

Percentage

Unbalanced model → overall performance is good (≈ 96%) but not forACK Port Scans and ICMP Floods

In balanced model (representativeness of classes above was increased)→ great improvement: 98% accuracy

17 / 21

Results in the Catalan NREN

Overall Accuracy: 94,11%Perc

Decision tree from GÉANT data

In 10 days, 18 false positives out of 310 anomalies

Low precision for DDoS and ACK Port Scans → 80% of these FP werewrongly classified replies from Network Scans and SYN Floods

After improving the system to take this into account: in 10 days, 4 falsepositives out of 310 anomalies

18 / 21

Results in the Catalan NREN

PORT SCAN

ICMP SCAN

NETWORK

SYN FLOOD

PORT SCAN

UNKNOWN

PrecissionRecall

Overall Accuracy: 99,1%Percentage

Decision tree from GÉANT data

In 10 days, 18 false positives out of 310 anomalies

Low precision for DDoS and ACK Port Scans → 80% of these FP werewrongly classified replies from Network Scans and SYN Floods

After improving the system to take this into account: in 10 days, 4 falsepositives out of 310 anomalies

18 / 21

Outline

1 Introduction

2 Related Work

3 Our Proposal

5 Conclusions

19 / 21

Conclusions

Novel system to detect and classify anomalies in network traffic

Conceptually simple approach → Easy to comprehend andreason about detected anomalies

High classification accuracy (e.g., > 98%)

Low number of false positives (≈ 1%)

Classification model trained in GÉANT and successfully used inthe Catalan NREN

20 / 21

Practical Anomaly Detection based onClassifying Frequent Traffic Patterns

Ignasi Paredes-Oliva1 Ismael Castell-Uroz1 Pere Barlet-Ros1

Xenofontas Dimitropoulos2 Josep Solé-Pareta1

1UPC BarcelonaTech, Spain{iparedes,icastell,pbarlet,pareta}@ac.upc.edu

2ETH Zürich, [email protected]

15th IEEE Global Internet Symposium (GI)Orlando, FL, United States

We thank DANTE and CESCA for having provided us access to GÉANT and Anella Científica,respectively. This work was partially funded by the Spanish Ministry of Education under contractTEC2011-27474 and the Catalan Government under contract 2009SGR-1140

practical anomaly detection based on classifying frequent...

Documents