algorithms for identification of network data streams

1
As traffic flows through router, Staleness Detector monitors the characteristics of traffic and triggers an alarm if the behavior has changed significantly. The alarm starts a process on Signature Factory, which clusters the flows matching the alarmed signature into groups. The new cluster is analyzed for signature. The new signatures are merged with purchased signatures, and then the new set of signatures is tested against a corpus of end user traffic. Algorithms for Identification of Network Data Streams Background: There is too much traffic in the Internet and identifying accurately its essential traits is a challenging problem. Existing techniques typically rely on manually generated signatures specified in packet headers, which makes traffic identification tests relatively simple. However, it lacks the flexibility required to deal with the constant changes in network traffic patterns. Problems: • How to constantly sense/detect changes of network traffic streams • How to identify suspicious traffic streams without pre-specified signatures Can we generate network traffic signatures automatically (i.e., without consumption of a network expert’s power) • Allocate network resources only when needed Proposed Solution: AutoImmune System: an Intelligent IP Service Infrastructure AutoImmune addressees a more general traffic stream identification problem that needs complex packet-payload based membership tests without pre-specified signature sets. We implemented AutoImmune by integrating the three developed algorithms, and tested the system against simulated data traffic. The system runs very well in various networking environments for non-stationary traffic streams. It adapts automatically to changes in the characteristics of network traffic and identifies new types of traffic patterns almost in real time. (It takes less than 10 seconds in a Gbps communication network to obtain a new traffic pattern). Simulation results showed that the system successfully identifies a new type of network traffic, which occupies as small as 0.2% of total network traffic. To the best of our knowledge, the lowest reachable worm detection rate that has been reported in the literature is 1.1% by a worm detection system referred to as DoWitcher. The smaller the percentage of the new type of traffic is, the longer the time spent for identifying the new type of signature is. 1.Change Detection The algorithm (in Staleness Detector) keeps a dictionary of data elements that are deemed useful in predicting future data elements. New data points that are not well explained by this dictionary are signaled as alarms. For each new data point Compute distance from this point to the points already in a dictionary If this point is very far, then set Red Alarm If it is somewhat far, then set Orange Alarm If it is close, then no alarm Periodically, evaluate Orange Alarms, and clean up dictionary A related study to our change detection algorithm is [1]. This research was supported in part by the MITACS Internship Program. The authors would like to acknowledge the contributions made by Katrina Rogers-Stewart, Yihui Tang, and Pin Yuan. [1] T. Ahmed, M. Coates and A. Lakhina, Multivariate online anomaly detection using kernel recursive least squares, in Proc. IEEE INFOCOM, Anchorage, AK, May 2007. [2] Paxson, Vern, “Bro: A System for Detecting Network Intruders in Real-Time,” Lawrence Berkeley National Laboratory Proceedings, the 7 th USENIX Security Symposium, Jan. 26-29, 1998, San Antonio TX. [3] Roesch, Martin, “Snort - Lightweight Intrusion Detection for Networks,” Proc. USENIX Lisa '99, Seattle: Nov. 7-12,1999. [4] F. Hao, M.S. Kodialam, T.V. Lakshman, and H. Zhang, “Fast Payload-Based Flow Estimation for Traffic Monitoring and Network Security,” in Proc. ANCS 2005, Oc. 26-28, 2005, New Jersey, USA. INTRODUCTION AUTOIMMUNE SYSTEM ARCHITECTURE ALGORITHMS ALGORITHMS (CONT.) CONCLUSION ACKNOWLEDEMENT Jun Li*, and Peter Rabinovitch** *Carleton University, **Bell Labs, Alcatel- Lucent Supervisor: Dr. Yiqiang Q. Zhao (Carleton University) RESULTS (CONT.) 2.Data Clustering and Classification The algorithm (in Signature Factory) classifies test data points into two clusters, typical and atypical traffic clusters. The data space is split into small regions Obtaining TWO density estimates for each region 1. The proportion of known observations 2. The proportion of test observations The observations in areas that have a nil (or very small) estimate under typical traffic, but a relatively large estimate assuming test traffic, are classified as atypical traffic. Purchas ed Signatu res Signatur e Factory Stalen ess Detect or Route r Packets of changed signature 5-tuple, packet size, … Alarms of signature changes New signatures RESULTS 3.Signature Extraction A signature-based algorithm similar to Bro [2], SNORT [3], and based on [4] Only the cluster of atypical traffic is examined for extracting signatures mean packet size flow length in packets Data space Cluster of typical traffic Cluster of atypical traffic Fig. 2: Clustering 2-dimensional Data REFERENCES In an implemented system, 20 computers are connected through Router (shown in Fig. 1) and communicate multimedia traffic. Staleness Detector and Signature Factory connect Router and run separately. Five types of traffic flows are Web, Mix, Smtp, VoIP, and Video. The statistics of the traffic flows are shown in Table 1. Fig. 1: AutoImmune Architecture Avg. flow length (# of packets) Std. flow length Avg. Packet size Std. packet size Web 6 2 1500 100 SMTP 3 2 1500 100 VoIP 200 50 200 100 Vide o 600 100 400 200 Mix 40 2 1000 100 Table 1: Five Types of Traffic Flows Network speed is assumed to be 1 Gbps. At the beginning of simulation, each computer generates traffic without Mix flows. When simulation enters steady state, Mix flows start to be generated on each computer with a specified proportion shown in Table 2. The payload of each Mix packet is injected with a synthetic worm. The injected Mix traffic is of Web type while passing through the router. Simulation run Web Smtp Voi p Vide o Mix (or Malicious) S1 45% 20% 20% 10% 5% S2 49% 20% 20% 10% 1% S3 49.8 % 20% 20% 10% 0.2% Table 2: Proportions of Traffic Flows Fig. 3: Change Detection in S1 Fig. 4: Change Detection in S2 Fig. 5: Change Detection in S3 Fig. 6: Flow Clustering and Classification in S1 Fig. 7: Flow Clustering and Classification in S2 Fig. 8: Flow Clustering and Classification in S3 Simulation run T N N’ MEAN L S1 0.25 679 48 1030.6 21 S2 0.679 859 80 1048.9 18 S3 3.14 789 117 1091.2 20 Define the following parameters for each simulation run: 1) T -- Period from when malware (e.g., Mix traffic) starts until new signature is obtained by Router 2) N -- Number of items in the Cluster of atypical traffic 3) N’ -- Number of items in the atypical traffic Cluster that are NOT malware (or of Mix type) 4) MEAN -- Mean of the length (in Bytes) of packets in the Cluster of atypical traffic 5) L – Length (in Bytes) of the signature extracted Table 3: Numerical Values of Parameters

Upload: grady-sexton

Post on 30-Dec-2015

20 views

Category:

Documents


4 download

DESCRIPTION

Algorithms for Identification of Network Data Streams. flow length in packets. Data space. mean packet size. Cluster of atypical traffic. Cluster of typical traffic. Purchased Signatures. Alarms of signature changes. Staleness Detector. Signature Factory. New signatures. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Algorithms for Identification of Network Data Streams

As traffic flows through router, Staleness Detector monitors the characteristics of traffic and triggers an alarm if the behavior has changed significantly. The alarm starts a process on Signature Factory, which clusters the flows matching the alarmed signature into groups. The new cluster is analyzed for signature. The new signatures are merged with purchased signatures, and then the new set of signatures is tested against a corpus of end user traffic.

Algorithms for Identification of Network Data Streams

Background:

There is too much traffic in the Internet and identifying accurately its essential traits is a challenging problem. Existing techniques typically rely on manually generated signatures specified in packet headers, which makes traffic identification tests relatively simple. However, it lacks the flexibility required to deal with the constant changes in network traffic patterns.

Problems:

• How to constantly sense/detect changes of network traffic streams

• How to identify suspicious traffic streams without pre-specified signatures

• Can we generate network traffic signatures automatically (i.e., without consumption of a network expert’s power)

• Allocate network resources only when needed

Proposed Solution:

AutoImmune System: an Intelligent IP Service Infrastructure

AutoImmune addressees a more general traffic stream identification problem that needs complex packet-payload based membership tests without pre-specified signature sets. We implemented AutoImmune by integrating the three developed algorithms, and tested the system against simulated data traffic. The system runs very well in various networking environments for non-stationary traffic streams. It adapts automatically to changes in the characteristics of network traffic and identifies new types of traffic patterns almost in real time. (It takes less than 10 seconds in a Gbps communication network to obtain a new traffic pattern). Simulation results showed that the system successfully identifies a new type of network traffic, which occupies as small as 0.2% of total network traffic. To the best of our knowledge, the lowest reachable worm detection rate that has been reported in the literature is 1.1% by a worm detection system referred to as DoWitcher. The smaller the percentage of the new type of traffic is, the longer the time spent for identifying the new type of signature is.

1. Change Detection

The algorithm (in Staleness Detector) keeps a dictionary of data elements that are deemed useful in predicting future data elements. New data points that are not well explained by this dictionary are signaled as alarms. For each new data point

Compute distance from this point to the points already in a dictionary

If this point is very far, then set Red Alarm

If it is somewhat far, then set Orange Alarm

If it is close, then no alarm

Periodically, evaluate Orange Alarms, and clean up dictionary

A related study to our change detection algorithm is [1].

This research was supported in part by the MITACS Internship Program. The authors would like to acknowledge the contributions made by Katrina Rogers-Stewart, Yihui Tang, and Pin Yuan.

[1] T. Ahmed, M. Coates and A. Lakhina, Multivariate online anomaly detection using kernel recursive least squares, in Proc. IEEE INFOCOM, Anchorage, AK, May 2007.[2] Paxson, Vern, “Bro: A System for Detecting Network Intruders in Real-Time,” Lawrence Berkeley National Laboratory Proceedings, the 7th USENIX Security Symposium, Jan. 26-29, 1998, San Antonio TX. [3] Roesch, Martin, “Snort - Lightweight Intrusion Detection for Networks,” Proc. USENIX Lisa '99, Seattle: Nov. 7-12,1999. [4] F. Hao, M.S. Kodialam, T.V. Lakshman, and H. Zhang, “Fast Payload-Based Flow Estimation for Traffic Monitoring and Network Security,” in Proc. ANCS 2005, Oc. 26-28, 2005, New Jersey, USA.

INTRODUCTION

AUTOIMMUNE SYSTEM ARCHITECTURE

ALGORITHMS

ALGORITHMS (CONT.)

CONCLUSION

ACKNOWLEDEMENT

Jun Li*, and Peter Rabinovitch** *Carleton University, **Bell Labs, Alcatel-Lucent

Supervisor: Dr. Yiqiang Q. Zhao (Carleton University)

RESULTS (CONT.)

2. Data Clustering and Classification

The algorithm (in Signature Factory) classifies test data points into two clusters, typical and atypical traffic clusters.

The data space is split into small regions

Obtaining TWO density estimates for each region

1. The proportion of known observations

2. The proportion of test observations

The observations in areas that have a nil (or very small) estimate under typical traffic, but a relatively large estimate assuming test traffic, are classified as atypical traffic.

Purchased

Signatures

Signature

Factory

Staleness

Detector

Router

Packets of changed signature

5-tuple, packet size, …

Alarms of signature changes

New signatures

RESULTS

3. Signature Extraction A signature-based algorithm similar to Bro [2], SNORT [3], and

based on [4]

Only the cluster of atypical traffic is examined for extracting signatures

•mean packet size

• flow

length

in p

ack

ets

Data space

Cluster of typical traffic

Cluster of atypical traffic

Fig. 2: Clustering 2-dimensional Data

REFERENCES

In an implemented system, 20 computers are connected through Router (shown in Fig. 1) and communicate multimedia traffic. Staleness Detector and Signature Factory connect Router and run separately. Five types of traffic flows are Web, Mix, Smtp, VoIP, and Video. The statistics of the traffic flows are shown in Table 1.

Fig. 1: AutoImmune Architecture

Avg. flow length

(# of packets)

Std. flow length

Avg. Packet size

Std. packet size

Web 6 2 1500 100

SMTP 3 2 1500 100

VoIP 200 50 200 100

Video 600 100 400 200

Mix 40 2 1000 100

Table 1: Five Types of Traffic Flows

Network speed is assumed to be 1 Gbps. At the beginning of simulation, each computer generates traffic without Mix flows. When simulation enters steady state, Mix flows start to be generated on each computer with a specified proportion shown in Table 2. The payload of each Mix packet is injected with a synthetic worm. The injected Mix traffic is of Web type while passing through the router.

Simulation run Web Smtp Voip Video Mix (or Malicious)

S1 45% 20% 20% 10% 5%

S2 49% 20% 20% 10% 1%

S3 49.8%

20% 20% 10% 0.2%

Table 2: Proportions of Traffic Flows

Fig. 3: Change Detection in S1 Fig. 4: Change Detection in S2

Fig. 5: Change Detection in S3 Fig. 6: Flow Clustering and Classification in S1

Fig. 7: Flow Clustering and Classification in S2 Fig. 8: Flow Clustering and Classification in S3

Simulation run T N N’ MEAN L

S1 0.25 679 48 1030.6 21

S2 0.679 859 80 1048.9 18

S3 3.14 789 117 1091.2 20

Define the following parameters for each simulation run:

1) T -- Period from when malware (e.g., Mix traffic) starts until new signature is obtained by Router

2) N -- Number of items in the Cluster of atypical traffic

3) N’ -- Number of items in the atypical traffic Cluster that are NOT malware (or of Mix type)

4) MEAN -- Mean of the length (in Bytes) of packets in the Cluster of atypical traffic

5) L – Length (in Bytes) of the signature extracted

Table 3: Numerical Values of Parameters