network security monitoring and analysis based on big data technologies

Post on 03-Jan-2016

154 Views

Category:

Documents

5 Downloads

Preview:

Click to see full reader

DESCRIPTION

Network Security Monitoring and Analysis based on Big Data Technologies. Bingdong Li. August 26, 2013. Outline. Motivation Objectives System Design Monitoring and Visualization Network Measurement Classification and Identification of Network Objects Conclusion Future Work. Motivation. - PowerPoint PPT Presentation

TRANSCRIPT

Network Security Monitoring and Analysis

based on Big Data TechnologiesBingdong Li

August 26, 2013

OutlineMotivation

Objectives

System Design

Monitoring and Visualization

Network Measurement

Classification and Identification of Network Objects

Conclusion

Future Work1

Motivation

Traditional security systems assume a static system

Network attacks– sophisticated – organized– targeted– persistent– dynamic– external– internal

3

Motivation

Problem: Network Security is becoming more challenging

Resource: A Large Amount of Security Data– Network flow– Firewall log– Application log– Server log– SNMP

Opportunity: Big Data Technologies, Machine Learning

6

Objectives

A network security monitor and analysis system based on Big Data technologies to

– Measures the network

– Real time continuous monitoring and interactive visualization

– Intelligent network object classification and identification based on role behavior as context

7

Big Data Machine Learning

Network Security

Objectives

8

10

System Design

Data Collection

11

System Design

Online Real Time Process

12

System Design

NoSQL Storage

13

System Design

User Interfaces

14

15

System Design

The Design supports features:

– Real Time Continuous Monitoring and Interactive Visualization

– Network Measurement

– Classification and Identification of Network Objects

16

Monitoring and Visualization

Real Time

response within a time constraint Interactive

involve user interactionContinuously

“continue to be effective overtime in light of the inevitable changes that occur”

(NIST)

17

Monitoring and Visualization

Retrieve Data

Web User Interfaces

Video Demo

18

Monitoring and Visualization

Data Retrieving:

Data are stored with IP as primary key and time slice as the secondary key in column

Accessing these data is in ϑ (1)

19

Real Time Querying

20

Host Network Connection

21

Network Status

22

Top N

23

Video Demo

Demo of Interactivity and Continuity

24

Network Measurement

A case study

The Anonymity Technology Usage on Campus Network

Using sFlow

– Geo-Location– Usage of Anonymity Systems

25

Geo-location of Anonymity Usage on Campus

One Instance: Bahamas, Belarus, Belgium, Bulgaria, Cambodia, Chile, Colombia, Estonia, Ghana, Greece, Hungary, Ireland, Israel, Jamaica, Jordan, Korea, Mongolia, Namibia, Nigeria, Pakistan, Panama, Philippines, Slovakia, Turkey, Ukraine, Vietnam, Zimbabwe

Two Instances: Chad, ChezchRep, Denmark, Hongkong, Iran, Japan, Kazakhistan, Poland, Romania, Spain, Switzerland

Three Instances: Austria, France, Singapore

Four Instances: Australia, Indonesia, Taiwan, Thailand

26

Usage of Anonymity Systems

27

Packets (%) Traffic (MB %) Observed IPs (%)

Proxies 5,580 (62.65) 8.13 (43.53) 234 (3.23)

Tor 3,129 (35.13) 9.04 (48.37) 152 (0.25)

I2P 190 (2.13) 1.50 (8.02) 23 (1.01)

Commercial 7 (0.08) 0.016 (0.08) 2 (N/A)

Total 8,906 (100) 16.69 (100) 411 (N/A)

Classification of Host Roles

Data: Three months sFlow data from a large campus

Role Count

Client 5494

Server 1920

Public Place 784

Personal Office 416

College1 163

College2 253

Web Server 56

Web Email Server 25

28

Classification of Host Roles

Algorithms

Decision Tree

On-line SVM

29

Classification of Host Roles

Features

Ad hoc based on domain knowledge

Aggregating features for on-line classification

24 features normalized between 0 and 1, inclusive

30

Classification of Host Roles

Features

24 features derived from

src/dest IP address

src/dest Port number

TTL

Package Size

Transport protocol

31

Classification of Host Roles

Ground Truth

Host Information in Active Directory

Crawler to validate its status

32

Classification of Host Roles

Classifying Client vs. Server

Classifying Web Server vs. Web Email Server

Classifying Hosts at Personal Office vs. Public Place

Classifying Hosts at Two Different Colleges

Feature Contributions

33

Classifying Client vs. Server

34

Classifying Web Server vs. Web Email Server

35

Classifying Host From Personal Office vs. Public Place

36

Classifying Host From Two Different Colleges

37

Accuracy

High accuracies of Host Role Classification

38

Classification Accuracy (%)

Clients vs. Server 99.2

Regular web server vs. Web email server 100

Hosts from personal office vs. public places 93.3

Host from two different colleges 93.3

Feature Contribution

39

Identification of a User

Data: NetFlow data from a large campus

Count

College1 163

College2 253

40

Identification of a User

AlgorithmsDecision TreeOn-line SVM

Ground TruthHost Information in Active DirectoryCrawler to validate its status

41

Identification of a User

Features

Discrete probability distribution function (pdf)

An Example:System Port Number [6, 8, 9, 11, 14, 30, 80, 1020]

–Outliner (P) is 1%,

–80 is the interested port (S)

–Number of bin 4 ( R )

42

Identification of a User

An Example

(1-0.01) * 8 to 7, the 7th is 80,

bin slice size = 80 / (4-1) = 26.6

[6, 8, 9, 11, 14, 30, 80, 1020]

pdf = 0.625 0.125 0.125 0.125

43

306,8,9,11,

1480 1020

Identification of a User

An Example without P and S

Bin size slice is 1024/4 = 256,

[6, 8, 9, 11, 14, 30, 80, 1020]

pdf = 0.875 0 0 0.125

44

6,8,9,11,14,30,80

1020

Identify a User Among Other Users

45

Accuracy

Identifying a particular user among other users

Decision Tree 93.3%

On-line Support Vector Machine 78.5%

46

Feature Contribution

47

Conclusion

Major Contributions

– A Big Data analysis system • a conference paper

– Monitoring and interactive visualization

– Usage of anonymity technologies• a conference and a journal paper

– Models of classification of host roles and identification and users

• a conference paper

48

Conclusion

The Big Data analysis system is high performance and scalable

Real Time Continuous Network Monitoring and Interactive Visualization are implemented and supported by the high performance system

49

Conclusion

Proxies and Tor are main anonymity technologies used on campus;

– US, Germany, and China are the top 3 countries

Models and Features for Classification of Host roles: – client vs. server, non-web server vs. web server,

personal office vs. public office, from two different colleges

Models of Features for Identification of a particular user among other users

50

Future Work

Improvement to the Current Work

– More interactive features and better user interfaces

– Further analysis on user identification: features, algorithm (such as deep learning)

51

Future Work

Extension to the Current Work

– Define and filter out background traffic

– Detection of operating system fingerprinting

– Identity anonymity

– Fusion with other network security data source

52

Future Work

Vision

To Provide network security as a service for individuals, small businesses, or government offices

53

top related