network security monitoring and analysis based on big data technologies
DESCRIPTION
Network Security Monitoring and Analysis based on Big Data Technologies. Bingdong Li. August 26, 2013. Outline. Motivation Objectives System Design Monitoring and Visualization Network Measurement Classification and Identification of Network Objects Conclusion Future Work. Motivation. - PowerPoint PPT PresentationTRANSCRIPT
Network Security Monitoring and Analysis
based on Big Data TechnologiesBingdong Li
August 26, 2013
OutlineMotivation
Objectives
System Design
Monitoring and Visualization
Network Measurement
Classification and Identification of Network Objects
Conclusion
Future Work1
Motivation
Traditional security systems assume a static system
Network attacks– sophisticated – organized– targeted– persistent– dynamic– external– internal
3
Motivation
Problem: Network Security is becoming more challenging
Resource: A Large Amount of Security Data– Network flow– Firewall log– Application log– Server log– SNMP
Opportunity: Big Data Technologies, Machine Learning
6
Objectives
A network security monitor and analysis system based on Big Data technologies to
– Measures the network
– Real time continuous monitoring and interactive visualization
– Intelligent network object classification and identification based on role behavior as context
7
Big Data Machine Learning
Network Security
Objectives
8
10
System Design
Data Collection
11
System Design
Online Real Time Process
12
System Design
NoSQL Storage
13
System Design
User Interfaces
14
15
System Design
The Design supports features:
– Real Time Continuous Monitoring and Interactive Visualization
– Network Measurement
– Classification and Identification of Network Objects
16
Monitoring and Visualization
Real Time
response within a time constraint Interactive
involve user interactionContinuously
“continue to be effective overtime in light of the inevitable changes that occur”
(NIST)
17
Monitoring and Visualization
Retrieve Data
Web User Interfaces
Video Demo
18
Monitoring and Visualization
Data Retrieving:
Data are stored with IP as primary key and time slice as the secondary key in column
Accessing these data is in ϑ (1)
19
Real Time Querying
20
Host Network Connection
21
Network Status
22
Top N
23
Video Demo
Demo of Interactivity and Continuity
24
Network Measurement
A case study
The Anonymity Technology Usage on Campus Network
Using sFlow
– Geo-Location– Usage of Anonymity Systems
25
Geo-location of Anonymity Usage on Campus
One Instance: Bahamas, Belarus, Belgium, Bulgaria, Cambodia, Chile, Colombia, Estonia, Ghana, Greece, Hungary, Ireland, Israel, Jamaica, Jordan, Korea, Mongolia, Namibia, Nigeria, Pakistan, Panama, Philippines, Slovakia, Turkey, Ukraine, Vietnam, Zimbabwe
Two Instances: Chad, ChezchRep, Denmark, Hongkong, Iran, Japan, Kazakhistan, Poland, Romania, Spain, Switzerland
Three Instances: Austria, France, Singapore
Four Instances: Australia, Indonesia, Taiwan, Thailand
26
Usage of Anonymity Systems
27
Packets (%) Traffic (MB %) Observed IPs (%)
Proxies 5,580 (62.65) 8.13 (43.53) 234 (3.23)
Tor 3,129 (35.13) 9.04 (48.37) 152 (0.25)
I2P 190 (2.13) 1.50 (8.02) 23 (1.01)
Commercial 7 (0.08) 0.016 (0.08) 2 (N/A)
Total 8,906 (100) 16.69 (100) 411 (N/A)
Classification of Host Roles
Data: Three months sFlow data from a large campus
Role Count
Client 5494
Server 1920
Public Place 784
Personal Office 416
College1 163
College2 253
Web Server 56
Web Email Server 25
28
Classification of Host Roles
Algorithms
Decision Tree
On-line SVM
29
Classification of Host Roles
Features
Ad hoc based on domain knowledge
Aggregating features for on-line classification
24 features normalized between 0 and 1, inclusive
30
Classification of Host Roles
Features
24 features derived from
src/dest IP address
src/dest Port number
TTL
Package Size
Transport protocol
31
Classification of Host Roles
Ground Truth
Host Information in Active Directory
Crawler to validate its status
32
Classification of Host Roles
Classifying Client vs. Server
Classifying Web Server vs. Web Email Server
Classifying Hosts at Personal Office vs. Public Place
Classifying Hosts at Two Different Colleges
Feature Contributions
33
Classifying Client vs. Server
34
Classifying Web Server vs. Web Email Server
35
Classifying Host From Personal Office vs. Public Place
36
Classifying Host From Two Different Colleges
37
Accuracy
High accuracies of Host Role Classification
38
Classification Accuracy (%)
Clients vs. Server 99.2
Regular web server vs. Web email server 100
Hosts from personal office vs. public places 93.3
Host from two different colleges 93.3
Feature Contribution
39
Identification of a User
Data: NetFlow data from a large campus
Count
College1 163
College2 253
40
Identification of a User
AlgorithmsDecision TreeOn-line SVM
Ground TruthHost Information in Active DirectoryCrawler to validate its status
41
Identification of a User
Features
Discrete probability distribution function (pdf)
An Example:System Port Number [6, 8, 9, 11, 14, 30, 80, 1020]
–Outliner (P) is 1%,
–80 is the interested port (S)
–Number of bin 4 ( R )
42
Identification of a User
An Example
(1-0.01) * 8 to 7, the 7th is 80,
bin slice size = 80 / (4-1) = 26.6
[6, 8, 9, 11, 14, 30, 80, 1020]
pdf = 0.625 0.125 0.125 0.125
43
306,8,9,11,
1480 1020
Identification of a User
An Example without P and S
Bin size slice is 1024/4 = 256,
[6, 8, 9, 11, 14, 30, 80, 1020]
pdf = 0.875 0 0 0.125
44
6,8,9,11,14,30,80
1020
Identify a User Among Other Users
45
Accuracy
Identifying a particular user among other users
Decision Tree 93.3%
On-line Support Vector Machine 78.5%
46
Feature Contribution
47
Conclusion
Major Contributions
– A Big Data analysis system • a conference paper
– Monitoring and interactive visualization
– Usage of anonymity technologies• a conference and a journal paper
– Models of classification of host roles and identification and users
• a conference paper
48
Conclusion
The Big Data analysis system is high performance and scalable
Real Time Continuous Network Monitoring and Interactive Visualization are implemented and supported by the high performance system
49
Conclusion
Proxies and Tor are main anonymity technologies used on campus;
– US, Germany, and China are the top 3 countries
Models and Features for Classification of Host roles: – client vs. server, non-web server vs. web server,
personal office vs. public office, from two different colleges
Models of Features for Identification of a particular user among other users
50
Future Work
Improvement to the Current Work
– More interactive features and better user interfaces
– Further analysis on user identification: features, algorithm (such as deep learning)
51
Future Work
Extension to the Current Work
– Define and filter out background traffic
– Detection of operating system fingerprinting
– Identity anonymity
– Fusion with other network security data source
52
Future Work
Vision
To Provide network security as a service for individuals, small businesses, or government offices
53