unconstrained endpoint profiling (googling the internet) powerpoint narus
TRANSCRIPT
-
7/27/2019 Unconstrained Endpoint Profiling (Googling the Internet) powerpoint narus
1/23
Unconstrained Endpoint Profiling
(Googling the Internet)
Ionut Trestian
Supranamaya Ranjan
Aleksandar Kuzmanovic
Antonio Nucci
Northwestern University
Narus Inc.
http://networks.cs.northwestern.edu http://www.narus.com
-
7/27/2019 Unconstrained Endpoint Profiling (Googling the Internet) powerpoint narus
2/23
-
7/27/2019 Unconstrained Endpoint Profiling (Googling the Internet) powerpoint narus
3/23
I. Trestian Unconstrained Endpoint Profiling (Googling the Internet)
3
Introduction
Can we use Google for networking research?
Can we systematically exploit search engines toharvest endpoint information available on the Internet?
Huge amount of endpointinformation available on the web
-
7/27/2019 Unconstrained Endpoint Profiling (Googling the Internet) powerpoint narus
4/23
I. Trestian Unconstrained Endpoint Profiling (Googling the Internet)
4
Websites run logging softwareand display statistics
Some popular proxy services alsodisplay logs
Popular servers (e.g., gaming) IPaddresses are listed
Blacklists, banlists, spamlists also
have web interfaces
Even P2P information is available
on the Internet since the first pointof contact with a P2P swarm is a
publicly available IP address
Where Does the Information Come From?
Servers
Clients
P2P
Malicious
-
7/27/2019 Unconstrained Endpoint Profiling (Googling the Internet) powerpoint narus
5/23
I. Trestian Unconstrained Endpoint Profiling (Googling the Internet)
Problem detecting active IP ranges
Needed for large-scale measurements e.g., Iplane [OSDI06]
Current approaches
Active
Passive
Our approach
IP addresses that generate hits on Google could suggest activeranges
5
Inferring Active IP Ranges in Target Networks
-
7/27/2019 Unconstrained Endpoint Profiling (Googling the Internet) powerpoint narus
6/23
I. Trestian Unconstrained Endpoint Profiling (Googling the Internet)
6
Can we infer what applications peopleare using across the world withouthaving access to network traces?
Detecting Application Usage Trends
-
7/27/2019 Unconstrained Endpoint Profiling (Googling the Internet) powerpoint narus
7/23
I. Trestian Unconstrained Endpoint Profiling (Googling the Internet)
7
Traffic Classification
Problem traffic classification
Current approaches
(port-based, payload signatures,
numerical and statistical etc.)
Our approach
Use information about destination IP
addresses available on the Internet
-
7/27/2019 Unconstrained Endpoint Profiling (Googling the Internet) powerpoint narus
8/23
I. Trestian Unconstrained Endpoint Profiling (Googling the Internet)
URL Hit text
URL Hit text
URL Hit text
. .
Rapid Match
Domain name Keywords
Domain name Keywords..
IP tagging
IP Addressxxx.xxx.xxx.xxx
Website cache
Search hits
8
58.61.33.40 QQ Chat Server
Methodology Web Classifier and IP Tagging
-
7/27/2019 Unconstrained Endpoint Profiling (Googling the Internet) powerpoint narus
9/23
I. Trestian Unconstrained Endpoint Profiling (Googling the Internet)
9
China
Packet level traces available
BrasilPacket level traces available
United States Sampled NetFlow available
France
No traces available
Evaluation Ground Truth
-
7/27/2019 Unconstrained Endpoint Profiling (Googling the Internet) powerpoint narus
10/23
I. Trestian Unconstrained Endpoint Profiling (Googling the Internet)
10
Inferring Active IP Ranges in Target Networks
-
7/27/2019 Unconstrained Endpoint Profiling (Googling the Internet) powerpoint narus
11/23
I. Trestian Unconstrained Endpoint Profiling (Googling the Internet)
11
Google hits
XXX.163.0.0/17 network rangeOverlap is around 77%
Inferring Active IP Ranges in Target Networks
Actual endpointsfrom trace
-
7/27/2019 Unconstrained Endpoint Profiling (Googling the Internet) powerpoint narus
12/23
I. Trestian Unconstrained Endpoint Profiling (Googling the Internet)
12
Application Usage Trends
-
7/27/2019 Unconstrained Endpoint Profiling (Googling the Internet) powerpoint narus
13/23
I. Trestian Unconstrained Endpoint Profiling (Googling the Internet)
13
Correlation Between Network Traces and UEP
-
7/27/2019 Unconstrained Endpoint Profiling (Googling the Internet) powerpoint narus
14/23
-
7/27/2019 Unconstrained Endpoint Profiling (Googling the Internet) powerpoint narus
15/23
I. Trestian Unconstrained Endpoint Profiling (Googling the Internet)
Is this scalable?
15
165.124.182.169
Tagged IP Cache
Traffic Classification
Mail server
193.226.5.150 Website
68.87.195.25 Router
186.25.13.24 Halo server
Hold a small % of theIP addresses seen
Look at source anddestination IP addresses
and classify traffic
-
7/27/2019 Unconstrained Endpoint Profiling (Googling the Internet) powerpoint narus
16/23
I. Trestian Unconstrained Endpoint Profiling (Googling the Internet)
16
5% of the destinationssink 95% of traffic
Traffic Classification
-
7/27/2019 Unconstrained Endpoint Profiling (Googling the Internet) powerpoint narus
17/23
I. Trestian Unconstrained Endpoint Profiling (Googling the Internet)
UEP (SIGCOMM 2008)
- Uses information available on the web- Constructs a semantically rich endpoint database- Very flexible (can be used in a variety of scenarios)
17
BLINC vs. UEP
BLINC (SIGCOMM 2005)
- Works in the dark (doesnt examine payload) - Uses graphlets to identify traffic patterns - Uses thresholds to further classify traffic
-
7/27/2019 Unconstrained Endpoint Profiling (Googling the Internet) powerpoint narus
18/23
I. Trestian Unconstrained Endpoint Profiling (Googling the Internet)
Traffic[%]
BLINC doesnt find
some categories
UEP also provides better semanticsClasses can be further divided into different services
UEP classifies twice as
much traffic as BLINC
18
BLINC vs. UEP (cont.)
-
7/27/2019 Unconstrained Endpoint Profiling (Googling the Internet) powerpoint narus
19/23
I. Trestian Unconstrained Endpoint Profiling (Googling the Internet)
19
Each packet has a 1/Sampling ratechance of being kept
(Cisco Netflow)
Sampled data is considered to be poorer in information
However ISPs consider scalable to gather onlysampled data
Working with Sampled Traffic
X XX X
X XX X
-
7/27/2019 Unconstrained Endpoint Profiling (Googling the Internet) powerpoint narus
20/23
I. Trestian Unconstrained Endpoint Profiling (Googling the Internet)
A quarter of the IP addresses stillin the trace at sampling rate 100
Most of the popular IP addressesstill in the trace
20
Working with Sampled Traffic
-
7/27/2019 Unconstrained Endpoint Profiling (Googling the Internet) powerpoint narus
21/23
I. Trestian Unconstrained Endpoint Profiling (Googling the Internet)
When no sampling is doneUEP outperforms BLINC
UEP maintains a largeclassification ratio even at
higher sampling rates
BLINC stays in the dark2% at sampling rate 100
UEP retains high classificationcapabilities with sampled traffic
21
Working with Sampled Traffic
-
7/27/2019 Unconstrained Endpoint Profiling (Googling the Internet) powerpoint narus
22/23
I. Trestian Unconstrained Endpoint Profiling (Googling the Internet)
Performed clustering of endpoints in order to
cluster out common behavior
Please see the paper for detailed results
22
Endpoint Clustering
Real strength:
We managed to achieve similar results both by usingthe trace and only by using UEP
-
7/27/2019 Unconstrained Endpoint Profiling (Googling the Internet) powerpoint narus
23/23
23
Key contribution:
Shift research focus from mining operational networktraces to harnessing information that is alreadyavailable on the web
Our approach can: Predict application and protocol usage trends in
arbitrary networks
Dramatically outperform classification tools Retain high classification capabilities when dealing
with sampled data
Conclusions