unconstrained endpoint profiling (googling the internet) powerpoint narus

Upload: stopspyingonme

Post on 02-Apr-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/27/2019 Unconstrained Endpoint Profiling (Googling the Internet) powerpoint narus

    1/23

    Unconstrained Endpoint Profiling

    (Googling the Internet)

    Ionut Trestian

    Supranamaya Ranjan

    Aleksandar Kuzmanovic

    Antonio Nucci

    Northwestern University

    Narus Inc.

    http://networks.cs.northwestern.edu http://www.narus.com

  • 7/27/2019 Unconstrained Endpoint Profiling (Googling the Internet) powerpoint narus

    2/23

  • 7/27/2019 Unconstrained Endpoint Profiling (Googling the Internet) powerpoint narus

    3/23

    I. Trestian Unconstrained Endpoint Profiling (Googling the Internet)

    3

    Introduction

    Can we use Google for networking research?

    Can we systematically exploit search engines toharvest endpoint information available on the Internet?

    Huge amount of endpointinformation available on the web

  • 7/27/2019 Unconstrained Endpoint Profiling (Googling the Internet) powerpoint narus

    4/23

    I. Trestian Unconstrained Endpoint Profiling (Googling the Internet)

    4

    Websites run logging softwareand display statistics

    Some popular proxy services alsodisplay logs

    Popular servers (e.g., gaming) IPaddresses are listed

    Blacklists, banlists, spamlists also

    have web interfaces

    Even P2P information is available

    on the Internet since the first pointof contact with a P2P swarm is a

    publicly available IP address

    Where Does the Information Come From?

    Servers

    Clients

    P2P

    Malicious

  • 7/27/2019 Unconstrained Endpoint Profiling (Googling the Internet) powerpoint narus

    5/23

    I. Trestian Unconstrained Endpoint Profiling (Googling the Internet)

    Problem detecting active IP ranges

    Needed for large-scale measurements e.g., Iplane [OSDI06]

    Current approaches

    Active

    Passive

    Our approach

    IP addresses that generate hits on Google could suggest activeranges

    5

    Inferring Active IP Ranges in Target Networks

  • 7/27/2019 Unconstrained Endpoint Profiling (Googling the Internet) powerpoint narus

    6/23

    I. Trestian Unconstrained Endpoint Profiling (Googling the Internet)

    6

    Can we infer what applications peopleare using across the world withouthaving access to network traces?

    Detecting Application Usage Trends

  • 7/27/2019 Unconstrained Endpoint Profiling (Googling the Internet) powerpoint narus

    7/23

    I. Trestian Unconstrained Endpoint Profiling (Googling the Internet)

    7

    Traffic Classification

    Problem traffic classification

    Current approaches

    (port-based, payload signatures,

    numerical and statistical etc.)

    Our approach

    Use information about destination IP

    addresses available on the Internet

  • 7/27/2019 Unconstrained Endpoint Profiling (Googling the Internet) powerpoint narus

    8/23

    I. Trestian Unconstrained Endpoint Profiling (Googling the Internet)

    URL Hit text

    URL Hit text

    URL Hit text

    . .

    Rapid Match

    Domain name Keywords

    Domain name Keywords..

    IP tagging

    IP Addressxxx.xxx.xxx.xxx

    Website cache

    Search hits

    8

    58.61.33.40 QQ Chat Server

    Methodology Web Classifier and IP Tagging

  • 7/27/2019 Unconstrained Endpoint Profiling (Googling the Internet) powerpoint narus

    9/23

    I. Trestian Unconstrained Endpoint Profiling (Googling the Internet)

    9

    China

    Packet level traces available

    BrasilPacket level traces available

    United States Sampled NetFlow available

    France

    No traces available

    Evaluation Ground Truth

  • 7/27/2019 Unconstrained Endpoint Profiling (Googling the Internet) powerpoint narus

    10/23

    I. Trestian Unconstrained Endpoint Profiling (Googling the Internet)

    10

    Inferring Active IP Ranges in Target Networks

  • 7/27/2019 Unconstrained Endpoint Profiling (Googling the Internet) powerpoint narus

    11/23

    I. Trestian Unconstrained Endpoint Profiling (Googling the Internet)

    11

    Google hits

    XXX.163.0.0/17 network rangeOverlap is around 77%

    Inferring Active IP Ranges in Target Networks

    Actual endpointsfrom trace

  • 7/27/2019 Unconstrained Endpoint Profiling (Googling the Internet) powerpoint narus

    12/23

    I. Trestian Unconstrained Endpoint Profiling (Googling the Internet)

    12

    Application Usage Trends

  • 7/27/2019 Unconstrained Endpoint Profiling (Googling the Internet) powerpoint narus

    13/23

    I. Trestian Unconstrained Endpoint Profiling (Googling the Internet)

    13

    Correlation Between Network Traces and UEP

  • 7/27/2019 Unconstrained Endpoint Profiling (Googling the Internet) powerpoint narus

    14/23

  • 7/27/2019 Unconstrained Endpoint Profiling (Googling the Internet) powerpoint narus

    15/23

    I. Trestian Unconstrained Endpoint Profiling (Googling the Internet)

    Is this scalable?

    15

    165.124.182.169

    Tagged IP Cache

    Traffic Classification

    Mail server

    193.226.5.150 Website

    68.87.195.25 Router

    186.25.13.24 Halo server

    Hold a small % of theIP addresses seen

    Look at source anddestination IP addresses

    and classify traffic

  • 7/27/2019 Unconstrained Endpoint Profiling (Googling the Internet) powerpoint narus

    16/23

    I. Trestian Unconstrained Endpoint Profiling (Googling the Internet)

    16

    5% of the destinationssink 95% of traffic

    Traffic Classification

  • 7/27/2019 Unconstrained Endpoint Profiling (Googling the Internet) powerpoint narus

    17/23

    I. Trestian Unconstrained Endpoint Profiling (Googling the Internet)

    UEP (SIGCOMM 2008)

    - Uses information available on the web- Constructs a semantically rich endpoint database- Very flexible (can be used in a variety of scenarios)

    17

    BLINC vs. UEP

    BLINC (SIGCOMM 2005)

    - Works in the dark (doesnt examine payload) - Uses graphlets to identify traffic patterns - Uses thresholds to further classify traffic

  • 7/27/2019 Unconstrained Endpoint Profiling (Googling the Internet) powerpoint narus

    18/23

    I. Trestian Unconstrained Endpoint Profiling (Googling the Internet)

    Traffic[%]

    BLINC doesnt find

    some categories

    UEP also provides better semanticsClasses can be further divided into different services

    UEP classifies twice as

    much traffic as BLINC

    18

    BLINC vs. UEP (cont.)

  • 7/27/2019 Unconstrained Endpoint Profiling (Googling the Internet) powerpoint narus

    19/23

    I. Trestian Unconstrained Endpoint Profiling (Googling the Internet)

    19

    Each packet has a 1/Sampling ratechance of being kept

    (Cisco Netflow)

    Sampled data is considered to be poorer in information

    However ISPs consider scalable to gather onlysampled data

    Working with Sampled Traffic

    X XX X

    X XX X

  • 7/27/2019 Unconstrained Endpoint Profiling (Googling the Internet) powerpoint narus

    20/23

    I. Trestian Unconstrained Endpoint Profiling (Googling the Internet)

    A quarter of the IP addresses stillin the trace at sampling rate 100

    Most of the popular IP addressesstill in the trace

    20

    Working with Sampled Traffic

  • 7/27/2019 Unconstrained Endpoint Profiling (Googling the Internet) powerpoint narus

    21/23

    I. Trestian Unconstrained Endpoint Profiling (Googling the Internet)

    When no sampling is doneUEP outperforms BLINC

    UEP maintains a largeclassification ratio even at

    higher sampling rates

    BLINC stays in the dark2% at sampling rate 100

    UEP retains high classificationcapabilities with sampled traffic

    21

    Working with Sampled Traffic

  • 7/27/2019 Unconstrained Endpoint Profiling (Googling the Internet) powerpoint narus

    22/23

    I. Trestian Unconstrained Endpoint Profiling (Googling the Internet)

    Performed clustering of endpoints in order to

    cluster out common behavior

    Please see the paper for detailed results

    22

    Endpoint Clustering

    Real strength:

    We managed to achieve similar results both by usingthe trace and only by using UEP

  • 7/27/2019 Unconstrained Endpoint Profiling (Googling the Internet) powerpoint narus

    23/23

    23

    Key contribution:

    Shift research focus from mining operational networktraces to harnessing information that is alreadyavailable on the web

    Our approach can: Predict application and protocol usage trends in

    arbitrary networks

    Dramatically outperform classification tools Retain high classification capabilities when dealing

    with sampled data

    Conclusions