security vulnerability: fingerprint in tor networks tor...

Security Vulnerability: Fingerprint in Tor NetworksTor (The Onion router)

Background Information

• What is an Onion Routing?§ Onion routing is an anonymous communication technique

over a computer network. § Messages are constantly encrypted and then sent through

several network nodes called onion routers which creates a circuit of nodes.

§ Message are put in cells and unwrapped at each node or onion router with a symmetric key.

§ The ORs only know the successor or predecessor but not any other Onion Router.oThus the analogy “onion router”. This prevents these

intermediary nodes from knowing the origin, destination, and contents of the message.

The structure of the Tor Network

Tor Browser

• It is a modified version of the popular Firefox web browser§ With extra features to protect users’ privacy

• HTTPS-Everywhere § To use secure web connections whenever possible

• NoScript§ To mitigate some weakness of JavaScript

• Disable Flash and uses only a few fonts§ To prevent websites from identifying users based on the

fonts they have installed

How Tor works

• User's software or client incrementally builds a circuit of encrypted connections through relays on the network.

Who is using Tor?• Normal people (e.g. protect their browsing records)

• Militaries (e.g. military field agents)

• Journalists and their audiences (e.g. citizen journalists encouraging social change)

• Law enforcement officers (e.g. for online “undercover” operations)

• Activists and Whilstblowers(e.g. avoid persecution while still raising a voice)

• Bloggers

• IT professionals (e.g. during development and operational testing, access internet resources while leaving security policies in place)

Who is using Tor?• Who else?

§ Malicious users want to access darkweb

o Drug

o Illegal contents

o Etc.

Tor cell

• Tor packages its cells into TLS records, which the network then splits into TCP segments.

• Characteristics: § Encrypted § Tor sends data in fixed-size (512-byte) cells§ The attacker gains no further information from each cell

• If there is not enough data to send, Tor pads cells with encrypted zeros

• Tor cells are used § Circuit construction, destruction, and flow control§ SENDME cells o once per 50 incoming cells for each stream; ando Per 100 incoming cells for each circuito noise; interrupt bursts of traffic

§ Constant size (512B); can not be broken down or combined together

How to distinguish two websites with network traffic

Feature sets (Wang 2014 USENIX)

• General features (strongest indicator)§ Total transmission size, transmission time, number of

incoming and outgoing packets• Unique packet lengths

§ Packet length between 1 and 1500• Packet ordering

§ The number of incoming packets between outgoing packets• Outgoing packets

§ The number of outgoing packets• Bursts

§ A burst of outgoing packets as a sequence of outgoing packets

• Initial packets§ The length of the first 20 packets (with direction)

Website Fingerprinting (WF)

• To execute a website fingerprinting (WF) attacks§ ExploitoAll features

• Processing Steps § Data collection§ Data preprocessing§ Feature extraction§ Model learning§ Model evaluation

Characterization of Tor Traffic using Time based Features

Lashkari, Arash Habibi, et al., ICISSP. 2017

Summary• Tor data analysis• The experiment was carried out in two scenarios:

§ Scenario A: Tor or No-Tor data§ Scenario B: Find out which applications Tor data useso Categorize applications into 8 categories: browsing, email, chat,

audio-streaming, video-streaming, file transfer, VoIP, P2P• Flow: flow of data with the same values

§ {Source IP, Destination IP, Source Port, Destination Port, and Protocol (TCP or UDP)}

• To analyze, this work focused on Network traffic information (e.g., ipaddress, port number, protocol, etc.), time-related features

Experimental environments

23 features• 23 features (p257)

§ fiat (Forward Inter Arrival Time): the time between two packets forward direction (source to destination; outgoing packets) (mean, min, max, std)

§ biat (Backward Inter Arrival Time): the time between two packetsbackward direction (destination to source; incoming packets) (mean, min, max, std)

§ flowiat (Flow Inter Arrival Time): the time between two packets;no distinction between forward and backward) (mean, min, max, std)

§ active: the time interval before the idle status§ idle: the time interval before activeo If there is no traffic between two nodes for 5 sec, it is idle.

23 features

• 23 features (p257)§ fb_psec: Flow Bytes per second; No distinction between forward

and backward§ fp_psec: Flow packets per second; No distinction between

forward and backward§ duration: o The duration time with same source IP, destination IP, source

port, destination port and protocol (TCP or UDP)o In UDP, the flow is terminated with time out; 10, 15, 30, 60,

120sec

Scenario A: Tor vs. Non-Tor

• Generated Tor data and encrypted traffic (No-Tor)

§ How to generate encrypted data[1]:

ohttps protocol or VPN

oNote: https protocol is defined as “regular encrypted”,

oVPN is defined as “encrypted traffic tunneled through

VPN”

Scenario B: 8 different applications

• Generate data to classify the types of application using Tor§ Browsing: http and https can be used§ Email: Thunderbird client, (random account(e.g., Alice and Bob))

used gmail account § Chat: Facebook, hangouts, Skype, AIM, ICQ o Pidgin applications: AIM and ICQ (http://pidgin.im)

§ Audio-Streaming: spotify (https://www.spotify.com/us/)

http://pidgin.im/

https://www.spotify.com/us/

Scenario B: 8 different applications

• Generate data to classify the types of application using Toro Video-Streaming: Youtube or Vimeoo File Transfer: (1) file transfer provided by Skype, (2) FTP over SSH

(SFTP; SSH File Transfer Protocol), (3) FTP over SSL (FTPS) o VoIP: voice-call using Facebook, Hangouts, Skype o P2P: used BitTorrent, downloaded Kali linux

(https://www.kali.org)– Used BitTorrent Vuze (http://www.vuze.com) client– Created a general environment BitTorrent by controlling

upload and download speed

https://www.kali.org/

http://www.vuze.com/

Machine learning algorithms

• Tested with different machine learning algorithms § ZeroR§ C4.5§ KNN

• For Binary classification (Tor vs. Non Tor), ZeroR is used• For all cases, C4.5 and KNN are better than ZeroR

Experimental results

• Split into 2 groups based on performance§ Good precision and recall: VOIP, P2P, Audio, File Transfer

and Video§ Classifier fail: Browsing, Chat, and Email

Experimental results (details)

• VOIP and P2P are more distinct than other apps. (with very few false positive)§ The reason is that VOIP (e.g., skype) uses p2p network

which is a unique protocol.§ But, common error in VOIP is Browsing. oThis is because VOIP is an application based on web. So it

generates traffic based on browsing, which is the cause.§ The results of P2P analysis are almost perfect. oThe reason is because it communicates with the protocol

of the P2P only; the same reasons as the VOIP above

Experimental results (details)

• AUDIO and VIDEO have a similar pattern.

§ Because,

o (1) they are working on UDP and use RTP (Real-time

Transport protocol) https://tools.ietf.org/html/rfc3550;

o (2) Intuitively, the data sent by the server is relatively

more than the data (back) sent by the client.

o (3) RTP and RTCP (Real-time Control Protocol)

§ But, it is also confused with Browsing because it uses

browsers.

• AUDIO, VIDEO, and chat used web browser, so they generate

browsing traffic.

https://tools.ietf.org/html/rfc3550

Conclusion

• Each application is more likely to be distinguished because different protocols are used. § Different protocols means different rules for packets that

are exchanged (send and receive)§ For example, different numbers of packets are required in

the initialization process.

Research challenges

• Processing Steps for Website Fingerprinting § Data collectionoHow many data can be collected

§ Data preprocessing oConvert to numeric data, normalized values

§ Feature extractiono Select less features, but important features for

computation § Model learning & evaluationoVarious machine learning models, such as random forest,

AdaBoost, XGBoost, and CNN

• Each step has various ways to improve performance

security vulnerability: fingerprint in tor networks tor...

Documents