security vulnerability: fingerprint in tor networks tor...
TRANSCRIPT
Security Vulnerability: Fingerprint in Tor NetworksTor (The Onion router)
Background Information
• What is an Onion Routing?§ Onion routing is an anonymous communication technique
over a computer network. § Messages are constantly encrypted and then sent through
several network nodes called onion routers which creates a circuit of nodes.
§ Message are put in cells and unwrapped at each node or onion router with a symmetric key.
§ The ORs only know the successor or predecessor but not any other Onion Router.oThus the analogy “onion router”. This prevents these
intermediary nodes from knowing the origin, destination, and contents of the message.
The structure of the Tor Network
Tor Browser
• It is a modified version of the popular Firefox web browser§ With extra features to protect users’ privacy
• HTTPS-Everywhere § To use secure web connections whenever possible
• NoScript§ To mitigate some weakness of JavaScript
• Disable Flash and uses only a few fonts§ To prevent websites from identifying users based on the
fonts they have installed
How Tor works
• User's software or client incrementally builds a circuit of encrypted connections through relays on the network.
Who is using Tor?• Normal people (e.g. protect their browsing records)
• Militaries (e.g. military field agents)
• Journalists and their audiences (e.g. citizen journalists encouraging social change)
• Law enforcement officers (e.g. for online “undercover” operations)
• Activists and Whilstblowers(e.g. avoid persecution while still raising a voice)
• Bloggers
• IT professionals (e.g. during development and operational testing, access internet resources while leaving security policies in place)
Who is using Tor?• Who else?
§ Malicious users want to access darkweb
o Drug
o Illegal contents
o Etc.
Tor cell
• Tor packages its cells into TLS records, which the network then splits into TCP segments.
• Characteristics: § Encrypted § Tor sends data in fixed-size (512-byte) cells§ The attacker gains no further information from each cell
• If there is not enough data to send, Tor pads cells with encrypted zeros
• Tor cells are used § Circuit construction, destruction, and flow control§ SENDME cells o once per 50 incoming cells for each stream; ando Per 100 incoming cells for each circuito noise; interrupt bursts of traffic
§ Constant size (512B); can not be broken down or combined together
How to distinguish two websites with network traffic
Feature sets (Wang 2014 USENIX)
• General features (strongest indicator)§ Total transmission size, transmission time, number of
incoming and outgoing packets• Unique packet lengths
§ Packet length between 1 and 1500• Packet ordering
§ The number of incoming packets between outgoing packets• Outgoing packets
§ The number of outgoing packets• Bursts
§ A burst of outgoing packets as a sequence of outgoing packets
• Initial packets§ The length of the first 20 packets (with direction)
Website Fingerprinting (WF)
• To execute a website fingerprinting (WF) attacks§ ExploitoAll features
• Processing Steps § Data collection§ Data preprocessing§ Feature extraction§ Model learning§ Model evaluation
Characterization of Tor Traffic using Time based Features
Lashkari, Arash Habibi, et al., ICISSP. 2017
Summary• Tor data analysis• The experiment was carried out in two scenarios:
§ Scenario A: Tor or No-Tor data§ Scenario B: Find out which applications Tor data useso Categorize applications into 8 categories: browsing, email, chat,
audio-streaming, video-streaming, file transfer, VoIP, P2P• Flow: flow of data with the same values
§ {Source IP, Destination IP, Source Port, Destination Port, and Protocol (TCP or UDP)}
• To analyze, this work focused on Network traffic information (e.g., ipaddress, port number, protocol, etc.), time-related features
Experimental environments
23 features• 23 features (p257)
§ fiat (Forward Inter Arrival Time): the time between two packets forward direction (source to destination; outgoing packets) (mean, min, max, std)
§ biat (Backward Inter Arrival Time): the time between two packetsbackward direction (destination to source; incoming packets) (mean, min, max, std)
§ flowiat (Flow Inter Arrival Time): the time between two packets;no distinction between forward and backward) (mean, min, max, std)
§ active: the time interval before the idle status§ idle: the time interval before activeo If there is no traffic between two nodes for 5 sec, it is idle.
23 features
• 23 features (p257)§ fb_psec: Flow Bytes per second; No distinction between forward
and backward§ fp_psec: Flow packets per second; No distinction between
forward and backward§ duration: o The duration time with same source IP, destination IP, source
port, destination port and protocol (TCP or UDP)o In UDP, the flow is terminated with time out; 10, 15, 30, 60,
120sec
Scenario A: Tor vs. Non-Tor
• Generated Tor data and encrypted traffic (No-Tor)
§ How to generate encrypted data[1]:
ohttps protocol or VPN
oNote: https protocol is defined as “regular encrypted”,
oVPN is defined as “encrypted traffic tunneled through
VPN”
Scenario B: 8 different applications
• Generate data to classify the types of application using Tor§ Browsing: http and https can be used§ Email: Thunderbird client, (random account(e.g., Alice and Bob))
used gmail account § Chat: Facebook, hangouts, Skype, AIM, ICQ o Pidgin applications: AIM and ICQ (http://pidgin.im)
§ Audio-Streaming: spotify (https://www.spotify.com/us/)
Scenario B: 8 different applications
• Generate data to classify the types of application using Toro Video-Streaming: Youtube or Vimeoo File Transfer: (1) file transfer provided by Skype, (2) FTP over SSH
(SFTP; SSH File Transfer Protocol), (3) FTP over SSL (FTPS) o VoIP: voice-call using Facebook, Hangouts, Skype o P2P: used BitTorrent, downloaded Kali linux
(https://www.kali.org)– Used BitTorrent Vuze (http://www.vuze.com) client– Created a general environment BitTorrent by controlling
upload and download speed
Machine learning algorithms
• Tested with different machine learning algorithms § ZeroR§ C4.5§ KNN
• For Binary classification (Tor vs. Non Tor), ZeroR is used• For all cases, C4.5 and KNN are better than ZeroR
Experimental results
• Split into 2 groups based on performance§ Good precision and recall: VOIP, P2P, Audio, File Transfer
and Video§ Classifier fail: Browsing, Chat, and Email
Experimental results (details)
• VOIP and P2P are more distinct than other apps. (with very few false positive)§ The reason is that VOIP (e.g., skype) uses p2p network
which is a unique protocol.§ But, common error in VOIP is Browsing. oThis is because VOIP is an application based on web. So it
generates traffic based on browsing, which is the cause.§ The results of P2P analysis are almost perfect. oThe reason is because it communicates with the protocol
of the P2P only; the same reasons as the VOIP above
Experimental results (details)
• AUDIO and VIDEO have a similar pattern.
§ Because,
o (1) they are working on UDP and use RTP (Real-time
Transport protocol) https://tools.ietf.org/html/rfc3550;
o (2) Intuitively, the data sent by the server is relatively
more than the data (back) sent by the client.
o (3) RTP and RTCP (Real-time Control Protocol)
§ But, it is also confused with Browsing because it uses
browsers.
• AUDIO, VIDEO, and chat used web browser, so they generate
browsing traffic.
Conclusion
• Each application is more likely to be distinguished because different protocols are used. § Different protocols means different rules for packets that
are exchanged (send and receive)§ For example, different numbers of packets are required in
the initialization process.
Research challenges
• Processing Steps for Website Fingerprinting § Data collectionoHow many data can be collected
§ Data preprocessing oConvert to numeric data, normalized values
§ Feature extractiono Select less features, but important features for
computation § Model learning & evaluationoVarious machine learning models, such as random forest,
AdaBoost, XGBoost, and CNN
• Each step has various ways to improve performance