peer-to-peer application recognition based on signaling activity
DESCRIPTION
Because of the enormous growth in the number of peer-to-peer (P2P) applications in recent years, P2P traffic now constitutes a substantial proportion of Internet traffic. The ability to accurately identify different P2P applications from the network traffic is essential for managing a number of network traffic issues, such as service differentiation and capacity planning. However, modern P2P applications often use proprietary protocols,dynamic port numbers, and packet encryptions, which make traditional identification approaches like port-based or signaturebased identification less effective. In this paper, we propose an approach for accurately recognizing P2P applications running on monitored hosts based on signaling behavior, which is regulated by the underlying P2P protocol; therefore, each application possesses a distinguishing characteristic. We consider that the signaling behavior of each P2P application can serve as a unique signature for application identification. Our approach is particularly useful for three reasons: 1) it does not need to access the packet payload; 2) it recognizes applications based purely on their signaling behavior; and 3) it can identify particular P2P applications. The performance evaluation shows that 92% of a real-life traffic trace can be correctly recognized within a 5-minute monitoring period.TRANSCRIPT
Chen‐Chi Wu1, Kuan‐Ta Chen2, Yu‐Chun Chang1, Chin‐Laung Lei1
1Department of Electrical Engineering, National Taiwan University2Institute of Information Science, Academia Sinica
1ICC09
Talk OutlineIntroduction
Fundamentals of our scheme
Methodology
Performance evaluation
Conclusion
2ICC09
IntroductionP2P traffic constitutes a substantial volume of Internet traffic
Accurately identify P2P applications from the network traffic is important
Network management, capacity planning, etc.
Conventional approaches: port numbers or payload signatures
Dynamic ports, encrypted payload
3ICC09
Fundamentals of Our SchemeP2P applications generate two types of traffic
Data transfer trafficFile‐sharing or file‐redistribution
Signaling trafficFile information refreshment, peer discovery, control information exchange, etc.
Signaling activity is regulated by the underlying P2P protocol
Each P2P application may have a unique characteristic
4ICC09
Fundamentals of Our SchemeVerify our conjecture
Compare the signaling activity patterns of BitTorrent, eMule, and Skype
Traffic dataCapture the traffic of 3 hosts that execute BitTorrent, eMule, or Skype
Assume packets with payload size smaller than 100 bytes are signaling packets
5ICC09
Signaling Activity PatternsAssign id to hosts that were contacted by the monitored host based on the order in which they are observed
BitTorrentIntensive exchange of signaling packets
The BitTorrent client progressively discovers new hosts
6ICC09
Signaling Activity PatternseMule
The number of hosts increases rapidly in the first 10 minutes but increases slowly thereafter
SkypeMost of signaling packets belong to the probe traffic
7ICC09
Proposed SchemeIdentify P2P applications running on hosts based on the signaling behavior
How to characterize signaling traffic?
8ICC09
Signaling Behavior CharacterizationKeep track of signaling packets of a monitored host for a period of time
Count the number of hosts contacted and the number of packets sent and received every minute
Classify hosts contacted with the monitored host into 2 typesSending/receiving packets within 5 minutes => old host
Otherwise => new host
Characterize the signaling behavior on two levelsHost level: based on the number of new or old hosts
Message level: based on the number of new or old packets
9ICC09
Signaling Behavior Features
10
Host level
Ratio of new / old hosts
Growth rate of new / old hosts
Correlation coefficient between the number of new and old hosts
ICC09
Message level
Ratio of new / old packets
Growth rate of new / old packets
Correlation coefficient between the number of new and old packets
ExampleHost level ‐ ratio of new hosts
Keep track of hosts contacted with the monitored host
Incoming direction in the 6th min.: B and D are old hosts; A, G, and H are new hosts
Ratio of new hosts in the 6th min. => 3/5
11ICC09Monitor time (min.)
A B C B GADHB FCBE DB D B C D
Inco
min
g D
irect
ion
New hostOld host
F G AAE B C BB E D C DA B
Out
goin
g D
irect
ion
1 2 3 4 5 6
Identifier DesignAdopt support vector machine (SVM)
Training phaseDerive features from each training data
Label each training data with the name of P2P applications
Train the SVM classifier
Identification phaseDerive features from a signaling packet stream
Use the trained classifier to determine the P2P application
12ICC09
Traffic Data
Category Hosts Packets
BitTorrent 110,711 104,722,150
eMule 42,377 36,716,588
Skype 61,777 34,076,328
World of Warcraft 218 2,528,359
TELNET 362 21,118,522
HTTP 4,448 28,264,360
13ICC09
Performance Evaluation10‐fold cross validation
14ICC09
ConclusionSummary
Identify distinct P2P applications without examining payload
Characterize signaling behavior possessed by P2P applications
Future workConsider the case that a host launches multiple P2P applications
Short flows?
15ICC09
Thank you for your attention!
16ICC09