characterizing instant messaging traffic in an enterprise network
DESCRIPTION
2006 autumn intern presentation. Characterizing Instant Messaging Traffic in an Enterprise Network. Lei Guo, the Ohio State University Mentor: Zhen Xiao, manager: John Tracey. Instant messaging. Peak online users. Skype: 7 M QQ: 20 M. Quick response User presence service - PowerPoint PPT PresentationTRANSCRIPT
IBM Research, Network Server Systems Software
© 2006 IBM Corporation
Characterizing Instant Messaging Traffic in an Enterprise Network
Lei Guo, the Ohio State University
Mentor: Zhen Xiao, manager: John Tracey
2006 autumn intern presentation
IBM Research, Network Server Systems Software
© 2006 IBM Corporation2
Instant messaging
Quick response
User presence service
Interactive communication
Multitasking
Private chat
Enterprise cooperation
AIM: 53 M usersMSN: 29 M users
Jabber: 13.5 M usersSameTime:15M users
Skype: 7 MQQ: 20 M
Peak online users
IBM Research, Network Server Systems Software
© 2006 IBM Corporation3
Challenges of IM measurements
No large scale measurement study on IM traffic characterization so far
No server logs
– In contrast to Web and streaming media servers
Difficulty of online packet analysis
User privacy concerns
IBM Research, Network Server Systems Software
© 2006 IBM Corporation4
Our Objective and Methodology
First large scale IM traffic measurement
– IM system design and optimization
– Experimental basis for IM workload generation
– Security in IM network
Online IM traffic parser with the protection of user privacy related information
– Packet level workloads of AIM and MSN Messenger (by port number)
– Packet headers of Yahoo and GTalk/Jabber (by port number)
– Nearly one month in a large enterprise network with thousands of employees
– More than 20,000 user conversations by 469 AIM users and 408 MSN users
IBM Research, Network Server Systems Software
© 2006 IBM Corporation5
Dump data
pcap format file
IM Sniffer
MSNP
AIM protocol– Classic: OSCAR– Triton: new, N/A
10% AIM traffic
Networkinterface
OSkernel
pcap library
Online packetreconstructor
AIM packetparser
MSN packetparser
Offline analysis
Ethernetpackets
Protect user privacy information
IM packet 1
IM packet 2
[email protected]: hello, how are you doing
4d347c1b: e51c49a1043fc
IP packets
MD5 hash
IBM Research, Network Server Systems Software
© 2006 IBM Corporation6
Instant messaging in AIM
Authentication
Redirection
User-to-user chat
Multi-user chat
P2P communication
Authentication server
BOS server BOS server
Chat room server
P2P voice/video chat,file transferring
Email server
Buddy iconserver
…
Other services
IBM Research, Network Server Systems Software
© 2006 IBM Corporation7
Instant messaging in MSN Messenger
Switchboard server
Dispatch server
Notification server Notification server
P2P voice/video chat, file transferring
MSN passport server
Email server
…
Other services
IBM Research, Network Server Systems Software
© 2006 IBM Corporation8
Outline
Overview of IM traffic
Online activity of IM users
Characterizing IM servers
Analysis of IM traffic
Conclusion
IBM Research, Network Server Systems Software
© 2006 IBM Corporation9
Overview of IM traffic
0
100
200
300
400
500
600
AIM MSN Yahoo Gtal k
InboundOutbound
For most IM systems, the traffic volume a client receives from IM servers is much greater than that it sends.
MB
0
0.5
1
1.5
2
2.5
3
AIM MSN Yahoo Gtal k
InboundOutbound
Traffic volume # of packets with TCP payloadx106
IBM Research, Network Server Systems Software
© 2006 IBM Corporation10
IM servers in our workloads
0
200
400
600
800
1000
1200
AIM MSN Yahoo Gtal k
The number of IM servers is very large
Total # of server IPs collected Cum. # of server IPs collected over time
IBM Research, Network Server Systems Software
© 2006 IBM Corporation11
IM TCP connections
0
5000
10000
15000
20000
25000
30000
AIM MSN Yahoo Gtal k0.00%
0.50%
1.00%
1.50%
2.00%
2.50%
3.00%
AIM MSN Yahoo Gtal k
Number of TCP requests Failed TCP requests (%)
The percentage of failed TCP requests is non-trivial
IBM Research, Network Server Systems Software
© 2006 IBM Corporation12
IM traffic rate
IM traffic rate (sampled per minute)
IM traffic rate (sampled per hour)
IM traffic is highly bursty: a lot of spikes
8.9 Kbps in average
IBM Research, Network Server Systems Software
© 2006 IBM Corporation13
IM traffic rateHourly traffic rate of AIM
Hourly traffic rate of MSN
Each spike is due to a very limited number of TCP connections (typically one or two) -- due to voice/video chat and file transferring
IBM Research, Network Server Systems Software
© 2006 IBM Corporation14
IM traffic rateHourly traffic rate of Yahoo
Hourly traffic rate of GTalk
GTalk traffic rate has clear diurnal and weekly pattern, due to the less use of voice/video chat and file transfers
IBM Research, Network Server Systems Software
© 2006 IBM Corporation15
Summary of IM traffic overview
The traffic volume a client receives from IM servers is much greater than that it sends (Yahoo is an exception)
A large number of servers are used for IM services
The failure ratio of IM TCP connections is non-trivial
IM traffic is highly bursty due to voice/video chat and file transfers
IBM Research, Network Server Systems Software
© 2006 IBM Corporation16
Outline
Overview of IM traffic
Online activity of IM users
Characterizing IM servers
Analysis of IM traffic
Conclusion
IBM Research, Network Server Systems Software
© 2006 IBM Corporation17
Online session and chat conversation: AIM
Online session duration– Login time to logout/disconnect time– Duration of TCP connection to BOS server
Conversation– All messages are forwarded by the BOS server– Interleaved in a TCP connection together– 5-minute threshold for msg inter-arrival time
to identify a conversation
> 5min
conversations
BOSserverA
B
C> 5minAB1
AB2
AC1
IBM Research, Network Server Systems Software
© 2006 IBM Corporation18
Online session and chat conversation: MSN
Online session duration– Login time to logout/disconnect time– Duration of TCP connection to notification server
Conversation– Each conversation is forwarded by a new switchboard server– Disconnect automatically if idle > 5min– Removing conversations without chat messages
Switchboardserver
Notificationserver
IBM Research, Network Server Systems Software
© 2006 IBM Corporation19
Online activity of AIM usersNumber of online users
Number of simultaneous chat conversations
Clear diurnal and weekly patterns peak time about 2:00 PM# of chat conversations << # of online users
120 users
12 chatconversations
IBM Research, Network Server Systems Software
© 2006 IBM Corporation20
Online activity of MSN usersNumber of online users
Number of simultaneous chat conversations
90 users
14 chatconversations
Clear diurnal and weekly patterns peak time about 2:00 PM (lunch break)# of chat conversations << # of online users
IBM Research, Network Server Systems Software
© 2006 IBM Corporation21
Number of conversations per user
Users are idle in most time
Few users chatting simultaneously with two buddies
A
I
M
M
S
N
average: 0.058
average: 0.075
IBM Research, Network Server Systems Software
© 2006 IBM Corporation22
Distribution of user online duration
AIM MSN
Weibull distribution has been reported by a P2P study (IMC 2006)
Cumulative probability distribution: P (X > x) = exp[-(x/x0)c]
log(–log P) = log[(x/x0)c] = c log x – c log x0 straight line: not well fit
IBM Research, Network Server Systems Software
© 2006 IBM Corporation23
Online duration of IM user sessions
CDF CCDF
Two-mode distribution
10 hours – the divide between long online durations and short online durations
IBM Research, Network Server Systems Software
© 2006 IBM Corporation24
Online activity of AIM users
Login events Logout events
Peak time: about 9:00 AM Peak time: about 5:00 PM
IBM Research, Network Server Systems Software
© 2006 IBM Corporation25
Online activity of MSN users
Login events Logout events
Peak time: about 9:00 AM Peak time: about 5:00 PM
IBM Research, Network Server Systems Software
© 2006 IBM Corporation26
The 10-hour divide of online duration
AI
M
MSN
Login events Logout events
10 hours
10 hours
Online time roughly 10 hours: some employees working longer than 8 hours
Online time longer than 10 hours: users do not turn off computer when leaving work
IBM Research, Network Server Systems Software
© 2006 IBM Corporation27
Number of online daysAIM
MSN
MSN
Not a heavy-tailed distribution, show user activity in another perspective– Inactive users: online occasionally– Active users: online every weekday– Random users: online sporadically
IBM Research, Network Server Systems Software
© 2006 IBM Corporation28
Summary of user online activity
Number of online users and simultaneous chat conversations have clearly diurnal and weekly patterns
Users are idle in most online time: # of chat conversations << # of online users
User online duration does not follow Weibull distribution
– Most user sessions: login and logout events are highly related with working hours
– Long duration user sessions (> 10 hours): users do not turn off computer when they leave work
– Two-mode online duration distribution
Users can classified into three categories based on their online days: actively online, inactively online, and sporadically online
IBM Research, Network Server Systems Software
© 2006 IBM Corporation29
Outline
Overview of IM traffic
Online activity of IM users
Characterizing IM servers
Analysis of IM traffic
Conclusion
IBM Research, Network Server Systems Software
© 2006 IBM Corporation30
Characterizing IM servers
RTT
SRT
snifferIM server
IM client
Server response time measurement
Purpose:A first step to understanding the server load from client side
CRT: client perceived response timeSRT: server response time of MSNP commandsRTT: packet round trip time (get from TCP handshake)
CRT
IBM Research, Network Server Systems Software
© 2006 IBM Corporation31
MSN server response time
Response time for the first MSNP command of a TCP connection
– RTT is still accurate
– Reflects the server load
Some commands are responded with a long latency
Dispatch server Notification server Switchboard server
IBM Research, Network Server Systems Software
© 2006 IBM Corporation32
Outline
Overview of IM traffic
Online activity of IM users
Characterizing IM servers
Analysis of IM traffic
Conclusion
IBM Research, Network Server Systems Software
© 2006 IBM Corporation33
Message level analysis of IM traffic
Inbound traffic >> outbound traffic
# of msgs: chat < hint < presence (AIM hint msg is small because OSCAR is binary based)
MSN has more bin msgs for user icons, voice/video chats
AIM MSN
IBM Research, Network Server Systems Software
© 2006 IBM Corporation34
Size of chat messages
AIM: messages are in html format (not extracted online)
MSN: format is described in message header and easy to remove
MSN: 90% messages are smaller than 50 bytes
CDF (semi-log scale) CCDF (log-log scale)
< 50 bytes
IBM Research, Network Server Systems Software
© 2006 IBM Corporation35
Number of messages in a user conversation
CDF (semi-log scale) CCDF (Weibull scale)
25 40
90%
Most conversations have small number of messages
The number of msg in a conversation
– Not power law
– Follows Weibull distribution approximately
IBM Research, Network Server Systems Software
© 2006 IBM Corporation36
Number of messages in a user conversation
AIM MSN
Weibull fitting results
IBM Research, Network Server Systems Software
© 2006 IBM Corporation37
Number of conversations by a user
Most users have small number of conversations
Number of user conversations
– Not power law
– Follows Weibull distribution approximately
CDF (semi-log scale) CCDF (Weibull scale)
IBM Research, Network Server Systems Software
© 2006 IBM Corporation38
Distribution of MSN conversation duration
< 200 sec
Most conversations are shortMSN client will disconnect to the SB server after a long idle time
IBM Research, Network Server Systems Software
© 2006 IBM Corporation39
IM social network: number of users contacted
Rank (log-log scale) CCDF (Weibull scale)
Users in buddy list– Contact list packets may be lost or cannot completely parsed by IM sniffer
Users chat with– IM spammers
MSN: Weibull, AIM: a little rough
IBM Research, Network Server Systems Software
© 2006 IBM Corporation40
Number of buddies an IM user chats with
A user only contacts with a small portion of of buddies in its contact list
MSN users are more active?– Not sure, we do not count AIM Triton users
MSN AIM
A user chat with 5.5 buddies (about 25%) in average
A user chat with 1.9 buddies (about 7%) in average
IBM Research, Network Server Systems Software
© 2006 IBM Corporation41
Concluding remarks
IM sniffer and measurement– Packet level
– User privacy protection
IM traffic characterization – Diurnal and weekly patterns of IM traffic
– The traffic volume a client receives is much greater than it sends
– Chat msgs only account for a small percentage of total msgs
– Online activity of IM users
– Messages in conversations: Weibull
– Conversations of users: Weibull
– Social network: Weibull roughly
IBM Research, Network Server Systems Software
© 2006 IBM Corporation42
Future work
Implement IM sniffer in Linux kernel
– For heavy workload collection
Larger scale measurement in Cornell University
– Larger user population, dominated by students
Collect SameTime workload on the server side
– Understand IM servers better
– How IM is used in work cooperation: a global map of IM user social network