characterizing instant messaging traffic in an enterprise network

42
IBM Research, Network Server Systems Software © 2006 IBM Corporation Characterizing Instant Messaging Traffic in an Enterprise Network Lei Guo, the Ohio State University Mentor: Zhen Xiao, manager: John Tracey 006 autumn intern presentation

Upload: xuxa

Post on 14-Jan-2016

31 views

Category:

Documents


0 download

DESCRIPTION

2006 autumn intern presentation. Characterizing Instant Messaging Traffic in an Enterprise Network. Lei Guo, the Ohio State University Mentor: Zhen Xiao, manager: John Tracey. Instant messaging. Peak online users. Skype: 7 M QQ: 20 M. Quick response User presence service - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Characterizing Instant Messaging Traffic in an Enterprise Network

IBM Research, Network Server Systems Software

© 2006 IBM Corporation

Characterizing Instant Messaging Traffic in an Enterprise Network

Lei Guo, the Ohio State University

Mentor: Zhen Xiao, manager: John Tracey

2006 autumn intern presentation

Page 2: Characterizing Instant Messaging Traffic in an Enterprise Network

IBM Research, Network Server Systems Software

© 2006 IBM Corporation2

Instant messaging

Quick response

User presence service

Interactive communication

Multitasking

Private chat

Enterprise cooperation

AIM: 53 M usersMSN: 29 M users

Jabber: 13.5 M usersSameTime:15M users

Skype: 7 MQQ: 20 M

Peak online users

Page 3: Characterizing Instant Messaging Traffic in an Enterprise Network

IBM Research, Network Server Systems Software

© 2006 IBM Corporation3

Challenges of IM measurements

No large scale measurement study on IM traffic characterization so far

No server logs

– In contrast to Web and streaming media servers

Difficulty of online packet analysis

User privacy concerns

Page 4: Characterizing Instant Messaging Traffic in an Enterprise Network

IBM Research, Network Server Systems Software

© 2006 IBM Corporation4

Our Objective and Methodology

First large scale IM traffic measurement

– IM system design and optimization

– Experimental basis for IM workload generation

– Security in IM network

Online IM traffic parser with the protection of user privacy related information

– Packet level workloads of AIM and MSN Messenger (by port number)

– Packet headers of Yahoo and GTalk/Jabber (by port number)

– Nearly one month in a large enterprise network with thousands of employees

– More than 20,000 user conversations by 469 AIM users and 408 MSN users

Page 5: Characterizing Instant Messaging Traffic in an Enterprise Network

IBM Research, Network Server Systems Software

© 2006 IBM Corporation5

Dump data

pcap format file

IM Sniffer

MSNP

AIM protocol– Classic: OSCAR– Triton: new, N/A

10% AIM traffic

Networkinterface

OSkernel

pcap library

Online packetreconstructor

AIM packetparser

MSN packetparser

Offline analysis

Ethernetpackets

Protect user privacy information

IM packet 1

IM packet 2

[email protected]: hello, how are you doing

4d347c1b: e51c49a1043fc

IP packets

MD5 hash

Page 6: Characterizing Instant Messaging Traffic in an Enterprise Network

IBM Research, Network Server Systems Software

© 2006 IBM Corporation6

Instant messaging in AIM

Authentication

Redirection

User-to-user chat

Multi-user chat

P2P communication

Authentication server

BOS server BOS server

Chat room server

P2P voice/video chat,file transferring

Email server

Buddy iconserver

Other services

Page 7: Characterizing Instant Messaging Traffic in an Enterprise Network

IBM Research, Network Server Systems Software

© 2006 IBM Corporation7

Instant messaging in MSN Messenger

Switchboard server

Dispatch server

Notification server Notification server

P2P voice/video chat, file transferring

MSN passport server

Email server

Other services

Page 8: Characterizing Instant Messaging Traffic in an Enterprise Network

IBM Research, Network Server Systems Software

© 2006 IBM Corporation8

Outline

Overview of IM traffic

Online activity of IM users

Characterizing IM servers

Analysis of IM traffic

Conclusion

Page 9: Characterizing Instant Messaging Traffic in an Enterprise Network

IBM Research, Network Server Systems Software

© 2006 IBM Corporation9

Overview of IM traffic

0

100

200

300

400

500

600

AIM MSN Yahoo Gtal k

InboundOutbound

For most IM systems, the traffic volume a client receives from IM servers is much greater than that it sends.

MB

0

0.5

1

1.5

2

2.5

3

AIM MSN Yahoo Gtal k

InboundOutbound

Traffic volume # of packets with TCP payloadx106

Page 10: Characterizing Instant Messaging Traffic in an Enterprise Network

IBM Research, Network Server Systems Software

© 2006 IBM Corporation10

IM servers in our workloads

0

200

400

600

800

1000

1200

AIM MSN Yahoo Gtal k

The number of IM servers is very large

Total # of server IPs collected Cum. # of server IPs collected over time

Page 11: Characterizing Instant Messaging Traffic in an Enterprise Network

IBM Research, Network Server Systems Software

© 2006 IBM Corporation11

IM TCP connections

0

5000

10000

15000

20000

25000

30000

AIM MSN Yahoo Gtal k0.00%

0.50%

1.00%

1.50%

2.00%

2.50%

3.00%

AIM MSN Yahoo Gtal k

Number of TCP requests Failed TCP requests (%)

The percentage of failed TCP requests is non-trivial

Page 12: Characterizing Instant Messaging Traffic in an Enterprise Network

IBM Research, Network Server Systems Software

© 2006 IBM Corporation12

IM traffic rate

IM traffic rate (sampled per minute)

IM traffic rate (sampled per hour)

IM traffic is highly bursty: a lot of spikes

8.9 Kbps in average

Page 13: Characterizing Instant Messaging Traffic in an Enterprise Network

IBM Research, Network Server Systems Software

© 2006 IBM Corporation13

IM traffic rateHourly traffic rate of AIM

Hourly traffic rate of MSN

Each spike is due to a very limited number of TCP connections (typically one or two) -- due to voice/video chat and file transferring

Page 14: Characterizing Instant Messaging Traffic in an Enterprise Network

IBM Research, Network Server Systems Software

© 2006 IBM Corporation14

IM traffic rateHourly traffic rate of Yahoo

Hourly traffic rate of GTalk

GTalk traffic rate has clear diurnal and weekly pattern, due to the less use of voice/video chat and file transfers

Page 15: Characterizing Instant Messaging Traffic in an Enterprise Network

IBM Research, Network Server Systems Software

© 2006 IBM Corporation15

Summary of IM traffic overview

The traffic volume a client receives from IM servers is much greater than that it sends (Yahoo is an exception)

A large number of servers are used for IM services

The failure ratio of IM TCP connections is non-trivial

IM traffic is highly bursty due to voice/video chat and file transfers

Page 16: Characterizing Instant Messaging Traffic in an Enterprise Network

IBM Research, Network Server Systems Software

© 2006 IBM Corporation16

Outline

Overview of IM traffic

Online activity of IM users

Characterizing IM servers

Analysis of IM traffic

Conclusion

Page 17: Characterizing Instant Messaging Traffic in an Enterprise Network

IBM Research, Network Server Systems Software

© 2006 IBM Corporation17

Online session and chat conversation: AIM

Online session duration– Login time to logout/disconnect time– Duration of TCP connection to BOS server

Conversation– All messages are forwarded by the BOS server– Interleaved in a TCP connection together– 5-minute threshold for msg inter-arrival time

to identify a conversation

> 5min

conversations

BOSserverA

B

C> 5minAB1

AB2

AC1

Page 18: Characterizing Instant Messaging Traffic in an Enterprise Network

IBM Research, Network Server Systems Software

© 2006 IBM Corporation18

Online session and chat conversation: MSN

Online session duration– Login time to logout/disconnect time– Duration of TCP connection to notification server

Conversation– Each conversation is forwarded by a new switchboard server– Disconnect automatically if idle > 5min– Removing conversations without chat messages

Switchboardserver

Notificationserver

Page 19: Characterizing Instant Messaging Traffic in an Enterprise Network

IBM Research, Network Server Systems Software

© 2006 IBM Corporation19

Online activity of AIM usersNumber of online users

Number of simultaneous chat conversations

Clear diurnal and weekly patterns peak time about 2:00 PM# of chat conversations << # of online users

120 users

12 chatconversations

Page 20: Characterizing Instant Messaging Traffic in an Enterprise Network

IBM Research, Network Server Systems Software

© 2006 IBM Corporation20

Online activity of MSN usersNumber of online users

Number of simultaneous chat conversations

90 users

14 chatconversations

Clear diurnal and weekly patterns peak time about 2:00 PM (lunch break)# of chat conversations << # of online users

Page 21: Characterizing Instant Messaging Traffic in an Enterprise Network

IBM Research, Network Server Systems Software

© 2006 IBM Corporation21

Number of conversations per user

Users are idle in most time

Few users chatting simultaneously with two buddies

A

I

M

M

S

N

average: 0.058

average: 0.075

Page 22: Characterizing Instant Messaging Traffic in an Enterprise Network

IBM Research, Network Server Systems Software

© 2006 IBM Corporation22

Distribution of user online duration

AIM MSN

Weibull distribution has been reported by a P2P study (IMC 2006)

Cumulative probability distribution: P (X > x) = exp[-(x/x0)c]

log(–log P) = log[(x/x0)c] = c log x – c log x0 straight line: not well fit

Page 23: Characterizing Instant Messaging Traffic in an Enterprise Network

IBM Research, Network Server Systems Software

© 2006 IBM Corporation23

Online duration of IM user sessions

CDF CCDF

Two-mode distribution

10 hours – the divide between long online durations and short online durations

Page 24: Characterizing Instant Messaging Traffic in an Enterprise Network

IBM Research, Network Server Systems Software

© 2006 IBM Corporation24

Online activity of AIM users

Login events Logout events

Peak time: about 9:00 AM Peak time: about 5:00 PM

Page 25: Characterizing Instant Messaging Traffic in an Enterprise Network

IBM Research, Network Server Systems Software

© 2006 IBM Corporation25

Online activity of MSN users

Login events Logout events

Peak time: about 9:00 AM Peak time: about 5:00 PM

Page 26: Characterizing Instant Messaging Traffic in an Enterprise Network

IBM Research, Network Server Systems Software

© 2006 IBM Corporation26

The 10-hour divide of online duration

AI

M

MSN

Login events Logout events

10 hours

10 hours

Online time roughly 10 hours: some employees working longer than 8 hours

Online time longer than 10 hours: users do not turn off computer when leaving work

Page 27: Characterizing Instant Messaging Traffic in an Enterprise Network

IBM Research, Network Server Systems Software

© 2006 IBM Corporation27

Number of online daysAIM

MSN

MSN

Not a heavy-tailed distribution, show user activity in another perspective– Inactive users: online occasionally– Active users: online every weekday– Random users: online sporadically

Page 28: Characterizing Instant Messaging Traffic in an Enterprise Network

IBM Research, Network Server Systems Software

© 2006 IBM Corporation28

Summary of user online activity

Number of online users and simultaneous chat conversations have clearly diurnal and weekly patterns

Users are idle in most online time: # of chat conversations << # of online users

User online duration does not follow Weibull distribution

– Most user sessions: login and logout events are highly related with working hours

– Long duration user sessions (> 10 hours): users do not turn off computer when they leave work

– Two-mode online duration distribution

Users can classified into three categories based on their online days: actively online, inactively online, and sporadically online

Page 29: Characterizing Instant Messaging Traffic in an Enterprise Network

IBM Research, Network Server Systems Software

© 2006 IBM Corporation29

Outline

Overview of IM traffic

Online activity of IM users

Characterizing IM servers

Analysis of IM traffic

Conclusion

Page 30: Characterizing Instant Messaging Traffic in an Enterprise Network

IBM Research, Network Server Systems Software

© 2006 IBM Corporation30

Characterizing IM servers

RTT

SRT

snifferIM server

IM client

Server response time measurement

Purpose:A first step to understanding the server load from client side

CRT: client perceived response timeSRT: server response time of MSNP commandsRTT: packet round trip time (get from TCP handshake)

CRT

Page 31: Characterizing Instant Messaging Traffic in an Enterprise Network

IBM Research, Network Server Systems Software

© 2006 IBM Corporation31

MSN server response time

Response time for the first MSNP command of a TCP connection

– RTT is still accurate

– Reflects the server load

Some commands are responded with a long latency

Dispatch server Notification server Switchboard server

Page 32: Characterizing Instant Messaging Traffic in an Enterprise Network

IBM Research, Network Server Systems Software

© 2006 IBM Corporation32

Outline

Overview of IM traffic

Online activity of IM users

Characterizing IM servers

Analysis of IM traffic

Conclusion

Page 33: Characterizing Instant Messaging Traffic in an Enterprise Network

IBM Research, Network Server Systems Software

© 2006 IBM Corporation33

Message level analysis of IM traffic

Inbound traffic >> outbound traffic

# of msgs: chat < hint < presence (AIM hint msg is small because OSCAR is binary based)

MSN has more bin msgs for user icons, voice/video chats

AIM MSN

Page 34: Characterizing Instant Messaging Traffic in an Enterprise Network

IBM Research, Network Server Systems Software

© 2006 IBM Corporation34

Size of chat messages

AIM: messages are in html format (not extracted online)

MSN: format is described in message header and easy to remove

MSN: 90% messages are smaller than 50 bytes

CDF (semi-log scale) CCDF (log-log scale)

< 50 bytes

Page 35: Characterizing Instant Messaging Traffic in an Enterprise Network

IBM Research, Network Server Systems Software

© 2006 IBM Corporation35

Number of messages in a user conversation

CDF (semi-log scale) CCDF (Weibull scale)

25 40

90%

Most conversations have small number of messages

The number of msg in a conversation

– Not power law

– Follows Weibull distribution approximately

Page 36: Characterizing Instant Messaging Traffic in an Enterprise Network

IBM Research, Network Server Systems Software

© 2006 IBM Corporation36

Number of messages in a user conversation

AIM MSN

Weibull fitting results

Page 37: Characterizing Instant Messaging Traffic in an Enterprise Network

IBM Research, Network Server Systems Software

© 2006 IBM Corporation37

Number of conversations by a user

Most users have small number of conversations

Number of user conversations

– Not power law

– Follows Weibull distribution approximately

CDF (semi-log scale) CCDF (Weibull scale)

Page 38: Characterizing Instant Messaging Traffic in an Enterprise Network

IBM Research, Network Server Systems Software

© 2006 IBM Corporation38

Distribution of MSN conversation duration

< 200 sec

Most conversations are shortMSN client will disconnect to the SB server after a long idle time

Page 39: Characterizing Instant Messaging Traffic in an Enterprise Network

IBM Research, Network Server Systems Software

© 2006 IBM Corporation39

IM social network: number of users contacted

Rank (log-log scale) CCDF (Weibull scale)

Users in buddy list– Contact list packets may be lost or cannot completely parsed by IM sniffer

Users chat with– IM spammers

MSN: Weibull, AIM: a little rough

Page 40: Characterizing Instant Messaging Traffic in an Enterprise Network

IBM Research, Network Server Systems Software

© 2006 IBM Corporation40

Number of buddies an IM user chats with

A user only contacts with a small portion of of buddies in its contact list

MSN users are more active?– Not sure, we do not count AIM Triton users

MSN AIM

A user chat with 5.5 buddies (about 25%) in average

A user chat with 1.9 buddies (about 7%) in average

Page 41: Characterizing Instant Messaging Traffic in an Enterprise Network

IBM Research, Network Server Systems Software

© 2006 IBM Corporation41

Concluding remarks

IM sniffer and measurement– Packet level

– User privacy protection

IM traffic characterization – Diurnal and weekly patterns of IM traffic

– The traffic volume a client receives is much greater than it sends

– Chat msgs only account for a small percentage of total msgs

– Online activity of IM users

– Messages in conversations: Weibull

– Conversations of users: Weibull

– Social network: Weibull roughly

Page 42: Characterizing Instant Messaging Traffic in an Enterprise Network

IBM Research, Network Server Systems Software

© 2006 IBM Corporation42

Future work

Implement IM sniffer in Linux kernel

– For heavy workload collection

Larger scale measurement in Cornell University

– Larger user population, dominated by students

Collect SameTime workload on the server side

– Understand IM servers better

– How IM is used in work cooperation: a global map of IM user social network