ibm research, network server systems software © 2007 ibm corporation instant messaging traffic...

28
IBM Research, Network Server Systems Software © 2007 IBM Corporation Instant Messaging Traffic Analysis Zhen Xiao, Lei Guo, and John Trace y The 27th International Conference on Distributed Computing Sys tems (ICDCS'07), Toronto, Canada, June 2007

Upload: ahmad-frakes

Post on 14-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

IBM Research, Network Server Systems Software

© 2007 IBM Corporation

Instant Messaging Traffic Analysis

Zhen Xiao, Lei Guo, and John Tracey

The 27th International Conference on Distributed Computing Systems (ICDCS'07), Toronto, Canada, June 2007

IBM Research, Network Server Systems Software

© 2007 IBM Corporation2

Instant messaging

Quick response

User presence service

Multitasking

Private chat

Enterprise cooperation

AIM: 53 M usersMSN: 29 M users

Jabber: 13.5 M usersSameTime:15M users

Skype: 7 MQQ: 20 M

Peak online users

IBM Research, Network Server Systems Software

© 2007 IBM Corporation3

Instant Messaging Traffic Analysis – Goals

Understanding instant messaging traffic characteristics

– Other workloads (Web, Database, etc.) well understood

– Little study on instant messaging workloads

Instant messaging is a key application for SIP

– Workload characterization essential to realistic workload generation

– Workload generation essential to benchmarking

Challenge: sniffing in the middle of the network is hard

– IM formats and protocols are proprietary

– Developing IM sniffer has distinct challenges

IBM Research, Network Server Systems Software

© 2007 IBM Corporation4

Existing Work on Instant Messaging Analysis

Social behaviors of IM users [CSCW2000, CSCW2002]

– Based on surveys and interviews

– Small sample sizes, subjective descriptions

Specialized instant messagers [CSCS2002 Hubbub]

– Relatively large scale (437 users)

– incompatible with popular IMs

– Also focuses on social behaviors

Security of IM networks [WORM2005]

– Propagation of viruses and worms

Very little focus on characteristics of instant messaging traffic

IBM Research, Network Server Systems Software

© 2007 IBM Corporation5

Outline

Introduction

Related work

Background on AIM and MSN

Instant Messaging Traffic Analysis

– Overview

– Message level analysis

– Online session analysis

– Social network analysis

Conclusion

IBM Research, Network Server Systems Software

© 2007 IBM Corporation6

AOL Instant Messaging (AIM)

Authentication

Redirection

User-to-user chat

Multi-user chat

P2P communication

Authentication server

BOS server BOS server

Chat room server

P2P voice/video chat,file transferring

Email server

Buddy iconserver

Other services

IBM Research, Network Server Systems Software

© 2007 IBM Corporation7

MSN Messenger

Switchboard server

Dispatch server

Notification server Notification server

P2P voice/video chat, file transferring

MSN passport server

Email server

Other services

IBM Research, Network Server Systems Software

© 2007 IBM Corporation8

Outline

Introduction

Related work

Background on AIM and MSN

Instant Messaging Traffic Analysis

– Overview

– Message level analysis

– Online session analysis

– Social network analysis

Conclusion

IBM Research, Network Server Systems Software

© 2007 IBM Corporation9

Instant Messaging Traffic Analysis - Approach Trace collection

– Analyze traffic between large enterprise (~4K users) Intranet and the Internet

– Comprehensive analysis of AIM and MSN

– Cursory analysis for Yahoo, GTalk, SameTime, IRC chat

– Logs anonymized version of traffic

– About one month duration: 2006-10-14 to 2006-11-06

– More than 20K conversations

Sniffer

Enterprisenetwork

Internet

IBM Research, Network Server Systems Software

© 2007 IBM Corporation10

Online anonymization

Dump to disk

Instant Messaging Sniffer Architecture

MSNP

AIM protocol– Classic: OSCAR– Triton: new (Aug

2006), N/A

10% AIM traffic

Networkinterface

OSkernel

pcap library

Online packetreconstructor

AIM packetparser

MSN packetparser

Ethernetpackets

IM packet 1

IM packet 2

IP packets

MD5 hash with random seed

IBM Research, Network Server Systems Software

© 2007 IBM Corporation11

Overview of IM traffic

0

100

200

300

400

500

600

AIM MSN Yahoo Gtalk

InboundOutbound

For most IM systems, the traffic volume a client receives from IM servers is much greater than that it sends.

MB Traffic volume

0

200

400

600

800

1000

1200

AIM MSN Yahoo Gtalk

Total # of server IPs collected

The number of IM servers is very large

IBM Research, Network Server Systems Software

© 2007 IBM Corporation12

IM traffic rateHourly traffic rate of AIM

Hourly traffic rate of MSN

Each spike is due to a very limited number of TCP connections (typically one or two) -- due to voice/video chat and file transferring

IBM Research, Network Server Systems Software

© 2007 IBM Corporation13

Outline

Introduction

Related work

Background on AIM and MSN

Instant Messaging Traffic Analysis

– Overview

– Message level analysis

– Online session analysis

– Social network analysis

Conclusion

IBM Research, Network Server Systems Software

© 2007 IBM Corporation14

Breakdown of IM message types

Chat msgs: text msgs a user types

Hint msgs: generated by IM client software

Presence msgs: status of buddies

Icon/binary msgs: transfer pics of users, deliver voice/video chat, file transfers when two users cannot communicate directly

Service control msgs: log in, log out, server redirection, appl level keep alive, etc.

Other: all other msgs

IBM Research, Network Server Systems Software

© 2007 IBM Corporation15

Message level analysis of IM traffic

# of msgs: chat < hint < presence

MSN has more bin msgs for user icons, voice/video chats

AIM MSN

During overload, instant messaging servers can prioritize traffic and drop lower priority traffic to protect the instantaneous nature of the communication

IBM Research, Network Server Systems Software

© 2007 IBM Corporation16

Size of chat messages

AIM: messages are in html format (not extracted online)

MSN: format is described in message header and easy to remove

MSN: 90% messages are smaller than 50 bytes

CDF (semi-log scale) CCDF (log-log scale)

< 50 bytes

IBM Research, Network Server Systems Software

© 2007 IBM Corporation17

Outline

Introduction

Related work

Background on AIM and MSN

Instant Messaging Traffic Analysis

– Overview

– Message level analysis

– Online session analysis

– Social network analysis

Conclusion

IBM Research, Network Server Systems Software

© 2007 IBM Corporation18

Online activity of AIM usersNumber of online users

Number of simultaneous chat conversations

Clear diurnal and weekly patterns peak time about 2:00 PM# of chat conversations << # of online users

120 users

12 chatconversations

IBM Research, Network Server Systems Software

© 2007 IBM Corporation19

Online activity of MSN usersNumber of online users

Number of simultaneous chat conversations

90 users

14 chatconversations

Clear diurnal and weekly patterns peak time about 2:00 PM (lunch break)# of chat conversations << # of online users

IBM Research, Network Server Systems Software

© 2007 IBM Corporation20

Online duration of IM user sessions

CDF CCDF

Two mode distribution

10 hours – the divide between long online durations and short online durations

IBM Research, Network Server Systems Software

© 2007 IBM Corporation21

Online activity of AIM users

Login events Logout events

Peak time: about 9:00 AM Peak time: about 5:00 PM

IBM Research, Network Server Systems Software

© 2007 IBM Corporation22

Online activity of MSN users

Login events Logout events

Peak time: about 9:00 AM Peak time: about 5:00 PM

IBM Research, Network Server Systems Software

© 2007 IBM Corporation23

Outline

Introduction

Related work

Background on AIM and MSN

Instant Messaging Traffic Analysis

– Overview

– Message level analysis

– Online session analysis

– Social network analysis

Conclusion

IBM Research, Network Server Systems Software

© 2007 IBM Corporation24

IM social network: number of users contacted

Rank (log-log scale) CCDF (Weibull scale)

Disclaimer: cannot rebuild the contact network of IM systems with only a subset of its users

MSN: Weibull, AIM: a little rough

IBM Research, Network Server Systems Software

© 2007 IBM Corporation25

Number of buddies an IM user chats with

A user only contacts with a small portion of of buddies in its contact list

MSN users are more active?– Not sure, we do not count AIM Triton users

MSN AIM

A user chat with 5.5 buddies (about 25%) in average

A user chat with 1.9 buddies (about 7%) in average

IBM Research, Network Server Systems Software

© 2007 IBM Corporation26

Outline

Introduction

Related work

Background on AIM and MSN

Instant Messaging Traffic Analysis

– Overview

– Message level analysis

– Online session analysis

– Social network analysis

Conclusion

IBM Research, Network Server Systems Software

© 2007 IBM Corporation27

Concluding remarks

Workload characterization essential to benchmarking

– The Design, Implementation, and Validation of an Instant Messaging Workload Generator Submitted for publication

Message level analysis

– Chat messages constitute only a small percentage of the total IM traffic

Social network

– Does not follow a power law distribution

Future work

– Measurement from other user population (e.g., universities)

– Server side workload (a global map of IM user social network)

IBM Research, Network Server Systems Software

© 2007 IBM Corporation

Thank you!Thank you!