iit bombay network measurements

76
IIT BOMBAY NETWORK MEASUREMENTS MONITORING THE PERFORMANCE OF BACKHAUL CAMPUS NETWORK Submitted by: Manveer Singh Chawla Guided by: Prof. Purushottam Kulkarni

Upload: kim

Post on 11-Feb-2016

51 views

Category:

Documents


0 download

DESCRIPTION

IIT BOMBAY NETWORK MEASUREMENTS. Guided by: Prof. Purushottam Kulkarni. Submitted by: Manveer Singh Chawla. MONITORING THE PERFORMANCE OF BACKHAUL CAMPUS NETWORK. OVERVIEW. Motivation Problem statement Related Work IIT Bombay Network Background Our Solution Architecture Implementation - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: IIT BOMBAY NETWORK MEASUREMENTS

IIT BOMBAY NETWORK MEASUREMENTSMONITORING THE PERFORMANCE OF BACKHAUL CAMPUS NETWORK

Submitted by:Manveer Singh Chawla

Guided by:Prof. Purushottam Kulkarni

Page 2: IIT BOMBAY NETWORK MEASUREMENTS

OVERVIEW Motivation Problem statement Related Work IIT Bombay Network Background Our Solution

Architecture Implementation

Experimental Evaluation Network measurement data Proxy log analysis

Future Work Thesis Contribution

Page 3: IIT BOMBAY NETWORK MEASUREMENTS

MOTIVATION Consider following scenarios

User writes a mail, clicks send but sending fails!! User is talking with a friend on gtalk and it

disconnects User is browsing web but the browsing speed is

very slow What will a novice user do?

No structured approach: Starts fiddling around with network settings Reboots machine

Result? Wastes a lot of time May not even find the cause

Page 4: IIT BOMBAY NETWORK MEASUREMENTS

MOTIVATION CNTD. Multiple points of failure

User’s machine Incorrect network settings Failure of ethernet card/cable

LAN Switch Router DNS Proxy

WAN Web Server Network Congestion

No user control over LAN / WAN failures

Page 5: IIT BOMBAY NETWORK MEASUREMENTS

PROBLEM DEFINITION1. Build a measurement tool which monitors

the status of elements in network back- bone, such that in case of network failure, it is able to detect and diagnose the cause of failure. These elements include the subnet routers, switches, DNS servers and network proxy.

2. A measurement study of the network proxy to study the response time variation, traffic pattern and object size variation across the day

Page 6: IIT BOMBAY NETWORK MEASUREMENTS

RELATED WORK Jigsaw

Merge traces to passively measure queuing delays, throughput

We summarize a trace to determine status of nodes

WiFiProfilerFault diagnosis in wireless setting for user machinePerform distributed analysis Ours is centralized processing of wired network

Network measurement tools Pathchar: bandwidth, queue size, packet drop rateTraceroute: RTT, Topology

Page 7: IIT BOMBAY NETWORK MEASUREMENTS

IIT BOMBAY NETWORK

Page 8: IIT BOMBAY NETWORK MEASUREMENTS

MAP

Page 9: IIT BOMBAY NETWORK MEASUREMENTS

SERVICES Proxy: netmon

Web caching Authentication Content filtering

Firewall NATing Packet filtering Internal and External

DNS DNS server for campus DNS servers in few subnets

Monitoring Traffic statistics

Page 10: IIT BOMBAY NETWORK MEASUREMENTS

WORKING OF PROXY

Page 11: IIT BOMBAY NETWORK MEASUREMENTS

MEASUREMENT CHALLENGES

Permission from Computer Centre Large volume of data

Unaware and amateur users Specific h/w required What to measure in such a large network

Use existing infrastructure Old h/w: unpredictable failures WAN: firewall makes difficult to diagnose

Page 12: IIT BOMBAY NETWORK MEASUREMENTS

OUR SOLUTION

Page 13: IIT BOMBAY NETWORK MEASUREMENTS

ARCHITECTURE

Page 14: IIT BOMBAY NETWORK MEASUREMENTS

SERVER NODE

• Send logs to diagnostic-node after collection

ICMP PORT_UNREA

CHABLE

Query reply from server

Bad request on HTTP GET

request

Page 15: IIT BOMBAY NETWORK MEASUREMENTS

CLIENT NODE

• Send logs to diagnostic-node after collection

Page 16: IIT BOMBAY NETWORK MEASUREMENTS

DIAGNOSTIC NODE

Page 17: IIT BOMBAY NETWORK MEASUREMENTS

DIAGNOSTIC NODE CNTD.

Is it seen by all?

Machine down with failure

Failure Seen

Machine overloaded

Yes

No

Determining the status of proxy (netmon)

Page 18: IIT BOMBAY NETWORK MEASUREMENTS

DIAGNOSTIC NODE CNTD.

Is it not

reach-able for all?

Machine overloaded

Problem in hierarchy

Determining the status of dns servers

Send back

to back querie-s

No

Yes

Machine Down

internal answeredexternal notanswered

other cases

Page 19: IIT BOMBAY NETWORK MEASUREMENTS

DIAGNOSTIC NODE CNTD.

Offline mode statistics for specified time period

Online mode statistics for last 10 minutes

Remote query mode query status of node at specified time

Page 20: IIT BOMBAY NETWORK MEASUREMENTS

EXPERIMENTAL EVALUATION

Page 21: IIT BOMBAY NETWORK MEASUREMENTS

SETUP Server node at 8 locations around the campus Client node at 3 locations around campus Collected data from 26th March – 15th June

No data for 25th May to 2nd Jun Measurements for following nodes:IP Address Name192.0.50.1 h8router-interface110.12.250.1 h8router-interface210.12.250.2 h12switch10.2.250.1 h3router-interface110.105.250.1

cserouter-interface1

10.129.1.1 kresit-dns10.200.1.11 iitbombay-dns10.105.1.7 cse-dns

IP Address Name10.107.1.250

ccrouter-interface1

192.0.20.2 ccrouter-interface2

192.0.40.2 ccrouter-interface3

192.0.50.2 ccrouter-interface4

10.129.250.1

ccrouter-interface5

10.129.1.250

ccrouter-interface6

10.165.250.1

ccrouter-interface7

netmon.iitb netmon

Page 22: IIT BOMBAY NETWORK MEASUREMENTS

DNS SERVICE TIME DISTRIBUTION

Page 23: IIT BOMBAY NETWORK MEASUREMENTS

DNS SERVICE TIME DISTRIBUTION: OBSERVATIONS

• Median response time is very less for all• Average is significantly greater than median

• heavy tailed• kresit-dns has much higher average and 90th percentile

Page 24: IIT BOMBAY NETWORK MEASUREMENTS

OUTAGE DISTRIBUTIONS

• Most of the outages are of smaller length. • Median is <= 2 minutes, 90th Percentile <= 10 for almost all.

Page 25: IIT BOMBAY NETWORK MEASUREMENTS

PERCENTAGE DOWNTIME ACROSS DAYS

• On most of the days downtimes are < 2 % for most of the nodes.• There is not much pattern across days

Page 26: IIT BOMBAY NETWORK MEASUREMENTS

COMBINED DOWNTIME

netmon ~ 0.24 % Percentage time atleast on interface is not

working is close to all not working Either machine goes down Or the measurements are not taking place at same

time Time to check the status of machine is variable

Element Atleast one down (%)

All not working (in

%)Hostel 8 Router 2.144 1.980Hostel 3 Router 0.686 0.686

CC Router 0.657 0.565DNS Servers 0.414 0.406

Page 27: IIT BOMBAY NETWORK MEASUREMENTS

RESULTS SUMMARY Router failure > DNS failure > netmon failure Median node outage <= 2 min Small number of outages each day

No pattern across days Average DNS Service time ~ 300 ms netmon is less than generally perceived

Dependence on other services: LDAP, DNS A lot of machinery in the network is old

Page 28: IIT BOMBAY NETWORK MEASUREMENTS

PROXY LOG ANALYSIS

Page 29: IIT BOMBAY NETWORK MEASUREMENTS

MOTIVATION Per day logs are huge, over 6 Gb Storing logs to perform long historical

analysis a problem Over 2 Tb for a year !

What is the traffic distribution ? What is the object size distribution ? What is response time distribution ? Is there some trend across days? What strategy can be used to select logs for

long term historical analysis ?

Page 30: IIT BOMBAY NETWORK MEASUREMENTS

PROBLEM DEFINITION1. Build a measurement tool which monitors

the status of elements in network back- bone, such that in case of network failure, it is able to detect and diagnose the cause of failure. These elements include the subnet routers, switches, DNS servers and network proxy.

2. A measurement study of the network proxy to study the response time variation, traffic pattern and object size variation across the day

Page 31: IIT BOMBAY NETWORK MEASUREMENTS

PROXY LOG ANALYSIS Log file has following format

Month Date Time Proxy_Server squid_process_id epoch_timestamp process_time_ms source_ip tcp_status/http_status_code object_size request_type URL user_id hierarchy_code/server_ip object_type/object_sub_type

Stored in a MySQL database Processed logs for a week from

May 14, 2009 – May 20, 2009 Size of the log file ~ 6 Gb Number of requests in a day ~ 22 million Bytes downloaded ~ 401.6 Gb

Page 32: IIT BOMBAY NETWORK MEASUREMENTS

TRAFFIC DISTRIBUTION ON OBJECT TYPE: REQUESTS

• Percentage distribution remain same across days• Multimedia traffic is the least ~ 0.2 % • Text traffic is maximum ~ 40 %

Page 33: IIT BOMBAY NETWORK MEASUREMENTS

TRAFFIC DISTRIBUTION ON OBJECT TYPE: DOWNLOADED BYTES

• Percentage distribution remain same across days• Multimedia traffic is the maximum ~ 38 %

Page 34: IIT BOMBAY NETWORK MEASUREMENTS

TRAFFIC DISTRIBUTION ON LOCATION: REQUESTS

• Percentage distribution remain same across week days• Increase in hostel traffic on weekends• Decrease in academic traffic on weekends

Page 35: IIT BOMBAY NETWORK MEASUREMENTS

TRAFFIC DISTRIBUTION ON LOCATION: DOWNLOADED BYTES

• Percentage distribution for downloaded bytes follow number of requests• Object type distribution remains same across days, thus majority of users have similar behavior in different locations

Page 36: IIT BOMBAY NETWORK MEASUREMENTS

TRAFFIC DISTRIBUTION: SUMMARYCategor

yApplication

(in %)Image(in %)

Text(in %)

Multimedia

(in %)

Other(in %)

Requests11.02 35.43 42.76 0.18 10.61

Bytes 30.52 12.05 14.94 38.28 4.20

Category

Admin(in %)

Acad(in %)

Hostel(in %)

Resnet(in %)

Requests3.50 28.16 61.90 6.58

Bytes 2.83 25.59 64.73 6.85

Page 37: IIT BOMBAY NETWORK MEASUREMENTS

NUMBER OF ARRIVALS PER SECOND

• Lesser activity from 2 a.m. – 11 a.m, lan curtailment• Higher activity points at 3 p.m., 7 p.m., and 11 p.m.• Average ~ 250 , Standard Deviation ~ 135

Page 38: IIT BOMBAY NETWORK MEASUREMENTS

NUMBER OF REQUESTS CONCURRENTLY SERVED

• Average ~ 2000 , Standard Deviation ~ 859 • Follows the arrival curve

Page 39: IIT BOMBAY NETWORK MEASUREMENTS

MEAN RESPONSE TIME AT TIME OF DAY

• Response time remains almost constant throughout the day• A peak at around 4 a.m. • Average ~ 9.8 seconds

Page 40: IIT BOMBAY NETWORK MEASUREMENTS

MEDIAN RESPONSE TIME AT TIME OF DAY

• Median Response time remains constant throughout the day, 480 ms for the day• Median curve is a better estimate of average value on a day • Both the median and mean response time do not follow requests concurrently served and arrival curve

Page 41: IIT BOMBAY NETWORK MEASUREMENTS

CUMULATIVE RESPONSE TIME DISTRIBUTION

• For multimedia the curve becomes linear• For remaining categories it is heavy tailed• Median response times: application ~472 ms, text ~ 563 ms, image ~ 172 ms, multimedia ~ 10175 ms and other ~ 672 ms

Page 42: IIT BOMBAY NETWORK MEASUREMENTS

CUMULATIVE OBJECT SIZE DISTRIBUTION

• For multimedia object sizes are more evenly distributed• Remaining categories have 90 % of objects < 10 Kb • Median object sizes: application ~1.5 Kb, text ~ 0.8 Kb, image ~ 1.7 Kb, multimedia ~ 903 Kb and other ~ 0.46 Kb

Page 43: IIT BOMBAY NETWORK MEASUREMENTS

RESULTS SUMMARY Multimedia traffic is the major part of WAN

traffic Percentage traffic distribution

Similar across object type on days Similar in different areas except on weekends Thus any log file can be selected as a

representative of the week Larger log file for more data one for weekend and one for weekdays

Page 44: IIT BOMBAY NETWORK MEASUREMENTS

FUTURE WORK Characterization of request processing time

at proxy Explore the other causes of failure including

the LDAP service Explore the failures from the side of ISP, from

a point outside the network Studying the traffic within LAN

Page 45: IIT BOMBAY NETWORK MEASUREMENTS

THESIS CONTRIBUTIONS Studied the tools and methodologies used for

network measurement Surveyed and documented the campus

network of IIT Bombay Architecture Services Failures

Developed a tool to detect some of the failures Can be easily extended to detect others

Experimental evaluation of tool by setting up testbed

Measurement analysis of proxy logs

Page 46: IIT BOMBAY NETWORK MEASUREMENTS

BIBLIOGRAPHY[1] Computer Center, IIT Bombay.

http://www.cc.iitb.ac.in[2] dnscache. http://cr.yp.to/djbdns/dnscache.html[3] Iperf. http://dast.nlanr.net/Projects/Iperf/ [4] iptables.

http://www.netfilter.org/projects/iptables/index.html. [5] Jpcap: a Java library for capturing and sending

network packets. http://netresearch.ics.uci.edu/kfujii/jpcap/doc/.

[6] Squid logs. http://wiki.squid-cache.org/SquidFaq/SquidLogs

[7] Traceroute. http://sourceforge.net/projects/traceroute

Page 47: IIT BOMBAY NETWORK MEASUREMENTS

BIBLIOGRAPHY CNTD.[8] Ultra monkey. http://www.ultramonkey.org/[9] Wikimedia.

http://www.squid-cache.org/Library/wikimedia.dyn [10] Kostas G. Anagnostakis, Michael Greenwald,

and Raphael Ryger. cing: Measuring network-internal delays using only existing infrastructure. In proceedings of IEEE Infocom, April 2003.

[11] Ranveer Chandra, Venkata N. Padmanabhan, and Ming Zhang. Wifiprofiler: Cooper- ative Diagnosis in Wireless LANs. In Proceedings of the 4th international conference on Mobile systems, applications and services, June 2006.

Page 48: IIT BOMBAY NETWORK MEASUREMENTS

BIBLIOGRAPHY CNTD.[12] Yu-Chung Cheng, John Bellardo, Peter

Benko, Alex C. Snoeren, Geoffrey M. Voelker, and Stefan Savage. Jigsaw: Solving the Puzzle of Enterprise 802.11 Analysis. In Proceedings of the 2006 conference on Applications, technologies, architectures, and protocols for computer communications, September 2006

[13] Ramesh Govindan and Hongsuda Tangmunarunkit. Heuristics for Internet map dis- covery. In proceedings of Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies, 2000. 101102 Bibliography

Page 49: IIT BOMBAY NETWORK MEASUREMENTS

BIBLIOGRAPHY CNTD.[14] Bradley Huffaker, Marina Fomenkov, David

Moore, and Ke Claffey. Macroscopic analyses of the infrastructure: measurement and visualization of Internet connectivity and performance. In proceedings of Passive and Active Measurements, 2001

[15] Van Jacobson. pathchar - a tool to infer characteristics of Internet paths, 1997.

[16] Alex Rousskov and Valery Soloviev. A performance study of the Squid proxy on HTTP/1.0. World-Wide Web Journal, Special Edition on WWW Characterization and Performance Evaluation, 1999.

Page 50: IIT BOMBAY NETWORK MEASUREMENTS

BIBLIOGRAPHY CNTD.[17] Stefan Savage. Sting: a TCP-based Network

Measurement Tool. In Proceedings of the Second Conference on USENIX Symposium on Internet Technologies and Systems, 1999.

[18] Subhabrata Sen and Jia Wang. Analyzing peer-to-peer traffic across large networks. In Proceedings of the 2006 ACM CoNEXT conference, 2006.

[19] Nirav S. Uchat. IIT bombay web traffic characterization.

[20] Ameya P. Usgaonkar. Network Performance Analysis by Mining Multi-Variate Time Series Data, January 2001.

Page 51: IIT BOMBAY NETWORK MEASUREMENTS

Extra Slides

Page 52: IIT BOMBAY NETWORK MEASUREMENTS

RELATED WORKPassive Measurement

WiFiProfiler collaborative diagnosis, information from neighbors blame assignment algorithm to predict actual

cause Jigsaw

collect and merge traces from multiple vantage points

create single unified view of network large scale synchronization frame unification

Measures queuing delays experienced by users throughput: compare observed vs expected (using

RTT,path loss) effect of mobility techniques: scanning, dhcp, initial

association

Page 53: IIT BOMBAY NETWORK MEASUREMENTS

RELATED WORK CNTD Squid Log Analysis by Rousskov et. al

Logs from seven proxies, 18 days of logs Applied patch to squid to measure: proxy connect time,

client connect time, server reply time, proxy reply time, swap-in time and swap-out time

Studied traffic distribution, response time at proxy, number of requests at proxy, disk traffic intensity, disk utilization, disk response time: all against TCP_STATUS i.e. HITS and MISS

Shortcomings: No long term historical analysis No comparison of direct traffic with proxied traffic

Active measurements Pathchar: bandwidth, queue size, packet drop

rate Traceroute: RTT, Topology

Page 54: IIT BOMBAY NETWORK MEASUREMENTS

RELATED WORK CNTD Active Measurement

PathChar measures: bandwidth, queue size, packet drop rate uses TTL field in IP header series of probes with varying packet size

Neglecting, queuing delay, Serror/B and tprocessing, reduces to RTT = Spacket/B

Packet loss: number of error messages received Statistic for node n = Statistic till nth node -

Statistic till n-1th node

Page 55: IIT BOMBAY NETWORK MEASUREMENTS

APPLICATION LAYER FAILURES Web Access Failures

Service Unavailable Connection timed out

failure Connection refused Connection reset Gateway Timeout No data received

Connection Closed at an intermediate byte

DNS Access Failures Connection Timed Out Blank answer field

Router Access Failures No Route To Host No response received

Page 56: IIT BOMBAY NETWORK MEASUREMENTS

IMPLEMENTATION Client Module

Snoop on incoming packets using jpcap library Node reachablity

If any packets received -> Subnet switch reachable If IP packets from other subnet received -> Subnet

router reachable If IP packets from DNS server received -> DNS server

reachable If IP packets from netmon received -> netmon

reachable Traffic characteristics

Size of packet Delay using inter-arrival time of two packets

Threads to synchronize measurements and information sending

Page 57: IIT BOMBAY NETWORK MEASUREMENTS

IMPLEMENTATION CNTD Server Module

Check plugged in ethernet cable: status of interface using ifconfig

Status of switch/router: Hop limited IP packets, using traceroute

Status of DNS server: Query to the DNS server using dig

Status ofproxy: a HTTP get request at port 80 using wget

Web download: Using wget Using JAVA runtime library to run these utilities Synchronize using SNTP protocol: implemented

in JAVA

Page 58: IIT BOMBAY NETWORK MEASUREMENTS

IMPLEMENTATION CNTD Communication Module

Used for sending logs and querying diagnostic-nodes Implemented using JAVA Net package

Receiver listens on a port Sender connects and sends the logs/query Our protocol to send and receive messages

Logging Module Used by diagnostic, server and client nodes Stores log in directory hierarchy: ip/yyyy/mm/dd Unsent logs stored to be sent in future New threads are created to make logs -> prevent

blocking Implemented using JAVA threads and JAVA IO package

Page 59: IIT BOMBAY NETWORK MEASUREMENTS

IMPLEMENTATION CNTD Diagnostic Module

Uses the logs of server and client nodes Continuous mode

Analyzes statistics every 10 minutes Statistics generated

Node outages, Percentage status distribution, last uptime status of nodes, DNS service time statistics

Offline mode User specifies the start and end time of measurement Statistics generated

Node outages, Percentage status distribution, last uptime status of nodes, DNS service time statistics, Node status at given time

Remote query mode User can query about node status at given time

Page 60: IIT BOMBAY NETWORK MEASUREMENTS

PERCENTAGE DOWN-TIME ON A DAY: DNS SERVERS

• Most of the days percentage downtime is < 1 % for all servers• No pattern in down-time across days

Page 61: IIT BOMBAY NETWORK MEASUREMENTS

PERCENTAGE DOWN-TIME ON A DAY: NETMON

• Most of the days percentage downtime is < 0.2 %• No pattern in down-time across days

Page 62: IIT BOMBAY NETWORK MEASUREMENTS

PERCENTAGE DOWN-TIME ON A DAY: HOSTEL 8 ROUTER

• Most of the days percentage downtime is < 2 %• No pattern in down-time across days

Page 63: IIT BOMBAY NETWORK MEASUREMENTS

PERCENTAGE DOWN-TIME ON A DAY: CC ROUTER

• Most of the days percentage downtime is < 1 %• No pattern in down-time across days

Page 64: IIT BOMBAY NETWORK MEASUREMENTS

OUTAGE LENGTH DISTRIBUTION: CC ROUTER

• Most of the outages are of smaller length

2

6

132

133

Page 65: IIT BOMBAY NETWORK MEASUREMENTS

OUTAGE LENGTH DISTRIBUTION: DNS

• Most of the outages are of small length• Smaller number of outages

2764

1

Page 66: IIT BOMBAY NETWORK MEASUREMENTS

OUTAGE LENGTH DISTRIBUTION: NETMON

• Most of the outages are of length < 3 min

3

Page 67: IIT BOMBAY NETWORK MEASUREMENTS

OUTAGE LENGTH DISTRIBUTION: HOSTEL 8 ROUTER

• Most of the outages are of smaller length

2

4

7

Page 68: IIT BOMBAY NETWORK MEASUREMENTS

OUTAGE LENGTHS

Page 69: IIT BOMBAY NETWORK MEASUREMENTS

STATUS DISTRIBUTION

Page 70: IIT BOMBAY NETWORK MEASUREMENTS

PROXY RESPONSE TIME VS USER RESPONSE TIME

Page 71: IIT BOMBAY NETWORK MEASUREMENTS

PROXY RESPONSE TIME VS USER RESPONSE TIME

Page 72: IIT BOMBAY NETWORK MEASUREMENTS

PROXY RESPONSE TIME VS USER RESPONSE TIME

Page 73: IIT BOMBAY NETWORK MEASUREMENTS

EXPERIMENT: PROXY FAILURE Setup:

wget to fetch berkley and netmon (http://netmon.iitb.ac.in)

Repeatedly performed at ever 6 minute interval From 2:42 on 22nd September to 1:06 on 25th Septmber

from kresit (10.129.41.189) 400 bad request response, denoted by 1, indicates proxy

is up -1 for connection refused error -3 for 503 server error

Result netmon: 0.7 % connection refused error berkley: 8.7% connection refused error, 0.28 503 error Intersection of failure implies

Machine not running, or Port is closed

Page 74: IIT BOMBAY NETWORK MEASUREMENTS

EXPERIMENT: PROXY FAILURE CNTD

Page 75: IIT BOMBAY NETWORK MEASUREMENTS

EXPERIMENT: DNS FAILURE Setup

dig to send back-to-back probes to dns.iitb.ac.in Periodically sent once every 2 minutes Conducted fro 22:06 on 17th September to 13:36

on 18th September from kresit(10.129.41.189) One query for internal domain and other for

external Both the domains randomly generated 1 -> answer field present, 0 -> answer field not

present Result

External queries failed 2.36 % of time Internal queries never failed

Page 76: IIT BOMBAY NETWORK MEASUREMENTS

EXPERIMENT: DNS FAILURE CNTD