network-aware clustering of web clients advanced ip topics seminar, fall 2000 supervisor: anat...
Post on 22-Dec-2015
215 views
TRANSCRIPT
Network-Aware Clustering of Web Clients
Advanced IP Topics Seminar, Fall 2000
Supervisor: Anat Bremler
Speaker: Zotenko Elena
Paper
• presentation is based on:– Balachander Krishnamurthy and Jia Wang, “On
Network-Aware Clustering Of Web Clients”, Proc. of ACM SIGCOM 2000;
– Balachander Krishnamurthy and Jia Wang, “On Network-Aware Clustering Of Web Clients”, Technical Report 000101-01-TM, AT&T Labs-Research January 2000
Agenda
• problem definition
• simple approach for the problem solution
• network-aware approach for the problem solution using information from BGP routing tables
• applications
9.3.5.1109.3.5.111
12.2.94.3012.2.95.3012.2.95.33
12.2.94.3012.2.95.3012.2.95.33
9.3.5.1109.3.5.111
Problem Definition
• definition of clustering in our case:– a partitioning of a set of IP addresses into non-
overlapping groups, such that all IP addresses in a group are topologically close and under common administrative control
net A9.3.5/255.255.255
net B12.2.94/255.255.254
Simple Approach
• assumes that 24 MSB of each IP address identify network
• groups IP addresses based on network portion of IP address
• drawbacks, assumption is not always correct due to CIDR:– aggregation; – sub-netting;
Simple Approach
clusters identified correctly
misidentified clusters:• one cluster contains several networks;
misidentified clusters:• one network spans several clusters;
network prefix distribution for BGP routing table snapshot for MAE-West NAP
NAA Overview
• identifies networks based on:– BGP routing tables snapshots
– IP dump files
• includes validation and adaptation stage
NAA Overview
prefix table clustering validationself-correction
andadaptation
input – network prefixes from:•BGP routing table snapshots;•IP dump files from ARIN and NLANR;output – prefix table:•contains all prefixes in one format;
NAA Overview
prefix table clustering validationself-correction
andadaptation
input:•prefix table;•IP addresses for clustering;output – raw clusters:•each network prefix represents a cluster;•put IP address into cluster with longest match;
NAA Overview
prefix table clustering validationself-correction
andadaptation
example:•prefix table contains:
•prefix A: 172.30.0.0/255.255.0.0•prefix B: 172.30.110.0/255.255.255.0
•172.30.110.256 will be assigned to cluster represented by prefix B;•172.30.115.256 will be assigned to cluster represented by prefix A;
NAA Overview
prefix table clustering validationself-correction
andadaptation
input:•raw clusters;estimates the goodness of raw clusters by cross check on small number of clusters (sample of 1% of clusters);
NAA Overview
prefix table clustering validationself-correction
andadaptation
goodness:•cluster are too big => includes IP addresses from different “networks”;•cluster are too small => several clusters include IP address from the same “network”;•“network” – group of IP addresses which are topologically close and under common administrative control;
NAA Overview
prefix table clustering validationself-correction
andadaptation
dynamically change clustering according to changes in network topology
NAA Building Prefix Table
ARIN (American Registry For Internet Numbers) IP dump file:•contains IP addresses registered with ARIN•on one hand may contain addresses of non-existent networks, thus is much larger than any BGP snapshot•on the other hand may contain IP address which contains several networks•only 1% clients is clustered based on IP dump files
BGP snapshot taken from AADS NAP:•publicly available via www.merit.edu/ipma/routing_table •much smaller than IP dump files;•contains networks that physically exists and are reachable;
NAA Validation
• cross-check of clustering based on names or if names are unavailable based on paths
• based on assumption that, hosts in the same network:
– share the same non-trivial suffix in their names
– share the same last few hops on the paths toward them
NAA Validation
• why names can be unavailable (50% of clients):
– host is behind a firewall– local network acquiring dynamic IP addresses
via DHCP server– ISP does not having registered any names for
its customers
NAA Validation
• validation procedure:– sample 1% of clusters
– for each cluster:• use modified traceroute to resolve host name or last few
hops toward host for each IP address in the cluster
– if cluster contains hosts from several networks declare cluster as misidentified
– if several clusters contain hosts from several networks declare those clusters as misidentified
NAA Validation
about 10 % of clusters are misidentified;one reason for misidentification is existence of national gateways (e.g. France, Japan), such that information about networks behind these gateways is unavailable in routing tables;
about half of sampled clients have names resolvable
BGP Dynamics And NAA
• BGP routing tables change dynamically due to changes in network topology
• NAA clusters clients based on BGP tables snapshots, which may not reflect current network topology or network topology in the time when client IP address was logged/recorded
BGP Dynamics And NAA
• trying to find out how BGP dynamics affect NAA clustering:– download BGP snapshots daily over period of
n days– denote by S[i] set of prefixes downloaded
during day i– denote by maximum effect to be the size of set
of prefixes that change during entire testing period
iS
BGP Dynamics And NAA
• example: testing period of 3 days
S[2]S[1]
S[3]set of prefixes which is unchanged during entire testing period of 3 days;
set of prefixes which are changed during entire testing period of 3 days;the size of this set is maximum effect;
BGP Dynamics And NAA
number of prefixes in AADS BGP snapshot during 4th day in testing period of 14 days
maximum effect observed till now, during 4 daysnumber of prefixes from AADS BGP
snapshot used to identify clusters from Apache server log
maximum effect for prefixes used to identify Apache server log clusters observed till now;maximum effect is about 121/3929 = 3% of all client clusters;
NAA Adaptation
• although empirical results show that only about 3% of client clusters are affected by changing network topology can employ adaptation step to improve NAA applicability
• run periodic traceroute on sampled clusters• using traceroute results:
– merge clusters that span the same network into one cluster
– divide cluster that include several networks into several clusters
Applications
• position of WEB caching proxy
caching proxy:-acts as server for clients;-acts as client for original server;-caches frequently accessed resources;
Applications
client request distribution of a client cluster containing a spider
a spider which issues 99.79% of all requests in the cluster
Applications
number of request over time for entire server log
number of request over time for cluster containing a proxy
number of request over time for cluster containing a spider
Applications
• identify busy clusters based on metrics such as number of clients, number of requests issued
• in front of each busy cluster place caching proxy