1 caching characteristics of internet and intranet web proxy traces arthur goldberg ilya pevzner...
TRANSCRIPT
1
Caching Characteristics Caching Characteristics of Internet and Intranet of Internet and Intranet Web Proxy TracesWeb Proxy Traces
Arthur GoldbergIlya PevznerRobert Buff
Courant Institute of Mathematical SciencesNew York University
2
Clients, Servers and ProxyClients, Servers and Proxy
C1
CM
PC2
S1
SN
S2
Clients Ci configured to use cachingproxy P to access N servers Si . Aserver may use a proxy itself.
3
HTTP Through a ProxyHTTP Through a ProxyBrowser Proxy Server
Miss
Hit
4
Potential Web Caching Potential Web Caching BenefitsBenefits
• Reduce response time by delivering Reduce response time by delivering document from a closer and/or less document from a closer and/or less loaded server than the origin serverloaded server than the origin server
• Save bandwidth costs between Save bandwidth costs between proxy and origin serverproxy and origin server
5
GoalsGoals
• Study large internet and intranet Study large internet and intranet tracestraces
• Evaluate caching opportunities and Evaluate caching opportunities and problemsproblems
• Examine cache size needs and Examine cache size needs and document residence timesdocument residence times
6
Part 1Part 1
Proxy trace sources Proxy trace sources and proxy and proxy configurationsconfigurations
7
Data SourcesData Sources
8
ISP UsageISP Usage
• 450,000 users450,000 users
• LoadLoad– PeakPeak
• 500 unique clients500 unique clients
• 30 requests per second30 requests per second
– AverageAverage• 1M requests per day1M requests per day
9
ISP hardware detailsISP hardware details
• IBM RS/6000 systemIBM RS/6000 system
• 256 MB RAM256 MB RAM
• Three 4 GB disksThree 4 GB disks
10
ISP proxy configuration ISP proxy configuration detailsdetails• 8 proxies nationwide8 proxies nationwide• Netscape 2.5 proxyNetscape 2.5 proxy• 5.5 GB cache size5.5 GB cache size• Netscape extended-2 log formatNetscape extended-2 log format• ParametersParameters
– max-uncheck - 6 hoursmax-uncheck - 6 hours– lm-factor - 0.1lm-factor - 0.1– term-percent - 80%term-percent - 80%
11
Intranet UsageIntranet Usage
• 8,000 employees8,000 employees
• LoadLoad– PeakPeak
• VariesVaries
– AverageAverage• 500K requests per day, over 10 hours500K requests per day, over 10 hours
12
Intranet hardware detailsIntranet hardware details
• Sun Microsystems Ultra 1 serverSun Microsystems Ultra 1 server
• 1 GB RAM1 GB RAM
• Seven 4 GB disksSeven 4 GB disks
13
Intranet proxy Intranet proxy configuration detailsconfiguration details
• 2 proxies2 proxies
• Squid 1.1.21 proxySquid 1.1.21 proxy
• 12 GB disk cache size12 GB disk cache size
• 750MB memory cache size750MB memory cache size
• Extended log formatExtended log format
14
Part 2Part 2
Analysis of ISP and Analysis of ISP and Intranet traces Intranet traces assuming unlimited assuming unlimited cache storagecache storage
15
Key Cache MetricsKey Cache Metrics
• Hit Ratio (Hit Ratio (HR HR ))
• Fractional Bandwidth Savings (BT)Fractional Bandwidth Savings (BT)
served documents of numbercache the from served documents of number
HR
served bytes of numbercache the in documents withserved bytes of number
BT
16
Analyzing Caching Analyzing Caching PropertiesProperties
Hit RateBandwidth savings
HR(ALL) BT(ALL) All
HR(-ACTUAL) BT(-ACTUAL)Cached by operating proxy
Analysis nameDocuments cached
Cachable as per HTTP specification
HR(-RFC) BT(-RFC)
17
ISP documents that cannot be ISP documents that cannot be cached, as per HTTP cached, as per HTTP specificationspecification
Reason for non-cacheability
% of entries in ISPtrace
Expires 3.2%
Cache control 0.8%
Pragma: no-cache 0.4%
Request method 0.0%
18
Comment about “cookies”Comment about “cookies”• For Prodigy, RFC figures assume that For Prodigy, RFC figures assume that
Netscape proxy follows RFCNetscape proxy follows RFC
• In reality, Netscape proxy does not cache In reality, Netscape proxy does not cache documents with cookiesdocuments with cookies
• Documents with cookies, account for 2% of Documents with cookies, account for 2% of responses in Prodigy trace responses in Prodigy trace
• It follows that RFC figures for Prodigy may be It follows that RFC figures for Prodigy may be up to 2% higher than shownup to 2% higher than shown
19
ISP Hit Ratio vs. Trace ISP Hit Ratio vs. Trace Length Length
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.0E+00 1.1E+06 2.1E+06 3.2E+06 4.2E+06Trace length
Hit ratio
HR(ALL)
HR(-RFC)
HR(-ACTUAL)
20
ISP BT vs. Trace Length ISP BT vs. Trace Length
0
0.2
0.4
0.0E+00 1.3E+06 2.6E+06 3.8E+06 5.1E+06Trace length
Fraction
BT(ALL)
BT(-RFC)
BT(-ACTUAL)
21
Intranet HR vs. Trace Intranet HR vs. Trace LengthLength
Hit Ratios
0.00.10.20.30.40.50.60.70.80.91.0
0.0E+00 4.0E+05 8.0E+05 1.2E+06 1.6E+06Trace length
Hit ratio
HR(ALL)
HR(-RFC)
HR(-ACTUAL)
22
Intranet BT vs. Trace Intranet BT vs. Trace Length Length
Fractions of Bytes Transferred Saved by Caching
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.0E+00 4.0E+05 8.0E+05 1.2E+06 1.6E+06 Trace length
Fraction
BT(ALL)
BT(-RFC)
BT(-ACTUAL)
23
Part 3Part 3
Analysis of ISP trace with finite Analysis of ISP trace with finite cache sizes.cache sizes.
24
Prophetic Cache Prophetic Cache Replacement AlgorithmReplacement Algorithm
• A A Prophetic Prophetic cache stores exactly the set cache stores exactly the set of documents that will be referenced in of documents that will be referenced in the futurethe future
• An on-line prophetic cache algorithm An on-line prophetic cache algorithm cannot be builtcannot be built
• However, given a trace, prophetic However, given a trace, prophetic caching decisions can be determined off-caching decisions can be determined off-lineline
25
Prophetic Cache Prophetic Cache Replacement Algorithm Replacement Algorithm (continued)(continued)
• Cache space used by a prophetic Cache space used by a prophetic cache is the minimum size needed cache is the minimum size needed to avoid cache missesto avoid cache misses– notes:notes:
• true for any maximum residence timetrue for any maximum residence time
• analyses make cyclical tracesanalyses make cyclical traces
26
Maximum Hit Rate as a Maximum Hit Rate as a function of residence timefunction of residence time
20 40 60 80 100 120 140Residence time (hours)
0.10.20.30.40.50.6
Hit Rate
27
Maximum Hit Rate as a Maximum Hit Rate as a function of residence function of residence time, by document sizetime, by document size
20 40 60 80 100 120 140Residence time (hours)
0.10.20.30.40.50.6
Hit Rate
101 – 1K
1K – 10K 10K – 100K
100K – 1M
1 - 10
11 - 100
28
ConclusionsConclusions
• We analyze very long Web proxy traces We analyze very long Web proxy traces from an ISP and an intranetfrom an ISP and an intranet
• We propose a new method to evaluate We propose a new method to evaluate a proxy by comparing the actual hit rate a proxy by comparing the actual hit rate with potential hit ratewith potential hit rate
• We show that it is important to keep the We show that it is important to keep the cache residence time above one daycache residence time above one day
29
AddressesAddresses
• E-mail: E-mail: {artg,pevzner,buff}@cs.nyu.edu{artg,pevzner,buff}@cs.nyu.edu
• WWW: WWW: www.cs.nyu.edu/{artg,pevzner,buff}www.cs.nyu.edu/{artg,pevzner,buff}
• Paper and presentation is available Paper and presentation is available at www.cs.nyu.edu/artgat www.cs.nyu.edu/artg