meta algorithms for hierarchical web caches

Meta algorithms for Hierarchical Web Caches

Nikolaos LaoutarisSofia Syntila

Ioannis Stavrakakis

Department of Informatics and TelecommunicationsUniversity of Athens

15784 Athens, Greece

{laoutaris,grad0585,ioannis}@di.uoa.gr

Introduction

The rapid growth of the Internet and the WWW have increased

The network traffic The user-perceived latency The load on web servers

Caching has been employed in order to Reduce access latency Reduce bandwidth consumption Server load balancing Improved data availability

Contemporary hierarchical caches

characteristic of contemporary hierarchical caches: Leave Copy Everywhere (LCE): a hit for a document at an

l-level cache leads to the caching of the document in all intermediate caches, on the path towards the leaf cache that received the initial request.

3,1

1,2 1,3 1,41,1

2,12,1 miss

miss

copy

copy

client request

hit

New approach

We introduce three new Meta Algorithms that revise the standard behavior of hierarchical caches, by:

operating before and independently of the actual replacement algorithm running in each individual cache (hence the “Meta”)

keeping copies in a subset of intermed. caches instead of all

We compare these algorithms against the de facto one (LCE) the one proposed by Che, Tung and Wang (JSAC, Sep. 2002 )

Additionally, we introduce a simple load balancing algorithm, based on the concept of meta algorithms

Advantages of the new algorithms

Significant reduction of average hit distance (delay/traffic reduction gain) over LCE in most cases

Suitable for storage constrained applications Low complexity

Memoryless Do not require additional information (e.g., object request frequencies etc.)

Little or no change to the protocols used to implement the existing hierarchical caches

The Prob algorithm

Each intermediate cache keeps a copy with probability p, and does not keep a copy with probability 1-p

3,1

1,2 1,3 1,41,1

2,12,1 miss

miss

copy with probability p

copy with probability p

client request

hit

The LCD algorithm

Leave a copy only at the cache that resides immediately below the location of the hit on the path to the requesting client.

Requires multiple requests to bring document to a leaf cache

3,1

1,2 1,3 1,41,1

2,12,1 miss

miss

copy

client request

hit

The MCD algorithm

Similar to LCD with the difference that a hit at level-l moves the requested document to the underlying cache (whereas LCD copies the document).

deletes requested document from the cache where the hit occurred

3,1

1,2 1,3 1,41,1

2,12,1 miss

miss

copy

client request

hit† delete

† The document does not have to be physically deleted but rather be marked for eviction

The Filter algorithm (Che et al.)

Each cache is seen as a low-pass filter, with a cutoff frequency given by the inverse of its characteristic

time the characteristic time of cache m is approximated by:

= (current time –last access time of the replaced document )

1m

m

1m

A hit for document i at level l on behalf of client k leads to the caching of i in an intermediate cache m on the path to k, when m satisfies the condition: λki is the frequency that client k requests document I

Filter is non-memory-less (requires frequency estimation)

ki1

m

Rarely requested objects are not cached thus theirrequests pass the filterthus flowing to upper levels

The Filter algorithm (cont.)

When a document is evicted from a cache at level l the algorithm forces its caching at level l+1 (upwards) if not already cached there (this may lead to a domino effect)

3,1

1,2 1,3 1,41,1

2,12,1

miss

miss

client request

hit

i11

m1 leave copy

i11

m2 don’t leave copy

* Assume that caches (1,1),(2,1),(3,1) are full

Design Principles

Prob, LCD, MCD they take advantage of the following 3 design principles:

1. Avoid the amplification of replacement errors

2. Filter-out one-timer documents

3. Rationalize the degree of replication

1.Avoid the amplification of replacement errors

replacement error: when document i is evicted while there exists a document j that if evicted would lead to an improved hit ratio.

LCE: in an L-level hierarchical cache a request for an unpopular document leads to its caching in all L caches L replacement errors amplification of replacement errors

Prob,LCD, MCD reduce the extent of the amplification by reducing the number of copies triggered by a single request

2.Filter-out one-timer documents

Measured proxy workloads contain high percentage of so called one-timer documents

One-timers: documents that are requested only once Caching a one-timer document leads to the worse type of

replacement error that can occur

LCE: deprives popular documents of valuable storage capacity by allowing one-timers to clog all caches

LCD,MCD: one-timers cannot affect any cache other than the root cache

Prob: filters out one-timers by using a small p (cache probability)

3.Rationalize the degree of replication

LCE places copies in all intermediate caches to achieve 2 goals: Have a nearby copy to service other clients connected to leaf caches Have a “backup” copy for the requesting client in case its leaf copy is

evicted Storing a large number of replicas is not always beneficial.

When demand pattern is non-homogeneous When storage capacity is limited

Prob, LCD, MCD create fewer copies, allowing for more distinct documents to be cached

This improves the exclusivity† of caches (Wong, Wilkes, Usenix 2002 ) Exclusivity relates to the ability to avoid the ineffective caching the same

documents at multiple levels

† We would like to thank an anonymous IPCCC reviewer for bringing Wong and Wilke’swork to our attention

Synthetic Simulations

Zipf-like document popularity distribution (a=0.9) Simulated hierarchical cache: regular Q-ary tree with L levels

(Q=2,L=3) Documents originate from an origin server (L+1 level) Each client is co-located with a leaf cache

A client represents the population of an organization Replacement policy at each cache: LRU Storage capacity equally allocated to the caches

Further improvements if the dimensioning of the caches is optimized (Laoutaris et al., Information Processing Letters, March 2004)

Average hit distance for Prob

Prob:[+]small p filters out more effectively one-timers [-] cost paid: slower convergence to steady state

Average hit distance for LCE,Prob,LCD,MCD …

… Average hit distance for LCE,Prob,LCD,MCD

The following may be noted:

LCE has the worse performance

Prob(0.2) is ranking second across all S

MCD’s, LCD’s performance is always better than LCE and Prob

Filter, although non-memoryless, is outperformed by LCD and closely matched by MCD

Non-stationary demand …

Non-stationary document sets common in the web Simulation scenario: every W reqs., M documents out of

the total N that can be requested, are replaced by M new ones

Models volatility is user access patterns

… Non-stationary demand

Hit distance increases with the volatility (captured here by M) LCE:

for small M is the worst performerfor large M outperforms all algorithms

Why?

LCE is able to track the new demand more quickly by requiring a single request to bring a new document to the leaf level

Prob,LCD,MCD,Filter require multiple requests to bring a copy of a new document to the leaf cache

However, the required volatility to make LCE better than the new algorithms is too high and is not typical of measured workloads which appear quite stable (Chen et al., JSAC, Aug. 2003)

Trace-driven Simulations

Description of traces: traces were filtered to keep only requests for cacheable

documents 2 types of caches were studied:

Leaf caches (duration:one week) UoA NTUA

Root caches of the NLANR hierarchy (duration:one day) Boulder,Colorado Palo Alto California Pittsburgh, Pennsylvania Urbana-Champaign San Diego,California Silicon Valley,California

Urbana-Champaign requests: 815194, docs:279375, 1-timers: 72%

Silicon Valley, California requests: 1299024, docs:726075, 1-timers: 82%

Boulder, Colorado requests: 698691, docs:365060, 1-timers: 81%

Pittsburgh, Pennsylvania requests:709180, docs:405680, 1-timers: 84%

San Diego, California requests: 193769, docs:94457, 1-timers: 83%

Palo Alto requests: 273511, docs:137497, 1-timers: 76%

UoArequests: 282540, docs:41088, 1-timers: 71%

NTUA requests: 580460, docs:234432, 1-timers: 73%

Results

Filter inferior to the best performing one, LCD, across all traces Filter more complicated than LCD

Average hit distance (AHD) AHD_Prob > AHD_MCD > AHD_LCD

LCE compared to LCD is inferior under all six NLANR traces almost as good under the UoA trace slightly better under the NTUA trace

LCE performs better when S/N is large (S:storage, N: #of docs)

Load Balancing…

LCE gives rise to the “filtering effect” (Williamson, ACM ToIT, Feb. 2002)

The “filtering effect“: popular documents gather at the leaf cachesIt leads to:

Poor hit ratios at upper levels The servicing of most of the requests at the lower level caches (causing load

imbalance)

A simple load balancing mechanism Threshold based / fully distributed Each cache

calculates its load accepts new copies of documents only when its load is below the threshold

Some popular documents are denied admission to the leaf level thus reside only at upper levels this allows for load to flow upwards

solution ?

…Load Balancing

Load: we count as load only the requests that lead to hits (we neglected the relatively smaller load due to misses)

Nearby hit small propagation delay!But!!!

This does not always lead to small total delivery delay When? when the low level cache is overloaded (then processing takes too long)

With the proposed load balancing mechanism: we sacrifice an increase of propagation delay to gain in terms of end-system processing delay

Simulations…

Load balancing may be applied to all discussed meta algorithms

Our experiments evaluate the effectiveness of LB mechanism :

Using trace data Under the LCE algorithm

LCE-LB: variation of LCE that keeps copies at all intermediate caches provided that a cache has not reached its load threshold TH

nkTH j

j

n: #of cachesk: controls the intensity of the desired load balancing

(1) LCE without LB

(2)LCE with LB (k=1)

No change relative to the no-LB case (previous slide)LB becomes effective after k=2

(3)LCE with LB (k=8)

The effect of LB becomes clear for k=8 With k=16 all levels get almost the same amount of load (see the paper for more results under several k)

Summary of LB related results

Previous figures show that: As k , load tends to be more evenly distributed among

levels Distribution of load under LCE-LB with k=1 is

almost identical to one under LCE.Why?

Load constraint under k=1 is too loose Load constraint almost equal to the maximum load that is

assigned under LCE

Load balancing becomes effective for k>2 Almost perfect LB for high values of k

The cost paid for having LB

The average hit distance (propagation delay) increases with the intensity of LB (with k that is)

Conclusions

We introduced three new Meta Algorithms We compared these algorithms against

the de facto one the one proposed by Che, Tung and Wang

We showed that these algorithms are useful in a variety of situations

LCD (the best one) seems to be performing well under all studied scenarios

We introduced a simple load balancing algorithm, based on the concept of meta algorithms that deals effectively with the “filtering effect”

Post IPCCC work†

We have derived an approximate mathematical model for predicting the performance of LCE analytically

The model predicts accurately the actual performance and gives further insights as to why LCD outperforms LCE

We have shown that LCE performs better than the DEMOTE algorithm of Wong and Wilkes (not discussed in this paper)

† Nikolaos Laoutaris, Hao Che, Ioannis Stavrakakis,

"The LCD interconnection of LRU caches and its analysis,"

submitted work, 2004.