summary cache: a scalable wide-area web cache sharing protocol

46
Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol By Abuzafor Rasal and Vinoth Rayappan

Upload: nguyet

Post on 22-Jan-2016

52 views

Category:

Documents


0 download

DESCRIPTION

Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol. By Abuzafor Rasal and Vinoth Rayappan. Web caching. HTTP request. 1. 2. HTTP response. 1. Client1. 2. 1. 1. 2. Client2. 2. 1. Cache. 2. Server. Client3. Web Cache Sharing. Rest of Internet. Bottleneck. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

Summary Cache: A Scalable Wide-Area Web Cache Sharing

ProtocolBy Abuzafor Rasal and Vinoth

Rayappan

Page 2: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

Web caching1

2

HTTP request

HTTP response

1

1

1

2

2

2

2

1

Client1

Client2

CacheServer

Client3

Page 3: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

Web Cache Sharing

Proxy Caches

Users

Regional Network

Rest of Internet

Bottleneck

. . . . . .

Page 4: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

Web Cache Sharing: Internet Cache Protocol (ICP)

• Internet Cache Protocol is currently implemented technique of web cache sharing

• Internet Cache Protocol = the proxy multicasts a query message to all other proxies whenever a cache miss occurs.

Page 5: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

Internet Cache Protocol

ClientProxy

Cache

Proxy

Cache Proxy

Cache

Proxy

Cache

InternetInternet

Page 6: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

Proxy

HTTP

INTERNET

Proxy Proxy …

Client 1 Client 2 Client n…..

1 2 N

First request: document is available in local proxy.

HTTPHIT

Internet Cache Protocol

Page 7: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

Proxy

HTTP

INTERNET

Proxy Proxy…

Client 1 Client 2 Client n…..

1 2 N

HTTP

ICP

Internet Cache Protocol

Second Request: document is not available in local proxy.

Page 8: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

Problem of ICP

• As the number of collaborating proxies increase the overhead dramatically increases, thus not scalable. – A proxy multicasts a query message to all

other proxies whenever a cache miss occurs

Page 9: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

• UDP = ICP query and replay messages

• TCP = HTTP traffic between proxies, servers, and clients

• Total Packets or IP = UDP + TCP

Problem of ICP

Page 10: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

Problem of ICP

Client Latency

2.75

3.072.85

2.42.62.8

33.2

No ICP ICP Overhead SC-ICPOverhead

UDP Msgs

615

54774

10790

20000

40000

60000

No ICP ICP Overhead SC-ICPOverhead

TCP msg

334000

328000330000

324000326000328000330000332000334000336000

No ICP ICP Overhead SC-ICPOverhead

Total Packets

355000

402000

351000

320000340000360000380000400000420000

No ICP ICP Overhead SC-ICPOverhead

+

= ;

Page 11: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

Summary Cache

• Each proxy maintains a Bloom Filter (data in compressed form) representing its local cache.

• Also, it holds Bloom Filters representing caches of other proxies.

• Updates to Bloom Filters are exchanged periodically or after a certain percentage of the documents in the cache was replaced.

• Request is sent only to proxy who most likely holds the requested document.

Page 12: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

Summary Cache

Client

InternetInternetProxy

Cache

Proxy

Cache

Proxy

Cache

Proxy

Cache

First request: document is in other proxy

Page 13: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

Summary Cache

Client

InternetInternetProxy

Cache

Proxy

Cache

Proxy

Cache

Proxy

Cache

Second request: the document is not in any proxy

Page 14: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

Summary Cache

Client

InternetInternetProxy

Cache

Proxy

Cache

Proxy

Cache

Proxy

Cache

Third request: summary gives false hit

Page 15: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

Summary Cache

• Two Parameter to design of Summary Cache protocol:– The frequency of summary updates. (inter-proxy traffic,

overhead)– The representation of summary (memory).

• Above Solution:– Delay update summaries until a fixed percentage i.e. 1% of the

cached documents are new. • Positive: Reduce overhead (traffic)• Negative: Introduce “false miss” error

– Store summaries as a “Bloom Filter”. This is efficient hash-based probabilistic scheme that represent URLs of cached document.

• Positive: Reduce memory requirement• Negative: Introduce “false hit” error

Page 16: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

Summary Cache• false misses:

– Definition : • the document requested is cached at some other proxy but its summary

does not reflect the fact.

– Effect: • In this case, a remote cache hit is lost, and the total hit ratio within the

collection of caches is reduced.

– Improvement: • can be eliminated/improved with higher frequency of update

• false hits: – Definition:

• the document requested is not cached at some other proxy but its summary indicates that it is. The proxy will send a query message to the other proxy, only to be noticed that the document is not cached there.

– Effect:• In this case, a query message is wasted.

– Improvement: • can be eliminated/improved by increasing the vector size of Bloom Filter or

increase memory size of representation

Page 17: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

Summary Cache

• Remote Stale Hits: document is cached at another proxy but the cached copy is stale. (Not because of update delay)

– Delta compression can be used to transfer the new document. Delta compression transfers only the difference between the old and the new document instead of downloading the whole document.

Page 18: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

Summary Cache

• Two factors limit the scalability:– The network overhead, the inter-proxies

communication. • Determined by update frequency, false hits and

remote hits

– Memory required to store the summaries. • Determined by size of individual summary and # of

proxies.

Page 19: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

ICP = Hit ratio when no update delay is introduced

exact_dir = Hit ratio with update delay introduced

false_hit = No delay – delay = ICP – exact_dir

stale-hit = Remote stale hit due to the document is stale (out dated) but not reflected in summary

Impact of Update Delay: Explanation of the Graph

Page 20: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

exact_dir = hit ratio decrease linearly as threshold increases.

stale-hit = not effected by threshold because stale-hit error exist for both ICP and Summary Cache.

False-hit = increases as threshold increases because deleted document in cache may still be show present in summary.

Impact of Update Delay: Observation of the Graph

Page 21: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

Summary Representations

• Summary Representation = how to store the summaries in proxies.

• Summary needs to be stored in DRAM (main memory) – Disk arms become bottlenecks in proxy cache– DRAM price continues to drop – DRAM is faster

Page 22: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

Summary Representations: Naïve approach

• Exact-directory = the summary is essentially the list of URLs of cached documents, with each URL represented by its 16-byts MD5 signature. – Positive: Less errors– Negative: Consumes too much memory

• Server-name = web server names in the URLs of cached documents. – Positive: Cut down memory requirement by a factor of

10 but introduces errors – Negative: Generate too many false hit thus increase

network traffic

Page 23: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

Summary Representations: Bloom Filters

• Process– Step 1: Take each URL as an input to four

different hash functions. – Step 2: Take each output of hash function (32

bits) and convert to 1 bit. – Step 3: Store 4 bits from four different hash

functions and stores into a vector.

• Positive: Consumes much less memory • Negative: Introduce insignificant errors

Page 24: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

Summary Representations

• Server name produces too much traffic in network because request is send to any proxies that has server name.

Page 25: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

Bloom filter

• Bloom filter is type technique used for compression of memory space( To avoid false hit)

• Summary cache : uses the bloom technique to do compression

• A method of representing a set of “A” of n elements to support

the membership queries.

•It is a mechanism for identifying which pages have associated comments stored with in common knowledge server

Page 26: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

Problem?

• Place A place B

cnn.com/index.html

wayne.edu/

CompactRepresentation

arbitrary URI

? Bloom

Page 27: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

How the bloom works?• Pick a large bit array with all ‘0’s• Pick # of independent hash function , in this

case we have four(4)• Every URL in the bag (Proxy summary cache) ,

you apply the four hash function, and we will be getting four integers.

• Use the four integers in to the bit array• Turn all the bits to 1• Repeat this to all URL in Proxy summary cache• The above is the Encryption process. • Repeat above steps in reverse for decrypting.

Page 28: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

How does hash works?Hash functionHash function turns data into a relatively turns data into a relatively small number that may serve as a digital small number that may serve as a digital "fingerprint" of the data."fingerprint" of the data.

Page 29: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

Bloom filter

A hashing technique m bit k independent hashing function many to one mapping

“false positive

Page 30: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

Bloom filter

• False positive - Given the query to b, we check bits at position

h1(b), h2(b)…..,hk(b)..if any of them is 0, b is not in the set of A.

- Other wise we know b is in a set A, although there is a certain probability that we are wrong.

• If fall positive increases number of access will go up, but when the fall negative increase , probability of getting wrong doc will go up.

• The salient feature of Bloom is there is a trade of between memory size(array) and false positive.

Page 31: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

Probability of false positive

upper graph: for 4 hash functions

lower graph: optimal integral number of hash functions(5 hash function)

Page 32: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

Bloom filter as summaries

• Provides straight forward mechanism to built summaries

• Proxy build bloom from the URL of cached docs

• Thus increasing the memory can decrease

flase positive and other wise

• provides the clear trade between the above two

Page 33: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

How the hash function built?

32 bit hash 32 bit hash 32 bit hash

101101110101010111100 …… 010111

www.abc.com

32 bit hash

MD5

128 bit

Page 34: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

Hit ratio

Page 35: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

Obeservations of the cache hit ratio

• Exact_dir and bloom filter_8, _16,_32 is have virtually the same hit ratio compared to server name.

• Exact_dir will give same hit as bloom, but it will consume more memory to store all the informations of URL.

• Incase of Bloom filter_8_16_32,it will consume less memory than exact_dir, because of hash function.

Page 36: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

False hit ratio under different summary representations

Page 37: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

Observation of false hit (miss) ratio

• Server name has a much higher false hit (miss) ratio. Why?

• Because it just got the server name and don’t have a specific address of the requested URL.

• So the request will be sent to all other proxies, but the hit will be in any of the one proxy and obviously false hit is high.

• Exact_dir will have less false hit ratio compared to all (but it does need large cache size (memory).

Page 38: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

Message per request

Page 39: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

Observations on Msg/request

• We included ICP in for a comparative study.• In case of ICP( With out the summary cache) the

request will sent to all proxy to find the requested URL. So obviously messages/client request will be high compared to others.

• In the other extreme the bloom_8_16_32 and exact_dir will spend much less msg/client request to find the URL. It is good and economical to go with.

• Server name will be in the mid the above, because it got more false hit (miss). So higher the msg/client request.

Page 40: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

Bytes of Msg size per request

Page 41: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

Observations on size of inter network msg in bytes

• We are considering this issue because, update messages is of higher size than the query messages.

• So, Summary caches uses the occasional burst of large messages in between the small query messages. So it reduces CPU overhead and network interface packet (Results are table 2 and 4) significantly

For query messages    

 

Header size Average URL

ICP and others 20 50

For Summary updates    

 

Header size Bytes/Change

Exact directory 20 16

Server name 20 16

Bloom filter based Summaries 32 4

Page 42: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

Memory requirments in terms of % of Proxy cache: NLANR 4 proxies

0.00%0.10%0.20%0.30%0.40%0.50%0.60%0.70%

% of memory size

1

Approach

Storage requirment in terms of Proxy cache size for trace NLANR

Series1

Series2

Series3

Series4

Series5

Exact_dir

server_na Bloom_8 Bloom_16

Bloom_32

Page 43: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

Memory requirments in terms of % of Proxy cache: DEC 16 proxies

0.00%

0.50%1.00%

1.50%

2.00%2.50%

3.00%

% of memory size

1

Approach

Storage requirements in terms of proxy cache for the traces DEC

Series1

Series2

Series3

Series4

Series5

Exact_dir

Server_n Bloom_8

Bloom_16

Bloom_32

Page 44: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

Summary

• Web caching is an active research area.• Directory server: Approach uses the a central server to

keep track of the cache directories of all the proxies query the server for the cache hits in other proxies

• The above approach is failed because being a centralized server the network overhead will be high because of serving the all request.

• To over come the above we got a summary cache enabled ICP web-cache sharing protocol.

• Our inspection of the Quesnet traces showed that the chid to parent ICP queries can be a significant portion of the messages that the parent proxy has to process. So in this case applying the summary cache will significantly reduce the # of queries and overhead.

Page 45: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

Future work

• Plan to investigate the impact of the protocol on the parent – child proxy cooperation and the optimal hierarchy configuration for a given work load

• Plan to investigate the application of summary cache in various web-cache consistency protocol

• Plan to design new method for summary cache implementation in proxy to speed up the look up.

Page 46: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

Conclusion• We proposed the summary-cache enhanced ICP, a scalable

world wide web cache sharing protocol and proved it is the best to go with compared all other techniques.

• Our study has two key concepts effects of delayed updates of summary cache, and the representation of summary.

• Solution to first is, we can delayed the updates1 % to 10 % (Proved based on trace driven simulation) and it will cause errors but it is bearable.

• Solution to second problem, we introduced bloom filter technique for representation of summary cache.

• We achieve over 50 % reduction in bandwidth, and reduces the inter-proxy communication messages by a factor of 25 to 60.