using database technology to improve performance of web proxy servers

22
Using Database Technology to Improve Performance of Web Proxy Servers K. Cheng¹, Y. Kambayashi¹, M. Mohania² ¹Kyoto University, Japan ²Western Michigan University, USA

Upload: shakti

Post on 19-Jan-2016

51 views

Category:

Documents


0 download

DESCRIPTION

Using Database Technology to Improve Performance of Web Proxy Servers. K. Cheng ¹ , Y. Kambayashi ¹ , M. Mohania ² ¹ Kyoto University, Japan ² Western Michigan University, USA. Proxy Server. Lower Bandwidth. Higher Bandwidth. ( WAN ). ( LAN ). X. Direct Access. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Using Database Technology to Improve Performance of Web Proxy Servers

Using Database Technology to Improve Performance of Web Proxy Servers

K. Cheng¹, Y. Kambayashi¹, M. Mohania²¹Kyoto University, Japan²Western Michigan University, USA

Page 2: Using Database Technology to Improve Performance of Web Proxy Servers

24-25 May 2001 WebDB'2001, Santa Barbara CA 2

Caching on web proxy servers

Improve throughput of proxy servers Improve response times for end users Bridge bandwidth gap between WAN and

LAN Distribute workload from web servers

Web Servers

Clients

(WAN) (LAN)Lower Bandwidth Higher Bandwidth

Proxy Server

Direct AccessX

Page 3: Using Database Technology to Improve Performance of Web Proxy Servers

24-25 May 2001 WebDB'2001, Santa Barbara CA 3

Characteristics of proxy caching

Traditional Caching

Proxy Caching

Storage Memory-based

Disk-based

Cache size

Small Huge

Object survival

time

Short Long

Algorithm Simple Can be complex

Who use ? Programmed process

People with specific

interest

Page 4: Using Database Technology to Improve Performance of Web Proxy Servers

24-25 May 2001 WebDB'2001, Santa Barbara CA 4

Limitations of current caching schemes: case 11. Tom found a very good page “P1” about car

models2. John is also looking for that kind of pages, but

he only got “P2”3. Both “P1” and “P2” were cached, but Tom

didn’t know “P2” and John didn’t know about “P1”.

4. After several days, however, both were replaced since no further visits.

5. As a result, Tom missed “P2”, John missed “P1”, and cache missed 2 hits

State-of-art caching schemes cannot deal this case!!

Page 5: Using Database Technology to Improve Performance of Web Proxy Servers

24-25 May 2001 WebDB'2001, Santa Barbara CA 5

Limitations of current caching schemes: case 21. Suppose the users of a proxy server are mostly

interested in “XML”, but rarely favor of “Fuzzy”2. Suppose some clients retrieved pages “P1” and

“P2”3. After checking the content of “P1”and “P2”, we

know “P1” is a “XML” one, “P2” is a “Fuzzy” one

Should we prefer to cache “P1” or “P2” ?

Page 6: Using Database Technology to Improve Performance of Web Proxy Servers

24-25 May 2001 WebDB'2001, Santa Barbara CA 6

Why current schemes can’t deal with these cases ? Physical object based cache management Content transparency low utilization

rate (Case 1)Approximately 60% data in cache never

usedApproximately 90% data in cache rarely used

Usage-based object replacement Needlessly long stay time for irrelevant contents (Case 2)

Page 7: Using Database Technology to Improve Performance of Web Proxy Servers

24-25 May 2001 WebDB'2001, Santa Barbara CA 7

Our solution

We propose a hierarchical data model for management of web data (physical pages, logical pages and topics).

Object replacement based on Link structure (“logical pages”)Semantic similarity with other objects

(“topics” ) Facilitate active access to cache

contents

Page 8: Using Database Technology to Improve Performance of Web Proxy Servers

24-25 May 2001 WebDB'2001, Santa Barbara CA 8

A hierarchical model for web data

Topic manager

Logical page manager

Physical page manager p1 p2 p3 p4 p5 p6

L1 L2 L3

T1 T2

Mapping

Mapping

Topics

Logical pages

Physical pages

navigate

Search

Browse

Page 9: Using Database Technology to Improve Performance of Web Proxy Servers

24-25 May 2001 WebDB'2001, Santa Barbara CA 9

Physical pages http://www.difa.unibas.it/webdb2001

/instructionsPage/index.html

../icons/webdblogo.gif

Physical page “A”

Physical page “B”

Page 10: Using Database Technology to Improve Performance of Web Proxy Servers

24-25 May 2001 WebDB'2001, Santa Barbara CA 10

Logical page

AA

BB

Page 11: Using Database Technology to Improve Performance of Web Proxy Servers

24-25 May 2001 WebDB'2001, Santa Barbara CA 11

Managing physical pages

Physical pageHTML/plain text file (.html, .txt) Embedded media file (.gif, .png, wav, .mp3) Application Generated File (.pdf, .ps, .doc)

Managing physical pages based onURL (protocol, ip, port, path)Physical properties (e.g. size, cost etc.)Usage (frequency, recency)

Page 12: Using Database Technology to Improve Performance of Web Proxy Servers

24-25 May 2001 WebDB'2001, Santa Barbara CA 12

Constructing logical pages

Basic logical pagesSingle multimedia documentHTML(1)+ embedded media files(1..*)

Extended logical pagesSeveral closely related directly linked

pages E.g. an HTML paper with sections on

different multimedia documents

Page 13: Using Database Technology to Improve Performance of Web Proxy Servers

24-25 May 2001 WebDB'2001, Santa Barbara CA 13

Managing topics Defining a topic

Topic = <id, name, criteria, popularity, date, …>Popularity=f(F, R, P, U)

F – Access Frequency of TopicR - Time interval between last access time and current tim

eP – Number of logical pages belonging to a topicU – Number of users accessing a topic

Deciding membership of a logical page to a topic IR Approaches (K-NN, )ML Approaches (e.g. Support Vector Machine-SVM)

Page 14: Using Database Technology to Improve Performance of Web Proxy Servers

24-25 May 2001 WebDB'2001, Santa Barbara CA 14

Definitions

We use a term “Priority” for object replacement. It is a function of several parameters, e.g. access frequency(F), time interval(R), size of object(S), retrieval cost(C), significance(G).

Significance: Importance of the topic

Page 15: Using Database Technology to Improve Performance of Web Proxy Servers

24-25 May 2001 WebDB'2001, Santa Barbara CA 15

Caching policy: LRU-SP+

Topic managementPriority = f(F, R, G)

Logical page managementBasic logical pages only Priority = g(F, R)

Physical page managementLRU-SP --size-adjusted & popularity-aware

LRU (K. Cheng et al, Compsac’00)Priority = h(F, R, S)

Page 16: Using Database Technology to Improve Performance of Web Proxy Servers

24-25 May 2001 WebDB'2001, Santa Barbara CA 16

Evaluate & add new objects

L1L1 L2L2 L3L3

P10P10

P11P11

P40P40P30P30P20P20

P41P41P31P31

P22P22P12P21P12P21

P42P42

T1T1 T2T2

Physical Pages

Logical Pages

Topics

Higher Lower

New Object “D”

Priority

“D” is of higher priority

Page 17: Using Database Technology to Improve Performance of Web Proxy Servers

24-25 May 2001 WebDB'2001, Santa Barbara CA 17

Replace an object

1. Choose a candidate topic (T1)

2. T1 has 1 logical page (L1), choose (L1)

3. (L1) has 3 physical pages (P10), ( P11), (P12), where (P12) shared by (L2)

4. Choose a victim (P*) from (P10), ( P11).

5. Replace (P*) with the new page

P10P10

P11P11

P40P40P30P30P20P20

P41P41P31P31P23P23P22P22P12P21P12P21

P42P42

L1L1 L2L2 L3L3

T1T1 T2T2

Page 18: Using Database Technology to Improve Performance of Web Proxy Servers

24-25 May 2001 WebDB'2001, Santa Barbara CA 18

Preliminary experiments Replay access logs of our proxy server(Squid)

30 clients, 30 days873,824 requests, 21.30GB data7 Topics, Priority [1..5]

Significance Factor ([0, 2])Measure the significance of each topic

Hit Rate(HR) Percentage of requests satisfied by cache

Profit Rate(PR)-- is significance of topic

otherwise

cacheindify

g

ygPR i

ii

Ni

i ii

,0

,1,

.1

g i

Page 19: Using Database Technology to Improve Performance of Web Proxy Servers

24-25 May 2001 WebDB'2001, Santa Barbara CA 19

Baseline algorithm LRV (Rizzo et al 1998) A physical-page-based algorithm Using size(S) to predict further

access to incoming objectsParameters in consideration

Access frequency (F)Time interval (R)Size of objects (S)

Page 20: Using Database Technology to Improve Performance of Web Proxy Servers

24-25 May 2001 WebDB'2001, Santa Barbara CA 20

Results: Hit Rates 20% UP

0

0.05

0.1

0.15

0.2

0.25

0.5 3 6 10

LRVLRU-SP+

Cache space in % of total unique data

Page 21: Using Database Technology to Improve Performance of Web Proxy Servers

24-25 May 2001 WebDB'2001, Santa Barbara CA 21

Results: Profit Rates 30% Up

00.050.10.150.20.250.30.350.40.450.5

0.5 3 6 10

LRVLRU-SP+

Cache space in % of total unique data

Page 22: Using Database Technology to Improve Performance of Web Proxy Servers

24-25 May 2001 WebDB'2001, Santa Barbara CA 22

Conclusion and future work Performance of caching proxies can be

remarkably improved if cache contents were well organized and managed

Proposed a hierarchical model and the cache management scheme based on that model

Future workTuning various parameters to achieve better

performance(Logical page clustering, priority balancing significance and popularity etc.)

More experiments