empirical quantification of opportunities for content adaptation in web servers
DESCRIPTION
Empirical Quantification of Opportunities for Content Adaptation in Web Servers. Michael Gopshtein and Dror Feitelson School of Engineering and Computer Science The Hebrew University of Jerusalem. Supported by a grant from the Israel Internet Association. Capacity Planning. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Empirical Quantification of Opportunities for Content Adaptation in Web Servers](https://reader035.vdocuments.net/reader035/viewer/2022062806/56814e8d550346895dbc305d/html5/thumbnails/1.jpg)
Empirical Quantification of Opportunities for Content Adaptation
in Web Servers
Michael Gopshtein and Dror FeitelsonSchool of Engineering and Computer Science
The Hebrew University of Jerusalem
Supported by a grant from the Israel Internet Association
![Page 2: Empirical Quantification of Opportunities for Content Adaptation in Web Servers](https://reader035.vdocuments.net/reader035/viewer/2022062806/56814e8d550346895dbc305d/html5/thumbnails/2.jpg)
Capacity Planning
Daily cycle of activity
Utilized capacityWasted capacity
time
capa
city
![Page 3: Empirical Quantification of Opportunities for Content Adaptation in Web Servers](https://reader035.vdocuments.net/reader035/viewer/2022062806/56814e8d550346895dbc305d/html5/thumbnails/3.jpg)
Capacity Planning
Flash crowd
capa
city
time
![Page 4: Empirical Quantification of Opportunities for Content Adaptation in Web Servers](https://reader035.vdocuments.net/reader035/viewer/2022062806/56814e8d550346895dbc305d/html5/thumbnails/4.jpg)
Capacity Planning
• The problem:– Required capacity for flash crowds cannot be
anticipated in advance– Even capacity for daily fluctuations is highly
wasteful
• Academic solution: use admission control
• Business practice: unacceptable to reject any clients– Especially in cases of surge in traffic
![Page 5: Empirical Quantification of Opportunities for Content Adaptation in Web Servers](https://reader035.vdocuments.net/reader035/viewer/2022062806/56814e8d550346895dbc305d/html5/thumbnails/5.jpg)
Content Adaptation
• Trade off quality for throughput– Installed capacity matches normal load– Handle abnormal load by reducing quality– But still manage to provide meaningful service
to all clients
• Assumes normal optimizations have been made already– Compress or combine images, promote
caching, …– Empirically this usually is not the case
![Page 6: Empirical Quantification of Opportunities for Content Adaptation in Web Servers](https://reader035.vdocuments.net/reader035/viewer/2022062806/56814e8d550346895dbc305d/html5/thumbnails/6.jpg)
Content Adaptationsmily
smily
smily
Low load
![Page 7: Empirical Quantification of Opportunities for Content Adaptation in Web Servers](https://reader035.vdocuments.net/reader035/viewer/2022062806/56814e8d550346895dbc305d/html5/thumbnails/7.jpg)
Content Adaptationsmily
smily
smily
High load
smilysmily
smily
smilysm
ily
![Page 8: Empirical Quantification of Opportunities for Content Adaptation in Web Servers](https://reader035.vdocuments.net/reader035/viewer/2022062806/56814e8d550346895dbc305d/html5/thumbnails/8.jpg)
Content Adaptation
• Maintain the invariant:
• Need to change quality (and cost!) of content– Prepare multiple versions in advance
capacityrequest
perstco
requests
ofrate
![Page 9: Empirical Quantification of Opportunities for Content Adaptation in Web Servers](https://reader035.vdocuments.net/reader035/viewer/2022062806/56814e8d550346895dbc305d/html5/thumbnails/9.jpg)
The Questions
• What are the main costs in web service?– Bottleneck is CPU / network / disk?– What do we gain by eliminating HTTP requests?– What do we gain by reducing file sizes?
• What can realistically be done?– What is the structure of a “random” site?– How much can we reduce quality?
Assumption: static web pages only
![Page 10: Empirical Quantification of Opportunities for Content Adaptation in Web Servers](https://reader035.vdocuments.net/reader035/viewer/2022062806/56814e8d550346895dbc305d/html5/thumbnails/10.jpg)
Costs of Serving Web Pages
![Page 11: Empirical Quantification of Opportunities for Content Adaptation in Web Servers](https://reader035.vdocuments.net/reader035/viewer/2022062806/56814e8d550346895dbc305d/html5/thumbnails/11.jpg)
Measuring Random Web Sites
• http://en.wikipedia.org/wiki/Special:Random
• Use title of page as input to Google search
• Extract domain of first link to get home page
• Retrieve it using IE
• Collect statistical data by intercepting system calls to send and receive
![Page 12: Empirical Quantification of Opportunities for Content Adaptation in Web Servers](https://reader035.vdocuments.net/reader035/viewer/2022062806/56814e8d550346895dbc305d/html5/thumbnails/12.jpg)
Retrieved Component Sizes
This is only 0.02% of the components
A ¼ of total data from components
larger than 200 KB
![Page 13: Empirical Quantification of Opportunities for Content Adaptation in Web Servers](https://reader035.vdocuments.net/reader035/viewer/2022062806/56814e8d550346895dbc305d/html5/thumbnails/13.jpg)
Download Times
Download time (and bandwidth requirements) roughly proportional to image size
![Page 14: Empirical Quantification of Opportunities for Content Adaptation in Web Servers](https://reader035.vdocuments.net/reader035/viewer/2022062806/56814e8d550346895dbc305d/html5/thumbnails/14.jpg)
Network Bandwidth
• Typical Ethernet packets are 1526 bytes– Ethernet and TCP/IP headers require 54 bytes– HTTP response headers require 280-325
• Most components fit into few packets– 43% fit into a single packet– 24% more fit into 2 packets
Save bandwidth by reducingnumber of small componentsor size of large components
![Page 15: Empirical Quantification of Opportunities for Content Adaptation in Web Servers](https://reader035.vdocuments.net/reader035/viewer/2022062806/56814e8d550346895dbc305d/html5/thumbnails/15.jpg)
Locality and Caching
• Flash crowds typically involve a very small number of pages (possibly the home page)
• Servers allocate GB of memory for cache
• This is enough for thousands of files
Disk is not expected to bea bottleneck
![Page 16: Empirical Quantification of Opportunities for Content Adaptation in Web Servers](https://reader035.vdocuments.net/reader035/viewer/2022062806/56814e8d550346895dbc305d/html5/thumbnails/16.jpg)
CPU Overhead
• CPU usage reflects several activities– Opening TCP connection– Processing request– Sending data
• Measure using combinatorical microbenchmarks– Open connection only– One extremely large file– Many small files– Many requests for non-existent file
![Page 17: Empirical Quantification of Opportunities for Content Adaptation in Web Servers](https://reader035.vdocuments.net/reader035/viewer/2022062806/56814e8d550346895dbc305d/html5/thumbnails/17.jpg)
CPU Overhead
Example: single 10KB file
• Equal processing and transfer at 240KB– Only 0.3% of files are so big
Establishing connection 25%
Processing request 72%
Data transfer 3%
If CPU is bottleneck, needto reduce number of requests
![Page 18: Empirical Quantification of Opportunities for Content Adaptation in Web Servers](https://reader035.vdocuments.net/reader035/viewer/2022062806/56814e8d550346895dbc305d/html5/thumbnails/18.jpg)
Optimizations
![Page 19: Empirical Quantification of Opportunities for Content Adaptation in Web Servers](https://reader035.vdocuments.net/reader035/viewer/2022062806/56814e8d550346895dbc305d/html5/thumbnails/19.jpg)
Guidelines
• Either CPU or network are the bottleneck
• Network bandwidth saved by reducing large components
• CPU saved by eliminating small components
• Maintaining “acceptable” quality is subjective
![Page 20: Empirical Quantification of Opportunities for Content Adaptation in Web Servers](https://reader035.vdocuments.net/reader035/viewer/2022062806/56814e8d550346895dbc305d/html5/thumbnails/20.jpg)
Eliminating Images
• Images have many functions– Story (main illustrative item)– Preview (for other page)– Commercial– Logo– Decoration (bullets, background)– Navigation (buttons, menus)– Text (special formatting)
• Some can be eliminated or replaced
![Page 21: Empirical Quantification of Opportunities for Content Adaptation in Web Servers](https://reader035.vdocuments.net/reader035/viewer/2022062806/56814e8d550346895dbc305d/html5/thumbnails/21.jpg)
Distribution of Types
• Manually classified 959 images from 30 random sites
• 50% decoration• 18% preview• 11% commercial• 6% logo• 6% text
![Page 22: Empirical Quantification of Opportunities for Content Adaptation in Web Servers](https://reader035.vdocuments.net/reader035/viewer/2022062806/56814e8d550346895dbc305d/html5/thumbnails/22.jpg)
Automatic Identification
• Decorations are candidates for elimination
• Identified by combination of attributes:– Use gif format– Appear in HTML tags other than <IMG>– Appear multiple times in same page– Small original size– Displayed size much bigger than original– Large change in aspect ratio when displayed
![Page 23: Empirical Quantification of Opportunities for Content Adaptation in Web Servers](https://reader035.vdocuments.net/reader035/viewer/2022062806/56814e8d550346895dbc305d/html5/thumbnails/23.jpg)
Image Sizes Distribution
decoration
preview
commercial
![Page 24: Empirical Quantification of Opportunities for Content Adaptation in Web Servers](https://reader035.vdocuments.net/reader035/viewer/2022062806/56814e8d550346895dbc305d/html5/thumbnails/24.jpg)
Auxiliary Files
• JavaScript– May be crucial for page function– Impossible to understand automatically
• CSS (style sheets)– May be crucial for page structure– May be possible to identify those parts that
are used
![Page 25: Empirical Quantification of Opportunities for Content Adaptation in Web Servers](https://reader035.vdocuments.net/reader035/viewer/2022062806/56814e8d550346895dbc305d/html5/thumbnails/25.jpg)
Auxiliary Files
• Cannot be eliminated
• Common wisdom: use separate files– Allow caching at client– Save retransmission with each page
• Alternative: embed in HTML– Reduce number of requests– May be better for flash crowds that do not
request multiple pages
![Page 26: Empirical Quantification of Opportunities for Content Adaptation in Web Servers](https://reader035.vdocuments.net/reader035/viewer/2022062806/56814e8d550346895dbc305d/html5/thumbnails/26.jpg)
Text and HTML
• Some areas may be eliminated under extreme conditions– Commercials– Some previews and navigation options
• Often encapsulated in <DIV> tags
• Sometimes identified by ID or class names, e.g. “sidebanner”– Especially when using modular design
![Page 27: Empirical Quantification of Opportunities for Content Adaptation in Web Servers](https://reader035.vdocuments.net/reader035/viewer/2022062806/56814e8d550346895dbc305d/html5/thumbnails/27.jpg)
Summary
![Page 28: Empirical Quantification of Opportunities for Content Adaptation in Web Servers](https://reader035.vdocuments.net/reader035/viewer/2022062806/56814e8d550346895dbc305d/html5/thumbnails/28.jpg)
Content Adaptation
• Degraded content usually better than exclusion
• Only way to handle flash crowds that overwhelm installed capacity
• Empirical results identify main options– Identify and eliminate decorations– Compress large images (story, commercial)– Embed JavaScript and CSS– Hide unnecessary blocks
![Page 29: Empirical Quantification of Opportunities for Content Adaptation in Web Servers](https://reader035.vdocuments.net/reader035/viewer/2022062806/56814e8d550346895dbc305d/html5/thumbnails/29.jpg)
Next Paper Preview
• Implementation in Apache
• Monitor CPU utilization and idle threads to switch between modes
• Use mod_rewrite to redirect URLs to adapted content
• Achieve up to x10 increase in throughput for extreme adaptation