nginx high-performance caching
TRANSCRIPT
NGINX High-performance Caching
Introduced by Andrew Alexeev
Presented by Owen Garrett
Nginx, Inc.
About this webinar
Content Caching is one of the most effective ways to dramatically improve
the performance of a web site. In this webinar, we’ll deep-dive into
NGINX’s caching abilities and investigate the architecture used, debugging
techniques and advanced configuration. By the end of the webinar, you’ll
be well equipped to configure NGINX to cache content exactly as you need.
BASIC PRINCIPLES OF CONTENT CACHING
Basic Principles
Internet
N
GET /index.html
GET /index.html
Used by: Browser Cache, Content Delivery Network and/or Reverse Proxy Cache
Mechanics of HTTP Caching• Origin server declares cacheability of content
• Requesting client honors cacheability– May issue conditional GETs
Expires: Tue, 6 May 2014 02:28:12 GMT
Cache-Control: public, max-age=60
X-Accel-Expires: 30
Last-Modified: Tue, 29 April 2014 02:28:12 GMT
ETag: "3e86-410-3596fbbc“
What does NGINX cache?
• Cache GET and HEAD with no Set-Cookie response
• Uniqueness defined by raw URL or:
• Cache time defined by
– X-Accel-Expires
– Cache-Control
– Expires http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html
proxy_cache_key $scheme$proxy_host$uri$is_args$args;
NGINX IN OPERATION…
NGINX Configproxy_cache_path /tmp/cache keys_zone=one:10m levels=1:2 inactive=60m;
server {
listen 80;
server_name localhost;
location / {
proxy_pass http://localhost:8080;
proxy_cache one;
}
}
Caching Process
Internet
Check Cache
Respond from cache
Read request Wait?cache_lock_timeout
Response cacheable?
Stream to disk
proxy_cache_use_stale error | timeout | invalid_header |
updating | http_500 | http_502 | http_503 | http_504 |
http_403 | http_404 | off
NGINX can use stale content under the following circumstances:
MISS
HIT
Caching is not just for HTTP
• FastCGI– Functions much like HTTP
• Memcache– Retrieve content from memcached
server (must be prepopulated)
• uwsgi and SCGI
N
HTTPFastCGImemcacheduwsgiSCGI
NGINX is more than just a reverse proxy
HOW TO UNDERSTAND WHAT’S GOING ON
add_header X-Cache-Status $upstream_cache_status;
MISS Response not found in cache; got from upstream. Response may have been saved to cache
BYPASS proxy_cache_bypass got response from upstream. Response may have been saved to cache
EXPIRED entry in cache has expired; we return fresh content from upstream
STALE takes control and serves stale content from cache because upstream is not responding correctly
UPDATING serve state content from cache because cache_lock has timed out and proxy_use_stale takes control
REVALIDATED proxy_cache_revalidate verified that the current cached content was still valid (if-modified-since)
HIT we serve valid, fresh content direct from cache
Cache Instrumentation
Cache Instrumentationmap $remote_addr $cache_status {
127.0.0.1 $upstream_cache_status;
default “”;
}
server {
location / {
proxy_pass http://localhost:8002;
proxy_cache one;
add_header X-Cache-Status $cache_status;
}
}
Extended StatusCheck out: demo.nginx.com
http://demo.nginx.com/status.html http://demo.nginx.com/status
HOW CONTENT CACHING FUNCTIONS IN NGINX
How it works...
• NGINX uses a persistent disk-based cache
– OS Page Cache keeps content in memory, with hints from NGINX processes
• We’ll look at:
– How is content stored in the cache?
– How is the cache loaded at startup?
– Pruning the cache over time
– Purging content manually from the cache
How is cached content stored?
• Define cache key:
• Get the content into the cache, then check the md5
• Verify it’s there:
$ echo -n "httplocalhost:8002/time.php" | md5sum
6d91b1ec887b7965d6a926cff19379b4 -
$ cat /tmp/cache/4/9b/6d91b1ec887b7965d6a926cff19379b4
proxy_cache_path /tmp/cache keys_zone=one:10m levels=1:2
max_size=40m;
proxy_cache_key $scheme$proxy_host$uri$is_args$args;
Loading cache from disk
• Cache metadata stored in shared memory segment
• Populated at startup from cache by cache loader
– Loads files in blocks of 100
– Takes no longer than 200ms
– Pauses for 50ms, then repeats
proxy_cache_path path keys_zone=name:size
[loader_files=number] [loader_threshold=time] [loader_sleep=time];
(100) (200ms) (50ms)
Managing the disk cache
• Cache Manager runs periodically, purging files that were inactive irrespective of cache time, deleteingfiles in LRU style if cache is too big
– Remove files that have not been used within 10m
– Remove files if cache size exceeds max_size
proxy_cache_path path keys_zone=name:size
[inactive=time] [max_size=size];
(10m)
Purging content from disk
• Find it and delete it
– Relatively easy if you know the key
• NGINX Plus – cache purge capability
$ curl -X PURGE -D – "http://localhost:8001/*"
HTTP/1.1 204 No Content
Server: nginx/1.5.12
Date: Sat, 03 May 2014 16:33:04 GMT
Connection: keep-alive
X-Cache-Key: httplocalhost:8002/*
CONTROLLING CACHING
Delayed caching
• Saves on disk writes for very cool caches
• Saves on upstream bandwidth and disk writes
proxy_cache_min_uses number;
proxy_cache_revalidate on;
Cache revalidation
Control over cache time
• Priority is:
– X-Accel-Expires
– Cache-Control
– Expires
– proxy_cache_valid
proxy_cache_valid 200 302 10m;
proxy_cache_valid 404 1m;
Set-Cookie response header means no caching
Cache / don’t cache
• Bypass the cache – go to origin; may cache result
• No_Cache – if we go to origin, don’t cache result
• Typically used with a complex cache key, and only if the origin does not sent appropriate cache-control reponses
proxy_cache_bypass string ...;
proxy_no_cache string ...;
proxy_no_cache $cookie_nocache $arg_nocache $http_authorization;
Multiple Caches
• Different cache policies for different tenants
• Pin caches to specific disks
• Temp-file considerations – put on same disk!:
proxy_cache_path /tmp/cache1 keys_zone=one:10m levels=1:2 inactive=60s;
proxy_cache_path /tmp/cache2 keys_zone=two:2m levels=1:2 inactive=20s;
proxy_temp_path path [level1 [level2 [level3]]];
QUICK REVIEW – WHY CACHE?
Why is page speed important?
• We used to talk about the ‘N second rule’:
– 10-second rule• (Jakob Nielsen, March 1997)
– 8-second rule • (Zona Research, June 2001)
– 4-second rule • (Jupiter Research, June 2006)
– 3-second rule • (PhocusWright, March 2010)
0
2
4
6
8
10
12
Jan
-97
Jan
-98
Jan
-99
Jan
-00
Jan
-01
Jan
-02
Jan
-03
Jan
-04
Jan
-05
Jan
-06
Jan
-07
Jan
-08
Jan
-09
Jan
-10
Jan
-11
Jan
-12
Jan
-13
Jan
-14
Google changed the rules
“We want you to be able to get from one page to another as quickly as you turn the page on a book”
Urs Hölzle, Google
The costs of poor performance• Google: search enhancements cost 0.5s page load
– Ad CTR dropped 20%
• Amazon: Artificially increased page load by 100ms– Customer revenue dropped 1%
• Walmart, Yahoo, Shopzilla, Edmunds, Mozilla… – All reported similar effects on revenue
• Google Pagerank – Page Speed affects Page Rank– Time to First Byte is what appears to count
NGINX Caching lets you
Improve end-user performance
Consolidate and simplify your web infrastructure
Increase server capacity
Insulate yourself from server failures
Closing thoughts
• 38% of the world’s busiest websites use NGINX
• Check out the blogs on nginx.com
• Future webinars: nginx.com/webinars
Try NGINX F/OSS (nginx.org) or NGINX Plus (nginx.com)