varnish cache

46
Varnish, The Good, The Awesome, and the Downright Crazy By Mike Willbanks Sr. Web Architect Manager NOOK Developer Northeast PHP August 12, 2012

Upload: mike-willbanks

Post on 17-May-2015

3.922 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Varnish Cache

Varnish, The Good, The Awesome, and the Downright Crazy

By Mike Willbanks

Sr. Web Architect Manager

NOOK Developer

Northeast PHP August 12, 2012

Page 2: Varnish Cache

2

• Talk

 Slides will be online later!

• Me

 Sr. Web Architect Manager at NOOK Developer

 Prior MNPHP Organizer

 Open Source Contributor (Zend Framework and various others)

 Where you can find me:

• Twitter: mwillbanks G+: Mike Willbanks

•  IRC (freenode): mwillbanks Blog: http://blog.digitalstruct.com

• GitHub: https://github.com/mwillbanks

Housekeeping…

Page 3: Varnish Cache

3

• What is Varnish

• The Good : Why…

  The quick, easy and hardly informed way…

• The Awesome : How…

  VCL’s, Directors and more…

• The Crazy : Go…

  ESI, Purging, VCL C, and VMOD…

• Varnish Command Line Apps

  varnishtop, varnishstat, etc.

Agenda

Page 4: Varnish Cache

What is Varnish? Official Statement

What the hell it means

Graphs, oh my!

Page 5: Varnish Cache

5

“Varnish is a web application accelerator. You install it in front of your web application and it will speed it up

significantly.”

Official Statement

Page 6: Varnish Cache

6

• Varnish allow you to accelerate your website

 By using memory and keeping in mind cookies, request headers and more…

• It caches pages so that your web server can RELAX!

 What about my apache, tomcat, nginx and (mongrel|thin|goliath….)

 Generally caching by TTL + HTTP Headers (cookies too!)

• A load banancer, proxy and more…

 What? …. Yes, it can do that!

What The Hell? Tell me!

Page 7: Varnish Cache

7

• CaringBridge Status Server

 Getting a message to mobile users.

 The system is down, or we want to be able to communicate a message to them about some subject… maybe a campaign.

 The apps and mobile site rely on an API

• Trouble in paradise? Few and far in between.

 Let an API talk to a server…

 A story on crashing and burning before varnish.

A General Use Case

Page 8: Varnish Cache

8

The Graph - AWS

0

10000

20000

30000

40000

50000

60000

70000

80000

Small X-Large Small Varnish

Requests

Requests

0 50

100 150 200 250 300 350 400 450 500

Small X-Large Small Varnish

Time

Time

0

100

200

300

400

500

600

700

Small X-Large Small Varnish

Req/s

Req/s

0

2

4

6

8

10

12

14

Small X-Large Small Varnish

Peak Load

Peak Load

Page 9: Varnish Cache

9

The Raw Data

Small   X-­‐Large   Small  Varnish  Concurrency   10   150   150  Requests   5000   55558   75000  Time   438   347   36  Req/s   11.42   58   585  Peak  Load   11.91   8.44   0.35  

Comments  19,442  failed  reqs  

Page 10: Varnish Cache

The Good – Listen Up! Installment

Documentation

Finding Existing VCL’s

Page 11: Varnish Cache

11

• RTM : http://goo.gl/hl4Tt

 Debian: sudo apt-get install varnish

 EPEL: yum install varnish

• only 6.x otherwise you’ll be out of date!

 WOOT Compiling #git

• git clone git://git.varnish-cache.org/varnish-cache

• cd varnish-cache

• sh autogen.sh

• ./configure

• make && make install

Installment

Page 12: Varnish Cache

12

Varnish Daemon

• varnishd

 -a address[:port] listen for client

 -b address[:port] backend requests

 -T address[:port] administration http

 -s type[,options] storage type (malloc, file, persistence)

 -P /path/to/file PID file

 Many others; these are generally the most important. Generally the defaults will do with just modification of the default VCL (more on it later).

Page 13: Varnish Cache

13

• Reference Manual

 https://www.varnish-cache.org/docs/3.0/reference/index.html

• Tutorial – more like a book version of the reference manual

 https://www.varnish-cache.org/docs/3.0/tutorial/index.html

• Knock yourselves out! There is a ton of documentation

• Yes, this makes happy developers.

 Documentation is very accurate, read carefully.

 Focus heavily on VCL’s, that is generally what you need.

 I’m attempting to show you some of how this works but you will require the documentation to assist you.

Documentation

Page 14: Varnish Cache

14

• VCL’s are available for common open source projects

 Hi wordpress and drupal!

• https://www.varnish-cache.org/trac/wiki/VarnishAndWordpress

• https://www.varnish-cache.org/trac/wiki/VarnishAndDrupal

 Examples of all sorts of crazy

• https://www.varnish-cache.org/trac/wiki/VCLExamples

Existing VCL’s – The truly lazy…

Page 15: Varnish Cache

15

backend default { .host = "127.0.0.1“; .port = "8080"; }

sub vcl_recv { if (!(req.url ~ "wp-(login|admin)")) { unset req.http.cookie; } }

sub vcl_fetch { if (!(req.url ~ "wp-(login|admin)")) { unset beresp.http.set-cookie; } }

Wordpress = Bad Slashdot Bad!!!

Page 16: Varnish Cache

The Awesome – Going Places VCL

Directors

A Few Examples

Page 17: Varnish Cache

17

VCL’s by Diagram…

Page 18: Varnish Cache

18

• VCL State Engine

 Each Request is Processed Separately & Independently

 States are Isolated but are Related

 Return statements exit one state and start another

 VCL defaults are ALWAYS appended below your own VCL

• VCL can be complex, but…

 Two main subroutines; vcl_recv and vcl_fetch

 Common actions: pass, hit_for_pass, lookup, pipe, deliver

 Common variables: req, beresp and obj

 More subroutines, functions and complexity can arise dependent on condition.

VCL – Varnish Configuration Language

Page 19: Varnish Cache

19

• vcl_init – VCL is loaded, no request yet; VMOD initialization

• vcl_recv – Beginning of request, req is in scope

• vcl_pipe – Client & backend data passed unaltered

• vcl_pass – Request goes to backend and not cached

• vcl_hash – call hash_data to add to the hash

• vcl_hit – called on request found in the cache

• vcl_miss – called on request not found in the cache

• vcl_fetch – called on document retrieved from backend

• vcl_deliver – called prior to delivery of cached object

• vcl_error – called on errors

• vcl_fini – all requests have exited VCL, cleanup of VMOD’s

VCL – Subroutines – breaking it down.

Page 20: Varnish Cache

20

• Always Available

 now – epoch time

• Backend Declarations

  .host – hostname / IP

  .port – port number

• Request Processing

 client – ip & identity

  server – ip & port

  req – request information

VCL - Variables

• Backend Req Prepartion

 bereq – backend request

• Retrieved Backend Request

 beresp – backend response

• Cached Object

 obj – Cached object, can only change .ttl

• Response Preparation

  resp – http stuff

Page 21: Varnish Cache

21

• hash_data(string) – adds a string to the hash input.

 Request host and URL is default from the default vcl.

• regsub(string, regex, sub) – substitution on first occurance

  sub can contain numbers 0-n to inject matches from the regex.

• regsuball(string, regex, sub) – substitution on all occurances

• ban(expression) – Ban all objects in cache that match

• ban(regex) – Ban all objects in cache that have a URL match

VCL - Functions

Page 22: Varnish Cache

22

• Directors allow you to talk to the backend servers

• Directors are a glorified reverse proxy

 Allows for certain types of load balancing

 Allows for talking to a cluster

“A director is a logical group of backend servers clustered together for redundancy. The basic role of the director is to let Varnish choose a backend server

amongst several so if one is down another can be used.”

Directors

Page 23: Varnish Cache

23

• Random Director – picks a backend by random number

• Client Director – picks a backend by client identity

• Hash Director – picks a backend by URL hash value

• Round-Robin Director – picks a backend in order

• DNS Director – picks a backend by means of DNS

 Random OR Round-Robin

• Fallback – picks the first “healthy” backend

Directors – The Types

Page 24: Varnish Cache

24

• To ensure healthy backends, you need to use probing.

 It really sounds like a colonoscopy for servers; which it is.

• Variables

 .url

 .request

 .window

 .threshold

 .intial

 .expected_response

 .interval

 .timeout

Director - Probing

Page 25: Varnish Cache

25

Example VCL Configuration

Page 26: Varnish Cache

The Crazy ESI – Edge-Side Includes

Purging

VMOD

Page 27: Varnish Cache

27

• ESI is a small markup language much like SSI (server side includes) to include fragments (or dynamic content for that matter).

• Think of it as replacing regions inside of a page as if you were using XHR (AJAX) but single threaded.

• Three Statements can be utilized.

 esi:include – Include a page

 esi:remove – Remove content

 <!-- esi --> - ESI disabled, execute normally

ESI – Edge Side Includes

Page 28: Varnish Cache

28

ESI – By Diagram

Page 29: Varnish Cache

29

• In vcl_fetch, you must set ESI to be on

 set beresp.do_esi = true;

 By default, ESI will still cache, so add an exclusion if you need it

•  if (req.url == “/show_username.php”) { return (pass); }

• This is a good thing, you may want to cache user information to the right people (aka by cookie value) so that you don’t reload it on every request.

 Varnish refuses to parse content for ESI if it does not look like XML

• This is by default; so check varnishstat and varnishlog to ensure that it is functioning like normal.

Using ESI

Page 30: Varnish Cache

30

<html>

<head><title>Rock it with ESI</title></head>

<body>

<header>

<esi:include src="/user_header.php" />

<!-- Don't do this as you'd lose the advantage of varnish -->

<!--esi

<?php include 'user_header.php'; ?>

-->

</header>

<section id="main"></section

<footer></footer>

</body>

</html>

ESI – By Example

Page 31: Varnish Cache

31

• The various ways of purging

 varnishadm – command line utility

•  It’s the ole finger in the back of the throat

 Sockets (port 6082) – everyone likes a good socket wrench

• Sure, Ipecac is likely overkill.

 HTTP – now that is the sexiness

• A few headers, nothing forced.

Purging

Page 32: Varnish Cache

32

varnishadm -T 127.0.0.1:6082 purge req.url == "/foo/bar“

telnet localhost 6082

purge req.url == "/foo/bar

telnet localhost 80

Response:

Trying 127.0.0.1...

Connected to localhost.

Escape character is '^]'.

PURGE /foo/bar HTTP/1.0

Host: bacon.org

Purging Examples

Page 33: Varnish Cache

33

• Distributed Purging… like a sorority party.

 Use a message queue (or gearman job server)

 Have a worker that knows about the varnish servers

 Submit the request to clear the cache in the asynchronously or synchronously depending on your use case.

• Have enough workers to make this effective at purging the cache quickly.

 This will make it far easier to scale; you can either store the servers in a config file, database or anything else you think is relevant.

Distributed Purging

Page 34: Varnish Cache

34

• Before getting into VMOD; did you know you can embed C into the VCL for varnish?

• Want to do something crazy fast or leverage a C library for pre or post processing?

• I know… you’re thinking that’s useless..

 On to the example; and a good one from the Varnish WIKI!

Embedding C in VCL – you must be crazy

Page 35: Varnish Cache

35

C{

#include <syslog.h>

}C

sub vcl_something {

C{

syslog(LOG_INFO, "Something happened at VCL line XX.");

}C

}

# Example with using varnish variables

C{

syslog(LOG_ERR, "Spurious response from backend: xid %s request %s %s \"%s\" %d \"%s\" \"%s\"", VRT_r_req_xid(sp), VRT_r_req_request(sp), VRT_GetHdr(sp, HDR_REQ, "\005host:"), VRT_r_req_url(sp), VRT_r_obj_status(sp), VRT_r_obj_response(sp), VRT_GetHdr(sp, HDR_OBJ, "\011Location:"));

}C

VCL - Embedded C for syslog – uber sexy

Page 36: Varnish Cache

36

• Taking VCL embedded C to the next level

• Allows you to extend varnish and create new functions

• Now, if you are writing modules for varnish you have a specialty use case!

 Go read up on it!

 https://www.varnish-cache.org/docs/trunk/reference/vmod.html

VMOD – Varnish Modules / Extensions

Page 37: Varnish Cache

37

• The VMOD std is shipped with varnish; it provides some useful commands

 toupper

 tolower

 set_up_tos

 random

 log

VMOD - std

 syslog

 fileread

 duration

 integer

 collect

Page 38: Varnish Cache

Varnish Command Line Apps varnish varnishadm varnishhist

varnishlog varnishncsa varnishreplay

varnishsizes varnishstat varnishtest

varnishtop

Page 39: Varnish Cache

39

• What is varnish doing right now?

• How do I debug what is happening?

 varnishtop

What is Varnish doing…

Page 40: Varnish Cache

40

What is Varnish doing…

Page 41: Varnish Cache

41

• Many times people want to log the requests to a file

 By default Varnish only stores these in shared memory.

 Apache Style Logs

• varnishncsa –D –a –w log.txt

 This will run as a daemon to log all of your requests on a separate thread.

Logging

Page 42: Varnish Cache

42

Logging

Page 43: Varnish Cache

43

• Need to warm up your cache before putting a sever in the queue or load test an environment?

 varnishreplay –r log.txt

• Replaying logs can allow you to do this. This is great for when you are going to be deploying code to check for performance issues.

 Although… be careful so that you don’t POST data or create data on peoples accounts. Maybe cat the file and remove anything that executes on data.

Cache Warmup

Page 44: Varnish Cache

44

• How to see your cache hit ratios…

 varnishstat

• Want to parse them from XML so you can create a sexy administration panel?

 varnishstat –x

Cache Hit Ratios? No Problem

Page 45: Varnish Cache

45

Cache Hit Ratios? No Problem

Page 46: Varnish Cache

Questions? These slides will be posted to SlideShare & SpeakerDeck.

 SpeakerDeck: http://speakerdeck.com/u/mwillbanks

 Slideshare: http://www.slideshare.net/mwillbanks

 Twitter: mwillbanks

 G+: Mike Willbanks

 IRC (freenode): mwillbanks

 Blog: http://blog.digitalstruct.com

 GitHub: https://github.com/mwillbanks