apache architecture

Apache Architecture

How do we measure performance?

• Benchmarks– Requests per Second– Bandwidth– Latency– Concurrency (Scalability)

Building a scalable web server

• handling an HTTP request – map the URL to a resource – check whether client has permission to access the

resource – choose a handler and generate a response – transmit the response to the client – log the request

• must handle many clients simultaneously • must do this as fast as possible

Resource Pools• one bottleneck to server performance is the operating

system – system calls to allocate memory, access a file, or create a

child process take significant amounts of time – as with many scaling problems in computer systems,

caching is one solution

• resource pool: application-level data structure to allocate and cache resources – allocate and free memory in the application instead of using

a system call – cache files, URL mappings, recent responses – limits critical functions to a small, well-tested part of code

Multi-Processor Architectures

• a critical factor in web server performance is how each new connection is handled – common optimization strategy: identify the most commonly-

executed code and make this run as fast as possible – common case: accept a client and return several static

objects – make this run fast: pre-allocate a process or thread, cache

commonly-used files and the HTTP message for the response

Connections

• must multiplex handling many connections simultaneously – select(), poll(): event-driven, singly-threaded – fork(): create a new process for a connection – pthread create(): create a new thread for a connection

• synchronization among processes/threads – shared memory: semaphores – message passing

Select

• select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, struct timeval *timeout);

• Allows a process to block until data is available on any one of a set of file descriptors.

• One web server process can service hundreds of socket connections

Event Driven Architecture

• one process handles all events • must multiplex handling of many clients and

their messages – use select() or poll() to multiplex socket I/O events– provide a list of sockets waiting for I/O events – sleeps until an event occurs on one or more sockets – can provide a timeout to limit waiting time

• must use non-blocking system calls • some evidence that it can be more efficient than

process or thread architectures

Process Driven Architecture

• devote a separate process/thread to each event – master process listens for connections – master creates a separate process/thread for each

new connection

• performance considerations – creating a new process involves significant overhead – threads are less expensive, but still involve overhead

• may create too many processes/threads on a busy server

Process/Thread Pool Architecture

• master thread – creates a pool of threads – listens for incoming connections – places connections on a shared queue

• processes/threads – take connections from shared queue – handle one I/O event for the connection – return connection to the queue – live for a certain number of events (prevents long-lived

memory leaks)

• need memory synchronization

Hybrid Architectures

• each process can handle multiple requests – each process is an event-driven server – must coordinate switching among events/requests

• each process controls several threads – threads can share resources easily – requires some synchronization primitives

• event driven server that handles fast tasks but spawns helper processes for time-consuming requests

What makes a good Web Server?

• Correctness

• Reliability

• Scalability

• Stability

• Speed

Correctness

• Does it conform to the HTTP specification?

• Does it work with every browser?

• Does it handle erroneous input gracefully?

Reliability

• Can you sleep at night?

• Are you being paged during dinner?

• It is an appliance?

Scalability

• Does it handle nominal load?

• Have you been Slashdotted?– And did you survive?

• What is your peak load?

Speed (Latency)

• Does it feel fast?

• Do pages snap in quickly?

• Do users often reload pages?

Apache the General Purpose Webserver

Apache developers strive for

correctness first, and

speed second.

Apache HTTP Server

Architecture Overview

Classic “Prefork” Model

• Apache 1.3, and

• Apache 2.0 Prefork

• Many Children

• Each child handles one connection at a time. Child

Parent

ChildChild… (100s)

http://httpd.apache.org/docs/2.2/mod/prefork.html

Multithreaded “Worker” Model

• Apache 2.0 Worker

• Few Children

• Each child handles many concurrent connections.

Child

Parent

ChildChild… (10s)

10s of threads

Dynamic Content: Modules

• Extensive API

• Pluggable Interface

• Dynamic or Static Linkage

In-process Modules

• Run from inside the httpd process– CGI (mod_cgi)– mod_perl– mod_php– mod_python– mod_tcl

Out-of-process Modules

• Processing happens outside of httpd (eg. Application Server)

• Tomcat– mod_jk/jk2, mod_jserv

• mod_proxy• mod_jrun

Parent

TomcatChild

ChildChild

Performancetransactions per second

Architecture Hello Big DB

Apache-Mod_perl 324 82 98

Apache-CGI 59 25 6

Architecture: The Big Picture

Child

Parent

ChildChild… (10s)

10s of threads Tomcat

DB

100s of threads

mod_jkmod_rewritemod_phpmod_perl

“MPM”

• Multi-Processing Module

• An MPM defines how the server will receive and manage incoming requests.

• Allows OS-specific optimizations.

• Allows vastly different server models(eg. threaded vs. multiprocess).

“Child Process” aka “Server”

• Called a “Server” in

httpd.conf

• A single httpd process.

• May handle one or more

concurrent requests

(depending on the MPM).

Child

Parent

ChildChild… (100s)

Servers

“Parent Process”

• The main httpd process.

• Does not handle connections itself.

• Only creates and destroys children.

• Shared memory scoreboard to determine who handles connections

Child

Parent

Child

Child

… (100s)

Only one P

arent

“Thread”

• In multi-threaded MPMs (eg. Worker).

• Each thread handles a single connection.

• Allows Children to handle many

connections at once.

Prefork MPM

• Apache 1.3 and Apache 2.0 Prefork

• Each child handles one connection at a time

• Many children

• High memory requirements

• “You’ll run out of memory before CPU”

Prefork Directives (Apache 2.0)

• StartServers

• MinSpareServers

• MaxSpareServers

• MaxClients

• MaxRequestsPerChild

Worker MPM

• Apache 2.0 and later

• Multithreaded within each child

• Dramatically reduced memory footprint• Only a few children (fewer than prefork)

Worker Directives

• MinSpareThreads

• MaxSpareThreads

• ThreadsPerChild

• MaxClients

• MaxRequestsPerChild

Apache 1.3 and 2.0Performance Characteristics

Multi-process,

Multi-threaded,

or Both?

Prefork

• High memory usage

• Highly tolerant of faulty modules

• Highly tolerant of crashing children

• Fast

• Well-suited for 1 and 2-CPU systems

• Tried-and-tested model from Apache 1.3

• “You’ll run out of memory before CPU.”

Worker

• Low to moderate memory usage• Moderately tolerant to faulty modules• Faulty threads can affect all threads in child• Highly-scalable• Well-suited for multiple processors• Requires a mature threading library

(Solaris, AIX, Linux 2.6 and others work well)

• Memory is no longer the bottleneck.

Important Performance Considerations

• sendfile() support

• DNS considerations

sendfile() Support

• No more double-copy• Zero-copy*• Dramatic improvement for static files• Available on

– Linux 2.4.x– Solaris 8+– FreeBSD/NetBSD/OpenBSD– ...

* Zero-copy requires both OS support and NIC driver support.

aio

• Process does not block on – Socket– Disk read or write– semaphores

Sendfilebackend throughput % disk

utilUser+sys

writev 17.59MB 50% 25%

sendfile 33.13MB 70% 30%

aio 50MB 98% 60%

aio-sendfile

44.15MB 90% 40%

DNS RoundRobin

apache architecture

Documents

new process

child process

event master process

process cont

web server process

thread architectures

connection return connection

new thread