the monitis introduction to website performance 2app, webapp, or website dares to be slow, it will...

2

The Monitis Introduction to Website Performance

Publication of Monitis

INTRODUCTION

Websites are standard fare for almost all organizations nowadays. Web applications (webapps)

provide web-based services at a lower cost than the same human-provided service. Cell phone

applications (mobile apps) are increasing at a staggering rate, so much so that users sometimes

judge a cell phone by how many apps are available for it.

Creating and maintaining these web-based resources is no small challenge. There is so much

more to it than writing a few scripts and creating a database table. Today’s development team

must also consider availability, confidentiality, correctness, cost, data integrity, human factors,

integration with other components, maintainability, performance, portability, privacy, quality

assurance, release management, resource usage, scalability, security, service-level agreements,

stability, sustainability, usability, and so much more.

Computing changes almost daily and mobility is increasing at an astounding rate. As society’s

technical skills increase, there is a dramatic shift away from face-to-face interaction in favour of

keyboard/mouse/finger interaction. These trends necessitate reasonable performance. If an

app, webapp, or website dares to be slow, it will quickly find itself ignored, which is its death

knell.

This e-book introduces the reader to the concept of website performance. It is intended for those

who are a little bit technical. Those with greater technical skills may find it to be a useful

consolidation and compact retelling of familiar principles. Those without technical skills will still

find a few points to ponder, and they can easily skip over the more technical passages without

losing the gist of the discussion.

We start by defining performance and the critical timeframe, then ask why this is important. The

latter section will reveal not only that performance is critical, but also that it can significantly

impact an organization’s revenues. The e-book is rounded out by discussions on measuring and

monitoring performance.

A reprint of How to Read a Waterfall Chart is included in an appendix because waterfall charts

are such a ubiquitous tool for performance analysis. The appendix lays out the basics for

reading them and gives a few examples of using them to diagnose performance issues.

Society has rightly raised the gender-bias within literature issue for more than half a century. We

used to use the masculine pronouns (he, his, him) exclusively, just as if women did not exist.

Sadly, we still do not have a pronoun that means people of any gender. The use of “he or she,”

“s/he,” “their” in a singular context, or a random gender at each instance seems cumbersome to

the author, so he has opted to use the masculine pronouns in a non-gender-specific sense. The

reader is asked to please infer that masculine pronouns refer to people rather than men. Ladies,

thank-you for your many contributions to the technical world.. (Style15)

3



WHAT IS PERFORMANCE?

In 2013, Wikipedia said,

“Computer performance is characterized by the amount of useful work accomplished by a

computer system compared to the time and resources used.

“Depending on the context, good computer performance may involve one or more of the

following:

short response time for a given piece of work

high throughput (rate of processing work)

low utilization of computing resource(s)

high availability of the computing system or application

fast (or highly compact) data compression and decompression

high bandwidth / short data transmission time”

The Wikipedia article reflects our tendency to think of performance in terms of the machine and

the measurements we use. There’s a reason for that.

In theory, performance should be defined in terms of the individual who is using the system,

which encompasses the entire user experience. However, this definition includes immeasurable

factors such as perception, emotion, motivation, physical and mental well-being, and the

availability of other work that can be done while waiting. These factors either cannot be

measured or vary wildly in ways that are not within our control.

The system development community finds it more practical to discuss performance in terms of

factors that are both measurable and controllable. This e-book follows that practice. The more

esoteric factors will make good fodder for another e-book.

Why Should Performance be Defined from the User’s Viewpoint? (Style23)

Our websites have a purpose. That purpose may be profit or some more altruistic goal. In either

case, success is judged by the website’s users. Without users, a website shrivels up and dies. It

may happen relatively quickly or it may take many months, even years, but it is evident that it will

happen.

The user will compare our website to other websites that offer similar benefits. He will also

consider options that do not use the World-Wide Web and the option of abandoning the task

altogether. Anything that makes the user’s experience mediocre can suggest to him that he take

a peek at other options. Anything that makes that experience annoying, frustrating, or

bothersome demands that he consider those other options. His investigation can easily result in

him abandoning our website and going elsewhere.

http://en.wikipedia.org/wiki/Computer_performance

4



For this discussion about performance, we need to ask ourselves how much impact our web-

site’s performance has on the user’s decision to stay or go. For now, we’ll just assume that it

does affect the users’ decisions. When we get to Why is Performance So Important below, we’ll

see that this assumption is a glaring understatement.

Because success is determined by the users and because poor performance can drive them

away, our definition of performance must be based on the reality of what happens at the usersʼ

machines (i.e., the user experience).

What is the Definition of Performance?

The simplest definition may well be the best definition in this case. Performance is defined

as the user’s perception of being able to get on with what he wants to do without delay.

Delay has many causes, many more than what we think of as performance.

Delay resulting from distraction is not typically included in anyone’s definition of performance, but

it is a serious problem nevertheless. For example, when a user can’t find a parking spot for his

mouse pointer because it triggers a popup in so many places, his train of thought is derailed. He

must devise a solution to his problem, implement the solution, and refocus on what he was

doing. This delays him.

Delay resulting from unneccessary activities is also not typically included in anyone’s definition of

performance, but is also a serious problem. For example, requiring a user to provide excessive

information may be helpful to the marketing department, but the user may see it as irrelevant to

his task and an unnecessary delay.

Delay resulting from a learning curve is another. Performance does not typically include this

serious problem. For example, if a website changes its user interface frequently, every user

must relearn how to interact with the website every time it is updated. Users can resent the time

spent relearning, especially if it comes at an inconvenient time.

This e-book does not address the distraction issue, the unnecessary-activities issue, or

the learning-curve issue, but the reader should give them serious thought anyhow. They make a

difference to the users and are at least partly responsible for some of them leaving. Further

research and blog discussions on these topics are sorely needed.

If these things are not included in the definition of performance, what then is included? Like

most, this e-book limits the definition of performance to only those delays that are directly caused

by computer activity. For example, if the user has to wait while a web page downloads and

renders, that is a performance problem. In fact, that is the one performance problem that is

talked about almost to the exclusion of all others.

5



What Does This Definition Imply?

Measurements: (Style 25) Since delay figures so prominently in the definition of performance, it is incumbent upon us to

measure it, minimize it, and monitor it. Fortunately, delay is easy to measure. We just start the

clock at the beginning of some task and stop it when the task completes.

Tradeoffs: Performance often finds itself at odds with other objectives. For example, most small businesses

use web hosting companies instead of maintaining their own servers. This limits what the small

business can do to improve performance, but it saves them a rather large capital investment.

This illustrates a tradeoff between performance and cost.

Engagement (the amount of time the user spends at our website) and interaction (keystrokes

and mouse clicks) are important, but they stand in stark contrast to the user’s goal of getting in,

doing what he came to do, and getting out. Engagement and interaction want users to dwell on

the website, but this use of their time may be, from their viewpoint, wasteful. The best way to

engage our users is to give them the functionality they need at the performance level they

demand. Engagement will happen, but in a different form – each visit will be shorter, but the

users will visit us more often.

We create webapps to avoid the high cost of human labour. Why pay an employee to do it when

the customer will do it for free? We also have in mind the desire to control users’ thoughts about

us and our products. We may also have other, more specific goals. In all cases, what we want

may be significantly different from what the users want. Although it may be possible to meet

everyone’s goals, there is often a tradeoff. Knowing and working toward the users’ goals is at

least as important as working toward our own.

The Critical Timeframe: Given the above definition of performance, we see that there is one time in particular when per-

formance is a concern – the time when the dreaded hourglass is displayed. True, performance

is always a concern because our server is always serving resources and we need to make sure it

doesn’t become a bottleneck, but the hourglass specifically identifies the wait time from the

user’s perspective. And I’m sure the reader has picked up on the key point that the user’s per-

spective is the one we need to focus on. From a programming viewpoint, the hourglass is

displayed

1. while the page is loading, and

2. while client-side scripts are executing.

Keep in mind that these timeframes are critical because this is when the user is waiting.

Anything our code does during the critical timeframe, no matter how necessary it may be,

increases the user’s wait time. If it’s only a millisecond or two, it may not matter. However, every millisecond adds to every other millisecond. Separately they may seem inconsequential, but together they can add up to a performance problem. Developers need to analyze each algorithm that executes during the critical timeframe and ask

the question, “Can any of this be done outside the critical timeframe?”

6



Example: All other things being equal, static web pages load faster than dynamic web pages.

However, we can use static web pages to create the perception of dynamic web pages: During

the critical timeframe, we serve a static web page, but a background process running outside the

critical timeframe recreates the static web page on a regular basis. How often? Often enough

that the users don’t notice that the data is not truly dynamic. Perception is reality!

Example: A lot can be done during build time, especially if we automate the build process.

Compress all the text files into various formats and instruct the server to serve the

compressed versions whenever possible.

Minify every image file by reducing the resolution to match the rendering resolution,

reducing the colour depth to a value we consider appropriate, and stripping out all

metadata.

Inline the needed-now components, including scripts, stylesheets, and even images.

Move all CSS to the <head> section.

Check to make sure style sheets, scripts, and other components are downloaded only

once.

Minify style sheets and scripts by stripping out all extraneous white space (including

newlines) and removing all comments.

Enforce shop standards.

...and more

The main point here is that these tasks should not be done while the user is waiting.

7



WHY IS PERFORMANCE SO IMPORTANT?

Once upon a time, there was an eight-second rule. It said that users disappear rather quickly if a

web page takes longer than 8 seconds to load.

In 2009, Akamai Technologies found that “47 percent of consumers expect an e-commerce page

to load in two seconds or less” and “40% of consumers will wait no more than three seconds for

a web page to render before abandoning the site.” That’s more than disappointment; that’s

revenue handed over to the competition.

More recently, Eric Horvitz, a scientist at Microsoft’s research labs, said, “The old two-second

guideline has long been surpassed on the racetrack of Web expectations.” Don’t consider the

two-second rule an objective; consider it a starting point.

Just so you don’t walk away thinking that this advice is based on one or two studies, here are a

few others to ponder:

Shopzilla spent 16 months improving its performance, which dropped its average page

load time from 6 seconds to 1.2 seconds. Revenue increased by 5–12%.

Google and Microsoft both introduced artificial delays to see what would happen.

Microsoft found that increasing page load time by half a second reduced revenue by

1.2%. Google found that user activity declined steadily over the 4 week testing period.

After performance was brought back to pre-testing levels, user activity increased slowly.

Even after 5 weeks, though, user activity had not increased back to its pre-testing levels.

Bing measured a 1.8% drop in queries, a 3.75% drop in clicks, a 4% drop in satisfaction,

and a 4.3% drop in revenue, all from a 2 second delay in loading pages.

Walmart found a sharp decline in conversions as page load time increased (up to 4

seconds).

Gomez found that page abandonment rates rose by 38% when response time rose from

2 to 10 seconds. Their study also showed that more than ⅓ of those lost customers went on to tell others about their experience.

Aberdeen Research Group calculated that a $100,000 per day e-commerce site can

suffer a $2,500,000 annual loss of revenues from a 1 second delay.

AOL found that its visitors who visited the fastest web pages viewed an average of 7½

pages per visit. Visitors who visited the slowest web pages viewed an average of only 5

pages per visit.

http://www.akamai.com/html/about/press/releases/2009/press_091409.html

http://blip.tv/oreilly-velocity-conference/velocity-09-eric-schurman-and-jake-brutlag-performance-related-changes-and-their-user-impact-2292767

http://www.gomez.com/pdfs/wp_why_web_performance_matters.pdf

8



MEASURING PERFORMANCE

Some have said, “What can’t be measured can’t be managed.” While that proverb is debatable,

it certainly reflects management’s dependence on measurements. Just as certainly, we know

that measurements can provide good information when used properly.

Measuring how long it takes to do something, then comparing that measurement to

previous readings, tells us about trends within our systems. If we see a time-based

measurement constantly increasing, we can provide a solution before users even realize

there’s a problem. That’s the ultimate in proactivity.

Service level agreements (SLA) specify what is acceptable within a system and what is

not. In some cases, performance measurements identify which SLA promises have (or

have not) been kept.

When faced with a performance issue that requires an immediate fix, measurements can

help us locate the source of the problem. If every subcomponent save one shows normal

readings, we can start our analysis with the subcomponent that has the high reading.

Yes, measurements provide benefits. Considering how easily some measurements can be

implemented, it is hard to understand why so many systems don’t use them.

If performance is not measured and monitored, it can scuttle our ship before we even know the

ship’s leaking. Given the previous discussions about the user response to poor perfor-

mance and its impact on the organization’s success, failing to measure performance is just plain

failing. Reactive mode can be disastrous; proactive mode is essential.

Key Performance Indicators

There is one key performance indicator (KPI) that matters most to a business – net income.

Growth by any other means is a house of cards. Without net income, the business shrivels.

Non-profit organizations (NPOʼs) do not use net income as their KPI. They are more interested in

how effectively or efficiently they accomplish their purpose. Different NPO’s define this in

different ways.

Whatever the organization’s KPI’s may be, they do not include website performance. However,

we do need to know which system measurements have the biggest impact on the organization’s

KPIʼs? Let’s call them the system’s KPI’s. They will be the first measurements we take.

There is a good body of research tying page load times to revenues, so we usually consider

this the system KPI. It offers the added advantages of being easy to measure and easy to moni-

tor.

Establishing a benchmark for page load times for every page in the system, followed by ongoing

monitoring of those page load times, is an essential first step for every website, webapp, and

mobile app. It is so easily accomplished that not doing it seems foolish.

https://mail-attachment.googleusercontent.com/attachment/u/0/?ui=2&ik=369d950b55&view=att&th=13e2f8e7bd360ec7&attid=0.1&disp=inline&safe=1&zw&saduie=AG9B_P9HaIXPrZ7Kr6HSgut3CRqG&sadet=1372232692284&sads=JdjQEW9VrLGEhuLllJRfuMxZrUY#0.1_important

9



Performance of user transactions is an equally important KPI. We start by defining critical user

transactions that span multiple web pages (e.g., logging in, adding an item to a shopping cart,

searching, paying for an order), then timing how long it takes an automated process to conduct

these transactions.

Benchmarking and monitoring user transactions is just as important as benchmarking and moni-

toring page load times.

Supporting Performance Indicators

Once KPI monitoring is up and running, attention should turn to the supporting subcomponents

that impact the system’s KPI’s. Databases, web servers, network/Internet traversal, the domain

name system, browser configurations, and much more each support the system in one way or

another. Individually, they can negatively impact the system’s KPI’s because of their own perfor-

mance issues.

Benchmarking and monitoring the supporting subcomponents gives us our supporting perfor-

mance indicators (SPI’s). SPI’s provide two important benefits.

1. When our system KPI’s show a performance problem, the SPI’s will often help us identify

the underlying cause of the problem.

2. When the system KPI’s are normal, the SPI’s will sometimes reveal emerging problems

that will affect performance down the road. Being able to resolve an issue before anyone

even knows it exists is the hallmark of a great development team.

Business Metrics System owners, upper management, and marketing gurus are interested in attention,

engagement, and conversion. Attention measures how many people find out about us. Engage-

ment measures how long visitors stay and how much they interact with our web pages. Conver-

sion measures how often visitors do what we want them to do. For e-commerce sites,

conversion is typically measured by percentage of visitors that place orders.

Business metrics are not performance metrics, so they are not discussed in this e-book. The

reader is reminded, though, that performance is an important topic only when it impacts business

metrics.

10



MONITORING PERFORMANCE Characteristics of Monitors

Before discussing the types of monitors, letʼs take a look at their characteristics:

the location of the monitor,

what is being monitored,

what is being reported, and

how the monitor collects its data.

Each type of monitor presented below will be described in terms of these characteristics.

The Location of the Monitor Monitors can be characterized by the location of the software that is doing the monitoring. This

can reveal important nuances because we get different results depending on where the monitor

is located.

Monitors in different places measure different things. For example, to see the Internetʼs impact

on a pageʼs performance, compare measurements from a location near the server and a location

near the user. The difference tells the story.

Internal Monitors run on the same machine as the thing that is being monitored. This allows us

to focus on a particular machine to identify localized issues.

Same-Network Monitors are on the same network as the thing that is being monitored. Both are

typically behind the same firewall. This allows us to focus our attention on the local network to

the exclusion of the vagaries of the Internet.

Cloud-Based Monitors (not to be confused with cloud monitors) run on some machine

somewhere on the Internet. They are typically not on the same network or intranet as the thing

that is being monitored. This allows us to continue monitoring even when one of our machines

or our entire network goes down. Using a cloud-based monitoring service (e.g., Monitis) lets us

focus on our webapps rather than on our monitoring system.

A good monitoring system will offer a variety of locations throughout the world.

Real-User Monitors (RUM) run on the actual users’ machines. This gives true end-to-end moni-

toring by including the users’ environment, connectivity, and latency in the measurements.

Monitors in other locations help localize performance issues and can be easier to set up, but only

RUM gives the best picture of the user experience.

What is Being Monitored We can also characterize monitors according to what it is they monitor. This is a never-ending

list because anything can be monitored. Some of the more common items are:

http://monitis.com/

11



web pages - monitored to see if they are there, to see if they have changed, to see how

long it takes to load them, etc.,

transactions (multiple steps a user takes when interacting with our webapp; e.g., logging

in, searching, placing an order) - monitored to see if a step fails, to see how long it takes,

etc.,

devices (e.g., servers, switches, routers, gateways, firewalls),

protocols (e.g., DNS, FTP, HTTP, HTTPS, IMAP, POP3, SNMP, SSH),

databases,

system events and log files,

drives,

memory,

CPUʼs,

connectivity,

bandwidth,

web traffic (e.g., hit counters, paths through our system, client IP addresses, user agents,

bounce rates), and

monitors (i.e., monitors that monitor other monitors [Quis custodiet ipsos custodes?]),

What is Being Reported Monitors can be characterized by the data they collect. This can be as simple as a boolean OK

or NOT OK (often abbreviated to NOK); a measure of elapsed time; or some character string.

A delta monitor can watch a resource and indicate every time the resource changes. This is a

handy way to watch for hacked web pages.

Although not as common, a data harvesting monitor can extract data from some resource and

store it for later retrieval. This lets us build a history of changes to dynamic resources.

How the Monitor Collects its Data Monitors can be characterized by how they function.

Passive monitors access data by sniffing network activity. They are not used very often to

measure website performance.

Active monitors, also known as synthetic monitors, request resources from a server just as if they

were real users accessing the website. They either simulate user agents (e.g., browsers, cell

phones) or use scripting with real user agents. A monitoring service typically provides multiple

user agents and versions.

Monitoring agents are code that we install or embed. They can be installed on our server or

another of our machines, or embedded into web pages that are sent to users. The agent acts as

a monitor and stores its data externally (e.g., by sending the data to an external monitoring

system or a database).

12



Types of Monitors

Uptime Monitors What is Monitored: the ability to connect

What is Reported: OK/NOK and response time

Location: cloud-based

Uptime monitors attempt to connect to a server through a specified protocol. If a connection can

be established, the monitor records OK and logs how long it took to get a response. If a

connection cannot be established, the monitor records NOK.

At the time of writing, Monitis provided uptime monitors for HTTP, FTP, DNS, MySQL, UDP,

IMAP, SIP, HTTPS, ping, SSH, TCP, POP3, SMTP, and SOAP.

Server/Device Monitors What is Monitored: various system and environmental resources

What is Reported: OK/NOK, SLA compliance, others depending on the resource

Location: on the device or network being monitored

Server/device monitors are installed on the machine or network that is to be monitored. They

poll system devices and/or read system measurements on a regular basis and report their

findings back to the external monitoring system.

At the time of writing, Monitis provided server/device monitors for CPUʼs, memory, linux load,

drives, processes, system events, disk I/O, bandwidth, SNMP, ping, HTTP, and HTTPS.

Application Monitors What is Monitored: supporting software installed on the server

What is Reported: OK/NOK, SLA compliance, others depending on the resource

Location: on the same machine as the application

Application monitors are installed on the machine to be monitored. They can query the target

software from a separate process or be installed as a module/extension of the target. Database

management systems are the most commonly monitored applications because they are essential

to almost all webapps.

At the time of writing, Monitis provided application monitors for Java/JMX, MySQL, and MSSQL.

End-User Monitors: Page Load Monitors What is Monitored: the loading of a web page, including all downloads and script execution

What is Reported: begin, end, and elapsed time for each component downloaded


Page load monitors load a web page and all of its components, keeping track of the begin and

end times for each phase of loading each component. Graphing the results in a waterfall chart

helps us identify slow resources and blocking, which we then work to minimize.

13



End-User Monitors: Transaction Monitors What is Monitored: transactions

What is Reported: OK/NOK, elapsed time, failure points


Transaction monitors simulate user transactions on the specified server. A transaction is an

ordered list of mouse clicks and keyboard events defined by the developer, usually to navigate

through multiple web pages. Examples of transactions: logging in, searching a database,

adding an item to a shopping cart, signing a petition, sending an e-mail.

There is a tendency to rely only on page load monitors because they are simple to implement

and they tell us about most problems. However, they do not tell us if a multi-page sequence of

steps generates a correct response in an acceptable time, nor are they that much simpler than

transaction monitors.

End-User Monitors: Web Traffic Monitors What is Monitored: client requests for a specified web page

What is Reported: details of each hit

Location: agent/embed

End-User monitors are installed on the server or embedded into web pages. They can log the

timestamp every time the page is served, along with the client IP address and port, the referring

page, and the session ID. This information can be used to identify paths users take through the

system and pages with high bounce rates. In its simplest form, we can graph the number of hits

per unit of time.

Cloud Monitors What is Monitored: cloud-based virtual machines

What is Reported: number of instances, various measurements of each instance

Location: installed on the virtual machines

Cloud monitors are not the same as cloud-based monitors (described above). Cloud monitors

monitor cloud-based virtual computing environments provided by vendors. An installed agent

monitors the entire virtual environment and reports back to the external monitoring system.

At the time of writing, Monitis provided monitors for Amazon EC2, Rackspace, and GoGrid.

Real-User Monitors What is Monitored: usually page load

What is Reported: begin/end/duration times

Location: embed

Real-User Monitors are embedded into web pages, usually to capture timing data. Newer ones

use the World-Wide Web Consortiumʼs PerformanceTiming interface to capture the timing data

that was created by the browser.

http://www.w3.org/TR/navigation-timing/#sec-navigation-timing-interface

14



Custom Monitors What is Monitored: anything you want

What is Reported: anything you want

Location: anywhere you want

Custom Monitors are monitors you create yourself, so they can do whatever you want them to

do.

Monitis provides an API for uploading results from custom monitors to the Monitis dashboard.

Contributors have created software development kits (SDKʼs) to further simplify development in

various languages.

What Should I Monitor?

#1 - Uptime: This is the most important metric because it tells whether the server is even there.

Think about it this way — a website constantly going down is the same thing as a shop being

closed at random times. We can lose a lot of customers.

#2 - Page Load: This metric is crucial because people have very short attention spans (a

couple of seconds). If web pages donʼt load quickly enough, we lose potential customers. Also

keep in mind that Google lowers the rankings of pages that load slowly.

#3 - Transactions: Identify and monitor key transactions in the system. Itʼs not enough to

know that the server is there and responding. We must also know that key transactions can be

completed. After all, the user experience is what counts.

#4 - Servers: Keep a close eye on CPU, RAM, storage, bandwidth, and application processes.

It not only helps prevent upcoming faults in the system, but also indicates whether we need to

upgrade resources.

#5 - Applications: Have you ever seen a web application without a database? No, right?

That’s because without databases, we have nowhere to store the information that drives the

business. Database monitoring helps us keep the database performing optimally. Not only

databases, though. We should be monitoring whatever applications our site relies on.

#6 - Networks: Monitor with SNMP, UDP, PING, TCP, SSH, and other protocols to make sure

the network is healthy. In addition, check for cross-office WAN connectivity: telnet, HTTP,

intranet, extranet, routers, and switches.

http://api.monitis.com/api/api.html

https://github.com/monitisexchange

15



Notifications

A monitoring system stores its results and provides an interface to let us see those results. For

example, the raw data may be available in a table or it may be presented graphically. The better

systems provide both interfaces, and may provide others.

Seeing results from the past allows us to analyze what has happened. This is especially useful

after changes have been implemented or when problems surface. However, monitoring systems

that let us know about problems as they happen provide an additional benefit. They give us an

opportunity to react to the problem without delay. There is no need to wait for a user to send in

an error report. There is no need to wait for the help desk to pass the message along to the de-

velopment team. All of this can be bypassed if the monitoring system sends us an SMS

immediately upon the problem rearing its ugly head.

Notification is most often by SMS or e-mail, but can be by telephone, XMPP, HTTP callback,

instant messaging, fax, pager, Skype, RSS feed, or any other medium.

Of course, we want to differentiate between problems that should wake us up in the middle of the

night vs. those that can be dealt with the next day vs. those that are a lower priority. Monitoring

systems respond to this need by letting us set the bar for a metric and select the type of

notification. We can use this feature to specify less timely notifications (e.g., e-mail) for smaller

problems and more timely notifications (e.g., SMS) for critical problems.

Not all measurements need to trigger notifications. We may decide that certain measurements

are for post facto study only. We do not need notifications for these.

One common problem with notifications is their overuse. If notifications are received too

frequently for problems that can be postponed, the technical staff may ignore them or skim

through them without due diligence. Repetitious notifications also fall into this category. One or

two repetitions may be useful to ensure the message was delivered, but more than that may

actually cause grief for the response team. It is wise to use notifications sparingly, but still often

enough that we are made aware of higher priority issues as they develop.

What Triggers a Notification?

The development team defines which measurements trigger notifications. For example, if the

system is not available for more than one minute, we can send an e-mail to our help desk. If the

problem continues for more than fifteen minutes, we can send an SMS to one of our problem

determination specialists.

Service level agreements often set the bar for page load times. A notification can be sent

whenever a page loads more slowly than the SLA requires, or perhaps if it loads consistently

slowly over a 15 minute period.

Good notification systems give us the flexibility to define notifications in a variety of ways. This

lets us create the notifications that are best for our system.

16



CONCLUSION

Website performance is the user’s perception of being able to get on with what he wants to do

without delay. Anything that leaves the user tapping his fingers while staring at the computer

screen is a performance issue. It is not limited to page load time, although that is one of its main

components.

Performance is not the only objective. Website development teams must also consider

availability, confidentiality, correctness, cost, data integrity, human factors, integration with other

components, maintainability, performance, portability, privacy, quality assurance, release ma-

nagement, resource usage, scalability, security, service-level agreements, stability, sustainability,

usability, and much more. Performance considerations can conflict with any of these other

objectives, which then forces a tradeoff. Teams should be encouraged to prioritize performance

as much as they can.

Performance’s critical timeframe is any time the user is waiting for the computer. Anything the

code does during the critical timeframe, no matter how necessary it may be, contributes to per-

formance issues. Opportunities often exist to move processing to the build process or to batch

jobs. These tasks can be scheduled outside the critical timeframe, and often at the system’s

quiet times.

Waterfall charts are a key tool in analyzing performance. They graphically show each resource

that is downloaded, when it is requested, when it starts to arrive, and when it is fully available.

Performance analysts can see at a glance which components are contributing to performance

problems.

Measurements are faulty because they do not measure performance in its entirety, especially

since the word perception is included in the definition of performance. However, measurements

are far superior to doing nothing. Monitoring every page’s load time on an ongoing basis is

essential. Individual multi-page transactions also need to be continuously watched. Other

measurements can track the various subcomponents of the system to let us know where perfor-

mance problems originate. All of these can be automated easily with Monitis’ cloud-based tools.

Slow web pages have been shown to impact conversion rates, interaction, engagement, and

(more importantly) revenue. This means two things: #1) Ignoring or lowering the priority of web

performance reduces a company’s net income. #2) Focusing in on website performance

presents an opportunity to increase revenue.

Research has shown, time and again, that minute changes to a web page’s speed can cause

dramatic changes in the user’s behaviour. The well-known two-second rule says web pages

should load in under two seconds. Some say two seconds should not be considered a goal, but

rather a starting point.

http://monitis.com/

17



APPENDIX A – HOW TO READ

A WATERFALL CHART

The first step in addressing any problem is to gather information. For website performance prob-

lems, waterfall charts are a handy tool.

What is a Waterfall Chart?

A waterfall chart is a bar graph that shows the individual and cumulative effects of sequentially

introduced values. In website performance analysis, each bar represents one component on a

web page and the entire chart represents the fetching and rendering of the page.

What Do The Horizontal Bars Represent?

Each request for each component goes through these stages (although some stages will be

skipped if they are unnecessary):

1. wait for the browser – The browser may have to wait for some other component to

download before it can begin downloading the requested component. If it’s waiting for a

script, it may be waiting for both the download and the execution of the script. In both

cases, this is called blocking.

2. look up the IP address – If a domain name is used to specify the requested component,

the browser must find the IP address for that domain. It uses the Internet’s domain name

system (DNS) to do this.

3. connect – The browser uses the socket layer to establish a connection to the server.

4. shake hands – If the component is protected by SSL (public/private key encryption), the

browser and server must “shake hands” before a connection can be established.

5. send the request – The browser sends the request to the server.

6. wait for the first byte – The browser waits for the server to send the first byte.

7. receive – The server sends the content and the browser receives it.

8. render – The browser writes the words and draws the images in its viewport. The

content becomes visible to the end-user during this stage.

18



The browser can handle more than one component at a time. For example, it can receive

several components at once or it can render one component while sending the request for a

different component. However, for each individual component, the order of the above actions is

mostly fixed.

A waterfall chart’s horizontal bars present the download/render process in a visual manner.

Each bar represents one component. Its length represents the amount of time used (i.e., the x-

axis represents time). Different colours are used to represent the different stages.

In the waterfall chart pictured at the beginning of this appendix, light blue represents the DNS

lookup, dark blue represents establishing a connection, peach represents a blocked state,

orange represents sending a request, pink represents waiting for the first byte, red represents

the downloading/receiving stage, light green represents the creation of the DOM tree, and dark

green represents the time to draw the images and text.

Some waterfall charts use slightly different terminology (e.g., look up the IP address may be

called DNS or DNS lookup), but they mean the same thing. Some charts combine multiple con-

secutive stages together into one (e.g., the SSL handshaking may be included in

the connect stage). Some charts may not show the first stage.

The list of stages above implies that each stage ends before the next stage begins. That is not

necessarily true. For example, a browser can start rendering before it finishes receiving all the

data. A waterfall chart may not show this overlap and it may show the overlapping time period

as either the preceding stage or the following stage. It’s another case of check the

documentation.

If one component blocks another, the waiting component’s bar will start near the end of the first

component’s bar. The opposite is not true, though. If a bar begins at the end of some other bar,

it may not have been waiting on that other component.

What Do The Vertical Lines Represent?

Some waterfall charts, like the above, include variously-coloured vertical lines. These lines re-

present the occurrence of some event. For example, vertical lines can be used to represent the

point in time at which:

1. the first byte is received,

2. the HTML and JavaScript are fully parsed and deferred scripts have finished executing

(i.e., JavaScript’s DOMContentLoaded event),

3. the document is fully loaded (i.e., JavaScript’s onLoad event), or

4. the browser starts to render the text and images.

The vertical lines of different waterfall charts may represent different events, so check the

legend. Note, too, that this information may not be presented as vertical lines on the waterfall

chart, but may be present elsewhere on the page.

19



How Can a Waterfall Chart Help Me Fix a Performance Problem?

The following are things you may observe about individual components on a waterfall chart,

along with what they mean and what you can do about them:

Waiting for the browser takes too long: If this is because of non-browser processes that

are hogging the CPU, terminate processes that don’t need to be running or migrate to a bigger,

faster whiz-bang computer. If this is because the browser is doing other things, change the order

in which the components are downloaded, update the browser, or move to a faster browser. If

this is because of blocking, defer loading less critical components until after the document is fully

loaded. Also use concurrency as much as possible.

Looking up the IP address takes too long: The domain name system is slow. Avoid

lookups by using IP addresses instead of domain names and caching zone file entries for as long

as possible. Moving the DNS server physically closer to the browser may also help.

SSL handshaking takes too long: Avoid SSL unless it is absolutely essential. We don’t

need to encrypt every component on a secure web page, do we? Example: Where’s the sense

in encrypting the company’s logo?

Connecting takes too long: This may be a network or Internet problem, or it may be a sign

of a busy server.

Sending a request takes too long: This may be a network or Internet problem, or it may be

a sign of a busy server.

Waiting for first byte takes too long: If this phase takes too long and other phases are

lightning fast, look at the server first. Look to see if the code or database are causing perfor-

mance problems. It may also indicate a networking or Internet problem, though, so consider

both possibilities.

Receiving content takes too long: This usually indicates a large quantity of data. Reduce

the number of bytes as much as possible. Smaller is better. Don’t assume that splitting the

content into multiple download streams will help, though. Quite the opposite, downloading more

components will incur additional overhead, which may increase the total download time.

Also look into minification, more compact algorithms, and eliminating redundancy. Finally, don’t

forget to keep images small and compress everything that can be compressed.

The distance between the client and the server impacts this phase and the previous five phases.

Consider storing components on machines that are physically closer to your end-users.

Rendering takes too long: It’s time to look more closely at DOM performance tips. [As a

general rule, there should be no dynamic (script-based) changes to the DOM tree before the

document reaches the point of interactivity.]

Looking at the forest instead of the trees can be helpful. Take a step back and look at the chart

as a whole. If you see the following, here’s what you can do about it:

http://blog.monitor.us/2012/01/image-processing-for-performance/

20



There are too many components: Each component has its own overhead. As shown

above, there is much more than data transfer happening here. Multiple components can be

joined together into a single data stream. Example: Multiple JavaScripts can be joined together

into a single JavaScript file.

One bar is longer than the others: The longest bar is for the component that takes the most

time. Re-evaluate it according to the above criteria. Be extra-finicky for this one. Small is your

friend; big is your enemy.

The chart shows a staircase formation: If the chart is a staircase, components are being

processed serially. That’s about as bad as it can get. Look into performance tips that increase

parallelization.

All (or almost all) DNS lookups have non-zero times: If multiple components from the

same domain are all looking up IP addresses in the DNS, then caching for that domain name has

been turned off. Turn it back on. Better yet, avoid DNS lookups completely by serving

components from IP addresses instead.

All (or almost all) components have non-zero connect times: Every component is

establishing its own connection instead of reusing existing connections. Connection persistence

is probably turned off at the server. Turn it on.

The start render line is too far away from zero: This line tells us how long it takes before

the end-user gets the first bit of visual feedback. If it takes too long, the user may click the

refresh button, which restarts the entire process. [Worse yet, he may just give up.] Defer every

possible script execution and component download until after onLoad fires, then

download/execute in the order that gets the end-user to the point of interactivity as soon as

possible.

The document complete line is too far away from zero: Perhaps some of the previously-

loaded components can be deferred until after the document is fully loaded? There are several

techniques available to make this happen.

What Are a Waterfall Chart’s Drawbacks?

A single waterfall chart taken in isolation can be misleading. Factors that affect performance can

be dramatically different with a different user agent or even one minute later on the same

machine. Decisions should never be made on the basis of a standalone waterfall chart.

A waterfall chart for the first access of a web page will be dramatically different from subsequent

accesses because the subsequent accesses will use whatever caching is available. This is only

a drawback if the reader is not aware of the difference. It’s a benefit to see and compare both

waterfall charts.

Cloud-based waterfall charts do not measure performance at your end-users’ computers. Real

User Monitoring (RUM) does, but it will present different data at different times because of the

21



dynamic nature of the Internet, the server, and the user’s computer. Performance is subject to

so many factors that averages alone may be meaningless.

A Free Cloud-Based Waterfall Chart

Monitis’ Full Page Load Tool generates waterfall charts. It’s as simple as giving it a URL and

clicking a button, but there are several non-default options, too. And best of all, it's free!

http://pageload.monitis.com/

https://www.monitis.com/free_signup.jsp?adv=13417&loc=61&utm_source=StartMonitoringNow&utm_medium=Ebook+mails&utm_campaign=Email

the monitis introduction to website performance 2app, webapp, or website dares to be slow, it will...

Documents