the monitis introduction to website performance 2app, webapp, or website dares to be slow, it will...
TRANSCRIPT
2
The Monitis Introduction to Website Performance
Publication of Monitis
INTRODUCTION
Websites are standard fare for almost all organizations nowadays. Web applications (webapps)
provide web-based services at a lower cost than the same human-provided service. Cell phone
applications (mobile apps) are increasing at a staggering rate, so much so that users sometimes
judge a cell phone by how many apps are available for it.
Creating and maintaining these web-based resources is no small challenge. There is so much
more to it than writing a few scripts and creating a database table. Today’s development team
must also consider availability, confidentiality, correctness, cost, data integrity, human factors,
integration with other components, maintainability, performance, portability, privacy, quality
assurance, release management, resource usage, scalability, security, service-level agreements,
stability, sustainability, usability, and so much more.
Computing changes almost daily and mobility is increasing at an astounding rate. As society’s
technical skills increase, there is a dramatic shift away from face-to-face interaction in favour of
keyboard/mouse/finger interaction. These trends necessitate reasonable performance. If an
app, webapp, or website dares to be slow, it will quickly find itself ignored, which is its death
knell.
This e-book introduces the reader to the concept of website performance. It is intended for those
who are a little bit technical. Those with greater technical skills may find it to be a useful
consolidation and compact retelling of familiar principles. Those without technical skills will still
find a few points to ponder, and they can easily skip over the more technical passages without
losing the gist of the discussion.
We start by defining performance and the critical timeframe, then ask why this is important. The
latter section will reveal not only that performance is critical, but also that it can significantly
impact an organization’s revenues. The e-book is rounded out by discussions on measuring and
monitoring performance.
A reprint of How to Read a Waterfall Chart is included in an appendix because waterfall charts
are such a ubiquitous tool for performance analysis. The appendix lays out the basics for
reading them and gives a few examples of using them to diagnose performance issues.
Society has rightly raised the gender-bias within literature issue for more than half a century. We
used to use the masculine pronouns (he, his, him) exclusively, just as if women did not exist.
Sadly, we still do not have a pronoun that means people of any gender. The use of “he or she,”
“s/he,” “their” in a singular context, or a random gender at each instance seems cumbersome to
the author, so he has opted to use the masculine pronouns in a non-gender-specific sense. The
reader is asked to please infer that masculine pronouns refer to people rather than men. Ladies,
thank-you for your many contributions to the technical world.. (Style15)
3
The Monitis Introduction to Website Performance
Publication of Monitis
WHAT IS PERFORMANCE?
In 2013, Wikipedia said,
“Computer performance is characterized by the amount of useful work accomplished by a
computer system compared to the time and resources used.
“Depending on the context, good computer performance may involve one or more of the
following:
short response time for a given piece of work
high throughput (rate of processing work)
low utilization of computing resource(s)
high availability of the computing system or application
fast (or highly compact) data compression and decompression
high bandwidth / short data transmission time”
The Wikipedia article reflects our tendency to think of performance in terms of the machine and
the measurements we use. There’s a reason for that.
In theory, performance should be defined in terms of the individual who is using the system,
which encompasses the entire user experience. However, this definition includes immeasurable
factors such as perception, emotion, motivation, physical and mental well-being, and the
availability of other work that can be done while waiting. These factors either cannot be
measured or vary wildly in ways that are not within our control.
The system development community finds it more practical to discuss performance in terms of
factors that are both measurable and controllable. This e-book follows that practice. The more
esoteric factors will make good fodder for another e-book.
Why Should Performance be Defined from the User’s Viewpoint? (Style23)
Our websites have a purpose. That purpose may be profit or some more altruistic goal. In either
case, success is judged by the website’s users. Without users, a website shrivels up and dies. It
may happen relatively quickly or it may take many months, even years, but it is evident that it will
happen.
The user will compare our website to other websites that offer similar benefits. He will also
consider options that do not use the World-Wide Web and the option of abandoning the task
altogether. Anything that makes the user’s experience mediocre can suggest to him that he take
a peek at other options. Anything that makes that experience annoying, frustrating, or
bothersome demands that he consider those other options. His investigation can easily result in
him abandoning our website and going elsewhere.
4
The Monitis Introduction to Website Performance
Publication of Monitis
For this discussion about performance, we need to ask ourselves how much impact our web-
site’s performance has on the user’s decision to stay or go. For now, we’ll just assume that it
does affect the users’ decisions. When we get to Why is Performance So Important below, we’ll
see that this assumption is a glaring understatement.
Because success is determined by the users and because poor performance can drive them
away, our definition of performance must be based on the reality of what happens at the usersʼ
machines (i.e., the user experience).
What is the Definition of Performance?
The simplest definition may well be the best definition in this case. Performance is defined
as the user’s perception of being able to get on with what he wants to do without delay.
Delay has many causes, many more than what we think of as performance.
Delay resulting from distraction is not typically included in anyone’s definition of performance, but
it is a serious problem nevertheless. For example, when a user can’t find a parking spot for his
mouse pointer because it triggers a popup in so many places, his train of thought is derailed. He
must devise a solution to his problem, implement the solution, and refocus on what he was
doing. This delays him.
Delay resulting from unneccessary activities is also not typically included in anyone’s definition of
performance, but is also a serious problem. For example, requiring a user to provide excessive
information may be helpful to the marketing department, but the user may see it as irrelevant to
his task and an unnecessary delay.
Delay resulting from a learning curve is another. Performance does not typically include this
serious problem. For example, if a website changes its user interface frequently, every user
must relearn how to interact with the website every time it is updated. Users can resent the time
spent relearning, especially if it comes at an inconvenient time.
This e-book does not address the distraction issue, the unnecessary-activities issue, or
the learning-curve issue, but the reader should give them serious thought anyhow. They make a
difference to the users and are at least partly responsible for some of them leaving. Further
research and blog discussions on these topics are sorely needed.
If these things are not included in the definition of performance, what then is included? Like
most, this e-book limits the definition of performance to only those delays that are directly caused
by computer activity. For example, if the user has to wait while a web page downloads and
renders, that is a performance problem. In fact, that is the one performance problem that is
talked about almost to the exclusion of all others.
5
The Monitis Introduction to Website Performance
Publication of Monitis
What Does This Definition Imply?
Measurements: (Style 25) Since delay figures so prominently in the definition of performance, it is incumbent upon us to
measure it, minimize it, and monitor it. Fortunately, delay is easy to measure. We just start the
clock at the beginning of some task and stop it when the task completes.
Tradeoffs: Performance often finds itself at odds with other objectives. For example, most small businesses
use web hosting companies instead of maintaining their own servers. This limits what the small
business can do to improve performance, but it saves them a rather large capital investment.
This illustrates a tradeoff between performance and cost.
Engagement (the amount of time the user spends at our website) and interaction (keystrokes
and mouse clicks) are important, but they stand in stark contrast to the user’s goal of getting in,
doing what he came to do, and getting out. Engagement and interaction want users to dwell on
the website, but this use of their time may be, from their viewpoint, wasteful. The best way to
engage our users is to give them the functionality they need at the performance level they
demand. Engagement will happen, but in a different form – each visit will be shorter, but the
users will visit us more often.
We create webapps to avoid the high cost of human labour. Why pay an employee to do it when
the customer will do it for free? We also have in mind the desire to control users’ thoughts about
us and our products. We may also have other, more specific goals. In all cases, what we want
may be significantly different from what the users want. Although it may be possible to meet
everyone’s goals, there is often a tradeoff. Knowing and working toward the users’ goals is at
least as important as working toward our own.
The Critical Timeframe: Given the above definition of performance, we see that there is one time in particular when per-
formance is a concern – the time when the dreaded hourglass is displayed. True, performance
is always a concern because our server is always serving resources and we need to make sure it
doesn’t become a bottleneck, but the hourglass specifically identifies the wait time from the
user’s perspective. And I’m sure the reader has picked up on the key point that the user’s per-
spective is the one we need to focus on. From a programming viewpoint, the hourglass is
displayed
1. while the page is loading, and
2. while client-side scripts are executing.
Keep in mind that these timeframes are critical because this is when the user is waiting.
Anything our code does during the critical timeframe, no matter how necessary it may be,
increases the user’s wait time. If it’s only a millisecond or two, it may not matter. However, every millisecond adds to every other millisecond. Separately they may seem inconsequential, but together they can add up to a performance problem. Developers need to analyze each algorithm that executes during the critical timeframe and ask
the question, “Can any of this be done outside the critical timeframe?”
6
The Monitis Introduction to Website Performance
Publication of Monitis
Example: All other things being equal, static web pages load faster than dynamic web pages.
However, we can use static web pages to create the perception of dynamic web pages: During
the critical timeframe, we serve a static web page, but a background process running outside the
critical timeframe recreates the static web page on a regular basis. How often? Often enough
that the users don’t notice that the data is not truly dynamic. Perception is reality!
Example: A lot can be done during build time, especially if we automate the build process.
Compress all the text files into various formats and instruct the server to serve the
compressed versions whenever possible.
Minify every image file by reducing the resolution to match the rendering resolution,
reducing the colour depth to a value we consider appropriate, and stripping out all
metadata.
Inline the needed-now components, including scripts, stylesheets, and even images.
Move all CSS to the <head> section.
Check to make sure style sheets, scripts, and other components are downloaded only
once.
Minify style sheets and scripts by stripping out all extraneous white space (including
newlines) and removing all comments.
Enforce shop standards.
...and more
The main point here is that these tasks should not be done while the user is waiting.
7
The Monitis Introduction to Website Performance
Publication of Monitis
WHY IS PERFORMANCE SO IMPORTANT?
Once upon a time, there was an eight-second rule. It said that users disappear rather quickly if a
web page takes longer than 8 seconds to load.
In 2009, Akamai Technologies found that “47 percent of consumers expect an e-commerce page
to load in two seconds or less” and “40% of consumers will wait no more than three seconds for
a web page to render before abandoning the site.” That’s more than disappointment; that’s
revenue handed over to the competition.
More recently, Eric Horvitz, a scientist at Microsoft’s research labs, said, “The old two-second
guideline has long been surpassed on the racetrack of Web expectations.” Don’t consider the
two-second rule an objective; consider it a starting point.
Just so you don’t walk away thinking that this advice is based on one or two studies, here are a
few others to ponder:
Shopzilla spent 16 months improving its performance, which dropped its average page
load time from 6 seconds to 1.2 seconds. Revenue increased by 5–12%.
Google and Microsoft both introduced artificial delays to see what would happen.
Microsoft found that increasing page load time by half a second reduced revenue by
1.2%. Google found that user activity declined steadily over the 4 week testing period.
After performance was brought back to pre-testing levels, user activity increased slowly.
Even after 5 weeks, though, user activity had not increased back to its pre-testing levels.
Bing measured a 1.8% drop in queries, a 3.75% drop in clicks, a 4% drop in satisfaction,
and a 4.3% drop in revenue, all from a 2 second delay in loading pages.
Walmart found a sharp decline in conversions as page load time increased (up to 4
seconds).
Gomez found that page abandonment rates rose by 38% when response time rose from
2 to 10 seconds. Their study also showed that more than ⅓ of those lost customers went on to tell others about their experience.
Aberdeen Research Group calculated that a $100,000 per day e-commerce site can
suffer a $2,500,000 annual loss of revenues from a 1 second delay.
AOL found that its visitors who visited the fastest web pages viewed an average of 7½
pages per visit. Visitors who visited the slowest web pages viewed an average of only 5
pages per visit.
8
The Monitis Introduction to Website Performance
Publication of Monitis
MEASURING PERFORMANCE
Some have said, “What can’t be measured can’t be managed.” While that proverb is debatable,
it certainly reflects management’s dependence on measurements. Just as certainly, we know
that measurements can provide good information when used properly.
Measuring how long it takes to do something, then comparing that measurement to
previous readings, tells us about trends within our systems. If we see a time-based
measurement constantly increasing, we can provide a solution before users even realize
there’s a problem. That’s the ultimate in proactivity.
Service level agreements (SLA) specify what is acceptable within a system and what is
not. In some cases, performance measurements identify which SLA promises have (or
have not) been kept.
When faced with a performance issue that requires an immediate fix, measurements can
help us locate the source of the problem. If every subcomponent save one shows normal
readings, we can start our analysis with the subcomponent that has the high reading.
Yes, measurements provide benefits. Considering how easily some measurements can be
implemented, it is hard to understand why so many systems don’t use them.
If performance is not measured and monitored, it can scuttle our ship before we even know the
ship’s leaking. Given the previous discussions about the user response to poor perfor-
mance and its impact on the organization’s success, failing to measure performance is just plain
failing. Reactive mode can be disastrous; proactive mode is essential.
Key Performance Indicators
There is one key performance indicator (KPI) that matters most to a business – net income.
Growth by any other means is a house of cards. Without net income, the business shrivels.
Non-profit organizations (NPOʼs) do not use net income as their KPI. They are more interested in
how effectively or efficiently they accomplish their purpose. Different NPO’s define this in
different ways.
Whatever the organization’s KPI’s may be, they do not include website performance. However,
we do need to know which system measurements have the biggest impact on the organization’s
KPIʼs? Let’s call them the system’s KPI’s. They will be the first measurements we take.
There is a good body of research tying page load times to revenues, so we usually consider
this the system KPI. It offers the added advantages of being easy to measure and easy to moni-
tor.
Establishing a benchmark for page load times for every page in the system, followed by ongoing
monitoring of those page load times, is an essential first step for every website, webapp, and
mobile app. It is so easily accomplished that not doing it seems foolish.
9
The Monitis Introduction to Website Performance
Publication of Monitis
Performance of user transactions is an equally important KPI. We start by defining critical user
transactions that span multiple web pages (e.g., logging in, adding an item to a shopping cart,
searching, paying for an order), then timing how long it takes an automated process to conduct
these transactions.
Benchmarking and monitoring user transactions is just as important as benchmarking and moni-
toring page load times.
Supporting Performance Indicators
Once KPI monitoring is up and running, attention should turn to the supporting subcomponents
that impact the system’s KPI’s. Databases, web servers, network/Internet traversal, the domain
name system, browser configurations, and much more each support the system in one way or
another. Individually, they can negatively impact the system’s KPI’s because of their own perfor-
mance issues.
Benchmarking and monitoring the supporting subcomponents gives us our supporting perfor-
mance indicators (SPI’s). SPI’s provide two important benefits.
1. When our system KPI’s show a performance problem, the SPI’s will often help us identify
the underlying cause of the problem.
2. When the system KPI’s are normal, the SPI’s will sometimes reveal emerging problems
that will affect performance down the road. Being able to resolve an issue before anyone
even knows it exists is the hallmark of a great development team.
Business Metrics System owners, upper management, and marketing gurus are interested in attention,
engagement, and conversion. Attention measures how many people find out about us. Engage-
ment measures how long visitors stay and how much they interact with our web pages. Conver-
sion measures how often visitors do what we want them to do. For e-commerce sites,
conversion is typically measured by percentage of visitors that place orders.
Business metrics are not performance metrics, so they are not discussed in this e-book. The
reader is reminded, though, that performance is an important topic only when it impacts business
metrics.
10
The Monitis Introduction to Website Performance
Publication of Monitis
MONITORING PERFORMANCE Characteristics of Monitors
Before discussing the types of monitors, letʼs take a look at their characteristics:
the location of the monitor,
what is being monitored,
what is being reported, and
how the monitor collects its data.
Each type of monitor presented below will be described in terms of these characteristics.
The Location of the Monitor Monitors can be characterized by the location of the software that is doing the monitoring. This
can reveal important nuances because we get different results depending on where the monitor
is located.
Monitors in different places measure different things. For example, to see the Internetʼs impact
on a pageʼs performance, compare measurements from a location near the server and a location
near the user. The difference tells the story.
Internal Monitors run on the same machine as the thing that is being monitored. This allows us
to focus on a particular machine to identify localized issues.
Same-Network Monitors are on the same network as the thing that is being monitored. Both are
typically behind the same firewall. This allows us to focus our attention on the local network to
the exclusion of the vagaries of the Internet.
Cloud-Based Monitors (not to be confused with cloud monitors) run on some machine
somewhere on the Internet. They are typically not on the same network or intranet as the thing
that is being monitored. This allows us to continue monitoring even when one of our machines
or our entire network goes down. Using a cloud-based monitoring service (e.g., Monitis) lets us
focus on our webapps rather than on our monitoring system.
A good monitoring system will offer a variety of locations throughout the world.
Real-User Monitors (RUM) run on the actual users’ machines. This gives true end-to-end moni-
toring by including the users’ environment, connectivity, and latency in the measurements.
Monitors in other locations help localize performance issues and can be easier to set up, but only
RUM gives the best picture of the user experience.
What is Being Monitored We can also characterize monitors according to what it is they monitor. This is a never-ending
list because anything can be monitored. Some of the more common items are:
11
The Monitis Introduction to Website Performance
Publication of Monitis
web pages - monitored to see if they are there, to see if they have changed, to see how
long it takes to load them, etc.,
transactions (multiple steps a user takes when interacting with our webapp; e.g., logging
in, searching, placing an order) - monitored to see if a step fails, to see how long it takes,
etc.,
devices (e.g., servers, switches, routers, gateways, firewalls),
protocols (e.g., DNS, FTP, HTTP, HTTPS, IMAP, POP3, SNMP, SSH),
databases,
system events and log files,
drives,
memory,
CPUʼs,
connectivity,
bandwidth,
web traffic (e.g., hit counters, paths through our system, client IP addresses, user agents,
bounce rates), and
monitors (i.e., monitors that monitor other monitors [Quis custodiet ipsos custodes?]),
What is Being Reported Monitors can be characterized by the data they collect. This can be as simple as a boolean OK
or NOT OK (often abbreviated to NOK); a measure of elapsed time; or some character string.
A delta monitor can watch a resource and indicate every time the resource changes. This is a
handy way to watch for hacked web pages.
Although not as common, a data harvesting monitor can extract data from some resource and
store it for later retrieval. This lets us build a history of changes to dynamic resources.
How the Monitor Collects its Data Monitors can be characterized by how they function.
Passive monitors access data by sniffing network activity. They are not used very often to
measure website performance.
Active monitors, also known as synthetic monitors, request resources from a server just as if they
were real users accessing the website. They either simulate user agents (e.g., browsers, cell
phones) or use scripting with real user agents. A monitoring service typically provides multiple
user agents and versions.
Monitoring agents are code that we install or embed. They can be installed on our server or
another of our machines, or embedded into web pages that are sent to users. The agent acts as
a monitor and stores its data externally (e.g., by sending the data to an external monitoring
system or a database).
12
The Monitis Introduction to Website Performance
Publication of Monitis
Types of Monitors
Uptime Monitors What is Monitored: the ability to connect
What is Reported: OK/NOK and response time
Location: cloud-based
Uptime monitors attempt to connect to a server through a specified protocol. If a connection can
be established, the monitor records OK and logs how long it took to get a response. If a
connection cannot be established, the monitor records NOK.
At the time of writing, Monitis provided uptime monitors for HTTP, FTP, DNS, MySQL, UDP,
IMAP, SIP, HTTPS, ping, SSH, TCP, POP3, SMTP, and SOAP.
Server/Device Monitors What is Monitored: various system and environmental resources
What is Reported: OK/NOK, SLA compliance, others depending on the resource
Location: on the device or network being monitored
Server/device monitors are installed on the machine or network that is to be monitored. They
poll system devices and/or read system measurements on a regular basis and report their
findings back to the external monitoring system.
At the time of writing, Monitis provided server/device monitors for CPUʼs, memory, linux load,
drives, processes, system events, disk I/O, bandwidth, SNMP, ping, HTTP, and HTTPS.
Application Monitors What is Monitored: supporting software installed on the server
What is Reported: OK/NOK, SLA compliance, others depending on the resource
Location: on the same machine as the application
Application monitors are installed on the machine to be monitored. They can query the target
software from a separate process or be installed as a module/extension of the target. Database
management systems are the most commonly monitored applications because they are essential
to almost all webapps.
At the time of writing, Monitis provided application monitors for Java/JMX, MySQL, and MSSQL.
End-User Monitors: Page Load Monitors What is Monitored: the loading of a web page, including all downloads and script execution
What is Reported: begin, end, and elapsed time for each component downloaded
Location: cloud-based
Page load monitors load a web page and all of its components, keeping track of the begin and
end times for each phase of loading each component. Graphing the results in a waterfall chart
helps us identify slow resources and blocking, which we then work to minimize.
13
The Monitis Introduction to Website Performance
Publication of Monitis
End-User Monitors: Transaction Monitors What is Monitored: transactions
What is Reported: OK/NOK, elapsed time, failure points
Location: cloud-based
Transaction monitors simulate user transactions on the specified server. A transaction is an
ordered list of mouse clicks and keyboard events defined by the developer, usually to navigate
through multiple web pages. Examples of transactions: logging in, searching a database,
adding an item to a shopping cart, signing a petition, sending an e-mail.
There is a tendency to rely only on page load monitors because they are simple to implement
and they tell us about most problems. However, they do not tell us if a multi-page sequence of
steps generates a correct response in an acceptable time, nor are they that much simpler than
transaction monitors.
End-User Monitors: Web Traffic Monitors What is Monitored: client requests for a specified web page
What is Reported: details of each hit
Location: agent/embed
End-User monitors are installed on the server or embedded into web pages. They can log the
timestamp every time the page is served, along with the client IP address and port, the referring
page, and the session ID. This information can be used to identify paths users take through the
system and pages with high bounce rates. In its simplest form, we can graph the number of hits
per unit of time.
Cloud Monitors What is Monitored: cloud-based virtual machines
What is Reported: number of instances, various measurements of each instance
Location: installed on the virtual machines
Cloud monitors are not the same as cloud-based monitors (described above). Cloud monitors
monitor cloud-based virtual computing environments provided by vendors. An installed agent
monitors the entire virtual environment and reports back to the external monitoring system.
At the time of writing, Monitis provided monitors for Amazon EC2, Rackspace, and GoGrid.
Real-User Monitors What is Monitored: usually page load
What is Reported: begin/end/duration times
Location: embed
Real-User Monitors are embedded into web pages, usually to capture timing data. Newer ones
use the World-Wide Web Consortiumʼs PerformanceTiming interface to capture the timing data
that was created by the browser.
14
The Monitis Introduction to Website Performance
Publication of Monitis
Custom Monitors What is Monitored: anything you want
What is Reported: anything you want
Location: anywhere you want
Custom Monitors are monitors you create yourself, so they can do whatever you want them to
do.
Monitis provides an API for uploading results from custom monitors to the Monitis dashboard.
Contributors have created software development kits (SDKʼs) to further simplify development in
various languages.
What Should I Monitor?
#1 - Uptime: This is the most important metric because it tells whether the server is even there.
Think about it this way — a website constantly going down is the same thing as a shop being
closed at random times. We can lose a lot of customers.
#2 - Page Load: This metric is crucial because people have very short attention spans (a
couple of seconds). If web pages donʼt load quickly enough, we lose potential customers. Also
keep in mind that Google lowers the rankings of pages that load slowly.
#3 - Transactions: Identify and monitor key transactions in the system. Itʼs not enough to
know that the server is there and responding. We must also know that key transactions can be
completed. After all, the user experience is what counts.
#4 - Servers: Keep a close eye on CPU, RAM, storage, bandwidth, and application processes.
It not only helps prevent upcoming faults in the system, but also indicates whether we need to
upgrade resources.
#5 - Applications: Have you ever seen a web application without a database? No, right?
That’s because without databases, we have nowhere to store the information that drives the
business. Database monitoring helps us keep the database performing optimally. Not only
databases, though. We should be monitoring whatever applications our site relies on.
#6 - Networks: Monitor with SNMP, UDP, PING, TCP, SSH, and other protocols to make sure
the network is healthy. In addition, check for cross-office WAN connectivity: telnet, HTTP,
intranet, extranet, routers, and switches.
15
The Monitis Introduction to Website Performance
Publication of Monitis
Notifications
A monitoring system stores its results and provides an interface to let us see those results. For
example, the raw data may be available in a table or it may be presented graphically. The better
systems provide both interfaces, and may provide others.
Seeing results from the past allows us to analyze what has happened. This is especially useful
after changes have been implemented or when problems surface. However, monitoring systems
that let us know about problems as they happen provide an additional benefit. They give us an
opportunity to react to the problem without delay. There is no need to wait for a user to send in
an error report. There is no need to wait for the help desk to pass the message along to the de-
velopment team. All of this can be bypassed if the monitoring system sends us an SMS
immediately upon the problem rearing its ugly head.
Notification is most often by SMS or e-mail, but can be by telephone, XMPP, HTTP callback,
instant messaging, fax, pager, Skype, RSS feed, or any other medium.
Of course, we want to differentiate between problems that should wake us up in the middle of the
night vs. those that can be dealt with the next day vs. those that are a lower priority. Monitoring
systems respond to this need by letting us set the bar for a metric and select the type of
notification. We can use this feature to specify less timely notifications (e.g., e-mail) for smaller
problems and more timely notifications (e.g., SMS) for critical problems.
Not all measurements need to trigger notifications. We may decide that certain measurements
are for post facto study only. We do not need notifications for these.
One common problem with notifications is their overuse. If notifications are received too
frequently for problems that can be postponed, the technical staff may ignore them or skim
through them without due diligence. Repetitious notifications also fall into this category. One or
two repetitions may be useful to ensure the message was delivered, but more than that may
actually cause grief for the response team. It is wise to use notifications sparingly, but still often
enough that we are made aware of higher priority issues as they develop.
What Triggers a Notification?
The development team defines which measurements trigger notifications. For example, if the
system is not available for more than one minute, we can send an e-mail to our help desk. If the
problem continues for more than fifteen minutes, we can send an SMS to one of our problem
determination specialists.
Service level agreements often set the bar for page load times. A notification can be sent
whenever a page loads more slowly than the SLA requires, or perhaps if it loads consistently
slowly over a 15 minute period.
Good notification systems give us the flexibility to define notifications in a variety of ways. This
lets us create the notifications that are best for our system.
16
The Monitis Introduction to Website Performance
Publication of Monitis
CONCLUSION
Website performance is the user’s perception of being able to get on with what he wants to do
without delay. Anything that leaves the user tapping his fingers while staring at the computer
screen is a performance issue. It is not limited to page load time, although that is one of its main
components.
Performance is not the only objective. Website development teams must also consider
availability, confidentiality, correctness, cost, data integrity, human factors, integration with other
components, maintainability, performance, portability, privacy, quality assurance, release ma-
nagement, resource usage, scalability, security, service-level agreements, stability, sustainability,
usability, and much more. Performance considerations can conflict with any of these other
objectives, which then forces a tradeoff. Teams should be encouraged to prioritize performance
as much as they can.
Performance’s critical timeframe is any time the user is waiting for the computer. Anything the
code does during the critical timeframe, no matter how necessary it may be, contributes to per-
formance issues. Opportunities often exist to move processing to the build process or to batch
jobs. These tasks can be scheduled outside the critical timeframe, and often at the system’s
quiet times.
Waterfall charts are a key tool in analyzing performance. They graphically show each resource
that is downloaded, when it is requested, when it starts to arrive, and when it is fully available.
Performance analysts can see at a glance which components are contributing to performance
problems.
Measurements are faulty because they do not measure performance in its entirety, especially
since the word perception is included in the definition of performance. However, measurements
are far superior to doing nothing. Monitoring every page’s load time on an ongoing basis is
essential. Individual multi-page transactions also need to be continuously watched. Other
measurements can track the various subcomponents of the system to let us know where perfor-
mance problems originate. All of these can be automated easily with Monitis’ cloud-based tools.
Slow web pages have been shown to impact conversion rates, interaction, engagement, and
(more importantly) revenue. This means two things: #1) Ignoring or lowering the priority of web
performance reduces a company’s net income. #2) Focusing in on website performance
presents an opportunity to increase revenue.
Research has shown, time and again, that minute changes to a web page’s speed can cause
dramatic changes in the user’s behaviour. The well-known two-second rule says web pages
should load in under two seconds. Some say two seconds should not be considered a goal, but
rather a starting point.
17
The Monitis Introduction to Website Performance
Publication of Monitis
APPENDIX A – HOW TO READ
A WATERFALL CHART
The first step in addressing any problem is to gather information. For website performance prob-
lems, waterfall charts are a handy tool.
What is a Waterfall Chart?
A waterfall chart is a bar graph that shows the individual and cumulative effects of sequentially
introduced values. In website performance analysis, each bar represents one component on a
web page and the entire chart represents the fetching and rendering of the page.
What Do The Horizontal Bars Represent?
Each request for each component goes through these stages (although some stages will be
skipped if they are unnecessary):
1. wait for the browser – The browser may have to wait for some other component to
download before it can begin downloading the requested component. If it’s waiting for a
script, it may be waiting for both the download and the execution of the script. In both
cases, this is called blocking.
2. look up the IP address – If a domain name is used to specify the requested component,
the browser must find the IP address for that domain. It uses the Internet’s domain name
system (DNS) to do this.
3. connect – The browser uses the socket layer to establish a connection to the server.
4. shake hands – If the component is protected by SSL (public/private key encryption), the
browser and server must “shake hands” before a connection can be established.
5. send the request – The browser sends the request to the server.
6. wait for the first byte – The browser waits for the server to send the first byte.
7. receive – The server sends the content and the browser receives it.
8. render – The browser writes the words and draws the images in its viewport. The
content becomes visible to the end-user during this stage.
18
The Monitis Introduction to Website Performance
Publication of Monitis
The browser can handle more than one component at a time. For example, it can receive
several components at once or it can render one component while sending the request for a
different component. However, for each individual component, the order of the above actions is
mostly fixed.
A waterfall chart’s horizontal bars present the download/render process in a visual manner.
Each bar represents one component. Its length represents the amount of time used (i.e., the x-
axis represents time). Different colours are used to represent the different stages.
In the waterfall chart pictured at the beginning of this appendix, light blue represents the DNS
lookup, dark blue represents establishing a connection, peach represents a blocked state,
orange represents sending a request, pink represents waiting for the first byte, red represents
the downloading/receiving stage, light green represents the creation of the DOM tree, and dark
green represents the time to draw the images and text.
Some waterfall charts use slightly different terminology (e.g., look up the IP address may be
called DNS or DNS lookup), but they mean the same thing. Some charts combine multiple con-
secutive stages together into one (e.g., the SSL handshaking may be included in
the connect stage). Some charts may not show the first stage.
The list of stages above implies that each stage ends before the next stage begins. That is not
necessarily true. For example, a browser can start rendering before it finishes receiving all the
data. A waterfall chart may not show this overlap and it may show the overlapping time period
as either the preceding stage or the following stage. It’s another case of check the
documentation.
If one component blocks another, the waiting component’s bar will start near the end of the first
component’s bar. The opposite is not true, though. If a bar begins at the end of some other bar,
it may not have been waiting on that other component.
What Do The Vertical Lines Represent?
Some waterfall charts, like the above, include variously-coloured vertical lines. These lines re-
present the occurrence of some event. For example, vertical lines can be used to represent the
point in time at which:
1. the first byte is received,
2. the HTML and JavaScript are fully parsed and deferred scripts have finished executing
(i.e., JavaScript’s DOMContentLoaded event),
3. the document is fully loaded (i.e., JavaScript’s onLoad event), or
4. the browser starts to render the text and images.
The vertical lines of different waterfall charts may represent different events, so check the
legend. Note, too, that this information may not be presented as vertical lines on the waterfall
chart, but may be present elsewhere on the page.
19
The Monitis Introduction to Website Performance
Publication of Monitis
How Can a Waterfall Chart Help Me Fix a Performance Problem?
The following are things you may observe about individual components on a waterfall chart,
along with what they mean and what you can do about them:
Waiting for the browser takes too long: If this is because of non-browser processes that
are hogging the CPU, terminate processes that don’t need to be running or migrate to a bigger,
faster whiz-bang computer. If this is because the browser is doing other things, change the order
in which the components are downloaded, update the browser, or move to a faster browser. If
this is because of blocking, defer loading less critical components until after the document is fully
loaded. Also use concurrency as much as possible.
Looking up the IP address takes too long: The domain name system is slow. Avoid
lookups by using IP addresses instead of domain names and caching zone file entries for as long
as possible. Moving the DNS server physically closer to the browser may also help.
SSL handshaking takes too long: Avoid SSL unless it is absolutely essential. We don’t
need to encrypt every component on a secure web page, do we? Example: Where’s the sense
in encrypting the company’s logo?
Connecting takes too long: This may be a network or Internet problem, or it may be a sign
of a busy server.
Sending a request takes too long: This may be a network or Internet problem, or it may be
a sign of a busy server.
Waiting for first byte takes too long: If this phase takes too long and other phases are
lightning fast, look at the server first. Look to see if the code or database are causing perfor-
mance problems. It may also indicate a networking or Internet problem, though, so consider
both possibilities.
Receiving content takes too long: This usually indicates a large quantity of data. Reduce
the number of bytes as much as possible. Smaller is better. Don’t assume that splitting the
content into multiple download streams will help, though. Quite the opposite, downloading more
components will incur additional overhead, which may increase the total download time.
Also look into minification, more compact algorithms, and eliminating redundancy. Finally, don’t
forget to keep images small and compress everything that can be compressed.
The distance between the client and the server impacts this phase and the previous five phases.
Consider storing components on machines that are physically closer to your end-users.
Rendering takes too long: It’s time to look more closely at DOM performance tips. [As a
general rule, there should be no dynamic (script-based) changes to the DOM tree before the
document reaches the point of interactivity.]
Looking at the forest instead of the trees can be helpful. Take a step back and look at the chart
as a whole. If you see the following, here’s what you can do about it:
20
The Monitis Introduction to Website Performance
Publication of Monitis
There are too many components: Each component has its own overhead. As shown
above, there is much more than data transfer happening here. Multiple components can be
joined together into a single data stream. Example: Multiple JavaScripts can be joined together
into a single JavaScript file.
One bar is longer than the others: The longest bar is for the component that takes the most
time. Re-evaluate it according to the above criteria. Be extra-finicky for this one. Small is your
friend; big is your enemy.
The chart shows a staircase formation: If the chart is a staircase, components are being
processed serially. That’s about as bad as it can get. Look into performance tips that increase
parallelization.
All (or almost all) DNS lookups have non-zero times: If multiple components from the
same domain are all looking up IP addresses in the DNS, then caching for that domain name has
been turned off. Turn it back on. Better yet, avoid DNS lookups completely by serving
components from IP addresses instead.
All (or almost all) components have non-zero connect times: Every component is
establishing its own connection instead of reusing existing connections. Connection persistence
is probably turned off at the server. Turn it on.
The start render line is too far away from zero: This line tells us how long it takes before
the end-user gets the first bit of visual feedback. If it takes too long, the user may click the
refresh button, which restarts the entire process. [Worse yet, he may just give up.] Defer every
possible script execution and component download until after onLoad fires, then
download/execute in the order that gets the end-user to the point of interactivity as soon as
possible.
The document complete line is too far away from zero: Perhaps some of the previously-
loaded components can be deferred until after the document is fully loaded? There are several
techniques available to make this happen.
What Are a Waterfall Chart’s Drawbacks?
A single waterfall chart taken in isolation can be misleading. Factors that affect performance can
be dramatically different with a different user agent or even one minute later on the same
machine. Decisions should never be made on the basis of a standalone waterfall chart.
A waterfall chart for the first access of a web page will be dramatically different from subsequent
accesses because the subsequent accesses will use whatever caching is available. This is only
a drawback if the reader is not aware of the difference. It’s a benefit to see and compare both
waterfall charts.
Cloud-based waterfall charts do not measure performance at your end-users’ computers. Real
User Monitoring (RUM) does, but it will present different data at different times because of the
21
The Monitis Introduction to Website Performance
Publication of Monitis
dynamic nature of the Internet, the server, and the user’s computer. Performance is subject to
so many factors that averages alone may be meaningless.
A Free Cloud-Based Waterfall Chart
Monitis’ Full Page Load Tool generates waterfall charts. It’s as simple as giving it a URL and
clicking a button, but there are several non-default options, too. And best of all, it's free!