vista solutions uncompromised it. · running on that server, probably an asp.net page, invokes a...

11
V ISTA S OLUTIONS Uncompromised IT. Everyone blames the network first. ...and you’re right, it’s probably not the network. This whitepaper explores the more likely culprit of a slow “network” and how to proactively find and fix issues before users notice. IT’S NOT THE $#!&% NETWORK!

Upload: others

Post on 05-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: VISTA SOLUTIONS Uncompromised IT. · running on that server, probably an ASP.NET page, invokes a .NET Worker Process to process the transaction. As is commonly the case, this .NET

VISTA SOLUTIONSUncompromised IT.

Everyone blamesthe network first.

...and you’re right, it’sprobably not the network.

This whitepaper explores themore likely culprit of a slow

“network” and how toproactively find and fix

issues before users notice.

IT’S NOTTHE $#!&%NETWORK!

Page 2: VISTA SOLUTIONS Uncompromised IT. · running on that server, probably an ASP.NET page, invokes a .NET Worker Process to process the transaction. As is commonly the case, this .NET

IT’S NOT THE $#!&% NETWORK!Practical troubleshooting tips for a slow “network” and how toproactively find and fix issues before users notice.

Modern computer networks and the applications that run on them areprofoundly complex. We architect them so that, from the point of viewof any one component or program, the connections are simple and therest of the system abstract. But huge numbers of hardware and softwarecomponents and the connections between them create endlessopportunities for things to go wrong.

When end users see things slow down they are apt to complain to thehelp desk that “the network is slow.” Maybe it is, maybe not. Problemsat any number of points in the overall architecture could cause theslowdown.  

Without the right intelligence, you could spend a great deal of timefinding the cause of the problem and solving it. With the rightintelligence, you can find and remediate problems quickly. The besttools and insights will bring the problem to your attention without usershaving to complain, and perhaps before they even notice.

A NOTE TO THE READERIt may appear as though although the problems addressed in this paperare almost universal across all types and sizes or businesses, thesolutions and recommendations made for addressing these problemsmay seem geared toward large enterprises.

Please understand that this is not the case. The fact of the matter is thatthe average business (not just enterprises) loses $140,000 every yearand up to 45 productive hours every year as a result of performanceissues. There are services, such as those provided by Vista Solutions,that can provide enterprise level insights that can help IT professionalsin small- to mid-sized businesses increase productivity and reduce costsfor a fraction of the cost of an enterprise hardware purchase.

When end userssee things slowdown they are

apt to complainto the help desk

that...

Maybe it is,maybe not.

THENETWORKIS SLOW

Page 3: VISTA SOLUTIONS Uncompromised IT. · running on that server, probably an ASP.NET page, invokes a .NET Worker Process to process the transaction. As is commonly the case, this .NET

IT’S NOT THE $#!&% NETWORK! Practical troubleshooting tips for a slow “network” andhow to proactively find and fix issues before users notice.

WHAT IS “THE NETWORK” ANYWAY?To an end user, everything on the other side of their Ethernet port is “the network,”and from their point of view there’s some truth to this. The image below shows justhow complicated that blob called “the network” is. 

Source: Exposing and Fixing Common App Performance Problems (Riverbed Technology)http://www.slideshare.net/riverbedtechnology/take-control-of-application-performance-52822997

The diagram shows the path of a typical operation in a typical enterprise application,perhaps posting a payment to a customer record. Once the user clicks “Submit,” theirbrowser sends all the fields to the web server. On the way there it traverses the LocalArea Network (LAN), perhaps the Wide Area Network (WAN), eventually reaches aparticular server. (Perhaps to keep it somewhat readable, the diagram does notinclude load balancers and security appliances through which the traffic may pass onthe way to the server.)

Once at the server it is collected by the TCP/IP stack and picked up by the webserver which, in the diagram, is Microsoft Internet Information Server. The web pagerunning on that server, probably an ASP.NET page, invokes a .NET Worker Process toprocess the transaction.

As is commonly the case, this .NET code forwards the operation through the LAN onto a separate Application Server, in this case a Java App Server, perhaps runningOracle WebLogic or IBM WebSphere Application Server. At the edge of this physicalserver the transaction is initially handled by VMWare, a virtualization hypervisor, androuted through a virtual internal network to a particular virtual server. This server hasits own TCP/IP stack and is running the Apache Web Server which puts thetransaction into a queue for the Java App Server. 

ONEHUNDREDFOURTY

THOUSAND

The average business(not just enterprises)

loses...

and up to

every year as a resultof performance issues.

45PRODUCTIVE

HOURS

Page 4: VISTA SOLUTIONS Uncompromised IT. · running on that server, probably an ASP.NET page, invokes a .NET Worker Process to process the transaction. As is commonly the case, this .NET

IT’S NOT THE $#!&% NETWORK! Practical troubleshooting tips for a slow “network” andhow to proactively find and fix issues before users notice.

“FRANKEN-MONITORING” AND WHY IT’S SOHARD TO FIGURE OUT WHAT THE PROBLEM ISWe have established that the complexity of the modern IT infrastructure makes simplydetermining the SOURCE, not to mention the root cause of the problem extremelychallenging.

Between the numerous components, such as web servers, database servers, applicationservers, applications software, operating systems, networking software, networking hardwareand many more, (each of which is built from multiple components), finding the source of theproblem is literally like looking for a needle in a haystack.

It’s no wonder it takes an average of 7 hours for serious application problems to becompletely resolved.

The availability of tools at IT’s disposal to troubleshoot the problem is not the issue – in fact arecent study revealed that 87% of companies employ more than 6 management/ monitoringtools that they use to troubleshoot performance issues.

But the majority of these tools examine performance of individual segments of theinfrastructure; providing insights into the network, another into the database, another intothe server, and so on.

The App Server contains code written by the organization’s developers to process thetransaction, either committing it to at least one database server, which are certainly runningon other physical servers, or perhaps through other processing on their way to the database. 

The communications between these servers may be simple HTTP or HTTPS or it may involveRPCs (Remote Procedure Calls), which give the programmer the illusion of simply calling afunction but which, behind the scenes, involve complex marshalling of data and packaging topass, often through protocols which are not common elsewhere, to the receiving end of theRPC. 

When the database commit is complete and the App Server confirms it has completedsuccessfully it passes this information back, through Apache, the TCP/IP stack and VMWwareor other hypervisor, over the LAN to the .NET code on the Windows server, to the ASP.NETpage on the IIS server which constructs an HTML response page and sends it through theTCP/IP stack, over the WAN and LAN to the end user’s system which displays “TransactionComplete.”  

If 30 seconds go by after the user clicks submit and he hasn’t gotten “Transaction Complete”yet, what is he to make of it? The network is slow or broken. What are you, who hasresponsibility to determine the cause and fix it, to make of it? 

Even this description considerably simplifies the situation. Within a closed enterprise thereare services like DNS and caching services which can play a significant role in performance asperceived by the end-user. In truly modern applications which invoke third party services fromthe cloud, a bad router on another continent could result in effects visible to your users.

A recent studyconducted by a

well-known research�rm shows that a

majority

52%of the

IT operationssurveyed

waste more than

20%of their

operationalresources

to track andcorrect problems

and that applicationperformance

issues are a majorcause of businessproductivity loss.

1

2

Page 5: VISTA SOLUTIONS Uncompromised IT. · running on that server, probably an ASP.NET page, invokes a .NET Worker Process to process the transaction. As is commonly the case, this .NET

IT’S NOT THE $#!&% NETWORK! Practical troubleshooting tips for a slow “network” andhow to proactively find and fix issues before users notice.

APPLICATION ISSUESWhen the “network” is slow, as we all know, it is often not the network. It could very well bethe applications on the network that are causing the issue.

In the scenario of the diagram, the application the user is running is itself widely-distributed.It is his web browser, the IIS server, the .NET code, the Java App Server code, perhaps somedatabase server code and much more. A problem in any one of these could cause what theend users perceives as a slow network.  

Why do applications have these problems?

Just as the overall network is complex, so are the applications internally. They must beengineered to perform optimally and to allow administrators to configure them for optimalperformance in varied circumstances. The very complexity of the systems makes theopportunity for functional, but inefficient connection between components very easy. Theinefficiency could be context specific, and therefore more difficult to find. 

In significant application communities, such as Java, there is often a great deal of literatureand community knowledge on performance issues. An excellent example is the Quoradiscussion Why are big Java programs so slow? The short answer is that there are manyreasons why they might be slow, but they don’t have to be. Knowing these general rules andthe specific nature and needs of your applications will help you to find the specific problemand solution. 

Application configuration can also have a large impact on performance. Sometimes allocatinga greater amount of memory or changing the number of CPU cores on which the applicationruns can greatly improve performance. To address such problems usually requires expertise inthe application When an application or platform is significant enough you can obtainperformance tuning guides from the vendor or from third parties, for example the ApacheHTTP Server Performance Tuning documentation. 

The issue then becomes not the absence of tools but the absence of actionable insight thosetools provide. Used individually, it’s easy for them to miss the overall problem. The majorityof organizations don’t have a “single pane of glass” that lends itself to that level of insight.This leaves IT to slowly stitch together a patchwork understand of what is going on usingdomain-specific tools, monitoring a web server or a database engine or the network inisolation. The AppDynamics study referred to this as “Franken-Monitoring.”

This inefficient process leads to finger-pointing in IT as each group insists that nothing iswrong with their hardware and software, resulting in additional wasted time and resources.

 In such a complex environment IT cannot be expected to diagnose performance problemswithout actionable insights that look at the totality of the application environment. Often ITdepartments are forced to waste time and generate acrimony rather than solutions wheneach of their siloed tools declares that nothing is wrong.

And even if you can isolate a problem quickly to a particular program or component, it maybe more difficult to determine exactly what is wrong and what to do about it. In such casesapplication-specific tools, long nights digging through log files or contacting tech supportmaybe necessary. In especially bad cases it may even be necessary to read the manual. 

So what’s an IT professional to do?

Now before you go crazy and pull out the manual, here is a list of a few “usual suspects”other than the network that could be causing the problem:

The issue is notthe absence of

tools to monitorperformance,

but the

ABSENCE OFACTIONABLE

INSIGHTthose

tools provide.

Page 6: VISTA SOLUTIONS Uncompromised IT. · running on that server, probably an ASP.NET page, invokes a .NET Worker Process to process the transaction. As is commonly the case, this .NET

IT’S NOT THE $#!&% NETWORK! Practical troubleshooting tips for a slow “network” andhow to proactively find and fix issues before users notice.

SERVER AND OTHER HARDWARE ISSUESThe overwhelming bulk of complexity and opportunity for error in an application is in thesoftware, not the hardware. So it can be hard to believe when it happens, but hardwaredoes fail sometimes. Even network cables can go bad, causing problems that are difficult tonail down.  

Once again, the key to identifying the problem is good information. You really can’t have toomuch log data. Think about keeping logs like SMART logs from your hard drives and otherhardware-based logs. These can show hardware problems that are slowing down the systemas well as errors that may later develop into failures.

DATABASE ISSUESPerformance in modern database management systems (DBMS) is the subject of a great dealof study and literature. As with most other topics, there are issues of broad architecture andissues with a more minute focus. Before you need to do troubleshooting, you need to designand tune your DBMS and specific databases properly. The major vendors have guidelines youshould follow, such as those in Oracle’s Database Performance Tuning Guide. 

Almost any application that uses a DBMS has multiple layers of software surrounding andcontrolling access to it. So it’s important to confirm that the problem is in the DBMS and notother software layers.  

You have a wide choice of performance monitoring and management tools for databasemanagement systems, both from independent vendors like Riverbed and from the DBMScompanies themselves. The tools from third party vendors may have the advantage ofintegrating with performance monitoring and management of a more complete view of thenetwork and still give a very deep view into the behavior of the DBMS.

CLOUD ISSUESA cloud architecture at once changes none of this, because the software doesn’t have toknow that it’s running in a cloud, and everything, because the physical nature of the systemsit’s running on are so radically different.  

And so you need to maintain the fictional, traditional view of your architecture and manageand monitor it that way, because that’s what the software thinks it’s running on and manyissues will proceed as normal. But it’s also the case that you don’t know what physical serversyour software is running on, perhaps not even the data center they are in. The networkingdevices are probably all just software components, as are the DBMS systems. Do you suspecta problem with a particular DBMS instance? Start up another one in its place and dispose ofthe suspect one. 

You should specifically do performance monitoring on your key applications, and frommultiple locations. If you have 5 locations that use a particular application, monitor it from all5 if possible. If only one of them has a performance problem, then you can eliminate manypossibilities immediately. If all of them have a performance issue, then the problem is almostcertainly at server itself or something else where the server is housed. Monitoring andlogging performance also gives you a sense of when peak periods are; it may make sense, ifyour architecture allows it, to pre-allocate more resources to the application at those times. 

These logging and performance monitoring measures are not, in and of themselves,troubleshooting, but they make troubleshooting simpler, perhaps even easy. 

7

It takes an averageof 7 hours for

serious applicationproblems to be

completely resolved.

Page 7: VISTA SOLUTIONS Uncompromised IT. · running on that server, probably an ASP.NET page, invokes a .NET Worker Process to process the transaction. As is commonly the case, this .NET

IT’S NOT THE $#!&% NETWORK! Practical troubleshooting tips for a slow “network” andhow to proactively find and fix issues before users notice.

OTHER ISSUESThe really nasty problems come when components, which are working correctlyindependently, interact to cause a problem. It happens.  

An excellent example is described in a paper by Riverbed Technology6 on finding andremediating performance problems. Performance of a .NET application was intermittentlyslowed for reasons which were difficult to determine.  

The investigating analyst decided to correlate these delays with performance data they hadfrom all other components of the system. He discovered that they were coincident withgarbage collection and associated CPU spikes in a Java application. 

Using SteelCentral’s AppInternals tool, which integrates with VMWare (as well as otherhypervisors) to read its performance and log information, IT learned that the two applications,contrary to what they had believed, were running on the same CPU core under the samehypervisor. When the Java program decided to consume all the CPU capacity, inevitably itwould slow the .NET application. The immediate solution was to separate the Java applicationin its own VM. The Java application itself arguably needed some fixing, but once in a separateVM with limited capacity it could no longer hamper other applications.  

Finally, there is another concept worth remembering when investigating complicatedperformance problems, the Flaw of Averages. A focus on broad measures can often obscurespecific problems, and CPU utilization in the problem above is a good example. Overallperformance of the system may have been fine, when looked at in the big picture, becauseaveraging a large set of numbers will obscure the impact of isolated problems, even if theyare severe.

Another nice thing about the cloud is that the architecture makes adding new services cheapand easy, and monitoring is one of those services. Public cloud vendors like Amazon WebServices and Microsoft Azure provide a wealth of management and performance monitoringtools, while also supporting your existing tools. 

Large public clouds also provide more redundancy that you could likely afford, for systems,for Internet connectivity, even multiple data centers. So many of the problems that mightconcern you for a conventional IT installation are less likely in the cloud. 

And yet your network in the cloud may have its own special problems, especially if theprovider itself has them. in 2015 Amazon Web Services had a major failure in their NoSQLdatabase DynamoDB. One of the customers relying on this service was Netflix, but eventhough the problem persisted for about 8 hours Netflix wasn’t down for long5 because theytook extreme (and expensive) measures to ensure they would stay up and delivering even inthe event of an AWS catastrophe. 

Finally, while they may not strictly be cloud services, modern applications often rely onoutside software services, such as DNS. Problems at an outside DNS service really couldmake “the network” slow. Remember to put these services on your checklist, although anydecent analysis tool will note if DNS performance is off.

The average business loses

$140KPER YEAR

due toperformance

problems.

For largeenterprises, an

application failurecan cost a whopping

$500K - $1MPER HOUR

Put that in yourpipe and smoke it.

Page 8: VISTA SOLUTIONS Uncompromised IT. · running on that server, probably an ASP.NET page, invokes a .NET Worker Process to process the transaction. As is commonly the case, this .NET

IT’S NOT THE $#!&% NETWORK! Practical troubleshooting tips for a slow “network” andhow to proactively find and fix issues before users notice.

A BEAUTIFUL VIEW:BUILDING BUSINESS CASE FOR IT VISIBILITYThe fact of the matter is most non-IT professionals don’t understand theinherent complexity of finding and fixing slow “network” problems,particularly if you don’t have the insights to quickly drill down to what ishappening.

Most business professionals also don’t realize how much this“franken-monitoring” you are forced into is costing them. A recent studyshowed that the average business loses $140,000 per year due toperformance problems. For large enterprises, an application failure can costa whopping $500K - $1M per HOUR.

Put that in your pipe and smoke it.

Once business decision-makers in your organization realize the financialimplications of poor application performance, they may be more open to aconversation on how to address it.

The best way to identify the source of performance problems, whether ornot they actually are in “the network,” is to have the insights that allow youto pinpoint the source of the problem quickly. Having the most completeinformation available should make finding the problem straightforward. Solving it may still be a challenge, but at least you’ll know who is responsibleto work on it and can take measures to work around it until you have asolution. 

KEY RECOMMENDATIONSThe key to truly getting rid of the endless network fingerpointing is empowering IT withvisibility. Having a complete, holistic view of the application in all places, it can quickly findthe anomaly causing the problem.

The insights provided by a comprehensive Application Performance Monitoring (APM) andNetwork Performance Monitoring (NPM) system which takes monitoring information from allaspects of the system: servers, server applications, networking systems, outside data sources,and even end user experience is critical to solving performance problems before they impactthe business.

This granular view of the entire system can look at the complete reach of the application,providing end-to-end visibility into its workings. It can trigger alerts at an early stage suchthat users may not even notice the problem yet.

There areemerging

companiesthat provide

enterprise levelof actionable

data as a service

…helping smallerbusinesses

INCREASEPRODUCTIVITY &REDUCE COSTS

for a fractionof the cost

of an enterprisehardware purchase.

3

4

Page 9: VISTA SOLUTIONS Uncompromised IT. · running on that server, probably an ASP.NET page, invokes a .NET Worker Process to process the transaction. As is commonly the case, this .NET

IT’S NOT THE $#!&% NETWORK! Practical troubleshooting tips for a slow “network” andhow to proactively find and fix issues before users notice.

A study conducted by a well-known research firm which surveyed over 150 IT professionalswith direct responsibility for business-critical applications, found that application and businessservices performance problems have the greatest impact the business’ bottom line, but werethe most challenging issues to resolve. The lack of proactive alerting and root-causeidentification proved to be a significant obstacle to fast resolution. These challenges lea notonly to escalating business costs, but also to a fragmented and inefficient use of IT resources.

Based on these findings, the research organization made these3 key recommendations:

In order to be EFFICIENT, IT needs the right TOOLS (or the depth of INSIGHTS thosetools provide). Ideally, the tool set will have broad domain monitoring capabilities that can beabstracted in a way that focuses on services and therefore be able to model businessservices’ dependencies on the underlying infrastructure that is used to deliver eachapplication. This capability promotes cooperation across teams, better resource prioritization,and more streamlined troubleshooting.

The TOOLS need to provide the right INFORMATION. An important part of the toolintegration is the ability to understand the dynamic context of each business service and theability to model which infrastructure components are used in delivering the service to the enduser. This is the basis needed for an accurate analysis of issues.

The right INFORMATION needs to promote team COOPERATION. Infrastructure andapplication management requires teamwork. Multiple constituencies intervene at thedifferent stages of the incident and problem management process. Each of these participantsmust not only find the right information to perform their tasks, but do so in accordance withthe other team members. A common, integrated view of all component data is a key featureof a management solution.

123And at the end of the day, IT empowered with this level of visibility have saved hundreds ofproductive hours and millions of dollars by being able to discover and fix issues quickly,possibly before any users notice it.

But what about the small to med-sized business?

It would be remiss not to mention the budget issue. Unfortunately, every business losesmoney with this problem but the advanced APM/NPM equipment described above isgenerally only available to those with an enterprise-level budget. There are emergingcompanies, such as Vista Solutions, provide this level of actionable data as a service tosmall-and medium sized businesses, that can help increase productivity and reduce costsfor a fraction of the cost of an enterprise hardware purchase. This problem is prevalentacross all sizes of businesses and the solution – is not continued trial and error multi-toolmonitoring, it’s enterprise level insights at affordable prices.

To learn more about services like this, you can visit http://vistasolutions.net for moreinformation.

Are youtired of the

STRESS,ungrateful users

BLAMING YOURNETWORK,

andhaving to

WASTEPRECIOUS

HOURSof troubleshooting?

You are certainlynot the only one.

Page 10: VISTA SOLUTIONS Uncompromised IT. · running on that server, probably an ASP.NET page, invokes a .NET Worker Process to process the transaction. As is commonly the case, this .NET

IT’S NOT THE $#!&% NETWORK! Practical troubleshooting tips for a slow “network” andhow to proactively find and fix issues before users notice.

FINAL THOUGHTSWhat if you could find the root cause of slow applications before users could pick up thephone and blame the network?

What if you never had to get a call from your boss asking you “what the hell is going on?”(or at least be able to give him/her a definitive answer when she calls).

So, are you tired of the stress, ungrateful users blaming your network, and having to wasteprecious hours troubleshooting?

You are certainly not the only one.

Here’s the bottom line: No matter how big or small your business may be, every businessrelies on applications for productivity and profitability. Any level of slowdown of thoseapplications lose the business money.

Businesses which rely on their IT need to ensure that the performance of those IT resourcesis as good as it can be. With the complexity of modern systems and the volumes of datainvolved, tools that look at the application from end to end are necessary. IT professionalsmay have access to numerous quality tools that monitor performance of individualcomponents, but these are no longer enough. IT needs to see the complete picture in orderto do their jobs right.

Whether your business is a large enterprise that can purchase advanced APM/NPMinfrastructure, or a small business that hires a consultant to provide the insights thesetools can provide, knowing what is going on in your infrastructure will quickly reap positivebottom line benefits.

ABOUT VISTA SOLUTIONS

Vista Solutions works with business professionals and IT teams in small to mediumbusinesses that are frustrated that their slow applications, network or website is causingthem to lose time and money. Vista Solutions helps companies quickly identify and resolvethe root cause of the problem so that they can be more productive and pro�table.

Since 1975, Vista Solutions has worked with multinational Fortune 100 companies in highlycomplex environments to save millions of dollars and hundreds of productive hours byhelping them sustainably improve the performance of their critical applications.

Learn more about our services at http://vistasolutions.net or contact us athttp://vistasolutions.net/contact-us

Page 11: VISTA SOLUTIONS Uncompromised IT. · running on that server, probably an ASP.NET page, invokes a .NET Worker Process to process the transaction. As is commonly the case, this .NET

IT’S NOT THE $#!&% NETWORK! Practical troubleshooting tips for a slow “network” andhow to proactively find and fix issues before users notice.

SOURCES

1. Riverbed. (2015). Riverbed Global Application Performance Survey 2015.Retrieved from http://www.enterpriseinnovation.net/files/infographic/riverbed-app-performance-survey-apj-findings-page-001.jpg

2. AppDynamics. (2015). The Real Cost of IT Franken-monitoring.Retrieved from http://www.slideshare.net/appdynamics/the-real-cost-of-it-franken-monitoring

3. Avaya. (2014). Network Agility Research 2014.Retrieved from https://www.avaya.com/usa/documents/network-agility-research-2014-avaya-networking-feb-2014.pdf

4. Riverbed. (2015). Measuring the Business Imapact of IT Through Application Performance.