- creating unified it monitoring and management in your environment

Upload: mike-mcdonald

Post on 04-Apr-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/30/2019 http://artandvids.blogspot.ca/ - Creating Unified IT Monitoring and Management in your Environment

    1/92

    Creating Unified IT

    Monitoring andManagement in

    Your Environment

  • 7/30/2019 http://artandvids.blogspot.ca/ - Creating Unified IT Monitoring and Management in your Environment

    2/92

  • 7/30/2019 http://artandvids.blogspot.ca/ - Creating Unified IT Monitoring and Management in your Environment

    3/92

    CreatingUnifiedITMonitoringandManagementinYourEnvironment DonJones

    i

    IntroductiontoRealtimePublishersby Don Jones, Series Editor

    For several years now, Realtime has produced dozens and dozens of highquality booksthat just happen to be delivered in electronic formatat no cost to you, the reader. Wevemade this unique publishing model work through the generous support and cooperation ofour sponsors, who agree to bear each books production expenses for the benefit of ourreaders.

    Although weve always offered our publications to you for free, dont think for a momentthat quality is anything less than our top priority. My job is to make sure that our books areas good asand in most cases better thanany printed book that would cost you $40 ormore. Our electronic publishing model offers several advantages over printed books: You

    receive chapters literally as fast as our authors produce them (hence the realtime aspectof our model), and we can update chapters to reflect the latest changes in technology.

    I want to point out that our books are by no means paid advertisements or white papers.Were an independent publishing company, and an important aspect of my job is to makesure that our authors are free to voice their expertise and opinions without reservation orrestriction. We maintain complete editorial control of our publications, and Im proud thatweve produced so many quality books over the past years.

    I want to extend an invitation to visit us athttp://nexus.realtimepublishers.com, especiallyif youve received this publication from a friend or colleague. We have a wide variety ofadditional books on a range of topics, and youre sure to find something thats ofinterest to

    youand it wont cost you a thing. We hope youll continue to come to Realtime for yourfar into the future.educational needs

    enjoy.Until then,

    Don Jones

    http://nexus.realtimepublishers.com/http://nexus.realtimepublishers.com/http://nexus.realtimepublishers.com/http://nexus.realtimepublishers.com/http://nexus.realtimepublishers.com/
  • 7/30/2019 http://artandvids.blogspot.ca/ - Creating Unified IT Monitoring and Management in your Environment

    4/92

    CreatingUnifiedITMonitoringandManagementinYourEnvironment DonJones

    ii

    Introduction to Realtime Publishers ................................................................................................................. i

    Ch apter 1: Managing Your IT Environment: Four Things Youre Doing Wrong ........................... 1IT Management: How We Got to Where We Are Today ..................................................................... 1Problem 1: Youre Managing IT in Silos ............................................ ............................................. ............ 3Problem 2: You Arent Connecting Your Users, Service Desk, and IT Management ............... 6Problem 3: Youre Measuring the Wrong Things ................................................ ................................. 8Problem 4: Youre Losing Knowledge ..................................................................................................... 12How Truly Unified Management Can Fix the Problems ............................................ ....................... 13Summary .............................................................................................................................................................. 14

    Ch apter 2: Eliminating the Silos in IT Management ............................................................................... 16Too Many Tools Means Too Few Solutions ........................................................................................... 16DomainSpecific Tools Dont Facilitate Cooperation ........................................................................ 19The Cloud Question: Unifying OnPremise and OffPremise Monitoring................................. 21Missing Pieces .................................................................................................................................................... 23Not All of IT Is a Problem: Ordering, Routing, and Providing Services ..................................... 27Coming Up Next ............................................................................................................................................. 28

    Ch apter 3: Connecting Everyone to the IT Management Loop ............................................ ............... 29Starting the Loop: Connecting Monitoring to the Service Desk ................................................... 30Making Changes: How to Find a Change Management Window ........................................ .......... 35Communicating: How to Bring Users into the Loop .......................................................................... 37SLAs: Setting and Meeting Realistic Expectations ................................................ .............................. 39

    Thin Tell Me What You Really k ................................................................................................................... 41When Everyone DoesntNeedto See Everything: A MultiTenant Approach ........................ 42Call It a Private Management Cloud: Allocating Costs ...................................................................... 43Conclusion ........................................................................................................................................................... 44Coming Up Next ............................................................................................................................................. 44

    Ch apter 4: Monitoring: Look Outside the Data Center .......................................................................... 45Monitoring Technical Counters vs. the EndUser Experience ...................................................... 45

  • 7/30/2019 http://artandvids.blogspot.ca/ - Creating Unified IT Monitoring and Management in your Environment

    5/92

    CreatingUnifiedITMonitoringandManagementinYourEnvironment DonJones

    iii

    How the EUE Drives Better SLAs ............................................................................................................... 46

    How Its Done: Synthetic Transactions, Transaction Tracking, and More ............................... 49TopDown Monitoring: From the EUE to the Root Problem ......................................................... 50

    Agent vs. Agentless Monitoring .................................................................................................................. 51Monitoring What Isnt Yours ....................................................................................................................... 54Critical Capability: You Need to Monitor Everything ........................................................................ 57Conclusion ........................................................................................................................................................... 59Coming Up Next ............................................................................................................................................. 59

    Ch apter 5: Turning Problems into Solutions ............................................................................................. 60Closing the Loop: Connecting the Service Desk to Monitoring ..................................................... 60

    Re taining Knowledge Means Faster Future Resolution .................................................................. 62Knowledge Bases ......................................................................................................................................... 63Tickets as Knowledge Base Articles .................................................................................................... 64Unifying the Knowledge Base ................................................................................................................. 65Making Tickets an Asset ........................................................................................................................... 69

    Pa st Performance Is an Indication of Future Results ........................................................................ 69

    Its the Performance Database ............................................................................................................... 72Summary .............................................................................................................................................................. 73Coming Up Next ............................................................................................................................................. 73

    Ch apter 6: Unified Management, Illustrated ............................................................................................. 74Th e Case Studies ............................................................................................................................................... 74

    Detecting and Solving Problems ........................................................................................................... 74Fulfilling User Orders ................................................................................................................................. 79A Shopping List for Unified IT Management ........................................... ............................................ .. 82Ways to Buy Your Unified IT ....................................................................................................................... 84

    Conclusion ........................................................................................................................................................... 85

  • 7/30/2019 http://artandvids.blogspot.ca/ - Creating Unified IT Monitoring and Management in your Environment

    6/92

    CreatingUnifiedITMonitoringandManagementinYourEnvironment DonJones

    iv

    Copyright Statement

    2012 Realtime Publishers. All rights reserved. This site contains materials that havebeen created, developed, or commissioned by, and published with the permission of,Realtime Publishers (the Materials) and this site and any such Materials are protectedby international copyright and trademark laws.

    THE MATERIALS ARE PROVIDED AS IS WITHOUT WARRANTY OF ANY KIND,EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO, THE IMPLIEDWARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE,TITLE AND NON-INFRINGEMENT. The Materials are subject to change without noticeand do not represent a commitment on the part of Realtime Publishers its web sitesponsors. In no event shall Realtime Publishers or its web site sponsors be held liable fortechnical or editorial errors or omissions contained in the Materials, including withoutlimitation, for any direct, indirect, incidental, special, exemplary or consequentialdamages whatsoever resulting from the use of any information contained in the Materials.

    The Materials (including but not limited to the text, images, audio, and/or video) may notbe copied, reproduced, republished, uploaded, posted, transmitted, or distributed in anyway, in whole or in part, except that one copy may be downloaded for your personal, non-commercial use on a single computer. In connection with such use, you may not modifyor obscure any copyright or other proprietary notice.

    The Materials may contain trademarks, services marks and logos that are the property ofthird parties. You are not permitted to use these trademarks, services marks or logoswithout prior written consent of such third parties.

    Realtime Publishers and the Realtime Publishers logo are registered in the US Patent &Trademark Office. All other product or service names are the property of their respectiveowners.

    If you have any questions about these terms, or if you would like information aboutlicensing materials from Realtime Publishers, please contact us via e-mail [email protected].

    mailto:[email protected]:[email protected]
  • 7/30/2019 http://artandvids.blogspot.ca/ - Creating Unified IT Monitoring and Management in your Environment

    7/92

    CreatingUnifiedITMonitoringandManagementinYourEnvironment DonJones

    1

    Chapter1:ManagingYourITEnvironment:FourThingsYoureDoingWrongAt the very start of the IT industry, monitoring meant having a guy wander around insidethe mainframe looking for burntout vacuum tubes. There wasnt really a way to locate thetubes that were working a bit harder than they were designed for, so monitoringsuch asit waswas an entirely reactive affair.

    In those days, the Help desk was probably that same guy answering the phone when oneof the other dozen or so computer people needed a hand feeding punch cards into ahopper, tracking down a burntout tube, and so on. The concepts of tickets, knowledgebases, service level agreements (SLAs), and so forth hadnt yet been invented.

    IT management has certainly evolved since those days, but it unfortunately hasnt evolved

    as much as it could or should have. Our tools have definitely become more complex andmore mature, but the way in which we use those toolsour IT management processesare in some ways still stuck in the days of reactive tubechanging.

    Some of the philosophies that underpin many organizations IT management practices arereally becoming a detriment to the organizations that IT is meant to support. Thediscussion in this chapter will revolve around several core themes, which will continue todrive the subsequent chapters in this book. The goal will be to help change your thinkingabout how IT managementparticularly monitoringshould work, what value it shouldprovide to your organization, and how you should go about building a bettermanaged ITenvironment.

    ITManagement:HowWeGottoWhereWeAreTodayIn the earliest days of IT, we dealt with fairly straightforward systems. Even simplistic, bytodays standards. The IT team often consisted of people who could fix any of the problemsthat arose, simply because there werent all that many moving parts. Its as if IT was a car:A machine capable of complexity and of doing many different things, but perfectlycomprehendible, in its entirety, by a single human being.

  • 7/30/2019 http://artandvids.blogspot.ca/ - Creating Unified IT Monitoring and Management in your Environment

    8/92

    CreatingUnifiedITMonitoringandManagementinYourEnvironment DonJones

    2

    As we started to evolve that IT car into a space shuttle, we gradually needed to allow forspecialization. Individual systems became so complex in and of themselves that we neededdomainspecific experts to be able to monitor, maintain, and manage each system.Messaging systems. Databases. Infrastructure components. Directory services. The vendorswho produced these systems, along with third parties, developed tools to help our experts

    monitor and manage each system. Thats really where things went wrong. It seemedperfectly sensible at the time, and indeed there was probably no other way to have donethings, but that establishment of domainspecific siloseach with their own tools, theirown procedures, and their own expertisewas the seed for what would become atowering problem inside many IT shops.

    Fast forward to today, when our systems are vastly more complex, vastly interconnected,and increasingly not even hosted within our own data centers. When a user encounters aproblem, they obviously cant tell us which of our many complex systems is at fault. Theysimply tell us what they observe and experience about the problem, which may be theaggregate result of several systems interactions and interdependencies. Our users see a

    holistic environment: IT. That doesnt correspond well to what we see on the back end:databases, servers, directories, files, networks, and more. As a result, we often spend a lotof time trying to track down the root cause of problems. Worse, we often dont even see theproblems coming, because the problems only exist when you look at the end result of theentire environment rather than at individual subsystems. Users feel completelydisconnected from the process, shielded from IT by a sometimeshelpfulsometimesnotHelp desk. IT management has a difficult time wrapping their heads around things likeperformance, availability, and so on, simply because theyre forced to use metrics that arespecific to each system on the network rather than look at the environment as a whole.

    The way weve built out our IT organizations has led to very specific businesslevel issues,which have become common concerns and complaints throughout the world:

    IT has difficulty defining and meeting businesslevel SLAs. The messaging serverwill be up 99% of the time isnt a businesslevel SLA; its a technical one. Email willflow between internal and external users 99% of the time is a businesslevel SLA,but it can be difficult to measure because that statement involves significantly moresystems than just the email server.

    IT has difficulty proactively predicting problems based on system health, andremains largely reactive to problems.

    When problems occur, IT often spends far too much time pinpointing the root causeof the problem.

    ITs concept of performance and system health is driven by systemsdatabaseservers, directory services, network devices, and so forthrather than by how usersand the organization as a whole are experiencing the services delivered by thosesystems.

  • 7/30/2019 http://artandvids.blogspot.ca/ - Creating Unified IT Monitoring and Management in your Environment

    9/92

    CreatingUnifiedITMonitoringandManagementinYourEnvironment DonJones

    3

    IT has a tough time rapidly adopting new technologies that can benefit the business.Oxymoronically, IT is often the part of the organization most opposed to change,because change is usually the trigger for problems. Broken systems dont helpanyone, but an inability to quickly incorporate changes can also be a detriment tothe organizations competitiveness and flexibility.

    IT has a reallytough time adopting new technologies that are significantly outsidethe teams experience or physical reachmost specifically the bevy of outsourcedofferings commonly grouped under the term cloud computing. These technologiesand approaches to technology are so different from whats come before that ITdoesnt feel confident that they can monitor and manage these new systems. Thus,they resist implementing these types of systems for fear that doing so will simplydamage the organization.

    Even with modern selfservice Help desk systems, users feel incredibly powerlessand out of touch when it comes to IT.

    All of these businesslevel problems are the direct result of how weve always managed IT.Our processes for monitoring and managing IT basically have four core problems. Notevery organization has every single one of these, of course, and most organizations are atleast aware of some of these and work hard to correct them. Ultimately, however,organizations need to ensure that all four of these core problems are addressed. Doing sowill immediately begin to resolve the businesslevel issues Ive outlined.

    Problem1:YoureManagingITinSilosFigures 1.1, 1.2, and 1.3 illustrate one of the fundamental problems in IT monitoring andmanagement today.

    Figure1.1:WindowsPerformanceMonitor.

  • 7/30/2019 http://artandvids.blogspot.ca/ - Creating Unified IT Monitoring and Management in your Environment

    10/92

    CreatingUnifiedITMonitoringandManagementinYourEnvironment DonJones

    4

    Figure1.2:SQLServerPerformance.

    Figure1.3:RouterPerformance.

    These figures each illustrate a different performance chart for various components of an ITsystem. Each of these images was produced using a tool that is more or less specialized for

    the exact thing that was being monitored. The tool that produced the router performancechart, for example, cant produce the same chart for a database server or even for a routerthats located on someone elses network.

  • 7/30/2019 http://artandvids.blogspot.ca/ - Creating Unified IT Monitoring and Management in your Environment

    11/92

    CreatingUnifiedITMonitoringandManagementinYourEnvironment DonJones

    5

    This is such a core, fundamental problem that many IT experts cant even recognize that itis a problem. Using these domainspecific tools is such an integrated and seemingly naturalpart of how IT works that many of us simply cant imagine a different way. But we needtomove past using these domainspecific tools as our first line of defense when it comes to

    ring and troubleshooting.monito

    Why?

    One major reason is that these tools keep us all from being on the same page. IT expertscant even have meaningful crossdiscipline discussions when these tools become involved.Im looking at the database server, and the performance is at more than 200 TPMs, oneexpert says. Well, that must be a problem because the router is running well over 10,000PPMs. Those two experts dont even have a common language for performance becausetheyre locked into the domainspecific, deeplytechnical aspects of the technologies theymanage.

    Domainspecific tools also encourage what is probably the worst single practice in all of IT:

    looking at systems in isolation. The database guy doesnt have the slightest idea whatmakes a router tick, what constitutes good or bad performance in a messaging server, orwhat to look for to see if the directory services infrastructure is running smoothly. So thedatabase guy puts on a set of blinders and just looks at his database servers. But thoseservers dont exist in a vacuum; theyre impacted by, and they in turn impact, many othersystems. Everything works together,but we cantsee that using domainspecific tools.

    We have to permanently remove the walls between our technical disciplines, breakingdown the silos and getting everyone to work as a single team. In large part, that meanswere going to have to adopt new tools thatenable IT silos to work as a team, putting theinformation everyone needs into a common context. Sure, domainspecific tools will always

    have their place, but they cant be our first line of information.

    CaseStudy

    Jerry works for a typical IT department in a midsize company. His specialty isWindows server administration, and his team includes specialists for Webapplications, Microsoft SQL Server and Oracle, VMware vSphere, and for thenetwork infrastructure. The company outsources certain enterprisefunctionality, including their Customer Relationship Management (CRM) andemail.

    Recently, a problem occurred that caused the companys main Web site tostop sending customer order confirmation emails. Jerry was initially called tosolve the problem, on the assumption that it was with the companysoutsourced messaging solution. Jerry discovered, however, that user emailwas flowing normally. He passed the problem to the Web specialist, whoconfirmed that the Web site was working properly but that emails sent by itwere being rejected. Jerry filed a ticket with the messaging hosting company,who responded that their systems were in working order and that he shouldcheck the passwords that the Web servers were using.

  • 7/30/2019 http://artandvids.blogspot.ca/ - Creating Unified IT Monitoring and Management in your Environment

    12/92

    CreatingUnifiedITMonitoringandManagementinYourEnvironment DonJones

    6

    After more than a day of backandforth with the hosting company andvarious experts, the problem was traced to the companys firewall. It hadrecently been upgraded to a new version, and that version was now blockingoutgoing message traffic from the companys perimeter network, which iswhere the Web servers were located. The network infrastructure specialist

    was called in to reconfigure the firewall, and the problem was solved.

    This narrative precisely demonstrates the problem: By managing our IT teams as domainspecific silos, we significantly hinder their ability to work together to solve problems. Thefact that IT experts require domainspecific tools shouldnt be a barrier to breaking downthose silos and getting our team to work more efficiently together. This becomes especiallyimportant when pieces of the infrastructure are outsourced; those hosting companies arean unbreakable silo, as theyre not responsible for any systems other than the ones theyprovide to us. However, the dependencies that our systems and processes have on theirsystems means our own team still has to be able to monitor and troubleshoot those

    outsourced systems asifthey were located right in the data center.

    Problem2:YouArentConnectingYourUsers,ServiceDesk,andIT

    ManagementCommunication is a key component of making any team work; and the team that is yourorganization is no exception. In the case of IT, we typically use Help desk systems as ourmeans of enabling communicationsbut that isnt always sufficient. Help desk systems arealmost always built around the concept of reacting to problems, then managing thatreaction; theyre almost by definition notproactive.

    For example, how do you tell your users that a given system will have degradedperformance or will be offline for some period of time? Probably through email, whichcreates a couple of problems:

    Important messages tend to get lost in the glut of email that users deal with daily

    Users who dont get the message tend to go the Help desk route, which doesntinclude a means of intercepting their mental process and letting them know that theproblem was planned for.

  • 7/30/2019 http://artandvids.blogspot.ca/ - Creating Unified IT Monitoring and Management in your Environment

    13/92

    CreatingUnifiedITMonitoringandManagementinYourEnvironment DonJones

    7

    Most IT teams do know the things that need to be communicated throughout theorganization, for example:

    SLAs

    eyre being metThe current status of SLAswhether th

    Planned outages and degraded service

    icesAverage response times for specific serv

    Known issues that are being worked on

    What most IT teams have a problem with is communicating these items consistently acrossthe entire organization. Some organizations rely on email, which as Ive already pointed outcan be inefficient and not consistently effective. Some organizations will use an intranetWeb site, such as a SharePoint portal, to post noticesbut these sites arent directlyintegrated with the Help desk, making it an extra step to keep them updated and requiringusers to remember to check them.

    CaseStudy

    Tom works as an inside salesperson for a midsize manufacturing company.Recently, the application that Tom uses to track prospects and create neworders started responding very slowly, and over the course of the day,stopped working completely.

    Toms initial action was to call his companys IT Help desk. The Help desktechnician sounded harried and frustrated, and told Tom, We know, wereworking on it, and hung up. Tom had no expectation when the system mightreturn to normal, and was afraid to bother the Help desk by calling back formore details.

    Over the course of that day, the Help desk logged calls from nearly everysalesperson, each of whom called on their own to find out what was going on.Eventually, the Help desk simply stopped logging the calls, telling everyonethat, A ticket is already open, and disconnecting the call.

    Someone on the IT management team eventually sent out an email explainingthat a server had failed and that the application wasnt expected to be onlineuntil the next morning. Tom wished he had known earlier; although hedoriginally planned to make sales calls all day, if hed known that theapplication would be down for that long, he could have switched to otheractivities for the day or even just taken the day of

    f.

  • 7/30/2019 http://artandvids.blogspot.ca/ - Creating Unified IT Monitoring and Management in your Environment

    14/92

    CreatingUnifiedITMonitoringandManagementinYourEnvironment DonJones

    8

    Management communications are equally important, and equally challenging. Providingfrank numbers on service levels, response times, outages, and so forth is crucial in order formanagement to make better decisions about ITbut that information can often be difficultto come by.

    Problem3: YoureMeasuringtheWrongThingsThis problem is very likely at the heart of everything IT is notdoing to help better aligntechnology with business needs. The following case study outlines the scenario.

    CaseStudy

    Shelly works in the Accounting department for her company. Recently, whiletrying to close the books for her company, the accounting application beganto react very slowly. She called her companys IT Help desk to report theproblem.

    The Help desk technician listened to her then said that, Everything on thatserver looks fine right now. Ill open a ticket and ask someone to look at it,but since we are currently within our service level agreement for responsetimes, it will be a lowpriority ticket.

    Shelly continued to struggle with the slowlyresponding application.Eventually, someone was dispatched to her desktop. She demonstrated thatevery other application was responding normally. She pointed out that otherpeople in her department were having similar problems with the application.The technician made her close all of her applications and then restarted hercomputer, to no effect. He shrugged, entered some notes into his smartphone,and left.

    By the next morning, the applications response times were better, but theywere far from normal. Shelly continued to call the Help desk for updates onher tickets status, but it seemed as if the IT team had given up on trying to fixthe problemand refused to even admit that there was a problem.

    This kind of scenario unfortunately happens all too often in many organizations. It exactlyillustrates what happens when several problems are happening at once: IT is operating as aset of individual silos rather than as a team, and each silo has its own definition for wordslike slow. A root issue here is thateveryoneismeasuringthewrongthing.Figure 1.4

    shows how the average IT team sees a multicomponent, distributed application.

  • 7/30/2019 http://artandvids.blogspot.ca/ - Creating Unified IT Monitoring and Management in your Environment

    15/92

    CreatingUnifiedITMonitoringandManagementinYourEnvironment DonJones

    9

    Figure1.4:ITperspectiveofadistrib

    utedapplication.

  • 7/30/2019 http://artandvids.blogspot.ca/ - Creating Unified IT Monitoring and Management in your Environment

    16/92

    CreatingUnifiedITMonitoringandManagementinYourEnvironment DonJones

    10

    They see the components. Domain experts measure the performance of each component

    using technical metrics, such as processor utilization, response time, and so forth. When a

    components performance exceeds certain predefined thresholds, someone in IT paysattention. Figure 1.5, however, shows how a user sees this same application.

    Figure1.5 ersperspectiveofadistributedapplication.

    The user doesntoften cantsee any of the components. They simply see an application,

    and either its responding the way they expect, or it isnt. It doesnt matter a bit to the user

    if every single constituent component is running at an acceptable level of processor

    utilizationwhatever thatmeans. They simply care whether the application is working.This creates a major disconnect between the user population and IT, as Figure 1.6

    illustrates.

    :Us

  • 7/30/2019 http://artandvids.blogspot.ca/ - Creating Unified IT Monitoring and Management in your Environment

    17/92

    CreatingUnifiedITMonitoringandManagementinYourEnvironment DonJones

    11

    Figure1.6:ITvs.usermeasurements

    ofperformance.

  • 7/30/2019 http://artandvids.blogspot.ca/ - Creating Unified IT Monitoring and Management in your Environment

    18/92

    CreatingUnifiedITMonitoringandManagementinYourEnvironment DonJones

    12

    Users and IT measure very different things. An ITcentric SLA might specify a givenresponse time for queries sent to a database server; that often has little to do with whetheran application is seen as slow by users. Worse, as we start to migrate services andcomponents to the cloud, we lose much of our ability to measure those componentsperformance the way we do for things that are in our own data center. The result? Nobody

    can agree on what an SLA should say.

    This all has to change. We have to start measuring things more from a user perspective.The performance of individual components is important, but only as they contribute to thetotal experience that a user perceives. We need to define SLAs that put everyoneusersand ITon the same page, then manage to those SLAs using tools that enable us to do so.

    Some organizations will tell you that theyre moving, or have moved, to a servicebased IToffering. What that generally means in broad terms is that the organization is seeking toprovide IT as a set of services to the organizations various departments and users. In manyinstances, however, those serviceoriented organizations are still focused on componentsand devices, which isnt a serviceoriented approach at all. When your phone line goesdown, you dont call the phone company (on your cell phone, probably) and start askingquestions about switches and trunk linesyou ask when your dial tone will be back. Thebackend infrastructure is meaningless to the user. You dont ask for a service credit basedon how long a particular phone company office will be offline, you ask for that credit basedon how long you went without a dial tone. That's the model IT needs to move toward.

    Problem4:YoureLosingKnowledgeThe last problematic practice well look at is the issue of lost institutional knowledge. Thisproblem is a purely human one, and frankly its going to be difficult to address. Heres a

    quick scenario to set the scene.CaseStudy

    Aaron works for his companys IT department. Hes been with the companyfor 3 years and is responsible for several of the companys systems andinfrastructure components. One Tuesday, Aaron is contacted by hiscompanys IT Help desk. Were assigning you a ticket about the Oraclesystem, hes told. Once every couple of months it starts acting really weird,and someone has to fix it.

    Im not the Oracle guy, Aaron says. Thats Jill.

    Yeah, but Jills out on vacation for 2 weeks. So youll have to fix it.

    Ive no idea what to do!

    Well, figure something out. The CEO gets upset when this takes too long tofix.

  • 7/30/2019 http://artandvids.blogspot.ca/ - Creating Unified IT Monitoring and Management in your Environment

    19/92

    CreatingUnifiedITMonitoringandManagementinYourEnvironment DonJones

    13

    Unfortunately, too much knowledge gets wrapped up in the heads of specific individuals. Infact, its a sad truth that many organizations deal with this problem by simplydiscouraging IT team members to take lengthy vacations, and often resist other activitiesthat would put them out of touchsuch as sending them to conferences and classes tocontinue their education and to learn new skills.

    More than a few organizations have made halfhearted attempts at building knowledgebases, in a hope that some of this institutional knowledge can be committed to electronicpaper, preserved, and made more accessible. The problem is that IT professionals arentnecessarily good writers, so the act of producing the knowledge base is difficult for them. Italso takes timetime the organization is often unwilling to commit, especially in the faceof other daily pressures and demands.

    As I said, this is a problem thats difficult to fix. The IT team realizes its a problem, and isgenerally willing to fix itbut theyre not tech writers, and often have a limited ability tofix the problem. You can usually create management requirements that require problemsand solutions be logged in a Help desk ticketing system, but searching through that systemfor problems and solutions can often be difficult and timeconsumingmuch like searchingfor solutions on an Internet search engine, with all of the false hits such a search generally

    s.produce

    But we mustfind a way to address this problem. Knowledge about the companysinfrastructureand how to solve problemshas to be captured and preserved. Thisrequirement is crucial not only to solving problems faster in the future but also toeventually preventing those problems by making better IT management decisions.

    HowTrulyUnifiedManagementCanFixtheProblemsThis book is going to be all about fixing these four problems, and the means by which Illpropose to do so falls under the umbrella term unifiedmanagement.Essentially, unifiedmanagement is all about bringing everything together in one place.

    Well break down the silos between IT disciplines, putting everyone onto the same console,getting everyone working from the same data set, and getting everyone working togetheron problems. Well do that in a way that brings users, IT, and management into a singleviewport of IT service and performance. Well create more transparency about things likeservice levels, letting users see whats happening in the environment so that theyre moreinformed.

    Well inform users in a way thats meaningful to them rather than using invisible, backendtechnical metrics. Well rebuild the entire concept of SLAs into something thats meaningfulfirstto users and management, and that can withstand the transition to hybrid IT thats

    cloud.being brought about by outsourcing certain IT services to the

  • 7/30/2019 http://artandvids.blogspot.ca/ - Creating Unified IT Monitoring and Management in your Environment

    20/92

    CreatingUnifiedITMonitoringandManagementinYourEnvironment DonJones

    14

    Finally, well find a way to capture information about our environment, including solutionsto problems, to enable faster timetoresolution when problems occur. In addition, thisinformation will enable management to make smarter decisions about future technologydirections and investments.

    Well try to do all of this in a way that wont cost the organization an arm and a leg nor takehalf a lifetime to actually implement. That will involve a certain amount of creativity,including looking at outsourced solutions. The idea of an outsourced solution providingmonitoring for insourced components is fairly innovative, and well see what applicabilityit has.

    I should point out that much of what well be looking at can work to support the ITmanagement frameworks that many organizations are adopting these days, including theITIL framework thats become popular in the past few years. You certainly dont have to bean ITIL expert to take advantage of the new processes and techniques Ill suggestnor doyou even have to think about implementing ITIL (or any other framework) if yourorganization isnt already doing so. If you are using a framework, however, youll bepleased to know that everything I have to propose should fit right into it.

    SummaryThis chapter has established the four main themes that will drive the remaining chapters inthis book. These core things represent what many experts believe are the biggest and mostfundamental problems with how IT is managed today, and represent the things that wellfocus on fixing throughout the remainder of this book. Our focus will be on changingmanagement philosophies and practices, not on simply picking out new toolsalthoughnew tools may be something youll acquire to help support these new practices.

    Chapter 2 will focus on the first problematic practice, which is the fact that IT tends to bemanaged in domainspecific silos. Well look at the technical reasons organizations havebeen more or less forced to manage this way, and explore ways in which you can start tochange that practice.

    Chapter 3 will look at connecting people: IT management, your users, your service desk,and more. Only by bringing everyone into the process can IT better align itself to the needsof the organization.

    Our third problem practice will be the subject of Chapter 4, where we dive into lookingoutside the data center for monitoring. The goal will be to solve the problems weve

    to the organization.discussed in this chapter, further focusing IT on its value

  • 7/30/2019 http://artandvids.blogspot.ca/ - Creating Unified IT Monitoring and Management in your Environment

    21/92

    CreatingUnifiedITMonitoringandManagementinYourEnvironment DonJones

    15

    Chapter 5 will discuss ways to turn problems into future solutions. Although modernorganizations are fully aware of the need for Help desk tracking and knowledge building,howthose activities are managed as part of the larger IT management process can make ahuge difference in their valueadd to the organization.

    Well conclude in Chapter 6, with an attempt to visualize an IT environment where thesenew, unified management practices are in place. Ill provide narratives from several case

    work in a real environment.studies, helping you see how these modernized practices

  • 7/30/2019 http://artandvids.blogspot.ca/ - Creating Unified IT Monitoring and Management in your Environment

    22/92

    CreatingUnifiedITMonitoringandManagementinYourEnvironment DonJones

    16

    Chapter2:EliminatingtheSilosinITManagementIn the previous chapter, I proposed that one of the biggest problems in modern IT is thefact that we manage our environment in technologyspecific silos: database administratorsare in charge of databases, Windows admins are in charge of their machines, VMwareadmins run the virtualization infrastructure, and so forth. Im not actually proposing thatwe change that exact practicehaving domainspecific experts on the team is definitely abenefit. However, having these domainspecific experts each using their own unique,domainspecific tool definitely creates problems. In this chapter, well explore some ofthose problems, and see what we can do to solve them and create a more efficient, unifiedIT environment.

    TooManyToolsMeansTooFewSolutionsComparing apples to oranges is an apt phrase when it comes to how we manageperformance, troubleshooting, and other core processes in IT. Tell an Exchange Serveradministrator that theres a performance problem with the messaging system, and helllikely jump right into Windows Performance Monitor, perhaps with a precreated counterset that focuses on disk throughput, processor utilization, RPC request count, and soforthas shown in Figure 2.1.

    Figure2.1:MonitoringExchange.

  • 7/30/2019 http://artandvids.blogspot.ca/ - Creating Unified IT Monitoring and Management in your Environment

    23/92

    CreatingUnifiedITMonitoringandManagementinYourEnvironment DonJones

    17

    If the Exchange administrator cant find anything wrong with the server, he might pass theproblem over to someone else. Perhaps it will be the Active Directory administratorbecause Active Directory plays such a crucial role in Exchanges operation andperformance. Out comes the Active Directory administrators favorite performance tool,perhaps similar to the one shown in Figure 2.2. This is truly a domainspecific tool, with

    special displays and measurements that relate specifically to Active Directory.

    Figure2.2:MonitoringActiveDirectory.

    If Active Directory looks fine, then the problem might be passed over to the networkinfrastructure specialist. Out comes another tool, this one designed to look at theperformance of the organizations routers (see Figure 2.3).

  • 7/30/2019 http://artandvids.blogspot.ca/ - Creating Unified IT Monitoring and Management in your Environment

    24/92

    CreatingUnifiedITMonitoringandManagementinYourEnvironment DonJones

    18

    Figure2.3:Monitoringrouterperformance.

    Combined, all of these tools have led these three specialists to the same decision:Everythings working fine. In spite of the fact that Exchange is clearly, from the users pointof view, notworking fine, theres no evidence that points to a problem.

    Simply put, this is a too many tools, too few answers problem. In todays complex ITenvironments, performancealong with other characteristics like availability andscalabilityare the result of many components interacting with each other and workingtogether. You cant manage IT by simply looking at one component; you have to look atentire systems of interacting, interdependent components.

    Our reliance on domainspecific tools holds us back from finding the answers to our IT

    problems. That reliance also holds us back when it comes time to grow the environment,manage service level agreements (SLAs), and other core tasks. Ive actually seen instanceswhere domainspecific tools acted almost as blinders, preventing an expert who shouldhave been able to solve a problem, or at least identify it, from doing so as quickly as he orshe might have done.

  • 7/30/2019 http://artandvids.blogspot.ca/ - Creating Unified IT Monitoring and Management in your Environment

    25/92

    CreatingUnifiedITMonitoringandManagementinYourEnvironment DonJones

    19

    CaseStudy

    Heather is a database administrator for her organization. Shes responsiblefor the entire database server, including the database software, the operating

    system (OS), and the physical hardware.One day she receives a ticket indicating that users are experiencing sharplyreduced performance from the application that uses her database. She whipsout her monitoring tools, and doesnt see a problem. The servers CPU isidling along, disk throughput is well within norms, and memory consumptionis looking good. In fact, she notices that the amount of workload being sent tothe server is lower than shes used to seeing. That makes her suspect thenetwork is having traffic jams, so she reassigns the ticket to the companysinfrastructure team. That team quickly reassigns the ticket right back to her,assuring her that the network is looking a bit congested, but its all trafficcoming from herserver.

    Heather looks again, and sees that the servers network interface is hummingalong with a bit more traffic than usual. Digging deeper, she finally realizesthat the server is experiencing a high level of CRC errors, and is thus havingto retransmit a huge number of packets. Clients experience this problem as ageneral slowdown because it takes longer for undamaged packets to reachtheir computers.

    Heathers focus on her specific domain expertise led her to toss the problemover the wall to the infrastructure team, wasting time. Because she wasntaccustomed to looking at her servers network interface, she didnt check itas part of her routine performance troubleshooting process.

    DomainSpecificToolsDontFacilitateCooperationIf the components of our complex IT systems are cooperative and interdependent, our ITprofessionals are often anything but. In other words, IT management tends to encouragethe silos that are built around specific technology domains. Theres the databaseadministration group, the Active Directory group, the infrastructure group, and so forth.Even companies that practice matrix management, in which multiple domain experts are

    os around each technical domain.grouped into a functional team, still tend to accept the sil

  • 7/30/2019 http://artandvids.blogspot.ca/ - Creating Unified IT Monitoring and Management in your Environment

    26/92

    CreatingUnifiedITMonitoringandManagementinYourEnvironment DonJones

    20

    There are two major reasons that these silos persist, and almost any IT professional candescribe them to you:

    I dont know anything aboutthat. Each domain expert is an expert in his technicalarea. The database administrator isnt proficient at monitoring or managing routers,

    and doesnt especially want to work with them anyway. Theres little real value inextensive technical crosstraining for most organizations, simply because their staffdoesnt have the time. Devoting time to secondary and tertiary disciplines reducesthe amount of time available for their primary job responsibilities.

    I dont want anyone messing with my stuff. IT professionals want to do a good job,and theyre keenly aware that most problems come about as the result of change.Allow someone to change something, and youre asking for trouble. If someonechanges something in your part of the environment, and you dont know about theiractivity, youll have a harder time fixing any resulting problems.

    Both of these reasons are completely valid, and Im in no way suggesting that everyone on

    the IT team become an expert in every technology that the organization must support.minor adjHowever, the attitudes reflected in these two perspectives require some ustment.

    One reason I keep coming back to domainspecific tools is because they encourage this kindof walledgarden separation, and do nothing to encourage even the most cursorycooperation between IT specialists. Cooperation, when it exists, comes about through goodhuman working relationshipsand those relationships often struggle with the fact thateach specialist is looking at a different set of data and working from a different sheet ofmusic, so to speak. Ive been in environments and seen administrators spend hoursarguing about whose fault something was, each pointing to their own domainspecifictools as evidence.

    CaseStudy

    Dan is an Active Directory administrator for his company, and is responsiblefor around two dozen domain controllers, each of which runs in a virtualmachine. Peg is responsible for the organizations virtual serverinfrastructure, and manages the physical hosts that run all of the virtualmachines.

    One afternoon, Peg gets a call from Dan. Dans troubleshooting a performanceproblem on some of the domain controllers, and suspects that something isconsuming resources on the virtualization host that his domain controllersneed.

  • 7/30/2019 http://artandvids.blogspot.ca/ - Creating Unified IT Monitoring and Management in your Environment

    27/92

    CreatingUnifiedITMonitoringandManagementinYourEnvironment DonJones

    21

    Peg opens her virtual server console and assures Dan that the servers arentmaxed out on either physical CPU or memory, and that disk throughput iswell within expected levels. Dan counters by pointing to his Active Directorymonitoring tools, which show maxedout processor and memory statistics,and lengthening disk queues that indicate data isnt being written to and read

    from disk as quickly as it should be. Peg insists that the physical servers arefine. Dan asks if the virtual machines settings have been reconfigured toprovide fewer resources to them, and Peg tells him no.

    The two go back and forth like this for hours. Theyre each looking atdifferent tools, which are telling them completely different things. Becausetheyre not able to speak a common technology language, theyre not able towork together to solve the problem.

    We dontneed to have every IT staffer be an expert in every IT technology; we do need tomake it easier for specialists to cooperate with one another on things like performance,scalability, availability, and so forth. Thats difficult to do with domainspecific tools. The

    router administrator doesntwanta set of database performancemonitoring tools, and thedatabase administrator doesnt especially want the router admin to have those tools.Having domainspecific tools for someone elses technical specialization is exactly how thetwo attitudes I described earlier come into play.

    Ultimately, the problem can be solved by having a unified tool set. Get everyonesperformance information onto the same screen. That way, everyone is playing from thesame rule book, looking at the same dataand that data reflects the entire, interdependentenvironment. Everyone will be able to see where the problem lies, then they can pull outthe domainspecific tools to start fixing the actual problem area, if needed.

    TheCloud

    Question:

    Unifying

    On

    Premise

    and

    Off

    Premise

    M

    This concept of a unified monitoring console becomes even more important asorganizations begin shifting more of their IT infrastructure into the cloud.

    onitoring

    TheCloudIsNothingNew

    I have to admit that Im not a big fan of the cloud as a term. Its very salesandmarketing flavored, and the fact is that it isnt a terribly new concept.

    Organizations have outsourced IT elements for years. Probably the mostoutsourced component is Web hosting, either outsourcing single Web sitesinto a sharedhosting environment, or outsourcing collocated servers into

    someone elses data center.

  • 7/30/2019 http://artandvids.blogspot.ca/ - Creating Unified IT Monitoring and Management in your Environment

    28/92

    CreatingUnifiedITMonitoringandManagementinYourEnvironment DonJones

    22

    For the purposes of this discussion, the cloud simply refers to some ITelement being outsourced in a way that abstracts the underlyinginfrastructure. For example, if you have collocated servers in a hostingcompanys data center, you dont usually have details about their internalnetwork architecture, their Internet connectivity, their routers, and so

    forththe data center is the piece youre paying to have abstracted for you.In a modern cloud computing model like Windows Azure or Amazon ElasticCloud, you dont have any idea what physical hosts are running your virtualmachinesthat physical server level is what youre paying to haveabstracted, along with supporting elements like storage, networking, and soon. For a Software as a Service (SaaS) offering, you dont even know whatvirtual machines might be involved in running the software because yourepaying to have the entire underlying infrastructure abstracted.

    Regardless which bits of your infrastructure wind up in some outsourced serviceproviders hands, those bits are still apartofyourbusiness.Critical business applicationsand processes rely on those bits functioning. You simply have less control over them, andtypically have less insight into how well theyre running at any given time.

    This is where domainspecific tools fall apart completely. Sure, part of the whole point ofoutsourcing is to let someone else worry about performancebut outsourced IT stillsupportsyourbusiness, so you at least need the ability to see how the performance ofoutsourced elements is affecting the rest of your environment. If nothing else, you need theability to authoritatively point the finger at the specific cause of a problemeven if thatcause is an outsourced IT element, and you cant directly effect a solution. This is whereunified monitoring truly earns a place within the IT environment. For example, Figure 2.4shows a very simple unified dashboard that shows the overall status of severalcomponents of the infrastructureincluding several outsourced components, such as

    mazon Web Services.A

  • 7/30/2019 http://artandvids.blogspot.ca/ - Creating Unified IT Monitoring and Management in your Environment

    29/92

    CreatingUnifiedITMonitoringandManagementinYourEnvironment DonJones

    23

    Figure2.4:

    Unified

    monitoring

    dashboard.

    The idea is to be able to tell, at a glance, where performance is failing, to drill through formore details, and then to either start fixing the problemif it exists on your end of thecloudor escalate the problem to someone who can.

    Lets be very clear on one thing: Any organization thats outsourcing anyportion of itsbusiness IT environment and cannot monitor the basic performance of those outsourcedelements is going to be in big trouble when something eventually goes wrong. Sure, youhave SLAs with your outsourcing partnersbutreadthose SLAs. Typically, they onlycommit to a refund of whatever fees you pay if the SLA isnt met. That does nothing to

    compensate you for lost business that results from the unmet SLA. Its in your bestinterests, then, to keep a close watch on performance. That way, when itstarts to go bad,you can immediately contact your outsourcing partner and get someone working on a fix sothat the impact on your business can at least be minimized.

    MissingPiecesTheres another problem when it comes to performance monitoring and management,scalability planning, and so forth: missing pieces. Our technologycentric approach to ITtends to give us a myopic view of our environment. For example, consider the diagram inFigure 2.5. This is a typical (if simplified) diagram that any IT administrator might create to

    help visualize the components of a particular application.

  • 7/30/2019 http://artandvids.blogspot.ca/ - Creating Unified IT Monitoring and Management in your Environment

    30/92

    CreatingUnifiedITMonitoringandManagementinYourEnvironment DonJones

    24

    Figure2.5:Applicationdiagram.

    The problem is that there are obviously missing pieces. For example, wheres the

    infrastructure? Whoever created this diagram clearly doesnt have to deal with theinfrastructurerouters and switches and so forthso they didnt include it. Its assumed,

    almost abstracted like an outsourced component of the infrastructure. Maybe Figure 2.6 isa more accurate depiction of the environment.

  • 7/30/2019 http://artandvids.blogspot.ca/ - Creating Unified IT Monitoring and Management in your Environment

    31/92

    CreatingUnifiedITMonitoringandManagementinYourEnvironment DonJones

    25

    Figure2.6:Expandedapplicationdiagram.

    And even with this diagram, there are still probably missing pieces. This reality is probably

    one of the biggest dangers in IT management today: We forget about pieces that are outsideour purview.

  • 7/30/2019 http://artandvids.blogspot.ca/ - Creating Unified IT Monitoring and Management in your Environment

    32/92

    CreatingUnifiedITMonitoringandManagementinYourEnvironment DonJones

    26

    Again, this is where a unified monitoring system can create an advantage. Rather thanfocusing on a single area of technologylike serversit can be technologyagnostic,focusing on everything.Theres no need to leave something out simply because it doesnt fitwithin the tools domain of expertise; everything can be included.

    In fact, an even better approach is to focus on unified monitoring tools that can actually goout andfindthe components in the environment. Software doesnt have to make the sameassumptions, or have the same technology prejudices, as humans. A unified monitoringconsole doesnt care if you happen to be a HyperV expert, or if you prefer Cisco routersover some other brand. It can simply take the environment as it is, discovering the variouscomponents and constructing a real, accurate, and complete diagram of the environment. Itcan then start monitoring those components (perhaps prompting you for credentials foreach component, if needed), enabling you to get that complete, allinone, unifieddashboard. Ive been in environments where not using this kind of autodiscovery became areal problem.

    CaseStudy

    Terry is responsible for the infrastructure components that support hiscompanys primary business application. Those components include routers,switches, database servers, virtualization hosts, messaging servers, and evenan outsourced SaaS sales management application. Terrys heard about theunified monitoring idea, and his organization has invested in a service thatprovides unified monitoring for the environment. Terrys carefullyconfigured each and every component so that everything shows up in themonitoring solutions dashboard.

    One afternoon, the entire application goes down. Terry leaps to the unifiedmonitoring console, and sees several alarm indications. He drills down and

    discovers that the connection to the SaaS application is unavailable. Drillingfurther, he sees that the router for that connection is working fine, and thatthe firewall is up and responsive. Hes at a complete loss.

    Several hours of manual troubleshooting and wiretracing reveal somethingabout the environment that Terry didnt know: Theres a router on the otherside of the firewall as well, and its failed. Normal Internet communicationsare still working because those travel through a different connection, but theconnection that carries the SaaS applications traffic is offline. The extrarouter is actually a legacy component that pretty much everyone hadforgotten about.

    A monitoring solution capable of automated discovery wouldnt haveforgotten, though. It could have detected the extra router and included it inTerrys dashboard, making it much easier for him to spot the problem. In fact,it might have prompted him to replace or remove that router much earlier,once he realized it existed.

  • 7/30/2019 http://artandvids.blogspot.ca/ - Creating Unified IT Monitoring and Management in your Environment

    33/92

    CreatingUnifiedITMonitoringandManagementinYourEnvironment DonJones

    27

    Discovery can also help identify components that dont fit neatly within our technologysilos, and that dont belong to anyone. Infrastructure components like routers andswitches are commonlyused examples of these orphan components because not everyorganization maintains a dedicated infrastructure specialist to support these devices.However, legacy applications and servers, specialty equipment, and other components can

    all be overlooked when theyre not anyones specific area of responsibility. Discovery helpskeep us from overlooking them.

    NotAllofITIsaProblem:Ordering,Routing,andProvidingServicesMost organizations tend to get into the habit of thinking of their IT department as firefighters. IT exists to solve problems. That isnt true, of course, and any organizationprobably (hopefully) depends more on IT to carry out daytoday tasks and requests morethan they rely on them to solve problems. But the daytoday tasks are easy to overlook,whereas fire fighting gets everyones attention.

    The result of this way of thinking is that IT management tends to focus on tools that helpmake problemsolving easier. Unified monitoring is exactly that kind of tool: If nothing everwent wrong, we wouldnt need it. Its there to make problemsolving faster, primarily in the

    rform d availabilityareas of pe ance an . Right?

    Not quite. Trulyunified managementalso entails making daytoday IT tasks easier foreveryone involved. Users, for example, need to order and receive routine services, fromsimple password resets and account unlocks to new hardware and software requests. Illmake what some consider to be a bold statement and say that those routine requestsshould be treated in the exact same way as a problem. Look at any IT managementframework, such as ITIL, and youll find that concept runs throughout: Routine IT requests

    should be part of a unified managementprocess, which also includes problemsolving.Consider some of these broad functional capabilities that a unified management (versusmere monitoring) can offer both to problemsolving activities and to routine IT services:

    WorkflowWhen problems arise, following a structured process, or workflow, canhelp make problemsolving more consistent and efficient. Similarly, structuredworkflows can help make routine IT services more efficient and consistent. Theworkflows will be different for problemsolving and for various routine services, buthaving the ability to manage and monitor workflows can be a real benefit.

    ApprovalsWorkflows should include approvals. This capability is most obvious

    for routine services like hardware and software requests, security requests, and soonbut it can be just as important for problem solving. Not every problem can befixed by changing a setting or rebooting a device; sometimes youll need to make amore significant change, and having the ability to formally route approval to makethat change is a benefit.

  • 7/30/2019 http://artandvids.blogspot.ca/ - Creating Unified IT Monitoring and Management in your Environment

    34/92

    CreatingUnifiedITMonitoringandManagementinYourEnvironment DonJones

    28

    Routing. The specialist who fixes a problem is usually the last one to hear about it.Frontline resources, such as your Help desk and your end users, are the firstresponders. Being able to select a problem category and have a ticket routed to theright individual helps speed problem resolution. The same is true for routineservices: Things get done quicker when the right person has the request. Automated

    routing capabilities can help get the right person on the job more quickly and moreaccurately.

    Selfservice. Reducing phone calls and manual email juggling is crucial to achievingbetter efficiency. Selfservice can help do that for both problems and routinerequests. When users experience a problem, selfservice can allow them to submittickets as well as help them solve the problem on their own, through a knowledgebase. When users need routine service, selfservice helps them submit that requestwithout having to engage additional IT services.

    Service catalog. Part of selfservice is the ability to create an online store forservices that users can request.

    There are more capabilities, of course, but well cover them in upcoming chapters. Theseare simply some of the basic capabilities that we need in order to make both routine ITrequests andproblemsolving more consistent and efficient.

    ComingUpNextThis chapter has been about breaking down the silos between technology specialties, or atleast building doorways between them. That helps to solve one of the major problems inmodern IT monitoring and management. The next chapter will tackle a somewhat morecomplicated problem: Keeping everyone in the management loop. Its about improving

    communications. Unfortunately, communications are too often a voluntary, secondaryexercisewe have to make an effortto communicate, and when were really feeling thepressure, its easy to want to put that effort elsewhere. So we need to adopt processes andtools that make communications more automatic, helping keep everyone in the loopwithoutrequiring a massive secondary effort to do so.

  • 7/30/2019 http://artandvids.blogspot.ca/ - Creating Unified IT Monitoring and Management in your Environment

    35/92

    CreatingUnifiedITMonitoringandManagementinYourEnvironment DonJones

    29

    Chapter3:ConnectingEveryonetotheITManagementLoopIT management has for too long involved discrete, disconnected processes that often leavekey participants wondering whats going on. Bringing everyoneusers, managers, ITprofessionals, and moreinto the loop can create significant benefits as well as reduce thetendency to fall back into disciplinebased silos. This is where the integration betweenmonitoring and service desk truly happens, and these concepts deliver the most critical,central themes discussed throughout this book. Its all about communicationways to

    ent.better achieve communication as well as create opportunities for continuous improvem

    Users sometimes perceive their IT department as outoftouch, ivorytower geeks withpoor people skills. Whether or not thats true depends on the actual IT team members, but

    theperception,

    fair or not, often exists. Thats because IT can too often be the last ones toknow about things thatusers perceive as problems. Sure, the server might me hummingalong within specs, but the orderentry application is incredibly slow. IT says that email isworking fine, but Ive been waiting on an incoming purchase order for an hourthe emailsystem cant possibly be working correctly!

    IT has its own unique problems to deal with, and they sometimes involve a disconnect withmanagement. Finding windows in which to make approved changes, for example, can beincredibly tricky. Simply coordinating the changes that are proposed, approved, underdevelopment, ready for implementation, and so forth can be difficult. Many organizationshave adopted change management frameworks, such as those proposed by ITIL, thatoutline specific processes for reviewing and approving changes. Physically coordinatingthat process, however, can seem like herding cats. Its even worse when IT has beendivided into silos: The database team might have a change scheduled for tonight, but thatchange is going to conflict with the power supply changes being implemented by the data

    enter team. We need to get everyone on the same page.c

  • 7/30/2019 http://artandvids.blogspot.ca/ - Creating Unified IT Monitoring and Management in your Environment

    36/92

    CreatingUnifiedITMonitoringandManagementinYourEnvironment DonJones

    30

    StartingtheLoop:ConnectingMonitoringtotheServiceDeskMost organizations today have a ticketbased system for coordinating IT activities. Theseorganizations also usually have monitoring systems in place to watch their IT systems andalert them to any problems. Too few organizations, however, have connectedthese two

    systems. Ideally, thats what you want: A single, integrated IT management system that can

    detect problems and then automatically open tickets for the appropriate individuals. If theemail server is down, the appropriate administrator should get a ticket. Those tickets, ofcourse, should include notifications via text message, email, or whatever other medium is

    t.appropriate so that alerted individuals knowthey have an aler

    That autoassignmentyou might even choose to call it autoroutingof tickets needs to

    be pretty intelligent. Different systems, in different locations, at different times, all might

    change how the ticket is created, thus changing who is assigned to work the problem.Tickets should be as complete as possible, meaning as many fields as possible should be

    automatically populatedyou shouldnt have to rely on a Help desk, or someone else, to fill

    in the details. Those details might include the affected servers information. Figure 3.1

    shows what this kind of autogenerated ticket might look like, with several key bits ofinformation prepopulated by the system.

    Figure3.1:

    Automatically

    -generated

    tickets

    inresponse

    to

    alarms.

  • 7/30/2019 http://artandvids.blogspot.ca/ - Creating Unified IT Monitoring and Management in your Environment

    37/92

    CreatingUnifiedITMonitoringandManagementinYourEnvironment DonJones

    31

    The idea is to have a service desk solutionthats the software that helps coordinate andmanage IT activities, often through ticketsworking with the monitoring solution, thuscreating a truly integrated response to IT problems.

    This is all intended to provide specific benefits. First and foremost is faster problemresolution. By not waiting for users to inform you of a problem, youre getting started onsolving the problem faster. By having prepopulated tickets, the IT team is able to workmore quickly because theyre starting with more information.

    Theres a bit more depth that can be added, if you have the right service desk software inplace. Frameworks like ITIL encourage rootcauseanalysis,meaning your team should focusnot only on solving todays specific problem but also on making the overall environmentmore stable and problemresistant. To that end, a service desk solution can define twotypes of problems: global issues and specific incidents.

    Specific incidents might be daytoday problems like, Email moving slowly throughout theorganization, Order entry application operating slowly, and so forth. Those might all be

    tied to a global issue of Unexplained network slowdowns, which could be examined andsolvedperhaps locating a router that was overheating and dropping more packets thanusual.

    Sometimes, specific incidents might not be entirely solved until the overarching globalissue is solved. By tracking those individual incidents along with the global issue, you canhelp keep your users and managers more informed. For example, once that overheatingrouter is discovered and replaced, everyone affected by an associated specific issue couldbe notified: Hey, we think weve found the root cause for all the slowdowns, so thingsshould be better from here on out. Figure 3.2 shows how a single global problem can beattached to multiple incidents.

  • 7/30/2019 http://artandvids.blogspot.ca/ - Creating Unified IT Monitoring and Management in your Environment

    38/92

    CreatingUnifiedITMonitoringandManagementinYourEnvironment DonJones

    32

    Figure3.2:Relatingmultipleincidentstoasingleproblem.

    Ive used a couple of keywords in the forgoing discussion and want to take a moment tospecific defineally them in the context of this book:

    An incidentis something that happens in the environment, such as a failed server orion.a slow applicat

    IT staff createproblem records to help manage the incident. Problems may in fact beassociated with multiple incidents, as in the case of that overheating router, whichcaused multiple disparate failures throughout the environment.

    Im going to start using those two terms more consistently from here on. Hopefully, some ofthe benefits of combining monitoring with problem solving will become clear. For example,more simplistic Help desk solutions allow multiple tickets to be opened against what isessentially the exact same issue. That can result in a lot of duplicated effort, as multiple ITteam members attempt to work the issues on their own. It can also result in a lot of

    paperwork because solving the root cause then requires technicians to spend timelaboriously closing each ticket. With a more sophisticated system in place, everything canbe consolidated into a single, managed problem record. Doing so creates additionalbenefits, such as identifying solutions or workarounds, which Ill discuss in upcomingchapters.

  • 7/30/2019 http://artandvids.blogspot.ca/ - Creating Unified IT Monitoring and Management in your Environment

    39/92

    CreatingUnifiedITMonitoringandManagementinYourEnvironment DonJones

    33

    Problems and incidents, however, arent the only reason that users interact with IT.Hopefully, theyre not even the majorreason your users interact with IT! Aside fromreporting incidents, users also need to request routine services: advice, new hardwarerequests, routine change requests, access requests, and so forth. These interactions shouldbe managed through a more formal workflow in which users submit their request, have it

    assigned to the appropriate technician after being approved, and be able to track the statusst.of their reque

    For aex mple:

    1. A user might visit a Web site to browse a catalog of items they can request, such asaccess to systems, changes to hardware, and so forth.

    2. A user selects an item from the catalog, and provides whatever details are necessaryto complete the request.

    endingproval.

    3. A ticket is created in the service desk that represents the users request. Depupon the request, the ticket might first be routed to the users manager for ap

    4. Once approved, the ticket would be automatically routed to the appropriatetechnician or IT team for completion.

    5. The user would receive status updates, perhaps via email, throughout this process,keeping them informed of its progress. The status updates would include acompleted update once the request was finished.

    By using the same ticketbased system employed for problemsolving to address routinerequests, IT technicians can rely on a single interface to manage their workload. Figure 3.3shows what a routine request ticket might look like.

  • 7/30/2019 http://artandvids.blogspot.ca/ - Creating Unified IT Monitoring and Management in your Environment

    40/92

    CreatingUnifiedITMonitoringandManagementinYourEnvironment DonJones

    34

    Figure3.3:Routinerequestscanalsobemadeintotickets.

    Even better, IT management can rely on all IT work being documented and tracked in a

    single system, enabling management to stay informed through reports, dashboards, and

    other mechanisms. Figure 3.4 shows an example of what such a report might look like.

  • 7/30/2019 http://artandvids.blogspot.ca/ - Creating Unified IT Monitoring and Management in your Environment

    41/92

    CreatingUnifiedITMonitoringandManagementinYourEnvironment DonJones

    35

    Figure3.4:ManagementreportsbecomemoreeffectivewhentheyincludeallIT

    workload.

    The idea is to keep everyone in the loop: users remain informed, IT remains informed,

    management remains informed. Much of the burden of keeping everyone informed is

    handled by the software, which can send email updates and other kinds of notifications sothat everyone is aware of whats happening at all times.

    MakingChanges:HowtoFindaChangeManagementWindow

    Large, multidiscipline IT departments have inherent problems. In the previous chapter, Idiscussed the problem of silobased problem solving, where domain experts spend time

    passing a problem back and forth because everyone is looking at different tools and data to

    determine whether the problem is theirs. Were certainly not going to get rid of domainexperts, so the solution is to get tools that could puteverything into a single console in

    order to unify everyones efforts.

  • 7/30/2019 http://artandvids.blogspot.ca/ - Creating Unified IT Monitoring and Management in your Environment

    42/92

    CreatingUnifiedITMonitoringandManagementinYourEnvironment DonJones

    36

    Another problem created by those silos relates to change management. At the start of thischapter, I outlined one of those problems: The database team is ready to implement achange, but its going to be in conflict with a change being implemented by another group.Managing change windows is becoming increasingly difficult. Not only are applications andservices needed roundtheclock, creating tiny change windows in the first place, but the

    varying needs of different experts creates contention for those alreadysmall windows.Boss, wed have that fix in place, but we can only implement it at night. Its going to take 4hours, which just fits inside the window management allows us. But all this week, otherteams have been using the window, and the changes theyre making are blocking us fromdoing anything at the same time. Its not an unusual situation. It gets tough formanagement to even track what changes are pending and to slot them into the shrinkingtime thats available to make them.

    The lack of visibility into these windows, and the contention for them, makes it impossibleto even make a management decision. For example, if management could see the number ofchanges stacked up, and see the contention, they might decide to expand the window for a

    period of time in order to get the changes implemented. They might not decide to do that,but theyd be consciouslymakingadecisionrather than remaining ignorant of the actualproblem.

    The solution, of course, is software that facilitates the coordination of departments. Thinkabout it: If youre using a service desk solution to track tickets, then tickets can be createdfor proposed changes. Those tickets would be assigned to a technician, routed for reviewsand approvals, and so forth, all via some workflow you designed. Thats an excellent way tosupport ITIL processes, by the way. The tickets themselves can then feed a unifiedcalendar, built right into the service desk, which allows change planners to scheduleactivities. They can see agreed maintenance windows, manage contention betweenconflicting changes, and so forth. By getting this information into a familiar calendar form,they can also make decisions about whether to widen maintenance windows if doing so isnecessary and beneficial to the organization. Figure 3.5 shows a change managementcalendar.

  • 7/30/2019 http://artandvids.blogspot.ca/ - Creating Unified IT Monitoring and Management in your Environment

    43/92

    CreatingUnifiedITMonitoringandManagementinYourEnvironment DonJones

    37

    Figure3.5:Managingchangeschedulesinacalendarview.

    This is just another way to help keep everyone in the loop. Management now has a clearvisual depiction of change and schedule contention. Such a calendar could even be madeavailable to users so that they could see what changes were scheduled and plan their ownactivities accordingly.

    Communicating:HowtoBringUsersintotheLoopThe idea of keeping users informed certainly isnt new, but many organizations that haveattempted to better engage their users havent met with unqualified success. Too often,keep users in the loop solutions take the form of selfservice Web portals, where userscan log in to check the status of their tickets or to check the status of a particular service.Thats all well and good, but Web portals like that dont always fall within the naturalworkflow of a user. For example, most users, when confronted with some kind of problem,dont necessarily think to check a Web site and see if somethings wrongthey call theHelp desk.

  • 7/30/2019 http://artandvids.blogspot.ca/ - Creating Unified IT Monitoring and Management in your Environment

    44/92

    CreatingUnifiedITMonitoringandManagementinYourEnvironment DonJones

    38

    Users do, however, spend a lot of time in their email inbox. Why not make that yourchannel for communication? Organizations dont use this method of communication in partbecause doing so could easily become a time burden for your IT team. So on top of solvingthe problem, I have to send out hourly update emails with the status of the problem?Sounds like a Dilbert cartoon!

    In reality, a good service desk solution can do it for you. Sending an email update when ausers ticket is updated, for example, is an easy operation for a piece of software. Suchemails can be informative, and help users feel comfortable that their request is beinghandled. Figure 3.6 shows what one might look like.

    Figure3.6:Keepingusersinformedwi

    thdetailedemails.

  • 7/30/2019 http://artandvids.blogspot.ca/ - Creating Unified IT Monitoring and Management in your Environment

    45/92

    CreatingUnifiedITMonitoringandManagementinYourEnvironment DonJones

    39

    Whats more compelling is a service desk solution that can actually acceptrequestsviaemailrather than expecting users to go to a selfservice Web portal and open a ticket. Faceit: Your users are more likely to pick up the phone than visit a Web site, unless youveplaced significant artificial barriers in the way, like complex voice menus in the phonesystem. Users are more likely to send an email. If your service desk, rather than a human

    technician, can receive those emails and use them to create a ticket, youve truly created asystem your users are likely to embrace. Such tickets could still be autoassigned and routed, helping the right technician to start working the problem more quickly.

    Even for your users routine, nonproblem requests, email updates can be valuable. Whentheir request is approved, rejected, underway, completed, and so forth, an email updatehelps keep users informed without additional human effort.

    Note

    I want to emphasize that selfservice portals areagoodthing.They canprovide a rich user experience, help guide users to selfservice solutions, and

    more. They just shouldnt be the onlymeans of communicating with users.

    SLAs:SettingandMeetingRealisticExpectationsUnless youve been living under a rock for the past decade or so, Service Level Agreements(SLAs) are probably pretty familiar to you. These are, in their simplest form, an agreementby the IT team to provide a specific level of performance or availability for a specific serviceor application. The email service will be available 99.999% of the time on an annualizedbasis is an example of a very simple SLA.

    But SLAs can get complicated quickly. You cant just pull a number out of thin air; whatlevel of service can you reasonably provide? What level of service have you historicallyprovided, and is that meeting the business needs? Once established, how do you track theSLA to make sure youre actually meeting itand ideally get some kind of notificationwhen youre in danger of breaking the agreement?

    SLAs might not be the only type of agreement you need to define and track. Someorganizations also use underpinningcontracts (UCs) or operationallevelagreements(OLAs)for different in and outsourced services; these often support SLAs.

    A wellbuilt service desk and monitoring solution can help you handle these agreementsmore precisely. Youll start by defining toplevel SLAs, then creating and managing UCs andOLAs as appropriate.

    Once defined, the solution should be able to track ongoing performance and availability,perhaps offering a simple dashboardlike the one shown in Figure 3.7that illustratesyour compliance with your SLAs. You might also have more comprehensive and detailedreports on SLA metrics.

  • 7/30/2019 http://artandvids.blogspot.ca/ - Creating Unified IT Monitoring and Management in your Environment

    46/92

    CreatingUnifiedITMonitoringandManagementinYourEnvironment DonJones

    40

    Figure3.7:ManagingSLAswithat-a-glancedashboards.

    Most importantly, however, the solution needs to provide you with the ability to definerules for your SLAs so that tickets can be createdand autoassigned to the appropriatetechnicianswhen SLAs are in danger of being broken. Further, the solution should

    support escalation rules so that if an SLA that is in danger of being broken is not correctedwithin a certain amount of time, the solution can automatically call for backup, summoningadditional technicians, notifying management, and so forth.

    Theres also a strong need to recognize that no SLA is perfect. Sometimes, for whateverreason, the business will decide to take a service offline. Perhaps its for a software upgradeor for some kind of infrastructure maintenance. In those cases, youre notbreaking the SLA;youre agreeingalong with whatever part of the business will be affectedto temporarilysuspendthe SLA to get the work done. A service desk solution should support these types ofexceptions, including SLAs that are only valid during certain hours, holiday exceptions,agreedupon reduced service windows, maintenance windows, and so forth.

    The idea is to automate SLA definition and managementand to automate the notificationsthat go with SLAs. If an SLA is broken, you might agree that the affected business users willreceive an automatic notification. That lets them know thatITknows about the problemand is working on itwithout forcing users to visit a selfservice portal and open a ticket.That kind of proactive response can go a long way toward improving ITuser relationships,and in helping IT be viewed as responsive to, and supportive of, business requirements.

  • 7/30/2019 http://artandvids.blogspot.ca/ - Creating Unified IT Monitoring and Management in your Environment

    47/92

    CreatingUnifiedITMonitoringandManagementinYourEnvironment DonJones

    41

    TellMeWhatYouReallyThinkIT managers like IT to think of users as customers. In some cases, your users mightactually be customers, in the sense of sending you a check for specific services youprovide customers. In other cases, your users might be internal usersbut stillcustomers in the sense that they consume services you, the IT department, provides, and

    that you get paid for your efforts.

    A big problem that IT has always struggled with is its perception by its customers. Docustomers think youre doing a good job? Whatis a good job?

    For this reason, monitoring EndUser Experience (EUE) metrics, which I discussed in thefirst chapter, has become a hot trend in the IT industry. You might see that your serversperformance is within norms, but by the time you throw in old client computers, routers,network cabling, and everything else involved in delivering a service to users, theyhave