the path to operational intelligence - phoenix software · the path to operational intelligence the...

4
The Path to Operational Intelligence The move from reactive problem-solving to real-time data-driven insights is easier than you think WHITE PAPER

Upload: duongkiet

Post on 23-Apr-2018

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: The Path to Operational Intelligence - Phoenix Software · The Path to Operational Intelligence The move from reactive problem-solving to real-time data-driven insights is easier

The Path to Operational IntelligenceThe move from reactive problem-solving to real-time data-driven insights is easier than you think

W H I T E P A P E R

Page 2: The Path to Operational Intelligence - Phoenix Software · The Path to Operational Intelligence The move from reactive problem-solving to real-time data-driven insights is easier

W H I T E P A P E R

2

problem resolution is siloed in various organizations—meaning data isn’t shared, complex interactions between infrastructure and app components can’t be captured and subtle interactions between systems go unnoticed. Without systematic processes, valuable time is spent reinventing the wheel, there’s no organizational learning and little efficiency improvement over time. It’s an untenable state that’s hopelessly difficult to maintain and inherently not scalable.

IT’s natural response to work overload is to automate, but if systems management and troubleshooting remain siloed, the result is often a multitude of point tools, each monitoring a piece of the overall business nervous system. It’s a classic case of the elephant in the room where each organization only sees a portion of the whole, meaning problems spanning multiple applications and infrastructure silos still can’t be systematically diagnosed and solved.

But it doesn’t have to be this way. It’s far better to take a comprehensive approach to operational automation. This is done by first integrating data from all sources into an easily searched repository that can evolve from a troubleshooting tool, to a proactive monitoring and alerting platform and ultimately, an integrated intelligence system providing end-to-end operational visibility. The payoff is significant.

Splunk customers have been known to cut mean-time-to- resolution from a few days to a few minutes. Tesco, a multi-national retailer, uses Splunk software to reduce the average time for problem investigation by 95 percent, while other customers have cut incident response times 70 percent to 80 percent. Here’s how they did it—and your roadmap to Operational Intelligence.

Digital Business Means Managing Complexity Complexity is a way of life in IT, but with mobility, the cloud and software-defined everything now the norm, IT faces a rapidly growing and evolving set of digital footprints it must monitor, manage and troubleshoot. In a recent RightScale survey, over half of the respondents had more than 100 virtual machines (VMs) running in their private datacenters, while 40 percent had at least that many running on public clouds.1 Another survey estimates that the average medium to large enterprise uses between 300 to 400 cloud apps,2 while an NTT study found that the average U.S. enterprise has 100 internal business applications.3

Every one of these applications, physical servers, VMs and the associated network and storage equipment supporting them produces constant streams of data. Add in mobile apps and the situation gets worse. According to 451 Research, over the next two years, over 80 percent of enterprises in the U.S. plan to deliver custom-built mobile applications.4 Indeed, many digital businesses now generate hundreds of terabytes of data per day, a mix of routine system logs, customer transactions and application telemetry.

In the right hands and with the proper tools, this wealth of data is invaluable to solving system problems, isolating performance bottlenecks and identifying issues. Big operational data paired with sophisticated analysis software is the key to preventing problems, not just reacting to them, which ultimately improves the customer experience and reveals new business opportunities. Turning big data noise into valuable IT and business information follows what Splunk calls the path to Operational Intelligence. It’s a road more companies are traveling, but the path needn’t be arduous or disruptive. Better still, the journey itself is rewarding, with incremental benefits, including improved operational efficiency and more strategic decision making along with way.

The Current Situation Is UntenableIT complexity creeps up on most organizations, often going unnoticed until it creates a crisis. Things may seem to be running smoothly until one day ad hoc system admin and troubleshooting processes can no longer handle the increasing workload from more VMs, running more apps, interconnected in more complex ways. As front-line IT staff struggle to keep up, picture the digital equivalent of Lucy and Ethel in the chocolate factory: Problems go unnoticed or are triaged, take longer to diagnose and fix, are seldom documented and end up as recurring headaches.

The current state of IT operations is reactive, catching problems after they’ve snowballed into business disruptions. Worse still,

010101011000101010100100100101010100101010101011010110001001110100010101100101010010010101010001010101010110101010011000101001101001101001001010101010001100110101100110011010101000100101010110110010110100010110101010010101010010011010001100101010110010

Business InsightsMake better-informed business decisions by understanding trends, patterns and gaining Operational Intelligence from machine data.

Operational VisibilityGain end-to-end visibility across your operations and break down silos across your infrastructure.

Proactive MonitoringMonitor systems in real time to identify issues, problems and attacks before they impact your customers, services and revenue.

Search + InvestigationFind and fix problems, correlate events across multiple data sources and automatically detect patterns across massive sets of data.

Any Machine Data

Figure 1: With Splunk, you can transform machine data into real-time Operational Intelligence.

First Step: Search and InvestigationThe path to Operational Intelligence begins with data aggregation. It’s impossible to systematically analyze a hodgepodge of hardware, VM and application logs, metrics and transaction records in discreet silos and hope to reconcile across silos. Taking a big data approach, consolidating all operational and application data in an easily searched repository, provides visibility across complex interactions between different systems and correlating seemingly unrelated events.

Page 3: The Path to Operational Intelligence - Phoenix Software · The Path to Operational Intelligence The move from reactive problem-solving to real-time data-driven insights is easier

W H I T E P A P E R

3

For example, congestion on a particular switch or router might be causing database latency that slows transaction processing, crippling overall application performance. Or, buggy software on a VM could slow down other VMs on the same physical server. Using a fast query engine on a complete repository of system telemetry eliminates the administrator from manual hunting and pecking across a multitude of individual files in hopes of finding anomalous data and trying to manually correlate events on different systems spaced over time.

Many big data systems require preprocessing or reformatting raw data into a secondary database before the information is usable. This is an unwise approach for operational machine data, as this data varies in format and streams at massive volumes. Preprocessing IT machine data injects delays that make it nearly impossible to deal with high volume streams in real time and can lead to important information being discarded. It’s better to have a scalable system that can index and search data in its native format.

Splunk software can use data from any piece of equipment or application, whether on-premises or in the cloud, and file-based storage records data in its raw, unfiltered format — avoiding relational database overhead. Splunk software includes a powerful query language (SPL®) that supports sophisticated data searches, extraction and analysis.

An operational analysis system can also help investigators with event correlation. Examples of correlation identified by Splunk include:

1. Time and geo-location: related events based on time proximity or physical location

2. Transactions: a series of events related to a larger transaction or application function

3. Sub-searches: using the results of one search as input to another, like a series of filters

4. External lookups: events triggered by an external source

5. Joins: powerful SQL constructs that combine data from two or more sources

The benefits of a comprehensive, consolidated operational monitoring strategy flow directly into IT’s bottom line. For example, after installing Splunk software, the grocery chain Safeway eliminated 27 separate monitoring tools, and QTS, a managed services provider, saved $575,000 per year through retiring redundant and unnecessary tools.

Moving from Searching to Proactive MonitoringManual searching of consolidated operational data is a vast improvement over the status quo of siloed ad hoc troubleshoot-ing, but the logical next step is automation and augmentation: programmatically searching for anomalous events and deriving aggregate measures based on multiple events or queries. Two important features are the ability to define triggers for certain data values or events and the combination of several measures into a composite metric—for example, the aggregate delay between a web server displaying an order page and when the transaction is recorded in a database server.

Many organizations begin by performing manual searches, filters and analyses, but these can also be aggregated and summarized on performance dashboards tailored to different constituencies: IT support, app developers, IT management and line of business management. Dashboards provide a quick view on overall system health, problem areas and end-user ramifications. Furthermore, Splunk users can harness these features immediately with little setup and no programming required.

Even more customized searches and reports can be built using Splunk. Splunk’s extensibility has fostered a broad ecosystem of apps providing prebuilt dashboards, panels and UI elements focused on a specific technology or use case to make Splunk immediately useful and relevant to different roles. For example, the Splunk App for VMware includes over 50 prepackaged reports with real-time dashboards that provide both high-level visualizations of the entire VMware environment and detailed drill-downs into individual servers and storage systems.

Cloud service provider CloudShare uses the Splunk App for VMware to collect and analyze performance metrics and logs from every aspect of its VMware environment, including storage, networks, operating systems and its custom applications. This empowers CloudShare to troubleshoot customer issues up to 70 percent faster and get useful information about its current and historical resource usage and system status.

Next Stop: End-to-End Operational Visibility

So far we’ve described reports and dashboards for specific elements of IT infrastructure or applications; however, an Operational Intelligence platform can be used much more broadly. The next stage of the Operational Intelligence journey moves from monitoring infrastructure to providing a service-level view of IT. By measuring and correlating the activity from all the systems comprising a business process or end-user service, an Operational Intelligence platform exposes the myriad entwined factors affecting their performance. But successfully executing the strategy requires breaking down the silos between infrastructure and applications.

The same Splunk capabilities that enable comprehensive infrastructure reports like those in the Splunk App for VMware

Figure 2: More than 50 reports with real-time dashboards provide visualizations of the VMware environment.

Page 4: The Path to Operational Intelligence - Phoenix Software · The Path to Operational Intelligence The move from reactive problem-solving to real-time data-driven insights is easier

W H I T E P A P E R

www.splunk.com

250 Brannan St., San Francisco, CA 94107 [email protected] | [email protected] 866-438-7758 | 415-848-8400 splunkbase.splunk.com

Item # splunk_whitepaper_v3© 2015 Splunk Inc. All rights reserved. Splunk, Splunk>, Listen to Your Data, The Engine for Machine Data, Hunk, Splunk Cloud, Splunk Storm and SPL are trademarks and registered trademarks of Splunk Inc. in the United States and other countries. All other brand names, product names, or trademarks belong to their respective owners.

dashboard can be used to build measures of high-level performance indicators like SLA criteria and KPIs. For example, a Splunk app can monitor all aspects of a Microsoft Exchange deployment and show a consolidated view of email performance across the organization, allowing admins to quickly see resource utilization for all mail servers.

Comprehensive, system-wide visibility of relevant business metrics can improve capacity utilization by identifying under- and over-used systems or software licenses, infrastructure planning and IT budgeting. Holistic reporting can also improve security by identifying subtle interrelationship between systems, any policy gaps or inconsistencies on infrastructure supporting multi-tier applications, unauthorized configuration changes or anomalous usage.

The benefits of business-level operational visibility are amplified for organizations moving infrastructure and applications to the cloud, where resource consumption can be difficult to predict, leading to costly over-provisioning. End-to-end Operational Intelligence can help eliminate underused services and wasted money—or conversely, overtaxed resources and unhappy customers, business partners and employees.

Endgame: Operational Intelligence for Business InsightsThe ultimate goal of any big data system is to transform millions of pieces of mundane data into evidence-based business insights. Applied to Operational Intelligence, this means combining the same data, software and analysis techniques used to improve day-to-day IT operations with business-specific information like customer records (CRM), sales transactions and supply chain information (ERP) to provide deeper insight into strategic business initiatives, investment opportunities and new product and service roadmaps.

IT operational data and real-time user and usage analysis can identify your most valuable customers, segment a product’s most popular features by user demographic, expose where, when and how customers use or consume your product, highlight problem areas and yield insights toward changes that will improve customer satisfaction, stickiness and loyalty.

Operational Intelligence is a critical input in developing answers to questions like:

• What new product features should be prioritized?

• What is the performance of individual stores?

• Where should new stores be located?

• What is the sales mix in different locations and with different customer segments?

• What are the most effective sales promotions and ad campaigns?

For example, Domino’s Pizza uses Splunk software to visualize business sales trends across locations with metrics like orders per

minute, numbers of transactions per store, most popular menu items, coupon usage and even the most common mobile devices used to place orders. These insights enable more targeted, timely and hence lucrative promotions. They even empower marketing teams to analyze the success of campaigns in real time, enabling on-the-spot creation of one-off promotions to exploit time-sensitive events or regional trends, without waiting 24 hours to 48 hours for a batch data warehouse report to arrive.

Operational Intelligence Benefits Start with IT, End with the Business Comprehensive, end-to-end Operational Intelligence improves IT performance and effectiveness by providing an integrated view of the status and performance of IT infrastructure, services and business applications. The foundation of successful Operational Intelligence is data aggregation and mining. This approach transforms machine data from a hodgepodge of disconnected and often-unused log files into a valuable resource for analytics software that can highlight important usage trends, system inefficiencies and opportunities for process and business improvements, such as marketing campaigns and improving product/service enhancements. More significant benefits accrue as organizations build more automation, sophistication and business analytics into the system.

Technology trends like post-PC-era reliance on mobile devices and apps, along with the looming data explosion from the Internet of Things (IoT), make comprehensive Operational Intelligence a business imperative. The always-connected mobile lifestyle and work style of employees, business partners and customers presents great opportunities to exploit Operational Intelligence by using app instrumentation and telemetry to quantify feature usage and failure modes and to collect direct, immediate user feedback. Aggregating device and application data is also critical to the exploitation of the IoT or smart sensors and devices to improve business efficiency and customer service.

In summary, the tools and expertise developed in achieving Operational Intelligence constitute both a solid foundation and baseline requirement for success in today’s era of digital business.

Learn more at www.splunk.com.

1 “Cloud Computing Trends: 2015 State of the Cloud Survey,” RightScale, February 2015, http://www.rightscale.com/blog/cloud-industry-insights/cloud-computing-trends-2015-state-cloud-survey.2 “Cloud apps: Just how many does your firm use? Now guess again,” ZDNet, September 17, 2014, http://www.zdnet.com/article/cloud-apps-just-how-many-does-your-firm-use-now-guess-again/.3 “Cloud Reality Check,” NTT, March 2015, http://www.cloudrealitycheck.com.4 “Mobile APM Comes of Age as Continuous Improvement of the End-User Experience,” 451 Research, March 2015, https://451research.com/report-long?icid=3380.