infosys information platform - translating data into action

34
Translating data into action Infosys Information Platform

Upload: infosys

Post on 14-Apr-2017

384 views

Category:

Documents


2 download

TRANSCRIPT

Translating data into action

Infosys Information Platform

Table of Contents

Abstract ......................................................................................................................................................... 4

Introduction.................................................................................................................................................... 4

The Infosys Information Platform (IIP) in Action ........................................................................................... 5

Analytics-driven preventative maintenance and downtime reduction ....................................................... 5

Real-time operational visibility................................................................................................................... 6

Augmented revenue and profitability ........................................................................................................ 6

Information Management Realities ............................................................................................................... 8

There’s lots more data than ever .............................................................................................................. 8

Data has tremendous diversity ................................................................................................................. 8

Data is generated by multiple external sources ........................................................................................ 8

Data arrives very quickly ........................................................................................................................... 8

Cross-system data is generally uncorrelated ............................................................................................ 8

Barriers to Success ....................................................................................................................................... 9

Installation and integration of a modern data platform .............................................................................. 9

Staffing shortfalls ....................................................................................................................................... 9

Administrative hassles .............................................................................................................................. 9

What can change ........................................................................................................................................ 10

IIP in Brief .................................................................................................................................................... 11

IIP in the enterprise ................................................................................................................................. 11

IIP Customer Success Grid ......................................................................................................................... 12

Analytics-driven preventative maintenance & downtime reduction......................................................... 13

One of world’s largest mining companies ........................................................................................... 13

A major ATM manufacturer ................................................................................................................. 14

A multinational telecommunications enterprise ................................................................................... 15

Real-time business operational visibility ................................................................................................. 16

ATP Tour ............................................................................................................................................. 16

A global financial services institution .................................................................................................. 17

A world leader in agribusiness ............................................................................................................ 18

A global electronics and imaging major .............................................................................................. 19

Augmenting revenue and profitability ...................................................................................................... 20

A global automation major .................................................................................................................. 20

A global pharmaceutical supplier ........................................................................................................ 21

North American freight railroad network ............................................................................................. 22

The largest chocolate manufacturer in North America ....................................................................... 23

IIP Building Blocks ...................................................................................................................................... 24

Layer 1: Flexible data management ........................................................................................................ 25

Layer 2: Insights development & analytics .............................................................................................. 26

Layer 3: Insights-as-a-service ................................................................................................................. 27

Table 1: IIP Layer 1 components ............................................................................................................ 28

Table 2: IIP Layer 2 components ............................................................................................................ 31

Next Steps ................................................................................................................................................... 34

Abstract

Infosys has strengthened its incomparable proficiency in helping the world’s most advanced enterprises

improve operations and run their businesses with a power packed technology driven platform known as

Infosys Information Platform (IIP)

IIP is an analytics platform that orchestrates open source software, value-added enhancements, and strong

professional service expertise. Along with its single-click installer, data ingestion framework and graphical

data-modeling tool, IIP supplies a comprehensive array of adapters for diverse data sources as well as an

easy way to create new connections when needed. Out-of-the-box integration with R studio simplifies

harnessing the power of clusters while modeling data. Significantly, all of IIP’s benefits can be realized

without requiring extensive coding.

All of these and other features help customers discover actionable insights and foresights by deriving

meaning from the abundant - and untapped - sources of information.

This paper will help you understand IIP, its design philosophy, technical architecture and success stories.

Introduction

So far, it’s been strenuous and time-consuming to obtain insights from the enormous amounts of raw data

from internal and external sources that flood enterprises every day. And often, even when information

analysis has been successful, tangible insights that convert to business results have frequently been

elusive.

Infosys Information Platform (IIP) brings in all the right ingredients such as technology, toolsets and

processes to obtain insights near real time from all kinds of data – historic or current, idle or ever- changing,

structured or unstructured.

Solutions developed using IIP deliver quick and meaningful business outcomes such as:

Analytics-driven preventive maintenance and downtime reduction

Real-time operational visibility

Augmented revenue and profitability

Organizations that deploy IIP are able to realize these benefits while still protecting and rejuvenating

existing IT technology investments.

We begin this paper by describing a few common solutions and then we will illustrate IIP’s architecture and

distinct value proposition. We then depict how the unique challenges of today’s information landscape

served as the rationale behind Infosys’ development of AiKiDo and the IIP solution that it incorporates.

The intended audience for this paper includes line-of-business executives, IT leadership, and anyone else

interested in translating raw data into insights and guidance that the business can swiftly use to drive action.

The Infosys Information Platform (IIP) in Action

By running existing workloads more efficiently while unearthing new opportunities, IIP makes it possible to

leverage untapped data from numerous internal and external systems and sources to reveal insights and

suggest quick courses of action.

Market adoption of IIP has been impressive: within the first year of its existence, 200 customers have

completed evaluations, with dozens now onboard. To help customers get up and running quickly, Infosys

offers a preconfigured Data Analytics solution.

IIP with rich professional expertise in roles such as business analysts, technology architects, data scientists,

data engineers, and software developers, produces business solutions constructed on the platform. These

solutions result in the benefits to the bottom line in dozens of engagements across industries and

applications.

These achievements cover a broad range of applications, such as:

Fraud analytics

Predictive analytics for maintenance

Digital shopper insights

Customer churn analysis

Risk exposure analytics

Trade data analysis and regulatory reporting

Real-time machine learning

Working capital allocation optimization

Below are some of the examples of our solutions

Analytics-driven preventative maintenance and downtime reduction

1. One of the world’s largest mining operators has placed nearly 200 sensors on each of its autonomous, unmanned vehicles. IIP’s real-time data analytics – ingesting and processing 27,000 messages per second - predicts which of these vehicles is about to fail.

This guidance drives repairs before downtime can occur, and is a great illustration of a completely new type of application made possible by IIP.

2. A major ATM manufacturer and service provider turned to IIP to develop a fresh, innovative solution that analyzes four million records of alert and incident data from 8,500 machines in an effort to foresee which ATMs would fail within one week.

The outcome included downtime reduction of 10%, a 14% increase in service call efficiency, and an 18% cost reduction thanks to more accurate, productive client visits.

3. A multinational telecommunications services company had recognized that network faults were the biggest single cause of disruptions, but determining the time and location of impending failures was nearly impossible. Complex analytics on millions of operational records conducted using IIP helped unearth the fact that three percent of the company’s lines were at risk of having a fault sometime during the next three weeks.

Armed with this knowledge, the organization immediately targeted the imperiled lines for repairs before the anticipated outages could occur.

Real-time operational visibility

1. ATP Tour - the governing body of men’s professional tennis - wanted to add new color and depth to fans’ understanding of the game in an open and cost-effective way.

Infosys loaded extensive historic data consisting of millions of data points from multiple systems into IIP. The results - which were available in near real-time - provided a comprehensive collection of in-depth player performance probability-led foresight to the media, game commentators, and the sport’s worldwide fan base.

2. Spurred by regulatory trade requirements, a leading global financial institution processing six million transactions per day employed near real-time analytics in IIP to slash report generation times from 10-15 minutes to 35 seconds.

This is an example of better utilization of existing infrastructure brought about by IIP.

3. A world leader in agribusiness, was facing application performance challenges in their management reporting solution that was staggered by large volumes of data. Infosys implemented a proof-of-concept using the Infosys Information Platform (IIP) to improve the performance of the reporting solution. During the exercise, IIP could inject 19 million records in just six minutes. This was a breakthrough compared to the existing platform’s performance which took over an hour to inject half a million records.

IIP could conduct report / dashboard navigation in under five seconds while the existing platform took over a minute to perform the same.

4. A global imaging and electronics manufacturer of printers, photocopiers and fax machines sought to rejuvenate their Accounts Receivables reporting process.

Along with ongoing, daily production details, Infosys migrated 24 months of historical information from the existing data warehouse into IIP. At the same time, the enterprise’s data models, views, and reports were all streamlined and optimized.

Turnaround time for daily data integration was 37% faster, and report performance times were cut in half.

Augmented revenue and profitability

1. To help identify existing customers with a propensity to buy specific products and services, a major office automation vendor rolled out a new application that utilized IIP’s machine learning and in-memory computing capabilities to analyze more than two million records of previous purchases.

The complete set of predictions was concluded in seven seconds.

2. A global pharmaceutical supplier was hampered by the amount of time it took to identify backorders.

Using IIP to consume SAP-generated order details, they created a new solution that analyzes the entire data set and identifies - and helps correct - supply shortfalls within 10 seconds.

3. A major North American freight railroad network was eager to come up with new tactics to reduce the quantity of unnecessary braking events generated by the Positive Train Control (PTC) system for its locomotives. These unanticipated incidents diminished the organization’s ability to adhere to its published operating schedules. Infosys used IIP and the R programming language to analyze an expansive set of operational metrics and then develop a delay event prediction model.

The new approach helped the railroad adjust the PTC and significantly diminished the number of unnecessary braking occurrences.

4. The largest chocolate manufacturer in North America lacked a timely, systematic methodology for determining when its products were unavailable for purchase at retail locations.

Since many consumers make their buying decisions impulsively, these frustrating inventory shortages resulted in lost revenue and diminished brand loyalty.

IIP served as the computing platform for a collection of statistical models that helped to classify out-of-stock events and determine their root cause. Along with this analysis, the new solution also alerted the appropriate users to help prevent these costly episodes.

To learn more about these IIP accomplishments, please see the IIP Customer Success Grid that’s

presented later, or visit http://www.infosys.com/information-platform/case-studies/Pages/index.aspx)

In the next section, we portray some of the modern information complexities that Infosys needed to

overcome when constructing the solutions that we just illustrated. These dynamics also helped influence

the design and development of the IIP platform itself.

Information Management Realities

Regardless of industry, every IT organization must confront an assortment of commonly discomforting

truths about how data is created and utilized today. Each of these factors were integral considerations when

Infosys developed IIP.

There’s lots more data than ever

According to a study 1published by EMC and IDC, from 2013 to 2020, the digital universe will grow by a

factor of 10 – from 4.4 trillion gigabytes to 44 trillion. It is more than doubling every two years.

Data has tremendous diversity

Previously, information was principally generated by enterprise applications using a standard relational

structure that was easily catalogued and employed.

Naturally, structured application information is still a big and meaningful portion of the overall IT portfolio,

but data variety now encompasses unstructured sources such as:

Social Media Feeds

Machine Logs

Document scans

OCR Data

Data is generated by multiple external sources

There was a time that IT leadership could simply focus attention on its own application and data collection.

That’s passé: IT must now have a plan to interact with, and react to, data created by innumerable outside

sources.

Data arrives very quickly

These new information categories are typified by the speed at which they’re generated and distributed. For

example, consider how rapidly a video clip, tweet or Facebook post can go viral.

Cross-system data is generally uncorrelated

When the bulk of the enterprise’s data was from well-defined enterprise applications, it was relatively

straightforward to understand and manage information interconnections. This is much more daunting today,

since properly linking raw - and often unstructured - data from diverse sources takes a lot of work.

1 The Digital Universe of Opportunities: Rich Data and the Increasing Value of the Internet of Things, EMC

Digital Universe with Research & Analysis by IDC

Barriers to Success

Smart enterprises appreciate the untapped value of the data that’s generated by their operations every day.

Predictably, many organizations are making significant investments in hardware, software, and personnel

in an effort to derive advantages from this idle information.

Sadly, these technology expenditures have often failed to deliver on their promise. At a recent Gartner Big

Data Industry Insights event2, analyst Lisa Kart stated, "73 percent of enterprises are either investing or

planning to invest in big data. However, of these, 65 percent are struggling with determining how to get

value from big data."

The gap between anticipation and results is not astonishing as endeavors of this kind that has the premise

of “making sense of big data” need a careful thought and actions towards it than simply selecting and

purchasing technology. Also, apart from the usual data volume, velocity and variety, there are many

technological, skills and processes’ challenges that diminish the returns from the IT expenditures. Let’s take

a look.

Installation and integration of a modern data platform

Despite the existence of IT infrastructure, budgets and competencies, enterprises today demand advanced

analytics capabilities which require varied components working seamlessly to derive insights. Modern

information management platforms need to be integrated with the existing components alongside

installation of big data components, which we all know, is a challenge.

Staffing shortfalls

There’s a dearth of talented individuals with expertise in data science and platforms like Hadoop. This

means that key IT staff have additional, often perplexing responsibilities such as managing data diversity,

creating an accurate, meaningful enterprise information model, designing and developing algorithms and

applications necessary to turn potential returns into reality.

Administrative hassles

Since each constituent in a modern information management platform has its own lifecycle, administering

these solutions is an ongoing, often highly convoluted process that entails:

Keeping up with the rapid pace of open source technology advancements

Controlling versions, for infrastructure as well as applications

Managing across projects

Configuring and managing enterprise-grade security

Ensuring compliance with mandated internal standards as well as those from external regulatory agencies

While IT struggles to surmount the gaps between the promise and reality of the enterprise’s data, the user

community is often disappointed at the pace - and eventual upshot - of these efforts.

2 Gartner Webinar, Big Data Industry Insights, Lisa Kart, January 27, 2015

First, there is a remarkable scarcity of easily deployed, user friendly, and industry-specific applications that

can make sense of - and offer meaningful recommendations about - the enterprise’s substantial information

assets.

In the absence of packaged solutions, staff members naturally turn to self-service options - including

analytics - as a mechanism for gaining the insights they crave. This often results in yet another instance

where the promises portrayed by IT and software vendors diverge from the actual end-user experience.

Finally, even in those rare cases where all of the prerequisites are in place for an exceptional user

experience, delays during the data ingestion process - for both structured and unstructured information -

introduce unendurable lags in turning suggestions into action.

What can change

Enterprises that seek profit from the modern data landscape in real-time should seek technologies that:

are adept at handling both structured and unstructured information

increase sophistication by moving from old-style, basic relations to more far-reaching correlations

future proof investments by leveraging existing assets and letting the enterprise select best-of-breed modules

make it easy to profit from technology advancements without causing unnecessary expenditures or downtime

deliver rapid insights by moving away from the existing user request/IT response paradigm towards ‘self-discovery’

With these fundamentals in place, users are empowered with self-service capabilities that let them

answer questions without needing to request assistance from the

IT organization.

All of the best practices that we’ve just described have served as foundational guidelines for the Infosys

Information Platform, which we describe next.

IIP in Brief

As a global leader in consulting, technology, outsourcing and next-generation services, Infosys enables

clients in more than 50 countries to stay a step ahead of emerging business trends and outperform the

competition.

An entrepreneurial adventure that began with seven engineers and $250, Infosys is now a publicly traded

company (NYSE: INFY) driven by more than 179,000 relentless innovators and annual revenues of more

than $8.7 billion.

By focusing on seven core mega-trends, Infosys supplies enterprises in every industry with strategic

insights and a framework to uncover opportunities for innovation-led growth.

Inspired by its unparalleled experience in supplying the precise combination of technology, people, and

know-how to the world’s largest and most sophisticated organizations, Infosys has launched the AiKiDo

initiative.

AiKiDo is an integrated, coordinated, and strategic solution amalgamated from three distinct families of

products and services:

Ai: Platforms and Platforms as a Service

Ki: Knowledge-based management and landscape evolution

Do: Design thinking and design-led initiatives

AiKiDo’s overall mission is to help Infosys’ customers improve current operations - and thereby renew their

existing technology and business process landscape - while driving innovation by uncovering formerly

hidden opportunities to solve challenges.

IIP in the enterprise

As part of the Open Data aspect of the Ai platform set, Infosys has developed the Infosys Information

Platform (IIP). It’s a complete solution that blends open source software, value-added components, a robust

partner ecosystem, and deep professional service proficiency.

It is an analytics platform that enables you to quickly glean insights from all types of data sources and use

them for decision support across industries.

IIP offers all necessary open source components in one validated package that can be installed with a

single click. This saves a tremendous amount of time, and keeps staff focused on delivering value, rather

than downloading, installing, and maintaining individual software bundles.

All of IIP’s benefits can be attained without writing or manipulating any open source platform code, so

enterprises can emphasize core applications and business outcomes without having to hire expensive open

source experts.

IIP is tailored to avoid vendor lock-in, and works with any licensed software or open source Hadoop

distribution such as Cloudera or Hortonworks, or relevant components acquired directly from the related

open source Apache project.

An intuitive user interface abstracts away the nuances and complexities of the underlying platform’s open

source technologies, while also eliminating the need to code for many common tasks – including data

modeling. This greatly increases the productivity of scarce data engineers and data scientists. When coding

is necessary, everything that IIP generates is 100% open source for ease of maintenance.

Enterprises are free to make substitutions without incurring any downtime or outages, keeping all options

open when surveying the latest technology advancements. In fact, if they wish to they can replace IIP’s

open source components with already-installed technologies: it’s entirely up to the enterprises to select the

amount and type of open source software in their environment. Also, they can select any desired

deployment model for IIP, including on premise, public cloud, private cloud, and hybrid topographies.

Since it runs on commodity hardware and free from software license fees due to open source, IIP

significantly reduces hardware, software, professional service, and operational outlays and thus total cost

of ownership is significantly better over existing market offerings.

The most salient feature of IIP is that it is a secure platform while still taking full advantage of open source

technologies. Encryption, authentication, data lineage, and cluster monitoring tools provide far-reaching

security, management and compliance with data audit and governance mandates.

There is more to Infosys’ commitment to advance the state of the art of modern information platforms. It

extends far beyond simply providing a well-integrated solution to its customers. One such instance is -

Infosys is a Platinum Sponsor of the Open Data Platform (ODP) initiative: the open ecosystem of Big Data.

Infosys actively works with other industry leaders to promote and enhance Big Data technologies and open

source projects such as Apache Hadoop. These advances include Infosys contributions to performance

and security, which we will describe later in this paper.

Another example is - IIP has also been certified as a test bed by the Industrial Internet Consortium,

demonstrating its relevance in the rapidly evolving landscape of sensor data analytics and the Internet of

Things (IoT).

IIP Customer Success Grid

The following section summarizes few examples that have profited from IIP’s speed, scalability, and open

architecture. These capture business challenge, high-level solution summary, and benefits. They are

classified as

1. Analytics-driven preventative maintenance & downtime reduction

2. Real-time business operational visibility

3. Augmenting revenue and profitability

Analytics-driven preventative maintenance & downtime reduction

One of world’s largest mining companies

Business context Solution highlights Results

One of the world’s largest mining companies utilizes a fleet of autonomous, unmanned trucks.

These vehicles operate in numerous locations globally, and a failure negatively impacts the entire supply chain.

Each truck is equipped with nearly 200 sensors, which continually broadcast 400 data points of telemetry about the state of the vehicle (e.g. temperature, vibrations, and tire pressure) along with details about the terrain in which it’s currently operating.

Apache Kafka was configured to stream 27,000 of these messages per second into IIP, where a mathematical model developed in Apache Spark computed maintenance requirements as well as the likelihood of an upcoming failure.

A native HTML5 application presented a color-coded global map indicating the state of all vehicles, and permitting drill-down on any individual truck.

Machine breakdowns and production interruptions have been significantly diminished, resulting in less downtime and more savings.

Users can interact with much more accurate indicators for a collection of critical metrics such as:

Production schedule adjustments

Spare part requirements

Energy costs

Optimal asset utilization

Thanks to the operational efficiencies gained from the real-time, elastic, and scalable IIP solution, the enterprise is launching an initiative to increase its fleet of unmanned trucks by 300%.

A major ATM manufacturer

Business context Solution highlights Results

A major ATM manufacturer and service provider sought techniques to reduce maintenance costs while offering higher reliability and improved customer service to its clients.

An IIP solution - hosted on a 10-node Amazon Web Services (AWS) cluster - was developed to ingest four million records of ticketing data generated by 8,500 ATMs.

The entire data loading and cleansing process took 27 seconds, and the follow-on Apache Spark-based logistic regression analysis with reliability predictions concluded in only 60 seconds.

The final results were then transmitted to the customer’s Oracle database, and presented through the Tableau business intelligence solution.

The IIP solution was able to predict - with an 80% reliability rate - the likelihood of an ATM failing within one week.

Using the outage predictions generated by the IIP solution, each technician is now able to conduct 4 service calls per day, which is a significant increase from the earlier average of 3.5 service calls per technician per day.

Accurate failure predictions and the resulting optimized service calls have helped shave costs by 18%.

Meanwhile, chronic defects are now corrected rapidly - in hours rather than weeks.

A multinational telecommunications enterprise

Business context Solution highlights Results

A multinational telecommunications enterprise wanted to identify - and then correct - potential network faults that could result in costly and inconvenient service disruptions.

Experts from Infosys used IIP to process and analyze more than 16 million records of ADSL connection details such as attenuation/loss, code violations, upload/download rates, and re-initializations.

These computationally-intensive examinations resulted in two distinct “signatures”:

1. A profile produced by normal, non-fault activities (“Control signature”)

2. A profile that indicated an incipient fault (“Fault signature”)

Applying a statistical model to then compare these two signatures served as a reliable indicator of which lines were candidates for a near-term outage.

The IIP-based solution identified three percent of the firm’s lines as being at-risk of a looming interruption at some point in the subsequent three weeks.

Using these insights as a roadmap, the firm was able to get a head start on repairing the problematic lines before trouble could develop.

This has resulted in reduced downtime and more optimally allocated maintenance resources.

Real-time business operational visibility

ATP Tour

Business context Solution highlights Results

ATP Tour - the governing body of men’s professional tennis - eagerly sought new, innovative techniques to help commentators and fans get a better understanding of the fast-paced game.

Their mission was to go far beyond traditional statistics to uncover previously hidden insights.

Infosys loaded millions of data points into IIP, including umpire data for 12 months as well as five years of data from the computerized Hawk-Eye ball tracking system used in the Barclay’s ATP World Tour Finals.

Requiring just two nodes of eight core CPUs and 16GB of RAM for hardware, IIP concluded its analysis in near real-time.

An enormous number and variety of statistics - and their impact on the game - are now available for fans. Just a few examples of these metrics include:

Shot speed

Shot placement

Point winning shots

Fatigue indexes

Serve analysis

ATP now offers this research to match commentators along with publication on ATPWorldTour.com for the benefit of fans and journalists.

A global financial services institution

Business context Solution highlights Results

A global financial services institution carries out approximately six million trades per day. Regulatory requirements dictate that certain trades must be reported in a specific format within a 15-minute window.

There were numerous instances where the organization failed to make obligatory notifications within the mandated timeframe.

These delays resulted in non-compliance alerts and costly financial penalties.

Apache Sqoop extracts trade details from an Oracle database and loads them into the Hadoop File System (HDFS) residing on IIP. The data extraction process includes data cleansing, validation, and derivation operations.

Nearly 600 million transactions were loaded into a 100 AWS cluster at a rate of 130,000 transactions per second.

Upon completion of the relevant computations, the regulatory results are returned to Oracle, and various analytic reports are available from Tableau.

IIP completed the entire processing and reporting assignment within 35 seconds.

The enterprise now has an elastic and scalable strategy that eliminates violations and penalties, and can easily support future growth.

A world leader in agribusiness

Business context Solution highlights Results

To offer timely information to their user community, a large agribusiness concern aimed to speed up data loads and report generation by deploying an inexpensive, cloud-based business intelligence data warehouse acceleration solution.

The entire information portfolio - consisting of 19 million records of master data, sales, costs, and inventory - was loaded into an on-premise two-node IIP cluster.

It took only six minutes to transfer the complete data set, and a full suite of reports were generated in less than 20 seconds.

These reports - which were presented in Tableau - provided guidance on sales performance, budget variances, and geographic revenue summaries.

Month-end processing is now completed in near real-time, and the data load task is 600 times faster than in the previous solution.

Users are able to gain access to the reports they need 60 times more quickly than before.

A global electronics and imaging major

Business context Solution highlights Results

An international

manufacturer of

imaging and

electronics

technology such as

printers, copiers,

and fax machines

desired a fresh

alternative to its

data warehouse

infrastructure.

Infosys created a collection of ingestion

models to load active and historical data

related to payments, credits, debits, and

adjustments from multiple source systems -

including a massive existing data lake - directly

into IIP.

The data model was optimized and

harmonized, with summary tables created in

Apache Hive.

Spark SQL and Tableau were assigned the

task of presenting information to users.

As part of this migration and streamlining

effort, Infosys was able to cut the number of

views in half, and offered extensive new

visualization and extraction options to users.

The essential job of

loading information from

source applications and

data warehouses was

improved by 37%.

Report generation times

were trimmed by 50%,

and the business

profited from far greater

accuracy and reduced

variances using the new

solution.

Augmenting revenue and profitability

A global automation major

Business context Solution highlights Results

In an effort to more effectively allocate marketing and sales resources, a large office automation concern sought a reliable method to predict the likelihood of existing customers making subsequent purchases.

More than two million records of current customer details and monthly sales transactions were retrieved from production systems and loaded into Hadoop.

Apache Spark was used to create near real-time, in-memory machine learning models to accurately identify which customers within a given sales territory were likely to make a purchase.

Results were available for user consumption in Tableau within seven seconds.

The ensuing reports were accurate in identifying those customers that were genuine candidates for repeat purchases.

The business used this information to drive cross-sell and upsell efforts.

A global pharmaceutical supplier

Business context Solution highlights Results

A global pharmaceutical manufacturer’s revenue was negatively impacted by delays in determining back order specifics.

To scale these obstacles, the organization sought to take advantage of high performance distributed computing.

A comprehensive information portfolio was loaded into IIP.

This data set consisted of fine-grained details about customers, orders, products, and manufacturing plant availability.

Computations were performed in IIP, with the resulting guidance delivered in 10 seconds on a single node server.

The results were then presented to users via Tableau.

Management now has a much more accurate picture of potential backorder issues, and can take corrective action long before problems impact revenue.

North American freight railroad network

Business context Solution highlights Results

A major North American freight railroad network searched for ways to eliminate an ongoing series of needless braking events for its locomotives.

Infosys loaded a diverse set of metrics into IIP running on an AWS cloud. These values - which created a data set of hundreds of terabytes - included locomotive brake data, engineer characteristics, wayside data streams, weather information, maintenance details, and signal data from the Positive Train Control (PTC) system.

Using the R programming language with resulting visualizations presented in Tableau, Infosys performed a series of investigations such as Pareto analysis of braking events, basic text mining of delay comments, and locomotive delay prediction.

These inquiries demonstrated that erroneous signals, speed restrictions, and switch alignments were the primary culprits in the unwanted braking occurrences.

Applying the recommendations delivered by the IIP-based solution helped the railroad predict - and then prevent - the factors that were causing the braking events.

A one-mile-per-hour increase in train velocity can yield $200 million of incremental revenue, so these adjustments had a major impact on the enterprise’s bottom line.

The largest chocolate manufacturer in North America

Business context Solution highlights Results

This organization recognized its inability to accurately determine which retailers were lacking inventory was resulting in lost revenue and unhappy customers.

350 million rows of sales and inventory data were loaded into a five-node IIP instance running on AWS.

Infosys developed a collection of statistical models that classified - within four minutes of processing - the out-of-stock incidents and ascertained their root causes.

The resulting Tableau dashboard - which presented heat maps of details about stores, days, times, and items - gave users the necessary insights to avoid product availability shortfalls.

According to industry trade journal Retail Wire, out-of-stock incidents such as those experienced by this organization account for approximately 3.2% of lost revenue.

By eliminating these events, the enterprise stands to gain more than $100 million of incremental sales.

IIP Building Blocks

As illustrated in figure 1, IIP encompasses three fine-tuned yet well-integrated layers:

Layer 1: Flexible data management

Layer 2: Insights development & analytics

Layer 3: Insights-as-a-service

Figure – 1

Layer 1: Flexible data management

IIP’s data management layer is a pre-configured, curated, and optimized collection of well-known, industrial

grade open source technologies. When architecting IIP, Infosys carefully surveyed the market to choose

each component.

This tactic supplies all of open source’s advantages – such as cost, performance, transparency, and vendor

flexibility - while minimizing the drawbacks such as research, technology acquisition, and maintenance that

are prevalent when deploying open source.

Customers are also free to substitute their own already-implemented infrastructure for any of the bundled

technologies provided by Infosys.

Table 1 enumerates the extensive list of open source software that constitutes IIP layer 1.

Infosys has been an active participant in advancing the open source projects that make up

layer 1. A few instances of these contributions include:

Data level authorization on Spark views along with Hadoop File System (HDFS) tables accessible via Spark

Role-based access control on Spark Views and HDFS tables accessible via Spark

Auto-registration of Spark views on Apache Thrift server restart

Registration of multi-table joins as views through Spark beeline

Multi-threading in Sqoop

Callback capabilities in Sqoop created to report execution statistics

Infosys has also developed its own sentiment analytics engine that offers text analytics models and

algorithms for meaningful indicators such as:

Buzz

Sentiment

Affinity

Opinion

Network

Influencer

Layer 2: Insights development & analytics

The second layer of the IIP architecture builds on the robust foundation of open source and customer-

supplied information processing components that form IIP’s base layer.

Infosys has developed a collection of far-reaching, value-added, enterprise-grade capabilities that assist

customers with essential tasks like:

Installation

Administration

Data loading and modeling

Performance

Scalability

Publication framework

Security

Table 2 describes each of the items found in IIP’s second layer.

We continue to make major investments in IIP.

Upcoming capabilities will include:

Rules engine integration

Elastic search integration

High availability and disaster recovery

Web aggregators

Archiving and aging

Layer 3: Insights-as-a-service

IIP is a potent combination of open source and Infosys-supplied supplemental software. It provides

customers with the technical prerequisites to build applications that fully exploit today’s information

landscape. Infosys stands behind IIP with a large, highly-skilled specialists’ pool covering all aspects of

developing modern applications:

Infrastructure management

Functional expertise

Technology acumen

Business analysts

Data scientists

Given our history of achievement, many clients also opt to take advantage of a group of related service

offerings such as:

Integration and implementation customizations

Custom data extractors and adaptors

Client-specific data modeling and cleansing

Client-specific data science and advanced analytics

On-demand agile application development

Table 1: IIP Layer 1 components

Component Purpose

Apache Hadoop

A popular framework and ecosystem that facilitates batch-oriented distributed processing of massive amounts of data

Apache Hive

Infrastructure erected on top of Hadoop and the Hadoop File System (HDFS) to provide data warehouse capabilities such as querying, analysis, and summarization

Apache Kafka

A message broker that streamlines and speeds the important job of ingesting real-time data feeds

Apache Open NLP

A machine learning toolkit intended for processing natural language text.

Apache Shiro

Security framework for Java applications that offers authentication, authorization, encryption, and session management

Apache Spark

A cluster computing and processing framework, designed for very high throughput and performance, especially when incorporating machine-learning algorithms

Apache Sqoop

Technology developed to transfer data between relational databases and Hadoop

Component Purpose

Apache Yarn

A platform that manages computational assets that are aggregated in clusters and schedule applications on those resources

Apache Zeppelin

Provides easily-created, interactive data analytics using popular Big Data technology back-ends

Apache Zookeeper

Provides a naming registry for large distributed systems, along with keeping track of configuration and synchronizing information across the computing cluster

Azkaban

Technology developed by LinkedIn to permit scheduling of batch Hadoop jobs

Hibernate

A framework for mapping objects between Java and relational databases

Hipi

Hadoop Image Processing Interface: a library targeted at very fast image processing using MapReduce computational patterns

Java Development Kit (JDK)

Complete application development infrastructure for the Java Programming language

Kerberos Software that implements a network authentication protocol that makes it possible for nodes to securely communicate, regardless of whether the underlying network is secure or not.

MySQL

A widely adopted open source relational database, utilized as internal storage by the IIP platform’s Quartz scheduler.

Quartz

An open source, Java library that permits job scheduling and workflow coordination directly from an application.

RStudio

A specialized programming language (“R”) and supporting development studio intended for developing data analysis and statistical applications.

RStudio Server Enables a browser-based user interface to applications written in the R programming language that are running on a remote server

Twitter 4J

A software library that integrates Java applications with the Twitter API

Table 2: IIP Layer 2 components

Component Purpose

Administration workbench

Permits users to configure and manage workspaces and data sources.

Apache Ambari Software meant to make the job of administering and managing Hadoop clusters less taxing

Cluster maintenance A single click installs all of the components in the IIP platform. Infosys engineers provide robust maintenance and support for ongoing open source upgrades.

Data explorer A graphical user interface-based information modeling and query tool for designing joins, aggregation, and filtering. Offers drag-and-drop capabilities to quickly correlate multiple disparate information sources and data types. This sets the stage for uncovering insights while still insulating developers from the specifics of the underlying technologies.

It renders its results via visualization tools such as Tableau and Qlik, as well as native HTML5. The Data Explorer integrates with external data science and analytics toolsets - such as the R programming language - via commonly accepted standards and protocols.

Component Purpose

Data extractor A configurable, extensible, and fault-tolerant workbench that provides a drag-and-drop user interface for ingesting data (initially and for subsequent updates) with near real-time performance. It’s adept at loading data from multiple data sources such as relational databases, data streams, message queues, NoSQL databases, social media, and log files. It’s able to digest CSV, XML, PDF, and JSON encoding formats.

Governance Delivers complete metadata and view management via data ingestion and management workbenches.

In-memory analytics IIP supplies a high performance, comprehensive collection of libraries and features to facilitate rapid data mining and modeling. They apply mathematical and statistical algorithms to uncover patterns in raw data. This supports fast data transformation to create joins and views for subsequent consumption.

Resource manager All IIP-hosted applications can be launched, monitored, and administered from a single integrated Web-based user interface.

Component Purpose

Security IIP was built to incorporate robust security capabilities. First, it provides three levels of authentication including operating system, LDAP, and Kerberos. It also offers highly granular cell-based authorization and role-based access to information. Customers are free to specify fine-grained role-based access control:

For the platform

For all tables

For all views

For all fields with in tables and views

Next Steps

Infosys offers a collection of helpful resources that provide more information about IIP:

1. To learn more about the platform, visit the IIP website.

2. Sign up for an IIP test drive.

3. Buy today on AWS Marketplace.

Beyond the test drive, Infosys provides the ability to completely host IIP using customer-supplied cloud

environments or on-premise hardware. This option consists of a fully configured, multi-node IIP solution

that’s designed to deliver real-time insights. For more information, write to us – [email protected]