data lake: powering security insights · multiple heterogeneous data sources. to achieve this,...

8
Data Lake: Powering Security Insights Abstract The high tech industry is transitioning from traditional IT systems to a pool of integrated and loosely-coupled infrastructure and software components that generate huge amounts of data on a continual basis. Enterprises use metrics based on this data to address non-functional system requirements, and provide actionable insights into their capacity and performance needs. While organizations focus on improving functional integration between IT systems, they tend to overlook the performance, capacity, and security metrics generated by the systems, and the correlation between these metrics across systems. This white paper proposes an approach to building a data lake that can be used to capture various non-functional metrics across IT systems and establish correlations among those metrics. Such an approach enables the development of an ecosystem where capacity or performance constraints of a certain application can help isolate related applications and infrastructure WHITE PAPER

Upload: others

Post on 20-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Lake: Powering Security Insights · multiple heterogeneous data sources. To achieve this, enterprises can use an integrated data lake that combines the advanced technology capabilities

Data Lake: Powering Security Insights

Abstract

The high tech industry is transitioning from

traditional IT systems to a pool of integrated and

loosely-coupled infrastructure and software

components that generate huge amounts of data

on a continual basis. Enterprises use metrics

based on this data to address non-functional

system requirements, and provide actionable

insights into their capacity and performance

needs. While organizations focus on improving

functional integration between IT systems, they

tend to overlook the performance, capacity, and

security metrics generated by the systems,

and the correlation between these metrics

across systems.

This white paper proposes an approach to

building a data lake that can be used to capture

various non-functional metrics across IT systems

and establish correlations among those metrics.

Such an approach enables the development of an

ecosystem where capacity or performance

constraints of a certain application can help

isolate related applications and infrastructure

WHITE PAPER

Page 2: Data Lake: Powering Security Insights · multiple heterogeneous data sources. To achieve this, enterprises can use an integrated data lake that combines the advanced technology capabilities

WHITE PAPER

components that are likely to be impacted. In

addition, it provides operational insights to

improve system performance and availability by

proactively addressing constraints. The result: a

future-ready, agile enterprise IT environment that

can support superior performance and security.

Page 3: Data Lake: Powering Security Insights · multiple heterogeneous data sources. To achieve this, enterprises can use an integrated data lake that combines the advanced technology capabilities

WHITE PAPER

Why Traditional Data Architecture

Falls Short

As high tech organizations undertake transformation using

disruptive digital technologies, Big Data analytics and actionable

insights play a critical role in effective decision making.

Traditional data architecture includes data warehouses built

upon data marts that are designed to support specific business

decisions. Such architecture is not amenable to dynamic

changes in the data structure to enable effective decision

making. Most traditional systems are also business transaction

focused and not geared to drive IT decisions.

The introduction of unstructured data sets and Big Data

platforms into decision making systems calls for sophisticated

decision support systems that can integrate heterogeneous and

diverse data structures and sources. Such systems provide

additional insights through contextual information for effective

decision making across the business - to improve performance

and ensure the security and reliability of IT systems.

Non-functional requirements play a critical role in IT decision

making with respect to the performance, availability, reliability,

security, and compliance of IT systems. Some critical questions

that need to be answered are:

n Are users satisfied with the latency metrics across mission-

critical systems – what does the user satisfaction score tell?

n Is the storage consumption and tiering done in a cost-

effective way?

n Are there any vulnerabilities in the IT environment? What are

the cost implications of such vulnerabilities?

n Which set of users or infrastructure assets poses a business

risk in terms of security and unavailability?

The right blend of data warehouse and data lake solutions

provides answers to these questions.

Using Data Warehousing Systems for Non

Functional Requirements

The Limitations

Most monitoring and management data warehouse systems

accumulate a set of capacity, performance, availability, and

Page 4: Data Lake: Powering Security Insights · multiple heterogeneous data sources. To achieve this, enterprises can use an integrated data lake that combines the advanced technology capabilities

Data Sources

Big Data Storage

TransformLoadExtract

Data Sources Data Store

Dash Board

Transformation Logic

Extract Transform Load

WHITE PAPER

security metrics, and provide reporting and dashboard

capabilities to effectively visualize the data. Such systems are

capable of providing historical reports and future projections

based on past trends, and are built using extract, transform,

load (ETL) solutions (see Figure 1).

Figure 1- ETL Solution-based System for Systems

Monitoring and Management.

While most of these systems work well in silos, it requires

significant investment in terms of integration effort and cost,

as well as to leverage them for generating correlated metrics

across the enterprise. The integration effort involves the

incorporation of data transformation logic and schemas into

the systems, thereby increasing the complexity and limiting

modification flexibility.

The Upside

For certain non-functional IT metrics, especially security

domain metrics, organizations require real-time feeds from

unstructured data sources to drive real-time decision making

and categorize and isolate anomalies.

Figure 2: Data Lake System with Big Data Platform for

IT Security Metrics

The major difference between a data lake and a standard

ETL-based system lies in the way data is ingested. A data lake

leverages a Big Data platform to ingest the data into the

system in native format, rather than perform heavy

transformation logic prior to loading (see Figure 2).

Dash Board

Page 5: Data Lake: Powering Security Insights · multiple heterogeneous data sources. To achieve this, enterprises can use an integrated data lake that combines the advanced technology capabilities

WHITE PAPER

Data lakes are particularly suitable for aggregated analysis of

real-time streaming log sources. This helps provide a

consolidated view of system health across various technology

platforms, along with a correlation mechanism to monitor,

troubleshoot, and remediate issues across the system

landscape. The capabilities of such data lake systems can

be enriched by increasing the quantity and variety of

log information.

Building an Integrated Data Lake for IT and

Security Analytics

Before diving into a data lake, it’s important to note that data

warehouse-based monitoring systems have built-in domain

capabilities such as transformation logic written using domain

knowledge. This is hard to ignore. In addition, enterprises need

to enable advanced correlation and analytics capabilities using

multiple heterogeneous data sources. To achieve this,

enterprises can use an integrated data lake that combines the

advanced technology capabilities of a Big Data platform with

the domain capabilities of traditional ETL-based systems

(see Figure 3).

Figure 3: Integrated Data Lake: Combining Big Data Platform with

Domain Capabilities of ETL-based Solutions

Big Data Storage

TransformLoadExtract

Transformation Logic

Extract Transform Load

Data Sources

Data Sources

Data Store

Dash Board

Page 6: Data Lake: Powering Security Insights · multiple heterogeneous data sources. To achieve this, enterprises can use an integrated data lake that combines the advanced technology capabilities

WHITE PAPER

In this approach, a data lake for IT and security metrics would

leverage the existing monitoring systems and ingest the data

into the data lake, along with contextual feeds from a variety

of new data sources. These include configuration management

database (CMDB), IT service management systems (ITSM),

social media feeds, and so on. For the new data sources,

enterprises can leverage the domain capabilities of an existing

ETL solution in the form of an operations management or

storage resource management software, for advanced

reporting and analytics.

One way to assess the applicability of an integrated data lake

for augmenting storage resource management software would

be to build a proof-of-concept (PoC) (see Figure 4). Hadoop is

an ideal Big Data platform to integrate heterogeneous data

from: backup devices, ITSM systems, CMDBs. At the same

time, it can access virtualization platform, security audit, and

HTTP server logs as well as user accounts. Customized data

marts can be created to address security, backup, and incident

related problem statements. Companies can also create

advanced reports using these data marts in storage resource

management software to demonstrate additional capabilities

that can be introduced with the help of a Big Data platform.

Figure 4: Conceptual View of Integrated Data Lake Augmenting

Storage Resource Management Software

Analytics Tools Query Reporting Virtualization Analytics

Data Transformation Refined Data Trusted Data Meta Data

Data Pipeline Data Landing Zone

Data Source ApplicationLogs

ITSM Metrics

ApplicationMetrics

ServiceNowMetrics

Data Store Big Data Store (Original Unaltered Data)

QueryReporting/

VirtualizationAnalytics

Data Warehouse Engine

Data lake mart

Data warehouse

mart

Data warehouse

mart

Data warehouse

mart

Data warehouse

mart

Data Marts

Data Warehouse

Page 7: Data Lake: Powering Security Insights · multiple heterogeneous data sources. To achieve this, enterprises can use an integrated data lake that combines the advanced technology capabilities

WHITE PAPER

Some Use Cases

An integrated data lake leverages Big Data analytics to gain

meaningful insights to address specific business challenges

such as cost optimization, simplification, and security. All

analysis and recommendations can be presented in the context

of associated cost implications. For example, security controls

might vary based on information risk categorization. Additional

security measures can be adopted after evaluating cost against

the value of information. This helps enterprises make effective

business decisions in terms of what it would take to achieve

the desired measures and avoid unnecessary changes.

An integrated data lake approach can be applied to all use

cases of traditional ETL-based and modern data lake systems.

In addition, it addresses certain additional scenarios:

1. Capacity consumption: Optimize capacity consumption using

optimized tier recommendations leveraging machine

learning or Big Data analytics.

2. Performance anomalies and recommended actions: Detect

performance anomalies in the current environment and use

machine learning to recommend the ideal measures with the

help of Big Data reporting.

3. User satisfaction scores for mission-critical systems: Store

end user feedback on Big Data platforms and correlate the

feedback to data from the enterprise environment.

4. Advanced correlation for threat detection: Use machine

learning to model threat detection systems for future-ready

threat detection capabilities.

The Data Lake: A Critical Technology

Component for Long-term Strategic

Decision-making

With the increasing focus on leveraging data-driven analytics

for competitive advantage, it is critical to build a data lake that

addresses some of the key aspects of non-functional

requirements such as capacity, performance, and security.

While serving as a bridge that transitions customers from

traditional legacy monitoring systems to advanced digital IT

platforms, the IT and security data lake also serves as a critical

technology component for long-term strategic decision-making.

However, to realize its true value, a data lake must be part of a

carefully architected end-to-end platform that integrates

different data sources and leverages machine learning

algorithms for superior business decisions.

Page 8: Data Lake: Powering Security Insights · multiple heterogeneous data sources. To achieve this, enterprises can use an integrated data lake that combines the advanced technology capabilities

All content / information present here is the exclusive property of Tata Consultancy Services Limited (TCS). The content / information contained here is correct at the time of publishing. No material from here may be copied, modified, reproduced, republished, uploaded, transmitted, posted or distributed in any form without prior written permission from TCS. Unauthorized use of the content / information appearing here may violate copyright, trademark and other applicable laws, and could result in criminal or civil penalties. Copyright © 2018 Tata Consultancy Services Limited

About Tata Consultancy Services Ltd (TCS)

Tata Consultancy Services is an IT services, consulting and business solutions

organization that delivers real results to global business, ensuring a level of

certainty no other firm can match. TCS offers a consulting-led, integrated portfolio

of IT and IT-enabled, infrastructure, engineering and assurance services. This is TMdelivered through its unique Global Network Delivery Model , recognized as the

benchmark of excellence in software development. A part of the Tata Group,

India’s largest industrial conglomerate, TCS has a global footprint and is listed on

the National Stock Exchange and Bombay Stock Exchange in India.

For more information, visit us at www.tcs.com

TCS

Des

ign

Serv

ices

M

02

18I

II

WHITE PAPER

About The Authors

Ankur Srivastava

Ankur Srivastava is an

Enterprise Architect with TCS’

HiTech business unit and he

currently heads the HiTech

Solutions Lab. With 13 years of

experience, Srivastava drives

the cybersecurity and cloud

infrastructure initiatives within

the unit and is responsible for

developing differentiated digital

automation solutions in the

cloud and cybersecurity space.

He has a Master of Technology

degree in Software Systems

with specialization in Data

Analytics from the Birla

Institute of Technology and

Science, Pilani, India.

Ashish Pandey

Ashish Pandey is a Solution

Developer with TCS’ HiTech

business unit. He has over

seven years of experience and

focuses on developing new

solutions leveraging Big Data

platforms and technologies.

Pandey holds a Bachelor of

Technology degree in Computer

Science and Engineering from

Uttar Pradesh Technical

University, Lucknow, India.

Contact

Visit the page on Hitech www.tcs.com

Email: [email protected]

Subscribe to TCS White Papers

TCS.com RSS: http://www.tcs.com/rss_feeds/Pages/feed.aspx?f=w

Feedburner: http://feeds2.feedburner.com/tcswhitepapers