the past, present, and future of big data security

4
. I •• J .~ 0~. 0.• cD: 2 c..: . I. ~. I . N I11~~ 0< rn~< .. I.. .. •. •. T1 . i—i ••. n.I . . ~rn 4F•% ~ I. ,•• -<I-- ~ ~0 -I I. I . •• .. ~•• -I’,— •• •• •••cn •• •.• C0. . ..-4Z . .. a’~’4’ F”— .~ .. . .. .• . r) C . .-< ~ -I .. 0 —4 rn -4 . •• .. . . 0 . . -I -n .. .• . . .. . . . •..

Upload: ulf-mattsson

Post on 15-Jan-2015

211 views

Category:

Technology


2 download

DESCRIPTION

ONE OF THE BIGGEST REMAINING CONCERNS REGARDING HADOOP, PERHAPS SECOND ONLY TO ROI, IS SECURITY. The Past, Present, and Future of Big Data SecurityWhile Apache Hadoop and the craze around Big Data seem to have exploded out into the market, there are still a lot more questions than answers about this new environment. Hadoop is an environment with limited structure, high ingestion volume, massive scalability and redundancy, designed for access to a vast pool of multi-structured data. What’s been missing is new security tools to match. Read more in this article by Ulf Mattsson, Protegrity CTO, originally published by Help Net Security’s (IN)SECURE Magazine.

TRANSCRIPT

Page 1: The past, present, and future of big data security

.I

••

J .~

0~.0.•

cD:

2

c..:

.

I. ~.I

.

N

I11~~0<

rn~< ..I..

..•. •.

T1 .i—i ••.n.I

. .~rn4F•%

~ I. ,••-<I-- ~~0-I

I.

I

.• ••

.. ~••• -I’,—

• •••••••cn ••

•.• C0.• . —

..-4Z... a’~’4’ •

F”—.~

... ...•.

• r)C

..-<• ~

-I

..

0

• —4rn

-4.

••

• ...

•.

0

•..

-I

-n

..

••

.•.

• ...

.

..

•..

Page 2: The past, present, and future of big data security

I\;~/ ~___••~ %• /

‘J N~ /\ \• ~.

4

The p’ast, present, and future of Big lata securityby Ulf Mattsson

Whiie Apache Hadoop and the craze around Big Data seem to have explodedout into the market, there are still a lot more questions than answers aboutthis new environment. One of the biggest concerns, second perhaps only toROl, is security.

This is primarily due to the fact that many create an overall security architecture for thehave yet to grasp the paradigm shift from tra- entire Hadoop ecosystem as it continues toditional database platforms to Hadoop. Tradi- grow. However, some security tools havetional security tools address a separation of been released over the last few years, includduties, access control, encryption options, ing Kerberos, which provides strong authentiand more, but they are designed for a struc- cation. But Kerberos does little to protect datatured, limited environment, where data is flowing in and out of Hadoop, or to preventcarefully collected and cultivated, privileged users such as DBA’s or SA’s from

abusing the data. While authentication reHadoop, on the other hand, is an environment mains an important part of the data securitywith limited structure, high ingestion volume, structure in Hadoop, on its own it falls short ofmassive scalability and redundancy, designed adequate data protection.for access to a vast pool of multi-structureddata. What’s missing is new security tools to Another development was the addition ofmatch. coarse-grained volume or disk encryption,

usually provided by data security vendors.Another challenge with securing Hadoop This solved one problem (protecting data atcomes from the rapid expansion of the envi- rest) but considering one of the primary goalsronment itself. Since its initial development, behind Hadoop is using the data, one mightnew tools and modules have been coming out suggest that it provided little in the grandnot only from Apache, but nearly every other scheme of Big Data security. Sensitive data inthird-party vendor as well. While security is use for analytics, traveling between nodes,tested and implemented for one module, three sent to other systems, or even just beingmore have come out and are waiting for the viewed is subject to full exposure.same treatment. This makes it very difficult to

‘/

\\

I I

Page 3: The past, present, and future of big data security

THE OPTIONS FOR FINE-GRAI ED DATA SECURITYIN HADOOP NOW INCLUDE ENCRYPTION, MASKING,

AND VAULTLESS TOKENIZATION.

I II II I

Up until recently, Big Data technology vendorshave often left it to customers to protect theirenvironments, and they, too, feel the burdenof limited options.

Today, vendors such as Teradata, Horton-works, and Cloudera, have partnered withdata security vendors to help fill the securitygap. What they’re seeking is advanced functionality equal to the task of balancing securityand regulatory compliance with data insightsand “big answers”.

The key to this balance lies not in protectingthe growing ecosystem, or blanketing entirenodes with volume encryption, but targetingthe sensitive data itself at a very fine-grainedlevel, with flexible, transparent security. Applying this security through a comprehensivepolicy-based system can provide further control and additional options to protect sensitivedata, including multiple levels of access tovarious users or processes. Once secured,the data can travel throughout the Hadoopecosystem and even to outside systems andremain protected.

The options for fine-grained data security inHadoop now include encryption (AES orformat-preserving), masking, and VaultlessTokenization.

Typically, encryption is the least desirable option, as standard strong encryption produces

values that are unreadable to the tools andmodules in Hadoop, format-preserving encryption is typically much slower than maskingor Vaultless Tokenization, and both requirecomplicated cryptographic key managementacross tens or even hundreds of nodes.

Masking was developed for non-productionsystems and testing, and has found a home inHadoop’s early, experimental phase. Individual data elements are either replaced withrandom values or generalized so that they areno longer identifiable. It is fast, produces values that are readable to systems and processes, and requires no key management.However, because masking was designed fornon-production, it is usually not reversible,and is therefore not ideal for any situationswhere the original data may be needed sometime after the data is masked.

Vaultless Tokenization, similar to masking,also replaces data elements with random values of the same data type and length. It isalso much faster than format-preserving encryption, virtually eliminates key management,and is transparent to processes. The addedbenefit comes from the ability to perform bothone-way protection and reversible security.

This provides ideal protection for testldev environments and can also allow retrieval of theoriginal data when required by authorizedusers or processes.

Due to the read-only nature of the Hadoopenvironment (files cannot be updated, youcan only create a file, read it and delete it),application of these fine-grained protectionmethods requires a unique approach.

This is typically performed in one of two ways.The first is a secured gateway, situated infront of Hadoop, which parses incoming datato identify sensitive data elements, and applies the selected protection method beforepassing the data on to Hadoop. The second is

a secured landing zone, which may be a nodeor partition within Hadoop that is protectedwith coarse-grained encryption. Files arrive inthe landing zone, and are then parsed by oneof the processing applications in Hadoop(MapReduce, Hive, Pig, etc.), identifying andprotecting sensitive data elements before ingesting the data into the main Hadoop cluster.This method utilizes the massively parallelprocessing of Hadoop to efficiently protectdata.

Page 4: The past, present, and future of big data security

As the environment becomes more established, usability and enterprise integration willimprove, new data exchange protocols will beused, and a set of security tools will be standardized and made native to platforms.

Laws and regulations relating to privacy andsecurity will also continue to increase, andsecurity will become an even more vital component in Big Data. Companies will be unableto harness the massive amounts of machine-generated data from the Internet of Things

Big Data security will evolve, becoming increasingly intelligent and data-driven in itsown right. We will see more tools that cantranslate security event statistics into actionable information. Data security policies will beintricately designed, and likely multi-layered,utilizing a combination of coarse- and finegrained security methods, access control,authentication, and monitoring.

In the exciting near future, the data is onlygetting bigger, but we must not allow it to outgrow security.

Ulf T. Mattsson is the CTO of Protegrity. Ulf created the initial architecture of Protegrity’s database securitytechnology, for which the company owns several key patents. His extensive IT and security industry experience includes 20 years with IBM as a manager of software development and a consulting resource to IBM’sResearch and Development organization, in the areas of IT Architecture and IT Security. Ulf holds a degree inelectrical engineering from Polhem University, a degree in Finance from University of Stockholm and a master’s degree in physics from Chalmers University of Technology.

VISIT NET-SECUR{TY.ORGGRAB RSS

QUALITY SECURITY NEWSDELIVERED EVERY DAY

In the next five years, the creation of data by without implementing comprehensive datamore and more people and devices will con- security - first in the area of industrial envitinue to drive the companies towards Hadoop ronment (power grids, etc.) and later on con-and other Big Data platforms. The require- sumer use (healthcare, etc.). Security will bements for handling extreme levels of volume, viewed not only in terms of loss-prevention,velocity, variety, and veracity will only in- but value creation, enabling compliant datacrease, and Big Data will assume more and collection, use, analysis, and monetization.more critical business functions.