isaca journal - bridging the gap between access and security in big data

Download Isaca journal  - bridging the gap between access and security in big data

Post on 14-Apr-2017




2 download

Embed Size (px)


  • \

    ~~.; ~


    How Zero-trust Network Security CanEnable Recovery From Cyberattacks

    Leveraging Industry Standards toAddress Industrial Cybersecurity Risk

    Bridging the Gap BetweenAccess and Security in Big Data -I~ISACA

    Trust in, and value from, Information systems

    ISACA ournalCybersecur~ty

    Featured articles:

    And more...

  • Feature

    UIIT. Matlason is the chieftechnology officer (CTO)of Prategrity. He createdthe Initial architecture of

    Protegritys database securitytechnology, for which thecompany owns several keypatents. His extensive rr andsecurity Industry experienceincludes 20 years with IBMas a manager of softwaredevelopment and a consultingresource to IBM~ researchand development organizationin the areas of IT architectureand IT security.

    Do you havesomethingto saythis artice?

    Visit Journalpages the ISACAweb site ,findltiearticle, and choosethe Comments tab to

    your thoughts.

    Organizations are failing to truly secure theirsensitive data in big data environments. Dataanalysts require access to the data to efficientlyperform meaningful analysis and gain areturn on investment (ROl), and traditionaldata security has served to limit that access.The result is skyrocketing data breaches anddiminishing privacy, accompanied by huge finesand disintegrating public trust. It is critical toensure individuals privacy and proper securitywhile retaining data usabffity and enablingorganizations to responsibly utilize sensitiveinformation for gain.

    (BIG) DATA ACCESSThe Hadoop platform for big data is used hereto illustrate the common security issues andsolutions. Hadoop is the dominant big dataplatform, used by a global community and itlacks needed data security The platform providesa massively parallel processing platform1 designedfor access to extremely large amounts of data andexperimentation to find new insightsby analyzing and comparing moreinformation than was previouslypractical or possible.

    Data flow in faster, in greatervariety volume and levels ofveracity and can be processed efficiently bysimultaneously accessing data split across up tohundreds or thousands of data nodes in a cluster.Data are also kept for much longer periods oftime than would be in databases or relationaldatabase management systems (RDBMS), asthe storage is more cost-effective and historicalcontext is part of the draw.

    A FALSE SENSE OF SECURITYif the primary goal of Hadoop is data access, datasecurity is traditionally viewed as its antithesis.There has always been a tug of war betweenthe two based on risk, balancing operationalperformance and ptivac~ but the issue ismagnified exponentially in Hadoop (figure 1).

    Tambln disponible en espaolwww.Isaca.orgJcu,rentissue

    For example, millions of personal recordsmay be used for analysis and data insights, butthe privacy of all of those people can be severelycompromised from one data breach. The riskinvolved is far too high to afford weak securit)cbut obstructing performance or hindering datainsights will bring the platform to its knees.

    Despite the perception of sensitive data asobstacles to data access, sensitive data in bigdata platforms still require security according tovarious regulations and laws,2 much the same asany other data platfonn. Therefore, data securityin Hadoop is most often approached from theperspective of regulatory compliance.

    One may assume that this helps to ensuremaximum security of data and minimal risk,and, indeed, it does bind organizations to

    secure their data to some extent.However, as security is viewed asobstructive to data access and,therefore, operational performance,the regulations actually serve as aguide to the least-possible amount

    of security necessary to comply. Compliance doesnot guarantee security.

    Obviously, organizations do want to protecttheir data and the privacy of their customers, butaccess, insights and performance are paramount.To achieve maximum data access and security,the gap between them must be bridged. So howcan this balance best be achieved?

    Figure 1Traditional View of Data Security

    Bridging the Gap Be een Accessand Security in Big Data

    Compliance does notguarantee security.

    Traditional View of Data Security

    Access SecuritySource: UIIT. Mattsson. Reprinted with permission.


  • DATA SECURITY TOOLSHadoop, as of this writing, has no native data security, althoughmany vendors both of Hadoop and data security provide add-onsolutions.3 These solutions are typically based on access controland/or authentication, as they provide a baseline level of securitywith relatively high levels of access.

    Access Control and AuthenticationThe most common implementation of authentication inHadoop is Kerberos.4 In access control and authentication,sensitive data are displayed in the clear during job functionsin transit and at rest. In addition, neither access control norauthentication provides much protection from privilegedusers, such as developers or system administrators, who caneasily bypass them to abuse the data. For these reasons, manyregulations, such as the Payment Card Industry Data SecurityStandard (PCI DSS)3 and the US Health Insurance Portabilityand Accountabifity Act (HIPAA),6 require security beyondthem to be compliant.

    Coarse-pained EncryptionStarting from a base of access controls and/or authentication,adding coarse-pained volume or disk encryption is thefirst choice typically for actual data security in Hadoop.This method requires the least amount of difficulty inimplementation while still offering regulatory compliance.Data are secure at rest (for archive or disposal), andencryption is typically transparent to authorized users andprocesses. The result is still relatively high levels of access, butdata in transit, in use or in analysis are always in the clear andprivileged users can still access sensitive data. This methodprotects only from physical theft.

    Fine-pained EncryptionAdding strong encryption for columns or fields providesfurther securit3c protecting data at rest, in transit and fromprivileged users, but it requires data to be revealed in the clear(decrypted) to perform job functions, including analysis, asencrypted data are unreadable to users and processes.

    Format-preserving encryption preserves the ability of usersand applications to read the protected data, but is one of theslowest performing encryption processes.

    Implementing either of these methods can significantlyimpact performance, even with the fastest encryption/decryption processes available, such that it negates many ofthe advantages of the Hadoop platform. As access is

    Read Big Data: Impactsand Benefits.


    Discuss and collaborate on big data in the.Knowledge Center. - ~. . - -

    paramount, these methods tip the balance too far in thedirection of security to be viable.

    Some vendors offer a virtual file system above the HadoopDistributed File System (HDFS), with role-based dynamicdata encryption. While this provides some data security in use,it does nothing to protect data in analysis or from privilegedusers, who can access the operating system (OS) and layersunder the virtual layer and get at the data in the clear.

    Data MaskingMasking preserves the type and length of structured data,replacing it with an inert, worthless value. Because themasked data look and act like the original, they can be read byusers and processes.

    Static data masking (SDM) permanently replaces sensitivevalues with inert data. SDM is often used to perform jobfunctions by preserving enough of the original data or deidentifying the data. It protects data at rest, in use, in transit,in analysis and from privileged users. However, shouldthe cleartext data ever be needed again (i.e., to carry outmarketing operations or in health care scenarios), they areirretrievable. Therefore, SDM is utilized in test/developmentenvironments in which data that look and act like real dataare needed for testing, but sensitive data are not exposedto developers or systems administrators. It is not typicallyused for data access in a production Hadoop environment.Depending on the masking algorithms used and what data arereplaced, SDM data may be subject to data inference and bedc-identified when combined with other data sources.

    Dynamic data masking (DDM) performs masking on thefly. As sensitive data are requested, policy is referenced andmasked data are retrieved for the data the user or process isunauthorized to see in the clear, based on the users/processsrole. Much like dynamic data encryption and access control,DDM provides no security to data at rest or in transit and

    2 iS.4CA JOURNAL VOLIJME6 2014

  • little from privileged users. Dynamically masked values canalso be problematic to work with in production analyticscenarios, depending on the algorithm/method used.7

    TokenizationTokenization also replaces cleartext with a random, inertvalue of the same data type and length, but the process canbe reversible. This is accomplished through the use of tokentables, rather than a ciyptographic algorithm. In vaultlesstokenization, small blocks of the original data are replacedwith paired random values from the token tables overlappingbetween blocks. Once the entire value has been tokenized,the process is run through again to remove any pattern inthe transformation.

    However, because the exit value is still dependent upon theentering value, a one-to-one relationship with the original datacan still be maintained and, therefore, the tokenized data can beused in analytics as a replacement for the cleartext. Additionally,parts of the cleartext data can be preserved or bled through tothe token, which is especially useful in cases w