intel boubker el mouttahid

22
McAfee Confidential Securing Big Data Boubker El Mouttahid | Enterprise Architect

Upload: bigdataexpo

Post on 08-Jan-2017

63 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Intel boubker el mouttahid

McAfee Confidential

Securing Big Data

Boubker El Mouttahid | Enterprise Architect

Page 2: Intel boubker el mouttahid

McAfee Confidential2

The risks associated with Big Data technologies

New Technology New Risks

Any technology that is not well understood will introduce new

vulnerabilities.

Typically include open source code

Luck of security best practices

SecurityUser authentication and

access to data from multiple locations may not be sufficiently controlled

There is significant opportunity for malicious data input and inadequate data validation

Attack surface of the nodes in a cluster may not have been

reviewed and servers adequately hardened.

Data Privacy &

CompliancyRegulatory requirements may not be fulfilled, with access to

logs and audit trails problematic

Data Security: Hadoop store the data as it is without encryption to improve

efficiency

Granular access control: What types of personal information can be deemed sharable and

with whom

Page 3: Intel boubker el mouttahid

McAfee Confidential3

Emerging Hadoop Security Maturity Model

Page 4: Intel boubker el mouttahid

McAfee Confidential

Security Requirements of the Enterprise

4

•   Network Security

Authentication

Authorization

Data Protection

Visibility & Monitoring

Secure Configurations for Hardware and Software

Page 5: Intel boubker el mouttahid

McAfee Confidential

Comprehensive, Security & Compliance-ReadySecuring the underlying operating system Authentication, Authorization, Audit, and Compliance

PerimeterGuarding access

to the cluster itself

InfoSec Concept:Authentication

AccessDefining what

users and applications can do

with data

InfoSec Concept:Authorization

VisibilityReporting on

where data came from and how it’s

being used

InfoSec Concept:Audit

DataProtecting data in the cluster from

unauthorized visibility

InfoSec Concept:Compliance

Network Security Secure Configuration Real Time Monitoring

Page 6: Intel boubker el mouttahid

McAfee Confidential

Platform Security Requirements

Network Security Secure Configuration Real Time Monitoring

Defense, resilienceDeep packet inspections

Securing the underlying operating system and the applications installed on the system

Log correlation and analysis to rapidly identify anomalies

Page 7: Intel boubker el mouttahid

McAfee Confidential

Perimeter Security Requirements

Preserve user choice of the right Hadoop service (e.g. Impala, Spark)

Conform to centrally managed authentication policies

Implement with existing standard systems: Active Directory and Kerberos

PerimeterGuarding access to

the cluster itself

InfoSec Concept:Authentication

Page 8: Intel boubker el mouttahid

McAfee Confidential

• Contributed by Intel in 2013• Blueprint for enterprise-

grade security

Cloudera and Intel Project RhinoRhino Goal: Unified

AuthorizationEngineers at Intel and Cloudera (together with Oracle and IBM) are now jointly contributing to

Apache Sentry

Rhino Goal: Encryption and Key Management

FrameworkCloudera and Intel engineers are now contributing HDFS

encryption capabilities that can plug into enterprise key

managers

Page 9: Intel boubker el mouttahid

McAfee Confidential

Right Solution• Provides maximum flexibility • Delivers centrally managed authentication• Automates configuration while leveraging existing infrastructure

Watch for other solutions that…• Require setup of additional Kerberos server and cross-realm trust• Lead to manual and error-prone Kerberos config on individual nodes• Offer limited support for username/password authentication against AD

Business Impact• DELAY: Must seek InfoSec sign-off for cross-realm trust establishment• SET-UP COST: Must procure & configure Kerberos server & nodes• ONGOING MAINTENANCE: Additional task for each new Kerberos user

Authentication and Identity

✓✗

“Integrate With Existing Enterprise Authentication Mechanisms for Hadoop Identity and Access Management.”

Gartner – Best Practices for Securing Hadoop

Page 10: Intel boubker el mouttahid

McAfee Confidential

Access Security Requirements

Provide users access to data needed to do their job

Centrally manage access policies

Leverage a role-based access control model built on AD

AccessDefining what users and applications can

do with data

InfoSec Concept:Authorization

Page 11: Intel boubker el mouttahid

McAfee Confidential

Manage data access by role, instead of by individual user• Fraud Analyst Role has read access on ALL transaction data• Branch Teller Role has read / write access on very limited set

of data• Relationships between users and roles are established via

groups

An RBAC policy is then uniformly enforced for all Hadoop services• Provides unified authorization controls• As opposed to tools for managing numerous, service specific

policies

RBAC and Centralized Authorization

Page 12: Intel boubker el mouttahid

McAfee Confidential

Sentry provides unified authorization via fine-grained RBAC for Impala, Hive, Search, MapReduce, Pig, HDFS…

Unified Authorization with Apache Sentry

Sentry Perm. Read

Access to ALL

Transaction Data

Sentry RoleFraud

Analyst Role

GroupFraud

AnalystsSam Smith

Page 13: Intel boubker el mouttahid

McAfee Confidential

• Sentry can be configured to use AD to determine a user’s group assignments• Group assignment changes in AD are automatically picked up, resulting in

updated Sentry role assignments

Sentry and Active Directory Groups

Sentry Perm. Read

Access to ALL

Transaction Data

Sentry RoleFraud

Analyst Role

AD GroupFraud

AnalystsSam Smith

Page 14: Intel boubker el mouttahid

McAfee Confidential

Sentry enforces each rule across Hadoop components

16

Hive Server 2

Enfo

rcem

ent c

ode

Impala

MapReduce, Pig, HDFS*

Apps: Datameer, Platfora,

etc*

Permissions rules

Common enforcement code for consistency.

Rule 1: Allow fraud analysts read access to the transaction table

Permissions specified by administrators (top-level and delegated)

Enfo

rcem

ent c

ode

Enfo

rcem

ent c

ode

Enfo

rcem

ent c

ode

Page 15: Intel boubker el mouttahid

McAfee Confidential

Sentry – The Open Standard

Broad Contribution

s• Cloudera• IBM• Intel• Oracle

Multi-Vendor Support

• Cloudera• IBM• MapR• Oracle

Wide Industry Adoption

• Banking• Healthcare• Insurance• Pharma• Telco

Third-Party Integrations• Oracle

Endeca• Platfora

Page 16: Intel boubker el mouttahid

McAfee Confidential

Right solution• User access determined via group assignments• Unified authorization via granular RBAC policy • Leverages existing AD infrastructure

Watch for other solutions that…• Require redundant security policies; one for every access path• Do not offer an RBAC policy model• Cannot meet InfoSec requirement for centrally managed authorization• Depend on manual mirroring of AD group assignments

Business Impact• DELAY: InfoSec approval for data access assignments outside of AD• ONGOING COSTS: Authorization policies have to be reviewed and tested for each access path• ONGOING MAINTENANCE: Mirroring of directory group assignments

Authorization and Access

✓✗

“A must-have for enterprise access scenarios, the most prominent solution here is Apache Sentry”

Gigaom– Hadoop Security: Solutions Emerge

Page 17: Intel boubker el mouttahid

McAfee Confidential

Visibility Security Requirements

Understand where report data came from and discover more data like itComply with policies for audit, data classification, and lineage

Centralize the audit repository; perform discovery; automate lineage

VisibilityReporting on where data came from and how it’s being used

InfoSec Concept:Audit

Page 18: Intel boubker el mouttahid

McAfee Confidential

Right solution• Users can easily discover data and examine lineage• Complies with requirements for audit, classification, lineage• No additional IT burden: audit logs are centralized, lineage is

automatic, users can self help

Watch for other solutions that…• Offer no unified audit trail for point in time user access• Limited lineage, available only at file level, no visualization• Third-party tools required for data discovery

Business Impact• DELAY: InfoSec testing, validation, and approval of third-party tools• COMPLIANCE RISK: Inability to respond quickly if point-in-time user access

history needed. Inability to meet core data governance requirements without column level lineage

Audit and Governance

✓✗

“Cloudera Navigator is… one pane of glass for all Hadoopmetadata and events including security”

Gartner – Protecting Big Data In Hadoop

Page 19: Intel boubker el mouttahid

McAfee Confidential©2014 Cloudera, Inc. All rights reserved.

Data Security Requirements

Perform analytics on regulated data

Encrypt data, conform to key management policies, protect from rootIntegrate with existing HSM as part of key management infrastructure

DataProtecting data in the

cluster from unauthorized visibility

InfoSec Concept:Compliance

Page 20: Intel boubker el mouttahid

McAfee Confidential

Right solution• Brings the power of pervasive analytics to regulated data• Delivers compliant protection with encryption, key management• Provides separation of duties between system and data admins• Conforms to existing policies regarding HSM based key management

Watch for other solutions that…• Run risk of data breach or theft by storing data in clear text• Require 3rd party encryption and key management solutions• Offer no protection against privileged user access• May not integrate with corporate HSM’s

Business Impact• EXPOSURE: theft or breach of data – fines, damage to brand• DELAY: InfoSec testing, validation and approval of 3rd party solutions

(COST)• DELAY: InfoSec approval for key management independent of HSM• COMPLIANCE RISK: Potential PCI issues

Encryption and Key Management

✓✗

“Cloudera… recognized that data in Hadoop must be protected both at rest and in transit… and have made data encryption part of their products.”

Gartner – Protecting Big Data In Hadoop

Page 21: Intel boubker el mouttahid

McAfee Confidential23

Recommendation

Enterprise data hub with built-in security

Comprehensive and integrated security

Compliance Ready

Page 22: Intel boubker el mouttahid

Intel & McAfee Confidential 24