intel boubker el mouttahid
TRANSCRIPT
McAfee Confidential
Securing Big Data
Boubker El Mouttahid | Enterprise Architect
McAfee Confidential2
The risks associated with Big Data technologies
New Technology New Risks
Any technology that is not well understood will introduce new
vulnerabilities.
Typically include open source code
Luck of security best practices
SecurityUser authentication and
access to data from multiple locations may not be sufficiently controlled
There is significant opportunity for malicious data input and inadequate data validation
Attack surface of the nodes in a cluster may not have been
reviewed and servers adequately hardened.
Data Privacy &
CompliancyRegulatory requirements may not be fulfilled, with access to
logs and audit trails problematic
Data Security: Hadoop store the data as it is without encryption to improve
efficiency
Granular access control: What types of personal information can be deemed sharable and
with whom
McAfee Confidential3
Emerging Hadoop Security Maturity Model
McAfee Confidential
Security Requirements of the Enterprise
4
• Network Security
Authentication
Authorization
Data Protection
Visibility & Monitoring
Secure Configurations for Hardware and Software
McAfee Confidential
Comprehensive, Security & Compliance-ReadySecuring the underlying operating system Authentication, Authorization, Audit, and Compliance
PerimeterGuarding access
to the cluster itself
InfoSec Concept:Authentication
AccessDefining what
users and applications can do
with data
InfoSec Concept:Authorization
VisibilityReporting on
where data came from and how it’s
being used
InfoSec Concept:Audit
DataProtecting data in the cluster from
unauthorized visibility
InfoSec Concept:Compliance
Network Security Secure Configuration Real Time Monitoring
McAfee Confidential
Platform Security Requirements
Network Security Secure Configuration Real Time Monitoring
Defense, resilienceDeep packet inspections
Securing the underlying operating system and the applications installed on the system
Log correlation and analysis to rapidly identify anomalies
McAfee Confidential
Perimeter Security Requirements
Preserve user choice of the right Hadoop service (e.g. Impala, Spark)
Conform to centrally managed authentication policies
Implement with existing standard systems: Active Directory and Kerberos
PerimeterGuarding access to
the cluster itself
InfoSec Concept:Authentication
McAfee Confidential
• Contributed by Intel in 2013• Blueprint for enterprise-
grade security
Cloudera and Intel Project RhinoRhino Goal: Unified
AuthorizationEngineers at Intel and Cloudera (together with Oracle and IBM) are now jointly contributing to
Apache Sentry
Rhino Goal: Encryption and Key Management
FrameworkCloudera and Intel engineers are now contributing HDFS
encryption capabilities that can plug into enterprise key
managers
McAfee Confidential
Right Solution• Provides maximum flexibility • Delivers centrally managed authentication• Automates configuration while leveraging existing infrastructure
Watch for other solutions that…• Require setup of additional Kerberos server and cross-realm trust• Lead to manual and error-prone Kerberos config on individual nodes• Offer limited support for username/password authentication against AD
Business Impact• DELAY: Must seek InfoSec sign-off for cross-realm trust establishment• SET-UP COST: Must procure & configure Kerberos server & nodes• ONGOING MAINTENANCE: Additional task for each new Kerberos user
Authentication and Identity
✓✗
“Integrate With Existing Enterprise Authentication Mechanisms for Hadoop Identity and Access Management.”
Gartner – Best Practices for Securing Hadoop
McAfee Confidential
Access Security Requirements
Provide users access to data needed to do their job
Centrally manage access policies
Leverage a role-based access control model built on AD
AccessDefining what users and applications can
do with data
InfoSec Concept:Authorization
McAfee Confidential
Manage data access by role, instead of by individual user• Fraud Analyst Role has read access on ALL transaction data• Branch Teller Role has read / write access on very limited set
of data• Relationships between users and roles are established via
groups
An RBAC policy is then uniformly enforced for all Hadoop services• Provides unified authorization controls• As opposed to tools for managing numerous, service specific
policies
RBAC and Centralized Authorization
McAfee Confidential
Sentry provides unified authorization via fine-grained RBAC for Impala, Hive, Search, MapReduce, Pig, HDFS…
Unified Authorization with Apache Sentry
Sentry Perm. Read
Access to ALL
Transaction Data
Sentry RoleFraud
Analyst Role
GroupFraud
AnalystsSam Smith
McAfee Confidential
• Sentry can be configured to use AD to determine a user’s group assignments• Group assignment changes in AD are automatically picked up, resulting in
updated Sentry role assignments
Sentry and Active Directory Groups
Sentry Perm. Read
Access to ALL
Transaction Data
Sentry RoleFraud
Analyst Role
AD GroupFraud
AnalystsSam Smith
McAfee Confidential
Sentry enforces each rule across Hadoop components
16
Hive Server 2
Enfo
rcem
ent c
ode
Impala
MapReduce, Pig, HDFS*
Apps: Datameer, Platfora,
etc*
Permissions rules
Common enforcement code for consistency.
Rule 1: Allow fraud analysts read access to the transaction table
Permissions specified by administrators (top-level and delegated)
Enfo
rcem
ent c
ode
Enfo
rcem
ent c
ode
Enfo
rcem
ent c
ode
McAfee Confidential
Sentry – The Open Standard
Broad Contribution
s• Cloudera• IBM• Intel• Oracle
Multi-Vendor Support
• Cloudera• IBM• MapR• Oracle
Wide Industry Adoption
• Banking• Healthcare• Insurance• Pharma• Telco
Third-Party Integrations• Oracle
Endeca• Platfora
McAfee Confidential
Right solution• User access determined via group assignments• Unified authorization via granular RBAC policy • Leverages existing AD infrastructure
Watch for other solutions that…• Require redundant security policies; one for every access path• Do not offer an RBAC policy model• Cannot meet InfoSec requirement for centrally managed authorization• Depend on manual mirroring of AD group assignments
Business Impact• DELAY: InfoSec approval for data access assignments outside of AD• ONGOING COSTS: Authorization policies have to be reviewed and tested for each access path• ONGOING MAINTENANCE: Mirroring of directory group assignments
Authorization and Access
✓✗
“A must-have for enterprise access scenarios, the most prominent solution here is Apache Sentry”
Gigaom– Hadoop Security: Solutions Emerge
McAfee Confidential
Visibility Security Requirements
Understand where report data came from and discover more data like itComply with policies for audit, data classification, and lineage
Centralize the audit repository; perform discovery; automate lineage
VisibilityReporting on where data came from and how it’s being used
InfoSec Concept:Audit
McAfee Confidential
Right solution• Users can easily discover data and examine lineage• Complies with requirements for audit, classification, lineage• No additional IT burden: audit logs are centralized, lineage is
automatic, users can self help
Watch for other solutions that…• Offer no unified audit trail for point in time user access• Limited lineage, available only at file level, no visualization• Third-party tools required for data discovery
Business Impact• DELAY: InfoSec testing, validation, and approval of third-party tools• COMPLIANCE RISK: Inability to respond quickly if point-in-time user access
history needed. Inability to meet core data governance requirements without column level lineage
Audit and Governance
✓✗
“Cloudera Navigator is… one pane of glass for all Hadoopmetadata and events including security”
Gartner – Protecting Big Data In Hadoop
McAfee Confidential©2014 Cloudera, Inc. All rights reserved.
Data Security Requirements
Perform analytics on regulated data
Encrypt data, conform to key management policies, protect from rootIntegrate with existing HSM as part of key management infrastructure
DataProtecting data in the
cluster from unauthorized visibility
InfoSec Concept:Compliance
McAfee Confidential
Right solution• Brings the power of pervasive analytics to regulated data• Delivers compliant protection with encryption, key management• Provides separation of duties between system and data admins• Conforms to existing policies regarding HSM based key management
Watch for other solutions that…• Run risk of data breach or theft by storing data in clear text• Require 3rd party encryption and key management solutions• Offer no protection against privileged user access• May not integrate with corporate HSM’s
Business Impact• EXPOSURE: theft or breach of data – fines, damage to brand• DELAY: InfoSec testing, validation and approval of 3rd party solutions
(COST)• DELAY: InfoSec approval for key management independent of HSM• COMPLIANCE RISK: Potential PCI issues
Encryption and Key Management
✓✗
“Cloudera… recognized that data in Hadoop must be protected both at rest and in transit… and have made data encryption part of their products.”
Gartner – Protecting Big Data In Hadoop
McAfee Confidential23
Recommendation
Enterprise data hub with built-in security
Comprehensive and integrated security
Compliance Ready
Intel & McAfee Confidential 24