protecting enterprise data in apache hadoop
TRANSCRIPT
© Hortonworks Inc. 2015
Protecting Enterprise Datain Apache Hadoop
June 2015
Page 1
Owen O’[email protected]@owen_omalley
© Hortonworks Inc. 2015
Security
Page 2
• What are the important threats?• What are the attack vectors?• Minimize attack surfaces.
© Hortonworks Inc. 2015
Security Architecture
Page 3
© Hortonworks Inc. 2015
Attack Vectors
Page 4
• Physical Access to Machines• Remote Access• Gateway Machines• Slave Machines• Master Machines
© Hortonworks Inc. 2015
Attack Vectors
Page 5
• Root• Has complete access on machine
• Hadoop Administrator• Has complete Hadoop access
• User• Limited by Hadoop permissions
© Hortonworks Inc. 2015
Threat: Accidental Damage
Page 6
• User deletes files• Need HDFS permissions• User kills other user’s jobs• Need Linux Container Executor
© Hortonworks Inc. 2015
Threat: Remote Access
Page 7
• Need Service Level Authorization• Which users can use each service• Apache Ranger simplifies this!
• Need firewall around cluster• Control attack surface
© Hortonworks Inc. 2015
Threat: Eavesdropping
Page 8
• Root can watch network traffic• Need wire encryption
• HTTP SSL encryption• Shuffle• Data transfer• RPC encryption
© Hortonworks Inc. 2015
Threat: User accesses private data
Page 9
• Need Kerberos• HDFS permissions are critical
• ACLs add additional flexibility• Define user-to-group mapping
• Need group directories• Minimize user or global spaces
© Hortonworks Inc. 2015
Threat: Physical access
Page 10
• Very Rare• Attacker can get to physical box
• Hopeless• Can remove hard drives
• Includes access to retired drives• Need raw file system encryption
© Hortonworks Inc. 2015
Threat: Hadoop Admin in Cluster
Page 11
• Attacker is an Hadoop Admin• But not root
• Need HDFS Encryption Zones• Directory sub-tree encryption• Each file gets unique key• Client decrypts data
© Hortonworks Inc. 2015
HDFS Encryption
Page 12
• Each zone has master key• Each file gets unique sub-key• HDFS stores sub-key encrypted with
master key• Client uses sub-key to decrypt file
© Hortonworks Inc. 2015
KeyProvider API
Page 13
• Key management• Allows 3rd party plugins• Named keys• Key versions and key rolling• Key Management Server is alpha
© Hortonworks Inc. 2015
Encryption Scheme
Page 14
• AES/CTR• Supports append
and seek• Does not pad files• Uses Initialization
Vector
© Hortonworks Inc. 2015
Threat: User Deletes Hive tables
Page 15
• Configure Storage-Based Auth• Table permissions match HDFS
• Change owner and group on HDFS• Modify permissions on HDFS
© Hortonworks Inc. 2015
Threat: User reads private columns
Page 16
• Some parts of file have different protection requirements.
• Configure SQL Standard Auth• Supports grant permissions
• Restrict permissions in HDFS• All normal users use Hive Server 2
© Hortonworks Inc. 2015
Threat: User reads private columns
Page 17
• Some columns are more sensitive• But need custom file formats & UDF• Need Column Encryption
• Strong encryption of entire column• File format specific• Working on ORC in HIVE-4227
© Hortonworks Inc. 2015
ORC File Layout
Page 18
© Hortonworks Inc. 2015
Threat: User reads hidden values
Page 19
• Want to hide values in some columns.
• Need value encryption• Provided by 3rd parties• Typically done using UDFs
© Hortonworks Inc. 2015
Threat: Shadow Security
Page 20
• Be very wary of non-public encryption schemes
• Cryptosystems need deep analysis• Need to use AES or stronger.• Avoid masking or format preserving
that is not based on AES
© Hortonworks Inc. 2015
Resources
Page 21
• Use the developer lists:• [email protected]
• [email protected]• Report security holes: