encryption and anonymization in hadoop - linux
TRANSCRIPT
![Page 1: Encryption and Anonymization in Hadoop - Linux](https://reader034.vdocuments.net/reader034/viewer/2022051507/58a2eac01a28ab02228b93a7/html5/thumbnails/1.jpg)
Page 1
Encryption and Anonymization in Hadoop
Sept-28-2015 ApacheCon, Budapest
Current and Future needs
![Page 2: Encryption and Anonymization in Hadoop - Linux](https://reader034.vdocuments.net/reader034/viewer/2022051507/58a2eac01a28ab02228b93a7/html5/thumbnails/2.jpg)
Page 2
Agenda
• Need for data protection – Encryption and Anonymization • Current State of Encryption in Hadoop
• Demo
• Future focus areas for the community
![Page 3: Encryption and Anonymization in Hadoop - Linux](https://reader034.vdocuments.net/reader034/viewer/2022051507/58a2eac01a28ab02228b93a7/html5/thumbnails/3.jpg)
Page 3
Speakers
Chief Security Architect, Hortonworks Committer - Apache Ranger and Apache Hawq
Sr Director, Enterprise Security Hortonworks Committer - Apache Ranger
![Page 4: Encryption and Anonymization in Hadoop - Linux](https://reader034.vdocuments.net/reader034/viewer/2022051507/58a2eac01a28ab02228b93a7/html5/thumbnails/4.jpg)
Page 4
• Wire encryption in Hadoop
• HDFS, Hbase encryption
• Centralized audit reporting w/ Apache Ranger
• Fine grain access control with Apache Ranger
Security today in Hadoop
Authorization What can I do?
Audit What did I do?
Data Protection Can data be encrypted at rest and over the wire?
• Kerberos • API security with
Apache Knox
Authentication Who am I/prove it?
Had
oop
Eco
syst
em
Centralized Security Administration w/ Ranger
![Page 5: Encryption and Anonymization in Hadoop - Linux](https://reader034.vdocuments.net/reader034/viewer/2022051507/58a2eac01a28ab02228b93a7/html5/thumbnails/5.jpg)
Page 5
Data Protection Encryption and Anonymization
Page 5
![Page 6: Encryption and Anonymization in Hadoop - Linux](https://reader034.vdocuments.net/reader034/viewer/2022051507/58a2eac01a28ab02228b93a7/html5/thumbnails/6.jpg)
Page 6
Why is Encryption at Rest required?
• Sensitive data could be stored in Hadoop
• Compliance or external regulation may mandate encryption, example PCI (Retail, Consumer) or HIPAA ( Healthcare)
• Cost of not encrypting is increasing
• Enhanced Security • Added layer on top of authentication (passwords) and authorization (ACLs)
• Protect certain rogue administrators from accessing sensitive data
![Page 7: Encryption and Anonymization in Hadoop - Linux](https://reader034.vdocuments.net/reader034/viewer/2022051507/58a2eac01a28ab02228b93a7/html5/thumbnails/7.jpg)
Page 7
Available Hadoop Encryption Options
OS
HDFS
Hbase Custom Granualrity E
ase
of
impl
emen
tatio
n
![Page 8: Encryption and Anonymization in Hadoop - Linux](https://reader034.vdocuments.net/reader034/viewer/2022051507/58a2eac01a28ab02228b93a7/html5/thumbnails/8.jpg)
Page 8
OS Level Encryption – LUKS/DM-CRYPT
Partition 2..n
DM - CRYPT
Partition 1
/root /
grid0 /
grid2 /
gridn
Hadoop
Why it helps? • Encrypts entire disk volume
– all data is encrypted • Simpler setup, native OS
and Vendor solutions available
Cons • Performance challenges • Admin can still see raw
data
![Page 9: Encryption and Anonymization in Hadoop - Linux](https://reader034.vdocuments.net/reader034/viewer/2022051507/58a2eac01a28ab02228b93a7/html5/thumbnails/9.jpg)
Page 9
Ranger KMS
HDFS Transparent Encryption Solution
NN
A B
C D
HDFS Client
A B
C D
A B
C D
DN DN DN
Why it helps? • Encrypt only specific data • Different access control
levels • Transparent to end
application, little changes needed
• Auditing of Key Access
![Page 10: Encryption and Anonymization in Hadoop - Linux](https://reader034.vdocuments.net/reader034/viewer/2022051507/58a2eac01a28ab02228b93a7/html5/thumbnails/10.jpg)
Page 10
HDFS Encryption – Protect Application Data
NN
A B
C D
A B
C D
A B
C D
DN DN DN
HBase Hive Oozie Sqoop Spark
Guidelines • Encrypt Hive, Hbase data
stored in HDFS • Specific changes in Hive
to ensure scratch dir is encrypted
• Separate admins in HDFS, Yarn, Oozie
• Spark application logs should be in EZ
![Page 11: Encryption and Anonymization in Hadoop - Linux](https://reader034.vdocuments.net/reader034/viewer/2022051507/58a2eac01a28ab02228b93a7/html5/thumbnails/11.jpg)
Page 11
Ranger KMS – Centralized Key Management
![Page 12: Encryption and Anonymization in Hadoop - Linux](https://reader034.vdocuments.net/reader034/viewer/2022051507/58a2eac01a28ab02228b93a7/html5/thumbnails/12.jpg)
Page 12
HDFS TDE Workflow
Create Encryption
Zone
Create EZ Keys
Provide EZ Keys
Ranger KMS NN, DN Client
NN marks folder as EZ
![Page 13: Encryption and Anonymization in Hadoop - Linux](https://reader034.vdocuments.net/reader034/viewer/2022051507/58a2eac01a28ab02228b93a7/html5/thumbnails/13.jpg)
Page 13
HDFS TDE Workflow – Write a File
Receive EDEK.
Request DEK
Create DEK and encrypt with EZ Key
Decrypt EDEK,
provide DEK
NN, DN Client
Client request to write to EZ
NN does access check.
Encrypt data and write to
DN.
Send block information to client. EDEK
stored with file
Ranger KMS
![Page 14: Encryption and Anonymization in Hadoop - Linux](https://reader034.vdocuments.net/reader034/viewer/2022051507/58a2eac01a28ab02228b93a7/html5/thumbnails/14.jpg)
Page 14
HDFS TDE Workflow – Read a File
Receive EDEK.
Request DEK
Decrypt EDEK,
provide DEK
NN, DN Client
Client request to read from
EZ
NN does access check. Provide data,
EDEK
Use DEK to read file data
Ranger KMS
![Page 15: Encryption and Anonymization in Hadoop - Linux](https://reader034.vdocuments.net/reader034/viewer/2022051507/58a2eac01a28ab02228b93a7/html5/thumbnails/15.jpg)
Page 15
Hbase Encryption in 0.98
Why it helps? • Hfile
encrypted and stored in disk
• Per CF configuration
• Keys stored in Java keystore
![Page 16: Encryption and Anonymization in Hadoop - Linux](https://reader034.vdocuments.net/reader034/viewer/2022051507/58a2eac01a28ab02228b93a7/html5/thumbnails/16.jpg)
Page 16
Demo Don Bosco Durai
Page 16
![Page 17: Encryption and Anonymization in Hadoop - Linux](https://reader034.vdocuments.net/reader034/viewer/2022051507/58a2eac01a28ab02228b93a7/html5/thumbnails/17.jpg)
Page 17
Future Work Focus areas for the community
![Page 18: Encryption and Anonymization in Hadoop - Linux](https://reader034.vdocuments.net/reader034/viewer/2022051507/58a2eac01a28ab02228b93a7/html5/thumbnails/18.jpg)
Page 18
Encryption and Anonymization - Future Focus Areas ² Hive Column Encryption ² Solidifying Hbase Encryption
² Kafka and Solr Encryption
² Need for Tokenization/Masking
![Page 19: Encryption and Anonymization in Hadoop - Linux](https://reader034.vdocuments.net/reader034/viewer/2022051507/58a2eac01a28ab02228b93a7/html5/thumbnails/19.jpg)
Page 19
Hive Column Encryption • Being discussed in the community. Apache JIRA #
ORC-14
• Handled at the ORC layer
• Elegant solution. Encryption done after ORC compression.
• Each columns are different files and they can be encrypted with different key
• Leverage keyprovider API. Potentially can use Hadoop/Ranger KMS
How it will help? • Encrypt
fields instead of file
• Data protected in HDFS as well as OS layer
![Page 20: Encryption and Anonymization in Hadoop - Linux](https://reader034.vdocuments.net/reader034/viewer/2022051507/58a2eac01a28ab02228b93a7/html5/thumbnails/20.jpg)
Page 20
Kafka Encryption
• Discussion going on in Kafka community • Two possible approaches
• Broker encrypts and stores the data • Client(s) encrypt/decrypt the data
• Pros with client side encrypt/decrypt • No encryption/decryption overhead on Broker side • Keys not available on Broker, so data safe from everyone • No need for wire encryption
• Cons with client side encrypt/decrypt • Compaction/compression not effective with encrypted data. • Needs protocol change and update client libraries.
How it will help? • Encrypt any
local data stored in disks
• Data encrypted on wire
![Page 21: Encryption and Anonymization in Hadoop - Linux](https://reader034.vdocuments.net/reader034/viewer/2022051507/58a2eac01a28ab02228b93a7/html5/thumbnails/21.jpg)
Page 21
Solr Encryption
• No active discussion currently
• Will be good to have native support
• Index files could be encrypted/decrypted just like ORC
• Could be integrated with external KMS (Hadoop/Ranger)
How it will help? • Sensitive data
could be stored in indexes, may need to be encrypted
• Higher granularity than OS or HDFS encryption
![Page 22: Encryption and Anonymization in Hadoop - Linux](https://reader034.vdocuments.net/reader034/viewer/2022051507/58a2eac01a28ab02228b93a7/html5/thumbnails/22.jpg)
Page 22
Beyond Encryption... Anonymization1?
- Tokenization – Replace a sensitive field (eg: card number) with some other value. Could be format preserving or random unique value.
- Redaction - Mask sensitive data (eg: card numbers
can be changed to xxxx xxxx xxxx 1234)
1. http://blogs.gartner.com/merv-adrian/2014/01/13/aaa-is-not-enough-security-in-the-big-data-era/
1. Reference: http://blogs.gartner.com/merv-adrian/2014/01/13/aaa-is-not-enough-security-in-the-big-data-era/
How it helps? • Protect
sensitive data beyond access control
• Field level control
• Enable compliance to privacy laws
![Page 23: Encryption and Anonymization in Hadoop - Linux](https://reader034.vdocuments.net/reader034/viewer/2022051507/58a2eac01a28ab02228b93a7/html5/thumbnails/23.jpg)
Page 23
Where is it applicable?
• Sensitive data in HDFS file
• Column values in Hive or Hbase
• Field values in Solr
• Messages in Kafka or NiFi
![Page 24: Encryption and Anonymization in Hadoop - Linux](https://reader034.vdocuments.net/reader034/viewer/2022051507/58a2eac01a28ab02228b93a7/html5/thumbnails/24.jpg)
Page 24
How?
• Tokenize on source • Tokenize while ingesting data (Flume, NiFi, Sqoop, etc.) • Data stored tokenized, so safe to give access to others. • Selective users can de-tokenize if needed
• Tokenize/Mask on read • E.g. select name, mobile_number from customer; Based on policy, if user is Data Scientist, then tokenize/mask data before returning
Name Returned (Format Preserved)
Actual
John Doe 415-123-4567 415-682-5638 Jane Smith 408-123-4567 408-802-4027 Mary Pick 650-123-4567 650-865-6921
![Page 25: Encryption and Anonymization in Hadoop - Linux](https://reader034.vdocuments.net/reader034/viewer/2022051507/58a2eac01a28ab02228b93a7/html5/thumbnails/25.jpg)
Page 25
Questions ??