issues securing (big) data
TRANSCRIPT
The enclosed materials are highly sensitive, proprietary and confidential. Please use every effort to safeguard the confidentiality of these materials. Please do not copy, distribute, use, share or otherwise provide access to these materials to any person inside or outside DST Systems, Inc. without prior written approval.
This proprietary, confidential presentation is for general informational purposes only and does not constitute an agreement. By making this presentation available to you, we are not granting any express or implied rights or licenses under any intellectual property right.
If we permit your printing, copying or transmitting of content in this presentation, it is under a non-exclusive, non-transferable, limited license, and you must include or refer to the copyright notice contained in this document. You may not create derivativeworks of this presentation or its content without our prior written permission. Any reference in this presentation to anotherentity or its products or services is provided for convenience only and does not constitute an offer to sell, or the solicitation ofan offer to buy, any products or services offered by such entity, nor does such reference constitute our endorsement, referral, or recommendation.
Our trademarks and service marks and those of third parties used in this presentation are the property of their respective owners.
© 2015 DST Systems, Inc. All rights reserved.
DisclaimerDisclaimer
• DST has established internal rules around the use of Big Data
• Data flowing into our data lake is partitioned by, what we call, Data Domains
• Each DST business unit is in essence at least one Data Domain
• Data Domains serve as the primary method of organizing our permission-ing
Big (or not) Data Security
• By default, one Business Unit is not granted access to another’s data
• Agreements between business units are made to access data for purpose
• Internal Data Scientists are given cross-Business Unit access to data
• Management mandate to secure data which has not been explicitly granted access
What This Means
4
• These rules result in a very complex matrix of permissions
• Example below• Data Doman ‘Business Unit A’ may be accessed by Business Unit A and Business
Unit D. Business Units B and C may not access this Data Domain
Complexity
5
BU A BU B BU C BU D
Dat
a D
om
ain Business Unit A X X
Business Unit B X X
Business Unit C X X X
Third Party Data X X
• Let’s deal with just text data on a file system in a Linux server
• Logical approach is to arrange directories to track with the Data Domains
• For permission-ing, create a group and directory for each Data Domain• Assign the group ownership as appropriate• Set umask to 007 – new files to have u:rw-, g:rw-, o:--- permissions
Scenario
6
sudo useradd buaadmsudo passwd -d buaadm
sudo useradd bubadmsudo passwd -d bubadm
sudo useradd bucadmsudo passwd -d bucadm
sudo useradd budadmsudo passwd -d budadm
sudo useradd tpdadmsudo passwd -d tpdadm
Details – Setup Users and Groups
7
sudo groupadd buagsudo usermod -G buag buaadm
sudo groupadd bubgsudo usermod -G bubg bubadm
sudo groupadd bucgsudo usermod -G bucg bucadm
sudo groupadd budgsudo usermod -G budg budadm
sudo groupadd tpdgsudo usermod -G tpdg tpdadm
sudo usermod -a -G buag,bubg,bucg,budg,tpdg dt206031
umask 007
cd $HOMEmkdir data
cd datamkdir buamkdir bubmkdir bucmkdir tpd
cd $HOME/data/buatouch bua_file_1touch bua_file_2touch bua_file_3touch bua_file_4touch bua_file_5sudo chown buaadm:buag *
Details – Setup Files
8
cd $HOME/data/bubtouch bub_file_1touch bub_file_2touch bub_file_3touch bub_file_4touch bub_file_5sudo chown bubadm:bubg *
cd $HOME/data/buctouch buc_file_1touch buc_file_2touch buc_file_3touch buc_file_4touch buc_file_5sudo chown bucadm:bucg *
cd $HOME/data/tpdtouch tpd_file_1touch tpd_file_2touch tpd_file_3touch tpd_file_4touch tpd_file_5sudo chown tpdadm:tpdg *
cd $HOME/datasudo chown buaadm:buag buasudo chown bubadm:bubg bubsudo chown bucadm:bucg bucsudo chown tpdadm:tpdg tpd
• The directory for the Data Domain ‘Business Unit A’ can be accessed by members of the ‘bua’ group
• How can we grant additional access to the ‘bud’ group, but still restrict other groups?
Complexity Redux
10
BU A BU B BU C BU D
Dat
a D
om
ain Business Unit A X X
Business Unit B X X
Business Unit C X X X
Third Party Data X X
• POSIX Access Control Lists (ACLs) are the answer to our dilemma• Not enabled by default. Needs to be enabled at the filesystem level• mount with the remount and acl options can enable• mount –o remount –o acl /dev/sda5 /home• See your system administrator for the permanent enable
The Secret Sauce
11
• setfacl is used to set the ACL for a file or directory
• getfacl is used to query and list the ACL of a file or directory
• Our specific need:• In addition to rwx permissions for the group ‘buag’, add rwx permissions for
the group ‘budg’ to the directory ‘bua’• In addition to rwx permissions for the group ‘bubg’, add rwx permissions for
the group ‘budg’ to the directory ‘bub’• In addition to rwx permissions for the group ‘bucg’, add rwx permissions for
the groups ‘bubg’ and ‘budg’ to the directory ‘buc’• In addition to rwx permissions for the group ‘tpdg’, add rwx permissions for the
groups ‘bucg’ and ‘budg’ to the directory ‘tpd’
The Tools
12
• In addition to rwx permissions for the group ‘buag’, add rwx permissions for the group ‘budg’ to the directory and contents of ‘bua’• setfacl –R --set u::rwx,g::rwx,o::-,g:budg:rwx bua
• In addition to rwx permissions for the group ‘bubg’, add rwx permissions for the group ‘budg’ to the directory and contents of ‘bub’• setfacl –R --set u::rwx,g::rwx,o::-,g:budg:rwx bub
• In addition to rwx permissions for the group ‘bucg’, add rwx permissions for the groups ‘bubg’ and ‘budg’ to the directory and contents of ‘buc’• setfacl –R --set u::rwx,g::rwx,o::-,g:bubg:rwx,g:budg:rwx buc
• In addition to rwx permissions for the group ‘tpdg’, add rwx permissions for the groups ‘bucg’ and ‘budg’ to the directory and contents of ‘tpd’• setfacl –R --set u::rwx,g::rwx,o::-,g:bucg:rwx,g:budg:rwx tpd
The Commands
13
• Hadoop HDFS v2.6 adds POSIX ACLs
• Make sure to turn it on firsthdfs-site.xml
<property>
<name>dfs.namenode.acls.enabled</name>
<value>true</value>
</property>
• Reboot the namenode
• Set an ACLhdfs dfs -setfacl -m u::rwx,g::rwx,o::-,g:budg:rwx /bua
• See the ACLshdfs dfs –getfacl /bua
How To Hadoop It
15
• Use a Default ACL for Automatic Application to New Childrensudo setfacl -d --set u::rwx,g::rwx,o::-,g:budg:rwx bua
sudo setfacl -d --set u::rwx,g::rwx,o::-,g:budg:rwx bub
sudo setfacl -d --set u::rwx,g::rwx,o::-,g:bubg:rwx,g:budg:rwx buc
sudo setfacl -d --set u::rwx,g::rwx,o::-,g:bucg:rwx,g:budg:rwx tpd
• And in Hadoop…hadoop fs -setfacl --set d:u::rwx,d:g::rwx,d:o::-,d:g:budg:rwx bua
hadoop fs -setfacl --set d:u::rwx,d:g::rwx,d:o::-,d:g:budg:rwx bub
hadoop fs -setfacl --set d:u::rwx,d:g::rwx,d:o::-,d:g:bubg:rwx,d:g:budg:rwx buc
hadoop fs -setfacl --set d:u::rwx,d:g::rwx,d:o::-,d:g:bucg:rwx,d:g:budg:rwx tpd
Other Goodies
16
• Don’t forget about the sticky bit• Makes it so that only root or the directory owner can delete filessudo chmod +t bua
• Use the setgid bit to set new files in a directory to have the same group owner as the directory.• Very handy when paired with default ACLSsudo chmod g+s bua
Last Extra Bits
18