secure hadoop clusters on windows platform

14
Secure Hadoop Clusters on Windows Platform Hadoop Users Group Bucharest Jan 29/2015 Remus Rusanu

Upload: remus-rusanu

Post on 18-Jul-2015

75 views

Category:

Software


1 download

TRANSCRIPT

Secure Hadoop Clusters on Windows Platform

Hadoop Users GroupBucharest Jan 29/2015

Remus Rusanu

about:me

• SQL Server engine developer since 2001

• Worked on HDInsight service (Azure Hadoop offering) and on PDW appliance Hadoop region

• Hive contributor: vectorized execution engine HIVE-4160

• Hadoop contributor: Windows secure YARN containers YARN-2190

• @rusanu

• Stack Overflow user 105929

Integrate Hadoop with Windows Security

• Integrate your cluster with the existing Active Domain

• Integrated security• Use Windows domain users

• No need for local users, local passwords

• Single sign-on• Only provide password when opening OS session

• Group membership provided from AD groups

Benefits

• Group membership based access control• Domain\HadoopUsers: Granted access to Hadoop cluster

• Domain\NewHire is added to HadoopUsers

• Domain\NewHire has access to Hadoop cluster

• Centralized password control• Only administer the Active Domain

• Integrate with the rest of the enterprise that uses AD

What can leverage AD based Access Control

• HDFS

• M/R queues

• HTTP interfaces (Web UI)

• Hadoop ecosystem stack• Oozie proxy (Hadoop super)

Secure Hadoop clusters

• “Kerberized” cluster• Users are authenticated using Kerberos• Services authenticate each other using Kerberos

• Data encryption in traffic• RPC, block transfer, HTTP• No data encryption at rest

• Permission control for containers (task)• Containers cannot access service (NM) files

• Process isolation• Containers cannot access each other files

• SecureMode: http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SecureMode.html

Windows and Secure Hadoop Clusters

• Windows does support integrated authentication and single sign on with a Kerberized cluster• Can be a Linux Kerberized cluster too, with proper KDC configuration

• Requires allowtgtsessionkey key in registry

• See KB308339: Registry Key to Allow Session Keys to Be Sent in Kerberos Ticket-Granting-Ticket: http://support.microsoft.com/kb/308339

• Not allowed for LUA, see KB2627903 Access to Session Keys not possible using a restricted Token: http://support.microsoft.com/kb/2627903

• This solves the problem of authenticating the user at cluster periphery (job submit)

Securing Hadoop Services

• Same as Linux configuration• Hadoop.security.authentication: Kerberos• Hadoop.security.authorization: True• Hadoop.http.filter.initializers: org.apache.Hadoop.security.AuthenticationFilterInitializer• Hadoop.http.authentication.type: Kerberos• Etc. etc. Refer to your installation Secure Mode guide.

• Use ktpass.exe to obtain keytab files for NT domain users• https://technet.microsoft.com/en-us/library/cc753771.aspx

• Configure KDC in krb5.ini for the realm (domain)

• Enable AES128 and AES256 for the user accounts in AD• msDS-SupportedEncryptionTypes

• Not required for Hadoop services to run as the service accounts, Hadoop will use principal names and keytabfiles anyway• I recommend it none the less, confusing otherwise

• Java runtime must contain the Unlimited Strength JCE policy files

Group Membership Provider

• LDAP provider already works• hadoop.security.group.mapping: org.apache.hadoop.security.LdapGroupsMapping

• HDFS access control

• M/R queues access control

• Add LDAP_MATCHING_RULE_IN_CHAIN• hadoop.security.group.mapping.ldap.search.attr.member:

member:1.2.840.113556.1.4.1941:

• This rule is limited to filters that apply to the DN. This is a special "extended match operator that walks the chain of ancestry in objects all the way to the root until it finds a match

• See https://msdn.microsoft.com/en-us/library/aa746475%28v=vs.85%29.aspx

Windows Secure Container Executor

• Windows platform equivalent of LinuxContainerExecutor• http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-

site/SecureContainer.html

• Yarn-2190

• Leverages the S4U (Self4User) Kerberos extension• A process that has the SE_TCB (Trusted Computing Base) privilege can

impersonate an arbitrary user w/o providing a password for said user

• Creates an isolated environment for a container and then launches the container impersonating the user

Configuring WSCE

• Requires a privileged NT service: • %HADOOP_HOME%\bin\winutils /service

• Must run as LocalSystem

• Equivalent of LinuxContainerExecutor’s container executor binary with setuidset and owned by root

• Requires %HADOOP_HOME%\etc\hadoop\wsce_site.xml• impersonate.allowed: users allowed to be impersonated

• impersonate.denied: users explicitly forbidden from being impersonated

• Very powerful: launch a process as arbitrary user• Validates that wsce_site.xml is writable only by Administrators

Demo

//TODO

• Forests, domain trust etc.• Currently works only with one single domain

• Hadoop infrastructure modeled after Linux security model, does not support “domain\user”

• Delegation• S4U extension does not support delegation

• Container cannot access resource outside the node host• Eg. sqoop access SQL Server under Integrated Security: won’t work

• Deployment/configuration support• Ambari (Hortonworks)

Q & A