hive contributors meetup apache sentry
DESCRIPTION
TRANSCRIPT
Current State of Authorization in Hive
• Advisory Authorization - Facilitates self regulation to avoid safeguard against accidental changes - Users can grant themselves privileges as necessary - Problem: Insufficient to guard against malicious users
• Impersonation - Data is protected at the file level by HDFS permissions - Problem: File-level access is not granular enough - Problem: Not role-based
2
Authorization Requirements
• Secure Authorization Ability to control access to data and/or privileges on data for authenticated users
• Fine-Grained Authorization Ability to give users access to a subset of data in files
• Role-Based Authorization Ability to create/apply templatized privileges based on functional roles
• Multi-Tenant Administration Ability for central admin group to empower lower-level admins to manage security for each database/schema
3
Introducing Sentry
Authorization module for Hadoop ecosystem • Unlocks Key RBAC Requirements
ᵒ Secure, fine-grained, role-based authorization
ᵒ Multi-tenant administration
ᵒ Open Source via Apache Incubator
ᵒ Modular RBAC Framework
ᵒ Multiple users in production for months
4
Sentry: Fine-Grained Authorization
5
Binding
Policy
Policy Provider
Hive Binding
Database Policy
File-‐based Provider
Solr Binding
Search Policy
File-‐based Provider
Concepts Implementa=ons
Sentry: Fine-Grained Authorization
• Ability to specify privileges on ᵒ SERVER, DATABASE, TABLE, VIEW, URI
• Privilege Granularity ᵒ SELECT ᵒ INSERT ᵒ ALL
• Multi-Tenant Administration ᵒ Administration per database
6
Granting Privileges
• Example: Grant SELECT on table CUSTOMERS from database SALES:
server=server1->db=sales->table=customer->action=SELECT!
• Objects represented by containment Hierarchy • Privilege granted for the leaf object and its continues
!!
7
Specifying Roles
• Roles are collection of Privileges • Example: A role Seller that allows SELECT on table CUSTOMER and Insert on
table ITEMS !
seller_role = server=server1->db=sales->table=customer->action=Select, \!
! server=server1->db=sales->table=items->action=Insert!
8
Users and Groups
• Works with existing Authentication Mechanisms • Group connects the authentication system with authorization system. ᵒ A Set of Roles can be assigned to a Group
!analyst = sales_reporting, data_export, audit_report!
• User to Group Mapping: ᵒ Using Hadoop groups ᵒ Or Specify Locally in sentry-site.xml file
9
User Feedback
I have implemented Hiveserver2 Authentication (openLDAP) and Authorization (using Cloudera Sentry). I am super-excited because we know can open our Hive Data Platform in "read only" mode to remote clients in the company and SAS clients.
Source: • Apache [email protected] • Tue, 17 Sep 2013 19:10:43 GMT • http://s.apache.org/hive-sentry-user
10
Future Direction
• Integration with other systems • More Granular Privileges • Usability Improvements
11
Hive Requirements
• Sentry plugs into existing hooks such as the Semantic Analyzer hook interface • Changes required are minor, estimating ~600 LOC including unit tests
12
Hive Requirements
Follow Hive integration via SENTRY-67 • HIVE-4670 - Authentication module should pass the instance part of the
Kerberos principle
• HIVE-4390 - Enable capturing input URI entities for DML statements
• HIVE-4741 - Add Hive config API to modify the restrict list
• HIVE-4641 - Support post execution/fetch hook for HiveServer2
13