hive contributors meetup apache sentry

13

Click here to load reader

Upload: brock-noland

Post on 27-Jan-2015

107 views

Category:

Technology


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Hive contributors meetup   apache sentry

November 18th, 2013

Apache Sentry (incubating) On Hive Integration

[email protected]

Page 2: Hive contributors meetup   apache sentry

Current State of Authorization in Hive

•  Advisory Authorization -  Facilitates self regulation to avoid safeguard against accidental changes -  Users can grant themselves privileges as necessary -  Problem: Insufficient to guard against malicious users

•  Impersonation -  Data is protected at the file level by HDFS permissions -  Problem: File-level access is not granular enough -  Problem: Not role-based

2

Page 3: Hive contributors meetup   apache sentry

Authorization Requirements

•  Secure Authorization Ability to control access to data and/or privileges on data for authenticated users

•  Fine-Grained Authorization Ability to give users access to a subset of data in files

•  Role-Based Authorization Ability to create/apply templatized privileges based on functional roles

•  Multi-Tenant Administration Ability for central admin group to empower lower-level admins to manage security for each database/schema

3

Page 4: Hive contributors meetup   apache sentry

Introducing Sentry

Authorization module for Hadoop ecosystem •  Unlocks Key RBAC Requirements

ᵒ  Secure, fine-grained, role-based authorization

ᵒ  Multi-tenant administration

ᵒ  Open Source via Apache Incubator

ᵒ  Modular RBAC Framework

ᵒ  Multiple users in production for months

4

Page 5: Hive contributors meetup   apache sentry

Sentry: Fine-Grained Authorization

5

Binding  

Policy  

Policy  Provider  

Hive  Binding  

Database  Policy  

File-­‐based  Provider  

Solr  Binding  

Search  Policy  

File-­‐based  Provider  

Concepts   Implementa=ons  

Page 6: Hive contributors meetup   apache sentry

Sentry: Fine-Grained Authorization

•  Ability to specify privileges on ᵒ  SERVER, DATABASE, TABLE, VIEW, URI

•  Privilege Granularity ᵒ  SELECT ᵒ  INSERT ᵒ  ALL

•  Multi-Tenant Administration ᵒ  Administration per database

6

Page 7: Hive contributors meetup   apache sentry

Granting Privileges

•  Example: Grant SELECT on table CUSTOMERS from database SALES:

server=server1->db=sales->table=customer->action=SELECT!

•  Objects represented by containment Hierarchy •  Privilege granted for the leaf object and its continues

!!

7

Page 8: Hive contributors meetup   apache sentry

Specifying Roles

•  Roles are collection of Privileges •  Example: A role Seller that allows SELECT on table CUSTOMER and Insert on

table ITEMS !

seller_role = server=server1->db=sales->table=customer->action=Select, \!

! server=server1->db=sales->table=items->action=Insert!

8

Page 9: Hive contributors meetup   apache sentry

Users and Groups

•  Works with existing Authentication Mechanisms •  Group connects the authentication system with authorization system. ᵒ  A Set of Roles can be assigned to a Group

!analyst = sales_reporting, data_export, audit_report!

•  User to Group Mapping: ᵒ  Using Hadoop groups ᵒ  Or Specify Locally in sentry-site.xml file

9

Page 10: Hive contributors meetup   apache sentry

User Feedback

I have implemented Hiveserver2 Authentication (openLDAP) and Authorization (using Cloudera Sentry). I am super-excited because we know can open our Hive Data Platform in "read only" mode to remote clients in the company and SAS clients.

Source: •  Apache [email protected] •  Tue, 17 Sep 2013 19:10:43 GMT •  http://s.apache.org/hive-sentry-user

10

Page 11: Hive contributors meetup   apache sentry

Future Direction

•  Integration with other systems •  More Granular Privileges •  Usability Improvements

11

Page 12: Hive contributors meetup   apache sentry

Hive Requirements

•  Sentry plugs into existing hooks such as the Semantic Analyzer hook interface •  Changes required are minor, estimating ~600 LOC including unit tests

12

Page 13: Hive contributors meetup   apache sentry

Hive Requirements

Follow Hive integration via SENTRY-67 •  HIVE-4670 - Authentication module should pass the instance part of the

Kerberos principle

•  HIVE-4390 - Enable capturing input URI entities for DML statements

•  HIVE-4741 - Add Hive config API to modify the restrict list

•  HIVE-4641 - Support post execution/fetch hook for HiveServer2

13