big sql security demo

28
© 2015 IBM Corporation Make Hive Data More Secure with Big SQL Paul Yip BigInsights Product Manager [email protected]

Upload: hadoop-dev

Post on 12-Apr-2017

354 views

Category:

Data & Analytics


0 download

TRANSCRIPT

© 2015 IBM Corporation

Make Hive Data More Secure

with Big SQLPaul Yip

BigInsights Product Manager

[email protected]

© 2015 IBM Corporation3

Watch this on YouTube (which adds a demo)…

https://www.youtube.com/watch?v=N2FN5h25-_s

Questions?

Paul Yip – [email protected]

© 2015 IBM Corporation4

Hive is Really 3 Things…

Storage Format, Metastore, and Execution Engine

4

SQL Execution Engine

Hive

(Open Source)

Hive Storage Model

(open source)

CSV Parquet RC Others…Tab Delim.

Hive Metastore

(open source)M

apR

edu

ce

Applications

© 2015 IBM Corporation5

OutputReduceMap

Hive “Execution Engine”

SQL

Hive

References Hive Meta Store to understand data

Translates SQL to Map Reduce

© 2015 IBM Corporation6

Big SQL preserves open source foundationLeverages Hive metastore and storage formats.

No Lock-in. Data part of Hadoop, not BigSQL. Fall back to Open Source Hive Engine at any time.

6

SQL Execution Engines

IBM BigSQL

(IBM)

Hive

(Open Source)

Hive Storage Model

(open source)

CSV Parquet RC Others…Tab Delim.

Hive Metastore

(open source)

Applications

© 2015 IBM Corporation7

Problem: Managing privileges on users is tedious and error-prone…

GRANT SELECT, INSERT ON T1 TO USER ALBERT;

GRANT SELECT ON T2 TO USER ALBERT;

GRANT SELECT, DELETE ON T3 TO USER ALBERT;

GRANT SELECT ON T4 TO USER ALBERT;

GRANT SELECT ON T4 TO USER BONNIE;

GRANT SELECT ON T5 TO USER BONNIE;

GRANT SELECT, UPDATE,DELETE ON T6 TO USER BONNIE;

T1T2

T3

T5 T6

T4

© 2015 IBM Corporation8

The problem magnifies with hundreds of users.

T1T2

T3

T5 T7

T4

T6

REVOKE x 100’s users from T1 would be tedious

GRANT x 100’s users to T6 would be tedious

New Users: have to decide what kind of access they need

Departing Users: need to clean up their access rights

New

Table

Restrict

Table

Access

© 2015 IBM Corporation9

Best Practice: Role-Based Access Control

Define the roles that exist in the organization

Assign sets or privileges to roles

CREATE ROLE BRANCH_A_ROLE;

GRANT SELECT ON … TO ROLE BRANCH_A_ROLE;

GRANT SELECT ON … TO ROLE BRANCH_A_ROLE;

CREATE ROLE BRANCH_B_ROLE;

CRATE ROLE FINANCE_ROLE;

T1

T2

T3

T6 T7

T4

BRANCH_A BRANCH_B FINANCE

© 2015 IBM Corporation10

GRANT and REVOKE roles to individuals

GRANT ROLE

BRANCH_A_ROLE

TO USER Albert;

GRANT ROLE

BRANCH_B_ROLE

TO USER Bonnie

BRANCH_A_ROLE BRANCH_B_ROLE

GRANT ROLE

FINANCE

TO USER Frieda

FINANCE_ROLE

© 2015 IBM Corporation11

GRANT/REVOKE roles as user access needs change.

REVOKE ROLE

BRANCH_A_ROLE

FROM USER Albert;

GRANT ROLE

BRANCH_B_ROLE

TO USER Bonnie

BRANCH_A_ROLE BRANCH_B_ROLE

GRANT ROLE

FINANCE

TO USER Frieda

GRANT ROLE

BRANCH_B_ROLE

TO USER Albert;

BRANCH_B_ROLE

Albert moves

from

Branch A to

Branch B

FINANCE_ROLE

© 2015 IBM Corporation12

Problem #2: Users should only see data that matters to

them….

I need to see all

the data

© 2015 IBM Corporation13

Problem #3: Sensitive Data in Columns

“These users need to access the table,… but not SALARY!”

© 2015 IBM Corporation14

Big SQL - Dynamic Data Masking Dynamically masked at

query time based on

user role

BUT,… I DO

NEED to see all

the data

We don’t need to

see salary data

© 2015 IBM Corporation15

Big SQL preserves open source foundationLeverages Hive metastore and storage formats.

No Lock-in. Data part of Hadoop, not BigSQL. Fall back to Open Source Hive Engine at any time.

15

SQL Execution Engines

IBM BigSQL

(IBM)

Hive

(Open Source)

Hive Storage Model

(open source)

CSV Parquet RC Others…Tab Delim.

Hive Metastore

(open source)

Applications

© 2015 IBM Corporation18

Create ROLES and Assign Privileges to Roles

CREATE ROLE BRANCH_A_ROLE;

GRANT SELECT ON HR.STAFF TO ROLE BRANCH_A_ROLE;

CREATE ROLE BRANCH_B_ROLE;

GRANT SELECT ON HR.STAFF TO ROLE BRANCH_B_ROLE;

CREATE ROLE FINANCE_ROLE;

GRANT SELECT ON HR.STAFF TO ROLE FINANCE_ROLE;

© 2015 IBM Corporation19

Allow FINANCE_ROLE to see all rows of data

CREATE PERMISSION FINANCE_ACCESS ON HR.STAFF

FOR ROWS WHERE

VERIFY_ROLE_FOR_USER(SESSION_USER,'FINANCE_ROLE') = 1

ENFORCED FOR ALL ACCESS

ENABLE;

© 2015 IBM Corporation20

Allow BRANCH_A_ROLE to see Branch_A data only

CREATE PERMISSION BRANCH_A_ACCESS ON HR.STAFF

FOR ROWS WHERE

(

VERIFY_ROLE_FOR_USER(SESSION_USER,'BRANCH_A_ROLE') = 1

AND

HR.STAFF.BRANCH_NAME = 'Branch_A'

)

ENFORCED FOR ALL ACCESS

ENABLE;

© 2015 IBM Corporation21

Allow BRANCH_B_ROLE to see Branch_B data only

CREATE PERMISSION BRANCH_B_ACCESS ON HR.STAFF

FOR ROWS WHERE

(

VERIFY_ROLE_FOR_USER(SESSION_USER,'BRANCH_B_ROLE') = 1

AND

HR.STAFF.BRANCH_NAME = 'Branch_B‘

)

ENFORCED FOR ALL ACCESS

ENABLE;

© 2015 IBM Corporation22

GRANT and REVOKE roles to individuals

GRANT ROLE

BRANCH_A_ROLE

TO USER Albert;

GRANT ROLE

BRANCH_B_ROLE

TO USER Bonnie

BRANCH_A_ROLE BRANCH_B_ROLE

GRANT ROLE

FINANCE

TO USER Frieda

FINANCE_ROLE

ALTER TABLE HR.STAFF

ACTIVATE ROW ACCESS CONTROL;

© 2015 IBM Corporation23

SELECT FROM HR.STAFF …..

Branch specific data only…

BRANCH_A_ROLE

BRANCH_B_ROLE

© 2015 IBM Corporation24

SELECT FROM HR.STAFF …..

Finance can see all branches

FINANCE_ROLE

© 2015 IBM Corporation25

WE CAN MAKE THIS EVEN

BETTER…..

That was great! But, …

© 2015 IBM Corporation26

SELECT FROM HR.STAFF …..

BRANCH_A_ROLE

BRANCH_B_ROLE

Hmm.. Salary seems pretty

sensitive

© 2015 IBM Corporation27

SALARY is very sensitive.

We should hide that except from users in Finance…

CREATE MASK SALARY_MASK ON HR.STAFF

FOR COLUMN SALARY RETURN

CASE

WHEN VERIFY_ROLE_FOR_USER(SESSION_USER,'FINANCE_ROLE') = 1

THEN SALARY

ELSE NULL

END

ENABLE;

ALTER TABLE HR.STAFF

ACTIVATE COLUMN ACCESS CONTROL;

© 2015 IBM Corporation28

Combined Row and Column security

BRANCH_A_ROLE

BRANCH_B_ROLE

© 2015 IBM Corporation29

Finance can see SALARY and all branch data

FINANCE_ROLE

© 2015 IBM Corporation30

Summary

Big SQL preserves Hive’s open-source storage model and metastore

Use Big SQL as the execution engine:

ROLE BASED Access Control

ROW LEVEL Security (dynamic filtering)

COLUMN LEVEL security (dynamic masking)

Big SQL makes access to Hive data Faster and More Secure

© 2015 IBM Corporation31

Recap - Big SQL preserves open source foundation

SQL Execution Engines

IBM BigSQL

(IBM)

Hive

(Open Source)

Hive Storage Model

(open source)

CSV Parquet RC Others…Tab Delim.

Hive Metastore

(open source)

Applications

Big SQL Makes Hive

FASTER and more SECURE