big sql security demo
TRANSCRIPT
© 2015 IBM Corporation
Make Hive Data More Secure
with Big SQLPaul Yip
BigInsights Product Manager
© 2015 IBM Corporation3
Watch this on YouTube (which adds a demo)…
https://www.youtube.com/watch?v=N2FN5h25-_s
Questions?
Paul Yip – [email protected]
© 2015 IBM Corporation4
Hive is Really 3 Things…
Storage Format, Metastore, and Execution Engine
4
SQL Execution Engine
Hive
(Open Source)
Hive Storage Model
(open source)
CSV Parquet RC Others…Tab Delim.
Hive Metastore
(open source)M
apR
edu
ce
Applications
© 2015 IBM Corporation5
OutputReduceMap
Hive “Execution Engine”
SQL
Hive
References Hive Meta Store to understand data
Translates SQL to Map Reduce
© 2015 IBM Corporation6
Big SQL preserves open source foundationLeverages Hive metastore and storage formats.
No Lock-in. Data part of Hadoop, not BigSQL. Fall back to Open Source Hive Engine at any time.
6
SQL Execution Engines
IBM BigSQL
(IBM)
Hive
(Open Source)
Hive Storage Model
(open source)
CSV Parquet RC Others…Tab Delim.
Hive Metastore
(open source)
Applications
© 2015 IBM Corporation7
Problem: Managing privileges on users is tedious and error-prone…
GRANT SELECT, INSERT ON T1 TO USER ALBERT;
GRANT SELECT ON T2 TO USER ALBERT;
GRANT SELECT, DELETE ON T3 TO USER ALBERT;
GRANT SELECT ON T4 TO USER ALBERT;
GRANT SELECT ON T4 TO USER BONNIE;
GRANT SELECT ON T5 TO USER BONNIE;
GRANT SELECT, UPDATE,DELETE ON T6 TO USER BONNIE;
T1T2
T3
T5 T6
T4
© 2015 IBM Corporation8
The problem magnifies with hundreds of users.
T1T2
T3
T5 T7
T4
T6
REVOKE x 100’s users from T1 would be tedious
GRANT x 100’s users to T6 would be tedious
New Users: have to decide what kind of access they need
Departing Users: need to clean up their access rights
New
Table
Restrict
Table
Access
© 2015 IBM Corporation9
Best Practice: Role-Based Access Control
Define the roles that exist in the organization
Assign sets or privileges to roles
CREATE ROLE BRANCH_A_ROLE;
GRANT SELECT ON … TO ROLE BRANCH_A_ROLE;
GRANT SELECT ON … TO ROLE BRANCH_A_ROLE;
CREATE ROLE BRANCH_B_ROLE;
…
CRATE ROLE FINANCE_ROLE;
…
T1
T2
T3
T6 T7
T4
BRANCH_A BRANCH_B FINANCE
© 2015 IBM Corporation10
GRANT and REVOKE roles to individuals
GRANT ROLE
BRANCH_A_ROLE
TO USER Albert;
GRANT ROLE
BRANCH_B_ROLE
TO USER Bonnie
BRANCH_A_ROLE BRANCH_B_ROLE
GRANT ROLE
FINANCE
TO USER Frieda
FINANCE_ROLE
© 2015 IBM Corporation11
GRANT/REVOKE roles as user access needs change.
REVOKE ROLE
BRANCH_A_ROLE
FROM USER Albert;
GRANT ROLE
BRANCH_B_ROLE
TO USER Bonnie
BRANCH_A_ROLE BRANCH_B_ROLE
GRANT ROLE
FINANCE
TO USER Frieda
GRANT ROLE
BRANCH_B_ROLE
TO USER Albert;
BRANCH_B_ROLE
Albert moves
from
Branch A to
Branch B
FINANCE_ROLE
© 2015 IBM Corporation12
Problem #2: Users should only see data that matters to
them….
I need to see all
the data
© 2015 IBM Corporation13
Problem #3: Sensitive Data in Columns
“These users need to access the table,… but not SALARY!”
© 2015 IBM Corporation14
Big SQL - Dynamic Data Masking Dynamically masked at
query time based on
user role
BUT,… I DO
NEED to see all
the data
We don’t need to
see salary data
© 2015 IBM Corporation15
Big SQL preserves open source foundationLeverages Hive metastore and storage formats.
No Lock-in. Data part of Hadoop, not BigSQL. Fall back to Open Source Hive Engine at any time.
15
SQL Execution Engines
IBM BigSQL
(IBM)
Hive
(Open Source)
Hive Storage Model
(open source)
CSV Parquet RC Others…Tab Delim.
Hive Metastore
(open source)
Applications
© 2015 IBM Corporation18
Create ROLES and Assign Privileges to Roles
CREATE ROLE BRANCH_A_ROLE;
GRANT SELECT ON HR.STAFF TO ROLE BRANCH_A_ROLE;
CREATE ROLE BRANCH_B_ROLE;
GRANT SELECT ON HR.STAFF TO ROLE BRANCH_B_ROLE;
CREATE ROLE FINANCE_ROLE;
GRANT SELECT ON HR.STAFF TO ROLE FINANCE_ROLE;
© 2015 IBM Corporation19
Allow FINANCE_ROLE to see all rows of data
CREATE PERMISSION FINANCE_ACCESS ON HR.STAFF
FOR ROWS WHERE
VERIFY_ROLE_FOR_USER(SESSION_USER,'FINANCE_ROLE') = 1
ENFORCED FOR ALL ACCESS
ENABLE;
© 2015 IBM Corporation20
Allow BRANCH_A_ROLE to see Branch_A data only
CREATE PERMISSION BRANCH_A_ACCESS ON HR.STAFF
FOR ROWS WHERE
(
VERIFY_ROLE_FOR_USER(SESSION_USER,'BRANCH_A_ROLE') = 1
AND
HR.STAFF.BRANCH_NAME = 'Branch_A'
)
ENFORCED FOR ALL ACCESS
ENABLE;
© 2015 IBM Corporation21
Allow BRANCH_B_ROLE to see Branch_B data only
CREATE PERMISSION BRANCH_B_ACCESS ON HR.STAFF
FOR ROWS WHERE
(
VERIFY_ROLE_FOR_USER(SESSION_USER,'BRANCH_B_ROLE') = 1
AND
HR.STAFF.BRANCH_NAME = 'Branch_B‘
)
ENFORCED FOR ALL ACCESS
ENABLE;
© 2015 IBM Corporation22
GRANT and REVOKE roles to individuals
GRANT ROLE
BRANCH_A_ROLE
TO USER Albert;
GRANT ROLE
BRANCH_B_ROLE
TO USER Bonnie
BRANCH_A_ROLE BRANCH_B_ROLE
GRANT ROLE
FINANCE
TO USER Frieda
FINANCE_ROLE
ALTER TABLE HR.STAFF
ACTIVATE ROW ACCESS CONTROL;
© 2015 IBM Corporation23
SELECT FROM HR.STAFF …..
Branch specific data only…
BRANCH_A_ROLE
BRANCH_B_ROLE
© 2015 IBM Corporation26
SELECT FROM HR.STAFF …..
BRANCH_A_ROLE
BRANCH_B_ROLE
Hmm.. Salary seems pretty
sensitive
© 2015 IBM Corporation27
SALARY is very sensitive.
We should hide that except from users in Finance…
CREATE MASK SALARY_MASK ON HR.STAFF
FOR COLUMN SALARY RETURN
CASE
WHEN VERIFY_ROLE_FOR_USER(SESSION_USER,'FINANCE_ROLE') = 1
THEN SALARY
ELSE NULL
END
ENABLE;
ALTER TABLE HR.STAFF
ACTIVATE COLUMN ACCESS CONTROL;
© 2015 IBM Corporation30
Summary
Big SQL preserves Hive’s open-source storage model and metastore
Use Big SQL as the execution engine:
ROLE BASED Access Control
ROW LEVEL Security (dynamic filtering)
COLUMN LEVEL security (dynamic masking)
Big SQL makes access to Hive data Faster and More Secure