governance,riskmanagementandcompliance inhadoopfiles.meetup.com/12611842/hadoop talks -...

26
1 Governance, Risk Management and Compliance in Hadoop Mark Donsky Director of Product Management, Cloudera

Upload: others

Post on 20-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Governance,RiskManagementandCompliance inHadoopfiles.meetup.com/12611842/Hadoop Talks - Navigator.pdf · 1" Governance,"Risk"Management"and"Compliance" in"Hadoop" Mark"Donsky" Director"of"Product"Management,"Cloudera"

1"

Governance,"Risk"Management"and"Compliance"in"Hadoop"Mark"Donsky"Director"of"Product"Management,"Cloudera"

Page 2: Governance,RiskManagementandCompliance inHadoopfiles.meetup.com/12611842/Hadoop Talks - Navigator.pdf · 1" Governance,"Risk"Management"and"Compliance" in"Hadoop" Mark"Donsky" Director"of"Product"Management,"Cloudera"

2"

Big"Data"Security"Breaches"

Page 3: Governance,RiskManagementandCompliance inHadoopfiles.meetup.com/12611842/Hadoop Talks - Navigator.pdf · 1" Governance,"Risk"Management"and"Compliance" in"Hadoop" Mark"Donsky" Director"of"Product"Management,"Cloudera"

3"

DefiniDons"

• How"do"I"ensure"that"data"is"sufficiently"complete"and"accurate?"Governance"

• How"do"I"ensure"that"data"is"accessed"according"to"(legal)"requirements,"such"as"PCI,"HIPAA"and"NIST?"Compliance"

• How"do"I"idenDfy"risks"that"might"adversely"affect"my"ability"to"govern"or"comply?"

Risk"Management"

Page 4: Governance,RiskManagementandCompliance inHadoopfiles.meetup.com/12611842/Hadoop Talks - Navigator.pdf · 1" Governance,"Risk"Management"and"Compliance" in"Hadoop" Mark"Donsky" Director"of"Product"Management,"Cloudera"

4"

The"Rise"of"Governance,"Risk"Management"and"Compliance"(GRC)"in"Hadoop"Lots"of"data"landing"in"Hadoop"• Huge"quanDDes"• Many"different"sources"–"structured"&"unstructured"• Varying"levels"of"sensiDvity"

Many"users"working"with"the"data"in"mulDple"ways"• Users:"Compliance"Officers,"Analysts,"Data"ScienDsts,"Business"Users"• Tools:"BI"tools,"ETL"tools,"Hue,"and"more"

Need"to"effecDvely"control"&"consume"data"• Get"visibility"&"control"over"the"environment"• Discover,"explore"and"consume"data"

Page 5: Governance,RiskManagementandCompliance inHadoopfiles.meetup.com/12611842/Hadoop Talks - Navigator.pdf · 1" Governance,"Risk"Management"and"Compliance" in"Hadoop" Mark"Donsky" Director"of"Product"Management,"Cloudera"

5"

GRC"Requirements"

• View,"granDng"and"revoke"permissions"across"the"Hadoop"stack"• IdenDfy"access"to"a"data"asset"around"the"Dme"of"security"breach"• Generate"alert"when"a"restricted"data"asset"is"accessed"

AudiDng"and"Access"Management"

• Given"a"data"set,"trace"back"to"the"original"source"• Understand"the"downstream"impact"of"purging/modifying"a"data"set""Lineage"

• Search"through"metadata"to"find"data"sets"of"interest"• Given"a"data"set,"view"schema,"metadata"and"policies"

Metadata"Tagging"and"Discovery"

Page 6: Governance,RiskManagementandCompliance inHadoopfiles.meetup.com/12611842/Hadoop Talks - Navigator.pdf · 1" Governance,"Risk"Management"and"Compliance" in"Hadoop" Mark"Donsky" Director"of"Product"Management,"Cloudera"

6"

AudiDng"

Page 7: Governance,RiskManagementandCompliance inHadoopfiles.meetup.com/12611842/Hadoop Talks - Navigator.pdf · 1" Governance,"Risk"Management"and"Compliance" in"Hadoop" Mark"Donsky" Director"of"Product"Management,"Cloudera"

7"

Why"is"audiDng"important?"

• Who"accessed"a"parDcular"file"or"table?"• Who"was"denied"access"a"parDcular"file"or"table?"• Who"ran"queries"on"a"parDcular"table?"• What"did"someone"try"to"do"during"a"security"breach?"

Page 8: Governance,RiskManagementandCompliance inHadoopfiles.meetup.com/12611842/Hadoop Talks - Navigator.pdf · 1" Governance,"Risk"Management"and"Compliance" in"Hadoop" Mark"Donsky" Director"of"Product"Management,"Cloudera"

8"

Hadoop"Audit"Logs"

Component( Loca,on((CDH)(

HDFS"Audit"Logs"" /var/log/hadoopchdfs/audit"

Hive"Audit"Logs" /var/log/hive/audit"

Impala"Audit"Logs" /var/log/impalad/audit"

HBase"Audit"Logs" /var/log/hbase/audit"

Page 9: Governance,RiskManagementandCompliance inHadoopfiles.meetup.com/12611842/Hadoop Talks - Navigator.pdf · 1" Governance,"Risk"Management"and"Compliance" in"Hadoop" Mark"Donsky" Director"of"Product"Management,"Cloudera"

9"

HDFS"and"Hive"Audit"Logs"

•  Logs"all"file"system"access"requests"•  Impala,"HBase"and"other"components"use"a"similar"format"•  Implemented"in"log4j"at"the"INFO"level"

HDFS(Property:(Log4j.logger.org.apache.hadoop.hdfs."server.namenode.FSNamesystem.audit"

{ ""allowed":""true,"""serviceName":"""HDFSc1”,"""username":"""training”,"""src":"""/user”,"""eventTime":""1398544478141,"""ipAddress":"""10.20.187.39”,"""operaDon":"""gekileinfo”,"""dest":""null,"""permissions":""null,"""impersonator":""null,"""delegaDonTokenId":""null"

}"{ ""allowed":""false,"

""serviceName":"""HDFSc1”,"""username":"""training”,"""src":"""/user/test”,"""eventTime":""1398544478187,"""ipAddress":"""10.20.187.39”,"""operaDon":"""mkdirs”,"""dest":""null,"""permissions":""null,"""impersonator":""null,"""delegaDonTokenId":""null"

}"

{ ""serviceName":""HIVEc1","""username":""admin","""impersonator":"null,"""ipAddress":""10.20.187.39","""operaDon":""QUERY","""eventTime":"1398402718797,"""operaDonText":""select"count(*)"from"salesdata","""allowed":"true,"""databaseName":""default","""tableName":""salesdata","""resourcePath":""/user/hive/warehouse/salesdata","""objectType":""TABLE""

}"{ ""serviceName":""HIVEc1","

""username":""admin","""impersonator":"null,"""ipAddress":""10.20.187.39","""operaDon":""QUERY","""eventTime":"1398402762830,"""operaDonText":""select"s_zip,"count(*)"from"salesdata"group"by"s_zip","""allowed":"true,"""databaseName":""default","""tableName":""salesdata","""resourcePath":""/user/hive/warehouse/salesdata","""objectType":""TABLE""

}"

HDFS"Audit"Log" Hive"Audit"Log"

Page 10: Governance,RiskManagementandCompliance inHadoopfiles.meetup.com/12611842/Hadoop Talks - Navigator.pdf · 1" Governance,"Risk"Management"and"Compliance" in"Hadoop" Mark"Donsky" Director"of"Product"Management,"Cloudera"

10"

Hue"Job"Status"

Page 11: Governance,RiskManagementandCompliance inHadoopfiles.meetup.com/12611842/Hadoop Talks - Navigator.pdf · 1" Governance,"Risk"Management"and"Compliance" in"Hadoop" Mark"Donsky" Director"of"Product"Management,"Cloudera"

11"

AudiDng"Summary"

• Hadoop"components"maintain"a"complete"audit"log,"but"they"are:"• Difficult"to"parse"•  Stored"in"different"locaDons"•  Limited"to"chronological"organizaDon"• Difficult"to"integrate"with"enterprise"infrastructure"

Page 12: Governance,RiskManagementandCompliance inHadoopfiles.meetup.com/12611842/Hadoop Talks - Navigator.pdf · 1" Governance,"Risk"Management"and"Compliance" in"Hadoop" Mark"Donsky" Director"of"Product"Management,"Cloudera"

12"

Metadata"and"Discovery"

Page 13: Governance,RiskManagementandCompliance inHadoopfiles.meetup.com/12611842/Hadoop Talks - Navigator.pdf · 1" Governance,"Risk"Management"and"Compliance" in"Hadoop" Mark"Donsky" Director"of"Product"Management,"Cloudera"

13"

Why"is"metadata"important?"

Technical"Metadata"

•  Describes"the"informaDon"required"to"access"the"data,"such"as"where"the"data"resides"or"the"structure"of"the"data"in"its"naDve"environment"

•  Allows"you"to"draw"relaDons"between"disparate"data"sets"like"“emp_sal”,"“salary”,"“sal”"

Business"Metadata"

•  Details"businesscrelated"informaDon"about"the"data,"such"as"keywords"related"to"the"meta"object"or"notes"about"the"meta"object"

•  Allows"you"to"annotate"data"for"your"users"and"retrieve"data"based"on"businessccontext"(e.g.,"all"data"related"to"a"clinical"trial)"

Page 14: Governance,RiskManagementandCompliance inHadoopfiles.meetup.com/12611842/Hadoop Talks - Navigator.pdf · 1" Governance,"Risk"Management"and"Compliance" in"Hadoop" Mark"Donsky" Director"of"Product"Management,"Cloudera"

14"

What"kind"of"technical"metadata"is"available?"

Hive"

•  Query"Text"•  Table"name"•  Column"name"

•  Data"Type"•  Owner"•  ParDDons"

Pig"

•  Script"name"•  Owner"•  CreaDon"date"

•  Last"modified"date"

HDFS"

•  Permissions"•  Owner"•  Group"•  CreaDon"date"

•  Last"modified"date"

MapReduce,"YARN"

•  JobID"•  Mapper"Class"•  Reducer"Class"

•  Inputs"•  Outputs"

Page 15: Governance,RiskManagementandCompliance inHadoopfiles.meetup.com/12611842/Hadoop Talks - Navigator.pdf · 1" Governance,"Risk"Management"and"Compliance" in"Hadoop" Mark"Donsky" Director"of"Product"Management,"Cloudera"

15"

Where"is"technical"metadata"located?"

Component( Metadata(

HDFS" fsimage"(ls"–lRa"/)"

Hive" Hive"Metastore"Server"(database"metadata"tables)"

MapReduce" JobTracker"

YARN" Job"History"Server"

Oozie" Oozie"Server"

Pig" JobTracker,"Job"History"Server"

Page 16: Governance,RiskManagementandCompliance inHadoopfiles.meetup.com/12611842/Hadoop Talks - Navigator.pdf · 1" Governance,"Risk"Management"and"Compliance" in"Hadoop" Mark"Donsky" Director"of"Product"Management,"Cloudera"

16"

Technical"metadata:"Hive"Metastore"

•  The"Hive"Metastore"is"a"SQLclike"querying"capability"for"its"own"tables"•  Restricted"to"Hive"tables"–"not"structured"HDFS"files"

Page 17: Governance,RiskManagementandCompliance inHadoopfiles.meetup.com/12611842/Hadoop Talks - Navigator.pdf · 1" Governance,"Risk"Management"and"Compliance" in"Hadoop" Mark"Donsky" Director"of"Product"Management,"Cloudera"

17"

Technical"metadata:"HCatalog"and"WebHCat"•  HCatalog"extends"the"Hive"metastore"with"noncHive"structured"data"stored"in"HDFS"•  Abstracts"the"file"locaDon"and"storage"format""• Makes"formats"available"to"Pig,"Hive,"MapReduce,"etc."• WebHCat"is"a"RESTful"interface"to"HCatalog""

#"hcat"ce""describe"salesdata”""s_num""""""""""""""" "float""""""""""""""" "None"""""""""""""""""s_borough""""""""""" "int""""""""""""""""" "None"""""""""""""""""s_neighbor"""""""""" "string"""""""""""""" "None"""""""""""""""""s_b_class""""""""""" "string"""""""""""""" "None"""""""""""""""""s_c_p""""""""""""""" "string"""""""""""""" "None"""""""""""""""""s_block""""""""""""" "string"""""""""""""" "None"""""""""""""""""s_lot""""""""""""""" "string"""""""""""""" "None"""""""""""""""""s_easement"""""""""" "string"""""""""""""" "None"""""""""""""""""w_c_p_2""""""""""""" "string"""""""""""""" "None"""""""""""""""""s_address""""""""""" "string"""""""""""""" "None"""""""""""""""""s_app_num""""""""""" "string"""""""""""""" "None"""""""""""""""""s_zip""""""""""""""" "string"""""""""""""" "None"""""""""""""""""s_res_units""""""""" "string"""""""""""""" "None"""""""""""""""""s_com_units""""""""" "string"""""""""""""" "None"""""""""""""""""s_tot_units""""""""" "int""""""""""""""""" "None"""""""""""""""""s_sq_v""""""""""""" "float""""""""""""""" "None"""""""""""""""""s_g_sq_v""""""""""" "float""""""""""""""" "None"""""""""""""""""s_yr_built"""""""""" "int""""""""""""""""" "None"""""""""""""""""s_tax_c""""""""""""" "int""""""""""""""""" "None"""""""""""""""""s_b_class2"""""""""" "string"""""""""""""" "None"""""""""""""""""s_price""""""""""""" "float""""""""""""""" "None"""""""""""""""""s_sales_dt"""""""""" "string"""""""""""""" "None"""""""""""""""""Time"taken:"1.847"seconds"

Page 18: Governance,RiskManagementandCompliance inHadoopfiles.meetup.com/12611842/Hadoop Talks - Navigator.pdf · 1" Governance,"Risk"Management"and"Compliance" in"Hadoop" Mark"Donsky" Director"of"Product"Management,"Cloudera"

18"

What"about"Business"Metadata?"

•  Business"metadata"="custom"tags"+"key"value"pairs"•  Businesscfocused"terms"that"make"sense"to"end"users"and"data"custodians"•  May"adhere"to"standards"such"as"CDISC"•  Required"for"regulatory"compliance"(e.g.,"Basel"II,"SOX)"

•  Solves"the"following"problems:"•  Show"me"everything"related"to"clinical"trial"X"•  Gather"all"recorded"customer"calls"about"denied"credit"•  Collect"all"credit"informaDon"about"customer"Z"•  Provide"consistent"naming"to"similar"columns"(e.g.,"emp_salary,"salary,"sal"!"salary)"

•  Must"integrate"with"exisDng"business"metadata"stored"in"products"from"InformaDca,"Data"Advantage"Group,"etc."

•  Hadoop"does"not"provide"business"metadata"

Page 19: Governance,RiskManagementandCompliance inHadoopfiles.meetup.com/12611842/Hadoop Talks - Navigator.pdf · 1" Governance,"Risk"Management"and"Compliance" in"Hadoop" Mark"Donsky" Director"of"Product"Management,"Cloudera"

19"

Business"metadata"examples"

PharmaceuDcals/Healthcare"• Trial"site"ID"• ParDcipant"ID"• HIPAA"metadata"such"as"PHI"flag"

Financial"Services"• Account"number"• SensiDvity"level"• Data"origin"• PCI"metadata"such"as"PII"flag"

Page 20: Governance,RiskManagementandCompliance inHadoopfiles.meetup.com/12611842/Hadoop Talks - Navigator.pdf · 1" Governance,"Risk"Management"and"Compliance" in"Hadoop" Mark"Donsky" Director"of"Product"Management,"Cloudera"

20"

Lineage"

Page 21: Governance,RiskManagementandCompliance inHadoopfiles.meetup.com/12611842/Hadoop Talks - Navigator.pdf · 1" Governance,"Risk"Management"and"Compliance" in"Hadoop" Mark"Donsky" Director"of"Product"Management,"Cloudera"

21"

Why"is"lineage"important?"

•  Lineage:"•  IdenDfies"the"files,"tables,"columns,"and"transformaDons"that"have"an"impact"on"a"selected"table"or"column"

•  Answers"the"following"quesDons:"•  Impact(analysis:"What"happens"if"I"delete"a"file,"table,"column,"etc.?""•  Governance:"What"analyses"were"performed"on"sensiDve"data?"•  Data(integrity:(What"data"sources"were"used"to"generate"a"parDcular"analysis?"

• However:"•  Lineage"is"very"complex"to"determine"in"Hadoop"

Page 22: Governance,RiskManagementandCompliance inHadoopfiles.meetup.com/12611842/Hadoop Talks - Navigator.pdf · 1" Governance,"Risk"Management"and"Compliance" in"Hadoop" Mark"Donsky" Director"of"Product"Management,"Cloudera"

22"

Cloudera"Navigator"

Page 23: Governance,RiskManagementandCompliance inHadoopfiles.meetup.com/12611842/Hadoop Talks - Navigator.pdf · 1" Governance,"Risk"Management"and"Compliance" in"Hadoop" Mark"Donsky" Director"of"Product"Management,"Cloudera"

23"

Cloudera"Enterprise"Data"Hub"(((1.   Secure(&(Compliant((

•  Robust"access"controls"•  Data"encrypDon"opDons"•  Shared"security"policies"

2.   (Enterprise(Data(Governance(((((((((((•  Meta"data"management"""""""""""""""•  Data"lineage/tethering"""•  Audit"histories"

3.   (Unified(&(manageable(•  Common"storage"&"resource"

management"•  Oncprem","cloud"&"managed"

service"•  Highly"available"(including"DR)"

4.   (Open(Architecture((•  Open"source"plakorm"•  APIs"&"engines"for""

mulDple"workloads"•  Extensible"for"3rd"parDes"

( ((((((((Enterprise(Data(Hub(

Unified(ScaleNout(Storage(For"Any"Type"of"Data"

ElasDc,"Faultctolerant,"Selfchealing,"Incmemory"capabiliDes"

Resource"Management"

Online"NoSQL""DBMS"

AnalyDc""MPP"DBMS"

Search""Engine"

Batch""Processing"

Stream""Processing"

Machine""Learning"

SQL" Streaming" File"System"(NFS)"

System""

Managem

ent"Data""

Managem

ent"

Metadata,"Security,"Audit,"Lineage"

Key(APributes(

Page 24: Governance,RiskManagementandCompliance inHadoopfiles.meetup.com/12611842/Hadoop Talks - Navigator.pdf · 1" Governance,"Risk"Management"and"Compliance" in"Hadoop" Mark"Donsky" Director"of"Product"Management,"Cloudera"

24"

Cloudera"Navigator"

• View,"granDng"and"revoke"permissions"across"the"Hadoop"stack"• IdenDfy"access"to"a"data"asset"around"the"Dme"of"security"breach"• Generate"alert"when"a"restricted"data"asset"is"accessed"

AudiDng"and"Access"Management"

• Given"a"data"set,"trace"back"to"the"original"source"• Understand"the"downstream"impact"of"purging/modifying"a"data"set""Lineage"

• Search"through"metadata"to"find"data"sets"of"interest"• Given"a"data"set,"view"schema,"metadata"and"policies"

Metadata"Tagging"and"Discovery"

Page 25: Governance,RiskManagementandCompliance inHadoopfiles.meetup.com/12611842/Hadoop Talks - Navigator.pdf · 1" Governance,"Risk"Management"and"Compliance" in"Hadoop" Mark"Donsky" Director"of"Product"Management,"Cloudera"

25"

Demo"

Page 26: Governance,RiskManagementandCompliance inHadoopfiles.meetup.com/12611842/Hadoop Talks - Navigator.pdf · 1" Governance,"Risk"Management"and"Compliance" in"Hadoop" Mark"Donsky" Director"of"Product"Management,"Cloudera"

26" ©2014"Cloudera,"Inc."All"rights"reserved."

Mark"Donsky"[email protected]"

26"