isaca new delhi india - privacy and big data

76
Bridging the Gap Between Privacy and Big Data Ulf Mattsson, CTO Protegrity ulf.mattsson AT protegrity.com

Upload: ulf-mattsson

Post on 11-Jan-2017

73 views

Category:

Technology


2 download

TRANSCRIPT

Bridging the Gap Between Privacy and Big Data

Ulf Mattsson , CTO

Protegrity

ulf.mattsson AT protegrity.com

20 years with IBM • Research & Development & Global Services

Inventor • Encryption, Tokenization & Intrusion Prevention

Involvement

Ulf Mattsson, CTO Protegrity

2

• PCI Security Standards Council (PCI SSC)

• American National Standards Institute (ANSI) X9

• Encryption & Tokenization

• International Federation for Information Processing• IFIP WG 11.3 Data and Application Security

• ISACA New York Metro chapter

3

Agenda

1. What is Big Data & Cloud?

2. Risk & Drivers for Data Security

3. The Evolution of Data Security Methods

4. Data De-Identification

5. Off-Shoring & Outsourcing

6. Use Cases & Case Studies

4

Who is Protegrity?

Proven enterprise data protection software leader since the 90’s.

Business driven by compliance

• PCI (Payment Card Industry)

• PII (Personally Identifiable Information)

• PHI (Protected Health Information) – HIPAA

• State and Industry Privacy Laws• State and Industry Privacy Laws

Servicing many Industries

• Retail, Hospitality, Travel and Transportation

• Financial Services, Insurance, Banking

• Healthcare

• Telecommunications, Media and Entertainment

• Manufacturing and Government

Big Data

Hadoop

• Designed to handle the emerging “4 V’s”

• Massively Parallel Processing (MPP)

• Elastic scale

• Usually Read-Only

• Allows for data insights on massive, heterogeneous data sets

What is Big Data?

data sets

• Includes an ecosystem of components:

7

Hive

MapReduce

HDFS

Physical Storage

Pig Other

Application Layers

Storage Layers

Has Your Organization Already Invested in Big Data?

8

Source: Gartner

Cloud

9

Services usually provided by a third party

• Can be virtual, public, private, or hybrid

Increasing adoption – up 12% from 2012*

Often an outsourced solution, sometimes cross-border

Allows for greater accessibility of data and low overhead

Cloud Services

*Source: GigaOM

Cloud Services and Models

Source: NIST, CSA

Drivers for Data Security

12

Data Security

Regulations & Laws

• Payment Card Industry Data Security Standard (PCI DSS)

• National Privacy Laws

• Cross-Border & Outsourcing Privacy Laws

Expanding Threat Landscape

• Hackers & APT

Drivers for Data Security

• Hackers & APT

• Internal Threats & Rogue Privileged Users

• Excessive Privilege or Security Negligence

Sensitive Data Insight & Usability

• Unprotected Sensitive or Restricted Data is Unusable for Marketing, Monetization, Outsourcing, etc.

Vulnerabilities in Emerging Technologies

13

Regulations & LawsLaws

PCI DSS

14

Founded in 2006, comprised of four major credit card brands

Each card brand enforcement program issues fines, fees and schedule deadlines

• Visa's Cardholder Information Security Program (CISP)http://www.visa.com/cisp

PCI Data Security Standards Council

• MasterCard's Site Data Protection (SDP) programhttp://www.mastercard.com/us/sdp/index.html

• Discover's Discover Information Security and Compliance (DISC) programhttp://www.discovernetwork.com/fraudsecurity/disc.html

• American Express Data Security Operating Policy (DSOP)http://www.americanexpress.com/datasecurity

15

PCI DSS Build and maintain a secure network.

1. Install and maintain a firewall configuration to protect data

2. Do not use vendor-supplied defaults for system passwords and other security parameters

Protect cardholder data. 3. Protect stored data4. Encrypt transmission of cardholder data and

sensitive information across public networks

Maintain a vulnerability management program.

5. Use and regularly update anti-virus software6. Develop and maintain secure systems and

applicationsapplications

Implement strong access control measures.

7. Restrict access to data by business need-to-know8. Assign a unique ID to each person with computer

access9. Restrict physical access to cardholder data

Regularly monitor and test networks.

10. Track and monitor all access to network resources and cardholder data

11. Regularly test security systems and processes

Maintain an information security policy.

12. Maintain a policy that addresses information security

16

Protection of cardholder data in memory

Clarification of key management dual control and split knowledge

Recommendations on making PCI DSS business-as-usual and best practices

PCI DSS 3.0

Security policy and operational procedures added

Increased password strength

New requirements for point-of-sale terminal security

More robust requirements for penetration testing

17

Relevant to all sensitive data that is outsourced t o cloud

1. Clients retain responsibility for the data they put in the cloud

2. Public-cloud providers often have multiple data centers, which may often be in multiple countries or regions

3. The client may not know the location of their data, or the data may

PCI DSS Cloud Guidelines

3. The client may not know the location of their data, or the data may exist in one or more of several locations at any particular time

4. A client may have little or no visibility into the controls

5. In a public-cloud environment, one client’s data is typically stored with data belonging to multiple other clients. This makes a public cloud an attractive target for attackers

18

Regulations & LawsLaws

National Privacy Laws

19

National Privacy Laws - USA

1. Names

2. All geographical subdivisions smaller than a State

3. All elements of dates (except year) related to individual

4. Phone numbers

11. Certificate/license numbers

12. Vehicle identifiers and serial numbers

13. Device identifiers and serial numbers

14. Web Universal Resource Locators

Heath Information Portability and Accountability Ac t – HIPAA

4. Phone numbers

5. Fax numbers

6. Electronic mail addresses

7. Social Security numbers

8. Medical record numbers

9. Health plan beneficiary numbers

10. Account numbers

20

14. Web Universal Resource Locators (URLs)

15. Internet Protocol (IP) address numbers

16. Biometric identifiers, including finger prints

17. Full face photographic images

18. Any other unique identifying number

Privacy Laws

54 International Privacy Laws

30 United States Privacy Laws

21

Information Technology Act – 2000 (IT Act)• Requires that the corporate body and Data Processor

implement reasonable security practices and standards

• IS/ISO/IEC 27001 requirements recognized

Information Technology Act – 2008 (Amended IT Act)• Damages for negligence and wrongful gain or loss

• Criminal punishment for disclosing Sensitive Personal

National Privacy Laws - India

• Criminal punishment for disclosing Sensitive Personal Information (SPI)

India Privacy Law – 2011• Expanded definition of SPI to passwords, financial data,

health data, medical treatment records, and more

Right to Privacy Bill – 2013 (Proposed)• Increased jail terms & fines for disclosure of SPI

• Addresses data handled for foreign clients

22

Regulations & Laws

Cross-Border & Outsourcing Laws

23

The laws of the sending country apply to data sent across international borders, including outsourced operations

• i.e. National Privacy Laws

APEC Cross-Border Privacy Laws

• Non-binding privacy enforcement in Asia-Pacific region

Cross-Border & Outsourcing Laws

• Non-binding privacy enforcement in Asia-Pacific region

24

Expanding Threat Landscape

26

Cyber Criminals Cost India USD 4 Billion

27

Source: Symantec 2013

28

29

http://www.ey.com/Publication/vwLUAssets/EY_-_2013_Global_Information_Security_Survey/$FILE/EY-GISS-Under-cyber-attack.pdf

Sensitive Data Insight &

30

Insight & Usability

Vulnerabilities in Emerging

31

in Emerging Technologies

Holes in Big Data…

32

Source: Gartner

Many Ways to Hack Big Data

MapReduce(Job Scheduling/Execution System)

Pig (Data Flow) Hive (SQL) Sqoop

ETL Tools BI Reporting RDBMS

Avr

o (S

eria

lizat

ion)

Zoo

keep

er (

Coo

rdin

atio

n)

Hackers

UnvettedApplications

OrAd Hoc

Processes

Source: http://nosql.mypopescu.com/post/1473423255/apache-hadoop-and-hbase

33

HDFS(Hadoop Distributed File System)

Hbase (Column DB)

Avr

o (S

eria

lizat

ion)

Zoo

keep

er (

Coo

rdin

atio

n)

PrivilegedUsers

The Insider Threat

34

Big Data and Cloud environments are designed for access and deep insight into vast data pools

Data can monetized not only by marketing analytics, but through sale or use by a third party

The more accessible and usable the data is, the

Sensitive Data Insight & Usability

The more accessible and usable the data is, the greater this ROI benefit can be

Security concerns and regulations are often viewed as opponents to data insight

35

Big Data (Hadoop) was designed for data access, not security

Security in a read-only environment introduces new challenges

Massive scalability and performance requirements

Big Data Vulnerabilities and Concerns

Sensitive data regulations create a barrier to usability, as data cannot be stored or transferred in the clear

Transparency and data insight are required for ROI on Big Data

36

Public cloud security is often not visible to the client, but client is still responsible for security

Greater access to shared data sets by more users creates additional points of vulnerability

Data redundancy for high availability, often across multiple data centers, increases vulnerability

Cloud Vulnerabilities and Concerns

multiple data centers, increases vulnerability

Virtualization can create numerous security issues

Transparency and data insight are required for ROI

37

How do you lock this?

Security Improving but We Are Losing Ground

38

Breach Discovery Methods

39

Verizon 2013 Data-breach-investigations-report

The Evolution of Data Security Data Security

Methods

40

Coarse Grained Security

• Access Controls

• Volume Encryption

• File Encryption

Fine Grained Security

Evolution of Data Security Methods

Time

Fine Grained Security

• Access Controls

• Field Encryption (AES & )

• Masking

• Tokenization

• Vaultless Tokenization

41

Use of Enabling Technologies

1%

18%

30%

21%

91%

47%

35%

39%

Access controls

Database activity monitoring

Database encryption

Backup / Archive encryption 21%

28%

7%

22%

39%

28%

29%

23%

Backup / Archive encryption

Data masking

Application-level encryption

Tokenization

Evaluating

42

Old and flawed:

Minimal access

levels so people

can only carry

Access Control

Risk

High –

can only carry

out their jobs

43

AccessPrivilege

LevelI

High

I

Low

Low –

DC6

Slide 43

DC6 I have no idea what this graph is supposed to representDaniel Crum, 11/6/2013

Applying the protection profile to the content of data fields allows

for a wider range of authority for a wider range of authority options

44

Risk

High –

Old:

Minimal access

levels – Least New:

Much greater

How the New Approach is Different

AccessPrivilege

LevelI

High

I

Low

Low –

levels – Least

Privilege to avoid

high risks

Much greater

flexibility and

lower risk in data

accessibility

45

Reduction of Pain with New Protection Techniques

High

Pain& TCO

Strong Encryption Output:AES, 3DES

Format Preserving EncryptionDTP, FPE

Input Value: 3872 3789 1620 3675

!@#$%a^.,mhu7///&*B()_+!@

8278 2789 2990 2789

46

1970 2000 2005 2010

Low

Vault-based Tokenization

Vaultless Tokenization

8278 2789 2990 2789

Format Preserving

Greatly reduced Key Management

No Vault

8278 2789 2990 2789

Fine Grained Security: Encryption of Fields

Production SystemsEncryption of fields• Reversible• Policy Control (authorized / Unauthorized Access)• Lacks Integration Transparency• Complex Key Management• Example: !@#$%a^.,mhu7///&*B()_+!@

47

Non-Production Systems

Fine Grained Security: Masking of Fields

Production Systems

48

Non-Production SystemsMasking of fields• Not reversible• No Policy, Everyone can access the data• Integrates Transparently• No Complex Key Management• Example: 0389 3778 3652 0038

Fine Grained Security: Tokenization of Fields

Production Systems

Tokenization (Pseudonymization)

• No Complex Key Management• Business Intelligence• Example: 0389 3778 3652 0038

49

Non-Production Systems

• Reversible • Policy Control (Authorized / Unauthorized Access)

• Not Reversible• Integrates Transparently

Fine Grained Data Security Methods

Tokenization and Encryption are Different

Used Approach Cipher System Code System

Cryptographic algorithms

Cryptographic keys

TokenizationEncryption

50

Cryptographic keys

Code books

Index tokens

Source: McGraw-HILL ENCYPLOPEDIA OF SCIENCE & TECHNOLOGY

Fine Grained Data Security Methods

Vault-based Tokenization Vaultless Tokenization

Footprint Large, Expanding. Small, Static.

High Availability,

Disaster Recovery

Complex, expensive

replication required.

No replication required.

Vault-based vs. Vaultless Tokenization

51

Distribution Practically impossible to

distribute geographically.

Easy to deploy at different

geographically distributed locations.

Reliability Prone to collisions. No collisions.

Performance,

Latency, and

Scalability

Will adversely impact

performance & scalability.

Little or no latency. Fastest industry

tokenization.

PCI DSS 3.0

• Split knowledge and dual control

PCI SSC Tokenization Task Force

• Tokenization and use of HSM

Card Brands – Visa, MC, AMEX …

The Future of Tokenization

• Tokens with control vectors

ANSI X9

• Tokenization and use of HSM

52

Security of Different Protection Methods

High

Security Level

I

Format

Preserving

Encryption

I

Vaultless

Data

Tokenization

I

AES CBC

Encryption

Standard

I

Basic

Data

Tokenization

53

Low

10 000 000 -

1 000 000 -

100 000 -

10 000 -

Transactions per second*

Speed of Different Protection Methods

10 000 -

1 000 -

100 -I

Format

Preserving

Encryption

I

Vaultless

Data

Tokenization

I

AES CBC

Encryption

Standard

I

Vault-based

Data

Tokenization

*: Speed will depend on the configuration

54

Risk Adjusted Data Protection

Data Security Methods Performance Storage Security Tran sparency

System without data protection

Monitoring + Blocking + Obfuscation

Data Type Preservation Encryption

Strong Encryption

There is always a trade-off between security and usability.

Strong Encryption

Vaultless Tokenization

Hashing

Anonymisation

BestWorst

55

DataDe-Identification

56

De-Identification

The solution to protecting Identifiable data is to properly de-identify it.

Redact the information – remove it.

What is de-identification of identifiable data?

Personally Identifiable Information Health Information / Financial Information

Personally Identifiable Information Health Information / Financial Information�

Redact the information – remove it.

The identifiable portion of the record is de-identified with any number of protection methods such as masking, tokenization, encryption, redacting (removed), etc.

The method used will depend on your use case and the reason that you are de-identifying the data.

57

Identifiable Sensitive InformationField Real Data Tokenized / Pseudonymized

Name Joe Smith csu wusoj

Address 100 Main Street, Pleasantville, CA 476 srta coetse, cysieondusbak, CA

Date of Birth 12/25/1966 01/02/1966

Telephone 760-278-3389 760-389-2289

E-Mail Address [email protected] [email protected]

SSN 076-39-2778 937-28-3390

CC Number 3678 2289 3907 3378 3846 2290 3371 3378

Business URL www.surferdude.com www.sheyinctao.com

Fingerprint Encrypted

Photo Encrypted

X-Ray Encrypted

Healthcare / Financial Services

Dr. visits, prescriptions, hospital stays and discharges, clinical, billing, etc.Financial Services Consumer Products and activities

Protection methods can be equally applied to the actual healthcare data, but not needed with de-identification

58

De-Identified Sensitive Data Field Real Data Tokenized / Pseudonymized

Name Joe Smith csu wusoj

Address 100 Main Street, Pleasantville, CA 476 srta coetse, cysieondusbak, CA

Date of Birth 12/25/1966 01/02/1966

Telephone 760-278-3389 760-389-2289

E-Mail Address [email protected] [email protected]

SSN 076-39-2778 076-28-3390

CC Number 3678 2289 3907 3378 3846 2290 3371 3378

Business URL www.surferdude.com www.sheyinctao.com

Fingerprint Encrypted

Photo Encrypted

X-Ray Encrypted

Healthcare / Financial Services

Dr. visits, prescriptions, hospital stays and discharges, clinical, billing, etc.Financial Services Consumer Products and activities

Protection methods can be equally applied to the actual data, but not needed with de-identification

59

Use

Case

How Should I Secure Different Data?

Simple –PCI

PII

Encryption

of Files

CardHolder Data

Tokenization of Fields

Personally Identifiable Information

Type of

DataI

Structured

I

Un-structured

Complex – PHI

ProtectedHealth

Information

60

Personally Identifiable Information

Research Brief

Tokenization Gets Traction

Aberdeen has seen a steady increase in enterprise use of tokenization for protecting sensitive data over encryption

Nearly half of the respondents (47%) are currently using tokenization for something other than cardholder data

Over the last 12 months, tokenization users had 50% fewer security-related incidents than tokenization non-users

61 Author: Derek Brink, VP and Research Fellow, IT Security and IT GRC

The business intelligence exposed through Vaultless Tokenization can allow many users and processes to perform job functions on protected data

Extreme flexibility in data de-identification can allow responsible data monetization

Vaultless Tokenization & Data Insight

Data remains secure throughout data flows, and can maintain a one-to-one relationship with the original data for analytic processes

62

Use Cases for Coarse & Fine Coarse & Fine

Grained Security

63

Off-shoring & OutsourcingOutsourcing

Business Process Outsourcing (BPO)

• Business Processes

• E.g. Loans, Mortgages, Call Centre, Claims Processing, ERP, etc.

• Application Development

• Need to de-identify Data for Testing and Development

Off-Shoring

Privacy Impacts BPO & Offshore Business Solutions

• Same as Outsourcing, but data is sent for business functions (like call center, etc.) off-shore.

Laws governing your ability to send real data to 3rd parties are already restrictive, and becoming more so

Penalties for infringement are growing more severe

Risk of data breaches and data theft is increased

65

Major Bank in EU wants to centralise EDW operations in a single country and therefore send customer data from country A to country B. Privacy Laws in country A prohibit this.

Private Bank in Europe wants to offshore Finance

Examples

Private Bank in Europe wants to offshore Finance Operations. Privacy Law prohibits transfer of citizen data to India.

Retail Bank in Scandinavia wants to offshore Customer Services. Privacy law prevents transfer of citizen data to the Far East.

66

Case Studies

Protegrity Use Case: UniCredit

CHALLENGES The primary challenge was to protect PII – names and addresses, phone and email, policy and account numbers, birth dates, etc. – to the satisfaction of EU Cross Border Data Security requirements. This included incoming source data from various European banking entities, and existing data within those systems, which would be consolidated at the Italian HQ.

Case Study - Large US Chain Store

Reduced cost

• 50 % shorter PCI audit

Quick deployment

• Minimal application changes

• 98 % application transparent

Top performanceTop performance

• Performance better than encryption

Stronger security

69

Case Study: Large Chain Store

Why? Reduce compliance cost by 50%• 50 million Credit Cards, 700 million daily transactions

• Performance Challenge: 30 days with Basic to 90 minutes with Vaultless Tokenization

• End-to-End Tokens: Started with the D/W and expanding to stores

• Lower maintenance cost – don’t have to apply all 12 requirements

• Better security – able to eliminate several business and daily reports

• Quick deployment

• Minimal application changes

• 98 % application transparent

70

Aadhaar/UIDBig DataBig Data

Use Case

Aadhaar Data Stores

Mongo cluster(all enrolment records/documents

– demographics + photo)

Shard

1

Shard

4

Shard

5

Shard

2

Shard

3Low latency indexed read (Documents per sec),High latency random search (seconds per read)

Low latency indexed read (milli-

Solr cluster(all enrolment records/documents

– selected demographics only)

Low latency indexed read (Documents per sec),Low latency random search (Documents per sec)

Shard

0

Shard

2

Shard

6

Shard

9

Shard

a

Shard

d

Shard

f

MySQL(all UID generated records - demographics only,

track & trace, enrolment status )

Low latency indexed read (milli-seconds per read),High latency random search (seconds per read)

UID master

(sharded)

Enrolment

DB

HDFS(all raw packets)

Data

Node 1Data

Node 10

Data

Node ..

High read throughput (MB per sec),High latency read (seconds per read)

Data

Node 20

HBase(all enrolment

biometric templates)

Region

Ser. 1Region

Ser. 10

Region

Ser. ..

High read throughput (MB per sec),Low-to-Medium latency read (milli-seconds per read)Region

Ser. 20

NFS(all archived raw packets)

Moderate read throughput,High latency read (seconds per read)

LUN 1 LUN 2 LUN 3 LUN 4

Protegrity Summary

Proven enterprise data security software and innovation leader

• Sole focus on the protection of data

• Patented Technology, Continuing to Drive Innovation

Cross-industry applicability• Retail, Hospitality, Travel and

TransportationTransportation

• Financial Services, Insurance, Banking

• Healthcare

• Telecommunications, Media and Entertainment

• Manufacturing and Government

74

Please contact us for more information

[email protected]

[email protected]

[email protected]

www.protegrity.com