data masking: testing with near-real data

16
T20 Test Techniques 5/2/2013 3:00:00 PM Data Masking: Testing with Near-real Data Presented by: Martin Kralj Ekobit Brought to you by: 340 Corporate Way, Suite 300, Orange Park, FL 32073 888-268-8770 ∙ 904-278-0524 ∙ [email protected] www.sqe.com

Upload: techwellpresentations

Post on 11-Jun-2015

178 views

Category:

Technology


3 download

DESCRIPTION

Organizations worldwide collect data about customers, users, products, and services. Striving to get the most out of collected data, they use it to fuel many day-to-day processes including software testing, development, and personnel training. The majority of this collected data is sensitive and falls under specific government regulations or industry standards that define policies for privacy and generally limit or prohibit using the data for these secondary purposes. Data masking solves this problem. It replaces sensitive information with data that looks real and is structurally similar to the actual information but is useless to anyone trying to obtain the real data. Learn about the process, pros and cons of static and dynamic data masking architectures, subsetting, randomization, generalization, shuffling, and other basic techniques used to set up data masking. Discover how to start data masking and learn about common challenges on data masking projects.

TRANSCRIPT

Page 1: Data Masking: Testing with Near-real Data

T20 Test Techniques

5/2/2013 3:00:00 PM

Data Masking: Testing with

Near-real Data

Presented by:

Martin Kralj

Ekobit

Brought to you by:

340 Corporate Way, Suite 300, Orange Park, FL 32073

888-268-8770 ∙ 904-278-0524 ∙ [email protected] ∙ www.sqe.com

Page 2: Data Masking: Testing with Near-real Data

Martin Kralj

Martin Kralj is responsible for the data masking line of tools and services at Ekobit. In his fifteen years in the software industry, Martin has worked as a business analyst, enterprise software development professional, consultant, project manager, and customer support engineer. He has held key roles on teams producing Ekobit’s flagship products, TeamCompanion and BizDataX, and directed software projects in-house and worldwide. Recently Martin has specialized in application lifecycle management, particularly agile software development methodologies, team work, and data masking. He presents at various conferences and writes about software.

Page 3: Data Masking: Testing with Near-real Data

13.5.2013

1

Data Masking

•Testing with Near-real Data

About me

~ Martin Kralj

~ Software development

~ Project managementand ALM + consulting

~ BizDataX by Ekobit� Complex data relationships

� Large databases

� Near-real data

� Designed for enterprise

Page 4: Data Masking: Testing with Near-real Data

13.5.2013

2

Page 5: Data Masking: Testing with Near-real Data

13.5.2013

3

Agenda

~ Handling sensitive data

� Define “sensitive”

� Norms and regulations

~ Data masking

� Concepts and basic techniques

� How can we do it?

� Scripts vs. tools and platforms

Comply to data privacy and

security laws

Page 6: Data Masking: Testing with Near-real Data

13.5.2013

4

USA norms and regulations

~ Nationwide� HIPAA (Health Insurance Portability and Accountability Act)

� HITEC (Health Information Technology for Economic and Clinical Health Act)

~ State specific, California as an example� CMIA (Confidentiality of Medical Information Act)

� IPA (Information Practices Act)

� PAHRA (Patient Access to Health Records Act)

� IPPA (Insurance Information and Privacy Protection Act)

� Security Breach Notification Law

~ Industry wide� PCI DSS (Payment Card Industry Data Security Standard)

Self-interests and reputation

~ Corporate rules

~ Competition and industrial espionage

~ Protecting intellectual property

~ Ethical reasons and

protection of reputation

Page 7: Data Masking: Testing with Near-real Data

13.5.2013

5

Work with near-real data

~ Format preserving and

context sensitive

~ Secondary usage of

sensitive data

is avoided

{

// Demo

}

Demo: is it real or fabricated?

Page 8: Data Masking: Testing with Near-real Data

13.5.2013

6

Suppression

ID First Name Last Name Date of Birth Phone Gender

1 Sasha Cortez 20.7.1967 1-340-337-7194 Female

2 Neve Dyer 17.11.1975 1-599-974-8272 Female

3 September Graves 9.6.1977 1-404-899-2966 Female

4 Theodore Graves 27.10.1962 1-266-364-7119 Male

5 Donovan Hoover 19.3.1978 1-728-752-4244 Male

6 Lynn Joyner 16.12.1984 1-124-859-5234 Female

7 Quon May 19.11.1954 1-406-895-7153 Female

8 Berk Mcclain 18.7.1966 1-938-803-0464 Male

9 Hakeem Ray 9.4.1964 1-734-314-8964 Male

10 Paki Sellers 10.11.1956 1-641-173-5621 Male

ID First Name Last Name Gender

2 Neve Dyer Female

4 Theodore Graves Male

5 Donovan Hoover Male

7 Quon May Female

8 Berk Mcclain Male

Shuffling

ID First Name Last Name Gender

1 Cortez Female

2 Dyer Female

3 Graves Female

4 Graves Male

5 Hoover Male

6 Joyner Female

7 May Female

8 Mcclain Male

9 Ray Male

10 Sellers Male

Sasha

Neve

September

Theodore

Donovan

Lynn

Quon

Berk

Hakeem

Paki

Page 9: Data Masking: Testing with Near-real Data

13.5.2013

7

Redaction (blacking-out)

ID First Name Last Name Age

1 Sasha Cortez 44

2 Neve Dyer 36

3 September Graves 34

4 Theodore Graves 49

5 Donovan Hoover 33

6 Lynn Joyner 27

7 Quon May 57

8 Berk Mcclain 45

9 Hakeem Ray 47

10 Paki Sellers 55

Generalization

ID First Name Last Name Age

1 Sasha Cortez 41-50

2 Neve Dyer 31.40

3 September Graves 31-40

4 Theodore Graves 41-50

5 Donovan Hoover 31-40

6 Lynn Joyner 21-30

7 Quon May 51-

8 Berk Mcclain 41-50

9 Hakeem Ray 41-50

10 Paki Sellers 51-

Page 10: Data Masking: Testing with Near-real Data

13.5.2013

8

Randomization, generating

and substitution

ID First Name Last Name Phone

1 Sasha Cortez 1-340-337-7194

2 Neve Dyer 1-599-974-8272

3 September Graves 1-404-899-2966

4 Theodore Graves 1-266-364-7119

5 Donovan Hoover 1-728-752-4244

6 Lynn Joyner 1-124-859-5234

7 Quon May 1-406-895-7153

8 Berk Mcclain 1-938-803-0464

9 Hakeem Ray 1-734-314-8964

10 Paki Sellers 1-641-173-5621

ID First Name Last Name Phone

1 Sasha Cortez 1-182-260-6935

2 Neve Dyer 1-886-794-9258

3 September Graves 1-847-263-1225

4 Theodore Graves 1-341-810-3139

5 Donovan Hoover 1-982-608-9112

6 Lynn Joyner 1-960-142-1834

7 Quon May 1-872-132-9340

8 Berk Mcclain 1-612-726-9353

9 Hakeem Ray 1-157-361-5540

10 Paki Sellers 1-834-906-6092

Masking techniques

~ Suppression

~ Shuffling

~ Redaction (blacking out)

~ Generalization

~ Randomization, generating and substitution

Page 11: Data Masking: Testing with Near-real Data

13.5.2013

9

Dynamic data masking

Real

data1234-5678-4011DDM

XXXX-XXXX-4011

1234-5678-4011

Primary

process

Secondary

process

Static data masking

Real

data1234-5678-4011

Masked

dataXXXX-XXXX-4011

SDM

XXXX-XXXX-4011

1234-5678-4011

Primary process

Secondary processes

Page 12: Data Masking: Testing with Near-real Data

13.5.2013

10

{

// Demo

}

Simple script

Tools: masking logic built-in

Page 13: Data Masking: Testing with Near-real Data

13.5.2013

11

Tools: declarative approach

{

// Demo

}

Define rules: simplicity and power

Page 14: Data Masking: Testing with Near-real Data

13.5.2013

12

Tools: performance and expertize

~ Explicit and implicit parallelism

~ Automatic and scheduled execution

~ Notifications, monitoring and auditing

~ Efficient processing of large amounts of data

~ Deterministic or repeatable masking

Tools: systematic approach

~ Thorough analysis of existing infrastructure,

data, people and processes

~ Natural separation of roles and responsibilities

~ Data can be handled as other must haves and

daily routines

~ Accountability and traceability

Page 15: Data Masking: Testing with Near-real Data

13.5.2013

13

Conclusion

~ Data Masking improves data security and handling sensitive data in general

~ Technology is heavily underutilized

� Side job for administrator or programmer

� There is no real control

~ Done by the book

� Systematical and project approach

� Explore and use specialized tools

� Ask for help

Page 16: Data Masking: Testing with Near-real Data

13.5.2013

14

~Martin Kralj

[email protected]

[email protected]