data masking: testing with near-real data

T20 Test Techniques

5/2/2013 3:00:00 PM

Data Masking: Testing with

Near-real Data

Presented by:

Martin Kralj

Ekobit

Brought to you by:

340 Corporate Way, Suite 300, Orange Park, FL 32073

888-268-8770 ∙ 904-278-0524 ∙ [email protected] ∙ www.sqe.com

Martin Kralj

Martin Kralj is responsible for the data masking line of tools and services at Ekobit. In his fifteen years in the software industry, Martin has worked as a business analyst, enterprise software development professional, consultant, project manager, and customer support engineer. He has held key roles on teams producing Ekobit’s flagship products, TeamCompanion and BizDataX, and directed software projects in-house and worldwide. Recently Martin has specialized in application lifecycle management, particularly agile software development methodologies, team work, and data masking. He presents at various conferences and writes about software.

13.5.2013

1

Data Masking

•Testing with Near-real Data

About me

~ Martin Kralj

~ Software development

~ Project managementand ALM + consulting

~ BizDataX by Ekobit� Complex data relationships

� Large databases

� Near-real data

� Designed for enterprise

13.5.2013

2

13.5.2013

3

Agenda

~ Handling sensitive data

� Define “sensitive”

� Norms and regulations

~ Data masking

� Concepts and basic techniques

� How can we do it?

� Scripts vs. tools and platforms

Comply to data privacy and

security laws

13.5.2013

4

USA norms and regulations

~ Nationwide� HIPAA (Health Insurance Portability and Accountability Act)

� HITEC (Health Information Technology for Economic and Clinical Health Act)

~ State specific, California as an example� CMIA (Confidentiality of Medical Information Act)

� IPA (Information Practices Act)

� PAHRA (Patient Access to Health Records Act)

� IPPA (Insurance Information and Privacy Protection Act)

� Security Breach Notification Law

~ Industry wide� PCI DSS (Payment Card Industry Data Security Standard)

Self-interests and reputation

~ Corporate rules

~ Competition and industrial espionage

~ Protecting intellectual property

~ Ethical reasons and

protection of reputation

13.5.2013

5

Work with near-real data

~ Format preserving and

context sensitive

~ Secondary usage of

sensitive data

is avoided

{

// Demo

}

Demo: is it real or fabricated?

13.5.2013

6

Suppression

ID First Name Last Name Date of Birth Phone Gender

1 Sasha Cortez 20.7.1967 1-340-337-7194 Female

2 Neve Dyer 17.11.1975 1-599-974-8272 Female

3 September Graves 9.6.1977 1-404-899-2966 Female

4 Theodore Graves 27.10.1962 1-266-364-7119 Male

5 Donovan Hoover 19.3.1978 1-728-752-4244 Male

6 Lynn Joyner 16.12.1984 1-124-859-5234 Female

7 Quon May 19.11.1954 1-406-895-7153 Female

8 Berk Mcclain 18.7.1966 1-938-803-0464 Male

9 Hakeem Ray 9.4.1964 1-734-314-8964 Male

10 Paki Sellers 10.11.1956 1-641-173-5621 Male

ID First Name Last Name Gender

2 Neve Dyer Female

4 Theodore Graves Male

5 Donovan Hoover Male

7 Quon May Female

8 Berk Mcclain Male

Shuffling

ID First Name Last Name Gender

1 Cortez Female

2 Dyer Female

3 Graves Female

4 Graves Male

5 Hoover Male

6 Joyner Female

7 May Female

8 Mcclain Male

9 Ray Male

10 Sellers Male

Sasha

Neve

September

Theodore

Donovan

Lynn

Quon

Berk

Hakeem

Paki

13.5.2013

7

Redaction (blacking-out)

ID First Name Last Name Age

1 Sasha Cortez 44

2 Neve Dyer 36

3 September Graves 34

4 Theodore Graves 49

5 Donovan Hoover 33

6 Lynn Joyner 27

7 Quon May 57

8 Berk Mcclain 45

9 Hakeem Ray 47

10 Paki Sellers 55

Generalization

ID First Name Last Name Age

1 Sasha Cortez 41-50

2 Neve Dyer 31.40

3 September Graves 31-40

4 Theodore Graves 41-50

5 Donovan Hoover 31-40

6 Lynn Joyner 21-30

7 Quon May 51-

8 Berk Mcclain 41-50

9 Hakeem Ray 41-50

10 Paki Sellers 51-

13.5.2013

8

Randomization, generating

and substitution

ID First Name Last Name Phone

1 Sasha Cortez 1-340-337-7194

2 Neve Dyer 1-599-974-8272

3 September Graves 1-404-899-2966

4 Theodore Graves 1-266-364-7119

5 Donovan Hoover 1-728-752-4244

6 Lynn Joyner 1-124-859-5234

7 Quon May 1-406-895-7153

8 Berk Mcclain 1-938-803-0464

9 Hakeem Ray 1-734-314-8964

10 Paki Sellers 1-641-173-5621

ID First Name Last Name Phone

1 Sasha Cortez 1-182-260-6935

2 Neve Dyer 1-886-794-9258

3 September Graves 1-847-263-1225

4 Theodore Graves 1-341-810-3139

5 Donovan Hoover 1-982-608-9112

6 Lynn Joyner 1-960-142-1834

7 Quon May 1-872-132-9340

8 Berk Mcclain 1-612-726-9353

9 Hakeem Ray 1-157-361-5540

10 Paki Sellers 1-834-906-6092

Masking techniques

~ Suppression

~ Shuffling

~ Redaction (blacking out)

~ Generalization

~ Randomization, generating and substitution

13.5.2013

9

Dynamic data masking

Real

data1234-5678-4011DDM

XXXX-XXXX-4011

1234-5678-4011

Primary

process

Secondary

process

Static data masking

Real

data1234-5678-4011

Masked

dataXXXX-XXXX-4011

SDM

XXXX-XXXX-4011

1234-5678-4011

Primary process

Secondary processes

13.5.2013

10

{

// Demo

}

Simple script

Tools: masking logic built-in

13.5.2013

11

Tools: declarative approach

{

// Demo

}

Define rules: simplicity and power

13.5.2013

12

Tools: performance and expertize

~ Explicit and implicit parallelism

~ Automatic and scheduled execution

~ Notifications, monitoring and auditing

~ Efficient processing of large amounts of data

~ Deterministic or repeatable masking

Tools: systematic approach

~ Thorough analysis of existing infrastructure,

data, people and processes

~ Natural separation of roles and responsibilities

~ Data can be handled as other must haves and

daily routines

~ Accountability and traceability

13.5.2013

13

Conclusion

~ Data Masking improves data security and handling sensitive data in general

~ Technology is heavily underutilized

� Side job for administrator or programmer

� There is no real control

~ Done by the book

� Systematical and project approach

� Explore and use specialized tools

� Ask for help

13.5.2013

14

~Martin Kralj

[email protected]

[email protected]

data masking: testing with near-real data

Technology

dyer graves hoover

data masking testing

data privacy

data masking concepts

data masking line of

handling sensitive data

sasha neve

masking techniques