data masking concept in power center

1

…helping protect sensitive data

2

What is Data Masking Why Data Masking DM Transformation - Informatica Different Masking Rules

Key Masking Random Masking Inbuilt Masking Rules Substitution Masking

Procedure followed – DM requirement in DICoE Challenges faced in AASI Data Masking

3

Transformation of sensitive information into de-identified, realistic-looking data

Data remains relevant and meaningful Preserves the original characteristics of data Preserves referential integrity

4

There are requirements in the Enterprise for production data in non-production environments for needs like

Development Test Data Analysis and training

Organizations take immense measures to secure private data in production environments. As a result the non-prod environments become an attractive target to the malicious users. There rises need to use the prod data in testing environments in a way such that the sensitive data is masked yet realistic. Informatica power center data masking option protects the sensitive information by masking it while maintaining the original nature of data and preserving the referential integrity.Pre-requisite for Data Masking transformation is Infa 8.5.1. In AMP DM server components are installed in Infa 8.6.1

5

Data Masking feature can be utilized by just adding a new transformation – Data Masking transformation in the mapping.

The DM transformation masks the source data based on the masking rules that we configure for each field which is identified to have sensitive data.

Masking rules can be configured to provide

Non-Deterministic Randomization Deterministic and Repeatable masking Blurring – adding variance value to the original data Substitute original data with false unrealistic data

6

Different Masking Rules

Key Masking : produces deterministic data. Maintain referential integrity by the use of seed value. DM transformation requires seed value to return deterministic data. DM transformation creates default seed value and is also editable. Default seed value is a random number between 1 and 1,000.

Key Masking types

String Masking – Key masking for strings can be configured to generate repeatable outputs. We can specify the following for string key masking Mask Format – Different Mask formats are A,N,D,X,+,R Characters to be masked in the source string Replacement characters

Numeric Masking – Field in the source file or table can be configured for numeric key masking to generate repeatable outputs.

Date Masking – This masking rule can be used if a particular date column needs to be masked in such a way that it maintains referential integrity.

8

Random Masking : to generate non-deterministic data The Data Masking transformation returns different values when the same source value occurs in different rows.

Random Masking Types

Numeric MaskingRules that can be applied for numeric random maskingRange – define range of the masked valueBlurring – generate masked values that are within the fixed or percentage

variance ofsource data.

String MaskingSimilar rules as string key masking. In addition there will be option to specify

the rangeof string length.

10

Date Random MaskingMasking rules that can be applied Range - upper/lower bound of the masked date value

The default date time format is MM/DD/YYYY HH24:MI:SS. Blur – mask date based on the variance applied to the unit of date.

Blur unit can be year, month day or hour. Default is year. DM applies variance to the selected blur unit and for other units random numbers are substituted.

For example, to restrict the masked date to a date within two years of the source date, select year as the unit. Enter two as the low and high bound.

11

Inbuilt Masking Rules Inbuilt masking rules that can be applied SSN Credit card URL / IP address Phone Email address

Masking Social Security NumberA list containing the valid SSN numbers will be stored in the infa server path<Installation Directory>\infa_shared\SrcFiles\highgroup.txt. The DM

transformationaccess the highgroup.txt file and generates masked SSN that is not available

in the list.

12

Substitution Masking:

Substituting data with lookup transformationApart from the different masking algorithms that are available, we can also substitute

original data with unreal information from dictionary files. The default dictionary files will be available in the following path server\infa_shared\LkpFiles

Example: FirstNames.dicThis file will contain SNo column and FirstNames column. In the mapping we can generate a random number using the DM, give random number as input to lookup thereby lookup for the SNo and get the first name from the lookup file. Suppose the

dic file has 100 names in it, while generating random numbers range can be specified

and 1 to 100.

13

Identify sensitive fields Documenting DM requirement in proper format– ideally it should have

table/file name , attribute/field name , DM required (Y/N) , PK/FK relation , Rule Type and Description

If requirement has common fields to be masked across files/tables , creating mapplet with the “to be masked” fields would be helpful.

Coding/Testing

14

Few challenges faced in AASI Data Masking

In the AASI DM requirement, the source and target were MF files. So to ensure that our DM mappings makes no impact to the fields which does not require masking , we had Only the “fields to be masked” in the Data Map and all others were declared as Filler with binary data type.

Masking Format defined for attributes should be in sync with the actual character feed from source as in case of String Masking. For example, if the mask format is defined to have “alphabets – A” but value from source is having special characters (@,$ etc.), error will pop up - “Invalid input mask format”

SSN Masking accepts only valid TAX_ID as input example: XXX-XX-XXXX.So if we are planning to use inbuilt SSN masking, we need to take call on whether to use SSN masking by transforming the input source value to proper format or simply use numeric/string key masking.

Maintaining Data Quality – DM transformation produces masked output based on inbuilt algorithms, so even if null or 0 values are passed as input, DM generates a masked output value. But it may be that the downstream teams using the masked data may need to check for NULL or 0 values in the source. So we need to make sure that the data quality is maintained. As in the above case, we may have to apply a transformation to retain source value in case of 0 or NULL.

15

Thank You

data masking concept in power center

Documents