data security guidelines
TRANSCRIPT
Data Security Guidelines
May 2010 – Version 1.0
Gary Waldrom
Data Security Guidelines 2010· Page 2
Entity Class
A logical model of an identifiable party• Each instance of an
entity defined within the system should be identified and marked for drill down investigation
Domains
A logical structure of attributes represented within a single entity• Each instance of a
domain structure as listed within the spreadsheet (slides 5-8) and being contained within an identified Entity should be marked for further drill down investigation
Attributes
Individual data fields under data type constraints and associated business and integrity rules• Each attribute type as listed within the spreadsheet (slides 5-8) and being contained within
an identified domain is a candidate for data obfuscation based on the data obfuscation rules
Candidate Selection
Data Security Guidelines 2010· Page 3
Level 1• Sensitivity level 1 is a unique identifier in which a party can be identified
without further reference to other sensitive information (High Cardinality), all instances should be obfuscated or masked
Level 2• Sensitivity level 2 is information which collectively i.e. more than 1
instance may form a positive identification of a party, in isolation this data, although deemed sensitive has no direct and unique identification of the party however the more attributes supplied ultimately form a sensitivity level 1 without a level 1 being involved (Normal-Cardinality). All combined instances must be obfuscated
Level 3• Sensitivity level 3 is data with a Low Cardinality ratio. All combined
instances should be obfuscated although individual instances will not identify a party
Data Sensitivity
Risk of Identification of Parties
• ∞ • High risk, these
identifiers will uniquely identify party and are traceable through various public domain based systems
Unique Identifier – Sensitivity Level 1
• n exponent
• Becomes an identifier as multiple instances increase cardinality, exponent based on cardinality
Composite Identifiers –
Sensitivity Level 2 • n + Composite • Multiple composites
increase identification, cardinality increases as instances are added
Low Cardinality Identifiers –
Sensitivity Level 3
Data Security Guidelines 2010· Page 4
Attribute Identification
Data Security Guidelines 2010· Page 5
Entity Domain Attribute Data Type (Generic) Classification
Client
Name
Firstname(s) Character 2
Surnames / Family Name Character 2
Title / Prefix Denormalised: Character 3
Suffix Denormalised: Character 3
Salutation Denormalised: Character 2
Address
House Number/Name Character 2
Address Line 1 Character 2
Address Line 2 Character 2
Address Line 3 Character 2
Address Line 4 Character 2State / County / Canton / Region etc Denormalised: Character 3
Zip / Post Code Character 2
Country Denormalised: Character 3
Contact
Home Telephone Number Character 1
Work Telephone Number Character 1
Cell/Mobile Number Character 1
Additional Telephone Numbers Character 1
Email1 Character 1
Email2 Character 1
Additional Email Accts Character 1
Attribute Identification
Data Security Guidelines 2010· Page 6
Entity Domain Attribute Data Type (Generic) Classification
Client
Personal Details
Date of Birth Date 3
Gender Denormalised: Character 3
Political Persuasion Denormalised: Character 3
Religious or Philosophical Beliefs Denormalised: Character 3
Sexual Persuasion Denormalised: Character 3
Race or Ethnic Origin Denormalised: Character 3
Accusations or Suspicions Denormalised: Character 3Convictions / Judgements / Criminal Records Denormalised: Character 3
NotesLong Character (Free text could hold sensitive details) 1
Internet usage & web tracking information Character / W3C Logs 2
Physical and/or Mental Health Character 3
Source of WealthLong Character (Free text could hold sensitive details) 1
Nationality Denormalised: Character 3
Domicile Denormalised: Character 3
Spouse Name Domain 2
Children Name Domain 2
Attribute Identification
Data Security Guidelines 2010· Page 7
Entity Domain Attribute Data Type (Generic) Classification
Client
Natural Keys
SSN / Tax ID / NI Number Character 1
Passport Number Character 1
Login ID's & Passwords Character 1Union / Club / Society Membership Character 1
Bank Account Number(s) Number 1
Sort Code(s) Number 2
Account Name(s) Character 1
Residential Address Address Domain 2
Linked Data
Beneficiary Beneficiary Entity 1
IFA IFA Entity 2
Intermediary Intermediary Entity 2
Sub Account Sub Account Entity 1
Accountant Accountant Entity 2
Attribute Identification
Data Security Guidelines 2010· Page 8
Entity Domain Attribute Data Type (Generic) Classification
Beneficiary All Client Entity Domains 2
IFA All Client Entity Domains 3
Intermediary All Client Entity Domains 3
Sub Account All Client Entity Domains 1
ClassificationKey
Sensitivity Level 1Sensitivity level 1 is a unique identifier in which a party can be identified without further reference to other sensitive information (High Cardinality), all instances should be obfuscated
Sensitivity Level 2
Sensitivity level 2 is information which collectively i.e. more than 1 instance may form a positive identification of a party, in isolation this data, although deemed sensitive has no direct and unique identification of the party ,however the more attributes supplied ultimately form a sensitivity level 1 without a level 1 being involved (Normal-Cardinality). All combined instances must be obfuscated
Sensitivity Level 3
Sensitivity level 3 is data with a Low Cardinality ratio. All combined instances should be obfuscated although individual instances will not identify a partyNote: Normalised data types obfuscated layer at the reference table level
Use-Case Example of Composite Identifiers (Sensitivity Level 2)
• Cardinality =>1,000,000
First Name
• Cardinality =>100,000Surname
• Cardinality =>10,000Country
• Cardinality =>100Region
• Cardinality =>5Post Code
• Cardinality =<2House
Number
Data Security Guidelines 2010· Page 9
Increase of positive
identification by a
cumulative of sensitivity 2
attributes held within the
same domain
Data is purely for reference
Obfuscation point
Point of probability
Use-Case Example of Composite Identifiers (Sensitivity Level 3)
• Cardinality =>100,000,000
Gender
• Cardinality =>10,000,000
Country
• Cardinality =>1,000,000Region
• Cardinality =>3,000Date of Birth
• Cardinality =>5Surname
• Cardinality <=2Post Code
Data Security Guidelines 2010· Page 10
Little increase of positive
identification by a
cumulative of sensitivity 1
until the addition of a sensitivity
level 2 attribute
Data is purely for reference
Obfuscation point
Point of probability
Data Security Guidelines 2010· Page 11
Numbers used in aggregate functions and checked to provide accuracy i.e. holdings, values, transactions, should not be obfuscated if all other attributes within the domain/entity structure have been obfuscated and
there is no method of reversing the obfuscation layer to identify sensitive data against the values, barring that:
Integers should be obfuscated equal to
or less than the length of the original
number but still conform to any
specific business rules
Fixed point numbers should be
obfuscated equal to or less than the
original precision and obfuscated but retain the original scale number but
still conform to any specific business
rules
Floating point numbers should be obfuscated equal to
or less than the original precision and scale number but still conform to
any specific business rules
Currency/percentage formatting over numeric values
should be retained for verification
purposes
Ordinal numbers should have the
alphabetic element obfuscated in the same way as an
alpha data element retaining the same
two character format
Numeric Obfuscation
Data Security Guidelines 2010· Page 12
Alphabetic and Alphanumeric data types should be obfuscated retaining the original structure of the underlying
data, however certain exceptions exist for search/view criteria
SGML/XML/HTML/XHTML/RSS data formats must retain XML
reserved characters in order for them to be used in native views, DTD, XLS, Web based formats
etc.
Embedded Java Code must be retained but underlying attributes
obfuscated
Alpha Obfuscation
Data Security Guidelines 2010· Page 13
Obfuscation of keys gives rise to the challenge of failure of Declarative Referential Integrity when presented to
certain applications that rely upon them thus:
Natural keys that are identified as sensitive
data can only be anonymised/masked
Natural keys that are identified as non-
sensitive are out of scope and may be
retained
Surrogate keys are out of scope and
should be retained
Key Obfuscation
Data Security Guidelines 2010· Page 14
Dates should retain the original date format of the National Character set of the underlying
data
Day numbers should be
obfuscated but retain the 1-31
format
Day of the week numbers should be obfuscated
but retain the 0-6 or 1-7 formatting
dependent on platform
Day names should be
obfuscated as per the alpha data element, however the
length of the day must be changed
to a length between 6 and 9 but not the same
length as the original day
Month numbers should be
obfuscated but retain the 1-12
format
Ordinal numbers should have the
alphabetic element
obfuscated in the same way as an
alpha data element retaining
the same two character format
Date Obfuscation 1
Data Security Guidelines 2010· Page 15
Dates should retain the original date format of the National Character set of the underlying
data
Month names should be obfuscated as per the alpha data element,
however the length of the month must be changed to a length
between 3 and 12 but not the same length as the original month.
Abbreviated month names should be obfuscated retaining the 3-
character format
Year numbers should always retain the century 4-number format in the range (current year- any validation criteria) to current year-1 for years in the past and current year + 1 to
(current year +any validation criteria) for projected ranges. (This potentially could cause problems
with date verification functions and any function code which performs these verifications must utilise the same seed value as the date value and must fully enclose within the
same block all other dates)
Decision support systems relying on “roll-forward”/”roll-back” date
scenarios and date range queries must retain the requested period
change between two dates
Date Obfuscation 2
Granularity of Access to Sensitive Data
Development•Development environments must be fully obfuscated at the data level (not obfuscated views) as developers usually hold higher privileges in these environments
UAT•UAT environments must be fully obfuscated to all Development, Support, and Non-Authorised users
•Business users may see sensitive data based on their individual levels of authorisation
•Access to data by Support users should be disallowed if possible
• If access is allowed for “fix-on-fail” functionality this must be keystroke logged through an auditing application
Production•Production environments must be fully obfuscated to all Development, Support, and Non-Authorised users
•Business users may see sensitive data based on their individual levels of authorisation
•Access to data by Support users should be disallowed if possible
• If access is allowed for “fix-on-fail” functionality this must be keystroke logged through an auditing application
Data Security Guidelines 2010· Page 16
Business Users, Development & Support
Business Users only
Development & Support only
Deployment Methods
Data Security Guideline Policy
Full Environment Access Control
Prod, UAT, SIT & Dev environments
are fully segregated by user type, or privilege level.
Data obfuscation/anonymisation/masking is performed through ETL tools from one environment to the
next
Shared Environment
Access
Prod, UAT, SIT & Dev environment
may share different user types i.e.
business, developers,
support. The level of granularity must
be defined on a per-user type or
privilege level basis.
Data is obfuscated/anonymised/masked based
on the authority level of the user type or privilege
level
Hybrid Environments
Prod, UAT, & SIT environments may be obfuscated at a user type level but transfers of data
into Dev environments may
be performed through ETL utilities
Data is obfuscated to the same rules
but the deployment method uses both technical methods
Data Security Guidelines 2010· Page 17
Benefits & Drawbacks of Deployment Methods
Full Environment
Access Control
Benefits•Leverage existing tools capabilities and vendor support
•Guaranteed obfuscation contained within the environment
•User access managed at different layer to data access
•Access to environment determines visibility
Drawbacks•ETL tool license/platform costs
•Load window issues•Metadata & cipher security concerns
Shared Environment
Access Control
Benefits•Higher level of access granularity, greater flexibility
•Define the level of encryption to conform to national regulatory controls
•No load window issues all users share same data instance
Drawbacks•Development costs•Requires clear delineation of user roles and role management
•Proprietary technology solutions
Hybrid Environments
Benefits•All prior mentioned•Greater flexibility in defining a solution which fits with a current “modus operandi”
Drawbacks•All prior mentioned•Potential support complexity issues
Data Security Guidelines 2010· Page 18
Data Obfuscation Methodology
Full Environmental Access Control• No data
obfuscation, none authorised users have no access
Hybrid environment• No access to
PROD, obfuscation in UAT based on roles and rules, ETL obfuscation into DEV
Shared Environment• Data obfuscation
based on roles and rules of sensitivity
Data Security Guidelines 2010· Page 19
Environmental Control (Access Method)
Data Security Guidelines 2010· Page 20
DEVUAT
ETL (Apply Obfuscation Rules)
Business UsersDevelopment & Support
Users
Informatica
Instance 2 (Obfuscated)
Instance 3 (Obfuscated)
Informatica
PROD
Instance 1ETL (Apply Obfuscation Rules)
Environmental Control (Hybrid Method)
Data Security Guidelines 2010· Page 21
PROD DEVUAT
ETL (Apply Obfuscation Rules)
Business UsersDevelopment & Support
Users
Obfuscation Layer
Informatica
Instance 1 Instance 1 or 2
Instance 3 (Totally Obfuscated)
Periodic Refresh or Duplex Feed
Appendix
Terms of Reference
Lingual Reference
Risk Impact/Probability
Non-Deterministic Obfuscation
Monte Carlo Method
Dynamic Obfuscation
Function
Methods
Data Security Guidelines 2010· Page 22
Data Security Guidelines 2010· Page 23
Anonymous/Anonymised
To remain unidentified, nameless i.e. NULL therefore a field that is anonymous would not show any
data at all and you could not verify the structure of the data
Obfuscate/Obfuscated
To confuse, scramble i.e. encrypt, therefore you could verify that a date was a date albeit the wrong one, a number is a number albeit
the wrong one and alpha is alpha in the same structure so you would
see the structure but the sensitive data would be indecipherable
Mask/MaskedTo cover, hide, this would normally
be used in password protection where the asterisk is displayed as
typed
Lingual Reference
Anonymous and Obfuscate are used in literature, an anonymous writer is unknown whereas writing under a nom de plume is obfuscated
Risk impact/Probability
Data Security Guidelines 2010· Page 24
Probability - A risk is an event that "may" occur. The probability of it occurring can range anywhere from just above 0% to just below 100%. (Note: It can't be exactly 100%, because then it would be a certainty, not a risk. And it can't be exactly 0%, or it wouldn't be a risk.)
Impact - A risk, by its very nature, always has a negative impact. However, the size of the impact varies in terms of cost and impact on some other critical factor.
We apply these rules to determine when to obfuscate data and when not to
Non-Deterministic Obfuscation
A variety of factors can cause an algorithm to behave in a way which is not deterministic, or non-deterministic:• If it uses external state other than the input, such as user input, a
global variable, a hardware timer value, a random value, or stored disk data.
• If it operates in a way that is timing-sensitive, for example if it has multiple processors writing to the same data at the same time. In this case, the precise order in which each processor writes its data will affect the result.
• If a hardware error causes its state to change in an unexpected way.
A major problem with deterministic algorithms is that sometimes, we don't want the results to be predictable.
For example, if you are playing an on-line game of blackjack that shuffles its deck using a pseudorandom
number generator, a clever gambler might guess precisely the numbers the generator will choose and so determine the entire contents of the deck ahead of time,
allowing him to cheat. Similar problems arise in cryptography, where private keys are often generated
using such a generator. This sort of problem is generally avoided using a cryptographically secure pseudo-random
number generator.
Data Security Guidelines 2010· Page 25
The Monte Carlo Methods
Monte Carlo methods are computational algorithms that rely on repeated random sampling to compute their results one of which is a
stochastic function to create an obfuscation layer
Stochastic programming is a framework for modelling optimization problems that involve uncertainty.
Because of their reliance on repeated computation of random or pseudo-random numbers, these methods are most suited and tend to
be used when it is unfeasible or impossible to compute an exact result with a deterministic algorithm thus ensuring data obfuscation
These are the building blocks to secure obfuscation of highly sensitive data within the banking environment and will satisfy an
external audit
Data Security Guidelines 2010· Page 26
Dynamic Obfuscation Function Methods
Data Security Guidelines 2010· Page 27
This is an example of a high level data obfuscation function in which a decision is made based on the previous criteria of when to obfuscate and the process of obfuscation for an alpha data type (simplest form)
Data is obfuscated on the decision point based on the underlying technologies info-gap non-probalistic theory methods of random number generation which creates seed data for ASCII conversion of real-data