risk management and the release of microdata: sonia whiteley & eric skuja the social research...

14
Risk management and the release of microdata: Sonia Whiteley & Eric Skuja The Social Research Centre balancing disclosure risks and data utility

Upload: miranda-wells

Post on 26-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Risk management and the release of microdata:

Sonia Whiteley & Eric Skuja

The Social Research Centre

balancing disclosure risks and data utility

About the Social Research Centre (1)

We are a private, for profit company owned by ANU Enterprise, a subsidiary of the Australian National University and co-founder of the Australian Centre for Applied Social Research Methods (AusCen).

Our resources include 60 professional staff, 125 station call centre, a panel of 250 interviewing staff and qualitative interviewing facilities.

Typical services provided include survey design and execution (including sampling, questionnaire design, survey administration and interviewer training), qualitative research, survey data management, statistical consulting and analytical and interpretative reporting.

About the Social Research Centre (2)

We conduct a number of large scale surveys that contribute to the annual Report on Government Services (ROGS) which provides information on the equity, effectiveness and efficiency of government services in Australia including the:

National Survey of Community Satisfaction with Policing (NSCSP)

Student Outcomes Survey (SOS)

Australian Early Development Census (AEDC).

About the AEDC (1)

The Australian Early Development Census (AEDC) is conducted every three years for every Australian child currently enrolled in their first year of full-time school.

Approximately 100 checklist questions cover the five theoretical domains of early childhood development including:

− Physical health and well being

− Social competence

− Emotional maturity

− Language and cognitive skills (school-based), and

− Communication skills and general knowledge.

About the AEDC (2)

The AEDC was conducted for the first time in 2009 with a second collection in 2012. Preparations are already underway for the 2015 AEDC collection.

Approximately 290,000 checklists were completed for each AEDC which equates to more than 96 per cent of in-scope children.

There are three school systems in Australia, Government, Catholic and Independent and all actively participate in the AEDC.

Our role in the AEDC

Our involvement in the AEDC is ongoing and includes:

Collecting checklist data from teachers via a secure, online system

Developing and maintaining a government website containing

− AEDC resources http://www.aedc.gov.au/ and

− AEDC macrodata and maps http://www.aedc.gov.au/data

Managing and disseminating the AEDC data collections.

Traditional risk management of microdata

A ‘worst case’ scenario approach requires the data custodians to be responsible for identifying and mitigating all potential risks

This model assumes that data users are

Unprofessional

Cannot be trusted

Do not have the required skills or training, and

(In extreme cases) Intend to maliciously misuse the data

Data must be protected from the users and the utility of the unit record data for research is a lower priority

The product of this approach is typically a confidentailised unit record file and the confidentialisation is regarded as the primary safeguard

Initial approach to AEDC data management

The AEDC Data Protocol and the AEDC Linkage Policy provide the research community with guidance regarding the appropriate uses of the data.

Two confidentialised unit record files (CURFs) were produced:

1. The Research CURF and

2. The Geography CURF

The files were split due to the

− large number of demographic variables

− fine level of geographic information that was available, and

− concerns about the re identification and disclosure of children, classes and schools

Perturbation issues

Both files were perturbed by an external agency based on a ‘worst case scenario’ view of risk

The perturbation rules were undisclosed to prevent reverse engineering

Key variables of relevance to early childhood education researchers were changed significantly

− In particular, gender of the child was altered substantially in some geographic areas

Government agencies started to use the CURFs because they were smaller and more accessible files which led to discrepancies between the official results and the agency results

Alternative approaches to risk management

The responsibility for appropriately using and reporting microdata is shared between the data custodian and the research community

It is assumed that researchers will observe their professional codes of conduct and do not intend to misuse the data

The main risk focus is on ensuring that researchers appropriately handle, store and publish the data. This includes:

Confirming a genuine research aim

Restriction of data access to authorised users

Ensuring the microdata is anonymised, and

Undertaking a risk assessment of files prior to release

Formal risk assessments

Government departments are still extremely risk averse, especially when it comes to information about very young children. Two simple risk assessments accompany each microdata file. Both act as topics for negotiation and support rather than obstruction:

1. The proportion of unique records in the dataset is assessed mainly to discourage unauthorised data linkage projects being undertaken by other government agencies.

2. The proportion of cells in two-way tables with three or fewer children is calculated to foreshadow potential problems when researchers publish the data.

High risk data

Built Environment measures Child Friendly Neighbourhoods: A composite index based on the

following built environment measures requires the children’s X-Y coordinates:

− Child health resources Proximity of childcare facilities.

− Parks and greenness Proximity of neighbourhood parks.

− Residential density Number of residential dwellings.

− Home environment Type of residence. Size of backyard.

− Traffic exposure Road network classifications

− Crime Crime and child related offenses

− Land use mix Evenness of different land uses

− Public transport Accessibility of bus and rail stops

− Street connectivity Number of 3-way or more intersections

Managing high risk data

Points dispersed within a mesh block

Implications of using anonymised microdata

Improves data utility but does not necessarily present higher levels of disclosure risk than a CURF.

Ensures that there is ‘one version of the truth’ and that outputs produced by researchers will be consistent all data users.

Ensures that access requests for any unit record file follow the same formal, detailed assessment, management and close-out procedures.

Concerns about unintentional misuse of microdata need to be clearly communicated to the research community.

Provides an avenue for offering training and support are a condition of data release where a potential data user may not have the skills or experience

Creates a platform where all genuine data access requests can be accommodated through a combination of engaged negotiation regarding the required data elements and by offering supported (and supportive) access modalities.