creating a national remote access system for register-based research marianne johnson, statistics...

14
Creating a National Remote Access System for Register- based Research Marianne Johnson, Statistics Finland Statistical Data Confidentiality Work Session Oct 2015

Upload: regina-haynes

Post on 17-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Creating a National Remote Access System for Register-based Research Marianne Johnson, Statistics Finland Statistical Data Confidentiality Work Session

Creating a National Remote Access System for Register-based Research

Marianne Johnson, Statistics Finland

Statistical Data Confidentiality Work Session Oct 2015

Page 2: Creating a National Remote Access System for Register-based Research Marianne Johnson, Statistics Finland Statistical Data Confidentiality Work Session

Statitics Finland /Researcher Services2

Finnish administrative registers

• several comprehensive national registers• contain unit level data on individuals, families, housing,

enterprises• compiled and maintained for administrative or statistical

purposes, e.g. – Population Register Centre (VRK)

– Population information system– Social Insurance Institution (KELA)

– Registers on obtained social benefits– National Institute for Health and Welfare (THL)

– Medical Birth Register, – Care Registers for Social Welfare and Health Care (HILMO),– Finnish Cancer Register

– Ministry of Labour (TEM)– Register over job seekers

– Statistics Finland (Tilastokeskus)

21.9.2015

Page 3: Creating a National Remote Access System for Register-based Research Marianne Johnson, Statistics Finland Statistical Data Confidentiality Work Session

Statitics Finland /Researcher Services3

Secondary usage of administrative registers

• Production of official statistics is to a large extent based on registers in Finland

- the population and housing census has been based totally on register sources since 1990

- Handbook: Use of Registers and Administrative Data Sources for Statistical Purposes – Best Practices of Statistics Finland

• Register-based research– 20 % of doctoral thesis’ within medicine in Finland include

data from national registers

21.9.2015

Page 4: Creating a National Remote Access System for Register-based Research Marianne Johnson, Statistics Finland Statistical Data Confidentiality Work Session

Statitics Finland /Researcher Services4 21.9.2015

Page 5: Creating a National Remote Access System for Register-based Research Marianne Johnson, Statistics Finland Statistical Data Confidentiality Work Session

Statitics Finland /Researcher Services5

Prerequisites for register-based research

• Common personal identification number in all registers– first used in 1964 ( between 1964-1970 two different systems) – since 1971 a digital population register – all Finns have a PIN data from different registers can be linked by PIN e.g.

for research purposes

• Legislation that allows the use of confidential personal data for scientific research• Trust in register keepers and researchers

• Comprehensive, well documented registers

21.9.2015

Page 6: Creating a National Remote Access System for Register-based Research Marianne Johnson, Statistics Finland Statistical Data Confidentiality Work Session

Statitics Finland /Researcher Services6

Legislative basis for research use of data from Statistics Finland- Statistics Act (280/2004)- In 2013 the Statistics Act was amended to better facilitate the use

of data gathered at Statistics Finland for research purposes.- New objective of the Act

– To extend the use of the data collected for statistical purposes in scientific studies and statistical surveys on social conditions.

- Possibility for researchers to gain access to confidential data from which only the direct identifiers have been removed.– Before 2013 statistical authorities could not give permission

to such confidential data from which the statistical unit could be indirectly identified.

– Gain access = see and analyze data by a remote access -system

21.9.2015

Page 7: Creating a National Remote Access System for Register-based Research Marianne Johnson, Statistics Finland Statistical Data Confidentiality Work Session

Statitics Finland /Researcher Services7

Remote access system (FIONA)

- In use at Statistics Finland since 2009, development project 2014-2015

- Model taken from Sweden, Denmark and the Netherlands- Researchers use data on Statistics Finland’s server at their own

workplace via a secured Internet connection, data remains at SF- Researchers use a Windows remote desktop, and have access to the

data they have obtained permission to as well as to metadata- The researchers have access to wide range of statistical programs :

STATA, SPSS, R, SAS, Python Anaconda, …- Each research project has its dedicated folders and storage space in the

system - Technical maintenance of the FIONA-system transferred to CSC-It Centre

for Science in 2015 - Number of users and data sets in the remote access system is growing

steadily, currently about 150 active users

21.9.2015

Page 8: Creating a National Remote Access System for Register-based Research Marianne Johnson, Statistics Finland Statistical Data Confidentiality Work Session

8

Confidentiality- Research data sets are stored on Statistics Finland’s /CSC’s

servers - Only mouse, keyboard and graphic signals are transferred- Access to the system only from preapproved IP-addresses - A disposable SMS password is sent each time the researcher

logs in to FIONA- All data transfers from and to FIONA are handled by personnel at

the Researcher Services of SF– Outputs are checked so that direct or indirect identification is

not possible and files are saved for possible future reference- Access to data is terminated when the permit for the project

expires- FIONA environment is separated from the production network- The system will be audited in fall 2015 after being transferred to

CSC

21.9.2015 Statitics Finland /Researcher Services

Page 9: Creating a National Remote Access System for Register-based Research Marianne Johnson, Statistics Finland Statistical Data Confidentiality Work Session

Statitics Finland /Researcher Services9

A typical process in applying for sensitive research data

A researcher applies for a

licence to access data for a research

project

The application must include a

research plan and a pledge of

secrecy

The Ethics Committee is consulted in cases

involving large datasets with confidential data

If the data can be given out the licence is

granted (possibly with modifications)

A contract is signed specifying the dataset and the fee as well as

the date of delivery

The data is put together, edited and

uploaded to the remote access system

The researcher uses a remote connection to

analyse the data and sends the

results to Research Services

The results are checked to make sure that no units

(persons, companies) can be

identifiedThe results are sent to the

researcher and they can be used

in publications21.9.2015

Page 10: Creating a National Remote Access System for Register-based Research Marianne Johnson, Statistics Finland Statistical Data Confidentiality Work Session

Present process for obtaining register data for research

RESEARCHER

Authority AuthorityStatistics FinlandAuthority

§

§ §§

•Handling permit applications•Control and specification•Compiling data-sets

§

§

@@

@@

Researcher responsible of data security and disposal of data sets

Searching for data sets and applying for permits from several different authorities, with varying practices

Delivering data using varying practices

§

Possible corrections and re-sending

Data protection Authority

€€

€ €

21.9.2015 Statitics Finland /Researcher Services 10

InternetInternet

Page 11: Creating a National Remote Access System for Register-based Research Marianne Johnson, Statistics Finland Statistical Data Confidentiality Work Session

FMASRemote access system

 

Services that require permit

• Remote desktop for analysing data (programs and tools)• Separated server space for data and metadata • Output service for results, Input service for researcher’s data

Services that require registration

• Centralized digital permit application service

Public services

• Data catalogue• Helpdesk for research and tuition

Interface service for data and meta data,

Pseudonymization

Administration services for user rights

Organiza-tion A

Organiza-tion C

Organiza-tion E

- Commonly agreed metadata standards – Data warehouse - Archive of multiple user files

Researcher

Organiza-tion B

Organiza-tion D

21.9.2015 Statitics Finland /Researcher Services 11

Page 12: Creating a National Remote Access System for Register-based Research Marianne Johnson, Statistics Finland Statistical Data Confidentiality Work Session

Statitics Finland /Researcher Services12

Linking data from different sources

- Present method– Register keepers send the data requested by the researcher

over a secure connection , by recommended mail, with courier services etc. to Statistics Finland

– The data includes the Finnish PIN or BIN ( or a pseudocode created by the register keeper and the key is sent separately)

– Statistics Finland creates a project specific pseudocode, changes the PIN (BIN) in the research data sets and uploads the data in the remote access system

- Aim– Pseudocodes should be used in all data deliveries– Register keepers should be able to upload their data direct to

the remote access system using a standard pseudonymization method

21.9.2015

Page 13: Creating a National Remote Access System for Register-based Research Marianne Johnson, Statistics Finland Statistical Data Confidentiality Work Session

Statitics Finland /Researcher Services13

Pseudonymization –project specific

Project 211

Statistics Finland

FIONA

Other registerkeeper

Common9843

Project 211

123456-111A, woman234567-222C, man

nvaoepanwzl, womanbleokldawgs, man

123456-111A, age 15234567-222C, age 44

nvaoepanwzl, age 15bleokldawgs, age 44

Common9843

Project 211

De-identification

De-identification

nvaoepanwzl, age 15bleokldawgs, age 44

nvaoepanwzl, womanbleokldawgs, man

21.9.2015

Page 14: Creating a National Remote Access System for Register-based Research Marianne Johnson, Statistics Finland Statistical Data Confidentiality Work Session

Statitics Finland /Researcher Services14

To be developed….

- We see a problem with the set pseudocodes of the ’ready-made’ data files

• Solution 1: Create project specific pseudocode also for projects that use the ’ready made’

– Problem: A copy of ’ready made’ data sets has to be made for each project -> much excessive disc space is needed

• Solution 2: Send the seed code that has been used for the ’ready made’ files to the other register keepers

– Problem: The key PIN /BIN - pseudocode used by Statistics Finland will be widely known

21.9.2015