creating a national remote access system for register-based research marianne johnson, statistics...
TRANSCRIPT
![Page 1: Creating a National Remote Access System for Register-based Research Marianne Johnson, Statistics Finland Statistical Data Confidentiality Work Session](https://reader035.vdocuments.net/reader035/viewer/2022062804/5697bf921a28abf838c8ec1d/html5/thumbnails/1.jpg)
Creating a National Remote Access System for Register-based Research
Marianne Johnson, Statistics Finland
Statistical Data Confidentiality Work Session Oct 2015
![Page 2: Creating a National Remote Access System for Register-based Research Marianne Johnson, Statistics Finland Statistical Data Confidentiality Work Session](https://reader035.vdocuments.net/reader035/viewer/2022062804/5697bf921a28abf838c8ec1d/html5/thumbnails/2.jpg)
Statitics Finland /Researcher Services2
Finnish administrative registers
• several comprehensive national registers• contain unit level data on individuals, families, housing,
enterprises• compiled and maintained for administrative or statistical
purposes, e.g. – Population Register Centre (VRK)
– Population information system– Social Insurance Institution (KELA)
– Registers on obtained social benefits– National Institute for Health and Welfare (THL)
– Medical Birth Register, – Care Registers for Social Welfare and Health Care (HILMO),– Finnish Cancer Register
– Ministry of Labour (TEM)– Register over job seekers
– Statistics Finland (Tilastokeskus)
21.9.2015
![Page 3: Creating a National Remote Access System for Register-based Research Marianne Johnson, Statistics Finland Statistical Data Confidentiality Work Session](https://reader035.vdocuments.net/reader035/viewer/2022062804/5697bf921a28abf838c8ec1d/html5/thumbnails/3.jpg)
Statitics Finland /Researcher Services3
Secondary usage of administrative registers
• Production of official statistics is to a large extent based on registers in Finland
- the population and housing census has been based totally on register sources since 1990
- Handbook: Use of Registers and Administrative Data Sources for Statistical Purposes – Best Practices of Statistics Finland
• Register-based research– 20 % of doctoral thesis’ within medicine in Finland include
data from national registers
21.9.2015
![Page 4: Creating a National Remote Access System for Register-based Research Marianne Johnson, Statistics Finland Statistical Data Confidentiality Work Session](https://reader035.vdocuments.net/reader035/viewer/2022062804/5697bf921a28abf838c8ec1d/html5/thumbnails/4.jpg)
Statitics Finland /Researcher Services4 21.9.2015
![Page 5: Creating a National Remote Access System for Register-based Research Marianne Johnson, Statistics Finland Statistical Data Confidentiality Work Session](https://reader035.vdocuments.net/reader035/viewer/2022062804/5697bf921a28abf838c8ec1d/html5/thumbnails/5.jpg)
Statitics Finland /Researcher Services5
Prerequisites for register-based research
• Common personal identification number in all registers– first used in 1964 ( between 1964-1970 two different systems) – since 1971 a digital population register – all Finns have a PIN data from different registers can be linked by PIN e.g.
for research purposes
• Legislation that allows the use of confidential personal data for scientific research• Trust in register keepers and researchers
• Comprehensive, well documented registers
21.9.2015
![Page 6: Creating a National Remote Access System for Register-based Research Marianne Johnson, Statistics Finland Statistical Data Confidentiality Work Session](https://reader035.vdocuments.net/reader035/viewer/2022062804/5697bf921a28abf838c8ec1d/html5/thumbnails/6.jpg)
Statitics Finland /Researcher Services6
Legislative basis for research use of data from Statistics Finland- Statistics Act (280/2004)- In 2013 the Statistics Act was amended to better facilitate the use
of data gathered at Statistics Finland for research purposes.- New objective of the Act
– To extend the use of the data collected for statistical purposes in scientific studies and statistical surveys on social conditions.
- Possibility for researchers to gain access to confidential data from which only the direct identifiers have been removed.– Before 2013 statistical authorities could not give permission
to such confidential data from which the statistical unit could be indirectly identified.
– Gain access = see and analyze data by a remote access -system
21.9.2015
![Page 7: Creating a National Remote Access System for Register-based Research Marianne Johnson, Statistics Finland Statistical Data Confidentiality Work Session](https://reader035.vdocuments.net/reader035/viewer/2022062804/5697bf921a28abf838c8ec1d/html5/thumbnails/7.jpg)
Statitics Finland /Researcher Services7
Remote access system (FIONA)
- In use at Statistics Finland since 2009, development project 2014-2015
- Model taken from Sweden, Denmark and the Netherlands- Researchers use data on Statistics Finland’s server at their own
workplace via a secured Internet connection, data remains at SF- Researchers use a Windows remote desktop, and have access to the
data they have obtained permission to as well as to metadata- The researchers have access to wide range of statistical programs :
STATA, SPSS, R, SAS, Python Anaconda, …- Each research project has its dedicated folders and storage space in the
system - Technical maintenance of the FIONA-system transferred to CSC-It Centre
for Science in 2015 - Number of users and data sets in the remote access system is growing
steadily, currently about 150 active users
21.9.2015
![Page 8: Creating a National Remote Access System for Register-based Research Marianne Johnson, Statistics Finland Statistical Data Confidentiality Work Session](https://reader035.vdocuments.net/reader035/viewer/2022062804/5697bf921a28abf838c8ec1d/html5/thumbnails/8.jpg)
8
Confidentiality- Research data sets are stored on Statistics Finland’s /CSC’s
servers - Only mouse, keyboard and graphic signals are transferred- Access to the system only from preapproved IP-addresses - A disposable SMS password is sent each time the researcher
logs in to FIONA- All data transfers from and to FIONA are handled by personnel at
the Researcher Services of SF– Outputs are checked so that direct or indirect identification is
not possible and files are saved for possible future reference- Access to data is terminated when the permit for the project
expires- FIONA environment is separated from the production network- The system will be audited in fall 2015 after being transferred to
CSC
21.9.2015 Statitics Finland /Researcher Services
![Page 9: Creating a National Remote Access System for Register-based Research Marianne Johnson, Statistics Finland Statistical Data Confidentiality Work Session](https://reader035.vdocuments.net/reader035/viewer/2022062804/5697bf921a28abf838c8ec1d/html5/thumbnails/9.jpg)
Statitics Finland /Researcher Services9
A typical process in applying for sensitive research data
A researcher applies for a
licence to access data for a research
project
The application must include a
research plan and a pledge of
secrecy
The Ethics Committee is consulted in cases
involving large datasets with confidential data
If the data can be given out the licence is
granted (possibly with modifications)
A contract is signed specifying the dataset and the fee as well as
the date of delivery
The data is put together, edited and
uploaded to the remote access system
The researcher uses a remote connection to
analyse the data and sends the
results to Research Services
The results are checked to make sure that no units
(persons, companies) can be
identifiedThe results are sent to the
researcher and they can be used
in publications21.9.2015
![Page 10: Creating a National Remote Access System for Register-based Research Marianne Johnson, Statistics Finland Statistical Data Confidentiality Work Session](https://reader035.vdocuments.net/reader035/viewer/2022062804/5697bf921a28abf838c8ec1d/html5/thumbnails/10.jpg)
Present process for obtaining register data for research
RESEARCHER
Authority AuthorityStatistics FinlandAuthority
§
§ §§
•Handling permit applications•Control and specification•Compiling data-sets
§
§
@@
@@
Researcher responsible of data security and disposal of data sets
Searching for data sets and applying for permits from several different authorities, with varying practices
Delivering data using varying practices
§
Possible corrections and re-sending
Data protection Authority
€€
€ €
21.9.2015 Statitics Finland /Researcher Services 10
InternetInternet
![Page 11: Creating a National Remote Access System for Register-based Research Marianne Johnson, Statistics Finland Statistical Data Confidentiality Work Session](https://reader035.vdocuments.net/reader035/viewer/2022062804/5697bf921a28abf838c8ec1d/html5/thumbnails/11.jpg)
FMASRemote access system
Services that require permit
• Remote desktop for analysing data (programs and tools)• Separated server space for data and metadata • Output service for results, Input service for researcher’s data
Services that require registration
• Centralized digital permit application service
Public services
• Data catalogue• Helpdesk for research and tuition
Interface service for data and meta data,
Pseudonymization
Administration services for user rights
Organiza-tion A
Organiza-tion C
Organiza-tion E
- Commonly agreed metadata standards – Data warehouse - Archive of multiple user files
Researcher
Organiza-tion B
Organiza-tion D
21.9.2015 Statitics Finland /Researcher Services 11
![Page 12: Creating a National Remote Access System for Register-based Research Marianne Johnson, Statistics Finland Statistical Data Confidentiality Work Session](https://reader035.vdocuments.net/reader035/viewer/2022062804/5697bf921a28abf838c8ec1d/html5/thumbnails/12.jpg)
Statitics Finland /Researcher Services12
Linking data from different sources
- Present method– Register keepers send the data requested by the researcher
over a secure connection , by recommended mail, with courier services etc. to Statistics Finland
– The data includes the Finnish PIN or BIN ( or a pseudocode created by the register keeper and the key is sent separately)
– Statistics Finland creates a project specific pseudocode, changes the PIN (BIN) in the research data sets and uploads the data in the remote access system
- Aim– Pseudocodes should be used in all data deliveries– Register keepers should be able to upload their data direct to
the remote access system using a standard pseudonymization method
21.9.2015
![Page 13: Creating a National Remote Access System for Register-based Research Marianne Johnson, Statistics Finland Statistical Data Confidentiality Work Session](https://reader035.vdocuments.net/reader035/viewer/2022062804/5697bf921a28abf838c8ec1d/html5/thumbnails/13.jpg)
Statitics Finland /Researcher Services13
Pseudonymization –project specific
Project 211
Statistics Finland
FIONA
Other registerkeeper
Common9843
Project 211
123456-111A, woman234567-222C, man
nvaoepanwzl, womanbleokldawgs, man
123456-111A, age 15234567-222C, age 44
nvaoepanwzl, age 15bleokldawgs, age 44
Common9843
Project 211
De-identification
De-identification
nvaoepanwzl, age 15bleokldawgs, age 44
nvaoepanwzl, womanbleokldawgs, man
21.9.2015
![Page 14: Creating a National Remote Access System for Register-based Research Marianne Johnson, Statistics Finland Statistical Data Confidentiality Work Session](https://reader035.vdocuments.net/reader035/viewer/2022062804/5697bf921a28abf838c8ec1d/html5/thumbnails/14.jpg)
Statitics Finland /Researcher Services14
To be developed….
- We see a problem with the set pseudocodes of the ’ready-made’ data files
• Solution 1: Create project specific pseudocode also for projects that use the ’ready made’
– Problem: A copy of ’ready made’ data sets has to be made for each project -> much excessive disc space is needed
• Solution 2: Send the seed code that has been used for the ’ready made’ files to the other register keepers
– Problem: The key PIN /BIN - pseudocode used by Statistics Finland will be widely known
21.9.2015