uk e-science 2008 all hands meeting. edinburgh. data sharing e-infrastructure david rodriguez 1,...
TRANSCRIPT
UK e-Science 2008 All Hands Meeting. Edinburgh.
Data Sharing e-Infrastructure
David Rodriguez1, Trevor Carpenter2, Jano van Hemert1 & Joanna Wardlaw2.
On behalf of the SINAPSE Collaboration.
1. National e-Science Centre. School of Informatics, University of Edinburgh.
2. SFC Brain Imaging Research Centre. Department of Clinical Neurosciences, University of Edinburgh.
UK e-Science 2008 All Hands Meeting. Edinburgh.
Contents
The SINAPSE project Data Protection &
pseudonymisation Data sharing Components Status
UK e-Science 2008 All Hands Meeting. Edinburgh.
Contents
The SINAPSE project Data Protection &
pseudonymisation Data sharing Components Status
UK e-Science 2008 All Hands Meeting. Edinburgh.
The SINAPSE Project
Stands for Scottish Imaging Network: a Platform for Scientific Excellence.
Pooling initiative of six Scottish universities: Aberdeen, Dundee, Edinburgh, Glasgow, St. Andrews and Stirling.
Main objectives: develop imaging expertise, support multi-centre clinical research in conjunction
with the Clinical Research Networks, improve the ability of neuroscientists to collaborate on
clinical trials, have a direct impact on patient health.
UK e-Science 2008 All Hands Meeting. Edinburgh.
Data Sharing e-Infrastructure
For enabling multi-centre clinical research through data sharing.
The main objectives of the SINAPSE e-infrastructure project are: Anonymisation, automatic compliance with data
protection policies; Security, advanced authentication and authorisation
within projects; Usability, providing a user friendly environment to
access data and applications; Modularity, conforming to relevant standards and use
of existing components; Centralisation, leveraging existing compute clusters
and storage.
UK e-Science 2008 All Hands Meeting. Edinburgh.
Benefits
Easier Data Protection compliance for users
Enables secure data sharing Coherent view of available data
(single point of access) Roadmap for end-of-project data
publication & data curation
UK e-Science 2008 All Hands Meeting. Edinburgh.
Key Features
Single sign-on: identify once per session for all the services. Delegated authentication to home universities
Permission management using groups and roles Data Catalogue:
Files Catalogue Metadata Catalogue: storing relevant information to
allow users find the desired data
Modularity Reuse existing components Allows future updates/changes
UK e-Science 2008 All Hands Meeting. Edinburgh.
Access Levels
Different access levels for different users/use cases
From only file access to encrypted files for site operators
Researchers sometimes just need access to decrypted images and associated basic image metadata, other will access to more clinical information
and metadata.
UK e-Science 2008 All Hands Meeting. Edinburgh.
Contents
The SINAPSE project Data Protection &
pseudonymisation Data sharing Components Status
UK e-Science 2008 All Hands Meeting. Edinburgh.
Data Protection
Data Protection Act (1998). Other legislation applies. Personal data must be processed in a fair and lawful manner. Projects to be run in SINAPSE shall have a proper consent form for the
processing to be done. All ethical approval.
Pseudonymous identifier to substitute the CHI (Community Health Index). Linked using a database.
Anonymisation of other fields. Full destruction of the information for some data like name or
address. Depending on the project some might be transformed into less
informative representations: Postal Code -> Deprivation Index or partial Postal Code Date of birth -> Age (with different precisions).
Any later access to personal data will be granted by the corresponding Data Controller.
All personal data processing will be logged for auditing.
UK e-Science 2008 All Hands Meeting. Edinburgh.
Data Pseudonymisation
National PACS
CHI Transformation
Service Pseudonymisation Application
Local Storage
Anonymous research data
Link Table
NHS Research Centre
UK e-Science 2008 All Hands Meeting. Edinburgh.
Pseudonymisation Tool
Implemented in Java. To be deployed as near as possible to the data
acquisition. Can be configured for each site. Configurable using XML documents.
Different projects can apply different policies. The policy specifies the classes that will execute the
transformation of the data. Graphical tool for editing the policies.
These classes will be distributed in signed jars, and their authenticity will be checked using their hash. For data provenance checks and auditing purposes the
classes’ version will be tracked.
UK e-Science 2008 All Hands Meeting. Edinburgh.
CHI Transformation Service CHI (Community Health Index) is the National
unique identifier for NHS (Scotland) patients Used in any health related communication As it identifies the patient it is sensitive information
It is composed of 10 digits that include Date of birth Gender Control digit
Possibilities Reversible / Irreversible transformation Unique for all Sinapse / Unique for each Data
Controller
UK e-Science 2008 All Hands Meeting. Edinburgh.
Contents
The SINAPSE project Data Protection &
pseudonymisation Data sharing Components Status
UK e-Science 2008 All Hands Meeting. Edinburgh.
Data Sharing
Centralised model adopted: cheaper, easier, allows to reduce the IT burden undertaken by research staff. Although there are several grid projects that
provide DICOM functionalities. The research data will be encrypted
before storing it. Data organised per project
Access control using groups & roles. Authentication using Shibboleth due to
usability concerns regarding X.509 certificates.
UK e-Science 2008 All Hands Meeting. Edinburgh.
Data Files
University Authentication Service
VOMS
Metadata Catalogue
SINAPSE
Storage
Data Catalogue
Uploading Data
Local Storage
Portal
Data Upload Service
Metadata extraction
Data Encryption
Data Storage
UK e-Science 2008 All Hands Meeting. Edinburgh.
Centralised Architecture
Simpler Deployment Easier middleware release control Lesser impact in participant centres Easier to manage and use No default resilience
A second centre would be needed But this is only necessary for critical services With a good support a reasonable service can
be provided using a single centre
UK e-Science 2008 All Hands Meeting. Edinburgh.
Deployment Plan
ECDF (http://www.is.ed.ac.uk/ecdf/) A singular facility along Scotland
Disk space and CPU time will be rented depending on the necessities.
1456 CPU cores 275 TB of disk
Also SINAPSE owned server to be hosted by ECDF: ECDF will provide basic hardware + software support SINAPSE services to be hosted in it:
Portal Data Catalogue Research Data encryption service OGSA-DAI Projects’ customised databases RAPID…
UK e-Science 2008 All Hands Meeting. Edinburgh.
Contents
The SINAPSE project Data Protection &
pseudonymisation Data sharing Components Status
UK e-Science 2008 All Hands Meeting. Edinburgh.
Portal
A gridsphere based portal will give access to the resources.
Basic functionality to be provided by SINAPSE Data uploading Catalogues querying …
The projects will customise the portal for their needs providing their own portlets.
UK e-Science 2008 All Hands Meeting. Edinburgh.
Authentication
Shibboleth federated authentication Single sign-on. Delegated to home universities. Users will continue using a method
they are already familiar with. X.509 certificates are usual in Grids
But can be a handicap for some users.
UK e-Science 2008 All Hands Meeting. Edinburgh.
Authorization
Dynamic Virtual Organisations Members should be added/removed
easily New VOs creation for new
projects/studies VO role management
Role based access Allows different access levels to
information for different users
UK e-Science 2008 All Hands Meeting. Edinburgh.
Communications
Encrypted communications for all the services: GridFTP SSH HTTPS for web services
UK e-Science 2008 All Hands Meeting. Edinburgh.
Images Encryption
These keys are to protect research data, not personal data Not so sensitive.
Keys accessible from all the SINAPSE sites
Access to the keys based on groups and roles Project/study dependent
UK e-Science 2008 All Hands Meeting. Edinburgh.
Catalogues
Data Catalogue for keeping track of the files in the system
Metadata Catalogue storing key attributes extracted from the DICOM headers.
Clinical Information databases and additional metadata databases can be deployed by the different projects.
OGSA-DAI will be used to provide access to this resources.
UK e-Science 2008 All Hands Meeting. Edinburgh.
Contents
The SINAPSE project Data Protection &
pseudonymisation Data sharing Components Status
UK e-Science 2008 All Hands Meeting. Edinburgh.
Status
Proposal endorsed by the SINAPSE IT & Image Analysis committee last July.
Grant application for machines & storage resources to be sent soon.
Pseudonymisation tool being tested.