15th of june 2009grids & e-science 2009 santander1 escience activities in a brain imaging...

45
15th of June 2009 Grids & e-Science 2009 Santan der 1 eScience activities in a brain imaging research network. David Rodríguez González SINAPSE collaboration National e-Science Centre. School of Informatics & SFC Brain Imaging Research Centre, Division of Clinical Neuroscience University of Edinburgh On behalf of the SINAPSE Collaboration.

Upload: cory-robertson

Post on 03-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

15th of June 2009 Grids & e-Science 2009 Santander 1

eScience activities in a brain imaging research network.

David Rodríguez GonzálezSINAPSE collaborationNational e-Science Centre. School of Informatics& SFC Brain Imaging Research Centre, Division of Clinical NeuroscienceUniversity of Edinburgh

On behalf of the SINAPSE Collaboration.

15th of June 2009 Grids & e-Science 2009 Santander 2

Contents

SINAPSE eScience for SINAPSE Data Protection Data de-identification Data Sharing Portal Status & Plans

15th of June 2009 Grids & e-Science 2009 Santander 3

Contents

SINAPSE eScience for SINAPSE Data Protection Data de-identification Data Sharing Portal Status & Plans

15th of June 2009 Grids & e-Science 2009 Santander 4

Slide by J. Wardlaw

15th of June 2009 Grids & e-Science 2009 Santander 5

Massive expansion in research imaging

All branches of medicine – particularly brain

Not just medicine – psychology, linguistics, engineering, parapsychology, etc.

In Scotland too!!! 8% UK population 12.5% of all highest rated departments. Highest concentration of biotech in Europe

Neuroscience – much larger than NIH But in 2006 there were machines, pockets of

excellence, but little cohesion

Slide by J. Wardlaw

15th of June 2009 Grids & e-Science 2009 Santander 6

The SINAPSE Project

Stands for Scottish Imaging Network: a Platform for Scientific Excellence.

Pooling initiative of six Scottish universities: Aberdeen, Dundee, Edinburgh, Glasgow, St. Andrews and Stirling.

Main objectives: develop imaging expertise, support multi-centre clinical research in conjunction

with the Clinical Research Networks, improve the ability of neuroscientists to collaborate

on clinical trials, have a direct impact on patient health.

15th of June 2009 Grids & e-Science 2009 Santander 7

SINAPSE – gluing it together

Networking CRNs – large patient populations CRFs – patient-focused research facilities Poolings – buys more science Individual projects – ageing cohorts, multicentre

studies Harmonise framework for image data

management Standardise imaging methods Make available image processing methods Ethics, good imaging research practice Translation from bench to bedside

Slide by J. Wardlaw

15th of June 2009 Grids & e-Science 2009 Santander 8

SINAPSE priority projects

Stroke, the brain and the blood-brain interface

Ageing brain to dementia

Novel molecular imaging markers for major psychiatric disorders

Innovative radiotracers for CNS inflammation

15th of June 2009 Grids & e-Science 2009 Santander 9

Slide by J. Wardlaw

15th of June 2009 Grids & e-Science 2009 Santander 10

Contents

SINAPSE eScience for SINAPSE Data Protection Data de-identification Data Sharing Portal Status & Plans

15th of June 2009 Grids & e-Science 2009 Santander 11

e-Science for SINAPSE

Sharing of research data and applications between centres is an important part of the SINAPSE project’s objectives The increasing amount of data acquired in

modern imaging facilities and the distributed nature of SINAPSE require a proper data management strategy

National e-Science Centre actively involved in the SINAPSE collaboration Mainly through the IT & Image Analysis

Committee

15th of June 2009 Grids & e-Science 2009 Santander 12

eScience project activities

Information governance & data de-identification Networking Development of de-identification tool

Data sharing infrastructure Facilitating multi-centre studies

Portal for brain imaging Improving usability

Other Analysis methods

15th of June 2009 Grids & e-Science 2009 Santander 13

Contents

SINAPSE eScience for SINAPSE Data Protection Data de-identification Data Sharing Portal Status & Plans

15th of June 2009 Grids & e-Science 2009 Santander 14

Data Protection Act

UK’s Data Protection Act (1998). Implements the European Community Data Protection Directive 1995.

Establish individuals’ rights on data held about them and obligations for organisations or people processing personal data.

Personal data must be processed in a fair and lawful manner. 8 DPA principles.

Other legislation pieces apply to medical data. Common law: duty of confidentiality. Human Rights Act 1998 (article 8).

15th of June 2009 Grids & e-Science 2009 Santander 15

DPA in research

The DPA does not define the term “research purposes” apart from clarifying that it includes statistical or historical purposes.

Data processing for research should be ‘compatible’ with the purpose for which the data were originally obtained.

The data subjects should be aware that their personal information will be used for research purposes.

15th of June 2009 Grids & e-Science 2009 Santander 16

Anonymous Data

Coded (pseudonymised or linked anonymised) data: the identifiable information has been

substituted by alphanumerical sequences with no plain meaning.

The data is anonymous to the research team. The key to reverse the transformation shall be

held securely by a third party to avoid falling into the DPA.

(Fully) Anonymised data: all personal identifiers or codes have been

irreversibly removed.

15th of June 2009 Grids & e-Science 2009 Santander 17

MIDAS meeting (18th March 2009)

Medical Imaging Data Access and Sharing

Hosted in the e-Science Institute Brought together representatives from

the NHS Scotland & the universities Successful meeting with useful discussion

Came out with a roadmap for improving the data sharing between both sides

Report produced now being circulated between attendees

15th of June 2009 Grids & e-Science 2009 Santander 18

Contents

SINAPSE eScience for SINAPSE Data Protection Data de-identification Data Sharing Portal Status & Plans

15th of June 2009 Grids & e-Science 2009 Santander 19

SINAPSE DICOM De-Identification Toolkit

Implemented in Java. Configurable for each site. The idea is to deploy it as near as possible to the data

acquisition. Privacy Policy configurable using XML documents.

Different projects can apply different policies. The policy specifies the classes that will execute the

transformation of the data. Graphical tool for editing the policies.

These classes will be distributed in signed jars, and their authenticity will be checked using their hash.

For data provenance checks and auditing purposes the classes’ version will be tracked.

15th of June 2009 Grids & e-Science 2009 Santander 20

Data De-Identification

National PACS

CHI Transformation

ServiceSINAPSE

AnonymiserLocal Storage

Anonymous research data

Link Table

NHS Research Centre

Local RIS

15th of June 2009 Grids & e-Science 2009 Santander 21

CHI Transformation Service CHI (Community Health Index) is the National

unique identifier for NHS (Scotland) patients Used in any health related communication As it identifies the patient it is sensitive information

It is composed of 10 digits that include Date of birth Gender Control digit

Possibilities Reversible / Irreversible transformation Unique for all SINAPSE / Unique for each Data

Controller

15th of June 2009 Grids & e-Science 2009 Santander 22

Data input

De-identification

Metadata

extraction

Data output

Anonymiser workflow

File system

Receiver

File system

SFTP

Content Provenance

Structure Catalogue

15th of June 2009 Grids & e-Science 2009 Santander 23

DICOM standard

DICOM library

DICOM library adaptor

Anonymiser library

FieldTransformer

ApplicationPolicy Builder

FieldTransformer

FieldTransformer

FieldTransformer

Privacy

policy

SINAPSE Anonymiser components

15th of June 2009 Grids & e-Science 2009 Santander 24

FieldTransformers

Classes implementing an interface that are used for atomic transformations of the contents of fields.

Specified in run time by the used Privacy Policy.

Format independent. Only work with the content.

Examples: DatesTransformer StudyIDTransformer InformationOverwriter …

15th of June 2009 Grids & e-Science 2009 Santander 25

Privacy Policies

XML documents containing the rules for anonymising the data

Specify: The target fields The class used for the transformation

including: Version Digest Location (jar file)

Parameters

15th of June 2009 Grids & e-Science 2009 Santander 26

Policy Editor

A graphical tool to help building policy documents.

DICOM dictionary. Searches for “FieldTransformer”

classes in jar files.

15th of June 2009 Grids & e-Science 2009 Santander 27

Policy Editor

15th of June 2009 Grids & e-Science 2009 Santander 28

Registry

A catalogue containing privacy policies.

The application can work without this. But it helps to set a coherent set of

policies. Transformer classes,

and the corresponding jar files.

15th of June 2009 Grids & e-Science 2009 Santander 29

Contents

SINAPSE eScience for SINAPSE Data Protection Data de-identification Data Sharing Portal Status & Plans

15th of June 2009 Grids & e-Science 2009 Santander 30

Data Sharing e-Infrastructure

For enabling multi-centre clinical research through data sharing

Some features of the proposed of the SINAPSE e-infrastructure project are: De-Identification, automatic compliance with data

protection policies; Security, advanced authentication and

authorisation within projects; Usability, providing a user friendly environment to

access data and applications; Modularity, conforming to relevant standards and

use of existing components; Centralisation, leveraging existing compute

clusters and storage.

15th of June 2009 Grids & e-Science 2009 Santander 31

Benefits

Easier Data Protection compliance for users

Enables secure data sharing Coherent view of available data

(single point of access) Roadmap for end-of-project data

publication & data curation

15th of June 2009 Grids & e-Science 2009 Santander 32

Data Storage & Access

Centralised model adopted: cheaper, easier, allows to reduce the IT burden undertaken by research staff. Although there are several grid projects that

provide DICOM functionalities. The research data will be encrypted

before storing it. Data organised per project

Access control using groups & roles. Authentication using Shibboleth due to

usability concerns regarding X.509 certificates.

15th of June 2009 Grids & e-Science 2009 Santander 33

Centralised Architecture (pros & cons)

Simpler Deployment Easier middleware release control Lesser impact in participant centres Easier to manage and use No default resilience

A second centre would be needed But this is only necessary for critical services With a good support a reasonable service can

be provided using a single centre

15th of June 2009 Grids & e-Science 2009 Santander 34

Deployment Plan

ECDF (http://www.is.ed.ac.uk/ecdf/) A singular facility along Scotland

Disk space and CPU time will be rented depending on the necessities.

1456 CPU cores 275 TB of disk

Also SINAPSE owned server to be hosted by ECDF: ECDF will provide basic hardware + software support SINAPSE services to be hosted in it:

Portal Data Catalogue Research Data encryption service OGSA-DAI Projects’ customised databases RAPID…

15th of June 2009 Grids & e-Science 2009 Santander 37

RESOURCES

DATA PROVIDER SERVICES

SINAPSE SERVICES

CPUs Storage Network Local Auth

SINAPSE Anonymiser CHI Transformation Service

VOMS JSS Metadata Catalogue

RD Key Storage

Portal Basic WS

Shibboleth

RAPID

RD Encryption OGSA-DAI

SINAPSE EXTERNALLOCAL

APPLICATIONS

Storage

Ageing Psychiatry Stroke …

CPUs

15th of June 2009 Grids & e-Science 2009 Santander 38

Authentication

Shibboleth federated authentication Single sign-on Delegated to home universities Users will continue using a method they

are already familiar with X.509 certificates are usual in Grids

But can be a handicap for some users

15th of June 2009 Grids & e-Science 2009 Santander 39

Authorisation

Dynamic Virtual Organisations Members should be added/removed

easily New VOs creation for new

projects/studies VO role management

Role based access Allows different access levels to

information for different users

15th of June 2009 Grids & e-Science 2009 Santander 40

Catalogues

Data Catalogue for keeping track of the files in the system

Metadata Catalogue storing key attributes extracted from the DICOM headers It will also keep information on de-identification

process for data provenance Clinical Information databases and

customised metadata databases can be deployed by the different projects

OGSA-DAI will be used to provide access to these resources

15th of June 2009 Grids & e-Science 2009 Santander 41

Contents

SINAPSE eScience for SINAPSE Data Protection Data de-identification Data Sharing Portal Status & Plans

15th of June 2009 Grids & e-Science 2009 Santander 42

Portal

A gridsphere based portal will give access to the resources.

Basic functionality to be provided by SINAPSE Data uploading Catalogues querying …

Different subprojects will develop their own customised portlets to be integrated in the portal

15th of June 2009 Grids & e-Science 2009 Santander 43

A portal for brain imaging(MSc project by Albert Heyrovský)

Motivations to facilitate the

usage of complex software packages

to provide access to large computing resources like ECDF

First application: brain perfusion imaging analysis

Easily extensible to other brain imaging applications

Portlets generated using the Rapid system developed at NeSC

15th of June 2009 Grids & e-Science 2009 Santander 44

Contents

SINAPSE eScience for SINAPSE Data Protection Data de-identification Data Sharing Portal Status & Plans

15th of June 2009 Grids & e-Science 2009 Santander 45

Status

The proposal was adopted by the SINAPSE IT & Image Analysis committee

Grant application to support pilot project (including hardware & storage resources) rejected Considering resubmission

SINAPSE De-Identification Toolkit deployed SBIRC (Edinburgh) Aberdeen Used for anonymising acute stroke study data

15th of June 2009 Grids & e-Science 2009 Santander 46

Plans

Development of new components started: Catalogues.

Portal for brain imaging to be kickstarted with the MsC in eScience student´s project

Collaboration with other centres CRIC (Edinburgh) TMRC (Dundee)

15th of June 2009 Grids & e-Science 2009 Santander 47

Questions