tsd: a secure and scalable service for sensitive data and ... · for sensitive data and ebiobanks...

25
TSD: a Secure and Scalable Service for Sensitive Data and eBiobanks Gard Thomassen, PhD Head of Research Support Services Group University Center for Information Technology (USIT) University of Oslo

Upload: others

Post on 27-Jun-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: TSD: a Secure and Scalable Service for Sensitive Data and ... · for Sensitive Data and eBiobanks Gard Thomassen, PhD ... • TSD is involved in the NeIC-based Tryggve project •

TSD: a Secure and Scalable Service for Sensitive Data and eBiobanks Gard Thomassen, PhD Head of Research Support Services Group University Center for Information Technology (USIT) University of Oslo

Page 2: TSD: a Secure and Scalable Service for Sensitive Data and ... · for Sensitive Data and eBiobanks Gard Thomassen, PhD ... • TSD is involved in the NeIC-based Tryggve project •

Outline

•  Sensitive Data •  TSD setup, solutions, demo, status and future •  Q&A •  How to get on board

Page 3: TSD: a Secure and Scalable Service for Sensitive Data and ... · for Sensitive Data and eBiobanks Gard Thomassen, PhD ... • TSD is involved in the NeIC-based Tryggve project •

What is sensitive data?

Norway : Personal Data Act §2, point 8 –  race/ethnic data, political opinion, philosophical

and religious beliefs, the fact that a person has been suspected of, charged with, indicted for or convicted a criminal act, health, sex life and trade-union membership

Page 4: TSD: a Secure and Scalable Service for Sensitive Data and ... · for Sensitive Data and eBiobanks Gard Thomassen, PhD ... • TSD is involved in the NeIC-based Tryggve project •

Who has sensitive data

Almost everyone

Page 5: TSD: a Secure and Scalable Service for Sensitive Data and ... · for Sensitive Data and eBiobanks Gard Thomassen, PhD ... • TSD is involved in the NeIC-based Tryggve project •

TSD launch in Computerworld 16/5-14

Page 6: TSD: a Secure and Scalable Service for Sensitive Data and ... · for Sensitive Data and eBiobanks Gard Thomassen, PhD ... • TSD is involved in the NeIC-based Tryggve project •

Norsk KreftGenom Konsortium Sammenliknet med den hardware vi benyttet fram til overgangen til TSD, som vel kan karakteriseres som en middels brukbar tjenermaskin, med 64 kjerner, kan vi med TSD oppnå en teoretisk hastighetsforbedring på 30X. I tillegg til dette kommer at vi har opitmalisert vår analysepipeline, ved at vi har parallellisert flere trinn. Tidligere ville en sekvenseringsanalyse på 48 svulst/normal-par resultert i kjøringstid på to-tre måneder minimun. Vi kjørte nå denne uka på TSD det samme på to dager og noen timer. Altså forsiktig sagt en dramatisk forbedring. Prof Eivind Hovig, NCGC

Page 7: TSD: a Secure and Scalable Service for Sensitive Data and ... · for Sensitive Data and eBiobanks Gard Thomassen, PhD ... • TSD is involved in the NeIC-based Tryggve project •

Teknisk ukeblad & e24, 5/5-14

Page 8: TSD: a Secure and Scalable Service for Sensitive Data and ... · for Sensitive Data and eBiobanks Gard Thomassen, PhD ... • TSD is involved in the NeIC-based Tryggve project •

Uniforum

Page 9: TSD: a Secure and Scalable Service for Sensitive Data and ... · for Sensitive Data and eBiobanks Gard Thomassen, PhD ... • TSD is involved in the NeIC-based Tryggve project •

TSD

Pilot 2009 - 2012

Page 10: TSD: a Secure and Scalable Service for Sensitive Data and ... · for Sensitive Data and eBiobanks Gard Thomassen, PhD ... • TSD is involved in the NeIC-based Tryggve project •

System requirements •  Security, isolation and access control as given by law •  Large storage capacity •  Multi tenant (multiple users) •  High performance computing (HPC) resource •  High bandwidth •  Easy to maintain and operate •  Easy to use and “practical” (also for audio and video) •  Some freedom within confined user space •  Accessible from anywhere through proper mechanisms •  A variety of software and public data-sources must be available •  Windows and Linux support (server/host-side) •  Data collection services •  Data sharing services

Page 11: TSD: a Secure and Scalable Service for Sensitive Data and ... · for Sensitive Data and eBiobanks Gard Thomassen, PhD ... • TSD is involved in the NeIC-based Tryggve project •

Setup, solutions and status

Page 12: TSD: a Secure and Scalable Service for Sensitive Data and ... · for Sensitive Data and eBiobanks Gard Thomassen, PhD ... • TSD is involved in the NeIC-based Tryggve project •

System outline

Gateway

HPC - Colossus VM-server

Storage

Internet

Secure encrypted network to special high volume data production sites

1 (project)

1 (storage area)

n 1

Page 13: TSD: a Secure and Scalable Service for Sensitive Data and ... · for Sensitive Data and eBiobanks Gard Thomassen, PhD ... • TSD is involved in the NeIC-based Tryggve project •

TSD Windows demo

Page 14: TSD: a Secure and Scalable Service for Sensitive Data and ... · for Sensitive Data and eBiobanks Gard Thomassen, PhD ... • TSD is involved in the NeIC-based Tryggve project •

Data import and export using TSD

File lock server

Virtual file lock server

Virtual project-server

File lock HD

Project HD

TSD

NFS mount

2

Data copied here by sftp (2-factor authentication) encrypted data if sensitive

1

4

3

Page 15: TSD: a Secure and Scalable Service for Sensitive Data and ... · for Sensitive Data and eBiobanks Gard Thomassen, PhD ... • TSD is involved in the NeIC-based Tryggve project •

Data collection using TSD

“Nettskjema-minID” Nettskjema homepage

minID

Project VM

Project disk

File lock

Encrypted XML (PGP)

TSD

Page 16: TSD: a Secure and Scalable Service for Sensitive Data and ... · for Sensitive Data and eBiobanks Gard Thomassen, PhD ... • TSD is involved in the NeIC-based Tryggve project •

Homepage

http://www.uio.no/tjenester/it/forskning/sensitiv/

Page 17: TSD: a Secure and Scalable Service for Sensitive Data and ... · for Sensitive Data and eBiobanks Gard Thomassen, PhD ... • TSD is involved in the NeIC-based Tryggve project •

Projects

http://www.uio.no/tjenester/it/forskning/sensitiv/mer-om/kunder/

Page 18: TSD: a Secure and Scalable Service for Sensitive Data and ... · for Sensitive Data and eBiobanks Gard Thomassen, PhD ... • TSD is involved in the NeIC-based Tryggve project •

TSD status

•  > 80 research projects •  > 350 users •  Secure storage (> 1 PiB on disk) •  Secure data analysis •  Linux or windows hosts (> 250 VMs) •  Secure import and export •  Web-based data harvesting •  HPC cluster (>1500 cores) •  Postgres DBs •  Video and sound display

Page 19: TSD: a Secure and Scalable Service for Sensitive Data and ... · for Sensitive Data and eBiobanks Gard Thomassen, PhD ... • TSD is involved in the NeIC-based Tryggve project •

Capabilities enabled by TSD

•  Large scale NGS research on human genomes •  Large scale medical imaging studies •  Large scale studies with web-based data

collection •  Off-site analysis of sensitive data •  Secure storage for verification of published

research •  Electronic consent

Page 20: TSD: a Secure and Scalable Service for Sensitive Data and ... · for Sensitive Data and eBiobanks Gard Thomassen, PhD ... • TSD is involved in the NeIC-based Tryggve project •

Nordic collaboration opportunities •  Laws are fairly similar (Norway very strict) •  Difficult to exchange sensitive data for research •  One should learn from each other as these systems

demands very special IT-knowledge •  Services development and system-administration

know-how is non-sensitive and may be shared •  Building TSD addressed many novel security

questions in a University setting to be learnt from •  Large DBs/registeries of health data may enable very

interesting research in the future •  TSD is involved in the NeIC-based Tryggve project •  We are happy to collaborate!

Page 21: TSD: a Secure and Scalable Service for Sensitive Data and ... · for Sensitive Data and eBiobanks Gard Thomassen, PhD ... • TSD is involved in the NeIC-based Tryggve project •

Future of TSD - main topics •  How to handle video and sound

–  harvesting –  management –  metadata –  analysis

•  Journal system for Psychologists (Univ of Umeå collaboration) •  Biobanks •  VMware and VDI infrastructure •  Galaxy inside TSD •  Elixir helpdesk connected to TSD •  Hosting docker containers •  Invariant storage of research data (connected with Cristin ?) •  National eInfrastructure investment in TSD ??

Page 22: TSD: a Secure and Scalable Service for Sensitive Data and ... · for Sensitive Data and eBiobanks Gard Thomassen, PhD ... • TSD is involved in the NeIC-based Tryggve project •

Main collaborators on TSD

Collaborators •  Norwegian Storage Infrastructure (NorStore) •  Norwegian Genetics Analysis Platform (GenAp) •  Norwegian Dietary Registry (Medical Faculty) •  Institute of Psychology (Faculty of Social Sciences) •  Norwegian Cancer Sequencing Consortium (NCGC) Reference group Oslo University Hospital, NorStore, Regional Ethical Committee, National Institute of Public Health, Norwegian Cancer Registry, Research Network at OUS, Elixir Norway, NCGC, GenAP, Institute of Psychology.

Page 23: TSD: a Secure and Scalable Service for Sensitive Data and ... · for Sensitive Data and eBiobanks Gard Thomassen, PhD ... • TSD is involved in the NeIC-based Tryggve project •

How to get on board

[email protected] [email protected] NB Remember that NorStore (and StoreBioInfo) hands out TSD storage on a per application basis.

Page 24: TSD: a Secure and Scalable Service for Sensitive Data and ... · for Sensitive Data and eBiobanks Gard Thomassen, PhD ... • TSD is involved in the NeIC-based Tryggve project •

Thanks to

•  tsd-core@usit •  virt-core@usit •  storage-core@usit •  postgres-core@usit •  network-core@usit •  hpc-core@usit •  windows-core@usit •  unix-core@usit •  IT-security@usit

Project group / developers •  IT-dir Lars Oftedal •  Hans A. Eide •  Märtha Felton

Administration / associated

Page 25: TSD: a Secure and Scalable Service for Sensitive Data and ... · for Sensitive Data and eBiobanks Gard Thomassen, PhD ... • TSD is involved in the NeIC-based Tryggve project •

Security details

•  OATH TOTP 2-factor authentication –  Smart phones or programmable hardware tokens

•  Import/export is under strict control •  No open connection to the internet •  All administration happens from the inside •  Strong separation between projects •  Hardened FreeBSD gateway and firewall •  Encrypted backup, one key per project •  Sys-admins are single users (traceability) •  Sys-admins have to use same authentication process