idash national center for biomedical computing · pdf fileidash national center for biomedical...

59
iDASH National Center for Biomedical Computing Sharing and Protecting Human Subjects Data 6/14/15 50 th iDASH external webinar NIH U54 HL108460 Lucila Ohno-Machado, MD, MBA, PhD Biomedical Informatics, University of California San Diego

Upload: vuonghanh

Post on 13-Mar-2018

225 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

iDASH National Center for Biomedical ComputingSharing and Protecting Human Subjects Data

6/14/1550th iDASH external webinar

NIH U54 HL108460 Lucila Ohno-Machado, MD, MBA, PhDBiomedical Informatics, University of California San Diego

Page 2: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

Patient Interaction

Data AnalysisStatistics Machine Learning

Data StructuringNatural Language ProcessingData Modeling

Predictive ModelingEvaluation Methods

Decision Support ToolsGuidelines, Alert & Reminders

Data Collection ToolsClinical Data Warehouse

Data IntegrationGenomicsProteomicsSensors

Data De-IdentificationPrivacy Technology

Communication StrategiesConsumer Health Informatics Medical Education

Page 3: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

Knowledge & Tools

Privacy

Consent

Data

Our Goals

• Share access to data and computation

• Train the new generation of data scientists

• Provide innovative software, platform, and infrastructure

• Protect privacyDevelop» Algorithms» Tools» Infrastructure» Policies

iDASH

Knowledge& Tools

ServicesPlatform

Data

Sensors

Genomic

Clinical

ServiceWWW

Apps

Exec.

Aggreg.Hosting

Sharing

Policies

Platform

Research

Develop.

Federation

Page 4: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

Postdocs (8 total)Current

MengShuangWenrui

PhD and master’s students (14 total)Current

WeiZhanglong

Yuan WuAsstProf Duke

MyoungLaSoftware EngineerMicrosoft

ShuangWangPostDoc UCSDK99/R00

AdelaGrandoAsstProf ASU

Elizabeth BellResearchAsstUCSD

XiaoqianJiangAsstProf UCSDK99/R00

Past Postdocs Past Interns

Trainees

Trainees 2011-2012Challen (UCLA)Christos (Berkeley) Colin (break)Hyunchul (military)Jialan (industry)Melanie (industry)Neda (postdoc)Petra (industry)Pinghao (industry)Seena (industry)Stefan (industry)Stephanie (grad st UCSD)Wanmin (industry)Wenchao (grad st U Minn)Wenrui (postdoc)

Summer Interns (66)

MikeConwayAsstProf U UtahK99/R00

Undergrad Students (9 total)Current

BriandaDexter

2011

KaushikSinhaAsstProf Wichita St

NLM Training Grant started 2012 (9 pre- and 6 postdoc slots)Graduates from the postdoc programMindy (Asst Prof UCLA)Dyvia (Fellowship in Resp Med UCSD)Edna (Residency in Surgery)

Page 5: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

Publications

• Published Articles and Book Chapters: 138• Presentations: 244• Posters: 72

Topic # Published Cell Biology 2Cloud Computing and Architecture 1Data Analysis and Compression 5Data Modeling and Integration 4Data Sharing 5Genomics 28Imaging Informatics 4Infrastructure 4Kawasaki Disease (DBP 1 & 4) 13Natural Language Processing 7Patient Centered Research 9Physical Activity Monitoring (DBP 3) 2Privacy Technology 41Statistics 13Total 138

https://idash.ucsd.edu/publications

As of 6/4/15

Page 6: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

Integrating Different Types of Data

Genotype RNA

Metabolites

transcription

trans

latio

n

genome transcriptome

laboratoryPhysiology tests

Protein proteome

Phenotype physical exam, imaging, monitoring systems

Page 7: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

7

Page 8: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

Biometrics and PHI

PHI requires HIPAA

• Biometrics require HIPAA

Biometrics are Protected Health Information (PHI)

Page 9: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

Genomes are Biometrics

PHI requires HIPAA

Biometrics are Protected Health Information (PHI)

Page 10: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

Big Data Announcement

Page 11: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

Research DataClinical Data Applications Integration2008-2009 2010-2011 2012-2013 2014-2015

Electronic Health Record SystemEpic & Clarity

Other SystemsPACS, lab, etc

Personnel SystemsActive Directory

Query ToolsUC-ReX ExplorerPrivacy Technology

Clinical Research DataRedCAPVelosOther DBs

iDASH HIPAA SHADEImages, human genomes, etc

Analytical Tools

Recruitment Consent toolsCustom Apps

VA LA Clinics

UCSF

Davis

Irvine

UCLA

Healthcare Clinical DataClinical Data Warehouse for Research

Scalable Network(Distributed Analytics Tools)

HIPAA

External data (patient reported data, sensors)

pSCANNERPCORI CDRN

iDASH HIPAA/FISMA OVERCASTiDASH, CTRI, School of Medicine

De-ID Tools

UCSD Health Sciences: Building Protected Health Information Networks

SCANNER

BRIGHT

iDASH

PhenDISCO NLM Training Grant

K22, K99s

PCORI contracts

Private Cloud

iCONCUR

UC-ReX

pSCANNER

Accrual for Clinical Trials

CTSA renewal

bioCADDIE

R21, subcontracts

Health System Department

USC/LAC Cedars Sinai

San Mateo

EpicCDDSNew modules

Page 12: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

Partners on Patient Privacy Projects

Privacy Preserving Analytics for KD in African-Americans

Consent for Data and Biosample

Sharing in Underserved Populations

Partnership for Epidemiological

Research Study on Latinos

Data and Biospecimen SharingPrivacy Preserving Computation

Which DNA variants are implicated in KD susceptibility in this population?Emory, Genome Institute of Singapore, Imperial College

Does consent rate depend on who is obtaining the consent?Maricopa Health System, FQHS in Arizona

Do patients understand what they consented for?San Diego State University

What type of ‘sharing’ is acceptable?University of Oklahoma

StrongHeartStudy on American Indian Populations

Page 13: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

What to share / Who to share with

Some preliminary findings from a limited set of interviews• Healthy volunteers do not want to share with

commercially sponsored researchers• Some want their medical information shared with

only UCSD researchers, no others• Many do not want to share at least 1 category of

sensitive information• Most common decline was genetic, followed by

sexual & reproductive health

Courtesy of E Bell

Page 14: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

Consent Management System

Do I wish to disclose data D to U?

Sharing Look-up

Yes

Patient I

Patient Interface

I can check that U looked at my data D

• Data use agreements

• Study registry

Trusted broker

Healthcare Institutions

User U requests Data D on individual I

Shifting control

Page 15: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

Front pageCourtesy of H Kim

Page 16: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

Choice historyCourtesy of H Kim

Page 17: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

Sharing choice taxonomy I

Courtesy of H Kim

Page 18: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

Sharing choice taxonomy II

Page 19: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

Sharing choice taxonomy III

Page 20: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

informed CONsent for Clinical data Use for Research

Consent Management

System

Sharing Look-upRegistryPatient I

Electronic Health Record

Clinical Data Warehouse

Query

User U

Healthcare Institution

Results

Conc

ierg

e or

Auto

mat

ed S

ervi

ces

Conn

ectin

g So

ftw

are

iCONCUR

Trusted Entity

20

Page 21: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

Ohno-Machado L. To Share or Not To Share: That Is Not the Question. Science Translational Medicine, 2012 4(165)

homomorphic encryption

secure multiparty computation

iDASH “commons”

Sharing Data, Tools, Systems

differential privacy

indexing

Page 22: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

Institutions with Signed Agreements

DCA• National

» UCSD» Children’s Health Care of Atlanta (GA)» Long Beach Veterans Affairs Medical

Center» Ortho Kenematics (TX)

• International» Mahidol University (Thailand)

DUA• National

» UCSD» Databetes (NY)» Tin Man Labs, LLC (TX)» UMass Dartmouth» Georgia Institute of Technology» University of Utah» The Ola Grimsby Institute (CA)» The Methodist Hospital Research Institute (TX)» Wake Forest University Health Systems (NC)

• International» North West London Hospitals NHS Trust (UK)» The University Hospital of Leuven (Belgium)» INRIA (France)» Newton Circus Pte. Ltd. (Singapore)

Page 23: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

‘De-Identification’ (microdata release)

Name Age Education

Hours/week

… HTN

Frank 42 6 40 … Y

Bob 31 10 60 … Y

Dave 43 9 40 … N

… … … … … …

Courtesy of Li Xiong

• HIPAA compliant methods» Safe harbor dataset

• Removal of 18 safe harbor identifiers

» Limited dataset• Removal of direct

identifiers» Statistical methods

• Removal/grouping of attributes

• “Risk reasonably low”• Re-identification and

disclosure risks

Page 24: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

Statistical Data Release (macrodata release)

Mohammed N et al. Privacy Preserving Heterogeneous Health Data Sharing. J Am Med Inform Assoc 2013

Jiang XL et al. Differential-Private Data Publishing Through Component Analysis. Transactions on Data Privacy 2013

Courtesy of Li Xiong

Page 25: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

Original records Original histogram

Statistical Data Release: Disclosure Risk

Courtesy of Li Xiong

Page 26: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

Original records Original histogramPerturbed histogram with differential privacy

Statistical Data Release: Differential Privacy

Courtesy of Li Xiong

Page 27: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

Differential Privacy (Dwork et al)

A privacy mechanism A gives ε-differential privacy if for all neighbouring databases D, D’, and for any possible output S ∈ Range(A),

Pr[A(D) = S] ≤ exp(ε) × Pr[A(D’) = S]

D D’

• D and D’ are neighboring databases if they differ on at most one record

Courtesy of Li Xiong

Page 28: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

iDASH 2014 First Privacy Protection Challenge

• Task 1: Privacy-preserving SNP Data Sharing• Task 2: Privacy-preserving release of top K

most significant SNPs

Evaluate solutions of guaranteed privacy protection for protecting the output of genomic data analysis

Page 29: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

Publication Trends

As of 6/4/15

Page 30: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

Ohno-Machado L. To Share or Not To Share: That Is Not the Question. Science Translational Medicine, 2012 4(165)

homomorphic encryption

secure multiparty computation

iDASH “commons”

Sharing Data, Tools, Systems

differential privacy

indexing

Page 31: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

iDASH’15 Privacy Protection Challenge

• Task 1: Homomorphic encryption (HME) based secure genomic data analysis

• Task 2: Secure comparison between genomic data in a distributed setting

• Focus on secure outsourcing and secure data analysis in a distributed setting (humangenomeprivacy.org)

Page 32: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

Genome Privacy Challenge 2015

Winners for HomomorphicEncryption

• Stanford/MI• IBM• Microsoft

Page 33: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

Workshops and Symposia• 12 Workshops

https://idash.ucsd.edu/events/workshops» 4 Privacy » 2 NLP » 2 Imaging Informatics» 4 Others (High Performance Computing, Biomedical Data Sharing, IEEE

HISB, Mobile Data)

• 9 Symposiahttps://idash.ucsd.edu/news-and-events

» 4 All-Hands» 5 Internship

• Next internship symposium planned for August 8, 2015

Page 34: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

Partners on Patient Privacy Projects

Privacy Preserving Analytics for KD in African-Americans

Consent for Data and Biosample

Sharing in Underserved Populations

Partnership for Epidemiological

Research Study on Latinos

Data and Biospecimen SharingPrivacy Preserving Computation

Which DNA variants are implicated in KD susceptibility in this population?Emory, Genome Institute of Singapore, Imperial College

Does consent rate depend on who is obtaining the consent?Maricopa Health System, FQHS in Arizona

Do patients understand what they consented for?San Diego State University

What type of ‘sharing’ is acceptable?University of Oklahoma

StrongHeartStudy on American Indian Populations

Page 35: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

International Collaboration

Slide from Dr. Shuang Wang

Page 36: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

First Driving Biological Projects

• DBP 1: Molecular Phenotyping of Kawasaki Disease» Goal: Understand how molecular phenotype relates to clinical

phenotype and how they may help predict susceptibility, response to treatment, and risk for cardiovascular sequelae

• DBP 2: Post-Marketing Surveillance of Hematologic Medications» Goal: Study adverse events associated with four different oral

hematologic medications (prasugrel, clopidogrel, warfarin, and dabigatran)

• DBP 3: Individualized Intervention to Enhance Physical Activity» Goal: Create an intervention system to provide individualized feedback

to increase physical activity and decrease sedentary behavior

Page 37: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

● Predictive modeling and adjustment for cofounders require lots of data

● Some institutions cannot move data outside their firewalls, we can bring computation to the data

User requests data for Quality Improvement or Research

•Identity & Trust Management•Policy enforcement

Trusted Broker(s)

Security Entity

Diverse Healthcare Entitiesin 3 different states (federal, state, private)

Distributed computingScalable National Network for Comparative Effectiveness Research

Wu Y et al. Grid Binary LOgistic REgression (GLORE): Building Shared Models Without Sharing Data. JAMIA, 2012 Wang S et al. EXpectation Propagation LOgistic REgRession (EXPLORER): Distributed Privacy-Preserving Online Model Learning. J Biomed Inf 2013 Jiang W et al.. WebGLORE: A Webservice for Grid Logistic Regression. Bioinformatics 2014Wu Y et al. Grid Multi-Category Response Logistic Models. BMC Med Inform Dec Making 2015

Page 38: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

Horizontal and Vertical Partitions

Patient Age Insurance

A1 45 X

A2 32 Y

Patient Age Insurance

B1 45 Y

B2 32 Y

Patient Age Insurance

A1 45 X

A2 32 Y

Li Y, Jiang X, Wang S, Xiong L, Ohno-Machado L. VERTIcal Grid lOgistic regression (VERTIGO) submitted.

Page 39: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

Clinical Data Network – UC-ReX

• Clinical Data Warehouses from 5 Medical Centers and affiliated institutions exchange (>13 million patients)» Translational research» Patient safety surveillance» Quality improvement

Funded by the UC Office of the President

UC-ReX was formed in 2010

Page 40: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

Phase 1

UC Davis 2.3M

UCSF 3.2M

UCLA 4.3M

UC Irvine 1.4M

UC ReX

SCANNER

USC

VA National Enterprise Data WarehouseVINCI 8.7M

UCSD 2.3M

CTSA hubNetwork

Altamed 200kChidlren’s Clinic 24k Queenscare 19k

21 Million people

Standardized data

Data governance

9 health systems

funded by

Page 41: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

Phase 2

Idaho State University Pocatello Family Medicine 16k

Critical Access Hospital Network 20k

Family Medicine Residency of Idaho 39k

UC Davis 2.3M

UCSF 3.2M

UCLA 4.3M

UC Irvine 300k

USC Keck 2M

LA Children’s 200k

LA DHS 600K

Bi-State Primary Care Association 122kCherokee Health Systems 89kDenver Health Hospital Authority 150kColorado Community Managed Care Network350k

WWAMI Region Practice & Research Network

SAFTiNet

UC ReX

SCANNER

USC

VA VINCI 11M

UCSD 2.3M

Cedars-Sinai 2M

University of Washington CTSA

CTSA hubNetwork

Intermountain Healthcare 2M

San Mateo Medical Center 77k

University of Colorado Health System 672k

Altamed 200kChidlren’s Clinic 24k Queenscare 19k

31 Million people

Same standards

Data governance

23 health systems

University of Texas Houston (NLP)

Page 42: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

Ohno-Machado L. To Share or Not To Share: That Is Not the Question. Science Translational Medicine, 2012 4(165)

homomorphic encryption

secure multiparty computation

iDASH “commons”

Sharing Data, Tools, Systems

differential privacy

indexing

Page 43: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

Clinical Research Informatics CTRI

Clinical Trial Management System, RedCAPData Concierge ServiceManagement of iDASH HIPAA cloud

43

Page 44: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

iDASH On-Demand Resources

SafeHIPAA-compliantAnnotated Data deposit boxEnvironment

On-demandVirtualizedElasticResilientCompute AndStorageTechnology

HIPAA and non-public data

public data, tools, recipes

Pow

ered

by

MID

AS

Data Tools Recipes

upload & download data

compute request,direct upload & download of proprietary data, tool, recipe

middleware and HIPAA security developed by iDASH

Compute nodesMemoryDisk storageNetworking

Pow

ered

by

VMw

areAUTOMATED

Page 45: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

iDASH HIPAA Private Cloud3 computation tiers3 storage tiers10GbE throughoutFull redundancyRSA Two Factor Auth.Remote data replication

800+ cores8TB+ RAM700TB+ storage

35 iDASH CLOUD customers196 instantiated VMs992 expired/destroyed VMs8.16 TB allocated memory680 TB storage consumed

Page 46: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

Secure VM Templates• Full disk encryption• Built-in Firewall• Secure shared memory• No root SSH • Protected su• Harden sysctl networking• Disabled Open DNS Recursion• IP Spoofing protection• Hardened PHP for webapps• Apache application firewall -

ModSecurity• ModEvasive protection of

webapps from DDOS attacks

• Automatic logs scanning and banning of suspicious hosts -DenyHosts and Fail2Ban

• Intrusion Detection - PSAD• Periodic checking for RootKits -

RKHunter and CHKRootKit• Autoscan for open Ports - Nmap• Analysis of system log files -

LogWatch• SELinux / Apparmor application

boundary enforcement• System security auditing with

Tiger

Page 47: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

iDASH Tools

Research Topic Tool NameDBP tool (1) SenSed

Genomics (10)

AbsCNVseqWhole Genome RVistaDNA-CompactHUGO (Hierarchical mUlti-reference Genome cOmpression)IDEPI (Identify Epitopes)MAAMD: A Workflow to Standardize Meta-Analyses and Comparison of Affymetrix Microarray DataREPREVER: REPeat REsolVER - find and reconstruct extra copies given copy number gain regionsVIRMID (VIRtual MicroDissection for SNP calling)WessimWIDGET (Web Interface for Dynamic Genome-privacy EvaluaTion)

Genomics/DBP tool (2)MAGIGenetic Query Language (GQL)

NLP (2)NLP Virtual MachinePFINDER

Patient centered (1) Pain prediction

Privacy (9)

Differentially Private Data Queries (DPDQ)OCEANSCUDA-miRandaDifferentially Private Logistic RegressionCount Perturbation Allowing User PreferencesPPSVMDifferentially Private Projected HistogramsSpectral SwappingWITNESS

Privacy/DBP tool (1) WebGLORE

• 26 tools impacting over 3,000 researchers• 2,048 unique views of website descriptions as of 6/4/15 (https://idash.ucsd.edu/idash-softwaretools)

Page 48: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

iDASH SHADE Repositories

• Based on Kitware MIDAS open-source technology

• File-level access control• Separate PHI and Non-

PHI repositories• Two Factor Auth (PHI)

https://idash-data.ucsd.edu/

Page 49: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

Public Communities

Name Date available Views Downloads SizeAbsCNseq Mar-14 72 81 3.6 MB

BREAST - RIDER Breast MRI Jun-12 269 3,059 6.8 GB

BREAST-MRI Jan-14 36 357 155 KB

Clinical Data Requests for Research Mar-14 130 245 6.0 KB

Clinical Notes and Reports Feb-14 25,839 259,768 765.8 MB

CT Colonography Feb-14 1485 790 12.6 GB

DMITRI1 May-12 258 1,271 470.5 KB

iDASH webinars May-15 161 180 7.5 MB

Informed Consent Templates Mar-14 94 158 31.3 GB

Kawasaki Disease Biomarker Jan-15 0 0 95 MB

KD-NLP April-15 0 0 282.5 KB

Laboratory Data Feb-14 0 2 587.7 MB

Lung Image Database Consortium (LIDC) Aug-12 4,241 229,425 120.8 GB

Observational Cohort Event Analysis and Notification System (OCEANS) Feb-14 37 89 813.1 KB

Pain Prediction Data Mar-14 2,181 11,933 2.3 MB

Physical Activity Sensor Data Jul-12 10,453 28,099 42.8 MB

Radiology Teaching Files Feb-14 1,925 7,277 62.5 MB

RIDER Lung CT Jul-12 342 20,647 10.3 GB

Trends in BMI publication Oct-13 40 271 187 KB

Total 47,563 563,643 183.4 GB

• 19 open-access communities with 130 registered users• 3,256 unique views of website descriptions as of 6/4/15 (https://idash.ucsd.edu/data-collections)

Page 50: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

Whole Exome Pipeline Workflow

.fastq

.bam

.refined.bam

.vcf

.annotated.vcf

.copy_number

Alignment (BWA)

Duplicate removal (Picard)Quality Recalibration (GATK)

Indel Realignment (GATK)

Variant calling (VarScan)

Variant Annotation (Oncotator, VariantTools)

Copy Number calling (VarScan)

DatabasesdbNSFPExACdbSNPCOSMIC

.realigned.bam

QC and MetadataVariant countsOverlap with databasesSubstitution Profiles…

.fastq

.bam

.refined.bam

.realigned.bam

Pipeline InfrastuctureOmicsPipeSeq-Ware

NORMAL DNA TUMOR DNA

Page 51: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

Repeatable Results

Workflow

Short reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Blueprint

WorkflowShort reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Cont

ext

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

WorkflowShort reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Cont

ext

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

WorkflowShort reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Cont

ext

Reference DB

Test data

Configuration

Helper tools

OS

Blueprint

WorkflowShort reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Cont

ext

Reference DB

Test data

Configuration

Helper tools

OS

Instance

WorkflowShort reads

Index reference

Align to reference

Call variants

Annotate variants

Pick high impact

Deleterious SNPs

Cont

ext

Reference DB

Test data

Configuration

Helper tools

OS

iDASH On-Demand Resources

BookshelfMyDATA

InputResults

Instance

External Data

Page 52: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

Collaborative ProjectsLinked R01s• Cardiac Atlas Project (R01HL121754)

» Goal: Develop accurate new methods for analyzing cardiac shape, mechanics and blood flow in CHD patients

• CYCORE: Cyberinfrastructure for Cancer Comparative Effectiveness Research (R01CA177996)

» Goal: Develop a system that improves the capture of patient-reported and objectively measured data from patients in cancer clinical trials

• Privacy-Preserved Sharing and Analysis of Human Genomic Data (R01HG007078) » Goal: Study and develop a suite of innovative and transformative techniques aimed at

achieving practical and cost-effective genomic data protection

• SHARE: Statistical Health Information Release with Differential Privacy (R0101GM114612)

» Goal: Develop a toolkit for enabling privacy-preserving health information release to cover different data modality and study needs

PCORI-funded methods grant to collaborator Li Xiong from EmoryNSF-funded infrastructure grant to collaborator Kevin PatrickR21 on cloud privacy to Xiaoqian Jiang

Page 53: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

Journal Clubs

Page 54: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

Webinars• iDASH External

https://idash.ucsd.edu/events/webinars» 49 since February 2011» Most well attended

• 4/15/2011 – Laura Rodriguez; “Sharing Genomic Data: NIH Data Sharing Policies Past, Present & Future” (100 attendees)

• 2/18/2011 – Deven McGraw; “Protecting Privacy in Secondary Use: the Promise and Limits of Deidentification” (86 attendees)

» Most SciVee views all time• 8/2/2013 – Elena Martinez and Ian Komenaka; “Informed Consent for Biospecimen Collection and Data

Sharing among Low-income, Uninsured and Underinsured Women: Is it a Matter of Trust? (Special Webinar)” (1,618 views)

• 6/17/2011 – Lucila Ohno-Machado; “Data Sharing in the 'Publish or Perish' Era: Barriers and Current Solutions” (1,360 views)

» Most SciVee views per month• 1/16/2015 – Florian Kohlmayer and Fabian Prasser; “ARX - A Comprehensive Tool for Anonymizing

Biomedical Data” (151 average monthly views)• 11/21/2014 – Nigam Shah; “Generating Practice-based Evidence from Electronic Health Records” (144

average monthly views)

• iDASH Internal» 37 since September 2011

Page 55: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

Open Source Software

Page 56: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

Bioinformatics Course, Maputo 2014

Page 57: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

The Near Future

• Ethics technology» Instrument policy makers with algorithms and tools to

support ethics (including privacy)

• Serve HIPAA-storage and compute needs of a larger community» Data Discovery Index prototype environment» Private cloud for protected health information

• Hub infrastructure for large HIPAA-data networks» FISMA ATO» Distributed computing

Page 58: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

Acknowledgements: DBMI

At UCSD since 2009, funded byNIH U54, UL1, U24, UH3, R21, U01, T15, R00, K22, K99, D43, UCBRAID/OP, PCORI, NVIDIA

Page 59: iDASH National Center for Biomedical Computing · PDF fileiDASH National Center for Biomedical Computing ... UCSF 3.2M. UCLA 4.3M. UC Irvine 300k. ... RedCAP. Data Concierge Service

iCONCURElizabeth BellDexter FriedmanRita Germann-KurtzBriand HerreraPaulina PaulJoe RamsdellRichard SchwabAmy Sitapati

Cleo MaeharaJeff GretheHua XuCui TaoTodd JohnsonPeter RoseRicky Taira

Vineet BafnaTyler BathJane BurnsSheila CastanedaMichele DayRobert El-KarehClaudiu FarcasOlivier HarismendyChun-nan HsuZhanglong JiXiaoqian JiangIan KomenakaJihoon KimElisa LeeEric LevyKevin PatrickElena MartinezGreg TalaveraStaal VinterboShuang WangWei Wei

NIHU54HL108460 R01LM011392UL1TR000100UH3HL108785 U24AI117966R00LM011392R21LM012060 K99HG008175T15LM011271D43TW007015R01GM114612 (Xiong)R01HG007078 (Tang)R01HL121754 (McCulloch)R01CA177996 (Patrick)PCORICDRN-1306-04819AHRQ R01HS019913UCOPNVIDIA

PhenDISCOSon DoanHyeon-eui KimKo-wei Lin

Zia AghaJason DoctorScott DuvallFern FitzhenryPietro GalassettiMichael HogarthKatherine KimCleo MaeharaMichael MathenyDaniella MeekerJonathan NebekerFred ResnicDena RifkinCarl StepnowskiHoward TarasMary Wooley

UC-ReXKent Anderson Nick AndersonLattice ArmsteadDoug BellDoug BermanLisa DahmLeslie Yuan

CTRiTony ChenDaniel ClarkJim GraczikGary FiresteinCarol JohnsonMike KalichmanAntonios KouresAshley Williams

Acknowledgements

David Brenner, Paul Viviano, Wolf Dillmann