big challenges for statisticians · big challenges for statisticians hongtu zhu, ph.d department of...

32
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Big Challenges for Statisticians Hongtu Zhu, Ph.D Department of Biostatistics and Biomedical Research Imaging Center The University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA

Upload: others

Post on 03-Nov-2019

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Big Challenges for Statisticians · Big Challenges for Statisticians Hongtu Zhu, Ph.D Department of Biostatistics†and Biomedical Research Imaging Center‡ The University of North

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Big Challenges for Statisticians

Hongtu Zhu, Ph.DDepartment of Biostatistics† and Biomedical Research Imaging Center‡

The University of North Carolina at Chapel Hill,

Chapel Hill, NC 27599, USA

Page 2: Big Challenges for Statisticians · Big Challenges for Statisticians Hongtu Zhu, Ph.D Department of Biostatistics†and Biomedical Research Imaging Center‡ The University of North

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Thank NSF and SAMSI!

Thank organizers!

Thank you!

Page 3: Big Challenges for Statisticians · Big Challenges for Statisticians Hongtu Zhu, Ph.D Department of Biostatistics†and Biomedical Research Imaging Center‡ The University of North

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Science

Statistics

Page 4: Big Challenges for Statisticians · Big Challenges for Statisticians Hongtu Zhu, Ph.D Department of Biostatistics†and Biomedical Research Imaging Center‡ The University of North

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Part 1. Technical Challenges

Page 5: Big Challenges for Statisticians · Big Challenges for Statisticians Hongtu Zhu, Ph.D Department of Biostatistics†and Biomedical Research Imaging Center‡ The University of North

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Imaging Science

Imaging Science is a multidisciplinary field concerned with the generation, collection,

duplication, analysis, modification, and visualization of images.

As an evolving field, it includes research and researchers from

Physics, Mathematics, Statistics, Electrical Engineering, Computer

Vision, Computer Science and Perceptual Psychology.

From Wikipedia, the free encyclopedia

Page 6: Big Challenges for Statisticians · Big Challenges for Statisticians Hongtu Zhu, Ph.D Department of Biostatistics†and Biomedical Research Imaging Center‡ The University of North

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Three key components

•Image acquisition: studies the physical mechanisms

and mathematical models and algorithms by which

imaging devices generate image observations.

•Image interpretation/application: is to see, monitor, and

interpret the targeted world/patterns being imaged.

•Image processing: is any linear or nonlinear operator

that operates on the images and produces targeted

patterns.

Page 7: Big Challenges for Statisticians · Big Challenges for Statisticians Hongtu Zhu, Ph.D Department of Biostatistics†and Biomedical Research Imaging Center‡ The University of North

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Structural MRI

Diffusion MRI

Functional MRI

(resting)

Functional MRI (task)

Level 1: Imaging Data

Overview

• Structural MRI

• Diffusion MRI

• Functional MRI

• Complementary techniques

- Variety of acquisitions

- Measurement basics

- Limitations & artefacts

- Analysis principles

- Acquisition tips

PET EEG/MEG Calcium CT

Page 8: Big Challenges for Statisticians · Big Challenges for Statisticians Hongtu Zhu, Ph.D Department of Biostatistics†and Biomedical Research Imaging Center‡ The University of North

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Image

Acquisition

Signal Models

& Noise Sources

Image

PreprocessingRepresentation

Segmentation

Registration

Data Analysis

&

Interpretation

Statistical

Modeling

& Inference

Mathematics

& Statistics

Image Processing

Computer

Science/Engineer

Page 9: Big Challenges for Statisticians · Big Challenges for Statisticians Hongtu Zhu, Ph.D Department of Biostatistics†and Biomedical Research Imaging Center‡ The University of North

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Individual Imaging Analysis

Imaging Construction Image Segmentation

Multimodal Analysis

DTI FLAIR

Marc

Page 10: Big Challenges for Statisticians · Big Challenges for Statisticians Hongtu Zhu, Ph.D Department of Biostatistics†and Biomedical Research Imaging Center‡ The University of North

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Group Imaging Analysis

Longitudinal/Family BrainGroup Differences

Prediction

Imaging Genetics

NC/Diseased

Registration

Hibar, Dinggang, Martin

Page 11: Big Challenges for Statisticians · Big Challenges for Statisticians Hongtu Zhu, Ph.D Department of Biostatistics†and Biomedical Research Imaging Center‡ The University of North

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

 

f

 

T

 

ˆ F = T[ f ]

FDA: Functional Data Analysis

Page 12: Big Challenges for Statisticians · Big Challenges for Statisticians Hongtu Zhu, Ph.D Department of Biostatistics†and Biomedical Research Imaging Center‡ The University of North

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Voxel-wise

Statistical

Models

Multiple ComparisonsSmoothing

Prediction

ImagesRegistration

Estimation

FDA: Functional Data Analysis

Page 13: Big Challenges for Statisticians · Big Challenges for Statisticians Hongtu Zhu, Ph.D Department of Biostatistics†and Biomedical Research Imaging Center‡ The University of North

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

ill-posed inverse problems

 

F

 

f

 

T

 

ˆ F = T[ f ]

d(F, F̂)® 0?

Page 14: Big Challenges for Statisticians · Big Challenges for Statisticians Hongtu Zhu, Ph.D Department of Biostatistics†and Biomedical Research Imaging Center‡ The University of North

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Level 2: A Multiscale Physical System

The van Essen diagramstimulus – activity – measurement chain

Robinson

Page 15: Big Challenges for Statisticians · Big Challenges for Statisticians Hongtu Zhu, Ph.D Department of Biostatistics†and Biomedical Research Imaging Center‡ The University of North

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

• Different models at different scales.

• Ladder of overlapping models.

• Must be testable against multiple

phenomena.

A Multi-modal Approach

Page 16: Big Challenges for Statisticians · Big Challenges for Statisticians Hongtu Zhu, Ph.D Department of Biostatistics†and Biomedical Research Imaging Center‡ The University of North

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Level 3: Data Integration

Ritchie et al. (2015).

Nature Review Genetics

Meta-dimensional analysis

An approach whereby all

scales of data are combined

simultaneously to produce

complex models defined

as multiple variables from

multiple scales of data.

Multi-staged analysis

A stepwise or hierarchical

analysis method that reduces

the search space through

different stages of analysis.

Systems genomics

An analysis approach that

models the complex inter- and

intra-individual variations

of traits and diseases using

data from next-generation

omic data.

Data integration

The incorporation of

multi-omic information in

a meaningful way to provide a

more comprehensive analysis

of a biological point of interest.

In this Review, we describe the principles of meta-

dimensional analysis and multi-staged analysis, and

provide an overview of some of the approaches that

are used to predict a given quantitative or categorical

outcome, the tools available to implement these analy-

ses, and the various strengths and weaknesses of these

strategies. In addition, we describe the analytical chal-

lenges that emerge with data sets of this magnitude, and

provide our perspective on how such systems genomic

analyses might develop in the future.

Why integrate data?

Data integration can have numerous meanings; however,

in this Review, we use it to mean the process by which

different types of omic data are combined as predictor

variables to allow more thorough and comprehensive

modelling of complex traits or phenotypes — which are

likely to be the result of an elaborate interplay among

biological variation at various levels of regulation — 

through the identification of more informative models.

Data integration methods are now emerging that aim

to bridge the gap between our ability to generate vast

amounts of data and our understanding of biology, thus

reflecting the complexity within biological systems.

The primary motivation behind integrated data analy-

sis is to identify key genomic factors, and importantly

their interactions, that explain or predict disease risk or

other biological outcomes. The success in understand-

ing the genetic and genomic architecture of complex

phenotypes has been modest, and this could be due to

our limited exploration of the interactions among the

genome, transcriptome, metabolome and so on. Data

integration may provide improved power to identify

the important genomic factors and their interactions

(BOX 1). In addition, modelling the complexity of, and

the interactions between, variation in DNA, gene

expression, methylation, metabolites and proteins

may improve our understanding of the mechanism

or causal relationships of complex-trait architecture.

There are two main approaches to data integration:

multi-staged analysis, which involves integrating

information using a stepwise or hierarchical analysis

approach; and meta-dimensional analysis, which refers

to the concept of integrating multiple different data

types to build a multi variate model associated with a

given outcome16–18.

Nature Reviews | Genetics

• SNP• CNV• LOH• Genomic rearrangement• Rare variant

• DNA methylation

• Histone modific

a

tion

• Chromatin

accessibility

• TF binding

• miRNA

• Gene expression

• Alternative splicing

• Long non-coding

RNA

• Small RNA

• Protein expresssion

• Post-translational

modific

a

tion

• Cytokine array

• Metabolite

profil

i

ng in

serum, plasma,

urine, CSF, etc.

Genome ProteomeTranscriptomeEpigenome

DNA Gene mRNA

TF Metabolites

Protein

Transcription Expression Translation Function

Alternative

splicing

miRNA

TFbs

TFbs

TFbs

Me

Histone

Metabolome Phenome

• Cancer

• Metabolic

syndrome

• Psychiatric

disease

Figure 1 | Biological systems multi-omics from the genome, epigenome,

transcriptome, proteome and metabolome to the phenome.

Heterogeneous genomic data exist within and between levels, for example,

single-nucleotide polymorphism (SNP), copy number variation (CNV), loss

of heterozygosity (LOH) and genomic rearrangement, such as translocation,

at the genome level; DNA methylation, histone modification, chromatin

accessibility, transcription factor (TF) binding and micro RNA (miRNA) at the

epigenome level; gene expression and alternat ive splicing at the

transcriptome level; protein expression and post-translational modification

at the proteome level; and metabolite profiling at the metabolome level.

Arrows indicate the flow of genetic information from the genome level to

the metabolome level and, ultimately, to the phenome level. The red crosses

indicate inactivation of transcription or translation. CSF, cerebrospinal

fluid; Me, methylation; TFBS, transcription factor-binding site.

REVIEWS

2 | ADVANCE ONLI NE PUBLI CATION www.nature.com/reviews/genetics

© 2015 Macmillan Publishers Limited. All rights reserved

Page 17: Big Challenges for Statisticians · Big Challenges for Statisticians Hongtu Zhu, Ph.D Department of Biostatistics†and Biomedical Research Imaging Center‡ The University of North

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Endophenotypes+

Genes+

Genomics)Epigenomics)

Expression)RNA)genes,)

protein4coding)genes)

Transcriptomics)Proteomics)

Metabolomics)

Interactomics)

neuron)development,)organelle)

Neuroscience)Imaging)

Brain)interactome)

Cell)biology)Neuroscience)

Diagnosis)Self4report)

Figure+1.+A)simplified)flow)chart)for)psychiatric)disorders:)from)genes)to)symptoms)

Environmental,+social+and+psychological+factors+

feedback)feedback) feedback) feedback)

Cells)

RNA,)proteins,)metabolites)

Molecules+ Brain+Structure,)circuits,)

physiology)

Symptoms+

Behavioral)tests)

Zhao and Castellanos (2016) Discovery science strategies in studies of the

pathophysiology of child and adolescent psychiatric disorders: promises and limitations

Page 18: Big Challenges for Statisticians · Big Challenges for Statisticians Hongtu Zhu, Ph.D Department of Biostatistics†and Biomedical Research Imaging Center‡ The University of North

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

http://en.wikipedia.org/wiki/DNA_sequence

Big Data Integration in Health Informatics

G

IE

D Selection

E: environmental factors

G: genetic/genomics

D: disease

I: imaging/device

Page 19: Big Challenges for Statisticians · Big Challenges for Statisticians Hongtu Zhu, Ph.D Department of Biostatistics†and Biomedical Research Imaging Center‡ The University of North

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Part 2. Career Challenges

Page 20: Big Challenges for Statisticians · Big Challenges for Statisticians Hongtu Zhu, Ph.D Department of Biostatistics†and Biomedical Research Imaging Center‡ The University of North

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Career Development

Start with simple projects

Learn from others

Try hard to get involved in some large studies

Think about how to do it better, in what sense?

More papers.

Develop new tools and packages.

Write more grants

Page 21: Big Challenges for Statisticians · Big Challenges for Statisticians Hongtu Zhu, Ph.D Department of Biostatistics†and Biomedical Research Imaging Center‡ The University of North

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Training

SAMSI videos and slides for summer schools and

lectures.

Short Courses in major conferences.

New Graduate Courses

Page 22: Big Challenges for Statisticians · Big Challenges for Statisticians Hongtu Zhu, Ph.D Department of Biostatistics†and Biomedical Research Imaging Center‡ The University of North

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Collaborations

Good Mentors: Theory and Applications.

Good Collaborators: Radiology, Neuroscience,

Psychiatry, Psychology, Computer Science, …

Page 23: Big Challenges for Statisticians · Big Challenges for Statisticians Hongtu Zhu, Ph.D Department of Biostatistics†and Biomedical Research Imaging Center‡ The University of North

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Big Public Data Sets:

• Alzheimer’s Disease Neuroimaging Initiative (ADNI)

• NIH MRI Study of Normal Brain Development

• National Database for Autism Research

• Human Connectome Project

• The Cancer Genome Atlas (TCGA)

• UK Biobank

https://en.wikipedia.org/wiki/List_of_neuroscience_databases

Data Sets

Page 24: Big Challenges for Statisticians · Big Challenges for Statisticians Hongtu Zhu, Ph.D Department of Biostatistics†and Biomedical Research Imaging Center‡ The University of North

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

UK Biobank Project

Page 25: Big Challenges for Statisticians · Big Challenges for Statisticians Hongtu Zhu, Ph.D Department of Biostatistics†and Biomedical Research Imaging Center‡ The University of North

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

The Human Connectome ProjectThe HCP is to elucidate the neural pathways that underlie brain

function and behavior.

Resting-state fMRI (rfMRI) and dMRI provide

information about brain connectivity.

Task-evoked fMRI reveals much about brain

function.

Structural MRI captures the shape of the highly

convoluted cerebral cortex.

Behavioral data relate brain circuits to individual

differences in cognition, perception, and

personality.

Magnetoencephalography (MEG) combined with

electroencephalography (EEG) yield information

about brain function on a milisecond time scale.

The Heavily Connected Brain

Peter Stern, “Connection, connection, connection…”, Science, Nov. 1 2013: Vol. 342 no. 6158 P.577

Page 26: Big Challenges for Statisticians · Big Challenges for Statisticians Hongtu Zhu, Ph.D Department of Biostatistics†and Biomedical Research Imaging Center‡ The University of North

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

http://www.nitrc.org/

NITRC = The Source for Neuroimaging Tools and Resources

Statistical Parametric Mapping (SPM)

FMRIB Software Library (FSL)

Analysis of Functional NeuroImages (Afni)

3D Slicer

FreeSurfer

……

Softwares

Page 27: Big Challenges for Statisticians · Big Challenges for Statisticians Hongtu Zhu, Ph.D Department of Biostatistics†and Biomedical Research Imaging Center‡ The University of North

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Human Brain Mapping (HBM)

ISMRM conference

SNF conference.

Information Processing in Medical Imaging (IPMI)

SIAM Conference on Imaging Science (IS)

Medical Image Computing and Computer Assisted Intervention (MICCAI)

International Symposium on Biomedical Imaging (ISBI)

IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Neural Information Processing Systems Foundation (NIPS)

Conferences

Page 28: Big Challenges for Statisticians · Big Challenges for Statisticians Hongtu Zhu, Ph.D Department of Biostatistics†and Biomedical Research Imaging Center‡ The University of North

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

NeuroImage

Medical Imaging Analysis

IEEE Transactions on Medical Image

Human Brain Mapping

IEEE Transactions on Signal Processing

IEEE Transactions on Image Processing

IEEE Transactions on Signal Processing Magazine

SIAM Journal on Imaging Sciences

IEEE Pattern Analysis and Machine Intelligence

Annals of Applied Statistics, Biometrics

Biostatistics

Journal of American Statistical Association ACS

Publications

Page 29: Big Challenges for Statisticians · Big Challenges for Statisticians Hongtu Zhu, Ph.D Department of Biostatistics†and Biomedical Research Imaging Center‡ The University of North

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Part 3. Software Challenges

Page 30: Big Challenges for Statisticians · Big Challenges for Statisticians Hongtu Zhu, Ph.D Department of Biostatistics†and Biomedical Research Imaging Center‡ The University of North

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

http://www.nitrc.org/

NITRC = The Source for Neuroimaging Tools and Resources

Software Development

Lack a good and popular statistical software

for Neuroimaging Data Analysis

from our community

Page 31: Big Challenges for Statisticians · Big Challenges for Statisticians Hongtu Zhu, Ph.D Department of Biostatistics†and Biomedical Research Imaging Center‡ The University of North

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Software Development

• Share responsibilities and information

• Common input and output files compatible with major packages

• Build small Rcpp and Matlab packages

• Release them through your own websites, our neuroconduct website

and http://www.nitrc.org/

• Focus on a few key tools and expand from them

• Encourage other groups to download and use them.

Start a Neuroconduct project

Page 32: Big Challenges for Statisticians · Big Challenges for Statisticians Hongtu Zhu, Ph.D Department of Biostatistics†and Biomedical Research Imaging Center‡ The University of North

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Software Development

1. Simulators for different imaging modalities • Evaluate image processing tools

• Evaluate statistical methods (group analysis, reliability)

2. Standardize all image processing and analysis pipelines• fMRI and resting fMRI

• EEG/MEG

• DTI

• CT

• Calcuim

• PET

3. Develop new tools to do multi-modal analysis

4. Develop new tools to integrate imaging, genetic,

and clinical data