high dimensional fused-informatics

33

Upload: joel-saltz

Post on 05-Dec-2014

213 views

Category:

Data & Analytics


0 download

DESCRIPTION

Tools, methods, algorithms, software for integrative analysis of Pathology, Radiology, "omics" and outcome data.

TRANSCRIPT

  • 1. High Dimensional Fused- Informatics Joel Saltz MD, PhD Chair Biomedical Informatics Stony Brook University Associate Director for Informatics, Stony Brook Cancer Center

2. Integrative Biomedical Informatics Analysis Reproducible anatomic/functional characterization at fine level (Pathology) and gross level (Radiology) Integrate of anatomic/functional characterization, multiple types of omic information, outcome Predict treatment outcome, select, monitor treatments Integrated analysis and presentation of observations, features Radiology Imaging Patient Outcome Pathologic Features Omic Data 3. Pathology and Radiology imaging have different properties in roles of discovery and aggressiveness potential Differences arise from differing capabilities & need not completely correspond sampling differences & global properties differing purposes discovery, staging, IMRT/brachyRx planning Pathology high spatial and increasing molecular resolution Radiology global view, temporal information, increasing spatial resolution Carl Jaffe 4. Correlating Imaging Phenotypes with Genomic Signatures: Scientific Opportunities (Imaging Genomics Workshop NCI June 2013) Clinical Approach and Use Development of imaging+analysis methods to characterize heterogeneity within a tumor at one time point evolution over time among different tumor types Development of imaging metrics that: can predict and detect emergence of resistance? correlates with genomic heterogeneity? correlates with habitat heterogeneity? can identify more homogeneous sub-types 5. VASARI Feature Set 6. Pathology Analytical Imaging Provide rich information about morphological and functional characteristics Image analysis, feature extraction on multiple scales Spatially mapped omics Multiple microscopy modalities Glass Slides Scanning Whole Slide Images Image Analysis 7. Morphological Tissue Classification Nuclei Segmentation Cellular Features Lee Cooper, Jun Kong Whole Slide Imaging 8. Quantitative Feature Analysis in Pathology: Emory In Silico Center for Brain Tumor Research (PI = Dan Brat, PD= Joel Saltz) NLM/NCI: Integrative Analysis/Digital Pathology R01LM011119, R01LM009239 (Dual PIs Joel Saltz, David Foran) 9. Millions of Nuclei Defined by n Features Top-down analysis: analyze features in context of existing diagnostic constructs Bottom-up analysis: let nuclear features define and drive the analysis 10. Direct Study of Relationship Between vs Lee Cooper, Carlos Moreno 11. Clustering identifies three morphological groups Analyzed 200 million nuclei from 162 TCGA GBMs (462 slides) Named for functions of associated genes: Cell Cycle (CC), Chromatin Modification (CM), Protein Biosynthesis (PB) Prognostically-significant (logrank p=4.5e-4) FeatureIndices CC CM PB 10 20 30 40 50 0 500 1000 1500 2000 2500 3000 0 0.2 0.4 0.6 0.8 1 Days Survival CC CM PB 12. Associations 13. Millions of Nuclei Defined by n Features Top-down analysis: use the features with existing diagnostic constructs Bottom-up analysis: let features define and drive the analysis 14. Nuclear Analysis Workflow Describe individual nuclei in terms of size, shape, and texture Step 2: Feature Extraction Step 1: Nuclei Segmentation 15. Oligodendroglioma Astrocytoma Nuclear Qualities 1 10 Step 3: Nuclei Classification 16. Survival Analysis Human Machine 17. Gene Expression Correlates of High Oligo-Astro Ratio on Machine-based Classification Oligo Related Genes Myelin Basic Protein Proteolipoprotein HoxD1 Nuclear features most Associated with Oligo Signature Genes: Circularity (high) Eccentricity (low) 18. Role of Microenvironment Necrosis in TCGA GBM tissue samples v.s. Verhaak transcriptional class Mesenchymal transcriptional class -- greater levels of necrosis than other classes Gene expression signatures of nonmesenchymal GBMs became more similar to the mesenchymal signature with increasing levels of necrosis 19. Microenvironment and Master Regulators Extent of Necrosis Related Expression of Master Regulators of the Mesenchymal Transition Necrosis and C/EBP- 20. Computation and Data Management: Requirements and Challenges Explosion of derived data 105x105 pixels per image 1 million objects per image Hundreds to thousands of images per study High computational complexity Image analysis, feature extraction, machine learning pipelines Spatial queries involve heavy duty geometric computations 21. Projection 2025 100K 1M pathology slides/hospital/year 2GB compressed per slide 1-10 slides used for Pathologist computer aided diagnosis 100-10K slides used in hospital Quality control Groups of 100K+ slides used for clinical research studies -- Combined with molecular, outcome data 22. HPC: Tools for Image Analysis, Feature Extraction, Machine Learning Pipelines 23. HPC Whole Slide Segmentation and Feature Extraction Pipeline Tony Pan, George Teodoro, Tahsin Kurc and Scott Klasky 24. Titan Peak Speed 30,000,000,000,000,000 floating point operations per second! 25. Large Scale Data Management Data model capturing multi-faceted information including markups, annotations, algorithm provenance, specimen, etc. Support for complex relationships and spatial query: multi-level granularities, relationships between markups and annotations, spatial and nested relationships Highly optimized spatial query and analyses Implemented in a variety of ways including optimized CPU/GPU, Hadoop/HDFS and IBM DB2 26. Spatial Centric Pathology Imaging GIS Point query: human marked point inside a nucleus . Window query: return markups contained in a rectangle Spatial join query: algorithm validation/comparison Containment query: nuclear feature aggregation in tumor regions Fusheng Wang 27. PAIS (Pathology Analytical Imaging Standards) PAIS Logical Model 62 UML classes markups, annotations, imageReferences, provenance PAIS Data Representation XML (compressed) or HDF5 PAIS Databases loading, managing and querying and sharing data Native XML DBMS or RDBMS + SDBMS class Domain Mo... Annotation GeometricShape CalculationObservation Specimen ImageReference Provenance User PAIS Equipment Group AnatomicEntity Subject Field Project MicroscopyImageReference DICOMImageReference TMAImageReference Markup Inference Region WholeSlideImageReference Patient Surface Collection AnnotationReference 10..1 1 0..1 0..* 0..* 1 0..* 1 0..1 1 0..* 1 0..1 1 0..1 1 0..1 1 0..* 1 0..* 0..* 0..* 1 0..1 1 0..1 1 0..* 0..1 0..* 1 0..* 1 0..1 1 0..* 1 0..1 1 0..1 1 0..* 10..* 1 0..* 1 0..* Fusheng Wang 28. High Performance Spatial Queries and Analytics: Hadoop-GIS General framework to support high performance spatial queries and analytics for spatial big data on MapReduce and CPU-GPU hybrid platforms Spatial data processing methods and pipelines with spatial partition level parallelism running on MapReduce Multi-level indexing methods to accelerate spatial data processing Declarative spatial queries and translation into MapReduce operations Utilize GPU to parallelize spatial operations and integrate them into MapReduce [VLDB12, GIS12, GIS13, VLDB13] 29. MICCAI 2014 BRAIN TUMOR Classification and Segmentation Challenges TCGA TCIA IMAGING CHALLENGE DIGITAL PATHOLOGY CHALLENGE Phase 1: Training June 20 - July 31 Phase 2: Leader Board Aug 1 - Aug 29 Phase 3: Test Sept 8 - Sept 12 For more information about these challenges and a related workshop on September 14, 2014 at MICCAI in Boston, see: cancerimagingarchive.net MICCAI: Medical Image Computing and Computer Aided Interventions - MICCAI2014.org TCGA: The Cancer Genome Atlas - cancergenome.nih.gov TCIA: The Cancer Image Archive - cancerimagingarchive.net 30. Digital Pathology/Brain Tumor Image Segmentation (BRATS) Use data currently available through data archive resources of the National Institutes of Health (NIH), namely, the Cancer Genome Atlas (TCGA) and the Cancer Image Archive (TCIA) Digital Pathology challenge will use digital slides related to patients whose genomics data are available from TCGA. Similarly, BRATS 2014 Challenge will use clinical MRI image data, also from the TCGA study subjects. Proposed outcome of RSNA/ASCP workshop Coordinated Pathology/Radiology 2015 challenge feature selection and statistical/machine learning algorithms to leverage Radiology, Pathology and omic features to predict outcome, response to treatment 31. Thanks!