visualization and analysis of large data collections: a case study applied to confocal microscopy...
TRANSCRIPT
Visualization and analysis of
large data collections: a case
study applied to confocal
microscopy data
Wim de Leeuw, Swammerdam Institute for Life Sciences, Amsterdam
Pernette Verschure, Swammerdam Institute for Life Sciences, Amsterdam
Robert van Liere, Center for Mathematics and Computer Science, Amsterdam
2
Motivation (1):
Context: cell biology experiments
Phenomenon captured using digital microscopy
Experiment characteristics:
• Biological diversity
• Not all biological parameters can be controlled
Many measurements needed
3
Motivation (2):
Visualization and analysis of collections of data sets
• High variability
• Non-trivial information extraction (eg segmentation)
• Noise
Visualization Modes: Interactive vs Batch
• Interactive control+feedback vs static settings of parameters
• Time consuming vs multiple data sets processed simultaneously
Aim: combine advantages of Interactive and Batch Visualization
4
Agenda
Biological Problem
• Chromatin structure and gene control
Visualization Problem
• Data collection description
• Analysis with visual summaries
5
Chromatin Structure and Gene Control
Chromatin Structure
• Low level : DNA, nucleosomes, 30 nm fiber
• High level: fiber folding
Gene control
• Regulation of gene activity
Biological research question:
• Relation chromatin structure and gene control
• Is there, what is, when, etc....
6
Experiment
Question: influence of Hetrochromatin protein 1 on chromatin structure?
Approach:
• Prepare collection of cells with a specific region
• Control group: target GFP to the region
• HP1 group : target GFP/HP1 to the region
• Observe regions with confocal microscope
Data analysis question:
• Identify and quantify the differences between control and HP1 group
7
Collection of data sets
60 data sets (30 control group, 30 HP1 group)
Each data set: 512 x 512 x 32
Sample images:
• Control group (left)
• HP1 group (right)
Data analysis questions:
• Accurately detect region of interest
• Quantify region attributes (size, roughness, roundness, etc)
• What are the attribute differences in the control and HP1 groups ?
8
Diversity of the collection
9
Interactive Visualization of Collection
Advantages
• Control over visualization tools and parameters
• Segmentation
• Attribute computations
• Direct feedback
Disadvantages
• Laborious
• Error prone
10
Batch processing of collection
Advantage
• All sets are processed automatically
• A-priori parameter settings
Disadvantage
• No feedback on the process
11
Visual Summaries
Definition: a user defined compact visual representation of the data
during (batch) processing
Governing idea: the visual summary is used to visualize the steps in
batch process
Examples:
General strategy:
• Interactive setup (determine parameters, attributes, etc)
• Batch processing using setup
• Information visualization with visual summaries
12
User Interface using Visual Summaries
13
Discriminating groups
Red: HP1 sets, Green: control
Region granularity vs number of spots in region
Granularity attribute
• Average intensity gradient of region
Plot tells us:
• Large variation, some outliers
• HP1 and control seem different
14
Large variation, some outliers
Brush / link outliers
• Investigate visual summary
Problems with data set
• Corrupt data
15
HP1 and control seem different
Further analysis
• Histograms
• Box plots
Statistical tests
• Wilcoxon
Wilcoxon tells us that there is indeed a
significant difference
16
Lessons learned
Showing a significant difference in granularity vs number of spots tells us that the HP1 effects the structure of chromatin. The effect is that chromatin is condensed in a number of compact regions.
Biological significant result. Two papers published
Strategy for analysis of collections of confocal data sets
• Interactive visualization and batch processing are both needed
• Information visualization is used for the analysis of batch output
• Visual summaries are used to link back to original data set or
previous steps in batch process
Strategy has been implemented as the Argos system
17
Generality
Argos has been used for the analysis of an experiment consisting of 2500+ confocal data sets
Argos has been used for the analysis of micro array data