visualization and analysis of large data collections: a case study applied to confocal microscopy...

Post on 27-Mar-2015

216 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Visualization and analysis of

large data collections: a case

study applied to confocal

microscopy data

Wim de Leeuw, Swammerdam Institute for Life Sciences, Amsterdam

Pernette Verschure, Swammerdam Institute for Life Sciences, Amsterdam

Robert van Liere, Center for Mathematics and Computer Science, Amsterdam

2

Motivation (1):

Context: cell biology experiments

Phenomenon captured using digital microscopy

Experiment characteristics:

• Biological diversity

• Not all biological parameters can be controlled

Many measurements needed

3

Motivation (2):

Visualization and analysis of collections of data sets

• High variability

• Non-trivial information extraction (eg segmentation)

• Noise

Visualization Modes: Interactive vs Batch

• Interactive control+feedback vs static settings of parameters

• Time consuming vs multiple data sets processed simultaneously

Aim: combine advantages of Interactive and Batch Visualization

4

Agenda

Biological Problem

• Chromatin structure and gene control

Visualization Problem

• Data collection description

• Analysis with visual summaries

5

Chromatin Structure and Gene Control

Chromatin Structure

• Low level : DNA, nucleosomes, 30 nm fiber

• High level: fiber folding

Gene control

• Regulation of gene activity

Biological research question:

• Relation chromatin structure and gene control

• Is there, what is, when, etc....

6

Experiment

Question: influence of Hetrochromatin protein 1 on chromatin structure?

Approach:

• Prepare collection of cells with a specific region

• Control group: target GFP to the region

• HP1 group : target GFP/HP1 to the region

• Observe regions with confocal microscope

Data analysis question:

• Identify and quantify the differences between control and HP1 group

7

Collection of data sets

60 data sets (30 control group, 30 HP1 group)

Each data set: 512 x 512 x 32

Sample images:

• Control group (left)

• HP1 group (right)

Data analysis questions:

• Accurately detect region of interest

• Quantify region attributes (size, roughness, roundness, etc)

• What are the attribute differences in the control and HP1 groups ?

8

Diversity of the collection

9

Interactive Visualization of Collection

Advantages

• Control over visualization tools and parameters

• Segmentation

• Attribute computations

• Direct feedback

Disadvantages

• Laborious

• Error prone

10

Batch processing of collection

Advantage

• All sets are processed automatically

• A-priori parameter settings

Disadvantage

• No feedback on the process

11

Visual Summaries

Definition: a user defined compact visual representation of the data

during (batch) processing

Governing idea: the visual summary is used to visualize the steps in

batch process

Examples:

General strategy:

• Interactive setup (determine parameters, attributes, etc)

• Batch processing using setup

• Information visualization with visual summaries

12

User Interface using Visual Summaries

13

Discriminating groups

Red: HP1 sets, Green: control

Region granularity vs number of spots in region

Granularity attribute

• Average intensity gradient of region

Plot tells us:

• Large variation, some outliers

• HP1 and control seem different

14

Large variation, some outliers

Brush / link outliers

• Investigate visual summary

Problems with data set

• Corrupt data

15

HP1 and control seem different

Further analysis

• Histograms

• Box plots

Statistical tests

• Wilcoxon

Wilcoxon tells us that there is indeed a

significant difference

16

Lessons learned

Showing a significant difference in granularity vs number of spots tells us that the HP1 effects the structure of chromatin. The effect is that chromatin is condensed in a number of compact regions.

Biological significant result. Two papers published

Strategy for analysis of collections of confocal data sets

• Interactive visualization and batch processing are both needed

• Information visualization is used for the analysis of batch output

• Visual summaries are used to link back to original data set or

previous steps in batch process

Strategy has been implemented as the Argos system

17

Generality

Argos has been used for the analysis of an experiment consisting of 2500+ confocal data sets

Argos has been used for the analysis of micro array data

top related