gxdb a universal tool to collect, analyse, manage and visualize transcriptomic data wolfgang...

Post on 28-Dec-2015

220 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

GxDb a universal tool to collect, analyse, manage and

visualize transcriptomic data

Wolfgang Raffelsberger, Raymond Ripp and Laetitia Poidevin

BingGi DaysJanuary 2010

What is transcriptomic ?

-> a high throughput analysis of gene expression by measuring the amount of mRNA

What are the techniques ?

-> DNA microarrays-> SAGE-> Differential Display-> ….

=> large quantities of data

GxDb: integrative tool to

Introduction

collecttreatanalyzemanage visualize

GxDb is a website and a database

Organization of data in GxDb

SampleSample

Individual• name• age• description

Individual• name• age• description

OrganismOrganism

GenotypeGenotypeTissueTissue

TreatmentTreatment

SampleConditionSampleCondition

ex: mouse wt aged 9 dayex: mouse wt aged 9 day

ArraytypeArraytype ex: Mouse430_2ex: Mouse430_2

ArraytypeArraytype

RealExpRealExp

ArraytypeArraytype

SampleSample

CEL file r3CEL file r3CEL file r2CEL file r2

CEL file r1CEL file r1

RealExp 2RealExp 2

ArraytypeArraytype

Sample 2Sample 2

CEL file r5CEL file r5CEL file r4CEL file r4

CEL file r3CEL file r3

RealExp 3RealExp 3

ArraytypeArraytype

Sample 3Sample 3

CEL file r8CEL file r8CEL file r7CEL file r7

CEL file r6CEL file r6

RealExp 4RealExp 4

ArraytypeArraytype

Sample 4Sample 4

CEL file r11CEL file r11CEL file r10CEL file r10CEL file r9CEL file r9

Organization of data in GxDb

ex: Mouse430_2ex: Mouse430_2

ex: wt_d9ex: wt_d9ex: wt_d9ex: wt_d9

ex: wt_d11ex: wt_d11

ex: wt_d13ex: wt_d13

ex: wt_d15ex: wt_d15

Organization of data in GxDb

ArraytypeArraytype

RealExpRealExp

ArraytypeArraytype

SampleSample

CEL file r3CEL file r3CEL file r2CEL file r2

CEL file r1CEL file r1

RealExp 2RealExp 2

ArraytypeArraytype

Sample 2Sample 2

CEL file r5CEL file r5CEL file r4CEL file r4

CEL file r3CEL file r3

RealExp 3RealExp 3

ArraytypeArraytype

Sample 3Sample 3

CEL file r8CEL file r8CEL file r7CEL file r7

CEL file r6CEL file r6

RealExp 4RealExp 4

ArraytypeArraytype

Sample 4Sample 4

CEL file r11CEL file r11CEL file r10CEL file r10CEL file r9CEL file r9

Experiment

ArraytypeArraytypeRealExpRealExp

ArraytypeArraytype

SampleSample

CEL file r3CEL file r3CEL file r2CEL file r2CEL file r1CEL file r1

RealExp 2RealExp 2

ArraytypeArraytype

Sample 2Sample 2

CEL file r5CEL file r5CEL file r4CEL file r4CEL file r3CEL file r3

RealExp 3RealExp 3

ArraytypeArraytype

Sample 3Sample 3

CEL file r8CEL file r8CEL file r7CEL file r7CEL file r6CEL file r6

RealExp 4RealExp 4

ArraytypeArraytype

Sample 4Sample 4

CEL file r11CEL file r11CEL file r10CEL file r10CEL file r9CEL file r9

ExperimentSignal Intensity

Ratio

Cluster

≠ expressed genes

Quality

Treatment and Analysis

protocol

Treatment and Analysis

protocol

1) Normalization 6 methods: RMA, gcRMA, dChip, MAS5.0, plier, vsn

=> signal intensity

2) Calculate average (between replicats) and ratio

3) Filtering - Eliminate probesets that are never expressed in all arrays of one experiment based on distribution or call (according to normalization method) - Eliminate probesets with very low changes between condition et reference

based on fold change based on standard deviation

4) Statistical analysis - method: t-test combined with empirical bayes for shrinkage - estimation of FDR (false discovery rate) - tag probesets with differential expression (automatic threshold findings)

Treatment and Analysis protocol

Treatment and Analysis protocol

1) Normalization 2) Calculate average (replicats) and ratio 3) Filtering4) Statistical analysis

5) Clusteringtool: Cluspackmethods: k-means (DPC) Mixtures models (aic and bic)

=> clusters

6) Quality Control Reporttool: RReportGenerator for Automatic Statistical AnalysisAutomatic Statistical Analysis to estimate the quality of arrays

Upload form

Upload formStep 1: Selection of Arraytype and Experiment

Upload formStep 1

Create your new experiment

Organism

Genotype

SampleCondition

Individual

TreatmentType

Treatment

Tissue

Sample

Upload formStep 1

Create your news samples

Upload formStep 1: Selection of Arraytype and Experiment

Upload formStep 2: Upload of .cel files

Upload formStep 3: Select the corresponding sample to each cel file

Upload formStep 4: Select the interesting comparisons to calculate ratio

Ratio:Condition / reference

Example:C3H_rd1_d10 / C3H_wt_d10

Upload formStep 5: Launch Treatment and Analysis protocol

Upload formStep 5: Clustering, Quality analysis and loading in database

Signal IntensityRatio

≠ expressed gene

Clustering

RealExp

Organization of data in GxDb

QualitySample

Experiment

Cel file

Arraytype-Probeset

Query GxDb

Query GxDb

Experiment

Probeset

Sample

RealExpSignal Intensity

RatioCluster

time-co

urse

of re

tinal d

evelo

pm

en

t

Visualization in GxDb

GxDb WebsiteUpload

Querying Display

alnitak

Star3

Star4

Star5

Star6

Star7

Star8

/GxData

GxDb SQL database

http://gx.igbmc.frWeb Services

Café des sciences QSub

Ordonnanceur

GxDb ressourcesLanguages used:

PHP (HTML) - Upload - PipeWork - RadarGenerator - Fed

R - Treatment and analysis protocol - RReportGenerator

SQL

Tcl - Gx (~ Gscope) - Probeset loading

C - Cluspack

Conclusion and Prospects• Automated raw-data upload, storage, treatment and analysis multiple treatment protocols multiple clustering methods multiple human and automatic expert analysis

=> Comparisons => Analyse the strengths and weaknesses of the different protocols

• Improvement of website • More user friendly• Visualization of clusters, ratio• Tools for meta-analysis

• Possibility of upload data directly from GEO

• Diagnostic report to analyze easier the data

• Links to others databases and tools: STRING, GSEA..

Ratio Pipework

Organism

Normalization

Ratio minimumRatio maximum

• Integration and storage in a unifying format

• Automated raw-data upload, storage, treatment and analysis multiple treatment protocols multiple clustering methods multiple human and automatic expert analysis

=> Comparisons => Analyse the strengths and weaknesses of the different protocols

• Facilitated querying and data visualization

Advantages of GxDb

ArraytypeArraytype

RealExpRealExp

ArraytypeArraytype

SampleSample

CEL file r3CEL file r3CEL file r2CEL file r2

CEL file r1CEL file r1

ArraytypeArraytype

RealExp 2RealExp 2

ArraytypeArraytype

Sample 2Sample 2

CEL file r5CEL file r5CEL file r4CEL file r4

CEL file r3CEL file r3

ArraytypeArraytype

RealExp 3RealExp 3

ArraytypeArraytype

Sample 3Sample 3

CEL file r8CEL file r8CEL file r7CEL file r7

CEL file r6CEL file r6

ArraytypeArraytype

RealExp 4RealExp 4

ArraytypeArraytype

Sample 4Sample 4

CEL file r11CEL file r11CEL file r10CEL file r10CEL file r9CEL file r9

GxDb transcriptomics

PROBESET 3• probeset_id• genename• genedescription• species• speciessymbol• representpublicid• refseqtranscriptid• gscope_id• swissprot• unigene_id• entrezgene• ensembl• mgi• cytoband• chromoloc• omim• tissuespecificity• linkeddiseases• go_biologicalprocess• go_cellularcomponent• go_molecularfunction• pathway• interpro• transmembrane

PROBESET 2• genename• probeset_id• genedescription• species• speciessymbol• representpublicid• refseqtranscriptid• gscope_id• swissprot• unigene_id• entrezgene• ensembl• mgi• cytoband• chromoloc• omim• tissuespecificity• linkeddiseases• go_biologicalprocess• go_cellularcomponent• go_molecularfunction• pathway• interpro• transmembrane

Experiment Experiment

ArraytypeArraytype

RealExp 1RealExp 1

ArraytypeArraytype

SampleSample

CEL file r3CEL file r3CEL file r2CEL file r2CEL file r1CEL file r1

ArraytypeArraytype

RealExp 2RealExp 2

ArraytypeArraytype

SampleSample

CEL file r3CEL file r3CEL file r2CEL file r2

CEL file r1CEL file r1

ArraytypeArraytype

RealExp 3RealExp 3

ArraytypeArraytype

SampleSample

CEL file r3CEL file r3CEL file r2CEL file r2

CEL file r1CEL file r1

ArraytypeArraytype

RealExp 4RealExp 4

ArraytypeArraytype

Sample 4Sample 4

CEL file r11CEL file r11CEL file r10CEL file r10CEL file r9CEL file r9

ArraytypeArraytype PROBESET• probeset_id• genename• genedescription• species• speciessymbol• representpublicid• refseqtranscriptid• gscope_id• swissprot• unigene_id• entrezgene• ensembl• mgi• cytoband• chromoloc• omim• tissuespecificity• linkeddiseases• go_biologicalprocess• go_cellularcomponent• go_molecularfunction• pathway• interpro• transmembrane

45000

SampleSample

Individual• name• age• description

Individual• name• age• description

OrganismOrganism GenotypeGenotype

TissueTissue

TreatmentTreatment

SampleConditionSampleCondition

Signal Intensity

Ratio

Cluster

already exists ?

Arraytypes

Createnew Arraytype

already exists ?

Sample

Create new Sample with• existing or new Individual• existing or new Organism• existing or new Tissues• existing or new Genotype• existing or new Treatment

• Upload your .CEL files

• Enter their association to Arraytypes and Samples

• Define Couples of RealExpsfor the Ratio Calculation

• Fill in the other information for the Experiment

Run Automatic AnalysisQuery and Display Results

GxDb protocol from upload to display

Quality Report

Signal Intensity

Ratio

Cluster

Differentially Expressed

Genes

top related