microarray data analysis using r

11
Microarray Data Microarray Data Analysis Using R Analysis Using R Studies in Tissue Databases Studies in Tissue Databases Mark Reimers, NCI Mark Reimers, NCI

Upload: julian-barlow

Post on 31-Dec-2015

15 views

Category:

Documents


1 download

DESCRIPTION

Microarray Data Analysis Using R. Studies in Tissue Databases Mark Reimers, NCI. Outline. The GNF tissue database Exploratory analysis - clustering Positional co-regulation Insight via co-regulation Apoptotic configuration of tissues Probe level analysis. The GNF Expression Atlas. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Microarray Data Analysis Using R

Microarray Data Microarray Data Analysis Using RAnalysis Using R

Studies in Tissue DatabasesStudies in Tissue Databases

Mark Reimers, NCIMark Reimers, NCI

Page 2: Microarray Data Analysis Using R

OutlineOutline

The GNF tissue databaseThe GNF tissue database Exploratory analysis - clusteringExploratory analysis - clustering Positional co-regulationPositional co-regulation Insight via co-regulationInsight via co-regulation Apoptotic configuration of tissuesApoptotic configuration of tissues Probe level analysisProbe level analysis

Page 3: Microarray Data Analysis Using R

The GNF Expression The GNF Expression AtlasAtlas

Su et al ( PNAS 2004) hybridized 150 Su et al ( PNAS 2004) hybridized 150 samples from 61 tissues to Affymetrix samples from 61 tissues to Affymetrix U133A and custom arraysU133A and custom arrays

Variation in gene expression (as Variation in gene expression (as proportion of transcriptome)proportion of transcriptome)

95% show at least one 2-fold change 95% show at least one 2-fold change among 61 tissuesamong 61 tissues

37% show more than 2-fold 37% show more than 2-fold differences between lowest 10% and differences between lowest 10% and highest 10%highest 10%

Page 4: Microarray Data Analysis Using R

Clustering samplesClustering samples

All biological All biological replicates are replicates are nearest nearest neighborsneighbors

Dendrogram Dendrogram reflects reflects discrepancy discrepancy between between healthy and healthy and cancerouscancerous

Page 5: Microarray Data Analysis Using R

Co-regulation of Nearby Co-regulation of Nearby GenesGenes

Some groups of genes next to one another Some groups of genes next to one another on chromosome show high correlation on chromosome show high correlation across tissuesacross tissues

Page 6: Microarray Data Analysis Using R

Significance of Co-Significance of Co-regulation regulation

How often would such correlations happen ‘by How often would such correlations happen ‘by chance’ - eg. by selecting genes at random?chance’ - eg. by selecting genes at random?

Three random measures would have Three random measures would have correlation greater than 0.6 with p < 10correlation greater than 0.6 with p < 10-20-20!!

However 3 genes selected at random from However 3 genes selected at random from atlas have probability ~ 10atlas have probability ~ 10-3-3 of having all of having all corrs > 0.6corrs > 0.6 In 30,000 positions, we should see 30In 30,000 positions, we should see 30

156 regions of high correlation determined156 regions of high correlation determined Many are paralogsMany are paralogs

Perhaps 50% false discovery rate among the Perhaps 50% false discovery rate among the restrest

Page 7: Microarray Data Analysis Using R

Prediction of FunctionPrediction of Function Zhang, et al (J. Biol, 2004, Zhang, et al (J. Biol, 2004, 33:21) :21)

hybridized 55 mouse tissues to spotted hybridized 55 mouse tissues to spotted oligo arraysoligo arrays

Hypothesis: genes with similar tissue Hypothesis: genes with similar tissue expression patterns share similar functionexpression patterns share similar function

Able to recover prediction of GO biological Able to recover prediction of GO biological process for known genes with better than process for known genes with better than 50% accuracy for many categories50% accuracy for many categories

Extended prediction to 1,092 Extended prediction to 1,092 uncharacterized transcriptsuncharacterized transcripts

Page 8: Microarray Data Analysis Using R

Investigation of Poorly Investigation of Poorly Characterized Gene - Characterized Gene -

Top1MTTop1MT 10-fold variation in expression (odd 10-fold variation in expression (odd

for a ‘housekeeping gene’)for a ‘housekeeping gene’) >50 genes with expression highly >50 genes with expression highly

correlated ( .75) with Top1MT correlated ( .75) with Top1MT across tissue databaseacross tissue database Large proportion are splicing factorsLarge proportion are splicing factors Top1MT has an odd splice junction in Top1MT has an odd splice junction in

intron 1, and may depend critically on intron 1, and may depend critically on abundant splicing factorsabundant splicing factors

Page 9: Microarray Data Analysis Using R

Apoptosis PatternsApoptosis Patterns

Majority of Majority of epithelial epithelial tissues show tissues show common common pattern pattern (indisposed (indisposed to apoptosis)to apoptosis)

Blood cells Blood cells show variety show variety of patternsof patterns

Page 10: Microarray Data Analysis Using R

Exploration of Probe SetsExploration of Probe Sets

Examine Examine correlation of correlation of probe sets across probe sets across 150 samples150 samples

All but one probe All but one probe verified to match verified to match latest Unigene latest Unigene build for genebuild for gene

Probes organized Probes organized by position in 3’ by position in 3’ endend

Red: 1; White: < 0

Page 11: Microarray Data Analysis Using R

Quality of ArraysQuality of Arrays Regional bias Regional bias

imagesimages