microarray data analysis using r
DESCRIPTION
Microarray Data Analysis Using R. Studies in Tissue Databases Mark Reimers, NCI. Outline. The GNF tissue database Exploratory analysis - clustering Positional co-regulation Insight via co-regulation Apoptotic configuration of tissues Probe level analysis. The GNF Expression Atlas. - PowerPoint PPT PresentationTRANSCRIPT
Microarray Data Microarray Data Analysis Using RAnalysis Using R
Studies in Tissue DatabasesStudies in Tissue Databases
Mark Reimers, NCIMark Reimers, NCI
OutlineOutline
The GNF tissue databaseThe GNF tissue database Exploratory analysis - clusteringExploratory analysis - clustering Positional co-regulationPositional co-regulation Insight via co-regulationInsight via co-regulation Apoptotic configuration of tissuesApoptotic configuration of tissues Probe level analysisProbe level analysis
The GNF Expression The GNF Expression AtlasAtlas
Su et al ( PNAS 2004) hybridized 150 Su et al ( PNAS 2004) hybridized 150 samples from 61 tissues to Affymetrix samples from 61 tissues to Affymetrix U133A and custom arraysU133A and custom arrays
Variation in gene expression (as Variation in gene expression (as proportion of transcriptome)proportion of transcriptome)
95% show at least one 2-fold change 95% show at least one 2-fold change among 61 tissuesamong 61 tissues
37% show more than 2-fold 37% show more than 2-fold differences between lowest 10% and differences between lowest 10% and highest 10%highest 10%
Clustering samplesClustering samples
All biological All biological replicates are replicates are nearest nearest neighborsneighbors
Dendrogram Dendrogram reflects reflects discrepancy discrepancy between between healthy and healthy and cancerouscancerous
Co-regulation of Nearby Co-regulation of Nearby GenesGenes
Some groups of genes next to one another Some groups of genes next to one another on chromosome show high correlation on chromosome show high correlation across tissuesacross tissues
Significance of Co-Significance of Co-regulation regulation
How often would such correlations happen ‘by How often would such correlations happen ‘by chance’ - eg. by selecting genes at random?chance’ - eg. by selecting genes at random?
Three random measures would have Three random measures would have correlation greater than 0.6 with p < 10correlation greater than 0.6 with p < 10-20-20!!
However 3 genes selected at random from However 3 genes selected at random from atlas have probability ~ 10atlas have probability ~ 10-3-3 of having all of having all corrs > 0.6corrs > 0.6 In 30,000 positions, we should see 30In 30,000 positions, we should see 30
156 regions of high correlation determined156 regions of high correlation determined Many are paralogsMany are paralogs
Perhaps 50% false discovery rate among the Perhaps 50% false discovery rate among the restrest
Prediction of FunctionPrediction of Function Zhang, et al (J. Biol, 2004, Zhang, et al (J. Biol, 2004, 33:21) :21)
hybridized 55 mouse tissues to spotted hybridized 55 mouse tissues to spotted oligo arraysoligo arrays
Hypothesis: genes with similar tissue Hypothesis: genes with similar tissue expression patterns share similar functionexpression patterns share similar function
Able to recover prediction of GO biological Able to recover prediction of GO biological process for known genes with better than process for known genes with better than 50% accuracy for many categories50% accuracy for many categories
Extended prediction to 1,092 Extended prediction to 1,092 uncharacterized transcriptsuncharacterized transcripts
Investigation of Poorly Investigation of Poorly Characterized Gene - Characterized Gene -
Top1MTTop1MT 10-fold variation in expression (odd 10-fold variation in expression (odd
for a ‘housekeeping gene’)for a ‘housekeeping gene’) >50 genes with expression highly >50 genes with expression highly
correlated ( .75) with Top1MT correlated ( .75) with Top1MT across tissue databaseacross tissue database Large proportion are splicing factorsLarge proportion are splicing factors Top1MT has an odd splice junction in Top1MT has an odd splice junction in
intron 1, and may depend critically on intron 1, and may depend critically on abundant splicing factorsabundant splicing factors
Apoptosis PatternsApoptosis Patterns
Majority of Majority of epithelial epithelial tissues show tissues show common common pattern pattern (indisposed (indisposed to apoptosis)to apoptosis)
Blood cells Blood cells show variety show variety of patternsof patterns
Exploration of Probe SetsExploration of Probe Sets
Examine Examine correlation of correlation of probe sets across probe sets across 150 samples150 samples
All but one probe All but one probe verified to match verified to match latest Unigene latest Unigene build for genebuild for gene
Probes organized Probes organized by position in 3’ by position in 3’ endend
Red: 1; White: < 0
Quality of ArraysQuality of Arrays Regional bias Regional bias
imagesimages