micro arrays ii - image analysis and data pre-processing(1)
TRANSCRIPT
-
8/2/2019 Micro Arrays II - Image Analysis and Data Pre-Processing(1)
1/34
GENOMICA FUNCIONALDR. VCTOR [email protected]
A7-421
Microarrays Image Analysis
-
8/2/2019 Micro Arrays II - Image Analysis and Data Pre-Processing(1)
2/34
Microarray - Pre-Processing Purpose
-
8/2/2019 Micro Arrays II - Image Analysis and Data Pre-Processing(1)
3/34
Microarray Image AnalysisTECHNOLOGIES
DNA Probes Oligos~20
40nt
Target(cDNA, PCR products, etc.)
Copies per gene Usually 1Usually 3
OrganizationSectors (print-tip) n x m probsets
Probeset
mprobsets
(~100)
ysectors
(~=3)
x sectors (~=3) n probsets (~100)
Sectorsi x j spots (18x20)
Empty spots
landing lights
perfect match probes (pm)
mismatch probes (mm)
Controls
-
8/2/2019 Micro Arrays II - Image Analysis and Data Pre-Processing(1)
4/34
Microarray - Image AnalysisTECHNOLOGIES
10,000 genes* 2 dyes
* 3 copies/gene* ~40 pixels/gene
= 2,400,00 values
only 10,000 values
10,000 genes* 20 oligos
* 2 (pm,mm)* ~ 36 pixels/gene
= 14,400,00 values
only 10,000 values
RAW DATA
Image AnalysisPre-processing
-
8/2/2019 Micro Arrays II - Image Analysis and Data Pre-Processing(1)
5/34
Image Analysis
Addressing:Estimate location of spot centers.Segmentation:Classify pixels as foreground or background.Extraction:For each spot on the array and each dye
foreground intensities background intensities
quality measures.Addressing Done by GeneChipAffymetrix software
-
8/2/2019 Micro Arrays II - Image Analysis and Data Pre-Processing(1)
6/34
Image Analysis
Addressing:Estimate location of spot centers.Segmentation:Classify pixels as foreground or background.Extraction:For each spot on the array and each dye
foreground intensities background intensities
quality measures.
Addressing (by grid, GenePix)
-
8/2/2019 Micro Arrays II - Image Analysis and Data Pre-Processing(1)
7/34
Image Analysis
Addressing:Estimate location of spot centers.Segmentation:Classify pixels as foreground or background.Extraction:For each spot on the array and each dye
foreground intensities background intensities
quality measures. Segmentation
Circular featureIrregular feature shape
Finally compute Average
-
8/2/2019 Micro Arrays II - Image Analysis and Data Pre-Processing(1)
8/34
Background Reduction
Extraction:
DeterminingBackground
-
8/2/2019 Micro Arrays II - Image Analysis and Data Pre-Processing(1)
9/34
Image Analysis
Segmentation
(Spot detection)
Background
Estimation
Value
Value = Spot Intensity Spot Background
Gene 1
Gene 2Gene 3
.
.
Gene k..
Gene N
Sample 1
100
2097
.
.
9882..
2298
Sample 1
98
42092..
9711..
28
-
8/2/2019 Micro Arrays II - Image Analysis and Data Pre-Processing(1)
10/34
Data Transformation two dyes
Gene 1Gene 2Gene 3
.
.
Gene k..
Gene N
Sample 1
100209
7..
9882..
2298
Sample 1
984209
2..
9711..
28 G=Sample 1
R=Sample1
Log2(G=Sample 1)
Log2(R=Sample1)
Log2
Microarray Bioinformatics - D. Stekel (Cambridge, 2003)
-
8/2/2019 Micro Arrays II - Image Analysis and Data Pre-Processing(1)
11/34
Data Transformation two dyes
Gene 1Gene 2Gene 3
.
.
Gene k..
Gene N
Sample 1
100209
7..
9882..
2298
Sample 1
984209
2..
9711..
28
(log2 scale)
R
G1 value?
( )
2
2
2
GRLog
A
G
RLogM
=
=
A
M
MA-PlotG=Sample 1
R=Sample1
Desv
Intensity
-
8/2/2019 Micro Arrays II - Image Analysis and Data Pre-Processing(1)
12/34
8 10 12 14 16
-4
-3
-2
-1
0
1
(log2(G)+log2(R)) / 2
log2(R)-log2(G)
A
M
"With-in"(2 color technologies)
Normalization 2 dyes
(assumption: Majority No change)
-
8/2/2019 Micro Arrays II - Image Analysis and Data Pre-Processing(1)
13/34
Normalization 2 dyes
(assumption: Majority No change)
Before
After
"With-in"(2 color technologies)
-
8/2/2019 Micro Arrays II - Image Analysis and Data Pre-Processing(1)
14/34
Normalization 2 dyes
"With-in" Spatial(2 color technologies)
Before Normalization
Aftter loessGlobal Normalization
Aftter loessby Sector (print-tip)
Normalization
-
8/2/2019 Micro Arrays II - Image Analysis and Data Pre-Processing(1)
15/34
Data Transformation one dye
Gene 1Gene 2Gene 3
.
.
Gene k..
Gene N
Sample 1
100209
7..
9882..
2298
Log2
-
8/2/2019 Micro Arrays II - Image Analysis and Data Pre-Processing(1)
16/34
7 8 9 10 11 12
0.0
0.5
1.0
1.5
N = 3840 Bandwidth = 0.1051
Density
9 10 11 12 13 14 15 16
0.0
0.2
0.4
0.6
0.8
1.0
log intensity
density
10 11 12 13 14 15
0.0
0.2
0.4
0.6
0.8
x
density
Before normalization After normalization
Between-slides
Normalization 1 or 2 dyes
quantileMAD (median absolute deviation)
scale
qspline
invariantset
loess
-
8/2/2019 Micro Arrays II - Image Analysis and Data Pre-Processing(1)
17/34
Sumarization = "Average"(Intensities)
Summarization Affymetrix
Oligonucleotide dependent technologies
Usual Methods:tukey-biweightav-diffmedian-polish
PMMM
The "summarization" equivalent intwo-dyes technologies is the average
of gene replicates within the slide.
-
8/2/2019 Micro Arrays II - Image Analysis and Data Pre-Processing(1)
18/34
Microarrays Filtering / TreatingUndefined Values
Some spots may be defective in the printing process Some spots could not be detected Some spots may be damaged during the assay Artefacts may be presents (bubbles, etc)
Use replicated spots as averages Remove unrecoverable genes Remove problematic spots in all arrays Infer values using computational methods (warning)
-
8/2/2019 Micro Arrays II - Image Analysis and Data Pre-Processing(1)
19/34
Microarray Data Filtering
More than 10,000 genes Too many data increases Computation Time and analysis
complexity
Remove Genes that do not change significantly Undefined Genes Low expression
Keeping
Large signal to noise ratio Large statistical significance Large variability Large expression
-
8/2/2019 Micro Arrays II - Image Analysis and Data Pre-Processing(1)
20/34
Data Processing
BackgroundDetection &Subtraction
a)
MicroarrayImage
Scanning SpotDetectionIntensityValue
Affymetrix
Twodyes
b)Image Analysis and Background Subtraction
c)Transformation
BetweenWithin
d)
A=log2(R*G)/2M=log2(R/G)
Normalization
Microarray Pre-Processing Summary
-
8/2/2019 Micro Arrays II - Image Analysis and Data Pre-Processing(1)
21/34
Image Analysis Exercise
Data processing of Placental Microarrays Dr. Hugo A. Barrera Saldaa Paper in Mol. Med. 2007 : DNA Microarrays - A
Powerful Genomic Tool for Biomedical Research -Trevino - Barrera - Mol Med 2007 Search PubMed for Trevino V
-
8/2/2019 Micro Arrays II - Image Analysis and Data Pre-Processing(1)
22/34
Experimental DesignGoal : Differential Expression
Placenta 1Placenta 2mRNA Extraction
Reference Pool
Labelling
MicroarrayHybridization(by duplicates)
Scanning &Data Processing
Detection ofDifferentially
Expressed Genes
Validation and
Analysis
Green GreenRed Red
ttestH0: = 0pvalues correction: False Discovery Rate
Comparison With Known Tissue Specific Genes
ImageAnalysis
WithinNormalization
(per array)
BetweenNormalization
(all arrays)
(controls)
(Dr. Hugo Barrera)
-
8/2/2019 Micro Arrays II - Image Analysis and Data Pre-Processing(1)
23/34
SLIDES' SCANNINGSGROUP SLIDE CY3 (GREEN) CY5(RED) COMMENTS
1a 52 A V Sample Control
1b 52 B V Sample Control
2a 51 A V Sample Control RIGHT TOP GROUP
2b 51 B V Sample Control RIGHT BOTTOM GROUP
3a 56 A V Control Muestra
3b 56 B V Control Muestra
4a A 54 V Control Muestra
4b B 54 V Control Muestra
5a A 55 V Control Control LEFT TOP GROUP
5b B 55 V Control Control LEFT BOTTOM GROUP
6a A 53 V Control Control
6b B 53 V Control Control
Experimental Design - Slides
http://bioinformatica.mty.itesm.mx/?q=node/68
DownloadImages from
-
8/2/2019 Micro Arrays II - Image Analysis and Data Pre-Processing(1)
24/34
Read ImagesRead BOTH Imagestogether using SpotFinder
Mark file 1 as "Cy3" = GreenMark file 2 as "Cy5" = Red
Adjust Image Brightness and Contrast
-
8/2/2019 Micro Arrays II - Image Analysis and Data Pre-Processing(1)
25/34
Create Grid
Create GridMetarows = 12,Metacolumns = 4Rows = 24, Columns = 24Pixels = 450 (of the 24 x 24
spots)Spacing = 18 (betweenmetacolumns and metarows)
-
8/2/2019 Micro Arrays II - Image Analysis and Data Pre-Processing(1)
26/34
Adjust Grid
Adjust each of the 12*4Grids to correctpositions
Right mouse button in agrid to move that gridArrow keys also work
Right mouse button in ablank section to move all
grids
Created Grids are not aligned to the image.
Use VisibleAll (right click in
a blank area)
Use Move AllTo adjust overall
position. Use
visible all to
restore grid.
-
8/2/2019 Micro Arrays II - Image Analysis and Data Pre-Processing(1)
27/34
Save Grid
Save the grid frequently to avoid loosing your work
-
8/2/2019 Micro Arrays II - Image Analysis and Data Pre-Processing(1)
28/34
Image Analysis
Use Gridding and Processing Adjust (save grid first, in mac adjust doesnt work well) Process
Copy images 1 From the grid adjust 1 From the RI plot
1 From the data (figure) 2 From the QC view (A and B) What does they represent?
Export to .mev file Open .mev file in excel Remove comment lines Compute signal:
Signal A = Cy3 Green = MNA - MedBkgA = Media del spot A - Mediana delfondo B
Signal B = Cy5 Red = MNB - MedBkgB = Media del spot B - mediana del fondoB
Plot Signal A vs Signal B Copy image in a word file
DO NOT SAVE THE modified .MEV FILE
-
8/2/2019 Micro Arrays II - Image Analysis and Data Pre-Processing(1)
29/34
Execute Process
- Select Gridding Tab- Use Histogram Segmentation
- Spot Size = 10- Process All !
-
8/2/2019 Micro Arrays II - Image Analysis and Data Pre-Processing(1)
30/34
Inspect DATA PROCESSED
Select Data Tab
Select a row / spot
See results and interpretoutput
-
8/2/2019 Micro Arrays II - Image Analysis and Data Pre-Processing(1)
31/34
Inspect MA-PLOT
Select RI-PLOT Tab Observe the MA-PLOT
You can switch on/offspecific grids
A tendency can beobserved (which has
to be corrected to 0see MIDAS exercise)
-
8/2/2019 Micro Arrays II - Image Analysis and Data Pre-Processing(1)
32/34
Quality Control View
Quality view tab
View 2 gives if each had M > 1(yellow, or 0.5 in this image)or M < -1
View 1 gives the count of all Mvalues per color (yellow, gray, blue,and green)
-
8/2/2019 Micro Arrays II - Image Analysis and Data Pre-Processing(1)
33/34
Export DATA and VIEW in Excel
Save data to a .mev file
Open .mev file in excel
Remove comment lines(important !)
Compute signal: Signal A = Cy3 Green = MNA -
MedBkgA = Media del spot A -Mediana del fondo B
Signal B = Cy5 Red = MNB - MedBkgB= Media del spot B - mediana delfondo B
Plot Signal A vs Signal B Copy image in a word file
DO NOT SAVE THE modified .MEVFILE
The Plot in Excel shouldbe similar to the MAplot (RI-Plot)
-
8/2/2019 Micro Arrays II - Image Analysis and Data Pre-Processing(1)
34/34
Resumen del Uso de SpotFinder
Lemos 2 imgenes, Verde=Cy3, Roja=Cy5 paragenerar un valor de intensidad con ruido defondo reducido para cada color:Generamos un grid con la cantidad de spots y diseo
espacial especificado para el microarreglo
Ajustamos las posiciones visualmente moviendo los gridsCalculamos el valor de la seal y el ruido de fondo
para cada colorObtuvimos un archivo con datos
Imagen
Datos