microarray data analysis day 2
DESCRIPTION
Microarray Data Analysis Day 2. Microarray Data Process/Outline. Experimental Design Image Analysis – scan to intensity measures (raw data) Normalization – “clean” data More “low level” analysis-fold change, ANOVA, (Z-score) --data filtering Data mining-how to interpret > 6000 measures - PowerPoint PPT PresentationTRANSCRIPT
Microarray Data Analysis
Day 2
1. Experimental Design
2. Image Analysis – scan to intensity measures (raw data)
3. Normalization – “clean” data
4. More “low level” analysis-fold change, ANOVA, (Z-score) --data filtering
5. Data mining-how to interpret > 6000 measures– Databases– Software– Techniques-clustering, pattern recognition
etc.– Comparing to prior studies, across platforms?
6. Validation
Microarray Data Process/Outline
*
Today we will be using Spotfire software to filter and search your data.
10928 records in Spotfire -5999 S. pombe specific -166 Affy controls
5763 S. cerevisiae specific
The Affy detection oligonucleotide sequences are frozen at the time of synthesis, how does this impact downstream data analysis?
660343771407819
Biology and Data Mining
Subcellular Localization, Provides a simple goal for genome-scale functional prediction
Determine how many of the ~6000 yeast proteins go into each compartment
Subcellular Localization, a standardized aspect of function
Nucleus
Membrane
Extra-cellular[secreted]
ER
Cytoplasm
Mitochondria
Golgi
"Traditionally" subcellular localization is "predicted" by sequence patterns
NLS
TM-helix
Sig. Seq.
HDEL
Nucleus
Membrane
Extra-cellular[secreted]
ER
Cytoplasm
Mitochondria
GolgiImport Sig.
Subcellular localization is associated with the level of gene expression
Nucleus
Membrane
Extra-cellular[secreted]
ER
Cytoplasm
Mitochondria
Golgi
[Expression Level in Copies/Cell]
Combine Expression Information & Sequence Patterns to Predict Localization
NLS
TM-helix
Sig. Seq.
HDEL
Nucleus
Membrane
Extra-cellular[secreted]
ER
Cytoplasm
Mitochondria
GolgiImport Sig.
[Expression Level in Copies/Cell]
Major Objective: Discover a comprehensive theory of life’s organization at the molecular level
– The major actors of molecular biology: the nucleic acids, DeoxyriboNucleic Acid (DNA) and RiboNucleic Acids (RNA)
– The central dogma of molecular biology???
Proteins are very complicated molecules with 20 different amino acids.
Epigenetics
RNA editing
Post-translational modification
Translational regulation
Data Mining
Microarray Experiment
Image Analysis
Biology Application Domain
Experiment Design and Hypothesis
Data Analysis
Artificial Intelligence (AI)
Knowledge discovery in databases (KDD)
Data Warehouse
Validation
Statistics
Higher LevelMicroarray data analysis
• Clustering and pattern detection• Data mining and visualization• Linkage between gene expression data and
gene sequence/function/metabolic pathways databases
• Discovery of common sequences in co-regulated genes
• Meta-studies using data from multiple experiments
Scatter plot of all genes in a simple comparison of two control (A) and two treatments (B: high vs. low glucose) showing changes in expression greater than 2.2 and 3 fold.
Types of Clustering
• Herarchical– Link similar genes, build up to a tree of all
• Self Organizing Maps (SOM)– Split all genes into similar sub-groups– Finds its own groups (machine learning)
Cluster by color/expression
difference
Self Organizing Maps
Public Databases
• Gene Expression data is an essential aspect of annotating the genome
• Publication and data exchange for microarray experiments
• Data mining/Meta-studies• Common data format - XML• MIAME (Minimal Information About a
Microarray Experiment)
• Molecular Function = elemental activity/task
– the tasks performed by individual gene products; examples are carbohydrate binding and ATPase activity
• Biological Process = biological goal or objective– broad biological goals, such as mitosis or purine metabolism, that are accomplished
by ordered assemblies of molecular functions
• Cellular Component = location or complex– subcellular structures, locations, and macromolecular complexes; examples include
nucleus, telomere, and RNA polymerase II holoenzyme
The 3 Gene Ontologies
One Last Note
• Microarrays are “cutting edge” technology
• You now have experience doing a technique that most Ph.D.s have never done
• Looks great on a resume…