genomics workshop demography of aging centers biomarker network meeting in conjunction with the...
TRANSCRIPT
Genomics Workshop
Demography of Aging Centers Biomarker Network Meeting in Conjunction with the Annual Meeting of the PAA
April 14, 9:00 AM to 3:30 PM – Hyatt Regency, Dallas, Texas
Sponsored by USC/UCLA Center of Biodemography and Population Health
Organized by Teresa Seeman, Steven Cole, Eileen Crimmins
Tactical aspects of study administration and sample capture/storage
Biological overview of genetics & functional genomics
Strategic aspects of study design and data analysis
Lunch
Technical aspects of study design and data analysis
Perspectives on the State of the Field
Application clinic
Tactical aspects of study administration and sample capture/storage
DNA1. New sample capture
• Methods: e.g., Oragene, leukocytes• Consent & administrative issues
2. Retrospective analyses• Sources: blood spots, cheek swabs, etc• Consent & administrative issues
3. Epigenetics• DNA methylation • Histone acetylation & chromatin dynamics• Tissue specificity (vs DNA)
4. Tactical issues – Reports from the Field• I wish I’d known then…
RNA1. Identifying appropriate target tissues
• Whole blood, PBMC, saliva, hair, path specim.2. Sample capture/storage3. Consent & administrative issues
Tactical aspects of study administration and sample capture/storage
DNA1. New sample capture
• Methods: e.g., Oragene, leukocytes• Consent & administrative issues
2. Retrospective analyses• Sources: blood spots, cheek swabs, etc• Consent & administrative issues
3. Epigenetics• DNA methylation • Histone acetylation & chromatin dynamics• Tissue specificity (vs DNA)
4. Tactical issues – Reports from the Field• I wish I’d known then…
RNA1. Identifying appropriate target tissues
• Whole blood, PBMC, saliva, hair, path specim.2. Sample capture/storage3. Consent & administrative issues
Tactical aspects of study administration and sample capture/storage
DNA1. New sample capture
• Methods: e.g., Oragene, leukocytes• Consent & administrative issues
2. Retrospective analyses• Sources: blood spots, cheek swabs, etc• Consent & administrative issues
3. Epigenetics• DNA methylation • Histone acetylation & chromatin dynamics• Tissue specificity (vs DNA)
4. Tactical issues – Reports from the Field• I wish I’d known then…
RNA1. Identifying appropriate target tissues
• Whole blood, PBMC, saliva, hair, path specim.2. Sample capture/storage3. Consent & administrative issues
Gene
IL6 DNA
Gene
IL6 DNA
Gene
IL6
RNA
DNA
Gene
Health
IL6
RNA
DNA
Tactical aspects of study administration and sample capture/storage
DNA1. New sample capture
• Methods: e.g., Oragene, leukocytes• Consent & administrative issues
2. Retrospective analyses• Sources: blood spots, cheek swabs, etc• Consent & administrative issues
3. Epigenetics• DNA methylation • Histone acetylation & chromatin dynamics• Tissue specificity (vs DNA)
4. Tactical issues – Reports from the Field• I wish I’d known then…
RNA1. Identifying appropriate target tissues
• Whole blood, PBMC, saliva, hair, path specim.2. Sample capture/storage3. Consent & administrative issues
Biological overview of genetics & functional genomics
Theoretical framework: Genes, Environments, transcription, and health
1. “Genetic” influences (missing h, penetrance R-square, etc.)
2. Functional genomics• Transcription factors• Epigenetics
3. Gene-Environment interactions• Regulatory polymorphism• Coding polymorphism
System dynamics
1. Feedback, network pleiotropy
2. Recursive developmental trajectories
Gene
IL6 DNA
Biological overview of genetics & functional genomics
Theoretical framework: Genes, Environments, transcription, and health
1. “Genetic” influences (missing h, penetrance R-square, etc.)
2. Functional genomics• Transcription factors• Epigenetics
3. Gene-Environment interactions• Regulatory polymorphism• Coding polymorphism
System dynamics
1. Feedback, network pleiotropy
2. Recursive developmental trajectories
Gene
IL6 DNA
Gene
IL6 DNA
Gene
IL6
RNA
DNA
Gene
Health
IL6
RNA
DNA
Gene
Health
IL6
RNA
DNA
Social Environment
Gene
Health
IL6
RNA
DNA
Social Environment
Gene
Health
IL6
RNA
DNA
Social Environment
Gene
Health
IL6
RNA
DNA
Social Environment
Gene
Health
IL6
RNA
DNA
IL6TCT TGCGATGCTA AAG
IL6 gene transcription
IL6TCT TGCGATGCTA AAG
IL6 gene transcription
NE
IL6TCT TGCGATGCTA AAG
IL6 gene transcription
NE
PKA
IL6TCT TGCGATGCTA AAG
IL6 gene transcription
NE
GATA1P
PKA
IL6TCT TGCGATGCTA AAG
IL6 gene transcription
NE
GATA1P
PKA
IL6TCT TGCGATGCTA AAG
IL6 gene transcription
NE
GATA1P
PKA
IL6
prom
oter
act
ivity
(f
old-
chan
ge)
10
8
6
4
2
0
Norepinephrine (M): 0 10 - 0 10
70 80 90
0.0
0.2
0.4
0.6
0.8
1.0
Age
Su
rviv
al
70 80 90
0.0
0.2
0.4
0.6
0.8
1.0
Age
Su
rviv
al
Non-depressedDepressed
p = .008
Socio-environmental regulation of IL6
70 80 90
0.0
0.2
0.4
0.6
0.8
1.0
Age
Su
rviv
al
70 80 90
0.0
0.2
0.4
0.6
0.8
1.0
Age
Su
rviv
al
Biological overview of genetics & functional genomics
Theoretical framework: Genes, Environments, transcription, and health
1. “Genetic” influences (missing h, penetrance R-square, etc.)
2. Functional genomics• Transcription factors• Epigenetics
3. Gene-Environment interactions• Regulatory polymorphism• Coding polymorphism
System dynamics
1. Feedback, network pleiotropy
2. Recursive developmental trajectories
Gene
IL6 DNA
Gene
IL6 DNA
Gene
Health
IL6
RNA
DNA
Gene
Health
IL6
RNA
DNA
Gene
IL6 DNA
Biological overview of genetics & functional genomics
Theoretical framework: Genes, Environments, transcription, and health
1. “Genetic” influences (missing h, penetrance R-square, etc.)
2. Functional genomics• Transcription factors• Epigenetics
3. Gene-Environment interactions• Regulatory polymorphism• Coding polymorphism
System dynamics
1. Feedback, network pleiotropy
2. Recursive developmental trajectories
Social Environment
Gene
Health
IL6
RNA
DNA
Social Environment
Gene
Health
IL6… [G/C] …
RNA
DNA
Social Environment
Gene
Health
IL6… [G/C] …
RNA
DNA
Social Environment
Gene
IL6… [G/C] … DNA
IL6TCT TGCGATGCTA AAG
Gene x Environment Interaction
In silico
IL6TCT TGCGATGCTA AAG
V$GATA1_01 = .943
Gene x Environment Interaction
In silico
IL6TCT TGCGATGCTA AAG
C
V$GATA1_01 = .943
Gene x Environment Interaction
In silico
IL6TCT TGCGATGCTA AAG
C
V$GATA1_01 = .943
V$GATA1_01 = .619
Gene x Environment Interaction
In silico
Tra
nscr
iptio
nal a
ctiv
ity
(fol
d-ch
ange
)
10
8
6
4
2
0
IL6 promoter: WT -174C
Norepinephrine (M): 0 10 - 0 10
IL6TCT TGCGATGCTA AAG
C
V$GATA1_01 = .943
V$GATA1_01 = .619
Gene x Environment Interaction
In silico In vitro
Tra
nscr
iptio
nal a
ctiv
ity
(fol
d-ch
ange
)
10
8
6
4
2
0
IL6 promoter: WT -174C
Norepinephrine (M): 0 10 - 0 10
Difference: p < .0001
IL6TCT TGCGATGCTA AAG
C
V$GATA1_01 = .943
V$GATA1_01 = .619
Gene x Environment Interaction
In silico In vitro
70 80 90
0.0
0.2
0.4
0.6
0.8
1.0
Age
Su
rviv
al
70 80 90
0.0
0.2
0.4
0.6
0.8
1.0
Age
Su
rviv
al
Non-depressedDepressed
p = .008
Gene x Environment Interaction
70 80 90
0.0
0.2
0.4
0.6
0.8
1.0
Age
Su
rviv
al
70 80 90
0.0
0.2
0.4
0.6
0.8
1.0
Age
Su
rviv
al
IL6 -174 GG IL6 -174 CC/GC
70 80 90
0.0
0.2
0.4
0.6
0.8
1.0
Age
Su
rviv
al
70 80 90
0.0
0.2
0.4
0.6
0.8
1.0
Age
Su
rviv
al
p = .439
Non-depressedDepressed
70 80 90
0.0
0.2
0.4
0.6
0.8
1.0
Age
Su
rviv
al
70 80 90
0.0
0.2
0.4
0.6
0.8
1.0
Age
Su
rviv
al
Non-depressedDepressed
p = .008
Gene x Environment Interaction
70 80 90
0.0
0.2
0.4
0.6
0.8
1.0
Age
Su
rviv
al
70 80 90
0.0
0.2
0.4
0.6
0.8
1.0
Age
Su
rviv
al
IL6 -174 GG IL6 -174 CC/GC
Biological overview of genetics & functional genomics
Theoretical framework: Genes, Environments, transcription, and health
1. “Genetic” influences (missing h, penetrance R-square, etc.)
2. Functional genomics• Transcription factors• Epigenetics
3. Gene-Environment interactions• Regulatory polymorphism• Coding polymorphism
System dynamics
1. Feedback, network pleiotropy
2. Recursive developmental trajectories
Social Environment
Gene
Health
IL6
RNA
DNA
Social Environment
Gene
Health
IL6
RNA
DNA… [G/C] …
Social Environment
Gene
Health2
IL6
RNA2
DNA… [G/C] …
Biological overview of genetics & functional genomics
Theoretical framework: Genes, Environments, transcription, and health
1. “Genetic” influences (missing h, penetrance R-square, etc.)
2. Functional genomics• Transcription factors• Epigenetics
3. Gene-Environment interactions• Regulatory polymorphism• Coding polymorphism
System dynamics
1. Feedback, network pleiotropy
2. Recursive developmental trajectories
Social Environment
Gene
Health
IL6
RNA
DNA
Social Environment
Gene
IL6
RNA
DNA
Behavior
Social Environment
Gene
IL6
RNA
DNA
Behavior
Gene-Environment Correlation
Social Environment
Gene
IL6
RNA
DNA
Behavior
Gene-Environment Correlation
Social Environment
Gene
IL6
RNA
DNA
Behavior
Gene-Environment Correlation
Social Environment
Gene
IL6
RNA
DNA
Behavior
Gene-Environment Correlation
Social Environment
Gene
IL6
RNA
DNA
Behavior
Gene-Environment Correlation
Recursive Molecular Remodeling
Body1
Recursive developmental remodeling
Cole (2009) Current Directions in Psychological Science
Environment1 Body1
Recursive developmental remodeling
Cole (2009) Current Directions in Psychological Science
Environment1 Body1
Behavior1
Recursive developmental remodeling
Cole (2009) Current Directions in Psychological Science
Environment1 Body1
RNA1
Behavior1
Recursive developmental remodeling
Cole (2009) Current Directions in Psychological Science
Time 1 Environment1 Body1
RNA1
Behavior1
Time 2 Body2
Recursive developmental remodeling
Cole (2009) Current Directions in Psychological Science
Time 1 Environment1 Body1
RNA1
Behavior1
Time 2 Environment2 Body2
Recursive developmental remodeling
Cole (2009) Current Directions in Psychological Science
Time 1 Environment1 Body1
RNA1
Behavior1
Time 2 Environment2 Body2
RNA2
Behavior2
Recursive developmental remodeling
Cole (2009) Current Directions in Psychological Science
Time 1 Environment1 Body1
RNA1
Behavior1
Time 2 Environment2 Body2
RNA2
Behavior2
Time 3 Environment3 Body3
RNA3
Behavior3
Recursive developmental remodeling
Cole (2009) Current Directions in Psychological Science
Time 1 Environment1 Body1
RNA1
Behavior1
Time 2 Environment2 Body2
RNA2
Behavior2
Time 3 Environment3 Body3
RNA3
Behavior3
Recursive developmental remodeling
RNA = intra-organismic adaptation
Cole (2009) Current Directions in Psychological Science
Biological overview of genetics & functional genomics
Theoretical framework: Genes, Environments, transcription, and health
1. “Genetic” influences (missing h, penetrance R-square, etc.)
2. Functional genomics• Transcription factors• Epigenetics
3. Gene-Environment interactions• Regulatory polymorphism• Coding polymorphism
System dynamics
1. Feedback, network pleiotropy
2. Recursive developmental trajectories
Strategic aspects of study design and data analysis
Basic substantive objectives & study designs
1. “Gene discovery” (e.g., genetic epidemiology)
2. Environmental regulation of health (via transcription)
3. Gene-Environment interaction
Gene
IL6 DNA
Gene
Health
IL6 DNA
Strategic aspects of study design and data analysis
Basic substantive objectives & study designs
1. “Gene discovery” (e.g., genetic epidemiology)
2. Environmental regulation of health (via transcription)
3. Gene-Environment interaction
Gene
Health
IL6 DNA
Gene
Health
IL6
RNA
DNA
Strategic aspects of study design and data analysis
Basic substantive objectives & study designs
1. “Gene discovery” (e.g., genetic epidemiology)
2. Environmental regulation of health (via transcription)
3. Gene-Environment interaction
Gene
Health
IL6
RNA
DNA
Gene
Health
IL6
RNA
DNA… [G/C] …… [G/C] …
Strategic aspects of study design and data analysis
Basic substantive objectives & study designs
1. “Gene discovery” (e.g., genetic epidemiology)
2. Environmental regulation of health (via transcription)
3. Gene-Environment interaction
Antagonistic pleiotropy
IL6 -174: CC GC GG CC GC GG
p = .007
Older Adult Adolescent
CR
P m
g/L
/ A
dver
sity
SD
3.0
2.0
1.0
0.0
-1.0
-2.0
-3.0
p = .032
Antagonistic pleiotropy
IL6 -174: CC GC GG CC GC GG
p = .007
Older Adult Adolescent
CR
P m
g/L
/ A
dver
sity
SD
3.0
2.0
1.0
0.0
-1.0
-2.0
-3.0
p = .032
Antagonistic pleiotropy
IL6 -174: CC GC GG CC GC GG
p = .007
Older Adult Adolescent
CR
P m
g/L
/ A
dver
sity
SD
3.0
2.0
1.0
0.0
-1.0
-2.0
-3.0
p = .032
Antagonistic pleiotropy
Evolution deletes disadvantage, particularly to the young
GG GC CC
Out
com
e
Fisher’s regression:
GG GC CC
Out
com
e
y = a + b(#G) + e
Fisher’s regression:
GG GC CC
Out
com
e
y = a + b(#G) + e
Environment A
GG GC CCO
utco
me
Environment B
Fisher’s regression:
GG GC CC
Out
com
e
y = a + b(#G) + c(Env) + d(#G x Env) + e
Environment A
GG GC CCO
utco
me
Environment B
Fisher’s regression:
GG GC CC
Out
com
e
y = a + b(#G) + e’ ← c(Env) + d(#G x Env) + e
Environment A
GG GC CCO
utco
me
Environment B
Fisher’s regression:
GG GC CC
Out
com
e
y = a + b(#G) + e’ ← c(Env) + d(#G x Env) + e ↓ power
Environment A
GG GC CCO
utco
me
Environment B
Fisher’s regression:
GG GC CC
Out
com
e
y = a + b(#G) + e’ ← c(Env) + d(#G x Env) + e ↓ power ↑ parameter estimate bias
Environment A
GG GC CCO
utco
me
Environment B
Fisher’s regression:
GG GC CC
Out
com
e
y = a + b(#G) + e’ ← c(Env) + d(#G x Env) + e ↓ power ↑ parameter estimate bias
Marginal: 0
Environment A
GG GC CCO
utco
me
Environment B
Strategic aspects of study design and data analysis
Basic substantive objectives & study designs
1. “Gene discovery” (e.g., genetic epidemiology)
2. Environmental regulation of health (via transcription)
3. Gene-Environment interaction
Antagonistic pleiotropy
Valid statistical models are one major reason that substantive interests (environments) matter.
Strategic aspects of study design and data analysis
Basic substantive objectives & study designs
1. “Gene discovery” (e.g., genetic epidemiology)
2. Environmental regulation of health (via transcription)
3. Gene-Environment interaction
Antagonistic pleiotropy
Valid statistical models are one major reason that substantive interests (environments) matter.
OK, then, let’s have lunch.
Technical aspects of study design and data analysis
Study designs, assay technologies, and statistical methods
1. “Gene discovery” (e.g., genetic epidemiology)• Candidate gene studies• Genome-wide association studies• The bioinformatic middle road
2. Environmental regulation of health (via transcription)• Candidate transcript studies• Genome-wide approaches
3. Gene-Environment interaction• Statistical issues• Revisiting the bioinformatic middle road
Technical aspects of study design and data analysis
Study designs, assay technologies, and statistical methods
1. “Gene discovery” (e.g., genetic epidemiology)• Candidate gene studies
- Candidate identification- Targeted genotyping
a. PCRb. High-throughput approaches
- Statistical modelsa. Fisher’s basic regression modelb. Multivariate mapping / association / recombination
i. Recombinationii. Haplotype blocks
c. Confoundingi. Linkage disequilibrium & haplotype analysesii. Ethnic stratification
Phenotypic ascertainmentGenetic ancestry
iii. Mendelian randomization
IL6TCT TGCGATGCTA AAG
Gene x Environment Interaction
IL6TCT TGCGATGCTA AAG
C
IL6TCT TGCGATGCTA AAG
C
V$GATA1_01 = .943
Gene x Environment Interaction
In silico
IL6TCT TGCGATGCTA AAG
C
V$GATA1_01 = .943
V$GATA1_01 = .619
Gene x Environment Interaction
In silico
Tra
nscr
iptio
nal a
ctiv
ity
(fol
d-ch
ange
)
10
8
6
4
2
0
IL6 promoter: WT -174C
Norepinephrine (M): 0 10 - 0 10
Difference: p < .0001
IL6TCT TGCGATGCTA AAG
C
V$GATA1_01 = .943
V$GATA1_01 = .619
Gene x Environment Interaction
In silico In vitro
70 80 90
0.0
0.2
0.4
0.6
0.8
1.0
Age
Su
rviv
al
70 80 90
0.0
0.2
0.4
0.6
0.8
1.0
Age
Su
rviv
al
p = .439
Non-depressedDepressed
70 80 90
0.0
0.2
0.4
0.6
0.8
1.0
Age
Su
rviv
al
70 80 90
0.0
0.2
0.4
0.6
0.8
1.0
Age
Su
rviv
al
Non-depressedDepressed
p = .008
Gene x Environment Interaction
70 80 90
0.0
0.2
0.4
0.6
0.8
1.0
Age
Su
rviv
al
70 80 90
0.0
0.2
0.4
0.6
0.8
1.0
Age
Su
rviv
al
IL6 -174 GG IL6 -174 CC/GC
Technical aspects of study design and data analysis
Study designs, assay technologies, and statistical methods
1. “Gene discovery” (e.g., genetic epidemiology)• Candidate gene studies
- Candidate identification- Targeted genotyping
a. PCRb. High-throughput approaches
- Statistical modelsa. Fisher’s basic regression modelb. Multivariate mapping / association / recombination
i. Recombinationii. Haplotype blocks
c. Confoundingi. Linkage disequilibrium & haplotype analysesii. Ethnic stratification
Phenotypic ascertainmentGenetic ancestry
iii. Mendelian randomization
Well ID1 ID2 RFU1 RFU2 Ct1 Ct2 CallA01 053 053 1094.39 956.90 42.53 41.36 HeterozygoteA02 065 065 -43.33 1519.25 60.00 40.39 Allele2A03 075 075 1126.77 890.96 42.82 42.02 HeterozygoteA04 079 079 2095.09 25.36 42.84 60.00 Allele1A05 087 087 2187.80 18.09 41.27 60.00 Allele1
Technical aspects of study design and data analysis
Study designs, assay technologies, and statistical methods
1. “Gene discovery” (e.g., genetic epidemiology)• Candidate gene studies
- Candidate identification- Targeted genotyping
a. PCRb. High-throughput approaches
- Statistical modelsa. Fisher’s basic regression modelb. Multivariate mapping / association / recombination
i. Recombinationii. Haplotype blocks
c. Confoundingi. Linkage disequilibrium & haplotype analysesii. Ethnic stratification
Phenotypic ascertainmentGenetic ancestry
iii. Mendelian randomization
Technical aspects of study design and data analysis
Study designs, assay technologies, and statistical methods
1. “Gene discovery” (e.g., genetic epidemiology)• Candidate gene studies
- Candidate identification- Targeted genotyping
a. PCRb. High-throughput approaches
- Statistical modelsa. Fisher’s basic regression modelb. Multivariate mapping / association / recombination
i. Recombinationii. Haplotype blocks
c. Confoundingi. Linkage disequilibrium & haplotype analysesii. Ethnic stratification
Phenotypic ascertainmentGenetic ancestry
iii. Mendelian randomization
Fisher’s regression:
GG GC CC
Out
com
e
Fisher’s regression:
GG GC CC
Out
com
e
Fisher’s regression:
GG GC CC
Out
com
e
Fisher’s regression:
GG GC CC
Out
com
e
Fisher’s regression:
GG GC CC
Out
com
e
y = a + b(#G)
Fisher’s regression:
GG GC CC
Out
com
e
y = a + b(#G)
y = a + b(GG) + c(GC) + d(CC)
Fisher’s regression:
GG GC CC
Out
com
e
y = a + b(#G)
y = a + b(GG) + c(GC) + d(CC)
Technical aspects of study design and data analysis
Study designs, assay technologies, and statistical methods
1. “Gene discovery” (e.g., genetic epidemiology)• Candidate gene studies
- Candidate identification- Targeted genotyping
a. PCRb. High-throughput approaches
- Statistical modelsa. Fisher’s basic regression modelb. Multivariate mapping / association / recombination
i. Recombinationii. Haplotype blocks
c. Confoundingi. Linkage disequilibrium & haplotype analysesii. Ethnic stratification
Phenotypic ascertainmentGenetic ancestry
iii. Mendelian randomization
Fisher’s regression:
GG GC CC
Out
com
e
y = a + b(#G rs1800795)
Fisher’s regression:
GG GC CC
Out
com
e
y = a + b(#G rs1800795)
y = a + b(#G rs1800795) + c(#T rs20937) + ….
Fisher’s regression:
GG GC CC
Out
com
e
y = a + b(#G rs1800795)
y = a + b(Haplotype containing rs1800795)
Fisher’s regression:
GG GC CC
Out
com
e
y = a + b(#G rs1800795)
y = a + b(Haplotype containing rs1800795)y = a + b(ATTCGTAC)
Fisher’s regression:
GG GC CC
Out
com
e
y = a + b(#G rs1800795)
y = a + b(Haplotype containing rs1800795)y = a + b(ATTCGTAC)
HapMap Tag SNP
Technical aspects of study design and data analysis
Study designs, assay technologies, and statistical methods
1. “Gene discovery” (e.g., genetic epidemiology)• Candidate gene studies
- Candidate identification- Targeted genotyping
a. PCRb. High-throughput approaches
- Statistical modelsa. Fisher’s basic regression modelb. Multivariate mapping / association / recombination
i. Recombinationii. Haplotype blocks
c. Confoundingi. Linkage disequilibrium & haplotype analysesii. Ethnic stratification
Phenotypic ascertainmentGenetic ancestry
iii. Mendelian randomization
Linkage-driven indirect association gradients
Linkage-driven indirect association gradients
Technical aspects of study design and data analysis
Study designs, assay technologies, and statistical methods
1. “Gene discovery” (e.g., genetic epidemiology)• Candidate gene studies
- Candidate identification- Targeted genotyping
a. PCRb. High-throughput approaches
- Statistical modelsa. Fisher’s basic regression modelb. Multivariate mapping / association / recombination
i. Recombinationii. Haplotype blocks
c. Confoundingi. Linkage disequilibrium & haplotype analysesii. Ethnic stratification
Phenotypic ascertainmentGenetic ancestry
iii. Mendelian randomization
Culture/behavior/exposure“Environment”
Ancestry classification via mitochondrial haplogroups (also Y haplogroups for paternal lineage)
Technical aspects of study design and data analysis
Study designs, assay technologies, and statistical methods
1. “Gene discovery” (e.g., genetic epidemiology)• Candidate gene studies
- Candidate identification- Targeted genotyping
a. PCRb. High-throughput approaches
- Statistical modelsa. Fisher’s basic regression modelb. Multivariate mapping / association / recombination
i. Recombinationii. Haplotype blocks
c. Confoundingi. Linkage disequilibrium & haplotype analysesii. Ethnic stratification
Phenotypic ascertainmentGenetic ancestry
iii. Mendelian randomization
CRP
CVD
CRP
CVDCRP
CRP
CVDCRP
CRP
CVDCRP
IL-6
Technical aspects of study design and data analysis
Study designs, assay technologies, and statistical methods
1. “Gene discovery” (e.g., genetic epidemiology)• Candidate gene studies
- Candidate identification- Targeted genotyping
a. PCRb. High-throughput approaches
- Statistical modelsa. Fisher’s basic regression modelb. Multivariate mapping / association / recombination
i. Recombinationii. Haplotype blocks
c. Confoundingi. Linkage disequilibrium & haplotype analysesii. Ethnic stratification
Phenotypic ascertainmentGenetic ancestry
iii. Mendelian randomization
Technical aspects of study design and data analysis
Study designs, assay technologies, and statistical methods
1. “Gene discovery” (e.g., genetic epidemiology)• Candidate gene studies
Technical aspects of study design and data analysis
Study designs, assay technologies, and statistical methods
1. “Gene discovery” (e.g., genetic epidemiology)• Candidate gene studies• Genome-wide association studies
Technical aspects of study design and data analysis
Study designs, assay technologies, and statistical methods
1. “Gene discovery” (e.g., genetic epidemiology)• Candidate gene studies• Genome-wide association studies
- Marker selection for blind search: tag SNPs- Massively parallel genotyping
a. Array-based strategiesb. Deep resequencing
- Statistical modelsa. Main effect modelsb. Interaction modelsc. Managing Type I error
- Bonferronni & FDR- Internal cross-validation- External replication
Technical aspects of study design and data analysis
Study designs, assay technologies, and statistical methods
1. “Gene discovery” (e.g., genetic epidemiology)• Candidate gene studies• Genome-wide association studies
- Marker selection for blind search: tag SNPs- Massively parallel genotyping
a. Array-based strategiesb. Deep resequencing
- Statistical modelsa. Main effect modelsb. Interaction modelsc. Managing Type I error
- Bonferronni & FDR- Internal cross-validation- External replication
Technical aspects of study design and data analysis
Study designs, assay technologies, and statistical methods
1. “Gene discovery” (e.g., genetic epidemiology)• Candidate gene studies• Genome-wide association studies
- Marker selection for blind search: tag SNPs- Massively parallel genotyping
a. Array-based strategiesb. Deep resequencing
- Statistical modelsa. Main effect modelsb. Interaction modelsc. Managing Type I error
- Bonferronni & FDR- Internal cross-validation- External replication
Fisher’s regression:
GG GC CC
Out
com
e
y = a + b(#G)
y = a + b(GG) + c(GC) + d(CC)
Fisher’s regression:
GG GC CC
Out
com
e
Environment A
GG GC CCO
utco
me
Environment B
y = a + b(#G)
y = a + b(GG) + c(GC) + d(CC)
Fisher’s regression:
GG GC CC
Out
com
e
y = a + b(#G) + c(Env) + d(#G x Env)
y = a + b(GG) + c(GC) + d(CC) + e(Env) + f(Env x GG) + g(Env x GC) + h(Env x CC)
Environment A
GG GC CCO
utco
me
Environment B
Technical aspects of study design and data analysis
Study designs, assay technologies, and statistical methods
1. “Gene discovery” (e.g., genetic epidemiology)• Candidate gene studies• Genome-wide association studies
- Marker selection for blind search: tag SNPs- Massively parallel genotyping
a. Array-based strategiesb. Deep resequencing
- Statistical modelsa. Main effect modelsb. Interaction modelsc. Managing Type I error
- Bonferronni & FDR- Internal cross-validation- External replication
Type 1 / false positive error:
Type 1 / false positive error:
Confirmatory hypothesis testing (candidate genes)
1 hypothesis = 1 t-test = 1 p-value = no problem: p < .05 = p < .05
Type 1 / false positive error:
Confirmatory hypothesis testing (candidate genes)
1 hypothesis = 1 t-test = 1 p-value = no problem: p < .05 = p < .05
Gene mapping (exploratory association testing)
Gene expression: 22,000 p-values = 1,100 false positives (p < .05)p(false discovery > 0) = .999999999999999999999999+
Type 1 / false positive error:
Confirmatory hypothesis testing (candidate genes)
1 hypothesis = 1 t-test = 1 p-value = no problem: p < .05 = p < .05
Gene mapping (exploratory association testing)
Gene expression: 22,000 p-values = 1,100 false positives (p < .05)p(false discovery > 0) = .999999999999999999999999+
Gene polymorphism: 10,000,000 p-values = 500,000 false positives (p < .05)p(false discovery > 0) = .999999999999999999999999+
What to do?
What to do?
1. Increase stringency (intra-study)
Bonferroni correct ( p = .05/22,000 = .00000227 )Choice: huge samples or massive Type 2 “false negative” error
What to do?
1. Increase stringency (intra-study)
Bonferroni correct ( p = .05/22,000 = .00000227 )Choice: huge samples or massive Type 2 “false negative” error
Model/simulate errorRandomization test or FDR modeling = less conservative bias
Unimpressive yield: p = .00000300 if you’re lucky. Still too conservative, and biased ( omitted true effects in error term )
What to do?
1. Increase stringency (intra-study)
Bonferroni correct ( p = .05/22,000 = .00000227 )Choice: huge samples or massive Type 2 “false negative” error
Model/simulate errorRandomization test or FDR modeling = less conservative bias
Unimpressive yield: p = .00000300 if you’re lucky. Still too conservative, and biased ( omitted true effects in error term )
What to do?
1. Increase stringency (intra-study)
Bonferroni correct ( p = .05/22,000 = .00000227 )Choice: huge samples or massive Type 2 “false negative” error
Model/simulate errorRandomization test or FDR modeling = less conservative bias
Unimpressive yield: p = .00000300 if you’re lucky. Still too conservative, and biased ( omitted true effects in error term )
Use a better sampling design
0 5000 10000 15000 20000
0.0
0.2
0.4
0.6
0.8
1.0
sample size
po
we
r
Population prevalence design
0 5000 10000 15000 20000
0.0
0.2
0.4
0.6
0.8
1.0
sample size
po
we
r
0 500 1000 1500 2000
0.0
0.2
0.4
0.6
0.8
1.0
sample sizep
ow
er
Population prevalence design Outcome-stratified design
What to do?
1. Increase stringency (intra-study)
Bonferroni correct ( p = .05/22,000 = .00000227 )Choice: huge samples or massive Type 2 “false negative” error
Model/simulate errorRandomization test or FDR modeling = less conservative bias
Unimpressive yield: p = .00000300 if you’re lucky. Still too conservative, and biased ( omitted true effects in error term )
Use a better sampling design
What to do?
1. Increase stringency (intra-study)
Bonferroni correct ( p = .05/22,000 = .00000227 )Choice: huge samples or massive Type 2 “false negative” error
Model/simulate errorRandomization test or FDR modeling = less conservative bias
Unimpressive yield: p = .00000300 if you’re lucky. Still too conservative, and biased ( omitted true effects in error term )
Use a better sampling design
2. Replicate (inter-study or intra-study cross-validation)
.05 x .05 x .05 = .000125 x 22,000 = 2.75 false positives ( vs. 1,100 )
What to do?
1. Increase stringency (intra-study)
Bonferroni correct ( p = .05/22,000 = .00000227 )Choice: huge samples or massive Type 2 “false negative” error
Model/simulate errorRandomization test or FDR modeling = less conservative bias
Unimpressive yield: p = .00000300 if you’re lucky. Still too conservative, and biased ( omitted true effects in error term )
Use a better sampling design
2. Replicate (inter-study or intra-study crossvalidation)
.05 x .05 x .05 = .000125 x 22,000 = 2.75 false positives ( vs. 1,100 )
Technical aspects of study design and data analysis
Study designs, assay technologies, and statistical methods
1. “Gene discovery” (e.g., genetic epidemiology)• Candidate gene studies• Genome-wide association studies
- Marker selection for blind search: tag SNPs- Massively parallel genotyping
a. Array-based strategiesb. Deep resequencing
- Statistical modelsa. Main effect modelsb. Interaction modelsc. Managing Type I error
- Bonferronni & FDR- Internal cross-validation- External replication
Technical aspects of study design and data analysis
Study designs, assay technologies, and statistical methods
1. “Gene discovery” (e.g., genetic epidemiology)• Candidate gene studies• Genome-wide association studies
Technical aspects of study design and data analysis
Study designs, assay technologies, and statistical methods
1. “Gene discovery” (e.g., genetic epidemiology)• Candidate gene studies• Genome-wide association studies• The bioinformatic “middle road” – biological hypotheses buy power
Technical aspects of study design and data analysis
Study designs, assay technologies, and statistical methods
1. “Gene discovery” (e.g., genetic epidemiology)• Candidate gene studies• Genome-wide association studies• The bioinformatic “middle road” – biological hypotheses buy power
- Candidate set selectiona. Regulatory polymorphismb. Coding polymorphism
- Statistical considerationsa. Powerb. Differential enrichment
IL6TCT TGCGATGCTA AAG
In silico prediction of Gene x Environment Interaction
IL6TCT TGCGATGCTA AAG
C
V$GATA1_01 = .943
V$GATA1_01 = .619
In silico prediction of Gene x Environment Interaction
In silico
Tra
nscr
iptio
nal a
ctiv
ity
(fol
d-ch
ange
)
10
8
6
4
2
0
IL6 promoter: WT -174C
Norepinephrine (M): 0 10 - 0 10
Difference: p < .0001
IL6TCT TGCGATGCTA AAG
C
V$GATA1_01 = .943
V$GATA1_01 = .619
In silico prediction of Gene x Environment Interaction
In silico In vitro
70 80 90
0.0
0.2
0.4
0.6
0.8
1.0
Age
Su
rviv
al
70 80 90
0.0
0.2
0.4
0.6
0.8
1.0
Age
Su
rviv
al
p = .439
Non-depressedDepressed
70 80 90
0.0
0.2
0.4
0.6
0.8
1.0
Age
Su
rviv
al
70 80 90
0.0
0.2
0.4
0.6
0.8
1.0
Age
Su
rviv
al
Non-depressedDepressed
p = .008
70 80 90
0.0
0.2
0.4
0.6
0.8
1.0
Age
Su
rviv
al
70 80 90
0.0
0.2
0.4
0.6
0.8
1.0
Age
Su
rviv
al
IL6 -174 GG IL6 -174 CC/GC
In silico prediction of Gene x Environment Interaction
In vivo
FLJ20719 -734LOC148490 -929AKR7A2 -678RHCE -292RHCE -292RHCE -292RHCE -292LOC440576 -934SOC -39SOC -49SOC -26UNQ6122 -877LAPTM5 -728PHC2 -168PHC2 -16ITGB3BP -311FLJ20331 -994ZNF265 -663ZNF265 -663FUBP1 -778LOC388650 -392LOC388654 -957PDE4DIP -175COAS2 -435LOC199882 -474LOC440689 -692LOC440689 -16LOC441906 -496FLG -17LEP3 -631RAB13 -310LOC91181 -956LOC91181 -956LOC126669 -407LOC440693 -399PKLR -118PKLR -597FCRH1 -580SPTA1 -163SLAMF9 -256KCNJ10 -383ITLN1 -760ITLN1 -760F11R -798F11R -798LMX1A -85SELP -144LOC400796 -263F13B -881F13B -881MYOG -951LOC440712 -956LGTN -331FLJ10874 -676GPATC2 -556LOC440721 -625AGT 1FLJ10359 -367LOC441927 -406LOC440741 -564MGC12466 -863KIAA1720 -894LOC388578 -522LOC391205 -430MIG-6 -618MIG-6 -638MIG-6 -678LOC441870 -731LOC440561 -255LOC401940 -500LOC401940 -564LOC401940 -606LOC339553 -400LOC440753 -695LOC388789 -593FLJ38374 -686LOC391241 -81LOC388794 -28C20orf70 -431STK4 -122PIGT -910DNTTIP1 -479C20orf67 -1MMP9 -875CEBPB -978RNPC1 -370RNPC1 -370TH1L -26TH1L -26LOC400849 -714LOC400849 -382CGI-09 -309FKHL18 -608C20orf172 -118TGM2 -220TGM2 -220LOC388798 -828Kua-UEV -465Kua-UEV -561Kua -465BTBD4 -590C21orf99 -772C21orf99 -13KRTAP15-1 -566B3GALT5 -889B3GALT5 -889B3GALT5 -889B3GALT5 -889B3GALT5 -889LOC441955 -824LOC441955 -824LOC400858 -624CLDN8 -17KRTAP19-7 -127DSCR1 -620C21orf84 -232KRTAP12-4 -899FTCD -410FTCD -410LOC440842 -24PEX26 -129PEX26 -89ZNF74 -726ZNF74 -726LOC440804 -940SMARCB1 -290CABIN1 -797KIAA1671 -26ARP10 -612ADSL -602ARHGAP8 -260NUP50 -629PPARA -184PPARA -184
BID -126DGCR14 -951TXNRD2 -882LOC391303 -881LOC150221 -939LOC91219 -352LOC150236 -666GSTT1 -141SEC14L4 -746SSTR3 -705FLJ22582 -372DIA1 -749ATP5L2 -328A4GALT -825SULT4A1 -729SULT4A1 -729C2orf15 -882LOC129521 -477LOC440892 -918IL1RL1 -332MRPS9 -970LOC442037 -839IL1F7 -978IL1F7 -978IL1F7 -978IL1F7 -978MGC52000 -273MGC52000 -466MGC52057 -404MAP1D -120COL3A1 -310SLC39A10 -921LOC200726 -220IL8RB -447TUBA4 -643FLJ25955 -24ALPPL2 -296UGT1A9 -651UGT1A7 -351UGT1A6 -224UGT1A6 -402TRPM8 -170ASB1 -723GCKR -204LOC388938 -212FLJ38348 -606MSH2 -376MSH2 -976MSH2 -376MSH2 -376SBLF -59LOC151443 -85LOC391387 -134SEMA4F -751RBM29 -1LOC339562 -621LOC339562 -641LOC200493 -245TXNDC9 -714FLJ40629 -946LOC401005 -12LOC389050 -170ORC4L -16ORC4L -16ORC4L -16ORC4L -16ARL5 -895ARL5 -895NR4A2 -527NR4A2 -527NR4A2 -527NR4A2 -527ATP5G3 -55ZNF533 -598ZSWIM2 -772PGAP1 -821PGAP1 -827SF3B1 -138ORC2L -786LOC391475 -413CRYGC -765PECR -942SLC23A3 -412LOC442070 -877LOC129607 -488LOC339789 -268LOC130502 -558ALK -710BCL11A -615BCL11A -615BCL11A -615BCL11A -615PAP -438PAP -438PAP -531CNTN4 -809PPARG -584PPARG -914LOC401054 -926GALNTL2 -427FBXL2 -107APRG1 -269APRG1 -347LOC440951 -20LOC389123 -140LOC285194 -808NR1I2 -769STXBP5L -480LOC442092 -880MRPS22 -897KCNAB1 -793LOC402146 -134LOC90133 -2NLGN1 -541FLJ20522 -803ATP2B2 -593LOC440946 -917ANKRD28 -437LOC152024 -365FLJ32685 -953SLC4A7 -509MST1 -895LOC377064 -623LOC200959 -572CPOX -150LOC401079 -259CBLB -250LOC344807 -514GPR156 -497IQCB1 -412MGC34728 -553LOC256374 -248KIAA0861 -435MGC15397 -397LOC254808 -484
LRRC15 -288KIAA0226 -776LOC255324 -380IBSP -319MGC48628 -101NDST3 -902LOC401149 -733LOC441038 -837FLJ35630 -291CYP4V2 -117LOC401164 -978LOC391727 -934LOC399917 -840ZAR1 -106LOC401132 -18PF4 -819EIF4E -716ADH7 -557TACR3 -957AGXT2L1 -631PLA2G12A -795PITX2 -411PITX2 -411LOC401155 -72CDHJ -652FGA -110FGA -110PPID -384LOC441049 -368GPM6A -203LOC389833 -878LOC389833 -288LOC389833 -288LOC389833 -878LOC442102 -418FGFBP1 -290LOC441013 -188FLJ00310 -289FLJ00310 -881FLJ00310 -289FLJ00310 -289FLJ00310 -289FLJ00310 -289FLJ00310 -289LOC442127 -287SRD5A1 -631LOC345711 -877LOC389281 -225MGC42105 -669PELO -938BDP1 -918DKFZp564C0469 -378LOC134505 -63TSLP -331LOC340069 -755SNCAIP -671LOC441106 -646SLC27A6 -484CDC42SE2 -384PHF15 -52LOC389331 -27PCDHA4 -26PCDHA4 -26PCDHB3 -623PCDHB6 -212PCDHB16 -609ABLIM3 -474LARP -716LOC134541 -868FGFR4 -472FGFR4 -472FGFR4 -745FGFR4 -745LOC442145 -7LOC442146 -856LOC345462 -604LOC345462 -609LOC442148 -595OR2V2 -340OR2V2 -901TPPP -454MYO10 -583LOC441066 -463GDNF -36LOC345643 -568FOXD1 -990ARSB -493DHFR -473SPATA9 -748CHD1 -581STK22D -863LOC389316 -227CDO1 -360FLJ33977 -166LOC391824 -129ALDH7A1 -920CAMK2A -429CAMK2A -429C5orf4 -657LOC345430 -332DUSP1 -361LOC285770 -132NQO2 -705MRS2L -22HIST1H2BA -960HIST1H2BD -597HIST1H2BD -597HIST1H2BH -618HIST1H4I -283HLA-H -477MRPS18B -207LOC401250 -26LOC401250 -497NFKBIL1 -305LY6G5B -359C6orf25 -413C6orf25 -413C6orf25 -413C6orf25 -413C6orf25 -413C6orf25 -413C6orf25 -413HSPA1B -942C2 -687HLA-DRA -774HLA-DQA1 -265ZBTB9 -229LOC389386 -725TLT4 -607C6orf139 -788KIAA1411 -549C6orf57 -986C6orf165 -728POU3F2 -997LOC340148 -581
C6orf55 -149LOC345829 -202LOC442278 -732LOC442279 -858LOC401289 -82LOC285766 -472SERPINB6 -657OFCC1 -367LOC441129 -714SMA3 -762LOC222699 -719LOC441138 -870OR12D3 -872LOC346171 -389HCG4P6 -80HCG4P6 -501PSORS1C2 -78PSORS1C2 -78HLA-C -512HLA-B -594HLA-DRB1 -469HLA-DRB1 -821HLA-DQB2 0HLA-DQB2 -333HLA-DQB2 0HLA-DOB -500MLN -740LRFN2 -452C6orf108 -907C6orf108 -907PLA2G7 -227CRISP1 -236CRISP1 -236IL17F -733HMGCLL1 -759LOC442226 -67C6orf66 -832DJ467N11.1 -34RTN4IP1 -207SLC22A16 -869LOC442254 -307DEADC1 -509FLJ44955 -391SYNE1 -484SYNE1 -126LOC389435 -451LOC389435 -565PIP3-E -457T -9T -3LOC442280 -112DKFZP434J154 -615LOC401303 -632LOC441198 -739GHRHR -646ADCYAP1R1 -60C7orf16 -842LOC441209 -41GPR154 -435GPR154 -435C7orf36 -707BLVRA -400BLVRA -400LOC51619 -311WBSCR19 -38LOC136288 -523LOC392030 -632FZD9 -485LOC85865 -255LOC442341 -390AKR1D1 -159LOC93432 -126OR2F1 -160OR2A5 -927LOC441184 -336LOC441186 -584LOC441187 -654LOC389831 -914LOC389831 -914LOC222967 -338LOC222967 -338LOC340267 -244ICA1 -699AGR2 -65LOC389472 -184LOC401316 -837CRHR2 -610PDE1C -20LOC441210 -361LOC222052 -77LOC441224 -287LOC441230 -143LOC441245 -127LOC441259 -954CCL26 -441SEMA3C -385C7orf23 -761PON1 -785GATS -36ACHE -715ACHE -224ACHE -715ACHE -224ORC5L -990ORC5L -990CHCHD3 -793MGC5242 -861LOC392997 -596LOC392997 -596FLJ44186 -168HIPK2 -70ZC3HDC1 -407LOC402301 -14BAGE4 -100BAGE4 -648MCPH1 -520MCPH1 -203AMAC -766NEIL2 -956NEF3 -789PNOC -756LOC441344 -308FKSG2 -72DKFZp586M1819 -469SNTG1 -463LOC389657 -899ADHFE1 -54SULF1 -522WWP1 -695LOC401471 -326FLJ45248 -290LOC441309 -343LOC392169 -927ANGPT2 -895SPAG11 -971
SPAG11 -971SPAG11 -971SPAG11 -971SPAG11 -622SPAG11 -622SPAG11 -971DEFB104 -132LOC389633 -370ASAH1 -702ASAH1 -882FLJ22494 -242FLJ22494 -781SNAI2 -728CPA6 -613FSBP -393MFTC -905MRPL13 -525LOC442399 -126TOP1MT -477LOC286126 -887LOC340393 -922DOCK8 -109LOC441386 -327C9orf93 -708SH3GL2 -702C9orf94 -376LOC340501 -32LOC441417 -394DKFZP434M131 -944SECISBP2 -404LOC441453 -821PHF2 -646PHF2 -648LOC441457 -742LOC441457 -802PRG-3 -971RAD23B -998SLC31A2 -380OR1N2 -646C9orf54 -2C9orf54 -2LAMC3 -895LOC441473 -825LOC441473 -825LOC441473 -825DBH -768OBP2A -732EGFL7 -330EGFL7 -335TRAF2 -32LOC441408 -394LOC389702 -288C9orf46 -353SLC24A2 -265IFNA10 -138IFNA14 -85C9orf11 -311C9orf24 -905C9orf24 -905UNQ470 -31STOML2 -420LOC392334 -904LOC286327 -215HNRPK -86LOC441452 -955DIRAS2 -896LOC286359 -774TXNDC4 -690TXN -239OR1L8 -459DYT1 -561ABO -790ABO -789ABO -790XPMC2H -374LOC441474 -921LOC389734 -489LOC389734 -223FCN1 -673FCN1 -709LOC441410 -990GAGE1 -21RRAGB -788RRAGB -788LOC340527 -194SH3BGRL -944DIAPH2 -921DIAPH2 -921HSU24186 -145NXF2 -89PLP1 -918PLP1 -918LOC286436 -713SLC6A14 -962LOC392529 -73FLJ25735 -992MAGEB4 -834MAGEB4 -834LOC389844 -822LOC389844 -814UBE1 -964LOC203604 -16LOC441481 -796DMD -923RPGR 3ZNF21 -828PRKY -308LOC441537 -223LOC441539 -222LOC441535 -225LOC441536 -223LOC338588 -51UCN3 -368NET1 -14MAPK8 -856MAPK8 -856MAPK8 -856MAPK8 -856LOC399768 -100CDC2 -415SLC29A3 -596LOC143244 -131LOC439994 -962LIPL3 -68LIPL3 -704LOC439996 -302LOC387701 -717LOC387701 -817FRAT1 -287FRAT1 -287ABCC2 -3ABCC2 -3HPS6 -237NFKB2 -790PNLIPRP2 -442
DMBT1 -462DMBT1 -462DMBT1 -462FANK1 3TAF3 -544LOC441547 9LOC220998 -941TPRT -277C10orf68 -817C10orf9 -269ZNF33A -477ZNF33A -477LOC399744 -202LOC399744 -202PPYR1 -81PPYR1 -81LOC439946 -71AKR1C2 -641AKR1C2 -641LOC441560 -504LOC439975 -618NEUROG3 6AMID -452PPP3CB -854LOC439983 -240LOC389988 -68MMS19L -221C10orf69 -121GPR10 -555C10orf93 -42ASB13 -506IL15RA -222IL15RA -827USP6NL -573C10orf45 -181NMT2 -912SIAT8F -676NEBL -727C10orf52 -163LOC439953 -879LOC399737 -608CTGLF1 -504LOC439963 -500KCNQ1 -40LOC387746 -61OR51F2 -640TRIM34 -105OR10A2 -851SAA1 -721SAA1 -722LOC441593 -126PDHX -845TRIM44 -24LOC90139 -660NDUFS3 -929LOC196346 -885OR5T3 -97CTNND1 -133CTNND1 -116CNTF -149ROM1 -515MARK2 -375MARK2 -375RAB1B -75GSTP1 -841GSTP1 -841LOC440056 -824USP35 -148LOC390231 -471OR4D5 -465OR8G5 -809MGC39545 -867LOC399969 -328LOC219797 -216NUP98 -651NUP98 -651NUP98 -651NUP98 -651KIAA0409 -533LOC283299 -427LOC440026 -69LOC440030 -675LOC387754 -159LOC144100 -631HPS5 -917HPS5 -917HPS5 -917LOC387764 -149LOC440041 -221FLJ31393 -362OR8H1 -161AGTRL1 -809PRG2 -899TCN1 -716RAB3IL1 -976KIAA0404 -771CHRDL2 -754KCTD14 -94MRE11A -879MRE11A -982MMP7 -853CRYAB -175ZNF202 -527LOC387820 -553LOC387823 -178CCND2 -350NDUFA9 -485KCNA5 -805FLJ10665 -245FLJ10665 -576LOC285407 -743LOC390299 -771FLJ10652 -491LOC144245 -455PFKM -838DKFZp686O1689 -733C12orf10 -110DGKA -806DGKA -800SUOX -384ZNFN1A4 -874LYZ -944GAS41 -166VEZATIN -34LOC387876 -110C12orf8 -840COX6A1 -124LOC390364 -971LOC390364 -971LOC144678 -418LOC338797 -31SLC6A13 -445NRIP2 -107NOL1 -122LOC387701 -817FRAT1 -287FRAT1 -287ABCC2 -3ABCC2 -3HPS6 -237NFKB2 -790PNLIPRP2 -442
CLECSF12 -885CLECSF12 -885CLECSF12 -885CLECSF12 -885CLECSF12 -885CLECSF12 -885CLECSF12 -885CLECSF12 -885CLECSF12 -885KLRK1 -349PRB1 -589PRB1 -589PRB1 -589ADAMTS20 -965ADAMTS20 -965SLC38A2 -638K-ALPHA-1 -27KIAA1602 -262RACGAP1 -620K6IRS3 -708KRT4 -83NPFF -777STAT2 -94FLJ32949 -500IFNG -795MGC26598 -498HAL -358DKFZp434M0331 -920LOC400070 -223TSC -785GPR109B -392EPIM -568EPIM -568GALNT9 -798LOC440122 -169LOC221140 -342LOC440128 -877LOC387912 -279LOC341784 -327NURIT -947RB1 -525DKFZP434K1172 -595DKFZP434K1172 -595LOC144983 -906LOC144983 -892LOC144983 -896LOC400144 -807PROZ -865PROZ -865CRYL1 -768POSTN -32LOC440134 -367EBPL -973GUCY1B2 -832LOC338862 -918LOC404785 -818OR11H6 -269C14orf92 -234PSMA6 -219KTN1 -222C14orf166B -786EVL -28CCNB1IP1 -868CCNB1IP1 -868NEDD8 -143BAZ1A -508BAZ1A -508NFKBIA -963LOC283551 -302CDKL1 -902LOC400214 -138RTN1 -974LOC390488 -457PLEK2 -465PIGH -153RDH11 -251FLJ39779 -161KIAA1509 -179SERPINA2 -559SERPINA2 -559SERPINA2 -559SERPINA9 -856LOC390529 -204LOC388073 -112LOC400307 -332LOC283694 -71LOC400320 -443FLJ35785 -414LOC440249 -92HH114 -991PLA2G4B -483CAPN3 -318CAPN3 -318CAPN3 -318LOC400368 -320SLC28A2 -275DUT -32SCG3 -739LIPC -853OSTbeta -781LOC440289 -446COMMD4 -790LOC400433 -496LOC390637 -55FLJ11175 -113LOC440224 -815LOC283804 -112CHSY1 -876LOC440315 -303LOC440315 -303LOC400470 -62LOC388076 -715LOC440250 -206LOC440255 -981FLJ20313 -236AVEN -767KIAA0377 -896FBN1 -191SPPL2A -4BCL2L10 -653LOC145780 -610BNIP2 -842BNIP2 -421RASL12 -878SNAPC5 -540BG1 -364LOC400411 -62LOC440293 -718FLJ40113 -951
IP -207TBL3 0KIAA1171 -70TNFRSF12A -968DNAJA3 -24ALG1 -464ALG1 -464FLJ12363 -773LOC92017 -711TMC7 -412MGC16824 -271RBBP6 -795RBBP6 -795RBBP6 -795ITGAX -504ERAF -510LOC388248 -649FLJ38101 -981CES4 -221MT1H -280GAN -839PLCG2 -534CDH13 -906HSBP1 -425MLYCD -917FLJ45121 -772DPEP1 -765FLJ32252 -288FLJ32252 -346MGC35212 -360FLJ25410 -280LOC400506 -715LOC94431 -77DOC2A -265LOC441761 -889LOC57019 -375ZNF319 -360DNCLI2 -857DNCLI2 -857DKFZP434A1319 -236LOC439920 -70CHST5 -601CHST5 -756LOC390748 -242DPH2L1 -42LOC388323 -892MAP2K4 -128MAP2K4 -128KRTAP4-12 -78JJAZ1 -789CCL2 -912PSMB3 -889LOC440440 -1FLJ25168 -244SP2 -57LOC388406 -800TBX4 -465DDX42 -212DDX42 -212LOC90799 -734DKFZP586L0724 -829SSTR2 -874MRPS7 -822MRPS7 -719LOC388429 -804NARF -669NARF -669GEMIN4 -911OR1D2 -376ALOX15 -267SLC16A11 -346CLECSF14 -596CLECSF14 -640FLJ40217 -393RCV1 -761CDRT1 -618NOS2A -287NOS2A -287KRT25D -828KRT12 -585HUMGT198A -797HUMGT198A -690FLJ31222 -769LOC284058 -524GIP -957LOC400619 -823UNC13D -695LOC339162 -685LOC388462 -43SEH1L -801LOC284232 -988LOC284232 -845CABLES1 -281CABYR -908CABYR -908CABYR -908CABYR -908CABYR -908DSG3 -367SLC14A1 -333DCC -386RAB27B -713ZCCHC2 -249LOC342808 -306LOC284276 -397MYOM1 -232MC2R -113LOC441817 -600KIAA1632 -405FBXO15 -123FBXO15 -192LOC390865 -489TXNL4 -33CDC34 -270GZMM -678C19orf21 -573ARID3A -913LOC126295 -456MGC39581 -37TRAPPC5 -352LOC51257 9OR7C2 -399OR10H3 -953OR10H4 -323LOC284434 -560HSPC142 -632PGLS -935LOC148206 -288ZNF431 -967CLECSF12 -885
PSMC4 -215PSMC4 -215EGLN2 -452LOC388549 -412SYNGR4 -825RPL13A -816LOC402665 -925FLJ46385 -176LOC91661 -13LAIR2 -705LAIR2 -705KIR2DL1 -763KIR3DL2 3ZNF583 -867ZNF71 -861MGC4728 -490ZNF211 -76ZNF211 -76LOC401895 -957APBA3 -13FUT5 -174TNFSF7 8SH2D3A -2738D6A -950EIF3S4 -547RAB3D -852MGC20983 -338MGC20983 -338MGC20983 -338NDUFB7 -741LOC339377 -660IL12RB1 -56IL12RB1 -56IL12RB1 -56IL12RB1 -56LOC148198 -361CEBPA -564UNQ467 -521FLJ22573 -941CLC -823DYRK1B -849DYRK1B -849DYRK1B -849PSG11 -297PSG11 -297PSG4 -299PSG4 -299PSG9 -435FLJ34222 -415ERCC2 -123DMPK -988PGLYRP1 -212LIG1 -806FLJ32926 -288CGB8 -202TEAD2 -546FLJ20643 -895LOC400712 -236SIGLEC6 -972SIGLEC6 -972SIGLEC6 -972ZNF577 -582ZNF611 -148ZNF600 -716ZNF600 -37NALP9 -489PRDM2 -762PRDM2 -762LOC400743 -400PADI1 -598FLJ44952 -494DJ462O23.2 -973PPP1R8 5PPP1R8 5PPP1R8 5ATPIF1 -766ATPIF1 -766ATPIF1 -766LOC440581 -793CGI-94 -384FLJ14351 -753UROD -715LOC441885 -810DKFZp761D221 -478DKFZp761D221 -221IL23R -322CTH -6CTH -6AK5 -966DNAJB4 -987CDC7 -604LOC388649 -426DCLRE1B -406LOC440610 -739LOC440610 -584LOC440610 -652LOC441903 -538LOC440673 -482BNIPL -420BNIPL -419SPRR1B -826SPRR1B -826IL6R -110IL6R -110CKS1B -983SYT11 -785PMF1 -223LOC164118 -75FY -397NCSTN -809HSPA6 -839HSPA6 -611CGI-01 7CGI-01 7DKFZP564J047 -208HFL1 -551HFL3 -563NEK7 -714MGC14801 -276OR2AK2 -528LOC441873 -501LOC441873 -565LOC441873 -607LOC343068 -256ARID3A -913LOC126295 -456MGC39581 -37TRAPPC5 -352LOC51257 9OR7C2 -399OR10H3 -953OR10H4 -323LOC284434 -560HSPC142 -632PGLS -935LOC148206 -288ZNF431 -967CLECSF12 -885
1205 GRE-modifying SNPs
Gene set enrichment analysis
Technical aspects of study design and data analysis
Study designs, assay technologies, and statistical methods
1. “Gene discovery” (e.g., genetic epidemiology)• Candidate gene studies• Genome-wide association studies• The bioinformatic “middle road” – biological hypotheses buy power
- Candidate set selectiona. Regulatory polymorphismb. Coding polymorphism
- Statistical considerationsa. Powerb. Differential enrichment
0 5000 10000 15000 20000
0.0
0.2
0.4
0.6
0.8
1.0
sample size
po
we
r
0 500 1000 1500 2000
0.0
0.2
0.4
0.6
0.8
1.0
sample sizep
ow
er
Population prevalence design Outcome-stratified design
0 5000 10000 15000 20000
0.0
0.2
0.4
0.6
0.8
1.0
sample size
po
we
r
0 500 1000 1500 2000
0.0
0.2
0.4
0.6
0.8
1.0
sample sizep
ow
er
Population prevalence design
GEscan GEscan
Outcome-stratified design
Technical aspects of study design and data analysis
Study designs, assay technologies, and statistical methods
1. “Gene discovery” (e.g., genetic epidemiology)• Candidate gene studies• Genome-wide association studies• The bioinformatic “middle road” – biological hypotheses buy power
- Candidate set selectiona. Regulatory polymorphismb. Coding polymorphism
- Statistical considerationsa. Powerb. Differential enrichment
Technical aspects of study design and data analysis
Study designs, assay technologies, and statistical methods
1. “Gene discovery” (e.g., genetic epidemiology)• Candidate gene studies• Genome-wide association studies• The bioinformatic “middle road” – biological hypotheses buy power
- Candidate set selectiona. Regulatory polymorphismb. Coding polymorphism
- Statistical considerationsa. Powerb. Differential enrichment
Technical aspects of study design and data analysis
Study designs, assay technologies, and statistical methods
1. “Gene discovery” (e.g., genetic epidemiology)• Candidate gene studies• Genome-wide association studies• The bioinformatic “middle road” – biological hypotheses buy power
Technical take-home points:
Strengths & weaknesses of alternative approaches
1. Candidate gene studies: focus on 1 candidateAdvantages
- Scientifically tractable: incremental & cross-validatable- Maximal statistical power (focused hypothesis)
Disadvantages- Can only “discover” what we already know (i.e., biased)
2. Genome-wide association studies: focus on all candidatesAdvantages
- Unbiased de novo discoveryDisadvantages
- Minimal statistical power, particularly for interactions3. The bioinformatic “middle road”: focus on a small set of causally
plausible candidates (unbiased search of regulatory and coding SNPs) Advantages
- Scientifically tractable: “short leap of inference” & cross-validatable- Relatively high statistical power (focus on 1-10% of plausible SNPs)
Disadvantages- Likely missing some true causal genetic influences- Bioinformatically intensive – thought (and programming) required
Take-home points for this group:
Take-home points for this group:
1. Gene-Environment interactions are likely far more…- ubiquitous- large in effect size- clinically/socially meaningful
…than current genetic analyses presume.
Take-home points for this group:
1. Gene-Environment interactions are likely far more…- ubiquitous- large in effect size- clinically/socially meaningful
…than current genetic analyses presume.
There is plenty left for you to find.
Take-home points for this group:
1. Gene-Environment interactions are likely far more…- ubiquitous- large in effect size- clinically/socially meaningful
…than current genetic analyses presume.
There is plenty left for you to find.
2. If you have the study you have (i.e., can’t alter sampling design), your major opportunities for increasing power/discovery involve:
Take-home points for this group:
1. Gene-Environment interactions are likely far more…- ubiquitous- large in effect size- clinically/socially meaningful
…than current genetic analyses presume.
There is plenty left for you to find.
2. If you have the study you have (i.e., can’t alter sampling design), your major opportunities for increasing power/discovery involve:
- focusing on substantive effects that are true/big (e.g., GxE, not G, given antagonistic pleiotropy; E, ExE, GxG,
etc.)
Take-home points for this group:
1. Gene-Environment interactions are likely far more…- ubiquitous- large in effect size- clinically/socially meaningful
…than current genetic analyses presume.
There is plenty left for you to find.
2. If you have the study you have (i.e., can’t alter sampling design), your major opportunities for increasing power/discovery involve:
- focusing on substantive effects that are true/big (e.g., GxE, not G, given antagonistic pleiotropy; E, ExE, GxG,
etc.)- modeling biological mechanisms to focus power/impose constraints (e.g., candidate systems, functional themes, regulatory themes)
Take-home points for this group:
1. Gene-Environment interactions are likely far more…- ubiquitous- large in effect size- clinically/socially meaningful
…than current genetic analyses presume.
There is plenty left for you to find.
2. If you have the study you have (i.e., can’t alter sampling design), your major opportunities for increasing power/discovery involve:
- focusing on substantive effects that are true/big (e.g., GxE, not G, given antagonistic pleiotropy; E, ExE, GxG,
etc.)- modeling biological mechanisms to focus power/impose constraints (e.g., candidate systems, functional themes, regulatory themes)- combinatorial data-mining (e.g., machine learning in discovery sample)
Take-home points for this group:
1. Gene-Environment interactions are likely far more…- ubiquitous- large in effect size- clinically/socially meaningful
…than current genetic analyses presume.
There is plenty left for you to find.
2. If you have the study you have (i.e., can’t alter sampling design), your major opportunities for increasing power/discovery involve:
- focusing on substantive effects that are true/big (e.g., GxE, not G, given antagonistic pleiotropy; E, ExE, GxG,
etc.)- modeling biological mechanisms to focus power/impose constraints (e.g., candidate systems, functional themes, regulatory themes)- combinatorial data-mining (e.g., machine learning in discovery sample)- sequential testing designs (low stringency discovery, med stringency test, high stringency confirm)
Take-home points for this group:
1. Gene-Environment interactions are likely far more…- ubiquitous- large in effect size- clinically/socially meaningful
…than current genetic analyses presume.
There is plenty left for you to find.
2. If you have the study you have (i.e., can’t alter sampling design), your major opportunities for increasing power/discovery involve:
- focusing on substantive effects that are true/big (e.g., GxE, not G, given antagonistic pleiotropy; E, ExE, GxG,
etc.)- modeling biological mechanisms to focus power/impose constraints (e.g., candidate systems, functional themes, regulatory themes)- combinatorial data-mining (e.g., machine learning in discovery sample)- sequential testing designs (low stringency discovery, med stringency test, high stringency confirm)
Your advantage is smart data analysis.
Follow-up references
Overview of genetics / biologyAttia, J., et al. (2009) How to use an article about genetic association: A: Background concepts. JAMA, 301, 74-81
Genetic association studiesHirschhorn, J., & Daly, M. (2005) Genome-wide association studies for common diseases and complex traits. Nature Reviews Genetics, 6, 95-108.Attia, J., et al. (2009) How to use an article about genetic association: B: Are the results of the study valid? JAMA, 301, 191-197.Cordell, H, & Clayton, D. (2005) Genetic epidemiology 3: Genetic association studies. Lancet, 366, 1121-1131
Basic statistical modeling for geneticsSiegmund, D., & Yakir, B. (2007) The statistics of gene mapping. New York, Springer
Sampling & statistical approaches for GxE discoveryThomas, D., (2010) Gene-environment-wide association studies: emerging approaches. Nature Reviews Genetics, 11, 259-272
Statistical strategies for combinatorial discoveryHastie, T., Tibshirani, R. & Friedman, J. (2001) The elements of statistical learning. New York, Springer.
.
Perspectives on the State of the Field
How can we best promote the integration of genetic and demographic approaches?
Application clinic
Open microphone
1. What do you want to accomplish?
2. At what stage are you now?i. Study design?ii. Data collection?iii. Analysis and reporting?
3. How can we be of help?
Genomics Workshop
Demography of Aging Centers Biomarker Network Meeting in Conjunction with the Annual Meeting of the PAA
April 14, 9:00 AM to 3:30 PM – Hyatt Regency, Dallas, Texas
Sponsored by USC/UCLA Center of Biodemography and Population Health
Organized by Teresa Seeman, Steven Cole, Eileen Crimmins
Technical aspects of study design and data analysis
Study designs, assay technologies, and statistical methods
1. “Gene discovery” (e.g., genetic epidemiology)• Candidate gene studies• Genome-wide association studies• The bioinformatic “middle road” – biological hypotheses buy power
2. Environmental regulation of health (via transcription)• Candidate transcript studies
- RT-PCR- Statistical analyses incorporating temporal & spatial heterogeneity
• Genome-wide approaches- Microarrays- Theme discovery
a. Functional (Gene Ontology)b. Regulatory (TELiS)c. Spatial (SpAnGEL)
RNA
DNA
RT
IFN-
Antiviral cytokine mRNA
0
100
200
300
400
500
600
700
800
900
1 2 3
0 6 12
IFN
- c
onse
nsus
mR
NA
(f
old-
indu
ctio
n ov
er b
asel
ine)
Exposure (hrs.)
IFN-
0
100
200
300
400
500
600
700
800
900
1 2 3
0 6 12IF
N-
mR
NA
(f
old-
indu
ctio
n ov
er b
asel
ine)
Exposure (hrs.)
CpG + NECpG + NE
CpG
CpG
Collado-Hidalgo et al (2006) Brain, Behavior and Immunity
SIV RNA (in situ hybridization)
SIV replication
Social Stress
- +
SIV
re
pli
ca
tio
n
(s
ites
/ sp
atia
l qu
ad
rat)
p < .0001
cond
0.00
0.05
0.10
0.15
0.20
0.25
0.30
SNS neurons
- +0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
SIV
re
pli
ca
tio
n
(
site
s /
spa
tial q
ua
dra
t)
p < .0001
Sloan et al. (2006) Journal of VirologySloan et al. (2007) Journal of Neuroscience
Technical aspects of study design and data analysis
Study designs, assay technologies, and statistical methods
1. “Gene discovery” (e.g., genetic epidemiology)• Candidate gene studies• Genome-wide association studies• The bioinformatic “middle road” – biological hypotheses buy power
2. Environmental regulation of health (via transcription)• Candidate transcript studies
- RT-PCR- Statistical analyses incorporating temporal & spatial heterogeneity
• Genome-wide approaches- Microarrays- Theme discovery
a. Functional (Gene Ontology)b. Regulatory (TELiS)c. Spatial (SpAnGEL)
Lonely
Integrated
Social isolation
J. Cacioppo
Genome Biology, 2007
78131
Palmer et al. BMC Genomics (2006)
Technical aspects of study design and data analysis
Study designs, assay technologies, and statistical methods
1. “Gene discovery” (e.g., genetic epidemiology)• Candidate gene studies• Genome-wide association studies• The bioinformatic “middle road” – biological hypotheses buy power
2. Environmental regulation of health (via transcription)• Candidate transcript studies
- RT-PCR- Statistical analyses incorporating temporal & spatial heterogeneity
• Genome-wide approaches- Microarrays- Theme discovery
a. Functional (Gene Ontology)b. Regulatory (TELiS)c. Spatial (SpAnGEL)
Social Environment
Gene
Biologicalfunction
IL6
RNA
DNA
Social Environment
Gene
Biologicalfunction
IL6
RNA
DNA
Social Environment
Gene
Biologicalfunction
IL6
RNA
DNA
Social Environment
Gene
Biologicalfunction
IL6
RNA
DNA
Social Environment
Gene
Biologicalfunction
IL6
RNA
DNA
Lonely
Integrated
Social isolation
J. Cacioppo
Genome Biology, 2007
78131
Lonely
Integrated
Social isolation
J. Cacioppo
Genome Biology, 2007
78131
InflammationCell growth/differentiation
Transcription control
Lonely
Integrated
Social isolation
J. Cacioppo
Genome Biology, 2007
78131
InflammationCell growth/differentiation
Transcription control
Immunoglobulin productionType I interferon antiviral response
http://www.gostat.wehi.edu.au
TRIM54ACSBG2HIST4H4KLHL32FLJ35773GPC4TRPV4LBPC20ORF200ASB15OCLM
http://www.gostat.wehi.edu.au
http://www.gostat.wehi.edu.au
Social Environment
Gene
Biologicalfunction
IL6
RNA
DNA
Sp1
CREB
NF-B
Sp1
CREB
NF-B
Sp1
CREB
NF-B
Sp1
CREB
NF-BEnvironment S equence ExpressionPromoterSequence
Sp1
CREB
NF-BEnvironment S equence ExpressionPromoterSequence
Sp1
CREB
NF-BEnvironment S equence ExpressionPromoterSequence
Sp1
CREB
NF-B
Sp1
CREB
NF-B
Sp1
CREB
NF-B
Environment S equence ExpressionPromoterSequence
?
Sp1
CREB
NF-B
Sp1
CREB
NF-B
Sp1
CREB
NF-B
Sp1
CREB
NF-B
Cole et al (2005) Bioinformatics, 21, 803
http://www.telis.ucla.edu
Cole et al (2005) Bioinformatics, 21, 803
http://www.telis.ucla.edu
Cole et al (2005) Bioinformatics, 21, 803
http://www.telis.ucla.edu
Lonely
Integrated
Social isolation
J. Cacioppo Genome Biology, 2007
78131
Lonely
Integrated
Social isolation
J. Cacioppo Genome Biology, 2007
78131
NF-B
Lonely
Integrated
Social isolation
J. Cacioppo Genome Biology, 2007
78131
NF-B
GRE
Social Environment
Gene
Biologicalfunction
IL6
RNA
DNA
NaB de-repression - fibroblast
gene 1
gene 2
gene 3
gene 4
gene 5
gene 6
gene 7
gene 8
gene 9
gene 10
gene 11
gene 12
gene 13
gene 14
gene 15
gene 16
gene 17
gene 18
gene 19
gene 20
gene 21
gene 22
gene 23
gene 24
gene 25
gene 26
gene 27
gene 28
gene 29
gene 30
gene 31
gene 32
gene 33
gene 34
gene 35
gene 36
gene 37
gene 38
gene 39
gene 40
gene 41
gene 1
gene 2
gene 3
gene 4
gene 5
gene 6
gene 7
gene 8
gene 9
gene 10
gene 11
gene 12
gene 13
gene 14
gene 15
gene 16
gene 17
gene 18
gene 19
gene 20
gene 21
gene 22
gene 23
gene 24
gene 25
gene 26
gene 27
gene 28
gene 29
gene 30
gene 31
gene 32
gene 33
gene 34
gene 35
gene 36
gene 37
gene 38
gene 39
gene 40
gene 41
TF1
gene 1
gene 2
gene 3
gene 4
gene 5
gene 6
gene 7
gene 8
gene 9
gene 10
gene 11
gene 12
gene 13
gene 14
gene 15
gene 16
gene 17
gene 18
gene 19
gene 20
gene 21
gene 22
gene 23
gene 24
gene 25
gene 26
gene 27
gene 28
gene 29
gene 30
gene 31
gene 32
gene 33
gene 34
gene 35
gene 36
gene 37
gene 38
gene 39
gene 40
gene 41
TF1
TF2
gene 1
gene 2
gene 3
gene 4
gene 5
gene 6
gene 7
gene 8
gene 9
gene 10
gene 11
gene 12
gene 13
gene 14
gene 15
gene 16
gene 17
gene 18
gene 19
gene 20
gene 21
gene 22
gene 23
gene 24
gene 25
gene 26
gene 27
gene 28
gene 29
gene 30
gene 31
gene 32
gene 33
gene 34
gene 35
gene 36
gene 37
gene 38
gene 39
gene 40
gene 41
TF1
TF2
TF3
gene 1
gene 2
gene 3
gene 4
gene 5
gene 6
gene 7
gene 8
gene 9
gene 10
gene 11
gene 12
gene 13
gene 14
gene 15
gene 16
gene 17
gene 18
gene 19
gene 20
gene 21
gene 22
gene 23
gene 24
gene 25
gene 26
gene 27
gene 28
gene 29
gene 30
gene 31
gene 32
gene 33
gene 34
gene 35
gene 36
gene 37
gene 38
gene 39
gene 40
gene 41
TF1
TF2
TF3
gene 1
gene 2
gene 3
gene 4
gene 5
gene 6
gene 7
gene 8
gene 9
gene 10
gene 11
gene 12
gene 13
gene 14
gene 15
gene 16
gene 17
gene 18
gene 19
gene 20
gene 21
gene 22
gene 23
gene 24
gene 25
gene 26
gene 27
gene 28
gene 29
gene 30
gene 31
gene 32
gene 33
gene 34
gene 35
gene 36
gene 37
gene 38
gene 39
gene 40
gene 41
TF1
TF2
TF3
gene 1
gene 2
gene 3
gene 4
gene 5
gene 6
gene 7
gene 8
gene 9
gene 10
gene 11
gene 12
gene 13
gene 14
gene 15
gene 16
gene 17
gene 18
gene 19
gene 20
gene 21
gene 22
gene 23
gene 24
gene 25
gene 26
gene 27
gene 28
gene 29
gene 30
gene 31
gene 32
gene 33
gene 34
gene 35
gene 36
gene 37
gene 38
gene 39
gene 40
gene 41
gene 1
gene 2
gene 3
gene 4
gene 5
gene 6
gene 7
gene 8
gene 9
gene 10
gene 11
gene 12
gene 13
gene 14
gene 15
gene 16
gene 17
gene 18
gene 19
gene 20
gene 21
gene 22
gene 23
gene 24
gene 25
gene 26
gene 27
gene 28
gene 29
gene 30
gene 31
gene 32
gene 33
gene 34
gene 35
gene 36
gene 37
gene 38
gene 39
gene 40
gene 41
TF1
TF2
TF3
gene 1
gene 2
gene 3
gene 4
gene 5
gene 6
gene 7
gene 8
gene 9
gene 10
gene 11
gene 12
gene 13
gene 14
gene 15
gene 16
gene 17
gene 18
gene 19
gene 20
gene 21
gene 22
gene 23
gene 24
gene 25
gene 26
gene 27
gene 28
gene 29
gene 30
gene 31
gene 32
gene 33
gene 34
gene 35
gene 36
gene 37
gene 38
gene 39
gene 40
gene 41
gene 1
gene 2
gene 3
gene 4
gene 5
gene 6
gene 7
gene 8
gene 9
gene 10
gene 11
gene 12
gene 13
gene 14
gene 15
gene 16
gene 17
gene 18
gene 19
gene 20
gene 21
gene 22
gene 23
gene 24
gene 25
gene 26
gene 27
gene 28
gene 29
gene 30
gene 31
gene 32
gene 33
gene 34
gene 35
gene 36
gene 37
gene 38
gene 39
gene 40
gene 41
TF1
TF2
TF3
miRNA1
miRNA2
miRNA3
gene 1
gene 2
gene 3
gene 4
gene 5
gene 6
gene 7
gene 8
gene 9
gene 10
gene 11
gene 12
gene 13
gene 14
gene 15
gene 16
gene 17
gene 18
gene 19
gene 20
gene 21
gene 22
gene 23
gene 24
gene 25
gene 26
gene 27
gene 28
gene 29
gene 30
gene 31
gene 32
gene 33
gene 34
gene 35
gene 36
gene 37
gene 38
gene 39
gene 40
gene 41
gene 1
gene 2
gene 3
gene 4
gene 5
gene 6
gene 7
gene 8
gene 9
gene 10
gene 11
gene 12
gene 13
gene 14
gene 15
gene 16
gene 17
gene 18
gene 19
gene 20
gene 21
gene 22
gene 23
gene 24
gene 25
gene 26
gene 27
gene 28
gene 29
gene 30
gene 31
gene 32
gene 33
gene 34
gene 35
gene 36
gene 37
gene 38
gene 39
gene 40
gene 41
TF1
TF2
TF3
miRNA1
miRNA2
miRNA3
DNMT1
DNMT2
DNMT3
Technical aspects of study design and data analysis
Study designs, assay technologies, and statistical methods
1. “Gene discovery” (e.g., genetic epidemiology)• Candidate gene studies• Genome-wide association studies• The bioinformatic “middle road” – biological hypotheses buy power
2. Environmental regulation of health (via transcription)• Candidate transcript studies
- RT-PCR- Statistical analyses incorporating temporal & spatial heterogeneity
• Genome-wide approaches- Microarrays- Theme discovery
a. Functional (Gene Ontology)b. Regulatory (TELiS)c. Spatial (SpAnGEL)
Technical aspects of study design and data analysis
Study designs, assay technologies, and statistical methods
1. “Gene discovery” (e.g., genetic epidemiology)• Candidate gene studies• Genome-wide association studies• The bioinformatic “middle road” – biological hypotheses buy power
2. Environmental regulation of health (via transcription)• Candidate transcript studies• Genome-wide approaches
Technical aspects of study design and data analysis
Study designs, assay technologies, and statistical methods
1. “Gene discovery” (e.g., genetic epidemiology)• Candidate gene studies• Genome-wide association studies• The bioinformatic “middle road” – biological hypotheses buy power
2. Environmental regulation of health (via transcription)• Candidate transcript studies• Genome-wide approaches
3. Gene-Environment interaction• Statistical considerations
- Main effects and antagonistic pleiotropy- Interaction models- Combinatorial discovery
• Revisiting the “bioinformatic” middle road- Candidate set selection
a. Regulatory polymorphismb. Coding polymorphism
Fisher’s regression:
GG GC CC
Out
com
e
y = a + b(#G)
y = a + b(GG) + c(GC) + d(CC)
Fisher’s regression:
GG GC CC
Out
com
e
y = a + b(#G) + c(Env) + d(#G x Env)
y = a + b(GG) + c(GC) + d(CC) + e(Env) + f(Env x GG) + g(Env x GC) + h(Env x CC)
Environment A
GG GC CCO
utco
me
Environment B
Combinatorial explosion
107 SNPs x 101-2 environments = 108-9 intx terms
Combinatorial explosion
107 SNPs x 101-2 environments = 108-9 intx terms
N = 2,000-20,000 for current main effect studies
Given that power/effect size, need 2 Million subjects for interaction sweep.
What to do?
1. Increase stringency (intra-study)
Bonferroni correct / FDR correctModel/simulate errorUse a better sampling design
2. Replicate (inter-study or intra-study crossvalidation)
3. Get a hypothesis- Biological- Empirical
Combinatorial discovery strategies
Smart study design + smart statistics + biological constraint
0 5000 10000 15000 20000
0.0
0.2
0.4
0.6
0.8
1.0
sample size
po
we
r
0 500 1000 1500 2000
0.0
0.2
0.4
0.6
0.8
1.0
sample sizep
ow
er
Population prevalence design Outcome-stratified design
Combinatorial discovery strategies
Smart study design + smart statistics + biological constraint
Combinatorial discovery strategies
Smart study design + smart statistics + biological constraint
• Stratified sampling• Multi-stage testing• Cross-validation
Combinatorial discovery strategies
Smart study design + smart statistics + biological constraint
• Stratified sampling• Multi-stage testing• Cross-validation
• Data-mining / Machine learning
- CART/forests- MARS- PRIM
Combinatorial discovery strategies
Smart study design + smart statistics + biological constraint
• Stratified sampling• Multi-stage testing• Cross-validation
• Data-mining / Machine learning
- CART/forests- MARS- PRIM
• Functional pathways• Regulatory pathways• Chromosomal units
Tra
nscr
iptio
nal a
ctiv
ity
(fol
d-ch
ange
)
10
8
6
4
2
0
IL6 promoter: WT -174C
Norepinephrine (M): 0 10 - 0 10
Difference: p < .0001
IL6TCT TGCGATGCTA AAG
C
V$GATA1_01 = .943
V$GATA1_01 = .619
In silico prediction of Gene x Environment Interaction
In silico In vitro
FLJ20719 -734LOC148490 -929AKR7A2 -678RHCE -292RHCE -292RHCE -292RHCE -292LOC440576 -934SOC -39SOC -49SOC -26UNQ6122 -877LAPTM5 -728PHC2 -168PHC2 -16ITGB3BP -311FLJ20331 -994ZNF265 -663ZNF265 -663FUBP1 -778LOC388650 -392LOC388654 -957PDE4DIP -175COAS2 -435LOC199882 -474LOC440689 -692LOC440689 -16LOC441906 -496FLG -17LEP3 -631RAB13 -310LOC91181 -956LOC91181 -956LOC126669 -407LOC440693 -399PKLR -118PKLR -597FCRH1 -580SPTA1 -163SLAMF9 -256KCNJ10 -383ITLN1 -760ITLN1 -760F11R -798F11R -798LMX1A -85SELP -144LOC400796 -263F13B -881F13B -881MYOG -951LOC440712 -956LGTN -331FLJ10874 -676GPATC2 -556LOC440721 -625AGT 1FLJ10359 -367LOC441927 -406LOC440741 -564MGC12466 -863KIAA1720 -894LOC388578 -522LOC391205 -430MIG-6 -618MIG-6 -638MIG-6 -678LOC441870 -731LOC440561 -255LOC401940 -500LOC401940 -564LOC401940 -606LOC339553 -400LOC440753 -695LOC388789 -593FLJ38374 -686LOC391241 -81LOC388794 -28C20orf70 -431STK4 -122PIGT -910DNTTIP1 -479C20orf67 -1MMP9 -875CEBPB -978RNPC1 -370RNPC1 -370TH1L -26TH1L -26LOC400849 -714LOC400849 -382CGI-09 -309FKHL18 -608C20orf172 -118TGM2 -220TGM2 -220LOC388798 -828Kua-UEV -465Kua-UEV -561Kua -465BTBD4 -590C21orf99 -772C21orf99 -13KRTAP15-1 -566B3GALT5 -889B3GALT5 -889B3GALT5 -889B3GALT5 -889B3GALT5 -889LOC441955 -824LOC441955 -824LOC400858 -624CLDN8 -17KRTAP19-7 -127DSCR1 -620C21orf84 -232KRTAP12-4 -899FTCD -410FTCD -410LOC440842 -24PEX26 -129PEX26 -89ZNF74 -726ZNF74 -726LOC440804 -940SMARCB1 -290CABIN1 -797KIAA1671 -26ARP10 -612ADSL -602ARHGAP8 -260NUP50 -629PPARA -184PPARA -184
BID -126DGCR14 -951TXNRD2 -882LOC391303 -881LOC150221 -939LOC91219 -352LOC150236 -666GSTT1 -141SEC14L4 -746SSTR3 -705FLJ22582 -372DIA1 -749ATP5L2 -328A4GALT -825SULT4A1 -729SULT4A1 -729C2orf15 -882LOC129521 -477LOC440892 -918IL1RL1 -332MRPS9 -970LOC442037 -839IL1F7 -978IL1F7 -978IL1F7 -978IL1F7 -978MGC52000 -273MGC52000 -466MGC52057 -404MAP1D -120COL3A1 -310SLC39A10 -921LOC200726 -220IL8RB -447TUBA4 -643FLJ25955 -24ALPPL2 -296UGT1A9 -651UGT1A7 -351UGT1A6 -224UGT1A6 -402TRPM8 -170ASB1 -723GCKR -204LOC388938 -212FLJ38348 -606MSH2 -376MSH2 -976MSH2 -376MSH2 -376SBLF -59LOC151443 -85LOC391387 -134SEMA4F -751RBM29 -1LOC339562 -621LOC339562 -641LOC200493 -245TXNDC9 -714FLJ40629 -946LOC401005 -12LOC389050 -170ORC4L -16ORC4L -16ORC4L -16ORC4L -16ARL5 -895ARL5 -895NR4A2 -527NR4A2 -527NR4A2 -527NR4A2 -527ATP5G3 -55ZNF533 -598ZSWIM2 -772PGAP1 -821PGAP1 -827SF3B1 -138ORC2L -786LOC391475 -413CRYGC -765PECR -942SLC23A3 -412LOC442070 -877LOC129607 -488LOC339789 -268LOC130502 -558ALK -710BCL11A -615BCL11A -615BCL11A -615BCL11A -615PAP -438PAP -438PAP -531CNTN4 -809PPARG -584PPARG -914LOC401054 -926GALNTL2 -427FBXL2 -107APRG1 -269APRG1 -347LOC440951 -20LOC389123 -140LOC285194 -808NR1I2 -769STXBP5L -480LOC442092 -880MRPS22 -897KCNAB1 -793LOC402146 -134LOC90133 -2NLGN1 -541FLJ20522 -803ATP2B2 -593LOC440946 -917ANKRD28 -437LOC152024 -365FLJ32685 -953SLC4A7 -509MST1 -895LOC377064 -623LOC200959 -572CPOX -150LOC401079 -259CBLB -250LOC344807 -514GPR156 -497IQCB1 -412MGC34728 -553LOC256374 -248KIAA0861 -435MGC15397 -397LOC254808 -484
LRRC15 -288KIAA0226 -776LOC255324 -380IBSP -319MGC48628 -101NDST3 -902LOC401149 -733LOC441038 -837FLJ35630 -291CYP4V2 -117LOC401164 -978LOC391727 -934LOC399917 -840ZAR1 -106LOC401132 -18PF4 -819EIF4E -716ADH7 -557TACR3 -957AGXT2L1 -631PLA2G12A -795PITX2 -411PITX2 -411LOC401155 -72CDHJ -652FGA -110FGA -110PPID -384LOC441049 -368GPM6A -203LOC389833 -878LOC389833 -288LOC389833 -288LOC389833 -878LOC442102 -418FGFBP1 -290LOC441013 -188FLJ00310 -289FLJ00310 -881FLJ00310 -289FLJ00310 -289FLJ00310 -289FLJ00310 -289FLJ00310 -289LOC442127 -287SRD5A1 -631LOC345711 -877LOC389281 -225MGC42105 -669PELO -938BDP1 -918DKFZp564C0469 -378LOC134505 -63TSLP -331LOC340069 -755SNCAIP -671LOC441106 -646SLC27A6 -484CDC42SE2 -384PHF15 -52LOC389331 -27PCDHA4 -26PCDHA4 -26PCDHB3 -623PCDHB6 -212PCDHB16 -609ABLIM3 -474LARP -716LOC134541 -868FGFR4 -472FGFR4 -472FGFR4 -745FGFR4 -745LOC442145 -7LOC442146 -856LOC345462 -604LOC345462 -609LOC442148 -595OR2V2 -340OR2V2 -901TPPP -454MYO10 -583LOC441066 -463GDNF -36LOC345643 -568FOXD1 -990ARSB -493DHFR -473SPATA9 -748CHD1 -581STK22D -863LOC389316 -227CDO1 -360FLJ33977 -166LOC391824 -129ALDH7A1 -920CAMK2A -429CAMK2A -429C5orf4 -657LOC345430 -332DUSP1 -361LOC285770 -132NQO2 -705MRS2L -22HIST1H2BA -960HIST1H2BD -597HIST1H2BD -597HIST1H2BH -618HIST1H4I -283HLA-H -477MRPS18B -207LOC401250 -26LOC401250 -497NFKBIL1 -305LY6G5B -359C6orf25 -413C6orf25 -413C6orf25 -413C6orf25 -413C6orf25 -413C6orf25 -413C6orf25 -413HSPA1B -942C2 -687HLA-DRA -774HLA-DQA1 -265ZBTB9 -229LOC389386 -725TLT4 -607C6orf139 -788KIAA1411 -549C6orf57 -986C6orf165 -728POU3F2 -997LOC340148 -581
C6orf55 -149LOC345829 -202LOC442278 -732LOC442279 -858LOC401289 -82LOC285766 -472SERPINB6 -657OFCC1 -367LOC441129 -714SMA3 -762LOC222699 -719LOC441138 -870OR12D3 -872LOC346171 -389HCG4P6 -80HCG4P6 -501PSORS1C2 -78PSORS1C2 -78HLA-C -512HLA-B -594HLA-DRB1 -469HLA-DRB1 -821HLA-DQB2 0HLA-DQB2 -333HLA-DQB2 0HLA-DOB -500MLN -740LRFN2 -452C6orf108 -907C6orf108 -907PLA2G7 -227CRISP1 -236CRISP1 -236IL17F -733HMGCLL1 -759LOC442226 -67C6orf66 -832DJ467N11.1 -34RTN4IP1 -207SLC22A16 -869LOC442254 -307DEADC1 -509FLJ44955 -391SYNE1 -484SYNE1 -126LOC389435 -451LOC389435 -565PIP3-E -457T -9T -3LOC442280 -112DKFZP434J154 -615LOC401303 -632LOC441198 -739GHRHR -646ADCYAP1R1 -60C7orf16 -842LOC441209 -41GPR154 -435GPR154 -435C7orf36 -707BLVRA -400BLVRA -400LOC51619 -311WBSCR19 -38LOC136288 -523LOC392030 -632FZD9 -485LOC85865 -255LOC442341 -390AKR1D1 -159LOC93432 -126OR2F1 -160OR2A5 -927LOC441184 -336LOC441186 -584LOC441187 -654LOC389831 -914LOC389831 -914LOC222967 -338LOC222967 -338LOC340267 -244ICA1 -699AGR2 -65LOC389472 -184LOC401316 -837CRHR2 -610PDE1C -20LOC441210 -361LOC222052 -77LOC441224 -287LOC441230 -143LOC441245 -127LOC441259 -954CCL26 -441SEMA3C -385C7orf23 -761PON1 -785GATS -36ACHE -715ACHE -224ACHE -715ACHE -224ORC5L -990ORC5L -990CHCHD3 -793MGC5242 -861LOC392997 -596LOC392997 -596FLJ44186 -168HIPK2 -70ZC3HDC1 -407LOC402301 -14BAGE4 -100BAGE4 -648MCPH1 -520MCPH1 -203AMAC -766NEIL2 -956NEF3 -789PNOC -756LOC441344 -308FKSG2 -72DKFZp586M1819 -469SNTG1 -463LOC389657 -899ADHFE1 -54SULF1 -522WWP1 -695LOC401471 -326FLJ45248 -290LOC441309 -343LOC392169 -927ANGPT2 -895SPAG11 -971
SPAG11 -971SPAG11 -971SPAG11 -971SPAG11 -622SPAG11 -622SPAG11 -971DEFB104 -132LOC389633 -370ASAH1 -702ASAH1 -882FLJ22494 -242FLJ22494 -781SNAI2 -728CPA6 -613FSBP -393MFTC -905MRPL13 -525LOC442399 -126TOP1MT -477LOC286126 -887LOC340393 -922DOCK8 -109LOC441386 -327C9orf93 -708SH3GL2 -702C9orf94 -376LOC340501 -32LOC441417 -394DKFZP434M131 -944SECISBP2 -404LOC441453 -821PHF2 -646PHF2 -648LOC441457 -742LOC441457 -802PRG-3 -971RAD23B -998SLC31A2 -380OR1N2 -646C9orf54 -2C9orf54 -2LAMC3 -895LOC441473 -825LOC441473 -825LOC441473 -825DBH -768OBP2A -732EGFL7 -330EGFL7 -335TRAF2 -32LOC441408 -394LOC389702 -288C9orf46 -353SLC24A2 -265IFNA10 -138IFNA14 -85C9orf11 -311C9orf24 -905C9orf24 -905UNQ470 -31STOML2 -420LOC392334 -904LOC286327 -215HNRPK -86LOC441452 -955DIRAS2 -896LOC286359 -774TXNDC4 -690TXN -239OR1L8 -459DYT1 -561ABO -790ABO -789ABO -790XPMC2H -374LOC441474 -921LOC389734 -489LOC389734 -223FCN1 -673FCN1 -709LOC441410 -990GAGE1 -21RRAGB -788RRAGB -788LOC340527 -194SH3BGRL -944DIAPH2 -921DIAPH2 -921HSU24186 -145NXF2 -89PLP1 -918PLP1 -918LOC286436 -713SLC6A14 -962LOC392529 -73FLJ25735 -992MAGEB4 -834MAGEB4 -834LOC389844 -822LOC389844 -814UBE1 -964LOC203604 -16LOC441481 -796DMD -923RPGR 3ZNF21 -828PRKY -308LOC441537 -223LOC441539 -222LOC441535 -225LOC441536 -223LOC338588 -51UCN3 -368NET1 -14MAPK8 -856MAPK8 -856MAPK8 -856MAPK8 -856LOC399768 -100CDC2 -415SLC29A3 -596LOC143244 -131LOC439994 -962LIPL3 -68LIPL3 -704LOC439996 -302LOC387701 -717LOC387701 -817FRAT1 -287FRAT1 -287ABCC2 -3ABCC2 -3HPS6 -237NFKB2 -790PNLIPRP2 -442
DMBT1 -462DMBT1 -462DMBT1 -462FANK1 3TAF3 -544LOC441547 9LOC220998 -941TPRT -277C10orf68 -817C10orf9 -269ZNF33A -477ZNF33A -477LOC399744 -202LOC399744 -202PPYR1 -81PPYR1 -81LOC439946 -71AKR1C2 -641AKR1C2 -641LOC441560 -504LOC439975 -618NEUROG3 6AMID -452PPP3CB -854LOC439983 -240LOC389988 -68MMS19L -221C10orf69 -121GPR10 -555C10orf93 -42ASB13 -506IL15RA -222IL15RA -827USP6NL -573C10orf45 -181NMT2 -912SIAT8F -676NEBL -727C10orf52 -163LOC439953 -879LOC399737 -608CTGLF1 -504LOC439963 -500KCNQ1 -40LOC387746 -61OR51F2 -640TRIM34 -105OR10A2 -851SAA1 -721SAA1 -722LOC441593 -126PDHX -845TRIM44 -24LOC90139 -660NDUFS3 -929LOC196346 -885OR5T3 -97CTNND1 -133CTNND1 -116CNTF -149ROM1 -515MARK2 -375MARK2 -375RAB1B -75GSTP1 -841GSTP1 -841LOC440056 -824USP35 -148LOC390231 -471OR4D5 -465OR8G5 -809MGC39545 -867LOC399969 -328LOC219797 -216NUP98 -651NUP98 -651NUP98 -651NUP98 -651KIAA0409 -533LOC283299 -427LOC440026 -69LOC440030 -675LOC387754 -159LOC144100 -631HPS5 -917HPS5 -917HPS5 -917LOC387764 -149LOC440041 -221FLJ31393 -362OR8H1 -161AGTRL1 -809PRG2 -899TCN1 -716RAB3IL1 -976KIAA0404 -771CHRDL2 -754KCTD14 -94MRE11A -879MRE11A -982MMP7 -853CRYAB -175ZNF202 -527LOC387820 -553LOC387823 -178CCND2 -350NDUFA9 -485KCNA5 -805FLJ10665 -245FLJ10665 -576LOC285407 -743LOC390299 -771FLJ10652 -491LOC144245 -455PFKM -838DKFZp686O1689 -733C12orf10 -110DGKA -806DGKA -800SUOX -384ZNFN1A4 -874LYZ -944GAS41 -166VEZATIN -34LOC387876 -110C12orf8 -840COX6A1 -124LOC390364 -971LOC390364 -971LOC144678 -418LOC338797 -31SLC6A13 -445NRIP2 -107NOL1 -122LOC387701 -817FRAT1 -287FRAT1 -287ABCC2 -3ABCC2 -3HPS6 -237NFKB2 -790PNLIPRP2 -442
CLECSF12 -885CLECSF12 -885CLECSF12 -885CLECSF12 -885CLECSF12 -885CLECSF12 -885CLECSF12 -885CLECSF12 -885CLECSF12 -885KLRK1 -349PRB1 -589PRB1 -589PRB1 -589ADAMTS20 -965ADAMTS20 -965SLC38A2 -638K-ALPHA-1 -27KIAA1602 -262RACGAP1 -620K6IRS3 -708KRT4 -83NPFF -777STAT2 -94FLJ32949 -500IFNG -795MGC26598 -498HAL -358DKFZp434M0331 -920LOC400070 -223TSC -785GPR109B -392EPIM -568EPIM -568GALNT9 -798LOC440122 -169LOC221140 -342LOC440128 -877LOC387912 -279LOC341784 -327NURIT -947RB1 -525DKFZP434K1172 -595DKFZP434K1172 -595LOC144983 -906LOC144983 -892LOC144983 -896LOC400144 -807PROZ -865PROZ -865CRYL1 -768POSTN -32LOC440134 -367EBPL -973GUCY1B2 -832LOC338862 -918LOC404785 -818OR11H6 -269C14orf92 -234PSMA6 -219KTN1 -222C14orf166B -786EVL -28CCNB1IP1 -868CCNB1IP1 -868NEDD8 -143BAZ1A -508BAZ1A -508NFKBIA -963LOC283551 -302CDKL1 -902LOC400214 -138RTN1 -974LOC390488 -457PLEK2 -465PIGH -153RDH11 -251FLJ39779 -161KIAA1509 -179SERPINA2 -559SERPINA2 -559SERPINA2 -559SERPINA9 -856LOC390529 -204LOC388073 -112LOC400307 -332LOC283694 -71LOC400320 -443FLJ35785 -414LOC440249 -92HH114 -991PLA2G4B -483CAPN3 -318CAPN3 -318CAPN3 -318LOC400368 -320SLC28A2 -275DUT -32SCG3 -739LIPC -853OSTbeta -781LOC440289 -446COMMD4 -790LOC400433 -496LOC390637 -55FLJ11175 -113LOC440224 -815LOC283804 -112CHSY1 -876LOC440315 -303LOC440315 -303LOC400470 -62LOC388076 -715LOC440250 -206LOC440255 -981FLJ20313 -236AVEN -767KIAA0377 -896FBN1 -191SPPL2A -4BCL2L10 -653LOC145780 -610BNIP2 -842BNIP2 -421RASL12 -878SNAPC5 -540BG1 -364LOC400411 -62LOC440293 -718FLJ40113 -951
IP -207TBL3 0KIAA1171 -70TNFRSF12A -968DNAJA3 -24ALG1 -464ALG1 -464FLJ12363 -773LOC92017 -711TMC7 -412MGC16824 -271RBBP6 -795RBBP6 -795RBBP6 -795ITGAX -504ERAF -510LOC388248 -649FLJ38101 -981CES4 -221MT1H -280GAN -839PLCG2 -534CDH13 -906HSBP1 -425MLYCD -917FLJ45121 -772DPEP1 -765FLJ32252 -288FLJ32252 -346MGC35212 -360FLJ25410 -280LOC400506 -715LOC94431 -77DOC2A -265LOC441761 -889LOC57019 -375ZNF319 -360DNCLI2 -857DNCLI2 -857DKFZP434A1319 -236LOC439920 -70CHST5 -601CHST5 -756LOC390748 -242DPH2L1 -42LOC388323 -892MAP2K4 -128MAP2K4 -128KRTAP4-12 -78JJAZ1 -789CCL2 -912PSMB3 -889LOC440440 -1FLJ25168 -244SP2 -57LOC388406 -800TBX4 -465DDX42 -212DDX42 -212LOC90799 -734DKFZP586L0724 -829SSTR2 -874MRPS7 -822MRPS7 -719LOC388429 -804NARF -669NARF -669GEMIN4 -911OR1D2 -376ALOX15 -267SLC16A11 -346CLECSF14 -596CLECSF14 -640FLJ40217 -393RCV1 -761CDRT1 -618NOS2A -287NOS2A -287KRT25D -828KRT12 -585HUMGT198A -797HUMGT198A -690FLJ31222 -769LOC284058 -524GIP -957LOC400619 -823UNC13D -695LOC339162 -685LOC388462 -43SEH1L -801LOC284232 -988LOC284232 -845CABLES1 -281CABYR -908CABYR -908CABYR -908CABYR -908CABYR -908DSG3 -367SLC14A1 -333DCC -386RAB27B -713ZCCHC2 -249LOC342808 -306LOC284276 -397MYOM1 -232MC2R -113LOC441817 -600KIAA1632 -405FBXO15 -123FBXO15 -192LOC390865 -489TXNL4 -33CDC34 -270GZMM -678C19orf21 -573ARID3A -913LOC126295 -456MGC39581 -37TRAPPC5 -352LOC51257 9OR7C2 -399OR10H3 -953OR10H4 -323LOC284434 -560HSPC142 -632PGLS -935LOC148206 -288ZNF431 -967CLECSF12 -885
PSMC4 -215PSMC4 -215EGLN2 -452LOC388549 -412SYNGR4 -825RPL13A -816LOC402665 -925FLJ46385 -176LOC91661 -13LAIR2 -705LAIR2 -705KIR2DL1 -763KIR3DL2 3ZNF583 -867ZNF71 -861MGC4728 -490ZNF211 -76ZNF211 -76LOC401895 -957APBA3 -13FUT5 -174TNFSF7 8SH2D3A -2738D6A -950EIF3S4 -547RAB3D -852MGC20983 -338MGC20983 -338MGC20983 -338NDUFB7 -741LOC339377 -660IL12RB1 -56IL12RB1 -56IL12RB1 -56IL12RB1 -56LOC148198 -361CEBPA -564UNQ467 -521FLJ22573 -941CLC -823DYRK1B -849DYRK1B -849DYRK1B -849PSG11 -297PSG11 -297PSG4 -299PSG4 -299PSG9 -435FLJ34222 -415ERCC2 -123DMPK -988PGLYRP1 -212LIG1 -806FLJ32926 -288CGB8 -202TEAD2 -546FLJ20643 -895LOC400712 -236SIGLEC6 -972SIGLEC6 -972SIGLEC6 -972ZNF577 -582ZNF611 -148ZNF600 -716ZNF600 -37NALP9 -489PRDM2 -762PRDM2 -762LOC400743 -400PADI1 -598FLJ44952 -494DJ462O23.2 -973PPP1R8 5PPP1R8 5PPP1R8 5ATPIF1 -766ATPIF1 -766ATPIF1 -766LOC440581 -793CGI-94 -384FLJ14351 -753UROD -715LOC441885 -810DKFZp761D221 -478DKFZp761D221 -221IL23R -322CTH -6CTH -6AK5 -966DNAJB4 -987CDC7 -604LOC388649 -426DCLRE1B -406LOC440610 -739LOC440610 -584LOC440610 -652LOC441903 -538LOC440673 -482BNIPL -420BNIPL -419SPRR1B -826SPRR1B -826IL6R -110IL6R -110CKS1B -983SYT11 -785PMF1 -223LOC164118 -75FY -397NCSTN -809HSPA6 -839HSPA6 -611CGI-01 7CGI-01 7DKFZP564J047 -208HFL1 -551HFL3 -563NEK7 -714MGC14801 -276OR2AK2 -528LOC441873 -501LOC441873 -565LOC441873 -607LOC343068 -256ARID3A -913LOC126295 -456MGC39581 -37TRAPPC5 -352LOC51257 9OR7C2 -399OR10H3 -953OR10H4 -323LOC284434 -560HSPC142 -632PGLS -935LOC148206 -288ZNF431 -967CLECSF12 -885
1205 GRE-modifying SNPs
0 5000 10000 15000 20000
0.0
0.2
0.4
0.6
0.8
1.0
sample size
po
we
r
0 500 1000 1500 2000
0.0
0.2
0.4
0.6
0.8
1.0
sample sizep
ow
er
Population prevalence design
GEscan GEscan
Outcome-stratified design
Coding sequence polymorphisms
gene 1
gene 2
gene 3
gene 4
gene 5
gene 6
gene 7
gene 8
gene 9
gene 10
gene 11
gene 12
gene 13
gene 14
gene 15
gene 16
gene 17
gene 18
gene 19
gene 20
gene 21
gene 22
gene 23
gene 24
gene 25
gene 26
gene 27
gene 28
gene 29
gene 30
gene 31
gene 32
gene 33
gene 34
gene 35
gene 36
gene 37
gene 38
gene 39
gene 40
gene 41
TF1
TF2
TF3
Combinatorial discovery strategies
Smart study design + smart statistics + biological constraint
• Stratified sampling• Multi-stage testing• Cross-validation
• Data-mining / Machine learning
- CART/forests- MARS- PRIM
• Functional pathways• Regulatory pathways• Chromosomal units
Combinatorial discovery strategies
Smart study design + smart statistics + biological constraint
• Stratified sampling• Multi-stage testing• Cross-validation
• Data-mining / Machine learning
- CART/forests- MARS- PRIM
• Functional pathways• Regulatory pathways• Chromosomal units
Why is this critical?
Combinatorial discovery strategies
Smart study design + smart statistics + biological constraint
• Stratified sampling• Multi-stage testing• Cross-validation
• Data-mining / Machine learning
- CART/forests- MARS- PRIM
• Functional pathways• Regulatory pathways• Chromosomal units
Why is this critical?
Antagonistic pleiotropy is the norm → GxE
Combinatorial discovery strategies
Smart study design + smart statistics + biological constraint
• Stratified sampling• Multi-stage testing• Cross-validation
• Data-mining / Machine learning
- CART/forests- MARS- PRIM
• Functional pathways• Regulatory pathways• Chromosomal units
Why is this critical?
Antagonistic pleiotropy is the norm → GxEEpistatic interaction is the norm → GxG
Combinatorial discovery strategies
Smart study design + smart statistics + biological constraint
• Stratified sampling• Multi-stage testing• Cross-validation
• Data-mining / Machine learning
- CART/forests- MARS- PRIM
• Functional pathways• Regulatory pathways• Chromosomal units
Why is this critical?
Antagonistic pleiotropy is the norm → GxEEpistatic interaction is the norm → GxGHigh-order interactions are likely normal → GxGxExE
Combinatorial discovery strategies
Smart study design + smart statistics + biological constraint
• Stratified sampling• Multi-stage testing• Cross-validation
• Data-mining / Machine learning
- CART/forests- MARS- PRIM
• Functional pathways• Regulatory pathways• Chromosomal units
Why is this critical?
Antagonistic pleiotropy is the norm → GxEEpistatic interaction is the norm → GxGHigh-order interactions are likely normal → GxGxExELow power, “replication failure”, and epistemological slop
- the missing “h”, and the missing “E”
Technical aspects of study design and data analysis
Study designs, assay technologies, and statistical methods
1. “Gene discovery” (e.g., genetic epidemiology)• Candidate gene studies• Genome-wide association studies• The bioinformatic “middle road” – biological hypotheses buy power
2. Environmental regulation of health (via transcription)• Candidate transcript studies• Genome-wide approaches
3. Gene-Environment interaction• Statistical considerations
- Main effects and antagonistic pleiotropy- Interaction models- Combinatorial discovery
• Revisiting the “bioinformatic” middle road- Candidate set selection
a. Regulatory polymorphismb. Coding polymorphism
Technical aspects of study design and data analysis
Study designs, assay technologies, and statistical methods
1. “Gene discovery” (e.g., genetic epidemiology)• Candidate gene studies• Genome-wide association studies• The bioinformatic “middle road” – biological hypotheses buy power
2. Environmental regulation of health (via transcription)• Candidate transcript studies• Genome-wide approaches
3. Gene-Environment interaction• Statistical considerations• Revisiting the “bioinformatic” middle road
Take-home points for this group:
Take-home points for this group:
1. Gene-Environment interactions are likely far more…- ubiquitous- large in effect size- clinically/socially meaningful
…than current genetic analyses presume.
Take-home points for this group:
1. Gene-Environment interactions are likely far more…- ubiquitous- large in effect size- clinically/socially meaningful
…than current genetic analyses presume.
There is plenty left for you to find.
Take-home points for this group:
1. Gene-Environment interactions are likely far more…- ubiquitous- large in effect size- clinically/socially meaningful
…than current genetic analyses presume.
There is plenty left for you to find.
2. If you have the study you have (i.e., can’t alter sampling design), your major opportunities for increasing power/discovery involve:
Take-home points for this group:
1. Gene-Environment interactions are likely far more…- ubiquitous- large in effect size- clinically/socially meaningful
…than current genetic analyses presume.
There is plenty left for you to find.
2. If you have the study you have (i.e., can’t alter sampling design), your major opportunities for increasing power/discovery involve:
- focusing on substantive effects that are true/big (e.g., GxE, not G, given antagonistic pleiotropy; E, ExE, GxG,
etc.)
Take-home points for this group:
1. Gene-Environment interactions are likely far more…- ubiquitous- large in effect size- clinically/socially meaningful
…than current genetic analyses presume.
There is plenty left for you to find.
2. If you have the study you have (i.e., can’t alter sampling design), your major opportunities for increasing power/discovery involve:
- focusing on substantive effects that are true/big (e.g., GxE, not G, given antagonistic pleiotropy; E, ExE, GxG,
etc.)- modeling biological mechanisms to focus power/impose constraints (e.g., candidate systems, functional themes, regulatory themes)
Take-home points for this group:
1. Gene-Environment interactions are likely far more…- ubiquitous- large in effect size- clinically/socially meaningful
…than current genetic analyses presume.
There is plenty left for you to find.
2. If you have the study you have (i.e., can’t alter sampling design), your major opportunities for increasing power/discovery involve:
- focusing on substantive effects that are true/big (e.g., GxE, not G, given antagonistic pleiotropy; E, ExE, GxG,
etc.)- modeling biological mechanisms to focus power/impose constraints (e.g., candidate systems, functional themes, regulatory themes)- combinatorial data-mining (e.g., machine learning in discovery sample)
Take-home points for this group:
1. Gene-Environment interactions are likely far more…- ubiquitous- large in effect size- clinically/socially meaningful
…than current genetic analyses presume.
There is plenty left for you to find.
2. If you have the study you have (i.e., can’t alter sampling design), your major opportunities for increasing power/discovery involve:
- focusing on substantive effects that are true/big (e.g., GxE, not G, given antagonistic pleiotropy; E, ExE, GxG,
etc.)- modeling biological mechanisms to focus power/impose constraints (e.g., candidate systems, functional themes, regulatory themes)- combinatorial data-mining (e.g., machine learning in discovery sample)- sequential testing designs (low stringency discovery, med stringency test, high stringency confirm)
Take-home points for this group:
1. Gene-Environment interactions are likely far more…- ubiquitous- large in effect size- clinically/socially meaningful
…than current genetic analyses presume.
There is plenty left for you to find.
2. If you have the study you have (i.e., can’t alter sampling design), your major opportunities for increasing power/discovery involve:
- focusing on substantive effects that are true/big (e.g., GxE, not G, given antagonistic pleiotropy; E, ExE, GxG,
etc.)- modeling biological mechanisms to focus power/impose constraints (e.g., candidate systems, functional themes, regulatory themes)- combinatorial data-mining (e.g., machine learning in discovery sample)- sequential testing designs (low stringency discovery, med stringency test, high stringency confirm)
Your advantage is smart data analysis.
Follow-up references
Overview of genetics / biologyAttia, J., et al. (2009) How to use an article about genetic association: A: Background concepts. JAMA, 301, 74-81
Genetic association studiesHirschhorn, J., & Daly, M. (2005) Genome-wide association studies for common diseases and complex traits. Nature Reviews Genetics, 6, 95-108.Attia, J., et al. (2009) How to use an article about genetic association: B: Are the results of the study valid? JAMA, 301, 191-197.Cordell, H, & Clayton, D. (2005) Genetic epidemiology 3: Genetic association studies. Lancet, 366, 1121-1131
Basic statistical modeling for geneticsSiegmund, D., & Yakir, B. (2007) The statistics of gene mapping. New York, Springer
Sampling & statistical approaches for GxE discoveryThomas, D., (2010) Gene-environment-wide association studies: emerging approaches. Nature Reviews Genetics, 11, 259-272
Statistical strategies for combinatorial discoveryHastie, T., Tibshirani, R. & Friedman, J. (2001) The elements of statistical learning. New York, Springer.
.
Perspectives on the State of the Field
How can we best promote the integration of genetic and demographic approaches?
Application clinic
Open microphone
1. What do you want to accomplish?
2. At what stage are you now?i. Study design?ii. Data collection?iii. Analysis and reporting?
3. How can we be of help?
Genomics Workshop
Demography of Aging Centers Biomarker Network Meeting in Conjunction with the Annual Meeting of the PAA
April 14, 9:00 AM to 3:30 PM – Hyatt Regency, Dallas, Texas
Sponsored by USC/UCLA Center of Biodemography and Population Health
Organized by Teresa Seeman, Steven Cole, Eileen Crimmins
Richlin et al. Brain, Behavior & Immunity (2004)