pedigree analysis in a genome- wide world: of mice and...
TRANSCRIPT
Pedigree Analysis in a Genome-wide World: Of Mice and Moms
Janet Sinsheimer PhD Prof. Human Genetics
Center Applied Statistics Seminar 11/15/2011
Gene Mapping with Pedigrees
• When markers were scarce, pedigrees provided the optimal study design to map the location of trait genes. • Analyzed Linkage = use the patterns of
transmissions of trait phenotypes from parent to child and compare these patterns to the pattern of transmissions of genes whose location are known.
• Marker genes = genes of known location.
Example of Linkage Analysis
Disease susceptibility gene between markers 4 and 5
Linkage Analysis’ Resolution is Poor
Most likely region ~ 3 million bases, could be dozens of genes
What about Linkage Analysis using Model Organism Pedigrees?
• Model Organisms can have similar traits to humans. • Find genes by using planned mating of inbred
founders under controlled the environment (increases power) then look for analogous genes in humans.
• Mice have been used extensively • Highly inbred stocks lead to identical founders • Cheap to keep and rapid generation times • Used often in medical research. Can be genetically
engineered to carry human genes. • There has been careful characterizations of an
extensive number of mutations.
Classic Inter-Cross Design
• Start with highly inbred strains as the founders.
• Mate to create F1 generation and then mate brothers and sisters to create the F2s.
Classical Inbred Crosses
• Classical crosses, such as intercrosses, use just two strains and exploit recent recombination events.
• Sparse marker maps. • Statistical analysis typically is analysis of
variance. • Power to map is high but the resolution is still
low. Design does not take advantage of inbred lines’ common histories.
Genome Projects have made Association Testing in Unrelated Individuals Practical
• Simple idea: A marker M is associated with trait T if trait values differ by marker genotype – ANOVA.
• To be powerful, association testing in unrelateds requires very closely spaced markers and a high percentage of affected individuals having the same variant. • Common variant – Common disease hypothesis. Common
diseases are caused in part by genetic variants that are also common but have small to moderate effect sizes.
• Easy and quick to implement - do millions of tests in a couple of hours using an average desktop computer.
Genome-wide Association Study for Height
• Weedon et al. (2008) used ~30,000 individuals and more than ¼ million loci (SNPs).
Success or Failure? Statistical but no Clinical Significance.
• Strong statistical support that 20 genes are associated with height (even after accounting for multiple testing). • P-values are very small, 2x10-24 to 3x10-7, reject
the null hypothesis of no association. • These 20 polymorphisms explain very little
of the variation in height – less than the amount the average person shrinks from morning to evening.
What Should we try next? • Return to pedigrees but don’t return to
linkage analysis. • Develop association methods that exploit
the information available in pedigrees. • First project: Association using local strain
origins for inbred strains • Second project: Use Maternal Genetic Effects to
improve power.
Project 1 Exploits Recent Inbred Cross Design Innovations
• Pedigrees are deeper and multiple founder strains are used providing more contrast and more recombination events for better resolution.
• Gene chips with dense marker maps exist. • Strain phylogenies are better known. • Quasi-random mating sometimes used. • Want to take the common histories of the different
founder strains into account. Straight forward to do if mating is nearly random. Otherwise need a new method.
Collaborative Cross Design
But the method is quite general and allow for a variety of other crossing scheme.
Overview of our New Method
• For mice progeny, determine the strain origins for small sections all over the chromosomes (local strain origins).
• Use these local strain origins as predictors of trait values in a regression analysis – more informative than using SNP genotypes.
• Also take into account the common genetic history of the progeny by modeling the polygenic background, which is a function of the global strain fractions.
The Global Strain Fractions for the Progeny Effect the Trait
Values • Both the mean and the covariance of a trait
depend on the fraction of the genome contributed by a particular strain, the global strain fractions. Think of as providing a genetic history for the mouse progeny.
• Global strain fractions are calculated recursively starting with the founders.
Local Strain Origins are Mean Predictors in the Association Analysis
• Instead of using the SNPs in a region as predictors – use the best guess of the maternally derived and paternally derived section of the chromosome.
• Need to have a dense set of markers. • Imputation of local strain origins is done in Mendel
software by minimizing a penalized likelihood one individual at a time and assigning the individual the most likely strain.
• The penalty reduces the number of switches between founder strains.
• We found that the accuracy of the algorithm to be very high >98%
Strain Association versus SNP Association • Strain association can be far more informative and therefore
more powerful. • Note: working with two traits together is better than each alone
Traits analyzed
Trait 1 alone Trait 2 alone Traits 1 and 2
SNP LRT 6.984 0.494 11.914
SNP DF 1 1 2
SNP p-value 0.0082 0.493 0.0026
Strain LRT (CI interval)
27.55 0.90 Mb
31.57 6.46 Mb
46.57 0.73 Mb
Strain DF 3 3 6
Strain p-value 4.51X10-6 6.44X10-7 2.34X10-7
Project 1 Future Work • Map Genes for Multivariate Traits using Actual
Collaborative Cross Data • Collaborative Cross Status:
• Both genotype and phenotype data now available
• http://csbio.unc.edu/CCstatus/index.py • Project could make an excellent masters’ thesis
and would provide experience with very large genetic data sets and with genetic analyses as well as method development opportunities
Project 2: Determining Maternal Influences on Offspring Traits
• Human Data Project • Disturbances that effect fetal development may lead to adult diseases. • Prenatal environment has been postulated to have a role in common diseases such as: - Cardiovascular disease - Anxiety and Depression - Diabetes - Schizophrenia - Obesity - ADHD • Maternal effects are difficult to detect in GWAS.
Prenatal Effects can have Genetic Origins
• Examples of genetically induced prenatal effects include maternal-fetal genotype incompatibilities.
• Maternal-fetal incompatibility (MFG) = Combinations of maternal and fetal genes that create an adverse prenatal environment and lead to disease in the offspring
• Phenotypes induced by MFG incompatibility cluster in families and are heritable.
Hypothetical Example of Maternal-Fetal Genotype
Incompatibility § Simple Case: One locus, two alleles
§ One allele codes for an antigen § The other allele codes for nothing (null). § Mother is homozygous null, fetus is
heterozygous § The mother produces an immune response
to the fetus’ antigen that is detrimental to the fetus.
§ Does this really occur?
+ + + +
- - - - - - - -
- - - - - - - -
+ + + +
IgG IgG IgG
IgG
RHD Incompatibility and Hemolytic Disease of the Newborn
Mom forms IgG antibodies against baby’s expressed antigen and destroys the babies RBCs.
RhHDN: Jaundice Kernicterus Hypoxia
Drawing courtesy of C. Palmer
Mom’s genotype = dd Baby’s genotype = Dd
Example 2: HLA Matching and Immunological Intolerance
Immunological intolerance = failure to stimulate an immune response that is needed to protect the baby from mother or exogenous agents. Immunological intolerance can increase risk of disease
Mom’s genotype = i/j Baby’s genotype = i/i (Matched from mom’s view)
i/i i/i i/i i/i
i/j i/j i/j i/j i/j i/j
i/j i/j i/j i/j
i/i i/i i/i i/i
Find Study Designs and Analysis Approaches that can Answer the Following
Questions § Is there a high risk allele that acts through the
offspring’s genotype alone to increase risk of disease?
§ Is there a high risk allele that acts through the mother’s genotype alone to increase risk of disease?
§ Are there combinations of maternal and offspring’s genotypes that increase risk of disease in the offspring?
Possible Study Designs • Standard GWAS designs that use unrelated
individuals are poorly powered to detect these effects (Sinsheimer 2003) and so MFG incompatibilities can account for a portion of the missing heritability.
• Effective Designs: • Case-Mother, Control-Mother. e.g. Chen
J, Zheng H, Wilson ML. 2009 • Nuclear family based “affected only”
tests – e.g. Hsieh HJ et al , 2006, 2007.
Our Motivation: Finnish Schizophrenia Family Study
• 230 families (161 nuclear and 69 extended) comprised of affected individuals and their available relatives from Finland. • Largest family has 73 individuals, 32 of them
genotyped • 553 affected individuals, 1-6 per pedigree,
60% males
• 1090 individuals genotyped at HLA B. • Our original analysis used nuclear families
and found a significant effect of HLA B matching (Palmer et al. 2006)
Using Data from Complex Pedigrees in Nuclear Family MFG test
Which Family to choose?
437 438
441 442
440
445
447
1
439 444
446
449
443
448
Using all of the Nuclear Families could introduce Bias, Inflate Significance
437 438
441 442
440
445
447
1
439 444
446
449
443
448
How to Test for MFG Incompatibility using Extended
Pedigrees?
• Want an approach that can use varied pedigree structures and incomplete data.
• Use the likelihood of the genotype patterns conditional on the affecteds in the pedigree.
• Requires more assumptions than a nuclear family or case-mom, control-mom tests.
So what do we get in Return?
• More accurate and precise estimates when we have extended pedigrees with more than one affected.
• Illustrate with Simulated Data
Example Simulation • Extended pedigrees with
4 affecteds • 300 extended pedigrees • 1000 data sets • Variable relative risks
due to matching • Compare results of the
nuclear family test (3 families per pedigree) with extended pedigree test
Selected Simulation Results µ Extended
Pedigrees Nuclear Families
Est µ 95%Coverage
Rejection Rate
Est µ 95%Coverage
Rejection Rate
1.00 0.987 0.965
0.046 0.986 0.969
0.042
1.50 1.490 0.950
0.856 1.438 0.937
0.807
2.50 2.504 0.953
0.999 2.316 0.908
0.991
Treating Data as Nuclear Families leads to slight loss of power and underestimates of MFG incompatibility
What about the Finnish Schizophrenia Example?
Model Male µ 95% CI
Female µ 95% CI
Log Likelihood
Null =1.0 1.0 -2868.179 Full 0.890
(0.687,1.153) 1.449 (1.109,1.892)
-2864.638
Only Female = 1.0 1.417 (1.089,1.843)
-2865.042
• Reject null of no MFG matching in favor of full model (p-value = 0.029) • Reject null of no MFG matching in favor of female effect (p-value = 0.012)
Project 2 Future Work
• Need to make the analysis more efficient. Better algorithms to speed up.
• How can we extend the method to handle quantitative traits?
• Again could make an excellent masters’ project or part of PhD dissertation.
References and Software • Mouse references:
• “QTL Association Mapping by Imputation of Strain Origins in Multifounder Crosses”, J.J. Zhou, A. Ghazalpour, E.M. Sobel, J.S. Sinsheimer, K. Lange, under review.
• Bauman LE, Sinsheimer JS, Sobel EM, Lange K. (2008) Genetics. 180:1743-61.
• MFG references: • Childs EJ, Sobel EM, Palmer CG, Sinsheimer JS. (2011) Hum
Hered. 72:160-171. • Childs EJ, Palmer CG, Lange K, Sinsheimer JS. (2010) Genet
Epidemiol. 34:512-21
• Both methods implemented in the “Inbred Strains Analysis” Option, Mendel version 11.0 and higher www.genetics.ucla.edu/software
Acknowledgements
• Collaborators: E. Childs, A. Ghazalpour, K. Lange, C. Palmer, E.M. Sobel, J.J. Zhou
• Funding NIH GM53275 and MH59490