pedigree analysis in a genome- wide world: of mice and...

Pedigree Analysis in a Genome-wide World: Of Mice and Moms

Janet Sinsheimer PhD Prof. Human Genetics

Center Applied Statistics Seminar 11/15/2011

Gene Mapping with Pedigrees

•  When markers were scarce, pedigrees provided the optimal study design to map the location of trait genes. •  Analyzed Linkage = use the patterns of

transmissions of trait phenotypes from parent to child and compare these patterns to the pattern of transmissions of genes whose location are known.

•  Marker genes = genes of known location.

Example of Linkage Analysis

Disease susceptibility gene between markers 4 and 5

Linkage Analysis’ Resolution is Poor

Most likely region ~ 3 million bases, could be dozens of genes

What about Linkage Analysis using Model Organism Pedigrees?

•  Model Organisms can have similar traits to humans. •  Find genes by using planned mating of inbred

founders under controlled the environment (increases power) then look for analogous genes in humans.

•  Mice have been used extensively •  Highly inbred stocks lead to identical founders •  Cheap to keep and rapid generation times •  Used often in medical research. Can be genetically

engineered to carry human genes. •  There has been careful characterizations of an

extensive number of mutations.

Classic Inter-Cross Design

•  Start with highly inbred strains as the founders.

•  Mate to create F1 generation and then mate brothers and sisters to create the F2s.

Classical Inbred Crosses

•  Classical crosses, such as intercrosses, use just two strains and exploit recent recombination events.

•  Sparse marker maps. •  Statistical analysis typically is analysis of

variance. •  Power to map is high but the resolution is still

low. Design does not take advantage of inbred lines’ common histories.

Genome Projects have made Association Testing in Unrelated Individuals Practical

•  Simple idea: A marker M is associated with trait T if trait values differ by marker genotype – ANOVA.

•  To be powerful, association testing in unrelateds requires very closely spaced markers and a high percentage of affected individuals having the same variant. •  Common variant – Common disease hypothesis. Common

diseases are caused in part by genetic variants that are also common but have small to moderate effect sizes.

•  Easy and quick to implement - do millions of tests in a couple of hours using an average desktop computer.

Genome-wide Association Study for Height

•  Weedon et al. (2008) used ~30,000 individuals and more than ¼ million loci (SNPs).

Success or Failure? Statistical but no Clinical Significance.

•  Strong statistical support that 20 genes are associated with height (even after accounting for multiple testing). •  P-values are very small, 2x10-24 to 3x10-7, reject

the null hypothesis of no association. •  These 20 polymorphisms explain very little

of the variation in height – less than the amount the average person shrinks from morning to evening.

What Should we try next? •  Return to pedigrees but don’t return to

linkage analysis. •  Develop association methods that exploit

the information available in pedigrees. •  First project: Association using local strain

origins for inbred strains •  Second project: Use Maternal Genetic Effects to

improve power.

Project 1 Exploits Recent Inbred Cross Design Innovations

•  Pedigrees are deeper and multiple founder strains are used providing more contrast and more recombination events for better resolution.

•  Gene chips with dense marker maps exist. •  Strain phylogenies are better known. •  Quasi-random mating sometimes used. •  Want to take the common histories of the different

founder strains into account. Straight forward to do if mating is nearly random. Otherwise need a new method.

Collaborative Cross Design

But the method is quite general and allow for a variety of other crossing scheme.

Overview of our New Method

•  For mice progeny, determine the strain origins for small sections all over the chromosomes (local strain origins).

•  Use these local strain origins as predictors of trait values in a regression analysis – more informative than using SNP genotypes.

•  Also take into account the common genetic history of the progeny by modeling the polygenic background, which is a function of the global strain fractions.

The Global Strain Fractions for the Progeny Effect the Trait

Values •  Both the mean and the covariance of a trait

depend on the fraction of the genome contributed by a particular strain, the global strain fractions. Think of as providing a genetic history for the mouse progeny.

•  Global strain fractions are calculated recursively starting with the founders.

Local Strain Origins are Mean Predictors in the Association Analysis

•  Instead of using the SNPs in a region as predictors – use the best guess of the maternally derived and paternally derived section of the chromosome.

•  Need to have a dense set of markers. •  Imputation of local strain origins is done in Mendel

software by minimizing a penalized likelihood one individual at a time and assigning the individual the most likely strain.

•  The penalty reduces the number of switches between founder strains.

•  We found that the accuracy of the algorithm to be very high >98%

Strain Association versus SNP Association •  Strain association can be far more informative and therefore

more powerful. •  Note: working with two traits together is better than each alone

Traits analyzed

Trait 1 alone Trait 2 alone Traits 1 and 2

SNP LRT 6.984 0.494 11.914

SNP DF 1 1 2

SNP p-value 0.0082 0.493 0.0026

Strain LRT (CI interval)

27.55 0.90 Mb

31.57 6.46 Mb

46.57 0.73 Mb

Strain DF 3 3 6

Strain p-value 4.51X10-6 6.44X10-7 2.34X10-7

Project 1 Future Work •  Map Genes for Multivariate Traits using Actual

Collaborative Cross Data •  Collaborative Cross Status:

•  Both genotype and phenotype data now available

•  http://csbio.unc.edu/CCstatus/index.py •  Project could make an excellent masters’ thesis

and would provide experience with very large genetic data sets and with genetic analyses as well as method development opportunities

Project 2: Determining Maternal Influences on Offspring Traits

•  Human Data Project • Disturbances that effect fetal development may lead to adult diseases. • Prenatal environment has been postulated to have a role in common diseases such as: -  Cardiovascular disease - Anxiety and Depression - Diabetes - Schizophrenia -  Obesity - ADHD •  Maternal effects are difficult to detect in GWAS.

Prenatal Effects can have Genetic Origins

•  Examples of genetically induced prenatal effects include maternal-fetal genotype incompatibilities.

•  Maternal-fetal incompatibility (MFG) = Combinations of maternal and fetal genes that create an adverse prenatal environment and lead to disease in the offspring

•  Phenotypes induced by MFG incompatibility cluster in families and are heritable.

Hypothetical Example of Maternal-Fetal Genotype

Incompatibility §  Simple Case: One locus, two alleles

§ One allele codes for an antigen §  The other allele codes for nothing (null). § Mother is homozygous null, fetus is

heterozygous §  The mother produces an immune response

to the fetus’ antigen that is detrimental to the fetus.

§  Does this really occur?

+ + + +

- - - - - - - -

- - - - - - - -

+ + + +

IgG IgG IgG

IgG

RHD Incompatibility and Hemolytic Disease of the Newborn

Mom forms IgG antibodies against baby’s expressed antigen and destroys the babies RBCs.

RhHDN: Jaundice Kernicterus Hypoxia

Drawing courtesy of C. Palmer

Mom’s genotype = dd Baby’s genotype = Dd

Example 2: HLA Matching and Immunological Intolerance

Immunological intolerance = failure to stimulate an immune response that is needed to protect the baby from mother or exogenous agents. Immunological intolerance can increase risk of disease

Mom’s genotype = i/j Baby’s genotype = i/i (Matched from mom’s view)

i/i i/i i/i i/i

i/j i/j i/j i/j i/j i/j

i/j i/j i/j i/j

i/i i/i i/i i/i

Find Study Designs and Analysis Approaches that can Answer the Following

Questions §  Is there a high risk allele that acts through the

offspring’s genotype alone to increase risk of disease?

§  Is there a high risk allele that acts through the mother’s genotype alone to increase risk of disease?

§  Are there combinations of maternal and offspring’s genotypes that increase risk of disease in the offspring?

Possible Study Designs •  Standard GWAS designs that use unrelated

individuals are poorly powered to detect these effects (Sinsheimer 2003) and so MFG incompatibilities can account for a portion of the missing heritability.

•  Effective Designs: •  Case-Mother, Control-Mother. e.g. Chen

J, Zheng H, Wilson ML. 2009 •  Nuclear family based “affected only”

tests – e.g. Hsieh HJ et al , 2006, 2007.

Our Motivation: Finnish Schizophrenia Family Study

•  230 families (161 nuclear and 69 extended) comprised of affected individuals and their available relatives from Finland. •  Largest family has 73 individuals, 32 of them

genotyped •  553 affected individuals, 1-6 per pedigree,

60% males

•  1090 individuals genotyped at HLA B. •  Our original analysis used nuclear families

and found a significant effect of HLA B matching (Palmer et al. 2006)

Using Data from Complex Pedigrees in Nuclear Family MFG test

Which Family to choose?

437 438

441 442

440

445

447

1

439 444

446

449

443

448

Using all of the Nuclear Families could introduce Bias, Inflate Significance

437 438

441 442

440

445

447

1

439 444

446

449

443

448

How to Test for MFG Incompatibility using Extended

Pedigrees?

•  Want an approach that can use varied pedigree structures and incomplete data.

•  Use the likelihood of the genotype patterns conditional on the affecteds in the pedigree.

•  Requires more assumptions than a nuclear family or case-mom, control-mom tests.

So what do we get in Return?

•  More accurate and precise estimates when we have extended pedigrees with more than one affected.

•  Illustrate with Simulated Data

Example Simulation •  Extended pedigrees with

4 affecteds •  300 extended pedigrees •  1000 data sets •  Variable relative risks

due to matching •  Compare results of the

nuclear family test (3 families per pedigree) with extended pedigree test

Selected Simulation Results µ Extended

Pedigrees Nuclear Families

Est µ 95%Coverage

Rejection Rate

Est µ 95%Coverage

Rejection Rate

1.00 0.987 0.965

0.046 0.986 0.969

0.042

1.50 1.490 0.950

0.856 1.438 0.937

0.807

2.50 2.504 0.953

0.999 2.316 0.908

0.991

Treating Data as Nuclear Families leads to slight loss of power and underestimates of MFG incompatibility

What about the Finnish Schizophrenia Example?

Model Male µ 95% CI

Female µ 95% CI

Log Likelihood

Null =1.0 1.0 -2868.179 Full 0.890

(0.687,1.153) 1.449 (1.109,1.892)

-2864.638

Only Female = 1.0 1.417 (1.089,1.843)

-2865.042

•  Reject null of no MFG matching in favor of full model (p-value = 0.029) •  Reject null of no MFG matching in favor of female effect (p-value = 0.012)

Project 2 Future Work

•  Need to make the analysis more efficient. Better algorithms to speed up.

•  How can we extend the method to handle quantitative traits?

•  Again could make an excellent masters’ project or part of PhD dissertation.

References and Software •  Mouse references:

•  “QTL Association Mapping by Imputation of Strain Origins in Multifounder Crosses”, J.J. Zhou, A. Ghazalpour, E.M. Sobel, J.S. Sinsheimer, K. Lange, under review.

•  Bauman LE, Sinsheimer JS, Sobel EM, Lange K. (2008) Genetics. 180:1743-61.

•  MFG references: •  Childs EJ, Sobel EM, Palmer CG, Sinsheimer JS. (2011) Hum

Hered. 72:160-171. •  Childs EJ, Palmer CG, Lange K, Sinsheimer JS. (2010) Genet

Epidemiol. 34:512-21

•  Both methods implemented in the “Inbred Strains Analysis” Option, Mendel version 11.0 and higher www.genetics.ucla.edu/software

Acknowledgements

•  Collaborators: E. Childs, A. Ghazalpour, K. Lange, C. Palmer, E.M. Sobel, J.J. Zhou

•  Funding NIH GM53275 and MH59490

pedigree analysis in a genome- wide world: of mice and...

Documents