theory of ibd sharing in the wright-fisher model

Theory of IBD sharing in the Wright-Fisher modelShai Carmi, Pier Francesco Palamara, Vladimir Vacic, and Itsik Pe’er

Department of Computer Science, Columbia University, New York, NY

1. A. Gusev et al., Whole population, genome-wide mapping of hidden relatedness, Genome Res. 19, 318 (2009).2. B. L. Browning and S. R. Browning, A fast, powerful method for detecting identity-by-descent, AJHG 88, 173 (2011).3. S. R. Browning and B. L. Browning, Identity by Descent Between Distant Relatives: Detection and Applications, Annu. Rev. Genet. 46, 615 (2012).4. P. F. Palamara et al. Length Distributions of Identity by Descent Reveal Fine-Scale Demographic History, AJHG (2012).5. H. Li and R. Durbin, Inference of human population history from individual whole-genome sequences, Nature 449, 851 (2011).6. A. Gusev et al. Low-Pass Genome-Wide Sequencing and Variant Inference Using Identity-by-Descent in an Isolated Human Population, Genetics

190, 679 (2012).7. Y. Shen et al., Coverage tradeoffs and power estimation in the design of whole-genome sequencing experiments for detecting association,

Bioinformatics 27, 1995 (2011).

References

Distribution of the total sharing: renewal theory

Background: Identity-by-descent

Direct calculation of the variance

The cohort-averaged sharing and imputation by IBD

More applications and conclusions

A B

AB

A shared segment

• In populations that have recently underwent strong genetic drift, most individuals share a very recent common ancestor.

• Long haplotypes are frequently shared identical-by-descent (IBD).• Algorithms can detect IBD shared segments between all pairs in

large cohorts based on either segment length or frequency [1,2].• Applications:

o Demographic inferenceo Imputationo Phasingo Association of rare variants/haplotypeso Pedigree reconstructiono Detection of positive selection.o See review [3]

How does the amount of sharing depend on the demographic history of the population?

The Wright-Fisher model:• Non-overlapping, discrete generations.• Constant size of N haploid individuals,

or,changing size

• Ignore recent mutations.• Recombination is a Poisson process.• Each pair of individuals (linages) has

probability 1/N to coalesce in the previous generation.

• For continuous-time and large population size, approximated by the coalescent.

• (Scaled) Time to most recent common ancestor: ( for constant size).

• Assume a segment can be detected only if it is longer than m (Morgans).

• Denote the fraction of the chromosome shared between two random individuals as the total sharing fT.

• Palamara et al. [4]:.

• For constant size,

• Used to infer population histories.

Questions:• Distribution? Higher moments?• Differences between individuals?• Applications [imputation by IBD, sharing between siblings]

ℓ1

0 Lcoordinate

ℓ2ℓ3 ℓ4

ℓ5 ℓ6ℓ7 ℓ8

ℓ9 ℓ10ℓ11

m ℓT=ℓ1+ℓ5+ℓ9

A

B

• In each block, the two chromosomes maintain the same ancestor. • Blocks (segments) end at recombination events.• Define ℓT as the total length of segments having length ≥m.• In the Sequentially Markov Coalescent, fT=ℓT/L.

• Li and Durbin [5] showed that at segment ends:

• For a given t, the probability of no recombination at distance ℓ is . Therefore (see also [4]),

• For constant size, , .• Find the distribution of ℓT using renewal theory. Map:

o Coordinate on chromosome → time (t) o Shared segments → waiting times between eventso L → T, ℓT → tS

o Segment length PDF P(ℓ) → waiting time PDF ψ(τ).• Laplace transform the PDF PT(tS) → Ps(u).

The PDF of the number of shared segments (Laplace transformed T → s)

• A general equation for the variance of the total sharing fT:

• M: number of markers; sum is over all markers• I(s): indicator of a site to lie on a shared segment; with probability π.• π2(s1,s2): probability of both sites s1 and s2 to lie on shared segments.• A simple approximation:

• For a constant size population:

• A full solution of the variance: (only key equations shown)• .• pnr: probability of no recombination between the two sites in the

history of the two chromosomes.• πnr: the probability of the two sites to lie on shared segments, given

that there was no recombination (similarly for πr).• For a discrete ancestral process and distance d between the sites:

• When there was recombination, calculation of πr is complicated by the fact that the segments are bounded on one end.

• Solve by explicit calculations on the coalescent with recombination.

• Define the cohort-averaged sharing:

• For each individual: the average sharing to the rest of the cohort.

• Approximate the variance:• .• For small n, • For large n, , independent of n.• Distribution is approximately normal.

Imputation by IBD:

• Assume a cohort of size n is genotyped and IBD sharing is detected between all pairs.

• A fraction ns/n of the individuals is selected for sequencing.

• Non-sequenced individuals are imputed using the sequenced individuals along segments of IBD sharing.

• What is the expected imputation success rate when individuals are randomly selected?

• What is the success rate when individuals are selected according to their cohort-averaged sharing [6]?

• Define pc as the fraction of the genome covered by IBD segments shared with the sequenced individuals.

Downstream effect on power of association:

• The effective number of sequenced individuals increases with imputation success rate.

• Power to detect variant of frequency β appearing in cases only [7]:

See our paper:

S. Carmi, P. F. Palamara, V. Vacic, T. Lencz, A. Darvasi, and I. Pe’er, The variance of identity-by-descent sharing in the Wright-Fisher model, Submitted (2012). arXiv:1206.4745.

Sharing between siblings:

• The variance in sharing between (same parent) chromosomes of siblings is known.

• What happens when siblings come from an inbred population and thus share also due to remote ancestry?

• The mean sharing is • When calculating variance, decompose sharing into

either same-grandparent or remote.

A simple estimator of the population size:

• Use , isolate N and simplify: , where is the total sharing averaged over all pairs.

• Can be seen that • The variance of the estimator:

.

Summary and discussion:

• We obtained analytical results for properties of IBD sharing in the Wright-Fisher model.

• Calculated the distribution using renewal theory and the variance using two methods.

• Treat genotyping/phasing errors by increasing the length cutoff m. If segments are missed with probability ε, can show that both mean and variance are scaled by (1- ε).

• Other analytical approaches and applications to demographic inferences in [4] and talk here.

• The sharing per individual (averaged over cohort) exhibits a surprisingly wide distribution even for large cohorts.

• Can be taken advantage of in imputation by IBD.

,

theory of ibd sharing in the wright-fisher model

Documents

segments of ibd sharing

ibd segments

descent ibd

ibd shared segments

total sharing

nonsequenced individuals

applications imputation

total length of segments