lecture 22: signatures of selection and introduction to linkage disequilibrium

Click here to load reader

Download Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium

Post on 12-Jan-2016




0 download

Embed Size (px)


Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium. November 12, 2012. Last Time. Sequence data and quantification of variation Infinite sites model Nucleotide diversity ( π ) Sequence-based tests of neutrality Tajima ’ s D Hudson-Kreitman-Aguade - PowerPoint PPT Presentation


  • Lecture 22: Signatures of Selection and Introduction to Linkage DisequilibriumNovember 12, 2012

  • Last TimeSequence data and quantification of variationInfinite sites modelNucleotide diversity ()Sequence-based tests of neutralityTajimas DHudson-Kreitman-AguadeSynonymous versus Nonsynonymous substitutionsMcDonald-Kreitman

  • TodaySignatures of selection based on synonymous and nonsynonymous substitutionsMultiple loci and independent segregationEstimating linkage disequilibrium

  • Using Synonymous Substitutions to Control for Factors Other Than Selection

    dN/dS or Ka/Ks Ratios

  • Types of Mutations (Polymorphisms)

  • First and second position SNP often changes amino acidUCA, UCU, UCG, and UCC all code for SerineThird position SNP often synonymousMajority of positions are nonsynonymousNot all amino acid changes affect fitness: allozymes Synonymous versus Nonsynonymous SNP

  • Synonymous & Nonsynonymous SubstitutionsSynonymous substitution rate can be used to set neutral expectation for nonsynonymous ratedS is the relative rate of synonymous mutations per synonymous sitedN is the relative rate of nonsynonymous mutations per non-synonymous site = dN/dSIf = 1, neutral selectionIf < 1, purifying selectionIf > 1, positive Darwinian selectionFor human genes, 0.1

  • Complications in Estimating dN/dSMultiple mutations in a codon give multiple possible pathsTwo types of nucleotide base substitutions resulting in SNPs: transitions and transversions not equally likelyBack-mutations are invisibleComplex evolutionary models using likelihood and Bayesian approaches must be used to estimate dN/dS (also called KA/KS or KN/KS depending on method) (PAML package)CGT(Arg)->AGA(Arg)CGT(Arg)->AGT(Ser)->AGA(Arg)CGT(Arg)->CGA(Arg)->AGA(Arg)

  • dn/ds ratios for 363 mouse-rat comparisons interleukin-3: mast cells and bone marrow cells in immune systemMost genes show purifying selection (dN/dS < 1)Some evidence of positive selection, especially in genes related to immune system

  • McDonald-Kreitman TestConceptually similar to HKA testUses only one geneContrasts ratios of synonymous divergence and polymorphism to rates of nonsynonymous divergence and polymorphismGene provides internal control for evolution rates and demography

  • Aligned 11,624 gene sequences between human and chimpCalculated synonymous and nonsynonymous substitutions between species (Divergence) and within humans (SNPs)Identified 304 genes showing evidence of positive selection (blue) and 814 genes showing purifying selection (red) in humansBustamente et al. 2005. Nature 437, 1153-1157Positive selection: defense/immunity, apoptosis, sensory perception, and transcription factorsPurifying selection: structural and housekeeping genesApplication of McDonald-Kreitman Test:

  • Genes showing purifying (red) or positive (blue) selection in the human genome based on the McDonald-Kreitman Test Bustamente et al. 2005. Nature 437, 1153-1157

  • How can you differentiate between effects of selection and demographic effects on sequence variation?Will this work for organellar DNA?

  • Extending to Multiple LociSo far, only considering dynamics of alleles at single lociLoci occur on chromosomes, linked to other loci!The fitness of a single locus ripped from its interactive context is about as relevant to real problems of evolutionary genetics as the study of the psychology of individuals isolated from their social context is to an understanding of mans sociopolitical evolutionRichard Lewontin (quoted in Hedrick 2005)Size of region that must be considered depends on Linkage Disequilibrium

  • Gametic (Linkage) Disequilibrium (LD)Nonrandom association of alleles at different loci into gametesHaplotype: Genotype of a group of closely linked lociLD is a major factor in evolutionLD itself provides insights into population historyEstimation of LD is critical for ALL population genetic data

  • Nomenclature and conceptsTwo loci, two allelesFrequency of allele i at locus 1 is piFrequency of allele i at locus 2 is qi

  • Nomenclature and conceptsGenotype is written as A1A2B1B2A1 and B1 are in coupling phaseA1 and B2 are in repulsion phase

  • Gametic DisequilibriumEasiest to think about physically linked loci, but not necessarily the caseWhat Are Expected Frequencies of Gametes in a Population Under Independent Assortment?

  • What are expected frequency of Gametes with complete linkage?A1A2B1B2p1p2q1q2

  • Linkage disequilibrium measure, D

  • Problem: D is sensitive to allele frequenciesExample, if D is positive: p1=0.5, q2=0.5, Dmax=0.25 butp1=0.1, q2=0.9, Dmax=0.09 Solution: D' = D/Dmaxranges from -1 to 1Dmax Calculation: If D is positive, Dmax is lesser of p1q2 or p2q1

    If D is negative, Dmax is lesser of p1q1 or p2q2

    Cant have negative gamete frequenciesMaximum D set by allele frequencies

  • LD can also be estimated as correlation between allelesr can also be standardized to a -1 to 1 scaleIt is equivalent to D in this case

  • RecombinationShuffling of parental alleles during meiosisOccurs for unlinked loci and linked lociRate of recombination for linked markers is partially a function of physical distance

  • What is the expected recombination rate for unlinked loci?

  • LD is partially a function of recombination rateExpected proportions of gametes produced by various genotypes over two generationsWhere c is the recombination rateand D0 is the initial amount of LD

  • Recombination degrades LD over timeWhere t is time (in generations) ande is base of natural log (2.718)

  • Effects of recombination rate on LDDecline in LD over time with different theoretical recombination rates (c)Even with independent segregation (c=0.5), multiple generations required to break up allelic associationsGenome-wide linkage disequilibrium can be caused by demographic factors (more later)

View more