evolutionary genetics: part 1 polymorphism in dna · evolutionary genetics: part 1 polymorphism in...

33
Evolutionary Genetics: Part 1 Polymorphism in DNA S. peruvianum S. chilense Winter Semester 2012-2013 Prof Aurélien Tellier FG Populationsgenetik

Upload: others

Post on 27-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Evolutionary Genetics: Part 1 Polymorphism in DNA · Evolutionary Genetics: Part 1 Polymorphism in DNA S. peruvianum S. chilense Winter Semester 2012-2013 Prof Aurélien Tellier FG

Evolutionary Genetics: Part 1

Polymorphism in DNA

S. peruvianum

S. chilense

Winter Semester 2012-2013

Prof Aurélien TellierFG Populationsgenetik

Page 2: Evolutionary Genetics: Part 1 Polymorphism in DNA · Evolutionary Genetics: Part 1 Polymorphism in DNA S. peruvianum S. chilense Winter Semester 2012-2013 Prof Aurélien Tellier FG

Color code

Color code:

Red = Important result or definition

Purple: exercise to do

Green: some bits of maths

Page 3: Evolutionary Genetics: Part 1 Polymorphism in DNA · Evolutionary Genetics: Part 1 Polymorphism in DNA S. peruvianum S. chilense Winter Semester 2012-2013 Prof Aurélien Tellier FG

Population genetics

Why study the genetics of populations?

� the population is the main unit at which selection acts !!!!!!!!!

Evolution = changes between generations of frequency of

characters, traits or alleles

Page 4: Evolutionary Genetics: Part 1 Polymorphism in DNA · Evolutionary Genetics: Part 1 Polymorphism in DNA S. peruvianum S. chilense Winter Semester 2012-2013 Prof Aurélien Tellier FG

Useful definitions

� DNA has 4 bases Adenine, Guanine, Cytosine, Thymine

� Haploid= organisms with one set of non-paired chromosomes

� Diploids, Polyploids

� Tetraploid for Maize,

� Up to Hexaploid for Wheat

� Decaploid for strawberries

� Pentapoid to Duodecaploid for sugarcane

� Chromosomal location of a gene is a locus

� Several alleles can be observed at a locus (one from mother and one from father)

� The complete set of alleles in a species or population = gene pool

� The occurrence of one allele in proportion to total in gene pool = allele frequency

Page 5: Evolutionary Genetics: Part 1 Polymorphism in DNA · Evolutionary Genetics: Part 1 Polymorphism in DNA S. peruvianum S. chilense Winter Semester 2012-2013 Prof Aurélien Tellier FG

Population genetics

Populations and species show variability:

� what type and how much genetic variation exist within populations/species?

� what are the forces that influence the amount of variation within

populations?

� First question: what is the variability at the genetic (DNA) level?

Definition: population genetics is the study of the frequencies of alleles in

populations as well as their temporal and spatial changes

Page 6: Evolutionary Genetics: Part 1 Polymorphism in DNA · Evolutionary Genetics: Part 1 Polymorphism in DNA S. peruvianum S. chilense Winter Semester 2012-2013 Prof Aurélien Tellier FG

Polymorphism in DNA: Sanger sequencing

Page 7: Evolutionary Genetics: Part 1 Polymorphism in DNA · Evolutionary Genetics: Part 1 Polymorphism in DNA S. peruvianum S. chilense Winter Semester 2012-2013 Prof Aurélien Tellier FG

Polymorphism in DNA: Illumina NGS sequencing

www.seqanswers.com

Page 8: Evolutionary Genetics: Part 1 Polymorphism in DNA · Evolutionary Genetics: Part 1 Polymorphism in DNA S. peruvianum S. chilense Winter Semester 2012-2013 Prof Aurélien Tellier FG

Polymorphism in DNA: Illumina NGS sequencing

Page 9: Evolutionary Genetics: Part 1 Polymorphism in DNA · Evolutionary Genetics: Part 1 Polymorphism in DNA S. peruvianum S. chilense Winter Semester 2012-2013 Prof Aurélien Tellier FG

Polymorphism in DNA: Illumina NGS sequencing

Page 10: Evolutionary Genetics: Part 1 Polymorphism in DNA · Evolutionary Genetics: Part 1 Polymorphism in DNA S. peruvianum S. chilense Winter Semester 2012-2013 Prof Aurélien Tellier FG

Polymorphism in DNA: Illumina NGS sequencing

Page 11: Evolutionary Genetics: Part 1 Polymorphism in DNA · Evolutionary Genetics: Part 1 Polymorphism in DNA S. peruvianum S. chilense Winter Semester 2012-2013 Prof Aurélien Tellier FG

Polymorphism in DNA: Illumina NGS sequencing

www.seqanswers.com

Page 12: Evolutionary Genetics: Part 1 Polymorphism in DNA · Evolutionary Genetics: Part 1 Polymorphism in DNA S. peruvianum S. chilense Winter Semester 2012-2013 Prof Aurélien Tellier FG

Polymorphism in DNA: Illumina NGS sequencing

Page 13: Evolutionary Genetics: Part 1 Polymorphism in DNA · Evolutionary Genetics: Part 1 Polymorphism in DNA S. peruvianum S. chilense Winter Semester 2012-2013 Prof Aurélien Tellier FG

Polymorphism in DNA: new new generation…

Page 14: Evolutionary Genetics: Part 1 Polymorphism in DNA · Evolutionary Genetics: Part 1 Polymorphism in DNA S. peruvianum S. chilense Winter Semester 2012-2013 Prof Aurélien Tellier FG

Polymorphism in DNA: how it looks like

Page 15: Evolutionary Genetics: Part 1 Polymorphism in DNA · Evolutionary Genetics: Part 1 Polymorphism in DNA S. peruvianum S. chilense Winter Semester 2012-2013 Prof Aurélien Tellier FG

Some definitions

Coding / non-coding DNA

Page 16: Evolutionary Genetics: Part 1 Polymorphism in DNA · Evolutionary Genetics: Part 1 Polymorphism in DNA S. peruvianum S. chilense Winter Semester 2012-2013 Prof Aurélien Tellier FG

Some definitions

For the coding DNA: start codon, stop codon

Page 17: Evolutionary Genetics: Part 1 Polymorphism in DNA · Evolutionary Genetics: Part 1 Polymorphism in DNA S. peruvianum S. chilense Winter Semester 2012-2013 Prof Aurélien Tellier FG

Some definitions: point mutations

Point mutation = change in the DNA base (e.g. T becomes G)

Insertions – deletions = removal or insertion of bases

exercise 1.1 and 1.2

use DNASP and the file 055twolines.fas

Page 18: Evolutionary Genetics: Part 1 Polymorphism in DNA · Evolutionary Genetics: Part 1 Polymorphism in DNA S. peruvianum S. chilense Winter Semester 2012-2013 Prof Aurélien Tellier FG

Some definitions: point mutations

Consequences on the protein:

Synonymous mutations: do not change the codon and the Amino Acid

Non-synonymous mutations: change the Amino Acid

Silent sites = non-coding regions + synonymous sites

- Frameshift mutation: change reading frame (due to indels)

- Nonsense mutation: stop codon is introduced

Positions in the codon:

-A position is fourfold degenerate if any nucleotide change specifies the same AA (only 3rd position of a codon) ex: Glycine codons

-Twofold degenerate if two our of 4 changes specifies the same AA (ex 3rd position of Glutamic Acid)

-Threefold degenerate? 3rd position Isoleucine

-Non-degenerate: any change specifies a different AA

Page 19: Evolutionary Genetics: Part 1 Polymorphism in DNA · Evolutionary Genetics: Part 1 Polymorphism in DNA S. peruvianum S. chilense Winter Semester 2012-2013 Prof Aurélien Tellier FG

Practical exercise

Using database to find info on sequences

Exercise: Download the file: data-plants.fas

1) Open it with DNASP. What do you see?

2) Go to www.ncbi.nih.gov to the BLAST tools

3) Look at the options, these are plant sequences. Can you retrieve where the

sequences are from?

4) You will be directed to the results of the BLAST: lets look at them by moving the

cursor on the lines. Scores for aligments.

5) What are the best hits? Then you will be directed to the GenBank directory of

sequences. What information are there for these sequences?

Page 20: Evolutionary Genetics: Part 1 Polymorphism in DNA · Evolutionary Genetics: Part 1 Polymorphism in DNA S. peruvianum S. chilense Winter Semester 2012-2013 Prof Aurélien Tellier FG

Practical exercise

Page 21: Evolutionary Genetics: Part 1 Polymorphism in DNA · Evolutionary Genetics: Part 1 Polymorphism in DNA S. peruvianum S. chilense Winter Semester 2012-2013 Prof Aurélien Tellier FG

Practical exercise

Page 22: Evolutionary Genetics: Part 1 Polymorphism in DNA · Evolutionary Genetics: Part 1 Polymorphism in DNA S. peruvianum S. chilense Winter Semester 2012-2013 Prof Aurélien Tellier FG

Practical exercise

Then we will insert information from the database into the sequences in DNASP.

Place the coding region using the: data -> assign coding regions

Then you see changes in the way DNASP shows the sequences (see example)

Open also in Mesquite

Can you find out how many changes are silent, synonymous or non-synonymous?

How many SNPs are there? (Single Nucleotide Polymorphisms = mutational changes)

Page 23: Evolutionary Genetics: Part 1 Polymorphism in DNA · Evolutionary Genetics: Part 1 Polymorphism in DNA S. peruvianum S. chilense Winter Semester 2012-2013 Prof Aurélien Tellier FG

Different types of data

� Patterns of diversity can be observed in populations, species or among species.

� Phylogenetic trees fall in the between species comparison class

� Type of data: DNA sequences (SNPs), proteines sequences, microsatellites

� Microsatellites are short stretches of repeated DNA: TATATATATATATA

� What matters is the number of motif repeats

� One can look at their size using electrophoresis gel, but they contain less

information than SNP data.

� Their mutation rate is also higher due to the ripping of the Polymerase on them.

Page 24: Evolutionary Genetics: Part 1 Polymorphism in DNA · Evolutionary Genetics: Part 1 Polymorphism in DNA S. peruvianum S. chilense Winter Semester 2012-2013 Prof Aurélien Tellier FG

Questions to ask when looking at data

-Are the sequences already aligned?

-Are the data from one population or more than one? Or different species?

-Are the data from sexually or asexually reproducing organisms?

-Are the sequences from coding or non-coding, or both DNA?

-Are the data from one locus or several loci?

-Do we see all sites or only the variable ones (SNP, indels or both)?

-Do we see all sequences or only the different ones?

-Are the data from microsatellites or SNPs?

=> Go to Exercise 1.3 for also the Solanum data

Page 25: Evolutionary Genetics: Part 1 Polymorphism in DNA · Evolutionary Genetics: Part 1 Polymorphism in DNA S. peruvianum S. chilense Winter Semester 2012-2013 Prof Aurélien Tellier FG

Population genetics: 4 evolutionary forces

random genomic processes

(mutation, duplication, recombination, gene conversion)

natural

selection

random demographic

process (drift)

random spatial

process (migration)

molecular diversity

Population genetics investigates the laws governing the genetic structure of

populations, and changes in allele frequencies over time

Page 26: Evolutionary Genetics: Part 1 Polymorphism in DNA · Evolutionary Genetics: Part 1 Polymorphism in DNA S. peruvianum S. chilense Winter Semester 2012-2013 Prof Aurélien Tellier FG

Population genetics: 4 evolutionary forces

random genomic processes

(mutation, duplication, recombination, gene conversion)

natural

selection

random demographic

process (drift)

random spatial

process (migration)

molecular diversity

We want to infer the role of the evolutionary forces from sequence data

(very useful tool is the coalescent theory)

phenotypic

variability

Page 27: Evolutionary Genetics: Part 1 Polymorphism in DNA · Evolutionary Genetics: Part 1 Polymorphism in DNA S. peruvianum S. chilense Winter Semester 2012-2013 Prof Aurélien Tellier FG

Divergence and mutation rate

Page 28: Evolutionary Genetics: Part 1 Polymorphism in DNA · Evolutionary Genetics: Part 1 Polymorphism in DNA S. peruvianum S. chilense Winter Semester 2012-2013 Prof Aurélien Tellier FG

Molecular clock

� when DNA is passed from one generation to the next there is a constant

probability called µ that a mutation occurs

� because the polymerase is not error free

� we assume the rate is constant (though over long periods of time this may not be

true)

� Probability of mutation of what?

� at a site (per site mutation rate)

� at an entire locus (locus mutation rate)

� genome wide mutation rate

� µ is the probability per generation of a mutation, and (1- µ) is the probability that

no mutation occurs

Page 29: Evolutionary Genetics: Part 1 Polymorphism in DNA · Evolutionary Genetics: Part 1 Polymorphism in DNA S. peruvianum S. chilense Winter Semester 2012-2013 Prof Aurélien Tellier FG

Molecular clock

� thus P[no mutation for t generations] = (1- µ)t

� as long as µ << t we can use an approximation:

1 + x ≈ e x and 1 - x ≈ e -x

� result: P[no mutation for t generations] = e –µt graph of the exponential?

Page 30: Evolutionary Genetics: Part 1 Polymorphism in DNA · Evolutionary Genetics: Part 1 Polymorphism in DNA S. peruvianum S. chilense Winter Semester 2012-2013 Prof Aurélien Tellier FG

Molecular clock

� Maths 1 : Probability, Expectation and Variance,

exponential distribution

Page 31: Evolutionary Genetics: Part 1 Polymorphism in DNA · Evolutionary Genetics: Part 1 Polymorphism in DNA S. peruvianum S. chilense Winter Semester 2012-2013 Prof Aurélien Tellier FG

Molecular clock

� T is the time to the next mutation

� P[no mutation for t generations] = P[T ≥ t ] = e –µt has

� Separated by at least 2t generations, and split between Drosophila populations

= 100 000 generations (10 000 years)

2

1 1[ ] and [ ]E T Var T

µ µ= =

Page 32: Evolutionary Genetics: Part 1 Polymorphism in DNA · Evolutionary Genetics: Part 1 Polymorphism in DNA S. peruvianum S. chilense Winter Semester 2012-2013 Prof Aurélien Tellier FG

Molecular clock

� D is the number of polymorphic sites between the sequences, also called the

divergence

� K mutations have appeared along the branches of the two descendants populations,

which have length of 2t

Page 33: Evolutionary Genetics: Part 1 Polymorphism in DNA · Evolutionary Genetics: Part 1 Polymorphism in DNA S. peruvianum S. chilense Winter Semester 2012-2013 Prof Aurélien Tellier FG

Molecular clock

� Maths 2: estimator

� assuming K = D

� thus E[D] = E[K] = 2µt

� so we obtain a first estimator for µ:

� So we can calculate the mutation rate if we know the time of divergence and

the divergence from molecular data!!!!

� Exercise 1.4: calculate the mutation rate in Drosophila

ˆ2

D

tµ =