genecvariaonandgenecdiversity&sssykim/teaching/f13/slides/genetic... · 2013. 9. 11. ·...
TRANSCRIPT
Gene$c Varia$on and Gene$c Diversity
02-‐223 How to Analyze Your Own Genome
Fall 2013
Terminology
• Allele: different forms of gene@c varia@ons at a given gene or gene@c locus – Locus 1 has two alleles, A and T,
and Locus 2 has two alleles, C and G
• Genotype: specific allelic make-‐up of an individual’s genome – Individual 1 has genotype AA at
Locus 1 and genotype CG at Locus 2
• Heterozygous/Homozygous – Locus 1 of Individual 1 is
homozygous, and Locus 2 is heterozygous
A
A
C
G
Locus 1
Locus 2
A
T
C
C
Locus 1
Locus 2
Individual 1
Individual 2
Single Nucleo$de Polymorphisms (SNPs)
Advantages of SNPs in Popula$on Gene$cs Studies
• Abundance: high frequency on the genome • Posi@on: throughout the genome
– coding region, intron region, promoter site
• Ease of genotyping (high-‐throughput genotyping)
• Less mutable than other forms of polymorphisms
• SNPs account for around 90% of human genomic varia@on
• About 10 million SNPs exist in human popula@ons
• Most SNPs are outside of the protein coding regions
• 1 SNP every 600 base pairs
• More than 5 million common SNPs each with frequency 10-‐50% account for the bulk of human DNA sequence difference
• It is es@mated that ~60,000 SNPs occur within exons; 85% of exons are within 5 kb of the nearest SNP
• Account for most of the genetic diversity among different (normal) individual, e.g. drug response, disease susceptibility"
• However, only two alleles at each locus, less informa@ve than microsatellites. (Use haplotypes!)
Working with SNP Data in Prac$ce
• At each locus, SNPs are represented as 0 or 1. – A/T/C/G lecers are converted to 0 or 1 for minor/major alleles – Genotypes at each locus of each individual are coded as
• 0 : minor allele homozygous • 1: heterozygous • 2: major allele homozygous
• Given genotype data for N individuals • For each locus, we can define minor allele frequency as follows: (Minor allele frequency) = (the number of minor alleles in the popula@on)/(total number of alleles in the popula@on)
• Typically, SNPs with a very low minor allele frequency are discarded, since they don’t contain sufficient informa@on about gene@c diversity
The Effects of Single Nucleo$de Muta$ons
• Muta@ons in the protein coding regions – Nonsynonymous muta@ons
• Missense muta@ons change the protein sequence – CAC in RNA (or DNA) codes for amino acid his, but if A is mutated to U (CUC), it
codes for amino acid leu
• Nonsense muta@ons truncate the protein – UGG codes for amino acid trp, but if G is mutated to A (UAG), it becomes a stop
codon.
– Synonymous muta@ons do not change amino acids • Both CAC and CAU result in amio acid his • However, such muta@ons could affect splice sites
• Muta@ons in the regulatory (non-‐coding) regions – We have very licle understanding of the regulatory regions and muta@ons in them
Gene$c Polymorphisms
• Inser@on/dele@on of a sec@on of DNA – Minisatellites: repeated base pacerns (several hundred base pairs)
– Microsatellites: 2-‐4 nucleo@des repeated – Presence or absence of Alu segments
– Many alleles, very informa@ve because of the high heterozygosity (the chance that a randomly selected person will be heterozygous)
Gene$c Polymorphisms
• Structural variants – inser@ons/dele@ons, duplica@ons, copy number varia@ons
Gene$c Polymorphisms
• Copy Number Varia@on – DNA segment whose numbers differ in different genomes
• Kilobases to megabases in size
– Usually two copies of all autosomal regions, one per chromosome
– Varia@on due to dele@on or duplica@on
Gene$c Polymorphisms
• Copy Number Varia@ons + SNPs
Detec$ng Gene$c Polymorphisms from Shotgun Sequencing
Gene$c Variant Frequencies from 1000 Genome Pilot Project
Frequency of SNPs greater than that of any other type of polymorphism
Gene$c Markers
• Gene@c markers – DNA sequence with a known physical loca@on on a chromosome
– An iden@fiable segment of DNA (e.g., SNPs, microsatellites) with enough varia@on between individuals that its inheritance and co-‐inheritance with alleles of a given gene can be traced
– Gene@c markers can be used to refer to a par@cular loca@on in genomes or in a gene@c map.
hcp://www.genome.gov/glossary/index.cfm?id=86 Check out the “Listen” voice recording of Dr. Hurle’s explana@on of gene@c markers
Summry
• Alleles and genotypes
• Different types of gene@c polymorphisms – Single nucleo@de polymorphisms (SNPs)
– Structural variants • Inser@ons, dele@ons, copy number varia@ons etc.
– SNPs are the most abundant polymorphisms and are oqen used as gene@c markers