genecvariaonandgenecdiversity&sssykim/teaching/f13/slides/genetic... · 2013. 9. 11. ·...

14
Gene$c Varia$on and Gene$c Diversity 02223 How to Analyze Your Own Genome Fall 2013

Upload: others

Post on 29-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: GenecVariaonandGenecDiversity&sssykim/teaching/f13/slides/genetic... · 2013. 9. 11. · Working&with&SNP&Data&in&Prac$ce& • Ateach%locus,%SNPs%are%represented%as%0%or%1.% – A/T/C/G%lecers%are%converted%to%0%or%1%for%minor/major%alleles%

Gene$c  Varia$on  and  Gene$c  Diversity  

02-­‐223  How  to  Analyze  Your  Own  Genome  

Fall  2013  

Page 2: GenecVariaonandGenecDiversity&sssykim/teaching/f13/slides/genetic... · 2013. 9. 11. · Working&with&SNP&Data&in&Prac$ce& • Ateach%locus,%SNPs%are%represented%as%0%or%1.% – A/T/C/G%lecers%are%converted%to%0%or%1%for%minor/major%alleles%

Terminology  

•  Allele:  different  forms  of  gene@c  varia@ons  at  a  given  gene  or  gene@c  locus  –  Locus  1  has  two  alleles,  A  and  T,  

and  Locus  2  has  two  alleles,  C  and  G  

•  Genotype:  specific  allelic  make-­‐up  of  an  individual’s  genome  –  Individual  1  has  genotype  AA  at  

Locus  1  and  genotype  CG  at  Locus  2  

•  Heterozygous/Homozygous  –  Locus  1  of  Individual  1  is  

homozygous,  and  Locus  2  is  heterozygous  

A  

A  

C  

G  

Locus  1  

Locus  2  

A  

T  

C  

C  

Locus  1  

Locus  2  

Individual  1  

Individual  2  

Page 3: GenecVariaonandGenecDiversity&sssykim/teaching/f13/slides/genetic... · 2013. 9. 11. · Working&with&SNP&Data&in&Prac$ce& • Ateach%locus,%SNPs%are%represented%as%0%or%1.% – A/T/C/G%lecers%are%converted%to%0%or%1%for%minor/major%alleles%

Single  Nucleo$de  Polymorphisms  (SNPs)      

Page 4: GenecVariaonandGenecDiversity&sssykim/teaching/f13/slides/genetic... · 2013. 9. 11. · Working&with&SNP&Data&in&Prac$ce& • Ateach%locus,%SNPs%are%represented%as%0%or%1.% – A/T/C/G%lecers%are%converted%to%0%or%1%for%minor/major%alleles%

Advantages  of  SNPs  in  Popula$on  Gene$cs  Studies  

•  Abundance:  high  frequency  on  the  genome  •  Posi@on:  throughout  the  genome    

–  coding  region,  intron  region,  promoter  site  

•  Ease  of  genotyping  (high-­‐throughput  genotyping)  

•  Less  mutable  than  other  forms  of  polymorphisms  

•  SNPs  account  for  around  90%  of  human  genomic  varia@on  

•  About  10  million  SNPs  exist  in  human  popula@ons  

•  Most  SNPs  are  outside  of  the  protein  coding  regions  

•  1  SNP  every  600  base  pairs  

•  More  than  5  million  common  SNPs  each  with  frequency  10-­‐50%  account  for  the  bulk  of  human  DNA  sequence  difference  

•  It  is  es@mated  that  ~60,000  SNPs  occur  within  exons;  85%  of  exons  are  within  5  kb  of  the  nearest  SNP  

•  Account for most of the genetic diversity among different (normal) individual, e.g. drug response, disease susceptibility"

•  However,  only  two  alleles  at  each  locus,  less  informa@ve  than  microsatellites.  (Use  haplotypes!)  

Page 5: GenecVariaonandGenecDiversity&sssykim/teaching/f13/slides/genetic... · 2013. 9. 11. · Working&with&SNP&Data&in&Prac$ce& • Ateach%locus,%SNPs%are%represented%as%0%or%1.% – A/T/C/G%lecers%are%converted%to%0%or%1%for%minor/major%alleles%

Working  with  SNP  Data  in  Prac$ce  

•  At  each  locus,  SNPs  are  represented  as  0  or  1.  –  A/T/C/G  lecers  are  converted  to  0  or  1  for  minor/major  alleles  –  Genotypes  at  each  locus  of  each  individual  are  coded  as  

•  0  :  minor  allele  homozygous  •  1:  heterozygous  •  2:  major  allele  homozygous  

•  Given  genotype  data  for  N  individuals  •  For  each  locus,  we  can  define  minor  allele  frequency  as  follows:    (Minor  allele  frequency)  =  (the  number  of  minor  alleles  in  the  popula@on)/(total  number  of  alleles  in  the  popula@on)  

•  Typically,  SNPs  with  a  very  low  minor  allele  frequency  are  discarded,  since  they  don’t  contain  sufficient  informa@on  about  gene@c  diversity  

Page 6: GenecVariaonandGenecDiversity&sssykim/teaching/f13/slides/genetic... · 2013. 9. 11. · Working&with&SNP&Data&in&Prac$ce& • Ateach%locus,%SNPs%are%represented%as%0%or%1.% – A/T/C/G%lecers%are%converted%to%0%or%1%for%minor/major%alleles%

The  Effects  of  Single  Nucleo$de  Muta$ons  

•  Muta@ons  in  the  protein  coding  regions  –  Nonsynonymous  muta@ons  

•  Missense  muta@ons  change  the  protein  sequence  –  CAC  in  RNA  (or  DNA)  codes  for  amino  acid  his,  but  if  A  is  mutated  to  U  (CUC),  it  

codes  for  amino  acid  leu  

•  Nonsense  muta@ons  truncate  the  protein  –  UGG  codes  for  amino  acid  trp,  but  if  G  is  mutated  to  A  (UAG),  it  becomes  a  stop  

codon.  

–  Synonymous  muta@ons  do  not  change  amino  acids  •  Both  CAC  and  CAU  result  in  amio  acid  his  •  However,  such  muta@ons  could  affect  splice  sites  

•  Muta@ons  in  the  regulatory  (non-­‐coding)  regions  –  We  have  very  licle  understanding  of  the  regulatory  regions  and  muta@ons  in  them  

Page 7: GenecVariaonandGenecDiversity&sssykim/teaching/f13/slides/genetic... · 2013. 9. 11. · Working&with&SNP&Data&in&Prac$ce& • Ateach%locus,%SNPs%are%represented%as%0%or%1.% – A/T/C/G%lecers%are%converted%to%0%or%1%for%minor/major%alleles%

Gene$c  Polymorphisms  

•  Inser@on/dele@on  of  a  sec@on  of  DNA  –  Minisatellites:  repeated  base  pacerns  (several  hundred  base  pairs)  

–  Microsatellites:  2-­‐4  nucleo@des  repeated  –  Presence  or  absence  of  Alu  segments  

–  Many  alleles,  very  informa@ve  because  of  the  high  heterozygosity  (the  chance  that  a  randomly  selected  person  will  be  heterozygous)  

Page 8: GenecVariaonandGenecDiversity&sssykim/teaching/f13/slides/genetic... · 2013. 9. 11. · Working&with&SNP&Data&in&Prac$ce& • Ateach%locus,%SNPs%are%represented%as%0%or%1.% – A/T/C/G%lecers%are%converted%to%0%or%1%for%minor/major%alleles%

Gene$c  Polymorphisms  

•  Structural  variants    –  inser@ons/dele@ons,  duplica@ons,  copy  number  varia@ons  

Page 9: GenecVariaonandGenecDiversity&sssykim/teaching/f13/slides/genetic... · 2013. 9. 11. · Working&with&SNP&Data&in&Prac$ce& • Ateach%locus,%SNPs%are%represented%as%0%or%1.% – A/T/C/G%lecers%are%converted%to%0%or%1%for%minor/major%alleles%

Gene$c  Polymorphisms  

•  Copy  Number  Varia@on  –  DNA  segment  whose  numbers  differ  in  different  genomes  

•  Kilobases  to  megabases  in  size  

–  Usually  two  copies  of  all    autosomal  regions,  one  per  chromosome  

–  Varia@on  due  to  dele@on  or  duplica@on  

Page 10: GenecVariaonandGenecDiversity&sssykim/teaching/f13/slides/genetic... · 2013. 9. 11. · Working&with&SNP&Data&in&Prac$ce& • Ateach%locus,%SNPs%are%represented%as%0%or%1.% – A/T/C/G%lecers%are%converted%to%0%or%1%for%minor/major%alleles%

Gene$c  Polymorphisms  

•  Copy  Number  Varia@ons  +  SNPs  

Page 11: GenecVariaonandGenecDiversity&sssykim/teaching/f13/slides/genetic... · 2013. 9. 11. · Working&with&SNP&Data&in&Prac$ce& • Ateach%locus,%SNPs%are%represented%as%0%or%1.% – A/T/C/G%lecers%are%converted%to%0%or%1%for%minor/major%alleles%

Detec$ng  Gene$c  Polymorphisms  from  Shotgun  Sequencing  

Page 12: GenecVariaonandGenecDiversity&sssykim/teaching/f13/slides/genetic... · 2013. 9. 11. · Working&with&SNP&Data&in&Prac$ce& • Ateach%locus,%SNPs%are%represented%as%0%or%1.% – A/T/C/G%lecers%are%converted%to%0%or%1%for%minor/major%alleles%

Gene$c  Variant  Frequencies  from  1000  Genome  Pilot  Project  

Frequency  of  SNPs  greater  than  that  of  any  other  type  of  polymorphism  

Page 13: GenecVariaonandGenecDiversity&sssykim/teaching/f13/slides/genetic... · 2013. 9. 11. · Working&with&SNP&Data&in&Prac$ce& • Ateach%locus,%SNPs%are%represented%as%0%or%1.% – A/T/C/G%lecers%are%converted%to%0%or%1%for%minor/major%alleles%

Gene$c  Markers  

•  Gene@c  markers  –  DNA  sequence  with  a  known  physical  loca@on  on  a  chromosome  

–  An  iden@fiable  segment  of  DNA  (e.g.,  SNPs,  microsatellites)  with  enough  varia@on  between  individuals  that  its  inheritance  and  co-­‐inheritance  with  alleles  of  a  given  gene  can  be  traced  

–  Gene@c  markers  can  be  used  to  refer  to  a  par@cular  loca@on  in  genomes  or  in  a  gene@c  map.  

hcp://www.genome.gov/glossary/index.cfm?id=86  Check  out  the  “Listen”  voice  recording  of  Dr.  Hurle’s  explana@on  of  gene@c  markers  

Page 14: GenecVariaonandGenecDiversity&sssykim/teaching/f13/slides/genetic... · 2013. 9. 11. · Working&with&SNP&Data&in&Prac$ce& • Ateach%locus,%SNPs%are%represented%as%0%or%1.% – A/T/C/G%lecers%are%converted%to%0%or%1%for%minor/major%alleles%

Summry  

•  Alleles  and  genotypes  

•  Different  types  of  gene@c  polymorphisms  –  Single  nucleo@de  polymorphisms  (SNPs)  

–  Structural  variants  •  Inser@ons,  dele@ons,  copy  number  varia@ons  etc.  

–  SNPs  are  the  most  abundant  polymorphisms  and  are  oqen  used  as  gene@c  markers