methods in genome wide association studies. norú moreno cs374:: algorithms in biology professor:...
TRANSCRIPT
![Page 1: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou](https://reader030.vdocuments.net/reader030/viewer/2022033018/56649f285503460f94c408ea/html5/thumbnails/1.jpg)
Methods in genome wide Methods in genome wide association studies.association studies.Norú MorenoNorú Moreno
CS374::Algorithms in BiologyProfessor: Serafim Batzoglou
![Page 2: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou](https://reader030.vdocuments.net/reader030/viewer/2022033018/56649f285503460f94c408ea/html5/thumbnails/2.jpg)
AgendaAgendaGWA PolymorphismsHap Map ProjectGenotyping chip
Integrating CNVs and SNPs
Imputation
Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays
![Page 3: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou](https://reader030.vdocuments.net/reader030/viewer/2022033018/56649f285503460f94c408ea/html5/thumbnails/3.jpg)
Genome-wide Association Genome-wide Association Study (GWA study or Study (GWA study or GWAS)GWAS)•Completion of the Human Genome Project in 2003 •Examination of genetic variation across a given genome.• Objective: Identify genetic associations with observable traits
![Page 4: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou](https://reader030.vdocuments.net/reader030/viewer/2022033018/56649f285503460f94c408ea/html5/thumbnails/4.jpg)
GWASGWAS
•Scan SNPs across many individuals to associate alleles with a particular disease
•Use a detected association to detect, treat and prevent the disease
•Pharmacogenomics.
![Page 5: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou](https://reader030.vdocuments.net/reader030/viewer/2022033018/56649f285503460f94c408ea/html5/thumbnails/5.jpg)
PolymorphismsPolymorphismsA specific sequence variation that some
individuals possess
Some variations are common, others are rare
Examples:
◦ Blood types◦ Height◦ Skin Color◦ Etc…
![Page 6: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou](https://reader030.vdocuments.net/reader030/viewer/2022033018/56649f285503460f94c408ea/html5/thumbnails/6.jpg)
Types of polymorphismsTypes of polymorphisms1. Copy Number Variation (CNV)
Segment of DNA that are found in different numbers of copies among individuals
Substantial regions, not single nucleotidesA B C
A C
A B CB B
![Page 7: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou](https://reader030.vdocuments.net/reader030/viewer/2022033018/56649f285503460f94c408ea/html5/thumbnails/7.jpg)
Types of polymorphismsTypes of polymorphisms2. Single Nucleotide
Polymorphism (SNP)
)Murray 2007(
![Page 8: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou](https://reader030.vdocuments.net/reader030/viewer/2022033018/56649f285503460f94c408ea/html5/thumbnails/8.jpg)
HapMapHapMapTwo unrelated people share about
99.5% of their DNA sequence.HapMap focuses only on common
SNPs, : 1% of the population
269 individuals, ~4M SNPs
Genotyped the individuals for these SNPs, and published the results
![Page 9: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou](https://reader030.vdocuments.net/reader030/viewer/2022033018/56649f285503460f94c408ea/html5/thumbnails/9.jpg)
Genotyping chipGenotyping chip
ACTGGGCTAATCGATCGACTAGCTAGCTAGTCTCGATCAAT
AC
TG
GG
CTA
A
TC
GA
TC
GA
CTA
GC
TA
GC
TA
GT
CTC
GA
TC
AA
T
Probes
![Page 10: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou](https://reader030.vdocuments.net/reader030/viewer/2022033018/56649f285503460f94c408ea/html5/thumbnails/10.jpg)
Genotyping chipGenotyping chip
(Liu 2007) (Affymetrix)
![Page 11: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou](https://reader030.vdocuments.net/reader030/viewer/2022033018/56649f285503460f94c408ea/html5/thumbnails/11.jpg)
Genotyping chipGenotyping chip
(Affymetrix)
![Page 12: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou](https://reader030.vdocuments.net/reader030/viewer/2022033018/56649f285503460f94c408ea/html5/thumbnails/12.jpg)
Genotyping chipGenotyping chip
A
B BB(0)
AB(0.5)
AA(1)
![Page 13: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou](https://reader030.vdocuments.net/reader030/viewer/2022033018/56649f285503460f94c408ea/html5/thumbnails/13.jpg)
Genotyping chipGenotyping chipAffymetrix 100k chip set
◦Entire genome with 100 000 SNPs (low density).
Affymetrix 500k chip (SNP array 5.0) ◦Entire genome with 500 000 SNPs
(high density) Affymetrix 1M chip (SNP array 6.0)
◦Entire genome with 1 000 000 SNPs (very high density)
![Page 14: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou](https://reader030.vdocuments.net/reader030/viewer/2022033018/56649f285503460f94c408ea/html5/thumbnails/14.jpg)
Integrated genotype Integrated genotype calling and association calling and association analysis of SNPs, common analysis of SNPs, common copy number copy number polymorphisms and rare polymorphisms and rare CNVs (Birdsuite)CNVs (Birdsuite) Korn, et al. Korn, et al.
![Page 15: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou](https://reader030.vdocuments.net/reader030/viewer/2022033018/56649f285503460f94c408ea/html5/thumbnails/15.jpg)
BirdsuiteBirdsuiteTake in count CNVs and SNPs :: Raw
data from genotyping chip as input.
Output: integrated CNVs and SNPS genotype per locus
CNVs and SNPs coexist.
Both common and rare to understand the role of genetic variation in disease.
![Page 16: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou](https://reader030.vdocuments.net/reader030/viewer/2022033018/56649f285503460f94c408ea/html5/thumbnails/16.jpg)
BirdsuiteBirdsuite
SNPs(AA, AB, CC)
CNPs
New Genotype
A-null
AAAB
BBBB
![Page 17: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou](https://reader030.vdocuments.net/reader030/viewer/2022033018/56649f285503460f94c408ea/html5/thumbnails/17.jpg)
Birdsuite – 4 Birdsuite – 4 StagesStagesCanary – ‘Genotypes’ common copy-
number polymorphisms (CNPs) Birdseed - Genotypes SNPs using the
classical AA, AB, and BB genotypes.Birdseye - Identify rare CNVs via
HMMsFawkes - Integrates CNV information
to produce mutually consistent SNP genotypes (i.e. including genotypes such as A-null and AAB)
![Page 18: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou](https://reader030.vdocuments.net/reader030/viewer/2022033018/56649f285503460f94c408ea/html5/thumbnails/18.jpg)
BirdsuiteBirdsuite - - CanaryCanaryDetermines the copy number of
each individual at each predefined CNP locus.
CNP = Copy number polymorphismCNV>1% frequency in population
Locus Number of copies
A 1
B 3
C 1
A B CB B
![Page 19: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou](https://reader030.vdocuments.net/reader030/viewer/2022033018/56649f285503460f94c408ea/html5/thumbnails/19.jpg)
CanaryCanary
(Korn, p.1255)
![Page 20: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou](https://reader030.vdocuments.net/reader030/viewer/2022033018/56649f285503460f94c408ea/html5/thumbnails/20.jpg)
Birdsuite - Birdsuite - BirdseedBirdseedWe expect only AA, AB or BB.From canary only CNPs with 2
No fewer or extra copies.
BB
AA
AB
(Korn, p.1257)
![Page 21: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou](https://reader030.vdocuments.net/reader030/viewer/2022033018/56649f285503460f94c408ea/html5/thumbnails/21.jpg)
Birdsuite - Birdsuite - BirdseyeBirdseyeUsing Canary and Birdseed:
◦Identify rare and de novo CNVs◦Small number of real CNVs at
unknown sites.Search consistent evidence for
copy number variation across multiple neighboring probes.
Implement an HMM-based algorithm to find strong, consistent evidence for altered copy number states
![Page 22: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou](https://reader030.vdocuments.net/reader030/viewer/2022033018/56649f285503460f94c408ea/html5/thumbnails/22.jpg)
Birdsuite - Birdsuite - BirdseyeBirdseyeHMM to find regions of variable
copy number in a sample.Hidden state: The true copy
number of the individual’s genome.
Observed states: The normalized intensity measurements of each probe on the array.
![Page 23: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou](https://reader030.vdocuments.net/reader030/viewer/2022033018/56649f285503460f94c408ea/html5/thumbnails/23.jpg)
Birdsuite - Birdsuite - FawkesFawkesMerge all the results.
Show the CNVs within each SNP.
Utilize the imputed locations (in A/B intensity space) of copy-variable clusters.
Assign an allele-specific copy number genotype at each SNP.
(e.g. AAB, ABBB, A or B)
![Page 24: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou](https://reader030.vdocuments.net/reader030/viewer/2022033018/56649f285503460f94c408ea/html5/thumbnails/24.jpg)
FawkesFawkes
(Korn, p. 1254,1257)
![Page 25: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou](https://reader030.vdocuments.net/reader030/viewer/2022033018/56649f285503460f94c408ea/html5/thumbnails/25.jpg)
(Affymetrix website screenshot)
![Page 26: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou](https://reader030.vdocuments.net/reader030/viewer/2022033018/56649f285503460f94c408ea/html5/thumbnails/26.jpg)
ImputationImputationDealing with missing data points by
filling in values.In SNPs:T A G G T ? T G C C T A G C G TWhy?- Cost-saving
- Avoid re-genotyping- Keep effective sample size- SNP comparisons between existing
platforms.
![Page 27: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou](https://reader030.vdocuments.net/reader030/viewer/2022033018/56649f285503460f94c408ea/html5/thumbnails/27.jpg)
ImputationImputationHigh rate of occurrence.
◦‘Direct’ imputation.
T A G G T ? T G C C T A G C G T
T A G G T A T G C C T A G C G T
![Page 28: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou](https://reader030.vdocuments.net/reader030/viewer/2022033018/56649f285503460f94c408ea/html5/thumbnails/28.jpg)
Linkage disequilibrium◦Non-random association of alleles at
two or more loci.
ImputationImputation
LD
SNP of interest
![Page 29: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou](https://reader030.vdocuments.net/reader030/viewer/2022033018/56649f285503460f94c408ea/html5/thumbnails/29.jpg)
Resolving Individuals Resolving Individuals Contributing Trace Contributing Trace Amounts of DNA to Highly Amounts of DNA to Highly Complex Mixtures Using Complex Mixtures Using High-Density SNP High-Density SNP Genotyping MicroarraysGenotyping Microarrays Homer, et al. Homer, et al.
![Page 30: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou](https://reader030.vdocuments.net/reader030/viewer/2022033018/56649f285503460f94c408ea/html5/thumbnails/30.jpg)
TheThe DNA Detective DNA Detective
Is an individual genome present in a DNA mixture?
Query Mixed DNA // Population
![Page 31: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou](https://reader030.vdocuments.net/reader030/viewer/2022033018/56649f285503460f94c408ea/html5/thumbnails/31.jpg)
DNA DetectiveDNA Detective
We have:Different laboratories > different
conclusions.Usually not accurate at all.Hard and cannot be automatized.
![Page 32: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou](https://reader030.vdocuments.net/reader030/viewer/2022033018/56649f285503460f94c408ea/html5/thumbnails/32.jpg)
DNA Detective - DNA Detective - MethodologyMethodologySummary:Cumulative sum of allele shifts over
all available SNPs.
Shift’s sign > individual of interest is closer to a reference sample or closer to a given mixture.
First genotype a single SNP for a single person, then adapt it to all mixtures and pooled data.
![Page 33: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou](https://reader030.vdocuments.net/reader030/viewer/2022033018/56649f285503460f94c408ea/html5/thumbnails/33.jpg)
DNA Detective – Single SNP, Single DNA Detective – Single SNP, Single personpersonRaw preprocessed data > allele
instensity (How much of A and how much of B we have).
1.Transform normalized data into a ratio.
Yi is the estimate of allele frequency
BB AB AA
~0 ~0.5 ~1
![Page 34: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou](https://reader030.vdocuments.net/reader030/viewer/2022033018/56649f285503460f94c408ea/html5/thumbnails/34.jpg)
DNA DNA Detective - Detective - MethodologyMethodologyUse relative probe intensity
data.Compare allele frequency
estimates from the mixture (M).
Assume reference population (Pop) has similar ancestral components interchangeable.
![Page 35: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou](https://reader030.vdocuments.net/reader030/viewer/2022033018/56649f285503460f94c408ea/html5/thumbnails/35.jpg)
Distance measure for individual Yi
DNA Detective - DNA Detective - MethodologyMethodology
![Page 36: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou](https://reader030.vdocuments.net/reader030/viewer/2022033018/56649f285503460f94c408ea/html5/thumbnails/36.jpg)
Null hypotheses, individual is not in the mixture, D(Yi,j) ~ 0
Alternative hypotheses, D(Yi,j) > 0
More similar to M than Pop
D(Yi,j) < 0 Yi,jc is more ancestral similar to
Pop than to M.
DNA Detective - DNA Detective - MethodologyMethodology
![Page 37: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou](https://reader030.vdocuments.net/reader030/viewer/2022033018/56649f285503460f94c408ea/html5/thumbnails/37.jpg)
(Homer, p.4)
![Page 38: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou](https://reader030.vdocuments.net/reader030/viewer/2022033018/56649f285503460f94c408ea/html5/thumbnails/38.jpg)
DNA Detective - DNA Detective - ResultsResultsAccurate findings.
Determined if a trace amount (<1%) of DNA is present in a DNA mixture.
Tested with different kinds of Mixtures from public available data.
![Page 39: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou](https://reader030.vdocuments.net/reader030/viewer/2022033018/56649f285503460f94c408ea/html5/thumbnails/39.jpg)
DNA Detective - DNA Detective - ImplicationsImplicationsForensics application.TraceabilityLeak of privacy information.
◦Public data from many studies. Summary statistics of Allele Frequency.
Political implications.◦How to share the data now?
![Page 40: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou](https://reader030.vdocuments.net/reader030/viewer/2022033018/56649f285503460f94c408ea/html5/thumbnails/40.jpg)
Thank You!
![Page 41: Methods in genome wide association studies. Norú Moreno CS374:: Algorithms in Biology Professor: Serafim Batzoglou](https://reader030.vdocuments.net/reader030/viewer/2022033018/56649f285503460f94c408ea/html5/thumbnails/41.jpg)
ReferencesReferences Korn J, et al. Integrated genotype calling and
association analysis of SNPs, common copy number polymorphisms and rare CNVs. Nature genetics. 2008 Oct;40(10): 1253-60
Homer N, et al. Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genet. 2008 Aug 29;4(8):e1000167
Liu Y, DPhil, Prchal F. SNP-Chip-Based Genome-Wide Analysis of Genetic Alterations in Hematologic Disorders: The Way Forward?. The Hematologist. 2007
Murray, E. IST 341 Issues in Human Genetics. http://www.science.marshall.edu/murraye/341/snps/Human%20Genetics%20MTHFR%20SNP%20Page.html