![Page 1: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/1.jpg)
1
![Page 2: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/2.jpg)
![Page 3: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/3.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
A brief reminder
![Page 4: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/4.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
A brief reminder
sister chromatids
homologous chromosomes
Paternal Maternal
![Page 5: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/5.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
M:
P:
HeterozygosityTACAAATATTACAGATAT
7
Heterozygous sites
![Page 6: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/6.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
M:
P:
Heterozygosity
segregating sites
8
![Page 7: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/7.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
M:
P:
HeterozygosityTACAAATATTACAGATAT
9
Homozygous sites
![Page 8: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/8.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
M:P:
TACAAATATTACAGATAT
Heterozygosity
ind#A
M:P:
TACAGATCTTACAGATCT
ind#B
12
![Page 9: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/9.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
M:P:
TACAAATATTACAGATAT
Heterozygosity
M:P:
TACAGATCTTACAGATCT
ind#A
ind#B
13
![Page 10: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/10.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
M:P:
TACAAATATTACAGATAT
M:P:
TACAGATCTTACAGATCT
Homozygous variantind#A
ind#B
14
![Page 11: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/11.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
M:P:
TACAAATATTACAGATAT
M:P:
TACAGATCTTACAGATCT
Homozygous invariantind#A
ind#B
15
![Page 12: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/12.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
AAACAGATCCCGCTGGGTTT
reference
ind#X TACAAATATTACAGATAT
Genotyping
16
Which of the 10 possible genotypes is the most likely?
![Page 13: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/13.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
reference
ind#X
reference
ind#Y
TACAAATATTACAGATAT
TACAGATCTTACAGATCT
Joint Genotyping
17
![Page 14: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/14.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
Menu
• Introduction
• Alignment post-processing
• From aligned reads to genomic variation
• Variant effect
18
![Page 15: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/15.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
Generalized NGS analysis
19
![Page 16: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/16.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
Brief probability reminder
20
Events:
= I pick a random human and that person is Danish
= pop. of Denmark / pop. world
?
![Page 17: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/17.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
Brief probability reminder
21
Events:
= I pick a random human and that person is Danish
= I pick a random human and that person’s name is Rasmus
= pop. of Denmark / pop. world
= # of Rasmuses / pop. world
= # of Rasmuses in Denmark / # of Rasmuses in the world
?
My name is Rasmus!
![Page 18: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/18.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
What is genotyping?
Genotyping is determining which genotype maximizes:
22
genotype data
![Page 19: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/19.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
What is genotyping?
23
genotype data
Genotyping is determining which genotype maximizes:
![Page 20: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/20.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
What is genotyping?
24
genotype data
TA
Genotyping is determining which genotype maximizes:
![Page 21: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/21.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling 25
What is genotyping?
![Page 22: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/22.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling 26
What is genotyping? likelihood: What is the probability of seeing the data given the genotype?prior: what is the probability of
the genotype to begin with?
![Page 23: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/23.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling 27
What is genotyping? likelihood: What is the probability of seeing the data given the genotype?prior: what is the probability of
the genotype to begin with?
evidence: What is the probability of generating that data to begin with?
![Page 24: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/24.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
The likelihood
29
i.e. each reads is an independent observation
![Page 25: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/25.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
The likelihood
30
G
T
T
Toy example:
![Page 26: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/26.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
The likelihood
31
G
T
T
Toy example:
quality score: 10
quality score: 20
quality score: 20
![Page 27: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/27.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
The likelihood
32
G
T
T
Toy example:
quality score: 10 P(error) = 0.1
quality score: 20 P(error) = 0.01
quality score: 20 P(error) = 0.01
![Page 28: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/28.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
The likelihood
33
G
T
T
Toy example:
quality score: 10 P(error) = 0.1
quality score: 20 P(error) = 0.01
quality score: 20 P(error) = 0.01
The 2 Ts are sequencing errors!The genotype is GG
Nope! They are all correct and the genotype is GT
Liar! The G is a sequencing error! TT is the genotype
![Page 29: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/29.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
The likelihood
34
G
T
T
Toy example:
quality score: 10 P(error) = 0.1
quality score: 20 P(error) = 0.01
quality score: 20 P(error) = 0.01
Error model
P(G|A)=
P(G|C)=
P(G|G)=
P(G|T)=
What I think is the baseprobability of the data given the base
![Page 30: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/30.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
The likelihood
36
G
T
T
Toy example:
quality score: 10 P(error) = 0.1
quality score: 20 P(error) = 0.01
quality score: 20 P(error) = 0.01
Let’s evaluate 3 possible genotypes:
GG
GT
TT
![Page 31: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/31.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
The likelihood
37
G
T
T
quality score: 10 P(error) = 0.1
quality score: 20 P(error) = 0.01
quality score: 20 P(error) = 0.01
GG
![Page 32: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/32.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
The likelihood
38
G
T
T
quality score: 10 P(error) = 0.1
quality score: 20 P(error) = 0.01
quality score: 20 P(error) = 0.01
GG
G G
![Page 33: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/33.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
The likelihood
39
G
T
T
quality score: 10 P(error) = 0.1
quality score: 20 P(error) = 0.01
quality score: 20 P(error) = 0.01
GG
G G
![Page 34: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/34.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
The likelihood
40
G
T
T
quality score: 10 P(error) = 0.1
quality score: 20 P(error) = 0.01
quality score: 20 P(error) = 0.01
GG
G G
![Page 35: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/35.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
The likelihood
41
G
T
T
quality score: 10 P(error) = 0.1
quality score: 20 P(error) = 0.01
quality score: 20 P(error) = 0.01
GG
G G
![Page 36: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/36.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
The likelihood
42
G
T
T
quality score: 10 P(error) = 0.1
quality score: 20 P(error) = 0.01
quality score: 20 P(error) = 0.01
GG
G G
![Page 37: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/37.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
The likelihood
43
G
T
T
quality score: 10 P(error) = 0.1
quality score: 20 P(error) = 0.01
quality score: 20 P(error) = 0.01
GT
G T
![Page 38: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/38.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
The likelihood
44
G
T
T
quality score: 10 P(error) = 0.1
quality score: 20 P(error) = 0.01
quality score: 20 P(error) = 0.01
TT
T T
![Page 39: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/39.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
The likelihood
45
G
T
T
quality score: 10 P(error) = 0.1
quality score: 20 P(error) = 0.01
quality score: 20 P(error) = 0.01 TT = 0.0327
GT = 0.11511
GG = 0.00001
![Page 40: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/40.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
The likelihood
46
TT = 0.0327
GT = 0.11511
GG = 0.00001
GG GG GT GT TT TT
A likelihood in itself is not meaningful,
you need to compare it to other
models
![Page 41: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/41.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
The likelihood
47
TT = 0.0327
GT = 0.11511
GG = 0.00001
We will neglect the genotype prior this time
![Page 42: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/42.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
The likelihood
48
GG = 6.7e-05
GT = 0.77888
TT = 0.22104
![Page 43: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/43.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
The likelihood
49
GG = 6.7e-05
GT = 0.77888
TT = 0.22104
Important point: More coverage →More multiplications →The relative difference between models become larger
![Page 44: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/44.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
The likelihood
50
GG = 6.7e-05 41.70 40.60
GT = 0.77888 1.09 0.00
TT = 0.22104 6.56 5.47
PHRED PHRED-scaled
What is the
difference between the most likely and second
most likely
![Page 45: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/45.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
Details I did not cover
- Error model- Most genotypers do not simply use raw quality scores
51
![Page 46: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/46.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
Most common genotypers
• GATK
• SAMtools/BCFtools
• graphtyper
• FreeBayes
52
![Page 47: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/47.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling 53
https://gatk.broadinstitute.org/hc/en-us/articles/360035535932-Germline-short-variant-discovery-SNPs-Indels-
GATK’s recommended workflow
![Page 48: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/48.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
The PCR amplification step included in the majority of NGS library construction techniques can introduce duplicates in the data.
![Page 49: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/49.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
We want: remove or mark them to avoid false callsex: the site below is probably heterozygous (i.e. the is the second allele)
![Page 50: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/50.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
Genotypers will ignore reads marked as duplicatesex: the site below is probably homozygous (i.e. the is a seq. error)
![Page 51: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/51.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
Duplicate/marking removal
Basic concepts of duplicate marking algorithm:
• Identify genomic position and strand for 5’-most bases.
• Mark reads that are duplicates of each other.
• Within a group of duplicate reads, the read with the highest sum of base
quality scores is retained.
http://picard.sourceforge.net/
![Page 52: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/52.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
Duplicate/marking removal
Problems:
• Does not account for sequencing errors.
• Does not account for natural duplicates.
• Does not account for duplicate reads with different mapping locations.
![Page 53: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/53.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling 59
https://gatk.broadinstitute.org/hc/en-us/articles/360035535932-Germline-short-variant-discovery-SNPs-Indels-
GATK’s recommended workflow
![Page 54: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/54.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
• remember those?
• There are supposedto reflect P(error)
• They are not always accurate: problem for genotyping
Base quality score recalibration?
![Page 55: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/55.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
![Page 56: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/56.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
![Page 57: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/57.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
Base quality score recalibration
To work we need:• East Asian or European (as in mostly West European) samples• WGS• Sufficient coverage
My biased opinion:• Just don’t bother
![Page 58: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/58.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling 64
https://gatk.broadinstitute.org/hc/en-us/articles/360035535932-Germline-short-variant-discovery-SNPs-Indels-
GATK’s recommended workflow
We covered this before
![Page 59: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/59.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
Variant call format (VCF)
• Details which variants have been called
• Can be bgzip (block gzip) and indexed using tabix
• Using tabix, queries can be made like:
– return all variants in the region chr22:323,340-361,152
65
![Page 60: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/60.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
Variant call format (VCF)
20 51391523 . A G 173.96 . AC=2;DP=5;MQ=52.03 GT:AD:DP:GQ:PL 1/1:0,5:5:15:188,15,0
20 51392469 . C T 146.14 . AC=2;DP=4;MQ=60.00 GT:AD:DP:GQ:PL 1/1:0,4:4:12:160,12,0
20 51394015 . T C 97.64 . AC=1;DP=6;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,3:6:66:105,0,66
20 51395647 . A C 89.64 . AC=1;DP=7;MQ=57.28 GT:AD:DP:GQ:PL 0/1:4,3:7:97:97,0,100
20 51397399 . C T 93.64 . AC=1;DP=7;MQ=60.00 GT:AD:DP:GQ:PL 0/1:4,3:7:99:101,0,120
20 51402308 . C T 161.64 . AC=1;DP=9;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,6:9:63:169,0,63
66
![Page 61: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/61.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
Variant call format (VCF)
20 51391523 . A G 173.96 . AC=2;DP=5;MQ=52.03 GT:AD:DP:GQ:PL 1/1:0,5:5:15:188,15,0
20 51392469 . C T 146.14 . AC=2;DP=4;MQ=60.00 GT:AD:DP:GQ:PL 1/1:0,4:4:12:160,12,0
20 51394015 . T C 97.64 . AC=1;DP=6;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,3:6:66:105,0,66
20 51395647 . A C 89.64 . AC=1;DP=7;MQ=57.28 GT:AD:DP:GQ:PL 0/1:4,3:7:97:97,0,100
20 51397399 . C T 93.64 . AC=1;DP=7;MQ=60.00 GT:AD:DP:GQ:PL 0/1:4,3:7:99:101,0,120
20 51402308 . C T 161.64 . AC=1;DP=9;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,6:9:63:169,0,63
67
name of chromosome (ex: chr1, chr2 ...
![Page 62: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/62.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
Variant call format (VCF)
20 51391523 . A G 173.96 . AC=2;DP=5;MQ=52.03 GT:AD:DP:GQ:PL 1/1:0,5:5:15:188,15,0
20 51392469 . C T 146.14 . AC=2;DP=4;MQ=60.00 GT:AD:DP:GQ:PL 1/1:0,4:4:12:160,12,0
20 51394015 . T C 97.64 . AC=1;DP=6;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,3:6:66:105,0,66
20 51395647 . A C 89.64 . AC=1;DP=7;MQ=57.28 GT:AD:DP:GQ:PL 0/1:4,3:7:97:97,0,100
20 51397399 . C T 93.64 . AC=1;DP=7;MQ=60.00 GT:AD:DP:GQ:PL 0/1:4,3:7:99:101,0,120
20 51402308 . C T 161.64 . AC=1;DP=9;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,6:9:63:169,0,63
68
coordinate on chromosome
![Page 63: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/63.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
Variant call format (VCF)
20 51391523 . A G 173.96 . AC=2;DP=5;MQ=52.03 GT:AD:DP:GQ:PL 1/1:0,5:5:15:188,15,0
20 51392469 . C T 146.14 . AC=2;DP=4;MQ=60.00 GT:AD:DP:GQ:PL 1/1:0,4:4:12:160,12,0
20 51394015 . T C 97.64 . AC=1;DP=6;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,3:6:66:105,0,66
20 51395647 . A C 89.64 . AC=1;DP=7;MQ=57.28 GT:AD:DP:GQ:PL 0/1:4,3:7:97:97,0,100
20 51397399 . C T 93.64 . AC=1;DP=7;MQ=60.00 GT:AD:DP:GQ:PL 0/1:4,3:7:99:101,0,120
20 51402308 . C T 161.64 . AC=1;DP=9;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,6:9:63:169,0,63
69
ID (ex: rs23534)
![Page 64: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/64.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
Variant call format (VCF)
20 51391523 . A G 173.96 . AC=2;DP=5;MQ=52.03 GT:AD:DP:GQ:PL 1/1:0,5:5:15:188,15,0
20 51392469 . C T 146.14 . AC=2;DP=4;MQ=60.00 GT:AD:DP:GQ:PL 1/1:0,4:4:12:160,12,0
20 51394015 . T C 97.64 . AC=1;DP=6;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,3:6:66:105,0,66
20 51395647 . A C 89.64 . AC=1;DP=7;MQ=57.28 GT:AD:DP:GQ:PL 0/1:4,3:7:97:97,0,100
20 51397399 . C T 93.64 . AC=1;DP=7;MQ=60.00 GT:AD:DP:GQ:PL 0/1:4,3:7:99:101,0,120
20 51402308 . C T 161.64 . AC=1;DP=9;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,6:9:63:169,0,63
70
reference base
![Page 65: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/65.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
Variant call format (VCF)
20 51391523 . A G 173.96 . AC=2;DP=5;MQ=52.03 GT:AD:DP:GQ:PL 1/1:0,5:5:15:188,15,0
20 51392469 . C T 146.14 . AC=2;DP=4;MQ=60.00 GT:AD:DP:GQ:PL 1/1:0,4:4:12:160,12,0
20 51394015 . T C 97.64 . AC=1;DP=6;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,3:6:66:105,0,66
20 51395647 . A C 89.64 . AC=1;DP=7;MQ=57.28 GT:AD:DP:GQ:PL 0/1:4,3:7:97:97,0,100
20 51397399 . C T 93.64 . AC=1;DP=7;MQ=60.00 GT:AD:DP:GQ:PL 0/1:4,3:7:99:101,0,120
20 51402308 . C T 161.64 . AC=1;DP=9;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,6:9:63:169,0,63
71
alternative base
![Page 66: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/66.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
Variant call format (VCF)
20 51391523 . A G 173.96 . AC=2;DP=5;MQ=52.03 GT:AD:DP:GQ:PL 1/1:0,5:5:15:188,15,0
20 51392469 . C T 146.14 . AC=2;DP=4;MQ=60.00 GT:AD:DP:GQ:PL 1/1:0,4:4:12:160,12,0
20 51394015 . T C 97.64 . AC=1;DP=6;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,3:6:66:105,0,66
20 51395647 . A C 89.64 . AC=1;DP=7;MQ=57.28 GT:AD:DP:GQ:PL 0/1:4,3:7:97:97,0,100
20 51397399 . C T 93.64 . AC=1;DP=7;MQ=60.00 GT:AD:DP:GQ:PL 0/1:4,3:7:99:101,0,120
20 51402308 . C T 161.64 . AC=1;DP=9;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,6:9:63:169,0,63
72
quality field
![Page 67: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/67.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
Variant call format (VCF)
20 51391523 . A G 173.96 . AC=2;DP=5;MQ=52.03 GT:AD:DP:GQ:PL 1/1:0,5:5:15:188,15,0
20 51392469 . C T 146.14 . AC=2;DP=4;MQ=60.00 GT:AD:DP:GQ:PL 1/1:0,4:4:12:160,12,0
20 51394015 . T C 97.64 . AC=1;DP=6;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,3:6:66:105,0,66
20 51395647 . A C 89.64 . AC=1;DP=7;MQ=57.28 GT:AD:DP:GQ:PL 0/1:4,3:7:97:97,0,100
20 51397399 . C T 93.64 . AC=1;DP=7;MQ=60.00 GT:AD:DP:GQ:PL 0/1:4,3:7:99:101,0,120
20 51402308 . C T 161.64 . AC=1;DP=9;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,6:9:63:169,0,63
73
Filter (ex: ‘LowQual’)
![Page 68: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/68.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
Variant call format (VCF)
20 51391523 . A G 173.96 . AC=2;DP=5;MQ=52.03 GT:AD:DP:GQ:PL 1/1:0,5:5:15:188,15,0
20 51392469 . C T 146.14 . AC=2;DP=4;MQ=60.00 GT:AD:DP:GQ:PL 1/1:0,4:4:12:160,12,0
20 51394015 . T C 97.64 . AC=1;DP=6;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,3:6:66:105,0,66
20 51395647 . A C 89.64 . AC=1;DP=7;MQ=57.28 GT:AD:DP:GQ:PL 0/1:4,3:7:97:97,0,100
20 51397399 . C T 93.64 . AC=1;DP=7;MQ=60.00 GT:AD:DP:GQ:PL 0/1:4,3:7:99:101,0,120
20 51402308 . C T 161.64 . AC=1;DP=9;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,6:9:63:169,0,63
74
Info field ex:AC= allele countDP = depthMQ = root mean square of the mapping quality
![Page 69: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/69.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
Variant call format (VCF)
20 51391523 . A G 173.96 . AC=2;DP=5;MQ=52.03 GT:AD:DP:GQ:PL 1/1:0,5:5:15:188,15,0
20 51392469 . C T 146.14 . AC=2;DP=4;MQ=60.00 GT:AD:DP:GQ:PL 1/1:0,4:4:12:160,12,0
20 51394015 . T C 97.64 . AC=1;DP=6;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,3:6:66:105,0,66
20 51395647 . A C 89.64 . AC=1;DP=7;MQ=57.28 GT:AD:DP:GQ:PL 0/1:4,3:7:97:97,0,100
20 51397399 . C T 93.64 . AC=1;DP=7;MQ=60.00 GT:AD:DP:GQ:PL 0/1:4,3:7:99:101,0,120
20 51402308 . C T 161.64 . AC=1;DP=9;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,6:9:63:169,0,63
75
Format field, what do the next fields mean?
![Page 70: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/70.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
Variant call format (VCF)
20 51391523 . A G 173.96 . AC=2;DP=5;MQ=52.03 GT:AD:DP:GQ:PL 1/1:0,5:5:15:188,15,0
20 51392469 . C T 146.14 . AC=2;DP=4;MQ=60.00 GT:AD:DP:GQ:PL 1/1:0,4:4:12:160,12,0
20 51394015 . T C 97.64 . AC=1;DP=6;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,3:6:66:105,0,66
20 51395647 . A C 89.64 . AC=1;DP=7;MQ=57.28 GT:AD:DP:GQ:PL 0/1:4,3:7:97:97,0,100
20 51397399 . C T 93.64 . AC=1;DP=7;MQ=60.00 GT:AD:DP:GQ:PL 0/1:4,3:7:99:101,0,120
20 51402308 . C T 161.64 . AC=1;DP=9;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,6:9:63:169,0,63
76
Most likely genotype
![Page 71: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/71.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
Variant call format (VCF)
20 51391523 . A G 173.96 . AC=2;DP=5;MQ=52.03 GT:AD:DP:GQ:PL 1/1:0,5:5:15:188,15,0
20 51392469 . C T 146.14 . AC=2;DP=4;MQ=60.00 GT:AD:DP:GQ:PL 1/1:0,4:4:12:160,12,0
20 51394015 . T C 97.64 . AC=1;DP=6;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,3:6:66:105,0,66
20 51395647 . A C 89.64 . AC=1;DP=7;MQ=57.28 GT:AD:DP:GQ:PL 0/1:4,3:7:97:97,0,100
20 51397399 . C T 93.64 . AC=1;DP=7;MQ=60.00 GT:AD:DP:GQ:PL 0/1:4,3:7:99:101,0,120
20 51402308 . C T 161.64 . AC=1;DP=9;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,6:9:63:169,0,63
77
Allele distribution
![Page 72: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/72.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
Variant call format (VCF)
20 51391523 . A G 173.96 . AC=2;DP=5;MQ=52.03 GT:AD:DP:GQ:PL 1/1:0,5:5:15:188,15,0
20 51392469 . C T 146.14 . AC=2;DP=4;MQ=60.00 GT:AD:DP:GQ:PL 1/1:0,4:4:12:160,12,0
20 51394015 . T C 97.64 . AC=1;DP=6;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,3:6:66:105,0,66
20 51395647 . A C 89.64 . AC=1;DP=7;MQ=57.28 GT:AD:DP:GQ:PL 0/1:4,3:7:97:97,0,100
20 51397399 . C T 93.64 . AC=1;DP=7;MQ=60.00 GT:AD:DP:GQ:PL 0/1:4,3:7:99:101,0,120
20 51402308 . C T 161.64 . AC=1;DP=9;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,6:9:63:169,0,63
78
Depth
![Page 73: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/73.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
Variant call format (VCF)
20 51391523 . A G 173.96 . AC=2;DP=5;MQ=52.03 GT:AD:DP:GQ:PL 1/1:0,5:5:15:188,15,0
20 51392469 . C T 146.14 . AC=2;DP=4;MQ=60.00 GT:AD:DP:GQ:PL 1/1:0,4:4:12:160,12,0
20 51394015 . T C 97.64 . AC=1;DP=6;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,3:6:66:105,0,66
20 51395647 . A C 89.64 . AC=1;DP=7;MQ=57.28 GT:AD:DP:GQ:PL 0/1:4,3:7:97:97,0,100
20 51397399 . C T 93.64 . AC=1;DP=7;MQ=60.00 GT:AD:DP:GQ:PL 0/1:4,3:7:99:101,0,120
20 51402308 . C T 161.64 . AC=1;DP=9;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,6:9:63:169,0,63
79
Genotype quality
![Page 74: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/74.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
Variant call format (VCF)
20 51391523 . A G 173.96 . AC=2;DP=5;MQ=52.03 GT:AD:DP:GQ:PL 1/1:0,5:5:15:188,15,0
20 51392469 . C T 146.14 . AC=2;DP=4;MQ=60.00 GT:AD:DP:GQ:PL 1/1:0,4:4:12:160,12,0
20 51394015 . T C 97.64 . AC=1;DP=6;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,3:6:66:105,0,66
20 51395647 . A C 89.64 . AC=1;DP=7;MQ=57.28 GT:AD:DP:GQ:PL 0/1:4,3:7:97:97,0,100
20 51397399 . C T 93.64 . AC=1;DP=7;MQ=60.00 GT:AD:DP:GQ:PL 0/1:4,3:7:99:101,0,120
20 51402308 . C T 161.64 . AC=1;DP=9;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,6:9:63:169,0,63
80
PHRED-scaled likelihood
![Page 75: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/75.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
The likelihood
81
GG = 6.7e-05 41.70 40.60
GT = 0.77888 1.09 0.00
TT = 0.22104 6.56 5.47
PHRED PHRED-scaled
PL field
GQ field
![Page 76: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/76.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling 82
https://gatk.broadinstitute.org/hc/en-us/articles/360035535932-Germline-short-variant-discovery-SNPs-Indels-
GATK’s recommended workflow
![Page 77: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/77.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
Variant filtration (soft)
• How do we remove false positive calls?
• Use known polymorphic sites to estimate what a real variant and a false variant “looks like”
• Learn how does the known sites (=truth set) look like in our data
• Evaluate on all our data, filter sites that look different!
83
![Page 78: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/78.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
![Page 79: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/79.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
Variant filtration (hard)• Hard filtering:
– Variant quality score /depth – Mapping quality– Mappability– Strand bias (the variant being seen only on the forward
strand or only on the reverse strand)– Depth
• BCFtools can perform this• Depends on the project at hand• Be careful of introducing a bias in favor of certain types of
variants
![Page 80: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/80.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling 86
https://gatk.broadinstitute.org/hc/en-us/articles/360035535932-Germline-short-variant-discovery-SNPs-Indels-
GATK’s recommended workflow
![Page 81: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/81.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
Variant annotation
87
What does the SNP do?
![Page 82: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/82.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
Variant annotation
• Some example of tools:
– Annovar
– Ensembl Variant Effect Predictor (VEP)
– SnpEff
• As good as annotations
• Beware of gene expression
88
![Page 83: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/83.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
we did not cover (in detail)...
![Page 84: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/84.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
Germline vs somatic
![Page 85: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/85.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
de novo
![Page 86: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/86.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
Polyploid
![Page 87: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/87.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
Phasing
TACAAATAT
TAGAAACAT
TACAAACAT
TAGAAATATvs
TA AAA AT C T G C
![Page 88: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/88.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
INDELs
Insertions Deletion
TACAAATAT
TACAAAGCTAT
TACAAATAT
TACAAAT
![Page 89: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/89.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
TACAAA--TAT TACAAAGCTAT
INDELs
Caution:
GC was inserted
![Page 90: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/90.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
TACAAA--TAT TACAAAGCTAT
INDELs
Caution:
GC was deleted
![Page 91: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/91.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
TACAAA--TAT TACAAAGCTAT TACAAAGCTAT
GC was deletedmore likely, not guaranteed!
![Page 92: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/92.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
Structural variants
![Page 93: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/93.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
Structural variants
Translocation:
![Page 94: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/94.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
Structural variants
Estivill, Xavier, and Lluís Armengol. "Copy Number Variants and Common Disorders: Filling the Gaps and Exploring Complexity in Genome-Wide Association Studies." PLoS Genet 3.10 (2007): e190.
Copy number variations (CNV)
![Page 95: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/95.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
Structural variants
Copy number variations (CNV)effect on coverage
Weetman, David, Luc S. Djogbenou, and Eric Lucas. "Copy number variation (CNV) and insecticide resistance in mosquitoes: evolving knowledge or an evolving problem?." Current Opinion in Insect Science 27 (2018): 82-88.
![Page 96: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/96.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
Ethical concerns
privacy, justice, fairness etc..
![Page 97: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT](https://reader035.vdocuments.net/reader035/viewer/2022081621/6128263a599d66182661ca6d/html5/thumbnails/97.jpg)
DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling
Exercise time!
http://teaching.healthtech.dtu.dk/22126/index.php/Postprocess_exercise