bioinformatics at molecular epidemiology - new tools for identifying indels in sequencing data
DESCRIPTION
Bioinformatics at Molecular Epidemiology - new tools for identifying indels in sequencing data. Kai Ye [email protected]. Data collection for osteoarthritis, cardiovascular disease and longevity. Serum parameters Cellular characteristics (biobank) Skin ageing Glycosylation Metabonomic - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Bioinformatics at Molecular Epidemiology - new tools for identifying indels in sequencing data](https://reader036.vdocuments.net/reader036/viewer/2022062520/56815a8a550346895dc8002f/html5/thumbnails/1.jpg)
Bioinformatics at Molecular Epidemiology- new tools for identifying indels in sequencing data
![Page 2: Bioinformatics at Molecular Epidemiology - new tools for identifying indels in sequencing data](https://reader036.vdocuments.net/reader036/viewer/2022062520/56815a8a550346895dc8002f/html5/thumbnails/2.jpg)
Data collection for osteoarthritis, cardiovascular disease and longevity
• Serum parameters• Cellular characteristics (biobank)• Skin ageing• Glycosylation • Metabonomic• Transcriptomic• Genetic (GWAS/sequence)• Epigenetic• Data Integration
0.0 5.0 10.0 15.0 20.0 25.0 30.0 35.0 40.0 45.0 50.0 55.0 60.0 65.0 70.0 75.0 80.0-50
-20
0
20
40
60
80
100
120
140
160
180
200
220
240
260
280
300
320
350 612 #68 6 dec B4 FLUmV
min
1 - 36.281
2 - 38.161
3 - 41.934
4 - Intergrate-11 - 42.787
5 - 44.173
6 - Intergrate-12 - 45.324
7 - Intergrate-13 - 48.294
8 - 49.809
9 - 52.029
10 - 54.688
11 - 55.813
12 - 58.113
13 - 60.439
14 - 65.038
15 - 66.956
16 - 69.878
17 - 72.70518 - 76.407
N-Acetylglucosamine
Galactose
Mannose
Sialicacid
Fucose
0.0 5.0 10.0 15.0 20.0 25.0 30.0 35.0 40.0 45.0 50.0 55.0 60.0 65.0 70.0 75.0 80.0-50
-20
0
20
40
60
80
100
120
140
160
180
200
220
240
260
280
300
320
350 612 #68 6 dec B4 FLUmV
min
1 - 36.281
2 - 38.161
3 - 41.934
4 - Intergrate-11 - 42.787
5 - 44.173
6 - Intergrate-12 - 45.324
7 - Intergrate-13 - 48.294
8 - 49.809
9 - 52.029
10 - 54.688
11 - 55.813
12 - 58.113
13 - 60.439
14 - 65.038
15 - 66.956
16 - 69.878
17 - 72.70518 - 76.407
N-Acetylglucosamine
Galactose
Mannose
Sialicacid
Fucose
N-Acetylglucosamine
Galactose
Mannose
Sialicacid
Fucose
![Page 3: Bioinformatics at Molecular Epidemiology - new tools for identifying indels in sequencing data](https://reader036.vdocuments.net/reader036/viewer/2022062520/56815a8a550346895dc8002f/html5/thumbnails/3.jpg)
Genetic &Epigenetic analyses
BiochemanalysesExpression
analysis
metabonomicanalysis
GlycosylationCell responses
Joost KokErik vd Akker Kai Ye Statistical analysis
![Page 4: Bioinformatics at Molecular Epidemiology - new tools for identifying indels in sequencing data](https://reader036.vdocuments.net/reader036/viewer/2022062520/56815a8a550346895dc8002f/html5/thumbnails/4.jpg)
About me
• 1995 – 2003 B.S. and M.S. in biology and pharmaceutical science
• 2004 – 2008 PhD with Cum Laude at Leiden University. Thesis title: Novel algorithms for protein sequence analysis
• 2008 – 2009 Postdoc at European Bioinformatics Institute, collaborating with scientists in Sanger Institute
• Currently assistant professor at MolEpi
![Page 5: Bioinformatics at Molecular Epidemiology - new tools for identifying indels in sequencing data](https://reader036.vdocuments.net/reader036/viewer/2022062520/56815a8a550346895dc8002f/html5/thumbnails/5.jpg)
A Pindel approach for identifying indels in Next-Gen sequencing data
• Paired-end reads in Next-gen sequencing
• Indel detection algorithms• Pindel• Cancer genome project• 1000 genomes project
![Page 6: Bioinformatics at Molecular Epidemiology - new tools for identifying indels in sequencing data](https://reader036.vdocuments.net/reader036/viewer/2022062520/56815a8a550346895dc8002f/html5/thumbnails/6.jpg)
Paired-end reads in Next Generation sequencing
~ insert size
![Page 7: Bioinformatics at Molecular Epidemiology - new tools for identifying indels in sequencing data](https://reader036.vdocuments.net/reader036/viewer/2022062520/56815a8a550346895dc8002f/html5/thumbnails/7.jpg)
SNP
Mapping paired-end reads
CNVs: copy number variations; INDELs: insertions and deletions; SVs: Structural variations
![Page 8: Bioinformatics at Molecular Epidemiology - new tools for identifying indels in sequencing data](https://reader036.vdocuments.net/reader036/viewer/2022062520/56815a8a550346895dc8002f/html5/thumbnails/8.jpg)
Gapped alignment for small indels
ATCCGTATCACGGTCA-CAGATCAGTCCAGT
ATCCGTATCACGGTCAGCAGATCAGTCCAGT
indel
![Page 9: Bioinformatics at Molecular Epidemiology - new tools for identifying indels in sequencing data](https://reader036.vdocuments.net/reader036/viewer/2022062520/56815a8a550346895dc8002f/html5/thumbnails/9.jpg)
Read-depth for CNVs
![Page 10: Bioinformatics at Molecular Epidemiology - new tools for identifying indels in sequencing data](https://reader036.vdocuments.net/reader036/viewer/2022062520/56815a8a550346895dc8002f/html5/thumbnails/10.jpg)
Read-pair approach for SVs
No Indel
Deletion
Insertion
Sample
Reference
Sample
Reference
Sample
Reference
![Page 11: Bioinformatics at Molecular Epidemiology - new tools for identifying indels in sequencing data](https://reader036.vdocuments.net/reader036/viewer/2022062520/56815a8a550346895dc8002f/html5/thumbnails/11.jpg)
Mapping paired-end reads
• read-pairs
• read-depth
SNP or small indel
![Page 12: Bioinformatics at Molecular Epidemiology - new tools for identifying indels in sequencing data](https://reader036.vdocuments.net/reader036/viewer/2022062520/56815a8a550346895dc8002f/html5/thumbnails/12.jpg)
Mapping paired-end reads
• read-pairs
• read-depth
SNP or small indel
![Page 13: Bioinformatics at Molecular Epidemiology - new tools for identifying indels in sequencing data](https://reader036.vdocuments.net/reader036/viewer/2022062520/56815a8a550346895dc8002f/html5/thumbnails/13.jpg)
test
ref
1base - 1million bases
Pindel: Deletions
![Page 14: Bioinformatics at Molecular Epidemiology - new tools for identifying indels in sequencing data](https://reader036.vdocuments.net/reader036/viewer/2022062520/56815a8a550346895dc8002f/html5/thumbnails/14.jpg)
22 April 2023 14
Pindel: Deletions
ref
Anchor
![Page 15: Bioinformatics at Molecular Epidemiology - new tools for identifying indels in sequencing data](https://reader036.vdocuments.net/reader036/viewer/2022062520/56815a8a550346895dc8002f/html5/thumbnails/15.jpg)
22 April 2023 15
ref
Pindel: Deletions
Anchor
2 x average distance
![Page 16: Bioinformatics at Molecular Epidemiology - new tools for identifying indels in sequencing data](https://reader036.vdocuments.net/reader036/viewer/2022062520/56815a8a550346895dc8002f/html5/thumbnails/16.jpg)
22 April 2023 16
ref
Pindel: Deletions
Anchor
2 x average distance
Expected maximum deletion size + read length (36)
![Page 17: Bioinformatics at Molecular Epidemiology - new tools for identifying indels in sequencing data](https://reader036.vdocuments.net/reader036/viewer/2022062520/56815a8a550346895dc8002f/html5/thumbnails/17.jpg)
22 April 2023 17
reference
Pindel: Deletions
sample
![Page 18: Bioinformatics at Molecular Epidemiology - new tools for identifying indels in sequencing data](https://reader036.vdocuments.net/reader036/viewer/2022062520/56815a8a550346895dc8002f/html5/thumbnails/18.jpg)
22 April 2023 18
African male: NA18507
• Bentley et al., Nature 2008• 135Gb of sequence• ~4 billion paired 35-base reads• After preprocessing:
56,161,333 pairs of one-end mapped reads
• Pindel– 142,908 1-16bp insertions– 162,068 1bp-10kb deletions
![Page 19: Bioinformatics at Molecular Epidemiology - new tools for identifying indels in sequencing data](https://reader036.vdocuments.net/reader036/viewer/2022062520/56815a8a550346895dc8002f/html5/thumbnails/19.jpg)
22 April 2023 19
Deletion size distribution
![Page 20: Bioinformatics at Molecular Epidemiology - new tools for identifying indels in sequencing data](https://reader036.vdocuments.net/reader036/viewer/2022062520/56815a8a550346895dc8002f/html5/thumbnails/20.jpg)
Applications
• Cancer genome project• 1000 genomes project
![Page 21: Bioinformatics at Molecular Epidemiology - new tools for identifying indels in sequencing data](https://reader036.vdocuments.net/reader036/viewer/2022062520/56815a8a550346895dc8002f/html5/thumbnails/21.jpg)
Cancer genome
• COLO-829 cells• Normal ~30x paired-end 100bp reads• Tumor ~40x paired-end 100bp reads• Search for somatic (tumor specific) indels
![Page 22: Bioinformatics at Molecular Epidemiology - new tools for identifying indels in sequencing data](https://reader036.vdocuments.net/reader036/viewer/2022062520/56815a8a550346895dc8002f/html5/thumbnails/22.jpg)
![Page 23: Bioinformatics at Molecular Epidemiology - new tools for identifying indels in sequencing data](https://reader036.vdocuments.net/reader036/viewer/2022062520/56815a8a550346895dc8002f/html5/thumbnails/23.jpg)
1000genomes project
• Pilot 1: 180 people of 3 major geographic groups (YRI, CEU, CHB and JPT) at low coverage (~4x)
• Pilot 2: the genomes of two families (CEU and YRI, both parents and an adult child) with deep coverage (20x per genome)
• Pilot 3: sequencing the coding regions (exons) of 1,000 genes in 1,000 people with deep coverage (20x).