long amplicon sequencing of repetitive regions and genomic ... · laboratory for cytogenetics and...
TRANSCRIPT
Laboratory for Cytogenetics and Genome Research [email protected] for Cytogenetics and Genome Research [email protected]
Matthew HestandMatthew HestandKU Leuven, BelgiumKU Leuven, Belgium
Long Amplicon Sequencing of Long Amplicon Sequencing of Repetitive Regions and Genomic Repetitive Regions and Genomic
RearrangementsRearrangements
Laboratory for Cytogenetics and Genome Research [email protected] for Cytogenetics and Genome Research [email protected]
Targeted (amplicon): Non-targeted:- cDNAs - small genomes - specific genes - methylomes- breakpoint validation- phasing- sequence repeat units
PacBio Experiments at KU LeuvenPacBio Experiments at KU LeuvenYear 1Year 1
Laboratory for Cytogenetics and Genome Research [email protected] for Cytogenetics and Genome Research [email protected]
OutlineOutline
Targeted Breakpoint Validation (in a LINE)
Targeted Breakpoint Validation & Phasing (PAR)
Tandem repeat analysis (FMR1)
Laboratory for Cytogenetics and Genome Research [email protected] for Cytogenetics and Genome Research [email protected]
Chr3-12 Breakpoint ValidationChr3-12 Breakpoint Validation
Similar to:
Targeted Breakpoint Validation: Chr3-12 in LINE
Laboratory for Cytogenetics and Genome Research [email protected] for Cytogenetics and Genome Research [email protected]
Chr3-12 Breakpoint IdentificationChr3-12 Breakpoint Identification
9kb LR-PCR through breakpoint (LINE) chr3-12
chr3 chr12Illumina:
PacBio:
Targeted Breakpoint Validation: Chr3-12 in LINE
Robberecht et al. 2013
Laboratory for Cytogenetics and Genome Research [email protected] for Cytogenetics and Genome Research [email protected]
CHR 12 (L1PA3)
CHR 3 (L1PA2)
Allora Assembly
Targeted Breakpoint Validation: Chr3-12 in LINE
Laboratory for Cytogenetics and Genome Research [email protected] for Cytogenetics and Genome Research [email protected]
HGAPAssembly
Targeted Breakpoint Validation: Chr3-12 in LINE
Laboratory for Cytogenetics and Genome Research [email protected] for Cytogenetics and Genome Research [email protected]
OutlineOutline
Targeted Breakpoint Validation (in a LINE)
Targeted Breakpoint Validation & Phasing (PAR)
Tandem repeat analysis (FMR1)
Laboratory for Cytogenetics and Genome Research [email protected] for Cytogenetics and Genome Research [email protected]
Parts of this section are not currently available for sharing. For this please watch out in the near future for:
“Pseudoautosomal Region 1 Length Polymorphism in the Human Population” Mensah MA et al.
-Paper under review
Laboratory for Cytogenetics and Genome Research [email protected] for Cytogenetics and Genome Research [email protected]
PacBio sequencePacBio sequence24 barcodes – 1 SMRTcell24 barcodes – 1 SMRTcell
Initial analysis tries:
− RS_Resequencing_GATK_Barcode and BWA-SW (tried
several settings, final use Carneiro et al. 2012) w/GATK or Varscan -observed many heterozygous calls in breakpoint amplicon or chr.Y specific amplicon
− Appeared to be a reference bias
Extended PAR region validation and functional proof
Laboratory for Cytogenetics and Genome Research [email protected] for Cytogenetics and Genome Research [email protected]
Normal Ref
Alternate Ref
Extended PAR region validation and functional proof
Laboratory for Cytogenetics and Genome Research [email protected] for Cytogenetics and Genome Research [email protected]
Normal Reference Alternative Reference
Carneiro et al. 2012BMC Genomics 13:375
Extended PAR region validation and functional proof
See Figure 2B
in Carneiro et al.
2012
Laboratory for Cytogenetics and Genome Research [email protected] for Cytogenetics and Genome Research [email protected]
SolutionSolution
Call variants with BWA-SW/Varscan Create a new reference from variants Re-Call variants with BWA-SW/Varscan Average the number of reads supporting the
reference/variant in both variant calls
Varscan SettingsMinimum read quality = 7Minumum coverage = 10Minimum number variant reads = 5Minimum variant frequency = 0.15
Final Frequency cutoffs:>70% = Hom.Variant<30% = Hom.Reference
-gives no Heterozygous Chr.Y or Breakpoint calls, but still Duplication het. Calls
Extended PAR region validation and functional proof
Laboratory for Cytogenetics and Genome Research [email protected] for Cytogenetics and Genome Research [email protected]
Example CallsExample Calls
H: Heterozygous V: Homozygous R1: called reference in first Varscan run
If not R1:H/V/R : Var Freq : call 1 (Ref:Var:Total) : call 2 (Ref:Var:Total)
Observation: As expected, call 1 is reference biased and call 2 is variant biased
Extended PAR region validation and functional proof
Laboratory for Cytogenetics and Genome Research [email protected] for Cytogenetics and Genome Research [email protected]
Duplication has two alleles:Duplication has two alleles:can we determine the paternalcan we determine the paternal
-need Phasing-need Phasing
With error rate and BAM file format is impossible to see in IGV
Solution:
− Walk by trio's of SNPs and report back the read counts
Extended PAR region validation and functional proof
Laboratory for Cytogenetics and Genome Research [email protected] for Cytogenetics and Genome Research [email protected]
SNP positions in amplicon: 567 - 754 - 1236 - 1296 - 1415
bam1: 567,754,1236 : GGC 1 GGG 15 TAC 15 GAG 6bam2: 567,754,1236 : TAG 6 GGG 19 TAC 8 TGG 1 GAG 1 TGC 3
bam1: 754,1236,1296 : GGC 6 AGC 3 GCC 1 ACC 19 GGT 6 AGT 4bam2: 754,1236,1296 : AGC 3 GCC 2 ACC 4 GCT 2 ACT 4 GGT 21 AGT 4
bam1: 1236,1296,1415 :GCG 2 CCA 2 CCG 21 GCA 12 GTA 14bam2: 1236,1296,1415 :GCG 5 CTA 1 CCG 9 CTG 9 GTA 19 GTG 13
predicted alleles: GGGTA and TACCG
For automation: start walking from most confident junction:-two highest alleles same in both BAMs-then by highest number of total reads
Step 3Step 3
Extended PAR region validation and functional proof
Laboratory for Cytogenetics and Genome Research [email protected] for Cytogenetics and Genome Research [email protected]
Example family 1: Example family 1:
Extended PAR region validation and functional proof
} Paternally inherited allele
Laboratory for Cytogenetics and Genome Research [email protected] for Cytogenetics and Genome Research [email protected]
FutureFuture Phase with Assembly to completely avoid
reference bias.
− Already incorporated into SMRT Portal version v2.1.1 Long Amplicon Analysis
− Seems to work great on new project: x x
x x x x
x
8 carriers, 2 controls
Laboratory for Cytogenetics and Genome Research [email protected] for Cytogenetics and Genome Research [email protected]
OutlineOutline
Targeted Breakpoint Validation (in a LINE)
Targeted Breakpoint Validation & Phasing (PAR)
Tandem repeat analysis (FMR1)
Laboratory for Cytogenetics and Genome Research [email protected] for Cytogenetics and Genome Research [email protected]
FMR1 Amplicon SequencingFMR1 Amplicon Sequencing
Loomis et al. 2013Genome Res. 23(1):121-8
AGG Sanger confirmed
-can sequence repeats (even 100% GC content)
-can detect nt changes in repeats
Using filtered CCS reads:
CGG repeat sequencing
See Figure 3B in Loomis et al. manuscript
Laboratory for Cytogenetics and Genome Research [email protected] for Cytogenetics and Genome Research [email protected]
Clinic WorkflowClinic Workflow
PCR (CGG)n
Report
XX: 2 alleles < 54 unitsXY: 1 allele < 54 units
XX: 1 allele or 2 alleles with one > 54
XY: no amplification or >54
TP PCR
Southern Blot
>80
XX: Hom. Normal or 2 alleles w/one <80XY: 1 allele <80
Example: 19 repeats
Example: 90 repeats
CGG repeat sequencing
Laboratory for Cytogenetics and Genome Research [email protected] for Cytogenetics and Genome Research [email protected]
FMR1 AmpliconFMR1 Amplicon(Simon Ardui)(Simon Ardui)
19 repeat units (length 295) 90 repeat units (length 507)
CGG repeat sequencing
Laboratory for Cytogenetics and Genome Research [email protected] for Cytogenetics and Genome Research [email protected]
FMR1FMR1RS_Long_Amplicon_Analysis.1RS_Long_Amplicon_Analysis.1
19 repeat units (length 295) 90 repeat units (length 507)Cluster Phase Length Est.Acc. Subreads Cluster Phase Length Est.Acc. SubreadsCluster0 Phase0 295 98.38 300 Cluster0 Phase0 516 99.07 300Cluster1 Phase0 607 99.21 69 Cluster1 Phase0 607 99.21 300Cluster2 Phase0 398 98.34 21 Cluster2 Phase0 288 98.34 31Cluster2 Phase1 368 97.41 15 Cluster3 Phase0 468 98.98 29Cluster3 Phase0 247 97.93 34 Cluster4 Phase0 502 99 28Cluster4 Phase0 464 98.97 29 Cluster5 Phase0 372 98.71 27Cluster5 Phase0 352 98.61 28 Cluster6 Phase0 858 99.43 20Cluster6 Phase0 307 98.33 21 Cluster6 Phase1 995 51.94 6
Cluster7 Phase0 565 99.15 26Cluster8 Phase0 565 97.83 26Cluster9 Phase0 457 98.92 24Cluster10 Phase0 518 99.08 24Cluster11 Phase0 353 98.44 24Cluster12 Phase0 573 99.17 23Cluster13 Phase0 387 98.72 23Cluster14 Phase0 408 98.82 23Cluster15 Phase0 560 99.14 22Cluster16 Phase0 494 98.94 22Cluster17 Phase0 596 99.15 21
BLASTs to targetBLASTs to control seq.
CGG repeat sequencing
Laboratory for Cytogenetics and Genome Research [email protected] for Cytogenetics and Genome Research [email protected]
FMR1: 19 repeat best contigFMR1: 19 repeat best contig
Target 1 CAGGCGCTCAGCTCCGTTTCGGTTTCACTTCCGGTGGAGGGCCGCCTCTGAgcgggcggc 60 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||PacBio 1 CAGGCGCTCAGCTCCGTTTCGGTTTCACTTCCGGTGGAGGGCCGCCTCTGAGCGGGCGGC 60
Target 61 gggccgacggcgagcgcgggcggcggcggtgacggaggcgccgctgccagggggcgtgcg 120 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||PacBio 61 GGGCCGACGGCGAGCGCGGGCGGCGGCGGTGACGGAGGCGCCGCTGCCAGGGGGCGTGCG 120
Target 121 gcagcgcggcggcggcggcggcggcggcggcggaggcggcggcggcggcggcggcggcgg 180 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||PacBio 121 GCAGCGCGGCGGCGGCGGCGGCGGCGGCGGCGGAGGCGGCGGCGGCGGCGGCGGCGGCGG 180
Target 181 cggcTGGGCCTCGAGCGCCCGCAGCCCACCTCTCGGGGGCGGGCTCCCGGCGCTAGCAGG 240 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||PacBio 181 CGGCTGGGCCTCGAGCGCCCGCAGCCCACCTCTCGGGGGCGGGCTCCCGGCGCTAGCAGG 240
Target 241 GCTGAAGAGAAGATGGAGGAGCTGGTGGTGGAAGTGCGGGGCTCCAATGGCGCTT 295 |||||||||||||||||||||||||||||||||||||||||||||||||||||||PacBio 241 GCTGAAGAGAAGATGGAGGAGCTGGTGGTGGAAGTGCGGGGCTCCAATGGCGCTT 295
CGG repeat sequencing
Laboratory for Cytogenetics and Genome Research [email protected] for Cytogenetics and Genome Research [email protected]
FMR1: 90 repeat best contigFMR1: 90 repeat best contigTarget 1 CAGGCGCTCAGCTCCGTTTCGGTTTCACTTCCGGTGGAGGGCCGCCTCTGAgcgggcggc 60 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||PacBio 2 CAGGCGCTCAGCTCCGTTTCGGTTTCACTTCCGGTGGAGGGCCGCCTCTGAGCGGGCGGC 61
Target 61 gggccgacggcgagcgcgggcggcggcggtgacggaggcgccgctgccagggggcgtgcg 120 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||PacBio 62 GGGCCGACGGCGAGCGCGGGCGGCGGCGGTGACGGAGGCGCCGCTGCCAGGGGGCGTGCG 121
Target 121 gcagcgcggcggcggcggcggcggcggcggcggcggaggcggcggcggcgg 171 ||||||||||||||||||||||||||||||| ||||| ||||||||||||||PacBio 122 GCAGCGCGGCGGCGGCGGCGGCGGCGGCGGCGGAGGCGGCGGCGGCGGCGGCGGCGGCGG 181
Target 172 cggcggcggcggcggcggcggcggcggcggcggcggcggcggcggcggcggcggcggcgg 231 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||PacBio 182 CGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGG 241
Target 232 cggcggcggcggcggcggcggcggcggcggcggcggcggcggcggcggcggcggcggcgg 291 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||PacBio 242 CGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGG 301
Target 292 cggcggcggcggcggcggcggcggcggcggcggcggcggcggcggcggcggcggcggcgg 351 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||PacBio 302 CGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGG 361
Target 352 cggcggcggcggcggcggcggcggcggcggcggcggcggcggcggcTGGGCCTCGAGCGC 411 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||PacBio 362 CGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCTGGGCCTCGAGCGC 421
Target 412 CCGCAGCCCACCTCTCGGGGGCGGGCTCCCGGCGCTAGCAGGGCTGAAGAGAAGATGGAG 471 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||PacBio 422 CCGCAGCCCACCTCTCGGGGGCGGGCTCCCGGCGCTAGCAGGGCTGAAGAGAAGATGGAG 481
Target 472 GAGCTGGTGGTGGAAGTGCGGGGCTCCAATGGCGCT 507 ||||||||||||||||||||||||||||||||||||PacBio 482 GAGCTGGTGGTGGAAGTGCGGGGCTCCAATGGCGCT 517
CGG repeat sequencing
Laboratory for Cytogenetics and Genome Research [email protected] for Cytogenetics and Genome Research [email protected]
Future analysis thoughts: Future analysis thoughts: - filter on Primers- filter on Primers - # Subreads- # Subreads
19 repeat units (length 295) 90 repeat units (length 507)Cluster Phase Length Est.Acc. Subreads Cluster Phase Length Est.Acc. SubreadsCluster0 Phase0 295 98.38 300 Cluster0 Phase0 516 99.07 300Cluster1 Phase0 607 99.21 69 Cluster1 Phase0 607 99.21 300Cluster2 Phase0 398 98.34 21 Cluster2 Phase0 288 98.34 31Cluster2 Phase1 368 97.41 15 Cluster3 Phase0 468 98.98 29Cluster3 Phase0 247 97.93 34 Cluster4 Phase0 502 99 28Cluster4 Phase0 464 98.97 29 Cluster5 Phase0 372 98.71 27Cluster5 Phase0 352 98.61 28 Cluster6 Phase0 858 99.43 20Cluster6 Phase0 307 98.33 21 Cluster6 Phase1 995 51.94 6
Cluster7 Phase0 565 99.15 26Cluster8 Phase0 565 97.83 26Cluster9 Phase0 457 98.92 24Cluster10 Phase0 518 99.08 24Cluster11 Phase0 353 98.44 24Cluster12 Phase0 573 99.17 23Cluster13 Phase0 387 98.72 23Cluster14 Phase0 408 98.82 23Cluster15 Phase0 560 99.14 22Cluster16 Phase0 494 98.94 22Cluster17 Phase0 596 99.15 21
BLASTs to targetBLASTs to control seq.
CGG repeat sequencing
Laboratory for Cytogenetics and Genome Research [email protected] for Cytogenetics and Genome Research [email protected]
ConclusionsConclusions
Variety of Projects so far. Improved analysis methods, but nothing
standard yet...
Project Type Final Analysis
LINE Breakpoint long reads HGAP assembly
PAR long reads/ custom BWA-SW/multiplexed Varscan realign & phase
FMR1 long reads/ PB-Long Amplicon assemblyRepetitive/high GC
Future Phasing long reads PB-Long amplicon assembly,example Inc. Phasing
Laboratory for Cytogenetics and Genome Research [email protected] for Cytogenetics and Genome Research [email protected]
ThanksThanks
Joris Vermeesch1 Jeroen Van Houdt1
Matthias Declercq1 Martin A. Mensah1,2
Maarten H.D. Larmuseau3,4,5 Simon Ardui1
Greet Peeters1 Herman Van Biesen
Mala Isrie1 , Nancy Vanderheyden3, Erika L. Souche1, Radka Stoeva1, Hilde Van
Esch1, Koen Devriendt1, Thierry Voet1, Ronny Decorte3,4, Peter Robinson2
1 KU Leuven, Department of Human Genetics2 Institut fur Medizinische Genetik und Humangenetik, Charite-Universitatsmedizin Berlin3 UZ Leuven, Laboratory of Forensic Genetics and Molecular Archaeology4 KU Leuven, Department of Imaging & Pathology, Biomedical Forensic Sciences5 KU Leuven, Department of Biology, Laboratory of Biodiversity and Evolutionary Genomics