![Page 1: Wheat Genome Structural Annotation Using a Modular and … · 2017-02-10 · MEGAP – A Modular and Evidence-combined Genome Annotation Pipeline Page 5 - Leveraging on strength from](https://reader031.vdocuments.net/reader031/viewer/2022041919/5e6b5b2efd12255b8304c54f/html5/thumbnails/1.jpg)
Bayer 4:3 Template 2010 • March 2016 Page 1
Xi Wang
Bioinformatics Scientist
Computational Life Science
17/01/2017
Wheat Genome Structural Annotation Using a Modular and Evidence-combined Annotation Pipeline
![Page 2: Wheat Genome Structural Annotation Using a Modular and … · 2017-02-10 · MEGAP – A Modular and Evidence-combined Genome Annotation Pipeline Page 5 - Leveraging on strength from](https://reader031.vdocuments.net/reader031/viewer/2022041919/5e6b5b2efd12255b8304c54f/html5/thumbnails/2.jpg)
Genomics Group Mission Statement
Page 2
![Page 3: Wheat Genome Structural Annotation Using a Modular and … · 2017-02-10 · MEGAP – A Modular and Evidence-combined Genome Annotation Pipeline Page 5 - Leveraging on strength from](https://reader031.vdocuments.net/reader031/viewer/2022041919/5e6b5b2efd12255b8304c54f/html5/thumbnails/3.jpg)
Page 3
Genomics Resources consisting of different data types
![Page 4: Wheat Genome Structural Annotation Using a Modular and … · 2017-02-10 · MEGAP – A Modular and Evidence-combined Genome Annotation Pipeline Page 5 - Leveraging on strength from](https://reader031.vdocuments.net/reader031/viewer/2022041919/5e6b5b2efd12255b8304c54f/html5/thumbnails/4.jpg)
From candidate genes to open ended gene discovery
Page 4
- Existing pipelines for genome structural annotation
- Not suitable for complex genome, resulting in high false positive prediction rate
- High quality, but intensive, lack of reproducibility and extensive manual work
! Requiring pipeline that is fast and efficient with competitive output
![Page 5: Wheat Genome Structural Annotation Using a Modular and … · 2017-02-10 · MEGAP – A Modular and Evidence-combined Genome Annotation Pipeline Page 5 - Leveraging on strength from](https://reader031.vdocuments.net/reader031/viewer/2022041919/5e6b5b2efd12255b8304c54f/html5/thumbnails/5.jpg)
MEGAP – A Modular and Evidence-combined Genome Annotation Pipeline
Page 5
- Leveraging on strength from diverse evidence resources
- Robustness of transcript evidence
- Precision of exon-intron structure using computational tools
- Support from high quality, curated reference data sets
- Confidence level classification
- efficiency
- Sustainable and reproducible
- Modularity
Repeat masking
Transcript evidence - RNA-seq - PacBio IsoSeq
Ab initio prediction - Augustus - FGeneSH
Reference gene models - Rice - Brachypodium
Search in repeat databases De novo repeat identification
Gene discovery
EvidenceModeler combiner
Confidence classification
![Page 6: Wheat Genome Structural Annotation Using a Modular and … · 2017-02-10 · MEGAP – A Modular and Evidence-combined Genome Annotation Pipeline Page 5 - Leveraging on strength from](https://reader031.vdocuments.net/reader031/viewer/2022041919/5e6b5b2efd12255b8304c54f/html5/thumbnails/6.jpg)
Wheat annotation using MEGAP – repeat masking
Page 6
CS IWGSC WGA v0.4
- Released on June, 2016
- Based on NRGene
- Anchoring/ordering using POPSEQ + HiC
- 21 pseudochromosomes, 14Gb
RepeatModeler
Build consensus modules of Putative interspersed repeats
Plant-specific RepBase
Assembly
RepeatMasker
Repeat masked genome
Repeat masking
![Page 7: Wheat Genome Structural Annotation Using a Modular and … · 2017-02-10 · MEGAP – A Modular and Evidence-combined Genome Annotation Pipeline Page 5 - Leveraging on strength from](https://reader031.vdocuments.net/reader031/viewer/2022041919/5e6b5b2efd12255b8304c54f/html5/thumbnails/7.jpg)
Wheat annotation using MEGAP – repeat masking
Page 7
Average = 84.7%
Repeat content MEGAP
CLARI-TE ! 3B: 85%
3B repeat distribution
![Page 8: Wheat Genome Structural Annotation Using a Modular and … · 2017-02-10 · MEGAP – A Modular and Evidence-combined Genome Annotation Pipeline Page 5 - Leveraging on strength from](https://reader031.vdocuments.net/reader031/viewer/2022041919/5e6b5b2efd12255b8304c54f/html5/thumbnails/8.jpg)
Page 8
Datasets & evidence types
- Transcriptome data:
- Public and in-house RNA-seq for CS and internal cultivars
- 92 tissues and developmental stages
- Pooled, normalized and size-selected in-house PacBio IsoSeq Libraries
- Ab initio prediction
- Augustus
- FgeneSH
- Close related species
- Rice
- Brachypodium
Wheat annotation using MEGAP – gene annotation
![Page 9: Wheat Genome Structural Annotation Using a Modular and … · 2017-02-10 · MEGAP – A Modular and Evidence-combined Genome Annotation Pipeline Page 5 - Leveraging on strength from](https://reader031.vdocuments.net/reader031/viewer/2022041919/5e6b5b2efd12255b8304c54f/html5/thumbnails/9.jpg)
MEGAP gene annotation – evidence types, confidence levels and evidence combiner
High Low Potential
Middle
EvidenceModeler (EVM) compares and merges evidences - Robustness of transcriptome evidence - Precise prediction in exon-intron boundary, gene start and stop - Evidence weighting score (transcriptome > related species > ab initio)
Page 9
![Page 10: Wheat Genome Structural Annotation Using a Modular and … · 2017-02-10 · MEGAP – A Modular and Evidence-combined Genome Annotation Pipeline Page 5 - Leveraging on strength from](https://reader031.vdocuments.net/reader031/viewer/2022041919/5e6b5b2efd12255b8304c54f/html5/thumbnails/10.jpg)
EVM combines evidences into final gene structure
Page 10
Chr3B
PacBio IsoSeq
RNA-seq
Augustus
FGeneSH
Rice
Brachypodium
Final (EVM combiner)
3B reference (TriAnnot)
RIKEN fl-cDNA (http://TriFLDB.psc.riken.jp/)
![Page 11: Wheat Genome Structural Annotation Using a Modular and … · 2017-02-10 · MEGAP – A Modular and Evidence-combined Genome Annotation Pipeline Page 5 - Leveraging on strength from](https://reader031.vdocuments.net/reader031/viewer/2022041919/5e6b5b2efd12255b8304c54f/html5/thumbnails/11.jpg)
MEGAP gene annotation – evidence types, confidence levels and evidence combiner
High Low Potential
Middle
EvidenceModeler (EVM) compares and merges evidences
Page 11
Confidence level # Genes
Total supported 164,785
High 120,329
Middle 8,820
Low 35,636
Potential (predicted only) 125,295
![Page 12: Wheat Genome Structural Annotation Using a Modular and … · 2017-02-10 · MEGAP – A Modular and Evidence-combined Genome Annotation Pipeline Page 5 - Leveraging on strength from](https://reader031.vdocuments.net/reader031/viewer/2022041919/5e6b5b2efd12255b8304c54f/html5/thumbnails/12.jpg)
Comparison of MEGAP with existing gene data sets
Page 12
CS NRGene CS IWGSC CS TGAC
Assembly NRGene v0.4 Survey
contigs TGAC scaffolds
Annotation
Provider MEGAP IWGSC TGAC
Whole genome
# Gene 129,149 103,274 104,352
mRNA length (bp; mean/
median) 1,271/1,027 1,209/977 3,146/1,912
Coding content (Mb) 164 124 328
# Predicted ORFs 113,573 99,354 103,504
3B chromosome
# Protein coding gene 7,252 7,264 6,361
mRNA length (bp; mean/
median) 1,232/998 1,094/909 3,111/1,872
Coding content (Mb) 9 8 18
Chr3B
![Page 13: Wheat Genome Structural Annotation Using a Modular and … · 2017-02-10 · MEGAP – A Modular and Evidence-combined Genome Annotation Pipeline Page 5 - Leveraging on strength from](https://reader031.vdocuments.net/reader031/viewer/2022041919/5e6b5b2efd12255b8304c54f/html5/thumbnails/13.jpg)
MEGAP shows higher sensitivity and accuracy in gene annotation
Page 13
RIKEN fl-cDNA sequence library
(http://TriFLDB.psc.riken.jp/)
Total 2,393
# mapped to IWGSC 1,671
# mapped to TGAC 1,926
# mapped to MEGAP 2,040
Disease resistant gene candidate on 3B
Gene model literature
MEGAP
TGAC
IWGSC
![Page 14: Wheat Genome Structural Annotation Using a Modular and … · 2017-02-10 · MEGAP – A Modular and Evidence-combined Genome Annotation Pipeline Page 5 - Leveraging on strength from](https://reader031.vdocuments.net/reader031/viewer/2022041919/5e6b5b2efd12255b8304c54f/html5/thumbnails/14.jpg)
Total Novel
IWGSC 103,274 6,454
MEGAP 129,149 26,455
Novel genes annotated using improved assembly and combination of evidence types
Page 14
Mapping novel genes against IWGSC survey contigs
Mapped Unmapped
13,277 13,178
92 RNA-seq samples
![Page 15: Wheat Genome Structural Annotation Using a Modular and … · 2017-02-10 · MEGAP – A Modular and Evidence-combined Genome Annotation Pipeline Page 5 - Leveraging on strength from](https://reader031.vdocuments.net/reader031/viewer/2022041919/5e6b5b2efd12255b8304c54f/html5/thumbnails/15.jpg)
Summary
Page 15
MEGAP
- Sustainable and reproducible
- Modularity – easy to extend and incorporate additional evidence, both for repeat masking and gene discovery
- Fast and efficient with competitive output
- Taking advantage of combination of
- Robustness of transcript evidence
- Precision of exon-intron structure using computational tools
- Support from high quality, curated reference data sets
- Confidence level classification
12d
5d 7d
7d 1d
1d 1d
1d 3d
Total = ~ 24 days
![Page 16: Wheat Genome Structural Annotation Using a Modular and … · 2017-02-10 · MEGAP – A Modular and Evidence-combined Genome Annotation Pipeline Page 5 - Leveraging on strength from](https://reader031.vdocuments.net/reader031/viewer/2022041919/5e6b5b2efd12255b8304c54f/html5/thumbnails/16.jpg)
Summary
Page 16
Key metrics wheat genome annotation using MEGAP
- Repeat content 84.7%
- Transcript loci: 164,785 of which 129,149 H/M confidence
- 1,271 bp mean length
![Page 17: Wheat Genome Structural Annotation Using a Modular and … · 2017-02-10 · MEGAP – A Modular and Evidence-combined Genome Annotation Pipeline Page 5 - Leveraging on strength from](https://reader031.vdocuments.net/reader031/viewer/2022041919/5e6b5b2efd12255b8304c54f/html5/thumbnails/17.jpg)
Discussion and Perspectives
Page 17
- Alternative genome annotation pipelines:
- MAKER2, FGeneSH++, BRAKER, Augustus
- Additional wheat EST data
- Annotation on wheat CS v1.0 assembly
- Assembly was improved using BAC-related data sponsored by BAYER
- By end of January
- Openness of data sharing with consortium and collaboration
- Other genetic elements:
- Isoforms, non-coding RNA, regulatory elements, promoter and enhancers
- Functional annotation:
- Integrative approach combining homology –based inference and additional information such as QTL, expression data, genotyping-phenotyping association, etc
In collaboration with consortium and academia groups
![Page 18: Wheat Genome Structural Annotation Using a Modular and … · 2017-02-10 · MEGAP – A Modular and Evidence-combined Genome Annotation Pipeline Page 5 - Leveraging on strength from](https://reader031.vdocuments.net/reader031/viewer/2022041919/5e6b5b2efd12255b8304c54f/html5/thumbnails/18.jpg)
Acknowledgment
Page 18
- Computational Life Science group
- Evelyne Caestecker
- Ken Heydrickx
- Steven Robbens
- Genomics & Genetics
- Lisbeth Vercruysse
- Frank Coen
- Fred van Ex
- Mark Davey
- HPC
- Filip Nollet
![Page 19: Wheat Genome Structural Annotation Using a Modular and … · 2017-02-10 · MEGAP – A Modular and Evidence-combined Genome Annotation Pipeline Page 5 - Leveraging on strength from](https://reader031.vdocuments.net/reader031/viewer/2022041919/5e6b5b2efd12255b8304c54f/html5/thumbnails/19.jpg)
Decoding Wheat
Page 19
- Highlights:
- Wheat PacBio genome assembly
- De novo gene discovery using PacBio IsoSeq data for wheat
- Transcriptome atlas
- Genetic diversity study using exome-capture sequencing
![Page 20: Wheat Genome Structural Annotation Using a Modular and … · 2017-02-10 · MEGAP – A Modular and Evidence-combined Genome Annotation Pipeline Page 5 - Leveraging on strength from](https://reader031.vdocuments.net/reader031/viewer/2022041919/5e6b5b2efd12255b8304c54f/html5/thumbnails/20.jpg)
Thank you!