the genome sequence of melampsora larici- populina the causal agent of the poplar rust disease
DESCRIPTION
Mlp Summer workshop – INRA Nancy, August 20-21 2008. The genome sequence of Melampsora larici- populina the causal agent of the poplar rust disease Gene content in the Mlp Genome ( automated annotation). Duplessis Sébastien (INRA Nancy). - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: The genome sequence of Melampsora larici- populina the causal agent of the poplar rust disease](https://reader036.vdocuments.net/reader036/viewer/2022062517/56813304550346895d99c0ba/html5/thumbnails/1.jpg)
The genome sequence of Melampsora larici-populinathe causal agent of the poplar rust disease
Gene content in the Mlp Genome(automated annotation)
Mlp Summer workshop – INRA Nancy, August 20-21 2008
Duplessis Sébastien (INRA Nancy)
Tree/Microbe Interactions Joint Unit, INRA/University Nancy, UMR 1136 IAM
![Page 2: The genome sequence of Melampsora larici- populina the causal agent of the poplar rust disease](https://reader036.vdocuments.net/reader036/viewer/2022062517/56813304550346895d99c0ba/html5/thumbnails/2.jpg)
Annotation of Mlp Genome – Gene prediction
2006-2007
Codingpotential search
SpliceMachine
NetstartRepeats
BlastnBlastx
EuGene, FGeneSH, Genewise
Intrinsicapproaches
Extrinsicapproaches
PredictedGenes
(manual curation)
tBlastx
PucciniaSporobolomycesBasidiomycetes
Swissprot Mlp ESTs
![Page 3: The genome sequence of Melampsora larici- populina the causal agent of the poplar rust disease](https://reader036.vdocuments.net/reader036/viewer/2022062517/56813304550346895d99c0ba/html5/thumbnails/3.jpg)
Mlp Genome Project – Summer 2007
Pre-release of Mlp genome assembly (16.4% gaps – Assembled with JAZZ)
Main genome scaffold total: 2,682
ESTs from 50/50 spores and germtubes of Mlp 98AG31
INRA Nancy => ~4,000 (2004)JGI => ~60,000 (2007)
=> ~52,000 ESTs
ESTs from spores and germlings of Melamspora Spp. [Mlp, Mmd, Mmt, Mo]
CFS Laval => ~3,000 Mlp / ~4,200 Mmd / ~3,000 Mo / ~3,000 MmtIn planta ESTs from Mlp haustoria => ~1,700 Mlp H3B
=> ~15,000 ESTs
![Page 4: The genome sequence of Melampsora larici- populina the causal agent of the poplar rust disease](https://reader036.vdocuments.net/reader036/viewer/2022062517/56813304550346895d99c0ba/html5/thumbnails/4.jpg)
Blast against Mlp scafolds Blast against Mlp ESTsBlast against available basidiomycete genomes
Melampsora IAM website => summer 2007 (B. Hilselberger) updated in 2008 (E. Tisserant)
![Page 5: The genome sequence of Melampsora larici- populina the causal agent of the poplar rust disease](https://reader036.vdocuments.net/reader036/viewer/2022062517/56813304550346895d99c0ba/html5/thumbnails/5.jpg)
Files to help in annotation using Artemis
=> fasta of genome scaffolds
=> gff files of ESTs clusters
=> gff files of blastn Hits vs. Puccinia, Sporobolomyces & Ustilago gene models
Melampsora IAM website => summer 2007 (B. Hilselberger) updated in 2008 (E. Tisserant)
![Page 6: The genome sequence of Melampsora larici- populina the causal agent of the poplar rust disease](https://reader036.vdocuments.net/reader036/viewer/2022062517/56813304550346895d99c0ba/html5/thumbnails/6.jpg)
Annotation of FL sequences = TRAINING SET for gene predictors (EuGene, fgenesh, )
Gene models annotation based on complete EST support & Homology
Coding for know ubiquitous function (metabolism, cytoskeleton elements…)Coding for hypothetical proteins and new genes?Coding for proteins of various size
Mannual curation performed with Artemis (Nancy & Québec)
=> 348 GM curated
Edition of annotation cards => Melampsora Genome Consortium website
![Page 7: The genome sequence of Melampsora larici- populina the causal agent of the poplar rust disease](https://reader036.vdocuments.net/reader036/viewer/2022062517/56813304550346895d99c0ba/html5/thumbnails/7.jpg)
TRAINING SET for gene prediction (EuGene, fgenesh, )
=> 348 GM curated
=> 52,269 ESTs from Mlp 98AG31
=> raw TE prediction based on Mlp genome pre-release
![Page 8: The genome sequence of Melampsora larici- populina the causal agent of the poplar rust disease](https://reader036.vdocuments.net/reader036/viewer/2022062517/56813304550346895d99c0ba/html5/thumbnails/8.jpg)
• 39 scaffolds (43.9 Mbp)• 409 repetitive elements provided by collaborator ,
87 generated in pipeline• nr: N.crassa, M.grisea, F.graminearum• ESTs
– 3941 uniseqs described in 2003 paper– 6318 uniseqs described in 2008 paper– 8799 JGI cluster consensi (includes
external ESTs)• 5 C.parasitica CDSs from NCBI
JGI Gene prediction (Andrea Aerts – Jan-Mar/2008 )
![Page 9: The genome sequence of Melampsora larici- populina the causal agent of the poplar rust disease](https://reader036.vdocuments.net/reader036/viewer/2022062517/56813304550346895d99c0ba/html5/thumbnails/9.jpg)
Outputsfeature Mellp1 Sporo1 Lacbi1 Phchr1 Pospl1
Scaffolds (Mbp)
101.1 21.2 64.9 35.1 90.9
Gaps (Mbp)3.4
(3.4%)0.33
(1.6%)6.2
(9.6%) N/A21.9
(24.1%)
Repeats (Mbp)
49.4 (48.9%)
0.31 (1.5%)
14.4 (22.2%)
0.32 (0.91%)
4.96 (5.46%)
Gene length (Mbp)
25.0 (24.7%)
13.2 (62.3%)
31.6 (48.7%)
16.8 (47.9%)
35.6 (39.2%)
# genes 15,410 5,536 20,614 10,048 17,173
# genes / Mbp 152.42 261.13 317.63 286.27 188.92
![Page 10: The genome sequence of Melampsora larici- populina the causal agent of the poplar rust disease](https://reader036.vdocuments.net/reader036/viewer/2022062517/56813304550346895d99c0ba/html5/thumbnails/10.jpg)
What do the genes look like?Mellp1 Sporo1 Lacbi1 Phchr1 Pospl1
Gene length
1622.89 2389.05 1533.42 1667.04 2075.26
Transcript length
1241.87 1750.21 1134.45 1365.73 1438.85
Protein length
383.36 564.80 367.19 455.18 458.46
Exon length
256.26 242.77 210.13 233.64 211.92
Intron length
101.07 104.88 92.70 64.18 111.92
Exon frequency
4.85 7.21 5.40 5.85 6.79
![Page 11: The genome sequence of Melampsora larici- populina the causal agent of the poplar rust disease](https://reader036.vdocuments.net/reader036/viewer/2022062517/56813304550346895d99c0ba/html5/thumbnails/11.jpg)
How were the genes predicted?
Method Mellp1 Sporo1 Lacbi1 Phchr1 Pospl1
KGs and ESTs
1377 (8.9%) 54 (1%) 64 (0.3%) 12 (0.1%) 61 (0.4%)
homology 2653 (17.2%)Eug 5603
(36.4%)
2713 (49%)
3699 (18%)Eug 9848
(47.7%)
3526 (35.1%)
7549 (43.9%)
ab initio 5777 (37.5%) 2769 (50%)
7003 (34%) 6510 (64.8%)
9563 (55.7%)
![Page 12: The genome sequence of Melampsora larici- populina the causal agent of the poplar rust disease](https://reader036.vdocuments.net/reader036/viewer/2022062517/56813304550346895d99c0ba/html5/thumbnails/12.jpg)
How good are the genes?
metric Mellp1 Sporo1 Lacbi1 Phchr1 Pospl1
start + stop
14432 (94%)
3891 (70%) 18218 (88%)
8352 (83%) 14569 (85%)
nr 6664 (43%) 4446 (80%) 10925 (53%)
ND 13374 (78%)
Pfam 4101 (27%) 3272 (59%) 7653 (37%) 4769 (47%) 7681 (45%)
EST 3230 (21%) 1759 (32%) 2468 (12%) ND 4038 (23%)
![Page 13: The genome sequence of Melampsora larici- populina the causal agent of the poplar rust disease](https://reader036.vdocuments.net/reader036/viewer/2022062517/56813304550346895d99c0ba/html5/thumbnails/13.jpg)
KOG assignments
Mellp1 Sporo1 Lacbi1 Phchr1 Pospl1
Cellular Processes & Signaling
2769 (18%)
1525 (28%)
3351 (16%)
2132 (21%)
3482 (20%)
Information Storage & Processing
1864 (12%)
1149 (21%)
2196 (11%)
1456 (14%)
2251 (13%)
Metabolism 2127 (14%)
1358 (25%)
2294 (11%)
2044 (20%)
3589 (21%)
![Page 14: The genome sequence of Melampsora larici- populina the causal agent of the poplar rust disease](https://reader036.vdocuments.net/reader036/viewer/2022062517/56813304550346895d99c0ba/html5/thumbnails/14.jpg)
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
Amino Acid Metabolism
Biodegradation of Xenobiotics
Biosynthesis of Secondary Metabolites
Carbohydrate Metabolism
Energy Metabolism Lipid Metabolism
Metabolism of Cofactors and Vitamins Metabolism of Complex Carbohydrates
Metabolism of Complex Lipids Metabolism of Other Amino Acids
Nucleotide Metabolism
Mellp1
Sporo1
Lacbi1
Phchr1
Pospl1
KEGG assignments
![Page 15: The genome sequence of Melampsora larici- populina the causal agent of the poplar rust disease](https://reader036.vdocuments.net/reader036/viewer/2022062517/56813304550346895d99c0ba/html5/thumbnails/15.jpg)
Prediction of Gene Models using EuGene (VIB - Ghent)
Annotation performed with Mlp genome pre-release
M-P Oudot Le Secq - Eugene annotation using Laccaria bicolor annotation parameters=> ~ 17,000 Mlp gene models (<1,500 TEs) => Mlp GM v0.0
Yao-Cheng Lin - Eugene annotation using parameters specifically defined for M. larici-populina=> ~9,000 Mlp gene models (> 200aa)
Annotation performed with Mlp genome assembly release Jan2008
Yao-Cheng Lin - EuGene annotation using specific training for M. larici-populina
=> 12,386 Mlp gene models
4308 hits vs yeast4899 hits against Uniprot (7487 no hits - 1/3 ; 2/3)4708 supported by ESTs
Yao-Cheng Lin – Last EuGene annotation (summer 2008)
including 454 data (~ 5000 contigs) and adjusted parameters for small secreted proteins prediction
=> 17,167 Mlp gene models (6,989 < 300aa)
![Page 16: The genome sequence of Melampsora larici- populina the causal agent of the poplar rust disease](https://reader036.vdocuments.net/reader036/viewer/2022062517/56813304550346895d99c0ba/html5/thumbnails/16.jpg)
• Genewise – 9193 models• Fgenesh_pm 3147 models• estExt_fpm 2438 models
JGI Gene prediction (Andrea Aerts – 03/28/2008 )
Reconciliation and release in April 2008
+
EuGene Prediction
![Page 17: The genome sequence of Melampsora larici- populina the causal agent of the poplar rust disease](https://reader036.vdocuments.net/reader036/viewer/2022062517/56813304550346895d99c0ba/html5/thumbnails/17.jpg)
JGI Gene Models prediction
16694 Gene models
4465 EuGene models (27%)
4810 fgenesh1 (29%) + 5422 fgenesh2 (32%)
=> 65.5% fgenesh models
1997 Genewise/GenewisePlus models (12%)
21% of fgenesh/genewise models were consolidated with EST Extension
Prediction method:– Ab initio: 51 %– EuGene: 27 %– Homology based: 14 %– EST based: 8 %
16,694 gene models predicted by JGI predictions (& EuGene)
Gene Model validation:– Complete (5'M-3'*): 94 %– Alignment with nr: 43 %– Alignment with pfam: 25 %– EST support: 27 %
![Page 18: The genome sequence of Melampsora larici- populina the causal agent of the poplar rust disease](https://reader036.vdocuments.net/reader036/viewer/2022062517/56813304550346895d99c0ba/html5/thumbnails/18.jpg)
JGI Gene Models prediction
16,694 gene models predicted by JGI (& EuGene)
Mean exon size: 250 pb (Laccaria: 210 pb)Mean intron size: 120 pb (Laccaria: 93 pb)Mean protein size: 378 (Laccaria: 367 aa)
Mean gene length: 1685 pb (Laccaria: 1.5 kb)Mean transcript length: 1224 b (Laccaria: 1.1 kb)Exon # / gene: 4.90 (Laccaria: 5.4)
Protein length < 300 aa— Laccaria: 52%, Coprinus: 40%— Melampsora: 49%, Puccinia: 54%
![Page 19: The genome sequence of Melampsora larici- populina the causal agent of the poplar rust disease](https://reader036.vdocuments.net/reader036/viewer/2022062517/56813304550346895d99c0ba/html5/thumbnails/19.jpg)
JGI Gene Models prediction – Introns donors and acceptors
![Page 20: The genome sequence of Melampsora larici- populina the causal agent of the poplar rust disease](https://reader036.vdocuments.net/reader036/viewer/2022062517/56813304550346895d99c0ba/html5/thumbnails/20.jpg)
Gene Models density on the 20 largest scaffoldsMean gene density of 2.04/10kb => 1 gene /4.9 kb (Laccaria 1 gene / 3.1 kb)
![Page 21: The genome sequence of Melampsora larici- populina the causal agent of the poplar rust disease](https://reader036.vdocuments.net/reader036/viewer/2022062517/56813304550346895d99c0ba/html5/thumbnails/21.jpg)
28% of the genome is coding sequence
16,694 putative proteins (gene models) = JGI prediction + extra putative proteins identified with EuGene
15,725 proteins > 100 AALaccaria >17,000Phanerochaete 10,048Coprinopsis 8,759Ustilago 6,522
7,830 with homologs in nr (47%) including 3,893 hypothetical proteins
(Puccinia, Laccaria, mostly basidiomycete) 5,461 with homologs in swissprot (33%) 6,820 with homologs in Laccaria (41%) 4,507 supported by Mlp ESTs (27%)
A large proportion (30%) of Mlp genes do not have homologues in other fungal genomes including Pucciniales P. graminis and Sporobolomyces roseus
JGI Gene Models prediction – The Mlp gene space
![Page 22: The genome sequence of Melampsora larici- populina the causal agent of the poplar rust disease](https://reader036.vdocuments.net/reader036/viewer/2022062517/56813304550346895d99c0ba/html5/thumbnails/22.jpg)
ESTs Phakopsora Puccinia Sporobolomyces Ustilago Phanerochaete Coprinus Laccaria Magnaporthe
0
10
20
30
40
50
60
70
Matchs (%)
Blast vs. Other fungal deduced proteomes
33% of Melampsora larici-populina specific Gene Models (5,500 models with no homologs but ~300 Pfam/IPR hits)
10,344 homologs in P. graminis (62%)~ 25% of orthologs with P. graminis
![Page 23: The genome sequence of Melampsora larici- populina the causal agent of the poplar rust disease](https://reader036.vdocuments.net/reader036/viewer/2022062517/56813304550346895d99c0ba/html5/thumbnails/23.jpg)
Mlp gene models functional classification
![Page 24: The genome sequence of Melampsora larici- populina the causal agent of the poplar rust disease](https://reader036.vdocuments.net/reader036/viewer/2022062517/56813304550346895d99c0ba/html5/thumbnails/24.jpg)
Cellular component
cell
macromolecular complex
organelle
extracellular region
envelope
Molecular function
catalytic activity
binding
transporter activity
enzyme regulator activity
molecular transducer activity
motor activity
transcription regulator activity
structural molecule activity
nutrient reservoir activity
antioxidant activity
Biological process
metabolic process
establishment of localization
cellular process
biological regulation
response to stimulus
reproduction
GO classification: 27.8%
![Page 25: The genome sequence of Melampsora larici- populina the causal agent of the poplar rust disease](https://reader036.vdocuments.net/reader036/viewer/2022062517/56813304550346895d99c0ba/html5/thumbnails/25.jpg)
• KEGG pathways: 2758 gene models (16.5%)
Amino Acid
Metabolism
Biodegradat ion
of Xenobiot ics
Biosynthesis of
Secondary
Metabolit es
Carbohydrat e
Metabolism
Energy
Metabolism
Lipid
Metabolism
Metabolism of
Cofactors and
Vit amins
Metabolism of
Other Amino
Acids
Nucleot ide
Metabolism
0
5
10
15
20
25
30
35
Melampsora
Puccinia
Sporobolomyces
%
![Page 26: The genome sequence of Melampsora larici- populina the causal agent of the poplar rust disease](https://reader036.vdocuments.net/reader036/viewer/2022062517/56813304550346895d99c0ba/html5/thumbnails/26.jpg)
JGI summary – A complete table to help in annotating Mlp gene models
![Page 27: The genome sequence of Melampsora larici- populina the causal agent of the poplar rust disease](https://reader036.vdocuments.net/reader036/viewer/2022062517/56813304550346895d99c0ba/html5/thumbnails/27.jpg)
Emilie Tisserant & Benoît Hilselberger (INRA Nancy) Mlp Bioinfo
Yao-Cheng Lin (VIB, Ghent, BE) EuGene prediction, Mlp gene families
Mlp 98AG31
Marie-Pierre Oudot-Le Secq (INRA Nancy)early EuGene gene prediction
the 'bad guy' genomic team at INRA
UMR 1136 IAM Duplessis Sébastien & Francis Martin