the genome sequence of melampsora larici-populina , the causal agent of the poplar rust disease

26
The genome sequence of Melampsora larici-populina, the causal agent of the poplar rust disease M. larici-populina Transcriptome Mlp Summer workshop – INRA Nancy, August 20-21 2008 Duplessis Sébastien (INRA Nancy) Tree/Microbe Interactions Joint Unit, INRA/University Nancy, UMR 1136 IAM

Upload: avari

Post on 23-Feb-2016

42 views

Category:

Documents


3 download

DESCRIPTION

Mlp Summer workshop – INRA Nancy, August 20-21 2008. The genome sequence of Melampsora larici-populina , the causal agent of the poplar rust disease M. larici-populina Transcriptome. Duplessis Sébastien (INRA Nancy). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The genome sequence of  Melampsora larici-populina ,  the causal agent of the poplar rust disease

The genome sequence of Melampsora larici-populina, the causal agent of the poplar rust disease

M. larici-populina Transcriptome

Mlp Summer workshop – INRA Nancy, August 20-21 2008

Duplessis Sébastien (INRA Nancy)Tree/Microbe Interactions Joint Unit, INRA/University Nancy, UMR 1136 IAM

Page 2: The genome sequence of  Melampsora larici-populina ,  the causal agent of the poplar rust disease

Mlp Transcriptome – Goals and Means

Goals

Gene Expression

- Identify genetic determinants involved in Mlp biology- Identify sets of genes involved in development of infection structures

(secretion, effectors, avirulence, ...)- Identify sets of genes involved in biotrophy (nutrition, transport)

- Identify expression profiles expressed during plant-fungal interaction

Gene Models Annotation

- Validation of Gene Models prediction- Detection of new Gene Models

Page 3: The genome sequence of  Melampsora larici-populina ,  the causal agent of the poplar rust disease

Mlp Transcriptome – Goals and Means

Means

EST sequencing

- Sanger ESTs from specific cDNA library (cDNA cloning / 100-1000s ESTs)- 454-pyrosequencing from specific tissue (no cDNA cloning / 200-400k

reads)

454: 80 Mb in 1 run for 10K€ vs. 1000s of Sanger ESTs for much more

=> Genes expressed in a given tissue (specific and ubiquitous)

=> No gene prediction a priori

Array-based expression profiling

- DNA Chips – NimbleGen Systems oligonucleotide arrays

=> Expression of all predicted genes represented on the array

=> Gene prediction a priori or EST sequencing required

Page 4: The genome sequence of  Melampsora larici-populina ,  the causal agent of the poplar rust disease

Mlp Transcriptome – EST sequencing I

cDNA Library of Mlp 98AG31 uriniospores and germlings

250 µg of DNase free-RNA were isolated from Mlp 98AG31 urediniospores and germlings (urediniospores grown for less than 12h on agar) sent to JGI

Mlp is an obligate biotroph so spores are unique sources for uncontaminated ESTs

cDNA Library => 29,081 cDNA clones

5'/3' sequencing => 52,269 ESTs (including ~ 4,500 ESTs previously obtained at INRA Nancy)

EST assembly => 11,535 Consensus (mean size 780nt: 100 -> 5052 nt)

— 6,599 singletons — 4,936 clusters — 119 consensus contain > 50 ESTs

Best Blast Hits of most abundant ESTs consisted in:

— stress response TF rds1, HSP, glycosidase, ubiquitin, fruitingbody protein, cyclin, SOD, Ras, antibiotic resistance, protease, laccase, tubulin

— dehydrogenases and cytP450 from Uromyces fabae— predicted gene models from P. graminis

Page 5: The genome sequence of  Melampsora larici-populina ,  the causal agent of the poplar rust disease

Mlp Transcriptome – EST Sequencing I

Comparison to released Pucciniales ESTs (e-value < 10-5)

Phakopsora pachyrizi (soybean rust) ESTs => Germinated/not germ spores, Infected tissues

Puccinia graminis f. sp. tritici (wheat stem rust) => Germ/not germ urediniospores and teliospores

46,411 28,5365,858 45,812 56,7536,483Mlp Mlp Pp Pgt

4,045Pgt spore ESTs

5,738Pp spore ESTs

Page 6: The genome sequence of  Melampsora larici-populina ,  the causal agent of the poplar rust disease

Mlp Transcriptome – EST Sequencing I

Mlp 98AG31 ESTs for Gene Prediction and Gene model support

ESTs were used in JGI and EuGene predictions

=> 27 % of Gene Models supported => 4,507 Gene models supported

ESTs to support gene curation

=> ESTs and clusters are shown on the JGI Melampsora website

Page 7: The genome sequence of  Melampsora larici-populina ,  the causal agent of the poplar rust disease

Mlp Transcriptome – EST Sequencing II

M. medusae f.sp. deltoidae (MMD)— Multiple isolates, diff. growth stages (field)

M. larici-populina (MLP and MLP-H)— Multiple isolates, diff. growth stages (field)— Single isolate, haustoria-enriched (in vitro)

M. medusae f.sp. tremuloidae (MMT)— Single isolate, 13 days growth (in vitro)

M. occidentalis (MO)— Single isolate, 13 days growth (in vitro)

cDNA Libraries from various Melampsora Spp. (Feau, Joly, Hamelin, CFS, Canada)

Page 8: The genome sequence of  Melampsora larici-populina ,  the causal agent of the poplar rust disease

Mlp Transcriptome – EST Sequencing II

Construction kit

# clones sequenced

# readable sequences # contigs #

singletons

MMD Stratagene 5,541 3,695 465 589

MLP Stratagene 3,008 2,493 282 564

MLP-H Clontech 3,708 3,137 615 1,034

MMT Clontech 3,008 2,793 638 999

MO Clontech 3,008 2,642 367 1,285

cDNA Libraries from various Melampsora Spp.

Page 9: The genome sequence of  Melampsora larici-populina ,  the causal agent of the poplar rust disease

Mlp Transcriptome – EST Sequencing II

Procaryota 0.18%Vertebrates 0.25%

Invertebrates 0.74%

Plants 1.4%

Fungi 26.8%

No hits 70.6%

Hypothetical proteins 63%

known proteins in public databases 37%

Avirulence and pathogenicity factors 3%Cell defense 6%

Cell growth/Cell division/DNA synthesis 2%

Cellular organization 6%

Energy 5%

Metabolisms 15%

Transcription 4%Protein destination 5%

Protein synthesis 27%

Signal transduction mediators 4%

Intracellular traffic 5%

Transport facilitator 6%

Unclassified 12%

Stress response (72%)Detoxification (28%)

Carbohydrates, amino-acids, lipids (66%)Ribosomal proteins (77%)Translational factors (22%)tRNA-synthetases (1%)

N, P and S (9%)Nucleotides (7%)

Biosynthesis of cofactors and vitamins (18%)

Glycolysis (42%)TCA pathway (17%)Respiration (33%)

Haustorially expressed secreted proteins (47%)Planta induced rust proteins (13%)Rust transferred protein precursors (27%)Other (13%)

Gluconeogenesis (8%)

Poplars (13%)

Figure 1. Gene prediction and classification of the 4867 assembled ESTs from the four Melampsoralibraries. ESTs with significant matches (Blastx against the Uniprot database; E values < 10-20) were classified into categories according to the functional nomenclature presented in Kamoun et al. (1999).

Others (87%)

Feau et al. 2007. Can.J.Bot

Annotation of Melampsora Spp. ESTs

Page 10: The genome sequence of  Melampsora larici-populina ,  the causal agent of the poplar rust disease

Mlp Transcriptome – EST Sequencing II

Annotation of Melampsora Spp. ESTs

0 5 10 15 20

Protein/domain of unknown function

RNA recognition motifATP synthase

Ubiquitin domain

Core histone H2A/H2B/H3/H4

RNA-metabolising metallo-beta-lactamaseElongation factor

Thioredoxin14-3-3 protein

Ras family

Cytochrome b5-like Heme/Steroid binding domainHelicase conserved C-terminal domain

CFEM domain

Cyclophilin type peptidyl-prolyl cis-trans isomerase

Putative GTPase activating protein for ArfWD domain, G beta repeat

Zing finger (C2HC/CCCH/ZPR1)Actin

Cytochrome c oxidase subunit Va/Vb

Enolase

16 5 7 15

No. of assembled ESTs representing the protein family

1

1

1

1

1

1

1 1

1

1 1

1

5

77

3 12 8 9

6 34

8412

184

3 670

6

23717

116

8 8 3

3 4

Acyl CoA binding protein 53

Heat shock protein 115 9

EF hand 1 9 2

Mitochondrial carrier protein 39 4

34 236632

3 3 7

32

4 2

8 4

11 4

17 2

7 1

M. larici-populina

M. medusae f.sp. deltoidae

M. medusae f.sp. tremuloidae

M. occidentalis

16 14229

196

Ribosomal proteins

0 5 10 15 20 196

Feau et al. 2007. Can.J.Bot

Page 11: The genome sequence of  Melampsora larici-populina ,  the causal agent of the poplar rust disease

Mlp Transcriptome – EST Sequencing III: 454-pyrosequencing

454-pyrosequencing of poplar leaf infected tissues

Melampsora is an obligate biotroph => specialized infection structures (haustoria) formed after 16 h post-inoculation (pi) and uredinia formed after 7 dpi only in the plant host

Strong Mlp invasion of plant tissues was observed at 4 dpi (Rinaldi et al., 2007)

Pyrosequencing allows the generation of 100,000s sequences from isolated transcripts

=> 200,000 ESTs from transcripts isolated from Poplar infected leaves at 4 and 7 dpi with 454 GS-FLEX (Roche) by Cogenix

— Transcripts expressed during plant infection— Transcripts involved in infection structure development, maintenance and biotrophy— Transcripts involved in spore formation and maturation— Identification of plant infection-specific transcripts by comparison with Sanger ESTs

Page 12: The genome sequence of  Melampsora larici-populina ,  the causal agent of the poplar rust disease

Mlp Transcriptome – 454-pyrosequencing

(From Ellegren, Mol. Ecol. 2008)

Page 13: The genome sequence of  Melampsora larici-populina ,  the causal agent of the poplar rust disease

Mlp Transcriptome – 454-pyrosequencing

454-sequencing at JGI

Page 14: The genome sequence of  Melampsora larici-populina ,  the causal agent of the poplar rust disease

Mlp Transcriptome – 454-pyrosequencing

1. 250 µg of total RNA were isolated from infected Poplar leaves ('Beaupré') at 4 hpi and 7 dpi with Mlp 98AG312. cDNA synthesis with SMART cDNA synthesis kit from 60 ng purified mRNA3. 10 µg cDNA recovered and sent to Cogenix for 454-pyrosequencing on GS-FLEX (Roche)

4 dpi: infection hyphae, haustoria 4 dpi: infection hyphae, haustoria,uredinia, spore-forming cells

Pictures by S Hacquard & S Duplessis (2008)by confocal microscopy with PI/Uvitex staining

Page 15: The genome sequence of  Melampsora larici-populina ,  the causal agent of the poplar rust disease

Mlp Transcriptome – 454-pyrosequencing

Cogenix report on 454-sequencing

454-pyrosequencing allow to generate > 400,000 sequences or 2 x 200,000 sequences in 1 run

Poplar infected tissues => ~ 185,663 sequences 454-sequences are small (mean length 203 nt) and requires assembly for transcript reconstruction

Assembly by Newbler => 148,688 assembled in 10,629 contigs & 36,975 reads (= singletons?)

Page 16: The genome sequence of  Melampsora larici-populina ,  the causal agent of the poplar rust disease

Mlp Transcriptome – 454-pyrosequencing

Newbler assembly vs. MIRA assembly

Newbler is a de novo assembler designed for genomic sequences (not transcripts) working in flow-chart space, not nucleotide space

Newbler tends to eliminate several reads with no obvious reasons (>38,000 reads are lost)Cogenix recommended the use of other de novo assembler dedicated to transcript assembly

CAP3 is not recommendedMIRA is an ESTs assembler recently updated for 454-data

=> http://chevreux.org/projects_mira.html

MIRA generates more contigs than Newbler => 17511 contigs (including 2,600 singletons)MIRA provides information on overall quality of sequences (tag 'too short' = low quality sequences)

Genome threader (Gth) allows to map transcript sequences to a genome sequenceMIRA contigs are mapped to Mlp and poplar genomes to identify fungal and plant transcripts

Page 17: The genome sequence of  Melampsora larici-populina ,  the causal agent of the poplar rust disease

Mlp Transcriptome – 454-pyrosequencing

0.50.52

0.540.56

0.580.6

0.620.64

0.660.68

0.70.72

0.740.76

0.780.8

0.820.84

0.860.88

0.900.92

0.940.96

0.981.00

0

500

1000

1500

2000

2500

3000MIRANewbler

Nb

cont

igs

0.50.52

0.540.56

0.580.6

0.620.64

0.660.68

0.70.72

0.740.76

0.780.8

0.820.84

0.860.88

0.900.92

0.940.96

0.981.00

0

200

400

600

800

1000

1200

1400 MIRANewbler

Nb

cont

igs

10e-5 - 10e-20 10e-20 - 10e-50 10e-50 - 10e-100 10e-100 - 0.0 0.0

0

200

400

600

800

1000

1200

1400

1600NewblerMIRA

Nb

cont

igs

10e-5 - 10e-20 10e-20 - 10e-50 10e-50 - 10e-100 10e-100 - 0.0 0.0

0

500

1000

1500

2000

2500NewblerMIRA

Nb

cont

igs

Newbler vs. MIRA

Mlp sequences Poplar sequences

Singletons reads from Newbler are mostly low quality sequences

Page 18: The genome sequence of  Melampsora larici-populina ,  the causal agent of the poplar rust disease

Mlp Transcriptome – 454-pyrosequencing

Final MIRA assembly vs. poplar and Mlp genomes

— Contigs that showed a Gth score < 0.9 were dissolved in singletons— Contigs attributed to both genomes with Gth scores > 0.9 were manually resolved— Contigs attributed to a genome and containing reads attributed to the other genome

were manually inspected with Consed => new contigs/singletons— Singletons with Gth scores < 0.9 were not retained

5,956 contigs & 9,562 singletons attributed to Mlp

6,414 contigs & 21,400 singletons attributed to Poplar

PASA (Program to Assemble Spliced Alignment)

PASA is a tool designed for curation of gene catalogs using sets of ESTs and FL-CDNA and based onstringent alignment to genome sequence with GMAP, assembly in clusters based on position on genome sequence, comparison to current catalogue of gene models => curation

PASA was used in several published 454-analyses, and in Arabidopsis community for gene curation

PASA => Mlp EST (Sanger & 454 contigs) vs. Mlp genome/gene models

Page 19: The genome sequence of  Melampsora larici-populina ,  the causal agent of the poplar rust disease

Mlp Transcriptome – 454-pyrosequencing

PASA outputs for Mlp 454 Contigs

PASA was run using all 454 reads against Mlp Genome and a similar number of gene models were supported

Page 20: The genome sequence of  Melampsora larici-populina ,  the causal agent of the poplar rust disease

Mlp Transcriptome – 454-pyrosequencing

PASA outputs for Mlp Sanger contigs

Total of 6294 Mlp Gene Models supported (38%)

Page 21: The genome sequence of  Melampsora larici-populina ,  the causal agent of the poplar rust disease

Mlp Transcriptome – 454-pyrosequencingExamples of gene models curation based on Mlp 454 Contigs proposed by PASA

Page 22: The genome sequence of  Melampsora larici-populina ,  the causal agent of the poplar rust disease

Mlp Transcriptome – 454-pyrosequencingMost abundant transcripts supporting Mlp Gene Models identified through 454-sequencing

4010 Gene models supported by 454 ESTs

— 935 no hits in nr/swissprot - 391 specific to Pucciniales - 519 specific to Mlp

— 265 encodes SSPs => 166 no hits in nr/swpr - 34 specific to Pucciniales - 128 specific to Mlp

Page 23: The genome sequence of  Melampsora larici-populina ,  the causal agent of the poplar rust disease

Mlp Transcriptome – NimbleGen Systems oligonucleotide arrays

NimbleGen Systems Expression oligont arrays

~390,000 60-mer oligoprobes evenly distributed on 2cm2 array4plex arrays = 80 to 90,000 probes per array (+ controls)

Set of 8 oligoprobes/gene duplicated in Laccaria bicolor

16,694 JGI models + new EuGene models with 454 support[All 454 supported new CDS ?]

17 to 20,000 Mlp Gene Models => 4 probes/genes => no duplicated probes => Populus filtered

10 x 4plex NimbleGen arrays ordered – Design ASAPMlp Gene Expression during timecourse infection

Page 24: The genome sequence of  Melampsora larici-populina ,  the causal agent of the poplar rust disease

Mlp Transcriptome – Conclusions

Conclusions

— 52,269 Mlp 98AG31 ESTs support 27% JGI Mlp Gene Models

— ESTs from other Mlp Spp to help in annotation (+ polymorphism study)

— 185,000 454-reads were assembled in 12,370 Contigs & 30,962 Singletons

5,956 contigs & 9,562 singletons attributed to Mlp by Gth

6,414 contigs & 21,400 singletons attributed to Poplar by Gth

— PASA identified a total of 6294 Mlp Gene Models supported both by 454 and Sanger ESTs contigs = 38% of Mlp Gene Models (11% increase)

— MIRA identified many Gene models that may need annotation

— MIRA also identified more than 2,500 putative new genes (to be verified)

— Among the 4,010 Gene Models expressed in planta

=> 519 are specific to Mlp and 391 to Pucciniales => 265 encode SSPs and 128 SSPs are specific

toMlp

Page 25: The genome sequence of  Melampsora larici-populina ,  the causal agent of the poplar rust disease

Mlp Transcriptome – Conclusions

Ongoing…

— Curation of Gene Models supported by 454 contigs

— Prediction/Curation of putative new genes with 454 contigs support

— Design of NimbleGen Systems Oligoarray Mlp v1.0

To come…

— Alternative splicing

— Presence of SNPs (Transcripts expressed in both nuclei?)

— Profiles of candidate genes during timecourse infection of poplar leaves

Page 26: The genome sequence of  Melampsora larici-populina ,  the causal agent of the poplar rust disease

Stéphane Hacquard (INRA Nancy)Mlp effectors

Emilie Tisserant & Benoît Hilselberger

(INRA Nancy) Mlp Bioinfo

Yao-Cheng Lin (VIB, Ghent, BE) EuGene prediction, Mlp gene families

Mlp 98AG31

the 'bad guy' genomic team at INRA

UMR 1136 IAM

Marie-Pierre Oudot-Le Secq(INRA Nancy) EST annotation

Duplessis Sébastien & Francis Martin