bioinformatics computational methods to discover ncrna in bacteria

20
www. .uni-rostock. Bioinformatics Bioinformatics Computational methods to discover ncRNA in Computational methods to discover ncRNA in bacteria bacteria Ulf Schmitz [email protected] Bioinformatics and Systems Biology Group www.sbi.informatik.uni-rostock.de

Upload: dewei

Post on 11-Jan-2016

19 views

Category:

Documents


0 download

DESCRIPTION

Bioinformatics Computational methods to discover ncRNA in bacteria. Ulf Schmitz [email protected] Bioinformatics and Systems Biology Group www.sbi.informatik.uni-rostock.de. Outline. Problem description Streptoccocus pyogenes The RNome, transcriptome - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Bioinformatics Computational methods to discover ncRNA in bacteria

www. .uni-rostock.de

BioinformaticsBioinformaticsComputational methods to discover ncRNA in bacteriaComputational methods to discover ncRNA in bacteria

Ulf [email protected]

Bioinformatics and Systems Biology Groupwww.sbi.informatik.uni-rostock.de

Page 2: Bioinformatics Computational methods to discover ncRNA in bacteria

Ulf Schmitz, Computational methods to discover ncRNA 2

www. .uni-rostock.de

Outline

1. Problem description

2. Streptoccocus pyogenes

3. The RNome, transcriptome

4. Characteristics of bacterial ncRNA

5. Approaches to find fRNA

6. Conclusion / Outlook

Page 3: Bioinformatics Computational methods to discover ncRNA in bacteria

Ulf Schmitz, Computational methods to discover ncRNA 3

www. .uni-rostock.de

Streptococcus pyogenes

• important human pathogen (group A streptococcus or GAS)• causes following diseases:

– pyoderma (111 million cases/year)– pharyngitis (616 million cases/year and 517,000 deaths/year)

pyoderma (source: DermNet NZ) pharyngitis (source: UCSD)

• completely adapted to humans as it’s only natural host• causes purulent infections of the skin and mucous membranes

and rarely life-threatening systemic diseases

Page 4: Bioinformatics Computational methods to discover ncRNA in bacteria

Ulf Schmitz, Computational methods to discover ncRNA 4

www. .uni-rostock.de

Streptococcus pyogenes

varies in multiplication rate -> associated with type of infection

to understand the regulation, one studied the growth-phase regulatory factors and gene expression in response to specific environmental differences within the host

a novel growth phase assosiated two-component-type regulator was identified fasBCA operon, present in all 12 tested M serotypes

contained two potential HPK genes (FasB, FasC) and one RR (FasA)

shows its maximum expression and activity at the transition phase

and to potentially support the aggressive spreading of the bacteria in its host

HPK = Histidine protein kinaseRR = response regulator

Page 5: Bioinformatics Computational methods to discover ncRNA in bacteria

Ulf Schmitz, Computational methods to discover ncRNA 5

www. .uni-rostock.de

Streptococcus pyogenes

• downstream of the fas operon they identified a ~300 nucleotide transcript (fasX)

• not encoding for a peptide/protein– but also growth phase related– main effector molecule of fas regulon

• ncRNA or fRNA

Page 6: Bioinformatics Computational methods to discover ncRNA in bacteria

Ulf Schmitz, Computational methods to discover ncRNA 6

www. .uni-rostock.de

ncRNA

gltX-L fasB fasC fasA

pfas pfasXtttt

prnpA

rnpA-L

fasX

1kb

Page 7: Bioinformatics Computational methods to discover ncRNA in bacteria

Ulf Schmitz, Computational methods to discover ncRNA 7

www. .uni-rostock.de

RNome or transcriptome

RNARNA

mRNAmRNA

ncRNA / fRNA snmRNA / sRNAncRNA / fRNA

snmRNA / sRNA

Structural RNAStructural RNA miRNAmiRNA siRNAsiRNA snRNAsnRNA snoRNAsnoRNA stRNAstRNA

tRNAtRNArRNArRNA putative gene expression regulators(also protein interaction – and housekeeping ncRNAs where found)

Page 8: Bioinformatics Computational methods to discover ncRNA in bacteria

Ulf Schmitz, Computational methods to discover ncRNA 8

www. .uni-rostock.de

RNome or transcriptome

fRNA Functional RNA essentially synonymous with non-coding RNA

miRNA MicroRNA 21-24 nucleotide RNAs probably acting as translational regulators mRNA

siRNA Small interfering RNA active molecules in RNA Interference

snRNA Small nuclear RNA includes spliceosomal RNAs

snmRNA Small non-mRNA essentially synonymous with small ncRNAs

snoRNA Small nucleolar RNA most known snoRNAs are involved in rRNA modification

stRNA Small temporal RNA for example, lin-4 and let-7 in Caenorhabditis elegans

Non-coding RNA (ncRNA) genes produce functional RNA molecules rather than encoding proteins and here are the nominees:

mRNA messenger RNA - transcript of a protein coding gene

rRNA ribosomal RNA - form large parts of the ribosome, the protein producing machinary

tRNAtransfer RNA - also involved in protein production, carrying single amino acids to the growing amino acid chain of a protein

ncRNA non coding RNA - found in intergenic regions, playing miscellaneous roles

types of RNA:

Page 9: Bioinformatics Computational methods to discover ncRNA in bacteria

Ulf Schmitz, Computational methods to discover ncRNA 9

www. .uni-rostock.de

Functions of ncRNA

…target mRNAs via imperfect sequence complementarity

binding may result in:• blockage of ribosome entry (translation repression)

• melting of inhibitory secondary structures (translation activation)

loop-loop kissing complexdissolving fold the fold back structure

Page 10: Bioinformatics Computational methods to discover ncRNA in bacteria

Ulf Schmitz, Computational methods to discover ncRNA 10

www. .uni-rostock.de

Streptococcus pyogenes genomes

Serotype Length Date

M1 GAS 1852441 bp Sep 19 2001

MGAS10270 1928252 bp May 4 2006

MGAS10394 1899877 bp Aug 3 2004

MGAS10750 1937111 bp May 4 2006

MGAS2096 1860355 bp May 4 2006

MGAS315 1900521 bp Jul 18 2002

MGAS5005 1838554 bp Aug 8 2005

MGAS6180 1897573 bp Aug 8 2005

MGAS8232 1895017 bp Jan 31 2002

MGAS9429 1836467 bp May 4 2006

SSI-1 1836467 bp May 4 2006

Genome Info & Features:

Genes: 1805

Protein coding 1697

Length 1,852,441 nt

Structural RNAs: 73

GC Content: 38%

Pseudo genes: 35

Coding: 83%

Topology: circular

Molecule dsDNA

Page 11: Bioinformatics Computational methods to discover ncRNA in bacteria

Ulf Schmitz, Computational methods to discover ncRNA 11

www. .uni-rostock.de

Intergenic sequence inspector (ISI)

IGR extractor

Annotated genome

IGR filtering BLAST BLAST Analyser Genview Final results

IGR databankFiltered IGR

databankBLAST results

Alignedfeatures

Sequence features

Bacterial genomes database

Page 12: Bioinformatics Computational methods to discover ncRNA in bacteria

Ulf Schmitz, Computational methods to discover ncRNA 12

www. .uni-rostock.de

Characteristics of bacterial ncRNA

• intergenic sequence/structure conservation between related genomes• encoded by free-standing genes, oriented in opposite fashion to both flanking genes • 50 to 400 nt long (avrg. >200nt)• higher G+C content than average intergenic space• σ70 promoter• ρ – independent terminator• imperfect sequence complementary with target mRNA

Page 13: Bioinformatics Computational methods to discover ncRNA in bacteria

Ulf Schmitz, Computational methods to discover ncRNA 13

www. .uni-rostock.de

Characteristics of bacterial ncRNA

CA90T

Promotor

T82T84G78A65C54A45 T80A95T45A60A50T96

-35 -10

16-19bp 5-9bp

Startpoint

intrinsic terminator

Page 14: Bioinformatics Computational methods to discover ncRNA in bacteria

Ulf Schmitz, Computational methods to discover ncRNA 14

www. .uni-rostock.de

The structure approach with RNAz

1. multiple sequence alignment

2. measure of thermodynamic stability (z score)

3. measure for RNA secondary structure conservation

Function of many ncRNAs depend on a defined secondary structure

Page 15: Bioinformatics Computational methods to discover ncRNA in bacteria

Ulf Schmitz, Computational methods to discover ncRNA 15

www. .uni-rostock.de

• calculation of the MFE (minimum free energy) as a measure of thermodynamic stability

• MFE depends on the length and the base composition of the sequence– and is therefor difficult to interpret in absolute terms

• RNAz calculates a normalized measure of thermodynamic stability by – compares the MFE m of a given (native) sequence – with the MFEs of a large number of random sequences with similar length

and base composition.• A z-score is calculated as

, where µ and σ are the mean and standard deviations, resp., of the MFEs of the random samples

• negative z score indicates the a sequence is more stable than expected by chance

The structure approach

Thermodynamic stability

m

z

Page 16: Bioinformatics Computational methods to discover ncRNA in bacteria

Ulf Schmitz, Computational methods to discover ncRNA 16

www. .uni-rostock.de

• RNAz predicts a consensus secondary structure for an alignment – results in a consensus MFE EA

• RNAz compares this consensus MFE to the average MFE of the individual sequences Ē and calculates a structure conservation index:

• SCI will be low if no consensus fold can be found.

The structure approach

Structural conservation

_

A E/ESCI

Page 17: Bioinformatics Computational methods to discover ncRNA in bacteria

Ulf Schmitz, Computational methods to discover ncRNA 17

www. .uni-rostock.de

The structure approach

• z-score and SCI, are used to classify an alignment as “structural RNA” or “other”.

• RNAz uses a support vector machine (SVM) learning algorithm which is trained on a set of known ncRNAs.

Page 18: Bioinformatics Computational methods to discover ncRNA in bacteria

Ulf Schmitz, Computational methods to discover ncRNA 18

www. .uni-rostock.de

Analysis pipeline of Freiburg group

extraction of intergenic regions ≥50nt

BLASTN

E-value ≤10-8

discard

no

reverse complement

Unify overlapping

Clustering

Scoring

local alignment of IGRs with BLASTN

of candidate sequences

to reduce redundancy

using ClustalW

using RNAz

Page 19: Bioinformatics Computational methods to discover ncRNA in bacteria

Ulf Schmitz, Computational methods to discover ncRNA 19

www. .uni-rostock.de

Summary / Conclusion

• there are ‘reliable’ computational methods to find ncRNA coding genes in bacteria

• key methods involve: – IGR extraction and filtering– observing sequence conservation in related

genomes (BLAST search, ClustalW alignment)– checking for structure conservation and

thermodynamic stability

• next step is to proof their existance experimentally via microArrays or Northern Blots

Page 20: Bioinformatics Computational methods to discover ncRNA in bacteria

Ulf Schmitz, Computational methods to discover ncRNA 20

www. .uni-rostock.de

Outlook

• might it be possible to predict target mRNA?

Thanks for your attention!