lecture04 genomics

29
14/02/03 1 LECTURE PRESENTATIONS For BROCK BIOLOGY OF MICROORGANISMS, THIRTEENTH EDITION Michael T. Madigan, John M. Martinko, David A. Stahl, David P. Clark Lectures by John Zamora Middle Tennessee State University © 2012 Pearson Education, Inc. Microbial Genomes Chapter 12 12.1 Introduction to Genomics Genome Entire complement of genetic information Includes genes, regulatory sequences, and noncoding DNA Genomics Discipline of mapping, sequencing, analyzing, and comparing genomes © 2012 Pearson Education, Inc.

Upload: bankai1234

Post on 21-Oct-2015

82 views

Category:

Documents


1 download

DESCRIPTION

Lecture notes

TRANSCRIPT

Page 1: Lecture04 Genomics

14/02/03

1

LECTURE PRESENTATIONS For BROCK BIOLOGY OF MICROORGANISMS, THIRTEENTH EDITION

Michael T. Madigan, John M. Martinko, David A. Stahl, David P. Clark!

Lectures by John Zamora Middle Tennessee State University

© 2012 Pearson Education, Inc.

Microbial Genomes

Chapter 12

12.1 Introduction to Genomics

• Genome –  Entire complement of genetic information –  Includes genes, regulatory sequences, and

noncoding DNA • Genomics

–  Discipline of mapping, sequencing, analyzing, and comparing genomes

© 2012 Pearson Education, Inc.

Page 2: Lecture04 Genomics

14/02/03

2

12.1 Introduction to Genomics

•  >2,000 prokaryotic genomes sequenced or in progress

•  RNA virus MS2 – First genome sequenced in 1976 –  3,569 bp

•  Haemophilus influenzae – First cellular genome sequenced in 1995 –  1,830,137 bp

© 2012 Pearson Education, Inc.

12.2 Sequencing and Annotating Genomes

•  Sequencing: determining the precise order of nucleotides in a DNA or RNA molecule •  Sanger dideoxy method

–  Invented by Nobel Prize winner Fred Sanger –  Dideoxy analogs of dNTPs used in conjunction

with dNTPs (Figure 12.1) –  Analog prevents further extension of DNA chain

(Figure 12.2a) –  Bases are labeled with radioactivity –  Gel electrophoresis is then performed on products

(Figure12.2b) © 2012 Pearson Education, Inc.

Page 3: Lecture04 Genomics

14/02/03

3

Figure 12.1

Normal deoxynucleotide Dideoxy analog

Base Base

Missing OH

DNA chain

Base

Base

Direction of chain growth

No free 3!-OH, replication will stop at this point

© 2012 Pearson Education, Inc.

Figure 12.2a

DNA strand to be sequenced

Radioactive DNA primer

Add DNA polymerase, mixture of all four deoxyribonucleotide triphosphates; separate into four reaction tubes

A small amount of only one dideoxynucleotide triphosphate (ddGTP, ddATP, ddTTP, or ddCTP)

added to each tube and reaction allowed to proceed

Reaction products

ddGTP ddATP ddTTP ddCTP

G A T C

© 2012 Pearson Education, Inc.

Page 4: Lecture04 Genomics

14/02/03

4

Figure 12.2b

Reaction products separated by electrophoresis on gel and identified by autoradiography

7 6

5

4

3

2 1

Sequence reads from bottom of gel as A G C T A A G. Sequence of unknown is 3! T C G A T T C 5!

© 2012 Pearson Education, Inc.

12.2 Sequencing and Annotating Genomes

•  Large-scale sequencing projects have led to automated DNA sequencing systems – Based on Sanger method – Radioactivity replaced by fluorescent dye

(Figure 12.2c)

© 2012 Pearson Education, Inc.

Page 5: Lecture04 Genomics

14/02/03

5

Figure 12.2c

G G A A A C T

© 2012 Pearson Education, Inc.

12.2 Sequencing and Annotating Genomes

•  Virtually all genomic sequencing projects use shotgun sequencing – Entire genome is cloned and resultant clones are

sequenced – Much of the sequencing is redundant – Generally 7- to 10-fold coverage

•  Computer algorithms used to look for replicate sequences and assemble them

© 2012 Pearson Education, Inc.

Page 6: Lecture04 Genomics

14/02/03

6

12.2 Sequencing and Annotating Genomes

•  Occasionally assembly is not possible •  Closure can be pursued using PCR to target

areas of the genome •  Closed vs. draft genome

– Closed genome relies on manpower – More expensive – More information

© 2012 Pearson Education, Inc.

12.2 Sequencing and Annotating Genomes

•  Annotation: converting raw sequence data into a list of genes present in the genome • Majority of genes encode proteins •  Functional ORF: an open reading frame that

encodes a protein –  Computer algorithms used to search for ORFs

•  Look for start/stop codons and Shine–Dalgarno sequences

• ORFs can be compared to ORFs in other genomes

© 2012 Pearson Education, Inc.

Page 7: Lecture04 Genomics

14/02/03

7

12.2 Sequencing and Annotating Genomes

•  Inaccuracies in some annotations are problematic – As many as 10% of annotated genes are

incorrectly annotated

© 2012 Pearson Education, Inc.

12.3 Bioinformatic Analyses and Gene Distributions

•  Bioinformatics –  Science that applies powerful computational

tools to DNA and protein sequences –  For the purpose of analyzing, storing, and

accessing the sequences for comparative purposes

© 2012 Pearson Education, Inc.

Page 8: Lecture04 Genomics

14/02/03

8

12.3 Bioinformatic Analyses and Gene Distributions

• Correlation between genome size and ORFs (Figure 12.3) • On average a prokaryotic gene is 1,000 bp long

–  ∼ 1,000 genes per megabase (1 Mbp = 1,000,000 bp)

–  As genome size increases, gene content proportionally increases

© 2012 Pearson Education, Inc.

Figure 12.3

Genome size (megabases)

Tota

l OR

Fs in

gen

ome

9000 8000

7000 6000

5000 4000 3000

2000

1000 0

0 1 2 3 4 5 6 7 8 9 10

© 2012 Pearson Education, Inc.

Page 9: Lecture04 Genomics

14/02/03

9

12.3 Bioinformatic Analyses and Gene Distributions

•  Prokaryotic genomes range in size from those of large viruses to those of eukaryotic microbes

•  Unlike prokaryotes, eukaryotic genomes contain a large fraction of noncoding DNA

© 2012 Pearson Education, Inc.

12.3 Bioinformatic Analyses and Gene Distributions

•  Smallest cellular genomes belong to parasitic or endosymbiotic prokaryotes

–  Obligate parasites range from 490 kbp (Nanoarchaeum equitans) to 4,400 kbp (Mycobacterium tuberculosis)

–  Endosymbionts can be smaller (e.g., 160-bp genome of Carsonella ruddii)

–  Estimates suggest the minimum number of genes for a viable cell is 250–300 genes

© 2012 Pearson Education, Inc.

Page 10: Lecture04 Genomics

14/02/03

10

12.3 Bioinformatic Analyses and Gene Distributions

•  Largest prokaryotic genomes comparable to those of some eukaryotes – Sorangium cellulosum (Bacteria)

•  Largest prokaryotic genome to date at 12.3 Mbp –  Largest archaeal genomes tend to be smaller

(~5 Mbp)

© 2012 Pearson Education, Inc.

12.3 Bioinformatic Analyses and Gene Distributions

•  Complement of genes in a particular organism defines its biology, but genomes are also molded by an organism�s lifestyle

© 2012 Pearson Education, Inc.

Page 11: Lecture04 Genomics

14/02/03

11

12.3 Bioinformatic Analyses and Gene Distributions

•  Many genes can be identified by sequence similarity to genes found in other organisms (comparative analysis)

•  Comparative analyses allow for predictions of metabolic pathways and transport systems

•  Example:Thermotoga maritima (Figure 12.4)

© 2012 Pearson Education, Inc.

Figure 12.4 Peptide ABC transport systems

Glucose Gluconate

Glycolysis

Branched-chain amino acids

Amino acids

Polyamines

Phosphate

Flagellum

Zinc

Iron

Chemotactic signals

Cations

Cations

Na+ Fe3+ K+ NH4+ Uracil Glycerol

uptake ATP synthase

H+ Glycerol 3-P Maltose

Ribose

Suga

r AB

C tr

ansp

ort s

yste

ms

PENTOSE PHOSPHATE PATHWAY

ENTNER– DOUDOROFF

PATHWAY Glucose-6-P 6-Phosphogluconate

Glycine Acetamine Threonine

NH3 + CO2 + H2 Fructose-6-P Gly-3-P

KDPG

Gly-3-P +

Pyruvate Glycerol-3-P DHAP Glycerol

Aspartate Malate

PEP

Pyruvate Oxalacetate Aspartate

33 flagellar & motor genes

cheA/B/C/D/R/W/Y

7 MCPs H2 and CO2 Acetyl-CoA

Valine

Lactate OR

α-Ketoglutarate Aldehydes Ketoisovalerate

Glutamate Proline Glutamine

Leucine

Histidine

PRPP

Ribose-5-P

© 2012 Pearson Education, Inc.

Page 12: Lecture04 Genomics

14/02/03

12

12.3 Bioinformatic Analyses and Gene Distributions

• Gene Distribution in Prokaryotes –  Metabolic genes typically most abundant

class –  DNA replication and transcription genes make

up minor fraction of genome –  Nontranslated RNA genes are typically

prevalent •  Include rRNA, tRNA, small regulatory RNAs

© 2012 Pearson Education, Inc.

12.3 Bioinformatic Analyses and Gene Distributions

•  Number of genes with role that can be clearly identified in a given genome is 70% or less of total ORFs detected

•  Hypothetical proteins: uncharacterized ORFs; proteins that likely exist but whose function is presently unknown –  Likely encode nonessential genes –  In E. coli, many predicted to encode regulatory or

redundant proteins

© 2012 Pearson Education, Inc.

Page 13: Lecture04 Genomics

14/02/03

13

12.3 Bioinformatic Analyses and Gene Distributions

•  Percentage of an organism�s genes devoted to a specific cell function is to some degree a function of genome size (Figure 12.5)

© 2012 Pearson Education, Inc.

Figure 12.5

DNA replication Translation Transcription Signal transduction Energy generation

Total ORFs in genome

Rel

ativ

e pe

rcen

t of O

RFs

10,000 8000 6000 4000 2000 0

© 2012 Pearson Education, Inc.

Page 14: Lecture04 Genomics

14/02/03

14

12.3 Bioinformatic Analyses and Gene Distributions

•  Gene Distribution in Bacteria and Archaea – Archaea typically devote a higher percentage

of their genomes to energy and coenzyme production than do Bacteria

– Archaea contain fewer genes for carbohydrate metabolism or cytoplasmic membrane functions than do Bacteria

© 2012 Pearson Education, Inc.

Figure 12.6

Carbohydrate metabolism

Cell membrane

Coenzyme metabolism

Energy production

Unknown function

General prediction

Bacteria Archaea

Perc

ent o

f gen

es

14 12 10 8 6 4 2 0

Functional category

© 2012 Pearson Education, Inc.

Page 15: Lecture04 Genomics

14/02/03

15

12.6 Metagenomics

•  Metagenome – The total gene content of the organisms

present in an environment •  Several environments have been surveyed by

large-scale metagenome projects – Examples: acid mine runoff waters, deep-sea

sediments, fertile soils

© 2012 Pearson Education, Inc.

12.10 Gene Families, Duplications, and Deletions

• Homologous: related in sequence to an extent that implies common genetic ancestry • Gene families: groups of gene homologs

(Figure 12.15) •  Paralogs: genes within an organism whose

similarity to one or more genes in the same organism is the result of gene duplication • Orthologs: genes found in one organism that are

similar to those in another organism but differ because of speciation

© 2012 Pearson Education, Inc.

Page 16: Lecture04 Genomics

14/02/03

16

Figure 12.15 Paralogs

Paralogs

Paralogs

Paralogs

Orthologs Orthologs

A1 A2 B1 B2 Species 1

Species 2 Species 1 Species 2

Paralogs Gene A Gene B

Ancestral gene in ancestral species

Divergence of species

Gene duplication

© 2012 Pearson Education, Inc.

Genomics in TB Research

Page 17: Lecture04 Genomics

14/02/03

17

The outer circle shows the scale in Mb, with 0 representing the origin of replication. The first ring from the exterior denotes the positions of stable RNA genes (tRNAs are blue, others are pink) and the direct repeat region (pink cube); the second ring inwards shows the coding sequence by strand (clockwise, dark green; anticlockwise, light green); the third ring depicts repetitive DNA (insertion sequences, orange; 13E12 REP family, dark pink; prophage, blue); the fourth ring shows the positions of the PPE family members (green); the fifth ring shows the PE family members (purple, excluding PGRS); and the sixth ring shows the positions of the PGRSsequences (dark red). The histogram (centre) represents G + C content, with <65% G + C in yellow, and >65% G + C in red. The figure was generated with software from DNASTAR.

Cole, S. T. et al. (1998). Nature, 393(6685), 537–544. doi:10.1038/31159

M. tuberculosis Genome •  The genome comprises:

–  4,411,529 base pairs –  4,000 genes –  a very high GC content that is reflected in the

biased amino-acid content of the proteins •  M. tuberculosis differs radically from other

bacteria in that; – A very large portion of its coding capacity is

devoted to the production of enzymes involved in lipogenesis and lipolysis.

– Two new families of glycine-rich proteins with a repetitive structure that may represent a source of antigenic variation.

Page 18: Lecture04 Genomics

14/02/03

18

Biased Amino-acid Content of the Proteins

•  GTG initiation codons (35%) are used more frequently than in Bacillus subtilis (9%) and E. coli (14%), although ATG (61%) is the most common translational start.

•  Statistically significant preference for the amino acids Ala, Gly, Pro, Arg and Trp, which are all encoded by G + C-rich codons

•  A comparative reduction in the use of amino acids encoded by A + T- rich codons such as Asn, Ile, Lys, Phe and Tyr

Lipogenesis and Lipolysis

•  ~250 distinct enzymes involved in fatty acid metabolism in M. tuberculosis compared with only 50 in E. coli.

•  36 acyl-CoA synthases. •  Two discrete types of fatty acid biosynthesis system,

fatty acid synthase (FAS) I and FAS II. •  Polyketides •  Siderophores

Page 19: Lecture04 Genomics

14/02/03

19

Immunological aspects and pathogenicity •  ~90 Lipoproteins •  Two copies of secA •  About 10% of the coding capacity of the genome is devoted

to two large unrelated families of acidic, glycine-rich proteins, the PE and PPE families, whose genes are clustered

–  the principal source of antigenic variation in what is otherwise a genetically and antigenically homogeneous bacterium?

–  these glycine-rich proteins might interfere with immune responses by inhibiting antigen processing?

Potential Antibiotic Resistance Mechanisms

•  Hydrolytic or drug-modifying enzymes such as beta-lactamases and aminoglycoside acetyl transferases.

•  Many potential drug – efflux systems, such as 14 members of the major facilitator family and numerous ABC transporters.

Page 20: Lecture04 Genomics

14/02/03

20

Cole, S. T., et al. (2001). Massive gene decay in the leprosy bacillus. Nature, 409(6823), 1007–1011. doi:10.1038/35059006

What is Leprosy? •  Chronic human neurological disease •  Results from infection with the obligate intracellular

pathogen Mycobacterium leprae •  M. leprae is a close relative of the tubercle bacillus. •  M. leprae has the longest doubling time of all known

bacteria (a doubling time of ~14 days) •  M. leprae has thwarted every effort at culture in the

laboratory.

Page 21: Lecture04 Genomics

14/02/03

21

Features of M. leprae Genome •  The 3.27 Mb genome sequence from an armadillo-derived

Indian isolate of the leprosy bacillus. •  Substantially smaller than that of M. tuberculosis (4.41 Mb). •  Reveals an extreme case of reductive evolution.

–  Less than half of the genome contains functional genes but pseudogenes, with intact counterparts in M. tuberculosis, abound.

–  Genome downsizing and the current mosaic arrangement appear to have resulted from extensive recombination events between dispersed repetitive sequences.

–  Gene deletion and decay have eliminated many important metabolic activities including siderophore production, part of the oxidative and most of the microaerophilic and anaerobic respiratory chains, and numerous catabolic systems and their regulatory circuits.

Pseudogenes

Page 22: Lecture04 Genomics

14/02/03

22

1, small-molecule catabolism; 2, energy metabolism; 3, central intermediary metabolism; 4, amino-acid biosynthesis; 5, nucleoside and nucleotide biosynthesis and metabolism; 6, biosynthesis of cofactors, prosthetic groups and carriers; 7, lipid biosynthesis; 8, polyketide and non-ribosomal peptide synthesis; 9, proteins performing regulatory functions; and so on.

Page 23: Lecture04 Genomics

14/02/03

23

It is striking that elimination of pseudogenes by deletion lags far behind gene inactivation. But why???

Garnier, T., et al. (2003). Proceedings of the National Academy of Sciences of the United States of America, 100(13), 7877–7882. doi:10.1073/pnas.1130426100

Page 24: Lecture04 Genomics

14/02/03

24

The genome sequence of M. bovis is >99.95% identical to that of M. tuberculosis, but deletion of genetic information has led to a reduced genome size.

Brosch, R., et al. (2002). Proceedings of the National Academy of Sciences of the United States of America, 99(6), 3684–3689. doi:10.1073/pnas.052548299

Page 25: Lecture04 Genomics

14/02/03

25

M. tuberculosis derived from M. bovis

M. bovis

M. tuberculosis

Or was it?

Proposed origin

Cole, Institut Pasteur, Downloaded from www.pasteur.fr.

Cole, Institut Pasteur, Downloaded from www.pasteur.fr.

RD distribution in M. tuberculosis!

M. tub.! M. afri.! M. mic.! M. bov.! BCG!

RD 9 !

RD3 (Φ Rv1)!

RD 9!RD 7!RD 8!

RD 10 !

RD3 (Φ Rv1)!RD 5’!

RD 9!RD 7!RD 8!RD10!!

RD 4!RD 5!RD12!RD13 !

RD 9!RD 7!RD 8!RD10!!

RD 4!RD 5!RD12!RD13!!

RD 1!RD 2 !

RD3 (Φ Rv1)!RD11 (Φ Rv2)!

RD11 (Φ Rv2)!

M. can.!

TbD1!RD 12’!

Page 26: Lecture04 Genomics

14/02/03

26

Cole, Institut Pasteur, Downloaded from www.pasteur.fr.

oxyR 285 G→A!

Common ancestor of the! M. tuberculosis complex!

M. africanum!

RD 7!RD 8!

RD 10!

RD 12!RD 13!

M. canettii!

RD 9! M. tuberculosis!katG 463 CTG→CGG!

M. microti!

M. bovis!

RDcan!

RDmic!

RDseal!seal-isol.!oryx-isol.!

goat-isol.!

“classical”!RD 1!

BCG Tokyo !

gyrA95AGC→ACC!

pncA 57CAC→GAC!RD 4!

RD 2!

BCG Pasteur!RD 14!

TbD 1!

Numerous sequence !polymorphisms!

“mod

ern”!“ancestral”!

mmpL6 551AAC→AAG!

Evolutionary scenario

Cole, Institut Pasteur, Downloaded from www.pasteur.fr.

Evolution of the M. tb complex!

M. bovis

M. tuberculosis

X

Silas Chan
Text
Page 27: Lecture04 Genomics

14/02/03

27

Cole, Institut Pasteur, Downloaded from www.pasteur.fr.

Progenitor bacillus

M. bovis M. tuberculosis

Evolution of the M. tb complex

•  More than 3 billion individuals have been immunized with bacillus Calmette–Guerin (BCG), an attenuated derivative of M. bovis.

•  BCG is part of the WHO’s Expanded Program on Immunization because of its proven efficacy at preventing extrapulmonary tuberculosis in children.

•  In adults, its efficacy against pulmonary disease is variable, possibly as a result of environmental, operational, demographic, and genetic factors.

•  Comparative genome and transcriptome analysis of Mycobacterium bovis BCG Pasteur 1173P2.

Page 28: Lecture04 Genomics

14/02/03

28

Major Findings •  The 4,374,522-bp genome contains 3,954 protein-coding

genes, 58 of which are present in two copies as a result of two independent tandem duplications, DU1 and DU2.

•  DU1 is restricted to BCG Pasteur. •  DU2-I is confined to early BCG vaccines, like BCG Japan •  DU2-III and DU2-IV occur in the late vaccines. •  The glycerol-3-phosphate dehydrogenase gene, glpD2, is

one of only three genes common to all four DU2 variants, implying that BCG requires higher levels of this enzyme to grow on glycerol.

•  Further amplification of the DU2 region is ongoing, even within vaccine preparations used to immunize humans.

•  Furthermore, the combined findings suggest that early BCG vaccines may even be superior to the later ones that are more widely used.

Page 29: Lecture04 Genomics

14/02/03

29

The Beijing family •  Appears to be more virulent,

more transmissible & associated with MDR

TRENDS in Microbiology Vol.10 No.1 January 2002

45-52