introduction to microbial genomics - dtu … ·...
Post on 13-Sep-2018
225 Views
Preview:
TRANSCRIPT
Center for Biological Sequence Analysis
Department of Systems Biology
Introduction to Microbial Genomics
Sequences as information
Dave Ussery
Comparative Bacterial Genomics WorkshopCenters for Disease ControlAtlanta, Georgia, USA
Monday, 27 August, 2012
2
Comparative Bacterial Genomics Workshop, Centers for Disease Control, Atlanta, Georgia, USA 27 August, 2012CBS, Department of Systems Biology 3
http://www.cbs.dtu.dk/staff/dave/CDC_2012.php
www.cbs.dtu.dk
0M
0.5
M1
M
1.5M
2M
2.5
M
V. cholerae O1 biovar El Tor str. N16961 I
2,961,149 bp
BASE ATLAS
Center for Biological Sequence Anhttp://www.cbs.dtu.dk/
G Content
0.18 0.30
A Content
0.20 0.32
T Content
0.21 0.32
C Content
0.17 0.30
Annotations:
CDS +
CDS -
rRNA
tRNA
AT Skew
-0.04 0.04
GC Skew
-0.08 0.08
Percent AT
0.46 0.59
Resolution: 1185
genomeStatistics
rnammer
1 3 5 7 9 12 15 18 21 24 27 30 33 36 39 42 45
05000
10000
15000
New genes
New gene families
Core genome
Pan genome
1 : Ecoli_042 2 : Ecoli_536 3 : Ecoli_55989 4 : Ecoli_ABU_83972 5 : Ecoli_APEC_O1 6 : Ecoli_ATCC_8739 7 : Ecoli_BL21_DE3_28965 8 : Ecoli_BL21_DE3_30681 9 : Ecoli_BW2952 10 : Ecoli_B_str_REL606 11 : Ecoli_DH1 12 : Ecoli_E24377A 13 : Ecoli_ED1a 14 : Ecoli_ETEC_H10407 15 : Ecoli_HS 16 : Ecoli_IAI1 17 : Ecoli_IAI39 18 : Ecoli_IHE3034 19 : Ecoli_KO11 20 : Ecoli_O103H2_str_12009 21 : Ecoli_O111H_str_11128 22 : Ecoli_O127H6_str_E2348_69 23 : Ecoli_O157H7_str_EDL933 24 : Ecoli_O157H7_str_TW14359 25 : Ecoli_O26H11_str_11368 26 : Ecoli_O55H7_str_CB9615 27 : Ecoli_O83H1_str_NRG_857C 28 : Ecoli_S88 29 : Ecoli_SE11 30 : Ecoli_SE15 31 : Ecoli_SMS35 32 : Ecoli_UM146 33 : Ecoli_UMN026 34 : Ecoli_UTI89 35 : Ecoli_W 36 : Ecoli_str_K12_substr_DH10B 37 : Ecoli_str_K12_substr_MG1655 38 : Ecoli_str_K12_substr_W3110 39 : Vatypica_ACS_049_V_Sch6 40 : Vatypica_ACS_134_V_Col7a 41 : Vdispar_ATCC_17748 42 : Vparvula_ATCC_17745 43 : Vparvula_DSM_2008 44 : Vsp_3_1_44 45 : Vsp_6_1_27 46 : Vsp_str_F0412
3.3 %111 / 3,378
28.3 %1,980 / 6,989
55.5 %2,683 / 4,838
52.4 %2,666 / 5,085
34.9 %2,114 / 6,065
33.1 %2,074 / 6,269
30.3 %1,795 / 5,923
30.5 %1,813 / 5,939
26.7 %1,916 / 7,168
30.5 %2,050 / 6,715
32.6 %2,040 / 6,250
28.3 %2,095 / 7,406
32.3 %1,842 / 5,705
31.9 %2,074 / 6,494
33.6 %1,805 / 5,377
30.2 %1,747 / 5,786
29.9 %1,736 / 5,802
31.9 %1,743 / 5,469
34.4 %1,846 / 5,360
32.5 %1,873 / 5,769
30.6 %1,777 / 5,804
32.1 %1,846 / 5,747
5.0 %243 / 4,897
30.3 %2,110 / 6,968
29.7 %2,127 / 7,169
29.5 %2,198 / 7,456
28.1 %2,155 / 7,667
25.5 %1,872 / 7,339
28.0 %2,022 / 7,222
25.9 %2,170 / 8,370
27.8 %2,222 / 7,979
29.4 %2,212 / 7,534
26.1 %2,254 / 8,624
27.9 %1,972 / 7,061
29.6 %2,295 / 7,753
28.1 %1,904 / 6,782
25.7 %1,850 / 7,198
25.6 %1,841 / 7,205
26.9 %1,851 / 6,869
28.7 %1,944 / 6,766
27.5 %1,971 / 7,179
26.3 %1,893 / 7,208
27.2 %1,946 / 7,165
2.6 %96 / 3,691
75.0 %3,261 / 4,346
38.7 %2,246 / 5,808
36.6 %2,201 / 6,016
33.6 %1,915 / 5,695
34.5 %1,963 / 5,692
30.4 %2,085 / 6,866
34.2 %2,205 / 6,448
36.3 %2,179 / 6,005
29.6 %2,214 / 7,478
36.2 %1,976 / 5,464
35.9 %2,233 / 6,219
36.7 %1,906 / 5,192
32.8 %1,843 / 5,611
33.0 %1,848 / 5,596
34.9 %1,843 / 5,282
37.7 %1,947 / 5,165
35.3 %1,972 / 5,581
33.6 %1,884 / 5,612
35.0 %1,949 / 5,561
2.9 %112 / 3,894
38.1 %2,277 / 5,979
35.7 %2,219 / 6,213
32.5 %1,919 / 5,903
33.9 %1,991 / 5,874
29.4 %2,083 / 7,082
33.1 %2,209 / 6,672
35.3 %2,191 / 6,211
29.3 %2,244 / 7,665
34.5 %1,965 / 5,696
35.5 %2,270 / 6,400
35.6 %1,922 / 5,398
31.9 %1,857 / 5,817
32.1 %1,861 / 5,806
34.2 %1,872 / 5,473
36.6 %1,964 / 5,371
34.2 %1,983 / 5,797
32.5 %1,896 / 5,827
34.0 %1,963 / 5,771
2.8 %118 / 4,277
72.3 %3,688 / 5,101
38.6 %2,289 / 5,931
42.3 %2,451 / 5,795
36.7 %2,562 / 6,982
40.8 %2,680 / 6,565
43.7 %2,670 / 6,112
36.7 %2,759 / 7,516
45.4 %2,507 / 5,523
43.9 %2,762 / 6,293
41.8 %2,264 / 5,418
38.0 %2,213 / 5,823
37.9 %2,209 / 5,822
39.9 %2,202 / 5,514
42.9 %2,314 / 5,388
40.4 %2,345 / 5,808
38.6 %2,251 / 5,839
40.3 %2,326 / 5,771
2.3 %103 / 4,463
36.9 %2,259 / 6,124
40.2 %2,413 / 5,999
36.5 %2,593 / 7,105
39.7 %2,672 / 6,728
41.9 %2,637 / 6,301
34.6 %2,682 / 7,762
43.7 %2,492 / 5,705
41.4 %2,698 / 6,523
39.9 %2,238 / 5,609
36.9 %2,208 / 5,989
36.3 %2,186 / 6,014
38.0 %2,171 / 5,707
40.6 %2,270 / 5,592
38.5 %2,311 / 6,004
37.0 %2,227 / 6,026
38.4 %2,291 / 5,971
2.3 %88 / 3,822
46.2 %2,452 / 5,307
30.9 %2,144 / 6,948
37.5 %2,396 / 6,387
39.9 %2,372 / 5,942
45.0 %3,018 / 6,702
37.8 %2,081 / 5,503
47.0 %2,741 / 5,827
38.1 %1,994 / 5,228
34.4 %1,944 / 5,645
34.8 %1,952 / 5,617
36.4 %1,935 / 5,317
38.7 %2,021 / 5,225
36.4 %2,055 / 5,647
34.7 %1,968 / 5,677
35.8 %2,018 / 5,637
2.7 %103 / 3,886
34.5 %2,335 / 6,762
43.2 %2,655 / 6,143
46.1 %2,626 / 5,697
43.4 %2,981 / 6,875
45.0 %2,357 / 5,232
64.9 %3,385 / 5,213
41.6 %2,134 / 5,135
38.2 %2,104 / 5,504
37.2 %2,064 / 5,548
39.1 %2,048 / 5,244
41.6 %2,140 / 5,139
38.8 %2,162 / 5,566
37.9 %2,110 / 5,560
38.7 %2,143 / 5,536
3.9 %200 / 5,078
33.0 %2,516 / 7,615
34.4 %2,472 / 7,184
30.1 %2,581 / 8,574
34.3 %2,276 / 6,634
35.2 %2,581 / 7,333
32.4 %2,098 / 6,481
30.3 %2,079 / 6,856
29.6 %2,044 / 6,898
31.2 %2,045 / 6,565
33.0 %2,137 / 6,467
31.5 %2,169 / 6,884
30.4 %2,098 / 6,893
31.2 %2,143 / 6,862
3.1 %150 / 4,773
67.5 %3,741 / 5,540
37.0 %2,900 / 7,832
43.2 %2,597 / 6,013
46.4 %3,042 / 6,550
43.0 %2,483 / 5,781
39.4 %2,432 / 6,172
39.1 %2,418 / 6,182
40.1 %2,373 / 5,919
44.1 %2,533 / 5,743
41.9 %2,575 / 6,151
40.0 %2,473 / 6,185
41.7 %2,552 / 6,116
2.8 %121 / 4,337
38.7 %2,880 / 7,439
47.2 %2,608 / 5,524
48.9 %2,994 / 6,128
46.3 %2,464 / 5,326
42.2 %2,409 / 5,711
41.3 %2,372 / 5,746
43.5 %2,367 / 5,437
47.1 %2,503 / 5,310
44.5 %2,539 / 5,707
42.8 %2,449 / 5,718
44.3 %2,515 / 5,683
3.9 %202 / 5,116
34.9 %2,496 / 7,160
46.4 %3,371 / 7,266
33.3 %2,327 / 6,984
31.0 %2,282 / 7,362
30.7 %2,271 / 7,389
32.1 %2,268 / 7,062
34.3 %2,377 / 6,932
33.1 %2,415 / 7,299
31.7 %2,323 / 7,337
32.5 %2,385 / 7,336
2.1 %79 / 3,683
43.5 %2,547 / 5,858
46.0 %2,220 / 4,821
41.1 %2,153 / 5,242
41.1 %2,152 / 5,239
42.7 %2,113 / 4,953
45.9 %2,223 / 4,842
42.3 %2,236 / 5,283
41.3 %2,181 / 5,277
42.2 %2,215 / 5,254
3.2 %147 / 4,662
42.3 %2,399 / 5,675
37.9 %2,313 / 6,099
38.1 %2,320 / 6,091
39.7 %2,303 / 5,796
42.4 %2,408 / 5,683
40.0 %2,440 / 6,094
38.4 %2,348 / 6,120
40.0 %2,421 / 6,055
2.5 %84 / 3,305
68.5 %2,844 / 4,150
70.4 %2,886 / 4,098
73.1 %2,818 / 3,854
81.0 %2,989 / 3,688
72.2 %2,986 / 4,136
68.5 %2,869 / 4,191
70.4 %2,922 / 4,153
3.5 %125 / 3,567
64.5 %2,847 / 4,414
68.3 %2,820 / 4,126
74.3 %2,987 / 4,018
81.6 %3,264 / 4,000
77.5 %3,153 / 4,066
76.9 %3,165 / 4,117
2.8 %99 / 3,597
67.8 %2,806 / 4,137
67.6 %2,836 / 4,195
67.4 %2,983 / 4,424
65.0 %2,880 / 4,434
64.6 %2,888 / 4,474
2.2 %73 / 3,311
71.5 %2,801 / 3,915
69.7 %2,916 / 4,183
69.0 %2,860 / 4,145
68.7 %2,874 / 4,181
1.8 %59 / 3,353
80.2 %3,169 / 3,953
75.1 %3,024 / 4,028
79.6 %3,139 / 3,944
4.3 %157 / 3,665
80.2 %3,271 / 4,079
80.4 %3,303 / 4,109
3.3 %120 / 3,599
77.1 %3,186 / 4,134
3.0 %110 / 3,665
Aliivibrio salmonicida LFI1238
3,915 proteins, 3,378 families
Photobacterium profundum
SS9
5,480 proteins, 4,897 families
Vibrio fischeri ES114
3,818 proteins, 3,691 families
Vibrio fischeri MJ11
4,039 proteins, 3,894 families
Vibrio splendidus LGP32
4,431 proteins, 4,277 families
Vibrio species
MED
222 1099517005441
4,590 proteins, 4,463 families
Vibrio campbellii
AN
D4 1103602000595
3,935 proteins, 3,822 families
Vibrio species Ex25
4,004 proteins, 3,886 families
Vibrio shilonii
AK1 1103207002036
5,360 proteins, 5,078 families
Vibrio vulnificus YJ016
5,028 proteins, 4,773 families
Vibrio vulnificus CM
CP6
4,538 proteins, 4,337 families
Vibrio harveyi
ATCC BA
A-1116
6,064 proteins, 5,116 families
Vibrio parahaemolyticus 16
3,780 proteins, 3,683 families
Vibrio parahaemolyticus
RIMD
2210633
4,832 proteins, 4,662 families
Vibrio cholerae A
M-19226
3,407 proteins, 3,305 families
Vibrio cholerae 2740-80
3,771 proteins, 3,567 families
Vibrio cholerae 1587
3,758 proteins, 3,597 families
Vibrio cholerae MZO
-2
3,425 proteins, 3,311 families
Vibrio cholerae MO
10
3,421 proteins, 3,353 families
Vibrio cholerae 0395
3,875 proteins, 3,665 families
Vibrio cholerae V52
3,815 proteins, 3,599 families
Vibrio cholerae
O1 biovar eltor str. N
16961
3,828 proteins, 3,665 families
Aliivi
brio
salm
onici
da
LFI1
238
3,915
pro
tein
s, 3,3
78 fa
mili
es
Photo
bacte
rium
profu
ndum
SS9
5,480
pro
tein
s, 4,8
97 fa
mili
es
Vibrio
fisch
eri
ES11
4
3,818
pro
tein
s, 3,6
91 fa
mili
es
Vibrio
fisch
eri
MJ1
1
4,039
pro
tein
s, 3,8
94 fa
mili
es
Vibrio
splen
didu
s
LGP32
4,431
pro
tein
s, 4,2
77 fa
mili
es
Vibrio
spec
ies
MED
222 1
0995
1700
5441
4,590
pro
tein
s, 4,4
63 fa
mili
es
Vibrio
cam
pbell
ii
AN
D4 1
1036
0200
0595
3,935
pro
tein
s, 3,8
22 fa
mili
es
Vibrio
spec
ies
Ex2
5
4,004
pro
tein
s, 3,8
86 fa
mili
es
Vibrio
shilo
nii
AK1 1
1032
0700
2036
5,360
pro
tein
s, 5,0
78 fa
mili
es
Vibrio
vuln
ificu
s
YJ0
16
5,028
pro
tein
s, 4,7
73 fa
mili
es
Vibrio
vuln
ificu
s
CM
CP6
4,538
pro
tein
s, 4,3
37 fa
mili
es
Vibrio
harv
eyi
ATCC B
AA
-111
6
6,064
pro
tein
s, 5,1
16 fa
mili
es
Vibrio
para
haem
olytic
us
16
3,780
pro
tein
s, 3,6
83 fa
mili
es
Vibrio
para
haem
olytic
us
RIMD
2210
633
4,832
pro
tein
s, 4,6
62 fa
mili
es
Vibrio
chole
rae
AM
-192
26
3,407
pro
tein
s, 3,3
05 fa
mili
es
Vibrio
chole
rae
2740
-80
3,771
pro
tein
s, 3,5
67 fa
mili
es
Vibrio
chole
rae
1587
3,758
pro
tein
s, 3,5
97 fa
mili
es
Vibrio
chole
rae
MZO
-2
3,425
pro
tein
s, 3,3
11 fa
mili
es
Vibrio
chole
rae
MO
10
3,421
pro
tein
s, 3,3
53 fa
mili
es
Vibrio
chole
rae
0395
3,875
pro
tein
s, 3,6
65 fa
mili
es
Vibrio
chole
rae
V52
3,815
pro
tein
s, 3,5
99 fa
mili
es
Vibrio
chole
rae
O1 b
iovar
elto
r str.
N16
961
3,828
pro
tein
s, 3,6
65 fa
mili
es
Homology within proteomes
5.0 %1.8 %
Homology between proteomes
81.6 %25.5 %
BLAST matrix
grep
ls -1
gawk
pancoreplot
makebmdest blastmatrix
Copy and download, GenBank and DNA files
saco_extract
saco_convert Prodigal
4 1 Sequences as Biological Information
organisms, the number of species present in the environment, and, despite their small size, the biomass they represent on a worldwide scale. Even inside an animal, microbes are abundant: only one out of every 10 cells in a human body is actually human, whilst the other nine cells are prokaryotic.
From an evolutionary perspective, Bacteria and Archaea have been around for more than 3 billion years; plants and animals are relatively recent ‘newcomers’ on the scene, arriving less than half a billion years ago. Since Bacteria and Archaea can divide rather quickly and have had much more time to evolve, their diversity by far exceeds that of eukaryotes (the members of Eucarya). Our human perception is that plants and animals are completely unlike each other, and so are, say, insects and mammals, as they are strikingly different even at first sight. The diversity of
Fig. 1.1 A phylogenetic tree displaying the genetic distances between members of the three super-kingdoms of life: Bacteria, Archaea, and Eucarya. The represented bacterial genera will appear in examples throughout the book. The distance between bacterial genera is much larger than that of plants and animals, drawn on the same scale of genetic distance
BACTERIA
ARCHAEA
EUCARYA
Unicellulareukaryotes
Animals Plants
Macro-organisms
Protozoans
Flav
obac
teriu
m
Crenarchaeota
EuryarchaeotaChlamydiae
Cyanobacteria
Pro
teob
acte
ria
Act
inob
acte
ria
Chlorobi
Clostridium
Bacillus
Chloroflexi
Acidobacteria
Giardia
Saccharomyces
Trypanosoma
Slime mold
Babesia
Aquifi
cae
Ther
moto
ga
Thermus
Deinoco
ccus
Firmicutes
Bacteroidetes
Spirochaetes
Pla
ncto
myc
etes
16S rRNA phylogenetic
tree
locate rRNA sequences
Basic genome statistics
njplot
extractseqs
clustalw
Genome atlas
Published annotated
genes/proteins
genomeAtlas
sed
chmod
genewiz
Examine GenBank
files
mousepad
basicgenomeanalysis
Genefinding, genes/proteins
Amino acid and codon
usage
Number of genes/proteins
Information table for all genomes.
Add information to this table as you do the exercises
Subset specific gene
counts
MONDAY Tuesday Wednesday Thursday
Pan and core
genome plot
Raw DNA sequence
Comparative Bacterial Genomics Workshop, Centers for Disease Control, Atlanta, Georgia, USA 27 August, 2012CBS, Department of Systems Biology 4
Comparative Bacterial Genomics Workshop, Centers for Disease Control, Atlanta, Georgia, USA 27 August, 2012CBS, Department of Systems Biology 5
SOU
RCE:
NCB
I; G
RAPH
ICS
BY N
. SPE
NCE
R &
W. F
ERN
AN
DES
At the time of the announcement of the first drafts of the human genome in 2000, there were 8 billion base pairs of sequence in the three main databases for ‘finished’ sequence: GenBank, run by the US National Center for Biotechnology
Information; the DNA Databank of Japan; and the European Molecu-lar Biology Laboratory (EMBL) Nucleotide Sequence Database. The databases share their data regularly as part of the International Nucle-otide Sequence Database Collaboration (INSDC). In the subsequent first post-genome decade, they have added another 270 billion bases to the collection of finished sequence, doubling the size of the database roughly every 18 months. But this number is dwarfed by the amount of raw sequence that has been created and stored by researchers around the world in the Trace archive and Sequence Read Archive (SRA). See Editorial, page 649, and human genome special at www.nature.com/humangenome
1. Venter, J. C. et al. Science 291, 1304–1351 (2001). 2. International Human Genome Sequencing
Consortium Nature 409, 860–921 (2001). 3. International Human Genome Sequencing
Consortium Nature 431, 931–945 (2004).4. Levy, S. et al. PLoS Biol. 5, e254 (2007). 5. Wheeler, D. A. et al. Nature 452, 872–876 (2008).6. Ley, T. J. et al. Nature 456, 66–72 (2008). 7. Bentley, D. R. et al. Nature 456, 53–59 (2008). 8. Wang, J. et al. Nature 456, 60–65 (2008).
9. Ahn, S.-M. et al. Genome Res. 19, 1622–1629 (2009).
10. Kim, J.-I. et al. Nature 460, 1011–1015 (2009). 11. Pushkarev, D., Neff, N. F. & Quake, S. R. Nature
Biotechnol. 27, 847–850 (2009). 12. Mardis, E. R. et al. N. Engl. J. Med. 10, 1058–1066
(2009).13. Drmanac, R. et al. Science 327, 78–81 (2009).14. McKernan, K. J. et al. Genome Res. 19, 1527–1541
(2009).
15. Pleasance, E. D. et al. Nature 463, 191–196 (2010). 16. Pleasance, E. D. et al. Nature 463, 184–190 (2010). 17. Clark, M. J. et al. PLoS Genet. 6, e1000832 (2010).18. Rasmussen, M. et al. Nature 463, 757–762 (2010).19. Schuster, S. C. et al. Nature 463, 943–947 (2010). 20. Lupski, J. R. et al. N. Engl. J. Med. doi:10.1056/
NEJMoa0908094 (2010). 21. Roach, J. C. et al. Science doi:10.1126/
science.1186802 (2010).
The graphic shows all published, fully sequenced hu-man genomes since 2000, including nine from the first quarter of 2010. Some are resequencing e!orts on the same person and the list does not include unpublished completed genomes.
HOW MANY HUMAN GENOMES?
THE SEQUENCE EXPLOSION
670
Vol 464|1 April 2010
670
NATURE|Vol 464|1 April 2010
671
Vol 464|1 April 2010
671
NATURE|Vol 464|1 April 2010 HUMAN GENOME AT TEN NEWS FEATURENEWS FEATURE HUMAN GENOME AT TEN
© 20 Macmillan Publishers Limited. All rights reserved10
Comparative Bacterial Genomics Workshop, Centers for Disease Control, Atlanta, Georgia, USA 27 August, 2012CBS, Department of Systems Biology 6
From The New York Times, 22 August, 2012
Genome Detectives Solve a Hospital’s Deadly OutbreakBy GINA KOLATA
The ambulance sped up to the red brick federal research hospital on June 13, 2011, and paramedics rushed a gravely ill 43-year-old woman straight to intensive care. She had a rare lung disease and was gasping for breath. And, just hours before, the hospital learned she had been infected with a deadly strain of bacteria resistant to nearly all antibiotics.
The hospital employed the most stringent and severe form of isolation, but soon the bacterium, Klebsiella pneumoniae, was spreading through the hospital. Seventeen patients got it, and six of them died. Had they been infected by the woman? And, if so, how did the bacteria escape strict controls in one of the nation’s most sophisticated hospitals, the Clinical Center of the National Institutes of Health in Bethesda, Md.?
What followed was a medical detective story that involved the rare use of rapid genetic sequencing to map the entire genome of a bacterium as it spread and to use that information to detect its origins and trace its route.
“We had never done this type of research in real time,” said Julie Segre, the researcher who led the effort.
The results, published online Wednesday in the journal Science Translational Medicine, revealed a totally unexpected chain of transmission and an organism that can lurk undetected for much longer than anyone had known. The method used may eventually revolutionize how hospitals deal with hospital-acquired infections, which contribute to more than 99,000 deaths a year.
....
Comparative Bacterial Genomics Workshop, Centers for Disease Control, Atlanta, Georgia, USA 27 August, 2012CBS, Department of Systems Biology
NATURE BIOTECHNOLOGY VOLUME 27 NUMBER 7 JULY 2009 631
primary factor limiting the understanding of our microbial planet is, in fact, the need for even larger quantities of data. In a remarkable achievement, the Sorcerer II Global Ocean Sampling Expedition34 sequenced over six million microbial genes, almost doubling the size of GenBank at the time. However, viewed from the perspective of the actual extent of microbial diversity35, such efforts are, and will remain, extremely small scale. The remarkable number of microbes (Table 1)—already estimated to be several orders of magnitude greater than the number of stars in the universe—urgently calls for a transition from random, anecdotal and small-scale surveys toward a systematic and comprehensive exploration of our planet.
This cannot be achieved by the efforts of individual researchers but requires the establishment of effective national and interna-tional collaborations. For comparison, space and planetary explo-ration could never have been realized by a single researcher or even a small network. To achieve those goals, a National Aeronautics and Space Administration (NASA; Houston, TX, USA) was formed in the United States, with similar national efforts introduced in several other countries. The success of NASA can serve as a model here.
It is imperative to see the formation of national Microbial Environmental Genomics Administrations (MEGA) launched around the globe. Current ongoing international efforts include the International Census for Marine Microbes (ICoMM) (http://www.coml.org/descrip/icomm.htm) and the International Soil Metagenome Sequencing Project, or so-called ‘Terragenome’ (http://terragenome.org/). National initiatives include the Australian Genome Alliance (http://www.genomealliance.org.au/) and the MikroBioKosmos initiative in Greece (http://www.mikrobiokosmos.org/).
Clearly, efforts of this magnitude require substantial invest-ment. To explore and seek to understand how the Earth breaths, grows, evolves, renews and sustains life—all essentially the work of the microbial world—is the great adventure now beckoning to us. Microbial genomics paves the way forward.
Note: Supplementary information is available on the Nature Biotechnology website.
Published online at http://www.nature.com/naturebiotechnology/Reprints and permissions information is available online at http://npg.nature.com/reprintsandpermissions/
ACKNOWLEDGMENTSI would like to thank C. Woese, P. Hugenholtz and C. Ouzounis for their critical reading and helpful suggestions, and M. Youle for her excellent editorial assistance. Special thanks to the members of the Genome Biology Program at the Joint Genome Institute for keeping me constantly in a most challenging and stimulating environment.
1. Roberts, R.J. Identifying protein function–a call for community action. PLoS Biol. 2, E42 (2004).
comparative analysis of microorganisms but now redefined as dynamic communities that may be computationally represented as pangenomes. Looking back at the breakthroughs that have brought genomics to where it stands today, we find that in 1960–1990, the era of ribosomal RNA, we were building the tree of life and establishing the framework for the genomics revolution of 1990–2010, when we were growing the tree of life. The next decade (2010–2020) will be marked as the era of pangenomics, defined as finally understanding the tree of life.
New technologies, new ways forwardThe greatest challenge to increasing our genomic coverage of micro-bial diversity lies in obtaining the DNA to sequence. More than 99% of the currently known microbial diversity resides in unculturable organisms. Of those that can be cultured, many are difficult to grow or grow only very slowly. Some present hindrances to DNA extrac-tion. Growing the organisms for even a hundred sequencing projects consumes huge resources and requires much infrastructure. Most importantly, unlike DNA sequencing and data analysis, provisioning of DNA does not seem to be scaling up to expedite the process.
Community metagenomics cannot fill this gap, as discrete genomes cannot be assembled from the metagenomic data obtained from most environments. Therefore, our best hope for the future may lay in a new direction: single-cell genomics31. Already, current technology can provide ~70% coverage of a microbial genome by sequencing the DNA from an individual microbial cell31. It has been predicted that cover-age will increase to ~95% within the next 3–5 years, owing to intense technology development. Even at the current coverage, this approach constitutes a major breakthrough that has opened a window into vast, previously inaccessible realms of unculturable microbial diversity.
Community metagenomics can be partnered with single-cell genom-ics, an approach that will likely become common for metagenomic projects. In parallel with sampling and sequencing the metagenome for an environment of medium complexity, single-cell techniques can be used to sequence several of the individual cell types present. Even at the current 70% coverage, this would provide representative reference genomes for that environment and lead to a more holistic understand-ing of the community and its individual members.
For those culturable organisms for which complete genome sequences can already be obtained, greater insights will emerge from bridging the gap between genotype and phenotype as expected from the integration of transcriptomics and proteomics with genomics. For the most part, genes in sequenced microbial genomes are computationally predicted based on the location of start and stop codons within the sequence. Thus, gene prediction is essentially protein prediction, and there is little known about the transcribed but untranslated regions (UTRs) at either end. Coordinating a genome with its companion transcriptome and proteome can provide experimental confirmation of the accuracy of those pre-dictions and can reveal genes missed by computational approaches32. Transcriptomes can extend known protein-coding sequences to include the UTRs, thus identifying the locations where transcription starts and stops. Overall, the advent of new sequencing technologies is opening entire new worlds of possibilities in microbial genomics, ranging from the identification of novel small regulatory RNAs33 to elucidation of the mechanisms underlying the generation of genetic diversity. Indeed, as sequencing technology becomes cheaper, faster and more accurate, rese-quencing, and by effect, studies on the origins of mutations and popula-tion variability, are finally within our reach.
National and international initiatives: a MEGA approachAlthough one of the greatest challenges ahead lies in managing the current exponential growth in sequence data, it is ironic that the
Table 1 Estimating the magnitude of microbial diversityNumber of bacteriophages on Earth 1031
Number of microbes on Earth 5 1030
Number of stars in the universe 7 1021
Number of microbes in all humans 6 1023
Number of humans 6 109
Number of microbial cells in one human gut 1014
Number of human cells in one human 1013
Number of microbial genes in one human gut 3 106
Number of genes in the human genome 2.5 104
Combined length of all bacteriophages on Earth 108 Ly
Diameter of the Milky Way 105 Ly
PERSPECT I VE
"An inordinate fondness of bacteria..."7
O-chain
Transport
TransportTransport
Transport
O-chainO-chain
O-chain
O-chain
O-chain
O-chain
Transport
Transport
Transport
O-chain
Transport
Transport
O-chain
Transport
O-chain
O-chain
Transport
O-chain
Transport
O-chain
Transport
O-chain
Transport
O-chain
Trans
port
O-ch
ain
Tran
spor
t O-ch
ain
Tran
spor
t
Comparative Bacterial Genomics Workshop, Centers for Disease Control, Atlanta, Georgia, USA 27 August, 2012CBS, Department of Systems Biology
A Brief History of Biological Information
“THE LOGIC OF LIFE - A History of Heredity”, by Francois Jacob (Vintage Books, A Division of Random House, New York, 1973, translated by Betty E. Spillman).
Based on three excellent books:
“WHO WROTE THE BOOK OF LIFE? - A History of The Genetic Code”, by Lily E. Kay (Stanford University Press, Stanford, California, 2000).
“THE INSIDE STORY - DNA to RNA to Protein”, edited by Jan Witkowski (Cold Spring Harbor Press, New York, 2005).
9
Comparative Bacterial Genomics Workshop, Centers for Disease Control, Atlanta, Georgia, USA 27 August, 2012CBS, Department of Systems Biology
Aristotle ~350 B.C.
plants animals minerals
10
Comparative Bacterial Genomics Workshop, Centers for Disease Control, Atlanta, Georgia, USA 27 August, 2012CBS, Department of Systems Biology
www.sciencemag.org SCIENCE VOL 306 1 OCTOBER 2004
CRE
DIT
:E.V
.ARM
BRU
ST E
T AL
.
ScienceScope
31
Experts Probe Flu Death,Call for Poultry VaccinationA 26-year-old woman in Thailand who diedof avian influenza earlier this month proba-bly contracted the disease from her daugh-ter, researchers said this week. But WorldHealth Organization (WHO) scientists arecautiously optimistic that the developmentis not the start of a major outbreak. Mean-while, several global health groups are call-ing for increased vaccination of SoutheastAsia’s poultry flocks in a bid to corral thedangerous H5N1 virus.
Researchers say the woman, who livedin the Bangkok area, had returned to a ru-ral village in northern Thailand to care forher sick daughter, who probably contract-ed the virus from local chickens. Thedaughter was cremated before re-searchers could collect tissue samplesthat could confirm her illness. But tissuesamples from the mother proved positivefor H5N1. The woman’s sister has alsotested positive for the virus and is in ahospital isolation ward.
Evidence to date suggests a case of“nonsustained, dead-end transmission,”says WHO virologist Klaus Stöhr. Similarcases have been documented in the past.But until the WHO collaborating center inAtlanta, Georgia, analyzes the new sam-ples, experts won’t know definitivelywhether the virus has mutated to a moredangerous form. So far, says Stöhr, Thaiauthorities have detected no increase inrespiratory disease among villagers orhealth workers who cared for the patients.
To keep the virus in check, governmentsshould be vaccinating and not just cullingpoultry flocks, the United Nations Food andAgriculture Organization and the World Or-ganisation for Animal Health said in a 28September statement. China and Indonesiaalready have vaccination programs. But Thai-land and other nations do not, in part be-cause poultry exporters fear importingcountries will ban products from vaccinatedbirds, which don’t exhibit flu symptoms butcan still carry the virus.
–DENNIS NORMILE
Boehlert Has BypassRepresentative Sherwood Boehlert(R–NY) is taking an unexpected breakfrom his duties as chair of the House Sci-ence Committee. Boehlert this week un-derwent triple coronary bypass surgery atthe National Naval Medical Center inBethesda, Maryland, after doctors discov-ered several blocked arteries. He’s expect-ed to be back to work within weeks.
–DAVID MALAKOFF
Diatoms are an enigma. Neither plant noranimal, they share biochemical features ofboth. Though simple single-celled algae,they are covered with elegant casingssculpted from silica.
Now a team of 45 biologists has taken abig step toward resolving the paradoxical na-ture of these odd microbes. They have se-quenced the genome of Thalassiosira
pseudonana, which lives in salt water and isa lab favorite among diatom experts. Thework should prove useful to ecologists, geol-ogists, and even biomedical researchers, saysEdward Theriot, a diatom systematist at theUniversity of Texas, Austin: “We’ve justjumped a generation ahead by having thiskind of understanding of this genome.”
Diatoms date back 180 million years, andremnants of their silica shells make up porousrock called diatomite that is used in industrialfilters. Today diatoms occupy vast swaths ofocean and fresh water, where they play a keyrole in the global carbon cycle. Diatomphotosynthesis yields 19 billion tons of or-ganic carbon, about 40% of the marine car-bon produced each year; thus, by processingcarbon dioxide into solid matter, they repre-sent a key defense against global warming.
Many marine organisms feaston diatoms. When conditions areripe, the algae can multiply at as-tonishing rates, creating ocean“blooms” that are sometimes tox-ic. These blooms can suffocatenearby marine life or make a toxinthat harms people who eat infect-ed shellfish. “This is a group oforganisms that has amazing im-portance in global ecology,” saysDeborah Robertson, an algal phys-iologist at Clark University inWorcester, Massachusetts.
Since 2002, Daniel Rokhsar, agenomicist at the DOE JointGenome Institute in Walnut Creek,California, and his colleagues have been un-raveling the genome of T. pseudonana. Theywere aided by a technique called optical map-ping, in which stretched-out chromosomesare nicked by enzymes and viewed through alight microscope. Those nicked pieces ofDNA stay in order and enable the sequencersto assemble almost all the bases in the correctplace on the right chromosomes.
The draft genome consists of 34 millionbases, Rokhsar, E. Virginia Armbrust, anoceanographer at the University of Wash-ington, Seattle, and their colleagues reporton page 79 of this issue. They ultimatelyfound about 11,500 genes along the di-atom’s chromosomes and along the DNA
in its chloroplast and mitochondria. Analyses of these genes and the pro-
teins they encode confirm that diatomshave had a complex history. Like other earlymicrobes, they apparently acquired newgenes by engulfing microbial neighbors.Perhaps the most significant acquisitionwas an algal cell that provided the diatomwith photosynthetic machinery.
Some biologists hypothesize that diatomsbranched off from an ancestral nucleated mi-crobe from which plants and animals laterarose, a theory supported by the identificationof T. pseudonana genes in some plant and an-imal genomes. As diatoms, plants, and ani-mals evolved, each must have shed differentgenes from this common ancestor. As a result,diatoms were left with what looks like a mixof plant and animal DNA, plus other genesthat are remnants of the engulfed algae.
The new data support this complex scenario, says Robertson. Some 182 T.pseudonana proteins are related only to redalgae proteins; another 865 proteins arefound just among plants. About half theproteins encoded by the rest of the di-atom’s genes are equally similar to coun-terparts in plants, animals, and red algae.
The newly analyzed genome has alsobegun to shed light on how a diatom con-structs its intricately patterned glass shell.So far, Rokhsar and his colleagues haveuncovered a dozen proteins involved in thedeposition of the silicon and expect to findmore. Such progress could be a boon tomaterials scientists. “Being able to under-stand [silica processing] should have apayoff in nanofabrication,” says Robertson.
Currently, a mere 100 or so researcherscall themselves diatom specialists. With thegenome in hand, interest in diatoms is goingto expand, Theriot predicts: “It will help putdiatoms on everyone’s radar.”
–ELIZABETH PENNISI
DNA Reveals Diatom’s Complexity
GENET I C S
Aqueous snowflake. The sequence of a diatom should reveal the secrets of its decorative shell.
Published by AAAS
www.sciencemag.org SCIENCE VOL 306 1 OCTOBER 2004
CRE
DIT
:E.V
.ARM
BRU
ST E
T AL
.
ScienceScope
31
Experts Probe Flu Death,Call for Poultry VaccinationA 26-year-old woman in Thailand who diedof avian influenza earlier this month proba-bly contracted the disease from her daugh-ter, researchers said this week. But WorldHealth Organization (WHO) scientists arecautiously optimistic that the developmentis not the start of a major outbreak. Mean-while, several global health groups are call-ing for increased vaccination of SoutheastAsia’s poultry flocks in a bid to corral thedangerous H5N1 virus.
Researchers say the woman, who livedin the Bangkok area, had returned to a ru-ral village in northern Thailand to care forher sick daughter, who probably contract-ed the virus from local chickens. Thedaughter was cremated before re-searchers could collect tissue samplesthat could confirm her illness. But tissuesamples from the mother proved positivefor H5N1. The woman’s sister has alsotested positive for the virus and is in ahospital isolation ward.
Evidence to date suggests a case of“nonsustained, dead-end transmission,”says WHO virologist Klaus Stöhr. Similarcases have been documented in the past.But until the WHO collaborating center inAtlanta, Georgia, analyzes the new sam-ples, experts won’t know definitivelywhether the virus has mutated to a moredangerous form. So far, says Stöhr, Thaiauthorities have detected no increase inrespiratory disease among villagers orhealth workers who cared for the patients.
To keep the virus in check, governmentsshould be vaccinating and not just cullingpoultry flocks, the United Nations Food andAgriculture Organization and the World Or-ganisation for Animal Health said in a 28September statement. China and Indonesiaalready have vaccination programs. But Thai-land and other nations do not, in part be-cause poultry exporters fear importingcountries will ban products from vaccinatedbirds, which don’t exhibit flu symptoms butcan still carry the virus.
–DENNIS NORMILE
Boehlert Has BypassRepresentative Sherwood Boehlert(R–NY) is taking an unexpected breakfrom his duties as chair of the House Sci-ence Committee. Boehlert this week un-derwent triple coronary bypass surgery atthe National Naval Medical Center inBethesda, Maryland, after doctors discov-ered several blocked arteries. He’s expect-ed to be back to work within weeks.
–DAVID MALAKOFF
Diatoms are an enigma. Neither plant noranimal, they share biochemical features ofboth. Though simple single-celled algae,they are covered with elegant casingssculpted from silica.
Now a team of 45 biologists has taken abig step toward resolving the paradoxical na-ture of these odd microbes. They have se-quenced the genome of Thalassiosira
pseudonana, which lives in salt water and isa lab favorite among diatom experts. Thework should prove useful to ecologists, geol-ogists, and even biomedical researchers, saysEdward Theriot, a diatom systematist at theUniversity of Texas, Austin: “We’ve justjumped a generation ahead by having thiskind of understanding of this genome.”
Diatoms date back 180 million years, andremnants of their silica shells make up porousrock called diatomite that is used in industrialfilters. Today diatoms occupy vast swaths ofocean and fresh water, where they play a keyrole in the global carbon cycle. Diatomphotosynthesis yields 19 billion tons of or-ganic carbon, about 40% of the marine car-bon produced each year; thus, by processingcarbon dioxide into solid matter, they repre-sent a key defense against global warming.
Many marine organisms feaston diatoms. When conditions areripe, the algae can multiply at as-tonishing rates, creating ocean“blooms” that are sometimes tox-ic. These blooms can suffocatenearby marine life or make a toxinthat harms people who eat infect-ed shellfish. “This is a group oforganisms that has amazing im-portance in global ecology,” saysDeborah Robertson, an algal phys-iologist at Clark University inWorcester, Massachusetts.
Since 2002, Daniel Rokhsar, agenomicist at the DOE JointGenome Institute in Walnut Creek,California, and his colleagues have been un-raveling the genome of T. pseudonana. Theywere aided by a technique called optical map-ping, in which stretched-out chromosomesare nicked by enzymes and viewed through alight microscope. Those nicked pieces ofDNA stay in order and enable the sequencersto assemble almost all the bases in the correctplace on the right chromosomes.
The draft genome consists of 34 millionbases, Rokhsar, E. Virginia Armbrust, anoceanographer at the University of Wash-ington, Seattle, and their colleagues reporton page 79 of this issue. They ultimatelyfound about 11,500 genes along the di-atom’s chromosomes and along the DNA
in its chloroplast and mitochondria. Analyses of these genes and the pro-
teins they encode confirm that diatomshave had a complex history. Like other earlymicrobes, they apparently acquired newgenes by engulfing microbial neighbors.Perhaps the most significant acquisitionwas an algal cell that provided the diatomwith photosynthetic machinery.
Some biologists hypothesize that diatomsbranched off from an ancestral nucleated mi-crobe from which plants and animals laterarose, a theory supported by the identificationof T. pseudonana genes in some plant and an-imal genomes. As diatoms, plants, and ani-mals evolved, each must have shed differentgenes from this common ancestor. As a result,diatoms were left with what looks like a mixof plant and animal DNA, plus other genesthat are remnants of the engulfed algae.
The new data support this complex scenario, says Robertson. Some 182 T.pseudonana proteins are related only to redalgae proteins; another 865 proteins arefound just among plants. About half theproteins encoded by the rest of the di-atom’s genes are equally similar to coun-terparts in plants, animals, and red algae.
The newly analyzed genome has alsobegun to shed light on how a diatom con-structs its intricately patterned glass shell.So far, Rokhsar and his colleagues haveuncovered a dozen proteins involved in thedeposition of the silicon and expect to findmore. Such progress could be a boon tomaterials scientists. “Being able to under-stand [silica processing] should have apayoff in nanofabrication,” says Robertson.
Currently, a mere 100 or so researcherscall themselves diatom specialists. With thegenome in hand, interest in diatoms is goingto expand, Theriot predicts: “It will help putdiatoms on everyone’s radar.”
–ELIZABETH PENNISI
DNA Reveals Diatom’s Complexity
GENET I C S
Aqueous snowflake. The sequence of a diatom should reveal the secrets of its decorative shell.
Published by AAAS
www.sciencemag.org SCIENCE VOL 306 1 OCTOBER 2004
CRE
DIT
:E.V
.ARM
BRU
ST E
T AL
.
ScienceScope
31
Experts Probe Flu Death,Call for Poultry VaccinationA 26-year-old woman in Thailand who diedof avian influenza earlier this month proba-bly contracted the disease from her daugh-ter, researchers said this week. But WorldHealth Organization (WHO) scientists arecautiously optimistic that the developmentis not the start of a major outbreak. Mean-while, several global health groups are call-ing for increased vaccination of SoutheastAsia’s poultry flocks in a bid to corral thedangerous H5N1 virus.
Researchers say the woman, who livedin the Bangkok area, had returned to a ru-ral village in northern Thailand to care forher sick daughter, who probably contract-ed the virus from local chickens. Thedaughter was cremated before re-searchers could collect tissue samplesthat could confirm her illness. But tissuesamples from the mother proved positivefor H5N1. The woman’s sister has alsotested positive for the virus and is in ahospital isolation ward.
Evidence to date suggests a case of“nonsustained, dead-end transmission,”says WHO virologist Klaus Stöhr. Similarcases have been documented in the past.But until the WHO collaborating center inAtlanta, Georgia, analyzes the new sam-ples, experts won’t know definitivelywhether the virus has mutated to a moredangerous form. So far, says Stöhr, Thaiauthorities have detected no increase inrespiratory disease among villagers orhealth workers who cared for the patients.
To keep the virus in check, governmentsshould be vaccinating and not just cullingpoultry flocks, the United Nations Food andAgriculture Organization and the World Or-ganisation for Animal Health said in a 28September statement. China and Indonesiaalready have vaccination programs. But Thai-land and other nations do not, in part be-cause poultry exporters fear importingcountries will ban products from vaccinatedbirds, which don’t exhibit flu symptoms butcan still carry the virus.
–DENNIS NORMILE
Boehlert Has BypassRepresentative Sherwood Boehlert(R–NY) is taking an unexpected breakfrom his duties as chair of the House Sci-ence Committee. Boehlert this week un-derwent triple coronary bypass surgery atthe National Naval Medical Center inBethesda, Maryland, after doctors discov-ered several blocked arteries. He’s expect-ed to be back to work within weeks.
–DAVID MALAKOFF
Diatoms are an enigma. Neither plant noranimal, they share biochemical features ofboth. Though simple single-celled algae,they are covered with elegant casingssculpted from silica.
Now a team of 45 biologists has taken abig step toward resolving the paradoxical na-ture of these odd microbes. They have se-quenced the genome of Thalassiosira
pseudonana, which lives in salt water and isa lab favorite among diatom experts. Thework should prove useful to ecologists, geol-ogists, and even biomedical researchers, saysEdward Theriot, a diatom systematist at theUniversity of Texas, Austin: “We’ve justjumped a generation ahead by having thiskind of understanding of this genome.”
Diatoms date back 180 million years, andremnants of their silica shells make up porousrock called diatomite that is used in industrialfilters. Today diatoms occupy vast swaths ofocean and fresh water, where they play a keyrole in the global carbon cycle. Diatomphotosynthesis yields 19 billion tons of or-ganic carbon, about 40% of the marine car-bon produced each year; thus, by processingcarbon dioxide into solid matter, they repre-sent a key defense against global warming.
Many marine organisms feaston diatoms. When conditions areripe, the algae can multiply at as-tonishing rates, creating ocean“blooms” that are sometimes tox-ic. These blooms can suffocatenearby marine life or make a toxinthat harms people who eat infect-ed shellfish. “This is a group oforganisms that has amazing im-portance in global ecology,” saysDeborah Robertson, an algal phys-iologist at Clark University inWorcester, Massachusetts.
Since 2002, Daniel Rokhsar, agenomicist at the DOE JointGenome Institute in Walnut Creek,California, and his colleagues have been un-raveling the genome of T. pseudonana. Theywere aided by a technique called optical map-ping, in which stretched-out chromosomesare nicked by enzymes and viewed through alight microscope. Those nicked pieces ofDNA stay in order and enable the sequencersto assemble almost all the bases in the correctplace on the right chromosomes.
The draft genome consists of 34 millionbases, Rokhsar, E. Virginia Armbrust, anoceanographer at the University of Wash-ington, Seattle, and their colleagues reporton page 79 of this issue. They ultimatelyfound about 11,500 genes along the di-atom’s chromosomes and along the DNA
in its chloroplast and mitochondria. Analyses of these genes and the pro-
teins they encode confirm that diatomshave had a complex history. Like other earlymicrobes, they apparently acquired newgenes by engulfing microbial neighbors.Perhaps the most significant acquisitionwas an algal cell that provided the diatomwith photosynthetic machinery.
Some biologists hypothesize that diatomsbranched off from an ancestral nucleated mi-crobe from which plants and animals laterarose, a theory supported by the identificationof T. pseudonana genes in some plant and an-imal genomes. As diatoms, plants, and ani-mals evolved, each must have shed differentgenes from this common ancestor. As a result,diatoms were left with what looks like a mixof plant and animal DNA, plus other genesthat are remnants of the engulfed algae.
The new data support this complex scenario, says Robertson. Some 182 T.pseudonana proteins are related only to redalgae proteins; another 865 proteins arefound just among plants. About half theproteins encoded by the rest of the di-atom’s genes are equally similar to coun-terparts in plants, animals, and red algae.
The newly analyzed genome has alsobegun to shed light on how a diatom con-structs its intricately patterned glass shell.So far, Rokhsar and his colleagues haveuncovered a dozen proteins involved in thedeposition of the silicon and expect to findmore. Such progress could be a boon tomaterials scientists. “Being able to under-stand [silica processing] should have apayoff in nanofabrication,” says Robertson.
Currently, a mere 100 or so researcherscall themselves diatom specialists. With thegenome in hand, interest in diatoms is goingto expand, Theriot predicts: “It will help putdiatoms on everyone’s radar.”
–ELIZABETH PENNISI
DNA Reveals Diatom’s Complexity
GENET I C S
Aqueous snowflake. The sequence of a diatom should reveal the secrets of its decorative shell.
Published by AAAS
11
Comparative Bacterial Genomics Workshop, Centers for Disease Control, Atlanta, Georgia, USA 27 August, 2012CBS, Department of Systems Biology 12
BACTERIA
ARCHAEA
EUCARYA
Unicellular
eukaryotes
Animals Plants
Macro-organisms
Protozoans
Flavobacterium
Crenarc
haeota
EuryarchaeotaChlamydiae
Cyanobacteria
Pro
teobacte
ria
Actinobacte
ria
Chlorobi
Clostridium
Bacillus
Chloro
flexi
Acidobacteria
Giardia
Saccharomyces
Trypanosoma
Slime mold
Babesia
Firmicutes
Bacteroidetes
Spirochaetes
Pla
ncto
mycete
s
Comparative Bacterial Genomics Workshop, Centers for Disease Control, Atlanta, Georgia, USA 27 August, 2012CBS, Department of Systems Biology
Gregor Mendel 1866
genes13
Comparative Bacterial Genomics Workshop, Centers for Disease Control, Atlanta, Georgia, USA 27 August, 2012CBS, Department of Systems Biology
Albrecht Kossel 1881
14
Comparative Bacterial Genomics Workshop, Centers for Disease Control, Atlanta, Georgia, USA 27 August, 2012CBS, Department of Systems Biology
T.H. Morgan 1919
Chromosomes
15
Comparative Bacterial Genomics Workshop, Centers for Disease Control, Atlanta, Georgia, USA 27 August, 2012CBS, Department of Systems Biology
G. Beadle & E. Tatum1930s
one enzyme
one gene
16
Comparative Bacterial Genomics Workshop, Centers for Disease Control, Atlanta, Georgia, USA 27 August, 2012CBS, Department of Systems Biology
O. Avery 1941
17
Comparative Bacterial Genomics Workshop, Centers for Disease Control, Atlanta, Georgia, USA 27 August, 2012CBS, Department of Systems Biology
E. Schrödinger 1943
“We believe a gene - or perhaps the whole chromosome fibre - to be an aperiodic solid.”
“...For an illustration, think of Morse code...”
"What is Life?" by Erwin Schrödinger
(Cambridge University Press, 1944)
18
Comparative Bacterial Genomics Workshop, Centers for Disease Control, Atlanta, Georgia, USA 27 August, 2012CBS, Department of Systems Biology
Watson & Crick 1953
19
Comparative Bacterial Genomics Workshop, Centers for Disease Control, Atlanta, Georgia, USA 27 August, 2012CBS, Department of Systems Biology
DNA is like Coca-cola!Coke DNA
Water WaterSugar (sucrose) (deoxyribose)
Phosphate acid (PO4) backbone
Caffeine Bases
20
Comparative Bacterial Genomics Workshop, Centers for Disease Control, Atlanta, Georgia, USA 27 August, 2012CBS, Department of Systems Biology
A=T
A=TA=TTilt
Twist
Roll
Propellor Twist
A=T
A=T
A=T
A == TDNA bases will spontaneously stack on
top of each other and form a helix!
21
Comparative Bacterial Genomics Workshop, Centers for Disease Control, Atlanta, Georgia, USA 27 August, 2012CBS, Department of Systems Biology 22
Comparative Bacterial Genomics Workshop, Centers for Disease Control, Atlanta, Georgia, USA 27 August, 2012CBS, Department of Systems Biology
MinorGroove
MajorGroove
1.12
3600= one helical
turn
10.5 bp per turn
34.30 twist angle
(rotation per residue)3.4A Axial Rise
Base Pair Tilt - 6o
Helix Pitch
35.7A
34.3o
Helix Diameter
20A
23
Comparative Bacterial Genomics Workshop, Centers for Disease Control, Atlanta, Georgia, USA 27 August, 2012CBS, Department of Systems Biology
The “DNA code”
24
Comparative Bacterial Genomics Workshop, Centers for Disease Control, Atlanta, Georgia, USA 27 August, 2012CBS, Department of Systems Biology
mRNA 5’...GAUCUAGCGAUGCCGAUGAAACAUGAUCAUG...3’
DNA 5’...GATCTAGCGATGCCGATGAAACATGATCATG...3’3’...CTAGATCGCTACGGCTACTTTGTACTAGTAC...5’
Protein N met-pro-met-lys-his-his-his...C
transcription
translation
25
Comparative Bacterial Genomics Workshop, Centers for Disease Control, Atlanta, Georgia, USA 27 August, 2012CBS, Department of Systems Biology
Genome -> Transcriptome ->Proteome
2. The Central Dogma
Once information flows to protein, it cannot come back!
1. The Sequence Hypothesis
The amino acid sequence in proteins is specified from DNA and RNA.
The General Idea
26
Comparative Bacterial Genomics Workshop, Centers for Disease Control, Atlanta, Georgia, USA 27 August, 2012CBS, Department of Systems Biology 27
Comparative Bacterial Genomics Workshop, Centers for Disease Control, Atlanta, Georgia, USA 27 August, 2012CBS, Department of Systems Biology 28
Comparative Bacterial Genomics Workshop, Centers for Disease Control, Atlanta, Georgia, USA 27 August, 2012CBS, Department of Systems Biology
Biological Sequences as Information DNA sequences as information
1. DNA sequence can code for an amino acid sequences (mRNAs)
2. The DNA sequence can code for stable RNA sequencessnRNA telomerase RNA
3. The DNA sequence can code for protein binding sites
4. The DNA can code for architectural information
nucleosome positioning
5. The DNA can code for structural / stability informationtranscription initiationorigins of replication
intrinsic DNA curvature
mutational "hot spots"
rRNAtRNA
29
Comparative Bacterial Genomics Workshop, Centers for Disease Control, Atlanta, Georgia, USA 27 August, 2012CBS, Department of Systems Biology 30
Comparative Bacterial Genomics Workshop, Centers for Disease Control, Atlanta, Georgia, USA 27 August, 2012CBS, Department of Systems Biology
Biological Sequences as Information RNA sequences as information1. The mRNAs can contain several different levels of information:
- specifies amino acid sequence for proteins
- localisation signals for WHERE the protein will be made
- stability signals to determine HOW MUCH protein is made- splice sites
- editing sites
2. The tRNAs code for the genetic code - same in all living organisms
(n.b. diff. in mitochondria)
3. The rRNAs code for the structures of ribosomes
4. Other RNA/protein complexes
31
Comparative Bacterial Genomics Workshop, Centers for Disease Control, Atlanta, Georgia, USA 27 August, 2012CBS, Department of Systems Biology
Biological Sequences as Information
1. The protein sequence can code for an "active site" for enzymes
2. The protein sequence can code for structural roles:
microtubules, myosin, collagen, etc.
3. The protein sequence can code for ion channels/pumps
4. The protein sequence can code for localisation information
5. The protein sequence can code for modification sites
Protein sequences as information
32
Comparative Bacterial Genomics Workshop, Centers for Disease Control, Atlanta, Georgia, USA 27 August, 2012CBS, Department of Systems Biology 33
Comparative Bacterial Genomics Workshop, Centers for Disease Control, Atlanta, Georgia, USA 27 August, 2012CBS, Department of Systems Biology
Malaria
IMPs
14-3-3 proteins
Other Enzymes
Ubiquitin system
Oxidoreductase
GTPase/Regulators
Kinases/Phosphatases
Comparative Bacterial Genomics Workshop, Centers for Disease Control, Atlanta, Georgia, USA 27 August, 2012CBS, Department of Systems Biology
Summary (so far!)
Sequences StructureDNARNA
Protein
Function
35
36
Comparative Bacterial Genomics Workshop, Centers for Disease Control, Atlanta, Georgia, USA 27 August, 2012CBS, Department of Systems Biology 37
Questions:!What information is in the following sequence?
!How can you find out?
!Is the DNA sequence REALLY like a ‘language’?
>Mystery sequence - prize for the first person who tells me what this is!ATGGGACTACCCTGGTACCGCGTACATACAGTAGTTCTGAACGATCCAGGACGGCTGATTTCTGTACACCTAATGCACACTGCTCTTGTCGCAGGTTGGGCGGGCTCTATGGCCCTGTACGAATTGGCAGTTTTTGACCCATCAGACCCAGTTCTCAATCCCATGTGGCGTCAAGGTATGTTTGTCATGCCTTTTATGGCTCGTTTGGGTGTAACTCAATCCTGGGGTGGCTGGAGTCTAACTGGTGAAGTAGCCGATAATCCCGGAATTTGGTCTTTTGAAGGGGTAGCCGCTACCCATATCATCTTGTCAGGTCTATTATTCCTGGCAGCAGTTTGGCACTGGGTTTACTGGGATCTGGAACTGTTTACCGATCCTCGGACTGGTGAACCAGCCCTAGACCTACCCAAAATGTTCGGAATTCATTTATTCCTATCTGGTTTGCTTTGTTTTGGCTTCGGAGCCTTCCACCTCACGGGACTATTCGGACCGGGAATGTGGGTTTCTGACCCCTATGGATTGACGGGAAGTATACAACCTGTCGCTCCTTCCTGGGGGCCTGAAGGATTTAACCCCTTCAATGCTGGCGGTATTGCGGCTCACCATATTGCGGCCGGAATTGTTGGCATTATTGCCGGACTATTCCACCCGTCCGTCAGACCACCTCAGCGCCTATACAAAGCCCTGCGTATGGGAAATATCGAAACTGTACTATCTAGTAGTATCGCGGCGGTATTCTTTGCGGCTTTTGTGGTAGCTGGAACTATGTGGTATGGTTCGGCTGCAACTCCGATTGAACTGTTTGGACCTACCCGCTATCAGTGGGATCAGGGATATTTCCAACAGGAAATTCAGCGCCGGGTACAAAGCAGTATTGCTCAGGGTGACAGCCCCTCAGAAGCATGGTCTAAGATTCCTGAAAAACTGGCATTTTATGACTATGTTGGTAACAGTCCCGCTAAAGGCGGTTTGTTCCGCGTCGGTCCGATGAACAAGGGCGATGGTATTGCTCAAGGTTGGCTCGGACACCCAGTATTCACTGATGCAGAAGGTCGCGAATTAACTGTTCGTCGTCTTCCTAACTTCTTTGAAACCTTCCCCGTCATTCTGACTGATGCTGATGGCGTAATTCGCGCTGACGTTCCTTTCCGTCGCGCGGAGTCTCGCTACAGCTTTGAGCAAACTGGGGTGACTGTTTCTTTATATGGTGGTGAACTCAATGGTAAAACCTTCACCGATCCCGCCTCTGTGAAGAAATATGCCCGCTTTGCTCAACAGGGTGAACCATTTGCCTTTGACCGGGAAACTCTCGGCTCTGATGGGGTATTTCGTACCAGTACCCGTGGCTGGTTTACTTTCGGTCACGCTTGCTTTGCTCTGCTTTTCTTCTTTGGTCATATTTGGCACGGTTCCCGCACCATCTTCCGAGATGTATTTGCTGGGGTGGAAGCTGACCTAGAAGAACAAGTTGAGTGGGGTAACTTCCAGAAAGTTGGAGACCAAACAACTCGTGTTCAAAAGACCGTCTAA
top related