gene density over the chromosome of escherichia coli: frequency

6
JOURNAL OF BACTERIOLOGY, Aug. 1985, p. 806-811 Vol. 163, No. 2 0021-9193/85/080806-06$02.00/0 Copyright © 1985, American Society for Microbiology Gene Density over the Chromosome of Escherichia coli: Frequency Distribution, Spatial Clustering, and Symmetry JERZY JURKA AND MICHAEL A. SAVAGEAU* Department of Microbiology and Immunology, The University of Michigan, Ann Arbor, Michigan 48109 Received 9 January 1985/Accepted 20 May 1985 Published studies of gene density (the number of genetic loci per unit of length on the linkage map) for Escherichia coli report a nonrandom frequency distribution and indicate notable symmetry in spatial dustering of gene density. We reexamined these results and found that gene density is a random variable with a frequency distribution that is lognormal. That is, the logarithm of gene density is a normally distributed random variable. Furthermore, comparison of the observed E. coli map and computer-generated random maps showed that symmetries in the spatial clustering of gene density are not exceptional; these features arise naturaly among genes (or loci) whose density has this frequency distribution. These results are discussed along with other related examples that illustrate the emerging importance of statistical inference in molecular genetics. The phenomenal rate at which molecular genetic data are being accumulated, particularly with regard to nucleic acid sequences, has been commented on repeatedly (e.g., see reference 4). Several million bases are now cataloged in sequence libraries, and the number of genes characterized by more classical means has increased at a rate that is hardly less dramatic. As these vast amounts of molecular data are being amassed, new questions are being raised about mean- ingful patterns of information. Some questions relate to local patterns believed to provide specificity for various types of targets-targets for recombination, translational initiation, transcription termination, etc. Other questions are con- cerned with global patterns in chromosome organization: the distribution of gene density, the locations of functionally related genes, etc. Because of the large mass of data and the limited number of a priori constraints, it is relatively easy to find apparently meaningful patterns. However, recent experience has shown that not all these patterns are statistically significant. In this paper we examine three features of the genetic map of Escherichia coli: the frequency distribution of gene density, spatial clustering of gene density, and symmetry in gene density along the linkage map. Our results suggest that (i) gene density is a random variable with a skewed frequency distribution, (ii) clustering of gene density occurs naturally as a consequence of this skewed frequency distribution, and (iii) symmetries also are the by-product of the skewed distribution of gene density. Frequency distributions. Before we address the principal questions, it will be helpful to review a few points about frequency distributions. In Figure 1, three different frequency distributions of the random variable X are shown. The first is a uniform distri- bution, which means that X will assume each value between a and b with equal probability (Fig. 1A). The second is a normal distribution, which means that the central value of X is most probable; the probability that X will assume a different value decreases in a well-known fashion as X deviates further from the central value (Fig. 1B). The third, unlike the first two, is a skewed distribution, which implies that the lower values of X are more probable than the higher values (Fig. 1C). In this case, the distribution is the well- * Corresponding author. known lognormal distribution, so called because plotting the logarithm of the value of the random variable X against frequency yields the normal distribution described above (Fig. 1B). Many other distributions of random variables could be examined; for a comprehensive listing see the multivolume collection by Johnson and Kotz (14). Those in Fig. 1 will suffice for our purposes here. We emphasize three points about such distributions. First, these are all distributions of random variables. Second, differences in the underlying stochastic mechanisms gener- ate such distributions, although we might not know specifi- cally what the mechanisms are. Third, the implications of randomness depend on the distribution, and knowledge of the distribution must be taken into account if valid statistical inferences are to be drawn. Frequency distribution of gene density. Early analyses of gene density for E. coli found that density was a nonrandom variable (2). The apparent nonrandom character of these results has stimulated many investigators to seek an expla- nation in terms of deterministic structural or functional mechanisms. Unfortunately, the frequency distribution of gene density was not characterized adequately in these early analyses. A single type of frequency distribution was as- sumed, the actual distribution of gene density was compared with the assumed distribution, differences were found, and it was concluded that gene density is not a random variable. The actual frequency distribution was not compared with other types of frequency distributions of random variables (Fig. 1), and no direct characterization of the actual fre- quency distribution was reported. We analyzed data from both the 1976 (2) and 1983 (1) linkage maps of E. coli. Similar results were obtained, so we shall henceforth refer only to the more complete 1983 map. The map was divided into 200 intervals (size, 0.5 min each), and the number of loci in each interval (i.e., gene density) was determined. We excluded prophage attachment sites and markers with questionable map locations (those in parentheses on the published map), although our conclu- sions are not influenced if these are included (data not shown). The observed frequency distribution of the remain- ing 948 loci is summarized in Table 1. We compared this frequency distribution with a number of well-known fre- quency distributions and concluded on the basis of the 806 on April 3, 2019 by guest http://jb.asm.org/ Downloaded from

Upload: others

Post on 16-Mar-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

JOURNAL OF BACTERIOLOGY, Aug. 1985, p. 806-811 Vol. 163, No. 20021-9193/85/080806-06$02.00/0Copyright © 1985, American Society for Microbiology

Gene Density over the Chromosome of Escherichia coli: FrequencyDistribution, Spatial Clustering, and Symmetry

JERZY JURKA AND MICHAEL A. SAVAGEAU*Department of Microbiology and Immunology, The University of Michigan, Ann Arbor, Michigan 48109

Received 9 January 1985/Accepted 20 May 1985

Published studies of gene density (the number of genetic loci per unit of length on the linkage map) forEscherichia coli report a nonrandom frequency distribution and indicate notable symmetry in spatial dusteringof gene density. We reexamined these results and found that gene density is a random variable with a frequencydistribution that is lognormal. That is, the logarithm of gene density is a normally distributed random variable.Furthermore, comparison of the observed E. coli map and computer-generated random maps showed thatsymmetries in the spatial clustering of gene density are not exceptional; these features arise naturaly amonggenes (or loci) whose density has this frequency distribution. These results are discussed along with otherrelated examples that illustrate the emerging importance of statistical inference in molecular genetics.

The phenomenal rate at which molecular genetic data arebeing accumulated, particularly with regard to nucleic acidsequences, has been commented on repeatedly (e.g., seereference 4). Several million bases are now cataloged insequence libraries, and the number of genes characterizedby more classical means has increased at a rate that is hardlyless dramatic. As these vast amounts of molecular data arebeing amassed, new questions are being raised about mean-ingful patterns of information. Some questions relate to localpatterns believed to provide specificity for various types oftargets-targets for recombination, translational initiation,transcription termination, etc. Other questions are con-cerned with global patterns in chromosome organization: thedistribution of gene density, the locations of functionallyrelated genes, etc.Because of the large mass of data and the limited number

of a priori constraints, it is relatively easy to find apparentlymeaningful patterns. However, recent experience has shownthat not all these patterns are statistically significant. In thispaper we examine three features of the genetic map ofEscherichia coli: the frequency distribution of gene density,spatial clustering of gene density, and symmetry in genedensity along the linkage map. Our results suggest that (i)gene density is a random variable with a skewed frequencydistribution, (ii) clustering of gene density occurs naturallyas a consequence of this skewed frequency distribution, and(iii) symmetries also are the by-product of the skeweddistribution of gene density.Frequency distributions. Before we address the principal

questions, it will be helpful to review a few points aboutfrequency distributions.

In Figure 1, three different frequency distributions of therandom variable X are shown. The first is a uniform distri-bution, which means that X will assume each value betweena and b with equal probability (Fig. 1A). The second is anormal distribution, which means that the central value ofXis most probable; the probability that X will assume adifferent value decreases in a well-known fashion as Xdeviates further from the central value (Fig. 1B). The third,unlike the first two, is a skewed distribution, which impliesthat the lower values ofX are more probable than the highervalues (Fig. 1C). In this case, the distribution is the well-

* Corresponding author.

known lognormal distribution, so called because plotting thelogarithm of the value of the random variable X againstfrequency yields the normal distribution described above(Fig. 1B).Many other distributions of random variables could be

examined; for a comprehensive listing see the multivolumecollection by Johnson and Kotz (14). Those in Fig. 1 willsuffice for our purposes here.We emphasize three points about such distributions. First,

these are all distributions of random variables. Second,differences in the underlying stochastic mechanisms gener-ate such distributions, although we might not know specifi-cally what the mechanisms are. Third, the implications ofrandomness depend on the distribution, and knowledge ofthe distribution must be taken into account if valid statisticalinferences are to be drawn.

Frequency distribution of gene density. Early analyses ofgene density for E. coli found that density was a nonrandomvariable (2). The apparent nonrandom character of theseresults has stimulated many investigators to seek an expla-nation in terms of deterministic structural or functionalmechanisms. Unfortunately, the frequency distribution ofgene density was not characterized adequately in these earlyanalyses. A single type of frequency distribution was as-sumed, the actual distribution of gene density was comparedwith the assumed distribution, differences were found, and itwas concluded that gene density is not a random variable.The actual frequency distribution was not compared withother types of frequency distributions of random variables(Fig. 1), and no direct characterization of the actual fre-quency distribution was reported.We analyzed data from both the 1976 (2) and 1983 (1)

linkage maps of E. coli. Similar results were obtained, so weshall henceforth refer only to the more complete 1983 map.The map was divided into 200 intervals (size, 0.5 min each),and the number of loci in each interval (i.e., gene density)was determined. We excluded prophage attachment sitesand markers with questionable map locations (those inparentheses on the published map), although our conclu-sions are not influenced if these are included (data notshown). The observed frequency distribution of the remain-ing 948 loci is summarized in Table 1. We compared thisfrequency distribution with a number of well-known fre-quency distributions and concluded on the basis of the

806

on April 3, 2019 by guest

http://jb.asm.org/

Dow

nloaded from

NOTES 807

TABLE 1. Observed frequency of gene loci on the E. coli mapand expected frequency based on the lognormal distribution

No. of loci Observed Expectedper interval frequency frequencya

0 13 7.461 24 30.072 32 34.173 28 29.054 21 22.665 24 17.216 9 13.007 13 9.848-9 12 13.27

10 and over 24 23.26a FL= 0.5989; c = 0.3360; x2 = 9.98; P = 0.3. Kolmogorov-Smimov test:

D = 5.54/200 = 0.0277; P > 0.5.

RANDOM VARIABLE, X

FIG. 1. Frequency distributions of the random variable X. (A)Uniform distribution, (B) normal distribution, and (C) lognormaldistribution. See the text for discussion.

following test that the observed frequency distribution isadequately described by the lognormal distribution dis-cussed above (Fig. 1C). The expected lognormal distribu-tion, modified to account for the truncated, discrete nature

of the data, was obtained by the method of moments (34, 35).The calculated lognormal distribution (Table 1) with ,. =

0.5989 and v = 0.3360 gave a chi-squared value of 9.98 for 9df, indicating reasonably good agreement with the observeddistribution (P = 0.3).From these results we conclude that gene density is a

random variable with a frequency distribution that is lognor-mal.The earlier conclusion, that gene density is not a random

variable, can now be seen to have resulted from failure tonote the diversity of distributions that a random variablemay exhibit. However, one should not draw another ex-treme conclusion, namely that genes are uniformly dispersedin a random fashion over the chromosome and that thelognormal results in this section are the result of mappingartifacts. This conclusion also would be invalid.

Potential sources of artifacts are relatively obvious. (i) Notall the genes have been mapped. (ii) Biases are introduced by

280 r-

240 p

200

t; 160z

x 1208L

80

H

40 P

0I I I I I I I I I I I IW I I 1 1I I

0

o 20 40 60 80 100 120

MOLECULAR WEIGHT ( 10 3)FIG. 2. Frequency distribution of molecular weights for polypeptide chains of E. coli. Calculated from the data of Neidhardt et al (20).

z

a

140

VOL. 163, 1985

on April 3, 2019 by guest

http://jb.asm.org/

Dow

nloaded from

808 NOTES

the technical requirements of genetic mapping. (iii) There isan arbitrary element in the assignment of ambiguous loci.These factors might well contribute to the observed distri-bution of gene density, particularly with small sample sizes,but their influence would be expected to disappear as anincreasing number of genes are accurately mapped. In fact,the apparent lognormal character of the gene density for the1976 map (2) and for the 1983 map (1) has remained the same(data not shown) in spite of the addition of many new loci.Furthermore, the distribution of interloci distances (data notshown) and hence that of gene density is skewed in afundamental sense because the proteins and consequentlytheir coding sequences have a highly skewed size distribu-tion (Fig. 2). The techniques and biases involved in thesephysical studies are completely different from those involvedin classical genetic mapping. Finally, if there was anysystematic error caused by the inclusion of markers withquestionable locations (e.g., those with an asterisk on thelinkage map), it would be in the opposite direction-towarda uniform rather than a skewed distribution-because of thetendency to place a locus whose location between twomarkers is uncertain toward the middle rather than adjacentto one of the extremes. Thus, although in time there willundoubtedly be minor changes in the distribution, it is highlyimprobable that the lognormal character of the frequencydistribution of gene density will ultimately disappear as moregenes are accurately mapped. Gene density does indeedappear to be a random variable, but its frequency distribu-tion is skewed.The existence of a statistical distribution for gene density

does not mean the absence of structural or functional con-straints on the relative location of specific genes. There areseveral examples of such constraints. The tendency in E.coli for structural genes to be organized in operons (18) iswell known. The tendency to cluster structural genes ofcertain regulatory enzymes in operons, which enhances thecoordination of their regulatory functions, has been docu-mented (23, 24). The clustering of genes for the machinery ofmacromolecular synthesis near the origin of replication canbe seen from an examination of the map (1), and theimplications of this for increasing gene dosage have oftenbeen noted. The correlation between the location of thegenes for amino acid biosynthetic enzymes and the rate ofsynthesis of the corresponding amino acids has been shownpreviously (30). Such constraints are perhaps first-orderrules of chromosome organization on a macroscopic scale.There are undoubtedly many more such rules to be discov-ered. However, none of these now provides an explanationfor the observed frequency distribution of gene density.Although the biological mechanism(s) responsible for thisdistribution is unknown, the possibilities are significantlyrestricted. Any proposed mechanism must be able to gener-ate a lognormal distribution or a skewed distribution that isindistinguishable from a lognormal distribution on the basisof the experimental data.Gene clustering and symmetry. Bachmann et al. (2) noted

that a plot of gene density as a function of map locationreveals major peaks of high gene density, or gene clustering,separated by major troughs of low gene density that arestatistically significant and that there is notable symmetryamong the peaks and troughs of gene density. These conclu-sions, which have been referred to by others (6, 21), areinvalid because the statistical analysis was performed withtheir assumed frequency distribution of gene density, which,as we have seen, is inappropriate.To explore further the question of symmetry and gene

85.0

40.0

68.032.0

24.0 A>N51.0

z16.0

~'34.0A8.0

x

0.017.0 0 6 12 18 24 30

x~~~~I0.0 6 12 18 24 30

NO. OF LOCI PER 0.5 MINFIG. 3. Lognormal distribution of gene density determined for

simulated maps. Random maps were generated by computer with arandom-breakage model that produced lognormal distributions. Thecontinuous lines represent fitted lognormal distributions of genedensity. Two different cases are represented. Values in the largerfigure correspond to fitted values of ,u = 0.173.and uf = 0.686 (X2 =7.49 with n = 7 and P = 0.3); values in the insert correspond to fittedvalues of pu = 0.600 and cr = 0.334 (X2 = 9.72 with n = 11 and P =0.5).

clustering, we examined the distribution of gene density as afunction of map location for both the observed E. coli mapand computer-generated random maps that have the appro-priate lognormal frequency distribution of gene density.Although the biological mechanisms that have generated thelognormal frequency distribution of gene density for E. coliare not known, one readily can simulate such maps by usingabstract models such as random breakage (7, 12, 13, 15),which are known to generate lognormal frequency distribu-tions. For our purposes, it is sufficient that random mapsgenerated on the basis of our model show the appropriatelognormal frequency distribution of gene density (Fig. 3).The following terms will be used to characterize the

observed and simulated maps of E. coli. The degree ofsymmetry about any point on a circular map is defined as thecorrelation coefficient between the gene densities down theright- and lefthand sides of the map from that point. Aperfect correlation indicates mirror-image symmetry; nocorrelation indicates no symmetry. The nonnormal fre-quency distribution of gene density (Table 1 and Fig. 3)suggests the use of rank methods to determine these corre-lations. We adopted the common method proposed bySpearman and modified by Lehmann (16). The axis ofsymmetry is defined by the angle of rotation for the circularmap that produces the maximum degree of symmetry. Thedegree of clustering is defined by the difference between the

J. BACTERIOL.

on April 3, 2019 by guest

http://jb.asm.org/

Dow

nloaded from

NOTES 809

TABLE 2. Symmetry and clustering of gene density overobserved and simulated maps of E. coli

Degree of symmetrya Degree of

MapMaximum Minimum (Maximum -

minimum)

Observed 0.329 -0.234 0.563

Simulatedc1 0.520 -0.208 0.7282 0.471 -0.408 0.8793 0.264 -0.347 0.6114 0.354 -0.524 0.8785 0.507 -0.344 0.851

a The axis of symmetry is identified by determining the angle of rotation thatproduces the maximum correlation between the gene densities down the right-and lefthand sides of'a given circular map. The degree of symmetry is definedby the correlation coefficient. The greater the maximum correlation coeffi-cient, the greater the degree of symmetry.bThe degree of clustering of gene density is the difference between the

maximum and minimum degrees of symmetry obtained for a given circularmap. The greater the difference, the greater the degree of clustering.

c Randomly generated maps with the appropriate lognormal distribution ofgene density (i = 0.6; a = 0.334) to sinmulate the observed map of E. coli.

maximum and minimum degrees of symmetry obtained for agiven circular map. By this measure, the degree of clusteringis zero when the genes are uniformly distributed over thechromosome.Measurements of symmetry and clustering of gene density

are given in Table 2 for the observed map of E. coli. In thesystematic search for symmetries in the observed E. colimap, we found that the greatest degree of symmetry is aboutthe 22-min point, not the previously reported 87.5-min point.Similar measurements are given in Table 2 for five randomlygenerated maps that simulate the observed E. coli map. Thesimulated maps clearly exhibit symmetries that are as strik-ing as those described for the map of E. coli (2, 6, 21).Furthermore, the simulated maps contain peaks of high genedensity (gene clustering) separated by regions of relativelylow gene density, which is similar to what has been observedin the genetic map ofE. coli (2). We found that these featuresalso arise naturally with other skewed distributions (data notshown).From these results, we conclude that there is no compel-

ling reason to give a primary biological interpretation to theobserved symmetries and clustering of gene density. In-stead, it is clear that such symmetries and clustering may bethe by-product of the biological process generating theobserved lognormal frequency distribution of gene density.The earlier conclusion, that there is notable symmetry in

the spatial clustering of gene density (2), can now be seen tohave resulted from the use of an inappropriate frequencydistribution for gene density in subsequent statistical analy-sis. However, one should not assume that such symmetriesdo not exist. Symmetries in the spatial clustering of genes doindeed exist, and they are unlikely to disappear as anincreasing number of genes are accurately mapped. Suchsymmetries arise naturally when the underlying frequencydistribution of gene density is lognormal, but they are notstatistically significant. This conclusion is not restricted tothe lognormal distribution. We have considered a number ofother distributions, but none proved a better fit to theexperimental data. Furthermore, our conclusions with re-gard to symmetries and gene clustering have been substan-tiated with other skewed distributions. Thus, our conclu-

sions appear to be valid and not unduly dependent on thechoice of a particular skewed distribution.Our results suggest that, contrary to previous reports,

gene density on the linkage map of E. coli is a randomvariable. Its frequency distribution, however, is not uniformor normal; it is highly skewed and appears to be lognormal.This fact has important consequences for the outcome ofsubsequent statistical analysis. The clustering of loci intohigh-density regions separated by low-density regions andthe symmetry of gene density around the linkage map, whichhas been noted by others, occur readily in randomly gener-ated maps. Thus, these are likely to be by-products of theprocess generating the skewed frequency distribution ofgene density and not features of the E. coli chromosome thatare significantly different from random expectation.We shall briefly note just two other examples from recent

literature that, together with the results reported above,serve to highlight the emerging importance of statisticalinference in molecular genetics-the search for patterns thatdefine recombination sites in DNA sequences of eucaryotesand translational initiation sites in RNA sequences ofprocaryotes.DNA sequencing of known recombinational junctions and

corresponding regions from parental strands did not revealany obvious consensus sequence of homology (10) as wasfound for site-specific recombination in procaryotes (19).However, small patches of homology that have no specificsequence, separated by nonhomologous stretches of variablelength, were reported between the parental strands nearrecombinational junctions, and it was suggested that such amosaic pattern of nonspecific homologies might provide thetarget for recombination (10).

Subsequent analysis has indicated that such patchy ho-mology is not statistically significant (25). The extent ofpatchy homology between parental strands at recom-binational junctions is no different from that observed (i)between randomly chosen locations within the samegenome, (ii) between randomly chosen locations within thesame genome that have had their sequences scrambled, and(iii) between computer-generated random sequences withthe same frequency distribution of bases as the experimentalsequences. As in the case of spatial clustering of genedensity, the appropriate frequency distribution must be usedif the statistical analysis is to be valid. An examination ofother sequence patterns that might define sites of recombi-nation showed that these patterns near knownrecombinational junctions also were indistinguishable fromthose near randomly generated sites by the same statisticalcriteria (25). Bullock et al. (5) came to the same conclusions.The search for patterns in nucleic acid sequences that

define translational initiation sites in procaryotes has led to adifferent conclusion. Nucleotide sequences around severalsites of translation initiation were known but no pattern wasapparent until Shine and Dalgarno (28) reported that thesequence for the 3' end of 16S rRNA contained a block ofbases complementary to the region upstream from the initi-ation codons in each case. There is considerable experimen-tal evidence to support the inclusion of these two elements(Shine and Dalgarno sequence and initiation codon) in thepattern that defines translational initiation sites (8, 9, 31).However, much more is involved; translational initiationsites that lack these elements are known (3, 17, 22, 32, 36),and sites where translational initiation does not occur pos-sess these elements (32).

Statistical analysis has shown that the nucleotides in theinitiation regions occur nonrandomly (26). More impor-

VOL. 163, 1985

on April 3, 2019 by guest

http://jb.asm.org/

Dow

nloaded from

810 NOTES

tantly, the nucleotides around the Shine and Dalgarno se-quence and the AUG codon of initiation sites are non-random, whereas those around these same elements in sitesthat are not involved in translational initiation appear to berandom (32). Thus, it appears that the sequence informationabout translational initiation sites 'may be sufficient to definethese sites, but the definitive pattern remains to be eluci-dated. Stormo et al. (33) used methods from the field ofpattern recognition and revealed a mosaic pattern in whichbases that are highly weighted are interspersed with basesthat are relatively unweighted in statistical significance indefining translational initiation sites. This algorithm is themost successful to date'in discriminating initiation fromnoninitiation sites'. RNA secondary structure, which isknown to affect translational initiation (11, 27, 29), has yet tobe fully analyzed.

In the three cases we have discussed, development wassimilar. First, a relatively obvious pattern was observed(symmetry among the relatively few peaks of high genedensity, patchy homology around recombinational junc-tions, short consensus sequence near translational initiationcodons). Second, subsequent analysis showed that the sim-ple patterns were not statistically significant (randomly gen-erated maps naturally exhibit symmetries, randomly'gener-ated parental strands exhibit abundant patchy homology,randomly generated sequences possess numerous Shine andDalgarno sequences followed after suitable spacing by AUGcodons). Only in the case of translational initiation hassignificant progress been made in establishing a well-definedstatistically significant pattern (33).The reasons' for the original misinterpretation in each case

were the large number of possibilities and the correspond-ingly small number of constraints to limit the arbitrarydegrees of choice in forming the putative pattern. An axis ofgenome symmetry can nearly always be 'found, because achromosome can be rotated through many positions so as tosymmetrically match the small number of peaks of high genedensity. Substantial amounts of patchy homology can nearlyalways be found, because parental sequences can be shiftedthrough many positions to match short random homologiesseparated by nonhomologous regions of arbitrary length, theonly constraint being the location of the junction for recom-bination. Short canonical sequences can nearly always befound because of the large number of bases in the genomeand the small number of bases in fixed relative positions tobe matched.

This work was supported in part by Public Health Service grantGM 30054 to M.A.S'. from the National Institutes of Health.We thank D. H. Irvine and M. Okamoto for their assistance in

implementing the computer models.

LITERATURE CITED1. Bachmann, B. J. 1983. Linkage map of Escherichia coli K-12,

edition 7. Microbiol. Rev. 47:180-230.2. Bachmann, B. J., K. B. Low, and A. L. Taylor. 1976. Recali-

brated 'linkage map of Escherichia coli K-12. Bacteriol Rev.40:116-167.

3. Belin, D., J. Hedgpeth, G. B. Selzer, and R. H. Epstein. 1979.Temperature-sensitive mutation in the initiation codon of therIlB gene of bacteriophage T4. Proc. Natl. Acad. Sci. U.S.A.76:700-704.

4. Blattner, F. R. 1983. Biological frontiers. Science 222:719-720.5. Bullock, P., W. Forrester, and M. Botchan. 1984. DNA se-

quence studies of simian virus 40 chromosomal excision andintegration in rat cells. J. Mol. Biol. 174:55-84.

6. DeMartelaere, D. A., and A. P. Van Gool. 1981. The density

distribution of gene loci over the genetic map of Escherichiacoli: its structural, functional and evolutionary implications. J.Mol. Evol. 17:354-360.

7. Epstein, B. 1947. The mathematical description of certain break-age mechanisms leading to the logarithmico-normal distribution.J. Franklin Inst. 244:471-477.

8. Gold, L., D. Pribnow, T. Schneider, S. Shinedling, B. S. Singer,and G. Stormo. 1981. Translational initiation in prokaryotes.Annu. Rev. Microbiol. 35:365-403.

9. Grunberg-Manago, M. 1980. Initiation of protein synthesis asseen in 1979, p. 445-477. In G. Chambliss, G. R. Craven, J.Davies, K. Davis, L. Kahan, and M. Nomura (ed.), Ribosomes.University Park Press, Baltimore.

10. Gutai, M. W., and D. Nathans. 1978. Evolutionary variants ofsimian virus 40: cellular DNA sequences and sequences atrecombinant joints of substituted variants. J. Mol. Biol.126:275-288.

11. Hall, M. N., J. Gabay, M. Debarboulile, and M. Schwartz. 1982.A role for mRNA secondary structure in the control of transla-tion initiation. Nature (London) 295:616-618.

12. Halmos, P. R. 1944. Random alms. Ann. Math. Stat.15:182-189.

13. Herdan, G. 1960. Small particle statistics. Academic Press, Inc.,New York.

14. Johnson, N. L., and S. Kotz. 1970. Distributions in statistics:continuous univariate distributions, vol. 1 and 2. HoughtonMifflin Co., Boston.

15. Kolmogoroff, A. N. 1941. Uber das logarithmisch normaleVerteilungsgesetz der Dimensionen der Teilchen bei Zer-stuckelung. C.R. Dokl. Acad. Sci. URSS 31:99-101.

16. Lehmann, E. L. 1975. Nonparametrics, p. 297-303. HoldenDay, Inc., San Francisco.

17. Mackie, G. M. 1981. Nucleotide sequence of the gene forribosomal protein S20 and its flanking regions. J. Biol. Chem.256:8177-8182.

18. Miller, J. H., and W. S. Reznikoff (ed.). 1978. The operon. ColdSpring Harbor Laboratory, Cold Spring Harbor, N.Y.

19. Nash, H. A. 1981. Integration and excision of bacteriophage X:the mechanism of conservative site-specific recombination.Annu. Rev. Genet. 15:143-167.

20. Neidhardt, F. C., V. Vaughn, T. A. Phillips, and P. L. Bloch.1983. Gene-protein index of Escherichia coli K-12. Microbiol.Rev. 47:231-284.

21. Pettijohn, D. E., and J. 0. Carlson. 1979. Chemical, physical,and genetic structure of prokaryotic chromosomes, p. 2-57. InD. M. Prescott and L. Goldstein (ed.), Cell biology, vol. 2.Academic Press, Inc., N.Y.

22. Ptashne, M., K. Beckman, M. Z. Humayun, A. Jeffrey, R.Maurer, B. Meyer, and R. T. Sauer. 1976. Autoregulation andfunction of a repressor in bacteriophage lambda. Science194:156-161.

23. Savageau, M. A. 1972. The behavior of intact biochemicalcontrol systems. Curr. Top. Cell. Regul. 6:63-130.

24. Savageau, M. A. 1976. Biochemical systems analysis: a study offunction and design in molecular biology, p. 213. Addison-Wesley Publishing Co., Inc., Reading, Mass.

25. Savageau, M. A., R. Meter, and W. W. Brockman. 1983.Statistical significance of partial base-pairing potential: implica-tions for recombination of SV40 DNA in eukaryotic cells.Nucleic Acids Res. 11:6559-6570.

26. Scherer, G. F. E., M. D. Wafldnshaw, S. Arnott, and D. J.Moore. 1980. The ribosome binding sites recognized by Esche-richia coli ribosomes have regions with signal character in boththe leader and protein coding segments. Nucleic Acids Res.8:3895-3907.

27. Schwartz, M., M. Roa, and M. Debarbouifle. 1981. Mutationsthat affect lamB gene expression at a post-transcriptional level.Proc. Natl. Acad. Sci. U.S.A. 78:2937-2941.

28. Shine, J., and L. Dalgarno. 1974. The 3'-terminal sequence ofEscherichia coli 16S ribosomal RNA: complementarity to non-sense triplets and ribosome binding sites. Proc. Natl. Acad. Sci.U.S.A. 71:1342-1346.

29. Singer, B. S., L. Gold, S. T. Shinedling, M. Colkitt, L. R.

J. BACTERIOL.

on April 3, 2019 by guest

http://jb.asm.org/

Dow

nloaded from

NOTES 811

Hunter, D. Pribnow, and M. A. Nelson. 1981. Analysis in vivo oftranslational mutants of the rIlB cistron of bacteriophage T4. J.Mol. Biol. 149:405-432.

30. Sneilings, K., and C. W. Vermeulen. 1982. Non-random layoutof the amino acid loci on the genome of Escherichia coli. J. Mol.Biol. 157:687-688.

31. Steitz, J. A. 1980. RNA-RNA interactions during polypeptide-chain initiation, p. 479-495. In G. Chambliss, G. R. Craven, J.Davies, K. Davis, L. Kahan, and M. Nomura (ed.), Ribosomes.University Park Press, Baltimore.

32. Stormo, G., T. D. Schneider, gnd L. Gold. 1982. Characteriza-tion of translational initiation sites in E. coli. Nucleic Acids Res.

10:2971-2996.33. Stormo, G., T. D. Schneider, L. Gold, and A. Ehrenfeucht. 1982.

Use of the 'Perceptron' algorithm to distinguish translationalinitiation sites in E. coli. Nucleic Acids Res. 10:2997-3011.

34. Thompson, H. R. 1950. Truncated normal distributions. Nature(London) 165:444-445.

35. Thompson, H. R. 1951. Truncated lognormal distributions. I.Solution by moments. Biometrika 38:414-422.

36. Young, I. G., B. L. Rogers, 11. D. Campbell, A. Jaworowski, andD. C. Shaw. 1981. Nucleotide sequence coding for the respira-tory NADH dehydrogenase of Escherichia coli: UUG initiationcodon. Eur. J. Biochem. 116:165-170.

VOL. 163, 1985

on April 3, 2019 by guest

http://jb.asm.org/

Dow

nloaded from