computational comparative genomics of borrelia species · the present study we analysed the...

10
RESEARCH • GENOMICS Anjaneyulu K et al. Computational Comparative Genomics of Borrelia Speci Discovery life, 2012, 2(4), 8-17, www.discovery.org.in/dl.htm Anjan 1.Ph.d Scholar, Dept. of Bio Sciences., South 2. Ph.d Scholar, Dept. of Bio Sciences., South 3. Professor , Dept. of Bio Sciences., South Gu *Corresponding author: Panda Ph.d Scholar, Dept. of Bi [email protected]. Received 21 August; accepted 18 September; published o Comparative genomics of closely related bacterial i important genome elements along with evolutionary p The regulation of these genes is simple, groups of ge genome is translated. Moreover, the limited physical s comparative genome analysis in compare with euka species of Borrelia inclusive of pathogenic and nonp shows a high degree of similarity of gene content and under this study divulge the most common and diverse evolutionarily approximately equidistant. Comparative B. burgdorferi JD1 than with B. garinii BgVir , B. valais Keywords: Bacterial genomics ; Comparative geno valaisiana VS116,orf finder,Gene Wiz Browser,Clustal Abbreviations: ORF-open reading frame,GW –GeneW 1. INTRODUCTIO Comparative genomics genome structure and fu species or strains. Compa evolutionary distance ca genes, in recognizing n regulatory sequences an genome. Closely related differences and similaritie both similarities and diffe regulatory regions of evolutionary selection pr time, a comparative sequ for a better annotatio approaches to genome co common research topic in use of computer programs and look for regions of s more than 16 years s sequence was published sequences are now avai The complete genome se five different species of B to the spirochetes, a gr RESEARCH • GENOMICS Computational Compar ISSN 2278 – 5442 EISSN 2278 – 5434 Comparative genomics understanding of how s Researchers have lear model organisms such sequence similarity, ge noncoding DNA in eac complex as humans. Global alignment: when two nucleic acid or amino acid sequences are lined up along their entire length. See also local alignment Homology: similarity in sequence that is based on descent from a common ancestor Local alignment: the alignment of portions (rather than the entire sequence length) of two nucleic acid or amino acid sequences Masking: the removal of repeated or low complexity regions from a sequence so that sequences are compared ies, © 2012 discove neyulu K 1 , Ashok P Patil 2 , Desai PV 3* Gujarat University , Post Box No 49, Surat - 395007, Gujarat, INDIA h Gujarat University , Post Box No 49, Surat - 395007, Gujarat, INDIA ujarat University , Post Box No 49, Surat - 395007, Gujarat, INDIA io Sciences., South Gujarat University , Post Box No 49, Surat - 395007, G online 01 October; printed 16 October 2012 ABSTRACT isolates is a powerful method for uncovering virulence properties, patterns. For microorganisms, the cellular machinery is dominated by enes, regulations, are controlled simultaneously from short regulator sizes , small and compact genomes of prokaryotes, has lead to mor aryotes. The present study we analysed the complete genome sequ pathogenic species. Comparative genome analysis of the organisms d organization, but also a high degree of sequence heterogeneity. G se genes, proteins within Borrelia species. The genomes of these spec genome analysis between the five species showed that B. afzelii PKo siana VS116. omics; Genomes of B. afzelii PKo, B. bissettii DN127, B. burgdor l w .Multiple wise Alignments Wiz, ON is the study of the relationship of unction across different biological parisons of genomes of appropriate an aid in defining protein-coding non-coding genes, and in finding nd other functional elements of a d species reveal species-specific es. Comparative genomics exploits erences in the proteins, RNA, and different organisms to infer ressures on genes. At the same uence analysis provides the means on. Consequently, computational omparison have recently become a n computer science. It involves the s that can line up multiple genomes similarity among them. It has been since the first bacterial genome d. Hundreds of bacterial genome ilable for com-parative genomics 1 . equences has been determined for Borrelia. The genus Borrelia belong roup of bacteria that have long, helically coiled cells comprise known to cause Lyme dise channeled by ticks and some the gastro-intestinal tract of are able to infect multiple hos is thought to be defined by redundant plasmids. Only th cause the multisystem disor B.burgdorferi, B.garinii and B. the disease patterns observe the particular Borrelia spe primarily associated with acrodermatitis chronica ath disease) whereas B.burgdorfe Lyme arthritis which however the present comparative stud findings of the comparative a proteins of the different spec 2006) 2. Statement of the The present study pointing o gene sequences of Borrelia s of the genes and proteins amo Discovery life, Volu lif rative Genomics of Borrelia S s is the analysis and comparison of genomes from different species. T species have evolved and to determine the function of genes and nonc rned a great deal about the function of human genes by examining h as the mouse. Genome researchers look at many different feature ene location, the length and number of coding regions (called exons ch genome, and highly conserved regions maintained in organisms www.discovery.org.in ery publication. All rights reserved 8 Gujarat, INDIA, Mail: metabolic activities and other y the central dogma processes. ry regions and a majority of the re perceptive results after the uence of reported five different s within same species not only Genome sequence comparisons cies are closely related and are o was more closely related with rferi JD1, B. garinii BgVir , B. es of 36 species of which 12 are ease or Borreliosis which are species of lice. Borreliae live in ticks (Ixodes spec.) and some sts via tick bite. The host range gene variations on a group of hree species of this complex rder Lyme Borreliosis, namely, .afzelii. All these Furthermore, ed in humans are dependent on ecies involved. B.garinii is neuroborreliosis, B.afzelii with hrophicans (a chronic skin eri was found to be prevalent in r, is still uncertain. The goal of dy we are summarize the major analysis at genome, gene and cies of Borrelia. (Tim T et al, e Problem out the analyzing the genome, species to finding the similarity ong the species. ume 2, Number 4, October 2012 fe Species The purpose is to gain a better coding regions of the genome. g their counterparts in simpler es when comparing genomes: s) within genes, the amount of as simple as bacteria and as

Upload: others

Post on 19-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

RESEARCH • GENOMICS

Anjaneyulu K et al.Computational Comparative Genomics of Borrelia Species,Discovery life, 2012, 2(4), 8-17, www.discovery.org.inwww.discovery.org.in/dl.htm © 2012 discovery publication. All rights reserved

8

Anjaneyulu K1, Ashok P Patil2, Desai PV3*

1.Ph.d Scholar, Dept. of Bio Sciences., South Gujarat University , Post Box No 49, Surat - 395007, Gujarat, INDIA2. Ph.d Scholar, Dept. of Bio Sciences., South Gujarat University , Post Box No 49, Surat - 395007, Gujarat, INDIA3. Professor , Dept. of Bio Sciences., South Gujarat University , Post Box No 49, Surat - 395007, Gujarat, INDIA

*Corresponding author: Panda Ph.d Scholar, Dept. of Bio Sciences., South Gujarat University , Post Box No 49, Surat - 395007, Gujarat, INDIA, Mail:[email protected].

Received 21 August; accepted 18 September; published online 01 October; printed 16 October 2012

ABSTRACTComparative genomics of closely related bacterial isolates is a powerful method for uncovering virulence properties, metabolic activities and otherimportant genome elements along with evolutionary patterns. For microorganisms, the cellular machinery is dominated by the central dogma processes.The regulation of these genes is simple, groups of genes, regulations, are controlled simultaneously from short regulatory regions and a majority of thegenome is translated. Moreover, the limited physical sizes , small and compact genomes of prokaryotes, has lead to more perceptive results after thecomparative genome analysis in compare with eukaryotes. The present study we analysed the complete genome sequence of reported five differentspecies of Borrelia inclusive of pathogenic and nonpathogenic species. Comparative genome analysis of the organisms within same species not onlyshows a high degree of similarity of gene content and organization, but also a high degree of sequence heterogeneity. Genome sequence comparisonsunder this study divulge the most common and diverse genes, proteins within Borrelia species. The genomes of these species are closely related and areevolutionarily approximately equidistant. Comparative genome analysis between the five species showed that B. afzelii PKo was more closely related withB. burgdorferi JD1 than with B. garinii BgVir , B. valaisiana VS116.

Keywords: Bacterial genomics ; Comparative genomics; Genomes of B. afzelii PKo, B. bissettii DN127, B. burgdorferi JD1, B. garinii BgVir , B.valaisiana VS116,orf finder,Gene Wiz Browser,Clustal w .Multiple wise Alignments

Abbreviations: ORF-open reading frame,GW –GeneWiz,

1. INTRODUCTIONComparative genomics is the study of the relationship ofgenome structure and function across different biologicalspecies or strains. Comparisons of genomes of appropriateevolutionary distance can aid in defining protein-codinggenes, in recognizing non-coding genes, and in findingregulatory sequences and other functional elements of agenome. Closely related species reveal species-specificdifferences and similarities. Comparative genomics exploitsboth similarities and differences in the proteins, RNA, andregulatory regions of different organisms to inferevolutionary selection pressures on genes. At the sametime, a comparative sequence analysis provides the meansfor a better annotation. Consequently, computationalapproaches to genome comparison have recently become acommon research topic in computer science. It involves theuse of computer programs that can line up multiple genomesand look for regions of similarity among them. It has beenmore than 16 years since the first bacterial genomesequence was published. Hundreds of bacterial genomesequences are now available for com-parative genomics1.The complete genome sequences has been determined forfive different species of Borrelia. The genus Borrelia belongto the spirochetes, a group of bacteria that have long,

helically coiled cells comprises of 36 species of which 12 areknown to cause Lyme disease or Borreliosis which arechanneled by ticks and some species of lice. Borreliae live inthe gastro-intestinal tract of ticks (Ixodes spec.) and someare able to infect multiple hosts via tick bite. The host rangeis thought to be defined by gene variations on a group ofredundant plasmids. Only three species of this complexcause the multisystem disorder Lyme Borreliosis, namely,B.burgdorferi, B.garinii and B.afzelii. All these Furthermore,the disease patterns observed in humans are dependent onthe particular Borrelia species involved. B.garinii isprimarily associated with neuroborreliosis, B.afzelii withacrodermatitis chronica athrophicans (a chronic skindisease) whereas B.burgdorferi was found to be prevalent inLyme arthritis which however, is still uncertain. The goal ofthe present comparative study we are summarize the majorfindings of the comparative analysis at genome, gene andproteins of the different species of Borrelia. (Tim T et al,2006)

2. Statement of the ProblemThe present study pointing out the analyzing the genome,gene sequences of Borrelia species to finding the similarityof the genes and proteins among the species.

RESEARCH • GENOMICS Discovery life, Volume 2, Number 4, October 2012

lifeComputational Comparative Genomics of Borrelia Species

ISS

N 2

278

–54

42E

ISS

N 2

278

–54

34

Comparative genomics is the analysis and comparison of genomes from different species. The purpose is to gain a betterunderstanding of how species have evolved and to determine the function of genes and noncoding regions of the genome.Researchers have learned a great deal about the function of human genes by examining their counterparts in simplermodel organisms such as the mouse. Genome researchers look at many different features when comparing genomes:sequence similarity, gene location, the length and number of coding regions (called exons) within genes, the amount ofnoncoding DNA in each genome, and highly conserved regions maintained in organisms as simple as bacteria and ascomplex as humans.

Global alignment:when two nucleic acid or

amino acid sequences arelined up along their entirelength. See also localalignment

Homology: similarity insequence that is based ondescent from a commonancestor

Local alignment: thealignment of portions(rather than the entiresequence length) of twonucleic acid or amino acidsequences

Masking:the removal of repeated orlow complexity regionsfrom a sequence so thatsequences are compared

RESEARCH • GENOMICS

Anjaneyulu K et al.Computational Comparative Genomics of Borrelia Species,Discovery life, 2012, 2(4), 8-17, www.discovery.org.inwww.discovery.org.in/dl.htm © 2012 discovery publication. All rights reserved

8

Anjaneyulu K1, Ashok P Patil2, Desai PV3*

1.Ph.d Scholar, Dept. of Bio Sciences., South Gujarat University , Post Box No 49, Surat - 395007, Gujarat, INDIA2. Ph.d Scholar, Dept. of Bio Sciences., South Gujarat University , Post Box No 49, Surat - 395007, Gujarat, INDIA3. Professor , Dept. of Bio Sciences., South Gujarat University , Post Box No 49, Surat - 395007, Gujarat, INDIA

*Corresponding author: Panda Ph.d Scholar, Dept. of Bio Sciences., South Gujarat University , Post Box No 49, Surat - 395007, Gujarat, INDIA, Mail:[email protected].

Received 21 August; accepted 18 September; published online 01 October; printed 16 October 2012

ABSTRACTComparative genomics of closely related bacterial isolates is a powerful method for uncovering virulence properties, metabolic activities and otherimportant genome elements along with evolutionary patterns. For microorganisms, the cellular machinery is dominated by the central dogma processes.The regulation of these genes is simple, groups of genes, regulations, are controlled simultaneously from short regulatory regions and a majority of thegenome is translated. Moreover, the limited physical sizes , small and compact genomes of prokaryotes, has lead to more perceptive results after thecomparative genome analysis in compare with eukaryotes. The present study we analysed the complete genome sequence of reported five differentspecies of Borrelia inclusive of pathogenic and nonpathogenic species. Comparative genome analysis of the organisms within same species not onlyshows a high degree of similarity of gene content and organization, but also a high degree of sequence heterogeneity. Genome sequence comparisonsunder this study divulge the most common and diverse genes, proteins within Borrelia species. The genomes of these species are closely related and areevolutionarily approximately equidistant. Comparative genome analysis between the five species showed that B. afzelii PKo was more closely related withB. burgdorferi JD1 than with B. garinii BgVir , B. valaisiana VS116.

Keywords: Bacterial genomics ; Comparative genomics; Genomes of B. afzelii PKo, B. bissettii DN127, B. burgdorferi JD1, B. garinii BgVir , B.valaisiana VS116,orf finder,Gene Wiz Browser,Clustal w .Multiple wise Alignments

Abbreviations: ORF-open reading frame,GW –GeneWiz,

1. INTRODUCTIONComparative genomics is the study of the relationship ofgenome structure and function across different biologicalspecies or strains. Comparisons of genomes of appropriateevolutionary distance can aid in defining protein-codinggenes, in recognizing non-coding genes, and in findingregulatory sequences and other functional elements of agenome. Closely related species reveal species-specificdifferences and similarities. Comparative genomics exploitsboth similarities and differences in the proteins, RNA, andregulatory regions of different organisms to inferevolutionary selection pressures on genes. At the sametime, a comparative sequence analysis provides the meansfor a better annotation. Consequently, computationalapproaches to genome comparison have recently become acommon research topic in computer science. It involves theuse of computer programs that can line up multiple genomesand look for regions of similarity among them. It has beenmore than 16 years since the first bacterial genomesequence was published. Hundreds of bacterial genomesequences are now available for com-parative genomics1.The complete genome sequences has been determined forfive different species of Borrelia. The genus Borrelia belongto the spirochetes, a group of bacteria that have long,

helically coiled cells comprises of 36 species of which 12 areknown to cause Lyme disease or Borreliosis which arechanneled by ticks and some species of lice. Borreliae live inthe gastro-intestinal tract of ticks (Ixodes spec.) and someare able to infect multiple hosts via tick bite. The host rangeis thought to be defined by gene variations on a group ofredundant plasmids. Only three species of this complexcause the multisystem disorder Lyme Borreliosis, namely,B.burgdorferi, B.garinii and B.afzelii. All these Furthermore,the disease patterns observed in humans are dependent onthe particular Borrelia species involved. B.garinii isprimarily associated with neuroborreliosis, B.afzelii withacrodermatitis chronica athrophicans (a chronic skindisease) whereas B.burgdorferi was found to be prevalent inLyme arthritis which however, is still uncertain. The goal ofthe present comparative study we are summarize the majorfindings of the comparative analysis at genome, gene andproteins of the different species of Borrelia. (Tim T et al,2006)

2. Statement of the ProblemThe present study pointing out the analyzing the genome,gene sequences of Borrelia species to finding the similarityof the genes and proteins among the species.

RESEARCH • GENOMICS Discovery life, Volume 2, Number 4, October 2012

lifeComputational Comparative Genomics of Borrelia Species

ISS

N 2

278

–54

42E

ISS

N 2

278

–54

34

Comparative genomics is the analysis and comparison of genomes from different species. The purpose is to gain a betterunderstanding of how species have evolved and to determine the function of genes and noncoding regions of the genome.Researchers have learned a great deal about the function of human genes by examining their counterparts in simplermodel organisms such as the mouse. Genome researchers look at many different features when comparing genomes:sequence similarity, gene location, the length and number of coding regions (called exons) within genes, the amount ofnoncoding DNA in each genome, and highly conserved regions maintained in organisms as simple as bacteria and ascomplex as humans.

Global alignment:when two nucleic acid or

amino acid sequences arelined up along their entirelength. See also localalignment

Homology: similarity insequence that is based ondescent from a commonancestor

Local alignment: thealignment of portions(rather than the entiresequence length) of twonucleic acid or amino acidsequences

Masking:the removal of repeated orlow complexity regionsfrom a sequence so thatsequences are compared

RESEARCH • GENOMICS

Anjaneyulu K et al.Computational Comparative Genomics of Borrelia Species,Discovery life, 2012, 2(4), 8-17, www.discovery.org.inwww.discovery.org.in/dl.htm © 2012 discovery publication. All rights reserved

8

Anjaneyulu K1, Ashok P Patil2, Desai PV3*

1.Ph.d Scholar, Dept. of Bio Sciences., South Gujarat University , Post Box No 49, Surat - 395007, Gujarat, INDIA2. Ph.d Scholar, Dept. of Bio Sciences., South Gujarat University , Post Box No 49, Surat - 395007, Gujarat, INDIA3. Professor , Dept. of Bio Sciences., South Gujarat University , Post Box No 49, Surat - 395007, Gujarat, INDIA

*Corresponding author: Panda Ph.d Scholar, Dept. of Bio Sciences., South Gujarat University , Post Box No 49, Surat - 395007, Gujarat, INDIA, Mail:[email protected].

Received 21 August; accepted 18 September; published online 01 October; printed 16 October 2012

ABSTRACTComparative genomics of closely related bacterial isolates is a powerful method for uncovering virulence properties, metabolic activities and otherimportant genome elements along with evolutionary patterns. For microorganisms, the cellular machinery is dominated by the central dogma processes.The regulation of these genes is simple, groups of genes, regulations, are controlled simultaneously from short regulatory regions and a majority of thegenome is translated. Moreover, the limited physical sizes , small and compact genomes of prokaryotes, has lead to more perceptive results after thecomparative genome analysis in compare with eukaryotes. The present study we analysed the complete genome sequence of reported five differentspecies of Borrelia inclusive of pathogenic and nonpathogenic species. Comparative genome analysis of the organisms within same species not onlyshows a high degree of similarity of gene content and organization, but also a high degree of sequence heterogeneity. Genome sequence comparisonsunder this study divulge the most common and diverse genes, proteins within Borrelia species. The genomes of these species are closely related and areevolutionarily approximately equidistant. Comparative genome analysis between the five species showed that B. afzelii PKo was more closely related withB. burgdorferi JD1 than with B. garinii BgVir , B. valaisiana VS116.

Keywords: Bacterial genomics ; Comparative genomics; Genomes of B. afzelii PKo, B. bissettii DN127, B. burgdorferi JD1, B. garinii BgVir , B.valaisiana VS116,orf finder,Gene Wiz Browser,Clustal w .Multiple wise Alignments

Abbreviations: ORF-open reading frame,GW –GeneWiz,

1. INTRODUCTIONComparative genomics is the study of the relationship ofgenome structure and function across different biologicalspecies or strains. Comparisons of genomes of appropriateevolutionary distance can aid in defining protein-codinggenes, in recognizing non-coding genes, and in findingregulatory sequences and other functional elements of agenome. Closely related species reveal species-specificdifferences and similarities. Comparative genomics exploitsboth similarities and differences in the proteins, RNA, andregulatory regions of different organisms to inferevolutionary selection pressures on genes. At the sametime, a comparative sequence analysis provides the meansfor a better annotation. Consequently, computationalapproaches to genome comparison have recently become acommon research topic in computer science. It involves theuse of computer programs that can line up multiple genomesand look for regions of similarity among them. It has beenmore than 16 years since the first bacterial genomesequence was published. Hundreds of bacterial genomesequences are now available for com-parative genomics1.The complete genome sequences has been determined forfive different species of Borrelia. The genus Borrelia belongto the spirochetes, a group of bacteria that have long,

helically coiled cells comprises of 36 species of which 12 areknown to cause Lyme disease or Borreliosis which arechanneled by ticks and some species of lice. Borreliae live inthe gastro-intestinal tract of ticks (Ixodes spec.) and someare able to infect multiple hosts via tick bite. The host rangeis thought to be defined by gene variations on a group ofredundant plasmids. Only three species of this complexcause the multisystem disorder Lyme Borreliosis, namely,B.burgdorferi, B.garinii and B.afzelii. All these Furthermore,the disease patterns observed in humans are dependent onthe particular Borrelia species involved. B.garinii isprimarily associated with neuroborreliosis, B.afzelii withacrodermatitis chronica athrophicans (a chronic skindisease) whereas B.burgdorferi was found to be prevalent inLyme arthritis which however, is still uncertain. The goal ofthe present comparative study we are summarize the majorfindings of the comparative analysis at genome, gene andproteins of the different species of Borrelia. (Tim T et al,2006)

2. Statement of the ProblemThe present study pointing out the analyzing the genome,gene sequences of Borrelia species to finding the similarityof the genes and proteins among the species.

RESEARCH • GENOMICS Discovery life, Volume 2, Number 4, October 2012

lifeComputational Comparative Genomics of Borrelia Species

ISS

N 2

278

–54

42E

ISS

N 2

278

–54

34

Comparative genomics is the analysis and comparison of genomes from different species. The purpose is to gain a betterunderstanding of how species have evolved and to determine the function of genes and noncoding regions of the genome.Researchers have learned a great deal about the function of human genes by examining their counterparts in simplermodel organisms such as the mouse. Genome researchers look at many different features when comparing genomes:sequence similarity, gene location, the length and number of coding regions (called exons) within genes, the amount ofnoncoding DNA in each genome, and highly conserved regions maintained in organisms as simple as bacteria and ascomplex as humans.

Global alignment:when two nucleic acid or

amino acid sequences arelined up along their entirelength. See also localalignment

Homology: similarity insequence that is based ondescent from a commonancestor

Local alignment: thealignment of portions(rather than the entiresequence length) of twonucleic acid or amino acidsequences

Masking:the removal of repeated orlow complexity regionsfrom a sequence so thatsequences are compared

RESEARCH • GENOMICS

Anjaneyulu K et al.Computational Comparative Genomics of Borrelia Species,Discovery life, 2012, 2(4), 8-17, www.discovery.org.inwww.discovery.org.in/dl.htm © 2012 discovery publication. All rights reserved

9

2.1 Scope of the StudyIn this study, our aim is to carry out comparative analysis ofthe genomes available within the species of Borrelia namely,Borrelia afzelii PKo, Borrelia bissettii DN127, Borreliaburgdorferi JD1, Borrelia garinii BgVir and Borreliavalaisiana VS116 that can provide further insight intospecies variations and uniqueness. Information ofuniqueness and variations between pathogenic andnonpathogenic nature, and importantly, may stimulate newstudies and approaches into disease diagnosis, preventionand treatment. (Read TD et al, 2002)

2.2 Limitations of the Study Study undertaken is limited to three years Thus includes the genomes of the species of

Borrelia available during the timeline of thisresearch study

Analysis is completely computational Based.

2.3 Data Collection2.3.1Borrelia Spp. GenomesThe complete genome sequences of five different species ofBorrelia with genome statistics were collected from theNCBI Genome database (http://www.ncbi.nlm.nih.gov/sites/entrez?Db=genome) and a brief description isillustrated.

2.3.2 Open Reading FramesOpen reading frames were obtained from the ORF Finder(Open Reading Frame Finder) is a graphical analysis toolwhich finds all open reading frames of a selectable minimumsize in a user's sequence or in a sequence already in thedatabase (http://www.ncbi.nlm.nih.gov/projects/gorf/) usedfor comparative

2.3.3 Gene Finding and Microbial GenomeAnnotationIn order to obtained the Genome Annotation, protein codinggenes tools such as GLIMMER (Gene Locator andInterpolated Markov ModelER) and GeneMark.hmm, 1.3used respectively. GeneMark.hmm, 1.3 is the secondgeneration of GeneMark, the DNA sequence is interpretedas a realization of the hidden semi-Markov model withgenome specific parameters. Then the maximum likelihoodparse of the sequence into protein-coding and non-codingregions is generated by an optimization algorithm.(http://www.ncbi.nlm.nih.gov/genomes/MICROBES/genemark.cgi?GeneMark.hmmPROKARYOTIC)(Version2.6r)GLIMMER is a system for finding genes in microbialDNA, especially the genomes of bacteria and archaea.GLIMMER uses interpolated Markov models to identifycoding regions.(http://www.ncbi.nlm.nih.gov/genomes/MICROBES/glimmer_3.cgi?)

2.3.4 tRNA Scan SEtRNA Scan SE identifies transfer RNA genes in genomicDNA or RNA sequences. It combines the specificity of theCove probabilistic RNA prediction package with the speedand sensitivity of tRNAscan 1.3 plus an implementation ofan algorithm described by which searches for tRNApromoters. tRNAscan and EufindtRNA are used as first-passprefilters to identify "candidate" tRNA regions of thesequence. These subsequences are then passed to Covefor further analysis and output if Cove confirms the initialtRNA prediction.(http://mobyle.pasteur.fr/cgi-bin/portal.py?#forms::trnascan)

2.3.5 rRNA Predictionr RNA prediction is completed using RNAmmer 1.2 Server.The RNAmmer 1.2 server predicts 5s/8s, 16s/18s and23s/28s ribosomal RNA in full genome sequences. ThisWeb Service predicts location of ribosomal RNA genes infull genome sequences by using Hidden Markov Modelsbased on alignments from a highly curated dataset of

structurally aligned sequences. The input is one or moregenomic sequence(s) as either one or more contigs. Eachcontig is submitted as one continous string of DNA togetherwith the sequences identifier. The kingdom (Bacteria,Archaea, or Eukaryotes) is specified once for each job thatis submitted, using the abbriviations 'bac', 'arc', and 'euk'.(http://www.cbs.dtu.dk/services/RNAmmer/)

2.3.6 Tandem Repeat FinderA tandem repeat in DNA is two or more adjacent,approximate copies of a pattern of nucleotides. It is aprogram to locate and display tandem repeats in DNAsequences. (http://tandem.bu.edu/trf/trf.html)

2.3.7 Comparative Genome AnalysisGenome comparison is achieved by using several toolsillustrated below

GeneWiz browser 0.94 server

GeneWiz browser 0.94 server is an interactive webapplication for visualizing genomic data of prokaryoticchromosomes. The tool allows users to carry out variousanalyses such as mapping alignments of homologous genesto other genomes, mapping of short sequencing reads to areference chromosome and calculating DNA properties suchas curvature or stac k-ing energy along the chromosome.The GeneWiz browser produces an interactive graphic thatenables zooming from a global scale down to singlenucleotides without changing the size of the plot. Its ability todisproportionally zoom provides optimal readability andincreased functionality compared to other browsers. It allowsthe user to select the display of various genomic featuressuch as color setting and data ranges. Custom numericaldata can be added to the plot allowing, for example,visualization of gene expression and regulation data.Further, standard atlases are pre-generated for allprokaryotic genomes available in GenBank, providing a fastoverview of all available genomes, including recentlydeposited genome sequences. The tool is available onlinefrom (http://www.cbs.dtu.dk/services/gwBrowser)

2.3.8 –Clustal w SoftwareClustal W Multiple Wise Alignments ProgramClustalW2 is a general purpose multiple sequence alignmentprogram for DNA or proteins. It attempts to calculate thebest match for the selected sequences and lines them up sothat the identities, similarities and differences can be seen.(http://www.ebi.ac.uk/Tools/msa/clustalw2/#)

3 RESULTS AND DISCUSSION3.1 Borrelia Spp. Genome SequencesThe complete genome sequences of five different species ofBorrelia have been reported and a brief description of eachis presented below

3.1.1 Borrelia afzelii PKo (A)The Lineage: Bacteria - Spirochaetes – Spirochaetales –Spirochaetaceae – Borrelia - Borrelia burgdorferi group -Borrelia afzelii. This species was isolated from a skin lesionfrom a Lyme disease and the genome sequence is takenfrom NCBI.The Borrelia afzelii PKo contains: Base Pairs = 903609 bp,Plasmids = 26, Size of the Genome = 1.74MB, G+C = 28.31%, Genes = 839, Proteins = 826, rRNA = 6, 5s rRNA = 2,23s rRNA = 2, 16s rRNA = (+/-) 2 , tRNA = 33 , TandemRepeats = 58 , ORF = 2280, Figure 1: shows the Genomemap of the Borrelia afzelii PKo. (Line 1 shows the intrinsiccurvature; Lines 2 and 3 displays the stacking energy andpositional preferences, respectively; Line 4 and Line 5depicts the Global direct repeats and global invertedrepeats; Line 6 determines the GC skew ([G_C]/[G+C]); Line7 drafts percent AT; Lines 8, 9, 10 and 11 depicts A, T, Gand C content respectively. Lines 12, 13, 14 and 15 showsAAAA, TTTT, GGGG and CCCC repeats respectively. Line

Similarity:how related one

nucleotide or proteinsequence is to another.The extent of similaritybetween two sequencesis based on the percentof sequence identityand/or conservation

RESEARCH • GENOMICS

Anjaneyulu K et al.Computational Comparative Genomics of Borrelia Species,Discovery life, 2012, 2(4), 8-17, www.discovery.org.inwww.discovery.org.in/dl.htm © 2012 discovery publication. All rights reserved

10

16 displays AT skew and Line 17 determines simplerepeats)Genes in lines are color-coded according to the followingcategory:

Wine Red = The genes involved in centralmetabolism and respiration without orthologues inH.pyloricyan, methyl-accepting chemotaxisproteins (MCPs)

Dark Blue = Type IV secretion system

Sky Blue = Genes involved in acid acclimation

Green = Putative secreted virulence factors

Pale Green = Glycosyltransferse gene clusterspecific of H.bizzozeronii;

Pale Grey = All other CDSs. ACC, acetophenonecarboxylase; comB, Type IV secretion system;NAP, periplasmic nitrate reductase; AHD,allophanate hydrolase; GT, glycosyltransferase;NRS, nitrite reductase system; SNO, S and Noxidases; FDH, formate reductase system; PL,polysaccharide lyase

Figure 1Genome map of the Borrelia afzelii PKo.

RESEARCH • GENOMICS

Anjaneyulu K et al.Computational Comparative Genomics of Borrelia Species,Discovery life, 2012, 2(4), 8-17, www.discovery.org.inwww.discovery.org.in/dl.htm © 2012 discovery publication. All rights reserved

11

3.1.2 Borrelia bissettii DN127 (B)The Lineage: Bacteria – Spirochaetes – Spirochaetales –Spirochaetaceae – Borrelia - Borrelia burgdorferi group -Borrelia bissettii. This species was isolated from a westernblackleg tick, Ixodes pacificus in California. This strain haspattern of outer surface proteins that is different from manyother North American strains.Borrelia bissettii PKo contains:Base Pairs = 900755 bp, Plasmids = 20, Size of theGenome = 1.53, G+C = 28.70%, Genes = 836, Proteins =817, rRNA = 5, 5s rRNA = 2, 23s rRNA = 2, 16s rRNA = (+/-)

1, tRNA = 33, Tandem Repeats = 58, ORF = 2361., Figure2: shows the Genome map of Borrelia bissettii DN127

3.1.3 Borrelia burgdorferi JD1 (C)The Lineage: Bacteria – Spirochaetes – Spirochaetales –Spirochaetaceae – Borrelia - Borrelia burgdorferi group -Borrelia burgdorferi N40 was isolated from an adult deertick, Ixodes dammini in New York.The Borrelia burgdorfericontains: Base Pairs = 922801 bp, Plasmids = 11, Size ofthe Genome = 1.27, G+C = 28.55%, Genes = 856, Proteins= 809, rRNA = 5, 5s rRNA = 2, 23s rRNA = 2, 16s RNA =(+/-) 1, tRNA = 33, Tandem Repeats = 57, ORF = 2351.,

Figure 2Genome map of Borrelia bissettii DN127

RESEARCH • GENOMICS

Anjaneyulu K et al.Computational Comparative Genomics of Borrelia Species,Discovery life, 2012, 2(4), 8-17, www.discovery.org.inwww.discovery.org.in/dl.htm © 2012 discovery publication. All rights reserved

12Figure 3: shows the Genome map of Borrelia burgdorferiJD1. (Besemer J et al, 1999)

3.1.4 Borrelia garinii BgVir (D)The Lineage: Bacteria – Spirochaetes – Spirochaetales –Spirochaetaceae – Borrelia - Borrelia burgdorferi group -Borrelia garinii. This organism was orginally classified asBorrelia burgdorferi however B.garinii was subsequentlydetermined to be a separate species based on geneticanalysis. The type strain of B.garinii was isolated fromIxodes ricinus in France. B.garinii is a major causative agentof tick-borne Borreliosis in Europe. Neurologic symptomssuch as arthritis, meningitis and extreme leg and back painare characteristic of infection by B.garinii.The Borrelia gariniicontains: Base Pairs = 905534 bp, Plasmids = 26, Size of

the Genome = 1.74MB, G+C = 28.44 %, Genes = 836,Proteins = 832, rRNA = 6, 5s rRNA = 2, 23s rRNA = 2, 16srRNA = (+/-) 2, tRNA = 33, Tandem Repeats = 62, ORF =2317., Figure 4: shows the Genome map of Borrelia garinniBgVir.

3.1.5 Borrelia valaisiana VS116 (E)The Lineage: Bacteria – Spirochaetes – Spirochaetales –Spirochaetaceae – Borrelia - Borrelia burgdorferi group -Borrelia valaisiana. This species was first isolated fromIxodes ricinus ticks in Valais, Switzerland.The Borreliavalaisiana contains: Base Pairs = 913294 bp, Plasmids = 16,Size of the Genome = 1.4MB, G+C = 28.10 %, Genes =849, Proteins = 832, rRNA = 6, 5s rRNA = 2, 23s rRNA = 2,16s rRNA = (+/-) 2, tRNA = 33, Tandem Repeats = 44, ORF

Figure 3Genome map of Borrelia burgdorferi JD1.

RESEARCH • GENOMICS

Anjaneyulu K et al.Computational Comparative Genomics of Borrelia Species,Discovery life, 2012, 2(4), 8-17, www.discovery.org.inwww.discovery.org.in/dl.htm © 2012 discovery publication. All rights reserved

13

= 2307., Figure 5: shows the Genome map of Borreliavalaisiana VS116.3.2 Comparative GenomicsThe results of the comparative analysis of the genomes offive species of the genus Borrelia illustrated in table 1 .Genomes of the five species differentiating based on theparameters such as base pairs, plasmids, size of thegenome, G+C %, genes, proteins, rRNA, tRNA, tandemrepeats and ORF. It was observed that the total number ofbase pairs of Borrelia burgdorferi JD1 was in the vicinity withthe base pairs of Borrelia valaisiana VS116. Table 1 showsthe Comparison based on different types of analysis ofBorrelia species: (Bendtsen JD et al, 2005)

A: Borrelia afzelii Pko, B: Borrelia bissetti DN127, C:Borrelia burgdorferi JD1, D: Borrelia garinii BgVir, E:

Borrelia valaisiana VS11

Borrelia afzelii PKo and Borrelia garinii BgVir plasmids wereexactly consubstantial to each other while other three havediminutive difference. Size of the Genome was found to becoequal between B. afzelii PKo and B. garinii BgVir whereas B. bissettii DN127 was at a close proximity with B. afzelliiPKO and B. garinii BgVir.B. bissettii DN127 is bordering onwith B. burgdorferi JD1 in aspect of G+C % and proteinssubsume. With respect to numbers of genes among all thefive species, the total number genes were almost same.Distribution of rRNA and tRNA were similar with a minutedifference among all species. B. afzelli PKo, B. bissettiiDN127 and B. burgdorferi JD1 also embody nearly identicalTandem repeats and ORF respectively.Hence, from theanalysis of these parameters it can be perceived that B.bissettii DN127 cluster showed only minor differences withthe cluster of B. burgdorferi JD1. On the other hand, allfive Borrelia strains tested were very close to each other andshowed a lack of variation. (Alm EJ et al, 2005)

Figure 4Genome map of Borrelia garinni BgVir.

RESEARCH • GENOMICS

Anjaneyulu K et al.Computational Comparative Genomics of Borrelia Species,Discovery life, 2012, 2(4), 8-17, www.discovery.org.inwww.discovery.org.in/dl.htm © 2012 discovery publication. All rights reserved

14

3.3 Genome Annotation:Comparative AnalysisGenome Atlas Comparison was performed toabstract the assorted DNA properties and depictedin Table 2 . About 4114 sequences of all five speciesof Borrelia were studied using GeneWiz browser0.94 server. Different parameters were obtained &hence values were quoted which helped tounderstand & prdict gene function of eachspecies.The above table makes predictions forintrinsic curvature fragments of all five Borreliaspecies which run across 0.22 to 0.27 curvatureunits of DNA wrapped around the nucleosome. Thescale of stacking energy is in the range of -7.274 to -6.579 kcal/mol, thus would require more energy todestack or melt the helix. Positional preferencecalculated the magnitude of these species ofBorrelia (0.51 to 0.187) of the trinucleotide numberswhich states that their DNA flexibility is very low. Theglobal direct repeats and inverted repeats were

Figure 5 Genome map of Borrelia valaisiana VS116.

Figure 6Representative graph of Sequences of Borrelia species withaligned score of 100% identity

RESEARCH • GENOMICS

Anjaneyulu K et al.Computational Comparative Genomics of Borrelia Species,Discovery life, 2012, 2(4), 8-17, www.discovery.org.inwww.discovery.org.in/dl.htm © 2012 discovery publication. All rights reserved

15

Table 2 Genome Annotation Comparison of five Borrelia species on the basis of DNA properties

Table 1 Comparison based on different types of analysis of Borrelia species:

RESEARCH • GENOMICS

Anjaneyulu K et al.Computational Comparative Genomics of Borrelia Species,Discovery life, 2012, 2(4), 8-17, www.discovery.org.inwww.discovery.org.in/dl.htm © 2012 discovery publication. All rights reserved

16

found to be 5 to 7.5 for the best match withinthe whole segment, on the same strand andopposite strand, in the same directionrespectively.

GC skewend correlations of replicationleading and lagging strand were -0.184 to0.179. AT rich regions are often morereadily melted, tend to be less pliable andmore rigid, hence Borrelia species fall scopeof 0.2 to 0.8 which is 80% AT although theycan also be readily compacted chromatinproteins. The fraction of A's and T’s in agiven sequence is approximately 40% each(0.23 to 0.49) while G’s and C’s is 10% each(0.57 to 0.24). Repeats of AAAA and TTTTarrive at 0.04 to 0.21 though GGGG andCCCC were at 0.008 to 0.016. AT skewendwas absent in B. garinii BgVir although inothers correlation of replication leading andlagging strand were -0.26 to 0.26. LocalDirect Repeats encountered were 6 to 7.3except in B. valaisiana VS116 for the bestmatch of a 30 bp piece within a 100 bpsequence window that window, on the samestrand, in the same direction. Perfect matchof tandem repeats within the window whichcontain a simple oligonucleotide repeat wereat 4.5 to 5.26 positions.

3.4 Protein SequenceComparative AnalysisTable-3 The protein sequences wereobtained from NCBI genome browser foreach Borrelia species. Multiple sequencealignment was executed by using ClustalWsoftware. The table 3 denotes total numberof protein sequences of five species having100% identity based on sequence ofproteins. The analysis was performed on113 sequences of five species of Borreliawhich matched with each other. ,Table 3:Total number of Sequences of Borreliaspecies with aligned score of 100% identityIt was observed Borrelia afzelii Pko has 12sequences matched with other threespecies except B. valaisiana VS116 whichwas 9 sequences having 100% identity.Borrelia bissetti DN127 and Borreliaburgdorferi JD1 have the maximum 100%identity alignment of 17 sequences whileBorrelia bissetti DN127 and Borreliavalaisiana VS11 have the least alignedsequences about 8 sequences. Borreliagarinii BgVir and Borrelia burgdorferi JD1were discerned to be having 13 sequenceswith complete similar protein basedsequences. Borrelia valaisiana VS11showed 8-10 sequences aligned 100%identity with other species of Borreliarespectively. Based on protein alignmentand resulting 100% identity scores, therepresentative graph (Fig 6) evidentlyconfirm the Species C i.e. Borreliaburgdorferi JD1, shares maximum identitywith other species followed by species Bi.e. Borrelia bissettii DN127, and SpeciesA,D Borrelia afzelii PKo, , Borrelia gariniiBgVir and Borrelia valaisiana VS116respectively, Figure 6 Representative graphof Sequences of Borrelia species withaligned score of 100% identity.

Table 4 Comparative analysis of the Proteins present in Borellia species: 1: Borrelia afzelii Pko, 2:Borrelia bissetti DN127, 3: Borrelia burgdorferi JD1, 4: Borrelia garinii BgVir, 5: Borrelia valaisianaVS11

Table 3 Total number of Sequences of Borrelia species with aligned score of 100% identity

RESEARCH • GENOMICS

Anjaneyulu K et al.Computational Comparative Genomics of Borrelia Species,Discovery life, 2012, 2(4), 8-17, www.discovery.org.inwww.discovery.org.in/dl.htm © 2012 discovery publication. All rights reserved

17

3.5 Protein Functional CategoriesTable-4 The protein sequences were isolated from genomefor each of the Borrelia species, under this study werealigned using ClustalW. Aligned sequences were filteredbased on score values with 100% identity. Out of 4114proteins analysed with Clustal W, only 113 Proteins shownthe 100% identity. The above table narrates the categoriesof proteins total of 23 types exist among the 113 highestsimilar proteins among these species.

Protein sequence based analysis also confirm theribosomal proteins S10 and S21 were most common to allthe five species of Borrelia. Borrelia burgdorferi JD1contains most of the ribosomal proteins. Borrelia bissettiDN127 has maximum of 50S ribosomal proteinsaccompanied with Borrelia burgdorferi JD1 having highestnumber of 30S ribosomal proteins. Putative membraneproteins were found in all the five species except B. gariniiBgVir. Apart from ribosomal proteins, hypothetical proteinsand response regulatory proteins are another class ofproteins observed mostly in all species of Borrelia. Flagellarproteins were spot in Borrelia bissetti DN127, B. burgdorferiJD1 and B. garinii BgVir. Borrelia afzelii Pko is deprived ofinitiation factor protein matching with rest of the species,similarly was with Borrelia valaisiana VS11 lacking acyl andphospo carrier proteins. Subsequently, the protein, Ptsh – 2was only found in B. garinii BgVir. CheC-Like and CheY-likewere espy mostly in B. bissetti DN127 and B. burgdorferiJD1 respectively., Table 4: Comparative analysis of theProteins present in Borellia species, Figure 6:Representative graph of Sequences of Borrelia species withaligned score of 100% identity.

3.6 ConclusionsComparative genomics is a valuable method to discern

species specific similarity based on the sequences. Ouranalysis of comparative genomes sequences from B. afzeliiPko, B. bissettii DN127, B. burgdorferi JD1, B. garinii and B.valaisiana VS11 species reveals supplementary informationabout the common genes and proteins involved inregulation, virulence,and physiology in all species ofBorrelia. Based on analysis as illustrated above (table 1) wemay infer that B. bissetti DN127 and B. burgdorferi JD1evenly match in proportion of parameters such as number ofgenes, proteins, rRNA, tRNA, tandem repeats and ORF.Further annotation analysis concede that all genomeannotations just alter with a minuscule anomaly. Proteincomparative analysis with ClustalW for multiple sequencealignmentwith 100% identity reveals that B. bissettii DN127and B. burgdorferi JD1 share ample pairs accounting for 17sequences with synonymous type of proteins as illustratedresults in (Table 3). Protein function comparasion (table 4)confirm that B. bissettii DN127 and B. burgdorferi JD1exhibit most of the analogous proteins such as ribosomalproteins, 30S and 50S ribosomal proteins, hypotheticalproteins, flagellar proteins, Initiation factors and ATP bindingproteins.

Comparative genomics analysis between the speciesrevealed that B. burgdorferi was more meticulously relatedto B. bissettii, moderately with B. garinii and B. afzelii andleast with B. valaisiana. Future detail analyses will provideinformation essential for understanding the identification ofunique targets for diagnosis, drug development andidentification of antigens for a universal vaccine. Anoverreach goal of such studies is an improved diagnosisprevention and treatment.

SUMMARY OF RESEARCHThe last few decades have brought an enormous and exciting expansion of knowledge about the comparative genomics which is helpful for finding thefunctional gene similarity between the species. By comparing the finished reference sequence of the human genome with genomes of other organisms,researchers can identify regions of similarity and difference.

FUTURE ISSUESComparison of Infectious non proteobacteria among genes and Proteins by using alignments tools and software’s.

DISCLOSURE STATEMENTThere is no financial support for this research work from the funding agency.

ACKNOWLEDGEMENTWe thank our guide for his timely help, giving outstanding ideas and encouragement to finish this research work successfully.

REFERENCES1. Alm EJ, Huang KH, Price MN, Koche RP, Keller K, Dubchak IL, Arkin AP. The MicrobesOnline Web site for

comparative genomics, Genome Res., 2005, 15(7),1015–10222. Bendtsen JD, Binnewies TT, Hallin PF, Ussery DW (2005b)3. Besemer J, Borodovsky M. Heuristic approach to deriving models for gene finding, Nucleic Acids Research 27,

1999, 3911-39204. Tim T, Binnewies, Yair Motro, Peter F, Hallin et’al. Ten years of bacterial genome sequencing comparative-

genomics-based discoveries, Funct Integr Genomics, 2006, 6, 165–1855. Read TD, Salzberg SL, Pop M, Shumway M, Umayam L, Jiang L,Holtzapple E, Busch JD, Smith KL, Schupp JM,

Solomon D, Keim P, Fraser CM. Comparative genome sequencing for discovery of novel polymorphisms inBacillus anthracis. Science, 2002, 296, 2028-2033

RELATED RESOURCE1. Abbott JC, Aanensen DM, Rutherford K, Butcher S, Spratt BG. WebACT—an online companion for the Artemis

Comparison Tool, Bioinformatics, 2005, 21(18),3665 –36662. Bendtsen JD, Binnewies TT, Hallin PF, Sicheritz-Ponten T, Ussery DW. Genome update: prediction of secreted proteins in

225 bacterial proteomes, Microbiology, 2005a, 151(Pt 6), 1725–1727