ncbi genebank, bilogical data bases, bioinformatics data base

3
Genbank: GenBank (http://www.ncbi.nlm.nih.gov/genbank/) is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences, built and distributed by the National Center for Biotechnology Information (NCBI), a division of the National Library of Medicine (NLM), located on the campus of the US National Institutes of Health (NIH) in Bethesda, MD. NCBI builds GenBank primarily from the submission of sequence data from authors and from the bulk submission of expressed sequence tag (EST), genome survey sequence (GSS) and other high-throughput. It has a flat file structure that is an ASCII text file, readable by both humans and computers. In addition to sequence data, GenBank files contain information like accession numbers and gene names, phylogenetic classification and references to published literature. There are approximately 126,551,501,141 bases in 135,440,924 sequence records in the traditional GenBank divisions and 191,401,393,188 bases in 62,715,288 sequence records in the WGS (Whole genome shotgun project) division as of April 2011. GenBank is part of the International Nucleotide Sequence Database Collaboration, which comprises the DNA DataBank of Japan (DDBJ), the European Molecular Biology Laboratory (EMBL), and GenBank at NCBI. These three organizations exchange data on a daily basis. Structure of Genbank Entries. ( GenBank Flat File Format) LOCUS SCU49845 5028 bp DNA PLN 21-JUN-1999 DEFINITION Saccharomyces cerevisiae TCP1-beta gene, partial cds, and Axl2p (AXL2) and Rev7p (REV7) genes, complete cds. ACCESSION U49845 VERSION U49845.1 GI :1293613 KEYWORDS . SOURCE Saccharomyces cerevisiae (baker's yeast) ORGANISM Saccharomyces cerevisiae Eukaryota; Fungi; Ascomycota; Saccharomycotina; Saccharomycetes;Saccharomycetales; Saccharomycetaceae; Saccharomyces. REFERENCE 1 (bases 1 to 5028) AUTHORS Torpey,L.E., Gibbs,P.E., Nelson,J. and Lawrence,C.W. TITLE Cloning and sequence of REV7, a gene whose function is required for DNA damage-induced mutagenesis in Saccharomyces cerevisiae JOURNAL Yeast 10 (11), 1503-1509 (1994)

Upload: rajesh-guru

Post on 23-Feb-2015

47 views

Category:

Documents


1 download

DESCRIPTION

bilogical data bases, bioinformatics data base

TRANSCRIPT

Page 1: NCBI GENEBANK, bilogical data bases, bioinformatics data base

Genbank:

GenBank (http://www.ncbi.nlm.nih.gov/genbank/) is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences, built and distributed by the National Center for Biotechnology Information (NCBI), a division of the National Library of Medicine (NLM), located on the campus of the US National Institutes of Health (NIH) in Bethesda, MD. NCBI builds GenBank primarily from the submission of sequence data from authors and from the bulk submission of expressed sequence tag (EST), genome survey sequence (GSS) and other high-throughput. It has a flat file structure that is an ASCII text file, readable by both humans and computers. In addition to sequence data, GenBank files contain information like accession numbers and gene names, phylogenetic classification and references to published literature.

There are approximately 126,551,501,141 bases in 135,440,924 sequence records in the traditional GenBank divisions and 191,401,393,188 bases in 62,715,288 sequence records in the WGS (Whole genome shotgun project) division as of April 2011. GenBank is part of the International Nucleotide Sequence Database Collaboration, which comprises the DNA DataBank of Japan (DDBJ), the European Molecular Biology Laboratory (EMBL), and GenBank at NCBI. These three organizations exchange data on a daily basis.

Structure of Genbank Entries. ( GenBank Flat File Format)

LOCUS SCU49845 5028 bp DNA PLN 21-JUN-1999DEFINITION Saccharomyces cerevisiae TCP1-beta gene, partial cds, and Axl2p (AXL2) and Rev7p (REV7) genes, complete cds.ACCESSION U49845VERSION U49845.1 GI:1293613KEYWORDS .SOURCE Saccharomyces cerevisiae (baker's yeast) ORGANISM Saccharomyces cerevisiae Eukaryota; Fungi; Ascomycota; Saccharomycotina; Saccharomycetes;Saccharomycetales; Saccharomycetaceae; Saccharomyces.REFERENCE 1 (bases 1 to 5028) AUTHORS Torpey,L.E., Gibbs,P.E., Nelson,J. and Lawrence,C.W. TITLE Cloning and sequence of REV7, a gene whose function is required for DNA damage-induced mutagenesis in Saccharomyces cerevisiae JOURNAL Yeast 10 (11), 1503-1509 (1994) PUBMED 7871890 TITLE Direct Submission JOURNAL Submitted (22-FEB-1996) Terry Roemer, Biology, Yale University, NewHaven, CT, USAFEATURES Location/Qualifiers source 1..5028 /organism="Saccharomyces cerevisiae" /db_xref="taxon:4932" /chromosome="IX" /map="9" CDS <1..206 /codon_start=3 /product="TCP1-beta" /protein_id="AAA98665.1" /db_xref="GI:1293614" ORIGIN 1 gatcctccat atacaacggt atctccacct caggtttaga tctcaacaac ggaaccattg //

Page 2: NCBI GENEBANK, bilogical data bases, bioinformatics data base

LOCOUS Short name of the sequences (dataelement, seq.name.molecule type,Gen Bank division, Modification date—max 32 character)

DEFINATION Defination of the sequence (Source organism, Gene/protein name, brief description of sequence function—Max-80 character)

ACCESSION Accession no of the entry (uniquie identifier)- Letter + NoVERSION Version no (Identification no- dynamic)DBSOURCE Source, Date of creation & Modification of dataKEYWORDS Word describing sequences of the entry (Present in older data)AUTHORS Auther of the work (Cited article)TITLE Title of publicationJOURNAL Journal reference & NameMEDLINE MEDLINE IDCOMMENT Lines of commentSOURCE ORGANISM Organism from which sequences has been derivedORGANISM Full name of organism (Max-80 character)AUTHORS Author of the sequence(Max-80 character)ACCESSION ID For the sequence (Max-80 character)FEATURES Feature of the sequence (Information about gene, gene product)CDS Coding sequence; region of nucleotides that corresponds

with the sequence of amino acids in a protein<1..206…(No) base span includes the start and stop codons (LEFT/right)GI "GenInfo Identifier" sequence identification number.ORIGIN Beginning of sequence data// End of sequence data

Submissions to GenBank

Many journals require submission of sequence information to a database prior to publication so that an accession number may appear in the paper.The data from the database is retrieved by Entrez search system, BLAST & Downloaded by NCBI E-Utilities. There are several options for submitting data to GenBank:

BankIt , a WWW-based submission tool for convenient and quick submission of sequence data

Sequin , NCBI's stand-alone submission software for MAC, PC, and UNIX platforms, is available by FTP. When using Sequin, the output files for direct submission should be sent to GenBank by e-mail.

tbl2asn , a command-line program, automates the creation of sequence records for submission to GenBank using many of the same functions as Sequin. It is used primarily for submission of complete genomes and large batches of sequences.

Barcode Submission Tool , a WWW-based tool for the submission of GenBank sequences and trace data for Barcode of Life projects. Currently, only mitochondrial cytochrome c oxidase subunit I (COI) genes are being accepted with this tool. For the submissions of loci other than COI please use either Bankit or Sequin. There are specialized, streamlined procedures for batch submissions of sequences, such as EST, STS, and GSS sequences.

NCBI places no restrictions on the use or distribution of the GenBank data. However, some submitters may claim patent, copyright, or other intellectual property rights in all or a portion of the data they have submitted.