the reference sequence database a non-redundant collection of richly annotated dna, rna, and protein...
TRANSCRIPT
The Reference Sequence database
• A non-redundant collection of richly annotated DNA, RNA, and protein sequences from diverse taxa
• The collection includes sequences from plasmids, organelles, viruses, archaea, bacteria, and eukaryotes
• Each RefSeq represents a single, naturally occurring molecule from one organism.
• RefSeq biological sequences (also known as RefSeqs) are derived from GenBank records but differ in that each RefSeq is a synthesis of information, not an archived unit of primary research data
• Similar to a review article in the literature, a RefSeq represents the consolidation of information by a particular group at a particular time.
Accession prefix Molecule type CommentAC_ Genomic Complete genomic molecule,
alternate assembly
NC_ Genomic Complete genomic molecule, reference assembly
NG_ Genomic Incomplete genomic region
NT_ Genomic Contig or scaffold, clone-based or WGSa
NW_ Genomic Contig or scaffold, primarily WGSa
NS_ Genomic Environmental sequence
NZ_b Genomic Unfinished WGS NM_ mRNANR_ RNAXM_c mRNA Predicted modelXR_c RNA Predicted model AP_ Protein Annotated on AC_ alternate
assemblyNP_ ProteinYP_c ProteinXP_c Protein Predicted modelZP_c Protein Predicted model, annotated on
NZ_ genomic records
a Whole Genome Shotgun sequence data.b An ordered collection of WGS for a genome.c Computed.
The RefSeq accession number format and molecule types
Flat File Format and Annotated Features
RefSeq records appear similar in format to the GenBank records from which they are derived.
Features of a RefSeq record
RefSeq records may also be displayed in a graphical format
Code Description GENOME ANNOTATION The RefSeq record is provided via automated processing and
is not subject to individual review or revision between builds.
INFERRED The RefSeq record has been predicted by genome sequence analysis, but it is not yet supported by experimental evidence. The record may be partially supported by homology data.
PREDICTED The RefSeq record has not yet been subject to individual review, and some aspect of the RefSeq record is predicted.
PROVISIONAL The RefSeq record has not yet been subject to individual review. The initial sequence-to-gene name associations have been established by outside collaborators or NCBI staff.
REVIEWED The RefSeq record has been reviewed by NCBI staff or by a collaborator. The NCBI review process includes assessing available sequence data and the literature. Some RefSeq records may incorporate expanded sequence and annotation information.
VALIDATED The RefSeq record has undergone an initial review to provide the preferred sequence standard. The record has not yet been subject to final review, at which time additional functional information may be provided.
WGS The RefSeq record is provided to represent a collection of whole genome shotgun sequences. These records are not subject to individual review or revisions between genome updates.
RefSeq status codes
Using Entrez Limits to restrict a query to RefSeq
http://www.ncbi.nlm.nih.gov/gene
Gene maintains information about genes from genomes of interest to the RefSeq group
Find genes by... Search textfree text human muscular dystrophypartial name and multiple species transporter[title] AND ("Drosophila melano
gaster"[orgn] OR "Mus musculus"[orgn])
chromosome and symbol (II[chr] OR 2[chr]) AND adh*[sym]
associated sequence accession number M11313[accn]
gene name (symbol) BRCA1[sym]publication (PubMed ID) 11331580[PMID]Gene Ontology (GO) terms or identifiers "cell adhesion"[GO]
10030[GO] Genes with variants of medical interest gene_snp_clin[filter]
chromosome and species Y[CHR] AND human[ORGN]Enzyme Commission (EC) numbers 1.9.3.1[EC]
Entrez Gene is accessed like any other Entrez database: