nucleic acid databases[1]
TRANSCRIPT
-
8/11/2019 Nucleic Acid Databases[1]
1/37
8/27/2014 5:03 AM
Introduction to Bioinformatics
databases: Nucleic Acid
Databases
DineshGupta
ICGEB
-
8/11/2019 Nucleic Acid Databases[1]
2/37
8/27/2014 5:03 AM
Biological databases: why?
Need for storing and communicating
large datasets has grown
Make biological data available to
scientists.
To make biological data available in
computer-readable form.
-
8/11/2019 Nucleic Acid Databases[1]
3/37
8/27/2014 5:03 AM
Different classifications of
databases
Type of data
nucleotide sequences
protein sequences
proteins sequence patterns or motifs
macromolecular 3D structure
gene expression data
metabolic pathways
-
8/11/2019 Nucleic Acid Databases[1]
4/37
8/27/2014 5:03 AM
Different classifications of databases.
Primary or derived databases
Primary databases: experimental results
directly into database
Secondary databases: results of analysis of
primary databases
Aggregate of many databases
Links to other data items Combination of data
Consolidation of data
-
8/11/2019 Nucleic Acid Databases[1]
5/37
8/27/2014 5:03 AM
Different classifications of databases.
Technical design
Flat-files
Relational database (SQL)
Exchange/publication technologies (FTP,
HTML, CORBA, XML,...)
-
8/11/2019 Nucleic Acid Databases[1]
6/37
8/27/2014 5:03 AM
Different classifications of databases.
Availability
Publicly available, no restrictions
Available, but with copyright
Accessible, but not downloadable
Academic, but not freely available
Proprietary, commercial; possibly free for
academics
-
8/11/2019 Nucleic Acid Databases[1]
7/37
8/27/2014 5:03 AM
Where do I get DB of my interest ?
-
8/11/2019 Nucleic Acid Databases[1]
8/37
8/27/2014 5:03 AM
-
8/11/2019 Nucleic Acid Databases[1]
9/37
8/27/2014 5:03 AM
http://www3.oup.co.uk/nar/database/c/
http://www3.oup.co.uk/nar/database/c/http://www3.oup.co.uk/nar/database/c/ -
8/11/2019 Nucleic Acid Databases[1]
10/37
8/27/2014 5:03 AM
Nucleotide sequence databases
EMBL, GenBank, and DDBJ are the three
primary nucleotide sequence
databases
EMBL www.ebi.ac.uk/embl/
GenBank
www.ncbi.nlm.nih.gov/Genbank/
DDBJ www.ddbj.nig.ac.jp
http://www.ebi.ac.uk/embl/http://www.ncbi.nlm.nih.gov/Genbank/http://www.ddbj.nig.ac.jp/http://www.ddbj.nig.ac.jp/http://www.ncbi.nlm.nih.gov/Genbank/http://www.ebi.ac.uk/embl/ -
8/11/2019 Nucleic Acid Databases[1]
11/37
8/27/2014 5:03 AM
Genbank
An annotated collection of all publiclyavailable nucleotide and proteins
Set up in 1979 at the LANL (Los Alamos).
Maintained since 1992 NCBI (Bethesda).
http://www.ncbi.nlm.nih.gov
http://www.ncbi.nlm.nih.gov/http://www.ncbi.nlm.nih.gov/ -
8/11/2019 Nucleic Acid Databases[1]
12/37
8/27/2014 5:03 AM
-
8/11/2019 Nucleic Acid Databases[1]
13/37
8/27/2014 5:03 AM
-
8/11/2019 Nucleic Acid Databases[1]
14/37
8/27/2014 5:03 AM
EMBL Nucleotide Sequence
Database An annotated collection of all publicly available
nucleotide and protein sequences
Created in 1980 at the European Molecular
Biology Laboratoryin Heidelberg.
Maintained since 1994 by EBI- Cambridge.
http://www.ebi.ac.uk/embl.html
http://www.ebi.ac.uk/embl.htmlhttp://www.ebi.ac.uk/embl.html -
8/11/2019 Nucleic Acid Databases[1]
15/37
8/27/2014 5:03 AM
-
8/11/2019 Nucleic Acid Databases[1]
16/37
8/27/2014 5:03 AM
http://www3.ebi.ac.uk/Services/DBStats/
http://www3.ebi.ac.uk/Services/DBStats/http://www3.ebi.ac.uk/Services/DBStats/http://www3.ebi.ac.uk/Services/DBStats/http://www3.ebi.ac.uk/Services/DBStats/ -
8/11/2019 Nucleic Acid Databases[1]
17/37
8/27/2014 5:03 AM
DDBJDNA Data Bank of Japan
An annotated collection of all publicly availablenucleotide and protein sequences
Started, 1984 at the National Institute ofGenetics(NIG) in Mishima.
Still maintained in this institute a team led by
Takashi Gojobori.
http://www.ddbj.nig.ac.jp
http://www.ddbj.nig.ac.jp/http://www.ddbj.nig.ac.jp/ -
8/11/2019 Nucleic Acid Databases[1]
18/37
8/27/2014 5:03 AM
-
8/11/2019 Nucleic Acid Databases[1]
19/37
8/27/2014 5:03 AM
-
8/11/2019 Nucleic Acid Databases[1]
20/37
8/27/2014 5:03 AM
Other NCBI nucleic acids DBs
EST database:A collection of expressed sequence tags, or short, single-pass sequencereads from mRNA (cDNA).
GSS database: A database of genome survey sequences, or short, single-pass genomicsequences.
HomoloGene:A gene homology tool that compares nucleotide sequences between pairs oforganisms in order to identify putative orthologs.
HTG database:A collection of high-throughput genome sequences from large-scalegenome sequencing centers, including unfinished and finished sequences.
SNPs database:A central repository for both single-base nucleotide substitutions andshort deletion and insertion polymorphisms.
RefSeq:A database of non-redundant reference sequences standards, including genomicDNA contigs, mRNAs, and proteins for known genes. Multiple collaborations, both withinNCBI and with external groups, supports data-gathering efforts.
STS database:A database of sequence tagged sites, or short sequences that areoperationally unique in the genome.
UniSTS:A unified, non-redundant view of sequence tagged sites (STSs).
UniGene:A collection of ESTs and full-length mRNA sequences organized into clusters,each representing a unique known or putative human gene annotated with mapping andexpression information and cross-references to other sources.
http://www.ncbi.nlm.nih.gov/dbEST/index.htmlhttp://www.ncbi.nlm.nih.gov/dbGSS/index.htmlhttp://www.ncbi.nlm.nih.gov/HomoloGene/http://www.ncbi.nlm.nih.gov/HTGS/http://www.ncbi.nlm.nih.gov/SNP/http://www.ncbi.nlm.nih.gov/RefSeq/http://www.ncbi.nlm.nih.gov/dbSTS/index.htmlhttp://www.ncbi.nlm.nih.gov/genome/sts/http://www.ncbi.nlm.nih.gov/UniGene/http://www.ncbi.nlm.nih.gov/UniGene/http://www.ncbi.nlm.nih.gov/genome/sts/http://www.ncbi.nlm.nih.gov/dbSTS/index.htmlhttp://www.ncbi.nlm.nih.gov/RefSeq/http://www.ncbi.nlm.nih.gov/SNP/http://www.ncbi.nlm.nih.gov/HTGS/http://www.ncbi.nlm.nih.gov/HomoloGene/http://www.ncbi.nlm.nih.gov/dbGSS/index.htmlhttp://www.ncbi.nlm.nih.gov/dbEST/index.html -
8/11/2019 Nucleic Acid Databases[1]
21/37
8/27/2014 5:03 AM
-
8/11/2019 Nucleic Acid Databases[1]
22/37
8/27/2014 5:03 AM
-
8/11/2019 Nucleic Acid Databases[1]
23/37
8/27/2014 5:03 AM
Sequence submission
Data mainly direct submissions from theauthors.
Submissions through the Internet:
Web forms. Email.
Sequences shared/exchanged between
the 3 centers on a daily basis: The sequence content of the banks is
identical.
-
8/11/2019 Nucleic Acid Databases[1]
24/37
8/27/2014 5:03 AM
Derived databases
CUTG Codon usage tabulated from GenBank
http://www.kazusa.or.jp/codon/
Genetic Codes Deviations from the standard genetic code in various
organisms and organelles
http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?mode=c
TIGR Gene Indices Organism-specific databases of EST and gene
sequences http://www.tigr.org/tdb/tgi.shtml
UniGene Unified clusters of ESTs and full-length mRNA sequences
http://www.ncbi.nlm.nih.gov/UniGene/
ASAP Alternative spliced isoformshttp://www.bioinformatics.ucla.edu/ASAP
Intronerator Introns and alternative splicing in C.elegans and
C.briggsae http://www.cse.ucsc.edu/~kent/intronerator/
http://www.kazusa.or.jp/codon/http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?mode=chttp://www.tigr.org/tdb/tgi.shtmlhttp://www.ncbi.nlm.nih.gov/UniGene/http://www.bioinformatics.ucla.edu/ASAPhttp://www.cse.ucsc.edu/~kent/intronerator/http://www.cse.ucsc.edu/~kent/intronerator/http://www.bioinformatics.ucla.edu/ASAPhttp://www.ncbi.nlm.nih.gov/UniGene/http://www.tigr.org/tdb/tgi.shtmlhttp://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?mode=chttp://www.kazusa.or.jp/codon/ -
8/11/2019 Nucleic Acid Databases[1]
25/37
8/27/2014 5:03 AM
-
8/11/2019 Nucleic Acid Databases[1]
26/37
8/27/2014 5:03 AM
-
8/11/2019 Nucleic Acid Databases[1]
27/37
8/27/2014 5:03 AM
-
8/11/2019 Nucleic Acid Databases[1]
28/37
8/27/2014 5:03 AM
-
8/11/2019 Nucleic Acid Databases[1]
29/37
8/27/2014 5:03 AM
-
8/11/2019 Nucleic Acid Databases[1]
30/37
8/27/2014 5:03 AM
-
8/11/2019 Nucleic Acid Databases[1]
31/37
8/27/2014 5:03 AM
Nucleic acid structure
databases NDB Nucleic acid-containing structures
http://ndbserver.rutgers.edu/
NTDB Thermodynamic data for nucleic acidshttp://ntdb.chem.cuhk.edu.hk/
RNABase RNA-containing structures from PDB andNDB http://www.rnabase.org/
SCOR Structural classification of RNA: RNA motifs bystructure, function and tertiary interactions
http://scor.lbl.gov/
http://ndbserver.rutgers.edu/http://ntdb.chem.cuhk.edu.hk/http://www.rnabase.org/http://scor.lbl.gov/http://scor.lbl.gov/http://www.rnabase.org/http://ntdb.chem.cuhk.edu.hk/http://ndbserver.rutgers.edu/ -
8/11/2019 Nucleic Acid Databases[1]
32/37
8/27/2014 5:03 AM
-
8/11/2019 Nucleic Acid Databases[1]
33/37
8/27/2014 5:03 AM
-
8/11/2019 Nucleic Acid Databases[1]
34/37
8/27/2014 5:03 AM
-
8/11/2019 Nucleic Acid Databases[1]
35/37
8/27/2014 5:03 AM
-
8/11/2019 Nucleic Acid Databases[1]
36/37
8/27/2014 5:03 AM
Database searching tips
Look for links to Helpor Examples
Try Booleansearches
Be careful with UK/US spellingdifferences leukaemia vs leukemia
haemoglobin vs hemoglobin
colour vs color
-
8/11/2019 Nucleic Acid Databases[1]
37/37
8/27/2014 5:03 AM
Exercises Study the statistics of the three primary nucleic acid
databases: Are they matching ?
Look for a gene of your interest in the three primarynucleic acid databases: compare the information given ineach one of them.
Read NAR DB paper and NAR DB index site: search fordifferent nucleic acid databases based on differentsearch terms.
Self study:
http://www3.oup.co.uk/nar/database/c/ Download NAR database paper (NARDB2004) from:ftp://cbag.sc.mahidol.ac.th/pub/Course_Materials/dinesh
http://www3.oup.co.uk/nar/database/c/http://www3.oup.co.uk/nar/database/c/