informatics for molecular biologists ansuman chattopadhyay,phd head, molecular biology information...
TRANSCRIPT
![Page 1: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/1.jpg)
Informatics for Molecular Biologists
Ansuman Chattopadhyay,PhDHead, Molecular Biology Information Service
Falk Library,Health Sciences Library System
University of Pittsburgh
![Page 2: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/2.jpg)
Molecular Biology Information Service
Falk Library of Health SciencesHealth Sciences Library SystemUniversity of Pittsburgh200 Scaife HallDesoto and Terrace StreetsPittsburgh, PA 15261
![Page 3: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/3.jpg)
Topics• Searching tools
– Internet– PubMed
• NCBI developed bioinformatics tools– Entrez Gene
• Structure visualization tools– Cn3D
• Genome Browsers– UCSC genome browsers
– NCBI Map viewer
![Page 4: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/4.jpg)
Information search space
• Biomedical literature databases
• Molecular databases
• Organism whole genome sequences
![Page 5: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/5.jpg)
Literature database
• NCBI PubMed– contains over 15 million citations dating back
to the mid-1950's.
Search:“apoptosis”: 130,476
“breast cancer”: 160,055 “p53”: 42,418
![Page 6: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/6.jpg)
Molecular databases
0
100
200
300
400
500
600
1996 1997 1998 1999 2000 2001 2002 2003 2004
Articles
Databases
![Page 7: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/7.jpg)
![Page 8: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/8.jpg)
Organisms whole genome sequences
http://www.genomesonline.org/
![Page 9: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/9.jpg)
Internet for Biologists
• Google Vs Clusty
– Google: Chronological list of search results– Clusty: Search results categorized into topical clusters
Vivísimo's clustering technology creates topical
categories on-the-fly from the search results, using terms in the title, snippet, and any other available textual description in the search results themselves
![Page 10: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/10.jpg)
Google Vs Clusty
• Search Example: Pittsburgh– Google– Clusty
![Page 11: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/11.jpg)
Clusty
Clusters help you see your
search results by topic, so
you can zero in on exactly
what you’re looking for
or discover unexpected
relationships between items.
![Page 12: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/12.jpg)
Search examples for Clusty
• SNP
• BLAST
• Lupus
![Page 13: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/13.jpg)
Web 2.0• Website bookmark and tagging tool
– Del.icio.us a social bookmarking web service for storing, sharing, and
discovering web bookmarks.
![Page 15: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/15.jpg)
Medline searching tool• PubMed vs ClusterMed
Search example: macular degeneration, cell cycle, p53
![Page 16: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/16.jpg)
Molecular databases• DNA Sequence Databases and Analysis Tools
• Enzymes and Pathways
• Gene Mutations, Genetic Variations and Diseases
• Genomics Databases and Analysis Tools
• Immunological Databases and Tools
• Microarray, SAGE, and other Gene Expression
• Organelle Databases
• Other Databases and Tools (Literature Mining, Lab Protocols, Medical Topics, and others)
• Plant Databases
• Protein Sequence Databases and Analysis Tools
• Proteomics Resources
• RNA Databases and Analysis Tools
• Structure Databases and Analysis Tools
![Page 17: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/17.jpg)
HSLS OBRC• http://www.hsls.pitt.edu/guides/genetics/obrc/
![Page 18: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/18.jpg)
Types of databases
– By level of curation:
• Archival
–GenBank, GenPept, ssSNP
• Curated
–Refseq, SwissProt, RefSNP
![Page 19: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/19.jpg)
Types of databases
– Archival data• repository of information • redundant; might have many sequence records for
the same gene, each from a different lab • submitters maintain editorial control over their
records: what goes in is what comes out
• no controlled vocabulary • variation in annotation of biological features
Example: GenBank record
![Page 20: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/20.jpg)
GenBank
• archival database of nucleotide sequences from >130,000 organisms
• records annotated with coding region (CDS) features also include amino acid translations
• each record represents the work of a single lab
• redundant; can have many sequence records for a single gene
![Page 21: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/21.jpg)
International Nucleotide Sequence Database Collaboration
![Page 22: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/22.jpg)
![Page 23: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/23.jpg)
Types of databases
![Page 24: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/24.jpg)
Refseq
• Curated data– non-redundant; one record for each gene, or
each splice variant – each record is intended to present an
encapsulation of the current understanding of a gene or protein, similar to a review article
– records contain value-added information that have been added by an expert(s)
![Page 25: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/25.jpg)
Refseq• Database of reference sequences
• Curated
• Non-redundant; one record for each gene, or each splice variant, from each organism represented
• A representative GenBank record is used as the source for a RefSeq record
• Value-added information is added by an expert(s)
• Each record is intended to present an encapsulation of the current understanding of a gene or protein, similar to a review article
• Variety of accession number prefixes (NM_ , NP_ , etc.) and status codes (provisional, reviewed, etc.). More about those in later slides.
• RefSeq database includes genomic DNA, mRNA, and protein sequences, so organizes information according to the model of the central dogma of biology
![Page 26: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/26.jpg)
RefSeq
![Page 27: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/27.jpg)
Searching GenBank
• Find messenger RNA sequence for Human epidermal growth factor (EGF) gene.
![Page 28: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/28.jpg)
Databases developers
• NCBI
• EBI
![Page 29: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/29.jpg)
![Page 30: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/30.jpg)
Neighbors and Hard Links
Genomes
Taxonomy
PubMed abstracts
Nucleotide sequences
Protein sequences
3-D Structure
3 -D Structure
Word weight
VAST
BLASTBLAST
Phylogeny
Source NCBI
![Page 31: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/31.jpg)
NCBI Tools
![Page 32: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/32.jpg)
![Page 33: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/33.jpg)
Entrez Gene
NCBI’s database for gene centricinformation focuses on organisms genome
• completely sequenced • an active research community to contribute
gene-specific information • scheduled for intense sequence analysis
– Total Taxa: 4246; Total Genes: 284,3587
• 160,000 organisms in the nucleotide sequence database (Genbank)
![Page 34: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/34.jpg)
Entrez gene• each record represents a single gene from a given organism
Gene record includes: – a unique identifier or GeneID assigned by NCBI – a preferred symbol – and any one or more of: – sequence information – map information – official nomenclature from an authority list – alternate gene symbols – summary of gene/protein function – published references that provide additional information on
function – expression – homology data – and more
![Page 35: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/35.jpg)
SNP
Genomic Sequence
Exon-Intron Structure
Expression Profile
Interacting Partners
3D Structure
mRNA Sequence
Chromosomal Localization
Disease
Amino acid Sequence
Homologous Sequences
Gene / Protein
![Page 36: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/36.jpg)
Searching Entrez Gene
![Page 37: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/37.jpg)
Entrez gene
Find: • gene symbols and aliases • sequences: genomic, mRNA, protein • intron-exon architecture • genomic context: neighboring and antisense
genes • Interacting partners • associated gene ontology terms: function,
cellular component and biological process
![Page 38: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/38.jpg)
Entrez Gene recordQuery: BRCA1
Search Tips:Query text box: BRCA1Limits:•To limit your search to a specific field, select: “Gene name” from drop-down menu•Limit by taxonomy: select “Homo sapiens”
Name and aliases
Chromosomal location
![Page 39: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/39.jpg)
Sourse: NCBI
![Page 40: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/40.jpg)
Entrez Gene: sequences and genomic context
Sequences: mRNA, Genomic, Protein
mRNA Seq
ProteinSeq
Genomic Seq
![Page 41: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/41.jpg)
Transcription and alternative splicing
Alternative splicing: http://www.exonhit.com/UserFiles/Image/epissage.swf?PHPSESSID=d9u8tiu2sioqa8u29bkop3l0l2
![Page 42: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/42.jpg)
Entrez Gene: intron-exon architectures
Tips: Change Display to “Gene Table” from “Summary”
![Page 43: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/43.jpg)
Genomic SeqmRNA Seq
ProteinSeq
![Page 44: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/44.jpg)
Gene Ontology
– Controlled vocabulary tagging
• Function
• Biological Processes
• Cellular Component
![Page 45: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/45.jpg)
Entrez Gene : Gene Ontology
![Page 46: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/46.jpg)
Homologous sequences
![Page 47: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/47.jpg)
Entrez Gene: Homologous sequence
Tips: change Display settings from" summary”to “Alignment score”to “Multiple Alignment”
![Page 48: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/48.jpg)
Single nucleotide polymorphisms
Single nucleotide polymorphisms (SNP) are DNA sequence variations that occur when a single nucleotide (A,T,C,or G) in the genome sequence is altered. For example a SNP might change the DNA sequence AAGGCTAA to ATGGCTAA
![Page 49: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/49.jpg)
SNPs
![Page 50: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/50.jpg)
Coding SNPs
![Page 51: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/51.jpg)
Entrez Gene: SNPs
![Page 52: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/52.jpg)
Protein Info: HPRD
![Page 53: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/53.jpg)
Protein Info: HPRD
![Page 54: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/54.jpg)
Entrez Gene: Links
![Page 55: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/55.jpg)
Entrez Gene: Linkout
![Page 56: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/56.jpg)
Seq to Entrez gene: UCSC BLATQuery Seq: SGLTPEEFMLVYKFARKHHITLTNLITEE
![Page 57: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/57.jpg)
BLAT to Entreze Gene
CLICK
CLICK
![Page 58: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/58.jpg)
Find chromosomal location of your gene of interest. How many exons have been reported for your gene?What are its neighboring genes ?
Query sequence:IHYNYMCNSSCMGGMNRRPILTII
Hands-On Exercise Question
![Page 59: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/59.jpg)
Exercise:
Find the protein sequence for rat leptin.
BLAT this sequence vs. the human
genome to find the human homolog.
Look for SNPs in the coding region of
this gene—are there any?
![Page 60: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/60.jpg)
Sequence alignment
• Pair wise alignment• Multiple alignment
![Page 61: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/61.jpg)
Pairwise alignment
• Global– Needleman Wunsc (1970)
• Local– Smith-Waterman (1981)– Lipman and Pearson
/FASTA (1985)– Basic Local Alignment
Search Tool(BLAST:1991)
![Page 62: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/62.jpg)
BLAST
To find homologous sequence for a sequence of interest by searching sequence databases:
Nucleotide:
Protein:
TTGGATTATTTGGGGATAATAATGAAGATAGCAATTATCTCAGGGAAAGGAGGAGTAGGAAAATCTTCTA TTTCAACATCCTTAGCTAAGCTGTTTTCAAAAGAGTTTAATATTGTAGCATTAGATTGTGATGTTGAT
MSVMYKKILYPTDFSETAEIALKHVKAFKTLKAEEVILLHVIDEREIKKRDIFSLLLGVAGLNKSVEEFE NELKNKLTEEAKNKMENIKKELEDVGFKVKDIIVVGIPHEEIVKIAEDEGVDIIIMGSHGKTNLKEILLG
![Page 63: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/63.jpg)
BLAST
• To Find statistically significant matches, based on sequence similarity, to a protein or nucleotide sequence of interest.
•Obtain information on inferred function of the gene or protein.
•Find conserved domains in your sequence of interest that are common to many sequences. •Compare two known sequences for similarity.
![Page 64: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/64.jpg)
What you can do with BLAST
•Find homologous sequence in all combinations (DNA/Protein) of query and database.
–DNA Vs DNA–DNA translation Vs Protein–Protein Vs Protein–Protein Vs DNA translation–DNA translation Vs DNA translation
![Page 65: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/65.jpg)
BLAST exercise
• Find homologous sequences for uncharacterized archaebacterial protein, NP_247556, from Methanococcus jannaschii
![Page 66: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/66.jpg)
BLAST searchSort by E values
2X10-65
Sequence description
Link to Entrez
number of display cut off (100)over rides E value cut
off (10)
Descriptions of hits
![Page 67: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/67.jpg)
BLAST search
•Orthologs from closely related species will have the highest scores and lowest E values
–Often E = 10-30 to 10-100
•Closely related homologs with highly conserved function and structure will have high scores
–Often E = 10-15 to 10-50
•Distantly related homologs may be hard to identify
–Less than E = 10-4
![Page 68: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/68.jpg)
Protein domains
• Wikipedia
SH2Src homology 2 domains; Signal transduction, involved in recognition of phosphorylated tyrosine (pTyr). SH2 domains typically bind pTyr-containingligands via two surface pockets, a pTyr and hydrophobic binding pocket, allowing proteins with SH2 domains to localize to tyrosine phosphorylated sites.
![Page 69: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/69.jpg)
Searching CDD
• CDD SEARCH
Query sequence:
![Page 70: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/70.jpg)
Blink
• BLink displays the graphical output of pre-computed blastp results against the protein non-redundant (nr) database. This graphical output includes:
– Alignment of up to 200 BLAST hits on the query sequence – Best Hits to each organism – List of known protein domains in the query sequence – Filter hits by selecting the BLAST cutoff score – Distribution of hits by taxonomic grouping – Display of similar sequences with known 3D structure – Filter hits by database and/or by taxonomic grouping – Display a taxonomic tree of all organisms with similar sequences
Access: Link out from NCBI protein records
Link toTP53 Blink: http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val=NP_000537.2&dopt=gp
![Page 71: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/71.jpg)
Protein structure
![Page 72: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/72.jpg)
Protein data bank (PDB)• international database of 3-D biological macromolecular structures
• accepts direct submissions of structure data
• maintained by a nonprofit organization, the Research Collaboratory for Structural Bioinformatics (RCSB), associated with Rutgers University, San Diego Supercomputer Center, and the Biotechnology Division of the National Institute of Standards and Technology
• contains molecular structures of proteins and nucleic acids, primarily structures experimentally-derived by X-Ray crystallography and NMR
• also includes some theoretical models, though they are not encouraged.
![Page 73: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/73.jpg)
3D structure viewing software
• NCBI Cn3D
• First glance in Jmol
A simple tool for macromolecular visualization.
The Cn3D home page includes a link in the blue sidebar for instructions on installing Cn3D, which is available for PC, Mac, and Unix.
![Page 74: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/74.jpg)
Cn3D
• View the 3-dimensional structure for 1TUP and practice using some of the Cn3D features that allow you to:
– spin the structure using your mouse – use the control+left mouse button combination to zoom in and
out of the structure – use the shift+left mouse button combination to move the
structure across the viewing window – use the Style menu to render the structure in different ways
(e.g., worms, space fill, ball and stick, ...) – use the Style menu to color the structure in different ways (e.g.,
secondary structure, domain, ...) – use the Style/Edit Global Style to label every 20th amino acids
![Page 75: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/75.jpg)
What is it?
Genome Browser is a computer program which helps to display gene maps, browse the chromosomes, align genes or gene models with ESTs or contigs etc.
![Page 76: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/76.jpg)
Genome Sequence Project Time Line
1976 : RNA Bacteriophage MS2
1995: Haemophilus influenzae
2003: Human genome reference sequence
2005: 265 genomes; 21 archaeal, 211 bacterial, 33 eukaryotic
![Page 77: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/77.jpg)
http://www.genomesonline.org/
![Page 78: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/78.jpg)
![Page 79: Informatics for Molecular Biologists Ansuman Chattopadhyay,PhD Head, Molecular Biology Information Service Falk Library, Health Sciences Library System](https://reader035.vdocuments.net/reader035/viewer/2022062423/56649f515503460f94c74039/html5/thumbnails/79.jpg)
Genome Browsers
• NCBI MAP Viewer
• EBI Ensembl
• UCSC Genome Browser