a sequence retrieving and manipulation network

46

Upload: barbie

Post on 24-Feb-2016

67 views

Category:

Documents


0 download

DESCRIPTION

Databases. Softwares. Retrival System. Information. Formats. A Sequence Retrieving and Manipulation Network. DNA Protein NCBI-GenBANKPIR DDBJSWISSPROT EBI-EMBLEXPASY, PDB. GCG SeqWEB Vector NTI GenoMAX. Entrez SRS. GenBANK GCG FASTA Staden Image. Sequence Converter. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A Sequence Retrieving and Manipulation Network
Page 3: A Sequence Retrieving and Manipulation Network

IAM: International Advisory Meeting ICM: International Collaborative Meeting

GenBank/EMBL/DDBJInternational Nucleotide Sequence Database

EMBL: European Molecular Biology LaboratoryEBI: European Bioinformatics Institute

DDBJ: DNA Data Bank of JapanCIB: Center for Information Biology and DNA Data Bank of JapanNIG: National Institute of Genetics

NCBI: National Center for Biotechnology InformationNLM: National Library of Medicine

Page 4: A Sequence Retrieving and Manipulation Network

http://www.ncbi.nlm.nih.gov/genbank/

Page 5: A Sequence Retrieving and Manipulation Network

Secondarily Databases

Page 6: A Sequence Retrieving and Manipulation Network

Secondarily Databases

Page 8: A Sequence Retrieving and Manipulation Network

Database Retrieving andManipulation Network

Literature DatabaseSequence Databases -

Primary DatabasesSecondarily Databases

GCGVector NTICLCOpen SourcesEndnoteMS OfficeAdobe

Query by1.Text2.Sequence

Sequnece,Structure,Image,Document

GenBANKGCGFASTAStadenImage Sequence

Converter

Databases

Softwares

Formats

RetrivalSystem

Information

Page 9: A Sequence Retrieving and Manipulation Network

fuzzy search( approximate string matching )

Page 10: A Sequence Retrieving and Manipulation Network

Literature Databases

Page 11: A Sequence Retrieving and Manipulation Network

Sequence ComparisonNucleotide sequence alignments

Residues with shared chemical properties can substitute for each otherSize, charge, hydrophobicity, polarityscored less than a match, but better than a mismatchConservative changes scored as better than non-conservative

137 AGACCAACCTGGCCAACATGGTGAAATCCCATCTCTAC.AAAAATACAAA 185 |||||| ||||||||||||||||||| |||||||||| |||||||||| 1 AGACCAGCCTGGCCAACATGGTGAAACTCCATCTCTACTGAAAATACAAA 50

matchmismatch gap

Protein sequence alignmentsConserved substitution

10 20 30 40 50 60ggamma.pep MGHFTEEDKATITSLWGKVNVEDAGGETLGRLLVVYPWTQRFFDSFGNLSSASAIMGNPK |||||||||||||||||:|||::|||||:|||||:|||||||||||||||||||||||||HGCZG MGHFTEEDKATITSLWGHVNVDEAGGETIGRLLVLYPWTQRFFDSFGNLSSASAIMGNPK 10 20 30 40 50 60

Page 12: A Sequence Retrieving and Manipulation Network

vs

Pairwise Comparsion

compares regions within two sequences and can return several matches

Local Alignment

compare entire sequences Global Alignment

BLAST

FASTA

Page 13: A Sequence Retrieving and Manipulation Network

Program QUERY Database

blastp amino acid sequence protein sequence database.

blastn nucleotide sequence

nucleotide sequence database.

blastxnucleotide sequence translated in all reading frames

protein sequence database(use this option to find potential translation products of an unknown nucleotide sequence)

tblastn amino acid sequence

nucleotide sequence database translated in all reading frames

tblastxsix-frame translations of a nucleotide sequence

six-frame translations of a nucleotide sequence database. (tblastx program cannot be used with the nr database on the BLAST Web page because it is computationally intensive)

Query by sequence

Page 14: A Sequence Retrieving and Manipulation Network
Page 15: A Sequence Retrieving and Manipulation Network

http://www.ncbi.nlm.nih.gov/About/glance/index.html

Page 16: A Sequence Retrieving and Manipulation Network
Page 17: A Sequence Retrieving and Manipulation Network
Page 18: A Sequence Retrieving and Manipulation Network

http://www.ncbi.nlm.nih.gov/sites/gquery

Page 19: A Sequence Retrieving and Manipulation Network
Page 20: A Sequence Retrieving and Manipulation Network

Literature Databaseshttp://www.ncbi.nlm.nih.gov/omim

Page 21: A Sequence Retrieving and Manipulation Network
Page 22: A Sequence Retrieving and Manipulation Network
Page 24: A Sequence Retrieving and Manipulation Network
Page 25: A Sequence Retrieving and Manipulation Network

http://www.ebi.ac.uk/

Page 26: A Sequence Retrieving and Manipulation Network
Page 27: A Sequence Retrieving and Manipulation Network

http://www.ebi.ac.uk/

EMBL-EBI provides freely available data from life science experiments, performs basic research in computational biology and offers an extensive user training programme, supporting researchers in academia and industry.

Page 28: A Sequence Retrieving and Manipulation Network
Page 29: A Sequence Retrieving and Manipulation Network
Page 31: A Sequence Retrieving and Manipulation Network

http://www.ebi.ac.uk/intact/pages/interactions/interactions.xhtml?query=EBI-1799550&filter=ac

Page 32: A Sequence Retrieving and Manipulation Network

Metabolic & Signalling Pathways

Kyoto Encyclopedia of Genes &Genomeshttp://www.genome.ad.jp/kegg/

Page 33: A Sequence Retrieving and Manipulation Network
Page 34: A Sequence Retrieving and Manipulation Network

http://www.genome.jp/kegg-bin/show_pathway?map04115

Page 35: A Sequence Retrieving and Manipulation Network

Metabolic & Signalling Pathways

Biocarta( http://biocarta.com)

Page 36: A Sequence Retrieving and Manipulation Network

http://www.ihop-net.org/UniPub/iHOP/

Page 37: A Sequence Retrieving and Manipulation Network
Page 38: A Sequence Retrieving and Manipulation Network

Minimal information for this gene

Most recent information for this gene

Interaction information for this gene

Defining information for this gene

Page 39: A Sequence Retrieving and Manipulation Network
Page 40: A Sequence Retrieving and Manipulation Network

January each year

Page 41: A Sequence Retrieving and Manipulation Network

Softwares & Sequence Formats

WWWSeqWEB

GCG

VectorNTICLC Genomics

text file paste & Copytext file paste & copy

GCG file FASTA Multiple sequence file (msf) GenBANK Rich sequence file (rsf)

EMBL List files (lst) Staden SwissProt

ProgramFormats

Default Accept Multiple sequence

Page 42: A Sequence Retrieving and Manipulation Network

Retrieve Sequences in GCG

FetchCopies GCG sequences or data files from the GCG databaseInto your directory or displays them on your terminal screen.Syntax: % fetch [-Infile=]database:acession numberExample: fetch gb:l10131

SeqEdAn interactive editor for entering and modifying sequences and for assembling parts of existing sequences into new genetic constructs

Page 43: A Sequence Retrieving and Manipulation Network

Importing and Exporting

You need a FTP program to transfer files between your PC and GCG.The sequence file must be in “plain text” format.

chopup: converts a non-GCG format sequence file containing lines longer than 511 characters and as long as 32,000 characterters into a new file containing no longer than 50 characters.

breakup: reads a non-GCG format sequence file containing more than 350,000 sequence characterters and writes it as a set of separate, shorter, overlapping sequence files than can be analyzed by GCG.

reformat: rewrites sequence files, scoring matrix files, or enzyme data files so than they can be read by GCG programs.

fromfasta: reformats one or more sequences from FastA format into single sequence files in GCG format.

Page 44: A Sequence Retrieving and Manipulation Network

Exercise 03-1

(A)Transfer sequence files from your PC to GCG(B) Chopup the sequence(C) Reformat the sequence(D)Edit the sequenceCreate a folder “BIO” in your hard diskStart WsFTP (ftp://bioinfo.nhri.org.tw)Upload “naq.txt” & “psq.txt” to GCGStart NettermStart GCGChopup “naq.txt” & “psq.txt”Reformat “naq.dat” or “psq.dat”Cat “naq.txt” or “psq.txt”

Page 45: A Sequence Retrieving and Manipulation Network

Exercise 03-3

Sequence Manipulation in GCG UNIX

Use the database searching techniques you learned today to retrieve the reference sequenceHomo sapiens LEGUMAIN and the amino acid sequence ofALL LEGUMAIN

From NCBI and EMBLAnd then transfer the sequence(s) to 1. SeqWEB and 2. GCG Unix (in GCG format)

There are many different ways to DO it.You can have your lunch now if you can make it.

Page 46: A Sequence Retrieving and Manipulation Network

ASSIGNMENT 1.

Use the Entrez searching techniques you learned today to retrieve theReference sequence and the corresponding amino acid sequences of

All the subclasses of Homo sapiens cyclophilin

Transfer the sequences to GCG Unix,Transform the sequences to GCG format

E-mail1. The steps (including URL of WWW sites) you used and2. The sequences in GCG format as attached file to [email protected] before next Thursday 1200**** 郵件主旨: ASS1 bioinfo – ( 學號 )