bioinformatics2

Upload: ksboopathi

Post on 05-Apr-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 Bioinformatics2

    1/16

    Tools to analyze protein characteristics

    Proteinsequence

    -Family member-Multiple alignments

    Identification of

    conserved regions

    Evolutionary

    relationship (Phylogeny)

    3-D fold model

    Protein sorting and

    sub-cellular localization

    Anchoring into

    the membrane

    Signal sequence

    (tags)

    Some nascent proteins contain a specific signal, or targeting sequence

    that directs them to the correct organelle. (ER, mitochondrial, chloroplast,

    lysosome, vacuoles, Golgi, or cytosol)

  • 7/31/2019 Bioinformatics2

    2/16

    Can we train the computers:To detect signal sequences and predict protein destination?

    To identify conserved domains (or a pattern)in proteins?To predict the membrane-anchoring type of a protein?(Transmembrane domain, GPI anchor)

    To predict the 3D structure of a protein?

    Learning algorithms are good for solving problems in pattern

    recognition because they can be trained on a sample data set.

    Classes of learning algorithms:

    -Artificial neural networks (ANNs)

    -Hidden Markov Models (HMM)

    Questions

  • 7/31/2019 Bioinformatics2

    3/16

    Artificial neural networks (ANN)

    Machine learning algorithms that mimic the brain. Real brains,

    however, are orders of magnitude more complex than any

    ANN so far considered.

    ANNs, like people, learn by example.ANNs cannot be programmedto perform a specific task.

    ANN is composed of a large number ofhighly interconnected

    processing elements (neurons) working simultaneously to solve

    specific problems.

    The first artificial neuron was developed in 1943 by theneurophysiologist Warren McCulloch and the logician Walter Pits.

  • 7/31/2019 Bioinformatics2

    4/16

    Hidden Markov Models (HMM)

    HMM is a probabilisticprocess over a set ofstates, in which the

    states are hidden. It is only the outcome that visible to the

    observer. Hence, the name Hidden Markov Model.

    HMM has many uses in genomics:

    Gene prediction (GENSCAN)

    SignalPFinding periodic patterns

    Used to answer questions like:

    What is the probability of obtaining a particularoutcome?

    What is the best model from many combinations?

  • 7/31/2019 Bioinformatics2

    5/16

    Expasy server(http://au.expasy.org)

    is dedicated to the analysis of

    protein sequences and structures.

    The ExPASy (Expert Protein Analysis System)

    Sequence analysis tools include:

    DNA -> Protein [Translate]Patternand profile searches

    Post-translational modification and

    topology prediction

    Primary structure analysis

    Structure prediction (2D and 3D)

    Alignment

  • 7/31/2019 Bioinformatics2

    6/16

    PredictProtein:A service for sequence analysis, and structure prediction

    http://www.predictprotein.org/newwebsite/submit.html

    TMpred: http://www.ch.embnet.org/software/TMPRED_form.html

    TMHMM: Predicts transmembrane helices in proteins (CBS; Denmark)http://www.cbs.dtu.dk/services/TMHMM-2.0/

    big-PI : Predicts GPI-anchor site:http://mendel.imp.univie.ac.at/sat/gpi/gpi_server.html

    DGPI: Predicts GPI-anchor site: http://129.194.185.165/dgpi/index_en.html

    SignalP: Predicts signal peptide: http://www.cbs.dtu.dk/services/SignalP/

    PSORT: Predicts sub-cellular localization: http://www.psort.org/

    TargetP: Predicts sub-cellular localization:http://www.cbs.dtu.dk/services/TargetP/

    NetNGlyc: Predicts N-glycosylation sites:http://www.cbs.dtu.dk/services/NetNGlyc/

    PTS1: Predicts peroxisomal targeting sequences

    http://mendel.imp.univie.ac.at/mendeljsp/sat/pts1/PTS1predictor.jsp

    MITOPROT: Predicts of mitochondrial targeting sequenceshttp://ihg.gsf.de/ihg/mitoprot.html

    Hydrophobicity: http://www.vivo.colostate.edu/molkit/hydropathy/index.html

    http://www.predictprotein.org/newwebsite/submit.htmlhttp://www.ch.embnet.org/software/TMPRED_form.htmlhttp://www.cbs.dtu.dk/services/TMHMM-2.0/http://mendel.imp.univie.ac.at/sat/gpi/gpi_server.htmlhttp://129.194.185.165/dgpi/index_en.htmlhttp://www.cbs.dtu.dk/services/SignalP/http://www.psort.org/http://www.cbs.dtu.dk/services/TargetP/http://www.cbs.dtu.dk/services/NetNGlyc/http://mendel.imp.univie.ac.at/mendeljsp/sat/pts1/PTS1predictor.jsphttp://ihg.gsf.de/ihg/mitoprot.htmlhttp://www.vivo.colostate.edu/molkit/hydropathy/index.htmlhttp://www.vivo.colostate.edu/molkit/hydropathy/index.htmlhttp://ihg.gsf.de/ihg/mitoprot.htmlhttp://mendel.imp.univie.ac.at/mendeljsp/sat/pts1/PTS1predictor.jsphttp://www.cbs.dtu.dk/services/NetNGlyc/http://www.cbs.dtu.dk/services/TargetP/http://www.psort.org/http://www.cbs.dtu.dk/services/SignalP/http://129.194.185.165/dgpi/index_en.htmlhttp://mendel.imp.univie.ac.at/sat/gpi/gpi_server.htmlhttp://www.cbs.dtu.dk/services/TMHMM-2.0/http://www.cbs.dtu.dk/services/TMHMM-2.0/http://www.cbs.dtu.dk/services/TMHMM-2.0/http://www.ch.embnet.org/software/TMPRED_form.htmlhttp://www.predictprotein.org/newwebsite/submit.html
  • 7/31/2019 Bioinformatics2

    7/16

    Multiple alignment

    Used to do phylogenetic analysis:

    Same protein from different species

    Evolutionary relationship: history

    Used to find conserved regions

    Local multiple alignment reveals conserved regions

    Conserved regions usually are key functional regionsThese regions are prime targets fordrug developments

    Protein domains are often conserved across many species

    Algorithm for search ofconserved regions:

    Block maker: http://blocks.fhcrc.org/blocks/make_blocks.html

    http://blocks.fhcrc.org/blocks/make_blocks.htmlhttp://blocks.fhcrc.org/blocks/make_blocks.html
  • 7/31/2019 Bioinformatics2

    8/16

    Multiple alignment tools

    Free programs:

    Phylip and PAUP: http://evolution.genetics.washington.edu/phylip.html

    Phyml: http://atgc.lirmm.fr/phyml/

    The most used websites :

    http://align.genome.jp/

    http://prodes.toulouse.inra.fr/multalin/multalin.htmlhttp://www.ch.embnet.org/index.html (T-COFFEE and ClustalW)

    ClustalW:

    Standard popular software

    Italigns 2 and keep on adding a new sequence to the alignment

    Problem: It is simply a heuristics.

    Motif discovery: use yourown motif to search databases:

    PatternFind: http://myhits.isb-sib.ch/cgi-bin/pattern_search

    http://evolution.genetics.washington.edu/phylip.htmlhttp://atgc.lirmm.fr/phyml/http://align.genome.jp/http://prodes.toulouse.inra.fr/multalin/multalin.htmlhttp://www.ch.embnet.org/index.htmlhttp://myhits.isb-sib.ch/cgi-bin/pattern_searchhttp://myhits.isb-sib.ch/cgi-bin/pattern_searchhttp://myhits.isb-sib.ch/cgi-bin/pattern_searchhttp://myhits.isb-sib.ch/cgi-bin/pattern_searchhttp://myhits.isb-sib.ch/cgi-bin/pattern_searchhttp://myhits.isb-sib.ch/cgi-bin/pattern_searchhttp://www.ch.embnet.org/index.htmlhttp://prodes.toulouse.inra.fr/multalin/multalin.htmlhttp://align.genome.jp/http://atgc.lirmm.fr/phyml/http://evolution.genetics.washington.edu/phylip.html
  • 7/31/2019 Bioinformatics2

    9/16

    Phylogenetic analysis

    Phylogenetic trees

    Describe evolutionary relationships between sequences

    Major modes that drive the evolution:Point mutations modify existing sequences

    Duplications (re-use existing sequence)Rearrangement

    Two most common methods

    Maximum parsimonyMaximum likelihood

  • 7/31/2019 Bioinformatics2

    10/16

    Parsimony vsMaximum likelihood

    Parsimony is the most popular method in which the simplest

    answer is always the preferred one.

    It involvesstatistical evaluationof the number of mutations needto explain the observed data.

    The best tree is the one that requires thefewestnumber of

    evolutionary changes.

    Likelihood generally performs better than parsimony

    In contrast,maximum likelihood does not necessarily satisfy

    any optimality criterion. It attempts to answer the question:

    Whatparameters of evolutionary events was likely to produce thecurrent data set?

    This is computationally difficult to do. This is the slowest of allmethods.

  • 7/31/2019 Bioinformatics2

    11/16

    Definitions

    Homologous:Have a common ancestor. Homology cannot be measured.

    Orthologous:The same gene in different species . It is the result ofspeciation (common ancestral)

    Paralogous: Related genes (already diverged) in the same species. It isthe result of genomic rearrangements or duplication

  • 7/31/2019 Bioinformatics2

    12/16

    Determining protein structure

    Direct measurement of structure

    X-ray crystallography

    NMR spectroscopy

    Site-directed mutagenesis

    Computer modeling

    Prediction of structure

    Comparative protein-structure modeling

  • 7/31/2019 Bioinformatics2

    13/16

    Comparative protein-structure modeling

    Goal:Construct 3-D model of a protein of unknown

    structure (target), based on similarity of sequence toproteins of known structure (templates)

    Blue: predicted model by PROSPECT

    Red: NMR structure

    Procedure:

    Template selectionTemplatetarget alignment

    Model building

    Model evaluation

  • 7/31/2019 Bioinformatics2

    14/16

    The Protein 3-D Database

    The Protein DataBase (PDB) contains 3-D structural data

    for proteins

    Founded in 1971 with a dozen structures

    As of June 2004, there were 25,760 structures in the database.

    All structures are reviewed for accuracy and data uniformity.

    Structural data from the PDB can be freely accessed at

    http://www.rcsb.org/pdb/

    80% come from X-ray crystallography

    16% come from NMR

    2% come from theoretical modeling

    http://www.rcsb.org/pdb/http://www.rcsb.org/pdb/
  • 7/31/2019 Bioinformatics2

    15/16

    High-throughput methods

  • 7/31/2019 Bioinformatics2

    16/16

    Most used websites for 3-D structure prediction

    Protein Homology/analogY Recognition Engine (Phyre) at

    http://www.sbg.bio.ic.ac.uk/phyre/html/index.html

    PredictProtein at

    http://www.predictprotein.org/newwebsite/submit.html

    UCLA Fold Recognition at

    http://www.doe-mbi.ucla.edu/Services/FOLD/

    http://www.sbg.bio.ic.ac.uk/phyre/html/index.htmlhttp://www.predictprotein.org/newwebsite/submit.htmlhttp://www.doe-mbi.ucla.edu/Services/FOLD/http://www.doe-mbi.ucla.edu/Services/FOLD/http://www.doe-mbi.ucla.edu/Services/FOLD/http://www.doe-mbi.ucla.edu/Services/FOLD/http://www.predictprotein.org/newwebsite/submit.htmlhttp://www.sbg.bio.ic.ac.uk/phyre/html/index.html