bioinformatics and computational biology. bioinformatics collection and storage of biological...
TRANSCRIPT
Bioinformatics
and
Computational Biology
• Bioinformatics collection and storage of biological information derives knowledge from computer analysis of
biological data
• Computational biology development of algorithms and statistical models
to analyze biological data
Few people adequately trained in both biology and computer science
Genome sequencing, microarrays etc. lead to large amounts of data to be analyzed
Leads to important discoveries
Saves time and money
Why bioinformatics is critical?
Why is the relationship between Computer Science and Biology is essential?
Three main reasons-
First, massive amounts of data have to be stored, analyzed and made accessible
Second, the nature of the data is often such that a computational statistical method is necessary. This applies in particular to the information on the building plans of proteins and spatial organization of their expression in the cell encoded by the DNA.
Third, there is a strong analogy between the DNA sequence and a computer program
Key Areas/Scope of Bioinformatics
1. Organizing biological knowledge in database
2. Analysing sequence data
3. Structural Bioinformatics
4. Pharmacological relevance (Population genetics)
1. Organizing biological knowledge in database
Genbank/Organized DNA sequences - NCBI, EMBL
Protein sequence databank and its structure and functional characteristics. For example, SWISSPROT contains verified protein sequences and more annotations describing the function
of a protein
Literature database – PUBMED, MEDLINE
2. Analysing sequence data Establish the correct order of sequence contigs Find the translation and transcription initiation sites, find promoter sites,
define open reading frames (ORF) Find splice sites, introns, exons Translate the DNA sequence into a protein sequence Compare the DNA sequence to known protein sequences in order to
verify exons etc with homologous sequences.
Multiple sequence alignments Studying evolutionary aspects, by the construction of phylogenetic trees Determining active site residues, and residues specific for subfamilies Predicting protein–protein interactions Analysing single nucleotide polymorphism to hunt for genetic sources of
diseases.
3. Structural Bioinformatics
This branch of bioinformatics is concerned with computational approaches
to predict and analyse the spatial structure of proteins and nucleic acids.
multiple sequence alignment, secondary structure, 3D structure can be predicted with an accuracy above 70 %.
4. Pharmacological relevance
Drug targets in infectious organisms can be revealed by wholegenome comparisons of infectious and non–infectious organisms.
The analysis of single nucleotide polymorphisms reveals genes potentially responsible for genetic diseases.
Prediction and analysis of protein 3D structure is used to develop drugs and understand drug resistance.
Patient databases with genetic profiles, e.g. for cardiovasculardiseases, diabetes, cancer, etc. may play an important role in thefuture for individual health care, by integrating personal geneticprofile (population genetics) into diagnosis.
National Center for Biotechnology information (NCBI)(http://ncbi.nlm.nih.gov)
Ensembl Genome Browser (http://www.ensembl.org) UCSC Genome Browser (http://genome.ucsc.edu/)
WormBase (http://www.wormbase.org/)
AceDB (http://www.acedb.org/)
FlyBase (http://flybase.bio.indiana.edu/)
Genomic Browsers
• SWISS-PROT/TrEMBL curated protein sequences http://www.expasy.ch/sprot
• InterPro: Protein families and domains http://www.ebi.ac.uk/interpro
• EXProt: proteins with experimentally verified functions http://www.cmbi.nl/exprot
• Protein Information Resource (PIR) http://pir.georgetown.edu/
Protein databses
NCBI
Continued..
NCBI text search of a protein
Abstract finding by NCBI
Nucleotide search of a typical gene
Continued..
FASTA format
FASTA: FASTA format is a text-based format for representing either nucleic acid sequences or protein sequences, in which base pairs or protein residues are represented using single letter codes.