bidiblast tool presentation

14
COMPARATIVE GENOMICS - Tool Development - COMPARATIVE GENOMICS - Tool Development -

Upload: joao-feio-de-almeida

Post on 02-Jul-2015

1.355 views

Category:

Technology


0 download

DESCRIPTION

Talk for the benefit of my fellow researchers at CReM (Univ. Nova de Lisboa)

TRANSCRIPT

Page 1: BiDiBlast Tool Presentation

COMPARATIVE GENOMICS

- Tool Development -

COMPARATIVE GENOMICS

- Tool Development -

Page 2: BiDiBlast Tool Presentation

Driving Cause•Problem

•How many genes are absent?

Sacharomyces cerevisiae Sacharomyces kudryavzevii

•Synteny analysis?

Page 3: BiDiBlast Tool Presentation

Driving Cause•Problem

•How many genes are absent?

Sacharomyces cerevisiae Sacharomyces kudryavzevii

•Homolog ORF detection to assess differences.

Coverage gap

S. kudryavzevii contigs

Page 4: BiDiBlast Tool Presentation

Homology•Types of homology

•Origin versus time

Page 5: BiDiBlast Tool Presentation

Homology•Types of homology

•Orthology versus Paralogy versus Speciation

Page 6: BiDiBlast Tool Presentation

Homology•Types of homology

•Orthology versus Paralogy versus Speciation

•A complex picture

•Many available detection strategies – none is perfect

1. Merkeev I., P. Novichkov, and A. Mironov. 2006. PHOG: a database of supergenomes built from proteome complements. BMC Evolutionary Biology 6:52.

out-paralogy

ohnology

in-paralogy

Page 7: BiDiBlast Tool Presentation

Homology•Detecting Homolog Sequences

•Increasing levels of stringency approach

•Sequence similarity

•Similarity search

•Best reciprocal hit (BRH)

Bi-Directional BLAST

•Similar product function

•Similar domain architecture

Regular Expressions (e.g.PROSITE Patterns)

PSSM (e.g. NCBI CDD)

HMM (e.g. Pfam)

•Common (protein) family

•Similar syntenic neighbourhood

•No easy solution

Page 8: BiDiBlast Tool Presentation

Homology•Detecting Homolog Sequences

•Bi-Directional BLAST

•No windows tool/server readily available

•Adapt existing PERL script

Refactor from UNIX to Windows environment

Lack of experience => effort needed?

•Migrate to UNIX environment

Same problems

•Develop simple JAVA app

Existing experience => smoother path

New useful tricks to learn

Interface command line applications

Multithreading = multitasking

In the end unanticipated problems emerged

Coding problems

Library insufficient documentation

GUI development

Page 9: BiDiBlast Tool Presentation

Homology•Detecting Homolog Sequences

•Bi-Directional BLAST

•Implementation as data pipeline

•Several (thousands of) code lines

•Collection of 15 JAVA classes – 3 Packages

General routines - bidiblastsup

Data structures – bidiblastsup.objects

User interface – bidiblastsup.ui

•Uses 3 third-party libraries

BioJava 1.4 – mainly trasnlation tasks

DB4O 5.0 – data management and …

NeoBio – scoring schemes including ambiguity codes

•Integrates 4 command line tools

NCBI BLAST (blastall –p blastn)

align0 (FASTA) – ORF alignment

stretcher (EMBOSS) – protein alignment

yn00 (PAML) – dN/dS calculation

Page 10: BiDiBlast Tool Presentation

Homology•Detecting Homolog Sequences

•Bi-Directional BLAST

•Implementation as data pipeline

•Swing graphical user interface (GUI)

Control over the program run

Parameter entry

BLAST database building

Result dumping

Page 11: BiDiBlast Tool Presentation

Homology•Detecting Homolog Sequences

•Bi-Directional BLAST

•IS NOT an orthologous gene finding tool!

•Performs the RBH detection between pools of DNA sequences

Customised BLAST / TBLASTX parameters

Store indicator values about the results

•Stores every first BLAST hit

Bi-directional – putative orthologs

Uni-directional – putative paralogs

•Aligns the resulting hit sequences by careful global alignment

Measures the real length of the aligned regions

Proper sequence similarity

•Translates and aligns the ORF products

Global alignment using a given substitution matrix

A codon wise global alignment of the ORF as a by product

Several statistics stored

Page 12: BiDiBlast Tool Presentation

Homology•Detecting Homolog Sequences

•Bi-Directional BLAST

•IS NOT an orthologous gene finding tool!

•Calculates evolution rates for every hit (pair of sequences)

Based on the codon wise global alignment of the ORFs

•Dumps the results in delimited text files

Follow on processing and analysis

Results should be imported into relational database

Spreadsheets accepted but not favoured

Page 13: BiDiBlast Tool Presentation

Homology•Detecting Homolog Sequences

•Bi-Directional BLAST

•IS NOT an orthologous gene finding tool!

•Result filtration runs on the final user

Sequence length mismatches (e.g. 80% to 120%)

Similarity threshold

Intervening STOP codons as ORF quality control

•Usage scope

Comparative genomics

Annotation of ORF from newly sequenced genomes

Estimation of evolution rates for sets sequence

etc…

Page 14: BiDiBlast Tool Presentation

Homology•Detecting Homolog Sequences

•Bi-Directional BLAST

•Future developments

•Domain architecture detection in products

Integration problems

What kind of formalism?

Downstream or upstream?

•Other assorted improvements

User interface

Result management inside the application

Sinteny integration

Matching against whole genome / chromosome / contig