speeding the database searches and sequence alignments with multi-motif phi-blast

24
Speeding the Database Searches and Sequence Alignments with Multi-Motif PHI-BLAST Nitin Bhardwaj, Dept. of Chemical Engineering, IIT Bombay.

Upload: hallie

Post on 01-Feb-2016

32 views

Category:

Documents


0 download

DESCRIPTION

Speeding the Database Searches and Sequence Alignments with Multi-Motif PHI-BLAST. Nitin Bhardwaj, Dept. of Chemical Engineering, IIT Bombay. National Center For Biological Sciences, (NCBS) Bangalore A unit of TATA Institute of Fundamental Research (TIFR). What is Sequence Alignment ?. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Speeding the Database Searches and Sequence Alignments with Multi-Motif PHI-BLAST

Speeding the Database Searches and Sequence Alignments with Multi-Motif

PHI-BLAST

Nitin Bhardwaj,

Dept. of Chemical Engineering,

IIT Bombay.

Page 2: Speeding the Database Searches and Sequence Alignments with Multi-Motif PHI-BLAST

National Center For Biological Sciences, (NCBS)

Bangalore

A unit of TATA Institute of Fundamental Research (TIFR)

Page 3: Speeding the Database Searches and Sequence Alignments with Multi-Motif PHI-BLAST

What is Sequence Alignment ?

The process of lining up two or more sequences to achieve maximal levels of similarity

Why make Sequence Alignments ?

To detect:

Structural & Functional Relationship

Evolutionary Relationship

Page 4: Speeding the Database Searches and Sequence Alignments with Multi-Motif PHI-BLAST

Some Basic TermsGlobal Alignment

Entire Sequence

Local Alignment

Restricted to regions of identity and strong similarity

Query SequenceThe sequence of interest

Subject Sequence

The other one

Page 5: Speeding the Database Searches and Sequence Alignments with Multi-Motif PHI-BLAST

And….

Scoring Matrices: to score a match/mismatch

True Positive

True Negative

False PositiveFalse Negative

Motif: A short conserved region of a sequence

Hits: Sequences picked up from the database

Page 6: Speeding the Database Searches and Sequence Alignments with Multi-Motif PHI-BLAST

What after alignment ?

Calculate the score of the alignment

Sort the aligned sequences in the order of their decreasing scores

Go ahead with your analysis to find out the relationships/similarities

Page 7: Speeding the Database Searches and Sequence Alignments with Multi-Motif PHI-BLAST

Pattern-Hit Initiated Basic Local Alignment Search Tool (PHI-BLAST)

Takes a query seq, a motif, a database to search into

Aligns the query sequence with all the seqs which have the motif

Brings out a score for each seq

Reports all the seqs which have the score above a particular thresh-hold value sorted in the order of the score

Page 8: Speeding the Database Searches and Sequence Alignments with Multi-Motif PHI-BLAST

A Typical PHI-BLAST Output1 occurrence(s) of pattern in query

pattern [RA][C][ACDEFGHIKLMNPQRSTVWY][C]

at position 3 of query sequenceSignificant matches for pattern occurrence 1 at position 3

  Score E

Value (bits)pdb|1ILP|A Chain A, Cxcr-1 N-Terminal Peptide Bound To Interleuk... 128 2e-37

pdb|1QE6|D Chain D, Interleukin-8 With An Added Disulfide Betwee... 121 2e-35

pdb|1ICW|A Chain A, Interleukin-8, Mutant With Glu 38 Replaced B... 121 3e-35

pdb|1ROD|A Chain A, Chimeric Protein Of Interleukin 8 And Human ... 98 2e-28

pdb|1TVX|B Chain B, Neutrophil Activating Peptide-2 Variant Form... 50 6e-14

pdb|1NAP|A Chain A, Mol_id: 1; Molecule: Neutrophil Activating P... 50 6e-14

pdb|1MSG|A Chain A, Human Melanoma Growth Stimulatory Activity (... 48 3e-13

pdb|1MGS|A Chain A, Human Melanoma Growth Stimulating Activity (... 48 3e-13

pdb|1QNK|A Chain A, Truncated Human Grob[5-73], Nmr, 20 Structur... 47 5e-13

pdb|1MI2|A Chain A, Solution Structure Of Murine Macrophage Infl... 46 1e-12

Page 9: Speeding the Database Searches and Sequence Alignments with Multi-Motif PHI-BLAST

Strategy behind PHI-BLAST

Location of motifs in the seqs

Motif (Query)

Motif (Subject)

Extension in both directions with local alignment

Calculate the score for the alignment

Page 10: Speeding the Database Searches and Sequence Alignments with Multi-Motif PHI-BLAST

Problems with PHI-BLAST

Only one motif as input so no of runs required thus increasing the time

Consequently, no space for attaching any weightage to any motif

No parallel comparison possible

No control on the specificity of the program

Page 11: Speeding the Database Searches and Sequence Alignments with Multi-Motif PHI-BLAST

The Solution(s) !!!

MULTI – MOTIF PHI-BLAST (MMPB)

RANKED MOTIF PHI-BLAST (RMPB)

Page 12: Speeding the Database Searches and Sequence Alignments with Multi-Motif PHI-BLAST

Multi-Motif PHI-BLAST

Takes a query seq, any no of motifs, a database to search into

Aligns the query sequence with all the seqs which have a min no of motif(s)

Brings out a score for each seq

Reports all the seqs which have the score above a particular thresh-hold value sorted in the order of the score

Page 13: Speeding the Database Searches and Sequence Alignments with Multi-Motif PHI-BLAST

Strategy behind MMPB

Location of motifs in the two seqs

Extension in both directions with local alignment and the part in between with global alignment

Calculate the score for the alignment

Query Motif 2

Motif 1 Motif 2Subject(Local) (Global)(Global)

Motif 1

Page 14: Speeding the Database Searches and Sequence Alignments with Multi-Motif PHI-BLAST

Comparison of Results il8 Macrophage Inflammatory 1beta

(the middle columns correspond to PHI-BLAST(e=1)

And the last one correspond to MMPB

0

5

10

15

20

25

30

35

PHI-

BLAST(e=10)

True Positives

False positives

Page 15: Speeding the Database Searches and Sequence Alignments with Multi-Motif PHI-BLAST

il8 (1ikl) Interleukin-8

0

5

10

15

20

25

PHI-

BLAST(e=10)

MMPB

East

West

Page 16: Speeding the Database Searches and Sequence Alignments with Multi-Motif PHI-BLAST

4helud (1bbh) Cytochrome $c (prime)

0

1

2

3

4

PHI-

BALST(e=10)

MMPB

True Positives

False Positives

Page 17: Speeding the Database Searches and Sequence Alignments with Multi-Motif PHI-BLAST

4helud (256b) Cytochrome $b502

0

5

10

15

20

PHI-

BLAST(e=100

MMPB

True Positives

False Positives

Page 18: Speeding the Database Searches and Sequence Alignments with Multi-Motif PHI-BLAST

Flav (1ord) Orthinine Decarboxylase

0

5

10

15

20

25

30

PHI-

BALST(e=10)

MMPB

True Positives

False Positives

Page 19: Speeding the Database Searches and Sequence Alignments with Multi-Motif PHI-BLAST

Flav (1cus)Cutinase

05

10

1520

25

30

3540

PHI-

BALST(e=10)

MMPB

True Positives

False Positives

Page 20: Speeding the Database Searches and Sequence Alignments with Multi-Motif PHI-BLAST

Ranked Motif PHI-BLAST

Takes a query seq, a number of motifs in the order of their ranks, and a database to search into

Aligns the query sequence with all the seqs which have the min no of highest ranked motifs

Brings out a score for each seq

Reports all the seqs which have the score above a particular thresh-hold value sorted in the order of the score

Page 21: Speeding the Database Searches and Sequence Alignments with Multi-Motif PHI-BLAST

Comparison of results Results for il8 (1hum)

Macrophage Inflammatory 1beta the unmarked columns correspond to RMPB with at least 3 & 2

05

1015

202530

MMPB

with

atleast

3

MMPB

with

atleast

2

True positives

Flase Positives

Page 22: Speeding the Database Searches and Sequence Alignments with Multi-Motif PHI-BLAST

il8 (1ikl) Interleukin-8

05

1015

202530

MMPB

with

atleast

3

MMPB

with

atleast

2

True Positives

False Positives

Page 23: Speeding the Database Searches and Sequence Alignments with Multi-Motif PHI-BLAST

The problems are solved !!!!

Space for multiple motifs as input

Space for attaching weightage tothe motifs via their ranks

Only one run required for any number of motifs so less time

A deeper analysis possible

Page 24: Speeding the Database Searches and Sequence Alignments with Multi-Motif PHI-BLAST

That’s All &

Thanks to All of You