home work i. running blast with bioperl input: 1) sequence or acc.num. 2) threshold (e value cutoff)...

14
Home Work I. Running Blast with BioPerl Input: 1) Sequence or Acc.Num. 2) Threshold (E value cutoff) Output: 1) Blast results – sequence names, alignment score, E- value. 2) Near each result provide a link that redirects to Pairwise Alignment (from the previous exercise). The page for Pairwise Alignment should be pre-filled with the two sequences (first - the original sequence, second – the selected sequence from the Blast run). * You should also submit data flow diagram with BioPerl class names.

Post on 22-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Home Work I. Running Blast with BioPerl Input: 1) Sequence or Acc.Num. 2) Threshold (E value cutoff) Output: 1) Blast results – sequence names, alignment

Home Work

I. Running Blast with BioPerl

Input: 1) Sequence or Acc.Num.

2) Threshold (E value cutoff)

Output:

1) Blast results – sequence names, alignment score, E-value.

2) Near each result provide a link that redirects to Pairwise Alignment (from the previous exercise). The page for Pairwise Alignment should be pre-filled with the two sequences (first - the original sequence, second – the selected sequence from the Blast run).

* You should also submit data flow diagram with BioPerl class names.

Page 2: Home Work I. Running Blast with BioPerl Input: 1) Sequence or Acc.Num. 2) Threshold (E value cutoff) Output: 1) Blast results – sequence names, alignment

Home Work (continued) •Doc: bioperl tutorial section III.4.1 Running BLAST remotely (using RemoteBlast.pm)

•Use sleep function

GenBank

Seq

string

get_Seq_by_acc('AF303112');

$seq1->seq();

•Data-Flow diagram example for retrieving sequence:

$gb = new Bio::DB::GenBank();

$seq = $gb->get_Seq_by_acc('AF303112');

print $seq1->seq();

Page 3: Home Work I. Running Blast with BioPerl Input: 1) Sequence or Acc.Num. 2) Threshold (E value cutoff) Output: 1) Blast results – sequence names, alignment

Home Work (continued)

II. Translate PROSITE pattern into Perl regular expression.

Page 4: Home Work I. Running Blast with BioPerl Input: 1) Sequence or Acc.Num. 2) Threshold (E value cutoff) Output: 1) Blast results – sequence names, alignment

Profile Analysis

M. Gribskov, D. Eisenberg.

Profile Analysis - detection of distantly related proteins by sequence comparison.

The information is expressed in a position-specific scoring table (profile).

Page 5: Home Work I. Running Blast with BioPerl Input: 1) Sequence or Acc.Num. 2) Threshold (E value cutoff) Output: 1) Blast results – sequence names, alignment

Profiles

Seq1>-

Seq3>-Seq4>-

Seq2>-

Page 6: Home Work I. Running Blast with BioPerl Input: 1) Sequence or Acc.Num. 2) Threshold (E value cutoff) Output: 1) Blast results – sequence names, alignment
Page 7: Home Work I. Running Blast with BioPerl Input: 1) Sequence or Acc.Num. 2) Threshold (E value cutoff) Output: 1) Blast results – sequence names, alignment

Profile alignment

•Sequence – Profile Alignment.

•Profile – Profile Alignment.

Dynamic Programming. (the same idea as in Pairwise Sequence Alignment)

Page 8: Home Work I. Running Blast with BioPerl Input: 1) Sequence or Acc.Num. 2) Threshold (E value cutoff) Output: 1) Blast results – sequence names, alignment

reminder:Pairwise Sequence Alignment

Sequence-Profile alignment:

S(x,j) – aligning ‘x’ with column ‘j’

S(x,j)= Σy σ(x,y) p(y,j)/p(y)

σ(x,y) – any regular score for Pairwise Alignment (PAM-k, BLOSUM-k …)

p(y,j) – frequency that character y appears in mult. align. column ‘j’

p(y) – frequency that character y appears anywhere in all sequences from mult.align.

The position-specific gap coefficients penalize gaps in conserved regions more heavily than gaps in more variable regions

Page 9: Home Work I. Running Blast with BioPerl Input: 1) Sequence or Acc.Num. 2) Threshold (E value cutoff) Output: 1) Blast results – sequence names, alignment

Profiles in GCG

PileUp creates a multiple sequence alignment from a group of related sequences.

ProfileMake makes a profile from a multiple sequence alignment.

ProfileSearch uses the profile to search a database for sequences with similarity to the group of aligned sequences.

ProfileSegments displays optimal alignments between each sequence in the ProfileSearch output list and the group of aligned sequences (represented by the profile consensus).

ProfileGap makes optimal alignments between one or more sequences and a group of aligned sequences represented as a profile.

ProfileScan uses a database of profiles to find structural and sequence motifs in protein sequences.

Page 10: Home Work I. Running Blast with BioPerl Input: 1) Sequence or Acc.Num. 2) Threshold (E value cutoff) Output: 1) Blast results – sequence names, alignment
Page 11: Home Work I. Running Blast with BioPerl Input: 1) Sequence or Acc.Num. 2) Threshold (E value cutoff) Output: 1) Blast results – sequence names, alignment

Iterative profile pairwise alignment

1. Align some pair.2. While (not done)

(a)Pick an unaligned string which is ”near” some aligned one(s).(b)Align with the profile of the previously aligned group.

Resulting new spaces are inserted into all strings in the group.

Page 12: Home Work I. Running Blast with BioPerl Input: 1) Sequence or Acc.Num. 2) Threshold (E value cutoff) Output: 1) Blast results – sequence names, alignment

Progressive Profile Alignment

ClustalW (algorithm of Thompson, Higgins, Gibson 1994)

(the idea is close to Feng-Doolittle 1987, implemented in PileUp, GCG package)

1. Calculate the pairwise alignment scores, and convert them to distances.

2. Use a neighbor-joining algorithm to build a tree from the distances.

3. Align sequence - sequence, sequence - profile, profile - profile in decreasing similarity order.

Page 13: Home Work I. Running Blast with BioPerl Input: 1) Sequence or Acc.Num. 2) Threshold (E value cutoff) Output: 1) Blast results – sequence names, alignment
Page 14: Home Work I. Running Blast with BioPerl Input: 1) Sequence or Acc.Num. 2) Threshold (E value cutoff) Output: 1) Blast results – sequence names, alignment

Alignment tree built by ClustalW