expect value (e-value)

26

Upload: jud

Post on 12-Jan-2016

44 views

Category:

Documents


0 download

DESCRIPTION

Expect value (E-value). Expected number of hits, of equivalent or better score, found by random chance in a database of the size searched. Conserved domains Domain: sequence of amino acids that typically fold to a stable tertiary structure. Many proteins are multi-domain. Blast to Psi-Blast. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Expect value (E-value)
Page 2: Expect value (E-value)
Page 3: Expect value (E-value)
Page 4: Expect value (E-value)
Page 5: Expect value (E-value)
Page 6: Expect value (E-value)
Page 7: Expect value (E-value)
Page 8: Expect value (E-value)

Expect value(E-value)

• Expected number of hits, of equivalent or better score, found by random chance in a database of the size searched.

Page 9: Expect value (E-value)
Page 10: Expect value (E-value)
Page 11: Expect value (E-value)
Page 12: Expect value (E-value)

Conserved domains

Domain: sequence of amino acids that typically fold to a stable tertiary structure. Many proteins are multi-domain.

Page 13: Expect value (E-value)
Page 14: Expect value (E-value)
Page 15: Expect value (E-value)
Page 16: Expect value (E-value)

Blast to Psi-Blast

• Blast makes use of Scoring Matrix derived from large number of proteins.

• What if you want to find homologs based upon a specific gene product?

• Develop a position specific scoring matrix (PSSM).

Page 17: Expect value (E-value)

PSSM

M

G

A

S

F

M F W Y G A P V I L C R K E N D Q S T H

5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0

1 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 2 0

0 4 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Determine frequency of substitution, and converts to LogOdd score.

Page 18: Expect value (E-value)

PSSM

M

G

A

S

F

M F W Y G A P V I L C R K E N D Q S T H

5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0

1 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 2 0

0 4 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Can include a score for permitting insertions and deletions. Perhaps this position is at a turn, where INDELs are common.

INDEL

Indel 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0

Page 19: Expect value (E-value)

PSSM

• In evaluating (scoring) alignments, PSSM approaches typically:– Reward matches to columns that have

conserved amino acids– Penalize mismatches to columns with

conserved amino acid more than mismatches in a variable column

Page 20: Expect value (E-value)

PSI-BLAST

• Input a single query sequence.

• Executes a BLAST run.

• Program takes significant hits, incorporates matches into a PSSM.

• Sequences >98% similar not included (avoid biasing the PSSM).

Page 21: Expect value (E-value)

Power of approach:

• PSI-BLAST is iterative.

• Takes best hits and improves the scoring matrix.

Page 22: Expect value (E-value)
Page 23: Expect value (E-value)
Page 24: Expect value (E-value)

Original Blast had 84 hits.

Page 25: Expect value (E-value)
Page 26: Expect value (E-value)

The PSSM will skewtowards this region