probability and markov models

Upload: alikhazaei1

Post on 05-Apr-2018

228 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 Probability and Markov Models

    1/21

    Timothy L. BaileyBIOL3014

    1

    Probability and Markov

    Models

  • 7/31/2019 Probability and Markov Models

    2/21

    Timothy L. BaileyBIOL3014

    2

    Reading

    Chapter 1 in the book.

    Chapter 3 pages 46-51.

  • 7/31/2019 Probability and Markov Models

    3/21

    Timothy L. BaileyBIOL3014

    3

    Definition: Random Process

    A RANDOM PROCESS is something thathas a random outcome:

    Roll a die, flip a coin, roll 2 dice

    Observe orthologous base pair in 2 seqs

    Measure an mRNA level

    Weigh a person

  • 7/31/2019 Probability and Markov Models

    4/21

    Timothy L. BaileyBIOL3014

    4

    Definition: Experiment

    In probability theory, an EXPERIMENT is asingle observation of a random process.

  • 7/31/2019 Probability and Markov Models

    5/21

    Timothy L. BaileyBIOL3014

    5

    Definition: Event

    An EVENT is a set of possible outcomes of anexperiment.

    An ELEMENTARY EVENT is whatever you

    decide it is. For example: The outcome of 1 roll of a die The outcomes of nrolls of a die

    The residue at position 237 in a protein

    The residues at position 237 in a family of proteins The weight of a person

    Elementary events must be non-overlapping!

  • 7/31/2019 Probability and Markov Models

    6/21

    Timothy L. BaileyBIOL3014

    6

    Compound Events

    A COMPOUND EVENT is a set of one ormore elementary events.

    For example, you might define twocompound events in a die-rollingexperiment: E=roll less than 3, F=roll

    greater than or equal to 3.

    Then,

    E = {1, 2} and F = {3, 4, 5,6}.

  • 7/31/2019 Probability and Markov Models

    7/21

    Timothy L. BaileyBIOL3014

    7

    Defn: Sample Space

    The SAMPLE SPACE is the set of allELEMENTARY EVENTS.

    So the sample space is the universe of

    all possible outcomes of the experiment. This is written:

    = { Ei}

    For example, for rolls of a die, you mighthave: = {1, 2, 3, 4, 5, 6}

  • 7/31/2019 Probability and Markov Models

    8/21

    Timothy L. BaileyBIOL3014

    8

    Discrete vs. Continuous Events

    The sample space might be INFINITE.For example, the weight of person can beany real number greater than 0.

    Some events are DISCRETE: countable

    Base pairs, residues, die rolls

    Other events are CONTINUOUS: eg, realnumbers

    Weights, alignment scores, mRNA levels

  • 7/31/2019 Probability and Markov Models

    9/21

    Timothy L. BaileyBIOL3014

    9

    The Axioms of Probability

    Let E and F be events. Then the axioms ofprobability are:

    1. Pr(E) 0

    2. Pr() = 1

    3. Pr(E U F) = Pr(E) + Pr(F) if (E F) = empty set

    4. Pr(E | F) Pr(F) = Pr (E F)

    E

    F

    E U F

    E

    F

    E F

    ProbabilityIs likearea

    in Venndiagrams

  • 7/31/2019 Probability and Markov Models

    10/21

    Timothy L. BaileyBIOL3014

    10

    Notation

    Joint Probability: Pr(E,F)

    The probability of EandF

    Conditional probability: Pr(E | F)

    The probability of EgivenF

  • 7/31/2019 Probability and Markov Models

    11/21

    Timothy L. BaileyBIOL3014

    11

    Conditional Probability and Bayes

    Rule

    Conditional probability can be defined as:

    Pr(E | F) = Pr (E,F) / Pr(F)

    Bayes Rule can be used to reverse theroles of E and F:

    Pr(F | E) = Pr (E|F) Pr(F) / Pr(E)

  • 7/31/2019 Probability and Markov Models

    12/21

    Timothy L. BaileyBIOL3014

    12

    Sequence Models

    Observed biological sequences (DNA,RNA, protein) can be thought of as theoutcomes of random processes.

    So, it makes sense to model sequencesusing probabilistic models.

    You can think of a sequence model as alittle machine that randomly generatessequences.

  • 7/31/2019 Probability and Markov Models

    13/21

    Timothy L. BaileyBIOL3014

    13

    A Simple Sequence Model

    Imagine a tetrahedral (four-sided) die withthe letters A, C, G and T on its sides.

    You roll the die 100 times and write downthe letters that come up (down, actually).

    This is a simple random sequence model.

  • 7/31/2019 Probability and Markov Models

    14/21

    Timothy L. BaileyBIOL3014

    14

    Zero-order Markov Model

    The four-sided die model is called a

    0-order Markov model.

    It can be drawn thus:

    M

    qAqCqG

    qT

    0-order MarkovSequence model

    Emission Probabilites

    p=1 Transition probability

  • 7/31/2019 Probability and Markov Models

    15/21

    Timothy L. BaileyBIOL3014

    15

    Complete 0-order Markov Model

    To model the length of the sequences thatthe model can generate, we need to addstart and end states.

    Complete 0-order MarkovSequence model

    M

    qAqCqGqT

    S E1 1-p

    p

  • 7/31/2019 Probability and Markov Models

    16/21

    Timothy L. BaileyBIOL3014

    16

    Generating a Sequence

    This Markov model can generateany DNA sequence.Associated with eachsequence is a path and aprobability.

    1. Start in state S: P = 1

    2. Move to state M: P=1P

    3. Print x: P = qXP

    4. Move to state M: P=pP

    or to state E: P=(1-p) P5. If in state M, go to 3. If in

    state E, stop.Sequence: GCAGCT

    Path: S, M, M, M, M, M, M, E

    P=1qGpqCpqApqGpqCpqT(1-p)

    M

    qAqCqGqT

    S E1 1-p

    p

  • 7/31/2019 Probability and Markov Models

    17/21

    Timothy L. BaileyBIOL3014

    17

    Using a 0-order Markov Model

    This model can generate any DNA sequence, so it canbe used to model DNA.

    We used it when we created scoring matrices forsequence alignment as the background model.

    Its a pretty dumb model, though. DNA is not very well modeled by a 0-order Markov

    model because the probability of seeing, say, a Gfollowing a C is usually different than a Gfollowing an A, (e.g, in CpG islands.)

    So we need a better models: higher order Markovmodels.

  • 7/31/2019 Probability and Markov Models

    18/21

    Timothy L. BaileyBIOL3014

    18

    Markov Model Order

    This simple sequence model iscalled a 0-order Markovmodelbecause the probabilitydistribution of the next letter tobe generated doesnt depend on

    any (zero) of the letterspreceding it.

    The Markov Property:

    Let X = X1X2XL be a sequence.

    In an n-order Markov sequence model, the probability distributionof the next letter depends on the previous n letters generated.

    0-order: Pr(Xi|X1X2Xi-1)=Pr(Xi)

    1-order: Pr(Xi|X1X2Xi-1)=Pr(Xi|Xi-1)

    n-order: Pr(Xi|X1X2Xi-1)=Pr(Xi|Xi-1Xi-2Xi-n)

    M

    qAqCqGqT

    S E1 1-p

    p

  • 7/31/2019 Probability and Markov Models

    19/21

    Timothy L. BaileyBIOL3014

    19

    A 1-order Markov Sequence Model

    In a first-order Markov sequence model, the probability of the next letterdepends on what the previous letter generated was.

    We can model this by making a state for each letter. Each state alwaysemits the letter it is labeled with. (Not all transitions are shown.)

    S E

    A

    Pr(A|A)

    T

    Pr(T|T)

    G

    Pr(G|G)

    C

    Pr(C|C)

    Pr(T|A)

    Pr(C|G)

  • 7/31/2019 Probability and Markov Models

    20/21

    Timothy L. BaileyBIOL3014

    20

    A 2-order Markov Model

    To make a second order Markov sequencemodel, each state is labelled with two letters. Itemits the second letter in its label.

    There would have to be sixteen states: AA, AC,AG, AT, CA, CG, CT etc., plus four states for thefirst letter in the sequence: A, C, G, T

    Each state would have transitions only to stateswhose first letter matched their second letter.

  • 7/31/2019 Probability and Markov Models

    21/21

    Timothy L. BaileyBIOL3014 21

    Part of a 2-order Model

    Each state remembers what the previous

    letter emitted was in its label.

    E

    AA

    Pr(A|AA)

    AT

    AG AC

    Pr(T|AA)

    Pr(G|AA)