1 cse, marmara university mimoza.marmara.edu.tr/~m.sakalli/cse546 including some slights of...

30
1 Why CpG islands? CSE, Marmara University CSE, Marmara University mimoza.marmara.edu.tr/~m.sakalli/cse546 mimoza.marmara.edu.tr/~m.sakalli/cse546 Including some slights of Papoulis. These notes will be further Including some slights of Papoulis. These notes will be further modified. modified. Dec/15/09 Dec/15/09 Notes on probability are from Notes on probability are from A. Papoulis and S. U. Pillai,

Upload: joella-patrick

Post on 03-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

1

Why CpG islands?CSE, Marmara University CSE, Marmara University

mimoza.marmara.edu.tr/~m.sakalli/cse546mimoza.marmara.edu.tr/~m.sakalli/cse546Including some slights of Papoulis. These notes will be further modified. Including some slights of Papoulis. These notes will be further modified.

Dec/15/09Dec/15/09

Notes on probability are from Notes on probability are from A. Papoulis and S. U. Pillai,

2

RNA interference, and DNA methylation RCOOHRCOOCH3

• Methylation involves on the regulation of the gene expression, protein functioning, and RNA metabolisms.

• A cell is combination of numerous proteins, each determining how a cell functions. Disproportionately expressed proteins will have devastating effects.

• Two possible vulnerabilities: – one is at the transcriptional level, while dna is converted to mRNA, a fraction of

antisense oligonucleotide binding to unprocessed gene in the DNA, and creating a 3-strand complex, as a result blocking transcription process,

– and the second vulnerability is that at the level of translation. Translation is a ribosome-guided process for manufacturing a protein from mRNA. There, once antisenseRNA hybridizes mRNA, then protein generation is inhibited since the editing enzymes splicing introns from RNAs are blocked. RNaseH recognizes the double helix complex of antisense on bound mRNA, and somehow frees antisense on, and cleaves mRNA.

• Antisense therapy: HIV, influenza and for cancer treatment where replication and transcription is targeted.

3

RNA interference, and DNA methylation RCOOHRCOOCH3

• RNA interference (RNAi) is a system controlling (either increasing or decreasing) the activity of RNAs. MicroRNA (miRNA) and small interfering RNA (siRNA) which are the direct products of genes, and can bind to other specific RNAs. They play roles in defending cells against parasitic genes – viruses and transposons – but also gene expression in general. It is universal.

• The methylation process differs in prokaryotic and eukaryotic cells, in the former one it occurs at the 5’ of cytosine pyrimidine and at the 6’ of nitrogen of the adenine purine ring, while in the later one, it occurs at the # 5 carbon of the cytosin pyrimidine sites.

• In mammalian, metyhlation occurs at the 5C of CpG dinucleotide. CpG is 1% of human genome. Most are methylated. Unmethylated CpG islands present in the regulatory genes, including promoter regions, therefore impeding transcription and protein modeling, (cromotin and histone).

• One abnormality for example caused due to the incomplete methylation is Rett syndrome. Epigenetic abnormalities. Methylated histones holding dna tightly and blocking transcriptions.

4

• The occurrence of CpG sequences is the least frequent in many genomes.. rarer than would be expected by the independent probabilities of C and G. This is said (!!because) C in CpG has a tendency to methylate and to become methyle-C, and methylation process is suppressed in areas around genes, hence these areas have a relatively higher concentration of CpG in islands.

• Epigenetic Importance: Methyle-C has a high change in mutating to T, therefore important in epigenetic inheritance, as well its importance in in controlling gene expression and regulation.

• Questions: How close a short sequence is to be a CpG island, and the likelihood of a long sequence containing one or more CpG islands, and more importantly the relation it bears, coincidental or for some functional reasoning.

• Therefore Markov chains.

5

A A Markov chainMarkov chain is a stochastic random process, a discrete is a stochastic random process, a discrete process {process { XXnn } where } where nn { { 0, 1, 2, . . . }, with the Markov 0, 1, 2, . . . }, with the Markov

property, for which, the conditional probability distribution of property, for which, the conditional probability distribution of the future states depends only upon the current state and a the future states depends only upon the current state and a fixed number of past states (with fixed number of past states (with mm memories). Continuous memories). Continuous Time MC has continuous time index. Time MC has continuous time index.

PrPr{{ XXmm+1+1 = = jj || XX00 = = kk00, . . . , , . . . , XXmm-1-1 = = kkmm-1-1, , XXmm = = ii } } = =

PrPr{ { XXmm+1+1 = = jj | | XXnn = = ii }} transition probabilities. transition probabilities.

Finite state machine, iid sequence. Finite state machine, iid sequence. for every for every ii, , jj, , kk00, . . . , , . . . , kkmm-1 -1 and for every and for every mm. .

Stationary: For all Stationary: For all nn, the transition matrix does not change over , the transition matrix does not change over time and the future state depends only on the current state time and the future state depends only on the current state ii and not on the previous states. and not on the previous states.

Pr{Pr{ XXnn+1 +1 = = jj | | XXnn = = ii } = Pr{ } = Pr{ XX1 1 = = jj | |XX00 = = ii }. }.

6

The one-step transition matrix for a Markov chain with states S = { 0, 1, 2 } is [………, …, …]

where Pr{ X1 = j | X0 = i } = pij(n)>0.

Accessibility: A Markov Process is ergodic if if possible to communicate between any two i to j states. Then this is irreducible, if all states communicate..

Periodic if returns to the same state at every k (periodicity) steps. Aperiodic if there is no a repetitive k steps.

A system lucking the system is absorbing state. If there is no absorbing state then the Markov Chain

irreducible.

2

0

1

(1) (1)

(1)

(1)

2

0

1

(1)

(0.5)

(1)

4(0.5)

7

3 1

0

2

0 0 X 0 X

1 X 0 0 0

2 X 0 0 0

3 0 0 X X

0 1 2 3

State

4

0

0 X X 0 0 X

1 X X 0 0 0

2 0 0 X X 0

3 0 0 0 X 0

0 1 2 3 4

State 4 0 0 0 0 0

23

1

8

Learn these.. • Conditional probability, joint probability.

• Independence of occurrences of events.

• Bayesian process.

• Expressing sequences statistically with their distribution. Discriminating states.

• MLE, EM.

• MCMC, for producing a desired posteriori distribution, 1- Metropolis-Hastings, RWMC, 2-Gibbs sampling.

• Markov chains, properties maintained. Stationary, ergodic, irreducibility, aperiodic,

• Hidden Markov Models (the goal is to detect the sequence of underlying states the sequence of underlying states that is likely to give rise to an observed sequence). that is likely to give rise to an observed sequence).

• This is Viterbi Algorithm.

9

Independence: A and B are said to be independent events,

if

Notice that the above definition is a probabilistic statement,

not a set theoretic notion such as mutually exclusiveness.

Suppose A and B are independent, then

Thus if A and B are independent, the event that B has occurred does not give any clue on the occurrence of the event A. It makes no difference to A whether B has occurred or not.

).()()( BPAPABP (1-45)

PILLAI

).()(

)()(

)(

)()|( AP

BP

BPAP

BP

ABPBAP (1-46)

10

Example 1.2: A box contains 6 white and 4 black balls. Remove two balls at random without replacement. What is the probability that the first one is white and the second one is black?

Let W1 = “first ball removed is white”

B2 = “second ball removed is black”

PILLAI

11

Ex 1.2: A box contains 6 w and 4 b balls. Remove two at random without replacement. What is the probability that the 1st one is white and the 2nd one is black?

Let W1 = “first ball removed is white” and B2 = “second ball removed is black”

We need We have

Using the conditional probability rule,

But

and

and hence

?)( 21 BWP

).()|()()( 1121221 WPWBPWBPBWP

,5

3

10

6

46

6)( 1

WP

,9

4

45

4)|( 12

WBP

.25.081

20

9

4

9

5)( 21 BWP

.122121 WBBWBW

(1-47)

PILLAI

12

Are the events W1 and B2 independent? Our common sense says No. To verify this we need to compute P(B2). Of course the fate of the second ball very much depends on that of the first ball. The first ball has two options: W1 = “first ball is white” or B1= “first ball is black”. Note that and Hence W1 together with B1 form a partition. Thus (see (1-42)-(1-44))

and

As expected, the events W1 and B2 are dependent.

,11 BW

.11 BW

,5

2

15

24

5

2

3

1

5

3

9

4

10

4

36

3

5

3

45

4

)()|()()|()( 1121122

BPRBPWPWBPBP

.81

20)(

5

3

5

2)()( 1212 WBPWPBP

PILLAI

13

From (1-35),

Similarly, from (1-35)

or

or Bayes’ theorem

).()|()( BPBAPABP

,)(

)(

)(

)()|(

AP

ABP

AP

BAPABP

).()|()( APABPABP

(1-48)

(1-49)

).()|()()|( APABPBPBAP

(1-50))(

)(

)|()|( AP

BP

ABPBAP

PILLAI

14

Although simple enough, Bayes’ theorem has an interesting interpretation:

P(A|B): a-posteriori probability of A given B.

P(B): (New Infor.) Evidence of “B has occurred”.

P(B|A): Likelihood of B given A

P(A): the a-priori probability of the event A.

We can also view the event B as new knowledge obtained from a fresh experiment. We know something about A as P(A). The new information is available in terms of B. The new information should be used to improve our knowledge/understanding of A. Bayes’ theorem gives the exact mechanism for incorporating such new information.

PILLAI

15

A more general version of Bayes’ theorem involves

partition of . From (1-50)

where we have made use of (1-44). In (1-51),

represent a set of mutually exclusive events with

associated a-priori probabilities With the

new information “B has occurred”, the information about

Ai can be updated by the n conditional probabilities

,)()|(

)()|(

)(

)()|()|(

1

n

iii

iiiii

APABP

APABP

BP

APABPBAP (1-51)

,1 , niAi

47).-(1 using ,1 ),|( niABP i

.1 ),( niAP i

PILLAI

16

Example 1.3: Two boxes B1 and B2 contain 100 and 200

light bulbs respectively. The first box (B1) has 15 defective

bulbs and the second 5. Suppose a box is selected at random and one bulb is picked out.

(a) What is the probability that it is defective?

Solution: Note that box B1 has 85 good and 15 defective

bulbs. Similarly box B2 has 195 good and 5 defective

bulbs. Let D = “Defective bulb is picked out”.

Then

.025.0200

5)|( ,15.0

100

15)|( 21 BDPBDP

PILLAI

17

Since a box is selected at random, they are equally likely.

Thus B1 and B2 form a partition as in (1-43), and using

(1-44) we obtain

Thus, there is about 9% probability that a bulb picked at random is defective.

.2

1)()( 21 BPBP

.0875.02

1025.0

2

115.0

)()|()()|()( 2211

BPBDPBPBDPDP

PILLAI

18

(b) Suppose we test the bulb and it is found to be defective.

What is the probability that it came from box 1?

Notice that initially then we picked out a box

at random and tested a bulb that turned out to be defective.

Can this information shed some light about the fact that we

might have picked up box 1?

From (1-52), and indeed it is more

likely at this point that we must have chosen box 1 in favor

of box 2. (Recall box1 has six times more defective bulbs

compared to box2).

.8571.00875.0

2/115.0

)(

)()|()|( 11

1

DP

BPBDPDBP

?)|( 1 DBP

(1-52)

;5.0)( 1 BP

,5.0857.0)|( 1 DBP

PILLAI

19

Let denote the random outcome of an experiment. To every such outcome suppose a waveform is assigned.The collection of such waveforms form a stochastic process. The set of and the time index t can be continuousor discrete (countably infinite or finite) as well.For fixed (the set of all experimental outcomes), is a specific time function.For fixed t,

is a random variable. The ensemble of all such realizations over time represents the stochastic

),( tX

}{ k

Si

),( 11 itXX

),( tX

PILLAI/Cha

14. Stochastic Processes

t1

t2

t

),(n

tX

),(k

tX

),(2

tX

),(1

tX

Fig. 14.1

),( tX

0

),( tX

Introduction

20

process X(t). (see Fig 14.1). For example

where is a uniformly distributed random variable in represents a stochastic process. Stochastic processes are everywhere:Brownian motion, stock market fluctuations, various queuing systemsall represent stochastic phenomena.

If X(t) is a stochastic process, then for fixed t, X(t) representsa random variable. Its distribution function is given by

Notice that depends on t, since for a different t, we obtaina different random variable. Further

represents the first-order probability density function of the process X(t).

),cos()( 0 tatX

})({),( xtXPtxFX

),( txFX

(14-1)

(14-2)

PILLAI/Cha

(0,2 ),

dxtxdF

txf X

X

),(),(

21

For t = t1 and t = t2, X(t) represents two different random variablesX1 = X(t1) and X2 = X(t2) respectively. Their joint distribution is given by

and

represents the second-order density function of the process X(t).Similarly represents the nth order densityfunction of the process X(t). Complete specification of the stochasticprocess X(t) requires the knowledge of for all and for all n. (an almost impossible taskin reality).

})(,)({),,,( 22112121 xtXxtXPttxxFX

(14-3)

(14-4)

),, ,,,( 2121 nn tttxxxfX

),, ,,,( 2121 nn tttxxxfX

niti , ,2 ,1 ,

PILLAI/Cha

21 2 1 2

1 2 1 21 2

( , , , )( , , , )

X

X

F x x t tf x x t t

x x

22

Mean of a Stochastic Process:

represents the mean value of a process X(t). In general, the mean of a process can depend on the time index t.

Autocorrelation function of a process X(t) is defined as

and it represents the interrelationship between the random variablesX1 = X(t1) and X2 = X(t2) generated from the process X(t).

Properties:

1.

2.

(14-5)

(14-6)

*1

*212

*21 )}]()({[),(),( tXtXEttRttR

XXXX (14-7)

.0}|)({|),( 2 tXEttRXX

PILLAI/Cha

(Average instantaneous power)

( ) { ( )} ( , )

Xt E X t x f x t dx

* *1 2 1 2 1 2 1 2 1 2 1 2( , ) { ( ) ( )} ( , , , )

XX XR t t E X t X t x x f x x t t dx dx

23

3. represents a nonnegative definite function, i.e., for any set of constants

Eq. (14-8) follows by noticing that The function

represents the autocovariance function of the process X(t).Example 14.1Let

Then

.)(for 0}|{|1

2

n

iii tXaYYE

)()(),(),( 2*

12121 ttttRttCXXXXXX

(14-9)

.)(

T

TdttXz

T

T

T

T

T

T

T

T

dtdtttR

dtdttXtXEzE

XX

2121

212*

12

),(

)}()({]|[|

(14-10)

niia 1}{

),( 21 ttRXX

n

i

n

jjiji ttRaa

XX

1 1

* .0),( (14-8)

PILLAI/Cha

24

Similarly

,0}{sinsin}{coscos

)}{cos()}({)(

0 0

0

EtaEta

taEtXEtX

).(cos2

)}2)(cos()({cos2

)}cos(){cos(),(

210

2

210210

2

20102

21

tta

ttttEa

ttEattRXX

(14-12)

(14-13)

Example 14.2

).2,0(~ ),cos()( 0 UtatX (14-11)

This gives

PILLAI/Cha

2

0 }.{sin0cos}{cos since 2

1 EdE

25

Stationary Stochastic ProcessesStationary processes exhibit statistical properties that are

invariant to shift in the time index. Thus, for example, second-orderstationarity implies that the statistical properties of the pairs {X(t1) , X(t2) } and {X(t1+c) , X(t2+c)} are the same for any c. Similarly first-order stationarity implies that the statistical properties of X(ti) and X(ti+c) are the same for any c.

In strict terms, the statistical properties are governed by thejoint probability density function. Hence a process is nth-orderStrict-Sense Stationary (S.S.S) if

for any c, where the left side represents the joint density function of the random variables andthe right side corresponds to the joint density function of the randomvariables A process X(t) is said to be strict-sense stationary if (14-14) is true for all

),, ,,,(),, ,,,( 21212121 ctctctxxxftttxxxf nnnn XX

(14-14)

)( , ),( ),( 2211 nn tXXtXXtXX

).( , ),( ),( 2211 ctXXctXXctXX nn

. and ,2 ,1 , , ,2 ,1 , canynniti PILLAI/Cha

26

For a first-order strict sense stationary process,from (14-14) we have

for any c. In particular c = – t gives

i.e., the first-order density of X(t) is independent of t. In that case

Similarly, for a second-order strict-sense stationary process we have from (14-14)

for any c. For c = – t2 we get

),(),( ctxftxfXX

(14-16)

(14-15)

(14-17)

)(),( xftxfXX

[ ( )] ( ) , E X t x f x dx a constant.

), ,,(), ,,( 21212121 ctctxxfttxxfXX

) ,,(), ,,( 21212121 ttxxfttxxfXX

(14-18)

PILLAI/Cha

27

i.e., the second order density function of a strict sense stationary process depends only on the difference of the time indices In that case the autocorrelation function is given by

i.e., the autocorrelation function of a second order strict-sensestationary process depends only on the difference of the time indices Notice that (14-17) and (14-19) are consequences of the stochastic process being first and second-order strict sense stationary. On the other hand, the basic conditions for the first and second order stationarity – Eqs. (14-16) and (14-18) – are usually difficult to verify.In that case, we often resort to a looser definition of stationarity,known as Wide-Sense Stationarity (W.S.S), by making use of

.21 tt

.21 tt

(14-19)

PILLAI/Cha

*1 2 1 2

*1 2 1 2 1 2 1 2

*1 2

( , ) { ( ) ( )}

( , , )

( ) ( ) ( ),

XX

X

XX XX XX

R t t E X t X t

x x f x x t t dx dx

R t t R R

28

(14-17) and (14-19) as the necessary conditions. Thus, a process X(t)is said to be Wide-Sense Stationary if(i) and(ii) i.e., for wide-sense stationary processes, the mean is a constant and the autocorrelation function depends only on the difference between the time indices. Notice that (14-20)-(14-21) does not say anything about the nature of the probability density functions, and instead deal with the average behavior of the process. Since (14-20)-(14-21) follow from (14-16) and (14-18), strict-sense stationarity always implies wide-sense stationarity. However, the converse is not true in general, the only exception being the Gaussian process.This follows, since if X(t) is a Gaussian process, then by definition are jointly Gaussian randomvariables for any whose joint characteristic function is given by

)}({ tXE

(14-21)

(14-20)

),()}()({ 212*

1 ttRtXtXEXX

)( , ),( ),( 2211 nn tXXtXXtXX

PILLAI/Chanttt ,, 21

29

where is as defined on (14-9). If X(t) is wide-sense stationary, then using (14-20)-(14-21) in (14-22) we get

and hence if the set of time indices are shifted by a constant c to generate a new set of jointly Gaussian random variables then their joint characteristic function is identical to (14-23). Thus the set of random variables and have the same joint probability distribution for all n and all c, establishing the strict sense stationarity of Gaussian processes from its wide-sense stationarity.

To summarize if X(t) is a Gaussian process, then wide-sense stationarity (w.s.s) strict-sense stationarity (s.s.s).Notice that since the joint p.d.f of Gaussian random variables dependsonly on their second order statistics, which is also the basis

),( ki ttCXX

1 ,

( ) ( , ) / 2

1 2( , , , )XX

n n

k k i k i kk l k

X

j t C t t

n e

(14-22)

12

1 1 1 1

( )

1 2( , , , )XX

n n n

k i k i kk k

X

j C t t

n e

(14-23)

niiX 1}{

niiX 1}{

PILLAI/Cha

),( 11 ctXX )(,),( 22 ctXXctXX nn

30

The ergodic hypothesis: an isolated system in thermal equilibrium, evolving in time, will pass through all the accessible microstates states at the same recurrence rate, i.e. all accessible microstates are equally probable.

The average over long times will equal the average over the ensemble of all equi-energetic microstates: if we take a snapshot of a system with N microstates, we will find the system in any of these microstates with the same probability.