review of statistical modeling and probability theory alan moses ml4bio

Download Review of statistical modeling and probability theory Alan Moses ML4bio

If you can't read please download the document

Upload: thomas-lester

Post on 18-Jan-2018

216 views

Category:

Documents


0 download

DESCRIPTION

What is modeling? Describe some observations in a simple, more compact way Model: a = - G m r2r2 Instead of all the observations, we only need to remember a constant ‘G’ and measure some parameters ‘m’ and ‘r’.

TRANSCRIPT

Review of statistical modeling and probability theory Alan Moses ML4bio What is modeling? Describe some observations in a simple, more compact way X = (X 1,X 2 ) What is modeling? Describe some observations in a simple, more compact way Model: a = - G m r2r2 Instead of all the observations, we only need to remember a constant G and measure some parameters m and r. What is statistical modeling? Deals also with the uncertainty in observations Expectation Deviation or Variance Also use the term probabilistic modeling Mathematics is more complicated What kind of questions will we answer in this course? Whats the best linear model to explain some data? What kind of questions will we answer in this course? Are there multiple groups? What are they? What kind of questions will we answer in this course? Given new data, which group do we assign it to? 3 major areas of machine learning Regression Clustering Classification (that have proven useful in biology) Molecular Biology example Expression Level Expectation Variance disease X = (L,D) Molecular Biology example Expression Level Expectation Variance E1 V1 E2 V2 clustering Expression Level disease Class 2 is enriched for disease Molecular Biology example Expression Level Expectation Variance E1 V1 E2 V2 clustering regression Genotype AAAaaa Expression Level disease Class 2 is enriched for disease Molecular Biology example Expression Level Expectation Variance E1 V1 E2 V2 clustering regression Genotype AAAaaa Expression Level classification Genotype AAAaaaExpression Level disease disease? Aa Class 2 is enriched for disease Probability theory Probability theory quantifies uncertainty using distributions Distributions are the models and they depend on constants and parameters E.g., in one dimension, the Gaussian or Normal distribution depends on two constants e and and two parameters that have to be measured, and P(X|,) = e 1 2 2 (X) 2 2 2 X are the possible datapoints that could come from the distribution. In statistics jargon X is called a random variable Probability theory Probability theory quantifies uncertainty using distributions Choosing the distribution or models the first step in a statistical model E.g., data: mRNA expression levels, counts of sequencing reads, presence or absence of protein domains or A C G and T s We will use different distributions to describe these different types of data. Typical data and distributions Data is categorical (yes or no, A,C,G,T) Data is a fraction (e.g., 13 out of 5212) Data is a continuous number (e.g., -6.73) Data is a natural number (0,1,2,3,4) Its also possible to do regression, clustering and classification without specifying a distribution Molecular Biology example classification Genotype AAAaaaExpression Level disease? Aa In this example, we might try to combine a Bernoulli for the disease data, Poisson for the genotype and Gaussian for the expression level We also might try to classify without specifying distributions Molecular Biology example Gene 2 Expression Level genomics era means we will almost never have the expression level for just one gene or the genotype at just one locus Each genes expression level can be considered another dimension Gene 1 Expression Level for two genes, if each point is data for one person, we can make a graph of this type of data for 1000s of genes. Gene 2 Expression Level Gene 1 Expression Level Gene 3 Gene 4 Gene 5 Molecular Biology example Gene 2 Expression Level genomics era means we will almost never have the expression level for just one gene or the genotype at just one locus Well usually make 2-D plots, but anything we say about 2-D can usually be generalized to n-dimensions Gene 1 Expression Level Each observation, X, contains expression level for Gene 1 and Gene 2 X = (1.3, 4.6) Represent this as a vector: X = (X 1, X 2 ) e.g., Or generally Molecular Biology example Gene 2 Expression Level genomics era means we will almost never have the expression level for just one gene or the genotype at just one locus Well usually make 2-D plots, but anything we say about 2-D can usually be generalized to n-dimensions Gene 1 Expression Level Each observation, X, contains expression level for Gene 1 and Gene 2 X = (1.3, 4.6) Represent this as a vector: X = (X 1, X 2 ) e.g., Or generally This gives a geometric interpretation to multivariate statistics Probability theory Probability theory quantifies uncertainty using distributions Distributions are the models and they depend on constants and parameters E.g., in two dimensions, the Gaussian or Normal distribution depends on two constants e and and 5 parameters that have to be measured, and P(X|,) = e 1 2 (X) T -1 (X) 2 X are the possible datapoints that could come from the distribution. In statistics jargon X is called a random variable || 1 What does the mean mean in 2 dimensions? What does the standard deviation mean? Bivariate Gaussian Molecular Biology example Gene 2 Expression Level genomics era means we will almost never have the expression level for just one gene or the genotype at just one locus Well usually make 2-D plots, but anything we say about 2-D can usually be generalized to n-dimensions Gene 1 Expression Level Each observation, X, contains expression level for Gene 1 and Gene 2 Represent this as a vector: X = (X 1, X 2 ) The mean is also a vector: = ( 1, 2 ) The variance is a matrix: 11 12 21 22 = = = = = correlated data axis-aligned, diagonal covariancefull covariance spherical covariance = 2 I Probability theory Probability theory quantifies uncertainty using distributions Distributions are the models and they depend on constants and parameters Once we chose a distribution, the next step is to chose the parameters This is called estimation or inference P(X|,) = e 1 2 2 (X) 2 2 2 Choose the parameters so the model fits the data There are many ways to measure how well a model fits that data Different Objective functions will produce different estimators (E.g., MSE, ML, MAP) Estimation Expression Level Expectation Variance We want to make a statistical model. 1.Choose a model (or probability distribution) 2.Estimate its parameters P(X|,) = e 1 2 2 (X) 2 2 2 How do we know which parameters fit the data? Laws of probability If X 1 X N are a series of random variables (think datapoints) P(X 1, X 2 ) is the joint probability and is equal to P(X 1 ) P(X 2 ) if X 1 and X 2 are independent. P(X 1 | X 2 ), is the conditional probability of event X 1 given X 2 Conditional probabilities are related by Bayes theorem: P(X 1 | X 2 ) = P(X 2 |X 1 ) P(X 1 ) P(X 2 ) (True for all distributions) Likelihood and MLEs Likelihood is the probability of the data (say X) given certain parameters (say ) Maximum likelihood estimation says: choose , so that the data is most probable. In practice there are many ways to maximize the likelihood. L = P(X|) L = 0 Example of ML estimation X i P(X i |=6.5, =1.5) Data: = P(X i |=6.5, =1.5) = 6.39 x i=1 i=5 L = P(X|) = P(X 1 X N | , ) Mean, L Example of ML estimation Mean, Log(L) In practice, we almost always use the log likelihood, which becomes a very large negative number when there is a lot of data Mean, Standard deviation, Log(L) Example of ML estimation ML Estimation In general, the likelihood is a function of multiple variables, so the derivatives with respect to all of these should be zero at a maximum In the example of the Gaussian, we have two parameters, so that In general, finding MLEs means solving a set of coupled equations, which usually have to be solved numerically for complex models. L = 0 L and MLEs for the Gaussian The Gaussian is the symmetric continuous distribution that has as its centre a parameter given by what we consider the average (the expectation). The MLE for the for variance of the Gaussian is like the squared error from the mean, but is actually a biased (but still consistent!?) estimator ML = X V ML = (X - ML ) 2 X N 1 X N 1 Other estimators Instead of likelihood, L = P(X|) we can choose parameters to maximize posterior probability: Or sum of squared errors: Or a penalized likelihood: L* = P(X|) In each case, estimation involves a mathematical optimization problem that usually has to be solved on computer How do we choose? P(|X) (X MSE ) 2 X e 22 x