machine learning cmpt 726 simon fraser university chapter 1: introduction

15
Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION

Post on 20-Dec-2015

228 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION

Machine LearningCMPT 726Simon Fraser University

CHAPTER 1: INTRODUCTION

Page 2: Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION

Outline

• Comments on general approach.• Probability Theory.

• Joint, conditional and marginal probabilities.• Random Variables.• Functions of R.V.s

• Bernoulli Distribution (Coin Tosses).• Maximum Likelihood Estimation.• Bayesian Learning With Conjugate Prior.

• The Gaussian Distribution.• Maximum Likelihood Estimation.• Bayesian Learning With Conjugate Prior.

• More Probability Theory.• Entropy.• KL Divergence.

Page 3: Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION

Our Approach

• The course generally follows statistics, very interdisciplinary.• Emphasis on predictive models: guess the value(s) of target variable(s). “Pattern Recognition”• Generally a Bayesian approach as in the text.• Compared to standard Bayesian statistics:

• more complex models (neural nets, Bayes nets)• more discrete variables• more emphasis on algorithms and efficiency

Page 4: Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION

Things Not Covered

• Within statistics:• Hypothesis testing• Frequentist theory, learning theory.

• Other types of data (not random samples)• Relational data• Scientific data (automated scientific discovery)• Action + learning = reinforcement learning.

Could be optional – what do you think?

Page 5: Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION

Probability Theory

Apples and Oranges

Page 6: Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION

Probability Theory

Marginal Probability

Conditional ProbabilityJoint Probability

Page 7: Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION

Probability Theory

Sum Rule

Product Rule

Page 8: Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION

The Rules of Probability

Sum Rule

Product Rule

Page 9: Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION

Bayes’ Theorem

posterior likelihood × prior

Page 10: Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION

Bayes’ Theorem: Model Version

• Let M be model, E be evidence.

•P(M|E) proportional to P(M) x P(E|M)

Intuition• prior = how plausible is the event (model, theory) a priori before seeing any evidence.• likelihood = how well does the model explain the data?

Page 11: Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION

Probability Densities

Page 12: Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION

Transformed Densities

Markus Svensén
This figure was taken from Solution 1.4 in the web-edition of the solutions manual for PRML, available at http://research.microsoft.com/~cmbishop/PRML. A more thorough explanation of what the figure shows is provided in the text of the solution.
Page 13: Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION

Expectations

Conditional Expectation(discrete)

Approximate Expectation(discrete and continuous)

Page 14: Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION

Expectations are Linear

• Let aX + bY + c be a linear combination of two random variables (itself a random variable).

• Then E[aX + bY + c] = aE[X] + bE[Y] + c.• This holds whether or not X and Y are

independent.• Good exercise to prove it.

Page 15: Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION

Variances and CovariancesThink about this difference:1.Everybody gets a B.2.10 students get a C, 10 get an A.The average is the same – how to quantify the difference?

Prove this. Hint: use the linearity of expectation.