model-based machine learning

Model-based Machine Learning

Chris BishopMicrosoft Research, Cambridge

With thanks to John Winn, Tom Minka, and the Infer.NET team.

Intelligent software

Goal: software that can adapt, learn, and reason

Player skill

Game result

Movie preferences

Ratings

Words

Ink

Reasoning backwardsCan be described by a model

Traditional machine learning

logistic regression

neural networks

K-means clustering

Gaussian mixture

factor analysis

principal components

Boltzmann machines

support vector machines

ICA

HMM

Kalman filter

deep networks

decision trees

RVM

Radial basis functions

linear regression

Gaussian process

Markov random field

kernel PCA

random forest

“I am overwhelmed by the choice of machine learning methods and techniques. There’s too much to learn!”

“I don’t know which algorithm to use or why one would be better than another for my problem.”

“My problem doesn’t seem to fit with any standard algorithm.”

Model-based machine learning

Traditional:

“how do I map my problem into standard tools”?

Model-based:

“what is the model that represents my problem”?

Goal:

A single development framework which supports

the creation of a wide range of bespoke models

Potential benefits of MBML

Models optimised for each new application

Transparent functionality

– Models expressed as compact code

– Community of model builders

Segregate model from training/inference code

Newcomers learn one modelling environment

Does the “right thing” automatically

Handling uncertainty

We are uncertain about a player’s skill

Each result provides relevant information

But we are never completely certain

Uncertainty everywhere

Which movie should the user watch next?

Which word did the user write?

What did the user say?

Which web page is the user trying to find?

Which link will user click on?

What kind of product does the user wish to buy?

Which gesture is the user making?

Many others …

How can we compute with uncertainty in a principled way?

Probability

Limit of infinite number of trials (frequentist)

Quantification of uncertainty (Bayesian)

60% 40%

Probabilistic Graphical Models

Combine probability theory with graphs

new insights into existing models

framework for designing new models

graph-based algorithms for calculation and computation

efficient software implementation

Algorithms versus models

M. E. Tipping and C. M. Bishop (1997)

z

Gaussian

x

Gaussian

z1

z2

x1

x3

x2

𝐖𝐳 + 𝝁

N

p(z) p(x|z)

Principal components analysis (PCA)

TrueSkillTM

R. Herbrich, T. Minka, and T. Graepel (2006)

Elo

International standard for chess grading

A single rating value for each player

Limitations:not applicable to more than two players

not applicable to team games

Stages of MBML

1. Build a model: joint probability distribution of all of the relevant variables (e.g. as a graph)

2. Incorporate the observed data

3. Compute the distributions over the desired variables: inference

Iterate 2 and 3 in real-time applications

Extend model as required

I(1 > 2)

Gaussian

s1

Gaussian

s2

Gaussian

1

Gaussian

2

y12

s1 s2

y12

1 2

s1 s2

s1 s2

Expectation propagation (EP)

Convergence

0

5

10

15

20

25

30

35

40

Leve

l

0 100 200 300 400

Number of Games

char (Elo)

SQLWildman (Elo)

char (TrueSkill™)

SQLWildman (TrueSkill™)

1850 1858 1866 1875 1883 1891 1899 1907 1916 1924 1932 1940 1949 1957 1965 1973 1981 1990 1998 2006

1400

1600

1800

2000

2200

2400

2600

2800

3000

Adolf Anderssen

Mikhail Botvinnik

Jose Raul Capablanca

Robert James FischerAnatoly Karpov

Garry Kasparov

Emanuel Lasker

Paul Morphy

Boris V Spassky

Whilhelm Steinitz

Year

Skill

est

imat

e

ChessBase Analysis: 1850 - 2006

3.5M game outcomes20 million variables (200,000 players, skill per year, + latent variables)40 million factors

A probabilistic program compiler

Data

Algorithm execution

Predictions(with uncertainty)

1,000s lines of code10s lines of code

----------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------------------------------

Inferencecode

ProbabilisticProgram

Probabilistic Program Compiler

----------------------------------------------------------------------------------------------------------------

Probabilistic programming

A representation language for probabilistic models

Take a standard language and add support for:

random variables

constraints on variables

inference

http://research.microsoft.com/infernet

http://research.microsoft.com/infernet

s1 s2

1 2

y12

// model variables

Variable<double> skill1, skill2;

Variable<double> performance1, performance2;

Gaussian skillPosterior1, skillPosterior2;

// model

skill1 = Variable.GaussianFromMeanAndPrecision(0, 1);


performance1 = Variable.GaussianFromMeanAndPrecision(skill1, beta);


Variable.ConstrainPositive(performance1 – performance2);

// infer new posterior skills

InferenceEngine engine = new InferenceEngine();

skillPosterior1 = engine.Infer<Gaussian>(skill1);


Extension to multiple players

y12

1 2

s1 s2

3

s3

y23

// model variables

Variable<double> skill1, skill2, skill3;

Variable<double> performance1, performance2, performance3;

Gaussian skillPosterior1,skillPosterior2, skillPosterior3;

// model







Variable.ConstrainPositive(performance1 - performance2);







Extension to multiple players

Extension to teams

y12

1

s1 s2

2

s3 s4

// model variables

Variable<double> skill1, skill2, skill3, skill4;

Variable<double> performance1, performance2 , performance3, performance4;

Gaussian skillPosterior1,skillPosterior2, skillPosterior3, skillPosterior4;

// model





performance1 = Variable.GaussianFromMeanAndPrecision(skill1 + skill2, beta);

performance2 = Variable.GaussianFromMeanAndPrecision(skill3 + skill4, beta);








Extension to teams

y12

1 2

s1 s2

y12

1 2

s1 s2

TrueSkillTM through time

~~

~ ~

~

// model variables

Variable<double> skill1, skill2;

Variable<double> performance1, performance2;

Gaussian skillPosterior1, skillPosterior2;

// model

skill1 = Variable.GaussianFromMeanAndPrecision(oldskill1, alpha);

skill2 = Variable.GaussianFromMeanAndPrecision(oldskill2, alpha);


performance2 = Variable.GaussianFromMeanAndPrecision(skill2 ,beta);






TrueSkillTM through time

Gaussian

Sum

Product

Gaussian

GammaGaussian

SeedFeatureWeights

UserMailbox

PersonalFeatureWeights

E-mailMessage

t > 0

noise

xw

Weight

t

y

wm wp

wm prior wp prior

… Item

10,000

…

…

… … … … … …

User 10 Million

…

item

instance

user

trait

𝒩

user traits

item traits

user bias

item bias

noisy affinity

products

𝒩

affinitynoise

affinity

x

+

𝒩

𝒩

rating

>affinitythreshold

user thresholds

𝒩

user bias

featureweights

user featurevalues

•

user traitfeature weights

•

𝒩

item traitfeature weights

•

itemfeaturevalues

•

itembias

featureweights

𝒩 𝒩

𝒩𝒩user feature item feature

star rating

D. Stern, R. Herbrich, and T. Graepel (2009)

http://research.microsoft.com/~cmbishop

http://research.microsoft.com/~cmbishop

http://www.mbmlbook.com

http://www.mbmlbook.com/

model-based machine learning

Science

s1 s2 s1 s2

y12 s1 s2

existing models framework

probabilistic graphical

traditional machine

wide range of bespoke

handling uncertainty

standard algorithm