regression machine learning

7/27/2019 Regression machine learning

1/24

Regression on Page Relevancy

CSE4/574 Machine Learning

TA: Zhen [email protected]


2/24

Web search ranking

Goal: given queries and a documents/urls, estimate the Web searchresults (relevance) of the pages to the queries.

Ranking the pages via a relevance function.

Rankingurl pages

1

2

4

rankingresult

query


3/24

Regression on Page Relevancy

Not Ranking!!

Goal: Train a regression model based on query-url pair datasets , then

predict the page relevancy labels for new coming queries.

Binary / multiple levels of relevance (Bad, Fair, Good, Excellent, Perfect, ...)

Model url pages

3

2

4

relevance

levels

query


4/24

Datasets

Large scale real world learning to rank (LTR) datasets that has beenreleased:

Queries Doc. Rel. Feat. Year

Letor3.0 Gov 575 568k 2 64 2008

Letor3.0 Ohsumed 106 16k 3 45 2008

Letor4.0 2476 85k 3 46 2009

Yandex 20267 213k 5 245 2009

Yahoo 36251 883k 5 700 2010


5/24

Letor4.0 Dataset

The latest version, 4.0, can be found athttp://research.microsoft.com/en-us/um/beijing/projects/letor/letor4dataset.aspx

(It contains 8 datasets for four ranking settings derived from the two query

sets and the Gov2 web page collection.)

LETOR is a package of benchmark data sets for research on Learning ToRank released by Microsoft Research Asia.

For this project, one dataset of MQ2008 is used (supervised ranking):

Querylevelnorm.txt (15211 urls/samples in total)
http://research.microsoft.com/en-us/um/beijing/projects/letor/letor4dataset.aspxhttp://research.microsoft.com/en-us/um/beijing/projects/letor/LETOR4.0/Data/MQ2008.rarhttp://research.microsoft.com/en-us/um/beijing/projects/letor/LETOR4.0/Data/MQ2008.rarhttp://research.microsoft.com/en-us/um/beijing/projects/letor/letor4dataset.aspxhttp://research.microsoft.com/en-us/um/beijing/projects/letor/letor4dataset.aspxhttp://research.microsoft.com/en-us/um/beijing/projects/letor/letor4dataset.aspx


6/24

Letor4.0 DatasetSample rows from the MQ2008 dataset:

Judgments {0; 1; 2; 3; 4} (Bad, Fair, Good, Excellent, Perfect).


7/24

Letor4.0 DatasetSample rows from the MQ2008 dataset:

1. The first column is relevance label of this pair. The larger the relevance label,

the more relevant the query-document pair.

2. The second column is query id,

3. The following 46 columns are features. A query-document pair is represented

by a 46-dimensional feature vector of real numbers in the range 0 to 1.

4. The end of the row is a comment about the pair, including id of the document.

Judgments {0; 1; 2}


8/24

Features

Given a query and a document, construct

a feature vector (normalized between 0 and 1)


9/24

Import Data Set

Matlab function: fopen, textscan, strfind, etc.

Read by line

File -> Import Data

>> line_string = importedData{1} % imported data is nx1 cell

or

>> fid = fopen(dataset.txt);

>> data = textscan(fid, %*^\n+); % read by lines, data is 1x1 cell

>> line_string = data{1}{1};

Example of line in string


10/24

Process Data Set (i)

2 qid:10002 1:0.007477 2:0.000000 3:1.000000 4:0.000000 5:0.007470 46:0.007042 #docid =GX008-86-4444840 inc = 1 prob = 0.086622

2 qid:10002 1:0.007477 2:0.000000 3:1.000000 4:0.000000 5:0.007470 46:0.007042 #docid =

GX008-86-4444840 inc = 1 prob = 0.086622

LETOR 4.0

Process the original data into a matrix containing relevance labels (thefirst column) and feature vectors. This input matrix (training data) will be

feed into your regression model.


11/24

Process Data Set (ii)

Relevancy

labelsFeature Vectors

2 0.3 0.45 0.12 0.89

Dateset

train

validation

test

1-dimension M-1 dimension

N X M

For LETOR 4.0, you need partition the data set into three subsets.


12/24

Train/Validation/Test Sets

Relevancy

labelsFeature Vectors

Dateset

train

validation

test

1-dimension M-1 dimension N X M

Leave out asground truth!


13/24

Linear Regression

Problem: We want a general way of obtaining a linear model (model islinear in the parameters) that fitted to observed data.

wxwx, )()()(

1

1

0

M

j

jj xwwy

Typically, 0(x) = 1, so that w0 acts as a bias parameter.

In the simplest case, we use linear basis functions : j(x) = xj.

General set up:

Given a set of training examples (xn, tn), n =1, N

Goal: learn a function y(x) to minimize someloss function (error function): E(y,t)

Linear Basis function Model:


14/24

Linear Regression

ww, )(xy

Nx

x

x

2

1

x

Nt

t

t

2

1

t

)()()(

)()()(

)()()(

)(

110

212120

111110

NMNN

M

M

xxx

xxx

xxx

x

N x M design matrix

a single data

a basis function

1

1

0

Mw

w

w

w

t)t)ty, T -wwE (()( tw*

t)

TT

T

-

1)(

0(

wEw

Estimation:

Squared Error function:Least squares solution:

)(argmin ty,w* Ew

Minimize error:


15/24

Linear Basis Function Models

wxxwx, )()()(1

0

M

j

jjwy

2

2

2

)(exp)(

s

x jj

xjj x)(x

Polynomial Gaussian Sigmoid

s

x jj

)(x

)exp(1

1)(

aa


16/24

Linear Regression for Project

Project Goal: To predict the value of one or more continuous targetvariables tgiven the value of a D-dimensional vector xof input variables.

One dimensional:

D = 1 (already encountered)

D

nnn

D

D

xxx

xxx

xxx

...

...

21

2

2

2

1

2

1

2

1

1

1

x

nt

t

t

2

1

t

?

1

0

w

ww

wFind


17/24


Polynomial Basis Function (not required) jj x)(x

Different orders

of polynomial

Sum over

D dimension

112112222121

1

1

12

1

11

1

2

1

22

1

21

11

2

1

1

1

)(,...,)(,)(,...,)(,...,)(,)(,,...,,,1

)(,...,)(,)(,...,)(,...,)(,)(,,...,,,1

)(

MD

N

M

N

M

N

D

NNN

D

NNN

MDMMDD

xxxxxxxxx

xxxxxxxxx

x

N x ((M-1)xD + 1) matrix

w: (M-1)xD+1 dimension weight vector

1

1 1

),(0 )()(M-

j

D

i

ijji xwwy wx,


18/24


Gaussian Basis Function

1

1 1

),(0 )()(M-

j

D

i

ijji xwwy wx,

Different Gaussian

parameter settings

Sum over

D dimension

)(),...,(),(),...(),...,(),(),(),...,(),(,1

)(),...,(),(),...(),...,(),(),(),...,(),(,1

)(

1

2

1

1

12

2

2

1

21

2

1

1

1

11

2

11

1

1112

2

12

1

1211

2

11

1

11

D

NMNMNM

D

NNN

D

NNN

D

MMM

DD

xxxxxxxxx

xxxxxxxxx

x

2

2

2)(exp)(

sx j

j x

N x ((M-1)xD + 1) matrix

Sigmoid basis function: similar to Gaussian

w: (M-1)xD+1 dimension weight vector


19/24

Overfitting Issue

What can we do to curb overfitting?

Use less complex model

Use more training examples

Regularization


20/24

Regularized Least Square

)()()( www WD EEE

Squared Error function:

Regularized Least squares solution:

)(argmin ww* Ew

Minimize error:

Add regularization term to error function to control over-fitting:

wwt)wt)ww TT 2

1(()( -E

tIw*wt)w TTT 1)(( Ew

encourage small

weight values!

Regularization termData dependent term


21/24

Experimental Phases

Determine format

of your model

Train the model

you have selected

learn weights w

Adjusting following:

# of basis func.

Regularization Hyperparameter ,

etc.

Evaluating the

final model

Report test errorModel

Unacceptable validation error

Training Validation Test

Model with

tunedparameters


22/24

Experimental Phases

Determine format

of your model

Train the model

you have selected

learn weights w

Adjusting following:

# of basis func.

Regularization Hyperparameter ,

etc.

Evaluating the

final model

Report test errorModel

Unacceptable validation error

Training Validation Test

Model with

tunedparameters

Optimal solution? Model complexity?


23/24

Evaluation Metrics

Express results as Root Mean Square Error: ERMS

N

E

ED

RMS

)(2

)(

w

w

N: number of data in data set

ED(w): sum of square error function

(data-dependent error)


24/24

Project Report

Explain the problem and how you choose your model.

Elaborate your validating process.

- The intuitive choice of parameters)

There are no limitation on setting parameters and there could be infinity choices.You can define some range or choose some specific values.

- Description of how you went about avoiding overfitting.

Generate graphs showing how error changes with the

adjusting of parameters.

Report final result and evaluating model performance.

regression machine learning

Documents