regression machine learning

Upload: munish-mehra

Post on 14-Apr-2018

225 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/27/2019 Regression machine learning

    1/24

    Regression on Page Relevancy

    CSE4/574 Machine Learning

    TA: Zhen [email protected]

  • 7/27/2019 Regression machine learning

    2/24

    Web search ranking

    Goal: given queries and a documents/urls, estimate the Web searchresults (relevance) of the pages to the queries.

    Ranking the pages via a relevance function.

    Rankingurl pages

    1

    2

    4

    rankingresult

    query

  • 7/27/2019 Regression machine learning

    3/24

    Regression on Page Relevancy

    Not Ranking!!

    Goal: Train a regression model based on query-url pair datasets , then

    predict the page relevancy labels for new coming queries.

    Binary / multiple levels of relevance (Bad, Fair, Good, Excellent, Perfect, ...)

    Model url pages

    3

    2

    4

    relevance

    levels

    query

  • 7/27/2019 Regression machine learning

    4/24

    Datasets

    Large scale real world learning to rank (LTR) datasets that has beenreleased:

    Queries Doc. Rel. Feat. Year

    Letor3.0 Gov 575 568k 2 64 2008

    Letor3.0 Ohsumed 106 16k 3 45 2008

    Letor4.0 2476 85k 3 46 2009

    Yandex 20267 213k 5 245 2009

    Yahoo 36251 883k 5 700 2010

  • 7/27/2019 Regression machine learning

    5/24

    Letor4.0 Dataset

    The latest version, 4.0, can be found athttp://research.microsoft.com/en-us/um/beijing/projects/letor/letor4dataset.aspx

    (It contains 8 datasets for four ranking settings derived from the two query

    sets and the Gov2 web page collection.)

    LETOR is a package of benchmark data sets for research on Learning ToRank released by Microsoft Research Asia.

    For this project, one dataset of MQ2008 is used (supervised ranking):

    Querylevelnorm.txt (15211 urls/samples in total)

    http://research.microsoft.com/en-us/um/beijing/projects/letor/letor4dataset.aspxhttp://research.microsoft.com/en-us/um/beijing/projects/letor/LETOR4.0/Data/MQ2008.rarhttp://research.microsoft.com/en-us/um/beijing/projects/letor/LETOR4.0/Data/MQ2008.rarhttp://research.microsoft.com/en-us/um/beijing/projects/letor/letor4dataset.aspxhttp://research.microsoft.com/en-us/um/beijing/projects/letor/letor4dataset.aspxhttp://research.microsoft.com/en-us/um/beijing/projects/letor/letor4dataset.aspx
  • 7/27/2019 Regression machine learning

    6/24

    Letor4.0 DatasetSample rows from the MQ2008 dataset:

    Judgments {0; 1; 2; 3; 4} (Bad, Fair, Good, Excellent, Perfect).

  • 7/27/2019 Regression machine learning

    7/24

    Letor4.0 DatasetSample rows from the MQ2008 dataset:

    1. The first column is relevance label of this pair. The larger the relevance label,

    the more relevant the query-document pair.

    2. The second column is query id,

    3. The following 46 columns are features. A query-document pair is represented

    by a 46-dimensional feature vector of real numbers in the range 0 to 1.

    4. The end of the row is a comment about the pair, including id of the document.

    Judgments {0; 1; 2}

  • 7/27/2019 Regression machine learning

    8/24

    Features

    Given a query and a document, construct

    a feature vector (normalized between 0 and 1)

  • 7/27/2019 Regression machine learning

    9/24

    Import Data Set

    Matlab function: fopen, textscan, strfind, etc.

    Read by line

    File -> Import Data

    >> line_string = importedData{1} % imported data is nx1 cell

    or

    >> fid = fopen(dataset.txt);

    >> data = textscan(fid, %*^\n+); % read by lines, data is 1x1 cell

    >> line_string = data{1}{1};

    Example of line in string

  • 7/27/2019 Regression machine learning

    10/24

    Process Data Set (i)

    2 qid:10002 1:0.007477 2:0.000000 3:1.000000 4:0.000000 5:0.007470 46:0.007042 #docid =GX008-86-4444840 inc = 1 prob = 0.086622

    2 qid:10002 1:0.007477 2:0.000000 3:1.000000 4:0.000000 5:0.007470 46:0.007042 #docid =

    GX008-86-4444840 inc = 1 prob = 0.086622

    LETOR 4.0

    Process the original data into a matrix containing relevance labels (thefirst column) and feature vectors. This input matrix (training data) will be

    feed into your regression model.

  • 7/27/2019 Regression machine learning

    11/24

    Process Data Set (ii)

    Relevancy

    labelsFeature Vectors

    2 0.3 0.45 0.12 0.89

    Dateset

    train

    validation

    test

    1-dimension M-1 dimension

    N X M

    For LETOR 4.0, you need partition the data set into three subsets.

  • 7/27/2019 Regression machine learning

    12/24

    Train/Validation/Test Sets

    Relevancy

    labelsFeature Vectors

    Dateset

    train

    validation

    test

    1-dimension M-1 dimension N X M

    Leave out asground truth!

  • 7/27/2019 Regression machine learning

    13/24

    Linear Regression

    Problem: We want a general way of obtaining a linear model (model islinear in the parameters) that fitted to observed data.

    wxwx, )()()(

    1

    1

    0

    M

    j

    jj xwwy

    Typically, 0(x) = 1, so that w0 acts as a bias parameter.

    In the simplest case, we use linear basis functions : j(x) = xj.

    General set up:

    Given a set of training examples (xn, tn), n =1, N

    Goal: learn a function y(x) to minimize someloss function (error function): E(y,t)

    Linear Basis function Model:

  • 7/27/2019 Regression machine learning

    14/24

    Linear Regression

    ww, )(xy

    Nx

    x

    x

    2

    1

    x

    Nt

    t

    t

    2

    1

    t

    )()()(

    )()()(

    )()()(

    )(

    110

    212120

    111110

    NMNN

    M

    M

    xxx

    xxx

    xxx

    x

    N x M design matrix

    a single data

    a basis function

    1

    1

    0

    Mw

    w

    w

    w

    t)t)ty, T -wwE (()( tw*

    t)

    TT

    T

    -

    1)(

    0(

    wEw

    Estimation:

    Squared Error function:Least squares solution:

    )(argmin ty,w* Ew

    Minimize error:

  • 7/27/2019 Regression machine learning

    15/24

    Linear Basis Function Models

    wxxwx, )()()(1

    0

    M

    j

    jjwy

    2

    2

    2

    )(exp)(

    s

    x jj

    xjj x)(x

    Polynomial Gaussian Sigmoid

    s

    x jj

    )(x

    )exp(1

    1)(

    aa

  • 7/27/2019 Regression machine learning

    16/24

    Linear Regression for Project

    Project Goal: To predict the value of one or more continuous targetvariables tgiven the value of a D-dimensional vector xof input variables.

    One dimensional:

    D = 1 (already encountered)

    D

    nnn

    D

    D

    xxx

    xxx

    xxx

    ...

    ...

    21

    2

    2

    2

    1

    2

    1

    2

    1

    1

    1

    x

    nt

    t

    t

    2

    1

    t

    ?

    1

    0

    w

    ww

    wFind

  • 7/27/2019 Regression machine learning

    17/24

    Linear Regression for Project

    Polynomial Basis Function (not required) jj x)(x

    Different orders

    of polynomial

    Sum over

    D dimension

    112112222121

    1

    1

    12

    1

    11

    1

    2

    1

    22

    1

    21

    11

    2

    1

    1

    1

    )(,...,)(,)(,...,)(,...,)(,)(,,...,,,1

    )(,...,)(,)(,...,)(,...,)(,)(,,...,,,1

    )(

    MD

    N

    M

    N

    M

    N

    D

    NNN

    D

    NNN

    MDMMDD

    xxxxxxxxx

    xxxxxxxxx

    x

    N x ((M-1)xD + 1) matrix

    w: (M-1)xD+1 dimension weight vector

    1

    1 1

    ),(0 )()(M-

    j

    D

    i

    ijji xwwy wx,

  • 7/27/2019 Regression machine learning

    18/24

    Linear Regression for Project

    Gaussian Basis Function

    1

    1 1

    ),(0 )()(M-

    j

    D

    i

    ijji xwwy wx,

    Different Gaussian

    parameter settings

    Sum over

    D dimension

    )(),...,(),(),...(),...,(),(),(),...,(),(,1

    )(),...,(),(),...(),...,(),(),(),...,(),(,1

    )(

    1

    2

    1

    1

    12

    2

    2

    1

    21

    2

    1

    1

    1

    11

    2

    11

    1

    1112

    2

    12

    1

    1211

    2

    11

    1

    11

    D

    NMNMNM

    D

    NNN

    D

    NNN

    D

    MMM

    DD

    xxxxxxxxx

    xxxxxxxxx

    x

    2

    2

    2)(exp)(

    sx j

    j x

    N x ((M-1)xD + 1) matrix

    Sigmoid basis function: similar to Gaussian

    w: (M-1)xD+1 dimension weight vector

  • 7/27/2019 Regression machine learning

    19/24

    Overfitting Issue

    What can we do to curb overfitting?

    Use less complex model

    Use more training examples

    Regularization

  • 7/27/2019 Regression machine learning

    20/24

    Regularized Least Square

    )()()( www WD EEE

    Squared Error function:

    Regularized Least squares solution:

    )(argmin ww* Ew

    Minimize error:

    Add regularization term to error function to control over-fitting:

    wwt)wt)ww TT 2

    1(()( -E

    tIw*wt)w TTT 1)(( Ew

    encourage small

    weight values!

    Regularization termData dependent term

  • 7/27/2019 Regression machine learning

    21/24

    Experimental Phases

    Determine format

    of your model

    Train the model

    you have selected

    learn weights w

    Adjusting following:

    # of basis func.

    Regularization Hyperparameter ,

    etc.

    Evaluating the

    final model

    Report test errorModel

    Unacceptable validation error

    Training Validation Test

    Model with

    tunedparameters

  • 7/27/2019 Regression machine learning

    22/24

    Experimental Phases

    Determine format

    of your model

    Train the model

    you have selected

    learn weights w

    Adjusting following:

    # of basis func.

    Regularization Hyperparameter ,

    etc.

    Evaluating the

    final model

    Report test errorModel

    Unacceptable validation error

    Training Validation Test

    Model with

    tunedparameters

    Optimal solution? Model complexity?

  • 7/27/2019 Regression machine learning

    23/24

    Evaluation Metrics

    Express results as Root Mean Square Error: ERMS

    N

    E

    ED

    RMS

    )(2

    )(

    w

    w

    N: number of data in data set

    ED(w): sum of square error function

    (data-dependent error)

  • 7/27/2019 Regression machine learning

    24/24

    Project Report

    Explain the problem and how you choose your model.

    Elaborate your validating process.

    - The intuitive choice of parameters)

    There are no limitation on setting parameters and there could be infinity choices.You can define some range or choose some specific values.

    - Description of how you went about avoiding overfitting.

    Generate graphs showing how error changes with the

    adjusting of parameters.

    Report final result and evaluating model performance.