regression using boosting vishakh (vv2131@columbia.edu)vv2131@columbia.edu advanced machine learning...

Regression Using Boosting

Vishakh (vv2131@columbia.edu)

Advanced Machine LearningFall 2006

Introduction

● Classification with boosting– Well-studied– Theoretical bounds and guarantees– Empirically tested

● Regression with boosting– Rarely used– Some bounds and guarantees– Very little empirical testing

Project Description

● Study existing algorithms & formalisms– AdaBoost.R (Fruend & Schapire, 1997)– SquareLev.R (Duffy & Helmbold, 2002)– SquareLev.C (Duffy & Helmbold, 2002)– ExpLev (Duffy & Helmbold, 2002)

● Verify effectiveness by testing on interesting dataset.– Football Manager 2006

A Few Notes

● Want PAC-like guarantees● Can't directly transfer processes from

classification– Simply re-weighting distribution over iterations doesn't

work. – Can modify samples and still remain consistent with

original function class.● Performing gradient descent on a potential

function.

SquareLev.R

● Squared error regression.● Uses regression algorithm for base learner.● Modifies labels, not distribution.● Potential function uses variance of residuals.● New label proportional to negative gradient of

potential function.● Each iteration, mean squared error decreases by a

multiplicative factor.● Can get arbitrarily small squared error as long as

correlation between residuals and predictions > threshold.

SquareLev.C

● Squared error regression● Use a base classifier● Modifies labels and distribution● Potential function uses residuals● New label sign of instance's residual

ExpLev

● Attempts to get small residuals at each point.● Uses exponential potential.● AdaBoost pushes all instances to positive margin.● ExpLev pushes all instances to have small

residuals● Uses base regressor ([-1,+1]) or classifier ({-

1,+1}). ● Two-sided potential uses exponents of residuals.● Base learner must perform well with relabeled

instances.

Naive Approach

● Directly translate AdaBoost to the regression setting.

● Use thresholding of squared error to reweight.● Use to compare test veracity of other approaches

Dataset

● Data from Football Manager 2006– Very popular game– Statistically driven

● Features are player attributes.● Labels are average performance ratings over a

season.● Predict performance levels and use learned model

to guide game strategy.

Work so far

● Conducted survey● Studied methods and formal guarantees and

bounds.● Implementation still underway.

Conclusions

● Interesting approaches and analyses of boosting regression available.

● Insufficient real-world verification.● Further work

– Regressing noisy data– Formal results for more relaxed assumptions

regression using boosting vishakh (vv2131@columbia.edu)vv2131@columbia.edu advanced machine learning...

rsquared error regression

small squared error

distributionpotential

regression setting

regression algorithm

regression available

squared error decreases

exponential potential

Documents

instructors: bika rebek (br2428@columbia.edu), lexi tsien

web operating system by xiao han by xiao han...

motif detection in yeast vishakh joe bertolami nick urrea...

hgl hypertext generation...

1.5 day2.0 day2.5 day - columbia.edu

international bureau - columbia.edu

mpeg-2 music videos :-) alan crosswell alan@columbia.edu

obsah - columbia.edu · obsah - columbia.edu ... 1

2006.006 2007 - columbia.edu

phase diagram for co2 - columbia.edu

caitlin madevu-matson (cm3315@columbia.edu) charon gwynn...

education enhance middle/high school stem experimental...

histo on your nerves ssn histology october 26th, 2004 missy...

correspondence: jia guo (jg3400@columbia.edu) or andrew f

text 2 speech synthesizer -...

personalized search xiao liu xl2230@columbia.edu

columbia.edu/itc/hs/dental/d7710/client_ edit/

kostas.tsigaridis@columbia.edu ... -...

boosting small engines to high performance – boosting...

frank r. lichtenberg columbia university...