ecs550nfb introduction to numerical methods using matlab...

35
ECS550NFB Introduction to Numerical Methods using Matlab Day 4 Lukas Laffers [email protected] Department of Mathematics, University of Matej Bel June 11, 2015

Upload: lebao

Post on 12-May-2018

237 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: ECS550NFB Introduction to Numerical Methods using Matlab …static1.squarespace.com/static/52e69d46e4b05a145935f24d/t/557992... · ECS550NFB Introduction to Numerical Methods using

ECS550NFBIntroduction to Numerical Methods using Matlab

Day 4

Lukas [email protected]

Department of Mathematics, University of Matej Bel

June 11, 2015

Page 2: ECS550NFB Introduction to Numerical Methods using Matlab …static1.squarespace.com/static/52e69d46e4b05a145935f24d/t/557992... · ECS550NFB Introduction to Numerical Methods using

Today

I Basic econometricsI Linear regressionI Instrumental variablesI Panel Data regression

I BootstrapI Introduction to the BootstrapI Theory of BootstrapI Practical issues

I Selected topicsI Principal Components AnalysisI Support Vector MachinesI Cross validationI Non-parametric Estimation

Page 3: ECS550NFB Introduction to Numerical Methods using Matlab …static1.squarespace.com/static/52e69d46e4b05a145935f24d/t/557992... · ECS550NFB Introduction to Numerical Methods using

What is available

I MATLAB’s - Econometrics Toolbox (time series models)I MATLAB’s - Statistics and Machine Learning Toolbox (including

regression analysis)I LeSage - Econometrics toolbox (1999, free)I Panel Data toolbox (Alvarez, Barbero, Zofio)

Page 4: ECS550NFB Introduction to Numerical Methods using Matlab …static1.squarespace.com/static/52e69d46e4b05a145935f24d/t/557992... · ECS550NFB Introduction to Numerical Methods using

Basic econometrics - linear regression

y = Xβ + ε

[b,bint,r,rint,stats] = regress(y, X)

I β =(XTX

)−1XT y

I se(β) =√

nn−k−1 σ

2ε (XTX)−1

I tβi = βise(βi)

I R2 =∑n

i=1(yi−Xiβ)2∑ni=1 (yi−y)2

I CIi,α =[βi − tn−k−1

i,α se(βi), βi + tn−k−1i,α se(βi)

]

Page 5: ECS550NFB Introduction to Numerical Methods using Matlab …static1.squarespace.com/static/52e69d46e4b05a145935f24d/t/557992... · ECS550NFB Introduction to Numerical Methods using

Basic econometrics - instrumental variable regression

y = Xβ + ε

I Z is matrix of exogenous regressors, such that E(Z ′ε) = 0I PZ = Z(ZTZ)−1ZT is a projection matrixI β2SLS = (XTPZX)−1XTPZy

Page 6: ECS550NFB Introduction to Numerical Methods using Matlab …static1.squarespace.com/static/52e69d46e4b05a145935f24d/t/557992... · ECS550NFB Introduction to Numerical Methods using

Basic econometrics - Panel Data Regression

http://www.paneldatatoolbox.com/

yit = α+Xitβ + µi + vit

Models

I Panel Data ModelsI Instrumental Panel Data ModelsI Spatial Panel Data Models

What is available

I Pooled OLSI Fixed Effects (with option Robust)I Between EffectsI Random Effects (with option Robust)I Hausman test (Fixed vs Random effects)

Page 7: ECS550NFB Introduction to Numerical Methods using Matlab …static1.squarespace.com/static/52e69d46e4b05a145935f24d/t/557992... · ECS550NFB Introduction to Numerical Methods using

Basic econometrics - time series

http://www.mathworks.com/products/econometrics/

What is available

I Time Series modelling - ARIMAI State space modelling - Kalman filterI Monte Carlo simulationI ForecastingI Cointegration modelling - VECI Volatility modelling - ARCH, GARCH

Page 8: ECS550NFB Introduction to Numerical Methods using Matlab …static1.squarespace.com/static/52e69d46e4b05a145935f24d/t/557992... · ECS550NFB Introduction to Numerical Methods using

Bootstrap - what is it

I resampling method for estimating distribution of an estimator ortest statistic

I produces approximation that is at least as accurate as asymptoticexpansion

I may provide test statistic distribution/p-values/confidenceintervals when no asymptotic results are available

I usually used when: we have a consistent estimator, but we don’tknow how to derive standard errors

I after all, the data sample is all we have

Page 9: ECS550NFB Introduction to Numerical Methods using Matlab …static1.squarespace.com/static/52e69d46e4b05a145935f24d/t/557992... · ECS550NFB Introduction to Numerical Methods using

Bootstrap - how does it work

We will pretend our finite sample is a population and draw randomsamples with replacement from this population.

Page 10: ECS550NFB Introduction to Numerical Methods using Matlab …static1.squarespace.com/static/52e69d46e4b05a145935f24d/t/557992... · ECS550NFB Introduction to Numerical Methods using

Bootstrap - example

Nice animationshttps://www.stat.auckland.ac.nz/∼wild/BootAnim/

Page 11: ECS550NFB Introduction to Numerical Methods using Matlab …static1.squarespace.com/static/52e69d46e4b05a145935f24d/t/557992... · ECS550NFB Introduction to Numerical Methods using

Bootstrap - notation

I {Xi, i = 1, · · · , n} data from F0 ∈ II parametric F0(x, θ0) = P (X ≤ x)I statistic Tn = Tn(X1, ..., Xn)I Gn(τ, F0) = P (Tn ≤ τ) denotes the exact finite sample CDF of TnI Tn is pivotal if Gn(τ, F ) does not depend on FI Tn is asymptotically pivotal if G∞(τ, F ) does not depend on FI how can we estimate Gn(., F0)?

I by G∞ - asymptotic approximation (we need large n)I replace F0 with a known estimator - bootstrap

I Fn denotes the estimator of F0

I ECDF - Fn(x) = 1n

∑ni=1 I(Xi ≤ x))→a.s. F0(x)

I from a parametric family: F0(.) = F (., θ0)

Page 12: ECS550NFB Introduction to Numerical Methods using Matlab …static1.squarespace.com/static/52e69d46e4b05a145935f24d/t/557992... · ECS550NFB Introduction to Numerical Methods using

Bootstrap - Monte Carlo

Gn(., Fn)→ Gn(., F0)

Approximation procedure of Gn(τ, F0)

Step 1 Generate a random sample of size n from Fn: {X∗i : i = 1, ..., n}Step 2 Compute T ∗n = Tn(X∗1 , ..., X

∗n)

Step 3 Repeat (1) and (2) many times to get an empirical probability of(T ∗n ≤ τ)

Page 13: ECS550NFB Introduction to Numerical Methods using Matlab …static1.squarespace.com/static/52e69d46e4b05a145935f24d/t/557992... · ECS550NFB Introduction to Numerical Methods using

Bootstrap - ”does it work” ?

What does it mean, for the bootstrap to work? We would at leastexpect it to get the approximation right when data sample grows toinfinity.

Gn(t, Fn) is consistent if ∀ε > 0,∀F0 ∈ I

limn→∞

Pn

[supτ|Gn(t, Fn)−G∞(τ, F0)| > ε

]= 0

Page 14: ECS550NFB Introduction to Numerical Methods using Matlab …static1.squarespace.com/static/52e69d46e4b05a145935f24d/t/557992... · ECS550NFB Introduction to Numerical Methods using

Bootstrap - when does it work

Gn(τ, Fn) ∼ G∞(τ, Fn) ∼ G∞(τ, F0) ∼ Gn(τ, F0)

(Beran and Ducharme 1991)

I Fn → F0

I G∞(τ, F ) is continuous function of τ for any F ∈ II for any τ and any sequence Hn, that Hn → F0:Gn(τ,Hn)→ G∞(τ, F0)

(Mammen 1992)

I gn = 1n

∑gn(Xi), Tn = gn−tn

σn

I g∗n = 1n

∑gn(X∗i ), T ∗n = g∗n−tn

σn

I Then, G∗n consistently estimates Gn if and only if Tn →d N(0, 1)

Page 15: ECS550NFB Introduction to Numerical Methods using Matlab …static1.squarespace.com/static/52e69d46e4b05a145935f24d/t/557992... · ECS550NFB Introduction to Numerical Methods using

Bootstrap - when does it not work

I Heavy tailed distributions, Xi random sample from Cauchy,Tn = X

I Xi random sample from N(µ, σ2), Tn = n1/2(X2 − µ2) if µ 6= 0,otherwise Tn = nX2.

I Maximum of a sample: F0 has a support [0, θ0].θn = max{X1, ..., Xn}. Tn = n(θn − θ), T ∗n = n(θ∗n − θn).P ∗n(T ∗n = 0) = 1− (1− 1/n)n → 1− e−1 while P (Tn = 0)→ 0.

I Parameter on a boundary: Xi is random sample from N(µ, 1)where µ ∈ [0,∞) [Andrews (2000)]

Page 16: ECS550NFB Introduction to Numerical Methods using Matlab …static1.squarespace.com/static/52e69d46e4b05a145935f24d/t/557992... · ECS550NFB Introduction to Numerical Methods using

Bootstrap - even more

I consistency is nice, but bootstrap allows us to improve finitesample properties of an estimator!

I bias-correctionI bootstraping critical valuesI parametric vs non-parametric bootstrap - how to choose Fn(?)

Page 17: ECS550NFB Introduction to Numerical Methods using Matlab …static1.squarespace.com/static/52e69d46e4b05a145935f24d/t/557992... · ECS550NFB Introduction to Numerical Methods using

Bootstrap - bias correction

We care about E[θn − θ]

Step 1 Compute θnStep 2 Generate a random sample of size n from Fn: {X∗i : i = 1, ..., n}

and calculate θ∗n = g(X∗)

Step 3 Repeat (2) many times to calculate E∗θ∗n. Bias estimate is nowE∗θ∗n − θn. Bias corrected estimator is θn −B∗n

Do not use it for√n consistent estimators, because of a higher variance

of the bias-corrected estimator.

Page 18: ECS550NFB Introduction to Numerical Methods using Matlab …static1.squarespace.com/static/52e69d46e4b05a145935f24d/t/557992... · ECS550NFB Introduction to Numerical Methods using

Bootstrap - hypothesis tests

Tn = n1/2 θn − θ0

sθn

I With Asymptotic Refinement - use bootstrap to get critical valuesI Without Asymptotic Refinement

I use bootstrap to estimate standard error of an estimatorI percentile method - used quantiles of the distribution of θ∗n

Page 19: ECS550NFB Introduction to Numerical Methods using Matlab …static1.squarespace.com/static/52e69d46e4b05a145935f24d/t/557992... · ECS550NFB Introduction to Numerical Methods using

Bootstrap - hypothesis tests

Step 1 Compute θnStep 2 Generate a random sample of size n from Fn: {X∗i : i = 1, ..., n}

and calculate T ∗n = n1/2(θ∗n − θn)/s∗nStep 3 Repeat (2) many times to get empirical distribution of T ∗n . We set

z∗n,α/2 to the (1− α) quantile of this distribution.

Page 20: ECS550NFB Introduction to Numerical Methods using Matlab …static1.squarespace.com/static/52e69d46e4b05a145935f24d/t/557992... · ECS550NFB Introduction to Numerical Methods using

Bootstrap - choice of the number of bootstrapreplications

I the larger the number the betterI Efron and Tibshirani (1993) - 200I Andrews and Buchinsky (2000) - 2000I the smaller the quantile the larger the number of replications

Page 21: ECS550NFB Introduction to Numerical Methods using Matlab …static1.squarespace.com/static/52e69d46e4b05a145935f24d/t/557992... · ECS550NFB Introduction to Numerical Methods using

Implementation in MATLAB

Statistics and Machine Learning Toolbox

Bootstrapping statistics[bootstat, bootsam] = bootstrp(nboot,bootfun,d1)

Bootstrap confidence intervals[ci, bootstat] =

bootci(nboot,bootfun,...,...,’Options’,options)

I Normal approximationI Percentile methodI Bias correctedI Bias corrected and accelerated

Page 22: ECS550NFB Introduction to Numerical Methods using Matlab …static1.squarespace.com/static/52e69d46e4b05a145935f24d/t/557992... · ECS550NFB Introduction to Numerical Methods using

When bootstrap fails - Subsampling

Alternative to Bootstrap.

I we draw smaller samples without replacementI crucial difference: we draw samples from the true data generating

process (F0) and not from the estimated model (Fn)I more general than bootstrapI less powerful in cases when bootstrap worksI practical difficulties → how to choose subsample size?

Page 23: ECS550NFB Introduction to Numerical Methods using Matlab …static1.squarespace.com/static/52e69d46e4b05a145935f24d/t/557992... · ECS550NFB Introduction to Numerical Methods using

Selected topics

I Principal Components AnalysisI Support Vector MachinesI Cross ValidationI Non-parametric estimation

Page 24: ECS550NFB Introduction to Numerical Methods using Matlab …static1.squarespace.com/static/52e69d46e4b05a145935f24d/t/557992... · ECS550NFB Introduction to Numerical Methods using

Principal Components Analysis

Suppose we have many variables but there are certain regularities inour data.

Can we encode (almost) the same information using fewer variables?(dimension reduction).

We will transform the space that our data span into orthogonal space,the basis vectors are principal components.

We order the principal components according to their importance, thatis, what fraction of variation they explain.What is it good for?

Dimension reduction.

Page 25: ECS550NFB Introduction to Numerical Methods using Matlab …static1.squarespace.com/static/52e69d46e4b05a145935f24d/t/557992... · ECS550NFB Introduction to Numerical Methods using

Principal Components Analysis - example

Nice animationhttp://setosa.io/ev/principal-component-analysis/

Page 26: ECS550NFB Introduction to Numerical Methods using Matlab …static1.squarespace.com/static/52e69d46e4b05a145935f24d/t/557992... · ECS550NFB Introduction to Numerical Methods using

PCA - how does it work

I principal components are eigenvalues of the covariance matrix ofthe demeaned data

Page 27: ECS550NFB Introduction to Numerical Methods using Matlab …static1.squarespace.com/static/52e69d46e4b05a145935f24d/t/557992... · ECS550NFB Introduction to Numerical Methods using

Support Vector Machines

Is a 0 -1 classifier.

Training set → classifier.

This classifier is a result of an optimization problem (quadraticprogramming), that tries to separate 0s from 1s.

Those points that separates 0s from 1s are called support vectors.

Page 28: ECS550NFB Introduction to Numerical Methods using Matlab …static1.squarespace.com/static/52e69d46e4b05a145935f24d/t/557992... · ECS550NFB Introduction to Numerical Methods using

Support Vector Machines - Hard vs Soft Margin

Source: www.stackoverflow.com

Page 29: ECS550NFB Introduction to Numerical Methods using Matlab …static1.squarespace.com/static/52e69d46e4b05a145935f24d/t/557992... · ECS550NFB Introduction to Numerical Methods using

Support Vector Machine - Non-linear

Separation may not be possible in the original space so we project thepoints into Feature space

Source: www.stackoverflow.com

Page 30: ECS550NFB Introduction to Numerical Methods using Matlab …static1.squarespace.com/static/52e69d46e4b05a145935f24d/t/557992... · ECS550NFB Introduction to Numerical Methods using

Cross validation

Our goal is prediction.

We divide our data into three parts.

I Training set - here we train different predictorsI Validation set - pick a winner that is best on the validation set

(the one that fits best in the training phase may be overfitting)I Test set - check how well is the winner doing

Page 31: ECS550NFB Introduction to Numerical Methods using Matlab …static1.squarespace.com/static/52e69d46e4b05a145935f24d/t/557992... · ECS550NFB Introduction to Numerical Methods using

Non-parametric estimation - Why?(DiNardo and Tobias 2001)Parametric model:

Page 32: ECS550NFB Introduction to Numerical Methods using Matlab …static1.squarespace.com/static/52e69d46e4b05a145935f24d/t/557992... · ECS550NFB Introduction to Numerical Methods using

Why Non-parametric?(DiNardo and Tobias 2001)Non-parametric model:

Page 33: ECS550NFB Introduction to Numerical Methods using Matlab …static1.squarespace.com/static/52e69d46e4b05a145935f24d/t/557992... · ECS550NFB Introduction to Numerical Methods using

Kernel density estimation

[f,xi] = ksdensity(x,pts,Name,Value)

I kernel - ’normal’, ’box’, triangle’, ’epanechnikov’ orcustom

I npoints - at how many points it will evaluate (length of xi)I support - ’unbounded’, ’positive’

I bandwidth - kernel-smoothing window

Page 34: ECS550NFB Introduction to Numerical Methods using Matlab …static1.squarespace.com/static/52e69d46e4b05a145935f24d/t/557992... · ECS550NFB Introduction to Numerical Methods using

Kernel density estimation

Source: www.mathworks.com

Page 35: ECS550NFB Introduction to Numerical Methods using Matlab …static1.squarespace.com/static/52e69d46e4b05a145935f24d/t/557992... · ECS550NFB Introduction to Numerical Methods using

Literature

I LeSage’s Econometrics Toolbox http://www.spatial-econometrics.com/html/mbook.pdfI Panel Data toolbox (Alvarez, Barbero, Zofio) https://ideas.repec.org/p/uam/wpaper/201305.html

www.paneldatatoolbox.comI Horowitz, Joel L. ”The bootstrap.” Handbook of econometrics 5 (2001): 3159-3228.

I Cameron, A. Colin, and Pravin K. Trivedi. Microeconometrics: methods and applications.Cambridge university press, 2005.

I Efron, Bradley, and Robert J. Tibshirani. An introduction to the bootstrap. CRC press, 1994.I (mathematical theory) Beran, Rudolf J., and Gilles R. Ducharme. Asymptotic theory for bootstrap

methods in statistics. Centre de Recherches Mathematiques, 1991.I Kennedy, P. Bootstrapping Student Understanding of What is Going On in Econometrics

http://www.sfu.ca/ kkasa/middle.pdf - this is very instructive and usefulI Bootstrap Testing in Econometrics http://qed.econ.queensu.ca/faculty/mackinnon/papers/bt-cea.pdfI Politis, D. N., J. P. Romano. and M. Wolf. Subsampling. Springer, 1999.”I Bootstrap vs Subsampling

https://normaldeviate.wordpress.com/2013/01/19/bootstrapping-and-subsampling-part-i/

https://normaldeviate.wordpress.com/2013/01/27/bootstrapping-and-subsampling-part-ii/I Bootstrap vs Subsampling http://web.stanford.edu/∼doubleh/eco273/subsampling.pdfI Hastie, Trevor, et al. The elements of statistical learning. Vol. 2. No. 1. New York: springer, 2009.

http://statweb.stanford.edu/ tibs/ElemStatLearn/I Varian, H. Big Data: New Tricks for Econometrics

http://people.ischool.berkeley.edu/∼hal/Papers/2013/ml.pdfI Support Vector Machine http://research.microsoft.com/pubs/67119/svmtutorial.pdfI Cross Validation MATLAB example

http://white.stanford.edu/ knk/Psych216A/Psych216ALecture5Tutorial.mI Nonparametric econometrics http://www.ssc.wisc.edu/∼bhansen/718/NonParametrics1.pdf