partial least squares (pls): path modeling

31
Partial Least Squares (PLS): Path Modeling Method Talk Winter Term 2015/16 Pascal Stichler

Upload: others

Post on 22-Dec-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Partial Least Squares (PLS): Path Modeling

Partial Least Squares (PLS):Path Modeling

Method TalkWinter Term 2015/16

Pascal Stichler

Page 2: Partial Least Squares (PLS): Path Modeling

Outline

1 Introduction to PLS

2 Putting PLS in Context

3 Model Definition

4 Solution Algorithm

5 Model Evaluation

6 Wrap-Up

2PLS

Page 3: Partial Least Squares (PLS): Path Modeling

Today’s Lecture

Objectives

1 Evaluate when to use PLS

2 Learn how PLS works and how to use it

3 Investigate how to evaluate a PLS model, interpret the results andadjust the model accordingly

3PLS

Page 4: Partial Least Squares (PLS): Path Modeling

Outline

1 Introduction to PLS

2 Putting PLS in Context

3 Model Definition

4 Solution Algorithm

5 Model Evaluation

6 Wrap-Up

4PLS: Introduction to PLS

Page 5: Partial Least Squares (PLS): Path Modeling

PLS: A silver bullet?

Partial Least Squares Path Modeling is a statistical data analysismethodology that exists at the intersection of Regression Models,Structural Equation Models, and Multiple Table Analysis methods [9]

Goal: Use theoretical knowledge about structure of latent variables topredict indicators based on data

I Doing so with least possible distribution assumptions

I PLS-PM is known under several names: PLS-PM, PLS-SEM,component-based structural equation modeling, projection to latentstructures, soft modeling etc.

I Developed by Herman Wold in the mid 1960s under the term of "softmodeling" [14]

I After initial introduction and discussions it received little attention untilthe late 1990s, however since then sharply rising interest

5PLS: Introduction to PLS

Page 6: Partial Least Squares (PLS): Path Modeling

Why use PLS?

PLS-PM is worth considering when ...Structural model

I . . . you have a theoretical model that involves latent variablesI . . . the phenomenon you investigate is relatively new and measurement

models need to be newly developedI . . . the structural equation model is complex with a large number of latent

variables and indicator variables [12]

Observed variablesI . . . you have small sample sets (e. g. more variables than observations) [7]I . . . you have non-normal distributed dataI . . . you have multicollinearity problemsI . . . you have formative and reflective measures (to be discussed)I . . . you need minimum requirements regarding measurement scales (e. g.

ratio and nominal variables)I . . . you need minimum requirements regarding residuals distribution [1]

6PLS: Introduction to PLS

Page 7: Partial Least Squares (PLS): Path Modeling

Outline

1 Introduction to PLS

2 Putting PLS in Context

3 Model Definition

4 Solution Algorithm

5 Model Evaluation

6 Wrap-Up

7PLS: Putting PLS in Context

Page 8: Partial Least Squares (PLS): Path Modeling

General OverviewTypes of PLS:

I PLS-Path Modeling:Component-based modeling based ontheoretical structure modelMainly used in: social sciences, econometrics,marketing and strategic management

I PLS-Regression:Regression based approach investigating thelinear relationship between multiple independentvariables and dependent variable(s)Mainly used in: chemometrics, bioinformatics,sensometrics, neuroscience and anthropology

I OPLS: Orthognal projection improvesinterpretability

I PLS-DA: Used when Xr is categorial

I CB-SEM: Covariance-based structural equationmodelling

LVp LVr

XrXp

I Predictors Xp ⊂ X

I Responses Xr ⊂ X withXp ∩Xr =∅

I Exogenous latent variablesLVp ⊂ LV

I Endogenous latent variablesLVr ⊂ LV with LVp ∩LVr =∅

8PLS: Putting PLS in Context

Page 9: Partial Least Squares (PLS): Path Modeling

PLS-PM vs. CB-SEM

Both methods differ from statistical point of view. Hence, neither of the techniques isgenerally superior to the other and neither of them is appropriate for all situations. Ingeneral, the strenghts of PLS-SEM are CB-SEM’s weaknesses, and visa versa. [3]

PLS-PM (PLS-SEM)Variance-based

I The goal is prediction and theorydevelopment

I Formatively measured constructs arepart of the structural model

I The structural model is complex

I The sample size is small and/or thedata are non-normally distributed

I The plan is to use latent variable scoresin subsequent analyses

I Available Software: SmartPLS,PLSGraph, R packages (plspm) etc.

CB-SEMCovariance-based

I The goal is theory testing, theoryconfirmation, or the comparison ofalternative theories

I Error terms require additionalspecification, such as the covariation

I The structural model has non-recursiverelationships

I The research requires a globalgoodness-of-fit criterion

I Available Software: LISREL, AMOS,EQS etc.

Based on [8], [4], [11]9PLS: Putting PLS in Context

Page 10: Partial Least Squares (PLS): Path Modeling

Comparison

Prediction

Theo

ry T

estin

g

CB-SEM

PLS-PM

10PLS: Putting PLS in Context

Page 11: Partial Least Squares (PLS): Path Modeling

Outline

1 Introduction to PLS

2 Putting PLS in Context

3 Model Definition

4 Solution Algorithm

5 Model Evaluation

6 Wrap-Up

11PLS: Model Definition

Page 12: Partial Least Squares (PLS): Path Modeling

Exemplary Model

X21

X22

X23

LV2

X11

X12

X13

LV1

X31

X32

X33

LV3

Measurement Model/Outer Model Measurement Model/Outer Model

Structural Model/Inner Model

Formal definition:

I X data set with n observations and m variables

I X can be divided into J exclusive blocks with K variables each X1,1 . . .XJ,K etc.

I Each block Xj associated with LVj ; estimation of variable ("score") denoted by L̂Vj = Yj

I LV1 and LV3: reflective blocks; LV2: formative block [9]12PLS: Model Definition

Page 13: Partial Least Squares (PLS): Path Modeling

Structural Model (Inner Model)

1 Linear RelationshipAll relationships are considered linear relationships and can be notedasLVj = β0 +∑

i→jβjiLVi + εj

The coefficients βji represent the path coefficients

2 Recursive Model mandatoryCausality flow must be unidirectional (no loops)

3 Regression Specification (Predictor Specification)E(LVj |LVi) = β0i +∑

i→jβjiLVi

Specifying that the regression has to be linear under the assumptionthatcov(LVj ,εj) = 0 and εj = 0

13PLS: Model Definition

Page 14: Partial Least Squares (PLS): Path Modeling

Measurement Model (Outer Model)Reflective Indicators

X11

X12

X13

LV1

λ11

λ13

λ12

I Linear relationships:Xjk = λ0jk +λjk LVj +εjk(λjk is called loading)

I RegressionSpecification:E(Xjk |LVj) =λ0jk +λjk LVj

I Characteristics:- Unidimensional- Correlated- Xjk "fully relevant"

Formative Indicators

X21

X22

X23

LV2

λ21

λ23

λ22

LVj = λ0j +λjk Xjk + εj

E(LVj |Xjk ) = λ0j +λjk Xjk

- Multidimensional- Uncorrelated- Xjk "partly relevant"

MIMIC*

X31

X32

X33

LV3

λ31

λ33

λ32

equivalent to reflective andformative (depending on

indicator)

equivalent to reflective andformative (depending on

indicator)

In R package plspm notpossible

*multiple effect indicators for multiple causes

14PLS: Model Definition

Page 15: Partial Least Squares (PLS): Path Modeling

Weight Relations (Scores)

I The latent variables are only virtual entities

I However, as all linear relations depend on the latent variables, theyneed a representation: Weight Relations

Score: L̂Vj = Yj = ∑k

wjk Xjk

I The score, as a representation of the latent variable, is calculated asthe sum of its indicators (similar to the approach in principalcomponent analysis)

I Because of this PLS is called a component-based approach

15PLS: Model Definition

Page 16: Partial Least Squares (PLS): Path Modeling

Outline

1 Introduction to PLS

2 Putting PLS in Context

3 Model Definition

4 Solution Algorithm

5 Model Evaluation

6 Wrap-Up

16PLS: Solution Algorithm

Page 17: Partial Least Squares (PLS): Path Modeling

PLS-PM Algorithm Overview

1 Stage: Get the weights to compute latent variable scores→ Most important and most difficult

2 Stage: Estimate the path coefficients (inner model)→ Usually done via OLS

3 Stage: Obtain the loadings (outer model)→ Calculation of correlations

17PLS: Solution Algorithm

Page 18: Partial Least Squares (PLS): Path Modeling

Stage 1: Latent Variable Scores

Start:initial arbitrary outerweights (e.g. wjk = 1)

Step 1:Compute the external

approximation oflatent variablesYj = wjk Xjk

Step 4:Calculate new

outer weights wjk

Step 2: Obtain inner

weights eij

Check for convergence of outer weights

Mode A:Simple

regressionMode B:Multiple

regression(Mode C:)

Combination

Inner weightingschemes:

• Centroid scheme

• Factor scheme• Path scheme

Step 3:Compute the internal

approximation oflatent variables

Zj = eijYi$⌃i j

k⌃

18PLS: Solution Algorithm

Page 19: Partial Least Squares (PLS): Path Modeling

Stage 2 & 3

2. Stage: Path CoefficientsThe path coefficient estimates β̂ji = Bji are calculated usually using ordinaryleast squares in the multiple regression of Yi on the Yj ’s related with it

Yj = ∑i→j

β̂jiYi

In case high multicollinearity occurs PLS regression can also be applied [11]

3. Stage: LoadingsFor convenience and simplicity reasons, loadings are preferably calculatedas correlations between a latent variable and its indicators:

λ̂jk = cor(Xjk ,Yj)

19PLS: Solution Algorithm

Page 20: Partial Least Squares (PLS): Path Modeling

PLS-PM usage in R (package plspm)

Parameters to define the PLS Path ModelData Data for the modelpath_matrix Definition of inner modelblocks List definitng the blocks of variables of the outer modelscaling List defining the measurment scale of variables for non-metric datamodes Vector defining the measuremnt mode of each block

Parameters related to the PLS-PM algorithmscheme Inner path weighting schemescaled Indicates whether the data should be standardizedtol Tolerance threshold for checking convergence of the iterative stagesmaxiter maximum number of iterationsplscomp Indicates the number of PLS components when handling non-metric data

Additional parametersboot.val Indicates whether bootstrap validation must be performedbr Number of bootstrap resamplesdataset Indicates whether the data matrix should be retrieved

20PLS: Solution Algorithm

Page 21: Partial Least Squares (PLS): Path Modeling

Outline

1 Introduction to PLS

2 Putting PLS in Context

3 Model Definition

4 Solution Algorithm

5 Model Evaluation

6 Wrap-Up

21PLS: Model Evaluation

Page 22: Partial Least Squares (PLS): Path Modeling

Interpreting the Results

In PLS the real challenge is interpreting the results and makingwell-founded adjustments the model [9], p. 54

Steps of Model Assessment:

1 Assessment MeasurementModel (Outer Model)

2 Assessment Structural Model(Inner Model)

(It is important to keep this order due to modeldependencies)

22PLS: Model Evaluation

Page 23: Partial Least Squares (PLS): Path Modeling

1. Measurement Model Assessment (Outer Model)

I Formative Blocks: Evaluation relatively straightforward

I Reflective Blocks: Evaluation rather complex =⇒ Test theory applied

Formative Blocks:Variables are considered as causingthe latent variable

I They do not necessarilymeasure the same underlyingconstruct

I Not supposed to be correlated

I Compare outer weights tocheck which indicatorcontributes most efficiently

I Elimination of variables shouldbe based on multicollinearity

Reflective Blocks:Variables are considered as measuring thesame underlying construct

I Hence they need a strong mutualassociation

I Further they should be strongly related to itslatent variable

1 Unidimensionality of indicators

2 Indicators well explained

3 Constructs differ from each other

23PLS: Model Evaluation

Page 24: Partial Least Squares (PLS): Path Modeling

Deep Dive: Reflective Indicators

1 Unidimensionality of indicators: All for one and one for all(a) Cronbach’s alpha

Measures the average inter-variable correlation(considered good if > 0.7)

(b) Dillon-Goldstein’s rhoFocus on the variance of the sum of variables (considered a betterindicator than Cronbach’s alpha ([1], p.320)(considered good if > 0.7)(see [11], [13] p. 50 for formal definition)

(c) First eigenvalueFirst eigenvalue of correlation matrix should be larger than one andsecond one significantly smaller (preferably smaller than 1)

2 Loadings & Communalities: Indicators well explained

I Loadings are considered for each indicator (considered good if > 0.7)I Communalities (squared loadings): amount of indicator variance

explained by its corresponding LV

24PLS: Model Evaluation

Page 25: Partial Least Squares (PLS): Path Modeling

Deep Dive: Reflective Indicators

3 Cross-loadings: Constructs differ from each othercross-loadings =̂ loadings of an indicator with the rest of the latentvariablesGoal: Ensure that shared variance between construct and itsindicators is higher than for other constructs (no "traitor" indicators)=⇒ Loadings should always be highest for the respective block

[. . . ]$crossloadings

25PLS: Model Evaluation

Page 26: Partial Least Squares (PLS): Path Modeling

2. Structural Model Assessment (Inner Model)Standard OLS regression output:

3 further indicators of model quality:I R2 determination coeffcient: Amount of variance of endogenous LVs explained by

its independent LVs (considered low below 0.3 and high above 0.6)

I Redundancy Index: Amount of variance in the endogenous block that explained byits independent LVs (defined as Rd(LVj ,xjk ) = loading2

jk R2j )

I Goodness-of-Fit (GoF): No single criterion exists for overall quality of a model.GoF as a pseudo criterion:

GoF =

√communality×R2 (considered good if >0.7) [10] [11]

I Validation: Resampling (bootstrapping, jackknifing) possible; more traditionalapproaches are not (as there are no assumptions made on the distribution)

26PLS: Model Evaluation

Page 27: Partial Least Squares (PLS): Path Modeling

Outline

1 Introduction to PLS

2 Putting PLS in Context

3 Model Definition

4 Solution Algorithm

5 Model Evaluation

6 Wrap-Up

27PLS: Wrap-Up

Page 28: Partial Least Squares (PLS): Path Modeling

Summary: PLS

Advisable for the following conditions (based on [8])Focus Prediction and theory developmentDistribution Minimum assumptions made regarding indicator distributionSample size Small sample size possible (however questioned in literature [2], [6], [5])

Model definitionIndicators Define blocks of variables and respective latent variablesMeasurement Model Define relations (formative/reflective)Structural Model Define internal model

Interpreting the resultsMeasurement Model (formative) Eliminate multicollinearityMeasurement Model (reflective) Unidimensionality, loadings & communalities and

cross-loadingsStructural Model Consider R2, redundancy index and GoFValidation Apply resampling (bootstrapping, jackknifing)

28PLS: Wrap-Up

Page 29: Partial Least Squares (PLS): Path Modeling

Bibliography I

W. W. CHIN. The partial least squares approach to structuralequation modeling. In: Modern methods for business research,Vol. 295, No. 2 (1998), pp. 295–336.

D. GOODHUE, W. LEWIS, and R. THOMPSON. PLS, small samplesize, and statistical power in MIS research. In: System Sciences,2006. HICSS’06. Proceedings of the 39th Annual Hawaii InternationalConference on. Vol. 8. IEEE. 2006, 202b–202b.

J. F. HAIR JR et al. A primer on partial least squares structuralequation modeling (PLS-SEM). Sage Publications, 2013.

J. F. HAIR, C. M. RINGLE, and M. SARSTEDT. PLS-SEM: Indeed asilver bullet. In: Journal of Marketing Theory and Practice, Vol. 19,No. 2 (2011), pp. 139–152.

G. A. MARCOULIDES, W. W. CHIN, and C. SAUNDERS. A critical lookat partial least squares modeling. In: Mis Quarterly (2009),pp. 171–175.

29PLS: Wrap-Up

Page 30: Partial Least Squares (PLS): Path Modeling

Bibliography IIG. A. MARCOULIDES and C. SAUNDERS. Editor’s comments: PLS: asilver bullet? In: MIS quarterly, Vol. 30, No. 2 (2006), pp. iii–ix.

B.-H. MEVIK and R. WEHRENS. The pls package: principalcomponent and partial least squares regression in R. In: Journal ofStatistical Software, Vol. 18, No. 2 (2007), pp. 1–24.

W. REINARTZ, M. HAENLEIN, and J. HENSELER. An empiricalcomparison of the efficacy of covariance-based and variance-basedSEM. In: International Journal of research in Marketing, Vol. 26,No. 4 (2009), pp. 332–344.

G. SANCHEZ. PLS path modeling with R. In: Online, January (2013).

M. TENENHAUS, S. AMATO, and V ESPOSITO VINZI. A globalgoodness-of-fit index for PLS structural equation modelling. In:Proceedings of the XLII SIS scientific meeting. Vol. 1. CLEUPPadova. 2004, pp. 739–742.

30PLS: Wrap-Up

Page 31: Partial Least Squares (PLS): Path Modeling

Bibliography IIIM. TENENHAUS et al. PLS path modeling. In: Computationalstatistics & data analysis, Vol. 48, No. 1 (2005), pp. 159–205.

N. URBACH and F. AHLEMANN. Structural equation modeling ininformation systems research using partial least squares. In: Journalof Information Technology Theory and Application, Vol. 11, No. 2(2010), pp. 5–40.

V. E. VINZI, L. TRINCHERA, and S. AMATO. PLS path modeling: fromfoundations to recent developments and open issues for modelassessment and improvement. In: Handbook of partial least squares.Springer, 2010, pp. 47–82.

H. WOLD et al. Estimation of principal components and relatedmodels by iterative least squares. In: Multivariate analysis, Vol. 1(1966), pp. 391–420.

31PLS: Wrap-Up