SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 11
Performing Latent Class Analysis Performing Latent Class Analysis Using the CATMOD ProcedureUsing the CATMOD Procedure
David M. ThompsonDavid M. Thompson
Department of Biostatistics and EpidemiologyDepartment of Biostatistics and Epidemiology
College of Public Health, OUHSCCollege of Public Health, OUHSC
SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 22
Latent class analysis (LCA)Latent class analysis (LCA)
• LCA validates classification in the LCA validates classification in the absence of a gold standard for absence of a gold standard for decision-making.decision-making.
• LCA is unavailable in SAS.LCA is unavailable in SAS.
SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 33
LCA and Patient ClassificationLCA and Patient Classification
Patient classification is part of Patient classification is part of many clinical decisions.many clinical decisions.
• DiagnosisDiagnosis
• PrognosisPrognosis
SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 44
Patient classification in the Patient classification in the absence of a gold standardabsence of a gold standard
DiagnosisDiagnosis• Diagnostic categories may be Diagnostic categories may be
emerging or unclear.emerging or unclear.
PrognosisPrognosis• predicting rehabilitation outcomes predicting rehabilitation outcomes • counseling patients and families counseling patients and families
regarding expectationsregarding expectations
SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 55
Latent class analysis (LCA)Latent class analysis (LCA)• LCA is a parallel to LCA is a parallel to
factor analysis, but factor analysis, but for categorical for categorical responses. responses.
• Like factor analysis, Like factor analysis, LCA addresses the LCA addresses the complex pattern of complex pattern of association that association that appears among appears among observations….observations….
A
B
C
D
SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 66
… … and attributes the pattern to a set of and attributes the pattern to a set of latent (underlying, unobserved) factors latent (underlying, unobserved) factors or classes.or classes.
A
B
C
D
Class 1
Class 2
SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 77
What if no gold standard What if no gold standard eexisted in xisted in cardiology to assess a pattern of cardiology to assess a pattern of ““yes/no”yes/no”signs and symptoms?signs and symptoms?
Rindskopf, R., & Rindskopf, W. (1986). The value of latent class Rindskopf, R., & Rindskopf, W. (1986). The value of latent class analysis in medical diagnosis. analysis in medical diagnosis. Statistics in Medicine, 5Statistics in Medicine, 5, 21-27. , 21-27.
Q-wave in EKG
Abnormal LDH pattern
History of angina
Elevated CPK
SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 88
LCA predicts latent class membership LCA predicts latent class membership such that the observed variables are such that the observed variables are independent.independent.
Q-wave in EKG
History of
angina
abnormal LDH
pattern
elevated CPK
MI No MI
SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 99
LCA LCA estimatesestimatesLatent class prevalencesLatent class prevalencesConditional probabilities: probabilities of Conditional probabilities: probabilities of specific response, given class specific response, given class membershipmembership
Q-wave in EKG
History of
angina
abnormal LDH
pattern
elevated CPK
P(MI)P(MI) PP(MI)
P(No MI)
P(Q-wave| MI)
P(CPK| No MI)
SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 1010
Conditional probabilities are analogous to Conditional probabilities are analogous to sensitivities and specificities, sensitivities and specificities, but are calculated in the absence but are calculated in the absence of a gold standard.of a gold standard.
Q-wave in EKG
History of
angina
abnormal LDH
pattern
elevated CPK
P(MI)P(MI) PP(MI)
P(No MI)
P(Q-wave| MI)
P(CPK| No MI)
SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 1111
LCA works on unconditional contingency table LCA works on unconditional contingency table (no information on latent class membership)(no information on latent class membership)
Q-Q-wavewave
Hx of Hx of AnginaAngina
““flipped” flipped”
LDHLDH
HighHigh
CPKCPK
nnijklijkl
00 00 00 00 1515
00 00 00 11 1414
00 00 11 00 1111
00 00 11 11 88
00 11 00 00 2323
.. .. .. .. ..
11 11 11 11 99
SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 1212
LCA’s goal is to produce LCA’s goal is to produce a complete (conditional) table a complete (conditional) table that assigns counts for each latent classthat assigns counts for each latent class::
Q-waveQ-wave Hx of Hx of AnginaAngina
““flipped” flipped”
LDHLDH
HighHigh
CPKCPK
Latent Latent
Class Class
X=tX=t
nnijkltijklt
00 00 00 00 11 99
00 00 00 11 22 66
00 00 11 00 11 33
00 00 11 11 22 1111
.. .. .. .. .. ..
11 11 11 11 22 99
SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 1313
EEstimating LC parametersstimating LC parameters
• Maximum likelihood approachMaximum likelihood approach • Because LC membership is unobserved, the Because LC membership is unobserved, the
likelihood function, and the likelihood surface, likelihood function, and the likelihood surface, are complex.are complex.
SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 1414
EM algorithm EM algorithm calculates L calculates L when some data (X) are unobservedwhen some data (X) are unobserved
““E” step E” step uses parameter estimates uses parameter estimates to update expected values to update expected values
for cell counts nfor cell counts nijkltijklt in complete contingency tablein complete contingency table
““M” step M” step produces ML estimates produces ML estimates
from complete tablefrom complete table
SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 1515
EM algorithm requires initial estimatesEM algorithm requires initial estimates
““E” stepE” step
““M” stepM” step1st “E” step: 1st “E” step: Provide initial Provide initial
estimates to “fill in” estimates to “fill in” missing information missing information on LC membershipon LC membership
SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 1616
EM algorithm in SASEM algorithm in SAS
““E” step E” step
SAS DATA stepSAS DATA step
““M” stepM” step
PROC CATMODPROC CATMOD1st “E” step: 1st “E” step:
SAS DATA step that SAS DATA step that randomly assigns randomly assigns
each response profile each response profile to one latent classto one latent class
SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 1717
““M” stepM” stepods output estimates=mu;ods output estimates=mu;proc catmod order=data;proc catmod order=data; weight count;weight count; model a*b*c*d*x=_response_model a*b*c*d*x=_response_
/wls addcell=.1;/wls addcell=.1; loglin a b c d x loglin a b c d x
a*x b*x c*x d*x;a*x b*x c*x d*x;run;run;quit;quit;ods output close;ods output close;
SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 1818
““E” stepE” step• data step that uses loglinear ML estimates data step that uses loglinear ML estimates
from CATMODfrom CATMOD
• converts loglinear estimates into LC converts loglinear estimates into LC prevalences and conditional probabilitiesprevalences and conditional probabilities
• calculates joint response probabilities within calculates joint response probabilities within and summed across latent classesand summed across latent classes
• calculates “posterior probabilities”, i.e. calculates “posterior probabilities”, i.e. P(X=1|abcd)P(X=1|abcd)
• constructs a new complete (conditional) constructs a new complete (conditional) contingency tablecontingency table
SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 1919
Results of a simulation studyResults of a simulation study
• simulate responses to four binary simulate responses to four binary (yes-no) o(yes-no) observed bserved variables with variables with known but unobservable (latent) known but unobservable (latent) group membership group membership
• evaluate whether an LCA approach evaluate whether an LCA approach using CATMOD accurately detects using CATMOD accurately detects true parameterstrue parameters
SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 2020
Distribution of true LC Distribution of true LC prevalences from 1000 prevalences from 1000 simulated samples simulated samples where n=200 and where n=200 and E[P(X=1)] = 0.5E[P(X=1)] = 0.5
Parameter estimates Parameter estimates from 406 successful from 406 successful runs using CATMODruns using CATMOD
SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 2121
Distribution of conditional probabilities Distribution of conditional probabilities from 1000 simulated samples from 1000 simulated samples E[P(A=1|X=1)] = 0.9 E[P(A=1|X=2)] = 0.2E[P(A=1|X=1)] = 0.9 E[P(A=1|X=2)] = 0.2
Parameter estimates from CATMODParameter estimates from CATMOD
SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 2222
Distribution of conditional probabilities Distribution of conditional probabilities from 1000 simulated samples from 1000 simulated samples E[P(C=1|X=1)] = 0.1 E[P(C=1|X=2)] = 0.8E[P(C=1|X=1)] = 0.1 E[P(C=1|X=2)] = 0.8
Parameter estimates from CATMODParameter estimates from CATMOD
SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 2323
Concluding remarksConcluding remarks
•LCA is a potentially valuable tool in LCA is a potentially valuable tool in clinical epidemiology for clarifyclinical epidemiology for clarifying ing ill-defined diagnostic and prognostic ill-defined diagnostic and prognostic classifications. classifications.
•An approach using CATMOD brings An approach using CATMOD brings LCA closer to SAS’ analytic LCA closer to SAS’ analytic framework. framework.
SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 2424
• In any approach to LCA, sensitivity In any approach to LCA, sensitivity to initial estimates requires cautionto initial estimates requires caution
• E-M loop should iterate between 3 and E-M loop should iterate between 3 and 40 times40 times
• Initial estimates for LC prevalences Initial estimates for LC prevalences should be at least 0.3 should be at least 0.3
• Approach shoApproach should uld employemploy replicate replicate estimates using estimates using different starting valuesdifferent starting values
SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 2525
Parameter estimates from CATMODParameter estimates from CATMOD E[P(C=1|X=1)] = 0.1 E[P(C=1|X=2)] = 0.8E[P(C=1|X=1)] = 0.1 E[P(C=1|X=2)] = 0.8
Replicated parameter estimates from CATMODReplicated parameter estimates from CATMOD
SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 2727
AcknowledgementsAcknowledgements
Barbara R. Neas, Ph.D.Barbara R. Neas, Ph.D.
Willis Owen, Ph.D.Willis Owen, Ph.D.
Dept. of Biostatistics and Dept. of Biostatistics and Epidemiology, OUHSCEpidemiology, OUHSC
Gary Raskob, PGary Raskob, Ph.D.h.D.
Dean, College of Public Health, OUHSCDean, College of Public Health, OUHSC
SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 2828
Assumptions of LCAAssumptions of LCA
• Exhaustiveness Exhaustiveness ABCD ABCD = = X=X=t t ABCDX ABCDX
• Local Independence Local Independence ABCDX ABCDX
= = ABCD|X ABCD|X
==A|X A|X B|X B|X C|X C|X D|X D|X X X
(Goodman’s probabilistic parameterization of a (Goodman’s probabilistic parameterization of a latent class model with four manifest indicators)latent class model with four manifest indicators)
SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 2929
• Local Independence (2)Local Independence (2)
ABCDXABCDX==A|X A|X B|X B|X C|X C|X D|X D|X X X
ln ln ABCDX ABCDX == + + iiA A + + jj
B B + + kkC C + + ll
D D
+ + ttX X + + itit
AX AX + + jtjtBX BX + + ktkt
CX CX + + ltltDXDX
(Haberman’s loglinear parameterization of a latent (Haberman’s loglinear parameterization of a latent class model with four manifest indicators)class model with four manifest indicators)
SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 3030
EM algorithmEM algorithm• A way around the difficulty inherent in calculating L A way around the difficulty inherent in calculating L
when some data (X) are unobserved.when some data (X) are unobserved.
• The first “E” (expectation) step requires initial The first “E” (expectation) step requires initial estimates, which essentially “fill in” missing estimates, which essentially “fill in” missing information on LC membershipinformation on LC membership
• ““M” step maximizes likelihood for complete but M” step maximizes likelihood for complete but provisional data, then passes the associated provisional data, then passes the associated parameter estimates to next “E” step.parameter estimates to next “E” step.
• Given updated parameter estimates, revises the Given updated parameter estimates, revises the expected values for cell counts nexpected values for cell counts nijkltijklt in the complete in the complete contingency table while preserving observed marginal contingency table while preserving observed marginal counts ncounts nijklijkl..
• Finds new parameter estimates that maximize L. Finds new parameter estimates that maximize L.
SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 3131
Prognostic classificationPrognostic classification• Professionals must classify patients even Professionals must classify patients even when information is limited and available only when information is limited and available only in ‘yes/no’ form.in ‘yes/no’ form.
• Example of a challenge to prognostic Example of a challenge to prognostic classification:classification:
• able to ascend/descend flight of 3 able to ascend/descend flight of 3 stairs?stairs?
• positive screening test for depression?positive screening test for depression?
• spouse living at home?spouse living at home?
• independent in using toilet and bath?independent in using toilet and bath?
SUGI 31 - Contributed paper 201-31SUGI 31 - Contributed paper 201-31 3232
Maximum likelihood approach Maximum likelihood approach to estimating LC parametersto estimating LC parameters
• probability of obtaining observed count nprobability of obtaining observed count n ijklijkl for for response profile {i,j,k,l} is response profile {i,j,k,l} is ((ABCDX ABCDX ))nnijkltijklt
• likelihood of obtaining a set of observed counts likelihood of obtaining a set of observed counts for all response profiles is for all response profiles is
L = L = ii j j k k l l t t ( (ABCDX ABCDX ))nnijkltijklt
log L = log L = ii j j k k l l t t n n ijkltijklt ln( ln(ABCDX ABCDX ))
• Because LC membership (X=t) is unobserved, Because LC membership (X=t) is unobserved, likelihood function is complicated.likelihood function is complicated.