applying latent profile analysis to classify chicago neighborhoods · 2019. 5. 15. · latent...
TRANSCRIPT
![Page 1: Applying Latent Profile Analysis to Classify Chicago Neighborhoods · 2019. 5. 15. · Latent Profile Analysis • Latent profile models are commonly attributed to Lazarsfeld and](https://reader035.vdocuments.net/reader035/viewer/2022071021/5fd58361ae64ae64c72a0a2c/html5/thumbnails/1.jpg)
Applying Latent Profile Analysisto Classify Chicago Neighborhoods
Oksana Pugach, PhD
Institute for Health Research and Policy
University of Illinois at Chicago
December, 2017
![Page 2: Applying Latent Profile Analysis to Classify Chicago Neighborhoods · 2019. 5. 15. · Latent Profile Analysis • Latent profile models are commonly attributed to Lazarsfeld and](https://reader035.vdocuments.net/reader035/viewer/2022071021/5fd58361ae64ae64c72a0a2c/html5/thumbnails/2.jpg)
Cluster Analysis
• Identifying group of individuals or objects that are similar to each other but different from
individuals in other groups
• Cluster analysis and discriminant analysis both classify objects into categories
• In a nutshell:
– select cases
– select variables (standardize?)
– select clustering procedure
• hierarchical clustering
• k-means clustering
• two-step clustering
• Cluster analysis does not identify a particular statistical method
![Page 3: Applying Latent Profile Analysis to Classify Chicago Neighborhoods · 2019. 5. 15. · Latent Profile Analysis • Latent profile models are commonly attributed to Lazarsfeld and](https://reader035.vdocuments.net/reader035/viewer/2022071021/5fd58361ae64ae64c72a0a2c/html5/thumbnails/3.jpg)
Cluster Analysis
• Different cluster methods will result in different and conflicting solutions. Final cluster
solution and selection of cluster number is informal and subjective
• Alternative approach to clustering which postulates a formal statistical model for the
population: model assumes that population consists of subpopulations (‘clusters’) in each of
which variables have different multivariate probability density function, resulting is a finite
mixture density for the population as a whole.
• Problem: estimate parameters of the density functions and mixing probabilities
• Calculate: posterior probability of cluster membership
• How to determine number of clusters: model selection by objective procedures
![Page 4: Applying Latent Profile Analysis to Classify Chicago Neighborhoods · 2019. 5. 15. · Latent Profile Analysis • Latent profile models are commonly attributed to Lazarsfeld and](https://reader035.vdocuments.net/reader035/viewer/2022071021/5fd58361ae64ae64c72a0a2c/html5/thumbnails/4.jpg)
Latent Profile Analysis• Latent profile models are commonly attributed to Lazarsfeld and Henry (1968).
• Cluster analysis based on finite mixture models (FMM) are aka model-based clustering methods (Banfield, J. D & Raftery, A. E, 1993)
• FMM can be seen as a form of latent variable analysis (Skrondal & Rabe-Hesketh, 2004) with subpopulation being a latent categorical variable – aka latent class cluster analysis
Source: Oberski, D. (2016). Mixture Models: Latent Profile and Latent Class Analysis. In Modern Statistical Methods for HCI (pp. 275–287). Springer, Cham. https://doi.org/10.1007/978-3-319-26633-6_12
Observed Models for means Regression models
Latent Latent
Continuous Discrete Continuous Discrete
Continuous Factor analysisLatent profile analysis
Random effectsRegression mixture
DiscreteItem response theory
Latent class analysis
Logistic ran. eff. Logistic reg. mix.
Names of different kinds of latent variable models
![Page 5: Applying Latent Profile Analysis to Classify Chicago Neighborhoods · 2019. 5. 15. · Latent Profile Analysis • Latent profile models are commonly attributed to Lazarsfeld and](https://reader035.vdocuments.net/reader035/viewer/2022071021/5fd58361ae64ae64c72a0a2c/html5/thumbnails/5.jpg)
Finite Mixture Densities
• Model
• x – p-dimensional random vector
• Pj – mixing probabilities
• gj() – component densities
• c – number of clusters
Assumption for finite mixture as model for cluster analysis: each group of observations in a
dataset comes from population with a different probability distribution
1
( ) ;c
j j jj
f p g
x;p,θ x θ
11
c
jj
p
![Page 6: Applying Latent Profile Analysis to Classify Chicago Neighborhoods · 2019. 5. 15. · Latent Profile Analysis • Latent profile models are commonly attributed to Lazarsfeld and](https://reader035.vdocuments.net/reader035/viewer/2022071021/5fd58361ae64ae64c72a0a2c/html5/thumbnails/6.jpg)
Cluster allocation
Having estimated the parameters of the assumed mixture density, observations can be
associated with particular clusters based on the basis of the maximum value of the posterior
probability
ˆˆ ,Pr |
ˆˆ; ,
j j i
i
i
p gcluster j
f
x θx
x p θ
![Page 7: Applying Latent Profile Analysis to Classify Chicago Neighborhoods · 2019. 5. 15. · Latent Profile Analysis • Latent profile models are commonly attributed to Lazarsfeld and](https://reader035.vdocuments.net/reader035/viewer/2022071021/5fd58361ae64ae64c72a0a2c/html5/thumbnails/7.jpg)
Maximum Likelihood Estimation
Estimation by:
Expectation Maximization algorithm (usually used)
Bayesian estimation methods using Gibbs sampler or other MCMC methods
1
, ln ; ,n
i
i
l f x
p θ p θ
![Page 8: Applying Latent Profile Analysis to Classify Chicago Neighborhoods · 2019. 5. 15. · Latent Profile Analysis • Latent profile models are commonly attributed to Lazarsfeld and](https://reader035.vdocuments.net/reader035/viewer/2022071021/5fd58361ae64ae64c72a0a2c/html5/thumbnails/8.jpg)
Maximum Likelihood Estimation for mixtures of multivariate normal
• As number of clusters increases, number of model parameters increases rapidly. Restrictions
on can be imposed to obtain more parsimony and stability.
• Banfield, J. D & Raftery, A. E, 1993 proposed reparameterizing of class-specific covariance
matrix by principal component
Geometrical interpretation of the decomposition
Volume, Orientation, and Shape of j-cluster
Restrictions applied can be directly interpreted in terms of geometrical form of a cluster
j j j j jD A D
j
![Page 9: Applying Latent Profile Analysis to Classify Chicago Neighborhoods · 2019. 5. 15. · Latent Profile Analysis • Latent profile models are commonly attributed to Lazarsfeld and](https://reader035.vdocuments.net/reader035/viewer/2022071021/5fd58361ae64ae64c72a0a2c/html5/thumbnails/9.jpg)
Parameterisations of the within-group covariance matrix for multidimensional data available in the mclust
package, and the corresponding geometric characteristics (Scrucca, Fop, Murphy, & Raftery, 2016)
![Page 10: Applying Latent Profile Analysis to Classify Chicago Neighborhoods · 2019. 5. 15. · Latent Profile Analysis • Latent profile models are commonly attributed to Lazarsfeld and](https://reader035.vdocuments.net/reader035/viewer/2022071021/5fd58361ae64ae64c72a0a2c/html5/thumbnails/10.jpg)
![Page 11: Applying Latent Profile Analysis to Classify Chicago Neighborhoods · 2019. 5. 15. · Latent Profile Analysis • Latent profile models are commonly attributed to Lazarsfeld and](https://reader035.vdocuments.net/reader035/viewer/2022071021/5fd58361ae64ae64c72a0a2c/html5/thumbnails/11.jpg)
Example of mixture of two normals
![Page 12: Applying Latent Profile Analysis to Classify Chicago Neighborhoods · 2019. 5. 15. · Latent Profile Analysis • Latent profile models are commonly attributed to Lazarsfeld and](https://reader035.vdocuments.net/reader035/viewer/2022071021/5fd58361ae64ae64c72a0a2c/html5/thumbnails/12.jpg)
Other finite mixture models
• Mixture of multivariate t-distributions – robust to outliers and skewed distributions
• Mixtures for categorical data – latent class analysis.
• Multivariate Bernoulli densities with assumption that, given class, the categorical
variables are independent of each other.
![Page 13: Applying Latent Profile Analysis to Classify Chicago Neighborhoods · 2019. 5. 15. · Latent Profile Analysis • Latent profile models are commonly attributed to Lazarsfeld and](https://reader035.vdocuments.net/reader035/viewer/2022071021/5fd58361ae64ae64c72a0a2c/html5/thumbnails/13.jpg)
Model selection and Inference
• Log-likelihood ratio test
• Unfortunately this does not lead to a suitable statistical test, since the regularity conditions do not hold for - it is on
the edge of the parameter space, when components coincide, their mixing probability become unidentifiable. Tends to
overestimate number of clusters. Alternative – parametric bootstrap – preferred method. Both are available only for
nested models.
• Information theoretic approaches
• Uses a measure of information lost when a particular model is used to approximate the true model: AIC and BIC –
both are penalized log-likelihoods. Smaller value is preferred. All depends heavily on regularity conditions, which do
not necessarily holds in FMM. Robustness is not studied. Recommended to use multiple criteria along with
theoretical and practical considerations.
• Bayes factors
• It is a posterior odds of one model against another model. Estimation requires integration of marginal likelihood
(limitation).
• MCMC method using reversible jump MCMC
![Page 14: Applying Latent Profile Analysis to Classify Chicago Neighborhoods · 2019. 5. 15. · Latent Profile Analysis • Latent profile models are commonly attributed to Lazarsfeld and](https://reader035.vdocuments.net/reader035/viewer/2022071021/5fd58361ae64ae64c72a0a2c/html5/thumbnails/14.jpg)
Statistical Software
• R: mclust by Fraley and Raftery
• R: flexmix by Gruen and Fleisch
• R: caman by Schlattmann
• Latent GOLD (Statistical Innovations) - is a powerful latent class and finite mixture
program with a very user-friendly point-and-click interface (GUI).
• Mplus by Muthen and Muthen
• gllamm in Stata
• FMM in SAS (experimental)
![Page 15: Applying Latent Profile Analysis to Classify Chicago Neighborhoods · 2019. 5. 15. · Latent Profile Analysis • Latent profile models are commonly attributed to Lazarsfeld and](https://reader035.vdocuments.net/reader035/viewer/2022071021/5fd58361ae64ae64c72a0a2c/html5/thumbnails/15.jpg)
Application
• Project: Measuring Disparities in the Chain of Survival in Latino Communities
• PI: Marina Del Rios Rivera, MD, MSc
• Funding Agency: American Heart Association (Award No. 16MCPRP30960065)
• Purpose: Explore the relationship between neighborhood-level variables (i.e., language,
educational attainment, and residential instability) and out-of-hospital cardiac arrest
(OHCA) outcomes in Hispanics.
• Data: Surveillance data prospectively submitted to the Cardiac Arrest Registry to Enhance
Survival (CARES) will be geocoded to Census Tracts.
![Page 16: Applying Latent Profile Analysis to Classify Chicago Neighborhoods · 2019. 5. 15. · Latent Profile Analysis • Latent profile models are commonly attributed to Lazarsfeld and](https://reader035.vdocuments.net/reader035/viewer/2022071021/5fd58361ae64ae64c72a0a2c/html5/thumbnails/16.jpg)
Concentrated disadvantage- composite measure of census-tract level socioeconomic composition in Chicago
Sampson
et.al., 1997
Cagney and
Browning,
2004
Current Analysis,
N=797
mean (sd)
Age Dependency Ratio 52.90 (20.01)
% Unemployed 14.52 (10.15)
% Female-headed HH 20.12 (14.34)
% Median Income HH, 1K 49.66 (26.68)
% Vacant Housing 13.90 (8.81)
% Below Poverty 24.07 (14.65)
% on Public Assistance 24.44 (17.66)
% Less Than High School 18.39 (12.95)
% less than Age 18
% Black
Census tract characteristics of 2010-2014 5-year ACS estimates
![Page 17: Applying Latent Profile Analysis to Classify Chicago Neighborhoods · 2019. 5. 15. · Latent Profile Analysis • Latent profile models are commonly attributed to Lazarsfeld and](https://reader035.vdocuments.net/reader035/viewer/2022071021/5fd58361ae64ae64c72a0a2c/html5/thumbnails/17.jpg)
> library(mclust) > mod <- Mclust(mydata2[,-1]) > summary(mod$BIC) Best BIC values: VVV,4 VVE,6 VVV,3 BIC -45562.89 -45592.95763 -45606.92200 BIC diff 0.00 -30.06785 -44.03223 > summary(mod) ---------------------------------------------------- Gaussian finite mixture model fitted by EM algorithm ---------------------------------------------------- Mclust VVV (ellipsoidal, varying volume, shape, and orientation) model with 4 components: log.likelihood n df BIC ICL -22183.51 797 179 -45562.89 -45638.06 Clustering table: 1 2 3 4 260 331 185 21
21 cases is 2.6%
![Page 18: Applying Latent Profile Analysis to Classify Chicago Neighborhoods · 2019. 5. 15. · Latent Profile Analysis • Latent profile models are commonly attributed to Lazarsfeld and](https://reader035.vdocuments.net/reader035/viewer/2022071021/5fd58361ae64ae64c72a0a2c/html5/thumbnails/18.jpg)
BIC plot
![Page 19: Applying Latent Profile Analysis to Classify Chicago Neighborhoods · 2019. 5. 15. · Latent Profile Analysis • Latent profile models are commonly attributed to Lazarsfeld and](https://reader035.vdocuments.net/reader035/viewer/2022071021/5fd58361ae64ae64c72a0a2c/html5/thumbnails/19.jpg)
Fitting mixture model with 3 classes
> mod.3 <- Mclust(mydata2[,-1], G=3) > summary(mod.3) ---------------------------------------------------- Gaussian finite mixture model fitted by EM algorithm ---------------------------------------------------- Mclust VVV (ellipsoidal, varying volume, shape, and orientation) model with 3 components: log.likelihood n df BIC ICL -22355.84 797 134 -45606.92 -45689.57 Clustering table: 1 2 3 328 273 196
![Page 20: Applying Latent Profile Analysis to Classify Chicago Neighborhoods · 2019. 5. 15. · Latent Profile Analysis • Latent profile models are commonly attributed to Lazarsfeld and](https://reader035.vdocuments.net/reader035/viewer/2022071021/5fd58361ae64ae64c72a0a2c/html5/thumbnails/20.jpg)
Mixture probabilities and mean (sd) for Census tract characteristics
Component Class 1 Class 2 Class 3Mixing Probabilities 41.3% 34.3% 24.4%Age Dependency Ratio 49.6 (14.7) 67.6 (15.4) 37.9 (19.9) % Less Than High School 24.2 (14.3) 20.4 (8.21) 5.6 (4.48) % Unemployed 11.3 (4.27) 24.8 (9.8) 5.45 (2.67) % Female-headed HH 15.4 (6.71) 35.8 (11) 6.03 (4.02) % Median Income HH, 1K 44.7 (12.5) 29.4 (10.3) 86.5 (22.9) % Vacant Housing 10.8 (4.65) 21 (9.63) 9.11 (6.44) % Below Poverty 22.6 (9.58) 36.4 (13.6) 9.22 (4.94) % on Public Assistance 21.1 (9.89) 42.2 (13.6) 5.2 (4.23) Labels poor distressed affluent
![Page 21: Applying Latent Profile Analysis to Classify Chicago Neighborhoods · 2019. 5. 15. · Latent Profile Analysis • Latent profile models are commonly attributed to Lazarsfeld and](https://reader035.vdocuments.net/reader035/viewer/2022071021/5fd58361ae64ae64c72a0a2c/html5/thumbnails/21.jpg)
Uncertainty plot
![Page 22: Applying Latent Profile Analysis to Classify Chicago Neighborhoods · 2019. 5. 15. · Latent Profile Analysis • Latent profile models are commonly attributed to Lazarsfeld and](https://reader035.vdocuments.net/reader035/viewer/2022071021/5fd58361ae64ae64c72a0a2c/html5/thumbnails/22.jpg)
> drmod<- MclustDR(mod.3, lambda=1) > summary(drmod) > plot(drmod, what='contour') > plot(drmod, what='contour') > miscl<-mod.3$uncertainty>0.3 > points(drmod$dir[miscl,], pch=1, cex=2) > table(miscl) miscl FALSE TRUE 761 36
Contour plot of estimated mixture
densities on a projection subspace
![Page 23: Applying Latent Profile Analysis to Classify Chicago Neighborhoods · 2019. 5. 15. · Latent Profile Analysis • Latent profile models are commonly attributed to Lazarsfeld and](https://reader035.vdocuments.net/reader035/viewer/2022071021/5fd58361ae64ae64c72a0a2c/html5/thumbnails/23.jpg)
Chicago Map
![Page 24: Applying Latent Profile Analysis to Classify Chicago Neighborhoods · 2019. 5. 15. · Latent Profile Analysis • Latent profile models are commonly attributed to Lazarsfeld and](https://reader035.vdocuments.net/reader035/viewer/2022071021/5fd58361ae64ae64c72a0a2c/html5/thumbnails/24.jpg)
Classification by %Race
![Page 25: Applying Latent Profile Analysis to Classify Chicago Neighborhoods · 2019. 5. 15. · Latent Profile Analysis • Latent profile models are commonly attributed to Lazarsfeld and](https://reader035.vdocuments.net/reader035/viewer/2022071021/5fd58361ae64ae64c72a0a2c/html5/thumbnails/25.jpg)
• calculated as weighted by factor loading sum of components with loading above 0.3
• Mean (range) = 210.60 (37.81 – 406.15)
• Density Plot
• Class n mean sd min max
• 1 328 203.92 39.08 114.39 315.19
• 2 273 290.47 50.17 155.18 406.15
• 3 196 110.53 29.66 37.81 194.17
Concentrated disadvantage as continuous variable
![Page 26: Applying Latent Profile Analysis to Classify Chicago Neighborhoods · 2019. 5. 15. · Latent Profile Analysis • Latent profile models are commonly attributed to Lazarsfeld and](https://reader035.vdocuments.net/reader035/viewer/2022071021/5fd58361ae64ae64c72a0a2c/html5/thumbnails/26.jpg)
Thank you!
• This work was supported by Award No. 16MCPRP30960065 from the NIH – American Heart Association and by the Methodology Research Core at IHRP, UIC.
![Page 27: Applying Latent Profile Analysis to Classify Chicago Neighborhoods · 2019. 5. 15. · Latent Profile Analysis • Latent profile models are commonly attributed to Lazarsfeld and](https://reader035.vdocuments.net/reader035/viewer/2022071021/5fd58361ae64ae64c72a0a2c/html5/thumbnails/27.jpg)
References
• Banfield, J. D, & Raftery, A. E. (1993). Model-based Gaussian and non-Gaussian clustering. Biometrics, 49, 803–821.
• Browning, C. R., & Cagney, K. A. (2002). Neighborhood structural disadvantage, collective efficacy, and self-rated physical health in an urban setting. Journal of Health and Social Behavior, 43(4), 383–399.
• Cagney, K. A., & Browning, C. R. (2004). Exploring Neighborhood-level Variation in Asthma and other Respiratory Diseases. Journal of General Internal Medicine, 19(3), 229–236. https://doi.org/10.1111/j.1525-1497.2004.30359.x
• Hagenaars, J. A., & McCutcheon, A. L. (2002). Applied Latent Class Analysis. New York: Cambridge University Press. Retrieved from http://ebookcentral.proquest.com/lib/uic/detail.action?docID=217833
• Oberski, D. (2016). Mixture Models: Latent Profile and Latent Class Analysis. In Modern Statistical Methods for HCI (pp. 275–287). Springer, Cham. https://doi.org/10.1007/978-3-319-26633-6_12
• Sampson, R. J., Raudenbush, S. W., & Earls, F. (1997). Neighborhoods and Violent Crime: A Multilevel Study of Collective Efficacy. Science, 277(5328), 918–924. https://doi.org/10.1126/science.277.5328.918
• Scrucca, L., Fop, M., Murphy, T. B., & Raftery, A. E. (2016). mclust 5: Clustering, classification and density estimation using gaussian finite mixture models. The R Journal, 8(1), 289.
• Skrondal, A., & Rabe-Hesketh, S. (2004). Generalized Latent Variable Modeling: Multilevel, Longitudinal, and Structural Equation Models (1 edition). Boca Raton: Chapman and Hall/CRC.
• Wiley: Cluster Analysis, 5th Edition - Brian S. Everitt, Sabine Landau, Morven Leese, et al. (n.d.). Retrieved November 30, 2017, from http://www.wiley.com/WileyCDA/WileyTitle/productCd-EHEP002266.html