panel data analysis: a survey on model-based clustering of time series - statswork

Panel Data Analysis: A Survey on Model-BasedClustering of Time Series

An Academic presentation by

Dr. Nancy Agens, Head, Technical Operations, Statswork Group www.statswork.comEmail: [email protected]

In Brief

Longitudinal Data

Model Based Clustering

Example on Model Based Clustering

Dirichlet Prior

MCMC Simulation

Conclusion

Outline of Topics

TODAY'S DISCUSSION

In Brief

Clustering technique in Statistical Analysis is used to determine the

subsets as clusters in the data using specified distance measure.

We will discuss about some of the methods used for modeling

longitudinal or panel data using Clustering Analysis technique

http://www.statswork.com/

http://www.statswork.com/services/quantitative-data-analysis/

Longitudinal data is actually a sample of observations which are measured repeatedlyover time.

And, nowadays, longitudinal/repeated measure data or panel data exists in all areas ofApplied statistics such as finance, psychology, economics and social sciences.

Most studies deals with analyzing homogeneity in such Time series data.

The most common method of capturing the heterogeneity is to assume the presence oflatent classes and each class are stratified using the covariates.

Longitudinal Data

https://daviddalpiaz.github.io/appliedstats/index.html

http://statswork.com/blog/evaluation-of-autoregressive-time-series-prediction-using-validity-of-cross-validation/

Measuring the distance between time series data is notappropriate thus a cluster based modeling strategy forfinite mixture models is adopted using Bayesian rule.

Model based clustering considers each time series to asingle unit contained in an unknown latent class.

One can see an excellent review of finite mixturemodels for longitudinal data in Vermunt (2010)especially in the areas of psychology, bio-statistics andother applied areas.

Model BasedClustering

http://statswork.com/blog/model-based-clustering-using-bayesian-approach-for-binary-panel-probit-models/

The data consists of 237 teenagers who use marijuana for the year 1976-1980.

The use marijuana is categorized into three types as never, not more than once a month and morethan once a month.

The following figure represents the sample of 10 observed response of use of marijuana usageamong the 237 teenagers.

The model considered for analyzing the marijuana usage is based on Generalized transition model.

Example on Model Based Clustering

http://statswork.com/blog/factor-analysis/

Figure: ModelBased clustering

A Dirichlet prior is chosen in this case since the observed response variable is of categorical in nature.

Five different kernel classes are considered and evaluated the model using Dirichlet priordistribution and the results for the same is presented in the following table.

The clustering kernel M2 to M5 shows that there exists a common behaviour in marijuana usage.

If the value is smaller than one, then one may conclude that the method is overfitting, in this case, H3class of kernel seems to be overfitting.

Dirichlet Prior

Table: Dirichlet PriorDistribution

An MCMC simulation is carried out for M3 with H2 and the following figure explains the sampleof boxplots of the posterior probabilities for male and female groups.

Comparing the likelihood results obtained from the above table (598.5) and the previous table(596.5) the stratified Model based clustering reduces to Standard Model based clustering and itis clear that the use of marijuana is not associated with the gender classification.

From this results, it is concluded that the use of marijuana among teenagers may be clusteredinto two with never-use and other being more user groups.

MCMC Simulation

https://www.sciencedirect.com/topics/medicine-and-dentistry/model-based-clustering

https://advances.sciencemag.org/content/4/7/eaaq1360

Figure: Boxplotsfor MCMCSimulation

Table: Gender Specific Posterior Inference

To sum up, model-based clustering technique along with the Bayesian flavor yields betterresults since it provides an answer to the most troublesome problems in the cluster analysis.

In longitudinal or Panel data studies, usage of eculidean distance may be a valid one andhence a kernel based clustering for Time series data Analysis is considered and selection ofthe best method is analysed using different information criteria.

An MCMC simulation is carried out to find the optimal clustering methodology.

Conclusion

https://www.semanticscholar.org/paper/Panel-data-analysis%3A-a-survey-on-model-based-of-Fr%C3%BChwirth-Schnatter/c0f6f03b6f92cbd5523915845024241231317110

http://statswork.com/blog/application-of-time-series-analysis-in-financial-economics/

CONTACT US

+44-1143520021UNITED KINGDOM

+91-4448137070

EMAIL

INDIA

[email protected]

panel data analysis: a survey on model-based clustering of time series - statswork

Services