panel data analysis: a survey on model-based clustering of time series - statswork

14
Panel Data Analysis: A Survey on Model-Based Clustering of Time Series An Academic presentation by Dr. Nancy Agens, Head, Technical Operations, Statswork Group www.statswork.com Email: [email protected]

Upload: statsworkfb

Post on 31-Jan-2020

2 views

Category:

Services


0 download

DESCRIPTION

The Clustering technique in Statistical Analysis is used to determine the subsets as clusters in the data using the specified distance measure. However, this technique cannot be applied easily for longitudinal or time-series data. In this blog, I will discuss some of the methods used for modeling longitudinal or panel data using the Clustering Analysis technique as explained in Schmatter (2011). Statswork offers statistical services as per the requirements of the customers. When you Order statistical Services at Statswork, we promise you the following – Always on Time, outstanding customer support, and High-quality Subject Matter Experts. Why Statswork? Plagiarism Free | Unlimited Support | Prompt Turnaround Times | Subject Matter Expertise | Experienced Bio-statisticians & Statisticians | Statistics Across Methodologies | Wide Range Of Tools & Technologies Supports | Tutoring Services | 24/7 Email Support | Recommended by Universities Contact Us: Website: http://www.statswork.com/ Email: [email protected] UnitedKingdom: +44-1143520021 India: +91-4448137070 WhatsApp: +91-8754446690

TRANSCRIPT

Page 1: Panel Data Analysis: A Survey On Model-Based Clustering Of Time Series - Statswork

Panel Data Analysis: A Survey on Model-BasedClustering of Time Series

An Academic presentation by

Dr. Nancy Agens, Head, Technical Operations, Statswork Group  www.statswork.comEmail: [email protected]

Page 2: Panel Data Analysis: A Survey On Model-Based Clustering Of Time Series - Statswork

In Brief

Longitudinal Data

Model Based Clustering

Example on Model Based Clustering

Dirichlet Prior

MCMC Simulation

Conclusion

Outline of Topics

TODAY'S DISCUSSION

Page 3: Panel Data Analysis: A Survey On Model-Based Clustering Of Time Series - Statswork

In Brief

Clustering technique in Statistical Analysis is used to determine the

subsets as clusters in the data using specified distance measure.

We will discuss about some of the methods used for modeling

longitudinal or panel data using Clustering Analysis technique

Page 4: Panel Data Analysis: A Survey On Model-Based Clustering Of Time Series - Statswork

Longitudinal data is actually a sample of observations which are measured repeatedlyover time.

And, nowadays, longitudinal/repeated measure data or panel data exists in all areas ofApplied statistics such as finance, psychology, economics and social sciences.

Most studies deals with analyzing homogeneity in such Time series data.

The most common method of capturing the heterogeneity is to assume the presence oflatent classes and each class are stratified using the covariates.

Longitudinal Data

Page 5: Panel Data Analysis: A Survey On Model-Based Clustering Of Time Series - Statswork

Measuring the distance between time series data is notappropriate thus a cluster based modeling strategy forfinite mixture models is adopted using Bayesian rule.

Model based clustering considers each time series to asingle unit contained in an unknown latent class.

One can see an excellent review of finite mixturemodels for longitudinal data in Vermunt (2010)especially in the areas of psychology, bio-statistics andother applied areas.

Model BasedClustering

Page 6: Panel Data Analysis: A Survey On Model-Based Clustering Of Time Series - Statswork

The data consists of 237 teenagers who use marijuana for the year 1976-1980.

The use marijuana is categorized into three types as never, not more than once a month and morethan once a month.

The following figure represents the sample of 10 observed response of use of marijuana usageamong the 237 teenagers.

The model considered for analyzing the marijuana usage is based on Generalized transition model.

Example on Model Based Clustering

Page 7: Panel Data Analysis: A Survey On Model-Based Clustering Of Time Series - Statswork

Figure: ModelBased clustering

Page 8: Panel Data Analysis: A Survey On Model-Based Clustering Of Time Series - Statswork

A Dirichlet prior is chosen in this case since the observed response variable is of categorical in nature.

Five different kernel classes are considered and evaluated the model using Dirichlet priordistribution and the results for the same is presented in the following table.

The clustering kernel M2 to M5 shows that there exists a common behaviour in marijuana usage.

If the value is smaller than one, then one may conclude that the method is overfitting, in this case, H3class of kernel seems to be overfitting.

Dirichlet Prior

Page 9: Panel Data Analysis: A Survey On Model-Based Clustering Of Time Series - Statswork

Table: Dirichlet PriorDistribution

Page 10: Panel Data Analysis: A Survey On Model-Based Clustering Of Time Series - Statswork

An MCMC simulation is carried out for M3 with H2 and the following figure explains the sampleof boxplots of the posterior probabilities for male and female groups.

Comparing the likelihood results obtained from the above table (598.5) and the previous table(596.5) the stratified Model based clustering reduces to Standard Model based clustering and itis clear that the use of marijuana is not associated with the gender classification.

From this results, it is concluded that the use of marijuana among teenagers may be clusteredinto two with never-use and other being more user groups.

MCMC Simulation

Page 11: Panel Data Analysis: A Survey On Model-Based Clustering Of Time Series - Statswork

Figure: Boxplotsfor MCMCSimulation

Page 12: Panel Data Analysis: A Survey On Model-Based Clustering Of Time Series - Statswork

Table: Gender Specific Posterior Inference

Page 13: Panel Data Analysis: A Survey On Model-Based Clustering Of Time Series - Statswork

To sum up, model-based clustering technique along with the Bayesian flavor yields betterresults since it provides an answer to the most troublesome problems in the cluster analysis.

In longitudinal or Panel data studies, usage of eculidean distance may be a valid one andhence a kernel based clustering for Time series data Analysis is considered and selection ofthe best method is analysed using different information criteria.

An MCMC simulation is carried out to find the optimal clustering methodology.

Conclusion

Page 14: Panel Data Analysis: A Survey On Model-Based Clustering Of Time Series - Statswork

CONTACT US

+44-1143520021UNITED KINGDOM

+91-4448137070

EMAIL

INDIA

[email protected]