an overview of most common statistical packages for data analysis

55
Outline Introduction to Multidimensional Data Analysis Multidimensional techniques Statistical packages An overview of most common Statistical packages for data analysis Antonio Lucadamo Universit` a del Sannio - Italy [email protected] Workshop in Methodology of Teaching Statistics Novi Sad, December, 13 - 2011

Upload: others

Post on 12-Sep-2021

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An overview of most common Statistical packages for data analysis

Outline Introduction to Multidimensional Data Analysis Multidimensional techniques Statistical packages

An overview of most common Statistical packagesfor data analysis

Antonio Lucadamo

Universita del Sannio - [email protected]

Workshop in Methodology of Teaching Statistics

Novi Sad, December, 13 - 2011

Page 2: An overview of most common Statistical packages for data analysis

Outline Introduction to Multidimensional Data Analysis Multidimensional techniques Statistical packages

1 Introduction to Multidimensional Data Analysis

2 Multidimensional techniques

3 Statistical packages

Page 3: An overview of most common Statistical packages for data analysis

Outline Introduction to Multidimensional Data Analysis Multidimensional techniques Statistical packages

Pearson (1901)

Spearman (1904)

Hotelling (1933-1936)

The absence of adequate tools has been, for many years, an obstacleto the development of these methods. They were studied only in atheoretical context. (Multivariate analysis)

1960-1970: Benzecri - Analyse des donnees (Multidimensional DataAnalysis)

Page 4: An overview of most common Statistical packages for data analysis

Outline Introduction to Multidimensional Data Analysis Multidimensional techniques Statistical packages

Pearson (1901)

Spearman (1904)

Hotelling (1933-1936)

The absence of adequate tools has been, for many years, an obstacleto the development of these methods. They were studied only in atheoretical context. (Multivariate analysis)

1960-1970: Benzecri - Analyse des donnees (Multidimensional DataAnalysis)

Page 5: An overview of most common Statistical packages for data analysis

Outline Introduction to Multidimensional Data Analysis Multidimensional techniques Statistical packages

Pearson (1901)

Spearman (1904)

Hotelling (1933-1936)

The absence of adequate tools has been, for many years, an obstacleto the development of these methods. They were studied only in atheoretical context. (Multivariate analysis)

1960-1970: Benzecri - Analyse des donnees (Multidimensional DataAnalysis)

Page 6: An overview of most common Statistical packages for data analysis

Outline Introduction to Multidimensional Data Analysis Multidimensional techniques Statistical packages

Multivariate Analysis vs. Multidimensional Analysis

Distributional hypotheses vs. Structural hypotheses

‘The model must follow the data and not viceversa’

1980s: trade-off between the two positions.

Multidimensional analysis may be defined as a group of techniquesthat have the aim to visualize, classify and interprete the data. It tryto underline the latent structure of the data, removing the redundantinformation.

Page 7: An overview of most common Statistical packages for data analysis

Outline Introduction to Multidimensional Data Analysis Multidimensional techniques Statistical packages

Multivariate Analysis vs. Multidimensional Analysis

Distributional hypotheses vs. Structural hypotheses

‘The model must follow the data and not viceversa’

1980s: trade-off between the two positions.

Multidimensional analysis may be defined as a group of techniquesthat have the aim to visualize, classify and interprete the data. It tryto underline the latent structure of the data, removing the redundantinformation.

Page 8: An overview of most common Statistical packages for data analysis

Outline Introduction to Multidimensional Data Analysis Multidimensional techniques Statistical packages

Multivariate Analysis vs. Multidimensional Analysis

Distributional hypotheses vs. Structural hypotheses

‘The model must follow the data and not viceversa’

1980s: trade-off between the two positions.

Multidimensional analysis may be defined as a group of techniquesthat have the aim to visualize, classify and interprete the data. It tryto underline the latent structure of the data, removing the redundantinformation.

Page 9: An overview of most common Statistical packages for data analysis

Outline Introduction to Multidimensional Data Analysis Multidimensional techniques Statistical packages

Multivariate Analysis vs. Multidimensional Analysis

Distributional hypotheses vs. Structural hypotheses

‘The model must follow the data and not viceversa’

1980s: trade-off between the two positions.

Multidimensional analysis may be defined as a group of techniquesthat have the aim to visualize, classify and interprete the data. It tryto underline the latent structure of the data, removing the redundantinformation.

Page 10: An overview of most common Statistical packages for data analysis

Outline Introduction to Multidimensional Data Analysis Multidimensional techniques Statistical packages

Multivariate Analysis vs. Multidimensional Analysis

Distributional hypotheses vs. Structural hypotheses

‘The model must follow the data and not viceversa’

1980s: trade-off between the two positions.

Multidimensional analysis may be defined as a group of techniquesthat have the aim to visualize, classify and interprete the data. It tryto underline the latent structure of the data, removing the redundantinformation.

Page 11: An overview of most common Statistical packages for data analysis

Outline Introduction to Multidimensional Data Analysis Multidimensional techniques Statistical packages

Principal Component Analysis

Correspondence Analysis

Discriminant Analysis

Canonical Correlation Analysis

Cluster Analysis

Page 12: An overview of most common Statistical packages for data analysis

Outline Introduction to Multidimensional Data Analysis Multidimensional techniques Statistical packages

Principal Component Analysis

Hypotesis: the new factors are linear combinations of the originalvariables.Now few factors are sufficient to explain the most part of the vari-ability in the data.

Geometric interpretation

Page 13: An overview of most common Statistical packages for data analysis

Outline Introduction to Multidimensional Data Analysis Multidimensional techniques Statistical packages

Principal Component Analysis

Hypotesis: the new factors are linear combinations of the originalvariables.Now few factors are sufficient to explain the most part of the vari-ability in the data.

Geometric interpretation

Page 14: An overview of most common Statistical packages for data analysis

Outline Introduction to Multidimensional Data Analysis Multidimensional techniques Statistical packages

An example

Page 15: An overview of most common Statistical packages for data analysis

Outline Introduction to Multidimensional Data Analysis Multidimensional techniques Statistical packages

Page 16: An overview of most common Statistical packages for data analysis

Outline Introduction to Multidimensional Data Analysis Multidimensional techniques Statistical packages

Page 17: An overview of most common Statistical packages for data analysis

Outline Introduction to Multidimensional Data Analysis Multidimensional techniques Statistical packages

Correspondence Analysis

Simple Correspondence analysis is one of the most known tools forqualitative data.

It studies the relationships between the modalities of two qualitativevariables.

Page 18: An overview of most common Statistical packages for data analysis

Outline Introduction to Multidimensional Data Analysis Multidimensional techniques Statistical packages

Correspondence Analysis

Simple Correspondence analysis is one of the most known tools forqualitative data.

It studies the relationships between the modalities of two qualitativevariables.

Page 19: An overview of most common Statistical packages for data analysis

Outline Introduction to Multidimensional Data Analysis Multidimensional techniques Statistical packages

Correspondence Analysis

Page 20: An overview of most common Statistical packages for data analysis

Outline Introduction to Multidimensional Data Analysis Multidimensional techniques Statistical packages

Multiple Correspondence Analysis

When there are more than 2 qualitative variables the Simple Cor-respondence Analysis is not possible and the relationships betweenthe characters may be studied with the Multiple CorrespondenceAnalysis.

This technique is used in economic sciences, healt science, marketinganalysis.

Page 21: An overview of most common Statistical packages for data analysis

Outline Introduction to Multidimensional Data Analysis Multidimensional techniques Statistical packages

Multiple Correspondence Analysis

When there are more than 2 qualitative variables the Simple Cor-respondence Analysis is not possible and the relationships betweenthe characters may be studied with the Multiple CorrespondenceAnalysis.

This technique is used in economic sciences, healt science, marketinganalysis.

Page 22: An overview of most common Statistical packages for data analysis

Outline Introduction to Multidimensional Data Analysis Multidimensional techniques Statistical packages

Multiple Correspondence Analysis

Page 23: An overview of most common Statistical packages for data analysis

Outline Introduction to Multidimensional Data Analysis Multidimensional techniques Statistical packages

Discriminant analysis

Descriptive aim: verify if the prior classification is confirmed afterusing the explicative variables.

Decision aim: classify a new observation in one of the group.

Page 24: An overview of most common Statistical packages for data analysis

Outline Introduction to Multidimensional Data Analysis Multidimensional techniques Statistical packages

Discriminant analysis

Descriptive aim: verify if the prior classification is confirmed afterusing the explicative variables.

Decision aim: classify a new observation in one of the group.

Page 25: An overview of most common Statistical packages for data analysis

Outline Introduction to Multidimensional Data Analysis Multidimensional techniques Statistical packages

Cluster Analysis

Group of techniques that have the aim to classify observations orindividuals in clusters.

The observations in each cluster must be similar and the clustersmust be well separated.Partition and hierarchy.

Page 26: An overview of most common Statistical packages for data analysis

Outline Introduction to Multidimensional Data Analysis Multidimensional techniques Statistical packages

Cluster Analysis

Page 27: An overview of most common Statistical packages for data analysis

Outline Introduction to Multidimensional Data Analysis Multidimensional techniques Statistical packages

List of some softwares

Most used softwares for Multidimensional Data Analysis

Spad

Xl-stat

Spss

S-plus

R

Pspp

Page 28: An overview of most common Statistical packages for data analysis

Outline Introduction to Multidimensional Data Analysis Multidimensional techniques Statistical packages

SPAD

AdvantagesIt can perform many statistical analysis:

Descriptive Statistics

Factorial Analysis

Classification

Segmentation

Textual analysis

It has good graphical tools and it is easy to use.

Page 29: An overview of most common Statistical packages for data analysis

Outline Introduction to Multidimensional Data Analysis Multidimensional techniques Statistical packages

SPAD

AdvantagesIt can perform many statistical analysis:

Descriptive Statistics

Factorial Analysis

Classification

Segmentation

Textual analysis

It has good graphical tools and it is easy to use.

Page 30: An overview of most common Statistical packages for data analysis

Outline Introduction to Multidimensional Data Analysis Multidimensional techniques Statistical packages

SPAD

DisadvantagesData importation is not direct.

It is expensive:

Price for University (single user) 1050 €Price for University (15 users) 3000 €

Price for others 21000 €

Link: http://www.coheris.fr/fr/page/produits/Spad.html

Page 31: An overview of most common Statistical packages for data analysis

Outline Introduction to Multidimensional Data Analysis Multidimensional techniques Statistical packages

SPAD

DisadvantagesData importation is not direct.

It is expensive:

Price for University (single user) 1050 €Price for University (15 users) 3000 €

Price for others 21000 €

Link: http://www.coheris.fr/fr/page/produits/Spad.html

Page 32: An overview of most common Statistical packages for data analysis

Outline Introduction to Multidimensional Data Analysis Multidimensional techniques Statistical packages

SPAD

DisadvantagesData importation is not direct.

It is expensive:

Price for University (single user) 1050 €Price for University (15 users) 3000 €

Price for others 21000 €

Link: http://www.coheris.fr/fr/page/produits/Spad.html

Page 33: An overview of most common Statistical packages for data analysis

Outline Introduction to Multidimensional Data Analysis Multidimensional techniques Statistical packages

SPAD

Page 34: An overview of most common Statistical packages for data analysis

Outline Introduction to Multidimensional Data Analysis Multidimensional techniques Statistical packages

XLSTAT

AdvantagesVery easy to use, excel interface.

Possibility to download a trial version for 30 days.

It is less expensive than SPAD:

Price for single user 295 €Price for 20 users 2495 €

DisadvantagesAbout Multidimensional Analysis it has less options than SPAD.

Link: http://www.xlstat.com/en/

Page 35: An overview of most common Statistical packages for data analysis

Outline Introduction to Multidimensional Data Analysis Multidimensional techniques Statistical packages

XLSTAT

AdvantagesVery easy to use, excel interface.

Possibility to download a trial version for 30 days.

It is less expensive than SPAD:

Price for single user 295 €Price for 20 users 2495 €

DisadvantagesAbout Multidimensional Analysis it has less options than SPAD.

Link: http://www.xlstat.com/en/

Page 36: An overview of most common Statistical packages for data analysis

Outline Introduction to Multidimensional Data Analysis Multidimensional techniques Statistical packages

XLSTAT

AdvantagesVery easy to use, excel interface.

Possibility to download a trial version for 30 days.

It is less expensive than SPAD:

Price for single user 295 €Price for 20 users 2495 €

DisadvantagesAbout Multidimensional Analysis it has less options than SPAD.

Link: http://www.xlstat.com/en/

Page 37: An overview of most common Statistical packages for data analysis

Outline Introduction to Multidimensional Data Analysis Multidimensional techniques Statistical packages

XLSTAT

AdvantagesVery easy to use, excel interface.

Possibility to download a trial version for 30 days.

It is less expensive than SPAD:

Price for single user 295 €Price for 20 users 2495 €

DisadvantagesAbout Multidimensional Analysis it has less options than SPAD.

Link: http://www.xlstat.com/en/

Page 38: An overview of most common Statistical packages for data analysis

Outline Introduction to Multidimensional Data Analysis Multidimensional techniques Statistical packages

XLSTAT

AdvantagesVery easy to use, excel interface.

Possibility to download a trial version for 30 days.

It is less expensive than SPAD:

Price for single user 295 €Price for 20 users 2495 €

DisadvantagesAbout Multidimensional Analysis it has less options than SPAD.

Link: http://www.xlstat.com/en/

Page 39: An overview of most common Statistical packages for data analysis

Outline Introduction to Multidimensional Data Analysis Multidimensional techniques Statistical packages

XLSTAT

AdvantagesVery easy to use, excel interface.

Possibility to download a trial version for 30 days.

It is less expensive than SPAD:

Price for single user 295 €Price for 20 users 2495 €

DisadvantagesAbout Multidimensional Analysis it has less options than SPAD.

Link: http://www.xlstat.com/en/

Page 40: An overview of most common Statistical packages for data analysis

Outline Introduction to Multidimensional Data Analysis Multidimensional techniques Statistical packages

XLSTAT

Page 41: An overview of most common Statistical packages for data analysis

Outline Introduction to Multidimensional Data Analysis Multidimensional techniques Statistical packages

XLSTAT

Page 42: An overview of most common Statistical packages for data analysis

Outline Introduction to Multidimensional Data Analysis Multidimensional techniques Statistical packages

XLSTAT

Page 43: An overview of most common Statistical packages for data analysis

Outline Introduction to Multidimensional Data Analysis Multidimensional techniques Statistical packages

XLSTAT

Page 44: An overview of most common Statistical packages for data analysis

Outline Introduction to Multidimensional Data Analysis Multidimensional techniques Statistical packages

SPSS

Advantages

Easy to use, similar to excel.

It is simple to find manual or tutorial on internet that showhow to use it.

It has a large diffusion also if it is not open source.

Disadvantages

For some analyses it has less options than other packages.

Trial version only for 14 days and limited licence. (about 2000 €)

Link:http://www-01.ibm.com/software/analytics/spss/products/statistics/stats-standard/

Page 45: An overview of most common Statistical packages for data analysis

Outline Introduction to Multidimensional Data Analysis Multidimensional techniques Statistical packages

SPSS

Advantages

Easy to use, similar to excel.

It is simple to find manual or tutorial on internet that showhow to use it.

It has a large diffusion also if it is not open source.

Disadvantages

For some analyses it has less options than other packages.

Trial version only for 14 days and limited licence. (about 2000 €)

Link:http://www-01.ibm.com/software/analytics/spss/products/statistics/stats-standard/

Page 46: An overview of most common Statistical packages for data analysis

Outline Introduction to Multidimensional Data Analysis Multidimensional techniques Statistical packages

SPSS

Page 47: An overview of most common Statistical packages for data analysis

Outline Introduction to Multidimensional Data Analysis Multidimensional techniques Statistical packages

SPSS

Page 48: An overview of most common Statistical packages for data analysis

Outline Introduction to Multidimensional Data Analysis Multidimensional techniques Statistical packages

R

Advantages

Open source.

Many statistical procedures.

Availability on internet of many routines developed by users.

Disadvantages

Not user-friendly Link: http://cran.r-project.org/

Page 49: An overview of most common Statistical packages for data analysis

Outline Introduction to Multidimensional Data Analysis Multidimensional techniques Statistical packages

R

Advantages

Open source.

Many statistical procedures.

Availability on internet of many routines developed by users.

Disadvantages

Not user-friendly Link: http://cran.r-project.org/

Page 50: An overview of most common Statistical packages for data analysis

Outline Introduction to Multidimensional Data Analysis Multidimensional techniques Statistical packages

PSPP

Advantages

It is a free replacement for the proprietary program SPSS.

The copy of PSPP will not expire and there are no additionalpackages to purchase.

It is designed to perform its analyses as fast as possible,regardless of the size of the input data.

Disadvantages

Sometimes there are problems to find the right mirror for the instal-lation.

Link: http://www.gnu.org/software/pspp/

Page 51: An overview of most common Statistical packages for data analysis

Outline Introduction to Multidimensional Data Analysis Multidimensional techniques Statistical packages

PSPP

Advantages

It is a free replacement for the proprietary program SPSS.

The copy of PSPP will not expire and there are no additionalpackages to purchase.

It is designed to perform its analyses as fast as possible,regardless of the size of the input data.

Disadvantages

Sometimes there are problems to find the right mirror for the instal-lation.

Link: http://www.gnu.org/software/pspp/

Page 52: An overview of most common Statistical packages for data analysis

Outline Introduction to Multidimensional Data Analysis Multidimensional techniques Statistical packages

PSPP

Page 53: An overview of most common Statistical packages for data analysis

Outline Introduction to Multidimensional Data Analysis Multidimensional techniques Statistical packages

Other softwares

Not only for multidimensional analysis...

Matlab

Stata

Eviews

Gauss

They are not open source and some of them perform only sometechniques of multidimensional analysis.

Page 54: An overview of most common Statistical packages for data analysis

Outline Introduction to Multidimensional Data Analysis Multidimensional techniques Statistical packages

Open source softwares

Pr Ano Log Prob Glm Nopar Time PCA CCA CA Disc ClusAde4 * * * * * *

Dataplot * * * * * * *Easyreg * * * * * *

Gretl * * * * * * *Instat + * * * *

Macanova * * * * * * *Matrixer * * * * *Microsiris * * * * *Tanagra * * * * * *

Vista * * * *Winidams * * * * *

Page 55: An overview of most common Statistical packages for data analysis

Outline Introduction to Multidimensional Data Analysis Multidimensional techniques Statistical packages

Open source softwares

Thank you for your attention!