pertemuan 18 model statistik matakuliah: d0174/ pemodelan sistem dan simulasi tahun: tahun 2009

30

Upload: darrell-powell

Post on 01-Jan-2016

219 views

Category:

Documents


3 download

TRANSCRIPT

Pertemuan 18

MODEL STATISTIK

Matakuliah : D0174/ Pemodelan Sistem dan Simulasi

Tahun : Tahun 2009

Learning Objectives

• Analisis Data• Studi Kasus Model Statistik

Choice of an appropriate statistical technique a complex issue somewhat arbitrary

Real-life data often contain mixtures of different types of data two statisticians may select different methods depending upon what assumptions they are willing to take

into account extraneous factors

availability of software and its limitationsavailability of time and financial resources

General Principles of Data Analysis

Warnings Figures allow us to calculate them Applying different techniques and obtaining different results

does not mean that something is wrong Looking for an answer to the same question by using several

methods may lead to a better understanding Obtaining negative results may be as informative as getting a

positive one Obtaining no answer by using one technique, does not mean

that there is no answer at all Etc.

General Principles of Data Analysis

The choice of a statistical technique depends essentially upon Characteristics of the analysis question; Characteristics of the data; Characteristics of the sampling design.

Characteristics of the Analysis Question Whether there is a distinction between independent and dependent

variables or not? Whether the nature of the research problem requires:

Description, exploration, estimation, orTesting of a hypothesis or model

Whether the focus of research is on 'variables' or 'objects‘.

General Principles of Data Analysis

Characteristics of the DataTypes of data sets

Individuals - variables data sets Proximities data sets

Variable - Variable Proximities Individual - Individual Proximities

Types of Variables Continuous or Quantitative Variables Discrete or Qualitative Variables

Variable types by measurement level

General Principles of Data Analysis

Nominal-scale variables Ordinal-scale variables

Interval-scale variablesRatio-scale variables

Techniques for problems without distinction between independent and dependent variables

General Principles of Data Analysis

Measurement Level Analysis Method

Nominal Frequencies, ProportionsOrdinal Median, ModePreferences Rank Consensus among evaluators

One Mean, Median, Mode, Variance, Skewness, Kurtosis

Two Cross-tabulation Chi-squareTwo Cross tabulation, Chi-square, Correspondence AnalysisTwo Kendall's Tau,Spearman's Rho, Gamma

Two Scatter plot, Pearson's Correlation CoefficientMore than two Principal Components Analysis, Factor Analysis, Cluster

Analysis Multidimensional Scaling

No. of VariablesNON-METRIC

OneOneOne

METRICInterval or ratio scale

NON-METRICDichotomousNominalOrdinal

METRICInterval-scaleInterval-scale

Techniques for problems with distinction between independent and dependent variables

General Principles of Data Analysis

Analysis MethodDependent Independent Dependent Independent

One One Nominal Nominal Non-parametric tests, Chi-squareOne One Nominal

(dichotomous)Nominal Multiple Classification Analysis

One One Nominal Nominal (Dichotomous)

Wilcoxon's two sample test, Chi-square, Kolmogorov-Smirnov Test

One One Interval-scale Nominal (Dichotomous)

t-test, Analysis of Variance

One One Interval-scale Interval-scale Regression AnalysisOne One Interval-scale Nominal Analysis of VarianceOne More Nominal Interval-scale Discriminant AnalysisOne More Interval-scale Nominal Analysis of Variance, Multiple Regression

Analysis, Multiple Classification AnalysisOne More Interval scale Dummy Analysis of Variance, Multiple Regression

Analysis, Multiple Classification AnalysisOne More Interval-scale Interval-scale Multiple Regression Analysis

No. of Variables Measurement Level

Usual way of statistical problem solving Formulate the question using terms and logics of the specific

field of the problem (science management, pedagogy, economics, etc.)

Reformulate the question using statistical terms and logics Find appropriate statistical model(s) and technique(s) Use the selected model(s) and technique(s) Give statistical interpretation to the results obtained Reformulate the interpretation with terms of the original field

of application

General Principles of Data Analysis

Question in research management

Research groups have multiple outputs comprising publications, patents, experimental materials etc. What are the differences if any in the performance of the Research Groups of selected countries?

Statistical question

Can we construct a reasonable productivity index, using the following measures of the scientific output

Articles in country PatentsArticles abroad Algorithms and designs Original research reports Experimental material

Can we find a significant difference by countries in the productivity index?

Scientific products by country

Statistical model and technique Partial order scoring for constructing the index of research output Analysis of variance for testing the hypothesis concerning the

significance of the difference

Use of the selected model and technique

Scientific products by country

$RUN POSCOR $FILES PRINT = POSCOR.LST DICTIN = R2R3RU.DIC DATAIN = R2RU.DAT DICTOUT =POSCOR.DIC DATAOUT =POSCOR.DAT $SETUP POSCOR SCORES OF RU OUTPUTS BADDATA=MD1 -   IDVAR=V2 -   TRANSVARS=(V1) POSCOR ORDER=DESR -   ANAME=‘RU OUTPUT’ –VARS=(V116,V118,V122,V126,V128,V130)

$RUN ONEWAY $FILES PRINT = ONEWAY1.LST DICTIN = POSCOR.DIC DATAIN = POSCOR.DAT $SETUP ANALYSIS OF VARIANCE OF RU OUTPUT BADDATA=MD1 -   PRINT=CDICT DEPVARS=(V8) CONVARS=(R1) $RECODE R1=RECODE V15 (40)=1, (360)=2, (410)=3, (638)=4, (844)=5, (868)=6

Scientific products by country

Use of the selected model and technique (results)Weight-

sum1 334 334 22.9 37.731 35.794 1.26E+04 16.8 9.02E+052 239 239 16.4 45.213 35.778 1.08E+04 14.4 7.93E+053 200 200 13.7 77.585 27.336 1.55E+04 20.7 1.35E+064 225 225 15.4 52.547 35.43 1.18E+04 15.7 9.02E+055 233 233 16 36.7 33.266 8.55E+03 11.4 5.71E+056 229 229 15.7 69.074 36.255 1.58E+04 21.1 1.39E+06

Code Label N % Mean

S.D.(estim.) Sum of X %

Sum of X-square

Total sum of squares 2048467For 6 groups , Eta 0.4018943For 6 groups , Etasq 0.161519For 6 groups , Eta(adj) 0.3982909For 6 groups , Etasq(adj) 0.1586357Between means sum of squares 330866.5Within groups sum of squares 1717601F( 5,1454) 56.018

Scientific products by country

Statistical interpretation The F( 5,1454)=56.018 value shows that there is a highly

significant difference by country in the constracted performance index.

We see also a medium strength differentiation between the countries: Eta(adj)=0.398.

The Mean values show the level of each country.

Interpretation for research managementThere are two countries with low, two ones with medium and two other ones with high productivity index.

SourceP.S. Nagpaul: Guide to Advanced Data Analysis using IDAMS Software

Question in psychology - pedagogy

Intellectual performance, motivation and creativity of school children can be measured by using several indicators. Some of them are produced by the children themselves (e.g. IQ tests) others are based on the evaluation given by their teachers (e.g. average grade). What are the perceivable dimensions if any behind these indicators?

Statistical question

In the set of the listed indicators, are there any groups within which statistical inter-correlation and between which statistical independence can be detected?

T Average grade T Creative behaviourC IQ C Achievement motivation C Creativity test T Motivated behaviourC Creative attitude T Motivation index

Performance, motivation and creativity of school children

Statistical model and technique Pearsonian correlation between the measured indicators Multidimensional scaling, cluster analysis

Use of the selected model and technique

Executing PEARSON, MDSCAL, CLUSFIND in IDAMS

MDSCAL result

Performance, motivation and creativity of school children

Teachers

Children

Use of the selected model and technique

CLUSFIND result

Performance, motivation and creativity of school children

C IQ

C Creativity test

C Creative attitude

C Achievem. motivation

T Average grade

T Creative behaviour

T Motivated behaviour

T Motivation index

0,75

0,71

0,40

0,45

0,27

0,13

0,02

Performance, motivation and creativity of school children

Statistical interpretation Multidimensional scaling shows clear separation of indicators

produced by children and teachers Cluster analysis supports the finding of the separation of variables

coming from teachers and children

Pedagogical/psychological interpretationJust one aspect: ratings given by teachers to children are nearly the

same, independently of the evaluated ability, attitude or behaviour dimension Source

M. Hunya: Multidimensional statistical techniques in pedagogical studiesData

A.Deak, B. Kozeki: Study into the effect of motivation and creativity factors on the performance of school children

Question in hydrology We have water level data on four rivers in North-Africa (mor

than 40 years). Can the water flow level be predicted on the basis of data from the past? If so, with what precision?

What if the average flow level is considered instead of the individual ones?

Statistical question Can the river flow values be predicted by using a set of values

from the preceding period? How does the prediction change if 6 month average flow is

used?

Prediction of river flow values

Statistical model and technique Autoregression model (with a lag of 12 to 36) applied to the river flow

time series Transformation of the original data into a time series of moving

averages (interval length = 6)

Use of the selected model and technique

Time Series Analysis option from the IDAMS interactive facilities

Original series Moving average series12 months R**2=0,32 12 months R**2=0,92

24 months R**2=0,35 24 months R**2=0,93

36 months R**2=0,36

Prediction of river flow values

Use of the selected model and techniqueOriginal series

Prediction of river flow values

Moving average series

Prediction of river flow values

Statistical interpretationAutoregression shows that individual values can be predicted (Unbiased

R**2 = 0,32 - 0,36; for 12 to 36 months) with moderate or avarage precision, high peak values are very poorly reproduced.

In the case of a 6 month moving average, the prediction is nearly perfect (Unbiased R**2 = 0,92; for 12 months).

Hydrological interpretationAlthough the pattern of changes can fairly be reproduced, even three

years data from the past are not enough at all to predict the height of peak flows.

But if we consider 6 month averages, they can be predicted almost with full precision.

DataUNESCO, Water Science Division

Question concerning company management What are the factors that influence the economic performance

of a company? Economic performance is measured by the return on capital employed.

Statistical question Can the return on capital be predicted by using a set of

economic and production indicators from those characterizing the company?

How does the prediction change if we are loking for a subset of best predictors?

Statistical model and technique Multiple linear regression Stepwise regression

Business

Use of the selected model and technique

Running REGRESSN

Results The full regression model explains 70% of the adjusted variance

of the dependant variable. Its standard error is about one half of the mean, value of the determinant of the correlation matrix is .79478E-05. There are 8 variables (out of 12) with high covariance ratio values.

The stepwise regression model selects 3 variables for explaining 80 % variance. No multicollinearity (0.77647 ). Standard error of the estimate of the dependent variable = 0.06135 which is quite low: high reliability of estimation.

Business

Business

Statistical interpretationFull regression model: the reliability of prediction is poor. Strong

multicollinearity is shown. Variables, which contribute to multicollinearity can be identified

The stepwise regression model: 3 variables for explaining 80% variance. No multicollinearity. High reliability of estimation.

Interpretation for managementAlthough the full indicator set can give nice prediction, it can not

be suggested for real use because of the poor prediction reliability.

But if we consider 3 carefully selected indicators, we can get a fair prediction.

SourceP.S. Nagpaul, India

Question concerning measurement of knowledge level

Tests are used very often in education for checking the level of knowledge in one or in another subject. Long tests with many questions can meet relatively easily the reliability requirement. The question is if we can make a short interactive, adaptive test from a long test, preserving at least nearly the original reliability.

Statistical question

Can we give a good estimate of the original test value by using a tree structure based prediction?

Statistical model and techniqueRegression tree

Education

Use of the selected model and technique

Running SEARCH

Results

Starting from a standardized test (for checking a specific verbal aptitude) containing 20 questions, a regression tree with 3-4 questions was obtained. The regression tree contains 10 final subgroups (leaves) with estimates for the original test value ranging from 6,4 to 59,2. The explained variance is 90,4%.

Education

Education

Statistical interpretationA very good estimate can be given for the original test value by using the obtained regression tree.

Interpretation for test designersUsing the the tree structure, cumputer assisted test can be constructed, which is much shorter, without loosing the power of the original test.

SourceM. Hunya: Finding optimal interactive test structures (1982)

Daftar Pustaka

Harrel. Ghosh. Bowden. (2000). Simulation Using Promodel. McGraw-Hill. New York.

Kelton, WD., Sadowski, DA, and Sturrock DT. (KS&S). (2003). Simulation with Arena. 3rd edition. McGraw Hill. New York.

TERIMA KASIH