quantitative statistical methods

• Gazdaságtudományi Kar• Gazdaságelméleti és Módszertani Intézet

Quantitative Statistical Methods

Required Readings:

• Petra Petrovics: SPSS Tutorial and Exercise Book• Quantitative Information Forming Methods 08.modul

(TAMOP – 4.1.2-08/1/A-2009-0049 Virtuális vállalatok)http://elearning.infotec.hu/ilias.php?baseClass=ilSAHSPre

sentationGUI&ref_id=2774

Proposed Readings:• Chris Brooks: Introductory Econometrics for Finance,

Cambridge; Second Edition:• Richard A. Defusco, CFA – Dennis W. McLeavey, CFA –

Jerald E. Pinto, CFA – David E. Runkley, CFA: Quantitative Investment Analysis, CFA Series; SecondEdition:

Petra Petrovics

Introduction to Statistics

Statistics

Statistics: is a mathematical sciencepertaining to the collection, analysis,interpretation or explanation, andpresentation of data.

• Practical activity – to analyze data

• Set of data – as a result of statistical activity

• Method

• Analyzing data

• Drawing conclusion

Data Gathering

• Trends and reports overview

• Observations

• Interview

• Focus group

• Survey

• Photo interview

Statistical Inference

• Study of how data can besummarized effectively todescribe the importantaspects of large data sets

• It turns data intoinformation

• Data collection &analyzation

• It is used when tentativeconclusions about apopulation are drawn onthe basis of a sample

Statistics

Descriptive Statistics

Statistical Population

• All members of a specified group (N)

• It is a set of entities concerning whichstatistical inferences are to be drawn, oftenbased on a random sample taken from thepopulation.

– Discrete population

– Continuous population (interval)

Statistical Variables

= Characteristic of a unit.

(1)• Quantitative • Qualitative• Temporal• Geographical

(2)• Common• Differential

Quantitative vs. Qualitative

• Quantitative data measures either howmuch or how many of something, i.e. aset of observations where any singleobservation is a number that representsan amount or a count.

• Qualitative data provide labels, ornames, for categories of like items, i.e. aset of observations where any singleobservation is a word or code thatrepresents a class or category.

~ categorical variable

Types of Quantitative Variables

• Continuous variables are those variables that havetheoretically an infinite number of gradationsbetween two measurements.For example, body weight of individuals, milk yield of cows orbuffaloes etc. Most of the variables in biology are of continuoustype.

• Discrete variables do not have continuous gradationsbut there is a definite gap between twomeasurements, i.e. they can not be measured infractions.For example, number of eggs laid by hens, number of children

in a family etc.

Scales of measurement

from weakest to strongest

- nominal scale

- ordinal scale

- interval scale

- ratio scale

1. Nominal scale

• Numbers are labels of groups or classes• Simple codes assigned to objects as labels• For qualitative data, e.g. professional

classification, geographic classification• e.g. - blonde: 1, brown: 2, red: 3, black: 4

(a person with red hair does not possess more "hairness" than a person with blonde hair)

- female: 1, male: 2

2. Ordinal scale

• Data elements may be ordered according to their relative size or quality, the numbers assigned to objects or events represent the rank order (1st, 2nd, 3rd etc.)

• e.g. top lists of companies

3. Interval scale

• Meaning of distances between any two observations

• The "zero point" is arbitrary

• Negative values can be used

• Ratios between numbers on the scale are not meaningful, so operations such as multiplication and division cannot be carried out directly

• e.g. temperature with the Celsius scale

4. Ratio scale

• Strongest scale of measurement

• Distances between observations and also the ratios of distances have a meaning

• Contains a meaningful zero

• e.g. mass, length, time

a salary of $50,000 is twice as large as a salary of $25,000

SPSS (Statistical Package for the Social Sciences )

• computer program used for statistical analysis

• 2 files: XY.sav - Data View

XY.spo - Output

Just with upper case!!!

It can be a

longer name

Short name; don’t use space!!

Number of the

characters in the

Data View

Width of a column

Review of Bivariate Correlationand Regression

Types of dependence

• association – between two nominal data

• mixed – between a nominal and a ratio data

• correlation – among ratio data

• X (or X1, X2, … , Xp):

known variable(s) / independent variable(s) / predictor(s)

• Y: unknown variable / dependent variable

• causal relationship: X „causes” Y to change

Correlation Regression

describes the strength of a

relationship, the degree to

which one variable is linearly

related to another

shows us how to determine

the nature of a relationship

between two or more

variables

Correlation Measures

1. Covariance

2. Coefficient of correlation

3. Coefficient of determination

4. Coefficient of rank correlation

Correlation Measures

1. Covariance

The covariance between two variables is a measure of the joint variation of the two variables

– ranges from - to +;

– Cov = 0, when X and Y are uncorrelated;

– its sign shows the direction of correlation

– it doesn’t measure the degree of relationship!!!

( )( ) ( )

yyxx yx,Cov

−−=

2. Coefficient of correlation (Pearson)

• its sign shows the direction of correlation

• it measures the strength of correlation

• 0 < r < 1 → statistical dependence

r = 0 → X and Y are uncorrelated

r = -1 → negative ☻

r = 1 → positive ☺

• You can use only in case of linear relationship!

y,xCov r

3. Coefficient of determination

• r2

• The square of the sample correlation coefficient betweenthe outcomes and their predicted values.

• Measures the degree of correlation in percentage (%)

• It provides a measure of how well future outcomes arelikely to be predicted by the model.

• Vary from 0 to 1.

S - 1 =

Example

• A firm administers a test tosales trainees before they gointo the field. Themanagement of the firm isinterested in determiningthe relationship between thetest scores and the salesmade by the trainees at theend of one year in the field.The following data werecollected for 45 salespersonnel who have been inthe field one year.

• Calculate differentcorrelation measures!

Sales-

person

Number of

units sold

K. A. 25 188 +9 +22 +198

L. Z. 16 157 0 -9 0

B. E. 30 165 +14 -1 -14

G. P. 5 124 -11 -42 +462

… … … … … …

S. G. 10 158 -6 -8 +48

J. T. 24 224 +8 +58 +464

V. P. 17 169 +1 +3 +3

T. L. 6 114 -10 -52 +520

Total 716 7 464 0 0 ∑dxdy=8 894.5

X → Y

independent dependent variable

xi dxx =− yi dyy =− ( ) ( ) yxii ddyyxx =−−

Number of observed pairs: n = 45

Positive correlation

8.26 s 16 x x ==

30.99 s 166 y y ==

202.15 1-45

894.5 8

There is a strong & positive relationbetween test scores and number of unitssold.

The variation of test scores explains 62.36percent of the variation of number ofunits sold.

% 62.36 r

0.7897 30.99 8.26

202.15

4. Coefficient of rank correlation (Spearman)

• Measure of the relationship between two ordinal data

• n = number of paired observations,

d = difference between the ranks for each pair of

observations.

• perfect correlation → rs = 1

perfect inverse correlation → rs = -1

in case of independence → rs = 0

)1 (nn

d6 -1 r

1 r 0 s

Student

Ability

A B C D E F G H I J Total

Mathematics 1 2 3 4 5 6 7 8 9 10 -

Music 3 4 1 2 5 7 10 6 8 9 -

di = xi - yi -2 -2 2 2 0 -1 -3 2 1 1 0

di2 4 4 4 4 0 1 9 4 1 1 32

Example

Ten students were ranked by theirmathematical and musical ability:

0.806 1) - (1010

326 - 1

)1 (nn

d6 - 1 ρ

strong relationship

Simple Linear Regression Model

• We model the relationship between two variables, X and Y as a straight line.

• The model contains two parameters:

▪ an intercept parameter,

▪ a slope parameter.Y = β0 + β1x + ε

Y = deterministic component + random error

where: Y – dependent or response variable (the variable we

wish to explain or predict)

x – independent or predictor variableε – random error componentβ0 – y-intercept of the line, i.e. point at which the line

intercept the y-axisβ1 – slope of the line

β0 = y-intercept

β1 = slope

Random error

Deterministic component • y = deterministic component + random error

• We always assume that the mean value of the random error equals 0 → the mean value of y equals the deterministic component.

• It is possible to find many lines for which the sum of the errors is equal to 0, but there is one (and only one) line for which the SSE (sum of squares of the errors) is a minimum:

→ least squares line / regression line.

ŷi = b0 + b1x i

• The method of least squares gives us the bestlinear unbiased estimators (BLUE) of the regressionparameters, β0, β1.

• The least-squares estimators:

b0 estimates β0

b1 estimates β1

• The (empirical) regression line:

y caret („hat”):• Calculation of the estimators:

( ) ( ) min!,

1010 →−−==

ii xbbybbf

xbby += 10ˆ

Least Square Methode• There is an extreme value (minimum) iftha partial derivation is equal to 0

• After transformation…• The normal equations (with 1 x)

Σy = nb0 + b1ΣxΣxy = b0Σx + b1Σx2

• The estimated regression line:

( ) 02

=−−−=

xbbyxb

ŷ = b0 + b1x

Interpretation

• b0: when x=0, y=b0

If the X variable is 0, how much is the Y.

• b1: for every 1 unit increase in x we expecty to change by b1 units on average.

• If the X is higher with 1, what is the

difference in Y on average.

No relationship

0 10 20 30 40Number of storks

Number of

births

Independence

- 2 - 1 0 1 2

N i n c s k o r r e lá c i ó

Y = - 7 . 4 E - 0 2 + 0 . 2 0 8 3 4 8 X

R - S q = 3 . 4 %

Positive correlation

3210- 1- 2- 3

P o z i t ív k o r r e lá c i ó

R -S q = 6 2 .5 %

Y = -8 . 6 E -0 2 + 0 . 6 9 0 2 8 6 X

Negative correlation

- 3 - 2 - 1 0 1 2 3

N e g a t ív k o r r e lá c i ó

Y = 5 . 0 7 E - 0 2 - 0 . 6 4 7 8 7 2 X

R - S q = 7 0 . 9 %

Curvilinear relation

- 3 - 2 - 1 0 1 2 3

N e m l i n e á r i s k o r r e lá c i ó

Y = 1 2 . 0 9 5 8 + 6 . 0 7 6 8 4 X + 1 . 1 6 6 8 6 X * * 2

R - S q = 8 8 . 4 %

Scatter diagrams

direct relationship

positive slope

0 10 20 30 40

Production (number of products per day)

0 10 20 30 40

Advertising in $

0 2 4 6 8 10 12Age of a house (year)

0 5 10 15

Age of a car (year)

linear

curvilinear

inverse relationship

negative slope

Power regression

Y = a Xb

logY = loga + b logX

↓ ↓ ↓

V = b0 + b1 ∙ x

b1 = b

b0 = lga

xbxbyx

lglglglg

Compound regression

Y = a bx

logY = loga + logb x

↓ ↓ ↓

V = b0 + b1 ∙ x

b1 = lgb

b0 = lga

xbxbyx

Estimation in Regression

• Regression estimation is a technique used to replacemissing values in data.

• If we know:

1. The estimated parameter value;

2. The hypothesized value of the parameter;

3. Confidence interval around the estimatedparameter.

• The number of degrees of freedom equals the number ofobservations minus the number of parametersestimated.

• = n-2

Parameter Estimated value Standard error

Estimation in Regression

2i )xx −(

In case of average Y values

In case of discrete Y values

Elasticity

x b x)E(y,

= E(y, x) = bx

Elasticity at the mean

% change in x demanded % change in y

Residual variable

( ) ( )

( ) ( ) ( ) == =

−+−=−

+−=−

yyyyyy

Sy = + Se

Sum of square of Y Sum of squareexplained byregression

Sum of square of theerrors

Sum of

SquaresDf

Mean Sum

of SquaresF

Regression 1

Residual n-2

Total n-1

Analysis of Variance in Regression Analysis

2y SS S += ˆ

i )y(y + )yy( )y(y −−=−

iy )yy( = S − yS

ie )y(y = S − )2/( −= nS s e2e

S = (y y)y i

2 −1-n

2)-/(nS

Model testing

H0: β1 = 0

H1: β1 ≠ 0 (linear model)

Test statistic:

• F-statistic tests whether all the slope coefficientsin a linear regression are equal to 0.

• Measures how well the regression equationexplains the variation in the dependent variable.

2)-/(nS

211 : H

121 −F

211 : H

);( 21

211 : H

F);( 211 −F

Parameter testing

H0: β1 = 0

H1: β1 ≠ 0

Test statistic:

where: b1 is the least square estimate of the

regression slope

s(b1) is the standard error of b1

−− 1t 0

Pr01 : mH

2/1 −−t 0

2/1 −t

01 : mH

Pr01 : mH

Thanks for your attention!

strolsz@uni-miskolc.hu

quantitative statistical methods

Documents

critical appraisal: quantitative - london links · research...

statistics, statistics assignment help, statistics help,...

quantitative methods in psychology statistical

statistical methods for quantitative trait loci (qtl)...

quantitative reasoning and statistical methods · pdf...

dtc quantitative methods statistical inference ii:...

rob cribbie quantitative methods program – department of...

quantitative methods

dtc quantitative research methods statistical inference i:...

dtc quantitative research methods statistical inference ii:...

quantitative statistical methods for image quality...

engaging students in quantitative research methods: an...

statistical methods for quantitative ms-based proteomics ......

getting started with quantitative empirical methods started...

research methodology and statistical quantitative methods

quantitative methods for lawyers - class #11 - power laws,...

quantitative methods and gender inequalities - core ·...

quantitative analysis. quantitative / formal methods...

urbp 204a quantitative methods i statistical analysis...

quantitative methods of management....