summary of introduced statistical terms and concepts

17
Summary of introduced statistical terms and concepts mean Variance & standard deviation covariance & correlatio n Describes/measures average conditions or the center of the sample points Describes/measures the spread of the sample points; deviations from the center of the sample points Describes/measures co-dependence of variations in samples of two random variables Slides 6, 17 updated 2014-0

Upload: kenneth-garza

Post on 03-Jan-2016

33 views

Category:

Documents


1 download

DESCRIPTION

Slides 6, 17 updated 2014-03-31. Summary of introduced statistical terms and concepts. mean. Describes/measures average conditions or the center of the sample points. Variance & standard deviation. Describes/measures the spread of the sample points; deviations from the - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Summary of introduced statistical terms  and concepts

Summary of introduced statistical terms and concepts

mean

Variance &

standarddeviation

covariance&

correlation

Describes/measuresaverage conditions or thecenter of the sample points

Describes/measures the spread of the sample points;deviations from the center of the sample points

Describes/measuresco-dependence of variationsin samples of two random variables

Slides 6, 17 updated 2014-03-31

Page 2: Summary of introduced statistical terms  and concepts

Summary of introduced statistical terms and concepts

mean

Variance &

standarddeviation

covariance&

correlation

Calculated mean values:unbound, any real number(check your values: it must be within the minimumand maximum of the sample data)

Variance: values are > or = 0Standard deviation: > or = 0(check your values: standard deviationshould be less than the minimum-maximumsample range |max(x)-min(x)|)

Covariance: any real numberCorrelation: between -1 and +1(check your values: correlation shouldnever exceed the range from -1 to 1]

Page 3: Summary of introduced statistical terms  and concepts

Linear function: y= bx +a

The value of y depends on the value of x

Δy = b*Δx

Δx

Note: I corrected the notation of the equation, please check your notes b is the slope,a is the constant the intercept value. R-script class14.R was updated (2014-03-25 4:30pm)

Page 4: Summary of introduced statistical terms  and concepts

Linear function y= bx +a

The value of y depends on the value of x

Δy = b*Δx

Δx

b= Δy/Δx =2

a=-1

Page 5: Summary of introduced statistical terms  and concepts

Linear function y= bx +a

The value of y depends on the value of x

a= Δy/Δx =2y= 0x + a= a

value of y does not depend on x

Page 6: Summary of introduced statistical terms  and concepts

Note: updated slide to define y with random error

Page 7: Summary of introduced statistical terms  and concepts

?

Page 8: Summary of introduced statistical terms  and concepts

How to estimate the best fitting line?

Page 9: Summary of introduced statistical terms  and concepts

How to estimate the best fitting line?

Mathematically we formulate this as a minimization problem:Minimize the distance of the data points from the linear regression line.

Page 10: Summary of introduced statistical terms  and concepts

How to estimate the best fitting line?

Sum of Squared Errors (SSE)

The deviations fromthe deterministic modelline are interpreted as random errors (following a Gaussian distribution)

Page 11: Summary of introduced statistical terms  and concepts

How to estimate the best fitting line?

Sum of Squared Errors (SSE)

Intercept Slope

Page 12: Summary of introduced statistical terms  and concepts

How to estimate the best fitting line?

Sum of Squared Errors (SSE)

Note:Many textbooks and statisticians would prefer to distinguish the estimated valuesfrom the actual true (but unknown) parameter values using a different symbol.Or they use Greek letters for the true values, and Latin letters for the estimates.

^ ^^

Sample mean of x and y

Page 13: Summary of introduced statistical terms  and concepts

How to estimate the best fitting line?

Sum of Squared Errors (SSE)

1n

1n

COV(x,y)

VAR(x)b=

Page 14: Summary of introduced statistical terms  and concepts

How to estimate the best fitting line?

Sum of Squared Errors (SSE)

1n

1n

Slope of the regression line:

Correlation coefficient * standard deviation (y) / standard deviation (x)

Page 15: Summary of introduced statistical terms  and concepts

How to estimate the best fitting line?

Linear relationship with errors: y= bx +a + εThe value of y depends on the value of x plus a random error

Estimated Regression line

Page 16: Summary of introduced statistical terms  and concepts

Class exercises:

download script class14.R:

source(“class14.R”)

(1) change the linear parameters to have steeper, or more flat slopes.(2) change the slope to negative (from top left to bottom right)(3) observe how the correlation coefficient changes(4) change the error variance and observe how it affects the correlation and fitting of the line(5) watch in case, where does the line intersect with the y-axis(6) change the intercept parameter (intersection with the y-axis). What is the effect on the correlation?(7) find a way to change the sample size of the scatter points and repeatyour (1)-(6)(8) set the slope parameter closer to 0 and eventually to 0 change the variance of the errors. What happens to correlation?

Page 17: Summary of introduced statistical terms  and concepts

04/20/23

Note: In R-scripts the variable names have a slightly different notation:

As you can see in class14.R we use ‘yobs’ for the variable y containingthe random error ‘e’.

The estimator for the slope ‘bfit’ is calculated by using for the correlationcoefficient ‘cor(x,yobs)’ and the standard deviation ‘sd(yobs)’ and ‘sd(x)’

The equation on slide 14includes the error