dr. héctor allendereview of probability and statistics 1 a review of probability and statistics...

79
r. Héctor Allende Review of Probability and Statis 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling distributions Estimation and confidence intervals Test of Hypothesis For mean, variances, and proportions Goodness of fit

Post on 19-Dec-2015

238 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

1

A Review of Probability and Statistics

• Descriptive statistics

• Probability

• Random variables

• Sampling distributions

• Estimation and confidence intervals

• Test of Hypothesis–For mean, variances, and proportions–Goodness of fit

Page 2: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

2

Key Concepts

• Population -- "parameters"

–Finite

–Infinite

• Sample -- "statistics"

• Random samples - Your MOST important decision!

Page 3: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

3

Data

• Deterministic vs. Probabilistic (Stochastic)

• Discrete or Continuous:– Whether a variable is continuous (measured) or

discrete (counted) is a property of the data, not of the measuring device: weight is a continuous variable, even if your scale can only measure values to the pound.

• Data description:– Category frequency– Category relative frequency

Page 4: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

4

Data Types

• Qualitative (Categorical)

–Nominal -- I E = 1 ; EE = 2 ; CE = 3

–Ordinal -- poor = 1 ; fair = 2 ; good = 3 ; excellent = 4

• Quantitative (Numerical)

–Interval -- temperature, viscosity

–Ratio -- weight, height

• The type of statistics you can calculate depends on the data type. Average, median, and variance make no sense if the data is categorical (proportions do).

Page 5: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

5

Data Presentation for Qualitative Data

• Rules:– Each observation MUST fall in one and only one category.– All observations must be accounted for.

• Table -- Provides greater detail

• Bar graphs -- Consider Pareto presentation!

• Pie charts (do not need to be round)

Page 6: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

6

Data Presentation for Quantitative Data

• Consider a Stem-and-Leaf Display

• Use 5 to 20 classes (intervals, groups).

–Cell width, boundaries, limits, and midpoint

• Histograms

–Discrete–Continuous (frequency polygon - plot at class mark)

• Cumulative frequency distribution (Ogive - plot at upper boundary)

Page 7: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

7

Statistics

• Measures of Central Tendency– Arithmetic Mean– Median– Mode– Weighted mean

• Measures of Variation– Range– Variance– Standard Deviation

• Coefficient of Variation

• The Empirical Rule

Page 8: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

8

Arithmetic Mean and Variance -- Raw Data

• Mean

• Variance

S

y y

n

n y y

n n

ii i2

2

2 2

1 1

_

y

y

n

ii

n

_

1

Page 9: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

9

Arithmetic Mean and Variance -- Grouped Data

• Mean

• Variance

yf y

n

i ii

n

_

1

Sf y y

n

n f y f y

n n

n f y

i ii i i i

i i

2

2

2 2

1 1

_

where and = class midpoint

Page 10: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

10

Percentiles and Box-Plots

• 100pth percentile: value such that 100p% of the area under the relative frequency distribution lies below it.

– Q1: lower quartile (25% percentile)

– Q3: upper quartile (75% percentile)

• Box-Plots: limited by lower and upper quartiles– Whiskers mark lowest and highest values within 1.5*IQR from

Q1 or Q3

– Outliers: Beyond 1.5*IQR from Q1 or Q3 (mark with *)

– z-scores - deviation from mean in units of standard deviation. Outlier: absolute value of z-score > 3

Page 11: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

11

Probability: Basic Concepts

• Experiment: A process of OBSERVATION

• Simple event - An OUTCOME of an experiment that can not be decomposed

– “Mutually exclusive”– “Equally likely”

• Sample Space - The set of all possible outcomes

• Event “A” - The set of all possible simple events that result in the outcome “A”

Page 12: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

12

Probability • A measure of uncertainty of an estimate

– The reliability of an inference

• Theoretical approach - “A Priori”– Pr (Ai) = n/N

• n = number of possible ways “Ai” can be observed

• N = total number of possible outcomes

• Historical (empirical) approach - “A Posteriori”– Pr (Ai) = n/N

• n = number of times “Ai” was observed

• N = total number of observations

• Subjective approach – An “Expert Opinion”

Page 13: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

13

Probability Rules

• Multiplication Rule:– Number of ways to draw one element from set 1 which

contains n1 elements, then an element from set 2, ...., and finally an element from set k (ORDER IS IMPORTANT!):

n1* n2* ......* nk

0 Pr (A ) 1

Pr (A ) = 1

i

ii

Page 14: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

14

Permutations and Combinations• Permutations:

– Number of ways to draw r out of n elements WHEN

ORDER IS IMPORTANT:

• Combinations:– Number of ways to select r out of n items when order is

NOT important

Prn n

n r

!

( ) !

Crn n

r n r

!

! ( ) !

Page 15: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

15

Compound Events

}{)'(

Complement

}{)(

onIntersecti

}{)(

Union

AxxA

BandAxxBA

bothorBorAxxBA

Page 16: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

16

Conditional Probability

0)( )()()(

:Rule tiveMultiplica

0)( )(

)()(

BPprovidedBPBAPBAP

BPprovidedBP

BAPBAP

Page 17: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

17

Other Probability Rules

• Mutually Exclusive Events:

• Independence:– A and B are said to be statistically INDEPENDENT if

and only if:

P A B P A P B P A B( ) ( ) ( ) ( )

P A B( ) { }

)()()( BPAPBAP

Page 18: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

18

Bayes’ Rule

P A EP A P E A

P A P E Ai

i i

j j

j

( )( ) ( )

( ) ( )

Page 19: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

19

Random Variables

• Random variable: A function that maps every possible outcome of an experiment into a numerical value.

• Discrete random variable: The function can assume a finite number of values

• Continuous random variable: The function can assume any value between two limits.

Page 20: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

20

Probability Distribution for a Discrete Random Variable

• Function that assigns a value to the probability p(y) associated to each possible value of the random variable y.

0 1

1

p y

p yy

( )

( )

Page 21: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

21

Poisson Process

• Events occur over time (or in a given area, volume, weight, distance, ...)

• Probability of observing an event in a given unit of time is constant

• Able to define a unit of time small enough so that we can’t observe two or more events simultaneously.

• Tables usually give CUMULATIVE values!

Page 22: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

22

The Poisson Distribution

x is the number of events observed over T

is the expected number of events over T

e is the base of natural logs (2.71828)

= 2

Page 23: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

23

Poisson Approximation to the Binomial

• In a binomial situation where n is very large (n > 25) and p is very small (p < 0.30, and np < 15), we can approximate b(x, n, p) by a Poisson with probability ( lambda = np)

b y n pn

yp p P y n p

e n p

yy n y

n p y

( , , ) ( ) ( , )( )

!

1

Page 24: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

24

Probability Distribution for a Continuous Random Variable

• F( y0 ), is a cumulative distribution function that assigns a value to the probability of observing a value less or equal to y0

F y P y y f y dyy( ) ( ) ( )0 00

Property: F ( y ) is continuous over y

Page 25: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

25

Probability Calculations

P a y b f y dyab

f y d F ydy

f y y

f y dy

F y

P y a

( ) ( )

( ) [ ( )]

( )

( )

( )

( )

where f ( y ) is the density function of y

F(y)isthe(probability)distributionfunctionof y

iscontinuous

forallcontinuous r.v.(a constant)

0

1

0

Page 26: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

26

Expectations

Properties of Expectations

E y yp y discrete

E y y f y dy continuous

E g y g y f y dy

Variance E y E y

all y

( )

( ) ( )

[ ( ) ] ( ) ( )

[ ( ) ] ( )2 2 2 2

2Standard deviation

E c c

E cy c E y

E g y g y g y

E g y E g y

c

cy c y

k

k

( )

( ) ( )

[ ( ) ( ) ( ) ]

[ ( ) ] [ ( ) ]

( )

( ) ( )

1 2

1

2

2 2 2

0

Page 27: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

27

The Uniform Distribution

( ) ( )a b b a

22

2

12

A frequently used model when no data are available.

Page 28: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

28

The Triangular Distribution

A good model to use when no data are available. Just ask an expertto estimate the minimum, maximum, and most likely values.

Page 29: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

29

The Normal Distribution

z y

the standard normal variable

Tables provide cumulative values for the Standard Normal Distribution N ( = 0, = 1 )

Page 30: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

30

The Lognormal Distribution

Consider this model when 80 percent of the data valueslie in the first 20 % of the variable’s range.

Page 31: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

31

The Gamma Distribution

Properties: 2 2

Page 32: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

32

The Erlang Distribution

A special case of the Gamma Distribution when = k = integerA Poisson process where we are interested in the time to observe k events

Page 33: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

33

The Exponential Distribution

A special case of the Gamma Distribution when =1

Page 34: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

34

The Weibull Distribution

A good model for failure time distributions of manufactured items. It has a closed expression for F ( y ).

Page 35: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

35

The Beta Distribution

A good model for proportions. You can fit almost any data.However, the data set MUST be bounded!

Page 36: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

36

Bivariate Data (Pairs of Random Variables)

• Covariance: measures strength of linear relationship

• Correlation: a standardized version of the covariance

• Autocorrelation: For a single time series: Relationship between an observation and those immediately preceding it. Does current value (Xt) relate to itself lagged one period (Xt-1)?

Cov X Y E X E X Y E Y E XY E X E Y( , )

Cov X Y

X Y

,

Page 37: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

37

Sampling Distributions

The population has PARAMETERS

A sample yields STATISTICS X

A statistics is calculated based on the values observed in a sample.

Those values are random variables. Therefore, a statistics

is a RANDOM VARIABLE.

The sampling distribution of a statistic is its probability distribution.

The STANDARD ERROR of a statistic is the standard deviation of

its sampling distribution.

_

,

, S 2

See slides 8 and 9 for formulas to calculate sample means and variances (raw data and grouped data, simultaneously).

Page 38: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

38

The Sampling Distribution of the Mean (Central Limit Theorem)

The CENTRAL LIMIT THEOREM: If random samples

of size n are taken from a population having ANY distribution

with mean and standard deviation , then, when n is large

enough, the sample distribution of the mean can be approximated

by a normal density with mean and standard deviationY_

Y n_

Page 39: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

39

The Sampling Distribution of Sums

Let L a y a y a y

Assume E y Var y Cov y y

E L a a a

Var L a a a

a a a a a a

k k

i i i i i j ij

k k

k k

k k k k

1 1 2 2

2

1 1 2 2

1

2

1

2

2

2

2

2 2 2

1 2 12 1 3 13 1 1,2 2 2

.....

( ) , ( ) , ( , )

( ) .....

( ) .....

.....

Then L possesses a normal density with mean and variance:

Page 40: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

40

Distributions Related to Variances

For a sample with standard deviation S, the statistics

( )followsaChi squaredistr.with n 1.

For two independent samples, thestatistics

//

followsanF distributionwithparameters

inthenumerator and inthedenominator.

The sum of two chi - squares follows a chi - square

distribution with =

1 2

1

2

2

2

1

2

1

2

2

2

2

1

n S

F

Page 41: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

41

The t Distribution

Let z be a standard normal variable and be a chi - square

random variable with degrees of freedom. If z and are

independent, then t = z

is said to posses a

Student's distribution ("t-distribution") with df.

COROLLARY: For a random sample taken from a

normal population, t = y -

S / nfollowsat distribution

with df

2

2

2

/

.

Page 42: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

42

Estimation

• Point and Interval Estimators

• Properties of Point Estimators– Unbiased: E (estimator) = estimated parameter

Note: S2 is Unbiased if

– MVUE: Minimum Variance Unbiased Estimators

• Most frequently used method to estimate parameters: MLE - Maximum Likelihood Estimators.

E Y_

Page 43: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

43

Interval Estimators -- Large sample CI for mean

From the Central Limit Theorem:

Prob -z

After some algebraic manipulation we get:

Prob X X

/ 2

_ _

The ( 1 - ) * 100% Confidence Interval for

X

nz

zn

zn

_

/

/ /

/ 2

2 2

1

1

Page 44: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

44

Interval Estimators -- Small sample CI for mean

For small samples( n < 30 ):

Prob - t

After some algebraic manipulation we get:

Prob X X

/ 2

_ _

The ( 1 - ) * 100% Confidence Interval for (small samples)

X

S nt

tS

nt

S

n

_

/

/ /

/ 2

2 2

1

1

Page 45: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

45

Sample Size

Based on CI for the mean:

Recommendation:

Sample approximately 30

Estimate using S

Estimate n

Take more observations as needed.

2 2

nz z S

/ /2

2

2

2

Page 46: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

46

CI for proportions (large samples)

The distribution of a proportion is fairly normal with mean = p and

variance

Then, the C. I. for the population proportion is:

where p is the observed proportion of successes

Assumption: The interval does not contain 0 or 1.

2

^

p p

n

p p zp p

n

y

n

1

12

^

/

^ ^

( )

Page 47: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

47

Sample Size (proportions)

Based on CI for a proportion:

Recommendation:

Sample approximately 30

Estimate p

Estimate n

Take more observations as needed.

^

nz

p p

/

^ ^2

2

1

Page 48: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

48

CI for the variance

The statistics:

A Chi - Square distr. with = n - 1

After some algebraic manipulation:

Prob

Assumption: Population is approximately normal.

n S

n S n S

1

1 11

2

2

2

2

2

2

2

2

1 2

2

~

/ , ( / ),

Page 49: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

49

CI for the Difference of Two Means -- large samples --

The difference of two means follows a normal density with:

E Y Y

C.I. for = Y z

Y z

Assumptions: Independent samples with more than 30

observations each.

1 1

1 / 2

1 / 2

_ _ _ _

_ _

_ _

Y and Var Yn n

Yn n

YS

n

S

n

2 1 2 2

1

2

1

2

2

2

1 2 2

1

2

1

2

2

2

2

1

2

1

2

2

2

Page 50: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

50

CI for (p1 - p2) --- (large samples)

For large samples ( :

Approximation is good as long as neither interval includes

0 or 1.

1

^

1

^

1

n andn

p p z p p zp qn

p qnp p

2

2 2 2 2

1 1

1

2 2

2

30

1 2

)

^

/

^

/

^ ^ ^ ^

^ ^

Page 51: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

51

CI for the Difference of Two Means -- small samples, same variance --

C.I. for = Y1n

where S ("pooled variance")

Assumptions:

1. Independent samples taken from normal populations.

2. Variances are unknown but equal (

1 /2, n

1

p

2

1

2

1

1 2 2 2

2

1 1

2

2 2

2

1 2

2

2 2

2

1

1 1

1

_ _

( )

)

Y t Sn

n S n S

n n

n p

Page 52: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

52

CI for the Difference of Two Means -small samples, different variances-

C.I. for = YSn

a =Sn

Sn

(round down)

Assumptions: Independent samples taken from normal populations.

1 /2,

1

2

1

1

2

1

2

2

2

1 2 2

2

2

2

1

2

1

2

2

2

2

2

1

2

21 1

_ _

Y tSn

nd

Sn

Sn

n n

Page 53: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

53

CI for the Difference of Two Means -- matched pairs --

We have PAIRS of observations related through somecommon factor (Y , Y ):

Let d Y Y the observed difference for pair i

C.I. for

where and are the mean and the standard deviationof the n sample differences.

Assumptions: Random observations; the populationof paired differences is normally distributed.

1i 2i

i 1i 2i

d

d tSn

d S

n

d

d

_

/ ,

_

2 1

Page 54: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

54

CI for two variances

Recall:

After some algebraic manipulation:

Prob

F

n Sn

n Sn

S

SFn n

S

S

Fn n

12

1

22

2

11

12

12 1

1

21

22

22 2

1

12

12

22

22

11

21

12

22

11

2

/

/

/

/

~,

, 1 2

12

22

12

22

11

21 1 2

1

, / , , ( / )

( )

S

S

Fn n

Assumption: Independent samples from normal populations.

Page 55: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

55

Prediction Intervals

Consider the prediction of the value for the NEXT observation (not the

mean value but its actual value), e.g., we want a "confidence interval" for y

Consider the difference between this observation and the sample mean:

y

y

If the distribution of y is approximately normal, this difference will also be normal.

This yields the following "prediction interval" for the next observation, y

Pr

n +1

n +1 y

n +1

n + 1

n + 1

.

( ) ( )

( ) ( )

:

_ _

_ _

_E y E y E y

y y yn n

y

ny

n

1

2 2

1

2 2

2

2

0

11

_

/ ,

_

/ ,

t S

ny t S

nn n 2 1 2 1

11

11

1yn 1

Page 56: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

56

Hypothesis Testing

• Elements of a Statistical Test. Focus on decisions made when comparing the observed sample to a claim (hypotheses). How do we decide whether the sample disagrees with the hypothesis?

• Null Hypothesis, H0. A claim about one or more population parameters. What we want to REJECT.

• Alternative Hypothesis, Ha: What we test against. Provides criteria for rejection of H0.

• Test Statistic: computed from sample data.

• Rejection (Critical) Region, indicates values of the test statistic for which we will reject H0.

Page 57: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

57

Errors in Decision Making

True State of Nature

H0 Ha

Decision Dishonest client Honest client

Do not lend Correct decision Type II error

Lend Type I error Correct decision

Page 58: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

58

Statistical Errors

T y p e I e r r o r

T y p e I I e r r o r

P o w e r o f a s t a t i s t i c a l t e s t

( ): Rejecting a true

Null Hypothesis (producer's risk)

( ): Rejecting a true

Alternative Hypothesis (consumer's risk)

,

( 1 - ), is the probability of rejecting the

null hypothesis H when, in fact, H is false.0 0

Page 59: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

59

Statistical Tests

O n e - t a i l e d t e s t s

T w o - t a i l e d t e s t s

:

H H <

Rejection region: z > z

:

H H

Rejection region: z > z or

where z = / n

and P(z > z

0 a 0

0 a

: : ( )

( )

: :

)

/ /

_

0 0

0 0

2 2

0

or

or z z

z z

X

Page 60: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

60

The Critical Value

The sample size for specified and when testing H = versus

H is given by

n = z

Assumption: is the same under both hypotheses.

0 0

a a

:

:

z

a

22

0

2

Page 61: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

61

The observed significance level for a test

It is standard in industry to use = 0.05.

Some researchers prefer to report the observed

"p - value". This is the probability (under H

of observing the value of the test statistic. This

allows the reader to make his (her) own decision

about accepting or rejecting H

Most computer packages report the significance as

(for example) Prob > T

0

0

)

.

Page 62: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

62

Testing proportions (large samples)

H p pp pp p

nyn

H p p

n

a

0 0

0

0 0

0

1

1

:( )

( : )

( ) /

test statistic: z =

where p is the observed proportion of successes

Rejection region (example): z > z

Assumption: The interval p 2 p p

does not contain 0 or 1.

^

^

^ ^ ^

Page 63: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

63

Testing a Normal Mean

Select . Set your test as one- tailed or two- tailed.

Calculate test statistic: z = y y

Compare to the critical value (from book's table).

If sample is small ( n < 30 ):

Calculate test statistic: t =y

(ass umes an approximately normal population)

_ _

_

0 0

0

/ /

/

n S n

S n

Page 64: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

64

Testing a variance

H

n S

for H

for H

or for H

a

a

a

0

2

0

2

2

0

2

2 2

0

2

1

2 2

0

2

2

2

1 2

2 2

0

2

1:

:

:

:/ /

test statistic:

Rejection region:

Assumption: Population is approximately normal.

2

2

2

2 2

Page 65: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

65

Testing Differences of Two Means -- large samples --

H D

Y D

Sn

Sn

H DH D

or z z H D

a

a

a

0 1 2 0

2 0

1

2

1

2

2

2

1 2 0

1 2 0

2 2 1 2 0

:

:::

_ _

/ /

test statistic: z Y

Rejection region: z > z if

z < -z ifz > z if

Assumptions: Independent samples with more than 30observations each.

1

Page 66: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

66

Testing Differences of Two Means -- small samples, same variance --

H DY Y D

Sn n

H D

n S n S

n n

p

n n a

0 1 2 0

2 0

1 2

2 1 2 0

1 1

2

2 2

2

1 2

2

2 2

1 1

1 1

1

1 2

:

: )

( )

)

_ _

,

test statistic: t

Rejection region (example): t > t (

where S ("pooled variance")

Assumptions: 1. Indep. samples from normal populations.

2. Variances are unknown but equal (

1

p

2

1

2

Page 67: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

67

Testing Differences of Two Means -small samples, different variances-

H DY Y D

Sn

H D

where

Sn

Sn

n n

a

0 1 2 0

2 0

22

2

1 2 0

12

1

22

2

2

2

1

2

21 1

:

: )

_ _

,

test statistic: t Sn

Rejection region (example): t > t (

=Sn

Sn

(round down)

Assumptions: Independent samples taken from approximately normal populations.

1

12

1

12

1

22

2

Page 68: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

68

Testing Difference of Two Means -- matched pairs --

We have PAIRS of observations related through somecommon factor (Y ,Y ):

Let d Y Y the observed difference for pair i

H test statistic: t = d

Rejection region: t > t for H

where and are the mean and the standard deviationof the n sample differences.

Assumptions: Random observations; the populationof paired differences is normally distributed.

1i 2i

i 1i 2i

0 diff

_

a diff

:/

:,

_

1 2 0

0

1 1 2 0

DD

S n

D

d S

d

n

d

Page 69: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

69

Testing a ratio of two variances

H

test statistic: F = larger sample variancesmaller sample variance

Rejectionregion: F > F

F > F

Assumption: Independent samples from normal populations.

Note: Make sure the df in the numerator are those of

the sample with larger variance!

0: ( . ., )

:

:/

1

2

2

2 1

2

2

2

1

2

2

2

2 1

2

2

2

1

eg

for H

for Ha

a

Page 70: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

70

Testing (p1 - p2) --- (large samples)

For large samples ( :

test statistic: z =

Approximation is good as long as no interval includes 0 or 1.

1^

n and n

H p p Dp p D

when Dp qn

p qn

when D pqn n

and py yn n

p p

p p

p p

2

0 1 2 0

1 2 0

0

1 1

1

2 2

2

0

1 2

1 2

1 2

30

0

01 1

1 2

1 2

1 2

)

:( )

^

^ ^ ^ ^

^ ^ ^

^ ^

^ ^

^ ^

Page 71: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

71

Categorical Data

One-way Table: Categories and their frequencies:

Categ. 1 2 .. k Total Freq.

Large sample conf. int. for

Example: EE ME Others Total 17 11 9 37

Then

n n n n

p p znp p

p

p

k

i i i i

EE

EE

1 2

2

11

1737

196137

1737

2037

046 016

030 062

..

. . .

. .

^

/

^ ^

Page 72: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

72

One-way Tables (Cont.)

Large sample (1 - ) 100 % Conf. Int. for

In the example:

p p

p p p p zn

p p p p p p

p p

i j

i j i j i i j j i j

EE ME

:

( ) ( ) ( )

.

. .

^ ^

/

^ ^ ^ ^ ^ ^

2

11 1 2

1737

1137

196137

1737

2037

1137

2637

21737

1137

0162 0275

0113 0437

0045 0477

. .

: . . ,

p p

NOTE p p again

EE ME

EE Others

NOT significant!

difference is NOTsignificant!

Page 73: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

73

Categorical Data Analysis

General r x c Contingency Table 1 2 .. c Totals1 n(1,1) n(1,2) .. n(1,c) r (1)2 n(2,1) n(2,2) .. n(2,c) r (2).. .. .. .. .. ..r n(r,1) n(r,2) .. n(r,c) r (r)

Totals c(1) c(2) .. c(c) n

Page 74: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

74

Example of a Contingency Table

STA 3032 - Summer 1994Grade Q2 Q4 Q6 Total

0-2 13 0 2 152.1-4 6 1 1 84.1-6 8 5 11 246.1-8 4 7 9 208.1-10 2 16 6 24Total 33 29 29 91

Page 75: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

75

Testing for IndependenceH

0Variables are independent H

aThey are not

Test statistic: 2

where Rejection region:0.05, (r - 1) (c - 1)

Note: regroup rows (columns) as needed for

In the example: 2

: :

, .

nij

E nij

E nij

i

r

j

cn

nijricji

r

j

c

E nij

ricj

n

E nij

i j

2

11

2

111

2

5

91192

23 33

12

23 29

62

24 291 4133

0 05 612 5916

... .

. ,. (Note regrouping! Compare to from Table)

Conclusion: Variables are NOT independent.

Page 76: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

76

Distributions: Model Fitting Steps Collect data. Make sure you have a random sample.

You will need at least 30 valid cases Plot data. Look for familiar patterns Hypothesize several models for distribution Using part of the data, estimate model parameters Using the rest of the data, analyze the model’s

accuracy Select the “best” model and implement it Keep track of model accuracy over time. If warranted,

go back to 6 (or to 3, if data (population?) behavior keeps changing)

Page 77: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

77

Chi-Square Test of Goodness of Fit

H

At least one

Let n = sample size and the observed frequency in cell i

Make sure that e (if not, regroup cells as needed).

Test Statistic:

Rejection Region:

where: =k - r -1k =number of cells after regroupingr =number of parameters estimated from data to calculate

0

i

2

2

i0

: ; ; .... ;

:^

p p p p p p with p p

H p p

py

nnp i

n e

e

n np

np

p

k k i iii

a i i

ii

i

i i

ii

k i i

ii

k

1 10 2 20 0 0

0

2

1

0

2

01

2

1

5

Page 78: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

78

Kolmogorov-Smirnov Test of Goodness of Fit

Compares the empirical distribution function F with

a hypothesized theoretical distribution function F .

Empirical: F = fraction of the sample less or equal to y

= for the i ranked observation (contains y)

F

F

Then D = max F F

Critical values given in tables

n

n

n

( )

( )

( )

max ( )

max ( )

( ) ( ) max( , )

y

y

y

i

nth

Let Dn

y

D yi

n

y y D D

i

i

1

1

Page 79: Dr. Héctor AllendeReview of Probability and Statistics 1 A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling

Dr. Héctor Allende Review of Probability and Statistics

79

A Review of Probability and Statistics

• Descriptive statistics

• Probability

• Random variables

• Sampling distributions

• Estimation and confidence intervals

• Test of Hypothesis–For mean, variances, and proportions–Goodness of fit