psychology 215: statistics for social science kate bezrukova [email protected]...

90
Psychology 215: Statistics for Social Science Kate Bezrukova [email protected] du Introduction

Upload: rolf-lindsey

Post on 26-Dec-2015

227 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Psychology 215: Statistics for Social Science

Kate Bezrukova

[email protected]

Introduction

Page 2: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Introduction

• Our Increasingly Quantitative World!!!

• Data are not just numbers! but numbers that carry information

• Purposes:– producing trustworthy data – analyzing data to make their meaning clear,

and – drawing practical conclusions from data

Page 3: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Basic Idea of Statistics:

To make inferences about a population using data from only a sample

population

inference about population(using statistical tools)

sample of data

Page 4: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Overview of Course

• Methods of data collection– summarizing data

• Visualizing data– correlation and association

• Standard scores and Normal Curve– statistical inference

• Statistical tests– necessary skills

Page 5: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Variables

Page 6: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

• sample – statistics – carries uncertainty

• population – parameter – carries no uncertainty

is there any interest in a group larger than one you have?

1. if “yes” - sample2. if “no” - population

Statistics vs. Parameters

Page 7: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Terms

• Data - numbers collected for purpose in a particular context– Case/observational unit

• Variability – fundamental principle.• Variables – any characteristics of a person/thing

that can be assigned a number or category– measurement (continuous) variables – assumes a

range of numerical values– categorical variables – simply records a category

designation.

Page 8: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Types of Variables

• Dependent Variable (DV)

• Independent Variable (IV)

• Extraneous Variable– Confounded Variables

Variables = Aspect of a testing condition that can change or take on different characteristics with different conditions.

Page 9: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Graphs

Page 10: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Distributions

• data distribution – pattern of variability.– the center of a distribution– the ranges– the shapes

• simple frequency distributions

• grouped frequency distributions– midpoint

Page 11: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Graphic Presentation of Data

the histogram (quantitative data)

the bar graph (qualitative data)

the frequency polygon (quantitative data)

Page 12: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Distributions

• Bell-Shaped (also known as symmetric” or “normal”)

• Skewed:– positively (skewed to

the right) – it tails off toward larger values

– negatively (skewed to the left) – it tails off toward smaller values

Page 13: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Measures of Central Tendency

Page 14: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Mean

N

N

an instruction “to add”

mean

a score of observations

number of observations

characteristics: = 01.

2. = minimum, “least squares”

Page 15: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Median & Mode

• MEDIAN divides a distribution of scores (always arrange in order!) into two parts that are equal in size

• median location:– if uneven N – actual score– if even N – mean of the middle two scores

• MODE is the most common value, i.e., the most frequently occurring score.

Page 16: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Simple Frequency Distributions

name X

Student1 20

Student2 23

Student3 15

Student4 21

Student5 15

Student6 21

Student7 15

Student8 20

raw-score distribution frequency distribution

f X

3 15

2 20

2 21

1 23

f

NMean

Page 17: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Measures of Variability

Page 18: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Measures of Spread: Range

is simply a numerical distance b/w the highest score and the lowest score

– how many hours did you sleep last night?

– how much sleep do you usually get on a typical night?

Page 19: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Interquartile range

tells the range of scores that make up the middle 50 percent of the distribution.

• IQR = 75th percentile – 25th percentile• a location value (.25 x N)

– similar to finding the median

• interpretation: “the middle 50 percent of the scores have values from XXX to YYY”

Page 20: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

= raw score – mean

for samples: X – X

for populations: X –

1. if X > X, positive deviation scores

2. if X < X, negative deviation scores

3. if X = X, deviation scores = 0

interpretation: the number of points that a particular score deviates from the mean (X – X) = 0).

Deviation Scores

Page 21: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

- is used to describe the variability of population ( is parameter)s - is used to estimate from a sample of the population (s is statistics)S - is used to describe the variability of a sample when we have no desire to estimate . (S – statistics)

– Choosing the correct SD:1. how the data was gathered -- was sampling used?2. generalization – purpose of the data?

– Formulas:• deviation-score formula• raw-score formula

Standard Deviation

Page 22: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Deviation-Score Method

where

)2

(X- X

NS =

)2 (X-

N

standard deviation of population

S = standard deviation of sampleN = number of scores

1. find a deviation score for each raw score2. square the deviation score3. add them up4. divide this sum by N and find a square

root

Page 23: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Raw-Score Method

X)2

X2-

NN

(

where X2 = sum of the squared scores

X)2 = square of the sum of the raw scoresN = number of scores

(

1. square the sum of the raw scores and divide by N2. square each score and add them up3. subtract 1 from 24. divide this difference by N and find a square root

Page 24: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Standard Deviation (Sample)

XS =

)2 (X-

N - 1

X)2

SX2-

N - 1N

(

fX)2

SfX2-

N - 1N

(

fX2 squaring, multiplying, summing

(fX)2 - multiplying, summing, squaring

Page 25: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

(x-x)2

• the number before taking the square root

• analysis of variance

• N – population

• N – 1 – sample

• – sum of squares

Variance

(X-)2

N S 2=

(x-x)2

Page 26: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Other Descriptive Statistics

Page 27: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Combination Statistics: z scores

• standard scores are usually used to compare two scores from different distributions

• positive z scores = raw scores > mean• negative z scores = raw scores < mean• the absolute value of the z score |z score| tells the

number of standard deviations the score is from the mean

z = 0 because (X – X) = 0

Xz =

X -S

Page 28: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

How much difference is there? Effect Size Index “d”

• describes the size of the difference between two distributions

d = always a positive number!

small effects d = .20medium effects d = .50large effects d = .80

“huge”, “half the size of small”, “somewhat larger than”, “intermediate between”

pooled

Page 29: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Descriptive statistics report: Boxplot

- minimum score- maximum score- lower quartile- upper quartile - median- mean- the skew of the distribution: positive skew: mean > median & high-score whisker is longer negative skew: mean < median & low-score whisker is longer

Report: 1) boxplot (draw)2) effect size index (calculate)3) story (write): a) central tendency, b) form of the distributions,

c) overlap of distributions, d) interpretation of the effect size index

Page 30: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

EXAMPLE: A descriptive statistics report on “Marriage Ages”

1. The graph shows boxplots of marriage ages of women and men.

+++++++++insert your boxplot here+++++++++

2. The mean age of women is 35 years old; the median is 33 years old. The mean age of men is 38.64 years old; the median is 34 years old.

3. The marriage ages for both women and men are positively skewed. More women and men get married at the younger age*.

4. Although the two distributions overlap, the middle 50% of the men are somewhat older than the middle 50% of the women.

5. The difference in means produces an effects size index of .32. This value is somewhat larger than a small effect size.

Page 31: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Correlation and Regression

Page 32: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

a correlation coefficient (r) provides a quantitative way to express the degree of linear relationship between two variables.

• Range: r is always between -1 and 1 • Sign of correlation indicates direction: - high with high and

low with low -> positive - high with low and

low with high -> negative - no consistent pattern -> near zero

• Magnitude (absolute value) indicates strength (-.9 is just as strong as .9)

.10 to .40 weak

.40 to .80 moderate

.80 to .99 high 1.00 perfect

Correlation Coefficient

Page 33: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Pearson product – moment correlation coefficient: Formulas

raw-score formula

The rule of thumb Correlation coefficient should be based on at least 30 observations

Invented by Carl Pearson and used to describe the strength of the linear relationship between two variables that are both either ratio or interval variables.

Definitional Formula ( zx zy )

r = Nwhere r = Pearson product-moment correlation coefficient

zx = a z score for variable X zy = a z score for variable Y N = number of pairs of X and Y values

Computational Formulas

Page 34: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Pearson product – moment correlation coefficient: Blanched Formula

where

X and Y are paired observations

XY = product of each X value multiplied by its paired Y value

X =mean of variable X

Y =mean of variable Y

Sx =standard deviation of X distribution

Sy =standard deviation of Y distribution

N = number of paired observations

r =

ΣΧΥ N

- (Χ)(Υ)

(Sx) (Sy)

Page 35: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Correlation Coefficient: Limitations

1. Correlation coefficient is appropriate measure of association only when relationship is linear

2. Correlation coefficient is appropriate measure of association when equal ranges of scores in the sample and in the population (truncated range)

3. Correlation doesn't imply causality – Using U.S. cities a cases, there is a strong positive correlation

between the number of churches and the incidence of violent crime

– Does this mean churches cause violent crime, or violent crime causes more churches to be built?

– More likely, both related to population of city (3d variable -- lurking or confounding variable)

Page 36: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Coefficient of Determination

r2 tells the proportion of variance that two variables in a bivariate distribution have in common.

Page 37: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Introduction to Linear Regression

linear regression is the statistical procedure of estimating the linear relationship between Y and X

Y = a + bX, where X and Y are variables representing scores on the Y and X axes,b = slope of the regression linea = intercept of the line with Y axes

positive slope -- if the highest point on the line is to the right of the lowest pointnegative slope – if the highest point on the line is to the left of the lowest point

Page 38: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Regression Coefficients

Syb = r Sx, where

r = correlation coefficient for X and YSy = standard deviation of the Y variableSx = standard deviation of the X variable

a = Y – bX, where

Y = mean of the Y scoresb = regression coefficientX = mean of the X scores

Page 40: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Theoretical Distributions and Theoretical Probabilities

• probabilities range from .000 to 1• the expression p = .077 means that there are 7.7 chances in 100 of

the event to occur.

Ace 2 3 4 5 6 7 8 9 10 J Q K

4

3

2

1

Page 41: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Theoretical and Empirical Distributions

• theoretical distributions (logic, math)

• empirical distributions (observation)

Page 42: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

The Standard Normal Distribution

1. the mean, median and mode are the same score on the X axis where the curve peaks

2. the area to the left or right of the line = 50% of the total area

3. the tails of the curve are asymptotic to the X axis – they never cross the axis but continue in both directions indefinitely

4. the inflection points are where the curve is the steepest (-1 and +1)

Page 43: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

The Normal Distribution table(Table C in Appendix C)

The Table C can be used to determine areas (proportions) of a normal distributions and obtain the probability figures.

• Column A contains a z score• Column B -- the area b/w the mean and the z-score • Column C -- the area beyond the z score

• all the proportions hold for - z because the curve is symmetrical

Page 44: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

EXAMPLE: The College Frisbee Golf Course

On the average, it takes 27 throws to complete the College Frisbee Golf Course. The standard deviation about this mean is 4.

Questions that we can answer: 1. finding the proportion of a population that has

scores of a particular size (e.g., what proportion of population would be expected to score 22 or less?)– interpretation: there are XX.XX chances in 100

that players would have 22 or less throws.2. finding the score that separates the population

into two proportions (e.g., if 950 students played and a prize was given for scoring 20 or less, how many would get prizes?)– interpretation: XX of the 950 people would get

prizes.

Page 45: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Example: The College Frisbee Golf Course. Cont’d

3. finding the extremes scores in a Population (e.g., suppose an experimenter wanted to find some very good Frisbee players to use in an experiment. She decided to use the top 10 percent from the CFGC. What is the cut-off score?)

• interpretation: in order to be considered as a very good Frisbee player, you need to have a XXX score or higher.

4. finding the proportion of the population between two scores (e.g., what proportion would score between 25 and 30?)

• interpretation: there are XX.XX chances in 100 that players would score between 25 and 30.

Page 46: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Samples, Sampling Distributions, and Confidence Intervals

Page 47: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Sampling

The essential idea of sampling is to learn about the whole by studying a partTwo important terms:

1. population – the entire group of people or objects2. sample -- a (typically small) part of the population

• Biased Sampling -- a tendency to systematically overrepresent / underrepresent certain segments of the population

– convenience samples– voluntary response

Page 48: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Random Samples

• Random sample – any method that allows every possible sample of size N an equal chance to be selected.

• A method of getting a random sample -- a table of random numbers (Table B in Appendix C):– each position is equally likely to be occupied by any one of

the digits 0,1,2,3,4,5,6,7,8,9– the occupant of any one position has no impact on the

occupant of any other position

Page 49: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Sampling Distributions: Describing the Relationships b/w X and

• expected value is the mean of a sampling distribution

• standard error is the standard deviation

• the sampling distribution of the mean:– every sample is drawn randomly from a

specific population

– the sample size (N) is the same for all samples

– the number of samples is very large

– the mean (X) is calculated for each sample

– the sample means are arranged into a frequency distribution

Page 50: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Central Limit Theorem

• The sampling distribution of the mean approaches a normal curve as N increases.

• If you know and for the population:– the mean of the sampling distribution (expected

value) = – the standard deviation of the sampling distribution

(standard error) = .

symbols:

x – the standard error of the mean E (X) – the expected value of the mean

Page 51: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Calculating the Standard Error of the Mean

Page 52: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Determining Probabilities About Sample Means

A test of computer anxiety has a population mean of 65 and a standard deviation of 12. A random sample of size 36 is drawn from the population with a sample mean of 68. What is the probability of selecting a sample with a smaller mean?

1. Compute the standard error of the mean: 12/ SQRT(36) = 12/6 = 22. Compute a standard score for the mean of 68: Z = (68 - 65)/2 = 3/2 = +1.5

3. The area between the mean and a Z-score of +1.5 is .4332

4. Thus, the probability of a score below 68 is .5000 + .4332 = .9332

X – z = x

Page 53: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

The t distribution

• when you do not know and you don’t have population data to calculate it – use s as an estimate of

• t distribution – depends on the sample size with different distributions for each N.– degree of freedom: df = N – 1, ranges from 1 to ∞– t values are similar to the z scores used with the normal curve

– as df increases, the t distribution approaches the normal distribution

able D in Appendix C: – first column shows the degree of freedom

– the top row is for confidence intervals

– each column is associated with a percent of probability

Page 54: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Confidence Intervals

• a confidence interval establishes an interval within which a population parameter is expected to lie (with a certain degree of confidence)

• confidence intervals for population means produce an interval statistic (lower and upper limits) that is destined to contain 95 percent of the time

LL = X – t(sx)

UL = X + t(sx)

where X is the mean of the sample from the population

t is a value from the t distribution table

sx is the standard error of the mean, calculated from a sample

Page 55: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

EXAMPLE: Confidence Intervals

There are several published tests of self-efficacy. Suppose that the population mean for a particular test is 20. A therapy class taught by a graduate student had just finished the term. A psychologist wanted to determine the effect of the course on students’ sense of self-efficacy. The following statistics were produced. Use a 95 percent confidence interval to analyze the data. Write a conclusion about the effect of the class.

• N = 36 X = 684 X2 = 13,136• Interpretation: The population mean for the test is 20, and the 95

percent confidence interval about the mean of all those taking the workshop is entirely below 20. You can conclude (with 95 percent confidence) that the workshop was not sufficient to increase self-efficacy.

Page 56: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Hypothesis Testing: One-Sample Designs

Page 57: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Hypothesis Testing

• the Frito-Lay® company claims that bags of Doritos® tortilla chips contain 269.3 grams. When you buy a bag of chips, how much do you get?

- a parameter that describes a population (company standard)

a parameter that describes a population of actual weights

• there are three possible relationships b/w and 3

Page 58: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Outline

1. gather sample data from the population you are interested in and calculate a statistic

2. recognize two logical possibilities for the population:1. H0: a statement specifying an exact value for the parameter of

the population

2. H1: a statement specifying all other values for the parameter

3. using a sampling distribution that assumes H0 is correct, find the probability of the statistic you calculated

4. if the probability is very small, reject Ho and accept H1. If the probability is large, retain both H0 and H1, acknowledging that the data do not allow you to reject H0

Page 59: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Establishing a significance level: Setting Alpha ()

• significance level is the choice of a probability level

• rejection region

• critical values: For example, t.05 (14 df) = 2.145– the sampling distribution that

was used (t) level (.05)

– degree of freedom (14)

– critical value from the table (2.145, for a two-tailed test)

Page 60: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

The one-sample t test

X – 0

t = sx ; df = N – 1 , where

X is the mean of the sample

0 is the hypothesized mean of the population

sx is the standard error of the mean

Page 61: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

EXAMPLE 1: “Doritos bags”, one-sample t-test

1. the population mean is that of all bags; the sample mean (N = 8) is 270.675 grams

2. H0: the mean weight of Doritos bags is 269.3 grams, the weight claimed by the Frito-Lay company.H1: the mean weight of Doritos bags is not 269.3 grams

3. the t distribution for 7 df (N-1) (Table D in Appendix C)

4. if the probability is low – reject H0, meaning that the mean of Doritos bags is NOT 269.3 gramsif the probability is high – reject H1, meaning that the mean weight of Doritos bags could be 269.3 grams

Page 62: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

EXAMPLE 2: “Misinformation test”one-sample t-test

• A psychologist who taught the introductory psychology course always gave her class a "misinformation test" on the first day of class. Her test contained some commonly held incorrect beliefs about psychology. Over the years the mean number of errors on this test had been 21.50. The data for this year follow. Analyze the data with a t test calculate the effect size index, and write a conclusion.

X = 198 X2 = 4700 N = 11

Page 63: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Effect size index: How big is a difference?

|X – 0|

d = where X is the mean of the sample

0 is the mean specified by the null hypothesis

is the standard deviation of the null hypothesis population

• small effect d = .20• medium effect d = .50• large effect d = .80

Page 64: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Type I and Type II errors

Four possibilities:

1. The H0 is true but test rejects it ( Type I error)

2. The H0 is false but test accepts it (Type II error)

3. The H0 is true and test accepts it ( correct decision )

4. The H0 is false and test rejects it (correct decision)

• the probability of Type I error is denoted by (alpha)• the probability of Type II error is denoted by (beta)

Page 65: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

One- and two-tailed tests

1. a two-tailed test of significance:

• the sample is from a population with a mean less than/greater than that of the Ho

2. a one-tailed test of significance:

• the sample is from a population with a mean less than that of the Ho

• the sample is from a population with a mean greater than that of the Ho

Page 66: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

, Type I error and p-value??

is the probability of a Type I error (e.g., = .05)

a Type I error refers to when we mistakenly reject Ho.

p-value is the probability of obtaining the sample statistic actually obtained, if Ho is true

Page 67: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Testing the statistical significance of correlation coefficients

• definition formula:

r -

t = sr ; sr = √(1-r2)/(N-2)

• working formula:

N-2

t = (r)√ 1-r2 ; df = N – 2, where N = number of pairs

• Two uses of the t distribution:– to test a sample mean against a hypothesized

population mean– to test a sample correlation coefficient against a

hypothesized population correlation coefficient of .00

Page 68: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Hypothesis Testing: Two-Sample Designs

Page 69: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Logic1. begin with two logical possibilities:

• Ho: A= no A -- treatment A does not have an effect, the difference between population means is zero

• H1: A= no A – treatment A does have an effect, the difference between population means is not zero

2. assume that Treatment A has no effect (that is, assume Ho)3. decide on an alpha level (e.g., = .05).4. choose an appropriate inferential statistical test:

• test statistic (e.g., mean)• a sampling distribution of the test statistic (when Ho is true) (e.g.,

t distribution)• a critical value for the alpha level

5. calculate a test statistic using the sample data6. compare a test statistic to the critical value from the sampling

distribution7. write a conclusion

Page 70: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Independent - sample designs

H0: 1 = 2

H1: =

X1 – X2

t = sX1 – X2 where,

sX1 – X2 is the standard error of a difference

df = N1 + N2 – 2

assumptions:1. the DV scores are normally distributed and have equal

variances2. the samples are randomly assigned/selected

Page 71: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

EXAMPLES: Two-sample t-tests, independent design

Example “Achievement Test Scores” (equal N's design)"Achievement test scores are declining all around us," brooded Professor Probity. "Not in my class," vouchsafed Professor Paragon. "Here are my final-exam scores for last year and this year on the same exam. Run your own t test on them. Calculate the effect size index." What did Probity discover?

Example “Yummi Very Vanilla” (unequal N's design) An experimenter randomly divided a group of volunteers into two groups. One group fasted for 24 hours and the other for 48 hours. One person in the 24-hour group dropped out of the study. Scores below represent the number of ounces of Yummi (Very Vanilla) consumed during the first 10 minutes after the fast. Perform a t test, calculate d, and write a sentence summary of your results.

Page 72: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Paired (correlated) - samples designs

• natural pairs – we do not assign the participants to one group or the other

• matched pairs – we do assign the participants to one group or the other

• repeated measures – more than one measure is taken on each participant (e.g., before-and-after experiment)

X - Y

• t = sD where,

• sD = √sX2 + sY

2 – 2rXY (sX)(sY)

df = N -1, where N = number of pairs

Page 73: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

EXAMPLE: “Rats Learning a Simple Maze”two-sample t-test, paired design

• A comparative psychologist was interested in the effect of X-irradiation on learning. A group of rats learned a simple maze and were paired on the basis of the number of errors. One group was X-irradiated; then both groups learned a new, more complicated maze, and errors were recorded. Identify the independent and dependent variables. Test for a difference between the two groups. Find the effect size index. Write a conclusion for the study.

Page 74: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Effect size index

• effect size index for independent samples:

|X1 – X2|d = s-hat

• effect size index for paired (correlated) samples: |X – Y|

d = s-hatd

• interpretation of d: d = .20 – small effect d = .50 – medium effect d = .80 – large effect

when N1 is equal N2: s-hat = √N1(sX1 – X2)

when N1 is not equal N2: s-hat1

2(df1) + s-hat22(df2)

s-hat = √ df1 + df2

s-hatd = √N(sD), where N is the number of pairs of participants

Page 75: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Establishing a confidence interval about a mean difference

• confidence interval for independent samples:– LL = (X1 – X2) – t (sX1-X2)

– UL = (X1 – X2) + t (sX1-X2)

– interpretation: we can expect, with 95 percent confidence, that the true difference between [use the terms of your experiment] is between XX to AA.

• confidence interval for correlated samples:– LL = (X – Y) – t (sD)

– UL = (X – Y) + t (sD)

Page 76: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Power

• Power = 1 – • Factors that affect the power:

– effect size– the standard error of a difference:

• sample size• sample variability

– alpha ()

To allocate plenty of power, use large N’s!

Page 77: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Analysis of Variance: One-Way Classification

Page 78: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Example: “Social status and attitudes toward fate”, one-way ANOVA

• The following hypothetical data are scores from a test which measures a person's attitudes toward fate. (An example of such a test is Rotter's internal-external scale.) A high score indicates that the person views fate as being out of his or her control. Low scores indicate that he or she feels directly responsible for what happens. Test for differences among the three social classes.

• Fill in:• Independent variable_________• Dependent variable_________

Lower Class Middle Class Upper Class

12  13  6

10  4 5

11  9 8

9 6 2

7 12  2

3 10  1

Page 79: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Analysis of Variance: One-Way Classification

estimate of 2 variation between treatment meansF = estimate of 2 variation within treatments

1. F is a ratio of two estimates of the population variance – a between-treatments variance over an error variance.

2. when Ho is true, both variances are good estimators of the population variance (ratio is about 1)

3. when Ho is not true, the between – treatments variance over-estimates the population variance (ratio is greater than 1)

Page 80: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

New Terms

• Sum of squares (SS) – sum of squared deviations. (X – X)2

• Mean square (MS) is the ANOVA term for a variance s-hat2

• Grand mean – is the mean of all scores•

tot – the symbol stands for all such numbers in the experiment (Xtot)

• t – the symbol applies to a treatment group (Xt)• K – is the number of treatments in the

experiment (number of levels of the IV)• F test – is the result of an analysis of variance

Page 81: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Factorial ANOVA

Page 82: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Analysis of Variance: Factorial Design

• Suppose you wanted to see if the number of bedrooms as well as a house’s style and the interaction between a style and number of bedrooms has an effect on the house price.

1 year 30 years

1 BR 279K 189K 234K

5 BR 500K 250K 375K

389.5K 219.5K

traditional modern

1 BR 250K 300K 275K

5 BR 500K 550K 525K

375K 425K

0

100

200

300

400

500

600

1 year 30 years

1 BR

5 BR

0

100

200

300

400

500

600

1 2

1 BR

5 BR

Page 83: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Factorial Design Notation

• shorthand notations for factorial designs:

“2 x 2”, “3 x 2” ,etc.

A1 1 year

A230 years

B11 BR

Cell A1B1279K

Cell A2B1189K 234K

B25 BR

Cell A1B2500K

Cell A2B2250K 375K

389.5K 219.5K

Factor B

Factor A

Page 84: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Advantages of two-way ANOVA:

• valuable resources can be spent more efficiently by studying two factors simultaneously rather than separately

• we can investigate interactions between factors

Page 85: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Exercise 1

• Identify the response variable, factors, state the number of levels for each factor, and the total number of observations (N).

(a) A study of productivity of tomato plant compares four varieties of tomatoes and two types of fertilizer. Five plants of each variety are grown with each type of fertilizer. The yield in pounds of tomatoes is recorded for each plant.

Page 86: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Exercise 2

• Identify the response variable, factors, state the number of levels for each factor, and the total number of observations (N).

(b) A marketing experiment compares six different types of packaging for a laundry detergent. A survey is conducted to determine the attractiveness of the packaging in four different parts of the country. Each type of packaging is shown to 30 different consumers in each part of the country, who rate the attractiveness of the product on a 1 to 10 scale.

Page 87: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Exercise 3

• Identify the response variable, factors, state the number of levels for each factor, and the total number of observations (N).

(c) To compare the effectiveness of three different weight-loss programs, five men and five women are randomly assigned to each other. At the end of the program, the weight loss for each of the participants is recorded.

Page 88: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Exercise 4

• Numbers in the cell are means based on 8 scores for each cell.

• Data Set is an example of:a. a simple ANOVAb. a 2 x 2 factorial ANOVAc. a 3 x 3 factorial ANOVA

Page 89: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Exercise 5

• The figure that shows an interaction is

Page 90: Psychology 215: Statistics for Social Science Kate Bezrukova bezrukov@camden.rutgers.edu Introduction

Growth and age

• Imagine a two-way factorial design to study the following scientific hypothesis: “Toddlers get taller; adults don’t.” Here’s a quick summary:

Response: Height in inchesFactor 1: Age groups – 2-year-olds and adultsFactor 2: Time – at the start of the study, and

three years laterMake a two-way table summarizing the results you

would expect to get from this study. Then draw and label a graph showing the interaction pattern.