a hypothetical research example

115
A Hypothetical Research Example Is Hanx Writer better? How would you design a research project to answer this question? Based on what criteria would you make a claim? Better, no difference, worse

Upload: others

Post on 14-Apr-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Hypothetical Research Example

A Hypothetical

Research Example

Is Hanx Writer better?

How would you design a research project to

answer this question?

Based on what criteria would you make a claim?

Better, no difference, worse

Page 2: A Hypothetical Research Example

Some Fundamental Issues

How can you make a claim, usually a general

statement of the whole world, based on

partial observation of the world?

Scientific research: to generate new knowledge

Knowledge need to be as general as possible

What we can see is only part of the world

Key issues

Sample as a representation of population

Criteria for claiming positive, negative results

Page 3: A Hypothetical Research Example

Quantitative Data Analysis

Foundation

Page 4: A Hypothetical Research Example

Statistics,

Frequency Distributions

and Central Tendency

Page 5: A Hypothetical Research Example

Statistics

Page 6: A Hypothetical Research Example

Statistics

Mathematical procedures

Dealing with observed information

Organization, summarization, and

interpretation

Page 7: A Hypothetical Research Example

Populations and Samples

What are they?

Parameters and Statistics

What do they describe?

Descriptive and Inferential Statistical Methods

Different purposes of these two methods

Page 8: A Hypothetical Research Example

The relationship between a population

and a sample.

Parameters

Statistics

Descriptive Methods

Inferential Methods

Page 9: A Hypothetical Research Example

Sample Error

Page 10: A Hypothetical Research Example

Implications

Sample statistics are

Representative

Not identical to the corresponding population parameters

Two different samples will have different statistics.

Differences can occur just by chance

Sampling error is inevitable!

Page 11: A Hypothetical Research Example

Statistics in the Context of

Research

Page 12: A Hypothetical Research Example
Page 13: A Hypothetical Research Example

Teaching methods and Test Scores

Two samples

Two statistics with difference

What causes the difference?

Sample errors

Teaching methods

Using statistics to infer the characteristics of

the population

What Is the Research Question

Here?

Page 14: A Hypothetical Research Example

Scientific Methods and

Research Design

Correlational Method

Produce different sets of variables to see whether they are

related

Page 15: A Hypothetical Research Example

Income Education (year)

#1 125,000 19

#2 100,000 20

#3 40,000 16

#4 35,000 16

#5 41,000 18

#6 29,000 12

#7 35,000 14

#8 24,000 12

#9 50,000 16

#10 60,000 17

Correlation: .79

Page 16: A Hypothetical Research Example
Page 17: A Hypothetical Research Example

GPA TV Time (hours/week)

#1 2.2 25

#2 3.5 21

#3 2 20

#4 2.9 15

#5 3.1 14

#6 3.2 13

#7 2.4 10

#8 3.4 9

#9 3.8 7

#10 3.7 4

Correlation: -.63

Page 18: A Hypothetical Research Example
Page 19: A Hypothetical Research Example

Scientific Methods and

Research Design

Experimental Method

Produce different sets of variables to see whether they

have a cause-and-effect relationship

Independent and Dependent Variables

Independent

Manipulated by researchers

Different treatments

Dependent: observed

Page 20: A Hypothetical Research Example
Page 21: A Hypothetical Research Example
Page 22: A Hypothetical Research Example
Page 23: A Hypothetical Research Example
Page 24: A Hypothetical Research Example
Page 25: A Hypothetical Research Example

Variables and Measurement

Constructs

Internal attributes or characteristics that cannot be

directly observed

Intelligence, self-esteem, teaching/learning results

Discrete and Continuous Variables

Discrete

No values between two neighboring values

Continuous

Always can add a new value between two values

Real limits

Boundaries of a real score

Page 26: A Hypothetical Research Example

Scales of Measurement

The Nominal Scale Different names for categories

No quantitative distinction

The Ordinal Scale Ordered categories

The Interval Scale Ordered categories

Intervals between categories are comparable

The zero point is just for convenience

The Ratio Scale The interval scale with an absolute zero point

Temperature: What scales are they? Fahrenheit, Celsius, Kelvin

Different statistics methods for different types of variables

Page 27: A Hypothetical Research Example

Exercise

A survey collects following data

Age

Gender

Self-perception of weight level (overweight, normal,

underweight)

Satisfaction with services (in a scale of 1-5)

Weight loss

Identify their scales of measurement

Discrete or continuous?

Page 28: A Hypothetical Research Example

Statistical Notation

Summation Notation

Observed values or scores

SX: the sum of all X

How about others?

SX2, S(X+1), S(X+1)2

Page 29: A Hypothetical Research Example

Frequency Distributions

Page 30: A Hypothetical Research Example

Purpose for Frequency

Distributions

To organize research results so that

researchers can see what happened

A frequency distribution does not simply

summarize the scores, but rather shows the

entire set of scores.

Page 31: A Hypothetical Research Example

Frequency Distribution Tables

Two elements

Scores

Frequencies

Obtaining SX from a

Frequency Distribution Table

Score and frequency

Proportions and Percentages

Page 32: A Hypothetical Research Example

Grouped Frequency

Distribution Tables

Why do we need groups?

Age, income, …

It could be quite tricky

in selecting group intervals

Survey design

How many hours do you watch TV per week?

a) 0-10 b) 10-20 c) 20-30 d)30-40 e) above 40

a) 0-5 b) 5-10 c) 10-15 d)15-20 e) above 20

Page 33: A Hypothetical Research Example

Frequency Distribution Graphs

A pictorial presentation of a frequency

distribution

The X-axis (the abscissa): measurement scales

(categories)

Independent variables

The Y-axis (the ordinate): frequency

Dependent variables

Page 34: A Hypothetical Research Example

Graphs for Interval or Ratio

Data

Histograms

The width of the bar extends to the real limits.

Page 35: A Hypothetical Research Example

Graphs for Interval or Ratio

Data (Cont.)

Polygons

A continuous line

Page 36: A Hypothetical Research Example

Graphs for Nominal or Ordinal

Data

Bar Graphs

Similar to histograms but with space between bars

Page 37: A Hypothetical Research Example

Using Excel to Create Graphs

You need to have the Data Analysis package

installed

Histogram in Excel

Just generate the frequency distribution table

Use the Chart Wizard to create Histogram graph

You need to manually remove the space between bars

Page 38: A Hypothetical Research Example

Graphs for Population

Distributions

Relative Frequencies

Cannot get the absolute

frequency distribution

Smooth Curves

For numerical scores

measured by an interval

or ratio scale

Symmetrical vs. skewed

distribution

Page 39: A Hypothetical Research Example

Central Tendency

Page 40: A Hypothetical Research Example

Why do we need to measure

central tendency?

To identify the “average” or “typical” data

Products and services

Different measures in different distributions

and different types of data

Page 41: A Hypothetical Research Example

The Mean

Formulas

Population mean

Sample mean

The Weighted Mean

Different groups

How about more groups?

Different weights

GPA

SX

m = ------------

NSX

M = ------------

n

SX1 + SX2

M = ----------------

n1 + n2

S(SX)

M = -------------

Sn

S XC

M = -------------

SC

Page 42: A Hypothetical Research Example

Computing the Mean from a

Frequency Distribution Table

Distribution with frequency or percentage

Score Frequency

80 2

90 4

95 3

100 1

Score Percentage

80 20%

90 40%

95 30%

100 10%

Page 43: A Hypothetical Research Example

Characteristics of the Mean

What may affect the mean?

Remember the formula to calculate the mean

SX

M = ------------

n

Page 44: A Hypothetical Research Example

The Median

What is a median?

Finding a median

An Odd Number of Scores

1, 1, 3, 4, 5, 6, 6, 6, 7

An Even Number of Scores

1, 1, 3, 4, 5, 6, 6, 6

Page 45: A Hypothetical Research Example

The Median, the Mean, and the

Middle

Which to use?

All depends on what you mean the middle.

Mean: a weighted middle

Median: an absolute middle

Mean and Median

1,2,3,4,5

1,2,3,4,55

Page 46: A Hypothetical Research Example

The Mode

The score with the greatest frequency

A distribution could have multiple modes.

Page 47: A Hypothetical Research Example

Selecting a Measure of Central

Tendency

When to Use the Median

Extreme Scores or skewed distributions

House prices

Undetermined Scores or Incomplete Data

Ordinal Scales

When to Use the Mode

Nominal Scales

You cannot rank scores.

Discrete Variables

The mean and median may be meaningless.

Page 48: A Hypothetical Research Example

Central Tendency and the

Shape of the Distribution

Symmetrical Distributions

The mean and the median are the same

Page 49: A Hypothetical Research Example

Central Tendency and the

Shape of the Distribution

Skewed Distributions

Page 50: A Hypothetical Research Example

Variability and

Probability

Page 51: A Hypothetical Research Example

Selecting An Olympian

An easy case

Page 52: A Hypothetical Research Example

Selecting An Olympian

Page 53: A Hypothetical Research Example

Picking Up An Stock

Stock A

Average annual return: 5% in the past 8 years

5%, 4%, 3%, 3%, 3%, 6%, 7%, 9%

Stock B

Average annual return: also 5% in the past 8 years

15%, 15%, -10%, 20%, -5%, 5%, -10%, 10%

Which one to choose?

Assuming you are not a risk-taker.

Page 54: A Hypothetical Research Example

Data Collected from the Real

World Is Always Noisy.

How to Decide the Quality of

Data?

How to Tell the Difference

between Different Data Sets?

Page 55: A Hypothetical Research Example

Variability

The spread of data

Purpose for measuring variability

Understanding the distribution of data

Evaluating performances

People or products

Page 56: A Hypothetical Research Example

Simple Measures of Variability

Range and Interquartile Range

Range Difference between the maximum and the minimum

maximum – minimum +1

Or maximum – minimum

Interquartile range Difference between the first and third quartiles

Algorithm

1. Order the scores

2. Split into two equal sets

3. Find the middle values for two sets

4. Get the difference between them

What is the interquartile range of the following data? 1, 11, 15, 19, 20, 24, 28, 34, 37, 47 , 70

Page 57: A Hypothetical Research Example

Deviation

The problem of using ranges to measure variability Totally ignore the data in between

1, 11, 15, 19, 20, 24, 28, 34, 37, 47

1, 24, 24, 24, 24, 24, 24, 24, 24, 47

The problem of using interquartile Totally ignore the data outside the ranges.

1, 11, 15, 19, 20, 24, 28, 34, 37, 47

-100, 11, 15, 19, 20, 24, 28, 34, 37, 147

Deviation Distance from the mean (the average of all data)

Get deviation for every data point

Average absolute deviation

1, 3, 5, 7, 9 (4 + 2 + 0 + 2 + 4)/5 = 2.4

3, 3, 5, 7, 7 (2 + 2 + 0 + 2 + 2)/5 = 1.6

Page 58: A Hypothetical Research Example

Variance

The Mean of the squared deviation

s2 = S(X-m)2/N m: the mean of population (all existing data points)

N: the number of data points in a population

Deviation: S|X-m|/N

Why do we prefer variance over average absolute deviation?

Standard deviation s The square root of variance

Page 59: A Hypothetical Research Example

Standard Deviation and

Variance For Population

Calculation steps

Deviations

Squared deviations

Sum of squares: SS

Variance: s 2 (divided by N)

Standard deviation: s

Page 60: A Hypothetical Research Example

Computation Formula for the Sum

of Squared Deviations SS

The definitional formula is difficult to use and

may lead to rounding errors.

SS = S (X-m)2

The computational formula is often used.

SS = SX2 – (SX)2/N

They are equivalent.

Page 61: A Hypothetical Research Example

Standard Deviation and

Variance For Samples

Sample: A small portion of population

The calculation of standard deviation and variance for samples is very similar to that for a population

The only difference

n-1 is used in calculating variance

The computational formula for the sum of squares still uses n.

Why not using n?

Page 62: A Hypothetical Research Example

Degrees of Freedom

Not all samples are random!

Statistics

Assume observed data are random, but follow certain

rules

Need to make adjustment for nonrandom data

For a sample with n scores, only n-1 scores

are truly independent.

The number of truly independent deviations

All samples are biased.

Underestimate or overestimate parameters.

Page 63: A Hypothetical Research Example

Example of Underestimated

Variance

N = 6 (0,0,3,3,9,9), n = 2

Page 64: A Hypothetical Research Example

Adjustment (df) Is Necessary

Biased sample vs. unbiased sample

Biased statistics vs. unbiased statistics

Page 65: A Hypothetical Research Example

Standard Deviation in

Descriptive and Inferential

Statistics

Descriptive

What is going on and how spread data is

Mean and standard deviation

Inferential

What may come? In particular, how likely will extreme scores be observed?

Probability as a function of the mean and standard deviation

But, be careful! Lessons learned from financial markets

Probability of the market crash in October 1987

Probability of the fall of Long Term Capital Management in October 1997

When Genius Failed: The Rise and Fall of Long-Term Capital Management

Page 66: A Hypothetical Research Example

Population vs. Sample

Notations

N vs. n (size)

m vs. M (mean)

s vs. s (variance)

Page 67: A Hypothetical Research Example

z-Score:

Location of real

scores in a

standardized

distribution

Page 68: A Hypothetical Research Example

Why Do We Need z-Scores?

To help compare scores from different distributions

To help compute probability

Examples

Two students’ grades from two classes 80 vs. 90

80 (M=70, s=10) vs. 90 (M=85, s=5)

How good is a test score? GRE Verbal: 160 (percentile 85)

Standardized scores

Probability

Page 69: A Hypothetical Research Example

A z-Score

Tells the location of a score in a standardized

distribution

Sign: above or below the average

Number: distance to the mean

Formula

X-m X-M

z = ----------, z = ----------

s s

z

How far is a score from the mean, measured by the

standard deviation?

Page 70: A Hypothetical Research Example

Examples

Given, m = 500, s = 100 in all SAT scores

The z-score of an SAT of 620

The score if z = -0.3

In a standard test (s = 100), if X = 720, z = 1.2

Calculate the mean of the test

X-m

z = ----------

s

Page 71: A Hypothetical Research Example

Using a Distribution Graph

For a distribution with a standard deviation of

s = 4, a score of X=52 corresponds to a z-

score of - 2.0. What is the mean for this

distribution?

Page 72: A Hypothetical Research Example

Using z-Scores to Standardize

a Distribution

Questions

Can we compare students’ GRE scores obtained in different years? Say last year vs. two years ago?

Different groups of students in test

Different test questions

Are they really comparable?

Yes!

GRE scores are standardized

Score distribution is standardized each time.

Page 73: A Hypothetical Research Example

Standardizing a Distribution

Relabeling each score

Page 74: A Hypothetical Research Example

How to Standardize a Score?

Convert a score under a distribution, with any

mean and standard deviation, to a z-score

under a standardized distribution, with a

mean of 0 and a standard deviation of 1

Convert the z-score to a score under a

standardized distribution, with a

predetermined mean and standard deviation.

Page 75: A Hypothetical Research Example

Probability

Page 76: A Hypothetical Research Example

Probability

Probability definition

Chance, odds, proportion

What is the probability to get a King from a deck

of cards?

Page 77: A Hypothetical Research Example

Random Sampling

Each individual of the population has the

same chance of being selected

Constant probability for each and every

selection in case of repetitive samplings

Previously selected samples must be returned to

the population!

Page 78: A Hypothetical Research Example

The General Formula

Example: coin toss

number of outcomes classified as A

Probability of A = ----------------------------------------------------

total number of possible outcomes

Page 79: A Hypothetical Research Example

Using Frequency

Distributions to Calculate

Probability

Probability = Proportion

Page 80: A Hypothetical Research Example

More Complicated

Distributions?

0

100

200

300

400

500

600

0 1 2 3 4 5

Page 81: A Hypothetical Research Example

What If Distributions Are

Smooth Curves?

Same technique: to find

the proportion

But how?

No blocks to count.

Calculus

1 2 3 4 5

Page 82: A Hypothetical Research Example

Probability and the Normal

Distribution

The normal distribution

A particular shape

Can describe many phenomena if sample is big

enough

The bell curve

Symmetrical

Single mode in the middle

Simulation (link)

Page 83: A Hypothetical Research Example

Proportion of a z-Score in

the Normal Distribution

Has Been Pre-Calculated

Page 84: A Hypothetical Research Example

The Unit Normal Table

Up to z = 4.00

Page 85: A Hypothetical Research Example

Example

p (z>1.0)

p (z<1.5)

p (z<-0.5)

More

p (1<z<1.5)

p (-0.5<z<1.5)

Different tables for different distributions

Normal distribution is most often seen.

Page 86: A Hypothetical Research Example

Finding the z-Score from a

Probability

What if you cannot find the exact number?

Use the closest z-score

Interpolate

Page 87: A Hypothetical Research Example

Probabilities For Scores from A

Normal Distribution

Transfer scores to z-scores and then look

up the unit normal table

p(55<x<65)=?

Page 88: A Hypothetical Research Example

Find a Score for a Particular

Probability

Look up the unit normal table for a z-score,

and then find a score in a distribution which

corresponds to the z-score

What is the minimum score necessary to be in the top 15%?

Page 89: A Hypothetical Research Example

You Can Estimate Where Your

IQ Stands

mIQ = 100

sIQ = 15

Page 90: A Hypothetical Research Example

Importance of Probability to

Research

Compare samples with the population mean

Does a particular sample belong to the population?

How sure is it?

Page 91: A Hypothetical Research Example

Probabilities and

Samples:

Distribution of

Sample Means

Page 92: A Hypothetical Research Example

Sample Error

Page 93: A Hypothetical Research Example

Samples Always Have Errors!

How Can We Infer Population Parameters based on One or

A Few Samples?

Page 94: A Hypothetical Research Example

Recall Our Examples on GRE

and SAT Scores

We can make predication

Using statistics

Mean, standard deviation

For any variable, if we know the mean and

standard deviation, we will have a way to deal

with it

Sample statistics can be treated in the same way.

Sample mean: the mean of all possible sample means

Sample variance: the variance of all possible sample

means

Page 95: A Hypothetical Research Example

Distribution of Sample Means

A frequency distribution of sample means

Including all the possible samples with a

particular sample size n

The distribution of statistics

A sampling distribution

With a specific sample size

Page 96: A Hypothetical Research Example

Example

The population: 2,4,6,8

Sample size: n=2

Random sample

Page 97: A Hypothetical Research Example
Page 98: A Hypothetical Research Example
Page 99: A Hypothetical Research Example
Page 100: A Hypothetical Research Example

What Do We Get?

The sample means pile up around the

population mean.

The distribution of sample means is like a

normal distribution.

It is more likely to get a sample mean close to

the population mean.

What is the probability to get an extreme sample

mean?

Page 101: A Hypothetical Research Example

Central Limit Theorem

For a population (m,s), the distribution of

sample means for sample size n will have a

mean of m and a standard deviation of s/

It is about any population

The shape of distribution does not matter.

The mean and standard deviation do not matter.

n

Page 102: A Hypothetical Research Example

Shape of Distribution

Would be a normal distribution if

The population is a normal distribution, or

The sample size is large enough, say larger than

30

Page 103: A Hypothetical Research Example

The Mean of Sample Means

The expected value of M

It is near the population mean

Page 104: A Hypothetical Research Example

The Standard Deviation of

Sample Means

The standard error of M

Standard distance between an M and m

Notation: sM

The larger the sample size is, the smaller the

standard error of M is.

The law of large number

The larger the sample size is, the more probable the

sample means will be close to the population mean.

The more unlikely a sample mean is very far away

from the population mean.

Page 105: A Hypothetical Research Example

Distributions of Sample Means

for A Normal Distribution

n1

n2 n3

Page 106: A Hypothetical Research Example

A Non-Normal Distribution

Page 107: A Hypothetical Research Example

Distributions of Sample Means

n1 n2

Page 108: A Hypothetical Research Example

Implication

The probability of sample means can be

estimated by using z-scores and the unit

normal table

Variables:

A sample mean

The population mean

The standard error

The population standard deviation and sample size

Page 109: A Hypothetical Research Example

Example: m=500, s=100

Given n = 25, what is the probability to get a

sample mean larger than 540?

Page 110: A Hypothetical Research Example

Standard Error

Like standard deviation

Measure the standard distance between a sample mean and the population mean

Provide information about sample error

Very often, we don’t know the population mean

All we have are sample means and standard errors.

How much do we know about the population mean based on the sample means?

Page 111: A Hypothetical Research Example

Example

Comparing a new teaching method with the

traditional method based on testing scores

New Tradition

Page 112: A Hypothetical Research Example
Page 113: A Hypothetical Research Example

Important Concepts

Variability

Variance

Standard deviation

Population and sample

z-Score

Scores, mean and the standard deviation

Probability and frequency distribution

z-Score and probability

Use the unit normal table

Sampling distribution and standard errors

Probability of sample means

Page 114: A Hypothetical Research Example

Go Back to the Hanx Writer

Example

What the research project is about

Assuming two populations

Hanx Writer user population

Normal keyboard user population

Obtaining one sample from each population

Using the means from two samples to estimate the populations

The central question:

How likely is the sample of Hanx Writer population actually one

sample of normal keyboard population (which means no difference)

If the probability is low, the sample is more likely from another population.

Otherwise, cannot rule out the possibility that two samples are from the

same population, the normal keyboard users.

Page 115: A Hypothetical Research Example

Homework

On CANVAS

With red rectangles