a bar chart of a quantitative variable with only a few categories (called a discrete variable)...

38
A bar chart of a quantitative variable with only a few categories (called a discrete variable) communicates the relative number of subjects with each of the possible responses. However, the bar chart does not graphically distinguish between quantitative and qualitative variables. Once we looked at the variable label and the values, we would realize that this is a quantitative variable, but it would take that extra work to understand it. 06/27/22 Slide 1

Upload: crystal-smallridge

Post on 11-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: A bar chart of a quantitative variable with only a few categories (called a discrete variable) communicates the relative number of subjects with each of

A bar chart of a quantitative variable with only a few categories (called a discrete variable) communicates the relative number of subjects with each of the possible responses.

However, the bar chart does not graphically distinguish between quantitative and qualitative variables.

Once we looked at the variable label and the values, we would realize that this is a quantitative variable, but it would take that extra work to understand it.

04/18/23 Slide 1

Page 2: A bar chart of a quantitative variable with only a few categories (called a discrete variable) communicates the relative number of subjects with each of

If the quantitative variable has a large number of categories (called a continuous variable), the bar chart provides little information beyond the fact that there are a lot of different values, and some occur more frequently than others.

04/18/23 Slide 2

Page 3: A bar chart of a quantitative variable with only a few categories (called a discrete variable) communicates the relative number of subjects with each of

Histograms are used as the preferred graph for quantitative variables. While the bars resemble those of a bar chart, histograms are distinguished by the absence of gaps between consecutive bars.

For continuous variables, values are grouped in equally spaced intervals to convey a sense of what the distribution looks like.

04/18/23 Slide 3

Page 4: A bar chart of a quantitative variable with only a few categories (called a discrete variable) communicates the relative number of subjects with each of

While we used counts and percents to describe the distribution of a qualitative variable, we use statistical measures to describe the center, spread, and shape of a quantitative variable.

Measures of central tendency identify a value in the center of the distribution.

Measures of central tendency identify a value in the center of the distribution.

Measures of variability or dispersion summarize how the values for individual cases are spread out around the measure of central tendency.

Measures of variability or dispersion summarize how the values for individual cases are spread out around the measure of central tendency.

04/18/23 Slide 4

Page 5: A bar chart of a quantitative variable with only a few categories (called a discrete variable) communicates the relative number of subjects with each of

There are two measures of the shape of the distribution: skewness and kurtosis.

Many of the statistics we will use assume that the distribution of a variable is bell-shaped, i.e. the normal distribution.

Skewness measures the symmetry of the distribution on both sides of the average score for the distribution.

Having overlaid a blue normal curve on the distribution of this variable, we can see that the bars on either side of the red center line are similar as one moves away from the center.

Kurtosis measure the degree to which the distribution is peaked or flat compared to the normal distribution. In this example, the bars at the center of the distribution are close to what would be expected for a normal distribution and the frequencies decrease as we move away from the center.

04/18/23 Slide 5

Page 6: A bar chart of a quantitative variable with only a few categories (called a discrete variable) communicates the relative number of subjects with each of

Both of these variables have a problem with skewness, caused by atypical scores at one end of the distribution.

Skewness is characterized as negative or positive, depending on which side, or tail, of the distribution has the unusual scores.

This is an example of negative skewness, where a few small scores have elongated the left tail of the distribution. The tail on the right is truncated.

This is an example of positive skewness, where a few large scores have elongated the right tail of the distribution. The tail to the left is truncated.04/18/23 Slide 6

Page 7: A bar chart of a quantitative variable with only a few categories (called a discrete variable) communicates the relative number of subjects with each of

Both of these variables have a problem with kurtosis, caused by either too few cases in the center of the distribution, or too many cases in the center of the distribution.

This is an example of negative kurtosis, where the scores are uniformly distributed through the range of scores. The kurtosis statistic will have a negative value.

This is an example of positive kurtosis, where the scores are heavily concentrated in the center of the distribution. The kurtosis statistic will have a positive value.

04/18/23 Slide 7

Page 8: A bar chart of a quantitative variable with only a few categories (called a discrete variable) communicates the relative number of subjects with each of

When the distribution has minimal skewness and is symmetric, both the red mean line and the green median line fall in the center of the distribution.

There are two measures of central tendency for quantitative variables: the mean and the median.

The mean is the average score.

The median is the middle score, i.e. half of the scores are higher and half are lower.

While both measures reflect the center of the distribution, the mean is the preferred measure because it uses information for all of the cases in the distribution.

For each measure of centrality, there is a corresponding measure of spread. The standard deviation is used with the mean, and the interquartile range is used with median.

04/18/23 Slide 8

Page 9: A bar chart of a quantitative variable with only a few categories (called a discrete variable) communicates the relative number of subjects with each of

When skewing is present, the red mean line moves away from the center of the distribution as identified by the green median line in the direction of the skewness.

At some level of skewness , the median becomes more effective at representing the center of the distribution.

The issue is selecting a defensible rule for deciding the dividing line between acceptable skewness and problematic skewness.

The rule of thumb that we will use is that skewness less than -1.0 or greater than +1.0 is problematic and indicates that the median is the preferred measure.

04/18/23 Slide 9

Page 10: A bar chart of a quantitative variable with only a few categories (called a discrete variable) communicates the relative number of subjects with each of

Kurtosis does not affect the location of the measure of central tendency.

Kurtosis indicates that there are either more cases than expected in the middle of the distribution (positive kurtosis), or fewer cases than expected (negative kurtosis).

04/18/23 Slide 10

The bars extending about the normal curve overlay indicate that there is positive kurtosis. A distribution with positive kurtosis is characterized as a “peaked distribution.”

When the bars fall below the center of the normal curve overlay, the distribution has negative kurtosis, and is referred to as a flat distribution.

Page 11: A bar chart of a quantitative variable with only a few categories (called a discrete variable) communicates the relative number of subjects with each of

04/18/23 Slide 11

• The homework problems on central tendency and variability focus on describing the distribution of quantitative variables.

• The counts and percents that we used for qualitative variables are not effective for quantitative variables that can have many different scores in the distribution.

• We describe the distribution of quantitative variables with summary statistics that try to communicate the value on which the distribution is centered, the spread of the values from the center of the distribution, the symmetry of the distribution around the center measure, and the degree to which the distribution is bell-shaped or flat.

Page 12: A bar chart of a quantitative variable with only a few categories (called a discrete variable) communicates the relative number of subjects with each of

04/18/23 Slide 12

• The center, or central tendency, of the distribution is usually represented by the mean (average score) or the median (middle score) of the distribution.

• The standard deviation is used as the measure of spread (variability or dispersion) that is paired with the mean. It measures the average difference between the mean and each of the scores in the distribution.

• The range and interquartile range are used to measure the spread around the median. The range is the difference between the highest score and lowest score. The interquartile range is the difference between the highest and lowest score when the smallest 25% and the largest 25% of the scores are removed from the distribution.

Page 13: A bar chart of a quantitative variable with only a few categories (called a discrete variable) communicates the relative number of subjects with each of

04/18/23 Slide 13

• Both the mean and the median can be computed for the values in the distribution of any quantitative variable.

• However, the degree to which one or the other is a “good” measure or indicator of the central tendency of a distribution differs with the shape of the distribution, specifically the symmetry of the distribution as measured by skewness.

• If the distribution is symmetric, both the mean and the median fall in the center of the distribution. The mean is the preferred measure because it uses all of the cases in the distribution in its calculation, and because it can be used in a broader range of statistical tests.

• If the distribution is not symmetric, the median stays in the middle of the distribution, but the mean is pulled away from the center toward one of the tails of the distribution.

Page 14: A bar chart of a quantitative variable with only a few categories (called a discrete variable) communicates the relative number of subjects with each of

04/18/23 Slide 14

• The degree of symmetry of a distribution of scores for a quantitative variable can vary quite widely.

These six histograms show progressively increasing skewness. At what point do we choose the median over the mean?

Page 15: A bar chart of a quantitative variable with only a few categories (called a discrete variable) communicates the relative number of subjects with each of

04/18/23 Slide 15

• There is no universally accepted criteria for the amount of skewness that dictates a preference for the median.

• Most agree that we should be concerned with substantial violations of skewness and ignore minor departures, but there is not agreement of what is a substantial violation.

• One rule of thumb indicates that a distribution has a substantial skewness problem when the size of the skew statistic is twice its standard error (in the SPSS output).

• The rule of thumb that I have used and which will be used for the problems is that skewness is a problem if it is less than -1 for negatively skewed distributions or greater than +1 for positively skewed.

Page 16: A bar chart of a quantitative variable with only a few categories (called a discrete variable) communicates the relative number of subjects with each of

04/18/23 Slide 16

The skewness for this histogram is 1.33.

The skewness for this histogram is 1.86.

The skewness for this histogram is 1.09.

The skewness for this histogram is 0.35.

The skewness for this histogram is 0.84.

The skewness for this histogram is 0.94.

By my rule of thumb, we would use the mean as the measure of central tendency for the top row, and the median for the bottom row. That the rule is arbitrary is shown by the similarity of the last chart on the top row to the first chart on the bottom row.

Page 17: A bar chart of a quantitative variable with only a few categories (called a discrete variable) communicates the relative number of subjects with each of

The introductory statement in the question indicates:• The data set to use (GSS200R)• The statistic to use (central tendency and

dispersion)• The variable to use in the analysis(occupational

prestige score [prestg80]. )

04/18/23 Slide 17

Page 18: A bar chart of a quantitative variable with only a few categories (called a discrete variable) communicates the relative number of subjects with each of

The first statement for us to evaluate concerns the number of valid and missing cases. To answer this question, we produce the descriptive statistics in SPSS.

04/18/23 Slide 18

Page 19: A bar chart of a quantitative variable with only a few categories (called a discrete variable) communicates the relative number of subjects with each of

To compute the measures of central tendency and dispersion in SPSS, select the Descriptive Statistics > Explore command from the Analyze menu.

Measures of central tendency and variability can also be computed with the Frequencies and Descriptives commands.

04/18/23 Slide 19

Page 20: A bar chart of a quantitative variable with only a few categories (called a discrete variable) communicates the relative number of subjects with each of

Move the variable for the analysis prestg80 to the Dependent List list box.

Click on the Statistics button to select optional statistics.

04/18/23 Slide 20

Page 21: A bar chart of a quantitative variable with only a few categories (called a discrete variable) communicates the relative number of subjects with each of

The check box for Descriptives is already marked by default.

Click on Continue button to close the dialog box.

Mark the Percentiles check box. This will provide the upper and lower bounds for the interquartile range.

04/18/23 Slide 21

Page 22: A bar chart of a quantitative variable with only a few categories (called a discrete variable) communicates the relative number of subjects with each of

After returning to the Explore dialog box, click on the OK button to produce the output.

04/18/23 Slide 22

Page 23: A bar chart of a quantitative variable with only a few categories (called a discrete variable) communicates the relative number of subjects with each of

The 'Case Processing Summary' in the SPSS output showed the total number of valid cases to be 255 and the number of missing cases to be 15.

The SPSS output provides us with the answer to the question on sample size.

04/18/23 Slide 23

Page 24: A bar chart of a quantitative variable with only a few categories (called a discrete variable) communicates the relative number of subjects with each of

The 'Case Processing Summary' in the SPSS output showed the total number of valid cases to be 255 and the number of missing cases to be 15.

Click on the check box to mark the statement as correct.

04/18/23 Slide 24

Page 25: A bar chart of a quantitative variable with only a few categories (called a discrete variable) communicates the relative number of subjects with each of

The next pair of statements asks us to identify the correct values for the mean and the standard deviation from the SPSS output.

04/18/23 Slide 25

Page 26: A bar chart of a quantitative variable with only a few categories (called a discrete variable) communicates the relative number of subjects with each of

In the table of descriptive statistics, the Mean row has a value of 44.17 and the Std. Deviation row shows 13.935, which rounds to 13.94.

04/18/23 Slide 26

Page 27: A bar chart of a quantitative variable with only a few categories (called a discrete variable) communicates the relative number of subjects with each of

In the table of descriptive statistics, the mean is 44.17 and the standard deviation is 13.94.

We mark the check box for the statement with the correct values.

04/18/23 Slide 27

Page 28: A bar chart of a quantitative variable with only a few categories (called a discrete variable) communicates the relative number of subjects with each of

The next pair of statements asks us to identify the correct values for the median and the interquartile range from the SPSS output.

04/18/23 Slide 28

Page 29: A bar chart of a quantitative variable with only a few categories (called a discrete variable) communicates the relative number of subjects with each of

In the table of descriptive statistics, the Median row has a value of 43.00 and the Interquartile Range row has a value of 18.

04/18/23 Slide 29

Page 30: A bar chart of a quantitative variable with only a few categories (called a discrete variable) communicates the relative number of subjects with each of

In the table of descriptive statistics, the median is 43 and the interquartile range is 18.

We mark the check box for the statement with the correct values.

04/18/23 Slide 30

Page 31: A bar chart of a quantitative variable with only a few categories (called a discrete variable) communicates the relative number of subjects with each of

The next pair of statements asks us to identify the direction of the skewing in the distribution of the variable.

04/18/23 Slide 31

Page 32: A bar chart of a quantitative variable with only a few categories (called a discrete variable) communicates the relative number of subjects with each of

The skewness for the distribution of "occupational prestige score" [prestg80] is 0.40. Since this is equal to or greater than zero, we characterize it as positive skewing or skewing to the right. If it were less than zero, it would be negative skewing or skewing to the left.

04/18/23 Slide 32

Page 33: A bar chart of a quantitative variable with only a few categories (called a discrete variable) communicates the relative number of subjects with each of

The skewness for the distribution of "occupational prestige score" [prestg80] is 0.40. Since this is greater than zero, we characterize it as positive skewing or skewing to the right.

We mark the check box for the statement with the correct response.

04/18/23 Slide 33

Page 34: A bar chart of a quantitative variable with only a few categories (called a discrete variable) communicates the relative number of subjects with each of

The final pair of statements asks us to identify which measure of center and spread should be reported for the variable.

04/18/23 Slide 34

Page 35: A bar chart of a quantitative variable with only a few categories (called a discrete variable) communicates the relative number of subjects with each of

One rule of thumb suggests that when the value of the skewness statistic is 2 times the value of the skewness standard error, the median is preferred.

For this variable, the statistic (.401) is more than twice the standard error (.153), so the median would be preferred.

04/18/23 Slide 35

Page 36: A bar chart of a quantitative variable with only a few categories (called a discrete variable) communicates the relative number of subjects with each of

The skewness of this distribution (0.40) is in the allowable range, making the mean and standard deviation the preferred measures of center and spread.

Another rule of thumb uses only the value of the skewness statistic. When the skewness is smaller than -1.0 or larger than + 1.0, the distribution is badly skewed and the median is a better measure of central tendency. This is the rule of thumb used in our problems.

04/18/23 Slide 36

Page 37: A bar chart of a quantitative variable with only a few categories (called a discrete variable) communicates the relative number of subjects with each of

04/18/23 Slide 37

Page 38: A bar chart of a quantitative variable with only a few categories (called a discrete variable) communicates the relative number of subjects with each of

Using the rule of thumb that skewness between -1.0 and + 1.0 is acceptable, the skewness of this distribution (0.40) is acceptable making the mean and standard deviation the preferred measures of center and spread.

The check box for the first statement is marked.

04/18/23 Slide 38