464—social statistics for a diverse society466—social statistics for a diverse society 3. the...

13
464SOCIAL STATISTICS FOR A DIVERSE SOCIETY a. Is there any difference between the regression equations for married and divorced respondents? b. What is the predicted number of children for married and divorced respondents with the following number of siblings: one sibling, four siblings, and seven siblings? c. What differences, if any, do you find? Is the number of siblings a better predictor of number of children for married respondents or for women? 5. Use the 2006 GSS file [GSS06PFP-A] to investigate the relationship between the respondent’s education (EDUC) and the education received by his or her father and mother (PAEDUC and MAEDUC, respective1y). a. Use SPSS to find the correlation coefficient, the coefficient of determination, and the regression equation predicting the respondent’s education with father’s educa- tion only. Interpret your results. b. Use SPSS to find the multiple correlation coefficient, the multiple coefficient of determination, and the regression equation predicting the respondent’s education with father’s and mother’s education. Interpret your results. c. Did taking into account the respondent’s mother’s education improve our predic- tion? Discuss this on the basis of the results from 5b. d. Using the regression equation from 5a, calculate the predicted number of years of education for a person with a father with 12 years of education. Then, repeat this procedure, adding in a mother’s 12 years of education and using the regression equation from 5b. CHAPTER EXERCISES 1. For a variety of reasons, a larger percentage of people are concerned today about the state of the environment than in years past. This has led to the formation of environ- mental action groups that attempt to alter environmental policies nationally and around the globe. A large number of environmental action groups subsist on the donations of concerned citizens. Based on the following eight countries, examine the data to deter- mine the extent of the relationship between simply being concerned about the environ- ment and actually giving money to environmental groups. Country Percent Concerned Percent Donating Money Austria 35.5 27.8 Denmark 27.2 22.3 Netherlands 30.1 44.8 Philippines 50.1 6.8 Russia 29.0 1.6 Slovenia 50.3 10.7 Spain 35.9 7.4 United States 33.8 22.8 Source: International Social Survey Programme, 2000. Exercises 13-Frankfort 5e-45753.indd 464 8/1/2008 9:53:13 PM

Upload: others

Post on 26-Mar-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

464— S O C I A L S TAT I S T I C S F O R A D I V E R S E S O C I E T Y

a. Is there any difference between the regression equations for married and divorced respondents?

b. What is the predicted number of children for married and divorced respondents with the following number of siblings: one sibling, four siblings, and seven siblings?

c. What differences, if any, do you find? Is the number of siblings a better predictor of number of children for married respondents or for women?

5. Use the 2006 GSS file [GSS06PFP-A] to investigate the relationship between the respondent’s education (EDUC) and the education received by his or her father and mother (PAEDUC and MAEDUC, respective1y).a. Use SPSS to find the correlation coefficient, the coefficient of determination, and

the regression equation predicting the respondent’s education with father’s educa-tion only. Interpret your results.

b. Use SPSS to find the multiple correlation coefficient, the multiple coefficient of determination, and the regression equation predicting the respondent’s education with father’s and mother’s education. Interpret your results.

c. Did taking into account the respondent’s mother’s education improve our predic-tion? Discuss this on the basis of the results from 5b.

d. Using the regression equation from 5a, calculate the predicted number of years of education for a person with a father with 12 years of education. Then, repeat this procedure, adding in a mother’s 12 years of education and using the regression equation from 5b.

C H A P T E R E x E R C I S E S

1. For a variety of reasons, a larger percentage of people are concerned today about the state of the environment than in years past. This has led to the formation of environ-mental action groups that attempt to alter environmental policies nationally and around the globe. A large number of environmental action groups subsist on the donations of concerned citizens. Based on the following eight countries, examine the data to deter-mine the extent of the relationship between simply being concerned about the environ-ment and actually giving money to environmental groups.

Country PercentConcerned PercentDonatingMoney

Austria 35.5 27.8

Denmark 27.2 22.3

Netherlands 30.1 44.8

Philippines 50.1 6.8

Russia 29.0 1.6

Slovenia 50.3 10.7

Spain 35.9 7.4

United States 33.8 22.8

Source: International Social Survey Programme, 2000.

Exer

cise

s

13-Frankfort5e-45753.indd464 8/1/20089:53:13PM

Regression and Correlation— 465

a. Construct a scatterplot of the two variables, placing percent concerned about the environment on the horizontal or X-axis and the percent donating money to envi-ronmental groups on the vertical or Y-axis.

b. Does the relationship between the two variables seem linear? Describe the relationship.c. Find the value of the Pearson correlation coefficient that measures the association

between the two variables, and offer an interpretation.

2. There is often thought to be a relationship between a person’s educational attainment and the number of children he or she has. The hypothesis is that as one’s educational level increases, he or she has fewer children. Investigate this conjecture with 25 cases drawn from the 2006 GSS file. The following table displays educational attainment, in years, and the number of children for each respondent.

Education Children Education Children

16 0 12 2

12 1 12 3

12 3 11 1

6 6 12 2

14 2 11 2

14 2 12 0

16 2 12 2

12 2 12 3

17 2 12 4

12 3 12 1

14 4 14 0

13 0 12 3

12 1

a. Calculate the Pearson correlation coefficient for these two variables. Does its value support the hypothesized relationship?

b. Calculate the least-squares regression equation using education as a predictor vari-able. What is the value of the slope, b? What is the value of the intercept, a?

c. What is the predicted number of children for a person with a college degree (16 years of education)?

d. Does any respondent actually have this number of children? If so, what is his or her level of education? If not, is this a problem or an indication that the regression equation you calculated is incorrect? Why or why not?

Exercises

13-Frankfort5e-45753.indd465 8/1/20089:53:13PM

466— S O C I A L S TAT I S T I C S F O R A D I V E R S E S O C I E T Y

3. The condition and health of our environment is a growing concern. Let’s examine the relationship between a country’s gross national product (GNP) and the percentage of respondents willing to pay higher prices for goods to protect the environment. The fol-lowing table displays information for five random countries.a. Calculate the correlation coefficient between a country’s GNP and the percentage

of its residents willing to pay higher prices to protect the environment. What is its value?

b. Provide an interpretation for the coefficient.

Percentage of Residents Willing to Pay Higher Prices to Protect the Environment by Country and GNP per Capita

Country GNPperCapita PercentageWillingtoPay

United States 29.24 44.9

Ireland 18.71 53.3

Netherlands 24.78 61.2

Norway 34.31 40.7

Sweden 25.58 32.6

Source: International Social Survey Programme, 2000.

4. In Chapter 5, Exercise 9, we studied the variability of crime rates and police expendi-tures in the eastern and Midwestern United States. We’ve now been asked to investigate the hypothesis that the number of crimes is related to police expenditures per capita because states with higher crime rates are likely to increase their police force, thereby spending more on the number of officers on the street.a. Construct a scatter diagram of the number of crimes and police expenditures per

capita, with number of crimes as the predictor variable. What can you say about the relationship between these two variables based on the scatterplot?

b. Find the least-squares regression equation that predicts police expenditures per capita from the number of crimes. What is the slope? What is the intercept?

c. Calculate the coefficient of determination (r2), and provide an interpretation.d. If the number of crimes increased by 100 for a state, by how much would you pre-

dict police expenditures capita to increase?e. Does it make sense to predict police expenditures per capita when the number of

crimes is equal to zero? Why or why not?

Exer

cise

s

13-Frankfort5e-45753.indd466 8/1/20089:53:13PM

Regression and Correlation— 467

State

NumberofCrimesper100,000Population

PoliceProtectionExpendituresinMillionsofDollars

Maine 2,525 207

New Hampshire 1,928 239

Vermont 2,401 115

Massachusetts 2,821 1,480

Rhode Island 2,970 284

Source: U.S. Census Bureau, 2008. Statistical Abstract of the United States, 2008, Tables 310 and 429.

5. Before calculating a correlation coefficient or a regression equation, it is always important to examine a scatter diagram between two variables to see how well a straight line fits the data. If a straight line does not appear to fit, other curves can be used to describe the relationship (this subject is not discussed in our text).

The SPSS scatterplot in Figure 13.23 and output shown in Figure 13.24 display the relationship between education (measured in years) and television viewing (measured in hours) based on 2006 GSS data. We can hypothesize that as educational attainment increases, hours of television viewing will decrease, indicating a negative relationship between the two variables.a. Assess the relationship between the two variables based on the scatterplot and

output for the b coefficient. Is there a relationship between these two variables as hypothesized? Is it a negative or a positive relationship?

b. Describe the relationship between these two variables using representative values of years of education and hours of television viewing. For example, if an individual has 16 years of education, what are the predicted hours of television viewing? How can you determine this?

c. Does a straight line adequately represent (visually) the relationship between these two variables? Why or why not?

Exercises

13-Frankfort5e-45753.indd467 8/1/20089:53:13PM

468— S O C I A L S TAT I S T I C S F O R A D I V E R S E S O C I E T Y

0 5 10 15 20

Highest year of school completed

0

5

10

15

20

25

Ho

urs

of

tele

visi

on

vie

win

g p

er d

ay

Figure 13.23 Scatterplot of Education by Hours of Television Viewing per Day

Figure 13.24 Linear Regression Output Specifying the Relationship Between Education and Hours Spent per Day Watching Television

Coefficientsa

Model Summary

Model R RSquare AdjustedRSquare Std.ErroroftheEstimate

1 .195a .038 .037 2.337

Unstandardized Standardized Coefficients Coefficients

Model B Std.Error Beta t Sig.

1 (Constant) 4.827 .375 12.880 .000 ofHighest –.138 .027 –.195 –5.071 .000 YearSchool Completed

a.DependentVariable:HoursperdaywatchingTv

a.Predictors:(Constant),HighestYearofSchoolCompletedExer

cise

s

13-Frankfort5e-45753.indd468 8/1/20089:53:14PM

Regression and Correlation— 469

6. Based on the statistical data obtained from the countries in South America, let’s ana-lyze the relationship between GNP and infant mortality rate (IMR).

a. Construct a scatterplot from the following data, predicting IMR from GNP. What is the relationship between GNP and IMR for these 10 countries in South America?

b. Does it appear (visually) that a straight line fits these data? Why or why not?c. Calculate the correlation coefficient and coefficient of determination. Do these val-

ues offer further support for your answer to (b)? How?

7. Social scientists have long been interested in the aspirations and achievements of people in the United States. Research on social mobility, status, and educational attainment has provided convincing evidence on the relationship between parents’ and children’s socioeconomic achievement. The GSS2006 data set has information on the educational level of respondents and their mothers. Use this information for the following selected nonrandom subsample of respondents to see whether those whose mothers had more education are more likely to have more education themselves.

Country

GNPperCapitain1997(Dollars)

InfantMortalityin1998(per1,000Births)

Argentina 8,030 19.0

Bolivia 1,010 60.0

Brazil 4,630 33.0

Chile 4,990 10.0

Colombia 2,740 23.0

Ecuador 1,520 32.0

Paraguay 1,760 24.0

Peru 2,440 40.0

Uruguay 6,070 16.0

Venezuela 3,530 21.0

Source: Population Reference Bureau, 2000, and the World Bank, 1999.

Exercises

13-Frankfort5e-45753.indd469 8/1/20089:53:14PM

470— S O C I A L S TAT I S T I C S F O R A D I V E R S E S O C I E T Y

a. Construct a scatterplot, predicting the highest year of respondent’s schooling with the highest year of the mother’s schooling.

b. Calculate the regression equation with mother’s education as the predictor variable, and draw the regression line on the scatterplot. What is the slope? What is the inter-cept? Describe how the straight line “fits” the data.

c. What is the error of prediction for the second case (the person with 13 years of education and mother’s education 15 years)? What is the error of prediction for the person with 18 years of education and mother’s education 14 years?

d. What is the predicted years of education for someone whose mother received 4 years of education? How about for someone whose mother received 12 years of education?

e. Calculate the mean number of years of education for respondents and for respon-dents’ mothers. Plot this point on the scatterplot. Where does it fall? Can you think of a reason why this should be true?

8. In Exercise 6, we investigated the relationship between infant mortality rate and GNP in South America. The birthrates (number of live births per 1,000 inhabitants) in these same countries are shown in the following table:

Mother’sHighestSchoolYearCompleted

Respondent’sHighestSchoolYearCompleted

0 12

15 13

6 9

9 12

16 16

12 12

6 16

18 14

12 13

14 12

14 18

7 12

Exer

cise

s

13-Frankfort5e-45753.indd470 8/1/20089:53:15PM

Regression and Correlation— 471

a. ConstructascatterplotforGNPandbirthrateandoneforinfantmortalityrateandbirthrate.Doyouthinkeachcanbecharacterizedbyalinearrelationship?

b. Calculate the coefficient of determination and correlation coefficient for eachrelationship.

c. Usethisinformationtodescribetherelationshipbetweenthevariables.

9. In 2004, a U.S. Census Bureau report revealed that approximately 12.5% of allAmericanswere livingbelow thepoverty line in2003.This figure ishigher than in2002,whenthepovertyratewas12.1%.Thistranslatestoanincreaseof1.3millionAmericanslivingbelowthepovertyline.Individualsandfamilieslivingbelowthepov-ertylinefacemanyobstacles,theleastofwhichisaccesstohealthcare.Inmanycases,those livingbelow thepoverty linearewithoutany formofhealth insurance.UsingdatafromtheU.S.CensusBureau,analyzetherelationshipbetweenlivingbelowthepovertylineandaccesstohealthcare.

Country Birth Rate in 1999

Argentina 19

Bolivia 32

Brazil 20

Chile 18

Colombia 24

Ecuador 24

Paraguay 30

Peru 25

Uruguay 17

Venezuela 25

Source:DatafromWorldBank,2000.

State

% Below Poverty Line (Average from 2001 to 2003)

% Without Health Insurance (Average from 2001 to 2003)

Alabama 15.1 13.3

California 12.9 18.7

Idaho 11.0 17.5

(Continued)

Exercises

13-Frankfort 5e-45753.indd 471 8/1/2008 9:59:37 PM

472— S O C I A L S TAT I S T I C S F O R A D I V E R S E S O C I E T Y

a. Construct a scatterplot, predicting the percentage without health insurance with the percentage living below the poverty level. Does it appear that a straight-line relationship will fit the data?

b. Calculate the regression equation with percentage of the population without health insurance as the dependent variable, and draw the regression line on the scatterplot. What is its slope? What is the intercept? Has your opinion changed about whether a straight line seems to fit the data? Are there any states that fall far from the regres-sion line? Which one(s)?

c. What percentage of the population must be living below the poverty line to obtain a predicted value of 5% without health insurance?

d. Predicting a value that falls beyond the observed range of the two variables in a regression is problematic at best, so your answer in (c) isn’t necessarily statistically believable. However, what is a nonstatistical, or substantive, reason? Why making such a prediction might be important?

10. Let’s examine the relationship between GNP per capita and the percentage of respon-dents willing to pay more in taxes.

(Continued)

State

%BelowPovertyLine(Averagefrom2001to2003)

%WithoutHealthInsurance(Averagefrom2001to2003)

Louisiana 16.9 19.4

New Jersey 8.2 13.4

New York 14.2 15.5

Pennsylvania 9.9 10.7

Rhode Island 10.7 9.3

South Carolina 14.0 13.1

Texas 15.8 24.6

Washington 11.4 14.3

Wisconsin 8.8 9.5

Source: U.S. Bureau of the Census. Current Population Reports P60–226, Income, Poverty and Health Insurance Coverage in the United States, 2003.

Exer

cise

s

13-Frankfort5e-45753.indd472 8/1/20089:53:15PM

Regression and Correlation— 473

a. In this chapter, we used Table 13.4 to illustrate how to calculate the slope and intercept in the regression table. Using Table 13.4 as a model, create a similar table using the data below for GNP per capita and the percent willing to pay higher taxes.

b. From the table that you created in 10a, calculate a and b and write out the regression equation (i.e., prediction equation).

c. Calculate and interpret error type, E2. d. Using your answer from 10c, calculate the PRE measure, r2. Interpret. e. About what percentage of citizens are willing to pay higher taxes for a country

with a GNP per capita of 3.0 (i.e., $3,000)? For a GNP per capita of 30.0 (i.e., $30,000)?

11. In Exercise 5, we examined the relationship between years of education and hours of television watched per day. We saw that as education increases, hours of television

Country GNPperCapita %WillingtoPayHigherTaxes

Canada 19.71 24.0

Chile 4.99 29.1

Finland 24.28 12.0

Ireland 18.71 34.3

Japan 32.35 37.2

Latvia 2.42 17.3

Mexico 3.84 34.7

Netherlands 24.78 51.9

New Zealand 14.60 31.1

Norway 34.31 22.8

Portugal 10.67 17.1

Russia 2.66 29.9

Spain 14.10 22.2

Sweden 25.58 19.5

Switzerland 39.98 33.5

United States 29.24 31.6

Sources: The World Bank Group. Development Education Program Learning Module: Economics, GNP per Capita. 2004. ISSP 2000.

Exercises

13-Frankfort5e-45753.indd473 8/1/20089:53:15PM

474— S O C I A L S TAT I S T I C S F O R A D I V E R S E S O C I E T Y

viewing decreases. The number of children a family has could also affect how much television is viewed per day. Having children may lead to more shared and supervised viewing and thus increases the number of viewing hours. The SPSS output in Figure 13.25, based on 2006 GSS data, displays the relationship between television viewing (measured in hours per day) and both education (measured in years) and number of children. We hypothesize that whereas more education may lead to less viewing, the number of children has the opposite effect: Having more children will result in more hours of viewing per day.

Figure 13.25 Multiple Regression Output Specifying the Relationship Between Education, Number of Children, and Hours Spent per Day Watching Television

Coefficientsa

Unstandardized Standardized Coefficients Coefficients

Model B Std.Error Beta t Sig.

1 (Constant) 4.534 .472 10.620 .000 HighestYear –.128 .028 –.181 –4.558 .000 ofSchool .083 .058 .057 1.428 .154 Completed Numberof Children

a.DependentVariable:HoursPerDayWatchingTv

Model Summary

Model R RSquare AdjustedRSquare Std.ErroroftheEstimate

1 .203a .041 .038 2.335

a.Prddictors:(Constant),NumberofChildrenHighestYearofScholCompleted

a. What is the b coefficient for education? For number of children? Interpret each coefficient. Is the relationship between education and hours of viewing as hypoth-esized? How about number of children and television viewing?

b. Using the multiple regression equation with both education and number of children as independent variables, calculate the number of hours of television viewing for a person with 16 years of education and two children. Compare this with the pre-dicted value using the equation in Exercise 5.

c. Compare the r2 value from Exercise 5 with the R2 value from this regression. Does using education and number of children jointly reduce the amount of error involved in predicting hours of television viewed per day?

Exer

cise

s

13-Frankfort5e-45753.indd474 8/1/20089:53:15PM

Regression and Correlation— 475

12. In 2004, the U.S. Census published a report saying that the number of Americans living below the federal poverty line was at an all-time high. We want to know if the percentage of residents in each state living below the federal poverty line can be pre-dicted by taking into account both states’ racial composition and residents’ educational attainment. Figure 13.26 displays the results of multivariate regression (N = 50 states), predicting the percentage of a state’s residents living below the federal poverty line between 2002 and 2003 using the percentage of black residents in each state in 2002 and percentage of residents in each state with at least a high school diploma in 2002. Use these results to answer the questions below.

a. What is the b coefficient for the percentage of black residents in each state? For the percentage of states’ residents with at least a high school diploma? Interpret each coefficient. Do these results support the idea that poverty can be explained, at least in part, by considering the racial composition and education level of states’ residents? Why or why not? Use the appropriate statistics to make your argument.

b. Use the regression results to predict the percentage of a state’s residents living below the federal poverty line. Use the 2002 mean value of 10.2% for the percent-age of black residents in each state and the 2002 mean value of 85.6% for the percentage of states’ residents with at least a high school diploma. Is the predicted value below or above the mean value of 11.7% living below the federal poverty line between 2002 and 2003?

Figure 13.26 Multiple Regression Predicting the Percentage Living Below Poverty by Racial Composition and Educational Attainment

Coefficientsa

Model Summary

Model R RSquare AdjustedRSquare Std.ErroroftheEstimate

1 .779a .607 .591 1.91239

Unstandardized Standardized Coefficients Coefficients

Model B Std.Error Beta t Sig.

1 (Constant) 66.477 7.242 9.180 .000 %BLACKRESIDENTS −.045 .034 −.144 −1.304 .199 %W/HSDIPLOMA −.635 .082 −.851 −7.718 .000

c. What is the coefficient of determination? By how much has our prediction of the percentage living below the federal poverty line improved by employing the mul-tiple regression equation?

Exercises

13-Frankfort5e-45753.indd475 8/1/20089:53:15PM

476— S O C I A L S TAT I S T I C S F O R A D I V E R S E S O C I E T Y

13. On completing this chapter, you should be able to correctly answer the following questions.

a. True or false: It is possible, in fact it often is the case, that your slope b, will be a positive value and your correlation coefficient, r, will be a negative value.

b. Both a and b refer to changes in which variable, the independent or dependent? c. The coefficient of determination, r2, is a PRE measure. What does this mean? d. True or false: All regression equations reflect causal relationships expressed as

linear functions.

N o T E S

1. Refer to Paul Allison’s Multiple Regression: A Primer (Thousand Oaks, CA: Pine Forge Press, 1999) for a complete discussion of multiple regression—statistical methods and techniques that con-sider the relationship between one dependent variable and one or more independent variables.

2. If you obtain r simply by taking the square root of r2, make sure not to lose the sign of r (r2 is always positive, but r can also be negative), which can be ascertained by looking at the sign of SYX.

3. Centers for Disease Control, 2000.4. J. M. Greene and C. L. Ringwalt, “Pregnancy Among Three National Samples of Runaway

Homeless Youth,” Journal of Adolescent Health 23 (1998): 370–377.5. William J. Wilson, The Truly Disadvantaged: The Inner City, the Underclass, and Public Policy

(Chicago: University of Chicago Press, 1987).6. Stephanie Coontz, “The Welfare Discussion We Really Need,” Christian Science Monitor

(December 29, 1994): 19.7. Pregnancy rates were not reported for California, Florida, Iowa, New Hampshire, and Oklahoma.

The District of Columbia was removed from the analysis due to its extremely high teen pregnancy rate relative to other states.

8. Analysis is limited to women 40 years and older.

Exer

cise

s

13-Frankfort5e-45753.indd476 8/1/20089:53:15PM