supplement to the basic practice of statistics -- chapter 1

24
I. Introduction and Basic Definitions In Moore: Read beginning of Chapter 1 (pages 3 – 6), and page 202 Statistics is a science that involves the extraction of information from numerical data obtained during an experiment or from a sample. It involves the design of the experiment or sampling procedure and the collection of the data, the analysis of the data, and making inferences (statements) about larger groups based on the data that was collected. Definitions: 1. Population: the entire group of individuals (subjects) about which the researcher wants information. 1. All U.S. citizens 2. All male students at this university 3. All sections of all courses taught this semester at this university 2. Parameter: some characteristic of the population that the researcher wants to measure. 1. Proportion of U.S. citizens who voted in the last Presidential election 2. Average (mean ) height of all male students at this university 3. Proportion of all sections of all courses taught by adjunct (part-time) faculty Typically parameters are denoted using Greek letters, and this convention will be used in the remainder of this book. For example, the mean (average) of a population is often denoted with the Greek letter μ (read mu) and the proportion of a population that are “successes” is often denoted with the 1

Upload: kumars93

Post on 11-Jan-2016

17 views

Category:

Documents


3 download

DESCRIPTION

this is only the first chapter

TRANSCRIPT

Page 1: supplement to the basic practice of statistics -- chapter 1

I. Introduction and Basic Definitions

In Moore: Read beginning of Chapter 1 (pages 3 – 6), and page 202

Statistics is a science that involves the extraction of information from numerical data obtained during an

experiment or from a sample. It involves the design of the experiment or sampling procedure and the collection

of the data, the analysis of the data, and making inferences (statements) about larger groups based on the data

that was collected.

Definitions:

1. Population: the entire group of individuals (subjects) about which the researcher wants information.

1. All U.S. citizens

2. All male students at this university

3. All sections of all courses taught this semester at this university

2. Parameter: some characteristic of the population that the researcher wants to measure.

1. Proportion of U.S. citizens who voted in the last Presidential election

2. Average (mean) height of all male students at this university

3. Proportion of all sections of all courses taught by adjunct (part-time) faculty

Typically parameters are denoted using Greek letters, and this convention will be used in the remainder of this

book. For example, the mean (average) of a population is often denoted with the Greek letter μ (read mu) and the

proportion of a population that are “successes” is often denoted with the Greek letter π (read pi). Other

parameters will be introduced as the book progresses.

Often a population is so large that it is nearly impossible to contact or measure every subject, and hence the true

value of the parameter is usually unknown. So instead we must select a representative sample of the population

and only contact or measure the subjects in the sample. The definition of “representative” will be provided in the

next chapter.

3. Sample: a subset of the population that we examine in order to gather information.

If the population were all male students at this university, then the male students in this

class would be a sample of this population.

1

Page 2: supplement to the basic practice of statistics -- chapter 1

4. Statistic: a descriptive measure, usually computed from a sample, which can be expressed or

evaluated numerically.

If the population were all male students at this university, and the male students in this class a

sample of this population, then the average height of the male students in this class would be a

statistic.

Typically statistics are denoted with symbols involving regular English letters, and this convention will be

used in the remainder of this book. For example, the mean (average) of a sample is often denoted by the

symbol (read X-bar) and the proportion of a sample that are “successes” is often denoted by the symbol

(read p-hat). Other statistics will be introduced as the book progresses.

5. Inference: a statement about a population based on the data collected in a sample. One type of inference is

using a sample statistic to estimate a population parameter.

The average height of the male students in this class (a statistic, X̄ ) can be used to estimate

the average height of all male students at this university (a parameter, ).

Example 1

The 2008 Summer Olympic Games took place in Beijing, China from August 8 until August 24. Of the 195 countries in the world, 192 of them participated, with only Brunei, Kosovo and Vatican City not participating. Additionally, 12 territories participated (including American Samoa, Aruba, Bermuda, British Virgin Islands, Cayman Islands, Cook Islands, Guam, Hong Kong, Netherlands Antilles, Palestine, Puerto Rico and U. S. Virgin Islands). Hence there were a total of 204 countries and territories that were competing and eligible to win Olympic medals, and it is of interest to gather information about these 204 competing countries and territories. Certain things, such as the numbers of gold, silver and bronze medals won and the number of athletes participating in the Olympic games, were easy to determine for each country or territory, but other information could not be collected for all countries and territories. The number of gold medals won by each country and territory competing the 2008 Summer Olympic Games is easy to determine, and a total of 81 different countries and territories had at least one athlete win a gold medal. A random sample of 20 competing countries or territories was selected for further analysis. 13 of these 20 countries and territories, or 65% had at least one athlete win a gold medal, and 93% of the athletes in these 20 countries and territories answered that they enjoyed the Beijing Olympics. Interestingly, 7 of the 20 countries and territories in the sample were located in South America.

Describe the population of interest in this study.: the 204 countries and territories eligible to win a medal at the

Olympics

Describe the sample in this study.: the 20 countries or territories that were selected for further information

2

Page 3: supplement to the basic practice of statistics -- chapter 1

Identify each of the following as being a parameter or a statistic.

(a) 204, the number of competing countries and territories: parameter

(b) 20, the number of competing countries and territories selected for further analysis: statistic

(c) 65%, the percentage of the 20 countries or territories that had at least one athlete win a gold medal: statistic

(d) 93%, the percentage of the athletes in the 20 countries and territories that enjoyed the Beijing Olympics:

statistic

(e) 81, the number of all competing countries and territories that had at least one athlete win a gold

medal:parameter

(f) 7, the number of the 20 countries and territories that were located in South America: statistic

6. Distribution: a listing of all the possible values that a characteristic can take and the number (or percentage)

of times that each value occurs. A major component of statistics involves describing the

distribution of a set of data.

1. Consider gender of a student. Students are either male or female, and if we count the

number of each that would give us the distribution.

2. Consider grades on a test worth 100 points. We can count the number of students who

make grades between 0 and 9, between 10 and 19, etc through grades between 90 and

100. The 10-point intervals would be the values of the characteristic and the number of

students in each interval would complete the distribution.

Two key components of statistics involve learning how to describe a distribution, and also learning properties of

commonly used distributions. There are sections and chapters devoted to each of these things later in the book.

We generally deal with two branches of statistics:

1. Descriptive statistics is the branch of statistics concerned with numerical and graphical techniques for

describing one or more characteristics of a population and for comparing characteristics among populations.

The first part of this book, beginning with chapter III, will focus on descriptive statistics.

3

Page 4: supplement to the basic practice of statistics -- chapter 1

2. Inferential statistics is the branch of statistics in which we use data and statistics computed from a sample to

make inferences (statements) about a population. Often the inference is based on some descriptive statistic

that has already been computed or created. The second part of this book, beginning with chapter VII, will

focus on inferential statistics.

A goal of statistics is to measure some characteristic about a subject or set of subjects. To assure more accurate

results, we usually either measure the characteristic on several subjects (the sample), or if only one subject is

available, we repeat the measurement several times. This is called replication or repetition and it is in repeated

experiments that statistics become important.

Definitions:

1. When the measurements of some characteristic do not change in repeated trials over time, then the

characteristic is called a constant.

1. Number of days in January each year (always 31)

2. Number of minutes in an hour (always 60)

2. When the measurements of some characteristic vary (change) from trial to trial, then the characteristic is

called a variable.

1. Heights of students

2. Grades on a test

In statistics, we are primarily concerned with the observation of variables — if we know beforehand what the

measurement is going to be, as is the case with constants, then there is no reason to make repeated

measurements.

Variables are classified into two categories:

1. A qualitative (or categorical) variable is a variable whose measurements vary in kind or name but not in

degree, meaning that they cannot be arranged in order of magnitude. Hence one level of a qualitative variable

cannot be considered greater or better than another level.

1. Gender — male or female

2. Eye color — you can name the eye color

3. Social security number – a number is used to identify (name) a person

2. A quantitative variable is a variable whose measurements vary in magnitude from trial to trial, meaning

some order or ranking can be applied.

1. Number of students in a particular class

4

Page 5: supplement to the basic practice of statistics -- chapter 1

2. Weight of a typical student

3. Grades on a test

Quantitative variables are further classified as being discrete or continuous.

1. A discrete quantitative variable is a variable whose measurements can assume only a countable number of

possible values.

1. Number of students in a particular class

2. Number of cars in a parking deck

3. Grades on a test

2. A continuous quantitative variable is a variable whose measurements can assume any one of a countless

number of values in a line interval. It is usually either a measurable quantity or something that is calculated

such as rates, averages, proportions, and percentages.

1. Weight of a typical student

2. Percentage of students who pass a course

Example 2

Identify each of the following characteristics as being a constant, a qualitative variable, a discrete quantitative

variable, or a continuous quantitative variable.

(a). College major: qualitative variable

(b). Number of dependents claimed on a tax form: discrete quantitative variable

(c). Number of people serving as President of the United States at any one time: constant

(d). Average age of students in a class: continuous quantitative variable

(e). Zip code: qualitative

(f). Suicide rate: continuous quantitative variable

5

Page 6: supplement to the basic practice of statistics -- chapter 1

Additional Reading and Examples

1. A fundamental component of statistics is the ability to understand and recognize the difference and

relationship between a population and a sample. In most problems the data that we have is a sample of the

population, and our goal is to use this data to make statements about the population.

2. Greek letters will be used to denote parameters and symbols involving regular English letters will be used to

denote statistics.

3. The type of analysis to be performed on a set of data is dependent on the type of data that we have collected.

Hence being able to identify whether a variable is qualitative or quantitative, and for quantitative variables

whether it is discrete or continuous, is important to assure that the appropriate analysis is done.

4. Radon is a radioactive gas that is generally present in harmless amounts in nature. However, in certain

dwellings radon gas is known to be present in quantities that may be harmful to humans, particularly in

basements of buildings where air is stagnant. The Environmental Protection Agency (EPA) sets standards for

environmental emissions from hazardous substances. According to the July 1995 issue of Consumer Reports,

the EPA has suggested that radon levels exceeding 4.0pc/l (picocurie per liter) are associated with an

increased risk of lung cancer. Of interest is to estimate the percentage of all dwellings in which the radon

concentration poses an increased risk of lung cancer. Data is available for a sample of 51 residential buildings

owned by a local real estate developer.

The population consists of all dwellings in which humans may enter and which could therefore pose a health

risk. The specific parameter of interest is the percentage of all these dwellings that have a radon

concentration above 4.0 pc/l. To estimate this percentage a sample of 51 residential building is selected, and

the percentage of these 51 dwellings that have radon concentrations above 4.0 pc/l can be computed. This

sample percentage is a statistic and can be used to estimate the percentage of all dwellings with radon

concentrations above 4.0 pc/l (the inference).

The owner of the dwelling is a qualitative variable, because we can only name the owner. The actual radon

concentration is measured and hence is a continuous quantitative variable. The number of dwellings with

radon concentrations exceeding 4.0pc/l is countable and hence is a discrete quantitative variable. However,

the percentage of the dwellings with radon concentrations exceeding 4.0 pc/l can take an uncountable number

6

Page 7: supplement to the basic practice of statistics -- chapter 1

of possible values and hence is a continuous quantitative variable. The number of dwellings is countable

and hence is a discrete quantitative variable.

5. On January 20, 1986 the National Aeronautics and Space Administration (NASA) experienced a great

tragedy, as the space shuttle Challenger exploded less than two minutes from take off killing all on board.

Could this tragedy have been avoided? To answer this question the Rogers Commission, headed by then

Secretary of State William Rogers, studied the accident and the events that led to the fatal launching. Their

investigation determined the cause of the accident, and their findings were published in the two volume

Report of the Presidential Commission on the Space Shuttle Challenger Accident (1986).

Through the use of statistics, the report indicates that the flight should never have taken place and hence the

explosion could have been avoided. To illustrate this, we must first understand some information on how the

space shuttle operates. A space shuttle uses two booster rockets consisting of several pieces whose

connections are sealed with rubber O-rings, with the booster rockets lifting the shuttle into orbit. Each booster

has three primary O-rings, for a total of six on the entire shuttle.

Using data collected from previous flights, NASA had determined that the performance of the O-rings was

quite sensitive to the temperature. Due to their rubber makeup, the O-rings will change shape when a

compression is placed on them. The previous data has revealed that when this compression is removed, a

warm O-ring will recover its shape, while a cold O-ring will not. When an O-ring does not recover its shape

the joints will not be sealed, and hence a gas leak is quite possible.

Prior to the Challenger launch, the coldest launch temperature had been 53 degrees Fahrenheit. The

forecasted temperature for January 20, 1986 was only 31 degrees. Prior to the flight, the NASA engineers

discussed the conditions for the flight and decided to proceed with the launch. Unfortunately a statistician was

not involved in the discussion, because using only the data available at the time of the launch and some very

simple statistical analyst, the failure of the flight likely could have been predicted. The statistical analysis

follows.

Of the previous 23 flights, 16 of them were completed with no O-rings being damaged. The minimum

temperature of these 16 flights was 66 degrees, with an average temperature of 72.5 degrees. On five of the

flights, one O-ring was damaged, and on the other two flights two O-rings were damaged. The temperatures

for these seven flights ranged from 53 to 75 degrees, with an average temperature of 63 .7 degrees. The data

clearly indicate that there is a strong relationship between launch temperature and O-ring damage, with colder

temperatures associated with a higher chance of O-ring damage. Using a more advanced statistical technique

referred to as logistic regression, a function could be estimated that would predict the probability of O-ring

damage given the temperature at the time of the launch. Using the data available from the previous 23

7

Page 8: supplement to the basic practice of statistics -- chapter 1

launches, the predicted probability of O-ring damage for a launch temperature of 31 degrees is .96. Hence

given the data available at the time of the launch, the engineers could have predicted the near-certain O-ring

damage that allowed the gas leak whose combustion resulted in the explosion of the Challenger.

A more exhaustive discussion of this material can be found in the 1989 paper “Risk Analysis of the Space

Shuttle: Pre-Challenger Prediction of Failure,” by S. Dalal, E. Fowlkes, and B. Hoadley, which appeared in

Journal of the American Statistical Association.

TI-83/84 CalculatorIn many of the remaining chapters instructions will be given to use a T1-83/84 calculator. In this chapter we

begin with instructions for entering data.

1. Press the STAT button; then under the EDIT on-screen menu select 1:Edit.

2. You should get a window that looks like this:

If your list contains data, you can remove the data in an individual list by moving the cursor up (using the up-

arrow button) to select the name of the list you want to clear and then press the CLEAR button followed by

the ENTER button. Your list should now be clear.

To clear data in all lists, select the memory command 2nd + (two separate keys) then select 4:ClrAllLists

from the on-screen menu. When ClrAllLists appears on the screen, press ENTER. All of your lists should

now be clear.

3. Enter your data set into a list by typing in the first value, press ENTER, then type in the next value, press

ENTER, and continue doing so until all values are in the list. Be sure to press ENTER after every data value

(including the final value). You can use the up-arrow and down-arrow buttons to scroll through your list to be

sure that all of the data values were entered correctly. Replace any incorrect value with the correct value and

press ENTER.

4. If you have grouped data (data values and frequencies), then you would enter the data values into one list and

the frequencies into the next list. For example, if the number 57 occurs 3 times, enter 57 into list L1 and enter

3 into list L2.

8

Page 9: supplement to the basic practice of statistics -- chapter 1

5. If you mistakenly delete any one of your six lists (by pressing DEL instead of CLEAR when trying to clear a

list), you may retrieve it by pressing STAT, then select EDIT, then select 5: SetUpEditor. When

SetUpEditor appears on the screen, press ENTER. Now all of your six lists should be available.

Practice Problems:

I.1. A local city council is interested in determining the percentage of people who live in the city that would be

in favor of spending the money necessary to finance the renovation of the city’s sports arena. They randomly

sampled 250 city residents and asked each of them whether they would favor investing their tax dollars for

such a purpose. Identify the population, parameter, sample, and statistic in this experiment, and briefly

explain the inference that is taking place.

I.2. Identify each of the following characteristics as being a constant, a qualitative variable, a discrete

quantitative variable or a continuous quantitative variable. Support your choice.

(a). Type of illness (b). Birth rate (c). Number of pets owned

(d). Daily rainfall (e). Marital status (f). Temperature of classroom (°F)

I.3. A local church congregation consists of 318 members, and the total offering collected during 1998 was

$76,002. This works out to a mean contribution of $239 per member. The church is now faced with the task

of putting a new roof on the church building. A sample of 30 congregation members was selected to form a

“roof committee”, and the mean contribution of these 30 members to the church in 1998 was $275. The

church has received only one bid to repair the roof, and 24 out of the 30 “roof committee” members, or 80%,

approve allowing this company to do the work.

According to the paragraph above, identify each of the following as being a parameter or a statistic.

(a) 30 (b) 80% (c) $239 (d) $275 (e) 318 (f) $76,002

Identify each of the following characteristics as being a constant, a qualitative variable, a discrete

quantitative variable, or a continuous quantitative variable. Support your choice.

(g). Number of congregation members present each Sunday.

(h). Sunday school class to which each member belongs.

(i). Percentage of the congregation members in favor of the repair proposal.

(j). Average age of church attendees each Sunday.

(k). Number of years that a person has been a member of the church.

I.4. As we will learn later the standard deviation is a measure of spread in a data set and being able to correctly

identify whether a value is the population standard deviation (and hence a parameter) or the sample standard

9

Page 10: supplement to the basic practice of statistics -- chapter 1

deviation (and hence a statistic) is very important. For each of the following there is a standard deviation

highlighted. Identify whether this is the standard deviation of the population or of the sample.

(a). A national grocery store chain reports that the mean amount spent on groceries per trip by all of its

customers is $75.45 with a standard deviation of $15.00. The chain recently opened stores in the

Richmond, Virginia area and claim that the mean amount spent on groceries per trip by customers in

their stores is less than the national mean. To test this claim a simple random sample of 200 Richmond

area customers is selected and the amount spent on groceries during their last national grocery store

chain shopping trip determined. The mean amount spent by this sample of customers is $71.89.

(b). In an attempt to determine the appropriate methods of publicizing the Annual Giving campaign,

YMCA officials are obtaining demographic information about the potential contributors. One of these

demographic characteristics is the age of the contributor. The mean age of all adults in the surrounding

area is known to be 48.3 years. The YMCA selected a simple random sample of 29 contributors to the

YMCA Annual Giving campaign and determined the age of each. The mean age of this sample of 29

contributors was 45.4 years with a standard deviation of 6.2 years.

(c). According to the Butterball website, Butterball turkeys typically range from 10 to 25 pounds. Of

interest is to estimate the mean weight of all Butterball turkeys that were sold for Thanksgiving 2004.

To estimate this, a simple random sample of 51 Butterball turkeys sold between November 1 and

November 24, 2004 were selected and their weights recorded. The mean weight of this sample of 51

Butterball turkeys was 18.7 pounds, with a standard deviation of 2.4 pounds.

(d). The United States Congress consists of 100 Senators and 435 members of the House of

Representatives. A historian who is writing a book about the 107th Congress is interested in the

intelligence of the members of Congress. One approach to measure the intelligence is to give a

standard intelligence test to the members of Congress. The test that is to be used is scaled such that all

scores are between 0 and 200, and such that the standard deviation of all scores is 20 points.

I.5. On January 27, 2000 Bill Clinton delivered his last State of the Union Address. Among those invited to be in

attendance were the 535 members of Congress, which comprise the population. Of the 535 members of

Congress, 487, or 91%, were in attendance. Political experts were interested in the opinions that members of

Congress have about the speech and about the current status of the nation. To gather this information, they

randomly sampled 20 members of Congress. Of these 20 members, 12 were Democrats and 8 were

Republicans. 14 gave President Clinton favorable marks for his speech, and 19, or 95%, gave the current

status of the nation a positive rating. The average age (in years) of the 20 members in the sample was 73.

Identify each of the following as being a parameter or a statistic.

(a) 20 (b) 535 (c) 91% (d) 95% (e) 12 Democrats (f) 73

10

Page 11: supplement to the basic practice of statistics -- chapter 1

Identify each of the following characteristics as being a constant, a qualitative variable, a discrete

quantitative variable, or a continuous quantitative variable. Support your choice.

(g). Percentage of each elected Congress less than 65 years of age

(h). State from which a Congressperson was elected

(i). Number of Democrats attending each Congressional meeting

June 16, 2014

Most Americans Remain Satisfied With Healthcare System

Little change in satisfaction since mid-Marchby Frank Newport

http://www.gallup.com/poll/171680/americans-remain-satisfied-healthcare-system.aspx

PRINCETON, NJ -- Sixty-six percent of Americans in the first half of June are satisfied with the way the healthcare

system is working for them, in line with attitudes since mid-March. Gallup's seven-day rolling average on this

measure shows confidence during that time has varied only slightly, increasing to 70% in mid-April, just after the

enrollment period ended for purchasing insurance under the provisions of the Affordable Care Act. That modest

increase was short-lived; satisfaction has averaged about 65% since mid-May.

Gallup began asking this question nightly on March 21 as a continuous measure of the way changes in the nation's

healthcare system are affecting average Americans. There are no comparable data for 2013 or previous years, before

the ACA's individual insurance mandate helped lead to a significant drop in the uninsured population in the U.S.

However, these readings serve as a baseline for assessing the effect the ACA is having on people's healthcare

11

Page 12: supplement to the basic practice of statistics -- chapter 1

experiences as they interact with the system going forward. Americans' satisfaction with the healthcare system is

highly related to their health insurance status, although having health insurance does not guarantee satisfaction.

Nearly three in 10 Americans with insurance say they are dissatisfied with the way the healthcare system is working

for them. Among those without insurance --currently about 13.4% of Americans -- six in 10 are dissatisfied.

Higher percentages of Americans aged 65 and older are satisfied (79%) with how the system is working, with

satisfaction ranging from 61% to 66% among those between the ages of 18 and 49. This elevated level of

satisfaction among older Americans reflects that most in this group are covered by Medicare. Along these same

lines, slightly higher-than-average proportions of individuals who have government-paid insurance -- including not

only Medicare, but Medicaid and military or veterans insurance -- are satisfied with the way the healthcare system is

treating them.

12

Page 13: supplement to the basic practice of statistics -- chapter 1

Politics Have an Effect Above and Beyond Insurance Status

This measure of healthcare satisfaction does not mention the ACA or any other details about what the U.S.

healthcare system entails, and is, effectively, politically neutral. Still, politics shade the results, with Democrats and

Democratic-leaning independents somewhat more likely to say they are satisfied than are Republicans and

Republican leaners. While health insurance status is clearly the most significant predictor of satisfaction, partisan

identification still has an effect beyond basic insurance status. There is a 16-point satisfaction gap between

Republicans and Democrats among those with health insurance, and a nine-point gap among those without it.

Implications

About 13% of the U.S. adult population remains uninsured at this point, even after the institution of the ACA's

individual mandate and the availability of insurance through government health exchanges. These uninsured

Americans are half as likely as those with insurance to say they are satisfied with the way the healthcare system is

working for them. Still, one in three Americans without insurance are satisfied, and almost as many of those with

insurance are dissatisfied, indicating that there is more involved in satisfaction than having insurance.

More Americans with government-paid insurance are satisfied with the way the system is working than is true for

those with private plans. The data do not provide a way of separating out those newly insured through the

13

Page 14: supplement to the basic practice of statistics -- chapter 1

exchanges. These individuals are just now beginning to see how well these plans work in their personal situations,

and all of them will need to re-enroll when the exchanges open up again in mid-November. This ongoing measure of

Americans' healthcare satisfaction will provide an assessment of any effect of these changes going forward.

Americans' high level of satisfaction with how the healthcare system is treating them suggests that healthcare is not

in a crisis for most Americans. At the same time, that 30% of the adult population -- more than 70 million people --

is not satisfied with the healthcare system underscores the need for improvement.

Some Americans who are dissatisfied with the way the healthcare system is treating them may be expressing their

displeasure with particular doctors, hospitals, billing issues, or medical procedures -- aspects that have little to do

with the broader issue of how healthcare coverage is provided. The differing satisfaction levels between those with

insurance and those without, and between those with federal and those with private plans, however, indicate that the

way this coverage is provided is obviously related to healthcare satisfaction.

Survey Methods

Results for this Gallup poll are based on telephone interviews conducted March 21-June 14, 2014, with a random

sample of 42,566 adults, aged 18 and older, living in all 50 U.S. states and the District of Columbia. For results

based on the total sample of national adults, the margin of sampling error is ±1 percentage point at the 95%

confidence level.

Interviews are conducted with respondents on landline telephones and cellular phones, with interviews conducted in

Spanish for respondents who are primarily Spanish-speaking. Each sample of national adults includes a minimum

quota of 50% cellphone respondents and 50% landline respondents, with additional minimum quotas by time zone

within region. Landline and cellular telephone numbers are selected using random-digit-dial methods. Landline

respondents are chosen at random within each household on the basis of which member had the most recent

birthday.

Samples are weighted to correct for unequal selection probability, nonresponse, and double coverage of landline and

cell users in the two sampling frames. They are also weighted to match the national demographics of gender, age,

race, Hispanic ethnicity, education, region, population density, and phone status (cellphone only/landline only/both,

and cellphone mostly). Demographic weighting targets are based on the most recent Current Population Survey

figures for the aged 18 and older U.S. population. Phone status targets are based on the most recent National Health

Interview Survey. Population density targets are based on the most recent U.S. census. All reported margins of

sampling error include the computed design effects for weighting. In addition to sampling error, question wording

and practical difficulties in conducting surveys can introduce error or bias into the findings of public opinion polls.

14

Page 15: supplement to the basic practice of statistics -- chapter 1

June 11, 2014

Smaller Majority of Americans View Hillary Clinton Favorably

At 54%, her favorability has slipped since Februaryby Justin McCarthy

http://www.gallup.com/poll/171290/smaller-majority-americans-view-hillary-clinton-favorably.aspx

WASHINGTON, D.C. -- Hillary Clinton's favorability rating has dropped slightly, although a majority of Americans continue to

view her in a positive light. As Clinton publicizes her new memoir, “Hard Choices,” 54% of Americans view her favorably. This

is down from 59% in February, and significantly less than the ratings she received as secretary of state, which were consistently

above 60%.

The latest findings come from a Gallup poll conducted June 5-8. Though Clinton has said she will not announce whether she'll

run for president until at least later this year, her latest book has been widely framed as a preamble to another presidential bid and

a move typical of White House hopefuls. Clinton already has the support of many elected officials and Democratic Party repre-

sentatives if she chooses to run. Americans have named her their Most Admired Woman 18 times. Clinton's current favorability

rating is the lowest it has been since August 2008 (54%), when she was preparing to deliver a speech at the Democratic National

Convention endorsing then-Sen. Barack Obama, who defeated her in a hard-fought primary battle for the party's 2008 presiden -

tial nomination.

15

Page 16: supplement to the basic practice of statistics -- chapter 1

After recovering from a contentious Democratic primary race that strapped her with campaign debt, Clinton's favorability soared

while she served as secretary of state during Obama's first term. As she continued in her role, as many as two-thirds of Americans

(66%) viewed her favorably, in consecutive polls in 2011 and 2012 -- a rating she surpassed only once before, at 67% in Decem-

ber 1998, shortly after her husband, President Bill Clinton, was impeached. In her nonpolitical role as secretary of state, Clinton

enjoyed extremely high ratings from her fellow Democrats, but saw her ratings increase among Republicans as well. She peaked

with Republicans during this period in mid-2012, when 41% viewed her favorably. Her favorability fell with the GOP, as it did

with independents, after the September 2012 attacks on the U.S. compound in Benghazi, Libya. As Democratic elected officials

continue to encourage her to run for president, her name has become further politicized, thus making her less favorable to non-

Democrats.

GOP operatives and media pundits have publicly questioned whether her health and age (Clinton is now 66) could hinder her

ability to serve as president. Additionally, the House of Representatives has formed a select committee to investigate the attack in

Benghazi. And as she wades into Obama's controversial decision to trade five Taliban prisoners held at Guantanamo Bay for U.S.

soldier Bowe Bergdahl, Clinton's performance as one of Obama's top cabinet members will likely undergo greater scrutiny.

Bill Clinton, who in the same poll receives a 64% favorable rating, has commanded majority favorability from Americans during

most of his time as president and in post-presidential life. While some may view his high favorability ratings as an advantage for

Hillary if she decides on another presidential run, Bill's favorability did take a hit when he joined her on the 2008 campaign trail.

By January 2008 -- a year after Hillary announced her candidacy -- his favorability hit a five-year low of 50%, barely ahead of

Hillary's 48%. In fact, for much of Hillary's career since Bill's presidency, their favorability ratings have been closely related.

Their latest favorability ratings are separated by 10 percentage points and, with the exception of a 12-point difference in March

2007, are as far apart as they've been since Hillary independently entered the political fray as a candidate for the U.S. Senate from

New York. During Hillary's first term as New York's junior senator, her favorability was closely linked to that of her husband.

But for the first three years of her second term, from 2005-2007, their ratings differed by five to 12 percentage points. Then, in

early 2008, when Bill became a proxy campaigner for Hillary in her bid for the presidency, his favorability fell and their ratings

converged.

Hillary Clinton's era of higher favorability appears to be ending even before she announces whether she will run for president.

Americans typically rate non-political figures higher than political ones on this measure, and her favorable ratings before, during,

and after being secretary of state are consistent with that phenomenon. Though her husband's influence is far from Hillary's great-

est selling point, he may be better positioned to help her on the campaign trail than he was last time, with his favorability up five

points from what it was in mid-2006. But if Hillary does run, the boost she receives from him may be limited if it is similar to

2008, with his past favorability so closely married to her own in the backdrop of a presidential campaign.

Survey Methods

16

Page 17: supplement to the basic practice of statistics -- chapter 1

Results for this Gallup poll are based on telephone interviews conducted June 5-8, 2014, with a random sample of 1,027 adults,

aged 18 and older, living in all 50 U.S. states and the District of Columbia. For results based on the total sample of national

adults, the margin of sampling error is ±4 percentage points at the 95% confidence level.

Interviews are conducted with respondents on landline telephones and cellular phones, with interviews conducted in Spanish for

respondents who are primarily Spanish-speaking. Each sample of national adults includes a minimum quota of 50% cellphone re-

spondents and 50% landline respondents, with additional minimum quotas by time zone within region. Landline and cellular tele -

phone numbers are selected using random-digit-dial methods. Landline respondents are chosen at random within each household

on the basis of which member had the most recent birthday.

Samples are weighted to correct for unequal selection probability, nonresponse, and double coverage of landline and cell users in

the two sampling frames. They are also weighted to match the national demographics of gender, age, race, Hispanic ethnicity, ed-

ucation, region, population density, and phone status (cellphone only/landline only/both, and cellphone mostly). Demographic

weighting targets are based on the most recent Current Population Survey figures for the aged 18 and older U.S. population.

Phone status targets are based on the most recent National Health Interview Survey. Population density targets are based on the

most recent U.S. census. All reported margins of sampling error include the computed design effects for weighting. In addition to

sampling error, question wording and practical difficulties in conducting surveys can introduce error or bias into the findings of

public opinion polls.

17