1 lecture 1 – preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21...

119
1 Lecture 1 – Preliminaries Textbook Sections: Chapter 1, Sections 2.1,2.2,3.1,3.2 Problems: 1.3,1.5,1.7,2.3,2.4,2.13,3.20,3.25,3.45a-d These notes are intended to simultaneously review and extend the basic concepts of STA 2023 that are used in business applications. In this section, we describe the notions of: Populations and Samples Descriptive and Inferential Statistics Variable Types Tabular and Graphical Descriptions Numerical Descriptive Measures 1.1 Populations and Samples Populations are collections of individuals or items of interest to a researcher. We are typically concerned with one or more characteristics of the elements of the population. Examples include: PO1 All firms listed on the New York Stock Exchange (NYSE) throughout year 2001. PO2 All living graduates of the University of Florida. PO3 All pairs of Levi’s 550 jeans produced in January, 2002. Samples are subsets of their corresponding populations, used to describe or make inferences concerning particular characteristics of the elements of the population. Examples include: SA1 30 firms sampled at random from all firms listed on NYSE throughout 2001. SA2 100 UF graduates sampled from alumni records. SA3 A randomly selected set of 250 pairs of Levi’s 550 jeans produced in January, 2002. 1.2 Descriptive and Inferential Statistics Descriptive Statistics — Methods used to describe a group of measurements (e.g. mean, median, standard deviation, proportion (percent) with some characteristic). Examples include (Sources: Wall Street Journal Almanac,1999, US Statistical Abstract, 1992): Average daily shares traded NYSE (millions): 1980 – 49.8 1985 – 121.26 1990 – 176.0 1995 – 361.9 Median earnings for year round full–time workers (1990, in $1000): Males – 27.7 Females – 19.8 Mean and standard deviation of heights of adults 25–34 years old: Females – Mean=63.5”, std. dev=2.5” Males – Mean=68.5”, std dev=2.7”

Upload: others

Post on 17-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

1 Lecture 1 – Preliminaries

Textbook Sections: Chapter 1, Sections 2.1,2.2,3.1,3.2Problems: 1.3,1.5,1.7,2.3,2.4,2.13,3.20,3.25,3.45a-d

These notes are intended to simultaneously review and extend the basic concepts of STA 2023that are used in business applications. In this section, we describe the notions of:

• Populations and Samples

• Descriptive and Inferential Statistics

• Variable Types

• Tabular and Graphical Descriptions

• Numerical Descriptive Measures

1.1 Populations and Samples

Populations are collections of individuals or items of interest to a researcher. We are typicallyconcerned with one or more characteristics of the elements of the population. Examples include:

PO1 All firms listed on the New York Stock Exchange (NYSE) throughout year 2001.

PO2 All living graduates of the University of Florida.

PO3 All pairs of Levi’s 550 jeans produced in January, 2002.

Samples are subsets of their corresponding populations, used to describe or make inferencesconcerning particular characteristics of the elements of the population. Examples include:

SA1 30 firms sampled at random from all firms listed on NYSE throughout 2001.

SA2 100 UF graduates sampled from alumni records.

SA3 A randomly selected set of 250 pairs of Levi’s 550 jeans produced in January, 2002.

1.2 Descriptive and Inferential Statistics

Descriptive Statistics — Methods used to describe a group of measurements (e.g. mean, median,standard deviation, proportion (percent) with some characteristic). Examples include (Sources:Wall Street Journal Almanac,1999, US Statistical Abstract, 1992):

• Average daily shares traded NYSE (millions):

1980 – 49.8 1985 – 121.26 1990 – 176.0 1995 – 361.9

• Median earnings for year round full–time workers (1990, in $1000):

Males – 27.7 Females – 19.8

• Mean and standard deviation of heights of adults 25–34 years old:

Females – Mean=63.5”, std. dev=2.5” Males – Mean=68.5”, std dev=2.7”

Page 2: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

• Percent of families in US living below poverty level:

1970 – 12.6 1990 – 13.5

Inferential Statistics — Methods used to reach conclusions (decisions) concerning a popula-tion, based on measurements from a sample. Examples include:

• A sample of 2007 American adults were asked if they tought there would be a recession inthe next five years. Of those sampled, 66% answered “Yes”. Based on this sample we can bevery confident that a majority of American adults feel there will be a recession in the nextfive years. (Source: WSJ, 6/27/97, p. R1).

• After determining a safe dosing regimen, drug manufacturers must demonstrate efficacy ofnew drugs by comparing them with a placebo or standard drug in large–scale Phase III trials.In one such trial for the antidepressant Prozac (Eli Lilly & Co), researchers measured thechange from baseline in Hamilton Depression (HAM–D) scale. Based on a sample of 185patients receiving Prozac, the mean change (improvement) was 11.0 points, and among 169patients receiving placebo, the mean change was 8.2 points. Based on these samples, we canconclude that mean change from baseline in all patients receiving Prozac would be higherthan the mean change from baseline in all patients receiving placebo at a very high level ofconfidence.

1.3 Levels of Data Measurement

Nominal — Variable’s levels have no distinct ordering. Examples:

• Type of business (cyclical,non–cyclical,utility,. . . )

• Sex (female,male)

• Brand of beer purchased (Bud,Miller,Coors,. . . )

Ordinal — Levels can be ordered, but distances between levels are indeterminable. Examples:

• Product quality (poor,fair,good,excellent)

• Stanard & Poor’s Corporate bond rating (AAA,AA,A,BBB,BB,B,CCC,CC,C,D)

• Response to test drug (death,extensive deterioration,moderate/slight deterioration,no change,mod/slightimprovement,extensive improvement).

• Placement in sales rankings (1st, 2nd, . . . ,last).

Interval — Measurements fall along a numerical scale, such that distances between levels havemeaning. Examples:

• Quarterly corporate profits (in dollars)

• Time to assemble an automobile (in minutes)

• Number of items sold by a salesman in one month (units)

• Number of defective computer keyboards produced by a worker on a given day.

Ratio — Same as interval, but also containing an absolute 0, so that ratios, as well as distanceshave meaning. All examples above, except profits (which can be negative) are ratio scale.

Page 3: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

1.4 Tabular and Graphical Distribtions

Frequency distributions are lists of “classes” of levels of a variable, and the number of observedoutcomes within that range. Relative frequency distributions can be obtained for data of any levelof measurement. They can be depicted in tabular or graphical form.

Example 1.1 – 1994 Florida County DataTable 1.4 gives the population, total income (in $1000s), per capita income (in $1000s), and

retail sales (in $1000s) for Florida’s 67 counties in 1994 (Source: U.S. Census Bureau).The “classes” chosen for the frequency distribution for per capita income (in $1000s) are 5-10,

10–15, etc. If any observation fell right on a “breakpoint” between classes, it was assigned to theupper class. The following computer output gives the following distributions for the 67 counties inthis dataset:

Frequency — Labelled “Frequency”, this gives the list of the numbers of counties falling in thevarious categories.

Relative Frequency — Labelled “Percent”, this gives the percentage of the counties falling ineach of the categories.

Cumulative Frequency — Labelled “Cumulative Frequency”, this gives the number of countiesfalling in or below this category.

Relative Cumulative Frequency — Labelled “Cumulative Percent”, this gives the percent ofcounties falling in or below this category.

Cumulative Cumulativepci94 Frequency Percent Frequency Percent---------------------------------------------------------5-10 1 1.49 1 1.4910-15 21 31.34 22 32.8415-20 28 41.79 50 74.6320-25 10 14.93 60 89.5525-30 3 4.48 63 94.0330-35 4 5.97 67 100.00

Various graphs are useful in describing bodies of data, and are often given in business reporting.

Histograms – Vertical bar charts that identify categories for categorical variables and ranges ofvalues for interval scale variables, with heights representing frequencies of outcomes for asingle variable.

Pie Charts – Circular graphs where the size of each “slice” represents the frequency for a partic-ular category or range of values.

Scatter Plots – Plots of pairs of outcomes on two variables, where each point on the graphrepresents a single element from a set of data.

Time Series Plots – Plot of a single variable that is measured over a series of points in time.

Page 4: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

Total Per Capita RetailCounty Population Income Income Sales

Alachua 193353 3747486 19.382 305841Baker 19786 293855 14.852 14637Bay 139507 2495859 17.891 243300Bradford 24004 318011 13.248 17103Brevard 442637 8677944 19.605 574026Broward 1386497 34167902 24.643 2493269Calhoun 11738 146096 12.446 9410Charlotte 125832 2400459 19.077 156605Citrus 104035 1672558 16.077 102660Clay 120257 2238700 18.616 154681Collier 177778 5452519 30.670 361542Columbia 47886 719253 15.020 57166Dade 2012237 40530049 20.142 3251235De Soto 25074 410138 16.357 21529Dixie 11706 141523 12.090 8725Duval 702846 14553773 20.707 1206452Escambia 272187 4800237 17.636 391410Flagler 37818 594050 15.708 26894Franklin 9906 151954 15.340 8930Gadsden 43021 617896 14.363 30312Gilchrist 11929 149748 12.553 3492Glades 7615 111747 14.675 2548Gulf 13041 187590 14.385 8380Hamilton 11570 141496 12.230 6139Hardee 21611 338067 15.643 18125Hendry 29325 505741 17.246 27234Hernando 117141 1872699 15.987 106199Highlands 73685 1296740 17.598 82240Hillsborough 871046 17631999 20.242 1546211Holmes 16933 215197 12.709 9701Indian River 95250 2772529 29.108 148059Jackson 43787 664992 15.187 45529Jefferson 12761 185227 14.515 7127Lafayette 5873 79439 13.526 2237Lake 173250 3170498 18.300 188032Lee 367322 8103201 22.060 637949Leon 211763 4190977 19.791 358281Levy 28827 396698 13.761 24600Liberty 6257 89835 14.358 2405Madison 17197 223594 13.002 12120Manatee 226289 5194196 22.954 308061Marion 219358 3655070 16.663 290072Martin 109027 3491389 32.023 181558Monroe 81460 2068322 25.391 195625Nassau 49496 1035360 20.918 52127Okaloosa 160725 3048783 18.969 250499Okeechobee 31036 463635 14.939 32529Orange 740474 15108479 20.404 1598855Osceola 126386 2049838 16.219 197719Palm Beach 959721 31994145 33.337 1674647Pasco 298677 5051203 16.912 282914Pinellas 865364 21502994 24.848 1499770Polk 429408 7661229 17.841 746285Putnam 68598 978635 14.266 55690St. Johns 98214 2519924 25.657 131375St. Lucie 169116 2788362 16.488 178863Santa Rosa 99003 1695027 17.121 68145Sarasota 291722 8831912 30.275 540520Seminole 323719 7062419 21.817 507307Sumter 33367 486950 14.594 26922Suwannee 29489 436779 14.812 36258Taylor 17332 268153 15.472 19028Union 12193 115280 9.455 3937Volusia 403899 7154872 17.715 561530Wakulla 16665 258477 15.510 9233Walton 32677 470799 14.408 32274Washington 17984 248533 13.820 12653

Page 5: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

Data Maps – Plot of a single variable that is measured over a series of points in 2-dimensionalspace.

A histogram of per capita income is given in Figure 1. We see that most counties are in therange of $10,000 to $25,000 (the second, third, and fourth ranges of values), with one county lowerthan this range, and the seven most affluent counties being above this range. A pie chart of thesame data is given in Figure 2.

A scatter plot of retail sales (on the vertical or up/down axis) versus total income (on thehorizontal or left/right axis) is given in Figure 3. A tendency for counties with higher total incomesto have higher retail sales can be seen. This is considered to be a postive association.

A data map of per capita income is given in Figure 4. We can see visually where the mostaffluent and poorest counties are.

A time series plot of monthly average airfares (per 1000 miles of domestic flights) is given inFigure ?? for the period January 1980 through December 2001 (Source: Air Transport Association).We observe periodic trends (as demand shifts throughout the year) as well as longer term cycles,however, the series is showing only a very small long-term increase in trend. These prices arenot adjusted for inflation and are called nominal prices (not to be confused with nominal variabletypes). Figure ?? gives the series adjusted for inflation, showing that real prices have decreasedover this period. Figure 7 gives the monthly consumer price index (CPI) over this 22 year (264month) period (Source: US Department of Commerce).

1.5 Parameters and Statistics

Parameters are numerical descriptive measures corresponding to populations. We will use thegeneral notation θ to represent parameters. Special cases include:

µ Population mean — The average value of all elements in the population. It is also consideredthe ‘long–run’ average measurement in terms of conceptual populations. It can be thought ofas the value each unit would receive if the total of the outcomes had been evenly distributedamong the units.

σ2 Population variance — Measure of spread (around µ) of the elements of the population.

P Population proportion — The proportion of all elements of the population that posess a partic-ular characteristic.

µ1 − µ2 The difference between 2 population means.

P1 − P2 The difference between 2 population proportions.

Examples related to previous scenarios, as well as new ones include:

PA1 The proportion of all NYSE listed firms whose stock value increased in 1991 (P ).

PA2 The proportion of all living UF graduates who are members of the alumni association (P ).

PA3 The mean number of flaws in all pairs of Levis 550 jeans manufactured in January, 1992 (µ).

PA4 The proportion of all people who have (or will have) a disease that show remission due todrug treatment (P ).

PA5 The difference between mean lifetimes of two brands of automobile tires (µ1 − µ2).

Page 6: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

F R E Q U E N C Y

0

1 0

2 0

3 0

p c i 9 4 M I D P O I N T

7 . 5 0 0 1 2 . 5 0 0 1 7 . 5 0 0 2 2 . 5 0 0 2 7 . 5 0 0 3 2 . 5 0 0

F R E Q U E N C Y

0

1 0

2 0

3 0

p c i 9 4 M I D P O I N T

7 . 5 0 0 1 2 . 5 0 0 1 7 . 5 0 0 2 2 . 5 0 0 2 7 . 5 0 0 3 2 . 5 0 0

F R E Q U E N C Y

0

1 0

2 0

3 0

p c i 9 4 M I D P O I N T

7 . 5 0 0 1 2 . 5 0 0 1 7 . 5 0 0 2 2 . 5 0 0 2 7 . 5 0 0 3 2 . 5 0 0

F R E Q U E N C Y

0

1 0

2 0

3 0

p c i 9 4 M I D P O I N T

7 . 5 0 0 1 2 . 5 0 0 1 7 . 5 0 0 2 2 . 5 0 0 2 7 . 5 0 0 3 2 . 5 0 0

F R E Q U E N C Y

0

1 0

2 0

3 0

p c i 9 4 M I D P O I N T

7 . 5 0 0 1 2 . 5 0 0 1 7 . 5 0 0 2 2 . 5 0 0 2 7 . 5 0 0 3 2 . 5 0 0

F R E Q U E N C Y

0

1 0

2 0

3 0

p c i 9 4 M I D P O I N T

7 . 5 0 0 1 2 . 5 0 0 1 7 . 5 0 0 2 2 . 5 0 0 2 7 . 5 0 0 3 2 . 5 0 0

F R E Q U E N C Y

0

1 0

2 0

3 0

p c i 9 4 M I D P O I N T

7 . 5 0 0 1 2 . 5 0 0 1 7 . 5 0 0 2 2 . 5 0 0 2 7 . 5 0 0 3 2 . 5 0 0

Figure 1: Frequency histogram of per capita incomes among Florida counties

Page 7: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

F R E Q U E N C Y

0

1 0

2 0

3 0

p c i 9 4 M I D P O I N T

7 . 5 0 0 1 2 . 5 0 0 1 7 . 5 0 0 2 2 . 5 0 0 2 7 . 5 0 0 3 2 . 5 0 0

F R E Q U E N C Y

0

1 0

2 0

3 0

p c i 9 4 M I D P O I N T

7 . 5 0 0 1 2 . 5 0 0 1 7 . 5 0 0 2 2 . 5 0 0 2 7 . 5 0 0 3 2 . 5 0 0

F R E Q U E N C Y

0

1 0

2 0

3 0

p c i 9 4 M I D P O I N T

7 . 5 0 0 1 2 . 5 0 0 1 7 . 5 0 0 2 2 . 5 0 0 2 7 . 5 0 0 3 2 . 5 0 0

F R E Q U E N C Y

0

1 0

2 0

3 0

p c i 9 4 M I D P O I N T

7 . 5 0 0 1 2 . 5 0 0 1 7 . 5 0 0 2 2 . 5 0 0 2 7 . 5 0 0 3 2 . 5 0 0

F R E Q U E N C Y

0

1 0

2 0

3 0

p c i 9 4 M I D P O I N T

7 . 5 0 0 1 2 . 5 0 0 1 7 . 5 0 0 2 2 . 5 0 0 2 7 . 5 0 0 3 2 . 5 0 0

F R E Q U E N C Y

0

1 0

2 0

3 0

p c i 9 4 M I D P O I N T

7 . 5 0 0 1 2 . 5 0 0 1 7 . 5 0 0 2 2 . 5 0 0 2 7 . 5 0 0 3 2 . 5 0 0

F R E Q U E N C Y

0

1 0

2 0

3 0

p c i 9 4 M I D P O I N T

7 . 5 0 0 1 2 . 5 0 0 1 7 . 5 0 0 2 2 . 5 0 0 2 7 . 5 0 0 3 2 . 5 0 0

Figure 2: Pie chart of per capita incomes among Florida counties

Page 8: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

r t l 9 4

0

1 0 0 0 0 0 0

2 0 0 0 0 0 0

3 0 0 0 0 0 0

4 0 0 0 0 0 0

i n c 9 4

0 1 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0

r t l 9 4

0

1 0 0 0 0 0 0

2 0 0 0 0 0 0

3 0 0 0 0 0 0

4 0 0 0 0 0 0

i n c 9 4

0 1 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0

r t l 9 4

0

1 0 0 0 0 0 0

2 0 0 0 0 0 0

3 0 0 0 0 0 0

4 0 0 0 0 0 0

i n c 9 4

0 1 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0

r t l 9 4

0

1 0 0 0 0 0 0

2 0 0 0 0 0 0

3 0 0 0 0 0 0

4 0 0 0 0 0 0

i n c 9 4

0 1 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0

Figure 3: Scatter plot of retail sales versus total income among Florida counties

Page 9: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

p c i 9 4 1 2 . 0 0 0 1 5 . 0 0 0 1 8 . 0 0 02 1 . 0 0 0 2 4 . 0 0 0 2 7 . 0 0 03 0 . 0 0 0

p c i 9 4 1 2 . 0 0 0 1 5 . 0 0 0 1 8 . 0 0 02 1 . 0 0 0 2 4 . 0 0 0 2 7 . 0 0 03 0 . 0 0 0

p c i 9 4 1 2 . 0 0 0 1 5 . 0 0 0 1 8 . 0 0 02 1 . 0 0 0 2 4 . 0 0 0 2 7 . 0 0 03 0 . 0 0 0

p c i 9 4 1 2 . 0 0 0 1 5 . 0 0 0 1 8 . 0 0 02 1 . 0 0 0 2 4 . 0 0 0 2 7 . 0 0 03 0 . 0 0 0

p c i 9 4 1 2 . 0 0 0 1 5 . 0 0 0 1 8 . 0 0 02 1 . 0 0 0 2 4 . 0 0 0 2 7 . 0 0 03 0 . 0 0 0

p c i 9 4 7 . 5 0 0 1 2 . 5 0 0 1 7 . 5 0 02 2 . 5 0 0 2 7 . 5 0 0 3 2 . 5 0 0

Figure 4: Data map of per capita incomes among Florida counties

Page 10: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

a i r f a r e

1 0 0

1 1 0

1 2 0

1 3 0

1 4 0

1 5 0

1 6 0

t i m e

0 1 0 0 2 0 0 3 0 0

Figure 5: Monthly nominal (unadjusted for inflation) airfares (price per 1000 miles) on domesticflights

Page 11: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

a i r f a r e 1

6 0

7 0

8 0

9 0

1 0 0

1 1 0

1 2 0

1 3 0

1 4 0

1 5 0

1 6 0

t i m e

0 1 0 0 2 0 0 3 0 0

Figure 6: Monthly real (adjusted for inflation) airfares (price per 1000 miles) on domestic flights

Page 12: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

c p i

7 0

8 0

9 0

1 0 0

1 1 0

1 2 0

1 3 0

1 4 0

1 5 0

1 6 0

1 7 0

1 8 0

t i m e

0 1 0 0 2 0 0 3 0 0

Figure 7: Monthly Consumer Price Index (CPI-U) for all goods

Page 13: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

PA6 The difference in the proportions of all men and women who have made credit card purchasesover the internet (P1 − P2).

Statistics are numerical descriptive measures corresponding to samples. We will use the generalnotation θ to represent statistics. Special cases include:

Mode — Outcome that occurs most often. Usually is reported for nominal or ordinal variables orsimply as a peak of a continuous distribution when variable is continuous.

Median — Middle value (after numbers have been sorted from smallest to largest). Can bereported for ordinal or interval scale data. Let X(1) be the smallest , X(n) be the largest, andX(i) be the ith ordered observation in a sample of n items:

n even: Median = M =X(n

2) + X(n

2+1)

2

n odd: Median = M = X(n+12

)

X Sample mean — The average value of the elements of the sample:

X =∑n

i=1 Xi

n

S2 Sample variance — Measure of the spread (around X) of the elements of the sample:

S2 =∑n

i=1(Xi − X)2

n − 1

∑ni=1 X2

i − nX2

n − 1

S Sample standard deviation — Measure of the spread (around X) of the elements of the sample:

S =

√∑ni=1(Xi − X)2

n − 1

√∑ni=1 X2

i − nX2

n − 1

p Sample proportion — The proportion of elements in the sample that have a particular charac-teristic:

p =X

n=

# of elements with characterisic (Successes)# of elements in the sample (trials)

X1 − X2 — The difference between two sample means.

p1 − p2 — The difference between two sample proportions.

Examples related to previous scenarios, as well as new ones include:

ST1 Among a random sample of n = 50 firms listed on the NYSE in 2001, 18 (p = 18/50 = 0.36)had stock prices increase during 2001.

Page 14: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

ST2 Among a sample of 200 UF graduates, 44 are paying members of the alumni association(p = 44/200 = 0.22).

ST3 A quality inspector samples 60 pairs of Levis 550 jeans, and finds a total of 66 flaws, yieldingan average of X = 66/60 = 1.10 flaws per pair of jeans.

ST4 Of 20 patients selected with a particular disease, 12 (p = 12/20 = .60) show some remissionafter drug treatment.

ST5 Samples of 20 tires from each of two manufacturers are obtained, and the number of milesrun until the tread is worn to the legal limit are measured. Brand A has an average ofX1 = 27, 459 miles, while Brand B has an average of X2 = 32, 671 miles. The differencebetween the two brands’ sample means is X1 − X2 = 27, 459 − 32, 671 = −5212 miles.

ST6 Independent samples of male and female consumers find that among males, p1 = 0.26 havemade credit card purchases over the internet. Among females, p2 = 0.44 have made creditcard purchases on the internet.

Statistics based on samples will be used to estimate parameters corresponding to populations, aswell as test hypotheses concerning the true values of parameters.

Example 1.2 – Closing Prices for Stocks: 3/5/2002

A sample of n = 5 firms are obtained from the NYSE, and their closing prices are obtained inTable 1.5. We then compute the sample mean, median, variance, and standard deviation, whereXi is the closing price for firm i.

Firm (i) Xi Rank X2i Xi − X (Xi − X)2

Coca-Cola (1) 47.60 4 2265.76 47.6-39.5=8.1 (8.1)2 = 65.61GE (2) 40.50 2 1640.25 40.5-39.5=1.0 (1.0)2 = 1.0Pfizer (3) 40.60 3 1648.36 40.6-39.5=1.1 (1.1)2 = 1.21Sony (4) 50.30 5 2530.09 50.3-39.5=10.8 (10.8)2 = 116.64Toys R Us (5) 18.50 1 342.25 18.5-39.5=-21.0 (−21.0)2 = 441.00Sum (

∑ni=1) 197.50 8426.71 0.00 625.46

The sample mean, X is computed as follows:

X =∑n

i=1 Xi

n=

197.505

= 39.50

The median is the ((n + 1)/2)th = ((5 + 1)/2)th = 3rd ordered outcome, which is Pfizer’s closingprice (not because i = 3, but because its rank=3), which is M = X(3) = 40.60. The samplevariance, S2, and sample standard deviation, S, can be computed in two ways, the definitionalform, and the short cut form. The definitional form is as follows:

S2 =∑n

i=1(Xi − X)2

n − 1=

625.465 − 1

= 156.37 S = +√

S2 = +√

156.37 = 12.50

The short cut form is as follows:

S2 =∑n

i=1 X2i − nX

2

n − 1=

8426.71− 5(39.50)2

5 − 1=

8426.71− 7801.254

=625.46

4= 156.37 S = +

√156.37 = 12.50

Page 15: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

2 Lecture 2 — Probability

Textbook Sections: 4.4–4.8Problems: 4.25,4.27,4.29,4.31,4.33,4.39,4.43

Probability is used to measure the ‘likelihood’ or ‘chances’ of certain events (prespecifiedoutcomes) of an experiment. Certain rules of probability will be used in this course and are reviewedhere. We first will define 2 events A and B, with probabilities P (A) and P (B), respectively. Theintersection of events A and B is the event that both A and B occur, the notation being AB(sometimes written A ∩B). The union of events A and B is the event that either A or B occur,the notation being A ∪ B. The complement of event A is the event that A does not occur, thenotation being A. Some useful rules on obtaining these and other probabilities include:

• P (A ∪ B) = P (A) + P (B) − P (AB)

• P (A|B) = P (A occurs given B has occurred) = P (AB)P (B) (assuming P (B) > 0)

• P (AB) = P (A)P (B|A) = P (B)P (A|B)

• P (A) = 1 − P (A)

A special case occurs when events A and B are said to be independent. This is when P (A|B) =P (A), or equivalently P (B|A) = P (B), in this situation, P (AB) = P (A)P (B). We will be usingthis idea later in this course.

Example 2.1 – Phase III Clinical Trial for PravacholAmong a population of adult males with high cholesterol, approximately half of the males were

assigned to receive Pravachol (Bristol–Myers Squibb), and approximately half received a placebo.The outcome observed was whether or not the patient suffered from a cardiac event within fiveyears of beginning treatment. The counts of patients falling into each combination of treatmentand outcome are given in Table 1.

Cardiac EventTreatment Present (B) Absent (B) Total

Pravachol (A) 174 3128 3302Placebo (A) 248 3045 3293

Total 422 6173 6595

Table 1: Numbers of patients falling in each treatment/cardiac outcome combination (Source:NEJM, 11/16/95, pp 1301–1307)

If we define the event A to be that the patient received pravachol, and the event B to be thatthe patient suffers from a cardiac event over the study period, we can use the table to obtain somepertinent probabilities:

1. P (A) = P (AB) + P (AB) = (174/6595) + (3128/6595) = (3302/6595) = .5007

2. P (A) = P (AB) + P (AB) = (248/6595) + (3045/6595) = (3293/6595) = .4993

Page 16: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

3. P (B) = P (AB) + P (AB) = (174/6595) + (248/6595) = (422/6595) = .0640

4. P (B) = P (AB) + P (AB) = (3128/6595) + (3045/6595) = (6173/6595) = .9360

5. P (AB) = 174/6595 = .0264

6. P (AB) = 248/6595 = .0376

7. P (A ∪ B) = P (A) + P (B) − P (AB) = .5007 + .0640 − (174/6595) = .5383

8. P (B|A) = P (AB)P (A) = .0264

.5007 = .0527

9. P (B|A) = P (AB)

P (A)= .0376

.4993 = .0753

2.1 Bayes’ Rule

Sometimes we can easily obtain probabilities of the form P (A|B) and P (B) and wish to obtainP (B|A). This is very important in decision theory with respect to updating information. We startwith a prior probability, P (B), we then observe an event A, and obtain P (A|B). Then, we updateour probability of B in light of knowledge that A has occurred.

In the case of B only having two possible outcomes: B and B, Bayes’ rule can be stated asfollows:

P (B|A) =P (AB)P (A)

=P (AB)

P (AB) + P (AB)P (A|B)P (B)

P (A|B)P (B) + P (A|B)P (B)

In general if B has k possible (mutually exclusive and exhaustive) outcomes, the rule can bestated as follows:

P (Bj |A) =P (ABj)P (A)

=P (ABj)∑ki=1 P (ABi)

P (A|Bj)P (Bj)∑ki=1 P (A|Bi)P (Bi)

Example 2.2 – Moral Hazard

A manager cannot observe whether her salesperson works hard. She believes based on priorexperience that the probability her salesperson works hard (H) is 0.30. She believes that if thesalesperson works hard, the probability a sale (S) is made is 0.75. If the salesperson does not workhard, the probability the sale is made is 0.15.

What is the probability that the salesperson worked hard if the sale was made? If not made?

• Pr{Works Hard}=P (H) = 0.30 Pr{Not Works Hard}=P (H) = 1 − 0.40 = 0.70

• Pr{Makes Sale | Works Hard}=P (S|H) = 0.75

• Pr{Makes Sale | Not Works Hard}=P (S|H) = 0.15

P (H |S) =P (HS)P (S)

=P (S|H) · P (H)

P (S|H) · P (H) + P (S|H) · P (H)

=0.75(0.30)

0.75(0.30) + 0.15(0.70)=

0.2250.225 + 0.105

=0.2250.330

= 0.68

Page 17: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

P (H |S) =0.25(0.30)

0.25(0.30) + 0.85(0.70)=

0.0750.075 + 0.595

=0.0750.670

= 0.11

Note the amount of updating of the probability the salesperson worked hard, depending on whetherthe sale was made.

This is a simplistic example of a theoretical area in information economics (See e.g. D.M. Kreps,A Course in Microeconomic Theory, Chapter 16).

Example 2.3 – O.J. Simpson’s DNAIn the O.J. Simpson murder trial, it was stated that 0.43% (proportion=.0043) of blood samples

taken from all victims and suspects observed by the LA police department match the blood takenfrom the murder scene of Nicole Brown Simpson and Ronald Goldman. We will assume that thisis representative of the fraction of people in the general population whose blood types match theblood at the crime scene. Define the following events:

A —A randomly selected person’s blood matches that found at the crime scene

B — A person is innocent of the murders

B — A person is guilty of the murders

Assume that a guilty person’s blood will match with that at the crime scene with certainty.In terms of diagnostic testing, the sensitivity of this test is 100% and the specificity of the test is99.57%. That is:

P (A|B) = 1 P (A|B) = .0043 = 1 − .9957

Suppose you had a prior (to observing blood evidence) probability that O.J. was innocent of0.5 (P (B) = 0.5). You now find out that his blood matches that at the crime scene. What is yourupdated probability that he is innocent (ignoring possibility of tampering)?

P (B|A) =0.5(.0043)

0.5(.0043) + (1 − 0.5)(1)=

.00215.00215 + 0.5

=.00215.50215

= .0043

Repeat for prior probabilities of 0.9 and 0.1.Source: Forst B. (1996). “Evidence, Probabilities and Legal Standards for the Determination

of Guilt: Beyond the O.J. Trial.” In Representing O.J.: Murder, Criminal Justice, and the MassCulture, ed. G. Barak, pp 22-28. Guilderland, N.Y.: Harrow and Heston.

Example 2.4 – Adverse Selection (Job Market Signaling)Consider a simple model where there are two types of workers – low quality and high quality.

Employers are unable to determine the worker’s quality type. The workers choose education levelsto signal to employers their quality types. Workers can either obtain a college degree (high educationlevel) or not obtain a college degree (low education level). The effort of obtaining a college degree islower for high quality workers than for low quality workers. Employers pay higher wages to workerswith higher education levels, since this is a (imperfect) signal for their quality types.

Suppose you know that in the population of workers, half are low quality and half are highquality. Thus, prior to observing a potential employee’s education level, the employer thinks theprobability the worker will be high quality is 0.5. Among high quality workers, 70% will pursue a

Page 18: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

college degree (30% do not pursue a degree), and among low quality workers, 20% pursue a collegedegree (80% do not).

Let Q be the event a worker is high quality, and Q be the event the worker is low quality.Further, let E be the event the worker obtains a college degree, and E be the event that the workerdoes not obtain a college degree. Then we are given the following probabilities from the previousproblem description:

P (Q) = 0.5 P (Q) = 1 − P (Q) = 1 − 0.5 = 0.5 P (E|Q) = 0.70 P (E|Q) = 0.20

What is the probability a worker is high quality, given (s)he has a college degree?

P (Q|E) =P (Q) · P (E|Q)

P (Q) · P (E|Q) + P (Q) · P (E|Q)=

0.5(0.70)0.5(0.70) + 0.5(0.20)

=0.35

0.35 + 0.10=

0.350.45

= 0.78

What is the probability a worker is high quality, given (s)he does not have a college degree?

P (Q|E) =P (Q) · P (E|Q)

P (Q) · P (E|Q) + P (Q) · P (E|Q)=

0.5(0.30)0.5(0.30) + 0.5(0.80)

=0.15

0.15 + 0.40=

0.150.55

= 0.27

This is a simplistic example of a theoretical area in information economics (See e.g. D.M. Kreps,A Course in Microeconomic Theory, Chapter 17).

vsaExample 2.5 – Cholera and London Water CompaniesEpidemiologist John Snow conducted a massive survey during a cholera epidemic in London

during 1853-1854. He found that water was being provided through the pipes of two companies:Southwark & Vauxhall (S&V) and Lambeth (L). Apparently, the Lambeth company was obtainingtheir water upstream in the Thames River from the London sewer outflow, while the S&V companygot theirs near the sewer outflow.

Table 2 gives the numbers (or counts) of people who died of cholera and who did not, seperatelyfor the two firms.

Outcome Lambeth S&V Row TotalCholera Death 407 3702 4109No Cholera Death 170956 261211 432167Column Total 171363 264913 436276

Table 2: John Snow’s London cholera results

a) What is the probability a randomly selected person received water from the Lambeth com-pany? From the S&V company?

b) What is the probability a randomly selected person died of cholera? Did not die of cholera?

c) What proportion of the Lambeth consumers died of cholera? Among the S&V consumers?Is the incidence of cholera death independent of firm?

d) What is the probability a person received water from S&V, given (s)he died of cholera?

Source: W.H. Frost (1936). Snow on Cholera, London, Oxford University Press.

Page 19: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

3 Lecture 3 – Discrete Random Variables and Probability Distri-butions

Textbook Sections: 5.1,5.2,Notes(for bivariate r.v.’s)Problems: 5.1,5.3, see lecture notes

An experiment is conducted and some measurement is to be made regarding the outcome.This type of measurement can be classified as either discrete or continuous. Discrete randomvariables can take on only a finite (or countably infinite) possible set of outcomes. Examplesinclude:

RV1 The number of surveyed voters who favor Al Gore in the upcoming election from a survey of722 registered voters

RV2 The number of military personnel that oppose the military’s ban on homosexuals from asurvey of 300 current military personnel

RV4 The number of patients, out of a group of 20 under study, that react positively to a new drugtreatment

RV5 The number of successful shuttle launches out of the first 30 shuttle missions

Continuous random variables can take on any value corresponding to points on a line interval. Itshould be noted that while this type of variable occurs on a continuous scale, it is measured on somesort of discrete scale (a news weatherman reports the temperature as 93◦F , not 92.7756 . . .◦ F ).Examples include:

RV3 The gas mileage of a Ford Mustang GT convertible when run at 65 miles per hour.

RV7 The number of miles a tire can travel before wearing out.

RV9 The amount of time needed to housetrain a dog.

These are considered random variables because we have randomly selected some subject or objectfrom a population of such subjects (objects). The populations of these subjects (whether existingor conceptual) are said to have probability distributions. These are models of the distributionof the measurements corresponding to the elements of the population.

Discrete probability distributions are a set of outcomes (denoted by x) and their corre-sponding probabilities. The distribution can be presented in terms of a table, graph, or formularepresenting each possible outcome of the random variable and its probability of occuring. Definingp(x) as “ the probability the random variable takes on the value x”, we have the following simplerules for discrete probability distributions:

1. 0 ≤ p(x) ≤ 1

2.∑

x p(x) = 1

Thus, all probabilities must be between 0 and 1, and all probabilities must sum to 1.We consider discrete random variables in this lecture.

Example 3.1 — New Florida Lotto Game

Page 20: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

Consider Florida’s newly renovated lotto game. Before the drawing, you buy a card by choosing6 different numbers between 1 and 53 inclusive, and giving the clerk $1. When the state choosestheir 6 numbers subsequently, there will be either 0, 1, 2, 3, 4, 5 or 6 numbers that match yours.This is a discrete random variable. You do not know how many of the state’s numbers will matchyours, but you can obtain the probability of each possible outcome. This is a set of probabilitiesthat can be used to set up the corresponding probability distribution. For this case, if we letX be the random variable representing how many of the state’s numbers match yours, it has theprobability distribution given in Table 3. The distribution used is the hypergeometric distribution,which is described in many textbooks on mathematical statistics.

x p(x)0 .467715663911 .400899140502 .116540447823 .014126114894 .000706305745 .000012283586 .00000004356

Table 3: Probability distribution for number of winning digits on a Florida lotto ticket

Note that all probabilities are between 0 and 1, and that they sum to 1. Of course, your ticketis worthless unless x ≥ 3, so the probability distribution corresponding to your prize amount willbe different than this distribution (you will pool p(0), p(1), and p(2) to obtain the probability youwin nothing (.98515525223)).

For discrete probability distributions, the mean, µ is interpreted as the ‘long run averageoutcome’ if the experiment were conducted many times. The variance, σ2 is a measure of howvariable these outcomes are. The variance is the average squared distance between the outcomeof the random variable and the mean. The positive square root of the variance is the standarddeviation, σ and is in the units of the original data.

For a discrete random variable:

• µ = E(X) =∑

x x · p(x)

• σ2 = V (X) = E[(X − µ)2] =∑

x(x − µ)2 · p(x) =∑

x x2 · p(x) − µ2

• σ = +√

σ2

Example 3.1 – continuedReferring back to the new Florida lotto example, we obtain the mean and variance from calcu-

lations in Table 4.Thus, under the new game, the average number of “winning digit” is µ = 0.6792, with a variance

and standard deviation of (using 4 digits in calculations):

σ2 = 1.0058 − (0.6792)2 = 0.5445 σ = 0.7379

Page 21: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

x p(x) x · p(x) x2 · p(x)0 .46771566391 0(.46771566391) = 0 02(.46771566391) = 01 .40089914050 .40089914050 .400899140502 .11654044782 .23308089564 .466161791283 .01412611489 .04237834466 .127135033994 .00070630574 .00282522298 .011300891915 .00001228358 .00006141789 .000307089456 .00000004356 .00000026135 .00000156812∑

1.00 .67924528302 1.00580551525

Table 4: Probability distribution for number of winning digits on a Florida lotto ticket

The variation in the correct numbers is relatively small as well, reflecting the fact that almostalways people get either 0, 1, or 2 correct numbers.

Example 3.2 – Adverse Selection (Akerlof’s Market for Lemons)George Akerlof shared the Nobel Prize for Economics in 2002 for an extended version of this

model. There are two used car types: peaches and lemons. Sellers know the car type, having beendriving it for a period of time. Buyers are unaware of a car’s quality. Buyers value peaches at $3000and lemons at $2000. Sellers value peaches at $2500 and lemons at $1000. Note that if sellers hadhigher valuations, no cars would be sold.

Suppose that 1/3 of the cars are peaches and the remaining 2/3 are lemons. What is theexpected value to a buyer, if (s)he purchases a car at random? We will let X represent the valueto the buyer, which takes on the values 3000 (for peaches) and 2000 (for lemons).

µ = E(X) =∑

x · p(x) = 3000(1/3) + 2000(2/3) = 2333.33

Thus, buyers will not pay over $2333.33 for a used car, and since the value of peaches is $2500 tosellers, only lemons will be sold, and buyers will learn that, and pay only $2000. At what fractionof the cars being peaches, will both types of cars be sold?

For a theoretical treatment of this problem, see e.g. D.M. Kreps, A Course in MicroeconomicTheory, Chapter 17.

3.1 Bivariate Distributions

Often we are interested in the outcomes of 2 (or more) random variables. Suppose you have theopporunity to purchase shares of two firms. Your (subjective) joint probability distribution (p(x, y))for the return on the two stocks is given in Table 5.

Stock A6% 10%

Stock 0% .10 .40A 16% .40 .10

Table 5: Joint probability distribution for stock returns – Substitutable Industries

Page 22: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

Thus, you have reason to believe there is little possibility that both will perform poorly orstrongly. For now, denote X as the return for stock A and Y as the return for stock B. We canthink of these industries as “substitutes.”

Marginally, what is the probability distribution for stock A (this called the marginal distribu-tion)? For stock B? These are giben in Table 6.

Stock A Stock Bx P (X = x) y P (Y = y)0 .10+.40=.50 6 .10+.40=.5016 .40+.10=.50 10 .40+.10=.50

Table 6: Marginal probability distributions for stock returns

Hence, we can compute the mean and variance for X and Y:

E(X) = µX = 0(.5) + 16(0.5) = 8.0 V (X) = σ2X = (0 − 8)2(0.5) + (16 − 8)2(0.5) = 64.0

E(Y ) = µY = 6(.5) + 10(0.5) = 8.0 V (Y ) = σ2Y = (6 − 8)2(0.5) + (10 − 8)2(0.5) = 4.0

So, both stocks have the same expected return, but stock A is riskier, in the sense that itsvariance is much larger.

How do X and Y ”co-vary” together?For these two firms, we find that the covariance is negative, since high values of X tend to be

seen with low values of Y and vice versa. We compute the Covariance of their returns, which wedenote as COV (X,Y ) = E(X − µX)(Y − µY ) in Table 7.

COV (X,Y ) = E[(X − µX)(Y − µY )] = σXY =∑

x

y

(x − µX)(y − µY )p(x, y) = E(XY ) − µXµY

x x − µX y y − µY P (X = x, Y = y) = p(x, y) (x − µX)(y − µY )p(x, y)0 −8 6 −2 0.10 (−8)(−2)(.10) = 1.60 −8 10 2 0.40 (−8)(2)(.40) = −6.416 8 6 −2 0.40 (8)(−2)(.40) = −6.416 8 10 2 0.10 (8)(2)(.10) = 1.6

1.00 −9.6

Table 7: Covariance of stock returns

Functions of Random VariablesSuppose we are interested in the sum of X and Y. What will be its probability distribution

(specifically its mean and variance)?

E(X + Y ) = E(X) + E(Y ) = µX + µY

V (X + Y ) = V (X) + V (Y ) + 2COV (X,Y ) = σ2X + σ2

Y + 2σXY

Page 23: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

x y p(x, y) x + y

0 6 .10 60 10 .40 1016 6 .40 2216 10 .10 26

Table 8: Distribution for the sum of stock returns

To see this, look at the distribution of the random variable X + Y in Table 8.1) By definition of mean and variance:

E(X + Y ) = 6(.10) + 10(.40) + 22(.40) + 26(.10) = 0.6 + 4.0 + 8.8 + 2.6 = 16

V (X + Y ) = (6 − 16)2(.1) + (10 − 16)2(.4) + (22 − 16)2(.4) + (26 − 16)2(.1) =

100(.1) + 36(.4) + 36(.4) + 100(.1) = 10.0 + 14.4 + 14.4 + 10.0 = 48.8

2) By the formula:

E(X + Y ) = E(X) + E(Y ) = 8 + 8 = 16

V (X + Y ) = V (X) + V (Y ) + 2COV (X,Y ) = 64 + 4 + 2(−9.6) = 48.8

General Case (Linear Function) where a and b are any constants:

E(aX + bY ) = aE(X) + bE(Y ) = aµY + bµY

V (aX + bY ) = a2V (X) + b2V (Y ) + 2abCOV (X,Y ) = a2σ2X + b2σ2

Y + 2abσXY

Example 3.3 – Stock PurchaseYou can purchase either stock A, stock B, or any combination of A and B. Your two criteria

for choosing are 1) highest expected return, and 2) lowest variance of return. You can choose pbetween 0 and 1 (inclusive), where p is the fraction of A you will purchase and 1− p is the fractionof B you will purchase. Your (random) return is:

R = pX + (1 − p)Y

1) Compute your expected return:

E(R) =

2) Compute the variance of your return:

V (R) =

3) What value of p should you choose?

Page 24: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

Stock A6% 10%

Stock 0% .40 .10A 16% .10 .40

Table 9: Joint probability distribution for stock returns – Complementary Industries

Problem 3.1Conduct the analysis for two complementary industries, where their fortunes tend to be good/bad

simultaneously. The joint probabiliy distribution is given in Table 9.A classic paper on this topic (more mathematically rigorous than this example, where each stock

has only two possible outcomes) is given in: Harry M. Markowitz, “Portfolio Selection,” Journal ofFinance, 7 (March 1952), pp 77-91.

Page 25: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

4 Lecture 4 – Introduction to Decision Analysis

Textbook Sections: 18.1,18.2(1st 2 subsections),18.3(1st 3 subsections)Problems: 18.1a,b,3,5,6,7,8,9

Often times managers must make long-term decisions without knowing what future events willoccur that will effect the firm’s financial outcome from their decisions. Decision analysis is a meansfor managers to consider their choices and help them select an optimal strategy. For instance:

• Financial officers must decide among certain investment strategies without knowing the stateof the economy over the investment horizon.

• A buyer must choose a model type for the firm’s fleet of cars, without knowing what gasprices will be in the future.

• A drug company must decide whether to aggressively develop a new drug without knowingwhether the drug will be effective the patient population.

The decision analysis in its simplest form include the following components:

Decision Alternatives – These are the actions that the decision maker has to choose from.

States of Nature – These are occurrences that are out of the control of the decision maker, andthat occur after the decision has been made.

Payoffs – Benefits (or losses) that a particular decision alternative has been selected and a givenstate of nature has observed.

Payoff Table – A tabular listing of payoffs for all combinations of decision alternatives and statesof nature.

Case 1 - Decision Making Under CertaintyIn the extremely unlikely case that the manager knows which state of nature will occur, the

manager will simply choose the decision alternative with the highest payoff conditional on thatstate of nature. Of course, this is a very unlikely situation unless you have a very accurate psychicon the company payroll.

Case 2 - Decision Making Under UncertaintyWhen the decision maker does not know which state will occur, or even what probabilities to

assign to the states of nature, several options occur. The two simplest criteria are:

Maximax – Look at the maximum payoff for each decision alternative. Choose the alternativewith the highest maximum payoff. This is optimistic.

Maximin – Look at the minimum payoff for each decision alternative. Choose the alternativewith the highest minimum payoff. This is pessimistic.

Page 26: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

Case 3 - Decision Making Under RiskIn this case, the decision maker does not know which state will occur, but does have probabilities

to assign to the states. Payoff tables can be written in the form of decision trees. Note that indiagarams below, squares refer to decision alternatives and circles refer to states of nature.

Expected Monetary Value (EMV) – This is the expected payoff for a given decision al-ternative. We take each payoff times the probability of that state occuring, and sum it acrossstates. There will be one EMV per decision alternative. One criteria commonly used is to selectthe alternative with the highest EMV.

Expected Value of Perfect Information (EVPI) – This is a measure of how valuable itwould be to know what state will occur. First we obtain the expected payoff with perfect informationby multiplying the probability of each state of nature and its highest payoff, then summing overstates of nature. Then we subtract off the highest EMV to obtain EVPI.

Example 4.1 – Long-term Marketing PlanA drug manufacturer has two potential drugs for research and development. One drug targets a

childhood illness, the other targets an illness among the elderly. The firm expects that both drugswill be effective and will obtain FDA approval, but it will be 10 years before either drug will bebrought to market, and involve very expensive research and development. They are not sure whatthe size of each market will be in 10 years.

The firm has four decision alternatives:

• Pursue neither drug

• Pursue only the childhood drug

• Pursue only the elderly drug

• Pursue both drugs

Their are six possible states of nature:

• Birth rates decrease and life expectancies stay constant (B − /L0)

• Birth rates stay constant and life expectancies stay constant (B0/L0)

• Birth rates increase and life expectancies stay constant (B + /L0)

• Birth rates decrease and life expectancies increase (B − /L+)

• Birth rates stay constant and life expectancies increase (B0/L+)

• Birth rates increase and life expectancies increase (B + /L+)

The payoffs (in $million) for each combination of decisions and states of nature are given inTable 10.

a) What would be your decision and payoff under each state of nature, if you were certain thatstate were to occur?

B − /L0 – Decision: Payoff:

B0/L0 – Decision: Payoff:

B + /L0 – Decision: Payoff:

Page 27: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

Decision State of NatureAlternative B − /L0 B0/L0 B + /L0 B − /L+ B0/L+ B + /L+Neither 0 0 0 0 0 0Child -20 10 40 -20 10 40Elderly -10 -10 -10 30 30 30Both -30 0 30 10 40 70

Table 10: Payoff table for drug development decision

B − /L+ – Decision: Payoff:

B0/L+ – Decision: Payoff:

B + /L+ – Decision: Payoff:

b) Give the maximax and minimax decisions and their corresponding criteria:

Maximax – Decision: Criteria:

Maximin – Decision: Criteria:

c) Suppose we are given the probability distribution for the 6 states of nature in Table 11.

State ProbabilityB − /L0 0.05B0/L0 0.10

B + /L0 0.15B − /L+ 0.15B0/L+ 0.25

B + /L+ 0.30

Table 11: Probability distribution for states of nature for drug development decision

To obtain the expected monetary value for each decision alternative, we multiply the payoffsfor each state of nature and their corresponding probabilities, summing over states of nature. Forthe decision to develop only the childhood drug:

EMV (Child) = (−20)(0.05) + 10(0.10) + 40(0.15) + (−20)(0.15) + 10(0.25) + 40(0.30) =

−1.0 + 1.0 + 6.0 − 3.0 + 2.5 + 12.0 = 17.5

Neither – EMV (Neither)=

Child – EMV (Child)=

Elderly – EMV (Elderly)=

Both – EMV (Both)=

Page 28: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

Based on the EMV criteria, which decision should the firm make?

d) A firm that conducts extensive research on population dynamics can be hired and can beexpected to tell your firm exactly which state of nature will occur. Give the expected payoff underperfect information, and how much you would be willing to pay for that (EVPI).

Example 4.2 – Merck’s Decision to Build New FactoryAround 1993, Merck had to decide whether to build a new plant to manufacture the AIDS drug

Crixivan. The drug had not been tested at the time in clinical trials. The plant would be veryspecialized as the process to synthesize the drug was quite different from the process to produceother drugs.

Consider the following facts that were known at the time (I obtained most numbers throughnewspaper reports, and company balance sheets, all numbers are approximate):

• Projected revenues – $500M/Year

• Merck profit margin – 25%

• Probability that drug will prove effective and obtain FDA approval – 0.10

• Cost of building new plants – $300M

• Sunk costs – $400M (Money spent in development prior to this decision)

• Length of time until new generation of drugs – 8 years

Ignoring tremendous social pressure, does Merck build the factory now, or wait two years andobserve the results of clinical trials (thus, forfeiting market share to Hoffman-Laroche and Abbott,who are in fierce competition with Merck). Assume for this problem that if Merck builds now,and the drug gets approved, they will make $125M/Year (present value) for eight years (Note125=500(0.25)). If they wait, and the drug gets approved, they will generate $62.5M/Year (presentvalue) for six years. This is a by product of losing market share to competitors and 2 years ofproduction. Due to the specificity of the production process, the cost of the plant will be a totalloss if the drug does not obtain FDA approval.

a) What are Merck’s decision alternatives?

b) What are the states of nature?

c) Give the payoff table.

d) Give the Expected Monetary Value (EMV) for each decision. Ignoring social pressure, shouldMerck go ahead and build the plant?

e) At what probability of the drug being successful, is Merck indifferent to building early orwaiting. That is, for what value are the EMV’s equal for the decision alternatives?

Page 29: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

Note: Merck did build the plant early, and the drug did receive FDA approval.

Page 30: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

5 Lecture 5 – Normal Distribution and Sampling Distributions

Textbook Sections and pages: 6.2,7.2,7.3,pp336-337,pp364-365,9.1Problems: 6.7,9,11, 7.13,21,22,23,25,27

Continuous probability distributions are smooth curves that represent the ‘density’ ofprobability around particular values. This density is not interpreted as a probability at the point(all points will have probability of 0), but rather the probability of an outcome occuring betweenpoints a and b is measured as the area under the density function between a and b. Thedensity function is always defined so that the total area under it is 1, and it is never negative. Thecontinuous distribution you have seen most often is the normal distribution, but many othersexist including the t-distribution, which you also have already seen.

5.1 The Normal Distribution

Normal distributions are indexed by 2 parameters, the mean and variance (standard deviation).Figure 8 depicts 3 normal distributions with the same mean (µ = 100) and varying standarddeviations (σ = 3, 10, and 25). Figure 9 depicts 3 normal distributions with the same standarddeviation (σ = 10) and varying means (µ = 75, 100, and 125).

F 1

0 . 0 00 . 0 10 . 0 20 . 0 30 . 0 40 . 0 50 . 0 60 . 0 70 . 0 80 . 0 90 . 1 00 . 1 10 . 1 20 . 1 30 . 1 4

X

5 0 6 0 7 0 8 0 9 0 1 0 0 1 1 0 1 2 0 1 3 0 1 4 0 1 5 0

Figure 8: Normal distributions with common means and varying standard deviations (3, 10, 25)

Standard notation for a random variable X, that follows a normal distribution with mean µand standard deviation σ is X ∼ N(µ, σ). Since there are infinitely many normal distributions(corresponding to any µ and any σ > 0), we must standardize normal random variables to obtainprobabilities corresponding to them. If X ∼ N(µ, σ), we define Z = X−µ

σ . Z represents the numberof standard deviations above (or below, if negative) the mean that X lies. Table A.5 (p. A–14and last page of text, not including inside back cover) gives the probability that Z lies between 0and z for values of z between 0 and 3.49. Recall that the total area under the curve is 1, that theprobability that Z is larger than 0 is 0.5, and that the curve is symmetric.

Example 5.1

Page 31: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

F 1

0 . 0 0 0

0 . 0 0 5

0 . 0 1 0

0 . 0 1 5

0 . 0 2 0

0 . 0 2 5

0 . 0 3 0

0 . 0 3 5

0 . 0 4 0

X

2 0 4 0 6 0 8 0 1 0 0 1 2 0 1 4 0 1 6 0 1 8 0

Figure 9: Normal distributions with common standard deviations and varying means (75, 100, 125)

Scores on the Verbal Ability section of the Graduate Record Examination (GRE) between10/01/92 and 9/30/95 had a mean of 479 and a standard deviation of 116, based on a populationof N = 1188386 examinations. Scores can range between 200 and 800. Scores on standardizedtests tend to be approximately normally distributed. Let X be a score randomly selected from thispopulation. That is, X ∼ N(479, 116).

What is the probability that a randomly selected student scores above 700?What is the probability the student scores between 400 and 600?Above what score do the top 5% of all students score above?

1. P (X ≥ 700) = P (X−µσ ≥ 700−479

116 ) = P (Z ≥ 1.91) = P (Z ≥ 0) − P (0 ≤ Z ≤ 1.91) =0.50 − 0.4719 = .0281.

2. P (400 ≤ X ≤ 600) = P (400−479116 ≤ X−µ

σ ≤ 600−479116 ) = P (−.68 ≤ Z ≤ 1.04) = P (−.68 ≤ Z ≤

0) + P (0 ≤ Z ≤ 1.04) = P (0 ≤ Z ≤ .68) + P (0 ≤ Z ≤ 1.04) = .2517 + .3508 = .6025

3. .05 = .5 − .4500 = .5 − P (0 ≤ Z ≤ 1.645) = P (Z ≥ 1.645) = P (X−µσ ≥ 1.645) = P (X ≥

µ + 1.645σ) = P (X ≥ 479 + 1.645(116)) = P (X ≥ 670) = .05.

Source: “Interpreting Your GRE General Test and Subject Test Scores – 1996-97,” EducationalTesting Service.

5.2 Sample Statistics and Sampling Distributions

We have described sample statistics previously, showing how they are calculated once a samplehas been taken from a larger population. Since these samples are taken at random, the elementsof the sample, and thus the sample statisics themselves are random variables. One of the mostimportant theorems in statistics is the Central Limit Theorem, which states that when thesample size is large (n ≥ 30), the sample mean is approximately normally distributed with mean µand variance σ2/n, regardless of the shape of the underlying distribution of measurements. Here,µ and σ2 are the mean and variance of the distribution of the measurements. We can then writeX ∼ N(µ, σ/

√n).

Page 32: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

The other sample statistics p, X1 − X2, and p1 − p2 are also approximately normal in largesamples. The distribution of a sample statistic is called its sampling distribution. The standarddeviation of a sample statistic’s sampling distribution is called its standard error. Table 12gives each of these 4 sample statistics as well as the means and standard errors of their samplingdistributions. The row involving d is a special case of the sample mean for differences amongmatched pairs of observations (see the paired difference experiment).

Estimator (θ) Parameter (θ) Std. Error (σθ) Estimated Std. Error (Sθ) Degrees of Freedom (ν)X µ σ√

nS√n

n − 1

p P√

P (1−P )n

√p(1−p)

n —

X1 − X2 µ1 − µ2

√σ21

n1+ σ2

2n2

√S2

1n1

+ S22

n2

∗n1 + n2 − 2

d µdσd√

nSd√

nn − 1

p1 − p2 P1 − P2

√P1(1−P1)

n1+ P2(1−P2)

n2

√p1(1−p1)

n1+ p2(1−p2)

n2

∗∗—

Table 12: Means, standard errors, and estimated standard errors of four sample statistics (estima-tors)

To obtain probabilities of observing particular values of a sample statisic, we use the fact thatthe statistic is normally distributed, and work with Z = θ−θ

σθ

.

Example 5.2 – NCAA Basketball Tournament ScoresThe NCAA basketball tournament (often referred to as “March Madness”) has been held every

spring since 1939. In the 55 years of the tournament (up until 1993), there had been 1583 gamesplayed. Among these 1583 games (the population), the mean and standard deviation of the com-bined scores of the two combatants are µ = 143.40 and σ = 26.07 points, respectively. Supposeeach person in this class took samples of size 1, 10, 25, and 50, respectively, from this populationof games. Between what two bounds would virtually all students’ sample means fall between? Weknow that the sample mean is approximately normally distributed with mean µ = 143.40 and stan-dard error σ/

√n, (the underlying distribution is very well approximated by the normal, meaning

that we don’t need large samples sizes for the Central Limit Theorem to hold). We also know thatfor any random variable that is normally distributed, the probability that the random variable fallswithin two standard deviations (standard errors) of the mean is approximately .95. So, for eachsample size, we obtain bounds by computing µ± 2 σ√

n. Table 13 gives these bounds for the sample

sizes mentioned above.

n µ − 2 σ√n

µ + 2 σ√n

1 143.40 − 226.07√1

= 91.26 143.40 + 226.07√1

= 195.5410 143.40 − 226.07√

10= 126.91 143.40 + 226.07√

10= 159.89

25 143.40 − 226.07√25

= 132.97 143.40 + 226.07√25

= 153.8350 143.40 − 226.07√

50= 136.03 143.40 + 226.07√

25= 150.77

Table 13: Sample sizes and upper and lower bounds for sample means (95% confidence)

As the sample size increases, the sample means get closer and closer to the true mean. Thus if

Page 33: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

we don’t know the true mean, but we wish to estimate it, we know that if we take a large sample thesample mean will be relatively close to the true mean. The part that we are adding and subtractingfrom the true mean is referred to as the bound on the error of estimation (it is also referredto as the margin of error, particularly when used in context of a sample proportion).

Similar examples could be worked in terms of the other three estimators (sample statistics)given in Table 12, using the corresponding parameter and standard error of the estimator in placeof those used in Example 6.2.

Example 5.3 – Pravachol Clinical Trial

In Example 2.1 we considered the results of the clinical trial for Pravachol (and treated thedata as a population of patients). In reality, that was a sample (very large one at that). If we letP1 be the proportion of all possible Pravachol users to have a heart event within five years, andP2 be the corresponding proportion for patients on a placebo, we are interested in the parameterP1 − P2. The estimator for this parameter is p1 − p2, which, for this sample, takes on the value:

p1 − p2 =X1

n1− X2

n2=

1743302

− 2483293

= .0527 − .0753 = −.0226

The estimated standard error of p1 − p2 is:√

p1(1 − p1)n1

+p2(1 − p2)

n2=

√.0527(1 − .0527)

3302+

.0753(1 − .0753)3293

=√

.00001512 + .00002115 = .0060

Thus, we would expect that for approximately 95% of all possible samples, our statistic p1 − p2

will lie within 2 standard errors (2(.0060)=.0120) of the true difference P1 − P2.

Page 34: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

6 Lecture 6 – Large–Sample Tests and Confidence Intervals

Textbook Sections: 9.2,9.4,10.1,10.4Problems: 9.1,5,9,19,23,25 10.1,5,7,9,31,33,35

In this section we begin making statistical inferences, using sample data to comment on whatis occuring in a larger population or nature.

6.1 Large–sample Confidence Intervals

By making use of the sampling distributions of sample statistics, we can use sample data to make aninference concerning a population parameter. Since each estimator (θ) described in the previoussection is normally distributed with a mean equal to the true parameter (θ), and standard error(σθ) given in the table, we can obtain a confidence interval for the true parameter.

We first define zα/2 to be the point on the standard normal distribution such that P (Z ≥zα/2) = α/2. Some that we will see various times are z.05 = 1.645, z.025 = 1.96, and z.005 = 2.58,respectively. The main idea behind confidence intervals is the following. Since we know thatθ ∼ N(θ, σθ), then we also know Z = θ−θ

∼ N(0, 1). So, we can write:

P (−zα/2 ≤ θ − θ

σθ

≤ zα/2) = 1 − α

A little bit of algebra gives the following:

P (θ − zα/2σθ ≤ θ ≤ (θ + zα/2σθ) = 1 − α

This merely says that “in repeated sampling, our estimator will lie within zα/2 standard errors ofthe mean a fraction of 1 − α of the time.” The resulting formula for a (1 − α)100% confidenceinterval for θ is

θ ± zα/2σθ.

When the standard error σθ is unknown (almost always), we will replace it with the estimatedstandard error Sθ. Some notes concerning confidence intervals are given below.

• α is our level of confidence (with respect to repeated sampling) that the interval does notcontain the true parameter. If we wish to make α smaller, we will increase the width of ourinterval (for a fixed sample size).

• The width of the interval depends on the sample size through the standard error. As thesample size increases, the width of the interval will decrease (for a fixed α), which is goodsince we have a more precise estimate.

• If we took many random samples of a fixed size from the population of interest, and calculatedthe confidence interval based on each sample, approximately (1 − α)100% of these intervalswould contain the true parameter. This is where the term confidence arises from; sincealmost all of these intervals contain θ, we can be very confident that the interval based onour one sample contains θ.

Example 6.1

Page 35: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

Fox News Opinion Poll: “CNN covered Iran–Contra live in 1987, but is not covering Senatehearings of Democratic finance abuses. Do you think the decision was politically motivated?”(Washington Times, National Weekly Edition,8/10/97). Out of n = 899 American adults sampled,X = 476 agreed with the statement.

1 or 2 Populations? — We are observing a sample from a single population.

Numeric or Presence/Absence Outcome — Each person either agrees or does not agree withthe statement, thus it is a Presence/Absence outcome.

Parameter of Interest — P , the proportion of all American adults who feel the decision waspolitically motivated.

Appropriate Estimator — p = Xn

Estimated Standard Error —√

p(1−p)n

We wish to obtain a 95% confidence interval for the proportion of all U.S. adults who believethe decision was politically motivated. For this sample, p = X

n = 476899 = .53, and it’s estimated

standard error is:

Sp =

√p(1 − p)

n=

√(.53)(.47)

899= .0166.

Thus a 95% confidence interval for the true proportion, P , is:

p ± z.025Sp = .53 ± 1.96(.0166) = .53 ± .0325 = (.4975, .5625)

We are 95% confident that the proportion of all U.S. adults who feel that the decision was politicallymotivated was between 0.4975 and 0.5625. Note that since values below 0.50 are contained in theinterval, we cannot conclude that a majority agree with the statement at this significance level.

Example 6.2 – Salary Progression Gap Between Dual Earner and Traditional MaleManagers

A study compared the salary progressions from 1984 to 1989 among a sample of married malemanagers of Fortune 500 companies with children at home. For each manager, their 5-year salaryprogression was obtained as 100*(1989 salary-1984 salary)/1984 salary. This is a percent increase,if the manager’s salary increased from $100K in 1984 to $200K in 1989, then X=100*(200K-100K)/100K=100(1)=100%. The researchers were interested in determining whether there is adifference in the mean salary progression between dual earner and traditional managers. Dualearner managers had wives who worked full time, traditional managers’ wives did not work. Theauthors reported the sample statistics in Table 6.1.

Statistic Dual Earner (i = 1) Traditional (i = 2)Xi 60.46 69.24Si 22.21 61.27ni 166 182

Table 14: Summary statistics for male manager salary progression study

Page 36: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

1 or 2 Populations? — We are observing samples from two populations (dual earner male man-agers and traditional male managers).

Numeric or Presence/Absence Outcome — The outcome measured is the percent change insalary 1984-1989. This is a numeric outcome.

Parameter of Interest — µ1−µ2, the difference between true mean salary progressions for dualearner and traditional male managers.

Appropriate Estimator — X1 − X2

Estimated Standard Error —√

S21

n1+ S2

2n2

We obtain a 95% confidence interval for the true mean difference between these two groups ofmanagers: µ1 − µ2.

X1 − X2 = 60.46 − 69.24 = 8.78

√S2

1

n1+

S22

n2=

√22.212

166+

61.272

182=

√2.97 + 20.63 =

√23.60 = 4.86

95% CI for µ1−µ2 : (X1−X2)±z.05/2

√S2

1

n1+

S22

n2≡ 8.78±1.96(4.86) ≡ 8.78±9.52 ≡ (−0.74, 18.30)

We can be 95% confident that the true difference in mean salary progressions between the two groupsis between -0.74% and 18.30%. Since 0 is in this range (that is µ1 = µ2), we cannot conclude thereis a difference in the true underlying population means, although the sample means differed by8.78%. This is because of the large amount of variation in the individual salary progressions (seeS1 and S2). What would be your conclusion had you constructed a 90% confidence inetrval forµ1 − µ2?

Source: Stroh, L.K. and J.M. Brett (1996), “The Dual-Earner Dad Penalty in Salary Progres-sion,” Human Resource Management, 35:181-201.

6.2 Large–Sample Tests of Hypotheses

We also have a procedure to test hypotheses concerning parameter values. Hypothesis testing is aprocedure to make a decision concerning the value of an unknown parameter (although the methodis also used to test more general characteristics of populations than simply parameter values). Thetesting procedure involves setting up two contradicting statements concerning the true value ofthe parameter, known as the null hypothesis and the alternative hypothesis, respectively. Weassume the null hypothesis is true, and usually (but not always) wish to show that the alternative isactually true. After collecting sample data, we compute a test statistic which is used as evidencefor or against the null hypothesis (which we assume is true when calculating the test statistic). Theset of values of the test statistic that we feel provide sufficient evidence to reject the null hypothesisin favor of the alternative is called the rejection region. The probability that we could haveobtained as strong or stronger evidence against the null hypothesis, assuming that it is true, thanwhat we observed from our sample data is called the observed significance level or p–value.

An analogy that may help clear up these ideas is as follows. The researcher is like a prosecutorin a jury trial. The prosecutor must work under the assumption that the defendant is innocent (nullhypothesis), although he would like to show that the defendant is guilty (alternative hypothesis).

Page 37: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

The evidence that the prosecutor brings to the court (test statistic) is weighed by the jury to see ifit provides sufficient evidence to rule the defendant guilty (rejection region). The probability thatan innocent defendant could have had more damning evidence brought to trial than was broughtby the prosecutor (p-value) provides a measure of how strong the prosecutor’s evidence is againstthe defendant.

Testing hypotheses is ‘clearer’ than the jury trial because the test statistic and rejection regionare not subject to human judgement (directly) as the prosecutor’s evidence and jury’s perspectiveare. Since we do not know the true parameter value and never will, we are making a decision inlight of uncertainty. We can break down reality and our decision into Table 15.

DecisionH0 True H0 False

Actual H0 True Correct Decision Type I ErrorState H0 False Type II Error Correct Decision

Table 15: Possible outcomes of a hypothesis test

We would like to set up the rejection region to keep the probability of a Type I error (α) andthe probability of a Type II error (β) as small as possible. Unfortunately for a fixed sample size, ifwe try to decrease α, we automatically increase β, and vice versa. We will set up rejection regionsto control for α, and will not concern ourselves with β. However all tests described here have thelowest Type II error rates of any tests for a given sample size. Further, as sample sizes increase,the type II error rate decreases for a given state (value of θ) in the alternative hypothesis. Here αis the probability we reject the null hypothesis when it is true. (This is like sending an innocentdefendant to prison).

We can write out the general form of a hypothesis test in the following steps.

1. H0 : θ = θ0

2. HA : θ 6= θ0 or HA : θ > θ0 or HA : θ < θ0 (which alternative is appropriate should be clearfrom the setting).

3. T.S.: zobs = θ−θ0σ

θ(if the standard error is unknown, it is replaced by the estimated standard

error).

4. R.R.: |zobs| > zα/2 or zobs > zα or zobs < −zα (which R.R. depends on which alternativehypothesis you are using).

5. P -value: 2P (Z > |zobs|) or P (Z > zobs) or P (Z < zobs) (again, depending on which alternativeyou are using).

In all cases, a P -value less than α corresponds to a test statistic being in the rejection region(reject H0), and a P -value larger than α corresponds to a test statistic failing to be in the rejectionregion (fail to reject H0).

Example 6.2 – Treatment for Premature EjaculationThe efficacy of intracavernosal alprostadil was studied in men suffering from erectile dysfunction

(impotence). The measure under study (X) was the duration (in minutes) of erection as measured

Page 38: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

by the Rigiscan instrument (> 70% rigidity). Patients were assigned at random to receive eithera high or a low dose of the drug, and the manufacturer was interested in determining whetherincreased doses are associated with longer (in time, not length) erections. Consider the followingproblem components:

1 or 2 Populations? — We are comparing two groups – High vs Low Dose

Numeric or Presence/Absence Outcome — We are measuring the length of time that theerection sustains a specific level, which is numeric

Parameter of Interest — µ1 − µ2, the difference in the true mean lengths of duration

Appropriate Estimator — X1 − X2, the difference in the mean lengths of duration for thesamples of subjects in the clinical trial

Estimated Standard Error —√

S21

n1+ S2

2n2

Research Hypothesis (HA) — Goal is to show increased dose gives longer durations: HA : µ1 >µ2 or equivalently HA : µ1 − µ2 > 0

Type I Error — This occurs when we conclude that the drug is effective (µ1 > µ2), when in factit is not.

Type II Error — This occurs when we fail to conclude the drug is effective (fail to concludeµ1 > µ2) when in fact it is.

The sample statistics are reported in Table 16 (times are in minutes).

High Dose Low DoseX1 = 44 X2 = 12S1 = 56 S2 = 28n1 = 58 n2 = 57

Table 16: Summary statistics for the High and Low Doses (in minutes)

Now we test whether the mean time for the high dose exceeds that for the low dose (settingα = 0.05):

1. H0 : µ1 − µ2 = 0 HA : µ1 − µ2 > 0

2. T.S.: zobs = θ−θ0S

θ= (X1−X2)−0√

S21

n1+

S22

n2

= (44−12)−0)√(56)2

58+

(28)2

57

= 328.24 = 3.89

3. R.R.: zobs > zα = z.05 = 1.645

4. p-value: P (Z ≥ zobs) = P (Z ≥ 3.89) = .5 − P (0 ≤ Z ≤ 3.89) < .5 − .4998 = .0002

5. Conclusion: Since the test statistic falls in the rejection region (or, equivalently, the p-valueis below α, we reject the null hypothesis and claim that the true mean duration of erectionis higher for the high dose than the low dose (µ1 − µ2 > 0 ⇒ µ1 > µ2).

Page 39: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

Compute a 95% confidence interval for the difference in true mean erection times.

Example 6.3 Gastrointestinal Symptoms from OlestraAnecdotal reports were spread through the mainstream press that Procter & Gamble’s fat-

free substitute Olestra causes gastrointestinal (GI) side effects, even though such effects were notexpected based on clinical trials.

A study was conducted to compare the GI effects of Olestra based chips versus traditional chipsmade with trigyceride (TG). The goal was to determine whether or not the levels of GI side effectsbetween consumers of Olestra based chips and traditional chips.

At a Chicago movie theater, 563 subjects were randomized (blindly) to Olestra based chips, and529 received (blindly) traditional (TG) chips. Of the Olestra group, 89 reported suffering from agastrointestinal symptom (e.g. gas, diarrhea, abdominal cramping). Test whether the two types ofchips differ in terms of gastrointestinal effects.

1 or 2 Populations? — We are comparing two groups – Olestra vs TG

Numeric or Presence/Absence Outcome — We are measuring whether a consumer had agastrointestinal (GI) side effect (Presence/Absence)

Parameter of Interest — P1 − P2, the difference in the true proportions of consumers sufferingGI side effects

Appropriate Estimator — p1−p2, the difference in the in the proportions of consumers sufferingfrom GI side effects in the two groups

Estimated Standard Error (Under H0 : P1 = P2) —√

p(1 − p)(

1n1

+ 1n2

)where p = X1+X2

n1+n2

Research Hypothesis (HA) — Goal is to show differences in proportions of GI side effects:HA : P1 = P2 or equivalently HA : P1 − P2 6= 0

Type I Error — This occurs when we conclude that the rates of GI symptoms differ (P1 6= P2),when in fact they do not.

Type II Error — This occurs when we fail to conclude the rates of GI symptoms differ (fail toconclude P1 6= P2) when in fact they do.

For this experiment:

H0 : P1−P2 = 0 No Olestra Effect (Good or Bad) HA : P1−P2 6= 0Olestra Effect (Good or Bad) α = 0.05

p1 =X1

n1=

89563

= 0.158 p2 =X2

n2=

93529

= 0.176 p =X1 + X2

n1 + n2=

89 + 93563 + 529

=1821092

= 0.167

TS : Zobs =(p1 − p2) − 0√

p(1 − p)(

1n1

+ 1n2

) =(0.158− 0.176)− 0√

0.167(1− 0.167)(

1563 + 1

529

) = −0.80 RR : |Zobs| ≥ z.025 = 1.96

Thus, we have no evidence that the rate of GI symptoms differs between the two types of chips(in fact, the sample proportion is smaller for Olestra chips.Can you think of any other outcomes with respect to the chips of interest to Procter & Gamble?Obtain a 95% confidence interval for the difference between the true proportions. (The standarderror used above is very similar to the standard error not assuming equal proportions).Source: L.L. Cheskin, et al (1998), “Gastrointestinal Symptoms Following Consumption of Olestraor Regular Triglyceride Potato Chips,” JAMA, 279:150-152.

Page 40: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

7 Lecture 7 — Small–Sample Inference

Textbook Sections and pages: 9.3,10.2,10.3,pp392–393Problems: 9.11,13,17 10.11,15,17,23,25,29 11.1,3

In the case of small samples from populations with unknown variances, we can make use of thet-distribution to obtain confidence intervals or conduct tests of hypotheses regarding populationmeans. In all cases, we must assume that the underlying distribution is normal (or approximatelynormal), although this restriction is not necessary for moderate sample sizes. We will consider thecase of a single mean, µ, and the difference between two means, µ1 −µ2, separately. First, though,we refer back to the t-distribution in Table 4, page 669. This table gives the values tα such thatP (T > tα) = α for values of the degrees of freedom between 1 and 29. The bottom line gives thevalues zα, which should be used when the degrees of freedom exceed 30. I will also often add asecond subscript to tα to represent the appropriate degrees of freedom.

7.1 Inference Concerning µ

The general form for a confidence interval for µ remains the same as the large–sample case, exceptwe replace zα/2 by tα/2,n−1. The general formula is as follows:

X ± tα/2,n−1s√n

Testing a hypothesis concerning µ is also very similar to the large–sample case, with similarchanges as for confidence intervals. The general method is as follows:

1. H0 : µ = µ0

2. HA : µ 6= µ0 or HA : µ > µ0 or HA : µ < µ0 (which alternative is appropriate should be clearfrom the setting).

3. T.S.: tobs = X−µ0s√n

4. R.R.: |tobs| > tα/2,n−1 or tobs > tα,n−1 or tobs < −tα,n−1 (which R.R. depends on whichalternative hypothesis you are using).

5. p-value: 2P (T > |tobs|) or P (T > tobs) or P (T < tobs) (again, depending on which alternativeyou are using).

In this case, you cannot obtain an exact p-value, but you can obtain bounds for the p-value.Statistical computer packages report exact p-values.

7.2 Inference Concerning µ1 − µ2

Here we consider two ways of comparing the means of two populations or treatments. The firstapproach is based on independent samples, where the measurements are independent acrossgroups. This occurs when we sample units seperately from two populations, or assign experimentalunits to only one of two treatments being compared. The second approach involves paired sam-ples, where either each experimental unit is assigned to each of the two treatments, or units arepaired based on similar traits (e.g. individuals paired on race, gender, age, income, . . . ), and onereceives treatment 1, the matched individual receives treatment 2. Other examples of paired dataare given below.

Page 41: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

7.2.1 Independent Samples

When the samples are independent, we use methods very similar to those for the large–sample case.Examples of situations where the samples are independent include the following situations.

1. The mean lifetimes of two brands of television picture tubes are to be compared. A consumeradvovate samples n1 = 10 Sony televisions and n2 = 10 Mitsubishi televisions, measuring thelifetime of all tubes. The samples are independent because there is no connection betweenthe televisions in the two groups.

2. Two methods of teaching children a foreign language are to be compared. A class of 24children is split (randomly) into 2 groups of size 12. Each group receives one of the teachingmethods. The students’ foreign language proficiencies are measured at the end of the courses.These samples are independent because different children received the two teaching methods.

One important difference is that these methods assume the two population variances, althoughunknown, are equal. We then ‘pool’ the 2 sample variances to get an estimate of the commonvariance σ2 = σ2

1 = σ22 . This estimate, that we will call s2

p is calculated as follows:

S2p =

(n1 − 1)s21 + (n2 − 1)s2

2

n1 + n2 − 2.

The corresponding confidence interval can be written:

(X1 − X2) ± tα/2,n1+n2−2

√S2

p(1n1

+1n2

).

Similarly, the test of hypothesis concerning µ1 − µ2 is conducted as follows:

1. H0 : µ1 − µ2 = ∆0 (∆0 is usually 0).

2. HA : µ1 − µ2 6= ∆0 or HA : µ1 − µ2 > ∆0 or HA : µ1 − µ2 < ∆0 (which alternative isappropriate should be clear from the setting).

3. T.S.: tobs = (X1−X2)−∆0√S2

p( 1n1

+ 1n2

)

4. R.R.: |tobs| > tα/2,n1+n2−2 or tobs > tα,n1+n2−2 or tobs < −tα,n1+n2−2 (which R.R. depends onwhich alternative hypothesis you are using).

5. p-value: 2P (T > |tobs|) or P (T > tobs) or P (T < tobs) (again, depending on which alternativeyou are using).

Example 7.2 – Prozac for Borderline Personality DisorderThe efficacy of fluoxetine (Prozac) on anger in patients with borderline personality disorder was

studied in 22 patients with BPD. Among the measurements made by researchers was the Profile ofMood States (POMS) anger scale. Patients received either fluoxetine or placebo for 12 weeks, withmeasurements being made before and after treatment. Table ?? gives post-treatment summarystatistics for the two treatment groups. Low scores are better since the patient displays less anger.

First, we obtain a 95% confidence interval for the difference in true mean scores for the twotreatment groups. Then, we conduct a test to determine whether fluoxetine reduces mean angerscore (has a lower true mean) as compared to placebo (α = 0.05).

Page 42: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

Therapy 1 Therapy 2X1 = 40.3 X2 = 44.9s21 = 25.7 s2

2 = 75.2n1 = 13 n2 = 9

Table 17: Summary statistics for clinical depression example

a) To set up this confidence interval, we need to obtain the pooled variance (we are assumingthese population variances are the same), as well as the value of tα/2,n1+n2−2.

• S2p = (n1−1)s2

1+(n2−1)s22

n1+n2−2 = (13−1)25.7+(9−1)75.213+9−2 = 308.4+601.6

20 = 45.5

• tα/2,n1+n2−2 = t.025,20 = 2.086

Then we can set up the 95% confidence interval:

(X1 − X2) ± tα/2,n1+n2−2

√S2

p(1n1

+1n2

) = (40.3 − 44.9) ± 2.086√

45.5(113

+19)

= −4.6 ± 2.086(2.92) = −4.6 ± 6.10 = (−10.7, 1.50)

We are 95% confident that the true mean difference in improvement scores between the two therapiesis between −10.7 and 1.5. Since this interval for µ1 − µ2 contains 0, we cannot conclude thatµ1 − µ2 < 0, that is, we cannot that Prozac reduces anger. Two notes: (i) This is equivalent to a2-sided test (HA : µ1 6= µ2), not a 1-sided test (HA : µ1 < µ2). (ii) Note that these are very smallsamples.

b) To test whether Prozac reduces mean score we test as follows, making use of calculationsmade in part a) (α = 0.05):

H0 : µ1 − µ2 = 0 HA : µ1 − µ2 < 0

TS : tobs =(X1 − X2) − ∆0√

S2p( 1

n1+ 1

n2)

=−4.62.92

= −1.58

RR : tobs < −tα,n1+n2−2 = −t.05,20 = −1.725

Since our test statistic does not fall in the rejection region, we do not reject the null hypothesis ofno treatment effect. The P -value is the area under the t-distribution with 20 degrees of freedombewow -1.58, which is between 0.05 and 0.10 (−t.05,20 = −1.725 and −t.10,20 = −1.325).Source: Salzman, et al (1995), “Effects of Fluoxetine on Anger in Symptomatic Volunteers withBorderline Personality Disorder,” Journal of Clinical Psychopharmacology, 15:23-29.

7.2.2 Paired Samples

Samples are said to be paired if the measurements between the two samples are related. This isoften the case when we apply two ‘treatments’ to the same subjects or experimental material. Thefollowing examples describe situations in which the experiment consists of paired samples.

Page 43: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

1. An educationer would like to compare student scores on two different tests of natural ability.She selects 20 students at random and has each student take each exam (in random order),measuring the students’ scores on each exam. These samples are paired because the twosamples are made up of the same students. Some students will do very well on both exams,while others may do poorly on both exams. However, if one exam tends to yield higher scoresthan the other, it should show up for most or all of the students.

2. A clothing manufacturer wishes to compare the color retention of two types of blue dye. Sheselects a sample of 10 types of fabric, cutting each piece in half, and applying each type ofdye to a half of the piece. Each of the pieces are washed 15 times, and the amount of fadingis measured. (NOTE: there are 20 total measurements here since each type of fabric receivesboth dyes). These samples are paired because each piece of experimental material receiveseach dye.

The analysis of paired data involves computing the difference in the two measurements for eachsubject and then treating these differences as a single–sample. For each subject (or experimentalunit), we observe two measurements X1i and X2i (the i just represents which subject in the samplethe measurement represents). Then, for each subject, we calculate Di = X1i − X2i. Now, testingwhether the 2 population means are equal is equivalent to testing whether or not the mean differenceis 0. We compute:

D =∑n

i=1 Di

n, S2

d =∑n

i=1(Di − D)2

n − 1.

The (1 − α)100% confidence interval for µ1 − µ2 = µD is:

D ± tα/2,n−1Sd√n

Note the similarity between this and the single–sample case.To test hypotheses concerning the difference between the two population means, we use the

following method.

1. H0 : µ1 − µ2 = µD = ∆0

2. HA : µD 6= ∆0 or HA : µD > ∆0 or HA : µD < ∆0 (which alternative is appropriate shouldbe clear from the setting).

3. T.S.: tobs = D−∆0Sd√

n

4. R.R.: |tobs| > tα/2,n−1 or tobs > tα,n−1 or tobs < −tα,n−1 (which R.R. depends on whichalternative hypothesis you are using).

5. p-value: 2P (T > |tobs|) or P (T > tobs) or P (T < tobs) (again, depending on which alternativeyou are using).

Example 7.3 – Nicotine Delivery PatchesThe manufacturers of Nicoderm conducted an experiment to compare the delivery of Nicotine

of their patch versus that of their competitor Habitrol. They had 24 adult male smokers where eachpatch for 5 days (half wore Nicoderm first and Habitrol second, and the other half wore Habitrol firstand Nicoderm second). There was a 6-day washout period between wearing the two patches. The

Page 44: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

outcome measured was the amount of nicotine in the bloodstream over the fifth day. Higher valuesmean more nicotine has been delivered to the bloodstream from the patch (the measure is calledAUC - area under the concentration vs time curve). The same subjects are used for each patchbecause subjects metabolisms differ greatly, and this will remove subject-to-subject variability.

The mean difference (Nicoderm - Habitrol) among the n = 24 subjects was D = 55.0, with astandard deviation among the differences of Sd = 69.8. First, we test whether the true means differ(with α = 0.05), then we obtain a 95% confidence interval for difference in true means.

a) The test is done through the following steps.

1. H0 : µ1 − µ2 = µD = 0 HA : µ1 − µ2 = µD 6= 0

2. T.S.: tobs = D−∆0Sd√

n

= 55.0−069.8√

24

= 55.014.2 = 3.87

3. R.R.: |tobs| > tα/2,n−1 = t.025,23 = 2.069

4. p-value: 2P (T > tobs) = 2P (T > 3.87) < 2P (t > 2.808) = 2(.005) = .01 (since 2.808 is thelargest value on the table for 23 d.f.).

We can conclude that the mean amount of nicotine delivered is higher for Nicoderm than forHabitrol (since we reject H0 and D is positive).

b) The 95% confidence interval for the difference in true means is:

D ± tα/2,n−1Sd√n

55.0 ± 2.06969.8√

24≡ 55.0 ± 29.4 ≡ (25.6, 84.4)

We can conclude that the true mean for Nicoderm is between 25.6 and 84.4 units higher than thetrue mean for Habitrol.Source: S.K. Gupta, et al, (1995), “Comparison of the Pharmacokinetics of Two Nicodine Trans-dermal Systems: Nicoderm and Habitrol,” Journal of Clinical Pharmacology, 35:493-498.

Example 7.4 – Consumer Response to Introduction of New Coke (1985)In a well-publicized move, Coca-Cola made a bold business decision to replace the formulation

of its flag-ship soda Coke with a new formulation (New Coke). Preliminary research was conductedand produced the following information:

From 1981 to 1984, Coca-Cola tested the new formula in studies involving more than 190,000consumers in 25 cities. With the brands not identified, the New Coke flavor was preferred to originalone by 0.55 (55When the same consumers were told what they were tasting, preference for the NewCoke was .61 (61

However, there was a major problem: Consumers weren’t told that old Coke would be removedfrom the market.

Coca-Cola introduces New Coke, and consumers revolt. The problem?Referring to the fact that Coca-Cola researchers never made it clear to the consumers whom

they tested that original formula Coca-Cola would not be available as a choice, these executivesadmitted ”[t]hat was a mistake” and ”maybe we goofed”.

A psychological theory of reactance is hypothesized to occur if it is the case that New Cokewins in blind taste-tests and original Coke wins in labeled taste-tests. New Coke will be preferredin both cases if reactance diminishes over time. Subjects tasted both new and old coke under twoconditions (open label and blind), giving a score of 0-100 on taste. Different subjects were in the

Page 45: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

two groups. That is, you either drank the two formulations of Coke either blindly or open label,not doth.

a) Labelling µnew as the true mean score for New Coke, and µold as the true mean score for OldCoke. Give the alternative (research) hypotheses of reactance for the two conditions (blind andlabeled), where in each condition H0 : µnew − µold = 0:

Blind: HA : µnew − µold 0 Labeled: HA : µnew − µold 0

b) The following sample statistics were obtained from two samples of consumers. The sampleswere taken approximately 7 months after Coke was re-released after the New Coke disaster. Withineach sample, subjects tasted both Coke and New Coke, rating each brand on a scale of 0-100. Thesample means, mean differences, and standard deviation of the differences are given below. Computethe two test statistics for the tests from part a).

Blind Xnew = 59.5 Xold = 31.3 D = 28.2 Sd = 30.5 n = 25

Labeled Xnew = 60.7 Xold = 70.8 D = −10.1 Sd = 26.0 n = 24

c) Give the appropriate rejection regions, assuming that preference differences are approximatelynormally distributed (each test based on α = 0.05):

d) Can we conclude that reactance has been demonstrated by consumers? Note that since weare conducting two independent tests, our overall Type I error rate is approximately 2(0.05)=0.10(that we reject at least one null hypothesis when they are both true).

Example 7.5 – Battle of the Network Nightly NewsWho gets the best ratings, Peter Jennings or Dan Rather? We take a random sample of weekly

mean number of households for each news program over the course of 1997. For each week, weobserve the mean number of viewers as reported by Nielsen (actually, these are estimates, but ignorethat for our purposes) for ABC and CBS. We wish to determine whether there are differences inthe mean weekly viewership between the two networks.

a) Would these samples be considered independent or paired? Why? (Hint: what might causevariations in weekly news ratings, independent of the actual news programs)

b) If we denote x1i be the ratings for ABC on week i and x2i be the ratings for CBS on thesame week. What are the null and alternative hypotheses that we wish to test?

c) The data are given in Table 18, and are in millions of viewers per night. Give the mean andstandard deviation of the differences.

Page 46: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

Week (i) ABC (x1i) CBS (x2i) Di = x1i − x2i D2i

1 8.2 7.2 1.0 1.002 7.2 6.3 0.9 0.813 7.1 6.2 0.9 0.814 8.7 8.9 -0.2 0.045 7.2 7.1 0.1 0.016 6.6 6.4 0.2 0.047 8.8 7.6 1.2 1.448 8.5 8.6 -0.1 0.019 9.6 7.8 1.8 3.2410 9.2 8.5 0.7 0.49

Sum 81.1 74.6 6.5 7.89

Table 18: Sample of 10 weeks viewers for ABC and CBS news from 1997

d) Test whether the true means differ for the two networks. Clearly state the null and alternativehypotheses, test statistic, and rejection region.

e) What is the appropriate conclusion?

(i)µABC > µCBS (ii)µABC = µCBS (iii)µABC < µCBS

f) Based on your conclusion, we are at risk of (but have not necessarily made):(i) A Type I Error(ii) A Type II Error(iii) No error(iv) Either a Type I or Type II Error

Source:Daily Variety (1997 editions)

7.3 Statistical Models

We have seen the concepts of random variables, probability distributions, and inferential methodsconcerning their parameters (confidence intervals and tests of hypotheses). We will now write therandom variable in a form that breaks its value down into two components – its mean, and its‘deviation’ from the mean. We can write X as follows:

X = µ + (X − µ) = µ + ε,

where ε = X − µ. Note that if X ∼ N(µ, σ), then ε ∼ N(0, σ). Also note that µ is unknown(although we can estimate it), and so ε is unknown as well. We will be fitting different models inthis course, and estimating parameters corresponding to the models, as well as testing hypothesesconcerning them.

Page 47: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

8 Lecture 8 — Experimental Design and the Analysis of Variance

Textbook Section: Section 11.2Problems: 11.7,9,14,15,17

In this section, we will look at the effects of strictly qualitative variable(s) on the mean responseof a quantitative outcome variable. There are two distinct methods by which these measurementscan be made: controlled experiments and sample surveys. Some situations where analyses ofthis type are used are given below:

AOV1 A drug manufacturer would like to compare four formulations to decide which is mosteffective at reducing arthritic pain.

AOV2 A psychologist wishes to find out which of six classroom atmospheres provides the bestlearning results among young children.

AOV3 A management consultant is interested in deciding if three managerial techniques differ interms of their corporate efficiencies.

Before we get involved in the mathematics and model formulation, we will describe the two exper-imental situations and define some useful terms.

In a controlled experiment, the experimenter selects a sample of subjects or objects froma population of such subjects or objects. These are referred to as experimental units. Theseexperimental units are what we will make our measurements on. After the experimental units areselected, treatments are applied to the experimental units. These treatments are made up of oneor more factors, or experimental variables. We wish to estimate the effects of these treatmentson the units. We will refer to the levels as the intensity settings of the factors. Note that in acontrolled experiment, we are applying the treatments to the experimental units, and we wish toestimate the effects of the various levels of the factor(s). Generally we would like to decide if certainlevels provide higher (or lower) mean responses than other levels. Note that we have already donethis in the case of one factor possessing two levels in the previous chapter (two-sample t-test andthe paired difference test).

In an observational study, the experimenter selects samples of objects from several popula-tions and wishes to observe if the population means are the same. The mathematics of the analysisis the same for both of these methods, but the interpretations have subtle differences. In thissituation, we are not applying treatments to the elements of the sample, but rather observing somemeasurement of interest. We still often refer to these different populations as treatments, eventhough we aren’t really applying them to experimental units.

A couple of examples should clear up this diffence. First, consider a study of four blood pressuremedications. Twenty subjects with relatively comparable levels of high blood pressure are sampled,and each subject is given one of the four medications for a month with their blood pressure beingmeasured at the end of the study. In this setting, the patients are the experimental units, the fourmedications are the treatments, and we have randomly assigned one medication (level) to eachsubject. This is a controlled experiment. Now consider a study to observe whether four brands oftelevision sets have the same mean lifetimes. We sample five of each brand, observing the lifetimeof each set. In this case, we consider the brands to be treatments, although we are not applyingbrands to ‘experimental units’. However, in both of these situations, the method of testing fordifferences among the effects of the medications, and of testing for differences among the brandmeans are identical. Thus, we will not need to distinguish between the situations explicitly, but

Page 48: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

it is important to distinguish which one you are in from an interpretation standpoint. We willalways refer to these designs from the controlled experiment setting, with obvious extensions to theobservational study being implied.

8.1 Completely Randomized Design (CRD)

In the Completely Randomized Design, we have one factor that we are controlling. This factor hasC levels, and we measure nj units on the jth level of the factor. In terms of the sample survey, weare sampling nj items from the jth population, j = 1, . . . , C. We will define the observed responsesas Xij , representing the measurement on the ith experimental unit, receiving the jth treatment.We will write this in model form as follows:

Xij = µ + αj + εij = µj + εij.

Here, µ is the overall mean measurement across all treatments, αj is the effect of the jth treatment(µj = µ+αj), and εij is a random error component that is assumed to be normally distributed withmean 0 and standard deviation σ. This εij can be thought of as the fact that there will be variationamong the measurements of different experimental units receiving the same treatment in the case ofa controlled experiment, or the fact that different elements sampled from the same population willhave varying measurements. This means that our model assumes that Xij ∼ N(µ+αj = µj, σ). Wefurther assume the measurements are independent of one another. We will place a condition on theeffects αj, namely that they sum to zero. Of interest to the experimenter is whether or not thereis a treatment effect, that is do any of the levels of the treatment provide higher (lower) meanresponse than other levels. This can be hypothesized symbolically as H0 : α1 = α2 = · · · = αC = 0(no treatment effect) against the alternative HA : Not all αj = 0 (treatment effects exist). Beforewe set up this testing procedure, we must define a few items.

N = n1 + · · · + nC

Xj =∑nj

i=1 Xij

nj

S2j =

∑nj

i=1(Xij − Xj)2

nj − 1

TotalSS =C∑

j=1

nj∑

i=1

(Xij − X)2

SST =C∑

j=1

nj∑

i=1

(Xj − X)2 =C∑

j=1

nj(Xj − X)2

SSE =C∑

j=1

(nj − 1)S2j =

C∑

j=1

(nj − 1)S2j

Total SS represents the total variation of the sample measurements around the overall samplemean. This Total variation is partitioned into variation Between treatment means (SST ) andvariation Within treatments (SSE). Often, we refer to SST as the Model sum of squares andSSE as the Error sum of squares. Note that the model and error sums of squares add up to thetotal sum of squares. That is:

TotalSS = SST + SSE

Page 49: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

The point of the Analysis of Variance is to detect whether differences exist in the population meansof the treatments, and if so, to determine which treatments provide higher (lower) mean responses.

Associated with each source of variation, we have degrees of freedom. The total sum ofsquares has N − 1 degrees of freedom, since it is made up of N − 1 independent terms (we haveestimated the mean from the sample). The model sum of squares measures the variation in theC treatment means around the overall mean, and has dfT = C − 1 degrees of freedom. Finally,the error sum of squares is made up of variation of the individual measurements around the Ctreatment means, and has dfE = N − C. Note that just as the model and error sums of squaressum to the total sum of squares, the degrees of freedom also are additive. That is:

dfTotal = N − 1 = (C − 1) + (N − C) = dfT + dfE

Also, we can obtain an estimate of the error variance σ2, by taking the ‘average’ squared distance

of each observed value to its treatment mean. That is s2 =∑C

j=1(nj−1)S2

j

N−C = SSEN−C = MSE, we

divide by N − C because we are estimating C parameters (treatment means). We can set up anAnalysis of Variance table representing the decomposition of the total variation into parts due tothe model (between treatments) and error (within treatments), this is shown in Table 19.

ANOVASource of Sum of Degrees of MeanVariation Squares Freedom Square F

TREATMENTS SST =∑k

j=1 nj(Xj − X)2 C − 1 MST = SSTC−1 F = MST

MSE

ERROR SSE =∑C

j=1(nj − 1)S2j N − C MSE = SSE

N−C

TOTAL TotalSS =∑C

j=1

∑nj

i=1(Xij − X)2 N − 1

Table 19: The Analysis of Variance Table for the Completely Randomized Design

Recall the model that we are using to describe the data in this design:

Xij = µ + αj + εij = µj + εij.

The effect of the jth treatment is αj . If there is no treatment effect among any of the levels of thefactor under study, that is that the population means of the C treatments are the same, then eachof the parameters αj are 0. This is a hypothesis we would like to test. The alternative hypothesiswill be that not all treatments have the same mean, or equivalently, that treatment effects exist (notall αj are 0). If the null hypothesis is true (all C population means are equal), then the statisticF = MST

MSE follows the F -distribution with C − 1 numerator and N − C denominator degrees’ offreedom. Large values of F are evidence against the null hypothesis of no treatment effect (recallwhat SST and SSE are).

Upper percentage points of the F–distribution are given in Table A.7 (pp A-16 – A-25) of yourtext book. This distribution has 2 parameters ν1 and ν2, which are called the numerator anddenominator degrees’ of freedom, respectively. These tables give the upper tail cut off for variousvalues of ν1, ν2, and α (the upper tail probability). Under the null hypothesis of no differencesamong treatment means (α1 = · · · = αC = 0), the test statistic F = MSR

MSE has a F– distributionwith ν1 = C − 1 and ν2 = N − C degrees’ of freedom. Large values of F = MSR

MSE are evidenceagainst the null hypothesis. We will denote Fα,ν1,ν2 as the cut off value that leaves a probability ofα in the upper tail of the F–distribution with ν1 and ν2 degrees’ of freedom. The testing procedureis as follows:

Page 50: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

1. H0 : α1 = · · · = αC = 0 (µ1 = · · · = µC) (No treatment effect)

2. HA : Not all αj are 0 (Treatment effects exist)

3. T.S. Fobs = MSTMSE

4. R.R.: Fobs > Fα,C−1,N−C

5. p-value: P (F > Fobs)

Example 8.1 – Sexual Side Effects of 4 AntidepressantsA comparison of C = 4 antidepressants in terms of reported sexual side effects was conducted at

the University of Alabama Medical School. Patients currently prescribed to antidepressants werecontacted and asked a series of questions regarding their treatment and side effects. One responseof interest was their perceived change in libido, which was a continuous response between −2 and+2, based on a mark along a continuous line segment. Note that this is an observational study,as people had already been assigned to treatments and identified after they had begun treatment.Note that three of the four brands (Prozac, Zoloft, and Paxil) are selective serotonin re-uptakeinhibitors (SSRI’s), while the fourh brand does not come from that class of drug (Wellbutrin).Source: Clinical Pharmacology & Therapeutics,12:254–258.

X — Self–reported change in libido after treatment on a continuous visual analogue scaleranging from −2 to 2, with 0 representing no change. (X = −0.38)

Summary calculations are given in Table 20 and Analysis of Variance in Table 21.

Drug (j) nj Xj Sj nj(Xj − X)2 (nj − 1)S2j

Wellbutrin (1) 22 0.46 0.80 22(0.46 − (−0.38))2 = 15.52 (22 − 1)(0.80)2 = 13.44Prozac (2) 37 −0.49 0.97 37(−0.49 − (−0.38))2 = 0.45 (37 − 1)(0.97)2 = 33.87Paxil (3) 21 −0.90 0.73 21(−0.90 − (−0.38))2 = 5.68 (21 − 1)(0.73)2 = 10.66Zoloft (4) 27 −0.49 1.25 27(−0.49 − (−0.38))2 = 0.33 (27 − 1)(1.25)2 = 40.63

n = 107 X = −0.38 — SSC = 21.98 SSE = 98.60

Table 20: Summary statistics and sums of squares calculations for sexual side effects of antidepres-sant data.

ANOVASource of Sum of Degrees of MeanVariation Squares Freedom Square F

TREATMENTS 21.98 3 7.33 7.64ERROR 98.60 103 0.96TOTAL 120.58 106

Table 21: The Analysis of Variance table for sexual side effects in four antidepressant groups

Are there differences among the effects of the four brands (Test with α = 0.05)?

• H0 : µ1 = µ2 = µ3 = µ4 Ha : Not all µj are equ al

Page 51: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

• TS : Fobs = MSCMSE = 7.33

0.96 = 7.64

• RR : Fobs ≥ Fα,C−1,N−C = F0.05,3,103 ≈ 2.68

• p–value:P (F ≥ Fobs) = P (F ≥ 7.64) < P (F ≥ 4.58) ≈ 0.005

We can conclude that the sexual side effects differ among the 4 brands at virtually any levelof α since our P–value is so small. There is virtually no chance we would have observedthis large of variation among the four sample means if the true (unknown) populationmeans are the same.

Source: J.G. Modell, et al (1997), “Comparative Sexual Side Effects of Bupropion, Fluoxetine,Paroxetine, and Sertraline,” Clinical Pharmacology & Therapeutics, 61:476-487.

Example 8.2 – Corporate Social Responsibility and the MarketplaceA study was conducted to determine whether levels of corporate social responsibility (CSR)

vary by industry type. That is, can we explain a reasonable fraction of the overall variation inCSR by taking into account the firm’s industry? If there are differences by industry, this mightbe interpreted as the existence of ”industry forces” that affect what a firm’s CSR will be. Forinstance, consumer and service firms may be more aware of social issues and demonstrate higherlevels of CSR than companies that deal less with the direct public (more removed from the retailmarketplace).

The partial ANOVA table is given in Table 22. Use it to complete the following questions.

ANOVASource of Degrees of Sum of MeanVariation Freedom Squares Square F

Industry (Trts) 17Error 162 57.55Total 82.71

Table 22: The Analysis of Variance table for Corporate Social Responsibility

a) Complete the ANOVA table.

b) Test whether mean CSR scores differ among the industries (α = 0.05).

c) What can be said of the P -value (give a range)?

d) How many firms were represented in the sample?

e) How many industries were represented?

Source: M.T. Cottrill (1990), “Corporate Social Responsibility and the Marketplace,” Journalof Business Ethics, 9:723-729.

Page 52: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

Example 8.3 – Salary Progression By IndustryA recent study reported salary progressions during the 1980’s among k=8 industries. Results

including industry means, standard deviations, and sample sizes are given in Table 23. Also includedare columns that produce the treatment (between industry) and error (within industry) sums ofsquares. The overall mean X = 65.11.

Industry (j) nj Xj Sj nj(Xj − X)2 (nj − 1)S2j

Pharmaceuticals (1) 35 70.69 27.64 1089.8 25975.0Communications (2) 74 62.23 22.80 613.7 37948.3Food (3) 49 54.93 16.23 5077.9 12643.8Financial Srvcs (4) 21 131.16 145.42 91615.0 422939.5Retail (5) 10 43.54 11.97 4652.6 1289.5Hotel and Travel (6) 21 65.80 30.17 10.0 18204.6Chemicals (7) 60 60.04 20.39 1542.2 24529.4Manufacturing (8) 78 60.43 22.28 1708.3 38222.7Sum 348 106309.5 581752.8

Table 23: Salary Progressions by Industry

a) Give the Analysis of Variance

b) Test whether the true mean salary progressions differ among these industries (α = 0.05).

Source: L.K. Stroh and J.M. Brett (1996), “The Dual-Earner Dad Penalty in Salary Progres-sion,” Human Resources Management 35:181-201.

Example 8.4 – Professional Women as a Potential Market SegmentA study reported results of a survey among women’s shopping habits. Among the variables

measured and reported was the amount of time spent shopping in a new grocery store. Womenwere classified as: housewives (H), professional working women (P), or non-professional workingwomen (NP). Group means and sample sizes are given below and Table ?? gives the basis of theAnalysis of Variance.

XH = 56.9 nH = 80 XP = 53.3 nP = 24 XNP = 49.7 nNP = 40

Source df SS

Groups 1411.2Error 141 30244.5Total

Table 24: ANOVA table for Professional Women as Market Segment study

a) Complete the ANOVA table.

Page 53: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

b) Give the estimate of the within group standard deviation in shopping times.

c) Give the null and alternative hypotheses for testing whether true mean shopping times differamong these three potential market segments.

d) Give the test statistic, rejection region, and conclusion for the test in part c) (use 120denominator df).

e) What can be said of the P -value?

(i)P < .05 (ii)P > .05 (iii)P cannot be determined

Source: M. Joyce and J. Guiltinan (1978), “The Professional Woman: A Potential MarketSegment for Retailers,” Journal of Retailing, 54:59-70.

Example 8.5 – Impact of Attention on Attribute Performance Assessments

A study was conducted to determine whether the amount of attention (measured by the timesubject is exposed to advertisement) is related to th importance ratings of a product attribute.Subjects were asked to rate on a scale the importance of water resistance in a watch. People wereexposed to the ad for either 60, 105, or 150 seconds. The means, standard deviations, and samplesizes for each treatment are given in Table 25. The overall mean is computed as follows:

X =Total importance score

Overall sample size=

11(4.3) + 10(6.8) + 9(7.1)11 + 10 + 9

=179.230

= 6.0

Statistic 60 seconds (j = 1) 105 seconds (j = 2) 150 seconds (j = 3)Mean (Xj) 4.3 6.8 7.1Std Dev (Sj) 1.8 1.7 1.5Sample Size (nj) 11 10 9

Table 25: Summary statistics for Attention/Attribute Performance Study

a) Set up the ANOVA table.

b) Test whether differences exist among the mean importance scores for the three exposuretimes (α = 0.05)

Source: S.B. MacKenzie (1986), “The Role of Attention in Mediating the Effect of Advertisingon Attribute Performance,” Journal of Consumer Research, 13:174-195.

Page 54: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

9 Lecture 9 — Comparison of Treatment Means

Textbook Section: 11.3Problems: Apply Tukey’s Method to the Problems in Section 11.2

Assuming that we have concluded that treatment means differ, we generally would like to knowwhich means are significantly different. This is generally done by making either pre–planned orall pairwise comparisons between pairs of treatments. We will look at how to make pre–plannedcomparisons, and then how to make all comparisons. The two methods are very similar.

9.1 Pre–Planned Comparisons

Suppose we want to compare treatments i and j. That is, we’d like to decide whether or notµi = µj . In previous coursework, we have done this when we had 2 populations to be compared(the two–sample t-test). The method for making comparisons among 2 of the C populations ina CRD is exactly like what we did then, except now S2 = MSE (see above) and the degress offreedom are N − C. It is useful to think of MSE as a pooled estimate of the common variancesamong the C populations, just like S2

p was pooled in the two–sample case. The (1 − α)100%confidence interval for the difference in two means (µi − µj) is:

(X i − Xj) ± tα/2,N−C

√MSE(

1ni

+1nj

).

The inference we can make concerning the population means is as follows:

1. If the entire confidence interval for µi − µj is positive, we conclude that treatment i hasa higher mean than treatment j.

2. If the entire confidence interval for µi − µj is negative, we conclude that treatment i hasa lower mean than treatment j.

3. If the interval contains both positive and negative values, we cannot conclude thatthe means of treatments i and j are different.

9.2 All Pairwise Comparisons

The previous method described works well with pre–planned comparisons, but can lead to mis-leading results when being used on many or all pairwise comparisons among treatments. Variousmethods have been developed to handle all possible comparisons and keep the overall error rate atα. We will describe one commonly used method known as Tukey’s method of multiple comparisons.Computer packages will print these comparisons automatically. Tukey’s method involves setting upconfidence intervals for all pairs of treatment means simultaneously. If there are k treatments, theirwill be C(C−1)

2 such intervals. The general form, allowing for different sample sizes for treatmentsi and j is:

(X i − Xj) ± qα,C,N−C

√√√√MSE

2

(1ni

+1nj

),

Page 55: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

where qα,C,N−C is called the studentized range and is given in Table A.10 on page A–29. When thesample sizes are equal (ni = nj), the formula can be simplified to:

(Xi − Xj) ± qα,C,N−C

√MSE(

1ni

).

The term

qα,C,N−C

√√√√MSE

2

(1ni

+1nj

)

is referred to as Tukey’s “Honest Significant Difference”, or HSD.An alternative approach to forming the confidence interval is to simply compare |X i − Xj |

with HSDi,j . If the difference in means exceeds (in absolute value) HSD, we can conclude thatthe population means differ, otherwise we cannot conclude that they differ. The direction of anysignificant difference depends on the sign of (X i − Xj)

Example 9.1We’ve determined differences exist among the sexual side effects of the antidepressant brands.

Which brands differ?We use Tukey’s HSD test and make C(C−1)

2 = 4(3)2 = 6 comparisons with a simultaneous Type

I error rate of α = 0.05. The critical difference for treatments i and j are:

HSDi,j = qα,C,N−C

√√√√MSE

2

(1ni

+1nj

)

For treatments 1 and 2 (Wellbutrin vs Prozac), we have:Trt1 vs Trt 2: q.05,4,103 ≈ 3.68 MSE = 0.96 n1 = 22 n2 = 37 which leads to:

HSD1,2 = 3.68

√0.962

(122

+137

)= 3.68

√.0348 = 0.686

Thus, we compare the difference between the means for Wellbutrin and Prozac with this criticaldifference.

X1 − X2 = 0.46 − (−0.49) = 0.95 > 0.686 (µ1 > µ2)

Since the means differ by more than 0.686, we conclude they differ and that Wellbutrin users reporthigher scores on average than Prozac users. The results for all pairs are given in Table 26, whereN.S.D. in the Conclusion column means “Not Significantly Different”.

The primary conclusion is that Wellbutrin users have a higher population mean than the otherthree brands’ users. None of the three SSRI’s means can be determined to differ.

Example 9.2 – Salary Progression By IndustryRefer to Example 8.3.

a) Between which two industries will Tukey’s HSD be the largest? Compute this value. Dothese two industries differ significantly (use α = 0.05)?

Page 56: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

Simultaneous 95% CI’sComparison Xi − Xj HSDi,j Conclude

1 v 2 0.46 − (−0.49) = 0.95 0.686 µ1 > µ2

1 v 3 0.46 − (−0.90) = 1.36 0.778 µ1 > µ3

1 v 4 0.46 − (−0.49) = 0.95 0.732 µ1 > µ4

2 v 3 −0.49 − (−0.90) = 0.41 0.697 N.S.D.2 v 4 −0.49 − (−0.49) = 0.00 0.645 N.S.D3 v 4 −0.90 − (−0.49) = −0.41 0.742 N.S.D

Table 26: Tukey multiple comparisons for the sexual side effects study patients receiving antide-pressants

b) Between which two industries will Tukey’s HSD be the smallest? Compute this value. Dothese two industries differ significantly (use α = 0.05)?

Example 9.3 – Professional Women as a Potential Market SegmentRefer to Example 8.4. Compute Tukey’s HSD for all three pairwise comparisons (use α = 0.05).

Example 9.4 – Impact of Attention on Attribute Performance AssessmentsRefer to Example 8.5. Compute Tukey’s HSD for all three pairwise comparisons (use α = 0.05).

Page 57: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

10 Lecture 10 — Simple Linear Regression I – Least Squares Es-timation

Textbook Sections: 12.1–12.4Problems: 12.1,3,4,5,8,13,16,23,26

Previously, we have worked with a random variable X that comes from a population that isnormally distributed with mean µ and variance σ2. We have seen that we can write X in terms ofµ and a random error component ε, that is, X = µ+ ε. For the time being, we are going to changeour notation for our random variable from X to Y . So, we now write Y = µ + ε. We will nowfind it useful to call the random variable Y a dependent or response variable. Many times, theresponse variable of interest may be related to the value(s) of one or more known or controllableindependent or predictor variables. Consider the following situations:

LR1 A college recruiter would like to be able to predict a potential incoming student’s first–yearGPA (Y ) based on known information concerning high school GPA (X1) and college entranceexamination score (X2). She feels that the student’s first–year GPA will be related to thevalues of these two known variables.

LR2 A marketer is interested in the effect of changing shelf height (X1) and shelf width (X2) onthe weekly sales (Y ) of her brand of laundry detergent in a grocery store.

LR3 A psychologist is interested in testing whether the amount of time to become proficient in aforeign language (Y ) is related to the child’s age (X).

In each case we have at least one variable that is known (in some cases it is controllable), and aresponse variable that is a random variable. We would like to fit a model that relates the responseto the known or controllable variable(s). The main reasons that scientists and social researchersuse linear regression are the following:

1. Prediction – To predict a future response based on known values of the predictor variablesand past data related to the process.

2. Description – To measure the effect of changing a controllable variable on the mean valueof the response variable.

3. Control – To confirm that a process is providing responses (results) that we ‘expect’ underthe present operating conditions (measured by the level(s) of the predictor variable(s)).

10.1 A Linear Deterministic Model

Suppose you are a vendor who sells a product that is in high demand (e.g. cold beer on the beach,cable television in Gainesville, or life jackets on the Titanic, to name a few). If you begin your daywith 100 items, have a profit of $10 per item, and an overhead of $30 per day, you know exactlyhow much profit you will make that day, namely 100(10)-30=$970. Similarly, if you begin the daywith 50 items, you can also state your profits with certainty. In fact for any number of items youbegin the day with (X), you can state what the day’s profits (Y ) will be. That is,

Y = 10 · X − 30.

Page 58: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

This is called a deterministic model. In general, we can write the equation for a straight line as

Y = β0 + β1X,

where β0 is called the Y–intercept and β1 is called the slope. β0 is the value of Y when X = 0,and β1 is the change in Y when X increases by 1 unit. In many real–world situations, the responseof interest (in this example it’s profit) cannot be explained perfectly by a deterministic model. Inthis case, we make an adjustment for random variation in the process.

10.2 A Linear Probabilistic Model

The adjustment people make is to write the mean response as a linear function of the predictorvariable. This way, we allow for variation in individual responses (Y ), while associating the meanlinearly with the predictor X. The model we fit is as follows:

E(Y |X) = β0 + β1X,

and we write the individual responses as

Y = β0 + β1X + ε,

We can think of Y as being broken into a systematic and a random component:

Y = β0 + β1X︸ ︷︷ ︸systematic

+ ε︸︷︷︸random

where X is the level of the predictor variable corresponding to the response, β0 and β1 areunknown parameters, and ε is the random error component corresponding to the response whosedistribution we assume is N(0, σ), as before. Further, we assume the error terms are independentfrom one another, we discuss this in more detail in a later chapter. Note that β0 can be interpretedas the mean response when X=0, and β1 can be interpreted as the change in the mean responsewhen X is increased by 1 unit. Under this model, we are saying that Y |X ∼ N(β0 + β1X,σ).Consider the following example.

Example 10.1 – Coffee Sales and Shelf SpaceA marketer is interested in the relation between the width of the shelf space for her brand of

coffee (X) and weekly sales (Y ) of the product in a suburban supermarket (assume the height isalways at eye level). Marketers are well aware of the concept of ‘compulsive purchases’, and knowthat the more shelf space their product takes up, the higher the frequency of such purchases. Shebelieves that in the range of 3 to 9 feet, the mean weekly sales will be linearly related to thewidth of the shelf space. Further, among weeks with the same shelf space, she believes that saleswill be normally distributed with unknown standard deviation σ (that is, σ measures how variableweekly sales are at a given amount of shelf space). Thus, she would like to fit a model relatingweekly sales Y to the amount of shelf space X her product receives that week. That is, she is fittingthe model:

Y = β0 + β1X + ε,

so that Y |X ∼ N(β0 + β1X,σ).One limitation of linear regression is that we must restrict our interpretation of the model to

the range of values of the predictor variables that we observe in our data. We cannot assume thislinear relation continues outside the range of our sample data.

We often refer to β0 + β1X as the systematic component of Y and ε as the random component.

Page 59: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

10.3 Least Squares Estimation of β0 and β1

We now have the problem of using sample data to compute estimates of the parameters β0 and β1.First, we take a sample of n subjects, observing values Y of the response variable and X of thepredictor variable. We would like to choose as estimates for β0 and β1, the values b0 and b1 that‘best fit’ the sample data. Consider the coffee example mentioned earlier. Suppose the marketerconducted the experiment over a twelve week period (4 weeks with 3’ of shelf space, 4 weeks with6’, and 4 weeks with 9’), and observed the sample data in Table 27.

Shelf Space Weekly Sales Shelf Space Weekly Salesx y x y

6 526 6 4343 421 3 4436 581 9 5909 630 6 5703 412 3 3469 560 9 672

Table 27: Coffee sales data for n = 12 weeks

S A L E S

3 0 0

4 0 0

5 0 0

6 0 0

7 0 0

S P A C E

0 3 6 9 1 2

Figure 10: Plot of coffee sales vs amount of shelf space

Now, look at Figure 10. Note that while there is some variation among the weekly sales at 3’,6’, and 9’, respectively, there is a trend for the mean sales to increase as shelf space increases. Ifwe define the fitted equation to be an equation:

Y = b0 + b1X,

we can choose the estimates b0 and b1 to be the values that minimize the distances of the data pointsto the fitted line. Now, for each observed response Yi, with a corresponding predictor variable Xi,we obtain a fitted value Yi = b0 + b1Xi. So, we would like to minimize the sum of the squareddistances of each observed response to its fitted value. That is, we want to minimize the error

Page 60: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

sum of squares, SSE, where:

SSE =n∑

i=1

(Yi − Yi)2 =n∑

i=1

(Yi − (b0 + b1Xi))2.

A little bit of calculus can be used to obtain the estimates:

b1 =∑n

i=1(Xi − X)(Yi − Y )∑n

i=1(Xi − X)2=

Sxy

Sxx,

andb0 = Y − β1X =

∑ni=1 yi

n− b1

∑ni=1 xi

n.

Some shortcut equations, known as the corrected sums of squares and crossproducts, that whilenot very intuitive are very useful in computing these and other estimates are:

• SSXX =∑n

i=1(Xi − X)2 =∑n

i=1 X2i − (

∑n

i=1Xi)

2

n

• SSXY =∑n

i=1(Xi − X)(Yi − Y ) =∑n

i=1 XiYi −(∑n

i=1Xi)(

∑n

i=1Yi)

n

• SY Y =∑n

i=1(Yi − Y )2 =∑n

i=1 Y 2i − (

∑n

i=1Yi)2

n

Example 10.1 Continued – Coffee Sales and Shelf SpaceFor the coffee data, we observe the following summary statistics in Table 28.

Week Space (X) Sales (Y ) X2 XY Y 2

1 6 526 36 3156 2766762 3 421 9 1263 1772413 6 581 36 3486 3375614 9 630 81 5670 3969005 3 412 9 1236 1697446 9 560 81 5040 3136007 6 434 36 2604 1883568 3 443 9 1329 1962499 9 590 81 5310 34810010 6 570 36 3420 32490011 3 346 9 1038 11971612 9 672 81 6048 451584∑

X = 72∑

Y = 6185∑

X2 = 504∑

XY = 39600∑

Y 2 = 3300627

Table 28: Summary Calculations — Coffee sales data

From this, we obtain the following sums of squares and crossproducts.

SSXX =∑

(X − X)2 =∑

X2 − (∑

X)2

n= 504 − (72)2

12= 72

SSXY =∑

(X − X)(Y − Y ) =∑

XY − (∑

X)(∑

Y )n

= 39600 − (72)(6185)12

= 2490

Page 61: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

SSY Y =∑

(Y − Y )2 =∑

Y 2 − (∑

Y )2

n= 3300627 − (6185)2

12= 112772.9

From these, we obtain the least squares estimate of the true linear regression relation (β0+β1X).

b1 =SSXY

SSXX=

249072

= 34.5833

b0 =∑

Y

n− b1

∑X

n=

618512

− 34.5833(7212

) = 515.4167 − 207.5000 = 307.967.

Y = b0 + b1X = 307.967 + 34.583X

So the fitted equation, estimating the mean weekly sales when the product has X feet of shelfspace is Y = β0 + β1X = 307.967 + 34.5833X. Our interpretation for b1 is “the estimate for theincrease in mean weekly sales due to increasing shelf space by 1 foot is 34.5833 bags of coffee”.Note that this should only be interpreted within the range of X values that we have observed inthe “experiment”, namely X = 3 to 9 feet.

Example 10.2 – Computation of a Stock BetaA widely used measure of a company’s performance is their beta. This is a measure of the firm’s

stock price volatility relative to the overall market’s volatility. One common use of beta is in thecapital asset pricing model (CAPM) in finance, but you will hear them quoted on many businessnews shows as well. It is computed as (Value Line):

The “beta factor” is derived from a least squares regression analysis between weeklypercent changes in the price of a stock and weekly percent changes in the price of allstocks in the survey over a period of five years. In the case of shorter price histories, asmaller period is used, but never less than two years.

In this example, we will compute the stock beta over a 28-week period for Coca-Cola andAnheuser-Busch, using the S&P500 as ’the market’ for comparison. Note that this period is onlyabout 10% of the period used by Value Line. Note: While there are 28 weeks of data, there areonly n=27 weekly changes.

Table 29 provides the dates, weekly closing prices, and weekly percent changes of: the S&P500,Coca-Cola, and Anheuser-Busch. The following summary calculations are also provided, with Xrepresenting the S&P500, YC representing Coca-Cola, and YA representing Anheuser-Busch. Allcalculations should be based on 4 decimal places. Figure ?? gives the plot and least squaresregression line for Anheuser-Busch, and Figure ?? gives the plot and least squares regression linefor Coca-Cola.

∑X = 15.5200

∑YC = −2.4882

∑YA = 2.4281

∑X2 = 124.6354

∑Y 2

C = 461.7296∑

Y 2A = 195.4900

∑XYC = 161.4408

∑XYA = 84.7527

Page 62: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

Closing S&P A-B C-C S&P A-B C-CDate Price Price Price % Chng % Chng % Chng

05/20/97 829.75 43.00 66.88 – – –05/27/97 847.03 42.88 68.13 2.08 -0.28 1.8706/02/97 848.28 42.88 68.50 0.15 0.00 0.5406/09/97 858.01 41.50 67.75 1.15 -3.22 -1.0906/16/97 893.27 43.00 71.88 4.11 3.61 6.1006/23/97 898.70 43.38 71.38 0.61 0.88 -0.7006/30/97 887.30 42.44 71.00 -1.27 -2.17 -0.5307/07/97 916.92 43.69 70.75 3.34 2.95 -0.3507/14/97 916.68 43.75 69.81 -0.03 0.14 -1.3307/21/97 915.30 45.50 69.25 -0.15 4.00 -0.8007/28/97 938.79 43.56 70.13 2.57 -4.26 1.2708/04/97 947.14 43.19 68.63 0.89 -0.85 -2.1408/11/97 933.54 43.50 62.69 -1.44 0.72 -8.6608/18/97 900.81 42.06 58.75 -3.51 -3.31 -6.2808/25/97 923.55 43.38 60.69 2.52 3.14 3.3009/01/97 899.47 42.63 57.31 -2.61 -1.73 -5.5709/08/97 929.05 44.31 59.88 3.29 3.94 4.4809/15/97 923.91 44.00 57.06 -0.55 -0.70 -4.7109/22/97 950.51 45.81 59.19 2.88 4.11 3.7309/29/97 945.22 45.13 61.94 -0.56 -1.48 4.6510/06/97 965.03 44.75 62.38 2.10 -0.84 0.7110/13/97 966.98 43.63 61.69 0.20 -2.50 -1.1110/20/97 944.16 42.25 58.50 -2.36 -3.16 -5.1710/27/97 941.64 40.69 55.50 -0.27 -3.69 -5.1311/03/97 914.62 39.94 56.63 -2.87 -1.84 2.0411/10/97 927.51 40.81 57.00 1.41 2.18 0.6511/17/97 928.35 42.56 57.56 0.09 4.29 0.9811/24/97 963.09 43.63 63.75 3.74 2.51 10.75

Table 29: Weekly closing stock prices – S&P 500, Anheuser-Busch, Coca-Cola

Page 63: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

y a

- 5

- 4

- 3

- 2

- 1

0

1

2

3

4

5

x

- 4 - 3 - 2 - 1 0 1 2 3 4 5

Figure 11: Plot of weekly percent stock price changes for Anheuser-Busch versus S&P 500 and leastsquares regression line

y a

- 5

- 4

- 3

- 2

- 1

0

1

2

3

4

5

x

- 4 - 3 - 2 - 1 0 1 2 3 4 5

Figure 12: Plot of weekly percent stock price changes for Coca-Cola versus S&P 500 and leastsquares regression line

Page 64: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

a) Compute SSXX , SSXYC, and SSXYA

.

b) Compute the stock betas for Coca-Cola and Anheuser-Busch.

Example 10.3 – Estimating Cost Functions of a Hosiery Mill

The following (approximate) data were published by Joel Dean, in the 1941 article: “StatisticalCost Functions of a Hosiery Mill,” (Studies in Business Administration, vol. 14, no. 3).

Y — Monthly total production cost (in $1000s).

X — Monthly output (in thousands of dozens produced).

A sample of n = 48 months of data were used, with Xi and Yi being measured for each month.The parameter β1 represents the change in mean cost per unit increase in output (unit variablecost), and β0 represents the true mean cost when the output is 0, without shutting plant (fixedcost). The data are given in Table 10.3 (the order is arbitrary as the data are printed in table form,and were obtained from visual inspection/approximation of plot).

i Xi Yi i Xi Yi i Xi Yi

1 46.75 92.64 17 36.54 91.56 33 32.26 66.712 42.18 88.81 18 37.03 84.12 34 30.97 64.373 41.86 86.44 19 36.60 81.22 35 28.20 56.094 43.29 88.80 20 37.58 83.35 36 24.58 50.255 42.12 86.38 21 36.48 82.29 37 20.25 43.656 41.78 89.87 22 38.25 80.92 38 17.09 38.017 41.47 88.53 23 37.26 76.92 39 14.35 31.408 42.21 91.11 24 38.59 78.35 40 13.11 29.459 41.03 81.22 25 40.89 74.57 41 9.50 29.0210 39.84 83.72 26 37.66 71.60 42 9.74 19.0511 39.15 84.54 27 38.79 65.64 43 9.34 20.3612 39.20 85.66 28 38.78 62.09 44 7.51 17.6813 39.52 85.87 29 36.70 61.66 45 8.35 19.2314 38.05 85.23 30 35.10 77.14 46 6.25 14.9215 39.16 87.75 31 33.75 75.47 47 5.45 11.4416 38.59 92.62 32 34.29 70.37 48 3.79 12.69

Table 30: Production costs and Output – Dean (1941).

This dataset has n = 48 observations with a mean output (in 1000s of dozens) of X = 31.0673,and a mean monthly cost (in $1000s) of Y = 65.4329.n∑

i=1

Xi = 1491.23n∑

i=1

X2i = 54067.42

n∑

i=1

Yi = 3140.78n∑

i=1

Y 2i = 238424.46

n∑

i=1

XiYi = 113095.80

From these quantites, we get:

Page 65: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

• SSXX =∑n

i=1 X2i − (

∑n

i=1Xi)2

n = 54067.42 − (1491.23)2

48 = 54067.42 − 46328.48 = 7738.94

• SSXY =∑n

i=1 XiYi−(∑n

i=1Xi)(

∑n

i=1Yi)

n = 113095.80− (1491.23)(3140.78)48 = 113095.80−97575.53 =

15520.27

• SSY Y =∑n

i=1 Y 2i − (

∑n

i=1Yi)2

n = 238424.46 − (3140.78)2

48 = 238424.46 − 205510.40 = 32914.06

b1 =∑n

i=1 XiYi −(∑n

i=1Xi)(

∑n

i=1Yi)

n∑n

i=1 X2i − (

∑n

i=1Xi)2

n

=SSXY

SSXX

15520.277738.94

= 2.0055

b0 = Y − b1X = 65.4329 − (2.0055)(31.0673) = 3.1274

Yi = b0 + b1Xi = 3.1274 + 2.0055Xi i = 1, . . . , 48

ei = Yi − Yi = Yi − (3.1274 + 2.0055Xi) i = 1, . . . , 48

Table 10.3 gives the raw data, their fitted values, and residuals.A plot of the data and regression line are given in Figure 13.

c o s t

1 0

2 0

3 0

4 0

5 0

6 0

7 0

8 0

9 0

1 0 0

s i z e

0 1 0 2 0 3 0 4 0 5 0

Y = M a t h S c o r e X = T i s s u e L S D C o n c e n t r a t i o n

s c o r e

2 0

3 0

4 0

5 0

6 0

7 0

8 0

c o n c

1 2 3 4 5 6 7

c o s t

1 0

2 0

3 0

4 0

5 0

6 0

7 0

8 0

9 0

1 0 0

s i z e

0 1 0 2 0 3 0 4 0 5 0

c o s t

1 0

2 0

3 0

4 0

5 0

6 0

7 0

8 0

9 0

1 0 0

s i z e

0 1 0 2 0 3 0 4 0 5 0

Figure 13: Estimated cost function for hosiery mill (Dean, 1941)

Page 66: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

i Xi Yi Yi ei

1 46.75 92.64 96.88 -4.242 42.18 88.81 87.72 1.093 41.86 86.44 87.08 -0.644 43.29 88.80 89.95 -1.155 42.12 86.38 87.60 -1.226 41.78 89.87 86.92 2.957 41.47 88.53 86.30 2.238 42.21 91.11 87.78 3.339 41.03 81.22 85.41 -4.1910 39.84 83.72 83.03 0.6911 39.15 84.54 81.64 2.9012 39.20 85.66 81.74 3.9213 39.52 85.87 82.38 3.4914 38.05 85.23 79.44 5.7915 39.16 87.75 81.66 6.0916 38.59 92.62 80.52 12.1017 36.54 91.56 76.41 15.1518 37.03 84.12 77.39 6.7319 36.60 81.22 76.53 4.6920 37.58 83.35 78.49 4.8621 36.48 82.29 76.29 6.0022 38.25 80.92 79.84 1.0823 37.26 76.92 77.85 -0.9324 38.59 78.35 80.52 -2.1725 40.89 74.57 85.13 -10.5626 37.66 71.60 78.65 -7.0527 38.79 65.64 80.92 -15.2828 38.78 62.09 80.90 -18.8129 36.70 61.66 76.73 -15.0730 35.10 77.14 73.52 3.6231 33.75 75.47 70.81 4.6632 34.29 70.37 71.90 -1.5333 32.26 66.71 67.82 -1.1134 30.97 64.37 65.24 -0.8735 28.20 56.09 59.68 -3.5936 24.58 50.25 52.42 -2.1737 20.25 43.65 43.74 -0.0938 17.09 38.01 37.40 0.6139 14.35 31.40 31.91 -0.5140 13.11 29.45 29.42 0.0341 9.50 29.02 22.18 6.8442 9.74 19.05 22.66 -3.6143 9.34 20.36 21.86 -1.5044 7.51 17.68 18.19 -0.5145 8.35 19.23 19.87 -0.6446 6.25 14.92 15.66 -0.7447 5.45 11.44 14.06 -2.6248 3.79 12.69 10.73 1.96

Table 31: Approximated Monthly Outputs, total costs, fitted values and residuals – Dean (1941).

Page 67: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

We have seen now, how to estimate β0 and β1. Now we can obtain an estimate of the variance ofthe responses at a given value of X. Recall from your previous statistics course, you estimated thevariance by taking the ‘average’ squared deviation of each measurement from the sample (estimated)

mean. That is, you calculated S2 =∑n

i=1(Yi−Y )2

n−1 . Now that we fit the regression model, we knowlonger use Y to estimate the mean for each Yi, but rather Yi = b0 + b1Xi to estimate the mean.The estimate we use now looks similar to the previous estimate except we replace Y with Yi andwe replace n − 1 with n − 2 since we have estimated 2 parameters, β0 and β1. The new estimate(which we will refer as to the residual variance) is:

S2e = MSE =

SSE

n − 2=∑n

i=1(Yi − Yi)n − 2

=SSY Y − (SSXY )2

SSXX

n − 2.

This estimated variance S2e can be thought of as the ‘average’ squared distance from each observed

response to the fitted line. The word average is in quotes since we divide by n− 2 and not n. Thecloser the observed responses fall to the line, the smaller S2

e is and the better our predicted valueswill be.

Example 10.1 (Continued) – Coffee Sales and Shelf SpaceFor the coffee data,

S2e =

112772.9 − (2490)2

72

12 − 2=

112772.9 − 86112.510

= 2666.04,

and the estimated residual standard error (deviation) is Se =√

2666.04 = 51.63. We now haveestimates for all of the parameters of the regression equation relating the mean weekly sales to theamount of shelf space the coffee gets in the store. Figure 14 shows the 12 observed responses andthe estimated (fitted) regression equation.

S A L E S

3 0 0

4 0 0

5 0 0

6 0 0

7 0 0

S P A C E

0 3 6 9 1 2

Figure 14: Plot of coffee data and fitted equation

Example 10.3 (Continued) – Estimating Cost Functions of a Hosiery Mill

Page 68: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

For the cost function data:

• SSE =∑n

i=1(Yi − Yi)2 = SSY Y − SS2XY

SSXX= 32914.06 − (15520.27)2

7738.94 = 32914.06 − 31125.55 =1788.51

• S2e = MSE = SSE

n−2 = 1788.5148−2 = 38.88

• Se =√

38.88 = 6.24

Page 69: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

11 Lecture 11 — Simple Regression II — Inferences Concerningβ1

Textbook Sections: 12.5,12.6Problems: 12.36,39,40,41, Compute 95% CI’s for β1 in these problems.

Recall that in our regression model, we are stating that E(Y |X) = β0 +β1X. In this model, β1

represents the change in the mean of our response variable Y , as the predictor variable X increasesby 1 unit. Note that if β1 = 0, we have that E(Y |X) = β0 + β1X = β0 + 0X = β0, which impliesthe mean of our response variable is the same at all values of X. In the context of the coffee salesexample, this would imply that mean sales are the same, regardless of the amount of shelf space, soa marketer has no reason to purchase extra shelf space. This is like saying that knowing the levelof the predictor variable does not help us predict the response variable.

Under the assumptions stated previously, namely that Y ∼ N(β0 + β1X,σ), our estimator b1

has a sampling distribution that is normal with mean β1 (the true value of the parameter), andstandard error σ√∑n

i=1(Xi−X)2

. That is:

b1 ∼ N(β1,σ√

SSXX)

We can now make inferences concerning β1.

11.1 A Confidence Interval for β1

Recall the general form of a (1 − α)100% confidence interval for a parameter θ. The interval is ofthe form:

θ ± zα/2σθ

for large samples, orθ ± tα/2Sθ

for small samples when we must estimate the parameter’s standard error σθ.This leads us to the general form of a (1 − α)100% confidence interval for β1. The interval can

be written:b1 ± tα/2,n−2Sb1 ≡ b1 ± tα/2,n−2

Se√SSXX

.

Note that Se√SSXX

is the estimated standard error of b1 since we use Se =√

MSE to estimate σ.Also, we have n − 2 degrees of freedom instead of n − 1, since the estimate S2

e has 2 estimatedparamters used in it (refer back to how we calculate it above).

Example 11.1 – Coffee Sales and Shelf Space

For the coffee sales example, we have the following results:

b1 = 34.5833, SSXX = 72, Se = 51.63, n = 12.

So a 95% confidence interval for the parameter β1 is:

34.5833 ± t.025,12−251.63√

72= 34.5833 ± 2.228(6.085) = 34.583 ± 13.557,

Page 70: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

which gives us the range (21.026, 48.140). We are 95% confident that the true mean sales increaseby between 21.026 and 48.140 bags of coffee per week for each extra foot of shelf space the brandgets (within the range of 3 to 9 feet). Note that the entire interval is positive (above 0), so we areconfident that in fact β1 > 0, so the marketer is justified in pursuing extra shelf space.

Example 11.2 – Hosiery Mill Cost Function

b1 = 2.0055, SSXX = 7738.94, Se = 6.24, n = 48.

For the hosiery mill cost function analysis, we obtain a 95% confidence interval for average unitvariable costs (β1). Note that t.025,48−2 = t.025,46 ≈ 2.015, since t.025,40 = 2.021 and t.025,60 = 2.000(we could approximate this with z.025 = 1.96 as well).

2.0055 ± t.025,466.24√7738.94

= 2.0055 ± 2.015(.0709) = 2.0055 ± 0.1429 = (1.8626, 2.1484)

We are 95% confident that the true average unit variable costs are between $1.86 and $2.15 (thisis the incremental cost of increasing production by one unit, assuming that the production processis in place.

11.2 Hypothesis Tests Concerning β1

Similar to the idea of the confidence interval, we can set up a test of hypothesis concerning β1.Since the confidence interval gives us the range of ‘believable’ values for β1, it is more useful thana test of hypothesis. However, here is the procedure to test if β1 is equal to some value, say β10.

• H0 : β1 = β01 (β0

1 specified, usually 0)

• (1) Ha : β1 6= β01

(2) Ha : β1 > β01

(3) Ha : β1 < β01

• TS : tobs = b1−β01

Se√SSXX

• (1) RR : |tobs| ≥ tα/2,n−2

(2) RR : tobs ≥ tα,n−2

(3) RR : tobs ≤ −tα,n−2

• (1) P–value: 2 · P (t ≥ |tobs|)(2) P–value: P (t ≥ tobs)

(3) P–value: P (t ≤ tobs)

Using tables, we can only place bounds on these p–values.

Page 71: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

Example 11.1 (Continued) – Coffee Sales and Shelf SpaceSuppose in our coffee example, the marketer gets a set amount of space (say 6′) for free, and

she must pay extra for any more space. For the extra space to be profitable (over the long run),the mean weekly sales must increase by more than 20 bags, otherwise the expense outweighs theincrease in sales. She wants to test to see if it is worth it to buy more space. She works under theassumption that it is not worth it, and will only purchase more if she can show that it is worth it.She sets α = .05.

1. H0 : β1 = 20 HA : β1 > 20

2. T.S.: tobs = 34.5833−2051.63√

72

= 14.58336.085 = 2.397

3. R.R.: tobs > t.05,10 = 1.812

4. p-value: P (T > 2.397) < P (T > 2.228) = .025 and P (T > 2.397) > P (T > 2.764) = .010, so.01 < p − value < .025.

So, she has concluded that β1 > 20, and she will purchase the shelf space. Note also that the entireconfidence interval was over 20, so we already knew this.

Example 11.2 (Continued) – Hosiery Mill Cost Function

Suppose we want to test whether average monthly production costs increase with monthlyproduction output. This is testing whether unit variable costs are positive (α = 0.05).

• H0 : β1 = 0 (Mean Monthly production cost is not associated with output)

• HA : β1 > 0 (Mean monthly production cost increases with output)

• TS : tobs = 2.0055−06.24√

7738.94

= 2.00550.0709 = 28.29

• RR : tobs > t0.05,46 ≈ 1.680 (or use z0.05 = 1.645)

• p-value: P (T > 28.29) ≈ 0

We have overwhelming evidence of positive unit variable costs.

11.3 The Analysis of Variance Approach to Regression

Consider the deviations of the individual responses, Yi, from their overall mean Y . We wouldlike to break these deviations into two parts, the deviation of the observed value from its fittedvalue, Yi = b0 + b1Xi, and the deviation of the fitted value from the overall mean. See Figure 15corresponding to the coffee sales example. That is, we’d like to write:

Yi − Y = (Yi − Yi) + (Yi − Y ).

Note that all we are doing is adding and subtracting the fitted value. It so happens thatalgebraically we can show the same equality holds once we’ve squared each side of the equationand summed it over the n observed and fitted values. That is,

n∑

i=1

(Yi − Y )2 =n∑

i=1

(Yi − Yi)2 +n∑

i=1

(Yi − Y )2.

Page 72: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

S A L E S

3 0 0

4 0 0

5 0 0

6 0 0

7 0 0

S P A C E

0 3 6 9 1 2

Figure 15: Plot of coffee data, fitted equation, and the line Y = 515.4167

These three pieces are called the total, error, and model sums of squares, respectively. Wedenote them as SSyy, SSE, and SSR, respectively. We have already seen that SSyy represents thetotal variation in the observed responses, and that SSE represents the variation in the observedresponses around the fitted regression equation. That leaves SSR as the amount of the totalvariation that is ‘accounted for’ by taking into account the predictor variable X. We can usethis decomposition to test the hypothesis H0 : β1 = 0 vs HA : β1 6= 0. We will also find thisdecomposition useful in subsequent sections when we have more than one predictor variable. Wefirst set up the Analysis of Variance (ANOVA) Table in Table 32. Note that we will have tomake minimal calculations to set this up since we have already computed SSyy and SSE in theregression analysis.

ANOVASource of Sum of Degrees of MeanVariation Squares Freedom Square F

MODEL SSR =∑n

i=1(Yi − Y )2 1 MSR = SSR1 F = MSR

MSE

ERROR SSE =∑n

i=1(Yi − Yi)2 n − 2 MSE = SSEn−2

TOTAL SSY Y =∑n

i=1(Yi − Y )2 n − 1

Table 32: The Analysis of Variance Table for simple regression

The procedure of testing for a linear association between the response and predictor variablesusing the analysis of variance involves using the F–distribution, which is given in Table A.7 (ppA-16–A-25) of your text book. This is the same distribution we used in the previous chapter.

The testing procedure is as follows:

1. H0 : β1 = 0 HA : β1 6= 0 (This will always be a 2–sided test)

2. T.S.: Fobs = MSRMSE

3. R.R.: Fobs > F1,n−2,α

4. p-value: P (F > Fobs) (You can only get bounds on this, but computer outputs report themexactly)

Page 73: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

Note that we already have a procedure for testing this hypothesis (see the section on InferencesConcerning β1), but this is an important lead–in to multiple regression.

Example 11.1 (Continued) – Coffee Sales and Shelf Space

Referring back to the coffee sales data, we have already made the following calculations:

SSY Y = 112772.9, SSE = 26660.4, n = 12.

We then also have that SSR = SSY Y −SSE = 86112.5. Then the Analysis of Variance is given inTable 33.

ANOVASource of Sum of Degrees of MeanVariation Squares Freedom Square F

MODEL SSR = 86112.5 1 MSR = 86112.51 = 86112.5 F = 86112.5

2666.04 = 32.30ERROR SSE = 26660.4 12 − 2 = 10 MSE = 26660.4

10 = 2666.04TOTAL SSY Y = 112772.9 12 − 1 = 11

Table 33: The Analysis of Variance Table for the coffee data example

To test the hypothesis of no linear association between amount of shelf space and mean weeklycoffee sales, we can use the F -test described above. Note that the null hypothesis is that there isno effect on mean sales from increasing the amount of shelf space. We will use α = .01.

1. H0 : β1 = 0 HA : β1 6= 0

2. T.S.: Fobs = MSRMSE = 86112.5

2666.04 = 32.30

3. R.R.: Fobs > F1,n−2,α = F1,10,.01 = 10.04

4. p-value: P (F > Fobs) = P (F > 32.30) < P (F > 12.83) = .005 (p-value < .005). See p. A-24.

We reject the null hypothesis, and conclude that β1 6= 0. There is an effect on mean weekly saleswhen we increase the shelf space.

Example 11.2 (Continued) – Hosiery Mill Cost Function

For the hosiery mill data, the sums of squares for each source of variation in monthly productioncosts and their corresponding degrees of freedom are (from previous calculations):

Total SS – SSY Y =∑n

i=1(Yi − Y )2 = 32914.06 dfTotal = n − 1 = 47

Error SS – SSE =∑n

i=1(Yi − Yi)2 = 1788.51 dfE = n − 2 = 46

Model SS – SSR =∑n

i=1(Yi −Y )2 = SSY Y −SSE = 32914.06− 1788.51 = 31125.55 dfR = 1

The Analysis of Variance is given in Table 34.To test whether there is a linear association between mean monthly costs and monthly produc-

tion output, we conduct the F -test (α = 0.05).

Page 74: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

ANOVASource of Sum of Degrees of MeanVariation Squares Freedom Square F

MODEL SSR = 31125.55 1 MSR = 31125.551 = 31125.55 F = 31125.55

38.88 = 800.55ERROR SSE = 1788.51 48 − 2 = 46 MSE = 1788.51

46 = 38.88TOTAL SSY Y = 32914.06 48 − 1 = 47

Table 34: The Analysis of Variance Table for the hosiery mill cost example

1. H0 : β1 = 0 HA : β1 6= 0

2. T.S.: Fobs = MSRMSE = 31125.55

38.88 = 800.55

3. R.R.: Fobs > F1,n−2,α = F1,46,.05 ≈ 4.06

4. p-value: P (F > Fobs) = P (F > 800.55) <<<<< P (F > 8.83) = .005 (p-value <<<<<.005). See p. A-24 (with 40 denominator df).

We reject the null hypothesis, and conclude that β1 6= 0.

11.3.1 Coefficient of Determination

A measure of association that has a clear physical interpretation is r2, the coefficient of deter-mination. This measure is always between 0 and 1, so it does not reflect whether Y and X arepositively or negatively associated, and it represents the proportion of the total variation in theresponse variable that is ‘accounted’ for by fitting the regression on X. The formula for r2 is:

r2 = (r)2 = 1 − SSE

SSY Y=

SSR

SSY Y.

Note that SSyy =∑n

i=1(Yi − Y )2 represents the total variation in the response variable, whileSSE =

∑ni=1(Yi − Yi)2 represents the variation in the observed responses about the fitted equation

(after taking into account x). This is why we sometimes say that r2 is “proportion of the variationin Y that is ‘explained’ by X.”

Example 11.1 (Continued) – Coffee Sales and Shelf Space

For the coffee data, we can calculate r2 using the values of SSXY , SSXX , SSY Y , and SSE wehave previously obtained.

r2 = 1 − 26660.4112772.9

=86112.5112772.9

= .7636

Thus, over 3/4 of the variation in sales is “explained” by the model using shelf space to predictsales.

Example 11.2 (Continued) – Hosiery Mill Cost Function

Page 75: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

For the hosiery mill data, the model (regression) sum of squares is SSR = 31125.55 and thetotal sum of squares is SSY Y = 32914.06. To get the coefficient of determination:

r2 =31125.5532914.06

= 0.9457

Almost 95% of the variation in monthly production costs is “explained” by the monthly productionoutput.

Page 76: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

12 Lecture 12 — Simple Regression III – Estimating the Meanand Prediction at a Particular Level of X, Correlation

Textbook Sections: 12.7,12.8,12.9Problems: 12.42,43,Compute r for problems in Section 12.2We sometimes are interested in estimating the mean response at a particular level of the pre-

dictor variable, say X = X0. That is, we’d like to estimate E(Y |X0) = β0 + β1X0. The actualestimate is just Y0 = b0 + b1X0, which is simply where the fitted line crosses X = X0. Underthe previously stated normality assumptions, the estimator Y0 is normally distributed with mean

β0 + β1X0 and standard error of estimate σ

√1n + (X0−X)2∑n

i=1(Xi−X)2

. That is:

Y0 ∼ N(β0 + β1X0, σ

√1n

+(X0 − X)2

∑ni=1(Xi − X)2

).

Note that the standard error of the estimate is smallest at X0 = X, that is at the mean of thesampled levels of the predictor variable. The standard error increases as the value X0 goes awayfrom this mean.

For instance, our marketer may wish to estimate the mean sales when she has 6′ of shelf space,or 7′, or 4′. She may also wish to obtain a confidence interval for the mean at these levels of X.

12.1 A Confidence Interval for E(Y |X0) = β0 + β1X0

Using the ideas described in the previous section, we can write out the general form for a (1−α)100%confidence interval for the mean response when X = X0.

(b0 + b1X0) ± tα/2,n−2Se

√1n

+(X0 − X)2

SSXX

Example 12.1 – Coffee Sales and Shelf Space

Suppose our marketer wants to compute 90% confidence intervals for the mean weekly sales atX=4,6, and 7 feet, respectively (these are not simultaneous confidence intervals as were computedbased on Tukey’s Method previously). Each of these intervals will depend on tα/2,n−2 = t.05,10 =1.812 and X = 6. These intervals are:

(307.967 + 34.5833(4)) ± 1.812(51.63)

√112

+(4 − 6)2

72= 446.300 ± 93.554

√.1389

= 446.300 ± 34.866 ≡ (411.434, 481.166)

(307.967 + 34.5833(6)) ± 1.812(51.63)

√112

+(6 − 6)2

72= 515.467 ± 93.554

√.0833

= 515.467 ± 27.001 ≡ (488.465, 542.468)

(307.967 + 34.5833(7)) ± 1.812(51.63)

√112

+(7 − 6)2

72= 550.050 ± 93.554

√.0972

Page 77: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

= 550.050 ± 29.171 ≡ (520.879, 579.221)

Notice that the interval is the narrowest at X0 = 6. Figure 16 is a computer generated plotof the data, the fitted equation and the confidence limits for the mean weekly coffee sales at eachvalue of X. Note how the limits get wider as X goes away from X = 6. Would these intervals bewider or narrower, had they been 95% confidence intervals?

S A L E S

3 0 0

4 0 0

5 0 0

6 0 0

7 0 0

S P A C E

0 3 6 9 1 2

Figure 16: Plot of coffee data, fitted equation, and 90% confidence limits for the mean

Example 12.2 – Hosiery Mill Cost Function

Suppose the plant manager is interested in mean costs among months where output is 30,000items produced (X0 = 30). She wants a 95% confidence interval for this true unknown mean.Recall:

b0 = 3.1274 b1 = 2.0055 Se = 6.24 n = 48 X = 31.0673 SSXX = 7738.94t.025,46 ≈ 2.015

Then the interval is obtained as:

3.1274 + 2.0055(30)± 2.015(6.24)

√148

+(30 − 31.0673)2

7738.94

≡ 63.29± 2.015(6.24)√

0.0210 ≡ 63.29± 1.82 ≡ (61.47, 65.11)

We can be 95% confident that the mean production costs among months where 30,000 items areproduced is between $61,470 and $65,110 (recall units were thousands for X and thousands for Y ).A plot of the data, regression line, and 95% confidence bands for mean costs is given in Figure 17.

12.2 Predicting a Future Response at a Given Level of X

In many situations, a researcher would like to predict the outcome of the response variable at aspecific level of the predictor variable. In the previous section we estimated the mean response,in this section we are interested in predicting a single outcome. In the context of the coffee salesexample, this would be like trying to predict next week’s sales given we know that we will have 6′

of shelf space.

Page 78: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

c o s t

1 0

2 0

3 0

4 0

5 0

6 0

7 0

8 0

9 0

1 0 0

s i z e

0 1 0 2 0 3 0 4 0 5 0

Figure 17: Plot of hosiery mill cost data, fitted equation, and 95% confidence limits for the mean

Page 79: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

First, suppose you know the parameters β0 and β1 Then you know that the response variable,for a fixed level of the predictor variable (X = X0), is normally distributed with mean E(Y |X0) =β0 + β1X0 and standard deviation σ. We know from previous work with the normal distributionthat approximately 95% of the measurements lie within 2 standard deviations of the mean. So if weknow β0, β1, and σ, we would be very confident that our response would lie between (β0+β1X0)−2σand (β0 + β1X0) + 2σ. Figure 18 represents this idea.

F 2

0 . 0 0 0

0 . 0 0 5

0 . 0 1 0

0 . 0 1 5

0 . 0 2 0

0 . 0 2 5

0 . 0 3 0

0 . 0 3 5

0 . 0 4 0

X

5 0 6 0 7 0 8 0 9 0 1 0 0 1 1 0 1 2 0 1 3 0 1 4 0 1 5 0

Figure 18: Distribution of response variable with known β0, β1, and σ

We rarely, if ever, know these parameters, and we must estimate them as we have in previoussections. There is uncertainty in what the mean response at the specified level, X0, of the responsevariable. We do, however know how to obtain an interval that we are very confident contains thetrue mean β0 +β1X0. If we apply the method of the previous paragraph to all ‘believable’ values ofthis mean we can obtain a prediction interval that we are very confident will contain our futureresponse. Since σ is being estimated as well, instead of 2 standard deviations, we must use tα/2,n−2

estimated standard deviations. Figure 19 portrays this idea.

F 1

0 . 0 0 0

0 . 0 0 5

0 . 0 1 0

0 . 0 1 5

0 . 0 2 0

0 . 0 2 5

0 . 0 3 0

0 . 0 3 5

0 . 0 4 0

X

2 0 6 0 1 0 0 1 4 0 1 8 0

Figure 19: Distribution of response variable with estimated β0, β1, and σ

Note that all we really need are the two extreme distributions from the confidence interval for

Page 80: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

the mean response. If we use the method from the last paragraph on each of these two distributions,we can obtain the prediction interval by choosing the left–hand point of the ‘lower’ distributionand the right–hand point of the ‘upper’ distribution. This is displayed in Figure 20.

F 1

0 . 0 0 0

0 . 0 0 5

0 . 0 1 0

0 . 0 1 5

0 . 0 2 0

0 . 0 2 5

0 . 0 3 0

0 . 0 3 5

0 . 0 4 0

X

2 0 6 0 1 0 0 1 4 0 1 8 0

Figure 20: Upper and lower prediction limits when we have estimated the mean

The general formula for a (1 − α)100% prediction interval of a future response is similar to theconfidence interval for the mean at X0, except that it is wider to reflect the variation in individualresponses. The formula is:

(b0 + b1X0) ± tα/2,n−2s

1 +1n

+(X0 − X)2

SSXX.

Example 12.1 (Continued) – Coffee Sales and Shelf Space

For the coffee example, suppose the marketer wishes to predict next week’s sales when the coffeewill have 5′ of shelf space. She would like to obtain a 95% prediction interval for the number ofbags to be sold. First, we observe that t.025,10 = 2.228, all other relevant numbers can be found inthe previous example. The prediction interval is then:

(307.967 + 34.5833(5))± 2.228(51.63)

√1 +

112

+(5 − 6)2

72= 480.883± 93.554

√1.0972

= 480.883± 97.996 ≡ (382.887, 578.879).

This interval is relatively wide, reflecting the large variation in weekly sales at each level of x. Notethat just as the width of the confidence interval for the mean response depends on the distancebetween X0 and X, so does the width of the prediction interval. This should be of no surprise,considering the way we set up the prediction interval (see Figure 19 and Figure 20). Figure 21shows the fitted equation and 95% prediction limits for this example.

It must be noted that a prediction interval for a future response is only valid if conditions aresimilar when the response occurs as when the data was collected. For instance, if the store is beingboycotted by a bunch of animal rights activists for selling meat next week, our prediction intervalwill not be valid.

Page 81: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

S A L E S

3 0 0

4 0 0

5 0 0

6 0 0

7 0 0

S P A C E

0 3 6 9 1 2

Figure 21: Plot of coffee data, fitted equation, and 95% prediction limits for a single response

Example 12.2 (Continued) – Hosiery Mill Cost Function

Suppose the plant manager knows based on purchase orders that this month, her plant willproduce 30,000 items (X0 = 30.0). She would like to predict what the plant’s production costs willbe. She obtains a 95% prediction interval for this month’s costs.

3.1274 + 2.0055(30)± 2.015(6.24)

√1 +

148

+(30 − 31.0673)2

7738.94≡ 63.29± 2.015(6.24)

√1.0210

≡ 63.29± 12.70 ≡ (50.59, 75.99)

She predicts that the costs for this month will be between $50,590 and $75,990. This interval ismuch wider than the interval for the mean, since it includes random variation in monthly costsaround the mean. A plot of the 95% prediction bands is given in Figure 22.

12.3 Coefficient of Correlation

In many situations, we would like to obtain a measure of the strength of the linear associationbetween the variables Y and X. One measure of this association that is reported in researchjournals from many fields is the Pearson product moment coefficient of correlation. This measure,denoted by r, is a number that can range from -1 to +1. A value of r close to 0 implies that thereis very little association between the two variables (Y tends to neither increase or decrease as Xincreases). A positive value of r means there is a positive association between y and x (Y tendsto increase as X increases). Similarly, a negative value means there is a negative association (Ytends to decrease as X increases). If r is either +1 or -1, it means the data fall on a straight line(SSE = 0) that has either a positive or negative slope, depending on the sign of r. The formulafor calculating r is:

r =SSXY√

SSXXSSY Y.

Note that the sign of r is always the same as the sign of b1.

Example 12.1 (Continued) – Coffee Sales and Shelf Space

Page 82: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

c o s t

1 0

2 0

3 0

4 0

5 0

6 0

7 0

8 0

9 0

1 0 0

s i z e

0 1 0 2 0 3 0 4 0 5 0

Figure 22: Plot of hosiery mill cost data, fitted equation, and 95% prediction limits for an individualoutcome

Page 83: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

For the coffee data, we can calculate r using the values of SSXY , SSXX , SSY Y we have previouslyobtained.

r =2490√

(72)(112772.9)=

24902849.5

= .8738

Example 12.2 (Continued) – Hosiery Mill Cost Function

For the hosiery mill cost function data, we have:

r =15520.27√

(7738.94)(32914.06)=

15520.2715959.95

= .9725

Page 84: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

Computer Output for Coffee Sales Example (Sec 12.8)

Dependent Variable: SALESAnalysis of Variance

Sum of MeanSource DF Squares Square F Value Prob>FModel 1 86112.50000 86112.50000 32.297 0.0002Error 10 26662.41667 2666.24167C Total 11 112774.91667

Root MSE 51.63566 R-square 0.7636Dep Mean 515.41667 Adj R-sq 0.7399

Parameter Estimates

Parameter Standard T for H0:Variable DF Estimate Error Parameter=0 Prob > |T|INTERCEP 1 307.916667 39.43738884 7.808 0.0001SPACE 1 34.583333 6.08532121 5.683 0.0002

Dep Var Predict Std Err Lower95% Upper95% Lower95%Obs SALES Value Predict Mean Mean Predict1 421.0 411.7 23.568 359.2 464.2 285.22 412.0 411.7 23.568 359.2 464.2 285.23 443.0 411.7 23.568 359.2 464.2 285.24 346.0 411.7 23.568 359.2 464.2 285.25 526.0 515.4 14.906 482.2 548.6 395.76 581.0 515.4 14.906 482.2 548.6 395.77 434.0 515.4 14.906 482.2 548.6 395.78 570.0 515.4 14.906 482.2 548.6 395.79 630.0 619.2 23.568 566.7 671.7 492.7

10 560.0 619.2 23.568 566.7 671.7 492.711 590.0 619.2 23.568 566.7 671.7 492.712 672.0 619.2 23.568 566.7 671.7 492.7

Upper95% Upper95%Obs Predict Residual Obs Predict Residual1 538.1 9.3333 7 635.2 -81.41672 538.1 0.3333 8 635.2 54.58333 538.1 31.3333 9 745.6 10.83334 538.1 -65.6667 10 745.6 -59.16675 635.2 10.5833 11 745.6 -29.16676 635.2 65.5833 12 745.6 52.8333

Page 85: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

13 Lecture 13 — Multiple Regression I

Textbook Sections: 13.1,13.2Problems: 13.3,5,7,8,9,13

In most situations, we have more than one independent variable. While the amount of mathcan become overwhelming and involves matrix algebra, many computer packages exist that willprovide the analysis for you. In this chapter, we will analyze the data by interpreting the resultsof a computer program. It should be noted that simple regression is a special case of multipleregression, so most concepts we have already seen apply here.

13.1 The Multiple Regression Model and Least Squares Estimates

In general, if we have k predictor variables, we can write our response variable as:

Y = β0 + β1X1 + · · · + βkXk + ε.

Again, Y is broken into a systematic and a random component:

Y = β0 + β1X1 + · · · + βkXk︸ ︷︷ ︸systematic

+ ε︸︷︷︸random

We make the same assumptions as before in terms of ε, specifically that they are indepen-dent and normally distributed with mean 0 and standard deviation σ. That is, we are assumingthat Y , at a given set of levels of the k independent variables (X1, . . . ,Xk) is normal with meanE[Y |X1, . . . ,Xk] = β0 +β1X1 + · · ·+βkXk and standard deviation σ. Just as before, β0, β1, . . . , βk,and σ are unknown parameters that must be estimated from the sample data. The parameters βi

represent the change in the mean response when the ith predictor variable changes by 1 unit andall other predictor variables are held constant.

In this model:

• Y — Random outcome of the dependent variable

• β0 — Regression constant (E(Y |X1 = · · · = Xk = 0) if appropriate)

• βi — Partial regression coefficient for variable Xi (Change in E(Y ) when Xi increases by 1unit and all other X

′s are held constant)

• ε — Random error term, assumed (as before) that ε ∼ N(0, σ)

• k — The number of independent variables

By the method of least squares (choosing the bi values that minimize SSE =∑n

i=1(Yi − Yi)2),we obtain the fitted equation:

Y = b0 + b1X1 + b2X2 + · · · + bkXk

and our estimate of σ:

Se =

√∑(Y − Y )2

n − k − 1=

√SSE

n − k − 1

The Analysis of Variance table will be very similar to what we used previously, with the onlyadjustments being in the degrees’ of freedom. Table 35 shows the values for the general case whenthere are k predictor variables. We will rely on computer outputs to obtain the Analysis of Varianceand the estimates b0, b1, and bk.

Page 86: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

ANOVASource of Sum of Degrees of MeanVariation Squares Freedom Square F

MODEL SSR =∑n

i=1(Yi − Y )2 k MSR = SSRk F = MSR

MSE

ERROR SSE =∑n

i=1(Yi − Yi)2 n − k − 1 MSE = SSEn−k−1

TOTAL SSY Y =∑n

i=1(Yi − Y )2 n − 1

Table 35: The Analysis of Variance Table for multiple regression

13.2 Testing for Association Between the Response and the Full Set of PredictorVariables

To see if the set of predictor variables is useful in predicting the response variable, we will testH0 : β1 = β2 = . . . = βk = 0. Note that if H0 is true, then the mean response does not dependon the levels of the predictor variables. We interpret this to mean that there is no associationbetween the response variable and the set of predictor variables. To test this hypothesis, we usethe following method:

1. H0 : β1 = β2 = · · · = βk = 0

2. HA : Not every βi = 0

3. T.S.: Fobs = MSRMSE

4. R.R.: Fobs > Fα,k,n−k−1

5. p-value: P (F > Fobs) (You can only get bounds on this, but computer outputs report themexactly)

The computer automatically performs this test and provides you with the p-value of the test, soin practice you really don’t need to obtain the rejection region explicitly to make the appropriateconclusion. However, we will do so in this course to help reinforce the relationship between thetest’s decision rule and the p-value. Recall that we reject the null hypothesis if the p-value is lessthan α.

13.3 Testing Whether Individual Predictor Variables Help Predict the Re-sponse

If we reject the previous null hypothesis and conclude that not all of the βi are zero, we may wishto test whether individual βi are zero. Note that if we fail to reject the null hypothesis that βi iszero, we can drop the predictor Xi from our model, thus simplifying the model. Note that thistest is testing whether Xi is useful given that we are already fitting a model containingthe remaining k − 1 predictor variables. That is, does this variable contribute anything oncewe’ve taken into account the other predictor variables. These tests are t-tests, where we computet = bi

Sbijust as we did in the section on making inferences concerning β1 in simple regression. The

procedure for testing whether βi = 0 (the ith predictor variable does not contribute to predictingthe response given the other k − 1 predictor variables are in the model) is as follows:

• H0 : βi = 0 (Y is not associated with Xi after controlling for all other independent variables)

Page 87: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

• (1) HA : βi 6= 0

(2) HA : βi > 0

(3) HA : βi < 0

• T.S.: tobs = biSbi

• R.R.: (1) |tobs| > tα/2,n−k−1

(2) tobs > tα,n−k−1

(3) tobs < −tα,n−k−1

• (1) p–value: 2P (T > |tobs|)(2) p–value: P (T > tobs)

(3) p–value: P (T < tobs)

Computer packages print the test statistic and the p-value based on the two-sided test, so toconduct this test is simply a matter of interpreting the results of the computer output.

13.4 Testing for an Association Between a Subset of Predictor Variables andthe Response

We have seen the two extreme cases of testing whether all regression coefficients are simultaneously0 (the F -test), and the case of testing whether a single regression coefficient is 0, controlling forall other predictors (the t-test). We can also test whether a subset of the k regression coefficientsare 0, controlling for all other predictors. Note that the two extreme cases can be tested using thisvery general procedure.

To make the notation as simple as possible, suppose our model consists of k predictor vari-ables, of which we’d like to test whether q (q ≤ k) are simultaneously not associated with Y ,after controlling for the remaining k − q predictor variables. Further assume that the k − q re-maining predictors are labelled X1,X2, . . . ,Xk−q and that the q predictors of interest are labelledXk−q+1,Xk−q+2, . . . ,Xk.

This test is of the form:

H0 : βk−q+1 = βk−q+2 = · · · = βk = 0 HA : βk−q+1 6= 0 and/or βk−q+2 6= 0 and/or . . . and/or βk 6= 0

The procedure for obtaining the numeric elements of the test is as follows:

1. Fit the model under the null hypothesis (βk−q+1 = βk−q+2 = · · · = βk = 0). It will includeonly the first k − q predictor variables. This is referred to as the Reduced model. Obtainthe error sum of squares (SSE(R)) and the error degrees of freedom dfE(R) = n− (k−q)−1.

2. Fit the model with all k predictors. This is referred to as the Complete or Full model(and was used for the F -test for all regression coefficients). Obtain the error sum of squares(SSE(F )) and the error degrees of freedom (dfE(F ) = n − k − 1).

By definition of the least squares citerion, we know that SSE(R) ≥ SSE(F ). We now obtain thetest statistic:

TS : Fobs =SSE(R)−SSE(F )

(n−(k−q)−1)−(n−k−1)

SSE(F )n−k−1

==(SSE(R) − SSE(F ))/q

MSE(F )

Page 88: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

and our rejection region is values of Fobs ≥ Fα,q,n−k−1.

Example 13.1 – Texas Weather DataIn this example, we will use regression in the context of predicting an outcome. A construction

company is making a bid on a project in a remote area of Texas. A certain component of the projectwill take place in December, and is very sensitive to the daily high temperatures. They would liketo estimate what the average high temperature will be at the location in December. They believethat temperature at a location will depend on its latitude (measure of distance from the equator)and its elevation. That is, they believe that the response variable (mean daily high temperature inDecember at a particular location) can be written as:

Y = β0 + β1X1 + β2X2 + β3X3 + ε,

where X1 is the latitude of the location, X2 is the longitude, and X3 is its elevation (in feet). Asbefore, we assume that ε ∼ N(0, σ). Note that higher latitudes mean farther north and higherlongitudes mean farther west.

To estimate the parameters β0, β1, β2, beta3, and σ, they gather data for a sample of n = 16counties and fit the model described above. The data, including one other variable are given inTable 36.

COUNTY LATITUDE LONGITUDE ELEV TEMP INCOMEHARRIS 29.767 95.367 41 56 24322DALLAS 32.850 96.850 440 48 21870KENNEDY 26.933 97.800 25 60 11384MIDLAND 31.950 102.183 2851 46 24322DEAF SMITH 34.800 102.467 3840 38 16375KNOX 33.450 99.633 1461 46 14595MAVERICK 28.700 100.483 815 53 10623NOLAN 32.450 100.533 2380 46 16486ELPASO 31.800 106.40 3918 44 15366COLLINGTON 34.850 100.217 2040 41 13765PECOS 30.867 102.900 3000 47 17717SHERMAN 36.350 102.083 3693 36 19036TRAVIS 30.300 97.700 597 52 20514ZAPATA 26.900 99.283 315 60 11523LASALLE 28.450 99.217 459 56 10563CAMERON 25.900 97.433 19 62 12931

Table 36: Data corresponding to 16 counties in Texas

The results of the Analysis of Variance are given in Table 37 and the parameter estimates,estimated standard errors, t-statistics and p-values are given in Table 38. Full computer programsand printouts are given as well.

We see from the Analysis of Variance that at least one of the variables, latitude and elevation,are related to the response variable temperature. This can be seen by setting up the test H0 : β1 =β2 = β3 = 0 as described previously. The elements of this test, provided by the computer output,are detailed below, assuming α = .05.

1. H0 : β1 = β2 = β3 = 0

Page 89: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

ANOVASource of Sum of Degrees of MeanVariation Squares Freedom Square F p-valueMODEL SSR = 934.328 k = 3 MSR = 934.328

3 F = 311.4430.634 .0001

=311.443 =491.235ERROR SSE = 7.609 n − k − 1 = MSE = 7.609

1216 − 3 − 1 = 12 =0.634

TOTAL SSY Y = 941.938 n − 1 = 15

Table 37: The Analysis of Variance Table for Texas data

t FOR H0: STANDARD ERRORPARAMETER ESTIMATE βi=0 P-VALUE OF ESTIMATE

INTERCEPT (β0) b0=109.25887 36.68 .0001 2.97857LATITUDE (β1) b1 = −1.99323 −14.61 .0001 0.13639

LONGITUDE (β2) b2 = −0.38471 −1.68 .1182 0.22858ELEVATION (β3) b3 = −0.00096 −1.68 .1181 0.00057

Table 38: Parameter estimates and tests of hypotheses for individual parameters

2. HA : Not all βi = 0

3. T.S.: Fobs = MSRMSE = 311.443

0.634 = 491.235

4. R.R.: Fobs > F2,13,.05 = 3.81 (This is not provided on the output, the p-value takes the placeof it).

5. p-value: P (F > 644.45) = .0001 (Actually it is less than .0001, but this is the smallest p-valuethe computer will print).

We conclude that we are sure that at least one of these three variables is related to the responsevariable temperature.

We also see from the individual t-tests that latitude is useful in predicting temperature, evenafter taking into account the other predictor variables.

The formal test (based on α = 0.05 significance level) for determining wheteher temperature isassociated with latitude after controlling for longitude and elevation is given here:

• H0 : β1 = 0 (TEMP (Y ) is not associated with LAT (X1) after controlling for LONG (X2)and ELEV (X3))

• HA : βi 6= 0 (TEMP is associated with LAT after controlling for LONG and ELEV)

• T.S.: tobs = b1Sb1

= −1.993230.136399 = −14.614

• R.R.: |tobs| > tα/2,n−k−1 = t.025,12 = 2.179

• p–value: 2P (T > |tobs|) = 2P (T > 14.614) = .0001

Thus, we can conclude that there is an association between temperature and latitude, controllingfor longitude and elevation. Note that the coeficient is negative, so we conclude that temperaturedecreases as latitude increases (given a level of longitude and elevation).

Page 90: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

Note from Table 38 that neither the coefficient for LONGITUDE (X2) or ELEVATION (X3)are significant at the α = 0.05 significance level (p-values are .1182 and .1181, respectively). Recallthese are testing whether each term is 0 controlling for LATITUDE and the other term.

Before concluding that neither LONGITUDE (X2) or ELEVATION (X3) are useful predictors,controlling for LATITUDE, we will test whether they are both simultaneously 0, that is:

H0 : β2 = β3 = 0 vs HA : β2 6= 0 and/or β3 6= 0

First, note that we have:

n = 16 k = 3 q = 2 SSE(F ) = 7.609 dfE(F ) = 16 − 3 − 1 = 12 MSE(F ) = 0.634

dfE(R) = 16− (3 − 2) − 1 = 14 F.05,2,12 = 3.89

Next, we fit the model with only LATITUDE (X1) and obtain the error sum of squares: SSE(R) =60.935 and get the following test statistic:

TS : Fobs =(SSE(R) − SSE(F ))/q

MSE(F )=

(60.935 − 7.609)/20.634

=26.6630.634

= 42.055

Since 42.055 >> 3.89, we reject H0, and conclude that LONGITUDE (X2) and/or ELEVATION(X3) are associated with TEMPERATURE (Y ), after controlling for LATITUDE (X1).

The reason we failed to reject H0 : β2 = 0 and H0 : β3 = 0 individually based on the t-tests isthat ELEVATION and LONGITUDE are highly correlated (Elevations rise as you go further westin the state. So, once you control for LONGITUDE, we observe little ELEVATION effect, and viceversa. We will discuss why this is the case later. In theory, we have little reason to believe thattemperatures naturally increase or decrease with LONGITUDE, but we may reasonably expectthat as ELEVATION increases, TEMPERATURE decreases.

We re–fit the more parsimonious (simplistic) model that uses ELEVATION (X1) and LAT-ITUDE (X2) to predict TEMPERATURE (Y ). Note the new symbols for ELEVATION andLATITUDE. That is to show you that they are merely symbols. The results are given in Table 39and Table 40.

ANOVASource of Sum of Degrees of MeanVariation Squares Freedom Square F p-valueMODEL SSR = 932.532 k = 2 MSR = 932.532

2 F = 466.2660.634 .0001

=466.266 =644.014ERROR SSE = 9.406 n − k − 1 = MSE = 9.406

1316 − 2 − 1 = 13 =0.724

TOTAL SSY Y = 941.938 n − 1 = 15

Table 39: The Analysis of Variance Table for Texas data – without LONGITUDE

We see this by observing that the t-statistic for testing H0 : β1 = 0 (no latitude effect ontemperature) is −17.65, corresponding to a p-value of .0001, and the t-statistic for testing H0 :β2 = 0 (no elevation effect) is −8.41, also corresponding to a p-value of .0001. Further notethat both estimates are negative, reflecting that as elevation and latitude increase, temperaturedecreases. That should not come as any big surprise.

Page 91: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

t FOR H0: STANDARD ERRORPARAMETER ESTIMATE βi=0 P-VALUE OF ESTIMATE

INTERCEPT (β0) b0=63.45485 36.68 .0001 0.48750ELEVATION (β1) b1 = −0.00185 −8.41 .0001 0.00022LATITUDE (β2) b2 = −1.83216 −17.65 .0001 0.10380

Table 40: Parameter estimates and tests of hypotheses for individual parameters – without LON-GITUDE

The magnitudes of the estimated coefficients are quite different, which may make you believethat one predictor variable is more important than the other. This is not necessarily true, becausethe ranges of their levels are quite different (1 unit change in latitude represents a change ofapproximately 19 miles, while a unit change in elevation is 1 foot) and recall that βi represents thechange in the mean response when variable Xi is increased by 1 unit.

The data corresponding to the 16 locations in the sample are plotted in Figure 23 and the fittedequation for the model that does not include LONGITUDE is plotted in Figure 24. The fittedequation is a plane in three dimensions.

2 5 . 9 02 9 . 3 8

3 2 . 8 73 6 . 3 5

L A T 11 9

1 3 1 92 6 1 8

3 9 1 8

E L E V

3 6 . 0 0

4 4 . 6 7

5 3 . 3 3

6 2 . 0 0

T E M P

Figure 23: Plot of temperature data in 3 dimensions

Example 13.2 – Mortgage Financing Cost Variation (By City)A study in the mid 1960’s reported regional differences in mortgage costs for new homes. The

sampling units were n = 18 metro areas (SMSA’s) in the U.S. The dependent variable (Y ) is theaverage yield (in percent) on a new home mortgage for the SMSA. The independent variables (Xi)are given below.

Source: Schaaf, A.H. (1966), “Regional Differences in Mortgage Financing Costs,” Journal ofFinance, 21:85-94.

X1 – Average Loan Value / Mortgage Value Ratio (Higher X1 means lower down payment andhigher risk to lender).

Page 92: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

01 3 3 3

2 6 6 74 0 0 0

E L E V2 5

2 93 3

3 7

L A T

Y H A T

3 4 . 0 8

4 3 . 8 7

5 3 . 6 6

6 3 . 4 5

Figure 24: Plot of the fitted equation for temperature data

X2 – Road Distance from Boston (Higher X2 means further from Northeast, where most capitalwas at the time, and higher costs of capital).

X3 – Savings per Annual Dwelling Unit Constructed (Higher X3 means higher relative creditsurplus, and lower costs of capital).

X4 – Savings per Capita (does not adjust for new housing demand).

X5 – Percent Increase in Population 1950–1960

X6 – Percent of First Mortgage Debt Controlled by Inter-regional Banks.

The data, fitted values, and residuals are given in Table 41. The Analysis of Variance is givenin Table 42. The regression coefficients, test statistics, and p-values are given in Table 43.

Show that the fitted value for Los Angeles is 6.19, based on the fitted equation, and that theresidual is -0.02.

Based on the large F -statistic, and its small corresponding P -value, we conclude that this set ofpredictor variables is associated with the mortgage rate. That is, at least one of these independentvariables is associated with Y .

Based on the t-tests, while none are strictly significant at the α = 0.05 level, there is someevidence that X1 (Loan Value/Mortgage Value, P = .0515), X3 (Savings per Unit Constructed,P = .0593), and to a lesser extent, X4 (Savings per Capita, P = .1002) are helpful in predictingmortgage rates. We can fit a reduced model, with just these three predictors, and test whether wecan simultaneously drop X2, X5, and X6 from the model. That is:

H0 : β2 = β5 = β6 = 0 vs HA : β2 6= 0 and/or β5 6= 0 and/or β6 6= 0

First, we have the following values:

n = 18 k = 6 q = 3

SSE(F ) = 0.10980 dfE(F ) = 18− 6 − 1 = 11 MSE(F ) = 0.00998

dfE(R) = 18 − (6 − 3) − 1 = 14 F.05,3,11 = 3.59

Page 93: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

SMSA Y X1 X2 X3 X4 X5 X6 Y e = Y − YLos Angeles-Long Beach 6.17 78.1 3042 91.3 1738.1 45.5 33.1 6.19 -0.02Denver 6.06 77.0 1997 84.1 1110.4 51.8 21.9 6.04 0.02San Francisco-Oakland 6.04 75.7 3162 129.3 1738.1 24.0 46.0 6.05 -0.01Dallas-Fort Worth 6.04 77.4 1821 41.2 778.4 45.7 51.3 6.05 -0.01Miami 6.02 77.4 1542 119.1 1136.7 88.9 18.7 6.04 -0.02Atlanta 6.02 73.6 1074 32.3 582.9 39.9 26.6 5.92 0.10Houston 5.99 76.3 1856 45.2 778.4 54.1 35.7 6.02 -0.03Seattle 5.91 72.5 3024 109.7 1186.0 31.1 17.0 5.91 0.00New York 5.89 77.3 216 364.3 2582.4 11.9 7.3 5.82 0.07Memphis 5.87 77.4 1350 111.0 613.6 27.4 11.3 5.86 0.01New Orleans 5.85 72.4 1544 81.0 636.1 27.3 8.1 5.81 0.04Cleveland 5.75 67.0 631 202.7 1346.0 24.6 10.0 5.64 0.11Chicago 5.73 68.9 972 290.1 1626.8 20.1 9.4 5.60 0.13Detroit 5.66 70.7 699 223.4 1049.6 24.7 31.7 5.63 0.03Minneapolis-St Paul 5.66 69.8 1377 138.4 1289.3 28.8 19.7 5.81 -0.15Baltimore 5.63 72.9 399 125.4 836.3 22.9 8.6 5.77 -0.14Philadelphia 5.57 68.7 304 259.5 1315.3 18.3 18.7 5.57 0.00Boston 5.28 67.8 0 428.2 2081.0 7.5 2.0 5.41 -0.13

Table 41: Data and fitted values for mortgage rate multiple regression example.

ANOVASource of Sum of Degrees of MeanVariation Squares Freedom Square F p-valueMODEL SSR = 0.73877 k = 6 MSR = 0.73877

6 F = 0.123130.00998 .0003

=0.12313 =12.33ERROR SSE = 0.10980 n − k − 1 = MSE = 0.10980

1118 − 6 − 1 = 11 =0.00998

TOTAL SSY Y = 0.84858 n − 1 = 17

Table 42: The Analysis of Variance Table for Mortgage rate regression analysis

STANDARDPARAMETER ESTIMATE ERROR t-statistic P -value

INTERCEPT (β0) b0=4.28524 0.66825 6.41 .0001X1 (β1) b1 = 0.02033 0.00931 2.18 .0515X2 (β2) b2 = 0.000014 0.000047 0.29 .7775X3 (β3) b3 = −0.00158 0.000753 -2.10 .0593X4 (β4) b4 = 0.000202 0.000112 1.79 .1002X5 (β5) b5 = 0.00128 0.00177 0.73 .4826X6 (β6) b6 = 0.000236 0.00230 0.10 .9203

Table 43: Parameter estimates and tests of hypotheses for individual parameters – Mortgage rateregression analysis

Page 94: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

ANOVASource of Sum of Degrees of MeanVariation Squares Freedom Square F p-valueMODEL SSR = 0.73265 k − q = 3 MSR = 0.73265

3 F = 0.244220.00828 .0001

=0.24422 =29.49ERROR SSE = 0.11593 n − (k − q) − 1 = MSE = 0.11593

1418 − 3 − 1 = 14 =0.00828

TOTAL SSY Y = 0.84858 n − 1 = 17

Table 44: The Analysis of Variance Table for Mortgage rate regression analysis (Reduced Model)

STANDARDPARAMETER ESTIMATE ERROR t-statistic P -value

INTERCEPT (β0) b0=4.22260 0.58139 7.26 .0001X1 (β1) b1 = 0.02229 0.00792 2.81 .0138X3 (β3) b3 = −0.00186 0.00041778 -4.46 .0005X4 (β4) b4 = 0.000225 0.000074 3.03 .0091

Table 45: Parameter estimates and tests of hypotheses for individual parameters – Mortgage rateregression analysis (Reduced Model)

Next, we fit the reduced model, with β2 = β5 = β6 = 0. We get the Analysis of Variance inTable 44 and parameter estimates in Table 45.

Note first, that all three regression coefficients are significant now at the α = 0.05 significancelevel. Also, our residual standard error, Se =

√MSE has also decreased (0.09991 to 0.09100). This

implies we have lost very little predictive ability by dropping X2, X5, and X6 from the model. Nowto formally test whether these three predictor variables’ regression coefficients are simultaneously0 (with α = 0.05):

• H0 : β2 = β5 = β6 = 0

• HA : β2 6= 0 and/or β5 6= 0 and/or β6 6= 0

• TS : Fobs = (0.11593−0.10980)/20.00998 = .00307

.00998 = 0.307

• RR : Fobs ≥ F0.05,3,11 = 3.59

We fail to reject H0, and conclude that none of X2, X5, or X6 are associated with mortgagerate, after controlling for X1, X3, and X4.

Example 13.3 – Store Location Characteristics and SalesA study proposed using linear regression to describe sales at retail stores based on location

characteristics. As a case study, the authors modelled sales at n = 16 liquor stores in Charlotte,N.C. Note that in North Carolina, all stores are state run, and do not practice promotion as liquorstores in Florida do. The response was SALES volume (for the individual stores) in the fiscal year7/1/1979-6/30/1980. The independent variables were: POP (number of people living within 1.5miles of store), MHI (mean household income among households within 1.5 miles of store), DIS,(distance to the nearest store), TFL (daily traffic volume on the street the store was located), and

Page 95: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

EMP (the amount of employment within 1.5 miles of the store. The regression coefficients andstandard errors are given in Table 13.4.

Source: Lord, J.D. and C.D. Lynds (1981), “The Use of Regression Models in Store LocationResearch: A Review and Case Study,” Akron Business and Economic Review, Summer, 13-19.

Variable Estimate Std ErrorPOP 0.09460 0.01819MHI 0.06129 0.02057DIS 4.88524 1.72623TFL -2.59040 1.22768EMP -0.00245 0.00454

Table 46: Regression coefficients and standard errors for liquor store sales study

a) Do any of these variables fail to be associated with store sales after controlling for the others?

b) Consider the signs of the significant regression coefficients. What do they imply?

13.5 R2 and Adjusted–R2

As was discussed in the previous chapter, the coefficient of multiple determination represents theproportion of the variation in the dependent variable (Y ) that is “explained” by the regression on thecollection of independent variables: (X1,. . . ,Xk). We use R2 (as opposed) to r2 to differentiate thecoefficient of multiple determination from the coefficient of simple determination. R2 is computedexactly as before:

R2 =SSR

SSY Y= 1 − SSE

SSY Y

One problem with R2 is that when we continually add independent variables to a regressionmodel, it continually increases (or at least, never decreases), even when the new variable(s) addlittle or no predictive power. Since we are trying to fit the simplest (most parsimonious) modelthat explains the relationship between the set of independent variables and the dependent variable,we need a measure that penalizes models that contain useless or redundant independent variables.This penalization takes into account that by including useless or redundant predictors, we aredecreasing error degrees of freedom (dfE = n − k − 1). A second measure, that does not carry theproportion of variation explained criteria, but is useful for comparing models of varying degrees ofcomplexity, is Adjusted-R2:

Adjusted − R2 = 1 − SSE/(n − k − 1)SSY Y /(n − 1)

= 1 − n − 1n − k − 1

(SSE

SSY Y

)

Example 13.1 (Continued) – Texas Weather DataConsider the two models we have fit:

Page 96: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

Full Model — I.V.’s: LATITUDE, LONGITUDE, ELEVATION

Reduced Model — I.V.’s: LATITUDE, ELEVATION

For the Full Model, we have:

n = 16 k = 3 SSE = 7.609 SSY Y = 941.938

and, we obtain R2F and Adj-R2

F :

R2F = 1− 7.609

941.938= 1− .008 = 0.992 Adj−R2

F = 1− 1512

(7.609

941.938

)= 1− 1.25(.008) = 0.9900

For the Reduced Model, we have:

n = 16 k = 2 SSE = 9.406 SSY Y = 941.938

and, we obtain R2R and Adj-R2

R:

R2F = 1− 9.406

941.938= 1− .010 = 0.990 Adj−R2

F = 1− 1513

(9.406

941.938

)= 1− 1.15(.010) = 0.9885

Thus, by both measures the Full Model “wins”, but it should be added that both appear to fitthe data very well!

Example 13.2 (Continued) – Mortgage Financing CostsFor the mortgage data (with Total Sum of Squares SSY Y = 0.84858 and n = 18), when we

include all 6 independent variables in the full model, we obtain the following results:

SSR = 0.73877 SSE = 0.10980 k = 6

From this full model, we compute R2 and Adj-R2:

R2F =

SSRF

SSY Y=

0.738770.84858

= 0.8706 Adj−R2F = 1− n − 1

n − k − 1

(SSEF

SSY Y

)= 1−17

11

(0.109800.84858

)= 0.8000

Example 13.3 (Continued) – Store Location Characteristics and SalesIn this study, the authors reported that R2 = 0.69. Note that although we are not given the

Analysis of Variance, we can still conduct the F test for the overall model:

F =MSR

MSE=

SSR/k

SSE/(n − k − 1)=

SSRSSY Y

/kSSESSY Y

/(n − k − 1)=

R2/k

(1 − R2)/(n − k − 1)

For the liquor store example, there were n = 16 stores and k = 5 variables in the full model. Totest:

H0 : β1 = β2 = β3 = β4 = β5 = 0 vs HA : Not all βi = 0

we get the following test statistic and rejection region (α = 0.05):

TS : Fobs =0.69/5

(1 − 0.69)/(16 − 5 − 1)=

0.1380.031

= 4.45 RR : Fobs ≥ Fα,k,n−k−1 = F0.05,5,10 = 3.33

Thus, at least one of these variables is associated with store sales.

What is Adjusted -R2 for this analysis?

Page 97: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

14 Lecture 14 — Special Cases of Multiple Regression

Textbook Sections: 13.3,13.4,Skim 13.5Problems: 13.17,18,20,21,22,26

In this section, we will look at three special cases that are frequently used methods of multipleregression. The ideas such as the Analysis of Variance, tests of hypotheses, and parameter estimatesare exactly the same as before and we will concentrate on their interpretation through specificexamples. The four special cases are:

1. Polynomial regression

2. Regression models with dummy variables

3. Regression models containing interaction terms

14.1 Polynomial Regression

While certainly not restricted to this case, it is best to describe polynomial regression in the caseof a model with only one predictor variable. In many real–world settings relationships will not belinear, but will demonstrate nonlinear associations. In economics, a widely described phenomenonis “diminishing marginal returns”. In this case, Y may increase with X, but the rate of increasedecreases over the range of X. By adding quadratic terms, we can test if this is the case. Othersituations may show that the rate of increase in Y is increasing in X.

Example 14.1 – Health Club DemandConsider the dilemma of an owner of a new health club. She decides that she will charge a

very low membership fee and will charge people a fixed amount each time they come (this is tohelp attract customers who are scared off by those tremendous membership fees that some clubscharge). The premise to her plan is that she needs to have a fairly steady and heavy daily clienteleto keep money flowing sufficiently. A new, aesthetically pleasing, popular machine known as theMEGABODY MAKER 3000 has just been mass produced, and she wants this to be the sellingpoint of her new club. She believes that the more of these machines she has in her club, the higherthe daily attendance will be. However, she also knows that this increase in attendance will ‘tail-off’as the number of machines keeps getting larger. In the world of Economics, this is referred to as“diminishing returns”. That is, she may expect mean daily attendance to increase by 300 peopleif she were to increase from 1 to 2 machines, but possibly only increase by 20 people if she were toincrease from 5 to 6 machines. That is, she would expect the attendance to increase as the numberof machines increases, but the amount of the increase will diminish, thus implying that the effect ofchanging the predictor variable depends on its level. This could be written into a model as follows(letting y be the number of people attending on a given day, and x being the number of machines):

y = β0 + β1X + β2X2 + ε.

Again, we assume that ε ∼ N(0, σ). In this model, the number of people attending in a day whenthere are X machines is nomally distributed with mean β0 + β1X + β2X

2 and standard deviationσ. Note that we are no longer saying that the mean is linearly related to X, but rather thatit is approximately quadratically related to X (curved). Suppose she leases varying numbers ofmachines over a period of n = 12 Wednesdays (always advertising how many machines will be there

Page 98: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

on the following Wednesday), and observes the number of people attending the club each day, andobtaining the data in Table 47.

Week # Machines (X) Attendance (Y )1 3 5552 6 7763 1 2674 2 4315 5 7226 4 6357 1 2188 5 6929 3 53410 2 45911 6 81012 4 671

Table 47: Data for health club example

In this case, we would like to fit the multiple regression model:

y = β0 + β1X + β2X2 + ε,

which is just like our previous model except instead of a second predictor variable X2, we are usingthe variable X2, the effect is that the fitted equation Y will be a curve in 2 dimensions, not aplane in 3 dimensions as we saw in the weather example. First we will run the regression on thecomputer, obtaining the Analysis of Variance and the parameter estimates, then plot the data andfitted equation. Table 48 gives the Analysis of Variance for this example and Table 49 gives theparameter estimates and their standard errors. Note that even though we have only one predictorvariable, it is being used twice and could in effect be treated as two different predictor variables,so k = 2.

ANOVASource of Sum of Degrees of MeanVariation Squares Freedom Square F p-valueMODEL SSR = 393933.12 k = 2 MSR = 393933.12

2 F = 196966.56776.06 .0001

=196966.56 =253.80ERROR SSE = 6984.55 n − k − 1 = MSE = 6984.55

9=12-2-1=9 =776.06

TOTAL SSY Y = 400917.67 n − 1 = 11

Table 48: The Analysis of Variance Table for health club data

The first test of hypothesis is whether the attendance is associated with the number of machines.This is a test of H0 : β1 = β2 = 0. If the null hypothesis is true, that implies mean daily attendanceis unrelated to the number of machines, thus the club owner would purchase very few (if any) of themachines. As before this test is the F -test from the Analysis of Variance table, which we conducthere at α = .05.

1. H0 : β1 = β2 = 0

Page 99: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

t FOR H0: STANDARD ERRORPARAMETER ESTIMATE βi=0 P-VALUE OF ESTIMATEINTERCEPT (β0) b0=72.0500 2.04 .0712 35.2377MACHINES (β1) b1 = 199.7625 8.67 .0001 23.0535MACHINES SQ (β2) b2 = −13.6518 −4.23 .0022 3.2239

Table 49: Parameter estimates and tests of hypotheses for individual parameters

2. HA : Not both βi = 0

3. T.S.: Fobs = MSRMSE = 196966.56

776.06 = 253.80

4. R.R.: Fobs > F2,9,.05 = 4.26 (This is not provided on the output, the p-value takes the placeof it).

5. p-value: P (F > 253.80) = .0001 (Actually it is less than .0001, but this is the smallest p-valuethe computer will print).

Another test with an interesting interpretation is H0 : β2 = 0. This is testing the hypothesisthat the mean increases linearly with X (since if β2 = 0 this becomes the simple regression model(refer back to the coffee data example)). The t-test in Table 49 for this hypothesis has a teststatistic tobs = −4.23 which corresponds to a p-value of .0022, which since it is below .05, implieswe reject H0 and conclude β2 6= 0. Since b2 is is negative, we will conclude that β2 is negative,which is in agreement with her theory that once you get to a certain number of machines, it doesnot help to keep adding new machines. This is the idea of ‘diminishing returns’. Figure 25 showsthe actual data and the fitted equation Y = 72.0500 + 199.7625X − 13.6518X2.

Y H A T

0

1 0 0

2 0 0

3 0 0

4 0 0

5 0 0

6 0 0

7 0 0

8 0 0

9 0 0

X

0 1 2 3 4 5 6 7

Figure 25: Plot of the data and fitted equation for health club example

14.2 Regression Models With Dummy Variables

All of the predictor variables we have used so far were numeric or what are often called quantitativevariables. Other variables also can be used that are called qualitative variables. Qualitative vari-ables measure characteristics that cannot be described numerically, such as a person’s sex, race,

Page 100: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

religion, or blood type; a city’s region or mayor’s political affiliation; the list of possibilities isendless. In this case, we frequently have some numeric predictor variable(s) that we believe is (are)related to the response variable, but we believe this relationship may be different for different levelsof some qualitative variable of interest.

If a qualitative variable has m levels, we create m−1 indicator or dummy variables. Consideran example where we are interested in health care expenditures as related to age for men and women,separately. In this case, the response variable is health care expenditures, one predictor variable isage, and we need to create a variable representing sex. This can be done by creating a variable X2

that takes on a value 1 if a person is female and 0 if the person is male. In this case we can writethe mean response as before:

E[Y |X1,X2] = β0 + β1X1 + β2X2 + ε.

Note that for women of age x1, the mean expenditure is E[Y |X1, 1] = β0+β1X1+β2(1) = (β0+β2)+β1X1, while for men of age X1, the mean expenditure is E[Y |X1, 0] = β0+β1X1+β0(0) = β0+β1X1.This model allows for different means for men and women, but requires they have the same slope(we will see a more general case in the next section). In this case the interpretation of β2 = 0 isthat the means are the same for both sexes, this is a hypothesis a health care professional may wishto test in a study. In this example the variable sex had two variables, so we had to create 2− 1 = 1dummy variable, now consider a second example.

Example 14.2We would like to see if annual per capita clothing expenditures is related to annual per capita

income in cities across the U.S. Further, we would like to see if there is any differences in the meansacross the 4 regions (Northeast, South, Midwest, and West). Since the variable region has 4 levels,we will create 3 dummy variables X2,X3, and X4 as follows (we leave X1 to represent the predictorvariable per capita income):

X2 =

{1 if region=South0 otherwise

X3 =

{1 if region=Midwest0 otherwise

X4 =

{1 if region=West0 otherwise

Note that cities in the Northeast have X2 = X3 = X4 = 0, while cities in other regions will haveeither X2,X3, or X4 being equal to 1. Northeast cities act like males did in the previous example.The data are given in Table 50.

The Analysis of Variance is given in Table 51, and the parameter estimates and standard errorsare given in Table 52.

Note that we would fail to reject H0 : β1 = β2 = β3 = 0 at α = .05 significance level if we lookedonly at the F -statistic and it’s p-value (Fobs = 2.93, p-value=.0562). This would lead us to concludethat there is no association between the predictor variables income and region and the responsevariable clothing expenditures. This is where you need to be careful when using multiple regressionwith many predictor variables. Look at the test of H0 : β1 = 0, based on the t-test in Table 52.Here we observe tobs=3.11, with a p-value of .0071. We thus conclude β1 6= 0, and that clothingexpenditures is related to income, as we would expect. However, we do fail to reject H0 : β2 = 0, H0 :β3 = 0,and H0 : β4 = 0, so we fail to observe any differences among the regions in terms of clothing

Page 101: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

PER CAPITA INCOME & CLOTHING EXPENDITURES (1990)Income Expenditure

Metro Area Region X1 Y X2 X3 X4

New York City Northeast 25405 2290 0 0 0Philadelphia Northeast 21499 2037 0 0 0Pittsburgh Northeast 18827 1646 0 0 0Boston Northeast 24315 1659 0 0 0Buffalo Northeast 17997 1315 0 0 0Atlanta South 20263 2108 1 0 0Miami/Ft Laud South 19606 1587 1 0 0Baltimore South 21461 1978 1 0 0Houston South 19028 1589 1 0 0Dallas/Ft Worth South 19821 1982 1 0 0Chicago Midwest 21982 2108 0 1 0Detroit Midwest 20595 1262 0 1 0Cleveland Midwest 19640 2043 0 1 0Minneapolis/St Paul Midwest 21330 1816 0 1 0St Louis Midwest 20200 1340 0 1 0Seattle West 21087 1667 0 0 1Los Angeles West 20691 2404 0 0 1Portland West 18938 1440 0 0 1San Diego West 19588 1849 0 0 1San Fran/Oakland West 25037 2556 0 0 1

Table 50: Clothes Expenditures and income example

ANOVASource of Sum of Degrees of MeanVariation Squares Freedom Square F p-valueMODEL 1116419.0 4 279104.7 2.93 .0562ERROR 1426640.2 15 95109.3TOTAL 2543059.2 19

Table 51: The Analysis of Variance Table for clothes expenditure data

t FOR H0: STANDARD ERRORPARAMETER ESTIMATE βi=0 P-VALUE OF ESTIMATEINTERCEPT (β0) −657.428 −0.82 .4229 797.948X1 (β1) 0.113 3.11 .0071 0.036X2 (β2) 237.494 1.17 .2609 203.264X3 (β3) 21.691 0.11 .9140 197.536X4 (β4) 254.992 1.30 .2130 196.036

Table 52: Parameter estimates and tests of hypotheses for individual parameters

Page 102: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

expenditures after ‘adjusting’ for the variable income. Figure 26 and Figure 27 show the originaldata using region as the plotting symbol and the 4 fitted equations corresponding to the 4 regions.Recall that the fitted equation is Y = −657.428 + 0.113X1 + 237.494X2 + 21.691X3 + 254.992X4 ,and each of the regions has a different set of levels of variables X2,X3, and X4.

R E G I O N M i d w e s t N o r t h e a s tS o u t h W e s t

N N N S S SM M M W W W

Y

1 2 0 01 3 0 01 4 0 01 5 0 01 6 0 01 7 0 01 8 0 01 9 0 02 0 0 02 1 0 02 2 0 02 3 0 02 4 0 02 5 0 02 6 0 0

X 1

1 6 0 0 0 1 8 0 0 0 2 0 0 0 0 2 2 0 0 0 2 4 0 0 0 2 6 0 0 0

N

NN

N

N

S

S

S

S

S

M M

MM

M

W

W

W

W

W

Figure 26: Plot of clothing data, with plotting symbol region

Y N

1 2 0 01 3 0 01 4 0 01 5 0 01 6 0 01 7 0 01 8 0 01 9 0 02 0 0 02 1 0 02 2 0 02 3 0 02 4 0 02 5 0 02 6 0 0

X 1

1 6 0 0 0 1 8 0 0 0 2 0 0 0 0 2 2 0 0 0 2 4 0 0 0 2 6 0 0 0

Figure 27: Plot of fitted equations for each region

14.3 Regression Models With Interactions

In some situations, two or more predictor variables may interact in terms of their effects on themean response. That is, the effect on the mean response of changing the level of one predictorvariable depends on the level of another predictor variable. This idea is easiest understood inthe case where one of the variables is qualitative. Consider the following example involving thenumber of AIDS cases reported over a period of 8 years (treating this as a sample from a conceptualpopulation). We are interested in the number of cases reported (response variable) among men and

Page 103: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

women (qualitative predictor variable) over time (quantitative predictor variable). If we fit a modelas in the previous section, it would be of the form:

Y = β0 + β1X1 + β2X2 + ε,

where X1 is the year from the beginning of the study, and X2 is a dummy variable correspondingto gender. The model allows for different intercepts, but requires common slopes with respect tochanges over time (this can be interpreted as saying that while the number of new cases in a yearcan be at different levels among men and women, the rates of increase are the same). This wouldbe unlikely since we expect the rate of increase to be much higher for men. We would rather fit amodel that allowed for different intercepts and slopes (in effect, different regression equations forthe two sexes). Note that this is a situation where we would be trying to predict a future outcomebased on past data. The model, we will fit allows for interaction between time and sex, allowingfor different slopes for the two sexes. It can be written as:

Y = β0 + β1X1 + β2X2 + β3X1X2 + ε.

If we define X2 to be 1 for males and 0 for females, we can write the equations for the two sexes asfollows:

Males: Y = β0 + β1X1 + β2(1) + β3X1(1) + ε = (β0 + β2) + (β1 + β3)X1 + ε,

andFemales: Y = β0 + β1X1 + β2(0) + β3X1(0) + ε = β0 + β1X1 + ε.

The data for years 1984–1991 are given in Table 53, note that we will use X1 = year − 1983 as ourpredictor variable representing time to make calculations neater.

AIDS Cases Reported in U.S.YEAR X1 = Y EAR − 1983 SEX X2 CASES (Y )1984 1 FEMALE 0 2961984 1 MALE 1 41461985 2 FEMALE 0 5851985 2 MALE 1 76301986 3 FEMALE 0 10491986 3 MALE 1 121011987 4 FEMALE 0 18331987 4 MALE 1 192761988 5 FEMALE 0 32871988 5 MALE 1 274671989 6 FEMALE 0 36601989 6 MALE 1 299781990 7 FEMALE 0 48801990 7 MALE 1 367361991 8 FEMALE 0 56771991 8 MALE 1 37995

Table 53: AIDS cases reported in U.S. from 1984-1991 by sex

We will not provide the Analysis of Variance table for this example due to the magnitude ofthe numbers, and the fact that we are certain of sex and year effects after one look at the data.

Page 104: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

However, we do note that R2 = .991555, showing that our model does account for much of thevariation in reported cases. Table 55 provides the parameter estimates and their standard errors.

t FOR H0: STANDARD ERRORPARAMETER ESTIMATE βi=0 P-VALUE OF ESTIMATEINTERCEPT (β0) -1007.464 -0.94 .3675 1075.908X1 (β1) 814.631 3.82 .0024 213.062X2 (β2) -877.929 -0.58 .5746 1521.564X3 (β3) 4474.595 14.85 .0001 301.315

Table 54: Parameter estimates and tests of hypotheses for individual parameters – AIDS data

For males the fitted equation is:

Ymale = (b0 + b2) + (b1 + b3)X1 = −1885.393 + 5289.226X1 ,

while for females, the fitted equation is:

Yfemale = b0 + b1X1 = −1007.464 + 814.631X1 .

Note the difference in these equations, particularly their slopes, which represent the increase inthe number of new cases each year. Suppose that a government official would like to predict thenumber of new cases in 1992 based on this equation (we are assuming that this increasing patternwill continue). In this case X1 = year − 1983 = 1992 − 1983 = 9. The two predictions would be:

Males: Y1992 = −1885.393 + 5289.226(9) = 45717.641,

andFemales: Y1992 = −1007.464 + 814.631(9) = 6324.215.

Note that we are not limited to simple models like this, we could have interaction terms in any ofthe regression models that we have seen in this chapter. Their interpretations increase in complexityas the models include more variables. Usually we will test if a coefficient of an interaction term is0, using the t-test, and remove the interaction term from the model if we fail to reject H0 : βi = 0.Figure 28 shows the data, as well as the two fitted equations for the AIDS data.

15 Lecture 15 — Multicollinearity and Intro to Time Series

Textbook Sections: 13.6,15.1Problems: See lecture, 15.1,15.3

15.1 Multicollinearity

Multicollinearity refers to the situation where independent variables are highly correlated amongthemselves. This can cause problems mathematically and creates problems in interpreting regres-sion coefficients.

Some of the problems that arise include:

Page 105: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

Y

- 1 0 0 0 0

0

1 0 0 0 0

2 0 0 0 0

3 0 0 0 0

4 0 0 0 0

5 0 0 0 0

X 1

1 2 3 4 5 6 7 8

Figure 28: Plot of fitted equations for each sex

• Difficult to interpret regression coefficient estimates

• Inflated std errors of estimates (and thus small t–statistics)

• Signs of coefficients may not be what is expected.

• However, predicted values are not adversely affected

It can be thought that the independent variables are explaining “the same” variation in Y , andit is difficult for the model to attribute the variation explained (recall partial regression coefficients).

Variance Inflation Factors provide a means of detecting whether a given independent variableis causing multicollinearity. They are calculated (for each independent variable) as:

V IFi =1

1 − R2i

where R2i is the coefficient of multiple determination when Xi is regressed on the k − 1 other

independent variables. One rule of thumb suggests that severe multicollinearity is present if V IFi >10 (R2

i > .90).

Example 15.1First, we run a regression with ELEVATION as the dependent variable and LATITUDE and

LONGITUDE as the independent variables. We then repeat the process with LATITUDE as thedependent variable, and finally with LONGITUDE as the dependent variable. Table ?? gives R2

and V IF for each model.Note how large the factor is for ELEVATION. Texas elevation increases as you go West and as

you go North. The Western rise is the more pronounced of the two (the simple correlation betweenELEVATION and LONGITUDE is .89).

Consider the effects on the coefficients in Table 56 and Table 57 (these are subsets of previouslyshown tables).

Compare the estimate and estimated standard error for the coefficient for ELEVATION andLATITUDE for the two models. In particular, the ELEVATION coefficient doubles in absolute

Page 106: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

Variable R2 V IF

ELEVATION .9393 16.47LATITUDE .7635 4.23LONGITUDE .8940 9.43

Table 55: Variance Inflation Factors for Texas weather data

STANDARD ERRORPARAMETER ESTIMATE OF ESTIMATE

INTERCEPT (β0) b0=109.25887 2.97857LATITUDE (β1) b1 = −1.99323 0.13639

LONGITUDE (β2) b2 = −0.38471 0.22858ELEVATION (β3) b3 = −0.00096 0.00057

Table 56: Parameter estimates and standard errors for the full model

value and its standard error decreases by a factor of almost 3. The LATITUDE coefficient andstandard error do not change very much. We choose to keep ELEVATION, as opposed to LONGI-TUDE, in the model due to theoretical considerations with respect to weather and climate.

15.2 Forecasting and Time Series

In the remainder of the course, we consider data that are collected over time. Many economic andfinancial models are based on time series. We will describe some simple methods used to predictfuture outcomes based on past values and (possibly) other known information at the time of theforecast.

Since, there is unlimited number of possibilities of ways of forecasting future outcomes, we needmeans of comparing the various methods. First, we introduce some notation:

• Xt — Actual (random) outcome at time t, unknown prior to t

• Ft — Forecast of Xt, made prior to t

• et — Error of forecast: et = Xt − Ft

Six commonly used measures are given below, think of ways that the measures may differ:Mean Error (ME) — ME=

∑ei

number of forecasts

Mean Absolute Deviation (MAD) — MAD=∑

|ei|number of forecasts

STANDARD ERRORPARAMETER ESTIMATE OF ESTIMATE

INTERCEPT (β0) b0=63.45485 0.48750ELEVATION (β1) b1 = −0.00185 0.00022LATITUDE (β2) b2 = −1.83216 0.10380

Table 57: Parameter estimates and standard errors for the reduced model

Page 107: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

Mean Square Error (MSE) — MSE=∑

e2i

number of forecasts

Mean Percentage Error (MPE) — MPE=

∑( eiXi

·100)

number of forecasts

Mean Absolute Percentage Error (MAPE) — MPE=

∑( |ei|Xi

·100)

number of forecasts

16 Lecture 16 — Simple Time Series Forecasting Techniques

Textbook Sections: 15.4, pp651–652Problems: 15.11,13, See lecture

In this section, we describe some simple methods of using past data to predict future outcomes.Most forecasts you here reported are generally complex hybrids of these techniques.

16.1 Moving Averages

Use the mean of the last n observations to forecast outcome at t:

Ft =Xt−1 + Xt−2 + · · · + Xt−n

n

The term “moving” implies that the n X′s are moving through time.

Problem: How to choose n?

Weighted Moving Averages

Put higher (presumably) weights on more recent values in the moving averages:

Ft =w1Xt−1 + w2Xt−2 + · · · + wnXt−n∑

wi

Presumably w1 ≥ w2 ≥ · · · ≥ wn

Example 16.1Table 58 gives average dividend yields for Anheuser–Busch for the years 1952–1995 (Source:Value

Line), forecasts and errors based on moving averages based on lags of 1, 2, and 3. Note that wedon’t have early year forecasts, and the longer the lag, the longer we must wait until we get ourfirst forecast.

Here we compute moving averages for year=1963:

1–Year: F1963 = X1962 = 3.2

2–Year: F1963 = X1962+X19612 = 3.2+2.8

2 = 3.0

3–Year: F1963 = X1962+X1961+X19603 = 3.2+2.8+4.4

3 = 3.47

Page 108: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

t Year Xt F1,t e1,t F2,t e2,t F3,t e3,t

1 1952 5.30 . . . . . .2 1953 4.20 5.30 -1.10 . . . .3 1954 3.90 4.20 -0.30 4.75 -0.85 . .4 1955 5.20 3.90 1.30 4.05 1.15 4.47 0.735 1956 5.80 5.20 0.60 4.55 1.25 4.43 1.376 1957 6.30 5.80 0.50 5.50 0.80 4.97 1.337 1958 5.60 6.30 -0.70 6.05 -0.45 5.77 -0.178 1959 4.80 5.60 -0.80 5.95 -1.15 5.90 -1.109 1960 4.40 4.80 -0.40 5.20 -0.80 5.57 -1.1710 1961 2.80 4.40 -1.60 4.60 -1.80 4.93 -2.1311 1962 3.20 2.80 0.40 3.60 -0.40 4.00 -0.8012 1963 3.10 3.20 -0.10 3.00 0.10 3.47 -0.3713 1964 3.10 3.10 0.00 3.15 -0.05 3.03 0.0714 1965 2.60 3.10 -0.50 3.10 -0.50 3.13 -0.5315 1966 2.00 2.60 -0.60 2.85 -0.85 2.93 -0.9316 1967 1.60 2.00 -0.40 2.30 -0.70 2.57 -0.9717 1968 1.30 1.60 -0.30 1.80 -0.50 2.07 -0.7718 1969 1.20 1.30 -0.10 1.45 -0.25 1.63 -0.4319 1970 1.20 1.20 0.00 1.25 -0.05 1.37 -0.1720 1971 1.10 1.20 -0.10 1.20 -0.10 1.23 -0.1321 1972 0.90 1.10 -0.20 1.15 -0.25 1.17 -0.2722 1973 1.40 0.90 0.50 1.00 0.40 1.07 0.3323 1974 2.00 1.40 0.60 1.15 0.85 1.13 0.8724 1975 1.90 2.00 -0.10 1.70 0.20 1.43 0.4725 1976 2.30 1.90 0.40 1.95 0.35 1.77 0.5326 1977 3.10 2.30 0.80 2.10 1.00 2.07 1.0327 1978 3.50 3.10 0.40 2.70 0.80 2.43 1.0728 1979 3.80 3.50 0.30 3.30 0.50 2.97 0.8329 1980 3.70 3.80 -0.10 3.65 0.05 3.47 0.2330 1981 3.10 3.70 -0.60 3.75 -0.65 3.67 -0.5731 1982 2.60 3.10 -0.50 3.40 -0.80 3.53 -0.9332 1983 2.40 2.60 -0.20 2.85 -0.45 3.13 -0.7333 1984 3.00 2.40 0.60 2.50 0.50 2.70 0.3034 1985 2.40 3.00 -0.60 2.70 -0.30 2.67 -0.2735 1986 1.80 2.40 -0.60 2.70 -0.90 2.60 -0.8036 1987 1.70 1.80 -0.10 2.10 -0.40 2.40 -0.7037 1988 2.20 1.70 0.50 1.75 0.45 1.97 0.2338 1989 2.10 2.20 -0.10 1.95 0.15 1.90 0.2039 1990 2.40 2.10 0.30 2.15 0.25 2.00 0.4040 1991 2.10 2.40 -0.30 2.25 -0.15 2.23 -0.1341 1992 2.20 2.10 0.10 2.25 -0.05 2.20 0.0042 1993 2.70 2.20 0.50 2.15 0.55 2.23 0.4743 1994 3.00 2.70 0.30 2.45 0.55 2.33 0.6744 1995 2.80 3.00 -0.20 2.85 -0.05 2.63 0.17

Table 58: Dividend yields, Forecasts, errors — 1, 2, and 3 year moving Averages

Page 109: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

When might a “short” Moving Average be preferred to a “long” one?When might a “long” Moving Average be preferred to a “short” one?Figure 29 displays raw data and moving average forecasts.

A c t u a lM A ( 1 )M A ( 2 )M A ( 3 )

D I V _ Y L D

0

1

2

3

4

5

6

7

C A L _ Y E A R

1 9 5 0 1 9 6 0 1 9 7 0 1 9 8 0 1 9 9 0 2 0 0 0

Figure 29: Plot of the data moving average forecast for Anheuser–Busch dividend data

Measurements of Forecasting Error

Mean Error: ME =∑

ei

Number of forecasts ei = Xi − Fi

1–Year: ME = (−1.1)+(−0.3)+1.3+···+0.5+0.3+(−0.2)43 = −2.5

43 = −0.058

2–Year: ME = (−0.85)+1.15+1.25+···+0.55+0.55+(−0.05)42 = −2.6

42 = −0.062

3–Year: ME = 0.73+1.37+1.33+···+0.47+0.67+0.1741 = −2.8

41 = −0.068

Mean Absolute Percentage Error (MAPE) — MAPE =

∑( |ei|Xi

·100)

number of forecasts1–Year:

MAPE =

(|−1.1|4.2 · 100

)+(|−0.3|3.9 · 100

)+ · · · +

(|0.3|3.0 · 100

)+(|−0.2|2.8 · 100

)

43=

687.0643

= 15.98

2–Year:

MAPE =

(|−0.85|

3.9 · 100)

+(|1.15|5.2 · 100

)+ · · · +

(0.553.0 · 100

)+(|−0.05|

2.8 · 100)

42=

843.3142

= 20.08

16.2 Exponential Smoothing

Exponential smoothing is a method of forecasting that weights data from previous time periodswith exponentially decreasing magnitudes. Forecasts can be written as follows, where the forecastfor period 2 is traditionally (but not always) simply the outcome from period 1:

Ft+1 = α · Xt + (1 − α) · Ft

Page 110: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

where :

• Ft+1 is the forecast for period t + 1

• Xt is the outcome at t

• Ft is the forecast for period t

• α is the smoothing constant (0 ≤ α ≤ 1)

Forecasts are “smoother” than the raw data and weights of previous observations decline expo-nentially with time.

Example 16.1 (Continued)3 smoothing constants (allowing decreasing amounts of smoothness) for illustration:

• α = 0.2 — Ft+1 = 0.2Xt + 0.8Ft

• α = 0.5 — Ft+1 = 0.5Xt + 0.5Ft

• α = 0.8 — Ft+1 = 0.8Xt + 0.2Ft

Year 2 (1953) — set F1953 = X1952, then cycle from there.Table 59 gives average dividend yields for Anheuser–Busch for the years 1952–1995 (Source:Value

Line), forecasts and errors based on exponential smoothing based on lags of 1, 2, and 3.

Here we obtain Forecasts based on Exponential Smoothing, beginning with year 2 (1953):1953: Fα=.2,1953 = X1952 = 5.30 Fα=.5,1952 = X1952 = 5.30 Fα=.8,1952 = X1952 = 5.30

1954 (α = 0.2): Fα=.2,1954 = .2X1953 + .8Fα=.2,1953 = .2(4.20) + .8(5.30) = 5.08

1954 (α = 0.5): Fα=.5,1954 = .5X1953 + .5Fα=.5,1953 = .5(4.20) + .5(5.30) = 4.75

1954 (α = 0.8): Fα=.8,1954 = .8X1953 + .2Fα=.5,1953 = .8(4.20) + .2(5.30) = 4.42

Which level of α appears to be “discounting” more distant observations at a quicker rate? Whatwould happen if α = 1? If α = 0? Figure 30 gives raw data and exponential smoothing forecasts.

Table 60 gives measures of forecast errors for three moving average, and three exponentialsmoothing methods.

16.3 Autoregression

Sometimes regression is run on past or “lagged” values of the dependent variable (and possiblyother variables). An Autoregressive model with independent variables corresponding to k periodscan be written as follows:

Yt = b0 + b1Yt−1 + b2Yt−2 + · · · + bkYt−k

Note that the regression cannot be run for the first k responses in the series.

Example 16.1 (Continued)

Page 111: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

t Year Xt Fα=.2,t eα=.2,t Fα=.5,t eα=.5,t Fα=.8,t eα=.8,t

1 1952 5.30 . . . . . .2 1953 4.20 5.30 -1.10 5.30 -1.10 5.30 -1.103 1954 3.90 5.08 -1.18 4.75 -0.85 4.42 -0.524 1955 5.20 4.84 0.36 4.33 0.88 4.00 1.205 1956 5.80 4.92 0.88 4.76 1.04 4.96 0.846 1957 6.30 5.09 1.21 5.28 1.02 5.63 0.677 1958 5.60 5.33 0.27 5.79 -0.19 6.17 -0.578 1959 4.80 5.39 -0.59 5.70 -0.90 5.71 -0.919 1960 4.40 5.27 -0.87 5.25 -0.85 4.98 -0.5810 1961 2.80 5.10 -2.30 4.82 -2.02 4.52 -1.7211 1962 3.20 4.64 -1.44 3.81 -0.61 3.14 0.0612 1963 3.10 4.35 -1.25 3.51 -0.41 3.19 -0.0913 1964 3.10 4.10 -1.00 3.30 -0.20 3.12 -0.0214 1965 2.60 3.90 -1.30 3.20 -0.60 3.10 -0.5015 1966 2.00 3.64 -1.64 2.90 -0.90 2.70 -0.7016 1967 1.60 3.31 -1.71 2.45 -0.85 2.14 -0.5417 1968 1.30 2.97 -1.67 2.03 -0.73 1.71 -0.4118 1969 1.20 2.64 -1.44 1.66 -0.46 1.38 -0.1819 1970 1.20 2.35 -1.15 1.43 -0.23 1.24 -0.0420 1971 1.10 2.12 -1.02 1.32 -0.22 1.21 -0.1121 1972 0.90 1.91 -1.01 1.21 -0.31 1.12 -0.2222 1973 1.40 1.71 -0.31 1.05 0.35 0.94 0.4623 1974 2.00 1.65 0.35 1.23 0.77 1.31 0.6924 1975 1.90 1.72 0.18 1.61 0.29 1.86 0.0425 1976 2.30 1.76 0.54 1.76 0.54 1.89 0.4126 1977 3.10 1.86 1.24 2.03 1.07 2.22 0.8827 1978 3.50 2.11 1.39 2.56 0.94 2.92 0.5828 1979 3.80 2.39 1.41 3.03 0.77 3.38 0.4229 1980 3.70 2.67 1.03 3.42 0.28 3.72 -0.0230 1981 3.10 2.88 0.22 3.56 -0.46 3.70 -0.6031 1982 2.60 2.92 -0.32 3.33 -0.73 3.22 -0.6232 1983 2.40 2.86 -0.46 2.96 -0.56 2.72 -0.3233 1984 3.00 2.77 0.23 2.68 0.32 2.46 0.5434 1985 2.40 2.81 -0.41 2.84 -0.44 2.89 -0.4935 1986 1.80 2.73 -0.93 2.62 -0.82 2.50 -0.7036 1987 1.70 2.54 -0.84 2.21 -0.51 1.94 -0.2437 1988 2.20 2.38 -0.18 1.96 0.24 1.75 0.4538 1989 2.10 2.34 -0.24 2.08 0.02 2.11 -0.0139 1990 2.40 2.29 0.11 2.09 0.31 2.10 0.3040 1991 2.10 2.31 -0.21 2.24 -0.14 2.34 -0.2441 1992 2.20 2.27 -0.07 2.17 0.03 2.15 0.0542 1993 2.70 2.26 0.44 2.19 0.51 2.19 0.5143 1994 3.00 2.35 0.65 2.44 0.56 2.60 0.4044 1995 2.80 2.48 0.32 2.72 0.08 2.92 -0.12

Table 59: Dividend yields, Forecasts, and errors based on exponential smoothing with α =0.2, 0.5, 0.8

Page 112: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

A c t u a lE S ( a = . 2 )E S ( a = . 5 )E S ( a = . 8 )

D I V _ Y L D

0

1

2

3

4

5

6

7

C A L _ Y E A R

1 9 5 0 1 9 6 0 1 9 7 0 1 9 8 0 1 9 9 0 2 0 0 0

Figure 30: Plot of the data and Exponential Smoothing forecasts for Anheuser–Busch dividenddata

Moving Average Exponential SmoothingMeasure 1–Period 2–Period 3–Period α = 0.2 α = 0.5 α = 0.8

ME −0.06 −0.06 −0.07 −0.32 −0.12 −0.07MAE 0.43 0.53 0.62 0.82 0.58 0.47MSE 0.30 0.43 0.57 0.97 0.48 0.34MPE −3.31 −4.87 −6.62 −22.57 −7.83 −4.36

MAPE 15.98 20.07 24.19 37.01 22.69 17.29

Table 60: Relative performances of 6 forecasting methods — Anheuser–Busch data

Page 113: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

From Computer software, autoregressions based on lags of 1, 2, and 3 periods are fit:

1–Period: Yt = 0.29 + 0.88Yt−1

2–Period: Yt = 0.29 + 1.18Yt−1 − 0.29Yt−2

3–Period: Yt = 0.28 + 1.21Yt−1 − 0.37Yt−2 + 0.05Yt−3

Table 62 gives raw data and forecasts based on three autoregression models. Table 61 gives theforecasting errors. Figure 31 displays the actual outcomes and predictions.

AutoregressionMeasure 1–Period 2–Period 3–Period

ME 0.00 0.00 0.00MAE 0.41 0.38 0.39MSE 0.27 0.24 0.24MPE −3.47 −3.13 −3.16

MAPE 16.02 15.14 15.45

Table 61: Relative performances of 3 forecasting methods — Anheuser–Busch data

How do these methods of forecasting compare with moving averages and exponential smoothing?

A c t u a lA R ( 1 )A R ( 2 )A R ( 3 )

D I V _ Y L D

0

1

2

3

4

5

6

7

C A L _ Y E A R

1 9 5 0 1 9 6 0 1 9 7 0 1 9 8 0 1 9 9 0 2 0 0 0

Figure 31: Plot of the data and Autoregressive forecasts for Anheuser–Busch dividend data

Page 114: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

t Year Xt FAR(1),t eAR(1),t FAR(2),t e(AR(2),t FAR(3),t eAR(3),t

1 1952 5.3 . . . . . .2 1953 4.2 4.96 -0.76 . . . .3 1954 3.9 3.99 -0.09 3.72 0.18 . .4 1955 5.2 3.72 1.48 3.68 1.52 3.72 1.485 1956 5.8 4.87 0.93 5.30 0.50 5.35 0.456 1957 6.3 5.40 0.90 5.64 0.66 5.58 0.727 1958 5.6 5.84 -0.24 6.06 -0.46 6.03 -0.438 1959 4.8 5.22 -0.42 5.09 -0.29 5.03 -0.239 1960 4.4 4.52 -0.12 4.34 0.06 4.35 0.0510 1961 2.8 4.16 -1.36 4.10 -1.30 4.12 -1.3211 1962 3.2 2.75 0.45 2.33 0.87 2.29 0.9112 1963 3.1 3.11 -0.01 3.26 -0.16 3.35 -0.2513 1964 3.1 3.02 0.08 3.03 0.07 3.00 0.1014 1965 2.6 3.02 -0.42 3.06 -0.46 3.05 -0.4515 1966 2 2.58 -0.58 2.47 -0.47 2.44 -0.4416 1967 1.6 2.05 -0.45 1.90 -0.30 1.90 -0.3017 1968 1.3 1.70 -0.40 1.60 -0.30 1.61 -0.3118 1969 1.2 1.43 -0.23 1.36 -0.16 1.37 -0.1719 1970 1.2 1.35 -0.15 1.33 -0.13 1.34 -0.1420 1971 1.1 1.35 -0.25 1.36 -0.26 1.36 -0.2621 1972 0.9 1.26 -0.36 1.24 -0.34 1.23 -0.3322 1973 1.4 1.08 0.32 1.03 0.37 1.03 0.3723 1974 2 1.52 0.48 1.68 0.32 1.70 0.3024 1975 1.9 2.05 -0.15 2.25 -0.35 2.23 -0.3325 1976 2.3 1.96 0.34 1.96 0.34 1.92 0.3826 1977 3.1 2.31 0.79 2.46 0.64 2.47 0.6327 1978 3.5 3.02 0.48 3.29 0.21 3.28 0.2228 1979 3.8 3.37 0.43 3.53 0.27 3.49 0.3129 1980 3.7 3.64 0.06 3.77 -0.07 3.75 -0.0530 1981 3.1 3.55 -0.45 3.56 -0.46 3.54 -0.4431 1982 2.6 3.02 -0.42 2.88 -0.28 2.86 -0.2632 1983 2.4 2.58 -0.18 2.47 -0.07 2.47 -0.0733 1984 3 2.40 0.60 2.37 0.63 2.39 0.6134 1985 2.4 2.93 -0.53 3.14 -0.74 3.16 -0.7635 1986 1.8 2.40 -0.60 2.26 -0.46 2.20 -0.4036 1987 1.7 1.87 -0.17 1.72 -0.02 1.73 -0.0337 1988 2.2 1.79 0.41 1.78 0.42 1.80 0.4038 1989 2.1 2.23 -0.13 2.40 -0.30 2.41 -0.3139 1990 2.4 2.14 0.26 2.13 0.27 2.10 0.3040 1991 2.1 2.40 -0.30 2.52 -0.42 2.53 -0.4341 1992 2.2 2.14 0.06 2.08 0.12 2.05 0.1542 1993 2.7 2.23 0.47 2.28 0.42 2.29 0.4143 1994 3 2.67 0.33 2.84 0.16 2.85 0.1544 1995 2.8 2.93 -0.13 3.05 -0.25 3.03 -0.23

Table 62: Average dividend yields and Forecasts/errors based on autoregression with lags of 1, 2,and 3 periods

Page 115: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

17 Lecture 17 — Autocorrelation

Textbook Section: 15.5Problems: See Lecture

Recall a key assumption in regression: Error terms are independent. When data are collectedover time, the errors are often serially correlated (Autocorrelated). Under first–Order Autocorre-lation, consecutive error terms are linealy related:

epst = ρεt−1 + νt

where ρ is the correlation between consecutive error terms, and νt is a normally distributedindependent error term. When errors display a positive correlation, ρ > 0 (Consecutive errorterms are associated). We can test this relation as follows, note that when ρ = 0, error termsare independent (which is the assumption in the derivation of the tests in the chapters on linearregression).

Durbin–Watson Test for Autocorrelation

H0 : ρ = 0 No autocorrelation Ha : ρ > 0 Postive Autocorrelation

D =∑n

t=2(et−et−1)2∑n

t=1e2t

D ≥ dU =⇒ Don’t Reject H0

D ≤ dL =⇒ Reject H0

dL ≤ D ≤ dU =⇒ Withhold judgementValues of dL and dU (indexed by n and k (the number of predictor variables)) are given in Table

A.9, p. A–27

“Cures” for Autocorrelation:

• Additional independent variable(s) — A variable may be missing from the model that willeliminate the autocorrelation (see example).

• Transform the variables — Take “first differences” (Xt+1 − Xt) and (Yt+1 − Yt) and runregression with transformed Y and X.

Example — Autocorrelation — P&G Sales and CPIY — Quarterly Sales for Procter & Gamble (1965(q1)–1995(q4))

X — Consumer Price Index for quarter (1982–1984=100)

(Data Sources: Value Line and Economic Indicators Handbook (3rd Ed.))

Simple Regression: Yt = b0 + b1Xt = −1742.62 + 58.353Xt

The raw data are given in Table 63, and plotted (with the fitted equation) in Figure 32. Figure 33gives a plot of residuals vs time order. Notice the distinct pattern in the residuals and thatconsecutive residuals are very close to one another.

Page 116: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

Compute the first three residuals, and their contributions to the numerator and denominatorof the D–W statistic.

For the entire sample, we obtain: n = 124, k = 1, D = 0.092 — Test for autocorrelation.

Page 117: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

1st Qtr 2nd Qtr 3rd Qtr 4th QtrYear Sales CPI Sales CPI Sales CPI Sales CPI1965 523.0 31.2 486.9 31.5 527.8 31.6 520.9 31.71966 558.5 32.0 531.2 32.3 591.8 32.6 561.7 32.91967 643.7 32.9 564.0 33.2 633.5 33.5 597.5 33.81968 659.9 34.2 590.1 34.5 668.8 35.0 623.8 35.41969 695.3 35.8 648.4 36.4 694.6 37.0 669.3 37.51970 747.7 38.0 706.8 38.6 760.0 39.1 764.3 39.61971 815.2 39.9 764.0 40.3 799.6 40.8 799.3 41.01972 904.6 41.3 816.4 41.6 915.0 42.0 878.4 42.41973 975.2 42.9 913.3 43.9 1024.3 44.9 993.9 45.91974 1158.6 47.2 1136.1 48.5 1338.9 50.0 1278.7 51.51975 1530.6 52.4 1455.6 53.2 1587.8 54.4 1507.7 55.21976 1586.7 55.8 1541.2 56.5 1736.8 57.4 1648.0 58.01977 1830.0 59.0 1731.0 60.3 1921.0 61.2 1802.0 61.91978 1932.0 62.9 1929.0 64.5 2173.0 66.1 2065.0 67.41979 2286.0 69.1 2249.0 71.5 2457.0 73.8 2337.0 75.91980 2664.0 78.9 2622.0 81.8 2790.0 83.3 2696.0 85.51981 2908.0 87.8 2759.0 89.8 2949.0 92.4 2800.0 93.71982 3026.0 94.5 2895.0 95.9 3094.0 97.7 2979.0 97.91983 3201.0 97.9 3030.0 99.1 3131.0 100.3 3090.0 101.21984 3277.0 102.3 3135.0 103.4 3238.0 104.5 3251.0 105.31985 3485.0 106.0 3375.0 107.3 3350.0 108.0 3342.0 109.01986 3605.0 109.2 3865.0 109.0 4081.0 109.8 3888.0 110.41987 4356.0 111.6 4255.0 113.1 4222.0 114.4 4167.0 115.41988 4664.0 116.1 4839.0 117.5 4860.0 119.1 4973.0 120.31989 5267.0 121.7 5268.0 123.7 5430.0 124.7 5433.0 125.91990 5807.0 128.0 6025.0 129.3 6123.0 131.6 6126.0 133.71991 6652.0 134.8 6857.0 135.6 6795.0 136.7 6722.0 137.71992 7205.0 138.7 7597.0 139.8 7483.0 140.9 7167.0 141.91993 7879.0 143.1 7839.0 144.2 7350.0 144.8 7365.0 145.81994 7564.0 146.7 7788.0 147.6 7441.0 148.9 7503.0 149.61995 8161.0 150.9 8467.0 152.2 8312.0 152.9 8494.0 153.6

Table 63: Quarterly Sales for P&G (Y ) and CPI (X) — 1965–1995

Here, we attempt to cure the autocorrelation. The relationship appears approximately linear,with two slopes, with the split in 1985(q4) (CPI=109.0).

A LEXIS/NEXIS search shows that the company bought some OTC drug companies aroundthis time. Could this have changed rate of increase? Also, there was an uproar that their corporatelogo was Satanic around this time. Maybe a deal with Satan — Increased revenues for advertisingspace?

We fit a piecewise linear regression model with an interesting use of dummy variables andinteraction terms:

Yt = b0 + b1X1t + b2(X1t − 109.0)X2t

where: X1t is CPI at time t, X2t is 1 if after 1985(q4), 0 if before. We obtain the following

Page 118: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

S A L E S

- 1 0 0 0

0 . 0

1 0 0 0 . 0

2 0 0 0 . 0

3 0 0 0 . 0

4 0 0 0 . 0

5 0 0 0 . 0

6 0 0 0 . 0

7 0 0 0 . 0

8 0 0 0 . 0

9 0 0 0 . 0

C P I

0 . 0 5 0 . 0 1 0 0 . 0 1 5 0 . 0 2 0 0 . 0

Figure 32: Plot of sales vs CPI and the fitted equation — P&G data (Model 1)

Residual

- 2 0 0 0

- 1 0 0 0

0

1 0 0 0

2 0 0 0

T I M E

0 2 0 4 0 6 0 8 0 1 0 0 1 2 0 1 4 0

Figure 33: Plot of residuals vs time order — P&G data (Model 1)

Page 119: 1 Lecture 1 – Preliminariesusers.stat.ufl.edu/~winner/qmb3250/notes.pdf5-10 1 1.49 1 1.49 10-15 21 31.34 22 32.84 15-20 28 41.79 50 74.63 20-25 10 14.93 60 89.55 25-30 3 4.48 63

fitted equation:

Yt = −757.98 + 40.548X1t + 68.580(X1t − 109.0)X2t

Figure 34 gives a plot of the raw data and fitted equation and Figure 35 gives a plot of residualsvs time order. Is there a pattern in the residuals? Are consecutive residuals close or far apart?Compute the last three residuals, and their contributions to the numerator and denominator of theD–W statistic. For the full sample, we obtain n = 124, k = 2, D = 0.986 — Test for autocorrelation.

S A L E S

0 . 0

1 0 0 0 . 0

2 0 0 0 . 0

3 0 0 0 . 0

4 0 0 0 . 0

5 0 0 0 . 0

6 0 0 0 . 0

7 0 0 0 . 0

8 0 0 0 . 0

9 0 0 0 . 0

C P I

0 . 0 5 0 . 0 1 0 0 . 0 1 5 0 . 0 2 0 0 . 0

Figure 34: Plot of sales vs CPI and the fitted equation — P&G data (Model 2)

Residual

- 6 0 0- 5 0 0- 4 0 0- 3 0 0- 2 0 0- 1 0 0

01 0 02 0 03 0 04 0 05 0 06 0 0

T I M E

0 2 0 4 0 6 0 8 0 1 0 0 1 2 0 1 4 0

Figure 35: Plot of residuals vs time order — P&G data (Model 2)