lecture notes for agec 2225: statistical analysis

70
Lecture Notes for AgEc 2225: Statistical Analysis Spring 2015 Matt Sveum University of Missouri Department of Agricultural and Applied Economics 1

Upload: others

Post on 21-Apr-2022

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture Notes for AgEc 2225: Statistical Analysis

Lecture Notes for

AgEc 2225: Statistical Analysis

Spring 2015

Matt SveumUniversity of Missouri

Department of Agricultural and Applied Economics

1

Page 2: Lecture Notes for AgEc 2225: Statistical Analysis

Contents

A Syllabus 4

1 Introduction 7

2 Descriptive Statistics: Tabular and Graphical Presentations 8

3 Descriptive Statistics: Numerical Measures 133.1 Measures of Location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.2 Measures of Variability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.3 Measures of Distribution Shape, Relative Location, and Detection of Outliers . . . . . . . . . 173.4 Measures of Association Between Two Variables . . . . . . . . . . . . . . . . . . . . . . . . . 183.5 The Weighted Mean and Working with Grouped Data . . . . . . . . . . . . . . . . . . . . . . 19

4 Introduction to Probability 20

5 Discrete Probability Distributions 275.1 Binomial Probability Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295.2 Poisson Probability Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315.3 Hypergeometric Probability Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

6 Continuous Probability Distributions 346.1 Uniform Probability Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346.2 Normal Probability Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356.3 Normal Approximation of Binomial Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . 376.4 Exponential Probability Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

7 Sampling and Sampling Distributions 40

8 Interval Estimation 428.1 Population Mean: σ Known . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 428.2 Population Mean: σ Unknown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438.3 Determining the Sample Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

9 Hypothesis Tests 469.1 Developing a Null and Alternative Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 469.2 Type I and Type II Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479.3 Population Mean: σ Known . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489.4 Population Mean: σ Unknown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519.5 Population Proportion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

10 Comparisons Involving Means 5310.1 Inferences About the Difference Between Two Population Means: σ1 and σ2 Known . . . . . 5310.2 Inferences About the Difference Between Two Population Means: σ1 and σ2 Unknown . . . . 5510.3 Inferences About the Difference Between Two Population Means: Matched Samples . . . . . 56

11 Simple Linear Regression 5711.1 Ordinary Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5811.2 Coefficient of Determination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6011.3 The t-test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6211.4 A Few Other Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

2

Page 3: Lecture Notes for AgEc 2225: Statistical Analysis

12 Multiple Regression 6512.1 Testing for Significance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6712.2 Categorical Independent Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6812.3 Nonlinear Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

12.3.1 Quadratic Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6912.3.2 Logarithmic Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

3

Page 4: Lecture Notes for AgEc 2225: Statistical Analysis

A Syllabus

AgEcon 2225: Statistical AnalysisSpring 2015

Instructor: Matt Sveum

Email: [email protected] (NOTE: This is not an @missouri.edu email address! )

Class Time & Location: 9:30am – 10:45am; Middlebush 210

Office Hours: 11:00–12:00 Tuesday and Thursday; and by appointment

Office Location: 135C Mumford Hall

Course Description

Requirements

Prerequisites

The prerequisites for this course are AgEc 1042 (Microeconomics) and MATH 1100 (College Alge-bra). I will assume that you have completed these courses, and that you remember the content. Ifit has been a while since you took these courses, you may want to go back and review the material.You are also welcome to meet with me to discuss material that does not make sense.

Textbook

The main book for this class is Essentials of Statistics for Business and Economics by David An-derson, Dennis Sweeney, and Thomas Williams. This book is available from the Mizzou Store, orother fine book sellers. You may use an older edition if you would like, but you are responsible forchapter and page differences.

You are expected to read the relevant chapters before coming to class. This will help you to betterunderstand the material being taught and to ask better questions. Additionally, clicker quizzes maycontain questions on material from the book.

Technology

You are required to have a clicker for this class. You can purchase a clicker from the bookstore.Follow instructions on Blackboard on how to register it for the class. Clickers must be registered byJanuary 29. Clicker quizzes after this date will count toward your grade.I will post homework and reading assignments, as well as announcements on Blackboard. You arerequired to check Blackboard regularly, and all announcements on Blackboard will be assumed tohave been read. I will also post grades on Blackboard. You have one week from the date that agrade is posted to alert me of any mistakes I made. After that time, grade changes can not beguaranteed.

Course Elements

Exams (65%)

There will be two midterm exams and one final. The midterms will be worth 20% of the coursegrade each and will be on February 19 and April 9 in class. The final exam will be worth 25%

4

Page 5: Lecture Notes for AgEc 2225: Statistical Analysis

and will be on Friday May 15 at 7:30am. By signing up for this class, you agree to take the finalexam as scheduled. All exams are closed book. You will be allowed to use a calculator, but youmay not use a cell phone or iPod Touch-type device as a calculator. So make sure that you have acalculator by the first exam. Exams have a strict no-cheating policy. If you are caught using a cellphone, notes, etc during the exams, you will receive a zero for the exam.

Homework (20%)

Homework assignments are worth 10 points each and are graded based on your understanding ofthe material. Assignments will be due at the beginning of class on the due date. Assignments canbe handed in until the end of class for a two point penalty. After class has ended, assignments willnot be accepted. I will drop the lowest two homework grades.

Quizzes and Worksheets (15%)

There will be clicker quizzes randomly throughout the semester. Each one will consist of 2-4 ques-tions and will help you test your understanding of course topics. Each one will be be worth a smallnumber of points. They must be taken with a clicker. Worksheets will be done in class, usually onThursdays, and are graded based on completion. You need to be in class to get full credit. If youmiss class on a worksheet day, you can print it off of Blackboard and turn it in at the beginning ofthe next class for half credit. I will drop your two lowest worksheet grades.

Extra Credit (+5%)

Extra credit will be given before each exam, and potentially at other times during the semester. Ifyou do all extra credit assignments, your grade will go up by five percentage points (half of a lettergrade). Doing less than all extra credit will increase your final grade by less than five percentagepoints.

Classroom Expectations

Economics is a subject that requires you interact closely with the models used in class. Penciland paper is best for doing this. For this reason, and because of the distractions that they cause,laptops/iPads/etc are not acceptable note-taking devices in class. I expect that you keep cell phonesput away, as well. I reserve the right to determine attendance points based on how attentive youare in class. Texting is considered inattentive.

Academic Dishonesty

Academic integrity is fundamental to the activities and principles of a university. All members of theacademic community must be confident that each person’s work has been responsibly and honorablyacquired, developed, and presented. Any effort to gain an advantage not given to all students isdishonest whether or not the effort is successful. The academic community regards breaches of theacademic integrity rules as extremely serious matters. Sanctions for such a breach may includeacademic sanctions from the instructor, including failing the course for any violation, to disciplinarysanctions ranging from probation to expulsion. When in doubt about plagiarism, paraphrasing,quoting, collaboration, or any other form of cheating, consult the course instructor.

5

Page 6: Lecture Notes for AgEc 2225: Statistical Analysis

Students with Disabilities

If you anticipate barriers related to the format or requirements of this course, if you have emergencymedical information to share with me, or if you need to make arrangements in case the buildingmust be evacuated, please let me know as soon as possible.If disability related accommodations are necessary (for example, a note taker, extended time on ex-ams, captioning), please register with the Office of Disability Services (http://disabilityservices.missouri.edu),S5 Memorial Union, 882-4696, and then notify me of your eligibility for reasonable accommodations.For other MU resources for students with disabilities, click on “Disability Resources” on the MUhomepage.

Intelectual Pluralism

The University community welcomes intellectual diversity and respects student rights. Studentswho have questions or concerns regarding the atmosphere in this class (including respect for di-verse opinions) may contact the Departmental Chair or Divisional Director; the Director of theOffice of Students Rights and Responsibilities (http://osrr.missouri.edu/); or the MU Equity Of-fice (http://equity.missouri.edu/), or by email at [email protected]. All students will have theopportunity to submit an anonymous evaluation of the instructor(s) at the end of the course.

Recording

University of Missouri System Executive Order No. 38 lays out principles regarding the sanctityof classroom discussions at the university. The policy is described fully in Section 200.015 of theCollected Rules and Regulations. In this class, students may make audio recordings of courseactivity unless specifically prohibited by the faculty member. However, the redistribution of audiorecordings of statements or comments from the course to individuals who are not students in thecourse is prohibited without the express permission of the faculty member and of any students whoare recorded. Students found to have violated this policy are subject to discipline in accordance withprovisions of Section 200.020 of the Collected Rules and Regulations of the University of Missouripertaining to student conduct matters. Video recordings are not allowed.

6

Page 7: Lecture Notes for AgEc 2225: Statistical Analysis

1 Introduction

Why study statistics?

Fund Name Fund Type Net Asset Value ($) 5 YearAverageReturn (%)

Expense Ratio (%) Morningstar Rank

AmericanCenturyIntl. Disc

IE 14.37 30.53 1.41 3-Star

AmericanCenturyTax-FreeBond

FI 10.73 3.34 0.49 4-Star

AmericanCenturyUltra

DE 24.94 10.88 0.99 3-Star

Scales of Measurement:

7

Page 8: Lecture Notes for AgEc 2225: Statistical Analysis

2 Descriptive Statistics: Tabular and Graphical Presentations

Coke Classic Sprite PepsiDiet Coke Coke Classic Coke Classic

Pepsi Diet Coke Coke ClassicDiet Coke Coke Classic Coke Classic

Coke Classic Diet Coke PepsiCoke Classic Coke Classic Dr. PepperDr. Pepper Sprite Coke ClassicDiet Coke Pepsi Diet Coke

Pepsi Coke Classic PepsiPepsi Coke Classic Pepsi

Coke Classic Coke Classic PepsiDr. Pepper Pepsi Pepsi

Sprite Coke Classic Coke ClassicCoke Classic Sprite Dr. Pepper

Diet Coke Dr. Pepper PepsiCoke Classic Pepsi SpriteCoke Classic Diet Coke

Table 2.1: Data from a sample of 50 soft drink purchases.

8

Page 9: Lecture Notes for AgEc 2225: Statistical Analysis

Soft Drink Frequency

Coke ClassicDiet Coke

Dr. PepperPepsiSprite

Total 50

Table 2.2: Frequency Distribution.

Soft Drink Relative Frequency

Coke ClassicDiet Coke

Dr. PepperPepsiSprite

Total 1.00

Table 2.3: Relative Frequency Distribution.

Figure 2.1: Pie and Bar Charts.

9

Page 10: Lecture Notes for AgEc 2225: Statistical Analysis

14 24 18 22 20 22 16 1219 18 16 22 24 23 19 2524 17 15 16 20 25 21 1919 23 24 16 21 25 23 2416 26 21 16 22 19 20 20

Table 2.4: Example using quantitative data.

Figure 2.2: Frequency histogram and relative frequency histogram.

57 68 75 8058 70 75 8264 72 76 8365 72 78 85

Table 2.5: Stem-and-Leaf Example Data.

10

Page 11: Lecture Notes for AgEc 2225: Statistical Analysis

Observation x y Observation x y Observation x y

1 A 1 11 A 1 21 C 22 B 1 12 B 1 22 B 13 B 1 13 C 2 23 C 24 C 2 14 C 2 24 A 15 B 1 15 C 2 25 B 16 C 2 16 B 2 26 C 27 B 1 17 C 1 27 C 28 C 2 18 B 1 28 A 19 A 1 19 C 1 29 B 110 B 1 20 B 1 30 B 2

Table 2.6: Crosstabulation Example

x 1 2 Grand Total

ABC

Grand Total

Table 2.7: Crosstabulation Example

x 1 2 Grand Total

ABC

Grand Total

Table 2.8: Crosstabulation Example – Percents

11

Page 12: Lecture Notes for AgEc 2225: Statistical Analysis

Observation x y Observation x y

1 -22 22 11 -37 482 -33 49 12 34 -293 2 8 13 9 -184 29 -16 14 -33 315 -13 10 15 20 -166 21 -28 16 -3 147 -13 27 17 -15 188 -23 35 18 12 179 14 -5 19 -20 -1110 3 -3 20 -7 -22

Table 2.9: Scatter Plot Example

Figure 2.3: Scatter Plot with Trendline

12

Page 13: Lecture Notes for AgEc 2225: Statistical Analysis

3 Descriptive Statistics: Numerical Measures

Figure 3.1: Population vs. Sample.

3.1 Measures of Location

Sample Mean:

x =

∑ni=1 xin

(3.1)

Population Mean:

µ =

∑ni=1 xiN

(3.2)

Example: 10, 20, 12, 17, 16

13

Page 14: Lecture Notes for AgEc 2225: Statistical Analysis

Median and Mode:

Example: 10, 20, 12, 17, 16

Percentiles:

The first step in calculating the pth percentile is to arrange the data from smallest to largest.Second, compute an index i:

i =( p

100

)n (3.3)

where p is the percentile of interest and n is the number of observations.

Example: 27, 25, 20, 15, 30, 34, 28, 25.

14

Page 15: Lecture Notes for AgEc 2225: Statistical Analysis

3.2 Measures of Variability

Range:Range = Largest value− Smallest Value (3.4)

Example: 10, 20, 12, 17, 16

Interquartile Range:Interquartile range = Q3 −Q1 (3.5)

Example: 10, 20, 12, 17, 16

Variance:For the population:

σ2 =(xi − µ)2

N(3.6)

For the sample:

s2 =(xi − x)2

n− 1=

∑x2i − nx2

n− 1(3.7)

15

Page 16: Lecture Notes for AgEc 2225: Statistical Analysis

Example: 10, 20, 12, 17, 16

Standard Deviation:For the population:

σ =√σ2 (3.8)

For the sample:s =√s2 (3.9)

Example: 10, 20, 12, 17, 16

Coefficient of Variation: (Standard deviation

Mean

)× 100 (3.10)

16

Page 17: Lecture Notes for AgEc 2225: Statistical Analysis

3.3 Measures of Distribution Shape, Relative Location, and Detection of Out-liers

Skewness:

Skewness =n

(n− 1)(n− 2)

∑(xi − xs

)3(3.11)

z-score:

zi =xi − xs

(3.12)

Example: 10, 20, 12, 17, 16

Chebyshev’s Theorem

17

Page 18: Lecture Notes for AgEc 2225: Statistical Analysis

3.4 Measures of Association Between Two Variables

Covariance:For the sample:

sxy =

∑(xi − x)(yi − y)

n− 1(3.13)

For the population:

σxy =

∑(xi − µx)(yi − µy)

N(3.14)

Example:

xi 4 6 11 3 16

yi 50 50 40 60 30

Table 3.1: Covariance Example

Correlation:For the sample:

rxy =sxysxsy

(3.15)

For the population:

ρxy =σxyσxσy

(3.16)

18

Page 19: Lecture Notes for AgEc 2225: Statistical Analysis

Figure 3.2: Correlation and Covariance.

3.5 The Weighted Mean and Working with Grouped Data

The weighted mean:

x =

∑wixi∑wi

(3.17)

xi 3.2 2.0 2.5 5.0

wi 6 3 2 8

Table 3.2: Weighted Average

19

Page 20: Lecture Notes for AgEc 2225: Statistical Analysis

4 Introduction to Probability

Experiment Outcomes

Toss a coin Head, tailSelect part of inspection Pass, fail

Conduct a sales call Purchase, no purchaseRoll a die 1, 2, 3, 4, 5, 6

Play a football game Win, lose, tie

Table 4.1: Experiments and Outcomes

Example: A project has two stages. The first stage can take 2, 3, or 4 months to complete. Thesecond stage can take 6, 7, or 8 months to complete.

20

Page 21: Lecture Notes for AgEc 2225: Statistical Analysis

Combinations:

CNn =

(N

n

)=

N !

n!(N − n)!(4.1)

Example: N = 5, n = 2

Example: We randomly draw 6 lottery ball from a basket of 53:

Example: A pizza restaurant offers ten different toppings for its pizzas. In how many ways can acustomer order three toppings for his pizza?

Permutations:

PNn = n!

(N

n

)=

N !

(N − n)!(4.2)

21

Page 22: Lecture Notes for AgEc 2225: Statistical Analysis

Example: Select 2 of 5, and order matters:

Example: A small college has ten math majors and will select three of them to receive scholarship.In how many ways can the selections be made if:

1. The scholarships are each for $500?

2. The scholarships are for $900, $500, and $250, respectively?

Requirements for assigning probabilities:

1. The probability assigned to each experimental outcome must be between 0 and 1, inclusively. Ifwe let Ei denote the ith experimental outcome and P (Ei) its probability, then this requirementcan be written as

0 ≤ P (Ei) ≤ 1 ∀i (4.3)

2. The sum of the probabilities for all experimental outcomes must equal 1.0. For n experimentaloutcomes, this requirement can be written as:

P (E1) + P (E2) + · · ·+ P (En) = 1 (4.4)

22

Page 23: Lecture Notes for AgEc 2225: Statistical Analysis

Stage 1 Design Stage 2 Construction Number of Past Occurrences

2 6 62 7 62 8 23 6 43 7 83 8 24 6 24 7 44 8 6

Table 4.2: Probability of Events.

23

Page 24: Lecture Notes for AgEc 2225: Statistical Analysis

The addition law:P (A ∪B) = P (A) + P (B)− P (A ∩B) (4.5)

For mutually exclusive events:P (A ∪B) = P (A) + P (B) (4.6)

Example: Suppose we have a sample space with five equally likely outcomes: E1, E2, E3, E4, andE5. Let:

A = {E1, E2}B = {E3, E4}C = {E2, E3, E5}

Example:

M = Event an officer is a man

W = Event an officer is a woman

A = Event an officer is promoted

Ac = Event an officer is not promoted

Men Women Total

Promoted 288 36 324Not Promoted 672 204 876

Total 960 240 1200

Table 4.3: Conditional Probability

24

Page 25: Lecture Notes for AgEc 2225: Statistical Analysis

Conditional probabilities can be calculated as:

P (A|B) =P (A ∩B)

P (B)(4.7)

P (B|A) =P (A ∩B)

P (A)(4.8)

Two events A and B are independent if

P (A|B) = P (A) (4.9)

orP (B|A) = P (B) (4.10)

The multiplication law tells us that

P (A ∩B) = P (B)P (A|B) (4.11)

orP (A ∩B) = P (A)P (B|A) (4.12)

For independent events, the multiplication law is:

P (A ∩B) = P (A)P (B) (4.13)

Example: For trucks on a certain stretch of highway, 23% have faulty breaks, 24% have wore tires,and 38% have faulty breaks or worn tires. If a truck is stopped at random, what is the probabilitythat:

1. It will have faulty breaks and worn tires?

2. It will have neither faulty breaks nor worn tires?

3. It will not have faulty breaks and it will not have wore tires?

25

Page 26: Lecture Notes for AgEc 2225: Statistical Analysis

Example: Of freshman at a particular college, 25% are taking economics, 82% are taking mathe-matics, and 14% are taking both. If a randomly selected freshman is found to be taking economics,what is the probability that he or she will also be enrolled in mathematics?

Example: Five percent of a call center’s calls go to wrong numbers. If three calls are randomlyselected, what is the probability that all three went to wrong numbers?

26

Page 27: Lecture Notes for AgEc 2225: Statistical Analysis

5 Discrete Probability Distributions

Experiment Random Variable Possible Values

Contact 5 customers # who place order 0, 1, 2, 3, 4, 5

Inspect shipment of 50 radios Number of defective radios 0, 1, 2, 3, ..., 49, 50

Operate a restaurant for one day # of customers 0, 1, 2, 3, ...

Sell an automobile Gender of the customer 0 = male; 1 = female

Table 5.1: Examples of discrete random variables.

Experiment Random Variable Possible Values

Operate a bank Time between customer arrivals x ≥ 0

Fill a soda can Number of ounces 0 ≤ x ≤ 12.1

Construct a new library Percent of project complete within 6 months 0 ≤ x ≤ 100

Test a new chemical process Temperature when reaction takes place 150 ≤ x ≤ 212

Table 5.2: Examples of continuous random variables.

Are the following discrete or continuous?

• Number of test questions answered correctly.

• Number of cars arriving at a tollbooth.

• Number of pounds.

• Number of returns containing errors.

• Number of nonproductive hours in an hour hour workday.

Required conditions for a discrete probability function:

f(x) ≥ 0 (5.1)∑f(x) = 1 (5.2)

27

Page 28: Lecture Notes for AgEc 2225: Statistical Analysis

Uniform probability distribution:

f(x) =1

n(5.3)

where n is the number of values the random variable may have.

Example:

Job Satisfaction Senior Executives (%) Middle Managers (%)

1 5 42 9 103 3 124 42 465 41 28

What is the probability that a senior executive will report a job satisfaction score of 4 or 5?

What is the probability that a middle manager is very satisfied (a score of 5)?

Example:

x f(x)

100,000 .10200,000 .20300,000 .25400,000 .30500,000 .10600,000 .05

Is this a valid probability distribution?

What is the probability that the company will receive more than 400,000 new subscribers? Fewerthan 200,000 new subscribers?

28

Page 29: Lecture Notes for AgEc 2225: Statistical Analysis

The expected value of a discrete random variable is:

E(x) = µ =∑

xf(x) (5.4)

The variance of a discrete random variable is:

V ar(x) = σ2 =∑

(x− µ)2f(x) (5.5)

r

Example:

x f(x) yf(y) y − µ (y − µ)2 (y − µ)2f(y)

2 .20

4 .30

7 .40

8 .10

Compute E(x).

Compute V ar(y) and σ.

5.1 Binomial Probability Distribution

Properties of a binomial experiment:

1. The experiment consists of a sequence of n identical trials.

2. Two outcomes are possible on each trial. We refer to one outcome as a success and the otheroutcome as a failure.

3. The probability of a success, denoted by p, does not change from trial to trial. Consequently,the probability of a failure, denoted by 1− p, does not change from trial to trial.

4. The trials are independent.

29

Page 30: Lecture Notes for AgEc 2225: Statistical Analysis

The number of experimental outcomes resulting in exactly x successes in n trials can be computedusing the following formula: (

n

x

)=

n!

x!(n− x)(5.6)

Example: Consider a binomial experiment with ten trials. How many ways can we get 3 successfultrials?

Probability of x successes in n trials = px(1− p)(n−x) (5.7)

Binomial probability function:

f(x) =

(n

x

)px(1− p)(n−x) (5.8)

where x = number of successes, p = the probability of a success on one trial, n = the number oftrials, and f(x) = the probability of x successes in n trials.

The expected value of the binomial distribution is:

E(x) = µ = np (5.9)

And the variance of the binomial distribution is:

V ar(x) = σ2 = np(1− p) (5.10)

30

Page 31: Lecture Notes for AgEc 2225: Statistical Analysis

Example: Consider a binomial experiment with ten trials. Let p = .10.

5.2 Poisson Probability Distribution

Properties of a Poisson experiment:

1. The probability of an occurrence is the same for any two intervals of equal length.

2. The occurrence of nonoccurrence of any interval is independent of the occurrence or nonoc-currence in any other interval.

Poisson probability function:

f(x) =µxe−µ

x!(5.11)

where f(x) = the probability of x occurrences in an interval, µ = the expected value (or meannumber of) occurrences in an interval, and e =2.71828.

Example: Consider a Poisson distribution with µ = 3.

31

Page 32: Lecture Notes for AgEc 2225: Statistical Analysis

5.3 Hypergeometric Probability Distribution

The hypergeometric probability function is:

f(x) =

(rx

)(N−rn−x

)(Nn

) (5.12)

where x = the number of successes, n = the number of trials, f(x) = the probability of x successesin n trials, N = the number of elements in the population, and r = the number of elements in thepopulation labeled success.

Example: Suppose N = 10 and r = 3. Compute the hypergeometric probabilities for the followingvalues of n and x.

Example:

Increased Lending Decreased Lending

BB&T Bank of AmericaSun Trust Banks Capital One

US Bancorp CitigroupFifth Third Bancorp

JP Morgan ChaseRegions Financial

Wells Fargo

32

Page 33: Lecture Notes for AgEc 2225: Statistical Analysis

Example: The Census Bureau reports that 20% of people older than 25 have completed four yearsof college. Suppose that you sample 15 people.

Example: An average of 15 aircraft accidents occur each year.

33

Page 34: Lecture Notes for AgEc 2225: Statistical Analysis

6 Continuous Probability Distributions

6.1 Uniform Probability Distribution

Uniform probability density function is:

f(x) =

{1b−a for a ≤ x ≤ b0 elsewhere

(6.1)

Graphically:

Expected value and variance:

E(x) =a+ b

2

V ar(x) =(b− a)2

12

Example: Suppose we have data that show that a flight can arrive anywhere between 120 and 140minutes after takeoff. If any time between that interval is equally likely, then the probability densityfunction is:

34

Page 35: Lecture Notes for AgEc 2225: Statistical Analysis

Example: The random variable x is known to be uniformly distributed between 10 and 20.

6.2 Normal Probability Distribution

Normal probability density function:

f(x) =1

σ√

2πe−(x−µ)2/2σ2

(6.2)

where µ = mean, σ = standard deviation, π = 3.14159, and e = 2.71828

Graphically: (let m= µ; sigma= σ.)

35

Page 36: Lecture Notes for AgEc 2225: Statistical Analysis

The standard normal’s density function is:

f(z) =1√2πe−z

2/2

Example: Calculate the probability that z is less than or equal to one.

Example: Calculate the probability that z is between −0.5 and 1.25.

Example: Calculate the probability of z being larger than 1.58.

To convert a normal distribution to a standard normal distribution we use:

z =x− µσ

(6.3)

Example: Suppose µ = 10 and σ = 2. What is the probability that x is between 10 and 14?

Example: The average stock price for companies making up the S&P 500 is $30, and the standarddeviation is $8.20. Assume the stock prices are normally distributed.

What is the probability that a company will have a stock price of at least $40?

36

Page 37: Lecture Notes for AgEc 2225: Statistical Analysis

What is the probability that a company will have a stock price no higher than $20?

6.3 Normal Approximation of Binomial Probabilities

When np ≥ 5 and n(1− p) ≥ 5 to get the normal approximation we set

µ = np

andσ =

√np(1− p)

Example: Suppose p = .2 and n = 100:What is the mean and standard deviation?

What is the probability of 24 successes?

What is the probability of fewer than 15 successes?

Example: Although studies continue to show smoking leads to significant health problems, 20% ofadults in the US smoke. Consider a group of 250 adults.

What is the expected number of adults who smoke?

What is the probability that fewer than 40 smoke?

37

Page 38: Lecture Notes for AgEc 2225: Statistical Analysis

What is the probability that from 55 to 60 smoke?

What is the probability that 70 or more smoke?

6.4 Exponential Probability Distribution

The exponential probability density function is:

f(x) =1

µe−x/µ for x ≥ 0 (6.4)

The exponential distribution: cumulative probabilities:

P (x ≤ x0) = 1− e−x0/µ (6.5)

Graphically:

Example: Given

f(x) =1

8e−x/8

Find P (x ≤ 6).

Find P (4 ≤ x ≤ 6).

38

Page 39: Lecture Notes for AgEc 2225: Statistical Analysis

Example: The time between arrivals of vehicles at a particular intersection follows an exponentialprobability distribution with a mean of 12 seconds.

Sketch this exponential probability distribution.

What is the probability that the arrival time between vehicles is 12 seconds or less?

What is the probability that the arrival time between vehicles is 30 or more seconds?

39

Page 40: Lecture Notes for AgEc 2225: Statistical Analysis

7 Sampling and Sampling Distributions

Figure 7.1: Sampling distributions with 5 samples (upper left), 50 samples (upper right), and 500samples (bottom).

40

Page 41: Lecture Notes for AgEc 2225: Statistical Analysis

Expected value of x:E(x) = µ (7.1)

where E(x) is the expected value of x and µ is the population mean.The standard deviation of x:

σx =

√N − nN − 1

( σ√n

)Finite Population (7.2)

σx =σ√n

Infinite Population

The expected value of p isE(p) = p (7.3)

The standard deviation of p is

σp =

√N − nn− 1

√p(1− p)

nFinite Population (7.4)

σp =

√p(1− p)

nInfinite Population

41

Page 42: Lecture Notes for AgEc 2225: Statistical Analysis

8 Interval Estimation

8.1 Population Mean: σ Known

The interval estimate of a population mean, when σ is known is:

x± zα/2σ√n

(8.1)

where (1 − α) is the confidence coefficient and zα/2 is the z value providing an area of α/2 in theupper tail of the standard normal probability distribution.

Confidence Level α α/2 zα/290% .10 .05 1.64595% .05 .025 1.96099% .01 .005 2.576

Example: Suppose we have a random sample of 50 items with a population σ = 6 and a samplex = 32.

Example: A survey of 10 restaurants in the Fast Food/Pizza industry showed a sample meancustomer satisfaction index of 71. Past data indicate that the population standard deviation of theindex has been relatively stable with σ = 5.

What assumption should the researcher be willing to make if a margin of error is desired?Using 95% confidence, what is the margin of error?

What is the margin of error if 99% confidence is desired? 4

42

Page 43: Lecture Notes for AgEc 2225: Statistical Analysis

8.2 Population Mean: σ Unknown

The interval estimate of a population mean when σ is unknown:

x± tα/2s√n

(8.2)

where s =√∑

(xi−x)2

n−1 is the sample standard deviation, (1 − α) is the confidence coefficient, and

tα/2 is the t value providing an area of α/2 in the upper tail of the t distribution with n− 1 degreesof freedom.

Example: A simple random sample with n = 54 provided a sample mean of 22.5 and a samplestandard deviation of 4.4.

43

Page 44: Lecture Notes for AgEc 2225: Statistical Analysis

Example: The average cost per night of a hotel room in NYC is $273. Assume this estimate is basedon a sample of 45 hotels and that the sample standard deviation is $65.

With 95% confidence, what is the margin of error?

Two years ago the average cost of a hotel room in NYC was $229. Discuss the change in costover the two-year period.

8.3 Determining the Sample Size

Sample size for an interval estimate of a population mean:

n =z2α/2σ

2

E2(8.3)

Example: How large a sample should be selected to provide a 95% confidence interval with a marginof error of 10? Assume that the population standard deviation is 40.

44

Page 45: Lecture Notes for AgEc 2225: Statistical Analysis

The mean of the sampling distribution of p is the population proportion p, and the standard erroris:

σp =

√p(1− p)

n(8.4)

The margin of error for the population proportion is:

Margin of Error = zα/2

√p(1− p)

n(8.5)

The interval estimate of a population proportion is:

p± zα/2

√p(1− p)

n(8.6)

Required sample size based on our desired margin of error:

n =(zα/2)2p∗(1− p∗)

E2(8.7)

where p∗ is the planning value of p.

Example: A simple random sample of 400 individuals provides 100 Yes responses.

1. What is the point estimate of the proportion of the population that would provide Yes re-sponses?

2. What is our estimate of the standard error of the proportion, σp?

3. Compute the 95% confidence interval for the population proportion.

Example: In a survey, the planning value for the population proportion is p∗ = .35. How large of asample should be taken to provide a 95% confidence interval with a margin of error .05?

45

Page 46: Lecture Notes for AgEc 2225: Statistical Analysis

9 Hypothesis Tests

9.1 Developing a Null and Alternative Hypothesis

• A new teaching method is developed that is believed to be better than the current method.

• A new sales force bonus plan is developed in an attempt to increase sales.

• A new drug is developed with the goal of lowering blood pressure more than an existing drug.

Example: The manager of an automobile dealership is considering a new bonus plan designed toincrease sales volume. Currently, the mean sales volume is 14 cars per month. The manager wantsto conduct a research study to see whether the new bonus plan increases sales volume. To collectdata on the plan, a sample of sales personnel will be allowed to sell under the new bonus plan for aone-month period.

1. Develop the null and alternative hypothesis most appropriate for this situation.

46

Page 47: Lecture Notes for AgEc 2225: Statistical Analysis

2. Comment on the conclusion when H0 cannot be rejected.

3. Comment on the conclusion when H0 can be rejected.

Example: A production line operation is designed to fill cartons with laundry detergent to a meanweight of 32 ounces. A sample of cartons is periodically selected and weighted to determine whetherunderfilling or overfilling is occurring. If the sample data lead to a conclusion of underfilling oroverfilling, the production line will be shut down and adjusted to obtain proper filling.

1. Formulate the null and alternative hypotheses that will help in deciding whether to shut downthe production line.

2. Comment on the conclusion and the decision when H0 cannot be rejected.

3. Comment on the conclusion and the decision when H0 can be rejected.

4. What would a regulator’s null and alternative hypothesis be?

9.2 Type I and Type II Errors

Population ConditionH0 True Ha True

ConclusionAccept H0 Correct Conclusion Type II ErrorReject H0 Type I Error Correct Conclusion

47

Page 48: Lecture Notes for AgEc 2225: Statistical Analysis

Example: Carpetland salespersons average $8,000 per week in sales. Steve, the firm’s vice president,proposes a compensation plan with new selling incentives. Steve hopes that the results of a trialselling period will enable him to conclude that the compensation plans increase the average salesper salesperson.

1. Develop the appropriate null and alternative hypothesis.

2. What is the Type I error in this situation? What are the consequences of making this error?

3. What is the Type II error in this situation? What are the consequences of making this error?

9.3 Population Mean: σ Known

Example of a lower tail test: Hilltop Coffee puts 3 pounds of coffee in each can.

• If regulators want to check that claim, what would their hypothesis be? What does theoutcome tell us?

In our Hilltop example, we know that the population standard deviation is σ = .18. Suppose wepick a sample of 36 cans. That gives us a standard error of the sampling distribution of:

48

Page 49: Lecture Notes for AgEc 2225: Statistical Analysis

Test statistic for hypothesis tests about a population mean when σ is known:

z =x− µ0

σ/√n

(9.1)

Back to Hilltop Coffee: Suppose we get our sample of 36 cans and get x = 2.92. Is 2.92 small enoughto reject H0?

Summary:

1. Compute the value of the test statistic using equation (9.1).

2. Lower tail test: Using the standard normal distribution, compute the probability that z isless than or equal to the value of the test statistic (area in the lower tail).

3. Upper tail test: Using the standard normal distribution, compute the probability that z isgreater than or equal to the value of the test statistic (area in the upper tail).

49

Page 50: Lecture Notes for AgEc 2225: Statistical Analysis

Example: Consider the following hypothesis test:

H0 : µ ≤ 25

Ha : µ > 25

A sample of 40 provided a sample mean of 26.4. The population standard deviation is 6.

1. Compute the value of the test statistic.

2. What is the p-value?

3. At α = .01, what is your conclusion?

4. What is the rejection rule using the critical value? What is your conclusion?

Computation of p-values for two-tailed tests:

1. Compute the value of the test statistic using equation (9.1).

2. If the value of the test statistic is in the upper tail, compute the probability that z is greaterthan or equal to the value of the test statistic (upper tail area). If the value of the test statisticis in the lower tail, compute the probability that z is less than or equal to the value of the teststatistic (the lower tail area).

3. Double the probability (or tail area) from step 2 to obtain the p-value.

50

Page 51: Lecture Notes for AgEc 2225: Statistical Analysis

Example: Consider the following hypothesis test:

H0 : µ = 15

Ha : µ 6= 15

A sample of 50 provided a sample mean of 14.15. The population standard deviation is 3.

1. Compute the value of the test statistic.

2. What is the p-vaue?

3. At α = .05, what is your conclusion?

4. What is the rejection rule using the critical value? What is your conclusion?

9.4 Population Mean: σ Unknown

The test statistic for hypothesis tests about a population mean when σ is unknown is:

t =x− µ0

s/√n

(9.2)

with n− 1 degrees of freedom.

Example: Consider the following hypothesis test:

H0 : µ = 18

Ha : µ 6= 18

A sample of 48 provided a sample mean of x = 17 and a sample standard deviation of s = 4.5.

51

Page 52: Lecture Notes for AgEc 2225: Statistical Analysis

1. Compute the value of the test statistic.

2. Use the t distribution table to compute a range for the p-value.

3. At α = .05, what is your conclusion?

4. What is the rejection rule using the critical value? What is your conclusion?

9.5 Population Proportion

The test statistic for hypothesis tests about a population proportion:

z =p− p0√p0(1−p0)

n

(9.3)

Example: Consider the following hypothesis test:

H0 : p ≥ .75

Ha : p < .75

A sample of 300 items was selected. Compute the p-value and state your conclusions for each of thefollowing sample results. Use α = .05.

1. p = .68

2. p = .72

52

Page 53: Lecture Notes for AgEc 2225: Statistical Analysis

10 Comparisons Involving Means

10.1 Inferences About the Difference Between Two Population Means: σ1 andσ2 Known

The point estimator of the difference between two population means:

x1 − x2 (10.1)

Standard error of x1 − x2:

σx1−x2 =

√σ2

1

n1+σ2

2

n2(10.2)

The interval estimate’s margin of error is:

Margin of error = zα/2σx1−x2 = zα/2

√σ2

1

n1+σ2

2

n2(10.3)

That gives us an interval estimate:

x1 − x2 ± zα/2

√σ2

1

n1+σ2

2

n2(10.4)

where 1− α is the confidence coefficient.

Example: Conde Nast Traveler conducts an annual survey in which readers rate their favoritecruise ship. All ships are rated on a 100-point scale with higher values indicating better service. Asample of 37 ships that carry fewer than 500 passengers resulted in an average rating of 85.36, and asample of 44 ships that carry 500 or more passengers resulted in an average rating of 81.40. Assumethat the population standard deviation is 4.55 for ships that carry fewer than 500 passengers and3.97 for ships that carry 500 or more passengers.

a. What is the point estimate of the difference between the population mean rating for the twotypes of ships?

b. At 95% confidence, what is the margin of error?

c. What is a 95% confidence interval estimate of the difference between the population meanratings for the two size ships?

53

Page 54: Lecture Notes for AgEc 2225: Statistical Analysis

Test statistic for hypothesis tests about µ1 − µ2: σ1 and σ2 known:

z =(x1 − x2)−D0√

σ21n1

+σ22n2

(10.5)

Example: During the 2003 season, MLB took steps to speed up the play of baseball games in orderto maintain fan interest. The following results come from a sample of 60 games played during thesummer of 2002 and a sample of 50 games played during the summer of 2003. The sample meanshows the mean duration of the games included in each sample.

2002 Season 2003 Season

n1 = 60 n2 = 50x1 =2 hours, 52 minutes x2 =2 hours, 46 minutes

a. A research hypothesis was that the steps taken during the 2003 season would reduce thepopulation mean duration of baseball games. Formulate the null and alternative hypotheses.

b. What is the point estimate of the reduction in the mean duration of games during the 2003season?

c. Historical data indicate a population standard deviation of 12 minutes is a reasonable as-sumption for both years. Conduct the hypothesis test and report the p-value. At a .05 levelof significance, what is your conclusion?

d. Provide a 95% confidence interval estimate of the reduction in the mean duration of gamesduring the 2003 season.

54

Page 55: Lecture Notes for AgEc 2225: Statistical Analysis

10.2 Inferences About the Difference Between Two Population Means: σ1 andσ2 Unknown

Interval estimate of the difference between two population means: σ1 and σ2 unknown:

x1 − x2 ± tα/2

√s2

1

n1+s2

2

n2(10.6)

where 1− α is the confidence coefficient.Degrees of freedom: t distribution with two independent random samples:

df =

(s21n1

+s22n2

)2

1n1−1

(s21n1

)2+ 1

n2−1

(s22n2

)2 (10.7)

When this results in a decimal, we round down to the nearest whole number.Test statistic for hypothesis tests about µ1 − µ2: σ1 and σ2 unknown:

t =(x1 − x2)−D0√

s21n1

+s22n2

(10.8)

Example: Salary data show staff nurses in Tampa earn less than staff nurses in Dallas. Do staffnurses in Tampa really earn less?

Tampa Dallas

n1 = 40 n2 = 50x1 = $56, 100 x2 = $59, 400s1 = $6, 000 s2 = $7, 000

a. Formulate a hypothesis so that, if the null hypothesis is rejected, we can conclude that salariesfor staff nurses in Tampa are significantly lower than those in Dallas. Use α = .05.

b. What is the value of the test statistic?

c. What is the p-value?

55

Page 56: Lecture Notes for AgEc 2225: Statistical Analysis

d. What is your conclusion?

10.3 Inferences About the Difference Between Two Population Means: MatchedSamples

Test statistic for hypothesis tests involving matched samples:

t =d− µdsd/√n

(10.9)

where d is the average difference between item one and item 2 that we are comparing, µd is thehypothesized value for µ, and sd is the standard deviation of the difference.

Example: A market research firm used a sample of individuals to rate the purchase potential of aparticular product before and after the individuals saw a new TV commercial about the product.The purchase potential ratings were based on a 0 to 10 scale, with higher values indicating a higherpurchase potential.

Purchase Rating Purchase RatingIndividual After Before Individual After Before

1 6 5 5 3 52 6 4 6 9 83 7 7 7 7 54 4 3 8 6 6

a. Construct a hypothesis test to determine if individuals are more apt to buy the product afterseeing the commercial.

b. Use α = .05 and the hypothesis test to determine the value of the commercial.

56

Page 57: Lecture Notes for AgEc 2225: Statistical Analysis

11 Simple Linear Regression

A simple linear regression model takes the form:

y = β0 + β1x+ ε (11.1)

where β0 and β1 are parameters and ε (epsilon) is a random variable called an error term.

The estimated regression equation is:y = b0 + b1x (11.2)

57

Page 58: Lecture Notes for AgEc 2225: Statistical Analysis

11.1 Ordinary Least Squares

Data on the number of fatal accidents per 1000 licensed drivers and the percentage of drivers under21 from 42 cities (first 9 observations).

PercentUnder 21 Fatal Accidents per 1000

13 2.96212 0.7088 0.88512 1.65211 2.09117 2.62718 3.838 0.36813 1.142

The simple linear regression for this relationship is:

accidentsi = b0 + b1(p21i) (11.3)

OLS criterion:min

∑(yi − yi)2 (11.4)

where yi is the observed value of the dependent variable for the ith observation and yi is the estimatedvalue of the dependent variable for the ith observation.

The slope and y intercept for the estimated regression equation:

b1 =

∑(xi − x)(yi − x)∑

(xi − x)2(11.5)

b0 = y − b1x (11.6)

58

Page 59: Lecture Notes for AgEc 2225: Statistical Analysis

Example:

xi 1 2 3 4 5

yi 3 7 5 11 14

1. Develop a scatter diagram for these data.

2. What does the scatter diagram developed in part (a) indicate about the relationship betweenthe two variables?

3. Develop the estimated regression equation by computing the values of b0 and b1.

59

Page 60: Lecture Notes for AgEc 2225: Statistical Analysis

4. Use the estimated regression equation to predict the value of y when x = 4.

11.2 Coefficient of Determination

Sum of squares due to error, or SSE:

SSE =∑

(yi − yi)2 (11.7)

Total sum of squares, or SST:

SST =∑

(yi − y)2 (11.8)

The sum of squares due to regression, or SSR is:

SSR =∑

(yi − y)2 (11.9)

Intuitively, all variation (SST) must be explained either by the regression (SSR) or the error (SSE).Thus:

SST = SSR+ SSE (11.10)

60

Page 61: Lecture Notes for AgEc 2225: Statistical Analysis

Example: Back to the question above:

xi yi yi yi − yi (yi − yi)2 yi − y (yi − y)2

1 3 2.8

2 7 5.4

3 5 8

4 11 10.6

5 14 13.2

The coefficient of variation, denoted R2 is:

R2 =SSR

SST= 1− SSE

SST(11.11)

61

Page 62: Lecture Notes for AgEc 2225: Statistical Analysis

There are a number of assumptions that need to be made in order for OLS to work correctly.

1. The error term, ε, is a random variable with a mean of zero.

2. The variance of ε, denoted by σ2, is the same for all values of x.

3. The values of ε are independent.

4. The error term ε is a normally distributed random variable.

Mean square error:

s2 = MSE =SSE

n− 2(11.12)

The square root of the MSE is called the standard error of the estimate, and is defined as:

s =√MSE =

√SSE

n− 2(11.13)

11.3 The t-test

The standard error for b1 is:

sb1 =s√∑

(xi − x)2(11.14)

where s is from equation 11.13.

62

Page 63: Lecture Notes for AgEc 2225: Statistical Analysis

The test statistic used to determine the significance of β1 is:

t =b1sb1

(11.15)

Example from above:

xi 1 2 3 4 5

yi 3 7 5 11 14

1. Compute the mean square error.

2. Compute the standard error of the estimate.

3. Compute the estimated standard error of b1.

4. Use a t test to test for statistical significance.

63

Page 64: Lecture Notes for AgEc 2225: Statistical Analysis

11.4 A Few Other Topics

The residual for observation i is:yi − yi (11.16)

Residual Plot Against x on the left, and Residual Plot Against y on the right.

64

Page 65: Lecture Notes for AgEc 2225: Statistical Analysis

12 Multiple Regression

The multiple regression model takes the form:

y = β0 + β1x1 + β2x2 + · · ·+ βpxp + ε (12.1)

The estimated multiple regression equation is:

y = b0 + b1x1 + b2x2 + · · ·+ bpxp (12.2)

Example: A survey of 50 households asked for income, the number of people in the household, andthe amount charged on credit cards.

charges = b0 + b1(members)

where charges is the amount of credit card charges made by the household, and members is thenumber of members of the household.

Running the regression in Excel, we get the following output:

Linear Regression

Regression StatisticsR Square 0.56677Adjusted R Square 0.55775Total number of observations 50

charges = 2581.9410 + 404.1284 * members

Coefficients Standard Error t Stat p-levelIntercept 2,581.94102 195.26258 13.22292 0.E+0Members 404.12836 50.99787 7.92442 0.

65

Page 66: Lecture Notes for AgEc 2225: Statistical Analysis

charges = b0 + b1(members) + b2(income)

where charges is the amount of credit card charges made by the household, members is the numberof members of the household, and income is the combined income of the household (in thousands).

Running the model we get the following Excel output:

Linear Regression

Regression StatisticsR Square 0.82556Adjusted R Square 0.81814 F-stat 111.21765Total number of observations 50 p-level 0.E+0

charges = 1304.9048 + 33.1330 * Income($1000s) + 356.2959 * members

Coefficients Standard Error t Stat p-levelIntercept 1,304.90478 197.65484 6.60194 0.

Income($1000s) 33.13301 3.96791 8.35025 7.68207E-11Members 356.2959 33.20089 10.73152 3.11973E-14

SST = SSR+ SSE (12.3)

R2 =SSR

SST(12.4)

Adjusted R2:

R2a = 1− (1−R2)

n− 1

n− p− 1(12.5)

66

Page 67: Lecture Notes for AgEc 2225: Statistical Analysis

There are several assumptions that we need to make about the error term in order for OLS to givecorrect coefficients and t tests.

1. The error term ε is a random variable with mean zero. That is, E(ε) = 0.

2. The variance of ε is denoted by σ2 and is the same for all values of the independent variablesx1, x2, · · · , xp.

3. The values of ε are independent.

4. The error term is a normally distributed random variable reflecting the deviation between they value and the expected value of y given by β0 + β1x1 + β2x2 + · · ·+ βpxp.

12.1 Testing for Significance

A mean square is the sum of squares divided by the degrees of freedom:

MSR =SSR

p(12.6)

MSE =SSE

n− p− 1(12.7)

The F statistic is:

F =MSR

MSE=

SSR/p

SSE/(n− p− 1)(12.8)

67

Page 68: Lecture Notes for AgEc 2225: Statistical Analysis

12.2 Categorical Independent Variables

Suppose we have:

salary = b0 + b1(experience) + b2(gender) (12.9)

where gender is a dummy variable (x1 = 1 for male).

Example: We have data on 24 treadmills, including the price, the quality rating, and the overallscore.

Linear Regression

Regression StatisticsR Square 0.61659Adjusted R Square 0.55908 F-stat 10.72132Total number of observations 24 p-level 0.0002

Score = 65.6597 + 0.0023 * Price + 10.2097 * Excellent + 5.9246 * Very Good

Coefficients Standard Error t Stat p-levelIntercept 65.65972 2.50685 26.1921 1.11022E-16

Price 0.00232 0.00123 1.88452 0.07411Excellent 10.20966 3.43795 2.96969 0.00757

Very Good 5.92464 2.75857 2.14772 0.04417

68

Page 69: Lecture Notes for AgEc 2225: Statistical Analysis

12.3 Nonlinear Models

12.3.1 Quadratic Models

69

Page 70: Lecture Notes for AgEc 2225: Statistical Analysis

12.3.2 Logarithmic Models

Case Regression Specification Interpretation of b1I yi = b0 + b1 ln(xi) A 1% change in x is associated with a change in

y of 0.01× b1.II ln(yi) = b0 + b1x1 A change in x by one unit is associated with a

100× b1% change in y.III ln(yi) = b0 + b1 ln(xi) A 1% change in x is associated with a b1%

change in y.

70