introduction to linear regression living with the lab © 2011 david hall and the lwtl faculty team...

14
introduction to linear regression living with the lab © 2011 David Hall and the LWTL faculty team The Living with the Lab label, the Louisiana Tech Logo, and this copyright notice should not be removed when any part of this work is used by others. This work may not be used for commercial purposes. Inquiries should be addressed to [email protected]. This presentation on linear regression is based partially on class notes created by Dr. Mark Barker at Louisiana Tech University. linear regression provides a predictable way to quantify the relationship between two variables, even when significant uncertainty and measurement error exist environmental data medical data process parameters http://mrg.bz/ sWnKWI http://mrg.bz/ jR4UEX http://mrg.bz/sMmqHk

Upload: marvin-glenn

Post on 28-Dec-2015

224 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: Introduction to linear regression living with the lab © 2011 David Hall and the LWTL faculty team The Living with the Lab label, the Louisiana Tech Logo,

introduction to linear regression

living with the lab

© 2011 David Hall and the LWTL faculty teamThe Living with the Lab label, the Louisiana Tech Logo, and this copyright notice should not be removed when any part of this work is used by others. This work may not be used for commercial purposes. Inquiries should be addressed to [email protected]. This presentation on linear regression is based partially on class notes created by Dr. Mark Barker at Louisiana Tech University.

linear regression provides a predictable way to quantify the relationship between two variables, even when significant uncertainty and measurement error exist

environmental data medical data process parametershttp://mrg.bz/sWnKWI http://mrg.bz/jR4UEX http://mrg.bz/sMmqHk

Page 2: Introduction to linear regression living with the lab © 2011 David Hall and the LWTL faculty team The Living with the Lab label, the Louisiana Tech Logo,

living with the lab

2

The content of this presentation is for informational purposes only and is intended only for students attending Louisiana Tech University.

The author of this information does not make any claims as to the validity or accuracy of the information or methods presented.

The procedures demonstrated here are potentially dangerous and could result in injury or damage.

Louisiana Tech University and the State of Louisiana, their officers, employees, agents or volunteers, are not liable or responsible for any injuries, illness, damage or losses which may result from your using the materials or ideas, or from your performing the experiments or procedures depicted in this presentation.

If you do not agree, thendo not view this content.

DISCLAIMER

Page 3: Introduction to linear regression living with the lab © 2011 David Hall and the LWTL faculty team The Living with the Lab label, the Louisiana Tech Logo,

living with the lab

collect some data to see how linear regression works

3

• we know that our heart rate increases as we begin to exercise

• heart rate is usually expressed in beats per minute (bpm)

• we can record our pulse over a short period of time to estimate heart rate . . . we’ll collect over a 10 second period

• the variation of heart rate during exercise is complex and depends on many factors (fitness, the level of exertion, the duration of exercise, what you’ve been eating/drinking, . . .)

• we will assume that heart rate is initially linear with the duration of exercise just to collect some data . . . this could serve as a starting point for a systematic study of heart rate during exercise

Page 4: Introduction to linear regression living with the lab © 2011 David Hall and the LWTL faculty team The Living with the Lab label, the Louisiana Tech Logo,

collect pulse after doing jumping jacks1. measure pulse for 10 seconds (have a partner write down the number of beats)

4

living with the lab

0 10 20 30 40 50 60 70 80 90 total time (s)

STOP STOP STOP STOP

collect heart rate five times

jump jump jump

0 10 20 30 40 jumping time (s)

jump

2. do jumping jacks for 10 seconds 10 seconds of total exercise

STOP

3. measure pulse for 10 seconds4. do jumping jacks for 10 seconds 20 seconds of total

exercise5. measure pulse for 10 seconds6. do jumping jacks for 10 seconds 30 seconds of total exercise7. measure pulse for 10 seconds8. do jumping jacks for 10 seconds 40 seconds of total

exercise9. measure pulse for 10 seconds

Page 5: Introduction to linear regression living with the lab © 2011 David Hall and the LWTL faculty team The Living with the Lab label, the Louisiana Tech Logo,

living with the lab

5

logistics• choose one or two people per table to do jumping jacks; this is voluntary . . . don’t do the jumping

jacks if there is any reason why this activity could be harmful to you

• the people who are jumping should get away from tripping hazards and other people (clear a space around your table and keep yourself under control while exercising)

• your instructor will keep track of time and tell you when to jump and when to collect heart rate; a cell phone, watch or online stopwatch can be used

• we need about 7 to 10 sets of data from the entire class . . . not everybody will get to exercise

• we’ll analyze and plot this data using Excel

• the heart rate collected will include some erroro collect pulse as soon as you stop jumpingo after 10 seconds, call out the number of pulses collected over 10 seconds to your

partner(s) and start jumping again

• just be as accurate as possible

www.onlinestopwatch.com

Page 6: Introduction to linear regression living with the lab © 2011 David Hall and the LWTL faculty team The Living with the Lab label, the Louisiana Tech Logo,

enter heart rate data into a Excel

living with the lab

6

time(s)

student1 (bpm)

student2 (bpm)

student3 (bpm)

student4 (bpm)

student5 (bpm)

student6 (bpm)

student7 (bpm)

student8 (bpm)

0

10

20

30

40

• please multiply the number of pulses collected over 10 seconds by 6 to get beats per minute (bpm)

• report bpm to your instructor

• build a spreadsheet on your computer along with the instructor

Page 7: Introduction to linear regression living with the lab © 2011 David Hall and the LWTL faculty team The Living with the Lab label, the Louisiana Tech Logo,

living with the lab

7

plot data for the entire class in Excel• make a scatter plot using symbols only – no lines

• time is the independent variable and is plotted as the x-axis

• heart rate is the dependent variable and is plotted as the y-axis

• the title of the plot is always listed as “y versus x” . . . which is “heart rate versus exercise time” for this problem

0 5 10 15 20 25 30 35 40 455060708090

100110120130140150

heart rate versus exercise time

cumulative exercise time (s)

hear

t rat

e (b

pm)

Page 8: Introduction to linear regression living with the lab © 2011 David Hall and the LWTL faculty team The Living with the Lab label, the Louisiana Tech Logo,

living with the lab

8

make a hand plot for one data set• your instructor will select one student’s data that is typical of the data for the

entire class; we will analyze this data• make a hand plot using your own paper as shown below (use proper format!!)• draw a “best fit” line through the data; just use your judgment

heart rate versus exercise time

cumulativeexercise time heart rate

(s) (bpm)0 6710 8220 8630 9640 120

use data from class . . . not this data

“best fit line”

Page 9: Introduction to linear regression living with the lab © 2011 David Hall and the LWTL faculty team The Living with the Lab label, the Louisiana Tech Logo,

living with the lab

9

find an equation to fit the data • assume the data is linear• pick two points from your data (or make up two points by picking from the line)• compute the slope • write equation using point-slope form as

cumulativeexercise time heart rate

(s) (bpm)0 6710 8220 8630 9640 120

𝑏=𝑦−𝑚 ∙ 𝑥=¿

h𝑒𝑎𝑟𝑡 𝑟𝑎𝑡𝑒=1.27 ∙ 𝑡𝑖𝑚𝑒+69.3

find the y-intercept by plugging in one of the data points:

write the equation:

example (use data from your class)

𝑚=∆ 𝑦∆ 𝑥

=¿find the slope:

. . . where heart rate is in bpm and time is in seconds.

27

120−1.27 ∙ 40=69.3

Page 10: Introduction to linear regression living with the lab © 2011 David Hall and the LWTL faculty team The Living with the Lab label, the Louisiana Tech Logo,

living with the lab

10

analysis of our equations• compare your answer with others in the class• if you chose the same two points to define your “best fit” line, then your

equations should be the same• choosing different points causes us to get different equations

• linear regression, which can be derived using calculus, gives us the same equation every time

• linear regression takes the guess work out of finding best fit lines

http://earthobservatory.nasa.gov/IOTD/view.php?id=46145

Page 11: Introduction to linear regression living with the lab © 2011 David Hall and the LWTL faculty team The Living with the Lab label, the Louisiana Tech Logo,

living with the lab

11

understanding linear regression

𝑥

𝑦

𝑥𝑖

𝑦 𝑖𝑓𝑖𝑡𝑦 𝑖

𝑦 𝑖❑− 𝑦 𝑖

𝑓𝑖𝑡

data point (𝑥 𝑖 , 𝑦 𝑖)

best fit line

𝑦❑𝑓𝑖𝑡=𝑚 ∙𝑥+𝑏

𝑦 𝑖𝑓𝑖𝑡=𝑚 ∙𝑥 𝑖+𝑏

• linear regression generates the best line by minimizing the squares of the errors• minimize for all data points to find optimum values of m and b• we call this least squares linear regression

Page 12: Introduction to linear regression living with the lab © 2011 David Hall and the LWTL faculty team The Living with the Lab label, the Louisiana Tech Logo,

living with the lab

12

finding m and b

𝑚=𝑛∑ 𝑥 𝑖 𝑦 𝑖−∑ 𝑥𝑖∑ 𝑦 𝑖

𝑛∑ 𝑥 𝑖2  − (∑ 𝑥𝑖 )

2 𝑏=∑ 𝑦 𝑖−𝑚∑ 𝑥 𝑖

𝑛

cumulativeexercise time heart rate

(s) (bpm)0 6710 8220 8630 9640 120

cumulativeexercise time heart rate

(s) (bpm)

x y x · y x2

0 6710 8220 8630 9640 120

0 0820 100

400900

1600

172028804800

100∑

451∑

10220∑

3000∑

𝑦=𝑚 ∙𝑥+𝑏

Repeat the above procedure for the data set selected in your class. Compare the m and b that you get with your classmates. Doing this by hand is good practice for the exam.

¿1.2(100 )23000−5 ∙10220−100 ∙451𝑚=¿5 ∙

¿66.2𝑏=¿451−1.2 ∙1005

h𝑒𝑎𝑟𝑡 𝑟𝑎𝑡𝑒=¿1.2 ∙𝑡𝑖𝑚𝑒+¿66.2

Page 13: Introduction to linear regression living with the lab © 2011 David Hall and the LWTL faculty team The Living with the Lab label, the Louisiana Tech Logo,

living with the lab

13

repeat for all of the class datacumulative

exercise time heart rate(s) (bpm)

x y x · y x2

010203040010203040010203040010203040

400

stud

ent 1

stud

ent 2

stud

ent 3

stud

ent 4

∑ ∑ ∑ ∑

• reformat your spreadsheet to have single x and y columns as shown (5 lines for each students heart rate data)

• find the sums and plug them into the equations for m and b to find the best fit line; try to do these calculations in Excel . . . it’s tricky due to fixed cell references and the placement of parentheses

• create a plot of all data in Excel

• plot the best fit line without any symbols over the data points

• see the next page for an example

Page 14: Introduction to linear regression living with the lab © 2011 David Hall and the LWTL faculty team The Living with the Lab label, the Louisiana Tech Logo,

living with the lab

14

details of solving previous problem in Excelcumulative

exercise time heart rate(s) (bpm)

x y x · y x2 yfit

0 67 0 0 70.510 82 820 100 80.720 86 1720 400 90.930 96 2880 900 101.040 120 4800 1600 111.20 50 0 010 60 600 10020 70 1400 40030 80 2400 90040 90 3600 16000 82 0 010 92 920 10020 105 2100 40030 110 3300 90040 115 4600 16000 80 0 010 91 910 10020 105 2100 40030 118 3540 90040 118 4720 1600

400 1817 40410 12000

m = 1.0175b = 70.5

stud

ent 1

stud

ent 2

stud

ent 3

stud

ent 4

123456789

1011121314151617181920212223242526272829

A B C D E F

=C$28*B5+C$29

=(COUNT(B5:B24)*D26-B26*C26)/(COUNT(B5:B24)*E26-B26^2)

don’t look at these tips unless you get stuck!!

0 5 10 15 20 25 30 35 40 4550

70

90

110

130

150

heart rate versus exercise time

cumulative exercise time (s)

hear

t rat

e (b

pm)

use these data point to plot the best-fit line