data analysis. a few necessary terms categorical variable: discrete groups, such as type of reach...
TRANSCRIPT
Data Analysis
A Few Necessary Terms
Categorical Variable: Discrete groups, such as Type of Reach (Riffle, Run, Pool)
Continuous Variable: Measurements along a continuum, such as Flow Velocity
What type of variable would “Mottled Sculpin /meter2” be?
What type of variable is “Substrate Type”?
What type of variable is “% of bank that is undercut”?
A Few Necessary Terms
Explanatory Variable: Independent variable. On x-axis. The variable you use as a predictor.
Response Variable: Dependent variable. On y-axis. The variable that is hypothesized to depend on/be predicted by the explanatory variable.
Statistical Tests: Appropriate Use
For our data, the response variable will always be continuous.
T-test: A categorical explanatory variable with 2 options.
ANOVA: A categorical explanatory variable with >2 options.
Regression: A continuous explanatory variable
Statistical Tests
Hypothesis Testing: In statistics, we are always testing a Null Hypothesis (Ho) against an alternate hypothesis (Ha).
Test Statistic:
p-value: The probability of observing our data or more extreme data assuming the null hypothesis is correct
Statistical Significance: We reject the null hypothesis if the p-value is below a set value, usually 0.05.
Tests the statistical significance of the difference between means from two independent samples
Student’s T-Test
Cross Plains Salmo Pond
Mottled Sculpin/m2
Compares the means of 2 samples of a categorical variable
Precautions and Limitations
• Meet Assumptions
• Observations from data with a normal distribution (histogram)
• Samples are independent
• Assumed equal variance (boxplot)
• No other sample biases
• Interpreting the p-value
Analysis of Variance (ANOVA)Tests the statistical significance of the difference between means from two or more independent samples
ANOVA website Riffle Pool Run
Grand MeanMottled
Sculpin/m2
Precautions and Limitations
• Meet Assumptions
• Observations from data with a normal distribution
• Samples are independent
• Assumed equal variance
• No other sample biases
• Interpreting the p-value
• Pairwise T-tests to follow
Simple Linear Regression
• What is it? Least squares line
•When is it appropriate to use?
•Assumptions?
•What does the p-value mean? The R-value?
• How to do it in excel
Simple Linear Regression
R2 = 0.6955
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0 0.1 0.2 0.3 0.4 0.5
Mottled Sculpin/Meter^2
Bro
wn
Tro
ut/
Met
er^2
Tests the statistical significance of a relationship between two continuous variables, Explanatory and Response
Precautions and Limitations
• Meet Assumptions
• Observations from data with a normal distribution
• Samples are independent
• Assumed equal variance
• Relationship is linear
• No other sample biases
• Interpret the p-value and R-squared value.
Residual Plots
Residuals are the distances from observed points to the best-fit line
Residuals always sum to zero
Regression chooses the best-fit line to minimize the sum of square-residuals. It is called the Least Squares Line.
R2 = 0.6955
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0 0.1 0.2 0.3 0.4 0.5
Mottled Sculpin/Meter^2
Bro
wn
Tro
ut/
Met
er^2
Residuals
Residual vs. Fitted Value Plots
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0 0.1 0.2 0.3 0.4 0.5
Fitted Values (MS_CPUA)
Re
sid
ua
ls
Model Values (Line)
Observed Values (Points)
Residual Plots Can Help Test Assumptions
0
“Normal” Scatter
0
0Fan Shape: Unequal Variance
Curve (linearity)
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0 0.1 0.2 0.3 0.4 0.5
Fitted Values (MS_CPUA)
Re
sid
ua
ls
R2 = 0.6955
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0 0.1 0.2 0.3 0.4 0.5
Mottled Sculpin/Meter^2
Bro
wn
Tro
ut/
Met
er^2
Have we violated any assumptions?
R-Squared and P-value
High R-Squared
Low p-value (significant relationship)
R-Squared and P-value
Low R-Squared
Low p-value (significant relationship)
R-Squared and P-value
High R-Squared
High p-value (NO significant relationship)
R-Squared and P-value
Low R-Squared
High p-value (No significant relationship)
P-value indicates the strength of the relationship between the two variables
You can think of this as a measure of predictability
R-Squared indicates how much variance is explained by the explanatory variable.
If this is low, other variables likely play a role. If this is high, it DOES NOT INDICATE A SIGNIFICANT RELATIONSHIP!