4-step process confidence intervalsteachers.dadeschools.net/khankollari/ap stat notes... ·...

120
4-Step Process Confidence Intervals 1a

Upload: others

Post on 29-Jan-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

  • 4-Step Process Confidence Intervals

    1a

  • 1b

  • 4-Step Process Significance Tests

    2a

  • 2b

  • Advantage of using a StratifiedRandom Sample

    3a

  • Stratified random samplingguarantees that each of the strata willbe represented. When strata arechosen properly, a stratified randomsample will produce better (lessvariable/more precise) informationthan an SRS of the same size.

    3b

  • Bias

    4a

  • The systematic favoring of certainoutcomes due to flawed sampleselection, poor question wording,under coverage, nonresponse, etc. Bias deals with the center of asampling distribution being "off"!

    4b

  • Binomial Distribution (CalculatorUsage)

    5a

  • 5b

  • Binomial Distribution (conditions)

    6a

  • 6b

  • Can we generalize the results to thepopulation of interest?

    7a

  • 7b

  • Carrying out a Two-sided test from aconfidence interval

    8a

  • 8b

  • Chi-Square Tests df and ExpectedCounts

    9a

  • 9b

  • Complementary Events

    10a

  • Two mutually exclusive events who'sunion is the sample space

    Ex. Rain / Not Rain

    10b

  • Describe the Distribution orCompare the Distributions

    11a

  • SOCS! Shape, Outliers, Center, Spread. Only discussoutliers if there are obviously outliers present.Be sure to address SCS in context! If it says "compare" YOU MUST USE comparison phrases like "isgreater than" or "is less than" for Center &Spread.

    11b

  • Does ____ CAUSE ____?

    12a

  • Association is NOT Causation! An observed association, no matterhow strong, is not evidence ofcausation. Only a well-designed,controlled experiment can lead toconclusions of cause and effect.

    12b

  • Experimental Designs

    13a

  • CRD (Completely Randomized Design) - All experimentalunits are allocated at random among all treatments RBD (Random Block Design) - Experimented units are put intohomogeneous blocks. The random assignment of the units tothe treatment is carried out separately within each block. Matched Pairs - A form of blocking in which each subjectreceives both treatments in random order or the subjects arematched in pairs as closely as possible and one subject ineach pair receive each treatment determined at random.

    13b

  • Experiment or Observational Study?

    14a

  • A study is an experiment ONLY ifresearchers IMPOSE a treatment

    upon the experimental units. In an observational study researchersmake no attempt to influence theresults.

    14b

  • Explain a P- value

    15a

  • 15b

  • Extrapolation

    16a

  • Using a LSRL to predict outside thedomain of the explanatory variable (Can lead to ridiculous conclusions ifthe current linear trend does notcontinue)

    16b

  • Factors the Affect Power

    17a

  • 17b

  • Finding the Sample Size (for a givenmargin of error)

    18a

  • 18b

  • Geometric Distribution (conditions)

    19a

  • 19b

  • Goal of Blocking | Benefit of Blocking

    20a

  • The goal of blocking is to creategroups of homogenous experimentalunits The benefit of blocked is thereduction of the effect of variationwithin the experimental units.

    20b

  • Inference for Counts (Chi-SquaredTest) (Conditions)

    21a

  • 21b

  • Inference for Means (conditions)

    22a

  • 22b

  • Inference for Proportions(conditions)

    23a

  • 23b

  • Inference for Regression (conditions)

    24a

  • 24b

  • Interpret a Residual Plot

    25a

  • 1. Is there a curved patter? If so, a linear

    model may not be appropriate.

    2. Are the residuals small in size? if so,

    predictions using the linear model will be

    fairly precise.

    3. Is there increase or decreasing

    spread? If so, predictions for larger

    (smaller) value of x will be more variable.25b

  • Interpreting a Confidence Interval(Not a confidence level)

    26a

  • 26b

  • Interpreting a Confidence Level (themeaning of 95% confident)

    27a

  • 27b

  • Interpreting Expected Value/ Mean

    28a

  • The mean/expected value of arandom variable is the long runaverage outcome of a randomphenomenon carried out a very largenumber of times.

    28b

  • Interpreting Probability

    29a

  • The Probability of any outcome of arandom phenomenon is theproportion of times the outcomewould occur in a very long series ofrepetitions. Probability is a long termrelative frequency.

    29b

  • Interpret LSRL "s"

    30a

  • s = _____ is the standard deviation ofthe residuals It Measures the typical distancebetween the actual y-values(context) and their predicted y-values (context)

    30b

  • Interpret LSRL "SE-b"

    31a

  • SE-b measures the standard deviation of

    the estimated slope for predicting the y

    variable (context) from the x variable

    (context).

    SE-b measures how far the estimated

    slope will be from the true slope, on

    average31b

  • Interpret LSRL Slop "b"

    32a

  • For every one unit change in the inthe x variable (context) the y variable(context) is predicted toincrease/decrease by ____ units(context)

    32b

  • Interpret LSRL "y"

    33a

  • ŷ is the "estimated" value or"predicted" y- value (context) for a

    given x-value (context)

    33b

  • Interpret LSRL y-intercept "a"

    34a

  • When the x variable (context) is zero,the y variable (context) estimated to

    be `put value here`

    34b

  • interpret r

    35a

  • Correlation measures the strength and direction

    of the linear relationship between x and y

    - r is always between -1 and 1

    - close to zero = very weak

    -close to 1 or -1 = stronger

    -Exactly 1 or -1 = perfectly straight line

    -positive r = positive correlation

    -negative r = negative correlation35b

  • interpret r ^ 2 (r squared)

    36a

  • ___% of the variation in y (context) is accountedfor by the LSRL of y (context) on x (context) or __$ of the variation in y (context) is accountedfor by using the linear regression model with X(context) as the explanatory variable

    36b

  • Interpret Standard Deviation

    37a

  • Standard Deviation measures spreadby giving the " typical" or "average"distance that the observations(context) are away from their(context) mean

    37b

  • Interpret Z-Score

    38a

  • z = value - mean / standard deviation A z-score describes how many standarddeviations a value or statistic (x, x̄,p̂, etc.) fallsaway from the mean of the distribution and inwhat direction. The further the z-score is awayfrom zero the more "surprising" the value ofthe statistic is.

    38b

  • Linear Transformation

    39a

  • Adding "a" to every member of a data set adds "a"

    to the measures of a position, but does not

    change the measures of spread or shape

    Multiplying every member of data set by "b"

    multiplies the measures of position by "b" and

    multiplies most measures of spread by |b| but

    does not change the shape

    39b

  • Mean & Standard Deviation of aBinomial Random Variable

    40a

  • 40b

  • Mean & Standard Deviation of aDifference of Two Random Variables

    41a

  • 41b

  • Mean & Standard Deviation of aDiscrete Random Variable

    42a

  • 42b

  • Mean & Standard Deviation of a Sumof a Two Random Variables

    43a

  • 43b

  • Outlier Rule

    44a

  • Upper Bound = Q3 + 1.5(IQR) Lower Bound = Q1 - 1.5(IQR)

    IQR = Q3 - Q1

    44b

  • Paired t-test Phrasing Hints, H0 andHa Conclusion

    45a

  • 45b

  • P(at least one)

    46a

  • P(at least one) = 1 - P(none)

    Ex. P(at least one 6 in three rolls) =_____

    P(Get at lest one six) = 1 - P(No Sixes)

    46b

  • The Sampling distribution of thesample mean (central limit theorem)

    47a

  • 47b

  • Sampling Techniques

    48a

  • SRS - Number the entire population, draw numbers from a hat ( every setof n individuals has equal chance of selection) Stratified - Split the population into homogeneous groups, selection anSRS from each group. Cluster - Split the population into heterogeneous groups called clusters,and randomly select whole clusters for the sample. Ex. choosing a cartonof eggs actually chooses a cluster (group) of 12 eggs Census - An attempt to reach the entire population Convenience - Selects individuals easiest to reach Voluntary Response - People choose themselves by responding to ageneral appeal

    48b

  • SOCS

    49a

  • SHAPE: skewed left (Mean < Median) skewed right (Mean > Median) Fairly Symmetrical ( Mean = Median) Outliers: Discuss them if there are obvious ones Center: Mean or Median Spread: Range, IQR, or Standard Deviation Note ~ Also be on the lookout for gaps, clusters, or other unusual features of theddata set.

    49b

  • SRS

    50a

  • A simple random sample is a sampletaken in such a way that everypossible set of n individuals has anequal chance of selection.

    50b

  • Two Events are independent if...

    51a

  • P(B) = P(B|A^c) P(B) = P( B | A ) The knowledge of one event doestnot change the probability of theother event occurring

    51b

  • Two Sample t-test Phrasing Hints, H0and Ha, Conclusion

    52a

  • 52b

  • Type I error, Type II Error, & Power

    53a

  • 53b

  • Types of Chi-Square Tests

    54a

  • Goodness of Fit - use to test the distribution of one group orsample as compared to a hypothesized distribution Homogeneity: Use when you have a sample from 2 or moreindependent populations of 2 or more groups in anexperiment. Each individual must be classified based upon asingle categorical variable. Association/Independence: Use when you have a singlesample from a single population. Individuals in the sample areclassified by two categorical variables

    54b

  • Unbiased Estimator

    55a

  • 55b

  • Using NormalCDF and InvNorm (Tibrand calculator tips)

    56a

  • Normal CDF (min, max, mean,standard deviation)

    InvNorm (area to the left as a

    decimal, mean, standard deviation)

    56b

  • What is an outlier?

    57a

  • When given 1 variable data: An outlier is any value that falls morethan 1.5(IQR) above Q3 or below Q1. Regression Outlier: Any value that falls outside thepattern of the rest of the data.

    57b

  • What is a residual?

    58a

  • Residual = y - ŷ

    A residual measures the difference

    between the actual (observed) y - value

    in a scatterplot and the y-value that is

    predicted by the LSRL using its

    corresponding x value.

    In the calculator: L3 = L2 - Y1 (L1)58b

  • Why large Samples give moretrustworthy Results...(When collected

    appropriately)

    59a

  • 59b

  • Why use a control group?

    60a

  • A control group gives theresearchers a comparison group tobe used to evaluate the effectivenessof the treatment(s). (context) (gauge

    the effect of the treatment comparedto no treatment at all)

    60b