me confidence intervals: bootstrap 2015-02-21آ  bootstrap sample statistic bootstrap sample...

Download ME Confidence Intervals: Bootstrap 2015-02-21آ  Bootstrap Sample Statistic Bootstrap Sample Statistic

Post on 13-Jul-2020

2 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

  • 2/20/2015

    1

    Statistics: Unlocking the Power of Data Lock5

    STAT 250 Dr. Kari Lock Morgan

    Confidence Intervals: Bootstrap Distribution

    SECTIONS 3.3, 3.4

    • Bootstrap distribution (3.3)

    • 95% CI using standard error (3.3)

    • Percentile method (3.4)

    Statistics: Unlocking the Power of Data Lock5

    Confidence Intervals

    Population Sample

    Sample

    Sample

    Sample Sample Sample

    . . .

    Calculate statistic for each sample

    Sampling Distribution

    Standard Error (SE): standard deviation of sampling distribution

    Margin of Error (ME) (95% CI: ME = 2×SE)

    statistic ± ME

    Statistics: Unlocking the Power of Data Lock5

    Reality

    One small problem…

    … WE ONLY HAVE ONE SAMPLE!!!!

    • How do we know how much sample statistics vary, if we only have one sample?!?

    BOOTSTRAP! Statistics: Unlocking the Power of Data Lock5

    Your best guess for the population

    is lots of copies of the sample!

    Sample repeatedly from this “population”

    Statistics: Unlocking the Power of Data Lock5

    • It’s impossible to sample repeatedly from the population…

    • But we can sample repeatedly from the sample!

    • To get statistics that vary, sample with replacement (each unit can be selected more than once)

    Sampling with Replacement

    Statistics: Unlocking the Power of Data Lock5

    Suppose we have a random sample of 6 people:

  • 2/20/2015

    2

    Statistics: Unlocking the Power of Data Lock5

    Original Sample

    A simulated “population” to sample from

    Statistics: Unlocking the Power of Data Lock5

    Bootstrap Sample: Sample with replacement from the original sample, using the same sample size.

    Original Sample

    Bootstrap Sample

    Remember: sample

    size matters!

    Statistics: Unlocking the Power of Data Lock5

    • How would you take a bootstrap sample from your sample of Reese’s Pieces?

    Reese’s Pieces

    Statistics: Unlocking the Power of Data Lock5

    Your original sample has data values

    18, 19, 19, 20, 21 Is the following a possible bootstrap sample?

    18, 19, 20, 21, 22

    a) Yes b) No

    Bootstrap Sample

    Statistics: Unlocking the Power of Data Lock5

    Your original sample has data values

    18, 19, 19, 20, 21 Is the following a possible bootstrap sample?

    18, 19, 20, 21

    a) Yes b) No

    Bootstrap Sample

    Statistics: Unlocking the Power of Data Lock5

    Your original sample has data values

    18, 19, 19, 20, 21 Is the following a possible bootstrap sample?

    18, 18, 19, 20, 21

    a) Yes b) No

    Bootstrap Sample

  • 2/20/2015

    3

    Statistics: Unlocking the Power of Data Lock5

    Original Sample

    Bootstrap Sample

    Bootstrap Sample

    Bootstrap Sample

    .

    .

    .

    Bootstrap Statistic

    Sample Statistic

    Bootstrap Statistic

    Bootstrap Statistic

    .

    .

    .

    Bootstrap Distribution

    Statistics: Unlocking the Power of Data Lock5

    • What is the average mercury level of fish (large mouth bass) in Florida lakes?

    • Sample of size n = 53, with 𝑥 = 0.527 ppm. (FDA action level in the USA is 1 ppm, in Canada the limit is 0.5 ppm)

    •Key Question: How much can statistics vary from sample to sample?

    Mercury Levels in Fish

    Lange, T., Royals, H. and Connor, L. (2004). Mercury accumulation in largemouth bass (Micropterus salmoides) in a Florida Lake. Archives of Environmental Contamination and Toxicology, 27(4), 466-471.

    Statistics: Unlocking the Power of Data Lock5

    Mercury Levels in Fish

    Statistics: Unlocking the Power of Data Lock5

    You have a sample of size n = 53. You sample with replacement 1000 times to get 1000 bootstrap samples. What is the sample size of each bootstrap sample? (a) 53 (b) 1000

    Bootstrap Sample

    Statistics: Unlocking the Power of Data Lock5

    You have a sample of size n = 53. You sample with replacement 1000 times to get 1000 bootstrap samples. How many bootstrap statistics will you have? (a) 53 (b) 1000

    Bootstrap Distribution

    Statistics: Unlocking the Power of Data Lock5

    “Pull yourself up by your bootstraps”

    Why “bootstrap”?

    • Lift yourself in the air simply by pulling up on the laces of your boots

    • Metaphor for accomplishing an “impossible” task without any outside help

  • 2/20/2015

    4

    Statistics: Unlocking the Power of Data Lock5

    Sampling Distribution

    Population

    µ

    BUT, in practice we don’t see the “tree” or all of the “seeds” – we only have ONE seed

    Statistics: Unlocking the Power of Data Lock5

    Bootstrap Distribution

    Bootstrap “Population”

    What can we do with just one seed?

    𝑥

    Estimate the distribution and variability (SE) of 𝑥 ’s from the bootstraps

    µ

    Statistics: Unlocking the Power of Data Lock5

    Bootstrap statistics are to the original sample statistic

    as

    the original sample statistic is to the population parameter

    Golden Rule of Bootstrapping

    Statistics: Unlocking the Power of Data Lock5

    Center

    •The sampling distribution is centered around the population parameter

    • The bootstrap distribution is centered around the

    •Luckily, we don’t care about the center… we care about the variability!

    a) population parameter b) sample statistic c) bootstrap statistic d) bootstrap parameter

    Statistics: Unlocking the Power of Data Lock5

    Standard Error

    • The variability of the bootstrap statistics is similar to the variability of the sample statistics

    • The standard error of a statistic can be estimated using the standard deviation of the bootstrap distribution!

    Statistics: Unlocking the Power of Data Lock5

    Confidence Intervals

    Sample Bootstrap

    Sample

    . . .

    Calculate statistic for each bootstrap sample

    Bootstrap Distribution

    Standard Error (SE): standard deviation of bootstrap distribution

    Margin of Error (ME) (95% CI: ME = 2×SE)

    statistic ± ME

    Bootstrap Sample

    Bootstrap Sample

    Bootstrap Sample

    Bootstrap Sample

  • 2/20/2015

    5

    Statistics: Unlocking the Power of Data Lock5

    • What is the average mercury level of fish (large mouth bass) in Florida lakes?

    • Sample of size n = 53, with 𝑥 = 0.527 ppm. (FDA action level in the USA is 1 ppm, in Canada the limit is 0.5 ppm)

    •Give a confidence interval for the true average.

    Mercury Levels in Fish

    Lange, T., Royals, H. and Connor, L. (2004). Mercury accumulation in largemouth bass (Micropterus salmoides) in a Florida Lake. Archives of Environmental Contamination and Toxicology, 27(4), 466-471.

    Statistics: Unlocking the Power of Data Lock5

    SE = 0.047

    0.527  2  0.047 (0.433, 0.621)

    Mercury Levels in Fish

    We are 95% confident that average mercury level in fish in Florida lakes is between 0.433 and 0.621 ppm.

    Statistics: Unlocking the Power of Data Lock5

    Same process for every parameter! Estimate the standard error and/or a confidence interval for...

    • proportion (𝑝)

    • difference in means (µ1 − µ2 )

    • difference in proportions (𝑝1 − 𝑝2 )

    • standard deviation (𝜎)

    • correlation (𝜌)

    • ... Generate samples with replacement Calculate sample statistic Repeat...

    Statistics: Unlocking the Power of Data Lock5

    Hitchhiker Snails

     A type of small snail is very widespread in Japan, and colonies of the snails that are genetically very similar have been found very far apart.

     How could the snails travel such long distances?

     Biologist Shinichiro Wada fed 174 live snails to birds, and found that 26 were excreted live out the other end. (The snails are apparently able to seal their shells shut to keep the digestive fluids from getting in).

     What proportion of these snails ingested by birds survive?

    Yong, E. “The Scatological Hitchhiker Snail,” Discover, October 2011, 13.

    Statistics: Unlocking the Power of Data Lock5

    Hitchhiker Snails Give a 95% confidence interval for the proportion of snails ingested by birds that survive.

    a) (0.1, 0.2) b) (0.05, 0.25) c) (0.12, 0.18) d) (0.07, 0.18)

    Statistics: Unlocking the Power of Da

Recommended

View more >