a brief tour of statistics - noisebridge · 1/61 a brief tour of statistics this work by thomas...

Post on 17-Jul-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1/61

A Brief Tour of Statistics

This work by Thomas Lotze is licensed under a Creative Commons

Attribution 3.0 United States License. You’re free to share or change

it, so long as you provide attribution to Thomas Lotze.

http://creativecommons.org/licenses/by/3.0/us

2/61

• What are Distributions?• Models

– Binomial– Poisson– Uniform– Gaussian (normal)

• Using Distributions to Answer Questions

• Distribution Parameters• Estimating Parameters

• Confidence Intervals and P-values• Why Gaussian?

• Regression

• Logistic Regression• Why Not Gaussian?

• Bootstrapping• Multiple Testing

• Useful Tools

3/61Creative Commons licensed, from gr0don on flickr

4/61Creative Commons licensed, from WxMom on flickr

5/61Creative Commons licensed, from stringberd on flickr

6/61Creative Commons licensed, from benchilada on flickr

7/61Creative Commons licensed, from benchilada on flickr

8/61

• What are Distributions?• Models

– Binomial– Poisson– Uniform– Gaussian (normal)

• Using Distributions to Answer Questions

• Distribution Parameters• Estimating Parameters

• Confidence Intervals and P-values• Why Gaussian?

• Regression

• Logistic Regression• Why Not Gaussian?

• Bootstrapping• Multiple Testing

• Useful Tools

9/61Creative Commons licensed, from Jason Tromm on flickr

10/61

11/61

12/61Creative Commons licensed, from Rickydavid on flickr

13/61

14/61Creative Commons licensed, from Antoine Taveneaux on Wikimedia Commons

15/61

• What are Distributions?• Models

– Binomial– Poisson– Uniform– Gaussian (normal)

• Using Distributions to Answer Questions• Distribution Parameters• Estimating Parameters

• Confidence Intervals and P-values• Why Gaussian?

• Regression

• Logistic Regression• Why Not Gaussian?

• Bootstrapping• Multiple Testing

• Useful Tools

16/61

17/61

18/61

19/61

20/61

• What are Distributions?• Models

– Binomial– Poisson– Uniform– Gaussian (normal)

• Using Distributions to Answer Questions

• Distribution Parameters• Estimating Parameters

• Confidence Intervals and P-values• Why Gaussian?

• Regression

• Logistic Regression• Why Not Gaussian?

• Bootstrapping• Multiple Testing

• Useful Tools

21/61Creative Commons licensed, from gr0don on flickr

22/61

• What are Distributions?• Models

– Binomial– Poisson– Uniform– Gaussian (normal)

• Using Distributions to Answer Questions

• Distribution Parameters• Estimating Parameters• Confidence Intervals and P-values• Why Gaussian?

• Regression

• Logistic Regression• Why Not Gaussian?

• Bootstrapping• Multiple Testing

• Useful Tools

23/61

21/38

24/61

25/61

26/61

• What are Distributions?• Models

– Binomial– Poisson– Uniform– Gaussian (normal)

• Using Distributions to Answer Questions

• Distribution Parameters• Estimating Parameters

• Confidence Intervals and P-values• Why Gaussian?

• Regression

• Logistic Regression• Why Not Gaussian?

• Bootstrapping• Multiple Testing

• Useful Tools

27/61

28/61

29/61

30/61

31/610.10.44

0.050.41

0.010.34

0.0010.30

Probability of 21 or

more successes

Actual

Field

Goal Rate

32/61

• What are Distributions?• Models

– Binomial– Poisson– Uniform– Gaussian (normal)

• Using Distributions to Answer Questions

• Distribution Parameters• Estimating Parameters

• Confidence Intervals and P-values• Why Gaussian?• Regression

• Logistic Regression• Why Not Gaussian?

• Bootstrapping• Multiple Testing

• Useful Tools

33/61

34/61

35/61

36/61

• What are Distributions?• Models

– Binomial– Poisson– Uniform– Gaussian (normal)

• Using Distributions to Answer Questions

• Distribution Parameters• Estimating Parameters

• Confidence Intervals and P-values• Why Gaussian?

• Regression• Logistic Regression• Why Not Gaussian?

• Bootstrapping• Multiple Testing

• Useful Tools

37/61

38/61

39/61

40/61

41/61

42/61

43/61

• What are Distributions?• Models

– Binomial– Poisson– Uniform– Gaussian (normal)

• Using Distributions to Answer Questions

• Distribution Parameters• Estimating Parameters

• Confidence Intervals and P-values• Why Gaussian?

• Regression

• Logistic Regression• Why Not Gaussian?

• Bootstrapping• Multiple Testing

• Useful Tools

44/61

45/61

46/61

• What are Distributions?• Models

– Binomial– Poisson– Uniform– Gaussian (normal)

• Using Distributions to Answer Questions

• Distribution Parameters• Estimating Parameters

• Confidence Intervals and P-values• Why Gaussian?

• Regression

• Logistic Regression• Why Not Gaussian?• Bootstrapping• Multiple Testing

• Useful Tools

47/61Creative Commons licensed, from Alan Vernon on flickr

48/61

49/61

50/61

51/61

52/61

53/61

• What are Distributions?• Models

– Binomial– Poisson– Uniform– Gaussian (normal)

• Using Distributions to Answer Questions

• Distribution Parameters• Estimating Parameters

• Confidence Intervals and P-values• Why Gaussian?

• Regression

• Logistic Regression• Why Not Gaussian?

• Bootstrapping• Multiple Testing

• Useful Tools

54/61

• What are Distributions?• Models

– Binomial– Poisson– Uniform– Gaussian (normal)

• Using Distributions to Answer Questions

• Distribution Parameters• Estimating Parameters

• Confidence Intervals and P-values• Why Gaussian?

• Regression

• Logistic Regression• Why Not Gaussian?

• Bootstrapping• Multiple Testing• Useful Tools

55/61Creative Commons licensed, from CharlotteKinzie on flickr

56/61

57/61

0.991000.05

0.40100.05

0.0510.05

0.631000.01

0.10100.01

0.0110.01

Probability of

False Detection (in at least 1 test)Number of Tests

Probability of

False Detection (per test)

58/61

• What are Distributions?• Models

– Binomial– Poisson– Uniform– Gaussian (normal)

• Using Distributions to Answer Questions

• Distribution Parameters• Estimating Parameters

• Confidence Intervals and P-values• Why Gaussian?

• Regression

• Logistic Regression• Why Not Gaussian?

• Bootstrapping• Multiple Testing

• Useful Tools

59/61

• R: http://cran.r-project.org/

• Casella & Berger, Statistical Inference

• Wikipedia: http://en.wikipedia.org/wiki/List_of_probability_distributions

• Salsburg, The Lady Tasting Tea

• thomas.lotze@thomaslotze.com

60/61

1. Everything is a Distribution

2. Many Kinds of “Random” (many Distributions)

3. Estimated Parameters are RandomThey have Distributions!

4. Statistical Decisions come from Distribution Estimates

5. Be Skeptical of NormalityMean and Variance are not sufficient!

6. Be Skeptical of Multiple Testing

61/61

• Practical Take-home– Normality test– T-test– Wilcoxon rank-sum test

• Other Distributions– Student’s T– F– Lognormal– Geometric– Levy– Weibull– Benford

• Goodness of fit– Chi-squared test– Q-Q plots

• Distribution Connections• Multivariate Distributions

• Bias/Variance Tradeoff

•Nonparametric Distributions•Model Comparison:

Parameters and Fit (AIC, BIC)

•Bayesian Statistics•Bayes’ Law

•Cognitive Biases•Time series

•Counterintuitive Probability•Monty Haul

•Two Aces

•Poisson Waiting Times•Markov Chain Monte Carlo

•Extreme Value Statistics

top related