![Page 1: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/1.jpg)
Introduction to Descriptive Statistics
17.871
Spring 2015
![Page 2: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/2.jpg)
Reasons for paying attention to data description
• Double-check data acquisition
• Data exploration
• Data explanation
![Page 3: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/3.jpg)
Key measures Describing data
Moment
Non-moment
based location
parameters
Center Mean Mode, median
Spread Variance (standard deviation)
Range,
Interquartile range
Skew Skewness --
Peaked Kurtosis --
![Page 4: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/4.jpg)
Key distinction Population vs. Sample Notation
Population vs. Sample
Greeks Romans
μ, σ, β s, b
![Page 5: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/5.jpg)
Mean
Xn
xn
i
i
1
![Page 6: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/6.jpg)
Guess the Mean
Source: CCES
0.1
.2.3
.4
Den
sity
0strongly approve
somewhat approvesomewhat disapprove
strongly disapprove
institution approval - supreme court
![Page 7: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/7.jpg)
Guess the Mean
Source: CCES
0.1
.2.3
.4
Den
sity
0 1 2 3 4institution approval - supreme court
![Page 8: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/8.jpg)
Guess the Mean
Source: CCES
0.1
.2.3
.4
Den
sity
0 1 2 3 4institution approval - supreme court
2.8
![Page 9: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/9.jpg)
Guess the Mean
0
.2
.4
.6
.8
Fra
ctio
n
0 10 20 30Number of medals
![Page 10: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/10.jpg)
Guess the Mean
0
.2
.4
.6
.8
Fra
ctio
n
0 10 20 30Number of medals
3.3
![Page 11: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/11.jpg)
Variance, Standard Deviation of a Population
n
i
i
n
i
i
n
x
n
x
1
2
2
1
2
)(
,)(
![Page 12: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/12.jpg)
Variance, S.D. of a Sample
sn
x
sn
x
n
i
i
n
i
i
1
2
2
1
2
1
)(
,1
)(
Degrees of freedom
![Page 13: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/13.jpg)
Guess
What was the mean and standard deviation of the age of the MIT undergraduate
population on Registration Day, Fall 2014?
18 19 20 21 22
![Page 14: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/14.jpg)
Guess
What was the mean and standard deviation of the MIT undergraduate
population on Registration Day, Fall 2014?
My guess:
Mean probably ~ 19.5 (if everyone is 18, 19, 20, or 21, and they are
evenly distributed.
s.d. probably ~ 1
18 19 20 21 22
![Page 15: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/15.jpg)
Guess the Standard Deviation
Source: CCES
0.1
.2.3
.4
Den
sity
0 1 2 3 4institution approval - supreme court
![Page 16: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/16.jpg)
Guess the Standard Deviation
Source: CCES
0.1
.2.3
.4
Den
sity
0 1 2 3 4institution approval - supreme court
σ = 0.89
![Page 17: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/17.jpg)
Guess the Standard Deviation
0
.2
.4
.6
.8
Fra
ctio
n
0 10 20 30Number of medals
3.3
![Page 18: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/18.jpg)
Guess the Standard Deviation
0
.2
.4
.6
.8
Fra
ctio
n
0 10 20 30Number of medals
3.3 σ = 7.2
![Page 19: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/19.jpg)
Binary data
)1()1(
1 timeof proportion1)(
2 xxsxxs
xXprobX
xx
![Page 20: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/20.jpg)
Example of this, using the most recent Gallup approval rating of Pres. Obama
• gen o_approve = 1 if
gallup==“Approve”
• replace o_approve = 0
if gallup==“Disapprove”
• the command summ o_approve produces
• Mean = 0.51
• Var = 0.51(1-0.51)=.2499
• S.d. = .49989999
![Page 21: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/21.jpg)
Therefore, reporting the standard deviation (or variance) of a binary
variable is redundant information. Don’t do it for papers written for 17.871.
![Page 22: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/22.jpg)
Non-moment base measures of center or spread
• Central tendency
– Mode
– Median
• Spread
– Range
– Interquartile range
![Page 23: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/23.jpg)
Mode
• The most common value
![Page 24: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/24.jpg)
Guess the Mode
Source: CCES
0.1
.2.3
.4
Den
sity
0 1 2 3 4institution approval - supreme court
2.8
![Page 25: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/25.jpg)
Guess the Mode
0
.2
.4
.6
.8
Fra
ctio
n
0 10 20 30Number of medals
3.3
![Page 26: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/26.jpg)
Guess the Mode
Number of years the respondent has lived in his/her current home
0
.02
.04
.06
.08
Fra
ctio
n
0 20 40 60 80 100How long lived in current residence - Years
![Page 27: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/27.jpg)
Guess the Mode
pew religion | Freq. Percent Cum.
--------------------------+-----------------------------------
protestant | 26,241 47.40 47.40
roman catholic | 12,348 22.30 69.70
mormon | 931 1.68 71.38
eastern or greek orthodox | 275 0.50 71.88
jewish | 1,678 3.03 74.91
muslim | 164 0.30 75.21
buddhist | 445 0.80 76.01
hindu | 89 0.16 76.17
agnostic | 2,885 5.21 81.38
nothing in particular | 7,641 13.80 95.18
something else | 2,667 4.82 100.00
--------------------------+-----------------------------------
Total | 55,364 100.00
![Page 28: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/28.jpg)
The mode is rarely an informative statistic about the central tendency of the data.
It’s most useful in describing the “typical” observation of a categorical variable
![Page 29: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/29.jpg)
Median
• The numerical value separating the upper half of a distribution from the lower half of the distribution
– If N is odd, there is a unique median
– If N is even, there is no unique median --- the convention is to average the two middle values
![Page 30: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/30.jpg)
Guess the Median
Source: CCES
0.1
.2.3
.4
Den
sity
0 1 2 3 4institution approval - supreme court
2.8 2.0
![Page 31: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/31.jpg)
Guess the Median
Source: CCES
0.1
.2.3
.4
Den
sity
0 1 2 3 4institution approval - supreme court
2.8 2.0
3.0
![Page 32: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/32.jpg)
Guess the Median
0
.2
.4
.6
.8
Fra
ctio
n
0 10 20 30Number of medals
3.3
![Page 33: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/33.jpg)
Guess the Median
0
.2
.4
.6
.8
Fra
ctio
n
0 10 20 30Number of medals
3.3 0
![Page 34: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/34.jpg)
Guess the Median
Number of years the respondent has lived in his/her current home
0
.02
.04
.06
.08
Fra
ctio
n
0 20 40 60 80 100How long lived in current residence - Years
Mode = 0
Mean = 11.8
![Page 35: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/35.jpg)
Guess the Median
Number of years the respondent has lived in his/her current home
0
.02
.04
.06
.08
Fra
ctio
n
0 20 40 60 80 100How long lived in current residence - Years
Mode = 0
Mean = 11.8
Median = 8
![Page 36: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/36.jpg)
Guess the Median
Number of years the respondent has lived in his/her current home
0
.02
.04
.06
.08
Fra
ctio
n
0 20 40 60 80 100How long lived in current residence - Years
Mode = 0
Mean = 11.8
Median = 8
Note with right-skewed data:
Mode<median<mean
![Page 37: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/37.jpg)
Median frequently preferred for income data
![Page 38: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/38.jpg)
The (uninformative) graph
Mean = 68,735
Median = 35,000
Mode = 0 (probably)
0
1.0e-05
2.0e-05
3.0e-05
4.0e-05
5.0e-05
Den
sity
0 1.00e+07 2.00e+07 3.00e+07Income
![Page 39: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/39.jpg)
Spread
• Range
– Max(x) – Min(x)
• Interquartile range (IQR)
– Q3(x) – Q1(x)
Q1 = CDF-1(.25)
Q3 = CDF-1(.75)
![Page 40: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/40.jpg)
Guess the IQR
Source: CCES
0.1
.2.3
.4
Den
sity
0 1 2 3 4institution approval - supreme court
σ = 0.89
![Page 41: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/41.jpg)
Guess the IQR
Source: CCES
0.1
.2.3
.4
Den
sity
0 1 2 3 4institution approval - supreme court
σ = 0.89
IQR = 2
![Page 42: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/42.jpg)
Guess the IQR
Number of years the respondent has lived in his/her current home
0
.02
.04
.06
.08
Fra
ctio
n
0 20 40 60 80 100How long lived in current residence - Years
Mode = 0
Mean = 11.8
Median = 8
σ = 11.7
![Page 43: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/43.jpg)
Guess the IQR
Number of years the respondent has lived in his/her current home
0
.02
.04
.06
.08
Fra
ctio
n
0 20 40 60 80 100How long lived in current residence - Years
Mode = 0
Mean = 11.8
Median = 8
σ = 11.7
IQR = 14 (17-3)
![Page 44: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/44.jpg)
Don’t guess the IQR
0
.05
.1.1
5
Fra
ctio
n
0 10000000 20000000 30000000income
σ = 371,799
IQR = 50,000 (62,500-12,500)
Mean = 68,735
Median = 35,000
Mode = 0 (probably)
![Page 45: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/45.jpg)
Lopsidedness and peakedness
![Page 46: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/46.jpg)
Normal distribution example
• IQ
• SAT
• Height
• Symmetrical
• Mean = median = mode
Value
Frequency
22/)(
2
1)(
xexf
![Page 47: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/47.jpg)
Skewness Asymmetrical distribution
• Income
• Contribution to candidates
• Populations of countries
• Age of MIT undergraduates
• “Positive skew”
• “Right skew” Value
Frequency
![Page 48: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/48.jpg)
0.5
11
.5
Den
sity
0 5 10 15dividends_pc
Hyde County, SD
Distribution of the average $$ of dividends/tax return (in K’s)
0
.02
.04
.06
.08
.1
Den
sity
0 50 100 150var1
Fuel economy of cars for sale in
the US
Mitsubishi i-MiEV
(which is supposed to be all electric)
![Page 49: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/49.jpg)
Skewness Asymmetrical distribution
• GPA of MIT students
• Age of MIT faculty
• “Negative skew”
• “Left skew”
Value
Frequency
![Page 50: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/50.jpg)
Placement of Republican Party on 100-point scale
0
.02
.04
.06
.08
Den
sity
0 20 40 60 80 100place on ideological scale - republican party
![Page 51: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/51.jpg)
Skewness
Value
Frequency
![Page 52: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/52.jpg)
Guess the sign of the skew
Source: CCES
0.1
.2.3
.4
Den
sity
0 1 2 3 4institution approval - supreme court
![Page 53: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/53.jpg)
Guess the sign of the skew
Source: CCES
0.1
.2.3
.4
Den
sity
0 1 2 3 4institution approval - supreme court
γ = 0.89
![Page 54: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/54.jpg)
Guess the sign of the skew
Number of years the respondent has lived in his/her current home
0
.02
.04
.06
.08
Fra
ctio
n
0 20 40 60 80 100How long lived in current residence - Years
![Page 55: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/55.jpg)
Guess the sign of the skew
Number of years the respondent has lived in his/her current home
0
.02
.04
.06
.08
Fra
ctio
n
0 20 40 60 80 100How long lived in current residence - Years
γ = 1.5
![Page 56: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/56.jpg)
Note: It is really rare to find a naturally occurring variable with a
negative skew
0
.05
.1
.15
Fra
ctio
n
40 50 60 70 80Life expectancy
Mean = 68.3
s.d. = 8.7
Skew: -0.80
![Page 57: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/57.jpg)
Kurtosis
Value
Frequency
k > 3
k = 3
k < 3
leptokurtic
platykurtic
mesokurtic
![Page 58: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/58.jpg)
Mean s.d. Skew. Kurt.
Self-placement
4.5 1.9 -0.28 1.9
Dem. pty 2.2 1.4 1.1 3.9
Rep. pty 5.6 1.4 -0.98 3.7
Tea party 6.1 1.3 -2.1 7.5
Source: CCES, 2012
0
.05
.1.1
5.2
.25
Den
sity
0 2 4 6 8ideology - yourself
0.1
.2.3
.4
Den
sity
0 2 4 6 8ideology - dem party
0.1
.2.3
Den
sity
0 2 4 6 8ideology - rep party
0.2
.4.6
Den
sity
0 2 4 6 8ideology - tea party movement
![Page 59: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/59.jpg)
Normal distribution
• Skewness = 0
• Kurtosis = 3
22/)(
2
1)(
xexf
![Page 60: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/60.jpg)
Commands in STATA for univariate statistics (K&K pp. 176-186)
• summarize varlist
• summarize varlist, detail
• histogram varname, bin() start()
width() density/fraction/frequency
normal discrete
• table varname,contents(clist)
• tabstat
varlist,statistics(statname…)
• tabulate
![Page 61: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/61.jpg)
. summ time_1
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
time_1 | 10153 11.78371 11.70837 0 89
. summ time_1,det
How long lived in current residence - Years
-------------------------------------------------------------
Percentiles Smallest
1% 0 0
5% 0 0
10% 1 0 Obs 10153
25% 3 0 Sum of Wgt. 10153
50% 8 Mean 11.78371
Largest Std. Dev. 11.70837
75% 17 69
90% 29 75 Variance 137.086
95% 36 76 Skewness 1.470977
99% 50 89 Kurtosis 5.21861
Data from 2012 Survey of the Performance of American Elections
![Page 62: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/62.jpg)
. summ time_1
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
time_1 | 10153 11.78371 11.70837 0 89
. summ time_1,det
How long lived in current residence - Years
-------------------------------------------------------------
Percentiles Smallest
1% 0 0
5% 0 0
10% 1 0 Obs 10153
25% 3 0 Sum of Wgt. 10153
50% 8 Mean 11.78371
Largest Std. Dev. 11.70837
75% 17 69
90% 29 75 Variance 137.086
95% 36 76 Skewness 1.470977
99% 50 89 Kurtosis 5.21861
Data from 2012 Survey of the Performance of American Elections
![Page 63: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/63.jpg)
0
.02
.04
.06
.08
Den
sity
0 20 40 60 80 100How long lived in current residence - Years
. hist time_1,discrete
(start=0, width=1)
![Page 64: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/64.jpg)
. table pid3
------------------------
3 point |
party ID | Freq.
------------+-----------
Democrat | 3,808
Republican | 3,036
Independent | 2,825
Other | 234
Not sure | 297
------------------------
![Page 65: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/65.jpg)
. tabstat time_1 age
stats | time_1 age
---------+--------------------
mean | 11.78371 49.33363
------------------------------
. tabstat time_1 age,stats(mean sd skew kurt)
stats | time_1 age
---------+--------------------
mean | 11.78371 49.33363
sd | 11.70837 15.89716
skewness | 1.470977 -.0152461
kurtosis | 5.21861 2.177523
------------------------------
![Page 66: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/66.jpg)
. tabstat time_1 age,by(pid3) s(mean sd)
Summary statistics: mean, sd
by categories of: pid3 (3 point party ID)
pid3 | time_1 age
------------+--------------------
Democrat | 11.28602 47.72348
| 11.7268 15.88458
------------+--------------------
Republican | 13.1379 52.27569
| 12.17941 15.69504
------------+--------------------
Independent | 11.66335 50.07646
| 11.35228 15.51778
------------+--------------------
Other | 8.457265 42.52991
| 9.328546 13.84282
------------+--------------------
Not sure | 8.084459 38.19865
| 9.559606 14.14754
------------+--------------------
Total | 11.78371 49.33363
| 11.70837 15.89716
---------------------------------
![Page 67: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/67.jpg)
. table pid3,c(mean time_1 sd time_1)
----------------------------------------
3 point |
party ID | mean(time_1) sd(time_1)
------------+---------------------------
Democrat | 11.286 11.7268
Republican | 13.1379 12.17941
Independent | 11.6634 11.35228
Other | 8.45726 9.328546
Not sure | 8.08446 9.559606
----------------------------------------
![Page 68: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/68.jpg)
Univariate graphs
![Page 69: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/69.jpg)
Commands in STATA for univariate statistics
• histogram varname, bin()
start() width()
density/fraction/frequency
normal
• graph box varnames
• graph dot varnames
![Page 70: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/70.jpg)
Example of Florida voters
• Question: does the age of voters vary by race?
• Combine Florida voter extract files, 2010
• gen new_birth_date=date(birth_date,"MDY")
• gen birth_year=year(new_b)
• gen age= 2010-birth_year
![Page 71: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/71.jpg)
Look at distribution of birth year
0
.00
5.0
1.0
15
.02
.02
5
Den
sity
1850 1900 1950 2000birth_year
![Page 72: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/72.jpg)
Explore age by race
. table race if birth_year>1900,c(mean age)
----------------------
race | mean(age)
----------+-----------
1 | 45.61229
2 | 42.89916
3 | 42.6952
4 | 45.09718
5 | 52.08628
6 | 44.77392
9 | 40.86704
---------------------- 3 = Black
4 = Hispanic
5 = White
![Page 73: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/73.jpg)
Graph birth year
. hist age if birth_year>1900
(bin=71, start=9, width=1.3802817)
0
.01
.02
.03
Den
sity
0 20 40 60 80 100age
![Page 74: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/74.jpg)
Graph birth year
. hist age if birth_year>1900
(bin=71, start=9, width=1.3802817)
0
.01
.02
.03
Den
sity
0 20 40 60 80 100age
![Page 75: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/75.jpg)
Divide into “bins” so that each bar represents 1 year
. hist age if birth_year>1900,width(1)
0
.00
5.0
1.0
15
.02
Den
sity
0 20 40 60 80 100age
![Page 76: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/76.jpg)
Add ticks at 10-year intervals
0
.00
5.0
1.0
15
.02
Den
sity
20 30 40 50 60 70 80 90 100age
. hist age if birth_year>1900,width(1) xlabel(20 (10) 100)
![Page 77: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/77.jpg)
Superimpose the normal curve (with the same mean and s.d. as the empirical distribution)
hist age if birth_year>1900,wid(1) xlabel(20 (10) 100)
normal 0
.00
5.0
1.0
15
.02
Den
sity
20 30 40 50 60 70 80 90 100age
![Page 78: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/78.jpg)
. summ age if birth_year>1900,det age ------------------------------------------------------------- Percentiles Smallest 1% 18 9 5% 21 16 10% 24 16 Obs 12612114 25% 34 16 Sum of Wgt. 12612114 50% 48 Mean 49.47549 Largest Std. Dev. 19.01049 75% 63 107 90% 77 107 Variance 361.3986 95% 83 107 Skewness .2629496 99% 91 107 Kurtosis 2.222442
![Page 79: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/79.jpg)
Histograms by race
hist age if birth_year>1900&race>=3&race<=5,wid(1)
xlabel(20 (10) 100) normal by(race) 0
.01
.02
.03
0
.01
.02
.03
20 30 40 50 60 70 80 90 100
20 30 40 50 60 70 80 90 100
3 4
5
Density
normal age
Den
sity
age
Graphs by race
3 = Black
4 = Hispanic
5 = White
![Page 80: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/80.jpg)
05
01
00
age
Draw the previous graph with a box plot
graph box age if birth_year>1900
Upper quartile
Median
Lower quartile } Inter-quartile
range
} 1.5 x IQR
![Page 81: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/81.jpg)
Draw the box plots for the different races
graph box age if birth_year>1900&race>=3&race<=5,by(race) 0
50
100
05
01
00
3 4
5age
Graphs by race
3 = Black
4 = Hispanic
5 = White
![Page 82: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/82.jpg)
Draw the box plots for the different races using “over” option
graph box age if birth_year>1900&race>=3&race<=5,over(race) 0
50
100
age
3 4 5
3 = Black
4 = Hispanic
5 = White
![Page 83: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/83.jpg)
Main issues with histograms
• Proper level of aggregation
• Non-regular data categories
![Page 84: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/84.jpg)
Months Months Months
A
B
C
Stop and think:
What should the distribution of length-of-current residency
look like?
(Hint: the median is around 4 years)
![Page 85: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/85.jpg)
A note about histograms with unnatural categories
From the Current Population Survey (2000), Voter and Registration Survey How long (have you/has name) lived at this address? -9 No Response -3 Refused -2 Don't know -1 Not in universe 1 Less than 1 month 2 1-6 months 3 7-11 months 4 1-2 years 5 3-4 years 6 5 years or longer
![Page 86: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/86.jpg)
Solution, Step 1 Map artificial category onto “natural”
midpoint
-9 No Response missing
-3 Refused missing
-2 Don't know missing
-1 Not in universe missing
1 Less than 1 month 1/24 = 0.042
2 1-6 months 3.5/12 = 0.29
3 7-11 months 9/12 = 0.75
4 1-2 years 1.5
5 3-4 years 3.5
6 5 years or longer 10 (arbitrary)
recode live_length (min/-1 =.)(1=.042)(2=.29)(3=.75)(4=1.5) ///
(5=3.5)(6=10)
![Page 87: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/87.jpg)
Graph of recoded data F
ract
ion
longevity0 1 2 3 4 5 6 7 8 9 10
0
.557134
histogram longevity, fraction
![Page 88: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/88.jpg)
Graph of recoded data
Fra
ctio
n
longevity0 1 2 3 4 5 6 7 8 9 10
0
.557134
Why doesn’t…
…look like …
?
![Page 89: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/89.jpg)
longevity
0 1 2 3 4 5 6 7 8 9 10
0
15
Density plot of data
Total area of last bar = .557
Width of bar = 11 (arbitrary)
Solve for: a = w h (or)
.557 = 11h => h = .051
![Page 90: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/90.jpg)
Density plot template
Category Fraction X-min X-max X-length
Height
(density)
< 1 mo. .0156 0 1/12 .082 .19*
1-6 mo. .0909 1/12 ½ .417 .22
7-11 mo. .0430 ½ 1 .500 .09
1-2 yr. .1529 1 2 1 .15
3-4 yr. .1404 2 4 2 .07
5+ yr. .5571 4 15 11 .05
* = .0156/.082
![Page 91: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/91.jpg)
Three words about pie charts: don’t use them
![Page 92: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/92.jpg)
So, what’s wrong with them
• For non-time series data, hard to get a comparison among groups; the eye is very bad in judging relative size of circle slices
• For time series, data, hard to grasp cross-time comparisons
![Page 93: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/93.jpg)
Some words about graphical presentation
• Aspects of graphical integrity (following Edward Tufte, Visual Display of Quantitative Information)
– Main point should be readily apparent
– Show as much data as possible
– Write clear labels on the graph
– Show data variation, not design variation
![Page 94: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/94.jpg)
Some bad graphs
![Page 95: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/95.jpg)
Some good graphs
![Page 96: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/96.jpg)
Download and use the “Tufte” scheme
• ssc install scheme_tufte
![Page 97: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/97.jpg)
.2.3
.4.5
.6.7
de
mp
resp
ct2
00
0
.3 .4 .5 .6 .7demprespct1964
scatter demprespct2000 demprespct1964 [aw=total], xsize(6.5)
ysize(6.5) xscale(range(.2 .8)) yscale(range(.2 .8))
![Page 98: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/98.jpg)
.2
.3
.4
.5
.6
.7
de
mp
resp
ct2
00
0
.3 .4 .5 .6 .7demprespct1964
scatter demprespct2000 demprespct1964 [aw=total], xsize(6.5)
ysize(6.5) xscale(range(.2 .8)) yscale(range(.2 .8)) scheme(tufte)
![Page 99: Introduction to Descriptive Statisticsweb.mit.edu/17.871/www/2015/02descriptive_stats_2015.pdf · Introduction to Descriptive Statistics 17.871 Spring 2015 . Reasons for paying attention](https://reader030.vdocuments.net/reader030/viewer/2022040212/5e8a5d2a07ab0f3e9431223e/html5/thumbnails/99.jpg)
There is a difference between graphs in research and publication
0
.01
.02
.03
0
.01
.02
.03
20 30 40 50 60 70 80 90 100
20 30 40 50 60 70 80 90 100
3 4
5
Density
normal age
Den
sity
age
Graphs by race
0
.01
.02
.03
0
.01
.02
.03
20 30 40 50 60 70 80 90 100 20 30 40 50 60 70 80 90 100
Black Hispanic
White Total
Den
sity
AgeGraphs by race
Do not publish
Publish OK