![Page 1: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/1.jpg)
Graphics (and numerics) for univariate distributions
Nicholas J. Cox Department of Geography Durham University, UK
![Page 2: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/2.jpg)
2
Klein and mine
Felix Klein (1848–1925) wrote a classic: 1908, 1925, 1928. Elementarmathematik vom höheren Standpunkte aus. Leipzig: Teubner; Berlin: Springer.
In this talk I look at elementary statistical graphics from an intermediate standpoint.
![Page 3: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/3.jpg)
3
Why is Stata graphics so complicated? It offers
canned convenience commands for common tasks (e.g. histograms, survival functions)
a framework for creating new kinds of graphs, vital for programmers
cosmetic control of small details such as colours, text and symbols
![Page 4: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/4.jpg)
4
How to learn about Stata graphics?The radical solution:
Read the documentation.
The friendly solution: Read Michael Mitchell’s books.
Another solution: Follow Statalist and the Stata Journal.
![Page 5: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/5.jpg)
5
This talk
I will give a rag-bag of tips and tricks, including
some examples for official Stata commands
some examples of my own commands, from the Stata Journal or SSC (use net or ssc to install)
![Page 6: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/6.jpg)
6
Distributions
Most examples will show (fairly) raw data, but there is plenty of scope to show distributions of residuals, estimates, figures of merit, P-values, and so forth.
Categorical variables will get short shrift, but my best single tip is to check out catplot and tabplot from SSC.
![Page 7: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/7.jpg)
7
Small distributions with namesBar charts need no introduction here.
graph hbar is a basic graph for showing distributions with informative names attached.
hbar allows names to be written left to right.
20 or so values can be so shown fairly well, more if the medium allows
(e.g. whole-page figure, poster).
![Page 8: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/8.jpg)
8
0 0.5 m 1 m 1.5 m 2 mPopulation 2012
GotlandJämtlandBlekinge
KronobergKalmar
VästernorrlandNorrbotten
VästmanlandVästerbotten
VärmlandSödermanland
DalarnaGävleborg
ÖrebroHalland
JönköpingUppsala
ÖstergötlandSkåne
Västra GötalandStockholm
![Page 9: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/9.jpg)
9
0 50 100 150 200 250 300Population density per sq. km 2012
JämtlandNorrbotten
VästerbottenDalarna
VästernorrlandGävleborgVärmland
GotlandKalmar
KronobergJönköping
ÖrebroÖstergötland
UppsalaSödermanland
VästmanlandBlekingeHalland
Västra GötalandSkåne
Stockholm
![Page 10: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/10.jpg)
10
Small distributions with namesLess well known, graph dot is also a basic
graph for showing distributions with informative names attached.
graph dot also allows names to be written left to right.
Unlike bar graphs, graph dot also extends naturally to cases in which logarithmic scales are desired.
![Page 11: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/11.jpg)
11
2 5 10 20 50 100 200Population density per sq. km 2012
JämtlandNorrbotten
VästerbottenDalarna
VästernorrlandGävleborgVärmland
GotlandKalmar
KronobergJönköping
ÖrebroÖstergötland
UppsalaSödermanland
VästmanlandBlekingeHalland
Västra GötalandSkåne
Stockholm
![Page 12: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/12.jpg)
12
graph dot
This kind of graph is often called a dot chart or dot plot.
There is scope for confusion, as the same name has been applied to a different plot, on which more later.
It is often named for William S. Cleveland, who promoted it in various articles and books, as a Cleveland dot chart.
Whatever the name, dot charts also work well for comparison of distributions.
![Page 13: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/13.jpg)
13
0 10 20 30 40 50
SwedenFinlandIrelandIceland
PortugalUnited Kingdom
SlovakiaBelgiumNorway
MaltaGermany
Czech RepublicFrance
NetherlandsItaly
RomaniaSwitzerland
SpainDenmarkHungaryCyprusAustriaGreecePoland
BulgariaLithuaniaSloveniaEstonia
Latvia
% daily smokers ~2002
females males
![Page 14: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/14.jpg)
14
graph dot small tips
Guide lines are best kept thin and a light colour.
MS Word users beware: dotted lines don’t transfer well.
There is an undocumented vertical option, not often needed but there if you really want it.
![Page 15: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/15.jpg)
15
Larger distributions: histogramsAt some point with larger distributions we
have to abandon naming every observation on a plot, even if the names are known and informative.
Histograms remain very popular, despite the possibility of graphic artefacts arising from choices of bin width and bin origin.
Note that histogram and twoway histogram are related but distinct commands.
![Page 16: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/16.jpg)
16
0
50
100
150
200F
req
ue
ncy
10000 100000 1 m 10 mPopulation estimate for 1 July 2011
Core based statistical areas, USA
![Page 17: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/17.jpg)
17
Transformations and histogramsA twist in this example is use of a logarithmic
scale.
Transform the variable first, here with log10().
Draw the histogram on a transformed scale.
Fix labels, e.g. 4 "10000", in xlabel().
Note that xsc(log) won’t do this for you.
![Page 18: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/18.jpg)
18
Dividing histograms
Frequencies can be added.
So, for two subsets: Lay down the frequency histogram for all. Put the frequency histogram for a subset on top. The difference is the other subset. Use different colours, but the same bins. Use the same (light) colour for blcolor().
This can be extended to three or more subsets.
![Page 19: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/19.jpg)
19
0
50
100
150
200F
req
ue
ncy
10000 100000 1 m 10 mPopulation estimate for 1 July 2011
MicropolitanMetropolitan
Core based statistical areas, USA
![Page 20: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/20.jpg)
20
Densities
If you really want to plot densities, kdensity is the natural place to start.
Note that kdensity and twoway kdensity are related but distinct commands.
![Page 21: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/21.jpg)
21
Density estimation on transformed scalesA longstanding but under-used idea is to
estimate densities on a transformed scale.
This will ensure that estimates are positive only within the natural support and should help stabilise estimates where data are thin on the ground.
See Stata Journal 4: 66–88 (2004) for some references.
![Page 22: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/22.jpg)
22
Density estimation on transformed scalesFor density functions f of a variable x and a
monotone transform t(x),
estimate for f(x) = estimate for f(t(x)) · | dt/dx |.
For example, estimate f(x) by f(ln x) · (1/x).
![Page 23: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/23.jpg)
23
Example and code
Example data are lengths and widths of 158 glacial cirques in the English Lake District. See more at Earth Surface Processes and Landforms 32: 1902–1912 (2007).
tkdensity (SSC) is a convenience command that does the estimation and graphing in one.
A paper with some photos of Romanian cirques
![Page 24: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/24.jpg)
24
kdensity tkdensity with ln0
.00
05
.00
1.0
01
5.0
02
De
nsi
ty
0 500 1000 1500 2000length (m)
kernel = epanechnikov, bandwidth = 83.6216
0.0
00
5.0
01
.00
15
.00
2D
en
sity
0 1000 2000 3000length (m)
biweight on ln scale .4
0.0
00
5.0
01
.00
15
De
nsi
ty
0 500 1000 1500 2000width (m)
kernel = epanechnikov, bandwidth = 96.1975
0.0
00
5.0
01
.00
15
De
nsi
ty
0 500 1000 1500 2000 2500width (m)
biweight on ln scale .4
![Page 25: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/25.jpg)
25
Dot plots or strip plots
The main idea is to show each data point by one marker symbol on a magnitude scale.
Usually, although not necessarily, there is binning too and tied values are jittered or stacked to show relative frequency.
In official Stata the command is dotplot. stripplot from SSC is much more versatile. First we look at some examples using the
default horizontal alignment.
![Page 26: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/26.jpg)
26
length
width
0 500 1000 1500 2000size (m)
boxes show 5 25 50 75 95 percentiles
50 m bins
![Page 27: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/27.jpg)
27
Marginal box plots
stripplot (for that matter dotplot too) can add box plots.
That way box plots do what they arguably do best: summarize.
The fine structure of the data remains visible.
stripplot allows box plots with whiskers drawn to specified percentiles, as well as those following the Tukey rule that whiskers span data points within 1.5 IQR of the nearer quartile.
![Page 28: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/28.jpg)
28
An aside on box plots
If you like box plots, you will know that graph box and graph hbox get you there…
… except in so far as they don’t.
Suppose you want to do something a bit different, such as add points for means, or join medians.
See Stata Journal 9: 478-496 (2009) for details on how to do box plots from first principles.
![Page 29: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/29.jpg)
29
UT AK WY ID
LA
MS
NM
SC
TX
CO
GA
HI
ND
AZ
IN
KY
MI
MN
MT
SD
AL
NE
NC
VT
WI
CA
DE
IL
IA
KS
NV
NH
OH
OK
OR
TN
VA
WA
AR
ME
MD
WV
MA
MO
CT
NJ
NY
PA
RI
FL
24 26 28 30 32 34Median age 1980
0.5 year bins
![Page 30: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/30.jpg)
30
How was that done?
This plot of median age in 1980 for US states also used stripplot.
The main trick is very simple: make marker symbols big enough and marker labels small enough so they jointly act as small text boxes.
OH yes, I agree: 50 US states with two-letter abbreviations AR an easy case, but WY not?
![Page 31: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/31.jpg)
31
Panel or longitudinal data
Let’s change tack for a different kind of example, with panel data.
The dataset is small: OECD data on percent regular cigarette smoking at age 15 for 24 countries, 4 time periods and 2 genders.
Panel data can be seen as a series of distributions.
The distribution can serve as context for any interesting case, just as a test score is reported as a percentile rank.
![Page 32: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/32.jpg)
32
0
10
20
30
40
1993-94 1997-98 2001-02 2005-06 1993-94 1997-98 2001-02 2005-06
boys girls
Re
gu
lar
cig
are
tte
sm
oki
ng
in 1
5 y
ea
r o
lds,
%
![Page 33: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/33.jpg)
33
Spaghetti plot, or a graphical pasticheThe usual kind of multiple time series plot
(here using line) is the usual kind of mess. I suppressed the legend naming the countries.
There are ways of improving it as a time series plot, such as using a by() option or some other device for splitting out subsets.
But we will stick with the distribution theme. First, look at a stripplot in which the USA
is highlighted.
![Page 34: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/34.jpg)
34
0
10
20
30
40
1993-94 1997-98 2001-02 2005-06 1993-94 1997-98 2001-02 2005-06
boys girlsp
erc
en
t
USA highlighted
Regular cigarette smoking in 15 year olds
![Page 35: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/35.jpg)
35
stripplot for panel data
In principle, we lose some information on individual trajectories.
In practice, a multiple time series plot is likely to be too unattractive to invite detailed scrutiny.
As before, we could add box plots, or bars with means and confidence intervals.
![Page 36: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/36.jpg)
36
0
10
20
30
40R
eg
ula
r ci
ga
rett
e s
mo
kin
g in
15
ye
ar
old
s, %
boys girls
1993-94 1997-98 2001-02 2005-06 1993-94 1997-98 2001-02 2005-06
USA highlighted
![Page 37: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/37.jpg)
37
devnplot
The previous graph was from devnplot (SSC). devn here stands for deviation.
The values for each group are plotted as quantiles or order statistics.
A subset of cases may be highlighted (here just one panel).
A backdrop shows values as deviations from group means.
![Page 38: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/38.jpg)
38
devnplot
The choices shown are the defaults. Other plotting orders are possible. The backdrop can be removed, or tuned to show deviations from any specified set of levels.
devnplot was first written with the aim of showing data and summaries in anova style, but I mostly use it to show sets of quantiles.
![Page 39: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/39.jpg)
39
devnplot
A small but sometimes useful detail is that devnplot adjusts the width for each group according to the number of its values.
This can help if groups are of very different sizes.
![Page 40: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/40.jpg)
40
Quantile plots
Quantile functions are also known as inverse (cumulative) distribution functions.
They are the order statistics, given as a function of cumulative probability or fraction of the data.
For ranks i = 1, …, n, use a plotting position such as (i − 0.5)/n as abscissa.
In official Stata the main command is quantile. qplot from SJ is much more versatile.
![Page 41: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/41.jpg)
41
0
10
20
30
40
0 .5 1 0 .5 1
boys girls
1993-94 1997-98 2001-02 2005-06
Re
gu
lar
cig
are
tte
sm
oki
ng
in 1
5 y
ea
r o
lds,
%
fraction of the data
![Page 42: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/42.jpg)
42
qplot options What about smoothing?
qplot supports over() and by() options, to plot quantiles by distinct groups within and between graph panels.
In this example, some of the irregularity stems from reporting values as integers. None of the irregularity is easy to interpret.
So why not smooth the quantile functions?
![Page 43: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/43.jpg)
43
Quantile smoothing
Quantile smoothing is less well known than kernel density estimation.
The method of Harrell, F.E. and C.E. Davis. 1982. A new distribution-free quantile estimator. Biometrika 69: 635–640 turns out to be an exact bootstrap estimator of the corresponding population quantile.
hdquantile (SSC) offers an implementation.
![Page 44: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/44.jpg)
44
10
20
30
40
0 .5 1 0 .5 1
boys girls
1993-94 1997-98 2001-02 2005-06
Re
gu
lar
cig
are
tte
sm
oki
ng
in 1
5 y
ea
r o
lds,
%
fraction of the data
![Page 45: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/45.jpg)
45
How much difference does quantile smoothing make?
Quantile smoothing is conservative. Here the difference between smoothed and
observed quantiles is ~1%.
So, smoothing mostly takes out noise, which is its job.
![Page 46: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/46.jpg)
46
-2
-1
0
1
2
3
smo
oth
ed
qu
an
tile
- o
bse
rve
d q
ua
ntil
e,
%
0 10 20 30 40Regular cigarette smoking in 15 year olds, %
![Page 47: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/47.jpg)
47
Lord Rayleigh discovering argon
John William Strutt, Lord Rayleigh (1842–1919) compared the mass of nitrogen obtained by different methods from a given container.
The marked difference led to the discovery of argon with Sir William Ramsay and the award to Rayleigh of the Nobel Prize for Physics in 1904.
The Rayleigh distribution is named for the same Rayleigh.
![Page 48: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/48.jpg)
48
2.29500
2.30000
2.30500
2.31000m
ass
(g
)
air chemicalmethod
ferroushydrate
hotcopper
hotiron
ammoniumnitrite
nitricoxide
nitrousoxide
![Page 49: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/49.jpg)
49
Which plot?
devnplot works well here to show fine structure in the data.
stripplot doesn’t work so well and a boxplot just suppresses detail unnecessarily.
(Rayleigh was reporting extremely careful experimental results to a resolution of 10 μg.)
![Page 50: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/50.jpg)
50
Multiple quantile plots
For exploring a bundle of numeric variables, likely to have very different ranges and units, multqplot is offered. See also Stata Journal 12: 549-561 (2012).
The recipe is just to produce a qplot for each variable and then use graph combine.
A graph for each variable puts a premium on space. The variable’s details go on top. Values of selected quantiles are shown (by default 0(25)100, giving a box plot flavour).
![Page 51: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/51.jpg)
51
3291419550796303
15906
0 .25 .5 .75 1
Price
12
1820
24
41
0 .25 .5 .75 1
Mileage (mpg)
1
33
4
5
0 .25 .5 .75 1
Repair Record 1978
1.5
2.5
3
3.5
5
0 .25 .5 .75 1
Headroom (in.)
5
11
1517
23
0 .25 .5 .75 1
Trunk space (cu. ft.)
1760
2240
3200
3670
4840
0 .25 .5 .75 1
Weight (lbs.)
142
170
193
204
233
0 .25 .5 .75 1
Length (in.)
31
36
40
43
51
0 .25 .5 .75 1
Turn Circle (ft.)
79119
196
250
425
0 .25 .5 .75 1
Displacement (cu. in.)
2.19
2.732.93
3.3
3.89
0 .25 .5 .75 1
Gear Ratio
000
11
0 .25 .5 .75 1
Car type
![Page 52: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/52.jpg)
52
Features of quantile plotsShow well any outliers, gaps or granularity.
Scale well over a large range of possible sample sizes.
Entail a minimum of arbitrary choices.
Signal behaviour that might be awkward in modelling.
Behave reasonably with ordinal or binary variables.
For more propaganda: Stata Journal 5: 442–460 (2005).
![Page 53: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/53.jpg)
53
Distribution and survival plotsThose who prefer distribution plots with
axes interchanged will find a command in distplot (SJ).
The convention is to plot cumulative probability against magnitude.
Those who plot survival functions are likely to be working already with sts graph.
![Page 54: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/54.jpg)
54
When to write a new graphics command? Sometimes you want a graph that is new to you. After checking that no command exists, most
often you will plan to construct a graph using twoway commands.
Less often, it will be an application for graph dot, bar, hbar, box or hbox.
But play with do-files first. Most advice is to plan program writing, but for
small projects it makes as much sense to see what grows easily and naturally out of play.
![Page 55: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/55.jpg)
55
Principles of laziness
Let official Stata do as much as possible.
Let other programs do as much as possible.
Don’t generalise programs or add features too readily.
Don’t trust what you didn’t create.
Don’t plan too much; play and see what works.
![Page 56: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/56.jpg)
56
Assessing normal probability plotsSuppose you are assessing fit to a normal
or Gaussian distribution. qnorm is a dedicated official command for
normal probability plots (which are in fact quantile-quantile plots).
How much departure from a straight line is acceptable?
(If you really want a formal test, there are plenty on offer.)
![Page 57: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/57.jpg)
57
A plot is a sample statistic
Even in exploration, the attitude that a plot from a sample is a sample statistic, just like a sample mean or a slope estimate, is always salutary.
So we should be worrying about how the plot that we do have — from our one sample — lies within a sampling distribution of possible plots for different samples.
The auto dataset gives a sample of 74 car weights.
![Page 58: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/58.jpg)
58
1,0
00
2,0
00
3,0
00
4,0
00
5,0
00
We
igh
t (lb
s.)
1,000 2,000 3,000 4,000 5,000Inverse Normal
![Page 59: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/59.jpg)
59
Envelope curves
One recipe suggested is to get envelopes by
simulating several samples of the same size from a Gaussian with the same mean and SD
sorting each sample from smallest to largest
reporting results for each order statistic as an interval (e.g. spanning 95% of results)
![Page 60: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/60.jpg)
60
One solution (mine)
Write a helper program, qenvnormal (qenv package on SSC), that calculates the envelopes. Mata is the work-horse.
qplot is already general enough to plot the envelopes too, so there is no need for an extra graphics program.
Stifle the urge to extend the first program to include all my favourite distributions (gamma, lognormal, etc.)
![Page 61: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/61.jpg)
61
0
2000
4000
6000W
eig
ht,
lb
0 .2 .4 .6 .8 1fraction of the data
95% reference intervals
![Page 62: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/62.jpg)
62
1000
2000
3000
4000
5000W
eig
ht,
lb
1000 2000 3000 4000 5000Normal quantiles
95% reference intervals
![Page 63: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/63.jpg)
63
qplot solutions
qplot is able to plot observed quantiles and the envelopes, against both fraction of the data and normal quantiles.
By the way, qenvnormal warns if there is a reversal of order in the generated envelopes, best taken to mean that the number of replications is too small.
![Page 64: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/64.jpg)
64
qenv on SSC
Maarten Buis (how to say his name) has extended the package beyond its original example programs for normal and gamma distributions.
Type ssc desc qenv in Stata for an up-to-date list.
![Page 65: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/65.jpg)
65
Going gray
A detail in several graphs worth flagging is the usefulness of gray colours for less important elements such as grid lines.
For more, see Stata Journal 9: 499–503 and 9: 648–651 (2009).
![Page 66: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/66.jpg)
66
The aim
… delight lies somewhere between boredom and confusion.
Sir Ernst Gombrich (1909–2001)
1984. The sense of order: A study in the psychology of decorative art. Oxford: Phaidon, p.9.
![Page 67: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/67.jpg)
67
![Page 68: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/68.jpg)
68
Bits and pieces follow
The ratings to follow are subjective ratings based on how often I see such graphs compared with how often they seem about the best possible.
A really good graph used when appropriate would come in the middle on such a rating.
![Page 69: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/69.jpg)
69
Graphs for measured variablesunderrated quantile plotsstrip plots distribution plotssurvival plotsdensity plots histogramsbox plotsoverrated
![Page 70: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/70.jpg)
70
Graphs for categorical variablesunderrated dot charts multi-way bar charts side-by-side bar charts stacked bar charts spineplots mosaic plots generally pie charts overrated
![Page 71: Graphics (and numerics) for univariate distributions](https://reader035.vdocuments.net/reader035/viewer/2022062500/56815570550346895dc33d44/html5/thumbnails/71.jpg)
71
Presentation notes for font freaks and similar strange people
The main font is Georgia.
Stata syntax is in Lucida Console.
The graphs use Arial. I nearly used Gill Sans MT.
The Stata graph scheme is s1color.