stata m9

83
Chapter 1-6. Basic Graphics If you plan to do a lot of Stata graphics, a book you might want to buy is Mitchell MN, A Visual Guide to Stata Graphics, 3rd revised ed., College Station, TX, Stata Press, 2012. This book is available on the www.stata.com website. The book contains about 500 pages of example graphs and the commands used to generate them. Using Stata 7 Graphs Graphics changed a great deal after Stata version 7. Stata 7 graphs no longer work in Stata 8 or later. To use a Stata 7 graph, if someone where to give you the Stata commands for creating a graph in Stata 7 for example, you can simply revert back to that version temporarily with: graph7 twoway scatter y x If you used the following, you would revert back to version 7 for graphs and all the rest of the Stata commands. version 7 Redisplaying a Graph To display the last graph, still in memory, after closing the graphics window, use: graph display You can get the same thing by clicking Window on the menu bar, and then click on Graph. Chapter 1-6 (revision 2 Sep 2012) p. 1

Upload: ajayikayode

Post on 22-Jul-2016

26 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: stata m9

Chapter 1-6. Basic Graphics

If you plan to do a lot of Stata graphics, a book you might want to buy is

Mitchell MN, A Visual Guide to Stata Graphics, 3rd revised ed., College Station, TX, Stata Press, 2012.

This book is available on the www.stata.com website. The book contains about 500 pages of example graphs and the commands used to generate them.

Using Stata 7 Graphs

Graphics changed a great deal after Stata version 7. Stata 7 graphs no longer work in Stata 8 or later. To use a Stata 7 graph, if someone where to give you the Stata commands for creating a graph in Stata 7 for example, you can simply revert back to that version temporarily with:

graph7 twoway scatter y x

If you used the following, you would revert back to version 7 for graphs and all the rest of the Stata commands.

version 7

Redisplaying a Graph

To display the last graph, still in memory, after closing the graphics window, use:

graph display

You can get the same thing by clicking Window on the menu bar, and then click on Graph.

_____________________

Source: Stoddard GJ. Biostatistics and Epidemiology Using Stata: A Course Manual. Salt Lake City, UT: University of Utah School of Medicine. Chapter 1-6. (Accessed September 2, 2012, at http://www.ccts.utah.edu/biostats/ ?pageId=5385).

Chapter 1-6 (revision 2 Sep 2012) p. 1

Page 2: stata m9

Using the Graphics Menu

Open the births dataset,

use births, clear

We will create a scatterplot with birthweight on a y-axis and gestional age on the x-axis, first using the Graphics button the Stata toolbar.

Graphics Twoway graph (scatter, line, etc.) Plots tab: click on Create… button Plot: Basic plots: (select type): Scatter Plot: Plot type: (scatterplot): Y variable: bweight Plot: Plot type: (scatterplot): X variable: gestwks Accept OK

1000

2000

3000

4000

5000

birth

wei

ght i

n gr

ams

25 30 35 40 45gestation period

Chapter 1-6 (revision 2 Sep 2012) p. 2

Page 3: stata m9

Using the Graphics Editor

There are a lot of options just using the menus. The graphics editor can also be used to modify the graph, such as adding text.

In the Graph Window, click on the graphics editor icon (looks like a colored bar graph with a pencil, 6th icon from the left).

To add text, single right click on the “T” on the left menu. Then single right click inside the graph region. This brings up a “Textbox properties” menu. In the Text: box, put “Males” and hit OK. Now position over the text “Males”, and a crossbar will appear. This can be used to drag the text to the position you want it.

Click on the graphics editor icon once again to exit the editor. When the “Save chages to (Graph)?” menu box comes up, just say “No”. If you said yes, you could save the graph to a file at this point.

Now, on the Command line, enter

graph describe

Graph stored in memory

name: Graph format: live created: 5 Jul 2010 15:43:21 scheme: s2color size: 4 x 5.5 dta file: births.dta dated 20 Mar 2002 10:07 command: twoway (scatter bweight gestwks)

Notice this gives you the syntax used for the graph, the “command:” line. You could then copy this to the do-file editor so you have a permanent record of the graph, for future re-creation and modification. Unfortunately, “graph describe” does not show text added interactively.

Chapter 1-6 (revision 2 Sep 2012) p. 3

Page 4: stata m9

Using the do-file editor

We will practice creating graphs in the do-file editor. With a little practice, this becomes the most fast and versitile approach to generating graphs. Having created a graph for one project, you can cut-and-paste it for a new project, and then make a few modifications to customize it.

To get a scatterplot, we use the following command syntax:

graph twoway scatter y-variable x-variable

scatterplot of y-variable and x-variable ordered pairs “twoway” is the family of plots which use numeric y and x scales e.g., scatterplot, histogram, which require 2 variables to define points, heights, etc. “scatter” requests the plottype called scatterplot

graph twoway scatter bweight gestwks

1000

2000

3000

4000

5000

birth

wei

ght i

n gr

ams

25 30 35 40 45gestation period

Chapter 1-6 (revision 2 Sep 2012) p. 4

Page 5: stata m9

Although the command begins with “graph”, for a scatterplot it can be abbreviated by dropping “graph”,

twoway scatter bweight gestwks // “graph” is optional for twoway graph

Even “twoway” can be dropped,

scatter bweight gestwks // “twoway” is optional for scatterplot

To get a separate graph for males and females, we might try

bysort sex: twoway scatter bweight gestwks

It does not work, however, since this form of “by” or “bysort” is not available for twoway.Instead, we have to use,

twoway scatter bweight gestwks if sexalph=="female" twoway scatter bweight gestwks if sexalph=="male"

To get a side-by-side graph, we use the “by” option,

twoway scatter bweight gestwks ,by(sexalph)

Not only is the “by” options available, all graphs in the twoway family can be overlaid.

Chapter 1-6 (revision 2 Sep 2012) p. 5

1000

2000

3000

4000

5000

25 30 35 40 45 25 30 35 40 45

female male

bwei

ght

gestwksGraphs by sexalph

Page 6: stata m9

To overlay the scatterplot with a linear regression line, we would use:

twoway scatter bweight gestwks || lfit bweight gestwks

where, “lfit” requested a linear regression (linear fit) of bweight on gestwks and, “||” requested overlaying one plottype on another

which overlayed the two graphs:

Another notation for overlying plottypes is the ( )-binding notation,

twoway (scatter bweight gestwks) (lfit bweight gestwks)

where the ( ) go around everything that would follow the beginning word “twoway” if the two graphs were drawn separately.

Adding the “by” option,

twoway (scatter bweight gestwks) (lfit bweight gestwks) /// , by(sexalph)

Chapter 1-6 (revision 2 Sep 2012) p. 6

010

0020

0030

0040

0050

00bw

eigh

t/Fitt

ed v

alue

s

25 30 35 40 45gestwks

bw eight Fitted values

Page 7: stata m9

Frequently, researchers like to report profile plots, which are line plots of means. For example, we might be interested in showing the profile plot of mean birth weight at integer values of gestational age, using a separate line for males and females.

First we must round to the nearest integer, to provide a more discrete x-axis variable.

gen gestwks2 = round(gestwks)

Next, we sort on sex and the integer gestation age, so we can use “by” in the next command, as well as check the next command worked as anticipated when using the list command

sort sexalph gestwks2

Now, we create a variable that contains the mean birthweight for each combination of sex and gestational age,

by sexalph gestwks2: egen avgbweight = mean(bweight)

Checking our work,

list sexalph gestwks gestwks2 bweight avgbweight in 1/10 /// , abbrev(15)

Chapter 1-6 (revision 2 Sep 2012) p. 7

050

00

25 30 35 40 45 25 30 35 40 45

female male

bw eight Fitted values

bwei

ght/F

itted

val

ues

gestwks

Graphs by sexalph

Page 8: stata m9

+-----------------------------------------------------+ | sexalph gestwks gestwks2 bweight avgbweight | |-----------------------------------------------------| 1. | female 24.69 25 864 864 | 2. | female 27.33 27 708 668 | 3. | female 26.95 27 628 668 | 4. | female 30.65 31 1764 1337.333 | 5. | female 30.85 31 924 1337.333 | |-----------------------------------------------------| 6. | female 31.29 31 1324 1337.333 | 7. | female 32.47 32 1402 1590.333 | 8. | female 32.41 32 1431 1590.333 | 9. | female 31.71 32 1938 1590.333 | 10. | female 32.53 33 1541 1633.333 | +-----------------------------------------------------+

To extend the command across several rows in do-file editor, we can use “#delimit ;” to change the end of command delimiter to a semi-colon, then use the semi-colon to end the command, and then change the end of command delimiter back to carriage return.

#delimit ;twoway (line avgbweight gestwks2 if sexalph=="female") (line avgbweight gestwks2 if sexalph=="male") ;#delimit cr

Chapter 1-6 (revision 2 Sep 2012) p. 8

1000

2000

3000

4000

avgb

wei

ght

25 30 35 40 45gestwks2

avgbw eight avgbw eight

Page 9: stata m9

For line graphs, the x-axis variable must be sorted. Let’s see what happens if we had the data sorted on id, rather than gestwks2.

sort id#delimit ;twoway (line avgbweight gestwks2 if sexalph=="female") (line avgbweight gestwks2 if sexalph=="male") ;#delimit cr

1000

2000

3000

4000

avgb

wei

ght

25 30 35 40 45gestwks2

avgbweight avgbweight

In the future, if you ever get a “kaos” graph like this, you will now know that you forgot to sort on the x-axis variable.

Chapter 1-6 (revision 2 Sep 2012) p. 9

Page 10: stata m9

Of course, it is not biologically plausible that fetuses shrink at weeks 27 and 35, so these strange drops must be due to estimates based on small sample sizes. The following command verifies the small sample sizes.

table sexalph gestwks2 , c(mean bweight n bweight) format(%4.0f)

--------------------------------------------------------------------------------------------------------- | Gestational Age (weeks) sexalph | 25 27 28 31 32 33 34 35 36 37 38 39 40 41 42 43---------+----------------------------------------------------------------------------------------------- female | 864 668 1337 1590 1633 2014 1325 2656 2835 2870 3263 3291 3353 3443 3591 | 1 2 3 3 3 5 1 10 14 44 48 58 31 9 2 | male | 1000 1462 1931 2497 2395 2693 2841 3196 3308 3447 3589 4003 | 2 3 2 1 8 11 20 40 59 70 35 5 ---------------------------------------------------------------------------------------------------------

If you get wrap-around form a small output window, you can switch the rows and columns,

table gestwks2 sexalph , c(mean bweight n bweight) format(%4.0f)

As you might guess, growth curves (such as the CDC pediatric growth charts) use large sample sizes and smoothing methods. One approach we might take to get smoothing is to use fractional-polynomial prediction plots. It gets it name from fitting a polynomial, such as linear, quadratic, to each short interval, in order to get a smooth looking graph.

sort gestwks2#delimit ;twoway (fpfit bweight gestwks2 if sexalph=="female") (fpfit bweight gestwks2 if sexalph=="male") ;#delimit cr

This growth chart seems more biologically plausible.

Chapter 1-6 (revision 2 Sep 2012) p. 10

010

0020

0030

0040

00pr

edic

ted

bwei

ght

25 30 35 40 45gestwks2

predicted bw eight predicted bw eight

Page 11: stata m9

Aside: Another advantage of this type of smoothing is that you can control for covariates (to do so you would use the “fracpoly” command, which allows covariates, followed by the “predict” command, and then plot the predicted values).

Adding a title, subtitle, axis labels, and a footnote,

#delimit ;twoway (fpfit bweight gestwks2 if sexalph=="female") (fpfit bweight gestwks2 if sexalph=="male") , title("Birth Weight by Weeks Gestation by Gender") subtitle("Singleton Births in London Hospital") ytitle("Birth Weight (grams)") xtitle("Gestational Age (weeks)") note("Note: shown are fractional-polynomial regression lines") ;#delimit cr

Chapter 1-6 (revision 2 Sep 2012) p. 11

010

0020

0030

0040

00B

irth

Wei

ght (

gram

s)

25 30 35 40 45Gestational Age (w eeks)

predicted bw eight predicted bw eight

Note: shown are f ractional-poly nomial regression lines

Singleton Births in London HospitalBirth Weight by Weeks Gestation by Gender

Page 12: stata m9

Adding more tick marks and tick mark labels to both the y and x axes, converting the y tick mark labels to horizontal alignment, and adding better legends,

#delimit ;twoway (fpfit bweight gestwks2 if sexalph=="female") (fpfit bweight gestwks2 if sexalph=="male") , title("Birth Weight by Weeks Gestation by Gender") subtitle("Singleton Births in London Hospital") note("Note: shown are fractional-polynomial regression lines") ytitle("Birth Weight (grams)") xtitle("Gestational Age (weeks)") ylabel(0(500)4000,angle(horizontal)) xlabel(25(1)45) note("Note: shown are fractional-polynomial regression lines") legend(label(1 "Females") label(2 "Males")) ;#delimit cr

In the ylabel command, we used “0(500)4000” to denote the min value, the increment value, and the max value for the tick marks and labels. Also we used “angle(horizontal)” to position the labels horizontally, rather than the default vertically.

Chapter 1-6 (revision 2 Sep 2012) p. 12

0

500

1000

1500

2000

2500

3000

3500

4000

Birt

h W

eigh

t (gr

ams)

25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45Gestational Age (w eeks)

Females Males

Note: shown are f ractional-poly nomial regression lines

Singleton Births in London HospitalBirth Weight by Weeks Gestation by Gender

Page 13: stata m9

For a PowerPoint presentation, thicker lines might be preferred.

The choices for line widths are displayed using

graph query linewidthstyle

linewidthstyle may be

medium medthin thick vthick vvthick vvvthick medthick none thin vthin vvthin vvvthin

For information on linewidthstyle and how to use it, see help linewidthstyle.

In the connecting lines we have used so far, we specify the linewidth with the clwidth option.

#delimit ;twoway (fpfit bweight gestwks2 if sexalph=="female",clwidth(thick)) (fpfit bweight gestwks2 if sexalph=="male",clwidth(thick)) , title("Birth Weight by Weeks Gestation by Gender") subtitle("Singleton Births in London Hospital") note("Note: shown are fractional-polynomial regression lines") ytitle("Birth Weight (grams)") xtitle("Gestational Age (weeks)") ylabel(0(500)4000,angle(horizontal)) xlabel(25(1)45) note("Note: shown are fractional-polynomial regression lines") legend(label(1 "Females") label(2 "Males")) ;#delimit cr

0

500

1000

1500

2000

2500

3000

3500

4000

Birt

h W

eigh

t (gr

ams)

25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45Gestational Age (weeks)

Females Males

Note: shown are fractional-polynomial regression lines

Singleton Births in London HospitalBirth Weight by Weeks Gestation by Gender

Chapter 1-6 (revision 2 Sep 2012) p. 13

Page 14: stata m9

In Stata graphics, we can also use the “multiplier” feature, to change the default size by that multiple. This can be applied to lines, symbols, and text.

Adjusting the connect line width by making it 3 times larger than the default, we get the same effect as clwidth(thick).

#delimit ;twoway (fpfit bweight gestwks2 if sexalph=="female",clwidth(*3)) (fpfit bweight gestwks2 if sexalph=="male",clwidth(*3)) , title("Birth Weight by Weeks Gestation by Gender") subtitle("Singleton Births in London Hospital") note("Note: shown are fractional-polynomial regression lines") ytitle("Birth Weight (grams)") xtitle("Gestational Age (weeks)") ylabel(0(500)4000,angle(horizontal)) xlabel(25(1)45) note("Note: shown are fractional-polynomial regression lines") legend(label(1 "Females") label(2 "Males")) ;#delimit cr

0

500

1000

1500

2000

2500

3000

3500

4000

Birt

h W

eigh

t (gr

ams)

25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45Gestational Age (weeks)

Females Males

Note: shown are fractional-polynomial regression lines

Singleton Births in London HospitalBirth Weight by Weeks Gestation by Gender

Chapter 1-6 (revision 2 Sep 2012) p. 14

Page 15: stata m9

So far, we have only used the default scheme (blue background border).

The choices for schemes are displayed using (note “,” is required, unlike graph query linewidthstyle),

graph query , schemes

graph query , schemes unlike “graph query linewidthstyle” a comma is required preceding “schemes”

Available schemes are

s2color see help scheme_s2color <- version 11 default s2mono see help scheme_s2mono s2manual see help scheme_s2manual s2gmanual s1color see help scheme_s1color s1mono see help scheme_s1mono s1rcolor see help scheme_s1rcolor s1manual see help scheme_s1manual sj see help scheme_sj economist see help scheme_economist s2color8 <- version 10 default lean1 see help scheme_lean1 lean2 see help scheme_lean2

You cannot see the help by entering “scheme_economist” in the menu driven search box. You have to enter “help scheme_economist” in the Stata Command window.

Do not do it, but if you wanted to change the default scheme for the remainder of your Stata session, you would enter

set scheme economist

Do not do it, but if you wanted to change the dafault scheme permanently for all sessions, until set again, you would enter

set scheme economist , permanently

To use a different scheme for a specific graph, use

scatter yvar xvar , scheme(economist)

Chapter 1-6 (revision 2 Sep 2012) p. 15

Page 16: stata m9

Let’s try out the economist scheme.

#delimit ;twoway (fpfit bweight gestwks2 if sexalph=="female",clwidth(*3)) (fpfit bweight gestwks2 if sexalph=="male",clwidth(*3)) , title("Birth Weight by Weeks Gestation by Gender") subtitle("Singleton Births in London Hospital") note("Note: shown are fractional-polynomial regression lines") ytitle("Birth Weight (grams)") xtitle("Gestational Age (weeks)") ylabel(0(500)4000,angle(horizontal)) xlabel(25(1)45) note("Note: shown are fractional-polynomial regression lines") legend(label(1 "Females") label(2 "Males")) scheme(economist) ;#delimit cr

It looks kind of neat, but “thinning out” of titles and labels would be required to make it work.

Chapter 1-6 (revision 2 Sep 2012) p. 16

0

500

1000

1500

2000

2500

3000

3500

4000B

irth

Wei

ght (

gram

s)

252627282930313233343536373839404142434445Gestational Age (weeks)

Females Males

Note: shown are fractional-polynomial regression lines

Singleton Births in London Hospital

Birth Weight by Weeks Gestation by Gender

Page 17: stata m9

If we use “s1color” and switch to a thinner connecting line,

#delimit ;twoway (fpfit bweight gestwks2 if sexalph=="female",clwidth(*2)) (fpfit bweight gestwks2 if sexalph=="male",clwidth(*2)) , title("Birth Weight by Weeks Gestation by Gender") subtitle("Singleton Births in London Hospital") note("Note: shown are fractional-polynomial regression lines") ytitle("Birth Weight (grams)") xtitle("Gestational Age (weeks)") ylabel(0(500)4000,angle(horizontal)) xlabel(25(1)45) note("Note: shown are fractional-polynomial regression lines") legend(label(1 "Females") label(2 "Males")) scheme(s1color) ;#delimit cr

0

500

1000

1500

2000

2500

3000

3500

4000

Birt

h W

eigh

t (gr

ams)

25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45Gestational Age (weeks)

Females Males

Note: shown are fractional-polynomial regression lines

Singleton Births in London HospitalBirth Weight by Weeks Gestation by Gender

we get a color graph that would be appropriate for a PowerPoint presentation.

Chapter 1-6 (revision 2 Sep 2012) p. 17

Page 18: stata m9

For publishing in journals, a black and white graph without a background shaded border is appropriate. For that, use the “slmono” scheme.

#delimit ;twoway (fpfit bweight gestwks2 if sexalph=="female",clwidth(*2)) (fpfit bweight gestwks2 if sexalph=="male",clwidth(*2)) , title("Birth Weight by Weeks Gestation by Gender") subtitle("Singleton Births in London Hospital") note("Note: shown are fractional-polynomial regression lines") ytitle("Birth Weight (grams)") xtitle("Gestational Age (weeks)") ylabel(0(500)4000,angle(horizontal)) xlabel(25(1)45) note("Note: shown are fractional-polynomial regression lines") legend(label(1 "Females") label(2 "Males")) scheme(s1mono) ;#delimit cr

0

500

1000

1500

2000

2500

3000

3500

4000

Birt

h W

eigh

t (gr

ams)

25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45Gestational Age (weeks)

Females Males

Note: shown are fractional-polynomial regression lines

Singleton Births in London HospitalBirth Weight by Weeks Gestation by Gender

We will now develop this into a publishable graph.

Chapter 1-6 (revision 2 Sep 2012) p. 18

Page 19: stata m9

Biomedical and scientific journals do not like the border all the way around the graph because it has a “business” look to it, rather than a scientific look. With the border removed, it looks more like the Cartesian coordinate system, thus making it appear more scientific.

To eliminate the border around the plot region (but still showing a left y-axis and bottom x-axis), we use the “plotregion(style(none))” option.

#delimit ;twoway (fpfit bweight gestwks2 if sexalph=="female",clwidth(*2)) (fpfit bweight gestwks2 if sexalph=="male",clwidth(*2)) , title("Birth Weight by Weeks Gestation by Gender") subtitle("Singleton Births in London Hospital") note("Note: shown are fractional-polynomial regression lines") ytitle("Birth Weight (grams)") xtitle("Gestational Age (weeks)") ylabel(0(500)4000,angle(horizontal)) xlabel(25(1)45) note("Note: shown are fractional-polynomial regression lines") legend(label(1 "Females") label(2 "Males")) scheme(s1mono) plotregion(style(none)) ;#delimit cr

0

500

1000

1500

2000

2500

3000

3500

4000

Birt

h W

eigh

t (gr

ams)

25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45Gestational Age (weeks)

Females Males

Note: shown are fractional-polynomial regression lines

Singleton Births in London HospitalBirth Weight by Weeks Gestation by Gender

Chapter 1-6 (revision 2 Sep 2012) p. 19

Page 20: stata m9

Suppose we want to put the text at the end of the two lines. After a couple of trial-and-error attempts, we might come up with y=3900, x=44 for females and y=3600, x=44.4 for males.

We add the text (y-axis value, x-axis value , “text”) boxes for the two gender groups, drop the legend labels option, and add a legend(off) option.

#delimit ;twoway (fpfit bweight gestwks2 if sexalph=="female",clwidth(*2)) (fpfit bweight gestwks2 if sexalph=="male",clwidth(*2)) , title("Birth Weight by Weeks Gestation by Gender") subtitle("Singleton Births in London Hospital") note("Note: shown are fractional-polynomial regression lines") ytitle("Birth Weight (grams)") xtitle("Gestational Age (weeks)") ylabel(0(500)4000,angle(horizontal)) xlabel(25(1)45) note("Note: shown are fractional-polynomial regression lines") text(3900 44 "Males") text(3600 44.4 "Females") scheme(s1mono) plotregion(style(none)) legend(off) ;#delimit cr

MalesFemales

0

500

1000

1500

2000

2500

3000

3500

4000

Birt

h W

eigh

t (gr

ams)

25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45Gestational Age (weeks)

Note: shown are fractional-polynomial regression lines

Singleton Births in London HospitalBirth Weight by Weeks Gestation by Gender

Chapter 1-6 (revision 2 Sep 2012) p. 20

Page 21: stata m9

If you look closely at the graph, you will see that the y-x coordinates are the center of the text string (both vertically and horizontally. This is the default placement. The option we used:

text(3900 44 "Females") text(3600 44.4 "Males")

by default is equivalent to

text(3900 44 "Females", placement(c)) text(3600 44.4 "Males", placement(c))

Other placement choices are:

placement( ) location of text (orientation of location)c centered on the point, vertically and horizontallyn above the point, centeredne upper-right corner on the point e right of the point, vertically centeredse lower-right corner on the points below point, centeredsw lower-left corner on the pointw left of the point, vertically centerednw upper-left corner on the point

Let’s use the “east” placement, just to see how it works.

#delimit ;twoway (fpfit bweight gestwks2 if sexalph=="female",clwidth(*2)) (fpfit bweight gestwks2 if sexalph=="male",clwidth(*2)) , title("Birth Weight by Weeks Gestation by Gender") subtitle("Singleton Births in London Hospital") note("Note: shown are fractional-polynomial regression lines") ytitle("Birth Weight (grams)") xtitle("Gestational Age (weeks)") ylabel(0(500)4000,angle(horizontal)) xlabel(25(1)45) note("Note: shown are fractional-polynomial regression lines") text(3900 43.1 "Males" , placement(e)) text(3600 43.1 "Females" , placement(e)) scheme(s1mono) plotregion(style(none)) legend(off) ;#delimit cr

Chapter 1-6 (revision 2 Sep 2012) p. 21

Page 22: stata m9

MalesFemales

0

500

1000

1500

2000

2500

3000

3500

4000B

irth

Wei

ght (

gram

s)

25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45Gestational Age (weeks)

Note: shown are fractional-polynomial regression lines

Singleton Births in London HospitalBirth Weight by Weeks Gestation by Gender

Notice that the x-axis tick mark labels are crowded against the x-axis title. To add some space, add the following option to the x-axis title

,height(5)

#delimit ;twoway (fpfit bweight gestwks2 if sexalph=="female",clwidth(*2)) (fpfit bweight gestwks2 if sexalph=="male",clwidth(*2)) , title("Birth Weight by Weeks Gestation by Gender") subtitle("Singleton Births in London Hospital") note("Note: shown are fractional-polynomial regression lines") ytitle("Birth Weight (grams)") xtitle("Gestational Age (weeks)",height(5)) ylabel(0(500)4000,angle(horizontal)) xlabel(25(1)45) note("Note: shown are fractional-polynomial regression lines") text(3900 43.1 "Males" , placement(e)) text(3600 43.1 "Females" , placement(e)) scheme(s1mono) plotregion(style(none)) legend(off) ;#delimit cr

Chapter 1-6 (revision 2 Sep 2012) p. 22

Page 23: stata m9

MalesFemales

0

500

1000

1500

2000

2500

3000

3500

4000B

irth

Wei

ght (

gram

s)

25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45Gestational Age (weeks)

Note: shown are fractional-polynomial regression lines

Singleton Births in London HospitalBirth Weight by Weeks Gestation by Gender

For this graph to be publication quality, we have to remove the title and footnote. Also, some journals insist that all the lines be drawn with dark black, rather than using lighter lines (shading), because the lighter lines do not reproduce well.

Dropping the title, footnote, and making the two lines dark black,

#delimit ;twoway (fpfit bweight gestwks2 if sexalph=="female" ,clwidth(*2) clcolor(black)) (fpfit bweight gestwks2 if sexalph=="male"

,clwidth(*2) clcolor(black)) , ytitle("Birth Weight (grams)") xtitle("Gestational Age (weeks)",height(5)) ylabel(0(500)4000,angle(horizontal)) xlabel(25(1)45) text(3900 43.1 "Males" , placement(e)) text(3600 43.1 "Females" , placement(e)) scheme(s1mono) plotregion(style(none)) legend(off) ;#delimit cr

Chapter 1-6 (revision 2 Sep 2012) p. 23

Page 24: stata m9

Males

Females

0

500

1000

1500

2000

2500

3000

3500

4000

Birt

h W

eigh

t (gr

ams)

25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45Gestational Age (weeks)

This graph looks fine, because the lines to do cross. Just for practice, let us make one of the lines a different pattern, such as a dash.

To see the available line pattern styles, use,

graph query linepatternstyle

linepatternstyle may be

blank dot longdash_shortdash tight_dot dash longdash shortdash vshortdash dash_3dot longdash_3dot shortdash_dot dash_dot longdash_dot shortdash_dot_dot dash_dot_dot longdash_dot_dot solid

Chapter 1-6 (revision 2 Sep 2012) p. 24

Page 25: stata m9

Making one of the lines a dashed line,

#delimit ;twoway (fpfit bweight gestwks2 if sexalph=="female" ,clwidth(*2) clcolor(black) clpattern(solid)) (fpfit bweight gestwks2 if sexalph=="male"

,clwidth(*2) clcolor(black) clpattern(dash)) , ytitle("Birth Weight (grams)") xtitle("Gestational Age (weeks)",height(5)) ylabel(0(500)4000,angle(horizontal)) xlabel(25(1)45) text(3900 43.1 "Males" , placement(e)) text(3600 43.1 "Females" , placement(e)) scheme(s1mono) plotregion(style(none)) legend(off) ;#delimit cr

Males

Females

0

500

1000

1500

2000

2500

3000

3500

4000

Birt

h W

eigh

t (gr

ams)

25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45Gestational Age (weeks)

Chapter 1-6 (revision 2 Sep 2012) p. 25

Page 26: stata m9

Scatterplot with Error Bars

It is sometimes helpful to display a graph of odds ratios (or relative risks, hazard ratios, etc) with there 95% confidence intervals. These can be graphed in Stata using a range plot with capped spikes, specified by

twoway rcap

Charlesworth et al (2003) in their Table 1 present the following odds ratios for four levels (quartiles) of BMI:

Body Mass Index (kg/m2)

% ofSubjects

Stroke(%)

OddsRatio

ConfidenceInterval (95%)

p Value

<25.0 24.94 1.81 1.00 Reference25.0-27.7 24.88 1.59 0.87 0.69-1.1127.8-31.1 25.00 1.69 0.93 0.74-1.1731.2 25.17 1.35 0.74 0.58-0.95 ptrend = 0.040

To graphically depict this dose-respone (linear trend) relationship between BMI and Stroke, we might choose to graph the odds ratios, with a reference line at 1.00.

Step 1. Read in data.

We need the odds ratios and confidence interval limits. One approach is to use the Stata data editor and enter the following data:

Sequence OR lowerCI upperCI1 1.00 . .2 0.87 0.69 1.113 0.93 0.74 1.174 0.74 0.58 0.95

where “Sequence” is included to use for the x-axis value as a way to space out the odds ratios. Then save the these data using

save filename

The disadvantage of this approach is that we have a tiny data file sitting around on our hard drive that we have to keep track of.

Chapter 1-6 (revision 2 Sep 2012) p. 26

Page 27: stata m9

A better approach is to include the data directly into the do-file where we keep the commands used to create the graph.

clearinput sequence OR lowerCI upperCI1 1.00 . .2 0.87 0.69 1.113 0.93 0.74 1.174 0.74 0.58 0.95endlist

+------------------------------------+ | sequence OR lowerCI upperCI | |------------------------------------| 1. | 1 1 . . | 2. | 2 .87 .69 1.11 | 3. | 3 .93 .74 1.17 | 4. | 4 .74 .58 .95 | +------------------------------------+

Next we will graph the range plot with capped spikes using the twoway rcap graph style.

This graph has the syntax:

twoway rcap y1var y2var xvar

It does not matter which order you specify the bounds, so we could use either of the following commands and get the same result

twoway (rcap upperCI lowerCI sequence)twoway (rcap lowerCI upperCI sequence) // same result

Chapter 1-6 (revision 2 Sep 2012) p. 27

Page 28: stata m9

.6.8

11.

2lo

wer

CI/u

pper

CI

1 2 3 4sequence

Chapter 1-6 (revision 2 Sep 2012) p. 28

Page 29: stata m9

Next, overlay a scatterplot of the odds ratios to make the odds ratios appear inside the confidence limits.

twoway (rcap upperCI lowerCI sequence)(scatter OR sequence) .6

.81

1.2

uppe

rCI/l

ower

CI/O

R

1 2 3 4sequence

upperCI/lowerCI OR

Now, drop the legend, add more descriptive axis titles, and add a footnote

#delimit ;twoway (rcap upperCI lowerCI sequence)(scatter OR sequence) , legend(off) ytitle(Odds Ratio for Stroke) xtitle("Body Mass Index (kg/m2)") note("Note: odds ratios are shown with 95% confidence intervals") ; #delimit cr

Chapter 1-6 (revision 2 Sep 2012) p. 29

Page 30: stata m9

.6.8

11.

2O

dds

Rat

io fo

r S

trok

e

1 2 3 4Body Mass Index (kg/m2)

Note: odds ratios are show n w ith 95% confidence intervals

Next, add space to left and right side of graph by making x-axis range wider.

#delimit ;twoway (rcap upperCI lowerCI sequence)(scatter OR sequence) , legend(off) ytitle(Odds Ratio for Stroke) xtitle("Body Mass Index (kg/m2)") note("Note: odds ratios are shown with 95% confidence intervals") xlabel(0.5 1 2 3 4 4.5) ; ;#delimit cr

.6.8

11.

2O

dds

Rat

io fo

r S

trok

e

.5 1 2 3 4 4.5Body Mass Index (kg/m2)

Note: odds ratios are show n w ith 95% confidence intervals

Chapter 1-6 (revision 2 Sep 2012) p. 30

Page 31: stata m9

Chapter 1-6 (revision 2 Sep 2012) p. 31

Page 32: stata m9

Next, replace x-axis tick mark labels with descriptive labels

#delimit ;twoway (rcap upperCI lowerCI sequence)(scatter OR sequence) , legend(off) ytitle(Odds Ratio for Stroke) xtitle("Body Mass Index (kg/m2)") note("Note: odds ratios are shown with 95% confidence intervals") xlabel(0.5 " " 1 "<25.0" 2 "25.0-27.7" 3 "27.8-31.1" 4 ">31.2" 4.5 " ") ;#delimit cr

.6.8

11.

2O

dds

Rat

io fo

r S

trok

e

<25.0 25.0-27.7 27.8-31.1 >31.2 Body Mass Index (kg/m2)

Note: odds ratios are show n w ith 95% confidence intervals

Next, drop x-axis tick marks

#delimit ;twoway (rcap upperCI lowerCI sequence)(scatter OR sequence) , legend(off) ytitle(Odds Ratio for Stroke) xtitle("Body Mass Index (kg/m2)") note("Note: odds ratios are shown with 95% confidence intervals") xlabel(0.5 " " 1 "<25.0" 2 "25.0-27.7" 3 "27.8-31.1" 4 ">31.2" 4.5 " " , noticks) ;#delimit cr

Chapter 1-6 (revision 2 Sep 2012) p. 32

Page 33: stata m9

.6.8

11.

2O

dds

Rat

io fo

r S

trok

e

<25.0 25.0-27.7 27.8-31.1 >31.2 Body Mass Index (kg/m2)

Note: odds ratios are show n w ith 95% confidence intervals

Next, add a reference line at y=1.

#delimit ;twoway (rcap upperCI lowerCI sequence)(scatter OR sequence) , legend(off) ytitle(Odds Ratio for Stroke) xtitle("Body Mass Index (kg/m2)") note("Note: odds ratios are shown with 95% confidence intervals") xlabel(0.5 " " 1 "<25.0" 2 "25.0-27.7" 3 "27.8-31.1" 4 ">31.2" 4.5 " " , noticks) yline(1, lstyle(solid)) ;#delimit cr

.6.8

11.

2O

dds

Rat

io fo

r S

trok

e

<25.0 25.0-27.7 27.8-31.1 >31.2 Body Mass Index (kg/m2)

Note: odds ratios are show n w ith 95% confidence intervals

Chapter 1-6 (revision 2 Sep 2012) p. 33

Page 34: stata m9

Note: to get a vertical reference line, you would use “xline” in place of “yline”.

Chapter 1-6 (revision 2 Sep 2012) p. 34

Page 35: stata m9

To get a list of choices for colors, use

help colorstyle

To get a list of choices for symbols, use

help symbolstyle

Now, change odds ratio symbol to a blue square.

#delimit ;twoway (rcap upperCI lowerCI sequence) (scatter OR sequence , msymbol(square) mlcolor(blue) mfcolor(blue)) , legend(off) ytitle(Odds Ratio for Stroke) xtitle("Body Mass Index (kg/m2)") note("Note: odds ratios are shown with 95% confidence intervals") xlabel(0.5 " " 1 "<25.0" 2 "25.0-27.7" 3 "27.8-31.1" 4 ">31.2" 4.5 " " , noticks) yline(1, lstyle(solid)) ;#delimit cr

.6.8

11.

2O

dds

Rat

io fo

r S

trok

e

<25.0 25.0-27.7 27.8-31.1 >31.2 Body Mass Index (kg/m2)

Note: odds ratios are show n w ith 95% confidence intervals

Chapter 1-6 (revision 2 Sep 2012) p. 35

Page 36: stata m9

Now, make the caps (horizonal lines) on the error bars larger (2 × the default size),

#delimit ;twoway (rcap upperCI lowerCI sequence , msize(*2)) (scatter OR sequence , msymbol(square) mlcolor(blue) mfcolor(blue)) , legend(off) ytitle(Odds Ratio for Stroke) xtitle("Body Mass Index (kg/m2)") note("Note: odds ratios are shown with 95% confidence intervals") xlabel(0.5 " " 1 "<25.0" 2 "25.0-27.7" 3 "27.8-31.1" 4 ">31.2" 4.5 " " , noticks) yline(1, lstyle(solid)) ;#delimit cr

.6.8

11.

2O

dds

Rat

io fo

r Stro

ke

<25.0 25.0-27.7 27.8-31.1 >31.2 Body Mass Index (kg/m2)

Note: odds ratios are shown with 95% confidence intervals

Chapter 1-6 (revision 2 Sep 2012) p. 36

Page 37: stata m9

Make the symbols (squares) larger to match the new size of the error bar caps (2 × the default size),

#delimit ;twoway (rcap upperCI lowerCI sequence , msize(*2)) (scatter OR sequence , msymbol(square) mlcolor(blue) mfcolor(blue) msize(*2)) , legend(off) ytitle(Odds Ratio for Stroke) xtitle("Body Mass Index (kg/m2)") note("Note: odds ratios are shown with 95% confidence intervals") xlabel(0.5 " " 1 "<25.0" 2 "25.0-27.7" 3 "27.8-31.1" 4 ">31.2" 4.5 " " , noticks) yline(1, lstyle(solid)) ;#delimit cr

.6.8

11.

2O

dds

Rat

io fo

r Stro

ke

<25.0 25.0-27.7 27.8-31.1 >31.2 Body Mass Index (kg/m2)

Note: odds ratios are shown with 95% confidence intervals

Chapter 1-6 (revision 2 Sep 2012) p. 37

Page 38: stata m9

Make the error bar line widths thicker (2 × the default size),

#delimit ;twoway (rcap upperCI lowerCI sequence , msize(*2) lwidth(*2)) (scatter OR sequence , msymbol(square) mlcolor(blue) mfcolor(blue) msize(*2)) , legend(off) ytitle(Odds Ratio for Stroke) xtitle("Body Mass Index (kg/m2)") note("Note: odds ratios are shown with 95% confidence intervals") xlabel(0.5 " " 1 "<25.0" 2 "25.0-27.7" 3 "27.8-31.1" 4 ">31.2" 4.5 " " , noticks) yline(1, lstyle(solid)) ;#delimit cr

.6.8

11.

2O

dds

Rat

io fo

r Stro

ke

<25.0 25.0-27.7 27.8-31.1 >31.2 Body Mass Index (kg/m2)

Note: odds ratios are shown with 95% confidence intervals

Chapter 1-6 (revision 2 Sep 2012) p. 38

Page 39: stata m9

Adding space between the x-axis tick mark labels and x-axis title,

#delimit ;twoway (rcap upperCI lowerCI sequence , msize(*2) lwidth(*2)) (scatter OR sequence , msymbol(square) mlcolor(blue) mfcolor(blue) msize(*2)) , legend(off) ytitle(Odds Ratio for Stroke) xtitle("Body Mass Index (kg/m2)" , height(5)) note("Note: odds ratios are shown with 95% confidence intervals") xlabel(0.5 " " 1 "<25.0" 2 "25.0-27.7" 3 "27.8-31.1" 4 ">31.2" 4.5 " " , noticks) yline(1, lstyle(solid)) ;#delimit cr

.6.8

11.

2O

dds

Rat

io fo

r Stro

ke

<25.0 25.0-27.7 27.8-31.1 >31.2 Body Mass Index (kg/m2)

Note: odds ratios are shown with 95% confidence intervals

Chapter 1-6 (revision 2 Sep 2012) p. 39

Page 40: stata m9

Switch to horizontal placement of the y-axis labels with more white space above and below error bars,

#delimit ;twoway (rcap upperCI lowerCI sequence , msize(*2) lwidth(*2)) (scatter OR sequence , msymbol(square) mlcolor(blue) mfcolor(blue) msize(*2)) , legend(off) ytitle(Odds Ratio for Stroke) xtitle("Body Mass Index (kg/m2)" , height(5)) note("Note: odds ratios are shown with 95% confidence intervals") xlabel(0.5 " " 1 "<25.0" 2 "25.0-27.7" 3 "27.8-31.1" 4 ">31.2" 4.5 " " , noticks) ylabels(.4(.1)1.4,angle(horizontal)) yline(1, lstyle(solid)) ;#delimit cr

.4

.5

.6

.7

.8

.9

1

1.1

1.2

1.3

1.4

Odd

s R

atio

for S

troke

<25.0 25.0-27.7 27.8-31.1 >31.2 Body Mass Index (kg/m2)

Note: odds ratios are shown with 95% confidence intervals

Notice how the error bars suddenly look more narrow, compared to the graph on the previous page, implying greater precision of the odds ratio estimates (narrower 95% confidence intervals). Changing the range of the y-axis to alter the visual effect of a graph can be deceptive if you do not stop to consider the impact of the change. In this case, adding the white space is actually better reporting of the results. The graph on the previous page makes the precision look worse than it is.

Chapter 1-6 (revision 2 Sep 2012) p. 40

Page 41: stata m9

Let us see what happens if we take it to more of an extreme, making the range of the y-axis wider than we need to,

#delimit ;twoway (rcap upperCI lowerCI sequence , msize(*2) lwidth(*2)) (scatter OR sequence , msymbol(square) mlcolor(blue) mfcolor(blue) msize(*2)) , legend(off) ytitle(Odds Ratio for Stroke) xtitle("Body Mass Index (kg/m2)" , height(5)) note("Note: odds ratios are shown with 95% confidence intervals") xlabel(0.5 " " 1 "<25.0" 2 "25.0-27.7" 3 "27.8-31.1" 4 ">31.2" 4.5 " " , noticks) ylabels(0(.2)2,angle(horizontal)) yline(1, lstyle(solid)) ;#delimit cr

0

.2

.4

.6

.8

1

1.2

1.4

1.6

1.8

2

Odd

s R

atio

for S

troke

<25.0 25.0-27.7 27.8-31.1 >31.2 Body Mass Index (kg/m2)

Note: odds ratios are shown with 95% confidence intervals

The precision looks much tighter, but both the reader and the manscript reviewer will recognize that the y-axis range gives more white space than is needed. You might actually lose some credibility over this graph, because it might appear you are trying to graphically deceive the reader into thinking the precison is better than it actually is.

Chapter 1-6 (revision 2 Sep 2012) p. 41

Page 42: stata m9

Logarithmic Scale for Relative Measures of Effect

If you look carefully at the above graph, you will notice the 95% CI is wider above the odds ratio than below the odds ratio. Relative measures of effect (e.g, odds ratios, risk ratios) have this property of a wider CI limit above than below the relative effect measure. It is because a protective effect (odds ratio < 1) is constrained between 0 and 1, whereas a deleterious effect (odds ratio > 1) is not constrained, so ranging from 1 to +∞. Similarly, an odds ratio of 1/3 = 0.33 and 3/1 = 3.00 are of the same magnitude, but numerically look very different, so a protective effect appears smaller than a deleterious effect.To visually correct for these oddities, it is widely accepted that a logarithmic scale should be used for graphs displaying relative measures of effect. In the Instructions for Authors for The American Journal of Epidemiology( http://www.oxfordjournals.org/our_journals/aje/for_authors/general.html ) it states,

“When plotting relative measures of effect (e.g., relative risks, relative odds), a logarithmic scale should be used unless there is a compelling reason to use an arithmetic scale. If bars are used to plot the relative measures, they should start at the baseline levelof 1.0 rather than at zero.”

Egger et al (1997) describe this practice,

“There are several reasons that ratio measures are best plotted on logarithmic scales25. Most importantly, the value of an odds ratio and its reciprocal - for example, 0.5 and 2 - which represent odds ratios of the same magnitude but opposite directions, will be equidistant from 1.0. Studies with odds ratios below and above 1.0 will take up equal space on the graph and thus look equally important. Also, confidence intervals will be symmetrical around the point estimate. ------ 25 Galbraith R. A note on graphical presentation of estimated odds ratios from several clinical trials. Stat Med 1988;7:889-94.”

Chapter 1-6 (revision 2 Sep 2012) p. 42

Page 43: stata m9

Switching to a logarithm scale for the y-axis, and returning the y-axis range to 0.4 to 1.4 to avoid the gee whiz effect,

#delimit ;twoway (rcap upperCI lowerCI sequence , msize(*2) lwidth(*2)) (scatter OR sequence , msymbol(square) mlcolor(blue) mfcolor(blue) msize(*2)) , legend(off) ytitle(Odds Ratio for Stroke) xtitle("Body Mass Index (kg/m2)" , height(5)) note("Note: odds ratios are shown with 95% confidence intervals") xlabel(0.5 " " 1 "<25.0" 2 "25.0-27.7" 3 "27.8-31.1" 4 ">31.2" 4.5 " " , noticks) ylabels(.4(.1)1.4,angle(horizontal)) yline(1, lstyle(solid)) yscale(log) ;#delimit cr

.4

.5

.6

.7

.8

.9

1

1.1

1.2

1.3

1.4

Odd

s R

atio

for S

troke

<25.0 25.0-27.7 27.8-31.1 >31.2 Body Mass Index (kg/m2)

Note: odds ratios are shown with 95% confidence intervals

We see that the error bars look symmetric around the odds ratio symbol (blue square), which was one of the reasons for using a log scale.

Chapter 1-6 (revision 2 Sep 2012) p. 43

Page 44: stata m9

The y range is not what we want though. Notice the 0.4 is way below the graph. To get it to come out right, we have to use define the range using another yscale option.

#delimit ;twoway (rcap upperCI lowerCI sequence , msize(*2) lwidth(*2)) (scatter OR sequence , msymbol(square) mlcolor(blue) mfcolor(blue) msize(*2)) , legend(off) ytitle(Odds Ratio for Stroke) xtitle("Body Mass Index (kg/m2)" , height(5)) note("Note: odds ratios are shown with 95% confidence intervals") xlabel(0.5 " " 1 "<25.0" 2 "25.0-27.7" 3 "27.8-31.1" 4 ">31.2" 4.5 " " , noticks) ylabels(.4(.1)1.4,angle(horizontal)) yline(1, lstyle(solid)) yscale(log) yscale(range(.4 1.4)) ;#delimit cr

.4

.5

.6

.7

.8

.9

11.11.21.31.4

Odd

s R

atio

for S

troke

<25.0 25.0-27.7 27.8-31.1 >31.2 Body Mass Index (kg/m2)

Note: odds ratios are shown with 95% confidence intervals

Chapter 1-6 (revision 2 Sep 2012) p. 44

Page 45: stata m9

Reducing the white space at the bottom of the graph,

#delimit ;twoway (rcap upperCI lowerCI sequence , msize(*2) lwidth(*2)) (scatter OR sequence , msymbol(square) mlcolor(blue) mfcolor(blue) msize(*2)) , legend(off) ytitle(Odds Ratio for Stroke) xtitle("Body Mass Index (kg/m2)" , height(5)) note("Note: odds ratios are shown with 95% confidence intervals") xlabel(0.5 " " 1 "<25.0" 2 "25.0-27.7" 3 "27.8-31.1" 4 ">31.2" 4.5 " " , noticks) ylabels(.5(.1)1.4,angle(horizontal)) yline(1, lstyle(solid)) yscale(log) yscale(range(.5 1.4)) ;#delimit cr

.5

.6

.7

.8

.9

1

1.1

1.2

1.31.4

Odd

s R

atio

for S

troke

<25.0 25.0-27.7 27.8-31.1 >31.2 Body Mass Index (kg/m2)

Note: odds ratios are shown with 95% confidence intervals

Chapter 1-6 (revision 2 Sep 2012) p. 45

Page 46: stata m9

Adding a Special Character and a Superscript in Stata 11

How to do this is explained in the Stata graphics manual, which is available from Stata’s help menu, in the PDF documentation drop down menu option. Click on “[G] Graphics” on the left-side menu. Then click on “Edit” on the menu bar, and “Find” from the drop-down menu, and enter “Text in graphs” in the search box. This will take you to the Table of Contents line that looks like

text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Text in graphs 636

if you have Stata version 12 (refers to page 608 in version 11).Click on the text hyperlink and it will take you to that section of the manual.

Adding an underline and superscript to change

“>31.2” to “31.2” and

“(km/m2)” to “(kg/m2)”

#delimit ;twoway (rcap upperCI lowerCI sequence , msize(*2) lwidth(*2)) (scatter OR sequence , msymbol(square) mlcolor(blue) mfcolor(blue) msize(*2)) , legend(off) ytitle(Odds Ratio for Stroke) xtitle("Body Mass Index (kg/m{superscript:2})" , height(5)) note("Note: odds ratios are shown with 95% confidence intervals") xlabel(0.5 " " 1 "<25.0" 2 "25.0-27.7" 3 "27.8-31.1" 4 "{&ge}31.2" 4.5 " " , noticks) ylabels(.5(.1)1.4,angle(horizontal)) yline(1, lstyle(solid)) yscale(log) yscale(range(.5 1.4)) ;#delimit cr

Chapter 1-6 (revision 2 Sep 2012) p. 46

Page 47: stata m9

.5

.6

.7

.8

.9

1

1.1

1.2

1.31.4

Odd

s R

atio

for S

troke

<25.0 25.0-27.7 27.8-31.1 31.2 Body Mass Index (kg/m2)

Note: odds ratios are shown with 95% confidence intervals

Chapter 1-6 (revision 2 Sep 2012) p. 47

Page 48: stata m9

The superscript on kg/m does not show up very well. Let us convert the superscript “2” to bold face.

#delimit ;twoway (rcap upperCI lowerCI sequence , msize(*2) lwidth(*2)) (scatter OR sequence , msymbol(square) mlcolor(blue) mfcolor(blue) msize(*2)) , legend(off) ytitle(Odds Ratio for Stroke) xtitle("Body Mass Index (kg/m{superscript:{bf:2}})" , height(5)) note("Note: odds ratios are shown with 95% confidence intervals") xlabel(0.5 " " 1 "<25.0" 2 "25.0-27.7" 3 "27.8-31.1" 4 "{&ge}31.2" 4.5 " " , noticks) ylabels(.5(.1)1.4,angle(horizontal)) yline(1, lstyle(solid)) yscale(log) yscale(range(.5 1.4)) ;#delimit cr

.5

.6

.7

.8

.9

1

1.1

1.2

1.31.4

Odd

s R

atio

for S

troke

<25.0 25.0-27.7 27.8-31.1 31.2 Body Mass Index (kg/m2)

Note: odds ratios are shown with 95% confidence intervals

That helped a tiny bit. When we shrink this graph to the printed size in the article, however, it is going to be impossible to see that superscript. Our only choice is to increase the size of the x-title.

Chapter 1-6 (revision 2 Sep 2012) p. 48

Page 49: stata m9

For practice, let us increase the size of the labels, the axis titles, and the footnote.

#delimit ;twoway (rcap upperCI lowerCI sequence , msize(*2) lwidth(*2)) (scatter OR sequence , msymbol(square) mlcolor(blue) mfcolor(blue) msize(*2)) , legend(off) ytitle("Odds Ratio for Stroke" , size(*2)) xtitle("Body Mass Index (kg/m{superscript:{bf:2}})" , height(10) size(*2)) note("Note: odds ratios are shown with 95% confidence intervals" , size(*1.5)) xlabel(0.5 " " 1 "<25.0" 2 "25.0-27.7" 3 "27.8-31.1" 4 "{&ge}31.2" 4.5 " " , noticks labsize(*1.5)) ylabels(.5(.2)1.4,angle(horizontal) labsize(*1.5)) yline(1, lstyle(solid)) yscale(log) yscale(range(.5 1.4)) ;#delimit cr

.5

.7

.9

1.1

1.3

Odd

s R

atio

for S

troke

<25.0 25.0-27.7 27.8-31.1 31.2

Body Mass Index (kg/m2)Note: odds ratios are shown with 95% confidence intervals

Chapter 1-6 (revision 2 Sep 2012) p. 49

Page 50: stata m9

If you are using Stata 10, the muliplier for label sizes (the ytitle, xtile, xlabels, ylabels, the *2, or *1.5, used above, is not yet available. That was added in Version 11. To see the available text sizes,

graph query textsize

textsizestyle may be

default large quarter third_tiny zero full medium quarter_tiny tiny half medlarge small vhuge half_tiny medsmall tenth vlarge huge minuscule third vsmall

So, instead of

ytitle("Odds Ratio for Stroke" , size(*2))

you would use

ytitle("Odds Ratio for Stroke" , size(vlarge))

Adding Special Character and Superscript in Stata 10 and Earlier Versions (By Using PowerPoint)

The following task of making these changes,

“>31.2” to “31.2”

and

“(km/m2)” to “(kg/m2)”

cannot be done in Stata 10 and earlier versions. For earlier verions, we can save the graph in enhanced metafile format (a graph that can be modified in other Windows programs) and then modify it using PowerPoint.

We add a command to do this after our graph commands:

graph export filename.emf , replace

Chapter 1-6 (revision 2 Sep 2012) p. 50

Page 51: stata m9

Doing so, returning the Stata code to not add the superscript or underline, so that we can fix this in PowerPoint,

#delimit ;twoway (rcap upperCI lowerCI sequence , msize(*2) lwidth(*2)) (scatter OR sequence , msymbol(square) mlcolor(blue) mfcolor(blue) msize(*2)) , legend(off) ytitle(Odds Ratio for Stroke) xtitle("Body Mass Index (kg/m2)" , height(5)) note("Note: odds ratios are shown with 95% confidence intervals") xlabel(0.5 " " 1 "<25.0" 2 "25.0-27.7" 3 "27.8-31.1" 4 ">31.2" 4.5 " " , noticks) ylabels(.5(.1)1.4,angle(horizontal)) yline(1, lstyle(solid)) yscale(log) yscale(range(.5 1.4)) ;#delimit cr*graph export CharlesworthOddsRatios.emf , replace

Note for Macintosh Users: save the file as a Macinthosh PICT, rather than an .emf file

Chapter 1-6 (revision 2 Sep 2012) p. 51

Page 52: stata m9

How to modify graph in PowerPoint

1) start with blank slide in PowerPoint2) from toolbar, Insert Picture From file …open graph file CharlesworthOddsRatios.emf3) drag corners of graph to resize4) with mouse pointer on graph, right click mouse, and select edit picture5) when get prompt “This is an imported picture, not a group. Do you want to convert it to a Microsoft Office drawing object?” select Yes6) Now clicking or double clicking on any part of the graph, such as text or a symbol, you can modify.

For example, you can change the numbers to subscripts by clicking once to select a symbol label, highlighting the number that you want to change to a subscript, choosing Format from toolbar, then Font, then Subscript.

How to move graph from PowerPoint to Word

1) with cursor positioned on graph shown on a PowerPoint slide, right click save picture as *.emf formatted file

2) inside Word, on toolbarclick on Insert Picture From File

… shown below is graph modified in PowerPoint and brought back to Word

Chapter 1-6 (revision 2 Sep 2012) p. 52

Page 53: stata m9

Submitting Figures to Journals

Journals do not want to find your graphs inside your manuscript, but rather submitted as either a separate graphics files, usually either a Tagged Image File Format (TIFF) or Encapsulated PostScript (EPS) file.

The eps file is the better format, which will be shown below. For completeness, we will begin with tiff files.

TIFF files To generate such a graphics file, you simply save it as an appropriate file type.

From inside the Stata graphics window (with your graph showing),

File Save As < select a directory and file name > Save as type: TIFF (*.tif) Save

The following command will work as well, saving it into your current (working) directory, which is shown at the bottom left corner of your Stata Window.

graph export CharlesworthOddsRatios.tif , replace

This generally works, but some journals require an additional step. The default TIFF file graph does not have adequate resolution. Also, the journal will likely require that the graph is exactly the size needed for the publication.

Also, different journals require different color values, which even affects the color black. These are specified as either RBG (Stata’s default) or CMYK (must be requested as an option in Stata). The Stata documentation describes this (Stata Graphics Reference Manual, Release 11, 2009, p.543),

“RGB values represent a mixing of red, green, and blue light, whereas CMYK values represent a mixing of pigments—cyan, magenta, yellow, and black. Thus, as the numbers get bigger, RGB colors go from dark to bright, whereas the CMYK colors go from light to dark.”

To illustrate, the journal Muscle and Nerve, states the following in its instructions to authors: http://www3.interscience.wiley.com/journal/32891/home/ForAuthors.html

“Figures. Images submitted be in Tagged Image File Format (TIFF) or Encapsulated PostScript (EPS)….All images must be saved and submitted in final size. The final figure sizes are: 1 column = 3-in. (8.25-cm) wide, 1.5 column = 5-in. (13-cm) wide, 2

Chapter 1-6 (revision 2 Sep 2012) p. 53

Page 54: stata m9

columns = 6-in. (17.15-cm) wide…Resolutions of scanned images are as follows: Line art is to be scanned at 1200 dots per inch (dpi). Halftones are to be scanned at 300 dpi.”…Digital Figures. To ensure that your digital graphics are suitable for print purposes, please go to RapidInspector™ at http://rapidinspector.cadmus.com/wi/index.jsp . This free, stand-alone software application will help you to inspect and verify illustrations right on your computer.”

A figure save as a TIFF file exported directly from Stata is a digital figure.

In these same instructions Muscle and Nerve states that figures prepared in Excel and PowerPoint are not acceptable, so being able to prepare your figures in Stata is especially helpful.

Consistent with these instructions, if your figure contains just lines, with no shaded bars and no shaded symbols, it could be considered “line art.” For those, you could make them 1200 dpi, but anything larger than 300 dpi is acceptable. A figured saved in 1200 dpi takes up an incrediable amount of disk space, though, so you should just go with the 300 dpi—you would not notice any esthetic difference between the two resolutions, anyway. For figures with shaded bars or shaded (filled in) symbols, you definitely make them 300 dpi, since they are similar to a figure scanned as a halftone.

To get the figures to come out just the right size for publication, you use the xsize(#) option on the graph, along with the ysize(#), to get the right shape of the rectangle containing the figure, along with the width(#) option on the export command. These commands specify:

ysize(#) height of available graph area (in inches)xsize(#) width of available graph area (in inches)width(#) total dots in graph width

If you wanted a square graph, you would use either the option

aspectratio(1)

or equivalently use the same number for ysize and xsize, such as

ysize(3) xsize(3)

The default Stata graph will have an aspect ratio (height/width) of 4/5.5=0.73 0.75 or 3/4, which gives a visually appealing rectangle. To get your graph to have this aspect ratio, do nothing and you will get this by default. If you want to be sure it has specifically a 4 inch width, while closely approximating this aspect ratio, 3/4=0.75, you would use the graph options:

ysize(3) xsize(4)

These options will be added to the graph below.

Chapter 1-6 (revision 2 Sep 2012) p. 54

Page 55: stata m9

For this to have 300 dpi, you would then use

graph export mygraph.tif, width(1200)

since 1200/4 = 300 dpi.

To illustrate, we will graph the cosine curve, using the s1mono graph scheme. Executed from inside the Stata do-file editor, since it contains the line continuation character “///”,

twoway (function y=sin(x), clwidth(*1.5) range(0 6.28)) , /// scheme(s1mono)

-1-.5

0.5

1y

0 2 4 6x

We see that the line has some “fuzziness”, which is due to using too few dpi to fill in the niches.This fuzziness is what journals want to avoid.

Saving as a TIFF file,

graph export junk.tif , replace

and inserting back into this chapter using the Microsoft Word menu steps from menu toolbar,

click on Insert Picture From File

Chapter 1-6 (revision 2 Sep 2012) p. 55

Page 56: stata m9

We see it is just as fuzzy and maybe even more fuzzy.

We can get the needed 300 dpi using the ysize, xsize, and width options,

twoway (function y=sin(x), clwidth(*1.5) range(0 6.28)) , /// scheme(s1mono) ysize(3) xsize(4)graph export junk.tif , width(1200) replace

-1-.5

0.5

1y

0 2 4 6x

When we copy and paste the graph from graph window, we see it is now 4 inches wide. It is still fuzzy looking, though, since the width(1200) only applies to the exported graph in junk.tif.

Inserting the junk.tif file into this chapter using Microsoft Word “insert picture from file”,

Chapter 1-6 (revision 2 Sep 2012) p. 56

Page 57: stata m9

We see the fuzziness was resolved, but the graph is not the required width of 4 inches. It may actually be 4 inches wide in the TIFF file, but is stretched by Microsoft Word when we inserted it. (I do not know any way to check this.) We can see what it would look like as the right size by resizing in Word.

Clicking on the graph and dragging the lower right corner, until the 4 inch mark is reached on the ruler at the top of the page in Word,

Chapter 1-6 (revision 2 Sep 2012) p. 57

Page 58: stata m9

Another Example of Submitting a Figure to a Journal

As another example, the Journal of Bone and Joint Surgery (JBJS) states in its Instructions for Authors,

http://www2.ejbjs.org/misc/instrux.dtl

“Figures must be submitted electronically…Refer to the section entitled Illustrations for figure format requirements…Illustrations accompanying your manuscript must be submitted electronically and be in TIFF or EPS format…Color images must be RGB (not CMYK)….When using a digital camera to create your images, if possible, set the camera to save in TIFF format (not JPEG), set the resolution to a minimum of 300 ppi (pixels per inch), and set the size of the image to 5 × 7 in (127 × 178 mm). The resolution of your electronic images is critical and is directly linked to how well they will appear when printed. Color and grayscale images, such as radiographs, must have a minimum resolution of 300 ppi, and line-art drawings must have a minimum resolution of 1200 ppi. An original image size of 5 × 7 in (127 × 178 mm) is preferred.”

Similar to Muscle and Nerve, JBJS wants a TIFF file. However, it wants a figure size 5 × 7 in (127 × 178 mm), which is not an acceptable size for Muscle and Nerve. Also, it appears to want RBG, rather than CMYK, opposite of Muscle and Nerve.

The journal, JBJS, is not one of the journals supported by rapidInspector. However, the following should work, since RGB is Stata’s default,

twoway (function y=sin(x), clwidth(*1.5) range(0 6.28)) , /// scheme(s1mono) ysize(5) xsize(7)graph export junk.tif , width(2100) replace

Chapter 1-6 (revision 2 Sep 2012) p. 58

Page 59: stata m9

EPS Files (This is the Format You Should Always Use for Submitting to Journals)

This is how it was explained to me (personal communication, Derek Wagner, Stata Technical Support, March 2012) when I asked how to get 300 dpi and CMYK in the same graph,

“There isn't a way to set the dpi from within Stata. The options with -graph export- can set the width and height in pixels, but the dpi is a bit different. If a publisher is looking for a certain dpi, then you/they should really be looking at eps. If you save your graph in eps format, you should be able to turn it in to your publisher and they can change the size/resolution to whatever they need.

The eps format is generally the preferred way to move high-resolution graphs across platforms. You want to avoid bitmap formats such as .tif and .png since they are only at screen resolution. Vector formats such as .eps are ideal.

Note that eps files optionally include a tif preview which is low-resolution. However, this tif preview only shows on the screen. The eps file is what will print (provided you have a PostScript-compatible printer) and will print at a very high resolution.

You can turn the preview off with the following option in graph export: preview(off)

For a good discussion on the difference between vector graphics and bit-mapped graphics see the link: http://www.logoants.com/vector-bitmap.php”

The *.eps files hardly take any disk space on your hard drive. They look awful when you open them, but this is not a problem. The publisher has software, such as CorelDraw, which converts them to very high resolution before publishing them.

To save our graph will an *.eps file extension (file type) with CMYK, we use,

twoway (function y=sin(x), clwidth(*1.5) range(0 6.28)) , /// scheme(s1mono) graph export junk.eps , cmyk(on) replace

Chapter 1-6 (revision 2 Sep 2012) p. 59

Page 60: stata m9

Inside the Stata graphics window, the graph looks like,-1

-.50

.51

y

0 2 4 6x

To see what it looks like on the hard drive, we bring it into Microsoft Word using the following menu selections from inside Microsoft Word,

Insert Picture Find the Stata working directory where the file was saved to File name: junk.eps Insert

Noice the resolution of the line is not improved and the tiles and tick mark labels look awful.

Chapter 1-6 (revision 2 Sep 2012) p. 60

Page 61: stata m9

This is not a problem for submitting to a journal, since the journal publisher can improve the resolution of the graph and make the fonts look sharper.

If you want to make the graph look nice for your own Microsoft Word documents, or perhaps for a poster presention you are working on, how to do this is explained next.

Making EPS Files Look Nicer Using Microsoft PowerPoint

The the file extension *.eps does not need or allow the “width” option, which was shown above for the *.tif file, so do not worry about that. The steps are:

1) Save the graph as an EPS file,

twoway (function y=sin(x), clwidth(*1.5) range(0 6.28)) , /// scheme(s1mono) graph export junk.eps , replace

2) Open Microsoft PowerPoint. On the menu bar, select

Home New Slide Layout Office theme: blank

which will give you a blank slide to work with.

3) Inside PowerPoint, on the menu bar, select

Insert Picture Find the directory where you saved the eps file File name: junk.eps

which will insert the graph onto the blank slide.

4) Click on the graph to get the “resize” borderlines around the graph. Drag the upper left corner and the lower right corner of the border to make the graph take up a larger portion of the slide.

5) Position the mouse pointer inside the graph and right click. Select “Edit Picture” and answer “yes” to dialog box that asks, “Do you want to convert to Microsoft Office drawing object?” This conversion makes the fonts, number and letters, look very crisp.

Chapter 1-6 (revision 2 Sep 2012) p. 61

Page 62: stata m9

If you want to now move the graph to Microsoft Word,

while still inside PowerPoint, right click on the graph, and then

Save as Picture File name: junk Save as type: jpg Save

Other file formats work as well here, such as png.

Open Microsoft Word, and on the menu bar,

Insert Picture Find the directory where you saved the jpg file File name: junk.eps Insert

One Last Thing to Check When Submitting a Figure to a Journal

It is always a good idea to print your graph, after resizing it to what it will be in the journal. You might discover that you need to increase the size of the tick labels and the x-axis titles, which otherwise become too small to be readable if you do not.

References

Charlesworth DC, Likosky DS, Marrin CAS, et al. (2003). Development and validation of aprediction model for strokes after coronary artery bypass grafting. Ann Thorac Surg 76:436-43.

Egger M, Smith GD, Phillips AN. (1997). Meta-analysis: priniciples and procedures. BMJ 315(7121):1533-7.

Mitchell MN (2008). A Visual Guide to Stata Graphics, 2nd ed., College Station, TX, Stata Press.

Chapter 1-6 (revision 2 Sep 2012) p. 62