usingtherpackage rankingproject tomake ... · se la l la ok l ar l ky l nc l sc l ms l al l tn l l...

44
Using the R Package RankingProject to Make Simple Visualizations for Comparing Populations Demographic Edition Jerzy Wieczorek Carnegie Mellon University / Colby College 5/29/18 U.S. Census Bureau SUMMER AT CENSUS 2018 1 / 44

Upload: others

Post on 22-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: UsingtheRPackage RankingProject toMake ... · SE LA l LA OK l AR l KY l NC l SC l MS l AL l TN l l TX l DE l WV l FL l GA l VA l DC l MD 20 22 24 26 28 30 32 34 q k Inner tier: individual

Using the R Package RankingProject to MakeSimple Visualizations for Comparing Populations

Demographic Edition

Jerzy WieczorekCarnegie Mellon University / Colby College

5/29/18U.S. Census Bureau

SUMMER AT CENSUS 2018

1 / 44

Alexandra Barker
Page 2: UsingtheRPackage RankingProject toMake ... · SE LA l LA OK l AR l KY l NC l SC l MS l AL l TN l l TX l DE l WV l FL l GA l VA l DC l MD 20 22 24 26 28 30 32 34 q k Inner tier: individual

Outline

I MotivationI VisualizationsI Using the R package RankingProject

“Mean travel time to work”will be short for“Mean travel time to work of workers 16 years and overwho did not work at home (minutes)”

2 / 44

Page 3: UsingtheRPackage RankingProject toMake ... · SE LA l LA OK l AR l KY l NC l SC l MS l AL l TN l l TX l DE l WV l FL l GA l VA l DC l MD 20 22 24 26 28 30 32 34 q k Inner tier: individual

Motivation

3 / 44

Page 4: UsingtheRPackage RankingProject toMake ... · SE LA l LA OK l AR l KY l NC l SC l MS l AL l TN l l TX l DE l WV l FL l GA l VA l DC l MD 20 22 24 26 28 30 32 34 q k Inner tier: individual

Motivation

Problem: although ranking tables (estimates with MOEs) arenaturally visualized as marginal CIs. . .

I Overlapping marginal CIs do not always imply non-significantdifferences

I Under multiple-comparisons corrections, non-overlappingmarginal CIs do not always imply significant differences either

We will see several other ways to show directly the precision ofestimates and/or significance of comparisons at the same time

4 / 44

Page 5: UsingtheRPackage RankingProject toMake ... · SE LA l LA OK l AR l KY l NC l SC l MS l AL l TN l l TX l DE l WV l FL l GA l VA l DC l MD 20 22 24 26 28 30 32 34 q k Inner tier: individual

Ranking table for mean travel time to work, with statisticalsignificance (2011 ACS 1-year ests)

5 / 44

Page 6: UsingtheRPackage RankingProject toMake ... · SE LA l LA OK l AR l KY l NC l SC l MS l AL l TN l l TX l DE l WV l FL l GA l VA l DC l MD 20 22 24 26 28 30 32 34 q k Inner tier: individual

Census Regions

PA

NJ

NY

ME

NHVT

MA

CTRI

ND

SD

NE

KS

MN

IA

MO

IL

WI

INOH

MI

Census Regions and Divisions of the United States

PACIFIC

AK

0 200 400 Miles

TX

OK

AR

LA

MSAL

GA

FL

SC

TNNC

KY

WV

VA

MD

DC

DE

EASTNORTH

CENTRAL

MIDDLE

ATLANTIC

SOUTHATLANTICEAST

SOUTHCENTRAL

WESTSOUTH

CENTRAL

WESTNORTH

CENTRAL

0 200 400 Miles

HI

PACIFIC

0 100 200 Miles

WA

MT

WY

ID

OR

CA

NV

UT

CO

NM

AZ

MOUNTAIN

PACIFIC

WEST MIDWEST NORTHEAST

SOUTH

U.S. Department of Commerce Economics and Statistics Administration U.S. Census Bureau Prepared by the Geography Division

LEGEND

NEWENGLAND

REGIONDIVISIONSTATE

6 / 44

Page 7: UsingtheRPackage RankingProject toMake ... · SE LA l LA OK l AR l KY l NC l SC l MS l AL l TN l l TX l DE l WV l FL l GA l VA l DC l MD 20 22 24 26 28 30 32 34 q k Inner tier: individual

Mean travel time to work in Southern states

●LA LA

●OK●AR

●KY●NC

●SC●MS●AL

●TN●

● TX● DE

● WV● FL

● GA● VA

● DC● MD

20 22 24 26 28 30 32 34θk

90% confidence intervals

Mean travel time to work (min)in Southern states

123456789

1011121314151617

Est

imat

ed r

ank

7 / 44

Page 8: UsingtheRPackage RankingProject toMake ... · SE LA l LA OK l AR l KY l NC l SC l MS l AL l TN l l TX l DE l WV l FL l GA l VA l DC l MD 20 22 24 26 28 30 32 34 q k Inner tier: individual

Mean travel time to work in 3 neighboring states

●LA LA

●MS

● TX

23.3 23.5 23.7 23.9 24.1 24.3 24.5 24.7 24.9 25.1θk

90% confidence intervals

Mean travel time to work (min)in Texas, Louisiana, and Mississippi

1

2

3

Est

imat

ed r

ank

8 / 44

Page 9: UsingtheRPackage RankingProject toMake ... · SE LA l LA OK l AR l KY l NC l SC l MS l AL l TN l l TX l DE l WV l FL l GA l VA l DC l MD 20 22 24 26 28 30 32 34 q k Inner tier: individual

Marginal CI overlap vs. significance of differences

Abbrev. State Estimate SE

MS Mississippi 23.86 0.24LA Louisiana 24.54 0.15TX Texas 24.82 0.07

For α = 0.1, z α2= 1.64

90% CI for Mississippi minus Louisiana:(23.86− 24.54)± 1.64 ·

√0.152 + 0.242 = (−1.14,−0.22)

90% CI for Texas minus Louisiana:(24.82− 24.54)± 1.64 ·

√0.152 + 0.072 = (0.01, 0.55)

Despite overlap of marginal CIs,this difference is statistically significant too

9 / 44

Page 10: UsingtheRPackage RankingProject toMake ... · SE LA l LA OK l AR l KY l NC l SC l MS l AL l TN l l TX l DE l WV l FL l GA l VA l DC l MD 20 22 24 26 28 30 32 34 q k Inner tier: individual

Differences in mean travel time to work in pairs of states

●LA LA

●MS

−1.3 −1.2 −1.1 −1.0 −0.9 −0.8 −0.7 −0.6 −0.5 −0.4 −0.3 −0.2 −0.1 0.0 0.1θk − θk*

Difference (Mississippi − Louisiana)in mean travel time to work (min)

1

2

Est

imat

ed r

ank

●LA LA

● TX

−0.05 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60θk − θk*

Difference (Texas − Louisiana)in mean travel time to work (min)

1

2

Est

imat

ed r

ank

10 / 44

Page 11: UsingtheRPackage RankingProject toMake ... · SE LA l LA OK l AR l KY l NC l SC l MS l AL l TN l l TX l DE l WV l FL l GA l VA l DC l MD 20 22 24 26 28 30 32 34 q k Inner tier: individual

Marginal CI overlap vs. multiple-comparisons-correctedsignificance of differences

Abbrev. State Estimate SE

MS Mississippi 23.86 0.24LA Louisiana 24.54 0.15. . .

For α = 0.1, z α2·16

= 2.73

90% CI for Mississippi minus Louisiana:(23.86− 24.54)± 2.73 ·

√0.152 + 0.242 = (−1.45, 0.09)

Despite non-overlap of marginal CIs,this difference is not statistically significantafter Bonferroni correction for 16 comparisons(Louisiana vs. each other Southern state)

11 / 44

Page 12: UsingtheRPackage RankingProject toMake ... · SE LA l LA OK l AR l KY l NC l SC l MS l AL l TN l l TX l DE l WV l FL l GA l VA l DC l MD 20 22 24 26 28 30 32 34 q k Inner tier: individual

Differences in mean travel time to work in Southern states

●LA LA

●OK●AR

●KY●NC

●SC●MS●AL

●TN

● TX● DE

● WV● FL

● GA● VA

● DC● MD

−6 −4 −2 0 2 4 6 8 10θk − θk*

Compared to LA

Significantly DifferentNot Significantly Different

Bonferroni−corrected diffs. for Louisiana vs. Southern statesin mean travel time to work (min)

123456789

1011121314151617

Est

imat

ed r

ank

12 / 44

Page 13: UsingtheRPackage RankingProject toMake ... · SE LA l LA OK l AR l KY l NC l SC l MS l AL l TN l l TX l DE l WV l FL l GA l VA l DC l MD 20 22 24 26 28 30 32 34 q k Inner tier: individual

Motivation (repeated)

Problem: although ranking tables (estimates with MOEs) arenaturally visualized as marginal CIs. . .

I Overlapping marginal CIs do not always imply non-significantdifferences

I Under multiple-comparisons corrections, non-overlappingmarginal CIs do not always imply significant differences either

We will see several other ways to show directly the precision ofestimates and/or significance of comparisons at the same time

13 / 44

Page 14: UsingtheRPackage RankingProject toMake ... · SE LA l LA OK l AR l KY l NC l SC l MS l AL l TN l l TX l DE l WV l FL l GA l VA l DC l MD 20 22 24 26 28 30 32 34 q k Inner tier: individual

Visualizations

14 / 44

Page 15: UsingtheRPackage RankingProject toMake ... · SE LA l LA OK l AR l KY l NC l SC l MS l AL l TN l l TX l DE l WV l FL l GA l VA l DC l MD 20 22 24 26 28 30 32 34 q k Inner tier: individual

Ranking table showed significance of differences

15 / 44

Page 16: UsingtheRPackage RankingProject toMake ... · SE LA l LA OK l AR l KY l NC l SC l MS l AL l TN l l TX l DE l WV l FL l GA l VA l DC l MD 20 22 24 26 28 30 32 34 q k Inner tier: individual

“Shaded columns plot” for significance of differences

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

r̂k

Oklahoma

Arkansas

Kentucky

North Carolina

South Carolina

Mississippi

Alabama

Tennessee

Louisiana

Texas

Delaware

West Virginia

Florida

Georgia

Virginia

District of Columbia

Maryland

State (k)

21.13

21.31

22.86

23.37

23.61

23.86

23.94

24.23

24.54

24.82

25.30

25.58

25.76

27.11

27.74

30.10

32.21

θ̂k

.15

.23

.15

.12

.16

.24

.14

.14

.15

.07

.37

.31

.11

.17

.13

.32

.15

SEk

OK

AR

KY

NC

SC

MS AL

TN LA TX

DE

WV FL

GA

VA

DC

MD

16 / 44

Page 17: UsingtheRPackage RankingProject toMake ... · SE LA l LA OK l AR l KY l NC l SC l MS l AL l TN l l TX l DE l WV l FL l GA l VA l DC l MD 20 22 24 26 28 30 32 34 q k Inner tier: individual

CIs for differences

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

r̂k

Oklahoma

Arkansas

Kentucky

North Carolina

South Carolina

Mississippi

Alabama

Tennessee

Louisiana

Texas

Delaware

West Virginia

Florida

Georgia

Virginia

District of Columbia

Maryland

State (k)

21.13

21.31

22.86

23.37

23.61

23.86

23.94

24.23

24.54

24.82

25.30

25.58

25.76

27.11

27.74

30.10

32.21

θ̂k

.15

.23

.15

.12

.16

.24

.14

.14

.15

.07

.37

.31

.11

.17

.13

.32

.15

SEk

●LA LA

●OK●AR

●KY●NC●SC●MS●AL

●TN

● TX● DE● WV● FL

● GA● VA

● DC● MD

−6 −4 −2 0 2 4 6 8 10θk − θk*

Compared to LA

Significantly DifferentNot Significantly Different

17 / 44

Page 18: UsingtheRPackage RankingProject toMake ... · SE LA l LA OK l AR l KY l NC l SC l MS l AL l TN l l TX l DE l WV l FL l GA l VA l DC l MD 20 22 24 26 28 30 32 34 q k Inner tier: individual

“Comparison intervals” example

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

r̂k

Oklahoma

Arkansas

Kentucky

North Carolina

South Carolina

Mississippi

Alabama

Tennessee

Louisiana

Texas

Delaware

West Virginia

Florida

Georgia

Virginia

District of Columbia

Maryland

State (k)

21.13

21.31

22.86

23.37

23.61

23.86

23.94

24.23

24.54

24.82

25.30

25.58

25.76

27.11

27.74

30.10

32.21

θ̂k

.15

.23

.15

.12

.16

.24

.14

.14

.15

.07

.37

.31

.11

.17

.13

.32

.15

SEkLA

Distance from the strip

●LA LA

●OK●AR

●KY●NC

●SC●MS●AL

●TN

● TX● DE

● WV● FL

● GA● VA

● DC● MD

20 21 22 23 24 25 26 27 28 29 30 31 32 33 34

4 3 2 1 0 0 1 2 3 4 5 6 7 8 9

θkθ̂k*

Compared to LA

Significantly DifferentNot Significantly Different

18 / 44

Page 19: UsingtheRPackage RankingProject toMake ... · SE LA l LA OK l AR l KY l NC l SC l MS l AL l TN l l TX l DE l WV l FL l GA l VA l DC l MD 20 22 24 26 28 30 32 34 q k Inner tier: individual

“Comparison intervals” explained

θ̂i − wi θ̂i θ̂i + wi θ̂∗ − zα/2SE∗ θ̂∗ θ̂∗ + zα/2SE∗ Y

( (| |) )

di(low)

di(high)

19 / 44

Page 20: UsingtheRPackage RankingProject toMake ... · SE LA l LA OK l AR l KY l NC l SC l MS l AL l TN l l TX l DE l WV l FL l GA l VA l DC l MD 20 22 24 26 28 30 32 34 q k Inner tier: individual

Goldstein-Healy concept

For two populations k, k ′, we want to choose αGH such thatthe two 100(1− αGH)% CIs for θk and θk′ overlap iffthe 100(1− α)% CI for (θk − θk′) contains 0.

The test procedure “Check for overlap of 100(1− αGH)% CIs” shallhave a Type I error rate of α for H0 : θk = θk′ vs. HA : θk 6= θk′ .

For K > 2 populations, choose αGH such that this test procedurehas Type I error rate of α on average across all

(K2)pairs.

Then for any one pair k, k ′, we can report that “θk , θk′ do (not)differ for an average significance level of α.”

20 / 44

Page 21: UsingtheRPackage RankingProject toMake ... · SE LA l LA OK l AR l KY l NC l SC l MS l AL l TN l l TX l DE l WV l FL l GA l VA l DC l MD 20 22 24 26 28 30 32 34 q k Inner tier: individual

Goldstein-Healy example

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

r̂k

Oklahoma

Arkansas

Kentucky

North Carolina

South Carolina

Mississippi

Alabama

Tennessee

Louisiana

Texas

Delaware

West Virginia

Florida

Georgia

Virginia

District of Columbia

Maryland

State (k)

21.13

21.31

22.86

23.37

23.61

23.86

23.94

24.23

24.54

24.82

25.30

25.58

25.76

27.11

27.74

30.10

32.21

θ̂k

.15

.23

.15

.12

.16

.24

.14

.14

.15

.07

.37

.31

.11

.17

.13

.32

.15

SEk

●LA LA

●OK●AR

●KY●NC●SC

●MS●AL

●TN●

● TX● DE

● WV● FL

● GA● VA

● DC● MD

20 22 24 26 28 30 32 34θk

Goldstein−Healyadjusted intervals

21 / 44

Page 22: UsingtheRPackage RankingProject toMake ... · SE LA l LA OK l AR l KY l NC l SC l MS l AL l TN l l TX l DE l WV l FL l GA l VA l DC l MD 20 22 24 26 28 30 32 34 q k Inner tier: individual

Two-tiered CIs (no Bonferroni)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

r̂k

Oklahoma

Arkansas

Kentucky

North Carolina

South Carolina

Mississippi

Alabama

Tennessee

Louisiana

Texas

Delaware

West Virginia

Florida

Georgia

Virginia

District of Columbia

Maryland

State (k)

21.13

21.31

22.86

23.37

23.61

23.86

23.94

24.23

24.54

24.82

25.30

25.58

25.76

27.11

27.74

30.10

32.21

θ̂k

.15

.23

.15

.12

.16

.24

.14

.14

.15

.07

.37

.31

.11

.17

.13

.32

.15

SEk

●LA LA

●OK●AR

●KY●NC●SC

●MS●AL

●TN●

● TX● DE

● WV● FL

● GA● VA

● DC● MD

20 22 24 26 28 30 32 34θk

Inner tier: Goldstein−Healyadjusted intervals Outer tier: individual 90% CIs

22 / 44

Page 23: UsingtheRPackage RankingProject toMake ... · SE LA l LA OK l AR l KY l NC l SC l MS l AL l TN l l TX l DE l WV l FL l GA l VA l DC l MD 20 22 24 26 28 30 32 34 q k Inner tier: individual

Two-tiered CIs (with demi-Bonferroni)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

r̂k

Oklahoma

Arkansas

Kentucky

North Carolina

South Carolina

Mississippi

Alabama

Tennessee

Louisiana

Texas

Delaware

West Virginia

Florida

Georgia

Virginia

District of Columbia

Maryland

State (k)

21.13

21.31

22.86

23.37

23.61

23.86

23.94

24.23

24.54

24.82

25.30

25.58

25.76

27.11

27.74

30.10

32.21

θ̂k

.15

.23

.15

.12

.16

.24

.14

.14

.15

.07

.37

.31

.11

.17

.13

.32

.15

SEk

●LA LA

●OK●AR

●KY●NC●SC

●MS●AL

●TN●

● TX● DE

● WV● FL

● GA● VA

● DC● MD

20 22 24 26 28 30 32 34θk

Inner tier: individual 90% CIs Outer tier: Goldstein−Healyand demi−Bonferroniadjusted intervals

23 / 44

Page 24: UsingtheRPackage RankingProject toMake ... · SE LA l LA OK l AR l KY l NC l SC l MS l AL l TN l l TX l DE l WV l FL l GA l VA l DC l MD 20 22 24 26 28 30 32 34 q k Inner tier: individual

R package RankingProject

24 / 44

Page 25: UsingtheRPackage RankingProject toMake ... · SE LA l LA OK l AR l KY l NC l SC l MS l AL l TN l l TX l DE l WV l FL l GA l VA l DC l MD 20 22 24 26 28 30 32 34 q k Inner tier: individual

Dataset structure and formattingSouthData[11, ]

## Rank State Estimate.2dec SE.2dec Abbreviation## 11 11 Delaware 25.3 0.37 DE

SouthData = within(SouthData, {Estimate.Print = formatC(Estimate.2dec, 2, format = "f")SE.Print = substring(formatC(SE.2dec, 2, format = "f"),

first = 2)})SouthData[9:11, c(1,2,7,6)]

## Rank State Estimate.Print SE.Print## 9 9 Louisiana 24.54 .15## 10 10 Texas 24.82 .07## 11 11 Delaware 25.30 .37

25 / 44

Page 26: UsingtheRPackage RankingProject toMake ... · SE LA l LA OK l AR l KY l NC l SC l MS l AL l TN l l TX l DE l WV l FL l GA l VA l DC l MD 20 22 24 26 28 30 32 34 q k Inner tier: individual

Setting up a table and plot of CIs for differences

tableParList = with(SouthData,list(ranks = Rank, names = State,

est = Estimate.Print, se = SE.Print))

plotParList = with(SouthData,list(est = Estimate.2dec, se = SE.2dec,

names = Abbreviation, refName = "LA",confLevel = 0.9, plotType = "difference"))

RankPlotWithTable(tableParList, plotParList)

26 / 44

Page 27: UsingtheRPackage RankingProject toMake ... · SE LA l LA OK l AR l KY l NC l SC l MS l AL l TN l l TX l DE l WV l FL l GA l VA l DC l MD 20 22 24 26 28 30 32 34 q k Inner tier: individual

Resulting plot of CIs for differences

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

r̂k

Oklahoma

Arkansas

Kentucky

North Carolina

South Carolina

Mississippi

Alabama

Tennessee

Louisiana

Texas

Delaware

West Virginia

Florida

Georgia

Virginia

District of Columbia

Maryland

State (k)

21.13

21.31

22.86

23.37

23.61

23.86

23.94

24.23

24.54

24.82

25.30

25.58

25.76

27.11

27.74

30.10

32.21

θ̂k

.15

.23

.15

.12

.16

.24

.14

.14

.15

.07

.37

.31

.11

.17

.13

.32

.15

SEk

●LA LA

●OK●AR

●KY●NC●SC●MS●AL

●TN

● TX● DE● WV● FL

● GA● VA

● DC● MD

−6 −4 −2 0 2 4 6 8 10θk − θk*

Compared to LA

Significantly DifferentNot Significantly Different

27 / 44

Page 28: UsingtheRPackage RankingProject toMake ... · SE LA l LA OK l AR l KY l NC l SC l MS l AL l TN l l TX l DE l WV l FL l GA l VA l DC l MD 20 22 24 26 28 30 32 34 q k Inner tier: individual

Cleaning up the plot

tableParList$col2 = 0.65

plotParList$legendX = "bottomright"plotParList$thetaLine = 1.5

RankPlotWithTable(tableParList, plotParList)

28 / 44

Page 29: UsingtheRPackage RankingProject toMake ... · SE LA l LA OK l AR l KY l NC l SC l MS l AL l TN l l TX l DE l WV l FL l GA l VA l DC l MD 20 22 24 26 28 30 32 34 q k Inner tier: individual

Modified plot of CIs for differences

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

r̂k

Oklahoma

Arkansas

Kentucky

North Carolina

South Carolina

Mississippi

Alabama

Tennessee

Louisiana

Texas

Delaware

West Virginia

Florida

Georgia

Virginia

District of Columbia

Maryland

State (k)

21.13

21.31

22.86

23.37

23.61

23.86

23.94

24.23

24.54

24.82

25.30

25.58

25.76

27.11

27.74

30.10

32.21

θ̂k

.15

.23

.15

.12

.16

.24

.14

.14

.15

.07

.37

.31

.11

.17

.13

.32

.15

SEk

●LA LA

●OK●AR

●KY●NC●SC●MS●AL

●TN

● TX● DE● WV● FL

● GA● VA

● DC● MD

−6 −4 −2 0 2 4 6 8 10

θk − θk*

Compared to LA

Significantly DifferentNot Significantly Different

29 / 44

Page 30: UsingtheRPackage RankingProject toMake ... · SE LA l LA OK l AR l KY l NC l SC l MS l AL l TN l l TX l DE l WV l FL l GA l VA l DC l MD 20 22 24 26 28 30 32 34 q k Inner tier: individual

Changing the plot type

plotParList$plotType = "comparison"plotParList$rangeFactor = 0.8

RankPlotWithTable(tableParList, plotParList)

30 / 44

Page 31: UsingtheRPackage RankingProject toMake ... · SE LA l LA OK l AR l KY l NC l SC l MS l AL l TN l l TX l DE l WV l FL l GA l VA l DC l MD 20 22 24 26 28 30 32 34 q k Inner tier: individual

Modified plot of comparison intervals

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

r̂k

Oklahoma

Arkansas

Kentucky

North Carolina

South Carolina

Mississippi

Alabama

Tennessee

Louisiana

Texas

Delaware

West Virginia

Florida

Georgia

Virginia

District of Columbia

Maryland

State (k)

21.13

21.31

22.86

23.37

23.61

23.86

23.94

24.23

24.54

24.82

25.30

25.58

25.76

27.11

27.74

30.10

32.21

θ̂k

.15

.23

.15

.12

.16

.24

.14

.14

.15

.07

.37

.31

.11

.17

.13

.32

.15

SEkLA

Distance from the strip

●LA LA

●OK●AR

●KY●NC

●SC●MS●AL

●TN

● TX● DE

● WV● FL

● GA● VA

● DC● MD

20 21 22 23 24 25 26 27 28 29 30 31 32 33 34

4 3 2 1 0 0 1 2 3 4 5 6 7 8 9

θkθ̂k*

Compared to LA

Significantly DifferentNot Significantly Different

31 / 44

Page 32: UsingtheRPackage RankingProject toMake ... · SE LA l LA OK l AR l KY l NC l SC l MS l AL l TN l l TX l DE l WV l FL l GA l VA l DC l MD 20 22 24 26 28 30 32 34 q k Inner tier: individual

Wrangling .CSV data into the right formatFrom American Fact Finder, I downloaded a .CSV file of ACS 20161-year Ranking Table R2512, Percent of Occupied Housing Unitsthat are Owner-Occupied.

Housing=read.csv("ACS_16_1YR_R2512.US01PRF_with_ann.csv",skip = 1, stringsAsFactors = FALSE)[, c(5,7:9)]

names(Housing)=c("FIPS", "State", "Est.1dec", "MOE.1dec")

Compute approximate SEs from the MOEs. Merge in state regionsand abbreviations, using FIPS codes from another dataset. Subsetto Northeastern states.

Housing$SE.1dec = round(Housing$MOE.1dec / qnorm(.95), 1)FipsData=TravelTime2011[c("Abbreviation","Region","FIPS")]Housing = merge(Housing, FipsData, by = "FIPS")NeData = subset(Housing, Region == "NORTHEAST")

32 / 44

Page 33: UsingtheRPackage RankingProject toMake ... · SE LA l LA OK l AR l KY l NC l SC l MS l AL l TN l l TX l DE l WV l FL l GA l VA l DC l MD 20 22 24 26 28 30 32 34 q k Inner tier: individual

Wrangling .CSV data into the right formatSort the rows and create new columns as needed.

NeData = NeData[order(NeData$Est.1dec), ]rownames(NeData) = NULLNeData = within(NeData, {

Est.Print = formatC(Est.1dec, 1, format = "f")SE.Print = substring(formatC(SE.1dec, 1, format = "f"),

first = 2)Rank = 1:nrow(NeData)

})NeData[1:3, c(2,6,10,9)]

## State Abbreviation Est.Print SE.Print## 1 New York NY 53.3 .1## 2 Rhode Island RI 58.0 .7## 3 Massachusetts MA 62.0 .2

33 / 44

Page 34: UsingtheRPackage RankingProject toMake ... · SE LA l LA OK l AR l KY l NC l SC l MS l AL l TN l l TX l DE l WV l FL l GA l VA l DC l MD 20 22 24 26 28 30 32 34 q k Inner tier: individual

Shaded columns plot

tableParList = with(NeData,list(ranks = Rank, names = State,

est = Est.Print, se = SE.Print))

plotParList = with(NeData,list(est = Est.1dec, se = SE.1dec,

names = Abbreviation, refName = "PA",confLevel = 0.9, plotType = "columns"))

RankPlotWithTable(tableParList, plotParList)

34 / 44

Page 35: UsingtheRPackage RankingProject toMake ... · SE LA l LA OK l AR l KY l NC l SC l MS l AL l TN l l TX l DE l WV l FL l GA l VA l DC l MD 20 22 24 26 28 30 32 34 q k Inner tier: individual

Shaded columns plot

1

2

3

4

5

6

7

8

9

r̂k

New York

Rhode Island

Massachusetts

New Jersey

Connecticut

Pennsylvania

Vermont

New Hampshire

Maine

State (k)

53.3

58.0

62.0

63.2

64.8

68.5

69.8

70.1

71.9

θ̂k

.1

.7

.2

.2

.4

.2

.8

.5

.5

SEk

NY RI

MA NJ

CT

PA VT

NH

ME

35 / 44

Page 36: UsingtheRPackage RankingProject toMake ... · SE LA l LA OK l AR l KY l NC l SC l MS l AL l TN l l TX l DE l WV l FL l GA l VA l DC l MD 20 22 24 26 28 30 32 34 q k Inner tier: individual

Wrangling API data into the right formatUse Census API to get SAIPE 2016 poverty rates and MOEs foreach county in Maine: percent of related children age 5-17 infamilies in poverty.

library(RJSONIO)SaipeList = fromJSON("https://api.census.gov/data/timeseries/poverty/saipe?get=NAME,SAEPOVRT5_17R_PT,SAEPOVRT5_17R_MOE&for=county:*&in=state:23&time=2016")SaipeData = as.data.frame(do.call("rbind", SaipeList[-1]),

stringsAsFactors = FALSE)[, 1:3]for(ii in 2:3) class(SaipeData[[ii]]) = "numeric"names(SaipeData) = c("County", "Est.1dec", "SE.1dec")SaipeData[1:2, ]

## County Est.1dec SE.1dec## 1 Androscoggin County 18.3 4.3## 2 Aroostook County 19.0 4.8

36 / 44

Page 37: UsingtheRPackage RankingProject toMake ... · SE LA l LA OK l AR l KY l NC l SC l MS l AL l TN l l TX l DE l WV l FL l GA l VA l DC l MD 20 22 24 26 28 30 32 34 q k Inner tier: individual

Wrangling API data into the right formatSort the rows and create new columns as needed.

SaipeData = SaipeData[order(SaipeData$Est.1dec), ]SaipeData = within(SaipeData, {

Abbr = substr(County, 0, 3)Est.Print = formatC(Est.1dec, 1, format = "f")Rank = 1:nrow(SaipeData)

})rownames(SaipeData) = NULLSaipeData[1:2, -5]

## County Est.1dec SE.1dec Rank Abbr## 1 York County 10.4 2.7 1 Yor## 2 Cumberland County 10.5 2.2 2 Cum

37 / 44

Page 38: UsingtheRPackage RankingProject toMake ... · SE LA l LA OK l AR l KY l NC l SC l MS l AL l TN l l TX l DE l WV l FL l GA l VA l DC l MD 20 22 24 26 28 30 32 34 q k Inner tier: individual

Goldstein-Healy plot

tableParList = with(SaipeData,list(ranks = Rank, names = County,

est = Est.Print, se = SE.1dec,placeType = "County", col2 = .68))

plotParList = with(SaipeData,list(est = Est.1dec, se = SE.1dec,

names = Abbr, refName = "Ken",confLevel = 0.9, plotType = "individual",xlim = c(0, 50), legendX = "bottomright",GH = TRUE))

RankPlotWithTable(tableParList, plotParList)

38 / 44

Page 39: UsingtheRPackage RankingProject toMake ... · SE LA l LA OK l AR l KY l NC l SC l MS l AL l TN l l TX l DE l WV l FL l GA l VA l DC l MD 20 22 24 26 28 30 32 34 q k Inner tier: individual

Goldstein-Healy plot

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

r̂k

York County

Cumberland County

Sagadahoc County

Hancock County

Knox County

Penobscot County

Kennebec County

Lincoln County

Waldo County

Androscoggin County

Oxford County

Aroostook County

Franklin County

Piscataquis County

Washington County

Somerset County

County (k)

10.4

10.5

13.7

14.1

15.0

16.5

16.7

17.4

18.0

18.3

18.8

19.0

19.3

25.4

25.6

28.5

θ̂k

2.7

2.2

3.8

3.7

4.2

3.5

3.7

4.5

5

4.3

5.1

4.8

5.4

6.8

6.6

4.9

SEk

●Ken Ken

●Yor●Cum

●Sag●Han

●Kno●Pen●

● Lin● Wal● And● Oxf● Aro● Fra

● Pis● Was

● Som

0 4 8 12 16 20 24 28 32 36 40 44 48θk

Goldstein−Healyadjusted intervals

39 / 44

Page 40: UsingtheRPackage RankingProject toMake ... · SE LA l LA OK l AR l KY l NC l SC l MS l AL l TN l l TX l DE l WV l FL l GA l VA l DC l MD 20 22 24 26 28 30 32 34 q k Inner tier: individual

Two-tiered CIs (no Bonferroni)

plotParList$tiers = 2plotParList$textPad = 2

RankPlotWithTable(tableParList, plotParList)

40 / 44

Page 41: UsingtheRPackage RankingProject toMake ... · SE LA l LA OK l AR l KY l NC l SC l MS l AL l TN l l TX l DE l WV l FL l GA l VA l DC l MD 20 22 24 26 28 30 32 34 q k Inner tier: individual

Two-tiered CIs (no Bonferroni)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

r̂k

York County

Cumberland County

Sagadahoc County

Hancock County

Knox County

Penobscot County

Kennebec County

Lincoln County

Waldo County

Androscoggin County

Oxford County

Aroostook County

Franklin County

Piscataquis County

Washington County

Somerset County

County (k)

10.4

10.5

13.7

14.1

15.0

16.5

16.7

17.4

18.0

18.3

18.8

19.0

19.3

25.4

25.6

28.5

θ̂k

2.7

2.2

3.8

3.7

4.2

3.5

3.7

4.5

5

4.3

5.1

4.8

5.4

6.8

6.6

4.9

SEk

●Ken Ken

●Yor●Cum

●Sag●Han

●Kno●Pen●

● Lin● Wal● And● Oxf● Aro● Fra

● Pis● Was

● Som

0 4 8 12 16 20 24 28 32 36 40 44 48θk

Inner tier: Goldstein−Healyadjusted intervals Outer tier: individual 90% CIs

41 / 44

Page 42: UsingtheRPackage RankingProject toMake ... · SE LA l LA OK l AR l KY l NC l SC l MS l AL l TN l l TX l DE l WV l FL l GA l VA l DC l MD 20 22 24 26 28 30 32 34 q k Inner tier: individual

Two-tiered CIs (with demi-Bonferroni)

plotParList$Bonferroni = "demi"plotParList$textPad = 3.5plotParList$xlim = c(-2, 50)

RankPlotWithTable(tableParList, plotParList)

42 / 44

Page 43: UsingtheRPackage RankingProject toMake ... · SE LA l LA OK l AR l KY l NC l SC l MS l AL l TN l l TX l DE l WV l FL l GA l VA l DC l MD 20 22 24 26 28 30 32 34 q k Inner tier: individual

Two-tiered CIs (with demi-Bonferroni)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

r̂k

York County

Cumberland County

Sagadahoc County

Hancock County

Knox County

Penobscot County

Kennebec County

Lincoln County

Waldo County

Androscoggin County

Oxford County

Aroostook County

Franklin County

Piscataquis County

Washington County

Somerset County

County (k)

10.4

10.5

13.7

14.1

15.0

16.5

16.7

17.4

18.0

18.3

18.8

19.0

19.3

25.4

25.6

28.5

θ̂k

2.7

2.2

3.8

3.7

4.2

3.5

3.7

4.5

5

4.3

5.1

4.8

5.4

6.8

6.6

4.9

SEk

●Ken Ken

●Yor●Cum

●Sag●Han●Kno

●Pen●

● Lin● Wal● And● Oxf● Aro● Fra

● Pis● Was

● Som

−2 2 6 10 14 18 22 26 30 34 38 42 46 50θk

Inner tier: individual 90% CIs Outer tier: Goldstein−Healyand demi−Bonferroniadjusted intervals

43 / 44

Page 44: UsingtheRPackage RankingProject toMake ... · SE LA l LA OK l AR l KY l NC l SC l MS l AL l TN l l TX l DE l WV l FL l GA l VA l DC l MD 20 22 24 26 28 30 32 34 q k Inner tier: individual

Thank you!RankingProject is on CRAN:https://cran.r-project.org/package=RankingProjectOne of RStudio’s “Top 40” new packages in March 2017, withnearly 1800 downloads since.

install.packages("RankingProject")vignette("intro", package = "RankingProject")vignette("primer", package = "RankingProject")

Development version on GitHub:https://github.com/civilstat/RankingProject

See also Research Report and TAS article (in press):https://www.census.gov/srd/papers/pdf/rrs2014-12.pdfhttps://doi.org/10.1080/00031305.2017.1392359

Please contact me with any questions:[email protected]://civilstat.com

44 / 44