2014 planning database (pdb)

36
What is the Planning Database (PDB), and How Can I Use it? April 7, 2016 Nancy Bates, Kathleen Kephart, Suzanne McArdle Center for Survey Measurement 1

Upload: doandat

Post on 04-Feb-2017

228 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 2014 Planning Database (PDB)

What is the Planning Database (PDB), and How

Can I Use it?

April 7, 2016 Nancy Bates, Kathleen Kephart,

Suzanne McArdle Center for Survey Measurement

1

Page 2: 2014 Planning Database (PDB)

Acknowledgements Thank you to Travis Pape and Julia Coombs for

creating the code to generate the PDB Luke Larsen and Alina Kline for their work on

the upcoming 2016 PDB Nancy Bates and Barb O’hare for their time

and effort to bring the PDB back Suzanne McArdle for her work on PDB data

visualizations

2

Page 3: 2014 Planning Database (PDB)

Overview A “greatest hits” of ACS 5 year estimates and

2010 Census variables Pulls together publicly available estimates in

one convenient file Available at two levels of geography: Tract and

Block Group Publicly available in CSV and now API format

3

Page 4: 2014 Planning Database (PDB)

Background First PDB developed for 2000 Census planning Selected 1990 Census tract data in easy-to-use format Hard-to-Count Score

ACS annual 5-year estimates for block groups resulted in revised PDB in 2012

2015 PDB Latest 5-year ACS estimates Health Insurance Coverage Estimates An API version of the data for developers

4

Page 5: 2014 Planning Database (PDB)

Contents of the 2015 PDB

Both 2009-2013 5-year ACS estimates and 2010 census data Types of variables Population: gender, age, education, poverty Household: language, relationship, income Housing unit: tenure, number of units Census operational: mailout/mailback, bilingual

5

Page 6: 2014 Planning Database (PDB)

A Broad Scope of Uses Useful for: Identifying areas with likely low survey

response rates Stratifying small areas Creating thematic maps Enhancing reports with population metrics Creating applications

6

Page 7: 2014 Planning Database (PDB)

Access

Available on the Census Bureau’s Research @ Census page

Link to the PDB CSV format:

http://www.census.gov/research/data/planning_database/ API format: www.census.gov/developers Documentation describing the files in PDF

format

7

Page 8: 2014 Planning Database (PDB)

Navigation to the PDB CSV Format

From the Census Bureau internet site (http://www.census.gov): 1. Select “Our Research” from under the “About the

Bureau” menu at the top of the page 2. Select the “Data” tab 3. Select the “Research Data Products” link 4. Select “Planning Database” under the “Demographic –

People and Households” heading 5. Select the appropriate year under “Data and

Documentation”

8

Page 9: 2014 Planning Database (PDB)

Navigation to the PDB API Format

From the Census Bureau internet site (http://www.census.gov):

1. Select “Data” 2. Select “Developers”

3. Select “Available APIs” from the sidebar 4. Scroll down and select “The 2015 Planning Database”

9

Page 10: 2014 Planning Database (PDB)

Managing the PDB

Page 11: 2014 Planning Database (PDB)

It’s a BIG dataset Block Group Level

220,354 block groups X 344 variables =

~75.8 Million cells

Tract Level 74,021 tracts X 566 variables =

~41.9 Million cells

Page 12: 2014 Planning Database (PDB)

The Structure

Geography Identifiers • GIDBG (12 chars) = State (2 chars) + County (3 chars) + Tract (6 chars) + Block Group (1 char) • GIDTR (11 chars) = State (2 chars) + County (3 chars) + Tract (6 chars)

Demographic, Socioeconomic, and Housing data. • Order of variables is consistent. Census data first, followed by ACS estimates and ACS MOEs. • For example, Males_CEN_2010, Males_ACS_09_13, Males_ACSMOE_09_13

Census Operational data including Mail Return Rate and Low Response Score

Percentages and MOE Percentages. Listed in the same order as their respective estimate. • Variables identified with ‘pct_’ added to their variable name. • For example, pct_Males_CEN_2010, pct_Males_ACS_09_13, pct_Males_ACSMOE_09_13

Page 13: 2014 Planning Database (PDB)

Low Response Score (Erdman and Bates slides)

13

Page 14: 2014 Planning Database (PDB)

Low Response Score for Use in Survey and Census Planning and Analysis

Chandra Erdman and Nancy Bates U.S. Census Bureau

Disclaimer: The views expressed on statist ical issues are those of the authors only.

Page 15: 2014 Planning Database (PDB)

Overview

1 The original Hard-to-Count (HTC) Score

The Census Kaggle Challenge

The Low Response Score (LRS)

2

3

Erdman & Bates (2014) Low Response Score (LRS)

Page 16: 2014 Planning Database (PDB)

The Original HTC Score

Bruce et al. (2001); Bruce and Robinson (2003)

1 Renter occupied units

Unmarried

Vacant units Multi-unit structures

Below Poverty

Not high school graduate

2

3

4

5

6

7 Different housing unit 1 year ago

Public assistance

Unemployed

8

9

10 Crowed units

11 Linguistically isolated households

12 No phone service

Erdman & Bates (2014) Low Response Score (LRS)

Page 17: 2014 Planning Database (PDB)

The Census Kaggle Challenge - 2012

“All you need is data and a question. Our data scientists will provide the answer.” – Kaggle.com

Data: 2012 Block-Group-Level Planning Database (PDB) Question: Which statistical model best predicts 2010 Census mail return rates? Product: Updated model-based “Hard-to-Count” Score

Erdman & Bates (2014) Low Response Score (LRS)

Page 18: 2014 Planning Database (PDB)

The Census Kaggle Challenge (Cont.)

2009 America COMPETES Act Contest ran August 31 - November 1, 2012

244 teams and individual competitors

Software developer from MD won top prize

Erdman & Bates (2014) Low Response Score (LRS)

Page 19: 2014 Planning Database (PDB)

Winning Model Predictors

When ranked by relative influence, 24/25 top predictors from PDB

(Rank)

Rel

ativ

e In

fluen

ce

2

0 10 20 30 40 50

1 3

4 ● (1) Renter

● (2) Ages 18−24

● (3) Female head of household, no husband

Erdman & Bates (2014) Low Response Score (LRS)

Page 20: 2014 Planning Database (PDB)

Low Response Model (Block-Group)

Sig: * * * p < .001; * * .001 ≤ p < .01; * .01 ≤ p < .05 R-squared: 56.10%, n = 217,417

Erdman & Bates (2014) Low Response Score (LRS)

Coef Sig Coef Sig (Intercept) 10.29 *** Renter occupied units 1.08 *** Ages 18-24 0.64 *** Female head, no husband 0.58 *** Non-Hispanic White -0.77 *** Ages 65+ -1.21 *** Related child <6 0.46 *** Males 0.09 *** Married family households -0.12 *** Ages 25-44 -0.06 Vacant units 1.08 *** College graduates -0.32 *** Median household income 0.24 *** Ages 45-64 -0.08 * Persons per household 3.44 *** Moved in 2005-2009 0.09 *** Hispanic 0.41 *** Single unit structures -0.52 *** Population Density -0.40 *** Below poverty 0.11 *** Different HU 1 year ago -0.12 *** Ages 5-17 0.17 *** Black -0.04 ** Single person households -0.24 *** Not high school grad -0.06 *** Median house value 0.71 ***

Page 21: 2014 Planning Database (PDB)

Distribution of the LRS

20 30 Low Response Score

Num

ber o

f Blo

ck G

roup

s

0 10 40 50

0 50

00

1000

0 15

000

2000

0 25

000

Erdman & Bates (2014) Low Response Score (LRS)

Rule of thumb…areas with LRS = >29 are hardest to count?

Page 22: 2014 Planning Database (PDB)
Page 23: 2014 Planning Database (PDB)

23

Page 24: 2014 Planning Database (PDB)

LRS/PDB Example: Three HTC Blocks in DC

Columbia Heights: 43% Hispanic;

36% Other Language; 92% 10+ multi-

units; 64% non-family hhds; 85%

renters; 60% moved 5 years ; LRS=32

Erdman & Bates (2014) Low Response Score (LRS)

Anacostia: 98% Black; 46% below

poverty; 89% single unit homes; 15%

non-family hhds; 21% moved 5 years;

93% renters; LRS=38

Trinidad: 37% Ages 18-24;

59% Moved 5 years; 33%

Below poverty; 28% Vacant;

55% Black; 31% white; 87%

renters; LRS=37

Page 25: 2014 Planning Database (PDB)

Considerations

Independent variable is mail response; 2020 Census will have an Internet response option

“Single Unattached Mobiles” (Bates and Mulry, 2011) 64.7 percent of American Community Survey self response by Internet (Baumgardner, 2013)

In January, 2013, ACS began asking about Internet connectivity

Erdman & Bates (2014) Low Response Score (LRS)

Page 26: 2014 Planning Database (PDB)

Summary

New “hard to count” metric for tracts and block groups Winning model was complex but predictors in rank order of influence proved useful Accurate predictions with relatively few predictors

Useful for planning and targeted advertising LRS updated yearly to reflect changes Develop mapping app populated with PDB and LRS?

Erdman & Bates (2014) Low Response Score (LRS)

Page 27: 2014 Planning Database (PDB)

Examples Using the PDB

27

Page 28: 2014 Planning Database (PDB)

Area Demographics 619,371 people live in 179 tracts in DC

DC* United States*

Male to female ratio 0.90 0.97

Population under 5 years old 5.9% 6.4%

Population that identifies as Hispanic 9.6% 16.6%

Population that moved within the past year 19.4% 15.1%

Population that was not born in the US 13.8% 12.9%

28

*ACS 5 year 2009-2013

Page 29: 2014 Planning Database (PDB)

Using Excel to Analyze Demographics

29

I used the Excel function SUM() on all DC tracts to find the total Census population

Page 30: 2014 Planning Database (PDB)

2016 Census Test Harris County Texas Demographics

484,358 people live in 292 block groups in the test site

Houston* United States*

Households where no one over 14 speaks English “very well” 14.8% 4.6%

Population 18-24 years old 9.4% 10.0%

Renter Occupied Units 60.9% 35.1%

Population 25 and over, with less than a HS diploma 19.1% 13.9%

30

*ACS 5 year 2009-2013 Estimate

Page 31: 2014 Planning Database (PDB)

31

Page 32: 2014 Planning Database (PDB)

Linguistic Isolation What if you want to identify areas that may

need support for a language other than English? Find block groups in the area that have a high

percentage of housing units where no one over the age of 14 speaks English “very well” What language is spoken in these tracts?

32

Page 33: 2014 Planning Database (PDB)

Linguistically Isolated BGs in 2016 Census Harris TX Test Site

Rank BG No one speaks English “very

well” Spanish Asian/Pacific

Islander Other

1 4327012 81.4% (14.3)

81.4% (14.0)

0% (2.1)

0% (2.1)

2 4330012 77.2% (13.4)

73.4% (13.5)

3.8% (4.1)

0% (2.3)

3 4327011 72.5% (11.1)

72.5% (10.9)

0% (1.6)

0% (1.6)

4 4335012 69.3% (10.9)

66.1% (10.7)

0% (1.7)

3.2% (4.8)

5 5214001 69.3% (21.1)

69.3% (20.6)

0% (3.7)

0% (3.7)

33

Page 34: 2014 Planning Database (PDB)

JSM Govt Section Data Challenge

Tailoring Outreach to Boost Mail Self-Response in Geographic Areas with Similar Low Response Scores — Darryl Creel

Exploring the Census Bureau's 2014 Planning Database Using Topological Data Analysis — Robert Baskin

Informing Natural Disaster Response with Census Data — Jonathan Auerbach ; Christopher Eshleman, New York City Council

Optimizing Survey Cost-Error Tradeoffs: A Multiple Imputation Strategy Using the Census Planning Database — Shin-Jung Lee, University of Michigan

34

Page 35: 2014 Planning Database (PDB)

Important Note Why are there duplicates tracts and BG in the

PDB? Short answer: Changes in geography since 2010

35

Page 36: 2014 Planning Database (PDB)

Questions?

[email protected]

36