analysing households with the sars jo wathan sars support team university of manchester

56
Analysing Households with the SARs Jo Wathan SARs support team University of Manchester

Upload: kimberly-garza

Post on 28-Mar-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Analysing Households with the SARs Jo Wathan SARs support team University of Manchester

Analysing Households with the SARs

Jo Wathan

SARs support team

University of Manchester

Page 2: Analysing Households with the SARs Jo Wathan SARs support team University of Manchester

In this session

• What are the SARs?• What would you use them for?• How do you work with them?

– For household level analysis– For heirarchical analysis

• Hands-on session

Page 3: Analysing Households with the SARs Jo Wathan SARs support team University of Manchester

Census Microdata background

• Census outputs have historically been aggregate tables – safe but inflexible– Well suited to analyses at small

geographical detail• Microdata permits more flexibility

– Longitudinal Survey links data from 1971 good for process but has to be securehttp://www.celsius.lshtm.ac.uk/

– Demand for a cross-sectional dataset that can be used on own desktop

• Samples of anonymised records first available from 1991 Census – 2% individual file (SAR areas)– 1% household file (Region)

Page 4: Analysing Households with the SARs Jo Wathan SARs support team University of Manchester

General Features of the SARs

• Microdata– Can produce your own

tables, recode and group data

– Can use models– Full individual information

for all census topics– Need to be analysed using a

statistics package

• Very large samples– Good for looking at small

subpopulations

• Can be used alongside other census data

• 2 time points

Page 5: Analysing Households with the SARs Jo Wathan SARs support team University of Manchester

The SARs family 2001File Sample type Geography Availabilit

y

Individual licenced

3% sample of individuals

UKGOR (+ Wales, Scot, NI, Inner/Outer London)

EUL CCSR

Small area microdata

5% sample of individuals

UK: LA (or constituency in NI)

EUL CCSR

Household licensed

1% hierarchical file

None:England & Wales only

Special licence UKDA

Individual CAMS

Same sample as Individual licenced SAR

LA (GB) or Constituency (NI) IMD info for SOA

In house at ONS

Household CAMS

1% hierarchical file

All of UK In house at ONS

Page 6: Analysing Households with the SARs Jo Wathan SARs support team University of Manchester

Individual licenced file

Geography GOR (also Inner/Outer London, Scotland, Wales and NI)

Age Grouped: 8 bands for ages 16-74

Ethnicity All 16 categories in v2 (England & Wales), 16 cats of COB

Employment

SOCminor, 40 cats NSSEC, 17 cats of Industry

Notes Slight variation in the sampling fraction for each country:

3.125 in England and Wales; 3.246 in Scotland 3.139 in Northern Ireland

Page 7: Analysing Households with the SARs Jo Wathan SARs support team University of Manchester

Small area microdata fileGeography LA (Parliamentary

Constituency in NI) – 3 LAs merged due to size

Age All ages banded: 11 bands

Ethnicity 13 cats (England & Wales), 5 cats of COB

Employment

NSSEC 8 cats

Notes Most recent file – published 2006.

Page 8: Analysing Households with the SARs Jo Wathan SARs support team University of Manchester

The Licence

• All users need to be licensed• Academics complete license as part

of the Census Registration System Process

• Non-academic users sign license as part of the data registration process

• Cannot pass the data to an unlicensed user

• Cannot attempt to identify an individual

Page 9: Analysing Households with the SARs Jo Wathan SARs support team University of Manchester

Access Arrangements

• Data distributed by CCSR• Academics, no charge

– Register for the data under Census Registration System

– Access the data online from CCSR website

• Non-academics– Not for profit £500 per file– Business users £1000 per file– 10 users per application, incl. software– Download End User License from web

Page 10: Analysing Households with the SARs Jo Wathan SARs support team University of Manchester

Special licence – Household SAR

Geog None – England & Wales only

Age 2 year bands (e.g. 0-1, 2-3...)

Ethnicity

16 cats

Employ-ment

SOCMinor ,96cats(ISCO), 17cats(SIC92), 40 cats NSSEC

Notes Download access provided through UKDA & UKDA charges apply (free for not for profit). Requires a full paper application. Data supported by SARs team at CCSR. Users must agree to a much higher level of data stewardship than for EUL files.

Page 11: Analysing Households with the SARs Jo Wathan SARs support team University of Manchester

What are the CAMS?• Contain data which was seen as too

disclosive to release outside ONS• Use limited to research questions which

cannot be satisfied with another data source• ONS vet applications• Data accessed at a Virtual Microdata

Laboratory at ONS – data cannot be removed• Results vetted by ONS prior to release• Users must get OK from ONS before

publishing/presenting results • Further information and appropriate forms at

http://www.statistics.gov.uk/census2001/sar_cams.asp

• Contact [email protected] for more details

Page 12: Analysing Households with the SARs Jo Wathan SARs support team University of Manchester

Content of CAMs files

• Files contains much more detail; e.g.– Individual year of age (topcoded at 95)– Full coding on country of birth– SOC Unit Group– Local authority geography– Index of Deprivation for SOAs– Index of Deprivation for migrants last

address– ‘Full’ household matrix

Page 13: Analysing Households with the SARs Jo Wathan SARs support team University of Manchester

CAMS Good practice

• Use the licensed SARs...– to exhaust the potential of other

datasets– to write your syntax files

• check the disclosure guidelines before writing your application

• Avoid complex tables– small cell counts aren’t reliable– unique cells will usually be suppressed

• Do use models

Page 14: Analysing Households with the SARs Jo Wathan SARs support team University of Manchester

Using SARs to understand households

File Household level analysis

Can create new household variables?

Look at intra-household characteristics

Individual licenced

Yes – select HRP v. Limited No

Small area microdata

Yes –select HRP v. Limited No

Household licensed

Yes select any representative or change to hhd file

Yes Yes

Individual CAMS

Yes –select HRP v. limited No

Household CAMS

Yes select any representative or change to hhd file

Yes Yes

Page 15: Analysing Households with the SARs Jo Wathan SARs support team University of Manchester

Using the SARs• 1991-2001 Changes

– Principles – Defining a population base– Ethnicity

• Coverage– ONC & Imputation– Difference between 1991/2001

• Good practice issues– Documentation– Data stewardship– Dealing with sample data– Reporting

Page 16: Analysing Households with the SARs Jo Wathan SARs support team University of Manchester

Comparisons between 1991 and 2001

• Population base changed– Imputation (no imputed values in 1991 SARs)– Students – enumerated at term-time address – Residents only (choice in 1991)

• Variable continuity– Variable names have been changed where the

variable is not exactly the same – Some variables (e.g. age, LLI) are easy to

compare by grouping 1991 values– Some variables are harder to compare as the

question has changed (eg qualifications)

Page 17: Analysing Households with the SARs Jo Wathan SARs support team University of Manchester

Ethnicity 91/01

• Different questions asked in 1991 and 2001

• No agreed and perfect correspondence

• Simpson and Akinwale use LS to show how 1991 maps on to 2001www.statistics.gov.uk/events/ls_census2001/

agenda.asp

Page 18: Analysing Households with the SARs Jo Wathan SARs support team University of Manchester

Define your population base

• You need to define the population base– In 1991 we had an issue with visitors

being double counted (filter using residsta)

– In 2001 students who are living away from home are double counted (filter using stulawy in Ind licenced or popbase in other files)

– 2001 Household file contains ‘dummy form’ households with no usual residents, e.g. holiday homes (filter using popbase)

– Note popbase categories vary across files

Page 19: Analysing Households with the SARs Jo Wathan SARs support team University of Manchester

Census coverage• Major effort to improve coverage in 2001• One Number Census• Use of large Census Coverage Survey to

correct census results, 300K households– Design independent of census; – Used matched census and CCS data to

estimate total population in each area,– adjusted all results for census non-response

using imputation of households and individuals

– Results in final database for UK adjusted for non-response

Page 20: Analysing Households with the SARs Jo Wathan SARs support team University of Manchester

Census coverage• Coverage before imputation:

– 94% households returned forms, with another 4% estimated to be in households identified by enumerators.

• Response rate lowest for– Young people in their early 20s (men aged

20-24 resp. rate of 87%)– Inner London (resp rate of 78%)

• Once imputed cases are included estimated to be 100% coverage

Page 21: Analysing Households with the SARs Jo Wathan SARs support team University of Manchester

Non-response• 1991 SARs selected from 10% sample

– Did not include imputed households– 96% coverage

• 2001 SARs selected from 100% ONC database– Imputed individuals/hholds are identified

using oncperim variable– Imputed items are flagged using z

variables (zvar=1 if imputed) – available in the larger *impflag* version of the data

Page 22: Analysing Households with the SARs Jo Wathan SARs support team University of Manchester

Percentage ONC imputed, 2001 SARsNot ONC imputed

ONC imputed

White 94.8 5.2

Mixed 91.5 8.5

Asian 84.6 15.4

Black 76.5 13.5

Chinese/Other

85.6 14.4

All 93.8 6.2

Page 23: Analysing Households with the SARs Jo Wathan SARs support team University of Manchester

Percentage with ethnicity variable imputed, 2001 SARs

Not imputed(zeth*=0)

Imputed(zeth*=1)

White 97.5 2.5

Mixed 88.3 11.7

Asian 94.8 5.2

Black 92.6 7.4

Chinese/Other

89.0 11.0

All 97.1 2.9

Page 24: Analysing Households with the SARs Jo Wathan SARs support team University of Manchester

PRAMMing• PRAMMing is perturbation designed to

deal with very unusual cases, eg widowed 16-year olds

• Avoids additional broad-banding• Perturbation is constrained to

– preserve univariate distributions– Preserve multivariate distributions on control

variables– prevents strange results (like 5 year old

widows)

• Affects 15 variables– Primary economic activity – 1% cases

Page 25: Analysing Households with the SARs Jo Wathan SARs support team University of Manchester

General advice• PRAMMed cases are flagged as imputed

in z var• Imputation is better than not imputing

unless you have evidence to the contrary– Known exception is ethnicity (Simpson and

Akinwale)• If unsure about impact of PRAMMing

and imputation – Do a sensitivity test– use the z var to exclude cases with imputed

variables and then repeat your analysis– Use ONCPERIM to exclude imputed

individuals and repeat your analysis

Page 26: Analysing Households with the SARs Jo Wathan SARs support team University of Manchester

Get to know the data• Use the documentation • SARs User Guide

– Use Census schedules to check questions – Check univariate frequencies – Do exploratory analyses – Contact [email protected] if you

can’t find the information you need in the online documentation

• Contact [email protected] if you think there is a problem with the data

Page 27: Analysing Households with the SARs Jo Wathan SARs support team University of Manchester
Page 28: Analysing Households with the SARs Jo Wathan SARs support team University of Manchester
Page 29: Analysing Households with the SARs Jo Wathan SARs support team University of Manchester

SARs as a LARGE dataset• A few Million cases can cause trouble!• Use Nesstar to do initial data exploration • Extract a subset using NESSTAR or take a

subset from the downloaded file • For serious analysis using a syntax

( or .do) file to record syntax makes re-running easier – Create a single syntax file which starts with the

original data– Use file naming conventions that will enable

you to trace versions– Keep a record of work done

Page 30: Analysing Households with the SARs Jo Wathan SARs support team University of Manchester

SARs as sample data

Geographically stratified sample– approximates to simple random

sample– no clustering in Individual file– Household file – clustering within

households– Although large sample you may have

small sample sizes when using sub-groups

– use standard errors and confidence intervals

Page 31: Analysing Households with the SARs Jo Wathan SARs support team University of Manchester

Reporting

• Census data is crown copyright• Data should be cited (reference on

web site)• Let us know when you publish• Before presenting or publishing

results based on the CAMS contact ONS beforehand

Page 32: Analysing Households with the SARs Jo Wathan SARs support team University of Manchester

User support• www.ccsr.ac.uk/sars

– Resources and links added as we go

• Seminar invitations welcome!• Regional workshop invites

welcome!• SARs Helpdesk

[email protected]– (0161) 275 4735

• Join email and newsletter lists

Page 33: Analysing Households with the SARs Jo Wathan SARs support team University of Manchester

Questions

…before we talk about using the SARs for hierarchical analysis?

Page 34: Analysing Households with the SARs Jo Wathan SARs support team University of Manchester

Using hierarchical microdata

• Units of analysis• Flat files vs. hierarchical files• Using household hierarchy

– Different aims – Examples– How to achieve

Page 35: Analysing Households with the SARs Jo Wathan SARs support team University of Manchester

Types Units of analysis• Individual • Family

A group of people consisting of a married or cohabiting couple with or without child(ren), or a lone parent with child(ren). It also includes a married or cohabiting couple with their grandchild(ren) or a lone grandparent with his or her grandchild(ren) where there are no children in the intervening generation in the household.

• HouseholdA household is defined as one person living alone, or a group of people (not necessarily related) living at the same address with common housekeeping - that is, sharing either a living room or sitting room or at least one meal a day.

• Local authority district (SAM/CAMS)• Others?Definitions from 2001 Definitions Volume, National Stats

(2004)

Page 36: Analysing Households with the SARs Jo Wathan SARs support team University of Manchester

• HOUSEHOLD LEVEL: 1 observation per household– What proportion of households contain only 1

person? 29.2%– What is the mean household size? 2.34

• INDIVIDUAL LEVEL: 1 observation per person– What proportion of individuals live alone? 12.5%– What is the average household size for

individuals in the sample? 3.05

Source: QLFS 2005 Spring Quarter

Choice of unit matters

Page 37: Analysing Households with the SARs Jo Wathan SARs support team University of Manchester

Non-hierarchical files• Individual SAR/CAMS and Small

Area Microdata, 1991 Individual SAR

• Can be used to analyse household characteristics if and only if those characteristics– can be represented by those of HRPor…– are already stored in the data

• Need also to select only HRP to avoid large households being over represented

Page 38: Analysing Households with the SARs Jo Wathan SARs support team University of Manchester

Example: The relationship between occupancy and

social grade using the SAM• The SAM contains 2 occupancy derived

variables as well as HRP’s social grade• Limit analyses to the Household

Reference Person to over-representation of large households (select if reltohr=1)

• Tabulate the already present variables against each other

• Easier access, UK wide with geography (without CAM) and larger n

Page 39: Analysing Households with the SARs Jo Wathan SARs support team University of Manchester

Results

2001 Small Area Microdata

Occupancy Rating of Hhd

Social Grade of Hhd Reference Person

No emprecord A&B C1 C2 D E Total

2+ rms> req'd 46.3 63.1 50.3 45.3 37 39.3 48.2

1 rm > req'd 28.2 20.2 24.8 28.1 28.8 27.7 25.7

n(rms) = req'd 20 11.9 17.8 19.2 24.1 22.8 18.7

n(rms) < req'd 5.5 4.8 7.1 7.3 10.2 10.3 7.4

Total 100 100 100 100 100 100 100

N= 140554 251341 298250 177626 217245 136800 1221816

Filter: ( Relationship to HRP = Household reference person )

Occupational Rating by Soc Grd of HRP

0%

20%

40%

60%

80%

100%

record

No emp A&B C1 C2 D E TotalHRP Social Grade

n(rms) < req'd

n(rms) = req'd

1 rm > req'd

2+ rms> req'd

Page 40: Analysing Households with the SARs Jo Wathan SARs support team University of Manchester

But more flexible than tables…

• Can limit to owner occupiers in England and Wales…

Page 41: Analysing Households with the SARs Jo Wathan SARs support team University of Manchester

What sort of household variables are on the

individual files?e.g. EUL Individual file • Region• Household Resources

– Accomodation type, tenure, lowest floor of accomodation, Furnished, No. rooms

– Sole use of bath/shower/toilet, full/part central heating, self contained

– Cars• Household membership

– No. of residents, number who are; carers, 65+, employed adults, LT ill, poor health

– No. families – Students living away

• Household indicators– Education, employment, health/disability, housing– Social grade of HRP– Multiple ethnicity in hhd

• Density– No. residents per room, occupancy rating

Page 42: Analysing Households with the SARs Jo Wathan SARs support team University of Manchester

c.f.Hierarchical files• Household SL file, Household CAM,

1991 Household file• Contains individuals within households,

so considerably more flexible• Can be used to create new household

variables based on information about the household and/or information about all the individuals within the household

• Can be used to describe intra-household relationships

Page 43: Analysing Households with the SARs Jo Wathan SARs support team University of Manchester

The hierarchy of the household SAR

Household 1North West

Social rented

Household 2Wales

Owner occupier

Person 1HRP

Family 1Female

28No quals

No LTILL

Person 2Son of HRPFamily 1

Male12N/A

No LTILL

Person 1 HRP

Family 1Male34

Degree

No LTILL

Person 2Spouse of HRP

Family 1Female

30Degree

P/T EmployeeNo LTILL

Person 3Parent of HRP

Family 2Female

72No quals

Econ InactiveLTILL

• Individuals grouped into household groups

• Family units identified within households

Page 44: Analysing Households with the SARs Jo Wathan SARs support team University of Manchester

What does it look like?

Page 45: Analysing Households with the SARs Jo Wathan SARs support team University of Manchester

Looking at the data

For the 20 cases in the previous screenshot:

• How many households?• How many individuals in the largest

household?• What kind of family lives in hnum

41?• Thinking of the census definition of

family unit, did any household have more than one family unit?

Page 46: Analysing Households with the SARs Jo Wathan SARs support team University of Manchester

What sort of analysis?

• Describing the household better• Describing an individual in relation

to other members of the household

• Describing partnerships

Page 47: Analysing Households with the SARs Jo Wathan SARs support team University of Manchester

Household composition & position

No. Genera-tions in Hhd

Position within genera-tions

2001:W BC In Pa

1991:W BC I Pa

1 gen

snk <36 3.1 7.6 2.8 1.7 2.4 6.2 1.3 0.6

snk 36+ 6.3 11.7 2.7 1.3 4.1 6.2 1.1 0.8

cpnok <36 8.8 2.9 7.0 6.8 9.5 5.0 4.3 3.5

cpnok 36+ 17.2 5.7 6.2 3.4 14.3 6.5 4.4 1.5

2 gen

upper 2g 52.0 58.1 60.0 63.3 55.2 59.2 58.8 65.2

lower 2g 8.4 8.7 11.4 12.6 11.6 12.1 13.7 14.1

3 gen

upper 3g 0.6 1.0 1.9 2.3 0.7 2.3 3.8 3.9

mid 3g 1.1 1.8 6.2 7.0 2.0 2.3 12.1 10.2

lower 3g 0.1 0.1 0.6 0.9 0.2 0.1 0.4 0.2

unrel 2.4 2.2 1.3 0.7 0.1 0.1 0.0 0.0

Total (100%)

126,086 1,734 2,745 1,638

119,319 1,452 2,118 955

Household SAR 91/01: Female residents 16-59Excludes F/T students

Page 48: Analysing Households with the SARs Jo Wathan SARs support team University of Manchester

Mixed couples – SL-HSAR

0 .2 .4 .6 .8 1mean of mixedpart

Elsewhere

Other ethnic groupChinese

Other BlackBlack African

Black CaribbeanOther AsianBangladeshi

PakistaniIndian

Other MixedMixed White and Asian

Mixed White and Black AfricanMixed White and Black Caribbean

Other WhiteWhite Irish

White British

Source: Special Licence Household SAR 2001

Mixed sex couples England and Wales

Proportion of Couples of Mixed Ethnicity - by Male Partner's Ethnic Group

Page 49: Analysing Households with the SARs Jo Wathan SARs support team University of Manchester

...and UK born

0 .2 .4 .6 .8 1mean of mixedpart

UK/Ireland

Other ethnic groupChinese

Other BlackBlack African

Black CaribbeanOther AsianBangladeshi

PakistaniIndian

Other MixedMixed White and Asian

Mixed White and Black AfricanMixed White and Black Caribbean

Other WhiteWhite Irish

White British

Source: Special Licence Household SAR 2001

Mixed sex couples England and Wales

Proportion of Couples of Mixed Ethnicity - by Male Partner's Ethnic Group

Page 50: Analysing Households with the SARs Jo Wathan SARs support team University of Manchester

Principles of working with hierarchical data

• Can create variables which represent a summary across a household– Min, max, average, sum, count

• May need to prepare the data first • Can also work within families

within households• Need a unique identifier(s) to work

this way

Page 51: Analysing Households with the SARs Jo Wathan SARs support team University of Manchester

... in SPSS

• Aggregate will create a new file at household (or family...) level

• Match will allow you to link household (or family...) level and individual files

• Aggregate addvar subcommand allows you to do it all in one

Page 52: Analysing Households with the SARs Jo Wathan SARs support team University of Manchester

Example 1:Add ‘oldest person in hhd’

var to all individuals in the

householdOnly possible in recent versions of SPSSWithin each household (indicated by hnum)Compute maximum value of age:

AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /BREAK=hnum /AGEH_max = MAX(AGEH).

Defines new variable

Break by HouseholdID variable

Add new variableTo current person-level file

Aggregate command produces summary variables across higher level units – MUST SORT BY UNIT FIRST

Page 53: Analysing Households with the SARs Jo Wathan SARs support team University of Manchester

Which gets us...

For each Value of hnum

Take the max value

of ageh

To createnew

variable

Page 54: Analysing Households with the SARs Jo Wathan SARs support team University of Manchester

Example 2:Oldest male in the household Same principle as before but ensure that female ages are excluded (set them to system missing first)

DO IF (sex = 1) .RECODE AGEH (ELSE=Copy) INTO mageh .END IF .VARIABLE LABELS mageh 'male age'.EXECUTE .

AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /BREAK=hnum /maxmage 'maximum male age in hhd' = MAX(mageh).

mage = ageh formales only(otherwise systemmissing)

For each value of Hnum:-Take max value of mage-Distribute this value to all with that value of hnum

Page 55: Analysing Households with the SARs Jo Wathan SARs support team University of Manchester

Which gets us...First compute age for males only

Aggregate command takes maximum value of mage within each value of hnum and distributes across whole household

Page 56: Analysing Households with the SARs Jo Wathan SARs support team University of Manchester

Can extend this principle...

• To create a variable showing characteristics of HRP/Household Head– Create a new variable for HoH/HRP which

copies the relevant characteristic– Take maximum value of new variable

across household

• To create a variable showing characteristics of Family head– Create a variable for the family head/FRP

containing the value– Aggregate over household number AND

family unit