analysing households with the sars jo wathan sars support team university of manchester
TRANSCRIPT
Analysing Households with the SARs
Jo Wathan
SARs support team
University of Manchester
In this session
• What are the SARs?• What would you use them for?• How do you work with them?
– For household level analysis– For heirarchical analysis
• Hands-on session
Census Microdata background
• Census outputs have historically been aggregate tables – safe but inflexible– Well suited to analyses at small
geographical detail• Microdata permits more flexibility
– Longitudinal Survey links data from 1971 good for process but has to be securehttp://www.celsius.lshtm.ac.uk/
– Demand for a cross-sectional dataset that can be used on own desktop
• Samples of anonymised records first available from 1991 Census – 2% individual file (SAR areas)– 1% household file (Region)
General Features of the SARs
• Microdata– Can produce your own
tables, recode and group data
– Can use models– Full individual information
for all census topics– Need to be analysed using a
statistics package
• Very large samples– Good for looking at small
subpopulations
• Can be used alongside other census data
• 2 time points
The SARs family 2001File Sample type Geography Availabilit
y
Individual licenced
3% sample of individuals
UKGOR (+ Wales, Scot, NI, Inner/Outer London)
EUL CCSR
Small area microdata
5% sample of individuals
UK: LA (or constituency in NI)
EUL CCSR
Household licensed
1% hierarchical file
None:England & Wales only
Special licence UKDA
Individual CAMS
Same sample as Individual licenced SAR
LA (GB) or Constituency (NI) IMD info for SOA
In house at ONS
Household CAMS
1% hierarchical file
All of UK In house at ONS
Individual licenced file
Geography GOR (also Inner/Outer London, Scotland, Wales and NI)
Age Grouped: 8 bands for ages 16-74
Ethnicity All 16 categories in v2 (England & Wales), 16 cats of COB
Employment
SOCminor, 40 cats NSSEC, 17 cats of Industry
Notes Slight variation in the sampling fraction for each country:
3.125 in England and Wales; 3.246 in Scotland 3.139 in Northern Ireland
Small area microdata fileGeography LA (Parliamentary
Constituency in NI) – 3 LAs merged due to size
Age All ages banded: 11 bands
Ethnicity 13 cats (England & Wales), 5 cats of COB
Employment
NSSEC 8 cats
Notes Most recent file – published 2006.
The Licence
• All users need to be licensed• Academics complete license as part
of the Census Registration System Process
• Non-academic users sign license as part of the data registration process
• Cannot pass the data to an unlicensed user
• Cannot attempt to identify an individual
Access Arrangements
• Data distributed by CCSR• Academics, no charge
– Register for the data under Census Registration System
– Access the data online from CCSR website
• Non-academics– Not for profit £500 per file– Business users £1000 per file– 10 users per application, incl. software– Download End User License from web
Special licence – Household SAR
Geog None – England & Wales only
Age 2 year bands (e.g. 0-1, 2-3...)
Ethnicity
16 cats
Employ-ment
SOCMinor ,96cats(ISCO), 17cats(SIC92), 40 cats NSSEC
Notes Download access provided through UKDA & UKDA charges apply (free for not for profit). Requires a full paper application. Data supported by SARs team at CCSR. Users must agree to a much higher level of data stewardship than for EUL files.
What are the CAMS?• Contain data which was seen as too
disclosive to release outside ONS• Use limited to research questions which
cannot be satisfied with another data source• ONS vet applications• Data accessed at a Virtual Microdata
Laboratory at ONS – data cannot be removed• Results vetted by ONS prior to release• Users must get OK from ONS before
publishing/presenting results • Further information and appropriate forms at
http://www.statistics.gov.uk/census2001/sar_cams.asp
• Contact [email protected] for more details
Content of CAMs files
• Files contains much more detail; e.g.– Individual year of age (topcoded at 95)– Full coding on country of birth– SOC Unit Group– Local authority geography– Index of Deprivation for SOAs– Index of Deprivation for migrants last
address– ‘Full’ household matrix
CAMS Good practice
• Use the licensed SARs...– to exhaust the potential of other
datasets– to write your syntax files
• check the disclosure guidelines before writing your application
• Avoid complex tables– small cell counts aren’t reliable– unique cells will usually be suppressed
• Do use models
Using SARs to understand households
File Household level analysis
Can create new household variables?
Look at intra-household characteristics
Individual licenced
Yes – select HRP v. Limited No
Small area microdata
Yes –select HRP v. Limited No
Household licensed
Yes select any representative or change to hhd file
Yes Yes
Individual CAMS
Yes –select HRP v. limited No
Household CAMS
Yes select any representative or change to hhd file
Yes Yes
Using the SARs• 1991-2001 Changes
– Principles – Defining a population base– Ethnicity
• Coverage– ONC & Imputation– Difference between 1991/2001
• Good practice issues– Documentation– Data stewardship– Dealing with sample data– Reporting
Comparisons between 1991 and 2001
• Population base changed– Imputation (no imputed values in 1991 SARs)– Students – enumerated at term-time address – Residents only (choice in 1991)
• Variable continuity– Variable names have been changed where the
variable is not exactly the same – Some variables (e.g. age, LLI) are easy to
compare by grouping 1991 values– Some variables are harder to compare as the
question has changed (eg qualifications)
Ethnicity 91/01
• Different questions asked in 1991 and 2001
• No agreed and perfect correspondence
• Simpson and Akinwale use LS to show how 1991 maps on to 2001www.statistics.gov.uk/events/ls_census2001/
agenda.asp
Define your population base
• You need to define the population base– In 1991 we had an issue with visitors
being double counted (filter using residsta)
– In 2001 students who are living away from home are double counted (filter using stulawy in Ind licenced or popbase in other files)
– 2001 Household file contains ‘dummy form’ households with no usual residents, e.g. holiday homes (filter using popbase)
– Note popbase categories vary across files
Census coverage• Major effort to improve coverage in 2001• One Number Census• Use of large Census Coverage Survey to
correct census results, 300K households– Design independent of census; – Used matched census and CCS data to
estimate total population in each area,– adjusted all results for census non-response
using imputation of households and individuals
– Results in final database for UK adjusted for non-response
Census coverage• Coverage before imputation:
– 94% households returned forms, with another 4% estimated to be in households identified by enumerators.
• Response rate lowest for– Young people in their early 20s (men aged
20-24 resp. rate of 87%)– Inner London (resp rate of 78%)
• Once imputed cases are included estimated to be 100% coverage
Non-response• 1991 SARs selected from 10% sample
– Did not include imputed households– 96% coverage
• 2001 SARs selected from 100% ONC database– Imputed individuals/hholds are identified
using oncperim variable– Imputed items are flagged using z
variables (zvar=1 if imputed) – available in the larger *impflag* version of the data
Percentage ONC imputed, 2001 SARsNot ONC imputed
ONC imputed
White 94.8 5.2
Mixed 91.5 8.5
Asian 84.6 15.4
Black 76.5 13.5
Chinese/Other
85.6 14.4
All 93.8 6.2
Percentage with ethnicity variable imputed, 2001 SARs
Not imputed(zeth*=0)
Imputed(zeth*=1)
White 97.5 2.5
Mixed 88.3 11.7
Asian 94.8 5.2
Black 92.6 7.4
Chinese/Other
89.0 11.0
All 97.1 2.9
PRAMMing• PRAMMing is perturbation designed to
deal with very unusual cases, eg widowed 16-year olds
• Avoids additional broad-banding• Perturbation is constrained to
– preserve univariate distributions– Preserve multivariate distributions on control
variables– prevents strange results (like 5 year old
widows)
• Affects 15 variables– Primary economic activity – 1% cases
General advice• PRAMMed cases are flagged as imputed
in z var• Imputation is better than not imputing
unless you have evidence to the contrary– Known exception is ethnicity (Simpson and
Akinwale)• If unsure about impact of PRAMMing
and imputation – Do a sensitivity test– use the z var to exclude cases with imputed
variables and then repeat your analysis– Use ONCPERIM to exclude imputed
individuals and repeat your analysis
Get to know the data• Use the documentation • SARs User Guide
– Use Census schedules to check questions – Check univariate frequencies – Do exploratory analyses – Contact [email protected] if you
can’t find the information you need in the online documentation
• Contact [email protected] if you think there is a problem with the data
SARs as a LARGE dataset• A few Million cases can cause trouble!• Use Nesstar to do initial data exploration • Extract a subset using NESSTAR or take a
subset from the downloaded file • For serious analysis using a syntax
( or .do) file to record syntax makes re-running easier – Create a single syntax file which starts with the
original data– Use file naming conventions that will enable
you to trace versions– Keep a record of work done
SARs as sample data
Geographically stratified sample– approximates to simple random
sample– no clustering in Individual file– Household file – clustering within
households– Although large sample you may have
small sample sizes when using sub-groups
– use standard errors and confidence intervals
Reporting
• Census data is crown copyright• Data should be cited (reference on
web site)• Let us know when you publish• Before presenting or publishing
results based on the CAMS contact ONS beforehand
User support• www.ccsr.ac.uk/sars
– Resources and links added as we go
• Seminar invitations welcome!• Regional workshop invites
welcome!• SARs Helpdesk
– [email protected]– (0161) 275 4735
• Join email and newsletter lists
Questions
…before we talk about using the SARs for hierarchical analysis?
Using hierarchical microdata
• Units of analysis• Flat files vs. hierarchical files• Using household hierarchy
– Different aims – Examples– How to achieve
Types Units of analysis• Individual • Family
A group of people consisting of a married or cohabiting couple with or without child(ren), or a lone parent with child(ren). It also includes a married or cohabiting couple with their grandchild(ren) or a lone grandparent with his or her grandchild(ren) where there are no children in the intervening generation in the household.
• HouseholdA household is defined as one person living alone, or a group of people (not necessarily related) living at the same address with common housekeeping - that is, sharing either a living room or sitting room or at least one meal a day.
• Local authority district (SAM/CAMS)• Others?Definitions from 2001 Definitions Volume, National Stats
(2004)
• HOUSEHOLD LEVEL: 1 observation per household– What proportion of households contain only 1
person? 29.2%– What is the mean household size? 2.34
• INDIVIDUAL LEVEL: 1 observation per person– What proportion of individuals live alone? 12.5%– What is the average household size for
individuals in the sample? 3.05
Source: QLFS 2005 Spring Quarter
Choice of unit matters
Non-hierarchical files• Individual SAR/CAMS and Small
Area Microdata, 1991 Individual SAR
• Can be used to analyse household characteristics if and only if those characteristics– can be represented by those of HRPor…– are already stored in the data
• Need also to select only HRP to avoid large households being over represented
Example: The relationship between occupancy and
social grade using the SAM• The SAM contains 2 occupancy derived
variables as well as HRP’s social grade• Limit analyses to the Household
Reference Person to over-representation of large households (select if reltohr=1)
• Tabulate the already present variables against each other
• Easier access, UK wide with geography (without CAM) and larger n
Results
2001 Small Area Microdata
Occupancy Rating of Hhd
Social Grade of Hhd Reference Person
No emprecord A&B C1 C2 D E Total
2+ rms> req'd 46.3 63.1 50.3 45.3 37 39.3 48.2
1 rm > req'd 28.2 20.2 24.8 28.1 28.8 27.7 25.7
n(rms) = req'd 20 11.9 17.8 19.2 24.1 22.8 18.7
n(rms) < req'd 5.5 4.8 7.1 7.3 10.2 10.3 7.4
Total 100 100 100 100 100 100 100
N= 140554 251341 298250 177626 217245 136800 1221816
Filter: ( Relationship to HRP = Household reference person )
Occupational Rating by Soc Grd of HRP
0%
20%
40%
60%
80%
100%
record
No emp A&B C1 C2 D E TotalHRP Social Grade
n(rms) < req'd
n(rms) = req'd
1 rm > req'd
2+ rms> req'd
But more flexible than tables…
• Can limit to owner occupiers in England and Wales…
What sort of household variables are on the
individual files?e.g. EUL Individual file • Region• Household Resources
– Accomodation type, tenure, lowest floor of accomodation, Furnished, No. rooms
– Sole use of bath/shower/toilet, full/part central heating, self contained
– Cars• Household membership
– No. of residents, number who are; carers, 65+, employed adults, LT ill, poor health
– No. families – Students living away
• Household indicators– Education, employment, health/disability, housing– Social grade of HRP– Multiple ethnicity in hhd
• Density– No. residents per room, occupancy rating
c.f.Hierarchical files• Household SL file, Household CAM,
1991 Household file• Contains individuals within households,
so considerably more flexible• Can be used to create new household
variables based on information about the household and/or information about all the individuals within the household
• Can be used to describe intra-household relationships
The hierarchy of the household SAR
Household 1North West
Social rented
Household 2Wales
Owner occupier
Person 1HRP
Family 1Female
28No quals
No LTILL
Person 2Son of HRPFamily 1
Male12N/A
No LTILL
Person 1 HRP
Family 1Male34
Degree
No LTILL
Person 2Spouse of HRP
Family 1Female
30Degree
P/T EmployeeNo LTILL
Person 3Parent of HRP
Family 2Female
72No quals
Econ InactiveLTILL
• Individuals grouped into household groups
• Family units identified within households
What does it look like?
Looking at the data
For the 20 cases in the previous screenshot:
• How many households?• How many individuals in the largest
household?• What kind of family lives in hnum
41?• Thinking of the census definition of
family unit, did any household have more than one family unit?
What sort of analysis?
• Describing the household better• Describing an individual in relation
to other members of the household
• Describing partnerships
Household composition & position
No. Genera-tions in Hhd
Position within genera-tions
2001:W BC In Pa
1991:W BC I Pa
1 gen
snk <36 3.1 7.6 2.8 1.7 2.4 6.2 1.3 0.6
snk 36+ 6.3 11.7 2.7 1.3 4.1 6.2 1.1 0.8
cpnok <36 8.8 2.9 7.0 6.8 9.5 5.0 4.3 3.5
cpnok 36+ 17.2 5.7 6.2 3.4 14.3 6.5 4.4 1.5
2 gen
upper 2g 52.0 58.1 60.0 63.3 55.2 59.2 58.8 65.2
lower 2g 8.4 8.7 11.4 12.6 11.6 12.1 13.7 14.1
3 gen
upper 3g 0.6 1.0 1.9 2.3 0.7 2.3 3.8 3.9
mid 3g 1.1 1.8 6.2 7.0 2.0 2.3 12.1 10.2
lower 3g 0.1 0.1 0.6 0.9 0.2 0.1 0.4 0.2
unrel 2.4 2.2 1.3 0.7 0.1 0.1 0.0 0.0
Total (100%)
126,086 1,734 2,745 1,638
119,319 1,452 2,118 955
Household SAR 91/01: Female residents 16-59Excludes F/T students
Mixed couples – SL-HSAR
0 .2 .4 .6 .8 1mean of mixedpart
Elsewhere
Other ethnic groupChinese
Other BlackBlack African
Black CaribbeanOther AsianBangladeshi
PakistaniIndian
Other MixedMixed White and Asian
Mixed White and Black AfricanMixed White and Black Caribbean
Other WhiteWhite Irish
White British
Source: Special Licence Household SAR 2001
Mixed sex couples England and Wales
Proportion of Couples of Mixed Ethnicity - by Male Partner's Ethnic Group
...and UK born
0 .2 .4 .6 .8 1mean of mixedpart
UK/Ireland
Other ethnic groupChinese
Other BlackBlack African
Black CaribbeanOther AsianBangladeshi
PakistaniIndian
Other MixedMixed White and Asian
Mixed White and Black AfricanMixed White and Black Caribbean
Other WhiteWhite Irish
White British
Source: Special Licence Household SAR 2001
Mixed sex couples England and Wales
Proportion of Couples of Mixed Ethnicity - by Male Partner's Ethnic Group
Principles of working with hierarchical data
• Can create variables which represent a summary across a household– Min, max, average, sum, count
• May need to prepare the data first • Can also work within families
within households• Need a unique identifier(s) to work
this way
... in SPSS
• Aggregate will create a new file at household (or family...) level
• Match will allow you to link household (or family...) level and individual files
• Aggregate addvar subcommand allows you to do it all in one
Example 1:Add ‘oldest person in hhd’
var to all individuals in the
householdOnly possible in recent versions of SPSSWithin each household (indicated by hnum)Compute maximum value of age:
AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /BREAK=hnum /AGEH_max = MAX(AGEH).
Defines new variable
Break by HouseholdID variable
Add new variableTo current person-level file
Aggregate command produces summary variables across higher level units – MUST SORT BY UNIT FIRST
Which gets us...
For each Value of hnum
Take the max value
of ageh
To createnew
variable
Example 2:Oldest male in the household Same principle as before but ensure that female ages are excluded (set them to system missing first)
DO IF (sex = 1) .RECODE AGEH (ELSE=Copy) INTO mageh .END IF .VARIABLE LABELS mageh 'male age'.EXECUTE .
AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /BREAK=hnum /maxmage 'maximum male age in hhd' = MAX(mageh).
mage = ageh formales only(otherwise systemmissing)
For each value of Hnum:-Take max value of mage-Distribute this value to all with that value of hnum
Which gets us...First compute age for males only
Aggregate command takes maximum value of mage within each value of hnum and distributes across whole household
Can extend this principle...
• To create a variable showing characteristics of HRP/Household Head– Create a new variable for HoH/HRP which
copies the relevant characteristic– Take maximum value of new variable
across household
• To create a variable showing characteristics of Family head– Create a variable for the family head/FRP
containing the value– Aggregate over household number AND
family unit