nhanes 1999-2004 analytic strategies

43
NHANES 1999-2004 Analytic Strategies Deanna Kruszon-Moran, MS Centers for Disease Control and Prevention National Center for Health Statistics

Upload: bevis

Post on 29-Jan-2016

81 views

Category:

Documents


0 download

DESCRIPTION

Centers for Disease Control and Prevention National Center for Health Statistics. NHANES 1999-2004 Analytic Strategies. Deanna Kruszon-Moran, MS. Analyzing Data NHANES 1999-2004 Preparing your data files. Downloading demographic, questionnaire, exam and lab files. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: NHANES 1999-2004  Analytic Strategies

NHANES 1999-2004 Analytic Strategies

Deanna Kruszon-Moran, MS

Centers for Disease Control and PreventionNational Center for Health Statistics

Page 2: NHANES 1999-2004  Analytic Strategies

Analyzing Data NHANES 1999-2004Analyzing Data NHANES 1999-2004Preparing your data filesPreparing your data files

Downloading demographic, questionnaire, exam and lab files. • Files are no longer available as self-extracting zip

files.

• Documentation and procedure files are now in Adobe PDF format and can be viewed or accessed directly via the web link

• Clicking on the data link will allow you to store the data file or open it directly with SAS.

• Data files are in SAS transport (.xpt) format.

Page 3: NHANES 1999-2004  Analytic Strategies

Know your dataKnow your data

• Read the documentation !!

• Read the documentation !!

• Read the documentation !!

• Read the documentation!!

Page 4: NHANES 1999-2004  Analytic Strategies

Preparing your data filesPreparing your data files

Merging:Merging:

• Merge all files by sequence number to the demographic file.

• Verify the numbers of records merged and the final sample number against the published frequencies on the web.

• Be sure they are what you expected and all merges worked correctly.

Deanna Kruszon-Moran
sfsfsdfsf
Page 5: NHANES 1999-2004  Analytic Strategies

Know your dataKnow your data

• Run basic frequencies.

• Know your target population.

• Understand how item was measured (how is the item defined, topcoded, recoded)

• Recode variables as necessary (example: age groups, positive/negative lab tests,

high/low BP, high/low cholesterol etc.).

• Recode unknown/refusals as missing data (77, 99 recode to missing).

• Check your coding – run frequencies in SAS.

Page 6: NHANES 1999-2004  Analytic Strategies

Know your dataKnow your data

Continuous Outcome Data:• Look for outliers in your measure.

Run Proc Univariate.

• Look for outliers among the weights.

Use Proc Univariate on the weight variable.

Outlying variables especially those with large weights can really influence your estimates.

• Look at normality. Consider transformations. Log, square root, power.

Page 7: NHANES 1999-2004  Analytic Strategies

NHANES Sample DesignNHANES Sample Design

NHANES is a complex, multistage,

probability cluster design of the civilian,

noninstitutionalized US population.

Page 8: NHANES 1999-2004  Analytic Strategies

Sample WeightsSample Weights

To analyze NHANES data you must use the sample weights to account for :

Page 9: NHANES 1999-2004  Analytic Strategies

Stage 4Individuals

Stage 1Counties Stage 2

Segments

Stage 3Households

1. The base probability of selection1. The base probability of selection

Page 10: NHANES 1999-2004  Analytic Strategies

2. Over sampling2. Over sampling

NHANE 1999-2004 - Oversampled

• African Americans• Mexican Americans• Persons with low income• Adolescents aged 12-19• Persons aged 60+

Page 11: NHANES 1999-2004  Analytic Strategies

3.3. Non-response to the interview & examNon-response to the interview & exam Sample persons age 20+ Sample persons age 20+

Household interviewN=10291

78%

MEC ExamN=9471

71%

Screening interviewN=13312

Exam Non-response

7%

Interview Non-response

22%

Page 12: NHANES 1999-2004  Analytic Strategies

Non-response issues for NHANES Non-response issues for NHANES

Non-response:

• Most components have some level of individual item or component non-response.

• ONLY non-response to the interview and exam has already been accounted for in the weights.

• All additional non-response to the outcome measure of interest should be examined against all possible predictors.

• Potential biases should be discussed.

• If non-response is “high”, re-weighting should be considered.

Page 13: NHANES 1999-2004  Analytic Strategies

Why weight?Why weight?

SampleSample

SubdomainSubdomain

% US % US PopulationPopulation

% sample % sample unweightedunweighted

% sample % sample weightedweighted

Non-Hispanic Blacks

13% 25% 12%

Mexican Americans

9% 28% 9%

12-19 year olds 12% 24% 12%

Page 14: NHANES 1999-2004  Analytic Strategies

Sample weights – Which weights?Sample weights – Which weights?

Weight Variables

to Use

Household Interview Data ONLY

ANY Data from Exam/Lab/MEC Interview

Any 2 yrs of data

(1999-2000 or 2001-2002 or 2003-2004)

WTINT2YRWTINT2YR WTMEC2YRWTMEC2YR

4 yrs of data (1999-2002) *

WTINT4YRWTINT4YR WTMEC4YRWTMEC4YR

4 or 6 yrs of data (1999-2004) or (2001-2004)

Combine appropriate 2 or 4 Combine appropriate 2 or 4 year weights as follows:year weights as follows:

Page 15: NHANES 1999-2004  Analytic Strategies

Two, Four, Six, Eight - How can we estimate? Two, Four, Six, Eight - How can we estimate?

For 4 years of data from 2001-2004 -

MEC4YR = 1/2 WTMEC2YR ;

For 6 years of data from 1999-2004 –

if sddsrvyr=1 or sddsrvyr=2 thenMEC6YR = 2/3 WTMEC4YR ; /* for 1999-2002 */

If sddsrvyr=3 then MEC6YR = 1/3 WTMEC2YR ; /* for 2003-2004 */

* Only when analyzing years 1999-2002, you should not combined 2 year weights but use the 4 year weights provided.

Page 16: NHANES 1999-2004  Analytic Strategies

Two, Four, Six, Eight - How can we estimate? Two, Four, Six, Eight - How can we estimate?

Future years of data will be combined similarly:

For 6 years of data from 2001-2006 -

if sddsrvyr in (1,2,3) then

MEC6YR = 1/3 WTMEC2YR;

For 8 years of data from 1999-2006 –

if sddsrvyr=1 or sddsrvyr=2 then

MEC8YR = 1/2 WTMEC4YR ; /* for 1999-2002 */

if sddsrvyr=3 or sddsrvyr=4 then

MEC8YR = 1/4 WTMEC2YR etc; /* for 2003-2006 */

Page 17: NHANES 1999-2004  Analytic Strategies

Sample Weights - SubsamplesSample Weights - Subsamples

Subsamples and appropriate weights:

• Look at your primary variable of interest and the corresponding weight.

• Look at all other variables you want to combine with it.

• Are all from the interview? Exam? Subsample (i.e. fasting, audiometry, dioxin, VOC’s …) ?

• Use the weight from the smallest subsample for your analysis.

• Be consistent!

Page 18: NHANES 1999-2004  Analytic Strategies

Sample Weights - SubsamplesSample Weights - Subsamples

Subsamples and appropriate weights:

• Be careful about combining subsamples beyond MEC + VOC’s, Interview + Dioxin etc.

• Combining subsamples such as Environmental + AM fasting could be problematic.

• Some subsamples are mutually exclusive.

• Weights were not designed for combining subsamples and may not produce good estimates.

Page 19: NHANES 1999-2004  Analytic Strategies

Preparing for AnalysesPreparing for Analyses

Subsetting the data for SUDAAN:

• If using MEC exam weights - SUBSET the data on those MEC EXAMINED in SAS before using SUDAAN.

• If using other subsample weights – subset the data on those in the subsample corresponding to the weights you are using.

• Then use the SUBPOPN statement in the SUDAAN procedure to further subset your data by age, gender etc. to reflect the target population you are interested in analyzing.

Page 20: NHANES 1999-2004  Analytic Strategies

Sample WeightsSample Weights

Example:

You are interested in examining the association of high triglycerides, blood pressure, and body mass index (BMI) controlling for race/ethnicity on females age 20-59 from the 6 years of data from 1999-2004.

Page 21: NHANES 1999-2004  Analytic Strategies

Sample WeightsSample Weights

Step 1 – Determine the smallest sample population for the analysis to determine the correct weight to use.

• Race/ethnicity, gender and age are in the interview.

• Blood pressure and weight come from the MEC exam a subset of those interviewed.

• Triglycerides were measured on a subsample of those MEC examined who fasted for 8 hours and came to the AM MEC exam.

• Therefore, the fasting subsample is the smallest subsample in the analysis and you would use the AM fasting weights (WTSAF2YR and WTSAF4YR).

Page 22: NHANES 1999-2004  Analytic Strategies

Sample WeightsSample Weights

Step 2 – Combine weights in SAS prior to the SUDAAN procedure for the 6 years from 1999-2004:

If sddsrvyr in (1,2) then WEIGHT6 =2/3*WTSAF4YR ; /* 1999-2002 */

If sddsrvyr=3 then WEIGHT6= 1/3*WTSAF2YR ; /* 2003-2004*/

Page 23: NHANES 1999-2004  Analytic Strategies

Sample WeightsSample Weights

Step 3 – Subset your data set in SAS to reflect the weight being used (AM fasting weights WTSAF2YR or WTSAF4YR):

SAS Code:

IF WTSAF2YR ne . or WTSAF4YR ne . ;

Page 24: NHANES 1999-2004  Analytic Strategies

Sample WeightsSample Weights

Step4 – Last specify the correct weight to use using the weight statement in SUDAAN

and subset your data to obtain the subpopulation of interest using the SUBPOPN statement in SUDAAN (females age 20-59):

WEIGHT WEIGHT6 ;

SUBPOPN riagendr=2 and ridageyr > 19 and ridageyr < 60 ;

Page 25: NHANES 1999-2004  Analytic Strategies

NHANES 1999-2000NHANES 1999-2000Variance EstimationVariance Estimation

Why must you use the sample design to estimate the variance?

• NHANES is a cluster design

• Individual within a cluster are more similar than those in other clusters.

• This homogeneity or clustering results in a reduction of our effective sample size because we choose individuals within cluster vs randomly throughout the population.

Page 26: NHANES 1999-2004  Analytic Strategies

NHANES 1999-2004NHANES 1999-2004Variance EstimationVariance Estimation

Why must you use the sample design to estimate the variance?

• Variance estimates that do not account for this intra cluster correlation are too low and biased.

• Survey software such as SUDAAN or SAS Survey procedures must be used to account for the complex design and produce unbiased variance estimates

• These procedures require information on the sample design (i.e. identification of the PSU and strata) for each sample person.

Page 27: NHANES 1999-2004  Analytic Strategies

NHANES 1999-2000NHANES 1999-2000Variance EstimationVariance Estimation

For the initial 1999-2000 data release we recommended:

• Using JK-1/Jackknife/”leave-one-out” procedure.

• Required 52 replicate weights for each of 52 groups created. Only provided for 1999-2000.

• Can still be used if you have software that can produce the replicate weights.

• Replicate weights for this procedure will no longer be created on the data set.

• Too cumbersome

Page 28: NHANES 1999-2004  Analytic Strategies

NHANES 1999-2004NHANES 1999-2004Variance EstimationVariance Estimation

We now recommend:

Using the Taylor series (linearization) method

• Same as that used in NHANES III.

• We now provide “Masked Variance Units” (MVU’s) in place of primary sampling units (PSU’s) to maintain confidentiality.

• Design variables are called - SDMVSTRA and SDMVPSU.

Page 29: NHANES 1999-2004  Analytic Strategies

Design VariablesDesign Variables

SDMVSTRA and SDMVPSU

• Found in the demographic file.• Found in all two year data sets and can be combined

for 4 or 6 or … year data sets. • Can be used the same as the actual stratum and

PSU variables.• Produce variance estimates close to those using the

“true” design. • Data MUST be sorted by SDMVSTRA and SDMVPSU

first, before using SUDAAN.

Page 30: NHANES 1999-2004  Analytic Strategies

Sample SUDAAN CodeSample SUDAAN Code

In SAS:

IF WTMEC2YR NE . ; (Include only those with weights)

PROC SORT OUT=Datasort ;

BY SDMVSTRA SDMVPSU; (sort on design variables)

SUDAAN code :

PROC Descript DATA=Datasort DESIGN=WR ;

NEST SDMVSTRA SDMVPSU ;

WEIGHT WTMEC2YR ;

SUBPOPN RIDAGEYR > 11 AND RIDAGEYR < 50 AND TOXTEST=1 ;

Page 31: NHANES 1999-2004  Analytic Strategies

Preparing for AnalysisPreparing for AnalysisSetting up the procedure in SAS SurveymeansSetting up the procedure in SAS Surveymeans

SAS code :

PROC Surveymeans data=data ;

Strata sdmvstra;

Cluster sdmvpsu;

Weight WTMEC2YR ;

Where RIDAGEYR > 11 AND RIDAGEYR < 50 AND TOXTEST=1 ;

Page 32: NHANES 1999-2004  Analytic Strategies

Other data analysis issues from NHANESOther data analysis issues from NHANES

Calculating Population Totals

• Estimates of the number of persons in the U.S. population with a particular condition must be done carefully.

• Recommended procedure is to:

• First, estimate the proportion with the condition for each subdomain of interest.

• Mutliply that by the population control totals for that subdomain.

• Tables are available on the NCHS web site with the current March 2001 CPS control totals as part of the analytic guidelines.

Page 33: NHANES 1999-2004  Analytic Strategies

Other data analysis issues from NHANES Other data analysis issues from NHANES

Calculating Population Totals

• Estimates of number of persons with a condition can be obtained by summing the weights of those positive.

• These estimates will be less reliable due to item non response and sampling error

• Not the recommended method.

Page 34: NHANES 1999-2004  Analytic Strategies

Analyzing within NHANES 1999-2004 Analyzing within NHANES 1999-2004

Things to consider:

• Data released in two year cycles.

• We STRONGLY RECOMMEND using two or more cycles (4 or more years )to produce reliable estimates.

• Verify data items collected were comparable in wording and methods.

• When combining years remember to use correct combined weights.

Page 35: NHANES 1999-2004  Analytic Strategies

Analyzing trends with NHANES Analyzing trends with NHANES NHANES III to NHANES 1999-2004NHANES III to NHANES 1999-2004

Things to consider:• What is your sample from each survey–age?• How different was the question worded or the

interview methods ?• How different were the lab or exam

methodologies ? Cutoffs used? Definitions?• For current NHANES 1999-2004 sample sizes may

be smaller depending on number of years measured - especially in sub domains • Larger sampling variation. • May need to limit comparisons.

Page 36: NHANES 1999-2004  Analytic Strategies

Race/Ethnicity NHANES 1999-2004Race/Ethnicity NHANES 1999-2004

Two variables available

RIDRETH1&

RIDRETH2

Page 37: NHANES 1999-2004  Analytic Strategies

Race/Ethnicity NHANES 1999-2004Race/Ethnicity NHANES 1999-2004

Ridreth1- Use for analyses of 1999-2004 data alone.

1=Mexican American2=other Hispanic3=non-Hispanic white4=non-Hispanic black5=other races including multiracial.

• For 2 and 4 years of data we know there is insufficient sample size to analyze “other Hispanics” (group 2) alone or to analyze “all Hispanics”.

• Analyses to evaluate whether 6 years of data (1999-2004) are sufficient to analyze these Hispanic groups are ongoing.

• Groups 2 and 5 can AND should continue to be combined to represent all other races.

Page 38: NHANES 1999-2004  Analytic Strategies

Race/Ethnicity NHANES 1999-2004Race/Ethnicity NHANES 1999-2004

Ridreth2

Use for analyzing trends from NHANES III to NHANES 1999-2004.

Most comparable to race/ethnicity variable collected in NHANES III.

Coded as :1=non-Hispanic white2=non-Hispanic black3=Mexican American4=other – including Multi-Racial5=other Hispanic

Page 39: NHANES 1999-2004  Analytic Strategies

Analyzing data from NHANES 1999-2004Analyzing data from NHANES 1999-2004

Crude versus Age Standardized Estimates:

• Age distributions within survey samples vary by racial/ethnic group.

• Age distributions also vary by survey – NHANES III vs. NHANES 1999-2004.

• When comparing estimates across racial/ethnic groups or between surveys you may need to age standardize.

• Also present all age specific estimates!

Page 40: NHANES 1999-2004  Analytic Strategies

Analyzing data from NHANES 1999-2004Analyzing data from NHANES 1999-2004

When Age Standardizing:

• Use the 2000 U.S. Census Population for consistency for both NHANES III and all NHANES 1999-2000 or above.

• For guidelines and population proportions see the website below for the Klein and Schoenborn HP2010 Statistical Notes on “Age Adjustment using the 2000 Projected U.S. Population”.

http://www.cdc.gov/nchs/data/statnt/statnt20.pdf

Page 41: NHANES 1999-2004  Analytic Strategies

Analyzing data from NHANES 1999-2004Analyzing data from NHANES 1999-2004

When Age Standardizing:

• In SUDAAN, use the STDVAR and STDWGT statements.

• STDVAR –variable name for the age groups.

• STDWGT – corresponding proportion of the 2000 U.S. Census population for that age subgroup.

Page 42: NHANES 1999-2004  Analytic Strategies

Age standardization for NHANESAge standardization for NHANES

Crude vs. Age Standardized Estimates Example:

Hepatitis B

NHANES III

Non-Hispanic

White

Non-Hispanic Black

Mexican

American

Crude Prevalence

3.1 (2.6-3.6) 11.9 (10.6-13.2) 3.6 (2.8-4.6)

Age Standardized

2.6 (2.2-3.1) 11.9 (10.7-13.3) 4.4 (3.4-5.6)

Page 43: NHANES 1999-2004  Analytic Strategies

Analyzing Data from NHANES 1999-2004Analyzing Data from NHANES 1999-2004

Analytic Guidelines:

• Detailed guidelines for working with NHANES data can be found at:

http://www.cdc.gov/nchs/nhanes.htm

• This document contains everything discussed today and will continue to grow to include guidelines for statistical tests, multivariate analyses, modeling and more!

• Web based tutorial also currently in creation.

• Target date for release is Dec 31st 2006.