npsas das training december 2006 training shefali v. mehta minnesota office of higher education
TRANSCRIPT
NPSAS DAS Training
December 2006
TrainingShefali V. Mehta
Minnesota Office of Higher Education
NPSAS Background
NPSAS 2004 Data
Based on training by:
Lutz Berkner of MPR Associates, Inc. Tracy Hunt- White and James Griffith of
the National Center for Education Statistics
November 2006 Minnesota Office of Higher Education 3
What is the National Postsecondary Student Aid Survey?
The National Postsecondary Student Aid Survey, or NPSAS, is a nationally-representative stratified random sample of undergraduate, graduate and first-professional students attending postsecondary institutions.
Today’s presentation will focus on the undergraduate sample data- how it was collected, what it contains and how to access and use it.
November 2006 Minnesota Office of Higher Education 4
NCES’ Recent Surveys: Higher Education Longitudinal and Cross-Sectional Studies
NPSASNational Postsecondary Student Aid Study
1986-871989-90
BPS
1992
1994
1992-93B&B
1994
1997
2003
1995-96BPS
1998
2001
1999-2000B&B
2001
2003-2004BPS
2006
2009
NSOPF:88 NSOPF:93 NSOPF:99 NSOPF:04
November 2006 Minnesota Office of Higher Education 5
Data Sources for the NPSAS 2004
Central Processing System (CPS) Match Institutional Records (CADE) Student Interviews NSLDS Loan Match NSLDS Pell Grant File Match ETS File Match ACT File Match
November 2006 Minnesota Office of Higher Education 6
NPSAS 2004 Data Collection Timeline
Sample Institutions
Obtain Cooperation
Obtain Lists/Select Student Sample
CPS Matching
Preload
CADEStudent
Interviews
Data File Preparation
Aug 2002
Jan – Oct 2003
Jan – July 2004
Mar – Sep 2004
Mar – Dec 2004
November 2006 Minnesota Office of Higher Education 7
Products related to NPSAS
Public Use Data Systems (DASs) Methodology Reports describing study
design, procedures, and outcomes Restricted use research files ED Tabs and Descriptive Reports
based on analyses of merged data.
Using the DAS online
Accessing the NPSAS 2004 Data
November 2006 Minnesota Office of Higher Education 9
What is the DAS?
The Data Application System, or DAS, is a software application that produces tables and correlation matrices for NCES datasets.
The DAS, which is available for each NCES dataset, includes
Over 1,000 variables with full descriptions and
Statistical information, such as standard errors and the distribution of the data.
It is available online through the NCES website:
http://nces.ed.gov/dasol/
November 2006 Minnesota Office of Higher Education 10
DAS Home Page:http://nces.ed.gov/das/
November 2006 Minnesota Office of Higher Education 11
DAS Onlinehttp://nces.ed.gov/dasol/
November 2006 Minnesota Office of Higher Education 12
DAS Online: Select a dataset
November 2006 Minnesota Office of Higher Education 13
DAS Online
November 2006 Minnesota Office of Higher Education 14
NCES Data Usage Agreement
Select “I agree..” to continue to the DAS for NPSAS 2004.
Note:To use DAS online, you need to enable pop-up windows from this website. The application relies heavily on pop-up windows, such as this usage agreement.
November 2006 Minnesota Office of Higher Education 15
DAS Online Window
Toolbar
November 2006 Minnesota Office of Higher Education 16
DAS Online Window
Subject Category
Topic
Subtopic
November 2006 Minnesota Office of Higher Education 17
DAS Online Window
Variable list
Blue = continuous variableGreen = categorical variableRed = weight
November 2006 Minnesota Office of Higher Education 18
Available variables
Click on the “view/download list of variables” link to see all available variables.
November 2006 Minnesota Office of Higher Education 19
Locating variables in the NPSAS
Frequently Used: Variables
Aid: Application, Federal, Grants, Institutional, Net Price, Outside, Package, Ratio, State, Total
Background: Demographics, Family, Residence
Education: Attendance, Program
Employment: Description, Employer, Future, Licensure, Status, While Enrolled
Finances: Income
Institution: Other, Price, Type
Parent: Education, Family
Public Service: Participation
Survey: Sample, Weights
There are two ways to select variables. The first is through the drop-down menus available on the main page. The menus are organized in the following categories:
November 2006 Minnesota Office of Higher Education 20
The second way to locate a variable is by clicking on the “Search for variable” link on the toolbar. This pop-up window will appear.
Locating variables in the NPSAS
Using the DAS online
Using the Variable Tags
November 2006 Minnesota Office of Higher Education 22
What kind of estimates can the DAS produce?
Means (including observations = 0) Averages (of observations > 0) Percent distributions Percent positive (or greater than a selected
value) Percentiles (10th, 25th, 50th, 75th, and 90th)
(with or without observations = 0) Medians (the 50th centile)
or Correlation matrices
November 2006 Minnesota Office of Higher Education 23
Variable Description Window
Each variable window contains the following:
•a description of the variable
•the sources for the variable
November 2006 Minnesota Office of Higher Education 24
Variable Description Window
And the distribution of the variable.
In this case, 63.2 percent of the data has a value for the total amount received.
The range for this variable is $50-$56,740.
Remember- this information is for the national level, each state has its own distribution.
November 2006 Minnesota Office of Higher Education 25
Select a Tag for the Variable
Click on the “Select a tag” tab to show the tag options available for the variable.
These “tags” tell you the various ways this variable can be represented in your table.
Using the DAS online
Practice exercises to illustrate the tags
November 2006 Minnesota Office of Higher Education 27
NPSAS - Exercise 1
What is the percent distribution of full-time, full-year undergraduates according to degree program and gender, by dependency status, institution sector, aid status, and age?
Find the percentage of full-time, full-year independent male students who attended a public 4-year institution.
November 2006 Minnesota Office of Higher Education 28
Exercise 1 – Breakdown
Run 1 – What is the percent distribution of undergraduates according to degree program, by dependency status and institution sector?
Run 2 – What is the percent distribution of undergraduates according to degree program, by dependency status, institution sector, aid status, and age?
Run 3 – What is the percent distribution of full-time, full-year undergraduates according to degree program and gender, by dependency status, institution sector, aid status, and age?
November 2006 Minnesota Office of Higher Education 29
Tags: Column_Cat
Creates percentages for each category of a variable
Missing values and legitimate skips are not included in any of the categories
Responses coded as “0” are not included
Pertains to categorical variables only
Also applies to: Row_Cat, Span_Cat, By_Cat
November 2006 Minnesota Office of Higher Education 30
Tags: Row_Cat
Similar to Column_Cat
Creates a row of estimates for each category
Responses coded as “0” are not included
Pertains to categorical variables only
Also applies to: Column_Cat, Span_Cat, By_Cat
November 2006 Minnesota Office of Higher Education 31
Tags: Row_Lump
Creates customized categories by grouping existing variable categories
Responses coded as “0” can be included
Legitimate skips can be excluded or included in the new categorization
Allows reordering of existing categories
Pertains to categorical variables only
Also applies to: Column_Lump, Span_Lump, By_Lump
November 2006 Minnesota Office of Higher Education 32
Tags: Row_Cut
Divides a continuous variable into categories by specifying ranges
Creates a row of estimates for each category
Specify beginning cut-point value in each range
Cut-point must be a number with a decimal (e.g., 10.5)
Also applies to: Column_Cut, Span_Cut, By_Cut
November 2006 Minnesota Office of Higher Education 33
Tags: Row_Cut
Range
1: (>= 0.5 and < 18.5)
2: (>= 18.5 and < 23.5)
3: (>= 23.5 and < 29.5)
4: (>= 29.5 up to infinity/max value)
Range
1: (>= -0.5 and < 0.5) includes 0
2: (>= 0.5 up to infinity) at least $1 in aid
November 2006 Minnesota Office of Higher Education 34
Tags: Filter
And_Filter Subsets (focuses
on) the population of interest
All conditions have to be met (filters selected) in order for case to be included
Or_Filter Subsets (focuses
on) the population of interest
If any condition is met (filter is selected) the case will be included
November 2006 Minnesota Office of Higher Education 35
Tags: Filter
Integer filter: Limit population to the categories selected.
Cut-point filter: Limit population to those with values greater than or less than a specific point or between two points.
November 2006 Minnesota Office of Higher Education 36
Tags: Span_Cat
Uses all of a variable’s categories to group sets of rows in the table
Creates a subtable of estimates for each variable category
Does not provide an overall summary table
Warning: Drastically increases the number of estimates in the table
See also: Span_Cut, Span_Lump
November 2006 Minnesota Office of Higher Education 37
NPSAS - Exercise 2
What percentage of full-time, full-year undergraduates received financial aid by dependency status, institution sector, and age? What was the average amount they received?
Steps: Import exercise 1Delete Column_Cat and Span_Cat tagsDelete Row_Cut tag for Total AidAdd Percent and Average tags
November 2006 Minnesota Office of Higher Education 38
Tags: Percent>
Defines a column of percentages based on values greater than a specified cut point
Can be used with the Mean and Average>0 tags
November 2006 Minnesota Office of Higher Education 39
Tags: Mean versus Average
Mean will include zeros in the denominator
Average will not include zeros in the denominator
November 2006 Minnesota Office of Higher Education 40
Mean vs. Average
All respondents, including those
with no aidOnly respondents
who have aid
November 2006 Minnesota Office of Higher Education 41
Tags: By_Cat
Creates a column of Average, Mean, or Percent> estimates for each category of a variable
Can be used with only ONE Mean, Average>0, or Percent> variable
Provides an overall summary column
Will increase the size of your table
See also: By_Cut, By_Lump
November 2006 Minnesota Office of Higher Education 42
Example of By_Cat with Percent>
Percent> yields percent FT, full-year UG with aid By_Cat generates percent FT, full-year UG with aid by degree program.
Ex: 77.4% of FT, full-year UG in a certificate degree program received aid.
November 2006 Minnesota Office of Higher Education 43
Representative Sample States
NPSAS:04 is not designed to be representative at the state level except for undergraduates attending public 2-year, public 4-year, and private not-for-profit 4-year institutions in the 12 specific states.
Use these to look at these representative sample states:
- INSTSAST (NPSAS institution representative sample states)
- INSTSTSE (NPSAS institution representative state sample by sector)
Do not use: INSTSTAT (NPSAS institution state)
November 2006 Minnesota Office of Higher Education 44
Tags: Centile vs. Centile>0
Generates percentile columns from continuous variables
Produces the cut points for the following percentiles: 10th, 25th, 50th, 75th, 90th
Median = the 50th centile -- the value above and below which half of the observations lie
Centile includes zero values
Centile>0 excludes zero values
November 2006 Minnesota Office of Higher Education 45
Example of Centile>0
Note: Last column shows the percentage of FT, FY undergraduates who received no aid.
Using the DAS online
Saving, modifying and loading files: .tpf files
November 2006 Minnesota Office of Higher Education 47
Saving tables you created
You can save the parameter file for re-use and modification
Files containing the specifications for tables are called .tpf files, or table parameter files
After creating a file in the DAS window, click on Save in the toolbar.
The .tpf file will be saved to the location specified by you
November 2006 Minnesota Office of Higher Education 48
Uploading tables to the DAS application
Click on Import in the toolbar. Locate the .tpf file to be uploaded and upload it. Note: for the DAS online application to read the
file, they must be saved with the extension .tpf Once the file is uploaded, it can be altered and
run as usual.
November 2006 Minnesota Office of Higher Education 49
Reproducing or modifying tables created by others
You can download and use any parameter file used to create a report or ED Tab from our web site: http://nces.ed.gov/das
.tpf files can be edited in a text editor (such as Notepad or Wordpad) but they must be saved with the .tpf extension (not the .txt default extension)
November 2006 Minnesota Office of Higher Education 50
Using the batch processor
The batch processor allows you to run several tpfs at once
You must create an account and log-in by clicking on “Batch processor” on the left-hand side of http://nces.ed.gov/dasol/
The files must be in added to a .zip file and then uploaded. After uploading the file, COPY down your batch number to retrieve your files
November 2006 Minnesota Office of Higher Education 51
Using the batch processor: rules for naming files
There is one catch with the batch processor- it will not run files unless they have specific names (while the DAS has no such rules)
All file names (.ZIP/.TPF/.CPF) must fulfill the following requirements Begin with a letter (for example, A, B, C,...X,Y,Z) Contain at least 2 but no more than 8 characters Not contain spaces between characters Not include symbols or special characters (underscore is
allowed) These guidelines are available on the DAS website:
http://nces.ed.gov/das/das_windows/run_1.asp
Using the DAS online
Sampling and Data Issues
November 2006 Minnesota Office of Higher Education 53
Data sources by percentage
Which sources did NCES use to collect the student data?
Primary sources Institution records (CADE) 95% Student interviews (CATI) 70% Federal aid applications (CPS) 60%
Combinations of primary sources All three sources 40% Two sources 50% One source 10%
Additional sources Federal loans and Pell Grants (NSLDS) 50%
November 2006 Minnesota Office of Higher Education 54
Data issues: data collection problems
Data collection problems arose such as: Missing data
No source or incomplete sources Data did not exist (EFC, student budgets)
Discrepancies among sources Timing issues Reporting or data entry errors Students make guesses during interview
Mismatches Student social security numbers Institution identification numbers
November 2006 Minnesota Office of Higher Education 55
Data issues: addressing the collection problems
Imputation used to complete missing data or to check inconsistencies. NCES used two types of statistical imputation methods: Logical Stochastic (hot deck)
Perturbation used to protect privacy of individuals. Social security numbers switched around for individuals.
Reconciliation used to confirm data okay after imputation and perturbation.
November 2006 Minnesota Office of Higher Education 56
Sample size and weights
15 million undergraduates enrolled in Fall 2003
19 million undergraduates enrolled anytime during the 2003-04 academic year
80,000 undergraduate cases in NPSAS sample: Represent about 1 out of 240 undergraduates
Therefore, average weight for each respondent = about 240
November 2006 Minnesota Office of Higher Education 57
Sample size and weights (cont.)
Each NPSAS sample case has one record containing about 600 derived variables
Each case has been assigned a weight
The average weight for each case is 240, but there is a wide range of weight values
There is only one weight for each case
In general, the weights are lower for the 12 state cases
November 2006 Minnesota Office of Higher Education 58
Why Do The Weights Vary?
Initial sampling rates differ (for the type of institution, type of student, 12 states, etc)
Non-response weight adjustments- need to adjust for those who did not respond to certain questions
Poststratification to known totals- the samples adjusted using poststratification to match known population totals
Smaller sample sizes result in larger weights
Lower institutional/student response result in larger weights
Larger weights mean less precision in estimates
November 2006 Minnesota Office of Higher Education 59
An example to illustrate weights
Case 1
Case 2
Case 3
Case 4
Case 5
Case 6
Total
With grants
100
200
300
400
500
500
2000
1600
WeightedTotal Grant
$200,000
100,000
150,000
0
150,000
1,000,000
1,600,000
$2000
500
500
0
300
2000
Case WeightExampleCase Grant Average $
80% (1600/2000)
$1,000 ($1.6 million/1600)
$800 ($1.6 million/2000)
% with grants% with grants
Average grantAverage grant
Mean grantMean grant
November 2006 Minnesota Office of Higher Education 60
DAS Output Example: Total Grant Amount (TOTGRT)
The weighted N shown in cells is the denominator
Percentage>0 TOTGRT Total students
Average>0 TOTGRT Students with grants
Mean TOTGRT Total students
Function:Weighted N in cells:
(denominator)
November 2006 Minnesota Office of Higher Education 61
Small Sample Sizes: Low N in DAS Output
DAS will produce “low N” instead of an estimate
When does this occur? if the denominator has less than 30 cases (meaning the sample size is less than 30) the result is suppressed by “low N”
The rule-of-thumb in statistics: if the sample size is less than 30, you can not produce meaningful estimates of the population
Percentages: The row (denominator) must have 30+ cases
Average>0: The number in the cell (denominator) must have 30+ cases
November 2006 Minnesota Office of Higher Education 62
Small Sample Size Example
Dependents
Independents
Weighted N’s shown:
Dependents
Independents
Average grant
# of Cases[not shown]
% Grants
Cases in denom.
[not shown]
[100]
[50]
5,000
5,000
20%
80%
5,000
5,000
Low N
$400
Low N
4,000
[20]
[40]
5,000
5,000
Meangrant
$80
$320
5,000
5,000
Cases in denom.
[not shown]
[100]
[50]
5,000
5,000
Note: The weighted N’s do not give an indication of the size of the samples. The number of cases in each category is not shown in the
DAS output. Only those with access to the raw data know this information.
November 2006 Minnesota Office of Higher Education 63
Poststratification to known totals
Primary weights were adjusted in computer models using 75 control totals to reflect:
National enrollment totals for sectors (9 totals)
National total Pell Grant dollars by sector (9 totals)
National total Stafford loan dollars by sector (9 totals)
12 state Pell dollars by sector (36 totals)
12 state Stafford loan dollar totals (12 totals)
Statistical Analysis
Standard errors and analyzing estimates
November 2006 Minnesota Office of Higher Education 65
Reliability of NPSAS data
Representative data
At the national level
For the three major sectors at the state level
Unlike the Census, this does not provide data for the whole population, only for a sample of institutions and students.
When analyzing data, the uncertainty and errors related to sample data must be kept in mind.
November 2006 Minnesota Office of Higher Education 66
Standard errors
Standard errors accompany certain statistical estimates- such as percents, averages, and means.
Specify expected uncertainty in study results.
Reflects the extent to which a study result represents the “true” value in the population.
Calculated from two general sources of error.
November 2006 Minnesota Office of Higher Education 67
Errors in data
Sampling error occurs due to . . . Random-chance selection of too many of a
particular type of student or institution.
Measurement error occurs due to . . . Refusal of some students or institutions to
participate
Not all students and institutions provide data for each item
Respond differently to items.
Mistakes in recording and coding responses.
November 2006 Minnesota Office of Higher Education 68
Analyzing estimates by assessing their errors
All estimates have some measure of error accompanying them.
There are 2 ways of analyzing the errors in NPSAS data: One-Sample Case
For any given statistic, how representative is the statistic of the population (parameter)?
Two-Sample Case: Comparing 2 statistics Do the sample statistics differ enough to conclude that the
populations actually differ on the measured characteristic (or parameter)?
November 2006 Minnesota Office of Higher Education 69
One-sample case: confidence intervals
Confidence intervals provide a range for the estimate- this interval represents the probability that the population’s true statistics is actually in the interval
The larger the confidence interval, the less precise the estimate and the wider the range of possible population statistics
This will be easier to illustrate with an example.
November 2006 Minnesota Office of Higher Education 70
One-sample case: confidence intervals (CI) (cont)Constructing a CI for the percent of all dependent students in
Minnesota who applied for federal aid:NPSAS institution representative sample states = Minnesota
Applied for any aid
Applied for federal aid
(%>0.5) (%>0.5)
--------- Dependency status = Dependent ----------
Estimates
Total 88.1 77.6
Race-ethnicity (with multiple)
White 88.4 77.6
Minority/non-white
85.5 77.5
Standard Errors
Total 1.20 1.27
Race-ethnicity (with multiple)
White 1.33 1.53
Minority/non-white
3.84 2.97
To construct a confidence interval with 95 percent confidence level (which means that the interval contains the true population average 95 percent of the time), find the estimate and its standard error
Multiply the standard error by 1.96
1.96*1.27=2.489
Subtract and add this number from the estimate
77.6 -/+ 2.489 = (75.111, 80.089)
This is the 95 percent CI for this estimate- about 95 percent of the time (if this sample is repeated), the actual number of dependent students in MN who applied for federal aid is between 75%-80%
November 2006 Minnesota Office of Higher Education 71
One-sample case: confidence intervals (CI) (cont)
The CI for the percent of all dependent students in Minnesota who applied for federal aid:
This interval represents the upper and lower values, with 95% probability, that we would expect to observe the true population
characteristic (or parameter) i.e. the actual percent of dependent students in MN who applied for federal aid is between 75%-80%
75.1% 77.6% 80.1% % receiving
aid
November 2006 Minnesota Office of Higher Education 72
Two-sample case: comparing two estimatesConstruct CIs to compare the difference between the percent of white and minority/non-white dependent students in MN who applied for federal aid:
NPSAS institution representative sample states = Minnesota
Applied for any aid
Applied for federal aid
(%>0.5) (%>0.5)
--------- Dependency status = Dependent ----------
Estimates
Total 88.1 77.6
Race-ethnicity (with multiple)
White 88.4 77.6
Minority/non-white
85.5 77.5
Standard Errors
Total 1.20 1.27
Race-ethnicity (with multiple)
White 1.33 1.53
Minority/non-white
3.84 2.97
Construct a CI with 95 percent confidence level for each estimate:
The CI for the % of white students who applied for federal aid:
88.4 -/+ (1.33*1.96) = (85.8, 91.0)
The CI for the % of minority/non-white students who applied for federal aid:
85.5 -/+ (3.84*1.96) = (78, 93)
Now compare these two CIs- do they overlap?- In this case, they overlap which means that the differences are NOT statistically significant. For two estimates to be statistically significant, the CIs must not overlap.
November 2006 Minnesota Office of Higher Education 73
Two-sample case: comparing two estimates
Not only are these estimates not statistically significantly different, but we can learn something else from this sample. The large standard error for the minority/non-white estimate indicates that there is some error in this estimate. In this case, the sample is small which reflects the fact that the population in Minnesota is small (thus a larger standard error is to be expected).
White students
Minority/non-white students
78% 85.5% 93%
85.8% 88.4% 91%
November 2006 Minnesota Office of Higher Education 74
Two-sample case: another approach for comparing two estimates Besides constructing CIs, you can use the two sample t-test. Either you can do
this by hand using the equation below or by going to the DAS help center and selecting on T-tests.
The two-sample t-test uses the estimates and the standard errors:
Estimate1 – Estimate2
((Std Error1)2 + (Std Error2)2)
The result of this calculation is compared to 1.96; if it is larger than 1.96, then the difference between the estimates is statistically significant. In this case,
45.82 – 13.47((3.2)2 + (1.42)2)
This equals 9.24. Since this is larger than 1.96, the difference between these two estimates is statistically significant.
NPSAS institution representative sample states= Minnesota
State grants total (>0.5%)
Estimates
Total 18.71
Income of dependent student's parents
< $40,000 45.82
$40,000 + 13.47
Standard Errors
Total 1.08
Income of dependent student's parents
< $40,000 3.2
$40,000 + 1.42
November 2006 Minnesota Office of Higher Education 75
Two-sample case: another approach for comparing two estimates
The two sample tests (both the CI comparisons and the two-sample t-test) are meant for comparing two distinct populations (i.e. no overlap).
If the populations overlap, such as if one is a subset of the other (like Minnesota and the U.S.), then the two-sample t-test has a correction factor and the following test statistic is used:
Estimatea – Estimateb
Square root of (SEa2 + SEb
2 – 2 * rab * SEa * SEa)
Since the middle term, 2*rab, is not available, we can set this up without that term. Then it looks like the regular two sample t-test. Note, this test statistic is more conservative than it would be if we had used the correct formulation.
The End – Thank you!
For more information, contact Tricia Grimes [email protected] Shefali Mehta [email protected] technical support: Aurora D'Amico (NCES) [email protected] questions about the NPSAS 2004: Tracy Hunt-White (NCES) [email protected]