nec methods: matching, deduplication, analysis & … slides 2014 10 27d_0...nec algorithm, there...
TRANSCRIPT
28 October 2014
1
NEC METHODS: MATCHING,
DEDUPLICATION, ANALYSIS &
RESPONSE RATES
Matching & Deduplication 2
Purpose of the Merged Analytic Cross-
Region Datasets 3
PIF-ER Merged Dataset
Analyses on types of trainees who attended particular
events
PIF-ER-ACRE Merged Dataset
Analyses on outcomes of AETC training programs related
to self-assessed changes in provider behavior and clinical
practice.
Analytic Dataset Creation Overview
1. Collect regional process and evaluation data
2. Convert data in submitted format (Excel, CSV, SPSS) to SAS
3. Reformat regional datasets to match expected data file specifications (e.g., character/numeric type)
Process data: HRSA data manual
Evaluation data: ACRE implementation manual
4. Create all-region ER, PIF, ACRE IP, ACRE FUP, and FTCC PIF datasets by concatenating/appending regional files of the same type
5. Create analytic PIF-ER merged dataset
6. Create analytic PIF-ER-ACRE datasets
4
Cross-Region Analytic Data 5
Steps 1, 2, 3, 4:
Collect, convert, reformat
data. Create all-region ER,
PIF, ACRE IP and FUP
datasets.
Step 5:
Create analytic ER-PIF dataset
Step 6:
Create analytic ER-PIF-ACRE dataset
Creating the Analytic PIF-ER Merged
Dataset 6
Check to see which regions have repeats on
PROG_ID by LPS
Merge PIF and ER
For 1-2 regions with repeated PROG_ID, sort and
merge the PIF and ER by AETC – LPS – and PROG_ID
For all other regions that have distinct PROG_ID, sort
and merge the PIF and ER by AETC and PROG_ID
only
PROG_ID AETC LPS
Bottom of PIF:
Creating the Analytic PIF-ER-ACRE
Merged Dataset (1) 7
Select eligible ACRE IP data
Check to see which regions have repeats on PROG_ID by LPS
Exclude records where all 4 IP questions are missing/blank
Exclude records where the PIF_ID is . [missing], 0, or 99999999
De-duplicate IP records by AETC, LPS (if applicable), PROG_ID, PIF_ID, AIP1, AIP2
Select eligible records from the previously created ER-PIF merged dataset
Include only records where there is at least 1 PIF record included (e.g., there are some ERs without any PIFs)
Exclude records where the PIF_ID is . [missing], 0, or 99999999
Cont.’d
Creating the Analytic PIF-ER-ACRE
Merged Dataset (2) 8
Sort the ER-PIF and the ACRE IP data by AETC LPS (if applicable) PROG_ID
PIF_ID. The ER-PIF dataset is further sorted by PIFDATE
Merge the ER-PIF-IP by AETC LPS PROG_ID PIF_ID
De-duplicate the data based on the key variables AETC, LPS (if
applicable), PROG_ID, PIF_ID [*Note, this deletes <200 records]
Sort the all-region ACRE FUP by AETC LPS (if applicable) PROG_ID PIF_ID
Sort the previously created ER-PIF-IP dataset by AETC LPS (if applicable)
PROG_ID, PIF_ID
Merge the ER-PIF-IP with the ACRE FUP by these key variable
Restrict the analytic dataset to records with a valid, non-missing PIF_ID with
a PIF available [Note, approx 20K records removed]
PIF ID 9
PIF ID is available on the PIF, ACRE IP, and ACRE FUP data
Though not on the ER form, the Program ID on the PIF and ER allows PIF IDs to be associated with events
PIF ID used for matching
Across training events (repeat trainees)
Across evaluation forms (ACRE IP and FUP)
month of birth + day of birth + last 4 digits of SSN
PIF_ID
NEC valid PIF ID algorithm 10
Valid PIF ID contains:
Valid month of birth (1-12)
Valid day of birth (1-31)
Valid last 4 digits of SSN (≥1 and not 9999)
Valid PIF ID is a numeric value <99999999
Examples of invalid PIF IDs:
99999999
0
. [missing]
12345678
04049999
1122420932
Records with invalid PIF IDs are excluded from regression analyses
De-Duplication Examples 11
For overall ACRE regression analyses:
ER-PIF-ACRE dataset restricted to records with a valid PIF ID and with a linked PIF
Restricted dataset sorted by combined AETC region, PIF ID, eligibility for ACRE IP, having associated IP record, and PIF date
Last record is outputted
For MAI ACRE regression analyses, similar:
ER-PIF-ACRE dataset restricted to records with a valid PIF ID and with a linked PIF
Restricted ER-PIF-ACRE dataset sorted by combined AETC region, PIF ID, having an MAI training record, eligibility for ACRE IP, having associated IP record, and PIF date
Last record is outputted
Recoding & Analysis 12
Eligible Records for ACRE Regression
Analyses 13
Last eligible record among repeat trainees is used
“Eligible” means the PIF_ID is not an invalid code according to the NEC algorithm, there is truly an associated PIF in the linked dataset
Analytic population includes:
For IP: targeted IP trainee (i.e., attended Level 1, 2, or 3 training), who has an associated PIF and IP record, and is a direct HIV provider (PIF13=1)
For FUP: targeted FUP trainee (i.e., attended Level 2 training and topic included clinical management [ER4_1-16] or prevention and behavior change [ER4_29-31] topics), who has an associated PIF and FUP record, and is a direct HIV provider (PIF13=1)
ACRE IP Eligible Trainings
Event Record form
14
ACRE immediate post questions asked immediately after training event
ER9_3>0
ER9_2>0
ER9_1>0
-OR-
-OR-
ACRE FUP Eligible Trainings
-AND-
ANY
Event Record form
15
ACRE follow-up asked 6 weeks after training through a web-based survey
ER9_2>0
ER4_1=1 or
ER4_2=1 or
etc.
…. or
ER4_31=1
FY 11/12 AETC Cross-Region Trainees
in IP Analyses 16
Data source: cross-region ER-PIF and ACRE IP FY11-12.
N = 108,687 excludes n = 2,459
event records without a PIF
associated and n = 5,736 records
with an invalid PIF ID. This number
includes repeat trainees.
Though n = 93,756 records
fulfilled the IP target criteria,
n = 42,465 (45.3%) ER-PIF-
IP records that linked and
fulfilled the target.
Of these, n = 15,979
(52.7%) indicated they were
direct HIV providers on the
PIF.
N = 72,642
ACRE IP records received by
NEC
N = 108,687
FY 11-12 trainees (based on
linked AETC PIF and ER)
n = 45,452
linked ER-PIF-ACRE IP
n = 42,465
linked records and a
targeted IP training
n = 2,987
linked records and NOT a
targeted IP training
n = 30,331
linked records, IP targeted,
and trainee’s last record in FY 11-12
FY 11/12 AETC Cross-Region Trainees
in FUP Analyses 17
Data source: cross-region ER-PIF and ACRE FUP FY11-12.
N = 3,847
ACRE FUP records received
by NEC
N = 108,687
FY 11-12 trainees (based on
linked AETC PIF and ER)
n = 2,620
linked ER-PIF-ACRE FUP
n = 2,018
linked records and a
targeted FUP training
n = 602
linked records and NOT a
targeted FUP training
n = 1,707
linked records, FUP targeted,
and trainee’s last record in FY
11-12
N = 108,687 excludes n = 2,459
event records without a PIF
associated and n = 5,736 records
with an invalid PIF ID. This number
includes repeat trainees.
Though n = 61,647 records
fulfilled the FUP target
criteria, n = 2,018 (3.3%)
ER-PIF-FUP records that
linked and fulfilled the
target.
Of these, n = 1,014 (59.4%)
indicated they were direct
HIV providers on the PIF and
FUP survey.
Analytic Variables 18
Regression models have included the following predictors:
Big 6
Worked in Ryan White funded setting
Minority provider
Minority serving
Provider experience
HIV+ clients per month
Repeat trainee
All of the above predictors come directly from the PIF except for Repeat trainee status, which is based on the linked PIF-ER
Regression models are restricted to direct providers of HIV+
ACRE FUP web survey is targeted to direct providers
Analytic Variable: Clinical Providers
“BIG 6” 19
Comes from PIF question 3
Clinical providers encompass 7
professional categories, though
we often refer to them as “big
6”
All other non-missing responses
are coded as non-clinical
providers
Participant Information Form
PIF3 Mutually exclusive
Analytic Variable: Ryan White-Funded 20
From the RWFUND administrative variable on the
bottom of the PIF
Participant Information Form
Exceptions apply: some regions have advised the NEC
to use PIF8A for this information
RWFUND
=1 =0
=1 =0 =9
PIF8A
Analytic Variable: Minority Provider 21
A minority provider is
Hispanic, multiracial, AI/AN,
Asian, Native Hawaiian or
Pacific Islander, or Black
A non-minority provider is a
non-Hispanic White provider
with only a single race
indicated
Those without any race
indicated are left as missing
Participant Information Form
PIF10_1 PIF10_2
PIF10_3 PIF10_4
PIF10_5
PIF9
=0 =1
Mutually exclusive
Not mutually exclusive
Analytic Variable: Minority Serving 22
Among providers with direct service experience to
HIV-infected clients (PIF12_1=1 and PIF13=1):
“Minority serving” (i.e., serves greater than half
minorities): PIF12B = 3 or 4
Not minority serving (i.e., serves fewer than half
minorities): PIF12B = 0, 1, or 2
=0 =1 =2 =3 =4
Participant Information Form
Skip pattern: This question
should only be answered if
PIF12_1=1 and PIF13=1 PIF12_2
Analytic Variable: Provider Experience 23
Among providers with direct service experience to
HIV-infected clients (PIF12_1=1 and PIF13=1):
Novice: 0 to <2 years of experience
New: 2 to <3 years of experience
Experienced: 3 or more years of experience
= continuous numeric variable
Skip pattern: This question
should only be answered if
PIF12_1=1 and PIF13=1
PIF14
Analytic Variable: HIV+ Clients per
Month 24
Categories for HIV+ clients per month:
0/month: PIF13 = 0 (No direct HIV+ services provided)
or PIF15 = 0
1-19/month: PIF15 = 1 or 2
20+/month: PIF15 = 3 or 4
=0 =1 =2 =3 =4
Skip pattern: This question
should only be answered if
PIF12_1=1 and PIF13=1
PIF15
Special Initiatives 25
Repeat Trainees 26
Repeat trainee status is relative to the last eligible record during the analysis period
An individual who attended multiple AETC trainings with only 1 MAI training would not be categorized as a repeat trainee in an MAI analysis, since the last eligible MAI training record is the first and only MAI training
However, this same individual would be considered a repeat trainee for a cross-region analysis during this time period
A trainee is considered non-unique if s/he has the same PIF ID within a combined AETC region (e.g., AETC13, 39, 51 considered combined PAMA region)
Assumption: An individual took trainings within one region only. For example, Trainee who moved from CA to NY with training records in both regions would be counted as two separate individuals in the cross-site data.
Repeat Trainees – Combined AETC
Codes 27
Regional AETC Name Combined AETC Codes
Delta 1, 30
Florida/Caribbean 2, 31, 57, 61
Midwest 4, 32
Mountain Plains 5, 33, 56
New England 8, 35
NY/NJ 10, 36
Northwest 11, 37, 52
Pacific 12, 38, 50, 68
PAMA 13, 39, 51
Southeast 15, 40, 58
TX/OK 16, 41
Repeat Trainees- Example 28
PIF_ID AETC Funding Type Training event (any type)
during analytic period
12345678 13 MAI 1
12345678 13 Base 2
12345678 39 CDC testing 3
12345678 13 Base 4
If PIF_ID 12345678 were truly a valid ID and the records below are all
event data for this trainee in the fiscal year, sorted by event date:
In an MAI analysis, the latest MAI record would be retained. This trainee is not a repeat trainee during the MAI training.
In an overall analysis, this trainee is a repeat trainee. The fourth training record retained for the analysis.
Notes: AETC=39 is grouped with AETC=13 for the region PA/MA. Repeat trainee analyses are coupled with the de-duplication process.
We identified data to include by limiting records to
those identified as MAI on the ER:
MAI Initiative Events
Event Record form
29
ER5_3=1
Not mutually exclusive
We identified data to include by limiting records to
those identified as “HIV Testing” on the ER and
through the code associated with CDC funding used
by AETC Regions:
HIV Testing Events
-OR-
-OR-
AETC = 30-41 (CDC testing code)
Event Record form
30
ER4_7=1
ER4_31=1
ACRE Rescaled Outcomes 31
Original Scale IP Meanings FUP Meanings New Scale
1 “Novice”
“Poor”
“Disagree Strongly”
“Strongly Disagree” 0
2 “Disagree” 25
3 “Neither Agree or
Disagree”
50
4 “Agree” 75
5 “Expert”
“Excellent”
“Agree Strongly”
“Strongly Agree” 100
For ease of interpretation, all outcome responses were rescaled from 1-5 to 0-100 so that the results could be interpreted as percent change:
Original scale values of 0 or >5 are recoded to missing. Decimal values
between 1-5 are rounded down to a whole number.
Response Rates (ACRE-FUP) 32
Response Rates - Background 33
Over a wide range of disciplines, email response
rates average 20-30%
Factors hypothesized to influence response rates
Number of questions
Pre-notification
Follow-up
Salience
2013 Response Rates* 34
2013 response rates: 5% - 64%, avg: 30%
Top responders:
University of Hawaii (Pacific) – 64.1%
YVFWC (Northwest) – 47.7%
UNC Chapel Hill (SEATEC) – 42.5%
AARTH (Northwest) – 42.1%
Indiana (MATEC) – 38.5%
*Response rates by LPS for VF users with a minimum of 20 total participants
2014 Response Rates*,** 35
2014 response rates: 11% - 61%, avg: 27%
Top responders:
UK (SEATEC) – 60.5%
USC (SEATEC) – 55.8%
AZ AIDS ETC (Pacific) – 47.4%
SPIPA (Northwest) – 44.0%
Pittsburgh (PA/MA) – 41.0%
*Response rates by LPS for VF users with a minimum of 20 total participants
**Response rates through October 1, 2014
Response Rates 36
LPS with >1 events per year have higher response
rates: 31% vs 23%
Average response rate for LPS with 10+ events: 35%
Average response rate for LPS with 50+
attendees/event: 28%
Average response rate for LPS with <20
attendees/event: 35%
Response Rates 37
Email comments from top responders:
Online registration (UK & USC)
Participant buy-in (UK, USC, SPIPA)
Cultural awareness of participants (SPIPA)
Monthly audits from central office (UK & USC)
Response Rates 38
Additional comments?
Questions/concerns?