dj.vanderlaan@cbs - european commission · [email protected] indicator for the...

13
D. Jan van der Laan and Bart F.M. Bakker [email protected] Indicator for the Representativeness of Linked Sources

Upload: others

Post on 27-Jun-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: dj.vanderlaan@cbs - European Commission · dj.vanderlaan@cbs.nl Indicator for the Representativeness of Linked Sources . Representativeness 2 . Representativeness 3 . Representativeness

D. Jan van der Laan and Bart F.M. Bakker [email protected]

Indicator for the Representativeness of

Linked Sources

Page 2: dj.vanderlaan@cbs - European Commission · dj.vanderlaan@cbs.nl Indicator for the Representativeness of Linked Sources . Representativeness 2 . Representativeness 3 . Representativeness

Representativeness

2

Page 3: dj.vanderlaan@cbs - European Commission · dj.vanderlaan@cbs.nl Indicator for the Representativeness of Linked Sources . Representativeness 2 . Representativeness 3 . Representativeness

Representativeness

3

Page 4: dj.vanderlaan@cbs - European Commission · dj.vanderlaan@cbs.nl Indicator for the Representativeness of Linked Sources . Representativeness 2 . Representativeness 3 . Representativeness

Representativeness

4

representativeness sensitivity

Page 5: dj.vanderlaan@cbs - European Commission · dj.vanderlaan@cbs.nl Indicator for the Representativeness of Linked Sources . Representativeness 2 . Representativeness 3 . Representativeness

Representativeness indicator

Representativeness indicator for linkage:

When all records in population have equal probability of being linked (e.g.

S 𝜌𝑋 = 0), the linked data set is representative of the population.

Based on: B. Schouten, F. Cobben and J. Bethlehem (2009). “Indicators for the

representativeness of survey response”, Survey Methodology, 2009, 101-113.

5

𝓁 𝑋 =S(𝜌𝑋)

𝜌 𝑋,

Page 6: dj.vanderlaan@cbs - European Commission · dj.vanderlaan@cbs.nl Indicator for the Representativeness of Linked Sources . Representativeness 2 . Representativeness 3 . Representativeness

Partial representativeness indicator

Measure

- contribution of single variable

- contribution of category of single variable

Two variants:

- unconditional partial indicator

- conditional partial indicator

6

Page 7: dj.vanderlaan@cbs - European Commission · dj.vanderlaan@cbs.nl Indicator for the Representativeness of Linked Sources . Representativeness 2 . Representativeness 3 . Representativeness

Example I: employment register

- Target population: employed foreign residents, with exception of

residents with a Belgian or German address.

- Add variables to ER by linking to Population Register

7

Job 1

Address C

Address A

Job 2

Address A

Address D

Person

Address A

Address B

Person

Population Register Employment Register

Page 8: dj.vanderlaan@cbs - European Commission · dj.vanderlaan@cbs.nl Indicator for the Representativeness of Linked Sources . Representativeness 2 . Representativeness 3 . Representativeness

Example I: linkage results

Data sources and deterministic linkage

Population register Employment register

Number of records 14,336,000 Number of records 12,859,000

Deterministically linked 12,302,000

Foreign address 361,000

To probabilistic linkage 196,000

Probabilistically linked 4,000

8

Representativeness indicator:

Deterministic linkage: 𝓁 = 0.295

Probabilistic linkage: 𝓁 = 0.294

Page 9: dj.vanderlaan@cbs - European Commission · dj.vanderlaan@cbs.nl Indicator for the Representativeness of Linked Sources . Representativeness 2 . Representativeness 3 . Representativeness

9 Representativeness indicator

Page 10: dj.vanderlaan@cbs - European Commission · dj.vanderlaan@cbs.nl Indicator for the Representativeness of Linked Sources . Representativeness 2 . Representativeness 3 . Representativeness

Example II: Twin Register

National Twin Register: panel of twins

Health Insurance Database: health insurance claims of one

company (covers ca. 25% of Dutch population)

10

Population NTR0 complete NTR

NTR1 used in linkage

NTR2 linked to HID

Records lost because no permission for linkage

Records lost because they could no be linked to the HID

The NTR is a panel in which twins voluntary participate

Page 11: dj.vanderlaan@cbs - European Commission · dj.vanderlaan@cbs.nl Indicator for the Representativeness of Linked Sources . Representativeness 2 . Representativeness 3 . Representativeness

11

NTR used in linkage (NTR1) compared to population

𝓁 = 0.61

Linkage result (NTR2) compared to population 𝓁 = 0.67

Page 12: dj.vanderlaan@cbs - European Commission · dj.vanderlaan@cbs.nl Indicator for the Representativeness of Linked Sources . Representativeness 2 . Representativeness 3 . Representativeness

12

NTR used in linkage (NTR1) compared to population Unconditional indicator

Conditional indicator

Page 13: dj.vanderlaan@cbs - European Commission · dj.vanderlaan@cbs.nl Indicator for the Representativeness of Linked Sources . Representativeness 2 . Representativeness 3 . Representativeness

Conclusion

Representativeness indicator for linkage

- Under certain conditions upper bound on relative bias

- Results depend on set of covariates used

Applications

- Insight into which subpopulations underrepresented

- Direct further efforts in linkage

- Comparison of linkage algorithms

- Monitoring

13