identity in the census finding people in more than one

18
Identity in the Census Finding people in more than one

Upload: sabrina-gibbs

Post on 29-Dec-2015

215 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Identity in the Census Finding people in more than one

Identity in the Census

Finding people in more than one

Page 2: Identity in the Census Finding people in more than one

What is Identity?

A unique set of identifiers

Page 3: Identity in the Census Finding people in more than one

What is an Identifier?

Any measurable attribute In Census - Name, Age, Sex, Birth

State AND Household characteristics

Page 4: Identity in the Census Finding people in more than one

Basic Record Linking

Generalize identifiers to block Compare within a block more

specifically to matchWhy? GEDCOM was about exchange – we’ve

abandoned that in favor of linkage Local conclusions, remote evidence

Page 5: Identity in the Census Finding people in more than one

Household Characteristics?

Oldest male Oldest female Oldest boy Oldest girl

Page 6: Identity in the Census Finding people in more than one

Types of Identifiers

Cultural Biological

Page 7: Identity in the Census Finding people in more than one

Cultural Identifiers

Surname Given Name Family Role

Page 8: Identity in the Census Finding people in more than one

Biological Identifiers

Sex Age Parent / Child roles

Page 9: Identity in the Census Finding people in more than one

Coding Identifiers

Soundex Initials Birth year

Page 10: Identity in the Census Finding people in more than one

Why code identifiers?

Because matching doesn’t work Expressions of identifiers in records

vary – granularity etc. To speed up comparisons by allowing

blocking on a matched code

Page 11: Identity in the Census Finding people in more than one

Examples: Carroll Co AR

1860 and 1870 Surnames beginning with K and L

Page 12: Identity in the Census Finding people in more than one

What kind of keys did you use?

qry1860OldWoman Sex (f) Initial of Surname Initial of coded first name Estimated birth year / 5 Example: 1860 Mary Keelan age 13 fKM369

Page 13: Identity in the Census Finding people in more than one

1860 Family John KEYES 30 Hannah KEYES 27 Housekey = mKJ366fKH366 Less granular key = mKJfKH

Page 14: Identity in the Census Finding people in more than one

Easy Match

Surname Soundex First initial Birth Year

Easy List

Page 15: Identity in the Census Finding people in more than one

Other Matches

Universe 778 records Key 2 – mKJ - 449 matches EasyList – 78 matches Key 3 – mKJ382 – 38 matches Mom and oldest boy – key3 – 8

matches – 1 right Housekey - mLA368fLE368 – 4

matches – 3 right

Page 16: Identity in the Census Finding people in more than one

Work to do

Measure the effectiveness of different sets of identifiers

Scale the algorithms to larger data sets

Abandon linking for a Cartesian Event Space

Page 17: Identity in the Census Finding people in more than one

Example of LeMaster NumbersTim e

S pace

N am e

B irth 1

M arriage 1

D eath 1

B irth 2

Page 18: Identity in the Census Finding people in more than one

Questions