fbi biographicresolution
TRANSCRIPT
-
8/18/2019 FBI BiographicResolution
1/20
UNCLASSIFIED
UNCLASSIFIED
UNCLASSIFIED
UNCLASSIFIED
Biographic
Entity ResolutionChallenges of managing and sharing
John N. Dvorak
Senior Level IT Architect
September 16, 2014
Global Identity Summit
-
8/18/2019 FBI BiographicResolution
2/20
UNCLASSIFIED
UNCLASSIFIED
Discussion Topics
Challenges of data
Resolving entities
Managing and sharing resolved entitiesFuture (a.k.a. the world John would like to see)
2
-
8/18/2019 FBI BiographicResolution
3/20
UNCLASSIFIED
UNCLASSIFIED
Brief (very!) Introduction
Entity Resolution: The process of determining whether two or more
references to real-world objects such as people (individuals), places, or
things are referring to the same object or to different objects. This
concept is sometimes referred to as Entity Correlation, Entity
Disambiguation, or Record Linkage, and includes related concepts such
as Identity Resolution. (from draft DARA)
Entity Map: Complete enriched entity data that includes the linkage of
relationships between people, places, things, and characteristics of
data resulting from an entity resolution process.
For a deeper discussion about biographic identity challenges and
technologies, attend Dr. Keith Miller’s Biographic Identity Technologies
session, Wednesday, 9 AM (Track C)
3
-
8/18/2019 FBI BiographicResolution
4/20
UNCLASSIFIED
UNCLASSIFIED
Relationships in Identification
4
Skin Luminesence
Biometric Biographic
Behavioral
Fingerprint
Palm Print
Foot Print
Face
Iris
DNA
Voice
ScentEEG/fMRI
Gait
Handwriting
Keystroke Dynamics
Tattoo
Ear Shape
Skin Pattern
EOGECG
EMG
Height
Hair Color
Eye Color
Address
Age
Weight
Driver License
School Records
Financial RecordsOnline Transactions
Cell Phone Records Passport
Travel Records Medical Records
Tax Records
Social Networking
Affiliations
Personal Network
Name
SSN
-
8/18/2019 FBI BiographicResolution
5/20
-
8/18/2019 FBI BiographicResolution
6/20
UNCLASSIFIED
UNCLASSIFIED
Layers of Use of Biographic Identity
6
Additionally – apply at all levels
• Authentication
• Credentialing
• Access Control
• Privacy
• (includes Anonymization)
INDIVIDUAL ELEMENT MATCHING:
Name, DOB, Email, Address, Fingerprint, Iris, Behavior, etc..
(includes: confidence, uncertainty, lifetime, provenance, etc.)
IDENTITY MATCHINGMatch two identities with some subset of individual elements
(propagates: confidence, uncertainty, lifetime, provenance, etc.)
IDENTITY RESOLUTION
Resolving multiple (e.g., millions) identities(propagates: confidence, uncertainty, lifetime, provenance, etc.)
IDENTITY QUERY / RETRIEVAL(propagates: confidence, uncertainty,
lifetime, provenance, etc.)
IDENTITY ANALYSIS
E.g., Link Analysis, Data Mining(propagates: confidence, uncertainty,
lifetime, provenance, etc.)
Source: MITRE Corporation 2014
-
8/18/2019 FBI BiographicResolution
7/20
UNCLASSIFIED
UNCLASSIFIED
The Judgment Problem
(or We can’t automate everything)
Automation provides candidate matches, based
on carefully constructed rules
Judging which records should be linked is an
analytical task
Manual review is usually necessary and manual
“override” needed to correct errors
7
-
8/18/2019 FBI BiographicResolution
8/20
UNCLASSIFIED
UNCLASSIFIED
Match criteria must be balanced
Beware Over Resolving!
Making match criteria more
tolerant to small errors gets better
results, but can lead to over-resolving data
8
-
8/18/2019 FBI BiographicResolution
9/20
-
8/18/2019 FBI BiographicResolution
10/20
UNCLASSIFIED
UNCLASSIFIED
Leading us to…
Entity Management
10
-
8/18/2019 FBI BiographicResolution
11/20
-
8/18/2019 FBI BiographicResolution
12/20
UNCLASSIFIED
UNCLASSIFIED
Tracking pedigree: An example
Entities are merged and branched over time, all of
which needs to be tracked and made visible
(provenance/pedigree):
12
Entity AEntity AABC 27
GHI 12
NOP 15
LMN 64
DEF 34
13
15
19
13
15
19
65
66
Entity A
Entity B
UNCLASSIFIED
-
8/18/2019 FBI BiographicResolution
13/20
UNCLASSIFIED
UNCLASSIFIED
And now…
Assuming you have tackled the small problems of entityextraction, resolution and local management, and also
assuming you don’t live on an isolated island of analyticwonderment…
You need to share what you know
And process what others know
(restricted by laws & policies, not technology)
13
UNCLASSIFIED
-
8/18/2019 FBI BiographicResolution
14/20
UNCLASSIFIED
UNCLASSIFIED
The need to share
Sharing Resolved Entities
14
-
8/18/2019 FBI BiographicResolution
15/20
UNCLASSIFIED
-
8/18/2019 FBI BiographicResolution
16/20
UNCLASSIFIED
UNCLASSIFIED
More complex data, more complex challenges
Today:
Technical: proprietary matching algorithms,proprietary scoring mechanisms, non-standard exportformats, isolated innovation
Business: policy variation, inconsistent processes,often separately managed modalities
A hopeful future:
Open standards for sharing correlated data/entities,compatible (or at least “understandable”) policies,sharable algorithms/analytics, broad innovationwithin and between many modalities
16
UNCLASSIFIED
-
8/18/2019 FBI BiographicResolution
17/20
UNCLASSIFIED
UNCLASSIFIED
A (shameless) plug for the DARA
SA IPC’s Data Aggregation Working Group (DAWG) under PM-ISE isdeveloping the Data Aggregation Reference Architecture (DARA).
Response to NSISS Priority Objective 10: “Develop a referencearchitecture to support a consistent approach to data discovery andentity resolution and data correlation across disparate datasets”
Among its objectives: “Define a reference architecture that enablesentity resolution and data correlation, and disambiguation acrossmultiple data aggregation investments.”
Data Aggregation Reference Architecture public statement:
http://www.ise.gov/blog/ise-bloggers/building-data-aggregation-reference-architecture-%E2%80%93-industry%E2%80%99s-help
Associated RFI:
https://www.fbo.gov/?s=opportunity&mode=form&id=e5ce39a84c1e09afab844a9487866a68&tab=core&_cview=0
SA IPC: Information Sharing and Access Interagency Policy CommitteeNSSIS: National Strategy for Information Sharing and Safeguarding
17
UNCLASSIFIED
http://www.ise.gov/blog/ise-bloggers/building-data-aggregation-reference-architecture-%E2%80%93-industry%E2%80%99s-helphttp://www.ise.gov/blog/ise-bloggers/building-data-aggregation-reference-architecture-%E2%80%93-industry%E2%80%99s-helphttps://www.fbo.gov/?s=opportunity&mode=form&id=e5ce39a84c1e09afab844a9487866a68&tab=core&_cview=0https://www.fbo.gov/?s=opportunity&mode=form&id=e5ce39a84c1e09afab844a9487866a68&tab=core&_cview=0https://www.fbo.gov/?s=opportunity&mode=form&id=e5ce39a84c1e09afab844a9487866a68&tab=core&_cview=0https://www.fbo.gov/?s=opportunity&mode=form&id=e5ce39a84c1e09afab844a9487866a68&tab=core&_cview=0https://www.fbo.gov/?s=opportunity&mode=form&id=e5ce39a84c1e09afab844a9487866a68&tab=core&_cview=0http://www.ise.gov/blog/ise-bloggers/building-data-aggregation-reference-architecture-%E2%80%93-industry%E2%80%99s-helphttp://www.ise.gov/blog/ise-bloggers/building-data-aggregation-reference-architecture-%E2%80%93-industry%E2%80%99s-helphttp://www.ise.gov/blog/ise-bloggers/building-data-aggregation-reference-architecture-%E2%80%93-industry%E2%80%99s-helphttp://www.ise.gov/blog/ise-bloggers/building-data-aggregation-reference-architecture-%E2%80%93-industry%E2%80%99s-helphttp://www.ise.gov/blog/ise-bloggers/building-data-aggregation-reference-architecture-%E2%80%93-industry%E2%80%99s-helphttp://www.ise.gov/blog/ise-bloggers/building-data-aggregation-reference-architecture-%E2%80%93-industry%E2%80%99s-helphttp://www.ise.gov/blog/ise-bloggers/building-data-aggregation-reference-architecture-%E2%80%93-industry%E2%80%99s-helphttp://www.ise.gov/blog/ise-bloggers/building-data-aggregation-reference-architecture-%E2%80%93-industry%E2%80%99s-helphttp://www.ise.gov/blog/ise-bloggers/building-data-aggregation-reference-architecture-%E2%80%93-industry%E2%80%99s-helphttp://www.ise.gov/blog/ise-bloggers/building-data-aggregation-reference-architecture-%E2%80%93-industry%E2%80%99s-helphttp://www.ise.gov/blog/ise-bloggers/building-data-aggregation-reference-architecture-%E2%80%93-industry%E2%80%99s-helphttp://www.ise.gov/blog/ise-bloggers/building-data-aggregation-reference-architecture-%E2%80%93-industry%E2%80%99s-helphttp://www.ise.gov/blog/ise-bloggers/building-data-aggregation-reference-architecture-%E2%80%93-industry%E2%80%99s-helphttp://www.ise.gov/blog/ise-bloggers/building-data-aggregation-reference-architecture-%E2%80%93-industry%E2%80%99s-helphttp://www.ise.gov/blog/ise-bloggers/building-data-aggregation-reference-architecture-%E2%80%93-industry%E2%80%99s-helphttp://www.ise.gov/blog/ise-bloggers/building-data-aggregation-reference-architecture-%E2%80%93-industry%E2%80%99s-helphttp://www.ise.gov/blog/ise-bloggers/building-data-aggregation-reference-architecture-%E2%80%93-industry%E2%80%99s-help
-
8/18/2019 FBI BiographicResolution
18/20
UNCLASSIFIED
UNCLASSIFIED
Other Places of Interest
NIEMhttps://www.niem.gov
Global Federated Identity and Privilege Management(GFIPM)http://www.gfipm.net
DOJ’s Global Reference Architecture (GRA)https://it.ojp.gov/GRA
Open source:
SEARCH’s (search.org) Entity Resolution Toolkit (ERS),sponsored by DOJ’s National Institute of Justice https://github.com/entityresolution
18
UNCLASSIFIED
https://www.niem.gov/http://www.gfipm.net/https://it.ojp.gov/GRAhttps://github.com/entityresolutionhttps://github.com/entityresolutionhttps://github.com/entityresolutionhttps://github.com/entityresolutionhttps://it.ojp.gov/GRAhttps://it.ojp.gov/GRAhttp://www.gfipm.net/http://www.gfipm.net/http://www.gfipm.net/https://www.niem.gov/https://www.niem.gov/https://www.niem.gov/
-
8/18/2019 FBI BiographicResolution
19/20
UNCLASSIFIED
UNCLASSIFIED
We advance and evolve
UNCLASSIFIED
UNCLASSIFIED
-
8/18/2019 FBI BiographicResolution
20/20
UNCLASSIFIED
UNCLASSIFIED
UNCLASSIFIED
UNCLASSIFIED
U.S. Department of Justice
Federal Bureau of InvestigationWashington, D.C. 20535
John N. Dvorak
Senior Level IT Architect