introducing the tier id match component - internet2...2018/10/29 · introducing the tier id match...
TRANSCRIPT
Introducing theTIER ID Match Component
Benjamin OshrinOctober 2018 • Orlando
Introducing the TIER ID Match ComponentI2TX • Orlando
2
The Basic Problem
● Need a single identity to properly manage access to resources
● Who can we give email accounts to?
● Who can access licensed library content?
● How can we control access to computer labs?
● How can we provide electronic paystubs?
● Who is eligible for an ID badge?
Introducing the TIER ID Match ComponentI2TX • Orlando
3
The Basic Problem
● Typical HE scenario: Multiple Systems of Record (SORs)
● Students apply via commercial application
● Students enroll via Registrar
● Students become alumni and tracked via Alumni Relations
● Employees are hired through HR– But only after some early onboarding process
● Guests and affiliates come from everywhere
● And what about the Hospital?
Introducing the TIER ID Match ComponentI2TX • Orlando
4
The Basic Problem
● Even smaller institutions are not exempt
● Banner may be the main SoR, but where are your guests, visitors, contractors, etc?
Introducing the TIER ID Match ComponentI2TX • Orlando
5
Multiple Approaches, Common Themes
● Variation 1: Match at Registry
● Absent other considerations, probably the recommended approach
● Variation 2: Match at SOR ("Standalone")
● Variation 3: Match before SOR
● Enrollee obtains unique ID before approaching SOR– Regardless of variation, similar considerations
● Quality of inbound attributes● Handling ambiguous ("fuzzy", "potential") matches
Introducing the TIER ID Match ComponentI2TX • Orlando
7
Solutions (?)
● Lots of custom (legacy) code● Some commercial products
– Expensive
– Some better than others
● Not much in the way of Open Source– Maybe you can hack something together from MDM solutions
– Until now (well, soon) ...
Introducing the TIER ID Match ComponentI2TX • Orlando
8
Solution Part 1: ID Match API
● Defines how match requests and responses are exchanged● Match Request
● Match Response (exact, potential)
● Pending Matches (for review and resolution)
● Update Match Attributes
● Design preference: JSON + REST
● Goal: Assign a "Reference Identifier"● Unique identifier for a person, as defined by the Match Engine
● (More later...)
Introducing the TIER ID Match ComponentI2TX • Orlando
9
Solution Part 2: Match Engine
● Implements rules for performing searches● Define attribute characteristics
● Canonical" vs "Potential" rules– Canonical: Can uniquely identify a person, processing stops if exact match found
– Potential: Can suggest ambiguous or fuzzy matches, processing does not stop even if a single match is found
● Maintains match state● Implies a need for attribute updates (such as name changes) to be reported
to the Match Engine
Introducing the TIER ID Match ComponentI2TX • Orlando
10
Matching: A Community Timeline
● 2011: Initial ID Match Strawman drafted
● 2012: "CIFER" Strawman API drafted
● 2013: UCB develops Java-based in memory solution
● 2014: UCB develops Postgres-based solution (PoC)
● 2015: UCB internal implementation
● Nov 2017: TIER project funding allocated
● 2018: Initial TIER component releases● Code being developed under the COmanage Project
● API being formalized by TIER API/Reg Working Group
Introducing the TIER ID Match ComponentI2TX • Orlando
11
The ID Match API
● RESTful design
● Goal: Obtain a Reference Identifier
● Can operate synchronously or asynchronously● ie: interactive fuzzy resolution, or queue for an admin
● Can be placed behind a Registry or as a standalone service
● Can be used to transition legacy systems
Introducing the TIER ID Match ComponentI2TX • Orlando
12
ID Match API Status
● Strawman stable for quite some time
● Effort starting to "formalize" specification● eduPerson style, not as an RFC/etc
● Long term home TBD
● Attribute names may change slightly from the examples
Introducing the TIER ID Match ComponentI2TX • Orlando
13
Match Component
● Basically a configuration based API to SQL translator, using Postgres● Databases are designed to search lots of records given a variety of
attributes and patterns
● POC implementation performed quite well, but used config files, had no UI, and was not multi-tenant
● Why Postgres?● Postgres is Open Source and extremely full featured and reliable
● fuzzystrmatch Extension– Levenshtein distance matching and other capabilities
● MySQL might also work with some modifications, but not tested
Introducing the TIER ID Match ComponentI2TX • Orlando
14
(But I Don’t Currently Run Postgres?)
● TIER Packaging coming soon
● Can use (eg) RDS (though some performance considerations)
● Local deployment works to some scale with no optimizations
Introducing the TIER ID Match ComponentI2TX • Orlando
15
Search Types
● Exact
● Smith = Smith● Distance
● Smith = Simth● 1983-03-02 =
1983-02-03● Substring
● Anna = Anna Marie● Ann = Anna
● Dictionary
● John = Jonathan● Soundex
● Kathy = Cathy
Introducing the TIER ID Match ComponentI2TX • Orlando
16
Search Modifiers
● Alphanumeric
● 1983-03-18 = 19830318● Case Sensitivity
● SMITH = Smith● Null Equivalents
● 000-00-0000 = Ø● “ “ = Ø
● Crosscheck● Allows a single search to apply
to multiple attributes (eg: official and preferred name columns)
● Invalidates● Converts canonical rule to
potential if attribute fails to match
● Tokenize
● Anne Marie vs Anne
Match Component Scope and StatusInterface v1 Scope? v1 Status?
Postgres Match Backend √ √
MySQL / Other Database Support X X
NoSQL / ElasticSearch / etc X X
Alternate Backends (eg: local databases) X X
Pull Model X X
Web Driven Configuration √ √
ID Match REST API √ √
Match Component Scope and StatusInterface v1 Scope? v1 Status?
Web Driven Manual Match Resolution √ √
API Driven Match Resolution √ √
COmanage Registry Integration √ O
Externalized Authentication (User Login) √ √
Basic Authentication (API Access) √ √
OAuth Authentication (API Access) X X
User / Group Based Authorization √ √
Match Component Scope and StatusInterface v1 Scope? v1 Status?
Logging (matching logic, operations) √ O
Bulk Initial Load √ O
Notification on Fuzzy Match √ O
Multi-tenancy √ √
Match Component Scope and StatusMatching Capabilities v1 Scope? v1 Status?
Flexible Attribute Selection √ √
Flexible Attribute Configuration √ √
Confidence Specification √ O
Attribute "Cross Check" √ O
Attribute "Grouping" √ √
Match "Invalidation" √ O
Levenshtein Distance Matching √ √
Match Component Scope and StatusMatching Capabilities v1 Scope? v1 Status?
Exact Matching √ √
Substring Matching √ √
Dictionary Matching X X
Soundex Matching X X
Token Matching X X
Match Rule "Profiles" X X
Reference Identifier Type Selection √ O
Match Component Scope and StatusMatching Capabilities v1 Scope? v1 Status?
Identifier Assignment X X
Manual Split / Join √ O
Notification on Split / Join X X
Track History of matchgrid √ O
(((demo)))
Introducing the TIER ID Match ComponentI2TX • Orlando
46
Next Steps
● Finish code stabilization, testing
● Finish documentation● Match product
● ID Match API
● Packaging
● Scope v1.1.0 features
Introducing the TIER ID Match ComponentI2TX • Orlando
47
More Info
● https://github.internet2.edu/COmanage/match (develop branch)
● https://spaces.at.internet2.edu/display/COmanage/COmanage+Match+Technical+Manual
● https://spaces.internet2.edu/display/cifer/SOR-Registry+Strawman+ID+Match+API