introducing the tier id match component - internet2...2018/10/29  · introducing the tier id match...

47
Introducing the TIER ID Match Component Benjamin Oshrin October 2018 • Orlando

Upload: others

Post on 21-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introducing the TIER ID Match Component - Internet2...2018/10/29  · Introducing the TIER ID Match Component I2TX • Orlando 9 Solution Part 2: Match Engine Implements rules for

Introducing theTIER ID Match Component

Benjamin OshrinOctober 2018 • Orlando

Page 2: Introducing the TIER ID Match Component - Internet2...2018/10/29  · Introducing the TIER ID Match Component I2TX • Orlando 9 Solution Part 2: Match Engine Implements rules for

Introducing the TIER ID Match ComponentI2TX • Orlando

2

The Basic Problem

● Need a single identity to properly manage access to resources

● Who can we give email accounts to?

● Who can access licensed library content?

● How can we control access to computer labs?

● How can we provide electronic paystubs?

● Who is eligible for an ID badge?

Page 3: Introducing the TIER ID Match Component - Internet2...2018/10/29  · Introducing the TIER ID Match Component I2TX • Orlando 9 Solution Part 2: Match Engine Implements rules for

Introducing the TIER ID Match ComponentI2TX • Orlando

3

The Basic Problem

● Typical HE scenario: Multiple Systems of Record (SORs)

● Students apply via commercial application

● Students enroll via Registrar

● Students become alumni and tracked via Alumni Relations

● Employees are hired through HR– But only after some early onboarding process

● Guests and affiliates come from everywhere

● And what about the Hospital?

Page 4: Introducing the TIER ID Match Component - Internet2...2018/10/29  · Introducing the TIER ID Match Component I2TX • Orlando 9 Solution Part 2: Match Engine Implements rules for

Introducing the TIER ID Match ComponentI2TX • Orlando

4

The Basic Problem

● Even smaller institutions are not exempt

● Banner may be the main SoR, but where are your guests, visitors, contractors, etc?

Page 5: Introducing the TIER ID Match Component - Internet2...2018/10/29  · Introducing the TIER ID Match Component I2TX • Orlando 9 Solution Part 2: Match Engine Implements rules for

Introducing the TIER ID Match ComponentI2TX • Orlando

5

Multiple Approaches, Common Themes

● Variation 1: Match at Registry

● Absent other considerations, probably the recommended approach

● Variation 2: Match at SOR ("Standalone")

● Variation 3: Match before SOR

● Enrollee obtains unique ID before approaching SOR– Regardless of variation, similar considerations

● Quality of inbound attributes● Handling ambiguous ("fuzzy", "potential") matches

Page 6: Introducing the TIER ID Match Component - Internet2...2018/10/29  · Introducing the TIER ID Match Component I2TX • Orlando 9 Solution Part 2: Match Engine Implements rules for
Page 7: Introducing the TIER ID Match Component - Internet2...2018/10/29  · Introducing the TIER ID Match Component I2TX • Orlando 9 Solution Part 2: Match Engine Implements rules for

Introducing the TIER ID Match ComponentI2TX • Orlando

7

Solutions (?)

● Lots of custom (legacy) code● Some commercial products

– Expensive

– Some better than others

● Not much in the way of Open Source– Maybe you can hack something together from MDM solutions

– Until now (well, soon) ...

Page 8: Introducing the TIER ID Match Component - Internet2...2018/10/29  · Introducing the TIER ID Match Component I2TX • Orlando 9 Solution Part 2: Match Engine Implements rules for

Introducing the TIER ID Match ComponentI2TX • Orlando

8

Solution Part 1: ID Match API

● Defines how match requests and responses are exchanged● Match Request

● Match Response (exact, potential)

● Pending Matches (for review and resolution)

● Update Match Attributes

● Design preference: JSON + REST

● Goal: Assign a "Reference Identifier"● Unique identifier for a person, as defined by the Match Engine

● (More later...)

Page 9: Introducing the TIER ID Match Component - Internet2...2018/10/29  · Introducing the TIER ID Match Component I2TX • Orlando 9 Solution Part 2: Match Engine Implements rules for

Introducing the TIER ID Match ComponentI2TX • Orlando

9

Solution Part 2: Match Engine

● Implements rules for performing searches● Define attribute characteristics

● Canonical" vs "Potential" rules– Canonical: Can uniquely identify a person, processing stops if exact match found

– Potential: Can suggest ambiguous or fuzzy matches, processing does not stop even if a single match is found

● Maintains match state● Implies a need for attribute updates (such as name changes) to be reported

to the Match Engine

Page 10: Introducing the TIER ID Match Component - Internet2...2018/10/29  · Introducing the TIER ID Match Component I2TX • Orlando 9 Solution Part 2: Match Engine Implements rules for

Introducing the TIER ID Match ComponentI2TX • Orlando

10

Matching: A Community Timeline

● 2011: Initial ID Match Strawman drafted

● 2012: "CIFER" Strawman API drafted

● 2013: UCB develops Java-based in memory solution

● 2014: UCB develops Postgres-based solution (PoC)

● 2015: UCB internal implementation

● Nov 2017: TIER project funding allocated

● 2018: Initial TIER component releases● Code being developed under the COmanage Project

● API being formalized by TIER API/Reg Working Group

Page 11: Introducing the TIER ID Match Component - Internet2...2018/10/29  · Introducing the TIER ID Match Component I2TX • Orlando 9 Solution Part 2: Match Engine Implements rules for

Introducing the TIER ID Match ComponentI2TX • Orlando

11

The ID Match API

● RESTful design

● Goal: Obtain a Reference Identifier

● Can operate synchronously or asynchronously● ie: interactive fuzzy resolution, or queue for an admin

● Can be placed behind a Registry or as a standalone service

● Can be used to transition legacy systems

Page 12: Introducing the TIER ID Match Component - Internet2...2018/10/29  · Introducing the TIER ID Match Component I2TX • Orlando 9 Solution Part 2: Match Engine Implements rules for

Introducing the TIER ID Match ComponentI2TX • Orlando

12

ID Match API Status

● Strawman stable for quite some time

● Effort starting to "formalize" specification● eduPerson style, not as an RFC/etc

● Long term home TBD

● Attribute names may change slightly from the examples

Page 13: Introducing the TIER ID Match Component - Internet2...2018/10/29  · Introducing the TIER ID Match Component I2TX • Orlando 9 Solution Part 2: Match Engine Implements rules for

Introducing the TIER ID Match ComponentI2TX • Orlando

13

Match Component

● Basically a configuration based API to SQL translator, using Postgres● Databases are designed to search lots of records given a variety of

attributes and patterns

● POC implementation performed quite well, but used config files, had no UI, and was not multi-tenant

● Why Postgres?● Postgres is Open Source and extremely full featured and reliable

● fuzzystrmatch Extension– Levenshtein distance matching and other capabilities

● MySQL might also work with some modifications, but not tested

Page 14: Introducing the TIER ID Match Component - Internet2...2018/10/29  · Introducing the TIER ID Match Component I2TX • Orlando 9 Solution Part 2: Match Engine Implements rules for

Introducing the TIER ID Match ComponentI2TX • Orlando

14

(But I Don’t Currently Run Postgres?)

● TIER Packaging coming soon

● Can use (eg) RDS (though some performance considerations)

● Local deployment works to some scale with no optimizations

Page 15: Introducing the TIER ID Match Component - Internet2...2018/10/29  · Introducing the TIER ID Match Component I2TX • Orlando 9 Solution Part 2: Match Engine Implements rules for

Introducing the TIER ID Match ComponentI2TX • Orlando

15

Search Types

● Exact

● Smith = Smith● Distance

● Smith = Simth● 1983-03-02 =

1983-02-03● Substring

● Anna = Anna Marie● Ann = Anna

● Dictionary

● John = Jonathan● Soundex

● Kathy = Cathy

Page 16: Introducing the TIER ID Match Component - Internet2...2018/10/29  · Introducing the TIER ID Match Component I2TX • Orlando 9 Solution Part 2: Match Engine Implements rules for

Introducing the TIER ID Match ComponentI2TX • Orlando

16

Search Modifiers

● Alphanumeric

● 1983-03-18 = 19830318● Case Sensitivity

● SMITH = Smith● Null Equivalents

● 000-00-0000 = Ø● “ “ = Ø

● Crosscheck● Allows a single search to apply

to multiple attributes (eg: official and preferred name columns)

● Invalidates● Converts canonical rule to

potential if attribute fails to match

● Tokenize

● Anne Marie vs Anne

Page 17: Introducing the TIER ID Match Component - Internet2...2018/10/29  · Introducing the TIER ID Match Component I2TX • Orlando 9 Solution Part 2: Match Engine Implements rules for

Match Component Scope and StatusInterface v1 Scope? v1 Status?

Postgres Match Backend √ √

MySQL / Other Database Support X X

NoSQL / ElasticSearch / etc X X

Alternate Backends (eg: local databases) X X

Pull Model X X

Web Driven Configuration √ √

ID Match REST API √ √

Page 18: Introducing the TIER ID Match Component - Internet2...2018/10/29  · Introducing the TIER ID Match Component I2TX • Orlando 9 Solution Part 2: Match Engine Implements rules for

Match Component Scope and StatusInterface v1 Scope? v1 Status?

Web Driven Manual Match Resolution √ √

API Driven Match Resolution √ √

COmanage Registry Integration √ O

Externalized Authentication (User Login) √ √

Basic Authentication (API Access) √ √

OAuth Authentication (API Access) X X

User / Group Based Authorization √ √

Page 19: Introducing the TIER ID Match Component - Internet2...2018/10/29  · Introducing the TIER ID Match Component I2TX • Orlando 9 Solution Part 2: Match Engine Implements rules for

Match Component Scope and StatusInterface v1 Scope? v1 Status?

Logging (matching logic, operations) √ O

Bulk Initial Load √ O

Notification on Fuzzy Match √ O

Multi-tenancy √ √

Page 20: Introducing the TIER ID Match Component - Internet2...2018/10/29  · Introducing the TIER ID Match Component I2TX • Orlando 9 Solution Part 2: Match Engine Implements rules for

Match Component Scope and StatusMatching Capabilities v1 Scope? v1 Status?

Flexible Attribute Selection √ √

Flexible Attribute Configuration √ √

Confidence Specification √ O

Attribute "Cross Check" √ O

Attribute "Grouping" √ √

Match "Invalidation" √ O

Levenshtein Distance Matching √ √

Page 21: Introducing the TIER ID Match Component - Internet2...2018/10/29  · Introducing the TIER ID Match Component I2TX • Orlando 9 Solution Part 2: Match Engine Implements rules for

Match Component Scope and StatusMatching Capabilities v1 Scope? v1 Status?

Exact Matching √ √

Substring Matching √ √

Dictionary Matching X X

Soundex Matching X X

Token Matching X X

Match Rule "Profiles" X X

Reference Identifier Type Selection √ O

Page 22: Introducing the TIER ID Match Component - Internet2...2018/10/29  · Introducing the TIER ID Match Component I2TX • Orlando 9 Solution Part 2: Match Engine Implements rules for

Match Component Scope and StatusMatching Capabilities v1 Scope? v1 Status?

Identifier Assignment X X

Manual Split / Join √ O

Notification on Split / Join X X

Track History of matchgrid √ O

Page 23: Introducing the TIER ID Match Component - Internet2...2018/10/29  · Introducing the TIER ID Match Component I2TX • Orlando 9 Solution Part 2: Match Engine Implements rules for

(((demo)))

Page 24: Introducing the TIER ID Match Component - Internet2...2018/10/29  · Introducing the TIER ID Match Component I2TX • Orlando 9 Solution Part 2: Match Engine Implements rules for
Page 25: Introducing the TIER ID Match Component - Internet2...2018/10/29  · Introducing the TIER ID Match Component I2TX • Orlando 9 Solution Part 2: Match Engine Implements rules for
Page 26: Introducing the TIER ID Match Component - Internet2...2018/10/29  · Introducing the TIER ID Match Component I2TX • Orlando 9 Solution Part 2: Match Engine Implements rules for
Page 27: Introducing the TIER ID Match Component - Internet2...2018/10/29  · Introducing the TIER ID Match Component I2TX • Orlando 9 Solution Part 2: Match Engine Implements rules for
Page 28: Introducing the TIER ID Match Component - Internet2...2018/10/29  · Introducing the TIER ID Match Component I2TX • Orlando 9 Solution Part 2: Match Engine Implements rules for
Page 29: Introducing the TIER ID Match Component - Internet2...2018/10/29  · Introducing the TIER ID Match Component I2TX • Orlando 9 Solution Part 2: Match Engine Implements rules for
Page 30: Introducing the TIER ID Match Component - Internet2...2018/10/29  · Introducing the TIER ID Match Component I2TX • Orlando 9 Solution Part 2: Match Engine Implements rules for
Page 31: Introducing the TIER ID Match Component - Internet2...2018/10/29  · Introducing the TIER ID Match Component I2TX • Orlando 9 Solution Part 2: Match Engine Implements rules for
Page 32: Introducing the TIER ID Match Component - Internet2...2018/10/29  · Introducing the TIER ID Match Component I2TX • Orlando 9 Solution Part 2: Match Engine Implements rules for
Page 33: Introducing the TIER ID Match Component - Internet2...2018/10/29  · Introducing the TIER ID Match Component I2TX • Orlando 9 Solution Part 2: Match Engine Implements rules for
Page 34: Introducing the TIER ID Match Component - Internet2...2018/10/29  · Introducing the TIER ID Match Component I2TX • Orlando 9 Solution Part 2: Match Engine Implements rules for
Page 35: Introducing the TIER ID Match Component - Internet2...2018/10/29  · Introducing the TIER ID Match Component I2TX • Orlando 9 Solution Part 2: Match Engine Implements rules for
Page 36: Introducing the TIER ID Match Component - Internet2...2018/10/29  · Introducing the TIER ID Match Component I2TX • Orlando 9 Solution Part 2: Match Engine Implements rules for
Page 37: Introducing the TIER ID Match Component - Internet2...2018/10/29  · Introducing the TIER ID Match Component I2TX • Orlando 9 Solution Part 2: Match Engine Implements rules for
Page 38: Introducing the TIER ID Match Component - Internet2...2018/10/29  · Introducing the TIER ID Match Component I2TX • Orlando 9 Solution Part 2: Match Engine Implements rules for
Page 39: Introducing the TIER ID Match Component - Internet2...2018/10/29  · Introducing the TIER ID Match Component I2TX • Orlando 9 Solution Part 2: Match Engine Implements rules for
Page 40: Introducing the TIER ID Match Component - Internet2...2018/10/29  · Introducing the TIER ID Match Component I2TX • Orlando 9 Solution Part 2: Match Engine Implements rules for
Page 41: Introducing the TIER ID Match Component - Internet2...2018/10/29  · Introducing the TIER ID Match Component I2TX • Orlando 9 Solution Part 2: Match Engine Implements rules for
Page 42: Introducing the TIER ID Match Component - Internet2...2018/10/29  · Introducing the TIER ID Match Component I2TX • Orlando 9 Solution Part 2: Match Engine Implements rules for
Page 43: Introducing the TIER ID Match Component - Internet2...2018/10/29  · Introducing the TIER ID Match Component I2TX • Orlando 9 Solution Part 2: Match Engine Implements rules for
Page 44: Introducing the TIER ID Match Component - Internet2...2018/10/29  · Introducing the TIER ID Match Component I2TX • Orlando 9 Solution Part 2: Match Engine Implements rules for
Page 45: Introducing the TIER ID Match Component - Internet2...2018/10/29  · Introducing the TIER ID Match Component I2TX • Orlando 9 Solution Part 2: Match Engine Implements rules for
Page 46: Introducing the TIER ID Match Component - Internet2...2018/10/29  · Introducing the TIER ID Match Component I2TX • Orlando 9 Solution Part 2: Match Engine Implements rules for

Introducing the TIER ID Match ComponentI2TX • Orlando

46

Next Steps

● Finish code stabilization, testing

● Finish documentation● Match product

● ID Match API

● Packaging

● Scope v1.1.0 features

Page 47: Introducing the TIER ID Match Component - Internet2...2018/10/29  · Introducing the TIER ID Match Component I2TX • Orlando 9 Solution Part 2: Match Engine Implements rules for

Introducing the TIER ID Match ComponentI2TX • Orlando

47

More Info

● https://github.internet2.edu/COmanage/match (develop branch)

● https://spaces.at.internet2.edu/display/COmanage/COmanage+Match+Technical+Manual

● https://spaces.internet2.edu/display/cifer/SOR-Registry+Strawman+ID+Match+API