information management software catching the bad guys (and seeing the good guys) entity/relationship...
TRANSCRIPT
Information Management Software
Catching the bad guys (and seeing the good guys)
Entity/Relationship Analytics and how to Understand/Recognize Global Names
Information Management Software
Entity Analytic Challenges:Ability to Overcome Multiple Levels of Identity Ambiguation
Naturally Occurring Phenomena Such As Data Quality And Cultural Variants, As Well As Deliberate Acts Of Identity Misrepresentation, Compounded by Need to Protect Privacy
LEVEL 1)
Dirty Disparate Data
LEVEL 2)
Cultural Obstacles LEVEL 3)
Identity Ambiguation
LEVEL 4)
Network Ambiguation
LEVEL 5)
Privacy & Security
Information Management Software
Transposition ErrorsMultiple FormatsData DriftDirty Data
Complexity Level Complexity Level HIGHHIGH
Perform Data Perform Data QualityQuality
(Naturally Occurring)
What is Needed To Address The Challenge?Step One – Address Naturally Occurring Data Quality IssuesThe first step is to gather the information assets necessary to accomplish the mission, and perform consistent quality, standardization, formatting, and enhancement
Information Management Software
Architecturally Thinking
• We can handle dirty data…
• But WHAT is dirty data?
• Do we always want to cleanse data?
• What is the value of dirty data?
• Consider the sources of data
• Consider the flow of data
Information Management Software
6
The Problem: Ambiguous, Misrepresented, Blurry Identity
For a variety of reasons, companies don't have a clear picture of the individuals and organizations with whom they do business.
For a variety of reasons, companies don't have a clear picture of the individuals and organizations with whom they do business.
Depending on the nature of the organization's mission, the impact can lead to problems including missing threats to public safety, duplication of benefit payments, accepting
business from known criminals, etc.
Information Management Software
IBM InfoSphere Relationship ResolutionIBM InfoSphere Relationship Resolution
Who is who?
Establish Unique Identity
Integrates data silosFull attribution of
entities
Who knows who?
Obvious & non-obviousLinks people & groupsRole alerts
Who does what?
Events & TransactionsBusiness Rule
MonitoringCriteria based alerting
Information Management Software
Transposition ErrorsMultiple FormatsData DriftDirty Data
Name CulturesName GendersName Order
Complexity Level Complexity Level HIGHHIGH
Complexity Level Complexity Level HIGHHIGH
What is Needed To Address The ChallengeStep Two – Manage Cultural Identity Ambiguation
The second step is managing the cultural variations of identity data such as name variants, spellings, cultures, genders and applying culture-specific analytics to the recognition process
Cultural Cultural AmbiguitiesAmbiguities(Naturally Occurring)
Perform Data Perform Data QualityQuality
(Naturally Occurring)
Information Management Software
Transposition ErrorsMultiple FormatsData DriftDirty Data
Name CulturesName GendersName Order
Complexity Level Complexity Level HIGHHIGH
Identity MaskingFalse IdentifiersStolen Identifiers
Complexity Level Complexity Level VERY HIGHVERY HIGH
Complexity Level Complexity Level HIGHHIGH
What is Needed To Address The Challenge?Step Three – Address Intentional Identity
Resolve “Who Is Who” – Resolve and identify persons/organizations deliberately trying to hide or misrepresent who they actually are
Cultural Cultural AmbiguitiesAmbiguities(Naturally Occurring)
Resolve Resolve IdentityIdentity
(Intentional Act)
Perform Data Perform Data QualityQuality
(Naturally Occurring)
Information Management Software
EAS Entity #9453
Attribute Value Source
Name Marc R Smith A-70001Name Randal M Smith B-9103Name Mark Randy Smith C-6251Address 123 Main St A-70001Address 456 First St C-6251Phone (713) 730 5769 A-70001Phone (713) 731 5577 B-9103Phone (713) 731 5577 C-6251Tax ID 537-27-6402 A-70001License 0001133107 A-70001License 1133107 C-6251DOB 06/17/1934 B-9103
EAS Entity #9453
Attribute Value Source
Name Marc R Smith A-70001Name Mark Randy Smith C-6251Address 123 Main St A-70001Address 456 First St C-6251Phone (713) 730 5769 A-70001Phone (713) 731 5577 C-6251Tax ID 537-27-6402 A-70001License 0001133107 A-70001License 1133107 C-6251
EAS Entity #9452
Attribute Value Source
Name Randal M Smith B-9103DOB 06/17/1934 B-9103Phone (713) 731 5577 B-9103
Observations
Entity Analytic Solutions – “Who is who?”
Record #70001Marc R Smith123 Main St(713) 730 5769537-27-6402DL: 0001133107
Record #9103Randal M SmithDOB: 06/17/1934(713) 731 5577
Record #6251Mark Randy Smith456 First Street(713) 731 5577DL:1133107
EAS Entity #9451
Attribute Value Source
Name Marc R Smith A-70001Address 123 Main St A-70001Phone (713) 730 5769 A-70001Tax ID 537-27-6402 A-70001License 0001133107 A-70001
A – Credit Card B – Mortgage C – DDA
Sequence Neutral Identity Resolution
Self-Correcting
20 Attributes Out of the Box
Predefined Rules/Sensitivities
A
B
C
Interactions Entity Context
Information Management Software
EAS Entity Resolution – The Basis for Assessment
Mr. Joseph Carbella55 Church StreetNew York, NY 10007Tel#: 212-693-5312DOB: 07/08/66SID#: 068588345DL#: 544 210 836
ACCT # 2310322
COSIGNER
Mr. Joe JonesAPT 4909Bethesda, MD 20814Tel#: 978-365-6631DOB: 09/07/66AUTO LOAN
Mr. Joe Carbello1 Bourne StClinton MA 01510TEL#: 978-365-6631 DL#: 544 210 836DOB: 07/09/66
ACCT #3292322
HOME LOAN
Mr. Joey Carbello555 Church AveNew York, NY 10070Tel#: 212-693-5312 DL#: 544 210 836
PPN#: 086588345
ACCT #494202
OBLIGOR Close match
Exact match
Allows Investigators to Establish True Identity When Suspects, Attempt To Hide Or Blur Who They Are and Their Characteristics
Information Management Software
EAS Entity Resolution – Risk View
Names Marc R Smith A-#70001 05/01/05
Randal Smith B-#009102 05/10/06
Mark Randy Smith C-#6251 07/12/05
Address 123 Main St. A-#70001 05/01/05
456 First Street C-#6251 07/12/05
Phones (713) 730-5769 A-#70001 05/01/05
(713) 731-5577 B-#009102 05/10/06
SSN 537-27-6402 A-#70001 05/01/05
DL 1133107 A-#70001 05/01/05
1133107 C-#6251 07/12/05
DOB 06/17/1934 B-#009103 05/10/06
COUNTRY
Pakistan B-#009103 05/10/06
ACCTYPE
Wire A-#70001 05/01/05
OFAC Match C-#6251 07/12/06
PEP No Match A-#70001 05/01/06
NEGNEWS
Criminal B-#009103 05/10/06
SICCODE 1023113 A-#70001 05/01/06
HIFCA Los Angeles C-#6251 07/12/06
HIDTA San Diego B-#009103 05/10/06
Entity #144465
Marc R SmithRandal SmithMark Randy Smith123 Main St
#144465
Information Management Software
EAS Identity Repository – Identity Folder
Bob SmithMark Robert SmithMarc R SmithMark Smith
Marc R Smith
Bob Smith
Mark Robert Smith
Mark Smith
#144465
Information Management Software
Architecturally Thinking
• How does this compare to federating data?
• IS this MDM?
• If yes, why?
• If no, why?
• Remember the Filing Cabinet
• Remember the dirty data?
• Consider your understanding of Single View
• Remember that you now have a database footprint
Information Management Software
Transposition ErrorsMultiple FormatsData DriftDirty Data
Name CulturesName GendersName Order
Complexity Level Complexity Level HIGHHIGH
Identity MaskingFalse IdentifiersStolen Identifiers
Complexity Level Complexity Level VERY HIGHVERY HIGH
NomineesNon Obvious RelationsHidden networks
Complexity Level Complexity Level EXTREMEEXTREME
Complexity Level Complexity Level HIGHHIGH
What is Needed To Address The Challenge?Step Four – Address Network Ambiguation
Uncover “Who Knows Who” – Spot linkages or Non-Obvious Relationships between identities to reveal criminal networks, syndicates, and terrorist cells
Cultural Cultural AmbiguitiesAmbiguities(Naturally Occurring)
Resolve Resolve IdentityIdentity
(Intentional Act)
Relate Relate IdentityIdentity
(Intentional Act)
Perform Data Perform Data QualityQuality
(Naturally Occurring)
Information Management Software
Entity Analytic Solutions – “Who knows who?”
A – Credit Card B – Mortgage C – DDA D – Wires E – Addl internal/ External
Marc is related to Bob from B by a disclosed relationship.
Marc is related to Bob from B by a disclosed relationship.
What relationships does Marc Smith hold with entities across
the enterprise?
Marc is related to Joan from B by home address
Marc is related to Joan from B by home address
Related to Alice (through Sue) from D by a phone number at two degrees of separation
Related to Alice (through Sue) from D by a phone number at two degrees of separation
Related to John (through Alice) from B by a business address at three degrees of separation
Related to John (through Alice) from B by a business address at three degrees of separation
Related to Sue from C by a Tax ID at one degree of separation
Related to Sue from C by a Tax ID at one degree of separation
Information Management Software
EAS Relationship Resolution – Degrees of Separation( across any attribute(s) )
A: Mark Smith
Phone: (713) 730 5769
B: Kate Green
Phone: (713) 730 5796
Addr: 123 Main St
C: Tom Sinclair
Addr: 123 Main St
*** OFAC LIST ***
(Associative Property: If A = B = C; Therefore A = C)
=
= =
EAS Supports 30 Degrees of Separation!
A: Mark Smith
Phone: (713) 730 5769
C: Tom Sinclair
Addr: 123 Main St
*** OFAC LIST***
Mark is related to Tom by Two Degrees of Separation.
Information Management Software
144225
144465
144465
142365
143211
149965
144465
144465
144465
123101
144465
144465
143265
148965144215
142145
Mark Smith
123 High St
Telford
Kate Green
431 Rebus
Avenue
Harlow
Tom Sinclair
23 Lansbury Ave
Stratford
Raj Jones
65 Kenyan Way
Jim Roberts
30130 Elm
Boston, MA USA
Ming Chan
495 Randal St
Liverpool
Harold Burr
402 West St
Bristol
Luci Tamoia
13 Galliard House
Leeds
Gwen Roberts
95 Arvale Road
London
Juergen Lit
921 Rue de Lyon
Paris
Identity Folders – Complete Relationship Resolution
Information Management Software
Architecturally Thinking
• Degrees of separation…from?
• Other people (identities)
• Other “things” (entities)
• EAS is a fraud detection system
• If yes, why?
• If no, why?
• Remember the Filing Cabinet
• Remember the dirty data?
• The power of understanding a network of identities
• EAS is rarely (never) “rip and replace”
• If not, then where does it fit?
Information Management Software
Transposition ErrorsMultiple FormatsData DriftDirty Data
Privacy ComplianceSecurity Drivers
Complexity Level Complexity Level EXTREMEEXTREME
Privacy & Privacy & SecuritySecurity
(Reactive Action)
Name CulturesName GendersName Order
Complexity Level Complexity Level HIGHHIGH
Cultural Cultural AmbiguitiesAmbiguities(Naturally Occurring)
Identity MaskingFalse IdentifiersStolen Identifiers
Complexity Level Complexity Level VERY HIGHVERY HIGH
Resolve Resolve IdentityIdentity
(Intentional Act)
NomineesNon Obvious RelationsHidden networks
Complexity Level Complexity Level EXTREMEEXTREME
Relate Relate IdentityIdentity
(Intentional Act)
Complexity Level Complexity Level HIGHHIGH
Perform Data Perform Data QualityQuality
(Naturally Occurring)
Anonymization – For situations where privacy or security concerns make recognition high risk, or sensitive data needs to be de-identified to facilitate cross-agency/country sharing and analytics
What is Needed To Address The Challenge?Accommodate Privacy & Security Considerations If Required
Information Management Software
Identity-based Aggregate
Entity Analytic Solutions – “Who does what?”
Observed Activities
Business Rules & Threshholds
TRANSACTIONSAcct #120-555Withdraw $9,900Acct #456-983Withdraw $9,800Acct #942-525Withdraw $9,800
EVENTS01/25/08 10:39Account Applicant01/25/08 10:55Account Applicant01/25/08 11:05Account Applicant
Cust #C-6251Mark Randy SmithAcct #120-555Cust #A-70001Marc R SmithAcct #456-983Cust #B-9103Randal M SmithAcct #942-525
Sample Rules•Transaction Amt > $
•Average Transaction Amt
•Number of Transactions > X
•Between Date A and Date B
•Within Geospatial Range
•Combinations of the Above
•User Defined
Streaming Real-Time Monitoring & Alerting
(User) Define New Rules via GUI
Information Management Software
Traditional Anti Money Laundering – Account OrientationFraudsters know how to defeat account based detection systems
To defeat SAR systems criminals will “structure” activity across multiple accounts, each attached to an identity packet, and across multiple geographies so the suspicious pattern is watered down and overlooked.
$8,000 Cash Deposit
ACCT# 987-442-004
ACCT# 321-462-567
$8,000 Wire Transfer $9,500 Cash Deposit
ACCT# 675-466-099
$8,000 Cash Deposit
ACCT# 987-442-004
ACCT# 321-462-567
$8,000 Wire Transfer
$9,500 Cash Deposit
ACCT# 675-466-099
$9,900 Wire TransferACCT# 990-432-
000
Entity Analytic Solutions – Identity OrientationEAS (identity based) Catches The Fraudsters at THEIR Game!
Account-Number-Based Analysis Solutions
Possess Blind-Spots
$35,000ALERT!
“Structuring remains one of the most commonly reported suspected crimes on Suspicious Activity Reports (SARs).” – BSA AML Examination Manual
Information Management Software
Architecturally Thinking
• Real time “action resolution” through business rules
• What are the business rules?
• And who knows them?
• Always, always, always consider data overload
• A new term: False positive/False negative
• They can be:
• A great ROI tool when you reduce them
• A REAL PROBLEM when you increase them
• Think: Feedback loops
• Think: Synergy
Information Management Software
Entity Analytic Solutions – “How do I find what I should know”
DATADATA
DATADATA DWDW
Merge/Purge introduces data loss when picking
the “best version”
DATADATA
MARTMART
MARTMART
MARTMART
Data segragated to “support”
dept initiatives
“Have we seen this applicant
before?”
Days or Weeks
Days, Weeks or Months
Are all the details still present
New data? Must ask again every day
What is the right question to ask?
Will I remember how these facts relate?
The “Enterprise Amnesia” Model
Information Management Software
Entity Analytic Solutions – “How do I find what I should know”
DATADATA
DATADATA
The “Enterprise Awareness” Model
DATADATA
“Have we seen this applicant
before?”
DWDW
MARTMART
MARTMART
MARTMART
Each New Key Data Value Introduced is Evaluated Against All Prior Key Data Values
Seconds – Streaming Real-Time
Alerts pushed to analyst upon suspicious activity
Catch The Bad Guys!Nominal Latency & Real-Time Contextto PRE-EMPT and PREVENT
Catch The Bad Guys!Nominal Latency & Real-Time Contextto PRE-EMPT and PREVENT
Process New Key Info First Like a Query
Queries & Data Flow Through The Same Channel =
“The address and phone for account 59412 has
changed 5 times in just 3 weeks. Alert! potential
Identity Theft.”
“The address and phone for account 59412 has
changed 5 times in just 3 weeks. Alert! potential
Identity Theft.”
” A bank employee changed their payroll
address to the address of an ex-employee jailed for
embezzlement three years ago.”
” A bank employee changed their payroll
address to the address of an ex-employee jailed for
embezzlement three years ago.”
“This person has applied 10 times before and
shares an address and SSN with a bank
employee/teller in the same city.”
“This person has applied 10 times before and
shares an address and SSN with a bank
employee/teller in the same city.”
Information Management Software
Recognize
Resolve Relate
Engine
Persistant Search & Alerts
Database
Full Attribution
Fully Auditable
Service
CoreClient * Alert * Entity
Process Search * Load * Score
Application Server
Entity Analytic Solutions Architecture
Client
VisualizerSearchGraphResearch
ConsoleConfigureSecureManage
Entity Repository
User
Admin
Information Management Software
Enterprise
Ability to aggregate by consolidated identities and their full attributes
Relationships available as another dimension
Data Store
EAS Architecture in the Enterprise
Data Source
Metadata
Warehouse Administration
Ext
ract
ion
, Tra
nsf
orm
atio
n,
Val
idat
ion
; F
eder
ated
Dat
a A
cces
s
ETL Data Mart
Fraud
Detection
Analysis /
Mining
CRM
Financial
Extranet Portal
Apps, e.g.
Visualization
Access
Query/Reporting
Intranet Portal
External Audience
Internal Audience
Data Warehouse
M&A Data
Best Customers
Internal
Watch Lists
& CalculationTransformation
Review & Act (manually/auto) on Conflicts in Identities and
Relationships per Business Rules
REAL-TIME(or batched)IDENTITYRESOLUTION
Population in thecontext of theresolved identities
Identity andRelationshipRepository
EASAlways Up-to-date
(No Reload)
Employees
Vendors
Online Customers
Additional DataOn An Identity
External ListsOf Bad Guys
External
Data Service(s)
Information Management Software
Basic Question Number 1: How do you handle the management of your name data?
Difficult to accurately search for & match customers family or cultural variants of first and last names
Can validate addresses & telephone numbers, but how do we know if a name is accurate?
We have invested millions in cleaning up our customer data file, yet problems remain
Current solutions based on very old technology and generate too many false positives & negatives, too time-consuming.
This is a pervasive problem in many industries
Name Matching & Name Management Challenges Significant Business Issues
OFAC ListCheck Was
Missed!!
There’s how many
variants of that name? How do you
parse “Maria Luz Rodriguez
v. de Luna”
Information Management Software
Who cares about names?
Risk posed by false negatives / Requirement to handle names precisely
Size of name data
set
Small
Large
Low High
Criticality
of Name Data M
anagement
Information Management Software 36
What’s In a Name?
• Names remain the single most important means for identifying persona non grata
• Biometrics are only useful the second time you meet someone
• People everywhere in the world are learning how easily our name search systems can be confounded and circumvented
Information Management Software
Nicknames, Drew, Manny, CatShortened names, Andy, EmanPrefixes, Abdul, Fitz, O', De La,
Name Order, Hussein, Mohammed Abu AliTitles, Dr., Rev, Haj, Sri., ColPhonetics, Worchester, Wooster, “Worcester”
Andras, André, Andre, Drue, Ohndrae, Ohndre
Eman, Emanual, Imanuel, Immanuele, Manny, Manual,
Mohaammad,Mohammed, Imhemmed,Mohammd, Mohamod, Mohamud,
Cait, Caitey, Katalin, Katchen, Kate, Katerinka,
Why Are Multi-Cultural Names So Hard?How Do You Verify A Name?
? ? ? ? ?
Typically CIFs are focused on storing customer information, demographics, account information etc. They aren’t equipped to deal with the unique demands of classifying, matching and processing global & cultural name variations.
Information Management Software 38
Algorithm
User
SuccessfulName
Searching
Database
What’s Needed to Address The Challenge?
The sole purpose of a
Search Engine
is to mediate between a
User and a Data Base
Information Management Software 39
Database Problems
What We FoundThe First Problem
MARIA ELENA
LOPEZ GARCIA
MARIA ELENA
LOPEZ GARCI
MARIA ELENA
LOPEZGARCIA
Information Management Software 40
Ineffective Search Technologies
What We FoundThe Second Problem
Database Problems
Exact Match
Soundex (1918)
NYSIIS (1963)
“Home - Grown”
Information Management Software 42
Search
What We FoundThe Third Problem
Database
Exact Match
Soundex (1918)
NYSIIS (1963)
Limited User Support
Chinese
Arabic
Thai
Hispanic
Russian
Korean
Yoruban
Indonesian
“Home-Grown”
Information Management Software
Cheung Yau So
Cheung Yau So
Chiusu Sae Chang
Zhang Qiusu
Chang Ch’iu-Su
There are hundreds of name variants There are multiple ways that these
names can be spelled -
You can verify an address, a telephone number, but how do you verify a name??
Simple Name Recognition is Particularly Hard
Taiwan
Philippines
Indonesia
ThailandCambodia
Myanmar(Burma) Laos
Vietnam
Hong KongMacau
Malaysia
China
Singapore
Information Management Software
Solution
Name-centric data
warehouses
Anti-Money Laundering
Systems
Customer Information
Systems
ERP Systems(HR, Contracts,
etc.)
Search Lists
Watch lists
(OFAC, PEP,
Interpol)
3rd Party data sets
Name data files
Global Name Recognition
• Global Name Recognition consists of a set of tools that complement existing IT investments for organizations looking to analyze, search and process names
• Domain expertise in multi-cultural names in the areas of:
• Name Analysis
• Name Enrichment
• Name Matching
• Adds value to name matching and analysis based on statistical and linguistic analysis of almost a billion names and 18 cultural families
• Reduces false positive results so that the information returned is reliable and relevant
Information Management Software
What is GNR ?
• A series of Services Oriented Architecture enabled libraries and interfaces that address the linguistic and cultural complexities of names (personal, organizational,…) from around the world
• Used to enhance name-processing (analysis, matching, understanding) in a wide variety of systems and applications
• Based on 20+ years intensive research and data-collection of names – based on approximately 1 Billion name repository)
Information Management Software
Andreas, Andrei, Andrej, Andres, Andresj, Andrewes, Andrews, Andrey, Andrezj, Andrian, Andriel, Andries, Andrij, Andrija, Andrius, Andro, Andros, Andru, Andruw, Andrzej, Andy, Antero, Dandie, Dandy, Drew, Dru, Drud, Drue, Drugi, Mandrew, Ohndrae, Ohndre, Ondre, Ondrei, Ondrej, Ohnrey Ondrey, Eric, Erich, Erick, Erico, Erik, Eryk, Federico, Federigo, Fred, Fredd, Freddie, Freddy, Fredek, Frederic, Frederich, Frederico, Frederik, Fredi, Fredric, Fredrick, Fredrik, Frido, Friedel, Friedrich, Friedrick, Fridrich, Fridrick, Fritz, Fritzchen, Fritzi, Fritzl, Fryderky, Ric, Fredro, Rich, Rick, Ricky, Rik. Rikki. Cait, Caitie, Cate, Catee, Catey, Catie, Kaethe, Kait, Kaite, Kaitlin, Katee, Katey, Kathe, Kati, Katie, Bel, Belia, Belicia, Belita, Bell, Bella, Belle, Bellita, Ib, Ibbie, Isa, Isabeau, Isabela, Isabele, Isabelita, Isabell, Isabella, Isabelle, Ishbel, Isobel, Isobell, Isobella, Isobelle, Issie, Issy, Izabel, Izabella, Izabelle, Izzie, Izzy, Sabella, Sabelle, Ysabeau, Ysabel, Ysabella, Ysobel, Ainslaeigh, Ashalee, Ashalei, Ashelei, Asheleigh, Asheley, Ashely, Ashla, Ashlan, Ashlay, Ashle, Ashlea, Ashleah, Ashlee, Ashlei, Ashleigh, Ashlen, Ashli, Ashlie, Ashly, Boutros, Par, Peder, Pedro, Pekka, Per, Petar, Pete, Peterson, Petr, Petre, Petros, Petrov, Pierce, Piero, Pierre, Piet, Pieter, Pietro, Piotr, Pyotr, Hamid, Hammad, Mahmood, Mahmoud, Mahmud, Mahomet, Mehmet, Mehmood, Mehmoud, Mehmud, Mihammad, Mohamad, Mohamed, Mohamet, Mohammad, Mohammed, Muhamet, Muhammed. Achmad, Achmed, Ahmaad, Ahmad, Ahmet, Ahmod, Amad, Amadi, Amahd, Amed. Amad, Amed, Amahd, Amadi, Ahmad, Amado, Amid, Umed. Iman, Imre, Imani, Imri, Imray, Ismat, Itai, Mead. Ad, Adamo, Adams, Adan, Adao, Addam, Addams, Addem, Addie, Addis, Addison, Addy, Ade, Adem, Adham, Adhamh, Adim, Adnet, Adnon, Adnot, Adom, Atim, Atkins Edom, Adem, Aindrea, Aindreas, Analu,
GNR has implemented a knowledge based approach for coping with the wide array of multi-cultural name forms found in databases.
The Knowledge Base Process
Maria del Carmen Bustamante de la Fuente Hisham Abu Ali Quereshi Noor Eldin Chang Wen Ying Nadezhda Ivanovna Ovtsyuk William Martin Smith-Bagby Jr. Ohndre Van Der Merve
GNR Knowledge Base Over 20 years in development Information based, not rule based. Over 200 countries studied and growing Close to a billion names & linguistics, and growing We have it, no one else does
Ohndre
Male 90%German
50 Variants
Parsing
On-Dray
Search
1. Names are first submitted to an automatic analysis process, which determines the most likely cultural/linguistic origin of the name.
2. Based on this determination, an appropriate algorithm or set of rules is applied to the matching process.
Classification: Ohndre 89% German Van Der Merve 90% Dutch
Parsing: “Van Der Merve, Ohndre”
Gender: Ohndre – 90% Male
Variants: 65 Variations
Phonetics: “On-Dray - Ohndre”
Noise: Ohndre, Ondre, Omdre,
Nicknames: Andy, Drew, Drus
Salutations: Mr, Mrs, Doctor, Haj,
Information Management Software
Global Name Management – is made up of 2 products:
•Global Name Analytics •Global Name Scoring
Global Name Encyclopedia
Transliteration• Cyrillic• Latin-2• Greek• Arabic
IBM Global Name Analytics IBM Global Name Scoring
Global Name Reference Encyclopedia
Fully automated, high-performance multi-cultural name recognition and analysis
IBM Global Name Management(Complete portfolio package – Minus the Encyclopedia)
Global Name Recognition
Information Management Software
Global Name Analytics
IBM Global Name Analytics Identifies and classifies cultural
background of a name Determines country of association for a
given name Recognizes whether a name is
predominantly male or female and provides relevant frequency statistics
Returns name variants and scores in order of their frequency of occurrence
Determines which name, of a given name combination, is likely to be the given name or surname
Information Management Software
Global Name Scoring (aka Name Matching)
IBM Global Name Scoring
Input: Li-Hsiang Tsai
Search Results: Tsai, Li Hsiang 1.0 116313Tsai, Lishiang 0.99 102059Cai, Li-Xiang 0.98 131620Tasi, Li Hsiang 0.83 158987
Key capability: perform name matching against lists or other data sources
Improved accuracy of name searching, transliteration, and the quality of identity verification initiatives
Tuning capability for more than 40 parameters, allowing for highly tuned and application-specific results
Provides ranked search results based on similarity of pronunciation
Capability to accept names in native script for Arabic, Cyrillic and Greek languages and return results
KEY POINTS: Fast, accurate, scalable name matching
Information Management Software
Standalone user-based web application
Lightweight, no need for integration with systems
Comprehensive, interactive reference tool for understanding names, their origins and history
Includes culture-specific information about names, their use, their meanings, and their patterns of spelling variations
Global Name Reference Encyclopedia
Global Name Reference Encyclopedia