© 2006 Identity Systems, a division of Nokia
Suresh Menon Identity Resolution – Keys to Successful
CDI/MDM
© 2006 Identity Systems, a division of Nokia
Where We’ve Been
• A pioneer in name search and matching• First COTS delivered in 1986
• A leader in the global market for 20 years• Over 600 clients worldwide• Over 60 languages and countries supported
• Leadership in the Identity Resolution space• SSA-NAME3• Identity Search Server (ISS)• Address Standardization Module• Data Clustering Engine
• Continued pace-setter in the industry• 2005 DM Review World Class Solutions Award• Uniting Hurricane Katrina victims and families• KM World’s top “100 Companies that matter in 2006”
© 2006 Identity Systems, a division of Nokia
Proven Solutions
Hewlet Packard
Kaiser Permanente
AT&T
Readers Digest
Nationwide
Sprint
FedEx
GE Capital
Citigroup
American Express
Visa
Goldman Sachs
Equifax
Experian
FBI
IRS
US Postal Inspector
Dept Homeland Security
Canada Border Services
Australian Customs
DHL
Florida Dept Law Enforcement
Wells Fargo
© 2006 Identity Systems, a division of Nokia
Partners
© 2006 Identity Systems, a division of Nokia
CDI and MDM
“ CDI is the combination of technology, processes & services needed to create and maintain an accurate, timely and complete view of the customer across multiple channels, business lines and potentially enterprises, where there are multiple sources of customer data in multiple systems and databases.”
- John Radcliffe, Gartner
MDM encompasses specialized applications, techniques and technologies for aggregating the data from source systems, matching, merging, reconciling and standardizing the data, and ensuring that the target applications and systems have access to the master record on a timely basis.
© 2006 Identity Systems, a division of Nokia
Not just customers
Concepts apply equally well to:
Single “employee" view Single "supplier" view
Single “product" view
Single "taxpayer" view
© 2006 Identity Systems, a division of Nokia
How do you provide a consistent view across data sources ?
Accounts Receivable Accounts Payable
Technical Services
Call Center
Marketing
© 2006 Identity Systems, a division of Nokia
Identity Resolution and CDI/MDM
• Identity Resolution uses sophisticated algorithms to find and match records about a particular customer from multiple sources despite structural anomalies and quality problems
• Because the accuracy of the master record hinges on how well ALL records about a particular entity are found and matched – Identity Resolution is a critical success factor.
• Without a robust Identity Resolution infrastructure, the anticipated ROI of any MDM/CDI implementation is at risk.
© 2006 Identity Systems, a division of Nokia
What’s Wrong with Identity Data?
• Data entry error
• System inadequacies
• Natural influences
• Fraud and money laundering
• Program or data conversion error
• Inherited error
© 2006 Identity Systems, a division of Nokia
20 Common Errors & Variation
Variation or Error Example
Sequence errors • Mark Douglas or Douglas Mark
Involuntary corrections • Browne – Brown
Concatenated names • Mary Anne, Maryanne
Nicknames and aliases • Chris – Christine, Christopher, Tina
Noise • Full stops, dashes, slashes, titles, apostrophes
Abbreviations • Wlm/William, Mfg/Manufacturing
Truncations • Credit Suisse First Bost
Prefix/suffix errors • MacDonald/McDonald/Donald
Spelling errors • P0rter
Typing errors • Beht
© 2006 Identity Systems, a division of Nokia
Variation or Error Example
Transcription mistakes • Hannah, Hamah
Missing tokens • George W Smith
Extra tokens • George Smith, Smith
Foreign sourced data • Khader AL Ghamdi, Khadir A. AlGamdeyUnpredictable use of initials • John Alan Smith, J A Smith
Transposed characters • Johnson, Jhonson
Localization • Stanislav Milosovich – Stan Milo
Inaccurate dates • 12/10/1915, 21/10/1951, 10121951, 00001951
Transliteration differences • Gang, Kang, Kwang
Phonetic errors • Graeme – Graham
20 Common Errors & Variation
© 2006 Identity Systems, a division of Nokia
PEG MC CARY
Worldwide Variations Challenge Existing Systems
ABDULLAH AL MUSA
GEORGE PAPADOPLOUS
KIYO R SAHATO
MARGARET MACCLARY
A.ALLAH ALMOUSA
W. KWOK KI HOHWILLIAM KWOK MR. BILLY H KWOK
GRIETJE MCCLLARY
عبداالله الموس
ΓΕΩΡΓΠΑΠΑΔΟΠΟΥΛΟΣ
ΠΑΠΑΔΟΠΟΥΛΟΣΙΩΑΝΝΗΣ
Record 1 Record 2 Record 3
SAHATO-ROH, KIYO
© 2006 Identity Systems, a division of Nokia
Enterprise-Wide Data Sources
Enabling the 360 View of entities
XML/SOA
SearchEnterprise Search for
擺禮
HTML Client
Search
Batch Relate Search
314A Search for
Jack Abramoff
Additional OFAC Entry Abu Musab
al-Zarqawi
API Search New Customer Jonathan Smitthers
How well do you know your
customers?
CRM CIFHR ContractsPartnersVendors
IdentitySearch
Server (ISS)
© 2006 Identity Systems, a division of Nokia
Real World Example
• Hewlett Packard• Hundreds of millions of
customers• Acquisition of Compaq• Single standardized view of
all B2B customers• Across enterprise• Across geography• ID Duplicates• Enrich with additional
intelligence• Unicode• Simplify integration• Real-time or batch
• 250,000 transactions each day
• Saved 100,000 man hours
© 2006 Identity Systems, a division of Nokia
External
Applications CRM ERP Legacy Partners
Customer Analytics
DW MDM
Designer Admin
Console Data Steward Console
Identity Search Server
ISS Index
Unified Customer Index (ISS)
External Data SourcesExternal Reference
Databases
Correlation
CDI Hub
© 2006 Identity Systems, a division of Nokia
Optimum Identity Resolution Framework
“Smart” indexing & key buildingFlexible search strategiesMatching algorithms
Speed and scale
Simple to use and deployXML/SOA integration
© 2006 Identity Systems, a division of Nokia
Matching Algorithms
• Hundreds of algorithms exist to solve this problem.• The following 5 are the most commonly used within
the DQ/IRT industry.• Probabilistic
• Based on statistical frequency analysis, and derives key values that are used to perform matching
• Unable to catch very common errors such as character transpositions.
• Heuristic• Operates on “rule of thumb” derived from experience that
may be true. • Cannot deal with data anomalies for which a rule is not
present.
• Deterministic• A deterministic algorithm is one where the behavior can be
predicted from the input.• Unable to consider such common data anomalies as blank
fields, transposition of characters, or abbreviations.
• Empirical• Data driven, based on experience or observation and can
also reflect distinct cultural standards. • Dependent on dictionary/rulesets – and cannot
compensate for any “new” errors/variations.
• Language recognition• Based on identifying what cultural background a given
name comes from a dictionary of names. • Unable to compensate for new names – or hybrids. • Valid variations could mean that incorrect rules are applied
leading to missed matches.
• Best Solution: Hybrid• “Which algorithm is the best in solving my searching and
matching needs?”
• The answer is “No single algorithm is capable of compensating for all the classes of error and variation present in identity data.”.
• In order to achieve a consolidated view of your identity data, you will need a combination of these algorithms, and more, each one addressing a particular class of problem,
• IDS technology uses a variety of techniques, including the five mentioned here and many more, to address different classes or error and variation in identities
© 2006 Identity Systems, a division of Nokia
What’s next - is ISS+UDM
• The Unstructured Data Module extends Identity Systems' identity searching and matching technology to unstructured data.
• Identity data goes far beyond simple demographic data and could be product information, transactions or orders, emails and telephone call logs.
• Gives an organization the ability to find and access all of the identities stored in:• All structured and unstructured data repositories• despite the variations, errors and formatting differences
• A cohesive view of “all that we know”
© 2006 Identity Systems, a division of Nokia
Overcoming the Limits of Identity Data
• Maximize data quality at point of capture
• Use specialized search and matching technology
• Use search technology that doesn’t assume data has been corrected
• Ensure the same high-quality search covers all sources of customer data
• When searching against a fraud list, use a search strategy commensurate with the risk
• Identity data goes far beyond simple demographic data and could be product information, transactions or orders, emails and telephone call logs.