10-Step Methodology to Building a Single View
Mat Keep, Director of Product & Market Analysis. [email protected] @matkeep Jon Rangel, Director of Professional Services, EMEA. [email protected]
What You Will Learn
1. Single View: Opportunities & Challenges
2. Repeatable 10-Step Methodology
3. Required Technical Capabilities
Single View Defined • What
– Single, real-time representation of a business entity or domain
– Customer, product, supply chain, financial asset class, & more
• How – Gathers and organizes data from multiple,
disconnected sources; – Aggregates information into a standardized format
and joint information model
• Why – Improves business visibility – Serve operational applications – Foundation for analytics
Single View Use Cases
• Comparative view of traders or products
• Firm-wide view of asset exposure
• Aggregated transactions for fraud models
• Omni-channel view of customers for personalized marketing
• Inventory control & management
• Single view of product across channels & demographics
• Management of patient medical records for treatment plans
• Macro-analysis view for public health
• Medical history to identify insurance risk
Finance Retail Healthcare
Challenges • Current State
– Data dispersed across multitude of systems – Different structures, different attributes – Apps built to meet specific business requirements, not
integrated – New data sources from new apps, M&A
• Governance Processes – How to deliver & maintain single view in face of
constant business change • Technology Limitations
– Traditional databases not well suited to single view required capabilities
ETL
or M
essa
ge Q
ueue
Web
Mobile
CRM
Mainframe Single View
Call Center
Analytics
Technical Support
Billing
Source Systems Consuming Systems
Load Reads
High Level Architecture
10-Step Methodology
Step 1: Define Scope
Step 4: Appoint
Data Stewards
Step 5: Develop
Data Model
Step 6: Load &
Standardize
Step 7: Merge,
Test & Reconcile
Step 8: Infrastructure
Design
Step 3: Identify
Data Producers
Step 2: Identify
Data Consumers
Step 9: Modify Consuming
Systems
Step 10: Maintenance Processes
Discover
Develop
Deploy
Step 1: Define Scope & Sponsorship • Scope needs to be realistic, defined by specific success metric
– Long term: aggregate all customer data into a single view, serving all business functions
– Initial phase: collecting all customer interactions on digital channels over past 3-months to improve call center MTTR
• Appoint executive sponsors – Senior: allocate resources and command credibility – Combination of senior title from the business, and from the technology
group
Discover
Web
Mobile
CRM
Mainframe
Source Systems
Steps 2 & 3: Identify Data Consumers & Producers
• Single View Consumers Define – Typical queries and SLAs – Required data attributes – Current data sources
• Identify apps generating the source data – Identify application owners + associated databases – Profile apps: operational, analytical
Step 2: Data Consumers
Step 3: Data Producers
Discover
Step 4: Appoint Data Stewards
• Data steward appointed for each data source.
• Deep knowledge of: – Source system schema – Which tables store required attributes, what format – Clients and apps that generate & consume the
source data
• Advise on data loading strategies
Develop
Step 5: Develop Single View Data Model • Key inputs
– Required data attributes – Query patterns
• Define common fields & data types – Create rules to validate common data
• Define primary & secondary indexes • Identify dynamic fields
– No need to pre-declare when using a document database
• Localize data into a single document (where appropriate)
{_id : “[email protected]”,first_name : "Mark",last_name : "Smith",city : "San Francisco",
phones: [ { number : “1-212-777-1212”, dnc : true, type : “home”
},{
number : “1-212-777-1213”, type : “cell”
}]}
Single View
Develop
Step 6: Load 2 phases: Initial Load & Delta Load Emit JSON to preserve data types. Use Extended JSON
Load
ETL
or M
essa
ge Q
ueue
Single View
Develop
Initial Load • ETL Tools • Custom Loaders
Delta Load • Batch loads: use tools above • Real-time loads: Message queue
Step 6 (cont’d): Standardize
Data Source A Data Source B Data Source C
14 77 26
cust_id: 14 f_name: James l_name: Bond dob: 07/14/1968 eMail: [email protected]
fno: 77 first: Jim last: Bond born: 1968-‐07-‐14 email: [email protected]
xc_id: 26 name: James Bind bdate: July 14, 68 Email: [email protected]
Develop
Step 7: Match, Merge & Reconcile Develop
cust_id: 14 f_name: James l_name: Bond dob: 07/14/1968 eMail: [email protected]
xc_id: 26 name: James Bind bdate: July 14, 68 Email: [email protected]
source_id: A_14 first_name: James last_name: Bond dob: 1968-‐07-‐14 eMail: [email protected]
source_id: B_77 first_name: Jim last_name: Bond dob: 1968-‐07-‐14 eMail: [email protected]
source_id: C_26 first_name: James last_name: Bind dob: 1968-‐07-‐14 eMail: [email protected]
_id: [email protected] first_name: James last_name: Bond dob: 1968-‐07-‐14
Source Data
Standardized Data Field names & data types
Single View Data merged, tested & reconciled
fno: 77 first: Jim last: Bond born: 1968-‐07-‐14 email: [email protected]
Step 7 (cont’d): Match, Merge & Reconcile • Use iterative grouping functions to cluster records with similar
attributes 1. Match against unique, authoritative attributes (email address, credit card #) 2. Match by combining attributes (last name, DoB, zip code) 3. Use fuzzy matching to catch errors in source data (i.e. different spellings of customer
name)
• Apply confidence factor to dictate merging – Automatically merge records with 95%+ confidence – Manually inspect records with lower confidence
Develop
Step 7 (cont’d): MongoDB Tools • Workers framework to parallelize document comparisons • Grouping tool to cluster documents based on attribute similarity
– Levenshtein to calculate distances, single-linkage clustering for matching
Develop
Step 8: Architecture Design Deploy
• Deployment infrastructure • MongoDB Production Readiness Consulting
Package provides recommendations: – Hardware sizing – HA/DR strategies – Scaling – Security for corporate and regulatory compliance
• Follow-on services for implementation
Step 9: Modify Consuming Systems Deploy
• Modify the apps that consume the single view – Create an API that exposes the single view (i.e.
RESTful web service) – Re-point apps to the web service (reads initially)
• Modify one consuming application at time
Call Center
Analytics
Technical Support
Billing
Consuming Systems
Reads
Single View
Step 10: Implement Maintenance Processes Deploy
• Frequency of application launch & evolution is accelerating
• Impacts to single view – Adding new attributes from source systems – Onboarding new data sources or digital channels – Creating new apps that consume the single view
• Single view team needs to institutionalize governance around on-going maintenance – Repeat the 10-step process – Dynamic schema is HUGE!
Scope
Bus
ines
s B
enef
its
Transactions are written first to the single view, which propagates the data back to the source system of record.
Writes are performed concurrently to the source systems as well as the single view
The single view data model is enriched with additional sources to serve more applications, including real-time analytics. The single view becomes a platform serving multiple applications
Single View Platform
Records are copied via ETL or message queue mechanisms from the source systems into the single view, serving read queries. The single view serves one specific application
Single View Application
Single View First
Dual Writes
Read Centric
Transforming the role of the single view
Reads & Writes
Single View Maturity Model
• Advantages of writing to the single view – Fresher data – Reduced app complexity – Improved application agility
Architecture for Writes to the Single View
ETL
or M
essa
ge Q
ueue
Web
Mobile
CRM
Mainframe
Single View Call Center
Analytics
Technical Support
Billing Update Queue
Reads
Writes
Source Systems Consuming Systems
Load
Required Database Capabilities
• Data model flexibility with a dynamic schema • Real-time analytics • Performance, scale & always-on • Enterprise deployment model
MongoDB Compass
MongoDB Connector for BI
MongoDB Enterprise Server
Enterprise Deployment Model 24
x 7
Sup
port
(1 h
our S
LA)
Com
mercial License
(N
o AG
PL C
opyleft Restrictions)
Platform Certifications
MongoDB Ops Manager
Monitoring & AlerBng
Query OpBmizaBon
Backup & Recovery
AutomaBon & ConfiguraBon
Schema VisualizaBon
Data ExploraBon
Ad-‐Hoc Queries
VisualizaBon
Analysis
ReporBng
AuthorizaBon AudiBng EncrypBon (In Flight & at Rest) AuthenBcaBon
REST API Emergency Patches
Customer Success Program
On-Demand Online Training
Warranty
Limitation of Liability
Indemnification
Single View of Customer Insurance leader generates coveted single view of customers in 90 days – “The Wall”
Problem Why MongoDB Results Problem Solution Results
No single view of customer, leading to poor customer experience and churn 145 years of policy data, 70+ systems, 24 800 numbers, 15+ front-end apps that are not integrated Spent 2 years, $25M trying build single view with RDBMS – failed
Built “The Wall,” pulling in disparate data and serving single view to customer service reps in real time Flexible data model to aggregate disparate data into single data store Expressive query language and secondary indexes to serve any field in real time
Prototyped in 2 weeks Deployed to production in 90 days Decreased churn and improved ability to upsell/cross-sell
Single View of LHC Analytics Data aggregation system to accelerate scientific research & discovery
Problem Why MongoDB Results Problem Solution Results
Raw data from LHC & experiments distributed across multitude of source systems Scientists don’t know location of source data, or how to extract it Relational databases rigid data model prevented aggregation of data from different sources
Data Aggregation System built on MongoDB, consolidating analytics into a single view Dynamic schema represents data of any structure MongoDB query language supports simple lookups to complex search, traversals & analytics
A single query to MongoDB can return 10,000 documents from different data sources for real time analytics Accelerates scientific time to insight Accessed by 3,000 physicists from 200 research institutions across the globe
Where to Go from Here? • Single view projects are challenging
– Partner with a vendor offering proven methodology, tools & technologies
• Learn More – Download the whitepaper – 10-Step Methodology to Building a Single View
• Engage – MongoDB Global Consulting Services can help you
scope the project and get started – Book a workshop
Single View of the Customer 360° view of the customer increases customer satisfaction, cross-sell & up-sell with MongoDB, Spark, & Hadoop
Problem Why MongoDB Results Problem Solution Results
Customer data scattered across 100+ different systems Poor customer experience: no personalization, no consistent experience across brands or devices No way to analyze customer behavior to deliver targeted offers
Single View application on MongoDB flexible data model, expressive query language, secondary indexes, & horizontal scalability Data from old relational systems fed into Spark for analysis and then stored in MongoDB to support real-time CRM Customer data synced from MongoDB to Hadoop for nightly batch jobs, then fed back to MongoDB for personalized recommendations
Single view serves customers from any channel Stores 10s of TBs of customer data across multiple data centers Increased revenues from improved customer intimacy, driving cross-sell and upsell
Global Airline
Data Model Flexibility
… Mobile App
Web
Call
Centre CRM Social Feed
COMMON FIELDS CustomerID | eMail |
DYNAMIC FIELDS Can vary from record to record: location, action
Single View
Customer Service Application
MongoDB Primary Replica Single View
BI & Reporting
REST Data Services
Real-time Data Services for Regulators & Partners
Visualisations Queries & Updates
Aggregates Predictive Analytics
MongoDB Secondary Replica Single View MongoDB Secondary Replica Single View MongoDB Secondary Replica Single View MongoDB Secondary Replica Single View MongoDB Secondary Replica
MongoDB Secondary Replica
Data Analytics Pipeline
Real-Time Analytics