semi-automated mapping of industry standards · the first ai poc. follow us, share with us!...
TRANSCRIPT
Semi-automated mapping of industry standards
A modern approach
Köln, 18.09.2019, eCl@ss Congress
Slide 2 | © CAMELOT 2019 | Semi-automated mapping of industry standards
Value chain management at the leading edge – from strategy to solutions
Intelligent Data Management Camelot Approach
CAMELOT Management Consultants: Thought Leader in Value Chain Management for more than 20 years
Global industry specialist in Life Sciences, Chemicals, Consumer Packaged Goods and Industrial Manufacturing
Specialized expertise for superior project quality and results
Camelot Innovative Technologies Lab is our incubator for Digital Innovations and business applications
Co-Innovation Partnerships with the leading software providers such as SAP SE & IBM
Data-centric and digital business solutions
Sourcing &Network Collaboration
Supply Chain Operations & Manufacturing
Distribution &Logistics
Sales &Customer Centricity
Strategy & Business Model Innovation
Organization & Business Transformation
Finance & Performance Management
Data & Analytics
Slide 3 | © CAMELOT 2019 | Semi-automated mapping of industry standards
AI@CAMELOT
DKFI
Student networks at the University of Mannheim
Start-up companies
Regular presentations at renowed conference
Our partnerships brigde science and business:
HOLISTIC APPROACH
We combine data, people, technology and processes to deliver superior quality solutions that build competitive advantage at any stage of the value chain.
TEAM OF EXPERTS
Our team of highly-qualified data scientists and businessexperts develops solutions for your individual needs and helps to generate value starting with the first AI PoC.
Follow us, share with us!
Intelligent Data Management Camelot Approach
Proven track record of over 15 years in data and information management
10 creative workshops conducted for AI MDM community memebers
8 successfully delivered PoCs (SAP MDG Assistant –chatbot, Rule mining, Web crawling, Data Plausibility Check, and others)
EXPERIENCE
COLLABORATION PLATFORM FOR AI FORERUNNERS
Quarterly newsletter and AI MDM webinar series foralready 100+ members.
ai-mdm.com
Slide 4 | © CAMELOT 2019 | Semi-automated mapping of industry standards
eCl@ss & Camelot for automated mapping
Intelligent Data Management Camelot Approach
March 2018eCl@ss
Workshop I
April 2018
eCl@ss Workshop II
January 2019
eCl@ss Cross Company
Call
July 2019
AI MDM Community Workshop
Slide 5 | © CAMELOT 2019 | Semi-automated mapping of industry standards
Business Problem
Intelligent Data Management Camelot Approach
Mapping of common standards & individual mapping to ERP data causes high entry barrier for companies.
Labor intense & time consuming
ERP classification unharmonized & grown historically, several standards exist
Product/engineering know-how & standards knowledge required
We envision a generic tool to ease
the mapping and encouragecompanies to join eCl@ss.
Slide 6 | © CAMELOT 2019 | Semi-automated mapping of industry standards
Business Value
Intelligent Data Management Camelot Approach
HIGHER eCl@ss ACCEPTANCECompanies will have an additional inventive to join eCl@ss.
ENABLER FOR INDUSTRY 4.0 IoTGreater number of members and easier technical compliance mechanism enlarges the potential of becoming the Industry 4.0 standard.
BETTER AND FASTERLess time required, higher-quality of mapping through elimination of human factor.
Reduced entry barriers for new companies
Scalability of mapping automation
Minimal human interaction required
Automated interfaces between manufacturer and seller
Classes, Characteristics, Values
Classes, Characteristics, Values
Optionsfor
required mappingsManufacturer Seller
X
Slide 7 | © CAMELOT 2019 | Semi-automated mapping of industry standards
Business case: mapping between standards
Intelligent Data Management Camelot Approach
Calculation based on the mapping between eCl@ss and APPLiA –Pi Standard
36 man-days
9 classes
700 features
2000 valuesAssumptions*:
Efforts:Team: 5 personsMeetings: 3 meetings each 2 daysPreparation: 10%Consolidation: 10%
➔ 5 persons x 3 x 2 = 30 man-days + (2 x 10%) = 30
man-days + (2 x 3) = 30 + 6 = 36 man-days
*based on the real mapping run
Efficiency gain through automated mapping of 4 man-days per class
Slide 8 | © CAMELOT 2019 | Semi-automated mapping of industry standards
ML/DS task for business problem
Intelligent Data Management Camelot Approach
1. Mapping of common standards based on:
Synonyms (A) – open source,
Characteristic definition, continuous text (B),
Characteristic/ value meta data (C),
Patterns of characteristic assignments to classes (D)
2. Mapping of company specific standards to common standards based on:
Similarity of the characteristic/ value name and historical mappings done by other companies (A),
Characteristic/ value meta data (B),
Patterns of characteristic assignments to classes (C)
Slide 9 | © CAMELOT 2019 | Semi-automated mapping of industry standards
High level solution envisioning: the Problem can be solved via transferring well established algorithms from the area of data integration
Mapping of common standards
ALGORITHM: SIMILARITY FLOODING
Developed by data base scientists from Stanford & Leipzig University
Well-established in the data science community
Widely used for the topics of data integration, e.g. mergers & acquisitions
Similarity between individual attributes computed based on semantic similarities of name, description, etc. via vectorization through thought vectors.
Local similarities are distributed along the structe of the standard. Two attributes are similar if their semantics and context are similar.
Pairs of attributes get a similarity score for. Best matches are taken as mapping candidates.
Standard mapped
Similarity based on semantics
Context-sensitive semantic similarity
Slide 10 | © CAMELOT 2019 | Semi-automated mapping of industry standards
The algorithm starts with a pairwise comparison of the entities in the ontologies
Mapping on Schema Level
Initial similarity comparison via:
Property names
Property values
Data Types
…
1
2
Slide 11 | © CAMELOT 2019 | Semi-automated mapping of industry standards
Local similarities are distributed along the hierarchy
Mapping on Schema Level
Computer VS Electronics
Laptop VS Notebook Desktop PC VS Notebook
Laptop VS TVDesktop PC VS TV
0.3 0.2
0.70.9
Current similarity0.5
Computer
Laptop Desktop PC
Electronics
Notebook TV
VS
Compute new similarity score for each pair :
Consider similarities of neighboring nodes
Combine to new score via predefined rules
Iterate till system converges
Saves up to 50% of the manual effort
Slide 12 | © CAMELOT 2019 | Semi-automated mapping of industry standards
We want to leverage machine learning technologies to minimize the manual effort.
Mapping on Schema Level
Learn to compare
Use of current NLP Technology to improve initial matches
Natural language understanding for descriptions
Unsupervised learning of semantic descriptions
Topic modelling via Latent Dirichlet Analysis
Encoding of semantics via Word Vectors
Learn to combine
The message passing in the similarity flooding algorithm is structurally similar to the information flow in the recurrent neural network
Given the right training data we can refine the update rules to achieve higher accuracy
Slide 13 | © CAMELOT 2019 | Semi-automated mapping of industry standards
Even if corresponding classes and properties are identified in order to complete the mapping on instance level transformations have to be applied.
Mapping on Instance Level
𝑋1
𝑋2
𝑋3
(𝑋1, 𝑌1)
(𝑋2, 𝑌2)
(𝑋3, 𝑌3)
Active Learning of instance mappings The machine decides which instances to
translate for training.
Uses the most informative examples
Requires only very few examples
Slide 14 | © CAMELOT 2019 | Semi-automated mapping of industry standards
Each AI Application stands and falls with the user interface.
User Interface
The system needs to be designed to make the task of schema mapping as efficient and as comfortable as possible. Key ingrediency are A comprehensive visualization of the
mapping Intuitive correction functionalities Simple integration into enterprise process
Additionally the user feedback can becontinously fed back to the system to improvethe machine learning components Improve the matching quality in general Refine your own (e.g. company specific)
matcher
Slide 15 | © CAMELOT 2019 | Semi-automated mapping of industry standards
System Architecture
Architecture Overview
Classification A Classification B
Matching Algorithm
Probabilistic Mapping
User Interface
Matching algorithm Input: Classifications 𝐴,𝐵Output: Probabilistic mapping 𝜌: 𝐴 → 𝐵Algorithm capable of mapping arbitrary
industry standards against each other with minimal need for customization such as additional training (if any)
User interface Finalization of mapping by userGraphical presentation for quick and
convenient validation, e.g. visualization of hierarchy
Model refinement trough user feedback
Active Learning
ML Model
AI Core system
User Interaction
Slide 16 | © CAMELOT 2019 | Semi-automated mapping of industry standards
Envisioning workshop leading to a creation of a company specific AI innovation roadmap
Intelligent Data Management Camelot Approach
REASONING
Your specific needs for the application of AI in information management
Kickstart for newcomers in the AI MDM Community
Enrichment of a team event, innovation day or a training
2-day workshop moderated by Camelot
Extensive data intelligence experience and data science basics training
AI innovation roadmap Management
presentation
DELIVERABLES CONTENT
Design thinking methodology
Rapid Prototyping Insights from Camelot
data science and domain experts
Lessons learned from other customers and the AI MDM Community
INVESTMENT
Regular price: € 12.000 For AI MDM
Community members at their location: € 8.000, in one of Camelotoffices: € 6.000
Possible cost sharing with other members of the AI MDM Community
Slide 17 | © CAMELOT 2019 | Semi-automated mapping of industry standards
Entering, changing, approving data records in a system
Analyzing and processing of leading to identification of patterns,
records or classifications
Identifying relevant information in heterogenous sources including
unstructured documents and web
Extraction of classification attributes from data sheets
Gathered expertise and validated data science models enable accelerated delivery of intelligent data management projects
Intelligent Data Management Camelot Approach
Data understanding Data maintenance Data extraction
Optical Character Recognition
Table Extraction
Web crawling
Rule mining
Natural language understanding
Data classification
Fuzzy matching
Vectorization
Highlighting of found attributes on the data sheet
Editing of extracted values
Robotic process automation
Automated validation in the vendor data management in a shared service center
Extraction of relevant information and consistency check
Identification of supply chain scenarios and scenario based rules
Re
aliz
ed
p
roje
cts
Are
aA
pp
roac
he
s
Contact
Aleksandra BaumannSenior Consultant – AI for IM
Camelot MC AGTheodor-Heuss-Anlage 1268165 Mannheim, Germany
Tel: +49 621 86298 154Mob: +49 1732338419
Thomas GeyerPrincipal – EDM
Camelot MC AGTheodor-Heuss-Anlage 1268165 Mannheim, Germany
Tel: +49 621 86298 372Mob: +49 172 7412248
Dr. Faried Abu ZaidSenior Consultant – AI for IM
Camelot MC AGRadlkoferstr. 281373 Munich, Germany
Tel: +49 621 86298 431Mob: +49 1724966404