knowledge discovery in production

25
Knowledge Discovery in Production André Karpištšenko

Upload: andre-karpistsenko

Post on 07-Jan-2017

567 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Knowledge Discovery in Production

Knowledge Discovery in ProductionAndré Karpištšenko

Page 2: Knowledge Discovery in Production
Page 3: Knowledge Discovery in Production

Knowledge Discovery Requires Automation

Growth of information and devices per knowledge worker

1. Digital universe x3.8 in size in 2020. Focus on the highest-value subset.*

2. 26.3B devices in 2020, up +61% from 2015 with x2.7 IP traffic increase.**

3. 700M knowledge workers***, automation worth $5.2T to $6.7T****

* IDC, Apr 2014 ** Cisco, Jun 2016 *** Teleport.org, Jun 2016 **** McKinsey, Jun 2016

Page 4: Knowledge Discovery in Production

Core Dataflow

Model Engine

Preprocessing Dataflow

System Composition: Networked Intelligence

Mature

Nascent

Emerging

networked.ai

Infrastructure, Data & IoT Platforms, Advanced Analytics Platforms

Input Data

Info Merger

Data Curator Preparer & Explorer

Base Library

SelectorExecutor

Self-improvementInterpreter

Output Interfaces Core Human Interfaces

Knowledge Manager

Knowledge Manager

Page 5: Knowledge Discovery in Production

Predictive Modeling Flow Example

DashOpt

FeatureEngineering

RawData

RawFeatures

Labels

FeatureIntegration

Featureswith Labels

DataPartitioning

Training Data

Validation Data

Testing Data

Model Training

Evaluate formodel selection

Compute offlineevaluation metrics

Best model

Offline scoringand indexing

Online/offline systems

Online A/B test

Labelpreparation Log data

Scoring features

Raw features

FeatureintegrationModel

Performance

Test Results

Page 6: Knowledge Discovery in Production

Applications in ProductionElectronics Manufacturing Biotechnology

Process time reduction

Predictive maintenance Quality improvement

Yield increase

Page 7: Knowledge Discovery in Production

Product Preview

Page 8: Knowledge Discovery in Production

Preprocessing data for manufacturing analytics is complex and time consuming.

Custom built preprocessing solutions are used to gather data in electronics manufacturing.

The problem

How do people solve it today

Page 9: Knowledge Discovery in Production

Product Scope

Data-driven electronics manufacturing enabling understanding and prediction

• Heavy machinery

• Automotive

• Consumer Devices & Networks

• Drives

• PLC

Page 10: Knowledge Discovery in Production

Product for Pilot Factories

Page 11: Knowledge Discovery in Production

Product Solution

• Hybrid SaaS factory subscriptions and applications via open marketplace

• Real-time data streams from the field and factories for R&D and production

Electronics Factories

End Products

IoT Platforms Cloud Services

Page 12: Knowledge Discovery in Production

Delivering Business Value

Enabled metrics dataIncreased engagement 2x

Enhanced usability of MESIncreased productivity

Test time reduction270k-290kEUR/plant

Reducing risk through higher quality data and improving business with data preprocessing

Page 13: Knowledge Discovery in Production

Industrial Analytics Example: Bosch Competition, I

4 product lines 52 stations Every feature has timestamp Data rows Parts of mechanical components

# (training data) – 1 183 747 # (test data) – 1 183 748

Data columns Anonymized features of stations

Numeric – 970 Categorical – 2 141

Bosch has to ensure that the recipes for the production of its advanced mechanical components are of the highest quality and safety standards. Part of doing so is closely monitoring its parts as they progress through the manufacturing processes.

https://www.kaggle.com/

Page 14: Knowledge Discovery in Production

(Dis%

nct)p

a,erns)of)m

issing)values)of)all)sta%

ons)))

Utilization of stations

Industrial Analytics Example: Bosch Competition, II

Prod

uct F

amili

es

Page 15: Knowledge Discovery in Production

https://sites.google.com/site/iotminingtutorial/

IoT Data Streams Mining

• Continuous data, dynamic models, distributed, few seconds

Page 16: Knowledge Discovery in Production

Streams Mining: Actors Model

Data processing pipeline Distributed processing

Kappa Architecture https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102

Page 17: Knowledge Discovery in Production

DashOpt: Data Science Intelligence

Page 18: Knowledge Discovery in Production

Real-Time Predictive Flow

ML & Simulation Platforms

IoT Platforms

Preprocessed Data

IoT Data

Earth Data

Manufacturing

Data

Predictive Models

Decision Tree SVM

Neural Network Random Forest

Data Science

Intelligence

Page 19: Knowledge Discovery in Production

Outlier Detection

• Single point anomaly detection: likelihood over distribution

• Finding anomalous groups: divergence estimation

• Methods: percentage change, T-test, Chi-square test, Generalized ESD (Extreme Studentized Deviate) test, Seasonal Hybrid ESD, etc.

• Goal: move from detection to automated response

Page 20: Knowledge Discovery in Production

Outlier Detection in Practice

• Too many detections of too little value

• Use methods for thresholds

• Breakout detection and Concept Drift

• For changing distributions move baselines over time

• Risk of overfitting to known anomalies, not finding unknown anomalies

Page 21: Knowledge Discovery in Production

Bayesian aka Active Optimization

• Examples: Design of Experiments, hyper-parameters of supervised learning, algorithms tested with simulations

f is an unknown expensive black-box function with the goal to approximately optimize f with as few experiments as possible

• No free lunch theorem

• Other bio-inspired algorithms for optimization exploitation and exploration: neural networks, genetic algorithms, swarm intelligence, ant colony optimisation, etc.

Page 22: Knowledge Discovery in Production

Bayesian Optimization in Practice

• SigOpt experience: 20 dimensions, above human capacity.

• Uber ATC experience: scaling active optimization to high dimensions default works reliably for 5-7 dim.

• Variables are added during optimization.

• Choose fidelity using heuristics.

Page 23: Knowledge Discovery in Production

DashOpt: Data Science Intelligence

US Patent pending

Page 24: Knowledge Discovery in Production

Extensive data bases of DNA sequences, metabolism of cells and components – enzymes etc., high-throughput experimental omics-methods

Software environment for in silico ab initio design of cells, and in silico testing (predictive modeling) of the cell designs in manufacturing processes

Current State in Biotech

Already available Future state

Page 25: Knowledge Discovery in Production

Thinking about Value from Data Science