in3020/4020 –database systems spring 2020, week 16.2, part ...€¦ · in3020/4020 –database...

45
IN3020/4020 – Database Systems Spring 2020, Week 16.2, Part 1 Data Science Dr. M. Naci Akkøk, Chief Architect, Oracle Nordics Co-founder & CEO, In-Virtualis Also based upon slides from Renée Wikestad, Senior Solution Engineer, Oracle

Upload: others

Post on 29-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part ...€¦ · IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part 1 Data Science Dr. M. Naci Akkøk, Chief Architect,

IN3020/4020 – Database Systems Spring 2020, Week 16.2, Part 1

Data Science

Dr. M. Naci Akkøk, Chief Architect, Oracle NordicsCo-founder & CEO, In-Virtualis

Also based upon slides from Renée Wikestad, Senior Solution Engineer, Oracle

Page 2: IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part ...€¦ · IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part 1 Data Science Dr. M. Naci Akkøk, Chief Architect,

Major driver of DBMS development/usageo If you haven´t noticed yet, one of the major users of (and hence

drivers in research for) databases and database management systems is data-intensive or data-driven applications:

o Analyticso Data Science (or applied statistics)o Artificial Intelligence (AI) and Machine Learning (ML)o Also including pattern recognition, text and speech

analysis/synthesis etc.

Page 3: IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part ...€¦ · IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part 1 Data Science Dr. M. Naci Akkøk, Chief Architect,

Artificial Intelligence

Machine Learning

Deep Learning

So what is … ?

Page 4: IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part ...€¦ · IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part 1 Data Science Dr. M. Naci Akkøk, Chief Architect,

Data Science

Artificial Intelligence

Machine Learning

Deep Learning

Page 5: IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part ...€¦ · IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part 1 Data Science Dr. M. Naci Akkøk, Chief Architect,

Data Scientist; Skills

Math and Statistics

Business/DomainsKnowledge

Computer Science

Machine LearningDiagnosticAnalysis

Soft

war

e De

velo

pmen

t

Page 6: IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part ...€¦ · IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part 1 Data Science Dr. M. Naci Akkøk, Chief Architect,

What is Machine Learning?

ML is the scientific study of algorithms and statistical models that computer systems use to perform a specific task without using explicit instructions, relying on patterns and inference instead.

Page 7: IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part ...€¦ · IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part 1 Data Science Dr. M. Naci Akkøk, Chief Architect,

What is Machine Learning?

ML is the scientific study of algorithms and statistical models that computer systems use to perform a specific task without using explicit instructions, relying on patterns and inference instead.

Page 8: IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part ...€¦ · IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part 1 Data Science Dr. M. Naci Akkøk, Chief Architect,

What is Machine Learning?

ML is the scientific study of algorithms and statistical models that computer systems use to perform a specific task without using explicit instructions, relying on patterns and inference instead.

Page 9: IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part ...€¦ · IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part 1 Data Science Dr. M. Naci Akkøk, Chief Architect,

What is Machine Learning?

ML is the scientific study of algorithms and statistical models that computer systems use to perform a specific task without using explicit instructions, relying on patterns and inference instead.

Page 10: IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part ...€¦ · IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part 1 Data Science Dr. M. Naci Akkøk, Chief Architect,

What is Machine Learning?

ML is the scientific study of algorithms and statistical models that computer systems use to perform a specific task without using explicit instructions, relying on patterns and inference instead.

There are hybrid models, but such applications arealways data driven, data intensive.

Page 11: IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part ...€¦ · IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part 1 Data Science Dr. M. Naci Akkøk, Chief Architect,

The old way

Human written rules guided by experience

Problem: Difficult to determine the right rules, rules may change, impossible to evaluate the fit for every individual

?With Machine Learning

Machine finds patterns in historical data (model)

Uses this knowledge to predict best audience

Example: Address the right audience for my product

Page 12: IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part ...€¦ · IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part 1 Data Science Dr. M. Naci Akkøk, Chief Architect,

? pneumonia

healthy

The old way

Human work by highly skilled doctors.

With Machine Learning

Machine finds patterns in labelled historical images.

Uses this knowledge to diagnose new images.

Example: Medical diagnosis

Page 13: IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part ...€¦ · IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part 1 Data Science Dr. M. Naci Akkøk, Chief Architect,

Example: Fraud Detection

The old way

Rule based systems.

High amount of false positives.

With Machine Learning

Combination of techniques, including Anomaly Detection.

Higher accuracy.

Page 14: IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part ...€¦ · IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part 1 Data Science Dr. M. Naci Akkøk, Chief Architect,

SalePrice

Size (m2)

400K

200100 300

300K

200K

100K

500K

A model is a function

f(x)=Target

x = Input Feature

f(x) = 150 + 1.25 x

Page 15: IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part ...€¦ · IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part 1 Data Science Dr. M. Naci Akkøk, Chief Architect,

How to find a model with Linear regression?

Hypothesis:

Parameters:

Cost Function:

Target:

• We use a training set of historical data to find parameters that provide the function that best fits the training data.

• We then use a test data set of historical data to checkhow well the model fits this test data.

• If the model is good enough, we can use it to predictfuture values.

Page 16: IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part ...€¦ · IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part 1 Data Science Dr. M. Naci Akkøk, Chief Architect,

Machine Learning Types

Supervised

CLASSIFICATIONREGRESSION

Page 17: IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part ...€¦ · IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part 1 Data Science Dr. M. Naci Akkøk, Chief Architect,

Machine Learning Types

Supervised

CLASSIFICATIONREGRESSION CLUSTERING

Unsupervised

Page 18: IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part ...€¦ · IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part 1 Data Science Dr. M. Naci Akkøk, Chief Architect,

Machine Learning Types

CLASSIFICATIONREGRESSION CLUSTERING

Supervised Unsupervised Reinforcement

Agent

Environment

state

reward

action

Page 19: IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part ...€¦ · IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part 1 Data Science Dr. M. Naci Akkøk, Chief Architect,

o Supervised Learning: Uses historical data to find a model (mathematical function) or classification / pattern that best fits the data in order to predict valuesfor new data.

o Unsupervised Learning:Finds trends and patterns in data, without previous training on historicaldata. Here you try to find patterns/classifications of the data with littleor no idea what the result should look like.

Reinforcement Learning: Uses observations to learn. Learning is based on the fact that all actionshave an effect on the model and that feedback is given from the actionsthat guide and improve the learning.

Machine Learning Algorithms can be divided into categories according to their purpose.Machine Learning Models

S

M

Page 20: IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part ...€¦ · IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part 1 Data Science Dr. M. Naci Akkøk, Chief Architect,

Top languages for Data Scientists

Python and SQL are leading, with R in a good third place.

Page 21: IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part ...€¦ · IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part 1 Data Science Dr. M. Naci Akkøk, Chief Architect,

Access to Data

• On-prem/cloud/hybrid data management

• Finding data

• Combining data

Challenges of Data Science

Complex lifecycle to build models

• Multiple roles involved

• Reuse of past work

• Access to tools and infrastructure

• Access to flexible compute power

Operationalizing ML• Packaging models for

consumption

• Monitoring models

• Updating models

Page 22: IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part ...€¦ · IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part 1 Data Science Dr. M. Naci Akkøk, Chief Architect,

Problem Definition

Dataset Discovery and

Access

Data Exploration + Preparation

Model BuildingModel

Evaluation + Selection

Model Deployment

Model Evaluation

The Data Science Process

Page 23: IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part ...€¦ · IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part 1 Data Science Dr. M. Naci Akkøk, Chief Architect,

The Machine Learning Process

Business Understanding

Data Understanding Data Preparation Modeling Evaluation Deployment

Capture DataDescribe DataExplore Data

Select DataClean DataConstruct DataCombine Data

Select Modeling TechniqueBuild ModelAsses Model

Evaluate resultDecide Next Step

ObjectivesSuccess Criteria

Page 24: IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part ...€¦ · IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part 1 Data Science Dr. M. Naci Akkøk, Chief Architect,

Analytics Development

Reports & Dashboards

Data Information Predictions & Insights Appls with ML

Analytical Maturity

Valu

e of

Ana

lytic

s

Diagnostic Analysis & Reports

Predictive / Machine Learning

“ML Enabled” Applications

What Happened?

Why it Happened?

What WILL happen?

Automated ML Appls

Machine Learning Differentiators

Page 25: IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part ...€¦ · IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part 1 Data Science Dr. M. Naci Akkøk, Chief Architect,

Business Understanding

• Predict employees that voluntarily leave• Predict customers that are likely to churn• Target “best” customers • Find items that will help me sell more most profitable items• What is a specific customer most likely to purchase next?• How can I combat fraud?• ..and so on

Start with a Well-Defined Business Problem Statement

Page 26: IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part ...€¦ · IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part 1 Data Science Dr. M. Naci Akkøk, Chief Architect,

“If I had an hour to solve a problem I'd spend 55 minutes thinking about the problem and 5 minutes thinking about solutions.”

― Albert Einstein

Business UnderstandingStart with a Well-Defined Business Problem Statement

Page 27: IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part ...€¦ · IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part 1 Data Science Dr. M. Naci Akkøk, Chief Architect,

Move the Algorithms, Not the Data!

Machine Learning Algorithms Require Data

X1

X2

A1 A2 A3 A4 A5 A6 A7

Page 28: IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part ...€¦ · IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part 1 Data Science Dr. M. Naci Akkøk, Chief Architect,

Algorithms automatically sift through large amounts of data to discover hidden patterns, new insights and make predictions

In-database ML

X1

X2

A1 A2 A3 A4 A5 A6 A7

Identify most important factor (Attribute Importance)

Predict customer behavior (Classification)Find profiles of targeted people or items (Classification)

Predict or estimate a value (Regression)

Segment a population (Clustering)

Find fraudulent or “rare events” (Anomaly Detection)

Determine co-occurring items in a “basket” (Associations)

Page 29: IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part ...€¦ · IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part 1 Data Science Dr. M. Naci Akkøk, Chief Architect,

Built-in ml library. Try these in always free Oracle Autonomous Data Warehouse or Transaction Processing databases: https://www.oracle.com/cloud/free/?source=:ow:o:p:nav:0916BCButton&intcmp=:ow:o:p:nav:0916BCButton

Classification• Naïve Bayes• Logistic Regression (GLM)• Decision Tree• Random Forest• Neural Network• Support Vector Machine• Explicit Semantic Analysis

Clustering• Hierarchical K-Means• Hierarchical O-Cluster• Expectation Maximization (EM)

Anomaly Detection• One-Class SVM

Time Series• Forecasting –

Exponential Smoothing• Includes popular models

e.g. Holt-Winters with trends, seasonality, irregularity, missing data

Regression• Linear Model• Generalized Linear Model• Support Vector Machine (SVM)• Stepwise Linear regression• Neural Network• LASSO

Attribute Importance• Minimum Description Length• Principal Comp Analysis (PCA)• Unsupervised Pair-wise KL Div• CUR decomposition for row & AI

Association Rules• A priori/ market basket

Predictive Queries• Predict, cluster, detect, features

SQL Analytics• SQL Windows• SQL Patterns• SQL Aggregates

Feature Extraction• Principal Comp Analysis (PCA)• Non-negative Matrix Factorization• Singular Value Decomposition (SVD)• Explicit Semantic Analysis (ESA)

Text Mining Support• Algorithms support text• Tokenization and theme extraction• Explicit Semantic Analysis (ESA) for

document similarity

Statistical Functions• Basic statistics:

min, max, median, stdev, t-test, F-test, Pearson’s, Chi-Sq, ANOVA, etc.

Model Deployment• SQL—1st Class Objects • Oracle RESTful API (ORDS)• OML Microservices (for Apps)

X1

X2

A1 A2 A3 A4 A5 A6 A7

Page 30: IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part ...€¦ · IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part 1 Data Science Dr. M. Naci Akkøk, Chief Architect,

o STATISTICAL FUNCTIONS

o Descriptive statistics (e.g. median, stdev, mode, sum, etc.)

o Hypothesis testing (t-test, F-test, Kolmogorov-Smirnov test, Mann Whitney test, Wilcoxon Signed Ranks test

o Correlations analysis(parametric and nonparametric e.g.Pearson’s test for correlation, Spearman's rho coefficient, Kendall's tau-b correlation coefficient)

o Ranking functions

o Cross Tabulations with Chi-square statistics

o Linear regression

o ANOVA (Analysis of variance)

o Test Distribution fit (e.g., Normal distribution test,Binomial test, Weibull test,Uniform test, Exponential test,Poisson test)

o Statistical Aggregates(min, max, mean, median, stdev, mode, quantiles, plus x sigma, minus x sigma, top n outliers, bottom n outliers)

o ANALYTICAL SQL

o SQL Windows

o SQL Aggregate functions

o LAG/LEAD functions

o SQL for Pattern Matching

o Additional approximate query processing: APPROX_COUNT, APPROX_SUM, APPROX_RANK

o Regular Expressions

Statistical Functions and Analytical SQL

Page 31: IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part ...€¦ · IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part 1 Data Science Dr. M. Naci Akkøk, Chief Architect,

In-database ML: Solution to one of the biggest problems

One of the major challenges with ALL data-drive, data-intensive applications is getting the data into the database!

Time consumingSecurity / quality riskOptimization & performance challenges … and others

The way Oracle natively includes of ML algorithmsPerform parallel ML directly in Data Warehouse or OLTP for fast model building and real time scoring of new dataKeep data secure by avoiding copy contagion – copies of data in analytics systems are a common source of data breaches

Obstacles to adoption of ML in database have been removedAbility to call out to latest R and Python algorithmsFor example accessible via SQL, REST API, Oracle R Enterprise and Python

Page 32: IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part ...€¦ · IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part 1 Data Science Dr. M. Naci Akkøk, Chief Architect,

Traditional approaches to building models are cumbersome

Modelbuilding

Scoring

Data extraction

Data exploration and transformation

Model evaluation

Modeldeployment

Page 33: IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part ...€¦ · IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part 1 Data Science Dr. M. Naci Akkøk, Chief Architect,

o There is a better way

§ Optimized algorithms§ Models produced faster and with less effort§ Better data protection and security§ Faster access to data

Page 34: IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part ...€¦ · IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part 1 Data Science Dr. M. Naci Akkøk, Chief Architect,

In-database machine learning is better

Modelbuilding

Modeldeployment

Data extraction

Data exploration and transformation

Model evaluation

Scoring

Page 35: IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part ...€¦ · IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part 1 Data Science Dr. M. Naci Akkøk, Chief Architect,

In-database machine learning

• Parallel, distributed, in-DB execution• Improved memory utilization for

greater scalability• Wide range of algorithms• Flexible model building

SQL notebook R Data Miner (no coding)Python

• Data stays secured in database• Deploy as REST API available

Page 36: IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part ...€¦ · IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part 1 Data Science Dr. M. Naci Akkøk, Chief Architect,

Machine Learning on Data Platforms

OCI Data Science Service Bring Your Own Analytics

Approaches to building ML models

Build ML on databaseManage projects and collaboratively build, deploy,

manage models

Architect your own platform with

open source

Business user friendly approach to ML

• Large amounts of data in database?

• Governance / security?

• Skillset?

Page 37: IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part ...€¦ · IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part 1 Data Science Dr. M. Naci Akkøk, Chief Architect,

Build models in a notebook

Oracle Machine Learning notebooks

o Included in Autonomous Data Warehouse

o Familiar notebook for data scientists and collaborators

o Easy sharing of notebooks and templates with permissions, versioning, and execution scheduling

o In-database machine learning algorithms and analytics functions via SQL, and later Python and R

o Secure access to data in the database

Page 38: IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part ...€¦ · IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part 1 Data Science Dr. M. Naci Akkøk, Chief Architect,

Machine learning is better in the cloudThis is what Oracle Cloud offers:

High performance compute

§ Flexible shapes from VMs to Bare Metal

§ 40% better performance than other clouds

§ 30% cheaper GPUs

Scalable, independent storage

§ NVMe SSD, Object, Block and Archive Storage

§ 50% faster HDFS performance§ 90% lower Data Lake costs with

Data Tiering

Guaranteed network performance

§ Non-over subscribed, flat 25 Gbps bandwidth between nodes

§ Only cloud with Network performance SLA

§ No egress charges

Page 39: IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part ...€¦ · IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part 1 Data Science Dr. M. Naci Akkøk, Chief Architect,

The Evolution of the Database Developer/DBA Role

o Data Engineero Architecture, o “data wrangler”

Machine Learning Solving data-driven problemsDiscovering insights Making predictions

Data SecurityData classification,Data life-cycle mgmt

Application TuningSQL tuning, connection mgmt

Page 40: IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part ...€¦ · IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part 1 Data Science Dr. M. Naci Akkøk, Chief Architect,

o Data remains in Database

• Model building and scoring occur in-database• Leverage investment in Oracle IT

o Eliminate data duplication• Eliminate separate analytical servers

o Deliver enterprise-wide “predictive” applications

Savings with in-database Machine Learning

Don’t Move the Data!Traditional ML

Hours, Days or Weeks

Data Extraction

Data Prep & Transformation

Model Building

Model “Scoring”

Data Prep. &Transformation

Data Import

avings

Model “Scoring”Embedded Data Prep

Data Preparation

Model Building

Oracle Machine Learning

Seconds, Minutes or Hours

.

Page 41: IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part ...€¦ · IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part 1 Data Science Dr. M. Naci Akkøk, Chief Architect,

SELECT SUBSTR(cust_income_level, 1, 22) income_level,AVG(DECODE(cust_gender, 'M', amount_sold, null)) sold_to_men,AVG(DECODE(cust_gender, 'F', amount_sold, null)) sold_to_women,STATS_T_TEST_INDEPU(cust_gender, amount_sold, 'STATISTIC', 'F') t_observed,STATS_T_TEST_INDEPU(cust_gender, amount_sold) two_sided_p_value

FROM customers c, sales sWHERE c.cust_id = s.cust_idGROUP BY ROLLUP(cust_income_level)ORDER BY income_level, sold_to_men, sold_to_women, t_observed;

SQL Statistical Functions

STATS_T_TEST_INDEPU (SQL) Example; P_Values < 0.05 show statistically significantly differences in the amounts purchased by men vs. women

Compare AVG Purchase Amounts Men vs. Women Grouped_By INCOME_LEVELSimple SQL Syntax—Statistical Comparisons (t-tests)

Page 42: IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part ...€¦ · IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part 1 Data Science Dr. M. Naci Akkøk, Chief Architect,

o BEGINo DBMS_DATA_MINING.CREATE_MODEL(o model_name => 'BUY_INSUR1',o mining_function => dbms_data_mining.classification,o data_table_name => 'CUST_INSUR_LTV',o case_id_column_name => 'CUST_ID',o target_column_name => 'BUY_INSURANCE',o settings_table_name => 'CUST_INSUR_LTV_SET');o END;o /

Simple SQL Syntax—Classification ModelOML for SQL Model Build & Apply

Select prediction_probability(BUY_INSUR1, 'Yes'USING 3500 as bank_funds, 825 as checking_amount, 400 as credit_balance, 22 as age, 'Married' as marital_status, 93 as

MONEY_MONTLY_OVERDRAWN, 1 as house_ownership)from dual;

ML Model Build (PL/SQL)

Model Apply (SQL query)

.

Page 43: IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part ...€¦ · IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part 1 Data Science Dr. M. Naci Akkøk, Chief Architect,

o BEGINo DBMS_DATA_MINING.CREATE_MODEL(o model_name => 'BUY_INSURANCE_AI',o mining_function => DBMS_DATA_MINING.ATTRIBUTE_IMPORTANCE,o data_table_name => 'CUST_INSUR_LTV',o case_id_column_name => 'cust_id',o target_column_name => 'BUY_INSURANCE',o settings_table_name => 'Att_Import_Mode_Settings');o END;o /

Simple SQL Syntax—Attribute ImportanceOML for SQL Model Build

SELECT attribute_name, explanatory_value, rankFROM BUY_INSURANCE_AIORDER BY rank, attribute_name;

ML Model Build (PL/SQL)

Model Results (SQL query)

ATTRIBUTE_NAME RANK ATTRIBUTE_VALUEBANK_FUNDS 1 0.2161MONEY_MONTLY_OVERDRAWN 2 0.1489N_TRANS_ATM 3 0.1463N_TRANS_TELLER 4 0.1156T_AMOUNT_AUTOM_PAYMENTS 5 0.1095

A1 A2 A3 A4 A5 A6 A7

Page 44: IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part ...€¦ · IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part 1 Data Science Dr. M. Naci Akkøk, Chief Architect,

o Collaborative UI o Based on Apache Zeppelino Supports data scientists, data analysts,

application developers, DBAso Easy sharing of notebooks and templates o Permissions, versioning, and execution scheduling

o Included with Autonomous Databaseo Automatically provisioned, managed, backed upo In-database SQL algorithms and analytics functions o Supports deployment of ML analytics via ORDS

(Oracle REST Data Services)o Soon to be augmented with Python and R

Autonomous Database + Machine Learning PlatformOracle Machine Learning Notebooks

Page 45: IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part ...€¦ · IN3020/4020 –Database Systems Spring 2020, Week 16.2, Part 1 Data Science Dr. M. Naci Akkøk, Chief Architect,

Demo ML in Databaseo Using Oracle Autonomous Database