introducing oracle machine learning for python · 2020. 11. 23. · submit your presentation by...

23
Mark Hornick - Senior Director, Data Science and Machine Learning at Oracle [email protected] , www.twitter.com/MarkHornick 1 Introducing Oracle Machine Learning for Python

Upload: others

Post on 23-Aug-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introducing Oracle Machine Learning for Python · 2020. 11. 23. · Submit your presentation by going to our website and clicking on ^ TechCasts_ 4. Oracle Machine Learning for Python

Mark Hornick - Senior Director, Data Science and Machine Learning at [email protected], www.twitter.com/MarkHornick

1

Introducing Oracle Machine Learning for Python

Page 2: Introducing Oracle Machine Learning for Python · 2020. 11. 23. · Submit your presentation by going to our website and clicking on ^ TechCasts_ 4. Oracle Machine Learning for Python

Future and pastTechCasts:

Analytics & Data Oracle User CommunitySame great technical content…new name!

www.andouc.org

Submit a topic to share at https://analyticsanddatasummit.org/techcasts/

Page 3: Introducing Oracle Machine Learning for Python · 2020. 11. 23. · Submit your presentation by going to our website and clicking on ^ TechCasts_ 4. Oracle Machine Learning for Python

3

Save the DateTechCast Days-Winter Session

January 26-28, 2021

Watch our website & social media channels for more details

Share your knowledge, expertise and ideas!

Submit your presentation by going to our website

and clicking on “TechCasts”

Page 4: Introducing Oracle Machine Learning for Python · 2020. 11. 23. · Submit your presentation by going to our website and clicking on ^ TechCasts_ 4. Oracle Machine Learning for Python

4

Page 5: Introducing Oracle Machine Learning for Python · 2020. 11. 23. · Submit your presentation by going to our website and clicking on ^ TechCasts_ 4. Oracle Machine Learning for Python

Oracle Machine Learning for PythonIntroduction

Mark Hornick

Senior Director, Data Science and Machine Learning

November 19, 2020

Copyright © 2020 Oracle and/or its affiliates.

Page 6: Introducing Oracle Machine Learning for Python · 2020. 11. 23. · Submit your presentation by going to our website and clicking on ^ TechCasts_ 4. Oracle Machine Learning for Python

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, timing, and pricing of any features or functionality described for Oracle’s products may change and remains at the sole discretion of Oracle Corporation.

Safe Harbor

Page 7: Introducing Oracle Machine Learning for Python · 2020. 11. 23. · Submit your presentation by going to our website and clicking on ^ TechCasts_ 4. Oracle Machine Learning for Python

An interpreted, object-oriented, high level, general purpose programming language

Designed for rapid application development and scripting to connect existing components

Created in the late 1980s and first released in 1991

Open source: https://www.python.org

World-wide usage

• Widely taught in Universities

• Many Data Scientists know and use Python

Thousands of open source packages to enhance productivity

What is Python?

Copyright © 2020 Oracle and/or its affiliates.

Page 8: Introducing Oracle Machine Learning for Python · 2020. 11. 23. · Submit your presentation by going to our website and clicking on ^ TechCasts_ 4. Oracle Machine Learning for Python

Traditional Python and Data Source Interaction

Access latencyParadigm shift: Python → Data Access Language → PythonMemory limitation – data size, in-memory processingSingle threadedIssues for backup, recovery, securityAd hoc production deployment

DeploymentAd hoc

cron job

Data SourceFlat Filesextract / exportread

export load

Data source connectivity packages, e.g., cx_Oracle

Read/Write files using built-in tool capabilities

Copyright © 2020 Oracle and/or its affiliates.

Page 9: Introducing Oracle Machine Learning for Python · 2020. 11. 23. · Submit your presentation by going to our website and clicking on ^ TechCasts_ 4. Oracle Machine Learning for Python

d

* Coming soon

Oracle Machine Learning

OML Services*Model Deployment and Management,

Cognitive Text

OML4SQLSQL API

OML4Py*Python API

OML4RR API

OML Notebookswith Apache Zeppelin on

Autonomous Database

OML4SparkR API on Big Data

Oracle Data MinerOracle SQL Developer extension

OML AutoML UI*Code-free AutoML interface on Autonomous Database

Copyright © 2020 Oracle and/or its affiliates.

Page 10: Introducing Oracle Machine Learning for Python · 2020. 11. 23. · Submit your presentation by going to our website and clicking on ^ TechCasts_ 4. Oracle Machine Learning for Python

Collaborative UI

• Based on Apache Zeppelin

• Supports data scientists, data analysts, application developers, DBAs with SQL and Python

• Easy notebook sharing

• Permissions, versioning, and scheduling of notebooks

Included with Autonomous Database

• Automatically provisioned and managed

• In-database algorithms and analytics functions

• Explore and prepare, build and evaluate models, score data, deploy solutions

Autonomous Database as a Data Science Platform

Oracle Machine Learning Notebooks

Copyright © 2020 Oracle and/or its affiliates.

Page 11: Introducing Oracle Machine Learning for Python · 2020. 11. 23. · Submit your presentation by going to our website and clicking on ^ TechCasts_ 4. Oracle Machine Learning for Python

Use Oracle Database as HPC environment• Explore, transform, and analyze data faster and at scale

Use in-database parallelized and distributed ML algorithms• Build more models on more data, and score large volume data – faster

• Use in-database algorithms from OML4SQL via natural Python API

• Increased productivity from automatic data preparation, partitioned models, and integrated text mining capabilities

Execute Python scripts and manage Python objects in-database• Collaborate: hand-off data science products from data scientist to developers easily

• Run user-defined functions in data-parallel, task-parallel, and non-parallel fashion

• Return structured and image results in Python and REST API

New automatic machine learning (AutoML) and model explainability (MLX)• Enhance data scientist productivity and enable non-experts to use and benefit from machine learning

• Algorithm selection, feature selection, hyperparameter tuning, model selection

• Model-agnostic identification of important features that impact model predictions

Supported in Oracle Autonomous Database with OML Notebooks

Oracle Machine Learning for Python

OML Notebooks

REST InterfaceOML4Py

Copyright © 2020 Oracle and/or its affiliates.

Page 12: Introducing Oracle Machine Learning for Python · 2020. 11. 23. · Submit your presentation by going to our website and clicking on ^ TechCasts_ 4. Oracle Machine Learning for Python

Leverages proxy objects for database data: oml.DataFrame

• # Create table from Pandas DataFrame data

DATA = oml.create(data, table = 'BOSTON')

•# Get proxy object to DB table boston

DATA = oml.sync(table = 'BOSTON')

Uses familiar Python syntax to manipulate database data

Overloads Python functions translating functionality to SQL

DATA.shape

DATA.head()

DATA.describe()

DATA.std()

DATA.skew()

TRAIN, TEST =

DATA.split()

TRAIN.shape

TEST.shape

Transparency LayerIn-database performance – indexes, query optimization, parallelism, partitioning

Copyright © 2020 Oracle and/or its affiliates.

Page 13: Introducing Oracle Machine Learning for Python · 2020. 11. 23. · Submit your presentation by going to our website and clicking on ^ TechCasts_ 4. Oracle Machine Learning for Python

Example using the crosstab functionIn-database scalable aggregation

Oracle Autonomous Database

User tables

ONTIME_S = oml.sync(table="ONTIME_S")

res = ONTIME_S.crosstab('DEST')

type(res)

res.head()

Source data is a DataFrame, ONTIME_S,

which is an Oracle Database table

crosstab() function overloaded to accept OML

DataFrame objects and transparently

generates SQL for scalable processing in

Oracle Database

Returns an ‘oml.core.frame.DataFrame’ objectIn-dbstats

select DEST, count(*)

from ONTIME_S

group by DEST

Copyright © 2020 Oracle and/or its affiliates.

OML4Py

OML Notebooks

Page 14: Introducing Oracle Machine Learning for Python · 2020. 11. 23. · Submit your presentation by going to our website and clicking on ^ TechCasts_ 4. Oracle Machine Learning for Python

Machine Learning in-database algorithmsOML4Py 1.0

• Decision Tree• Naïve Bayes• Generalized Linear Model• Support Vector Machine• Random Forest• Neural Network

Regression

• Generalized Linear Model• Neural Network• Support Vector Machine

Classification

Attribute Importance

• Minimum Description Length

Clustering

• Expectation Maximization• Hierarchical k-Means

Feature Extraction

• Singular Value Decomposition• Explicit Semantic Analysis• Principal Component Analysis

via SVD

Association Rules

• Apriori – Association Rules

Anomaly Detection

• 1 Class Support Vector Machine

Supports automatic data preparation, partitioned model ensembles, integrated text mining

Copyright © 2020 Oracle and/or its affiliates.

Page 15: Introducing Oracle Machine Learning for Python · 2020. 11. 23. · Submit your presentation by going to our website and clicking on ^ TechCasts_ 4. Oracle Machine Learning for Python

Example using Support Vector MachineScalable in-database algorithms

from oml import svm

# create proxy object

ONTIME_S = oml.sync(table='ONTIME_S')

# define model object

settings = {'svms_outlier_rate' : 0.01}

svm_mod = svm('anomaly_detection',

svms_kernel_function =

'dbms_data_mining.svms_linear',

**settings)

# build anomaly detection model

svm_mod = svm_mod.fit(x=ONTIME_S, y=None)

# view model object

svm_mod

OML4Py

OML Notebooks

Copyright © 2020 Oracle and/or its affiliates.

Oracle Autonomous Database

User tables

Page 16: Introducing Oracle Machine Learning for Python · 2020. 11. 23. · Submit your presentation by going to our website and clicking on ^ TechCasts_ 4. Oracle Machine Learning for Python

Copyright © 2020, Oracle and/or its affiliates 16

Example using OML Notebooks with in-database clustering model build and scoreUse matplotlib visualization with in-database model results

Drop existing model

Build k-Means model

Score using model

Page 17: Introducing Oracle Machine Learning for Python · 2020. 11. 23. · Submit your presentation by going to our website and clicking on ^ TechCasts_ 4. Oracle Machine Learning for Python

spawns

Example of parallel partitioned data flow using third party packageEmbedded Python Execution

# user-defined function using sklearn

def build_lm(dat):

from sklearn import linear_model

lm = linear_model.LinearRegression()

X = dat[['PETAL_WIDTH']]

y = dat[['PETAL_LENGTH']]

lm.fit(X, y)

return lm

# select column(s) for partitioning data

index = oml.DataFrame(IRIS['SPECIES'])

# invoke function in parallel on IRIS table

mods = oml.group_apply(IRIS, index,

func=build_lm,

parallel=2)

mods.pull().items()

OML4Py

Python Engine

OML4Py

Python Engine

OML4Py

OML Notebooks

Copyright © 2020 Oracle and/or its affiliates.

REST Interface

Oracle Autonomous Database

User tables

Page 18: Introducing Oracle Machine Learning for Python · 2020. 11. 23. · Submit your presentation by going to our website and clicking on ^ TechCasts_ 4. Oracle Machine Learning for Python

Copyright © 2020, Oracle and/or its affiliates 18

<oml-cloud-service-url>/oml/tenants/<tenant_name>/databases/<pdb_name>/api/py-scripts/v1/<operation>/<script_name>/

py_scripts for executing user-defined functions (Python “scripts”)REST Interface for Embedded Python Execution

Cloud serviceURL

Customer tenant name

Name of pluggable database within ADB

Name of script in repository

do-eval

table-apply

group-apply

index-apply

row-apply

Example synchronous invocation from cURL$ curl -X POST --header “Authorization: Bearer ${token}” --header 'Content-Type: application/json' --header 'Accept: application/json' -d '-d ‘{“graphicsFlag”:true, “service”:“MEDIUM”}' "<oml-cloud-service-url>/oml/tenants/MYTENANT/databases/MYADW/api/py-scripts/v1/ RandomRedDots/do-eval”

Asynchronous invocation also available

Page 19: Introducing Oracle Machine Learning for Python · 2020. 11. 23. · Submit your presentation by going to our website and clicking on ^ TechCasts_ 4. Oracle Machine Learning for Python

Increase data scientist productivity – reduce overall compute timeAutoML – new with OML4Py

Auto Feature Selection

– Reduce # of features by identifying most predictive

– Improve performance and accuracy

Auto AlgorithmSelection

Much faster than exhaustive search

Auto FeatureSelection

De-noise data and reduce # of features

Auto ModelTuning

Significant accuracy improvement

MLModel

Auto Algorithm Selection

– Identify in-database algorithm that achieves highest model quality

– Find best algorithm faster than with exhaustive search

Auto Model Tuning

– Automatic tuning of algorithm hyperparameters

– Avoid manual or exhaustive search techniques

Copyright © 2020 Oracle and/or its affiliates.

Enables non-expert users to leverage Machine Learning

DataTable

Page 20: Introducing Oracle Machine Learning for Python · 2020. 11. 23. · Submit your presentation by going to our website and clicking on ^ TechCasts_ 4. Oracle Machine Learning for Python

Copyright © 2020, Oracle and/or its affiliates 20

Demo

Page 21: Introducing Oracle Machine Learning for Python · 2020. 11. 23. · Submit your presentation by going to our website and clicking on ^ TechCasts_ 4. Oracle Machine Learning for Python

Python access to Oracle Machine Learning in Autonomous Database• Scalable data exploration, preparation, and analysis

• Scalable in-database machine learning

• Automation for greater data scientist productivity and non-expert use

Extends Python for enterprise use• In-database performance and scalability

• Platform for application integration

• Simplified production deployment of data science solutions

Summary – OML4Py

Copyright © 2020 Oracle and/or its affiliates.

Page 22: Introducing Oracle Machine Learning for Python · 2020. 11. 23. · Submit your presentation by going to our website and clicking on ^ TechCasts_ 4. Oracle Machine Learning for Python

22

ORACLE AUTONOMOUS CLOUD – ALWAYS FREE TIER

https://cloud.oracle.com/tryit

ORACLE MACHINE LEARNING ON OTN

https://www.oracle.com/machine-learning

OML TUTORIALS

Interactive tour: https://docs.oracle.com/en/cloud/paas/autonomous-database/oml-tour

Basic getting started: https://docs.oracle.com/en/cloud/paas/autonomous-data-warehouse-cloud/omlug/get-started-oracle-machine-learning.html

OML OFFICE HOURS

https://asktom.oracle.com/pls/apex/asktom.search?office=6801#sessionss

ORACLE ANALYTICS CLOUDhttps://www.oracle.com/solutions/business-analytics/data-visualization/examples.html

Helpful Links

Page 23: Introducing Oracle Machine Learning for Python · 2020. 11. 23. · Submit your presentation by going to our website and clicking on ^ TechCasts_ 4. Oracle Machine Learning for Python

Thank You

Mark Hornick

[email protected]@MarkHornick