introducing oracle machine learning for python · 2020. 11. 23. · submit your presentation by...
TRANSCRIPT
Mark Hornick - Senior Director, Data Science and Machine Learning at [email protected], www.twitter.com/MarkHornick
1
Introducing Oracle Machine Learning for Python
Future and pastTechCasts:
Analytics & Data Oracle User CommunitySame great technical content…new name!
www.andouc.org
Submit a topic to share at https://analyticsanddatasummit.org/techcasts/
3
Save the DateTechCast Days-Winter Session
January 26-28, 2021
Watch our website & social media channels for more details
Share your knowledge, expertise and ideas!
Submit your presentation by going to our website
and clicking on “TechCasts”
4
Oracle Machine Learning for PythonIntroduction
Mark Hornick
Senior Director, Data Science and Machine Learning
November 19, 2020
Copyright © 2020 Oracle and/or its affiliates.
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, timing, and pricing of any features or functionality described for Oracle’s products may change and remains at the sole discretion of Oracle Corporation.
Safe Harbor
An interpreted, object-oriented, high level, general purpose programming language
Designed for rapid application development and scripting to connect existing components
Created in the late 1980s and first released in 1991
Open source: https://www.python.org
World-wide usage
• Widely taught in Universities
• Many Data Scientists know and use Python
Thousands of open source packages to enhance productivity
What is Python?
Copyright © 2020 Oracle and/or its affiliates.
Traditional Python and Data Source Interaction
Access latencyParadigm shift: Python → Data Access Language → PythonMemory limitation – data size, in-memory processingSingle threadedIssues for backup, recovery, securityAd hoc production deployment
DeploymentAd hoc
cron job
Data SourceFlat Filesextract / exportread
export load
Data source connectivity packages, e.g., cx_Oracle
Read/Write files using built-in tool capabilities
Copyright © 2020 Oracle and/or its affiliates.
d
* Coming soon
Oracle Machine Learning
OML Services*Model Deployment and Management,
Cognitive Text
OML4SQLSQL API
OML4Py*Python API
OML4RR API
OML Notebookswith Apache Zeppelin on
Autonomous Database
OML4SparkR API on Big Data
Oracle Data MinerOracle SQL Developer extension
OML AutoML UI*Code-free AutoML interface on Autonomous Database
Copyright © 2020 Oracle and/or its affiliates.
Collaborative UI
• Based on Apache Zeppelin
• Supports data scientists, data analysts, application developers, DBAs with SQL and Python
• Easy notebook sharing
• Permissions, versioning, and scheduling of notebooks
Included with Autonomous Database
• Automatically provisioned and managed
• In-database algorithms and analytics functions
• Explore and prepare, build and evaluate models, score data, deploy solutions
Autonomous Database as a Data Science Platform
Oracle Machine Learning Notebooks
Copyright © 2020 Oracle and/or its affiliates.
Use Oracle Database as HPC environment• Explore, transform, and analyze data faster and at scale
Use in-database parallelized and distributed ML algorithms• Build more models on more data, and score large volume data – faster
• Use in-database algorithms from OML4SQL via natural Python API
• Increased productivity from automatic data preparation, partitioned models, and integrated text mining capabilities
Execute Python scripts and manage Python objects in-database• Collaborate: hand-off data science products from data scientist to developers easily
• Run user-defined functions in data-parallel, task-parallel, and non-parallel fashion
• Return structured and image results in Python and REST API
New automatic machine learning (AutoML) and model explainability (MLX)• Enhance data scientist productivity and enable non-experts to use and benefit from machine learning
• Algorithm selection, feature selection, hyperparameter tuning, model selection
• Model-agnostic identification of important features that impact model predictions
Supported in Oracle Autonomous Database with OML Notebooks
Oracle Machine Learning for Python
OML Notebooks
REST InterfaceOML4Py
Copyright © 2020 Oracle and/or its affiliates.
Leverages proxy objects for database data: oml.DataFrame
• # Create table from Pandas DataFrame data
DATA = oml.create(data, table = 'BOSTON')
•# Get proxy object to DB table boston
DATA = oml.sync(table = 'BOSTON')
Uses familiar Python syntax to manipulate database data
Overloads Python functions translating functionality to SQL
DATA.shape
DATA.head()
DATA.describe()
DATA.std()
DATA.skew()
TRAIN, TEST =
DATA.split()
TRAIN.shape
TEST.shape
Transparency LayerIn-database performance – indexes, query optimization, parallelism, partitioning
Copyright © 2020 Oracle and/or its affiliates.
Example using the crosstab functionIn-database scalable aggregation
Oracle Autonomous Database
User tables
ONTIME_S = oml.sync(table="ONTIME_S")
res = ONTIME_S.crosstab('DEST')
type(res)
res.head()
Source data is a DataFrame, ONTIME_S,
which is an Oracle Database table
crosstab() function overloaded to accept OML
DataFrame objects and transparently
generates SQL for scalable processing in
Oracle Database
Returns an ‘oml.core.frame.DataFrame’ objectIn-dbstats
select DEST, count(*)
from ONTIME_S
group by DEST
Copyright © 2020 Oracle and/or its affiliates.
OML4Py
OML Notebooks
Machine Learning in-database algorithmsOML4Py 1.0
• Decision Tree• Naïve Bayes• Generalized Linear Model• Support Vector Machine• Random Forest• Neural Network
Regression
• Generalized Linear Model• Neural Network• Support Vector Machine
Classification
Attribute Importance
• Minimum Description Length
Clustering
• Expectation Maximization• Hierarchical k-Means
Feature Extraction
• Singular Value Decomposition• Explicit Semantic Analysis• Principal Component Analysis
via SVD
Association Rules
• Apriori – Association Rules
Anomaly Detection
• 1 Class Support Vector Machine
Supports automatic data preparation, partitioned model ensembles, integrated text mining
Copyright © 2020 Oracle and/or its affiliates.
Example using Support Vector MachineScalable in-database algorithms
from oml import svm
# create proxy object
ONTIME_S = oml.sync(table='ONTIME_S')
# define model object
settings = {'svms_outlier_rate' : 0.01}
svm_mod = svm('anomaly_detection',
svms_kernel_function =
'dbms_data_mining.svms_linear',
**settings)
# build anomaly detection model
svm_mod = svm_mod.fit(x=ONTIME_S, y=None)
# view model object
svm_mod
OML4Py
OML Notebooks
Copyright © 2020 Oracle and/or its affiliates.
Oracle Autonomous Database
User tables
Copyright © 2020, Oracle and/or its affiliates 16
Example using OML Notebooks with in-database clustering model build and scoreUse matplotlib visualization with in-database model results
Drop existing model
Build k-Means model
Score using model
spawns
Example of parallel partitioned data flow using third party packageEmbedded Python Execution
# user-defined function using sklearn
def build_lm(dat):
from sklearn import linear_model
lm = linear_model.LinearRegression()
X = dat[['PETAL_WIDTH']]
y = dat[['PETAL_LENGTH']]
lm.fit(X, y)
return lm
# select column(s) for partitioning data
index = oml.DataFrame(IRIS['SPECIES'])
# invoke function in parallel on IRIS table
mods = oml.group_apply(IRIS, index,
func=build_lm,
parallel=2)
mods.pull().items()
OML4Py
Python Engine
OML4Py
Python Engine
OML4Py
OML Notebooks
Copyright © 2020 Oracle and/or its affiliates.
REST Interface
Oracle Autonomous Database
User tables
Copyright © 2020, Oracle and/or its affiliates 18
<oml-cloud-service-url>/oml/tenants/<tenant_name>/databases/<pdb_name>/api/py-scripts/v1/<operation>/<script_name>/
py_scripts for executing user-defined functions (Python “scripts”)REST Interface for Embedded Python Execution
Cloud serviceURL
Customer tenant name
Name of pluggable database within ADB
Name of script in repository
do-eval
table-apply
group-apply
index-apply
row-apply
Example synchronous invocation from cURL$ curl -X POST --header “Authorization: Bearer ${token}” --header 'Content-Type: application/json' --header 'Accept: application/json' -d '-d ‘{“graphicsFlag”:true, “service”:“MEDIUM”}' "<oml-cloud-service-url>/oml/tenants/MYTENANT/databases/MYADW/api/py-scripts/v1/ RandomRedDots/do-eval”
Asynchronous invocation also available
Increase data scientist productivity – reduce overall compute timeAutoML – new with OML4Py
Auto Feature Selection
– Reduce # of features by identifying most predictive
– Improve performance and accuracy
Auto AlgorithmSelection
Much faster than exhaustive search
Auto FeatureSelection
De-noise data and reduce # of features
Auto ModelTuning
Significant accuracy improvement
MLModel
Auto Algorithm Selection
– Identify in-database algorithm that achieves highest model quality
– Find best algorithm faster than with exhaustive search
Auto Model Tuning
– Automatic tuning of algorithm hyperparameters
– Avoid manual or exhaustive search techniques
Copyright © 2020 Oracle and/or its affiliates.
Enables non-expert users to leverage Machine Learning
DataTable
Copyright © 2020, Oracle and/or its affiliates 20
Demo
Python access to Oracle Machine Learning in Autonomous Database• Scalable data exploration, preparation, and analysis
• Scalable in-database machine learning
• Automation for greater data scientist productivity and non-expert use
Extends Python for enterprise use• In-database performance and scalability
• Platform for application integration
• Simplified production deployment of data science solutions
Summary – OML4Py
Copyright © 2020 Oracle and/or its affiliates.
22
ORACLE AUTONOMOUS CLOUD – ALWAYS FREE TIER
https://cloud.oracle.com/tryit
ORACLE MACHINE LEARNING ON OTN
https://www.oracle.com/machine-learning
OML TUTORIALS
Interactive tour: https://docs.oracle.com/en/cloud/paas/autonomous-database/oml-tour
Basic getting started: https://docs.oracle.com/en/cloud/paas/autonomous-data-warehouse-cloud/omlug/get-started-oracle-machine-learning.html
OML OFFICE HOURS
https://asktom.oracle.com/pls/apex/asktom.search?office=6801#sessionss
ORACLE ANALYTICS CLOUDhttps://www.oracle.com/solutions/business-analytics/data-visualization/examples.html
Helpful Links