open data science in the ai era: breaking data science open

72
© 2017 Continuum Analytics - Confidential & Proprietary 1 Open Data Science In the AI Era Breaking Data Science Open

Upload: continuum-analytics

Post on 19-Mar-2017

647 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 1

Open Data Science In the AI EraBreaking Data Science Open

Page 2: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 2

Michele Chambers @mcAnalytics• EVP Product & CMO Continuum Analytics• M.B.A Duke University, B.S. Computer Engineering• Author

• Breaking Data Science Open: O’Reilly• Modern Analytics Methodologies: Driving Business

Value with Analytics Pearson FT Press• Advanced Analytics Methodologies: Driving Business

Value with Analytics Pearson FT Press• Big Data Big Analytics Wiley

About Us

Page 3: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 3

About Us

Ian Stokes-Rees @ijstokes• Product Marketing Manager• PhD in Particle Physics from Oxford University• Passionate advocate of Open Data Science• Educator and evangelist for use of

Python and Anaconda • Author: Breaking Data Science Open: O’Reilly• 20+ years experience engineering large scale

systems for numerical computing

Page 4: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 4

Business Intelligence & Predictive AnalyticsUsing Data for Insight & Human-in-the-Loop actions

Page 5: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 5

Cognitive IntelligenceUsing Data & Deep Learning to Make Recommendations

Page 6: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 6

Page 7: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 7

Page 8: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 8

Open Data ScienceConnecting Data, Analytics & Computation

Page 9: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary

“ ”9

The ability of machines and software to think, work and react like humans.

AI is…

Page 10: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 10

Page 11: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary

Data sciencehas expanded toinclude AI

Distributed Systems

Business Intelligence

AI

Web

Scientific Computing / HPC

Deep learning, natural language processing, neural networks,

regression, statistics

Hadoop, SparkWeb crawling, scraping, 3rd party data & API providers, predictive services & APIs

GPUs, multi-coresData warehouse, querying, reporting

Page 12: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary

Numbaxlwings

Airflow

BlazeOpen Source Communities Creates Powerful Technology for Data Science

Distributed Systems

Business Intelligence

Web

Scientific Computing / HPC

AI

Page 13: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary

Diverse Languages Poised to Embrace AI

Distributed Systems

Business Intelligence

AI

Web

Scientific Computing / HPC

SQL

Page 14: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary

Numbaxlwings

Airflow

Blaze

Anaconda is the Open Data Science Platform bringing technology together

Distributed Systems

Business Intelligence

Web

Scientific Computing / HPC

AI

Page 15: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary

Page 16: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary

Page 17: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 17

Open Data Science Technologies

AI

NotebooksVisualization

Big DataLanguages

Page 18: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 18

Open Source Open Source

Open Core Open Core

Open Data Science Landscape

Niche Wide

Niche WideLEADERSCHALLENGERS

NICHE VISIONARIES

Adoption

Open Core

Open Source

Page 19: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 19

Open Core

Open Source

LEADERSCHALLENGERS

NICHE VISIONARIES

Adoption

MS Azure MLAWS Machine LearningGoogle Cloud Platform

DataRobotH2OAnaconda

Enterprisedeepsense

NLTK

TheanoTensorFlowCaffeKerasMXnet

Spark MLlib

SpaCy

Open Data Science Landscape for AI

AnacondaNumPySciPyPandasScikit-Learn

dplyrcaret

Page 20: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 20

LEADERSCHALLENGERS

NICHE VISIONARIES

Open Data Science Landscape for Notebooks

Anaconda Enterprise

R StudioDatabricks

Jupyter

ZeppelinJupyterHub

Rodeo

JupyterLab

Beaker

nteractbinder

nbviewer

Hue

Adoption

Open Core

Open Source

Page 21: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 21

LEADERSCHALLENGERS

NICHE VISIONARIES

Open Data Science Landscape for Visualization

Plotly Server

Anaconda Enterprise

` HoloviewsDatashader

Shiny Server Pro

BokehShinyShiny Server

ggplot

d3

matplotlib

seaborn

Plotly

Adoption

Open Core

Open Source

Page 22: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 22

LEADERSCHALLENGERS

NICHE VISIONARIES

Open Data Science Landscape for Big Data

Hortonworks

Cloudera

MapRAnaconda Enterprise

DatabricksPivotal

Dask

Ibis

Numba

KuduSpark

Hadoop

Flink

Spark Streaming

Kafka

ImpalaStorm

Elasticsearch

SolrHive

Adoption

Open Core

Open Source

Page 23: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 23

LEADERSCHALLENGERS

NICHE VISIONARIES

Open Data Science Landscape for Languages

R StudioAnaconda Enterprise

R AnacondaPythonPyData

Microsoft R Open

Scala

Microsoft R Client

Julia

Data Robot

MS Azure ML

IBM DSXMicrosoft R Server

yhatdataiku

Domino Data Lab

DataScience.com

Adoption

Open Core

Open Source

Page 24: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 24

LEADERSCHALLENGERS

NICHE VISIONARIES

Landscape for Open Data Science Platforms

Everyone Else

Cloudera

Microsoft

Adoption

Open Core

Open Source

Anaconda Enterprise

R Studio

R AnacondaJupyter

Page 25: Open Data Science in the AI Era: Breaking Data Science Open

25

Anaconda now includes

AI

Page 26: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 26

• Deep Learning means creating models that attempt to match the structure of the human brain through artificial neural networks

• TensorFlow, from Google, has grown enormously in popularity and provides a framework for creating deep learning systems

• Commonly applied for text, speech, and image processing to ”learn” patterns then identify them and score or classify new data streams

Art of the PossibleOpen Data Science Applications

https://www.tensorflow.org/

Page 27: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 27

• Artificial General Intelligence refers to systems that can understand and respond to a wide range of inputs

• Universe, from OpenAI, provides a framework for testing AGI systems on video games

• Open Source, Python based

Art of the PossibleOpen Data Science Applications

https://universe.openai.com/

Page 28: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 28

• Production-ready open source AI toolkit from Microsoft

• Used in Bing, Skype, Cortana• Designed to work with Anaconda• GPU and multi-node ready

Art of the PossibleOpen Data Science Applications

https://aka.ms/cognitivetoolkit

Page 29: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 29

• Accelerate Time-to-Value

• Connect Data, Analytics & Compute

• Empower Data Science Teams

…is the leading Open Data Science platform powered by Python the fastest growing data science language

Page 30: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 30

INNOVATE faster through managed agile experimentation

MOVE from analysis to deployment immediately

DELIVER powerful results backed by high performance open data science platform

LEVERAGE innovative open source analytics to extract value from data

MAXIMIZE your computational power to easily analyze all data

CONNECT and integrate all your data sources for predictive models

ITERATE quickly to create powerful analysis and predictive models

COLLABORATE and share with your data science team

PUBLISH interactive results to the business

ACCELERATETime-to-Value

CONNECTData, Analytics & Compute

EMPOWERData Science Teams

Page 31: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 31

Open Data Science PlatformACCELERATE. CONNECT. EMPOWER

Page 32: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 32

Open Data Science in Enterprises

DATASCIENCECOLLAB

DATASCIENCE

GOVERNANCEDASHBOARDS

& APPS

SELFSERVICE

ANALYTICS

DATASCIENCE

OPERATIONS

DATASCIENCE FOR

BIG DATA

AI

OPENDATA

SCIENCE

Page 33: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 33

Anaconda• High performance Python &

R• 720+ data science

packages• Cross-platform package,

dependency & environments• Community driven package

repository collaboration Anaconda Navigator

• Desktop Portal & Installer

Anaconda Powers

OPEN DATA SCIENCE

DATA SCIENCE GOVERNANCE

DATA SCIENCE COLLABORATION

Anaconda Repository• Storage & sharing of

packages, environments, notebooks

• On-premise governance• Enterprise authentication

Anaconda Enterprise Notebooks

• Collaborative project based workflows for Python & R

• Enterprise authentication & permissioning

• Notebook sharing, versioning, search, differencing

DATA SCIENCE FOR BIG DATA

Anaconda Scale • Hadoop & Spark integration• Scalable distributed

processing framework• Integration with resource

management & data stores• Self-service cluster

launching• Distributed package,

dependency & environments Anaconda Fusion

• Big Data querying & transformations

Page 34: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 34

Anaconda• 720+ data science

packages• Deep Learning: Theano,

TensorFlow, Caffe, Keras, Neon, Lasagne

• Natural Language Processing: NLTK, spaCy

• Machine Learning: Scikit-learn

• GPU enablement

Anaconda Powers

AI DASHBOARDS & APPS

DATA SCIENCE OPERATIONS

Anaconda• Interactive browser based

dashboards & visualizations with Bokeh

• Bokeh apps using Python, R, Scala

• Big Data visualizations with Datashader

Anaconda Adam• Server & Cluster Installer

Anaconda Accelerate• Python compilation for multi-

core & GPUs• Code, data, in-notebook

profilers• Pre-optimized numerical

libraries

SELF-SERVICE ANALYTICS

Anaconda Fusion• Integration of Open Data

Science with Microsoft Excel®

• Interactive exploration & visualization

• Predictive modeling• Big Data querying &

transformations

Page 35: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 35

GrowingAnaconda Partner Ecosystem

Page 36: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary© 2017 Continuum Analytics - Confidential & Proprietary

Anaconda Gives Superpowers To People Who Change The World

Page 37: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 37

Open Data ScienceVibrant and Growing Community

Python Community

30M+Packages in Anaconda

720+

R Community

16M+Spark Python Usage

50%+

ANACONDADownloads

11M+

Page 38: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 38

Financial Services• Risk management, Quant modeling, Data exploration

and processing, algorithmic trading, compliance reporting

Government• Fraud detection, data crawling, web & cyber data

analytics, statistical modelingHealthcare & Life Sciences• Genomics data processing, cancer research, natural

language processing for health data scienceHigh Tech• Customer behavior, recommendations, ad bidding,

retargeting, social media analyticsRetail & CPG• Engineering simulation, supply chain modeling,

scientific analysisOil & Gas• Pipeline monitoring, noise logging, seismic data

processing, geophysics

Anaconda…is Trusted by Industry Leaders

Page 39: Open Data Science in the AI Era: Breaking Data Science Open

39

Open Data Sciencein Action

Page 40: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 40

DaskParallel processing with Dask makes large data sets accessible

Deep Learning with Anaconda & TensorFlowClassic digit recognition leveraging Anaconda + GPUs

Page 41: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 41

• Parallel and Distributed Pandas and NumPy

• Low latency workflow manager• Graphical tools• Simple APIs• Extensible and generalizable to

other data structures

Dask: Parallel Data Processing

Page 43: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 43

DaskParallel processing with Dask makes large data sets accessible

Deep Learning with Anaconda & TensorFlowClassic digit recognition leveraging Anaconda + GPUs

Page 44: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary

Deep Learning == Neural Nets

ReLU

ReLU

ReLU

ReLU

Page 45: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 45

Deep Learning Software Stack

MULTI-CORE CPU GPU

MANY-CORE CPU

(XEON PHI)HARDWARE

INTEL MKL 2017 CUDNNPRIMITIVES

TENSORFLOWTHEANOPYTORCHTENSOR MATH

NEURAL NETWORKS KERAS TFLEARNCAFFE

...and many others

MI OPEN

Page 46: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 46

Example: Digit recognition

Trained to 98% accuracy in4 minutes using a singleNVIDIA GTX 1080 GPU

98% success

2% failure

Page 47: Open Data Science in the AI Era: Breaking Data Science Open

47

A collaborative environment for data science teams

Anaconda Enterprise

Page 48: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 48

Anaconda Foundation

http://continuum.io/downloads

Page 49: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 49

Accelerating adoption of Open Data Science

• Easy to install

• Agile data exploration

• Powerful data analysis

• Simple to collaborate

• Accessible to everyone

PYTHON & R OPEN SOURCE ANALYTICSNumPy SciPy Pandas Scikit-learn Jupyter/IPython

Numba Matplotlib Spyder TensorFlow Cython Theano

Scikit-image NLTK Dask Caffe dplyr shiny

ggplot2 tidyr caret PySpark & 720+ packages

Page 50: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 50

Page 51: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 51

Page 52: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 52

Page 53: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 53

Third Party Asset Management

http://anaconda.org

Page 54: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 54

Step 2: Anaconda Cloud

Page 55: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 55

Personal Assets Management

http://anaconda.org/ijstokes

Page 56: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 56

Publishing Data Science

Page 57: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 57

Provenance and Reproducibility

Page 58: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 58

Enterprise Collaboration

Apps

Search

Tags

Team

Assets

Page 59: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 59

Data Science App Creation & Deployment

Page 60: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 60

Excel Integration

BRING interactive visualizations, machine learning and ETL to Excel

BRIDGE Excel Data to Python & R through notebooks

ACCESS all the power of Python and Big Data, natively embedded inside Excel

Anaconda Fusion brings Open Data Science to Microsoft Excel

Page 61: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 61

Enterprise-Ready Anaconda

DATASCIENCECOLLAB

DATASCIENCE

GOVERNANCEDASHBOARDS

& APPS

SELFSERVICE

ANALYTICS

DATASCIENCE

OPERATIONS

DATASCIENCE FOR

BIG DATA

AI

OPENDATA

SCIENCE

Page 62: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 62

Executive Sponsorship

Page 63: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 63

Static Investments

Page 64: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 64

Dynamic Investments

Page 65: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 65

Dynamic Investments

Data Lab Data Science Team CDO Production Operations

Page 66: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 66

Executive Sponsorship Responsibilities

Governance Provenance Reproducibility

Page 67: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary

Data science is Team Sport

Team | CollaborativeIndividual | Silo

Modern RolesTraditional Roles

Page 68: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 68

Empowering the Data Science Team

Page 69: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 69

We can help you get started…

• Journey to Open Data Science

• SAS to Python Workflow Best Practices

• Open Data Science Workflow Best Practices

• Deployed Data Science Model

• Interactive Visualization Data Science

• SAS to Python Migration

• Matlab to Python Migration

• SAS to R Migration

• Python Turning• Model Tuning• Infrastructure

Tuning

• Open Source Development

• Package Building

AssessmentsBestPractices

DataScience

Migrations Tuning Development

Page 70: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 70

Next Steps

DOWNLOAD Breaking Data Science Open eBookGo.continuum.io/download-ebook-breaking-data-science-open/

CHECKOUT Anaconda Enterprisecontinuum.io/anaconda-overview

EXPERIENCE Anaconda Enterprise on your ownhttp://know.continuum.io/Anaconda-Enterprise-Test-Drive.html

Page 71: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary 71

Thank YouMichele ChambersTwitter: @mcAnalytics

Ian Stokes-ReesTwitter: @ijstokes

[email protected]@ContinuumIO

Page 72: Open Data Science in the AI Era: Breaking Data Science Open

© 2017 Continuum Analytics - Confidential & Proprietary© 2017 Continuum Analytics - Confidential & Proprietary

Continuum AnalyticsWe empower data science teams to make the world a better placeWe Empower Data Science Teams to Make the World Better