journey to open data science

53
© 2016 Continuum Analytics - Proprietary JOURNEY TO OPEN DATA SCIENCE Michele Chambers, CMO & VP Products Christine Doig, Sr. Data Scientist & Product Marketing Manager Continuum Analytics

Upload: continuum-analytics

Post on 16-Apr-2017

10.322 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Page 1: Journey to Open Data Science

© 2016 Continuum Analytics - Proprietary

JOURNEY TO OPEN DATA SCIENCE

Michele Chambers, CMO & VP ProductsChristine Doig, Sr. Data Scientist & Product Marketing ManagerContinuum Analytics

Page 2: Journey to Open Data Science

2© 2016 Continuum Analytics- Confidential & Proprietary

• Michele Chambers @ mcAnalyticsCMO & VP Product Continuum AnalyticsM.B.A Duke University, B.S. Computer Engineering

AuthorBig Data Big Analytics Wiley

Modern Analytics Methodologies: Driving Business Value with Analytics Pearson FT Press

Advanced Analytics Methodologies: Driving Business Value with Analytics Pearson FT Press

About Us

Page 3: Journey to Open Data Science

3© 2016 Continuum Analytics- Confidential & Proprietary

• Christine Doig @ch_doigSenior Data Scientist & Product Marketing Manager Continuum Analytics

M.S. Polytechnic University of Catalonia in Industrial Engineering

Open Source advocate and speakerPyData, EuroPython, SciPy, PyCon

5+ years in advanced analytics, operations research, machine learning in energy, manufacturing & banking

About Us

Page 4: Journey to Open Data Science

WHAT’S OPEN DATA SCIENCE?

© 2016 Continuum Analytics- Confidential & Proprietary

Page 5: Journey to Open Data Science

5

“ ”An interdisciplinary field about processes and systems to extract knowledge or insights from data in various forms

Wikipedia

Data Science is …

© 2016 Continuum Analytics- Confidential & Proprietary

Page 6: Journey to Open Data Science

6

Data Science is not just Machine Learning…

© 2016 Continuum Analytics - Confidential & Proprietary

Distributed Systems

Business Intelligence

Machine Learning / Statistics

Web

Scientific Computing / HPC

Page 7: Journey to Open Data Science

7

Data Science is Interdisciplinary…

© 2016 Continuum Analytics - Confidential & Proprietary

Hadoop, Spark

GPUs, multi-cores

Classification, deep learning

Regression, PCA

Web crawling, scraping, 3rd party data & API providers, predictive

services & APIs

Data warehouse, querying, reporting

Distributed Systems

Business Intelligence

Machine Learning / Statistics

Web

Scientific Computing / HPC

Page 8: Journey to Open Data Science

Open Data Science is …

an inclusive movement

that makes open source tools of data science -- data, analytics, & computation – easily work together as a connected ecosystem

© 2016 Continuum Analytics- Confidential & Proprietary 8

Page 9: Journey to Open Data Science

Open Data Science Means Open….

AvailabilityInnovation

InteroperabilityTransparency

For everyone in the data science team

© 2016 Continuum Analytics- Confidential & Proprietary

OPEN DATA SCIENCE IS THE FOUNDATION TO MODERNIZATION

9

Page 10: Journey to Open Data Science

10© 2016 Continuum Analytics - Confidential & Proprietary

Why are major corporations moving to Modern Analytics & Open Data Science?

Large Investment Banks Major Upstream Oil & GasGlobal CPG ManufacturersHow can I create and

deploy timely risk models? How can I possibly identify

the root causes of my complex problem and

remediate early enough to create revenue assurance?

How can I take advantage of all this new sensor

information now?

Page 11: Journey to Open Data Science

11

Industry Leaders Trusting Open Data ScienceOpen Data Science Community

Python Community 30M+R Community

16M +Spark Python Usage

50% Anaconda Downloads 3M+

© 2016 Continuum Analytics- Confidential & Proprietary

Page 12: Journey to Open Data Science

12

Open Source Communities Creates Powerful Technology for Data Science

© 2016 Continuum Analytics- Confidential & Proprietary

Numba

dask

xlwings

Airflow

Blaze

Distributed Systems

Business Intelligence

Machine Learning / Statistics

Web

Scientific Computing / HPC

Page 13: Journey to Open Data Science

13

Python is the Common Language

© 2016 Continuum Analytics- Confidential & Proprietary

Numba

dask

xlwings

Airflow

Blaze

Distributed Systems

Business Intelligence

Machine Learning / Statistics

Web

Scientific Computing / HPC

Page 14: Journey to Open Data Science

14© 2016 Continuum Analytics- Confidential & Proprietary

Python Trusted by Industry Leaders

Page 15: Journey to Open Data Science

15

“ ”Everyone at JPMorgan now needs to know Python and there are around 5,000 developers using it at Bank of America. There are close to 10 million lines of Python code in Quartz and we got close to 3,000 commits a day. It’s a good scripting language and easily integrated into both the front and back ends, which was one of the reasons we chose it in the first place.

Kirat Singh, Former Global Head of Risk Systems, Bank of America Merrill Lynch

Python is Everywhere

© 2016 Continuum Analytics- Confidential & Proprietary

Page 16: Journey to Open Data Science

16© 2016 Continuum Analytics- Confidential & Proprietary

Journey to Open Data Science

Page 17: Journey to Open Data Science

17© 2016 Continuum Analytics - Confidential & Proprietary

Before

• Proprietary Technology– Variety of DBs & DWs– Excel, SQL, Custom Code, SAS

• Problem– Hard to find people to create

proprietary risk assessment models– Takes months and years to deploy

After

• Open Data Science Technology– Python & Anaconda– NumPy, SciPy, PyData stack

• Results– Create and deploy risk models in

days not years – Easier to find and hire data scientists

Why Companies are Migrating to ODS…

Large Investment Bank

Page 18: Journey to Open Data Science

18

• Proprietary Technology– Matlab, Custom Fortran– Perl, SQL

• Problem– Complex model and simulation required

with disparate internal and external data

Before

Global CPG Manufacturer

© 2016 Continuum Analytics - Confidential & Proprietary

After

• Open Data Science Technology– Anaconda – Repository, PyData,

Fortran• Results

– Integrated multiple data feeds– Created full lifecycle predictive

model and simulation for revenue assurance

Why Companies are Migrating to ODS…

Page 19: Journey to Open Data Science

19

• Proprietary Technology– Industry specific visualization

• Problem– Unable to ingest Big Data

from sensors to proactively monitor oil well holes

Before

Major Upstream Oil & Gas

© 2016 Continuum Analytics - Confidential & Proprietary

After• Open Data Science Technology

– Streaming visualization with Bokeh• Results

– Created novel visualizations and predictive models using sensor data

– Gained insights into oil hole issues in weeks not years to detect issues earlier and increase profitability

Why Companies are Migrating to ODS…

Page 20: Journey to Open Data Science

20

Python’s Not the Only One…

© 2016 Continuum Analytics- Confidential & Proprietary

SQL

Distributed Systems

Business Intelligence

Machine Learning / Statistics

Web

Scientific Computing / HPC

Page 21: Journey to Open Data Science

21

But it’s also a Great Glue Language

© 2016 Continuum Analytics- Confidential & Proprietary

SQL

Distributed Systems

Business Intelligence

Machine Learning / Statistics

Web

Scientific Computing / HPC

Page 22: Journey to Open Data Science

22

Anaconda is the Open Data Science Platform Bringing Technology Together…

© 2016 Continuum Analytics- Confidential & Proprietary

Numba

daskAirflow

SQL

xlwings Blaze

Distributed Systems

Business Intelligence

Machine Learning / Statistics

Web

Scientific Computing / HPC

Page 23: Journey to Open Data Science

23© 2016 Continuum Analytics- Confidential & Proprietary

But Most Importantly Empowering Everyone on the Data Science Team

Data ScientistBiz Analyst Data EngineerDeveloper DevOps

Deploy & Operate

Explore & Analyze

Collaborate & Publish

Page 24: Journey to Open Data Science

How are Modern Roles Different from Traditional Roles?

© 2016 Continuum Analytics- Confidential & Proprietary

Team | CollaborativeIndividual | Silo

Modern RolesTraditional Roles

24

Page 25: Journey to Open Data Science

25

Modern Data Science Teams use…

© 2016 Continuum Analytics- Confidential & Proprietary

Data Scientist• Hadoop / Spark• Programming

Languages• Analytic Libraries• IDE• Notebooks• Visualization

Biz Analyst• Spreadsheets• Visualization• Notebooks• Analytic

Development Environment

Data Engineer• Database / Data

Warehouse• ETL

Developer• Programming

Languages• Analytic Libraries• IDE• Notebooks• Visualization

DevOps• Database / Data

Warehouse• Middleware• Programming

Languages

RIGHT TECHNOLOGY FOR THE PROBLEM

Page 26: Journey to Open Data Science

26

Modern Data Science Teams Want

© 2016 Continuum Analytics- Confidential & Proprietary

Collaboration

• Iterate on analysis• Share discoveries with team• Interact with teams across

the globe

Interactivity

• Interact with data• Build high performance

models• Visualize results in context

Integration

• Work with open source and legacy data systems

• Leverage data science languages: Python, R, Matlab, SAS, SPSS, Excel, Java, C/C++, C#, .NET, FORTRAN and more

Predict

Share

Deploy

with Open Data Science

Page 27: Journey to Open Data Science

• Accelerate Time-to-Value

• Connect Data, Analytics & Compute

• Empower Data Science Teams

27

is….the leading Open Data Science platform powered by Python the fastest growing open data science language

Page 28: Journey to Open Data Science

28© 2015 Continuum Analytics- Confidential & Proprietary

ACCELERATETime-to-Value

INNOVATE faster through managed agile experimentation

MOVE from analysis to deployment immediately

DELIVER high performance analytics processing

CONNECTData, Analytics & Compute

LEVERAGE innovative open source analytics to extract value from data

MAXIMIZE your computational power to easily analyze all your data

CONNECT and integrate all your data sources for predictive models

EMPOWERData Science Teams

ITERATE quickly to create powerful analysis and predictive models

COLLABORATE and share with your data science team

PUBLISH interactive results to the business

Page 29: Journey to Open Data Science

29© 2015 Continuum Analytics- Confidential & Proprietary

Introducing AnacondaThe Open Data Science Platform Powered by Python

Enterprise Ready Platform– Simplify administration– Use open data science– Collaborate with entire team– Leverage modern architectures– Integrate data sources– Accelerate performance

OPER

ATION

S

DATA SC

IENC

E LA

NG

UA

GES

APPLICATIONS

DATA

HARDWARE

ANALYTICS

Model Building

Analytics DevelopmentData Exploration

SOFTWARE DEVELOPMENT

HIGH PERFORMANCE

Cloud On-premises

Business Analyst

Data Scientist

Developer

DataEngineer

DevOps

Data Science Team

Page 30: Journey to Open Data Science

BSD LicensedSupport IndemnificationTrainingConsulting

Page 31: Journey to Open Data Science

DEMOS

© 2016 Continuum Analytics- Confidential & Proprietary

Page 32: Journey to Open Data Science

32© 2015 Continuum Analytics- Confidential & Proprietary

• Anaconda Enterprise Notebooks: A collaborative environment for Data

Science teams

• Anaconda for Excel: Bringing Advanced Analytics and Interactive

Visualizations to MS Excel

Page 33: Journey to Open Data Science

© 2016 Continuum Analytics- Confidential & Proprietary

ANACONDA ENTERPRISE NOTEBOOKSA COLLABORATIVE ENVIRONMENT FOR DATA SCIENCE TEAMS

Page 34: Journey to Open Data Science

34© 2015 Continuum Analytics- Confidential & Proprietary

Search projects per tag and collaborators

Manage contributors

Manage collaborative projects

Page 35: Journey to Open Data Science

35© 2015 Continuum Analytics- Confidential & Proprietary

Organize notebooks, scripts and other files in projects

Manage teams’ collaborators

Save favorite projects

Page 36: Journey to Open Data Science

36© 2015 Continuum Analytics- Confidential & Proprietary

Data lineage

Access to collaborative executable notebooks

Interactive Visualizations

Advanced notebook extensions

Page 37: Journey to Open Data Science

37

Use advanced notebook extensions for enhanced collaboration

• Publishing to Anaconda Repository integration• Revision control, commit and notebook diff comparison• Collaborative locking• Advanced interactive presentations editor

Page 38: Journey to Open Data Science

38© 2015 Continuum Analytics- Confidential & Proprietary

Easily publish and share your results with Business Leaders and Analysts

Page 39: Journey to Open Data Science

39

Leverage revision control, commit and diff comparison in notebooks

Notebooks version tracking Notebooks changes diff comparison

Commit your work to be able to go back to, and compare changes with other revisions

Page 40: Journey to Open Data Science

40

Collaborate with notebooks locking features

Page 41: Journey to Open Data Science

41© 2015 Continuum Analytics- Confidential & Proprietary

Transform notebook into an Interactive Presentation with an advanced editor

Edit slides layout and content

Edit slides theme

Present your slides with embedded interactive visualizations

Page 42: Journey to Open Data Science

© 2016 Continuum Analytics- Confidential & Proprietary

ANACONDA FOR EXCELBRINGING ADVANCED ANALYTICS AND INTERACTIVE VISUALIZATIONS TO MS EXCEL

Page 43: Journey to Open Data Science

Create browser-based Interactive Visualizations directly from your spreadsheet

Write your visualization directly into the formula

Access a powerful interactive toolbox

Enhance exploration with a customizable hover tool

Page 44: Journey to Open Data Science

Interactively explore your spreadsheet data with the crossfilter app

Select variables to plot, and color, palette and size of the points

Immediately view your updates in the visualization

Page 45: Journey to Open Data Science

Access advanced Machine Learning models to cluster your data

Simple formulas for advanced modeling applications

Easily input variables into algorithms with interactive widgets

Access a wide range of modeling algorithms

Page 46: Journey to Open Data Science

46

Enterprise Ready Open Data Science platform– Interactive Visualization for Fast Exploration– Data Science Team Collaboration– Publishing & Sharing of Data Science Results– Scale Up & Scale Out Advanced Analytics– Governance, Provenance, & Security

Without the proprietary vendor cost & lock-in

© 2016 Continuum Analytics- Confidential & Proprietary

Page 47: Journey to Open Data Science

ANACONDA GIVES SUPERPOWERS TO PEOPLE WHO CHANGE THE WORLD

© 2016 Continuum Analytics- Confidential & Proprietary

Page 48: Journey to Open Data Science

48

Modern Data Science Teams Love Anaconda

© 2016 Continuum Analytics- Confidential & Proprietary

Data Scientist• Hadoop / Spark• Programming

Languages• Analytic Libraries• Notebooks• Visualization• IDE

Biz Analyst• Spreadsheets• Visualization• Notebooks• Analytic

Development Environment

Data Engineer• Database / Data

Warehouse• ETL

Developer• Programming

Languages• Analytic Libraries• IDE• Notebooks• Visualization

DevOps• Database / Data

Warehouse• Middleware• Programming

Languages

Page 49: Journey to Open Data Science

49© 2016 Continuum Analytics- Confidential & Proprietary

Anaconda Trusted by Industry LeadersFinancial Services

Risk Mgmt, Quant modeling, Data exploration and processing, algorithmic trading, compliance reporting

GovernmentFraud detection, data crawling, web & cyber data analytics, statistical modeling

Healthcare & Life SciencesGenomics data processing, cancer research, natural language processing for health data science

High TechCustomer behavior, recommendations, ad bidding, retargeting, social media analytics

Retail & CPGEngineering simulation, supply chain modeling, scientific analysis

Oil & GasPipeline monitoring, noise logging, seismic data processing, geophysics

Page 50: Journey to Open Data Science

50

Anaconda Subscriptions

© 2015 Continuum Analytics- Confidential & Proprietary

Page 51: Journey to Open Data Science

51

Next Steps

• Open Data Science Journey Assessment [email protected] to schedule assessment

• Download Anaconda https://www.continuum.io/downloads

• Migrate your first model to ODS [email protected] to schedule a POC

© 2016 Continuum Analytics- Confidential & Proprietary

Page 52: Journey to Open Data Science

52© 2016 Continuum Analytics- Confidential & Proprietary

Thank YouMichele ChambersTwitter: @mcAnalytics

Christine DoigTwitter: @ch_doig

Email: [email protected]: @ContinuumIO

Page 53: Journey to Open Data Science

221 W. 6th StreetSuite #1550Austin, TX 78701+1 512.222.5440

[email protected]

@ContinuumIO

CONTINUUM ANALYTICS

We Empower Data Science Teams to Change the World