visualize. explore. transform. anaconda mosaic

29
© 2015 Continuum Analytics- Confidential & Proprietary VISUALIZE. EXPLORE. TRANSFORM. ANACONDA MOSAIC Lance Ransom, Anaconda Mosaic Product Manager Christine Doig, Senior Data Scientist

Upload: continuum-analytics

Post on 10-Feb-2017

1.957 views

Category:

Data & Analytics


0 download

TRANSCRIPT

© 2015 Continuum Analytics- Confidential & Proprietary

VISUALIZE. EXPLORE. TRANSFORM. ANACONDA MOSAIC

Lance Ransom, Anaconda Mosaic Product Manager Christine Doig, Senior Data Scientist

2

• Anaconda Overview • What is Anaconda Mosaic? • Flat File Repositories • Heterogeneous Data • Interactive Data Visualization • Summary • Q&A

Agenda

ANACONDA OVERVIEW

4

is…. Leading Open Data Science PlatformPowered by Python, the fastest growing data science language

• Accelerate Time-to-Value • Connect Data, Analytics & Compute • Empower Data Science Teams

© 2015 Continuum Analytics- Confidential & Proprietary 5

ACCELERATE Time-to-Value

INNOVATE faster through managed agile experimentation MOVE from analysis to deployment immediately DELIVER high performance analytics processing

CONNECT Data, Analytics & Compute

LEVERAGE innovative open source analytics to extract value from data MAXIMIZE your computational power to easily analyze all your data CONNECT and integrate all your data sources for predictive models

EMPOWER Data Science Teams

ITERATE quickly to create powerful analysis and predictive models COLLABORATE and share with your data science team PUBLISH interactive results to the business

© 2015 Continuum Analytics- Confidential & Proprietary 6

Data ScientistBiz Analyst Data EngineerDeveloper DevOps

Deploy & Operate

Explore & Analyze

Collaborate & Publish

Data Science Team

WHAT IS ANACONDA MOSAIC?

8

Anaconda Mosaic

• Create PORTABLE transformations • Interactively EXPLORE heterogeneous

data • Easily ANALYZE large flat file repositories • ELIMINATE data movement and

redundant storage • CATALOG datasets and transformations • ESTABLISH data lineage

9

ANALYTICS

DATA

Built on Anaconda Open Source Technology

BokehBlaze

DBsExcel Flat file repositories

10

SQL

CSV

REST

JSON

SQL

CSV

REST

JSON

SQL

CSV

SQL

CSV

Loss of Data Visibility

Poor data visibility inhibits valuable insights

11

Oracle MySQL MSSQL KDB ZIPCSV

SQL

Python

DSL

R Excel

C++ Java

Storage

ETL

Analysis

REST

Heterogenous Environments

12

Ad-hoc Workflow

FindData

Ad-HocETL

Copy/StoreAnalysis

Report

13

AccumulatesQuickly

DisparateStorageDifferentVendors

FormatChanges

Ad-hocUsage

Urgent!

Compounding Data Challenges

FLAT FILE REPOSITORIES

15

Demo 1: Flat file repositories

• COMBINE individual CSV files • ELIMINATE data movement and

redundant storage • INCORPORATE names into dataset • OPTIMIZE computations

16

Flat file repositoriessource: "lux://global-equities/data/daily/us/nasdaq stocks" extractor: "{}/{Symbol}.{Region}.txt"

Date,Open,High,Low,Close,Volume,OpenInt,Symbol,Region20151111,18.5,25.9,18,24.5,1584600,0,aaap,us20151112,24.25,27.12,22.5,25,83000,0,aaap,us20151113,25.47,26.2,24.55,25.26,67300,0,aaap,us…20160322,11.56,11.98,10.8894,11.09,517604,0,zyne,us20160323,11.3,11.72,9.5,9.75,489743,0,zyne,us20160324,9.5,10.24,9.22,9.64,188512,0,zyne,us

Onedatasetwith~5.5millionrows

17

CSVCombine

TransformPresent results

CSV

CSV

HETEROGENOUS DATA

19

Demo 2: Heterogenous Data

• COMPOSE expressions independent of storage system

• PUSH transformations to the data • Lazily EVALUATE • FAMILIAR Pandas like API

20

Mosaic Ecosystem

Expression

ComputeData

Oracle MySQL MSSQL KDB ZIPCSV

SQL

Python

DSL

R Excel

C++ Java

Storage

ETL

Analysis

REST

Oracle MySQL MSSQL KDB ZIPCSV

SQL

Python

DSL

R Excel

C++ Java

Storage

ETL

Analysis

REST

Oracle MySQL MSSQL KDB ZIPCSV

SQL

Python

DSL

R Excel

C++ Java

Storage

ETL

Analysis

REST

21

SQL

CombineTransform

Present results

22

Excel

CombineTransform

Present results

PCCTPCC

INTERACTIVE DATA VISUALIZATION

24

Demo 3: Interactive Data Visualization

• EXPLORE datasets visually • Easily CHANGE plot types • MOVE around and zoom in or out • PLOT large datasets with DataShader

http://go.continuum.io/datashader/

Learn more about datashader

SUMMARY

26

Why Anaconda Mosaic?

EMPOWER Data Science Teams

ACCELERATE Time-to-Value

CONNECT Data, Analytics & Compute

27

https://www.continuum.io/anaconda-subscriptions

Anaconda Mosaic is available with Anaconda Enterprise Subscription

28

Demo Documentation Learn more

Request a private demo for your team

Review Anaconda Mosaic

Documentation

Learn more about all the features in Anaconda

Enterprise subscriptions

www.continuum.io/anaconda-subscriptionsdocs.continuum.io/anaconda/mosaic/[email protected]

Next steps

Q&A