journey to sas analytics grid with sas, r, python

35
Journey to SAS Analytics Grid with SAS, R, Python Benjamin Zenick, Chief Operating Officer - Zencos Sumit Sarkar, Chief Data Evangelist - Progress DataDirect

Upload: sumit-sarkar

Post on 22-Jan-2018

329 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Journey to SAS Analytics Grid with SAS, R, Python

Journey to SAS

Analytics Grid with SAS,

R, Python

Benjamin Zenick, Chief Operating Officer -

Zencos

Sumit Sarkar, Chief Data Evangelist -

Progress DataDirect

Page 2: Journey to SAS Analytics Grid with SAS, R, Python

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved.2

Audio Bridge Options & Question Submission

Page 3: Journey to SAS Analytics Grid with SAS, R, Python

Journey to SAS

Analytics Grid with SAS,

R, Python

Benjamin Zenick, Chief Operating Officer -

Zencos

Sumit Sarkar, Chief Data Evangelist -

Progress DataDirect

Page 4: Journey to SAS Analytics Grid with SAS, R, Python

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved.4

Agenda

Differences between traditional and Grid deployments for SAS

Best practices and lessons learned in deploying an Analytics Grid

How to deliver an open analytics strategy for SAS, R, Python and

others

Popular data sources for advanced analytics

Page 5: Journey to SAS Analytics Grid with SAS, R, Python

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved.5

POLL

WHERE ARE YOU IN YOUR ANALYTICS JOURNEY?

DESKTOP ANALYTICS

CLIENT/SERVER ANALYTICS

GRID ANALYTICS

CLOUD ANALYTICS

OTHER

Page 6: Journey to SAS Analytics Grid with SAS, R, Python

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved.6

Differences between traditional and Grid deployments for SAS

Page 7: Journey to SAS Analytics Grid with SAS, R, Python

The Evolution of Analytics

Businesses started with large and expensive central mainframes

– Mainframes were limited by early storage and processing technology

– Connectivity and user interfaces to data were limited by “dumb” terminals

– Expansion was limited by proprietary chassis design

– Connecting multiple mainframes was expensive, challenging, or impossible

Page 8: Journey to SAS Analytics Grid with SAS, R, Python

Analytics Today

• Modernization moved away from Mainframes

• Moved toward server / client solutions, workstations, storage

appliances, and networking

• Shortcoming of centralized datacenters: Administrative and

Performance Bottlenecks

Page 9: Journey to SAS Analytics Grid with SAS, R, Python

Example of Traditional Deployment

Page 10: Journey to SAS Analytics Grid with SAS, R, Python

What benefits do grid deployments provide?

• Standardization supporting multiple ecosystems

• Streamline Administrative support

• Better tools for analytics and administration

• Centralizing and improving management

• Size & Scalability

Page 11: Journey to SAS Analytics Grid with SAS, R, Python

Example of Grid Deployment

Page 12: Journey to SAS Analytics Grid with SAS, R, Python

Signs your organization is ready to consider an HPC or Grid

solution…

• Decrease in cost benefits

• Current model doesn’t scale well

• Massively Parallelized Processing

• Administrative needs continue to grow and grow

• High(er) Availability is possible

• Faster (Disaster) Recovery

Zencos capabilities prepared for TEST Co.

Page 13: Journey to SAS Analytics Grid with SAS, R, Python

Top Considerations for “Modernization”

• Why?

• Who?

• What?

• Where?

• When?

Page 14: Journey to SAS Analytics Grid with SAS, R, Python

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved.14

Best practices and lessons learned in deploying an Analytics Grid

Page 15: Journey to SAS Analytics Grid with SAS, R, Python

Best Practices

• Preparation

• Technologies

• Plan

• Time

• Expectations

• Team

• Transition

• Users

• Support

• Goal Alignment

Page 16: Journey to SAS Analytics Grid with SAS, R, Python

Lessons Learned

• Invest in a meaningful assessment

• Plan to purchase and build Test and Disaster Recovery

environments

• Understand the applications and use cases

• Outline support model for legacy projects

• Consider your post-implementation needs

• Expect the unexpected

Page 17: Journey to SAS Analytics Grid with SAS, R, Python

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved.17

How to deliver an open analytics strategy for SAS, R, Python and others

Page 18: Journey to SAS Analytics Grid with SAS, R, Python

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved.18

POLL

WHICH LANGUAGE(S) ARE COMMONLY USED IN YOUR

ORGANIZATION

SAS

Python

R

SPSS

OTHER

Page 19: Journey to SAS Analytics Grid with SAS, R, Python

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved.19

SAS and Open Analytics across …

SAS ViyaSAS Grid ManagerSAS (open data access and grid

management for native language support)

Page 20: Journey to SAS Analytics Grid with SAS, R, Python

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved.20

SAS Grid Manager

Image from SAS webinar: https://www.evensi.us/webinar-taking-r-and-python-from-good-to-

great-with-sas-/204358443

Page 21: Journey to SAS Analytics Grid with SAS, R, Python

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved.21

SAS with Open Data Access (ODBC)

Access external data using supported

access modes using data source

specific SAS/Access interfaces.

Leverage generic SAS/Access

interface to ODBC with an open

ODBC driver for direct access from

Python and R.

Page 22: Journey to SAS Analytics Grid with SAS, R, Python

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved.22

Workers

SAS and Open Analytics | SAS Grid (Open Data Access via ODBC)

ODBC

RDBMS, Big Data, NoSQL, Cloud

Access data sources over TCP or HTTPS

Analytics GridOpen Grid Manager

Open Data Access

Controller

Page 23: Journey to SAS Analytics Grid with SAS, R, Python

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved.23

R ODBC Example

library(RODBC)

# Make a connection using your DSN name

conn <- odbcConnect("Spark Next")

# Execute a SQL Tables call

sqlTables(conn)

# Execute a SQL columns call on the table with our energy data

sqlColumns(conn, "energyconsumption")

# Bind the results of a SQL query for plotting

data <- sqlQuery(conn, "SELECT * FROM energyconsumption WHERE country IN ('China', 'United States', 'Canada', 'France', 'Germany', 'Italy',

'Japan')")

# Attach the data for plotting access

attach(data)

Page 24: Journey to SAS Analytics Grid with SAS, R, Python

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved.24

Python ODBC Example

import pyodbc

import getpass

import sys

def show_odbc():

sources = pyodbc.dataSources()

dsns = sources.keys()

sl = []

i = 1

for dsn in dsns:

sl.append( str(i) + '. %s' % (dsn))

i= i+1

print('\n'.join(sl))

return dsns

def listTables(cursor):

for row in cursor.tables():

print row.table_name

def executeSelectQuery(cursor, cnxn):

query = raw_input('Enter the SELECT Query:')

cursor.execute(query)

Page 25: Journey to SAS Analytics Grid with SAS, R, Python

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved.25

DataDirect ODBC is engineered for GRID and Cloud

Deliver advanced functionality over OSS to become SAS OEM Partner

Run 85+ million QA tests on our suite of connectors

Performance labs measure throughput and resource utilization (CPU and memory)

Focus on security features for customers to achieve regulatory compliance

Page 26: Journey to SAS Analytics Grid with SAS, R, Python

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved.26

Popular data sources for advanced analytics

Page 27: Journey to SAS Analytics Grid with SAS, R, Python

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved.27

Popular Relational/Analytics Data Sources

SQL Server 18.70%

Oracle 12.89%

MySQL 12.77%

Progress OpenEdge 7.93%

PostgreSQL 5.65%

Microsoft SQL Azure5.27%

IBM DB2 4.76%

SQLite 3.68%

Teradata 2.61%

SAP HANA 2.30%

MariaDB 2.25%

Sybase ASE 1.92%

Amazon Redshift 1.79%

Informix 1.64%

Sybase IQ 1.30%

Netezza 1.25%

Other (please

specify): 1.13%

Amazon Aurora 1.00%

Not sure 0.97%

Pivotal Greenplum0.87%

Google BigQuery 0.77%

Vertica 0.61%

Page 28: Journey to SAS Analytics Grid with SAS, R, Python

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved.28

Popular Big Data Sources

Hadoop Hive 18.53%

Spark SQL 8.17%

Hortonworks 7.97%

Cloudera CDH 7.87%

Cloudera Impala 7.47%

Apache Solr 7.37%

Oracle BDA 6.67%

Amazon EMR 5.98%

Apache Sqoop 5.48%

MapR 5.38%

IBM BigInsights 4.68%

Apache Storm 4.08%

Apache Drill 2.39%

Apache Phoenix 2.39%

SAP Altiscale 2.19%

Pivotal HD 1.89%

Presto 0.80%

GemFireXD 0.70%

Page 29: Journey to SAS Analytics Grid with SAS, R, Python

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved.29

Popular NoSQL Sources

MongoDB 35.60%

Cassandra 14.57%

HBase 10.34%

Oracle NoSQL 9.01%

Redis 8.45%Other (please

specify): 6.01%

Couchbase 5.78%

DynamoDB 2.78%DataStax

Enterprise 2.22%

SimpleDB 2.22%

MarkLogic 1.67%Aerospike 0.78%

Riak 0.56%

Page 30: Journey to SAS Analytics Grid with SAS, R, Python

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved.30

What about SaaS?

Data Source API

Eloqua Web Services API (REST/SOAP)

Bulk and non-Bulk APIs

No query language

Oracle Service Cloud Web Services APIs (REST/SOAP)

ROQL

Google Analytics Hypercube (query limits of 10 metrics grouped by

max of 7 dimensions)

Veeva CRM SOAP, BULK, Metadata APIs

SOQL

Page 31: Journey to SAS Analytics Grid with SAS, R, Python

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved.31

Supported ODBC Data Sources for SAS/Access

Apache Hadoop Hive 0.8.0 and higher

Amazon EMR 2.1.4 and higher

Amazon Redshift

Apache Spark SQL 1.2, 1.3, 1.4, 1.5

Cloudera CDH update 4 and higher

Cloudera Impala 1.0, 1.1, 1.2, 1.3, 1.4

Cloudera Impala 2.0, 2.1, 2.2

Hortonworks 1.3 and higher

IBM BigInsights 3.0 and higher

MapR 1.2 and higher

Pivotal HD 2.0.1 and higher

DB2 V9.1, V9.5, V9.7, 9.8 for Linux, UNIX, Windows DB2 V8.x for LUW

DB2 11 for z/OS* DB2 V10 for z/OS DB2 V9.1 for z/OS

DB2 UDB V8.1 for z/OS

DB2 I 7.1, 7.2* (DB2 UDB V7R1, V7R2 for iSeries)

DB2 I 6.1 (DB2 UDB V6R1 for iSeries)

DB2 for I 5/OS (DB2 UDB V5R4 for iSeries)

Eloqua (Oracle Marketing Cloud)

Financial Force

Google Analytics

Greenplum 4, 4.1, 4.2, 4.3

Greenplum 3.3

Hubspot

Informix Dynamic Server 12.1*

Informix Dynamic Server 11.0, 11.5, 11.7

Informix Dynamic Server 10.0

Informix Dynamic Server 9.2, 9.3, 9.4

Informix Dynamic Server 11.0, 11.5, 11.7

Informix Dynamic Server 10.0

Informix Dynamic Server 9.2, 9.3, 9.4

Marketo

Microsoft Dynamics CRM 2011 Rollup 16, 2013, 2015

Microsoft SQL Server 2014*

Microsoft SQL Server 2012

Microsoft SQL Server 2008 R1, R2

Microsoft SQL Server 2005

Microsoft SQL Server 2000 Desktop Engine (MSDE 2000) Microsoft SQL Server 2000

Microsoft SQL Azure*

MongoDB 3.0

MongoDB 2.2, 2.4, 2.6

MySQL Enterprise Edition 5.0, 5.1, 5.5, 5.6*

Oracle 12c R1 (12.1)*

Oracle 11g R1, R2 (11.1, 11.2)

Oracle 10g R1, R2 (10.1, 10.2)

Oracle 9i R1, R2 (9.0.1, 9.2)

Oracle 8i R3 (8.1.7)

Oracle Service Cloud

Oracle Sales Cloud

Pivotal HAWQ 1.1*, 1.2*

PostgreSQL 9.0, 9.1, 9.2, 9.3, 9.4*

PostgreSQL 8.2, 8.3, 8.4

Progress OpenEdge 11.0, 11.1*, 11.2*, 11.3*, 11.4*

Progress OpenEdge 10.1.x, 10.2.x

Progress Rollbase 2.0 and higher*

REST API (via OpenAccess)

SAP Adaptive Server Enterprise 16.0*

ServiceMax

SugarCRM 7.1.6 and higher*

Sybase Adaptive Server Enterprise 15.0, 15.5, 15.7

Sybase Adaptive Server Enterprise 12.0, 12.5, 12.5.x

Sybase Adaptive Server Enterprise 11.9

Sybase IQ 16.0*

Sybase IQ 15.0, 15.1, 15.2, 15.3, 15.4

Veeva CRM

Blue text indicates cloud hosted

Blue text* indicates cloud hosted with on-premises option

Page 32: Journey to SAS Analytics Grid with SAS, R, Python

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved.32

NEW cross data center access for SAS/Access interface to ODBC (over https)

SAS/Access interface to

ODBC

Page 33: Journey to SAS Analytics Grid with SAS, R, Python

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved.33

Learn More about Data Access for SAS Analytics

What DataDirect Does for SAS Shops

“Taking R and Python from good to great with SAS” [Webinar hosted

by SAS in April 17]

Zencos Consulting Blog

Tech Articles on configuring SAS with ODBC:

• SAS/Access 9.4 interface to ODBC Tutorial across popular data

sources such as SQL Server, Salesforce and Amazon Redshift

• SAS/Access 9.4 interface to ODBC Tutorial across cloud data

sources such as Marketo and Eloqua

Page 34: Journey to SAS Analytics Grid with SAS, R, Python

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved.34

Wrap Up with Q&A

Slides and recording will be made available to each attendee

Visit www.datadirect.com to learn more about ODBC drivers engineered for analytics

Please enter your questions in the chat...

Page 35: Journey to SAS Analytics Grid with SAS, R, Python