making r enterprise gradefiles.meetup.com/13745382/session 2 - r server in the enterprise.pdf ·...

41
Making R Enterprise Grade with SQL Server 2016

Upload: others

Post on 29-May-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Making R Enterprise Gradefiles.meetup.com/13745382/Session 2 - R Server in the Enterprise.pdf · SQL Server 2016: Everything to operationalize R The above graphics were published

Making R Enterprise Gradewith SQL Server 2016

Page 2: Making R Enterprise Gradefiles.meetup.com/13745382/Session 2 - R Server in the Enterprise.pdf · SQL Server 2016: Everything to operationalize R The above graphics were published

What is R?

Language

Platform

Community

Ecosystem

• A programming language for statistics, analytics, and data science

• A data visualization framework

• Provided as Open Source

• Used by 3M+ data scientists, statisticians and analysts

• Taught in most university statistics programs

• Active and thriving user groups across the world

• CRAN: 8000+ freely available algorithms, test data and evaluation

• Many of these are applicable to big data if scaled

• New and recent graduates prefer it

Page 3: Making R Enterprise Gradefiles.meetup.com/13745382/Session 2 - R Server in the Enterprise.pdf · SQL Server 2016: Everything to operationalize R The above graphics were published

History of R

20152009200420032000199719951993

Research Project in

New Zealand

Open Source Project

R-Core Group

R-1.0.0 released

R Foundation

First user

group

New York Times article

R-3.2.0 and R Consortium (founded by Microsoft)

Page 4: Making R Enterprise Gradefiles.meetup.com/13745382/Session 2 - R Server in the Enterprise.pdf · SQL Server 2016: Everything to operationalize R The above graphics were published

R’s popularity continues to outpace alternatives

IEEE Spectrum

Page 5: Making R Enterprise Gradefiles.meetup.com/13745382/Session 2 - R Server in the Enterprise.pdf · SQL Server 2016: Everything to operationalize R The above graphics were published

A Vast Community of R Users Share Rich Repositories of Pre-Built Solutions

Standing on the Shoulders of Giants

Page 6: Making R Enterprise Gradefiles.meetup.com/13745382/Session 2 - R Server in the Enterprise.pdf · SQL Server 2016: Everything to operationalize R The above graphics were published

R is a great tool, but the key question for the enterprise is:

how do we operationalize R?

Page 7: Making R Enterprise Gradefiles.meetup.com/13745382/Session 2 - R Server in the Enterprise.pdf · SQL Server 2016: Everything to operationalize R The above graphics were published

?

Some key operationalisation questions

How can we

embed R-based

analytics into

business

applications like

CRM?

How can we

build R-based

analytics into BI

dashboards and

deploy quickly?

How can we

maintain data

security for R

users?

How do we

allow data

scientists to

share code and

version control?

Page 8: Making R Enterprise Gradefiles.meetup.com/13745382/Session 2 - R Server in the Enterprise.pdf · SQL Server 2016: Everything to operationalize R The above graphics were published

SQL Server 2016: Everything to operationalize R

The above graphics were published by Gartner, Inc. as part of a larger research document and should be evaluated in the context of the entire document. The Gartner document is available upon request from Microsoft. Gartner does not endorse any vendor, product or service depicted in its research

publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner's research organization and should not be construed as statements of fact. Gartner disclaims all warranties,

expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.

Consistent experience from on-premises to cloud

In-memory across all workloads

built-in built-inbuilt-in built-inbuilt-in

TPC-H non-clustered results as of 04/06/15, 5/04/15, 4/15/14 and 11/25/13, respectively. http://www.tpc.org/tpch/results/tpch_perf_results.asp?resulttype=noncluster

At massive scale

0 14

0 03

34

29

22

15

5

22

6

43

20

69

18

49

3

0

10

20

30

40

50

60

70

80

2010 2011 2012 2013 2014 2015

SQL Server Oracle MySQL SAP HANA TPC-H non-clustered 10TB

Oracle is #5#2

SQL Server

#1

SQL Server

#3

SQL Server

National Institute of Standards and Technology Comprehensive Vulnerability Database update 2/2016

Page 9: Making R Enterprise Gradefiles.meetup.com/13745382/Session 2 - R Server in the Enterprise.pdf · SQL Server 2016: Everything to operationalize R The above graphics were published

In-database Advanced AnalyticsBuild intelligent applications with SQL Server R Services

Open Source R with in-memory & massive scale - multi-threading and massive parallel processing

Ad

vance

d A

nalytics

NEW

NEW

R built-in to SQL Server

Mission critical OLTP

R built-in to your T-SQL

Real-time operational analyticswithout moving the data

NEW

Page 10: Making R Enterprise Gradefiles.meetup.com/13745382/Session 2 - R Server in the Enterprise.pdf · SQL Server 2016: Everything to operationalize R The above graphics were published

Most secure database

Least vulnerable 6 yearsLayers of protection

0 2

118

0 14

0 03

69

53 53 55

3429

22

15

5

22

5

2825

29

21

6

13

30

6

16 16

9 8 6

43

20

69

18

49

3

0

10

20

30

40

50

60

70

80

2006 2007 2008 2009 2010 2011 2012 2013 2014 2015

SQL Server Oracle DB2 MySQL SAP HANA

* National Institute of Standards and Technology Comprehensive Vulnerability Database update 10/2015

in a row & most utilized

Protect data

• Always Encrypted

• Transparent data encryption

Monitor activity

• Advanced Threat Analytics

• SQL Server auditing

Control access

• Windows Authentication

• Row-level security

• Dynamic data masking

NEW

NEW

NEW

NEW

Mo

st secu

re d

ata

base

Page 11: Making R Enterprise Gradefiles.meetup.com/13745382/Session 2 - R Server in the Enterprise.pdf · SQL Server 2016: Everything to operationalize R The above graphics were published

Dramatically simplify HA & DRoperational efficiencies with hybrid cloud

Enhanced AlwaysOn

Missio

n critica

l OLT

P

NEW

Page 12: Making R Enterprise Gradefiles.meetup.com/13745382/Session 2 - R Server in the Enterprise.pdf · SQL Server 2016: Everything to operationalize R The above graphics were published

Highest performing DW

#1 TPC-H10 TB TPC-H world record*

@ ½ hardware costsUnparalleled scalability

with Windows Server 2016

12TB of memory WS 2016 max cores#1 SAP performance

2-tier 16-processor SAP benchmark

on Superdome X + SQL Server

Hig

hest p

erfo

rmin

g D

W

* As of 4/6/2015 (http://www.tpc.org/3312)

24X TPS boostIn throughput on the

same hardware

Page 13: Making R Enterprise Gradefiles.meetup.com/13745382/Session 2 - R Server in the Enterprise.pdf · SQL Server 2016: Everything to operationalize R The above graphics were published

Lightning fast queries & reports

End-to-end mobile BI on any device

Reports in minutes not days

PB scale DW in SQL Server

On all mobile devices

built-in built-in

Windows iOS Android HTML5

End

-to-e

nd

mo

bile

BI NEW

NEW

NEW

Page 14: Making R Enterprise Gradefiles.meetup.com/13745382/Session 2 - R Server in the Enterprise.pdf · SQL Server 2016: Everything to operationalize R The above graphics were published

The Data Science Process withSQL Server 2016

Page 15: Making R Enterprise Gradefiles.meetup.com/13745382/Session 2 - R Server in the Enterprise.pdf · SQL Server 2016: Everything to operationalize R The above graphics were published

Data scientists transform data into intelligent actionwith SQL Server 2016

Deployment

Dashboards &

Visualizations

Information

Management

Data Stores Machine Learning

and Analytics

Hadoop

HDFS

Data Intelligence Action

People

Automated Systems

Apps

Web

Mobile

Bots

SQL R Stored

ProcedureSQL Server

2016

Data Catalog

SQL Server

Integration

Services

Microsoft R

Client

Power BI

Data

Sources

Apps

Sensors

and

devices

Data

Azure Blob

Store

PolyBase

R Server for

Hadoop

Excel

Reporting

Services

SQL Server

Data Mining

Tools

DeployR

(web-services)

SQL Server

Data Tools

Page 16: Making R Enterprise Gradefiles.meetup.com/13745382/Session 2 - R Server in the Enterprise.pdf · SQL Server 2016: Everything to operationalize R The above graphics were published

1. Business and

Technology

Planning

2. Plan and Prepare

Data

3. Ingest Data into

Data Platform

4. Explore and

Visualize Data

5. Generate and

Select Features

6. Train/Retrain

Models

7. Deploy

models/analytics

8. Build dashboards

and/or Apps for

consumption

Data Science Process with SQL Server

20163. Ingest Data into

Data Platform

4. Explore and Visualize Data

5. Generate and Select Features

6. Train Models/optimize

analytics

7. Deploy models/analytics

8. Build dashboards and/or Apps for

consumption

• SQL Integration Services• R Stored Procedure• Data Tools• Import/Export wizards

for ad-hoc data• Directly via R• Polybase to Hadoop

• Microsoft R Client• RTVS or RStudio• Remote from any IDE onto Server• SQL Server Data Mining tools for

non-programmers

• DeployR (one-click)

• Stored Procedures (one-click)

• Azure Machine Learning (Cloud)

• Reporting

Services

• Power BI

• Excel

• Bots

• PowerApps

• Azure

LogicApps

Page 17: Making R Enterprise Gradefiles.meetup.com/13745382/Session 2 - R Server in the Enterprise.pdf · SQL Server 2016: Everything to operationalize R The above graphics were published

Productivity Tools: Visual Studionow including R

• One-click to deploy an R-based SQL stored procedure in RTVS

• For RStudio, we have a package to allow the deployment in a line of R code

• R with Team Services for source code control management

Page 18: Making R Enterprise Gradefiles.meetup.com/13745382/Session 2 - R Server in the Enterprise.pdf · SQL Server 2016: Everything to operationalize R The above graphics were published

Productivity Tools: Data Ingestfrom ad-hoc to production

ad-hoc production

Page 19: Making R Enterprise Gradefiles.meetup.com/13745382/Session 2 - R Server in the Enterprise.pdf · SQL Server 2016: Everything to operationalize R The above graphics were published

Productivity Tools: Microsoft R Client

• Works with RTVS or RStudio

• Speeds up computation locally thanks to MKL

• ScaleR package built-in

• Reproducible R Toolkit built-in

• Work with files on your local file system

• Work with ODBC data from SQL

• Comes with DeployR remote package:

• Remote into SQL Server and do R on the Server

• Switch between server and local environments

• Send R jobs to a server (asynchronously)

Page 20: Making R Enterprise Gradefiles.meetup.com/13745382/Session 2 - R Server in the Enterprise.pdf · SQL Server 2016: Everything to operationalize R The above graphics were published

Productivity Tools: DashboardingPowerBI

• Create Dashboards in PBI Desktop (free)

• Deploy in the Cloud

• Deploy on-premvia SSRS

• Consume R-based SQL stored procedures

• Caveat: As yet, no pagination

Page 21: Making R Enterprise Gradefiles.meetup.com/13745382/Session 2 - R Server in the Enterprise.pdf · SQL Server 2016: Everything to operationalize R The above graphics were published

Productivity Tools: DashboardingSQL Server Reporting Services

• On-Premise

• Option to deploy in the cloud via PBI

• Paginated reports consuming R-based SQL Stored Procedures

• Custom branding

• Easy drag-and-drop report authoring via Visual Studio or Report Builder

• Auto-parameter generation

Page 22: Making R Enterprise Gradefiles.meetup.com/13745382/Session 2 - R Server in the Enterprise.pdf · SQL Server 2016: Everything to operationalize R The above graphics were published

Productivity Tools: Excel Integration

• Allow Excel users to consume your R-based Stored Procedures

• Excel users can consume DeployR Web Services

• No R programming knowledge required for the end user

• Analytics self-service across the organisation

Page 23: Making R Enterprise Gradefiles.meetup.com/13745382/Session 2 - R Server in the Enterprise.pdf · SQL Server 2016: Everything to operationalize R The above graphics were published

Productivity Tools: Build Intelligent AppsMicrosoft PowerApps & SQL Server R Stored Procedures/DeployR Web-Services

Page 24: Making R Enterprise Gradefiles.meetup.com/13745382/Session 2 - R Server in the Enterprise.pdf · SQL Server 2016: Everything to operationalize R The above graphics were published

Hadoop with SQL Server and R

Page 25: Making R Enterprise Gradefiles.meetup.com/13745382/Session 2 - R Server in the Enterprise.pdf · SQL Server 2016: Everything to operationalize R The above graphics were published

Hadoop Data Pipeline

ORC, Parquet, Avro, JSON, Text

Model

Prepare

Store

Deploy

Big Data (TB)

Small Data (MB)

Large Data (GB)

PolyBase

sparklyr

sparklyr

SQL Server 2016

Consume

Page 26: Making R Enterprise Gradefiles.meetup.com/13745382/Session 2 - R Server in the Enterprise.pdf · SQL Server 2016: Everything to operationalize R The above graphics were published

R in Hadoop: Options

Edge Node

HeadNode

Head Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Edge Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data Node

Data NodeDeployR

Remote

Data Science Workstation

sparklyr

Page 27: Making R Enterprise Gradefiles.meetup.com/13745382/Session 2 - R Server in the Enterprise.pdf · SQL Server 2016: Everything to operationalize R The above graphics were published

Provides a scalable, T-SQL-compatible query processing framework for combining data from both universes

Access any data

R in Hadoop: Polybase built-in

Page 28: Making R Enterprise Gradefiles.meetup.com/13745382/Session 2 - R Server in the Enterprise.pdf · SQL Server 2016: Everything to operationalize R The above graphics were published

PolyBase View

Execute T-SQL queries against

relational data in SQL Server and

semi-structured data in Hadoop or

Azure Blob Storage

Leverage existing T-SQL skills and BI

tools to gain insights from different

data stores

Access any data

PolyBase View in SQL Server 2016

Page 29: Making R Enterprise Gradefiles.meetup.com/13745382/Session 2 - R Server in the Enterprise.pdf · SQL Server 2016: Everything to operationalize R The above graphics were published

R View

An R Stored procedure has an input SQL query that executes T-SQL queries against relational data in SQL Server and semi-structured data in Hadoop or Azure Blob Storage

The result set of that query is then processed by R to produce an analytics result set, model and/or R graphics – this can be done at Scale with ScaleR

The stored procedure can be set to run on a schedule and/or event trigger.

Another mechanism to operationalize R+Hadoop

Access any data

PolyBase and R

Page 30: Making R Enterprise Gradefiles.meetup.com/13745382/Session 2 - R Server in the Enterprise.pdf · SQL Server 2016: Everything to operationalize R The above graphics were published

Access any data

PolyBase use cases

Page 31: Making R Enterprise Gradefiles.meetup.com/13745382/Session 2 - R Server in the Enterprise.pdf · SQL Server 2016: Everything to operationalize R The above graphics were published

High Level Architecture

Data Scientists Workstations

Analytics Environment

PolyBase

Hadoop Cluster

From RStudio/RTVS remote to SQL and/or Hadoop Edge using DeployR R package

Continue to use local (workstation) for ad-hoc work

Once process has been tested in analytics environment, we promote to operational environment.

Operational

Page 32: Making R Enterprise Gradefiles.meetup.com/13745382/Session 2 - R Server in the Enterprise.pdf · SQL Server 2016: Everything to operationalize R The above graphics were published

Summary

• SQL Server 2016 contains all the components to operationalize data science workloads

• With SQL Stored procedures we can leverage other Microsoft products e.g. PowerBI, PowerApps, etc

• Visual Studio now has R built-in

• SQL Server comes with many tools to aid productivity – e.g. import wizards, SSIS, SSRS, etc.

• Flexibility of compute:

• SQL Server

• Hadoop

• Workstation

Page 33: Making R Enterprise Gradefiles.meetup.com/13745382/Session 2 - R Server in the Enterprise.pdf · SQL Server 2016: Everything to operationalize R The above graphics were published

Appendix

Page 34: Making R Enterprise Gradefiles.meetup.com/13745382/Session 2 - R Server in the Enterprise.pdf · SQL Server 2016: Everything to operationalize R The above graphics were published

Revolution R Enterprise and SQL

Big data analytics platform Based on open source R

High-performance, scalable, full-featuredStatistical and machine-learning algorithms are performant, scalable, and distributable

Write once, deploy anywhereScripts and models can be executed on a variety of platforms, including non-Microsoft (Hadoop,

Teradata in-DB)

Integration with the R EcosystemAnalytic algorithms accessed via R function with similar syntax for R users (with arbitrary R functions/packages)

Advanced analytics

Data sourceintegration

Parallel external-memoryalgorithm library

Data

Compute context integration

Reso

urce

s

RequestsR scripts +

CRANalgorithms

Page 35: Making R Enterprise Gradefiles.meetup.com/13745382/Session 2 - R Server in the Enterprise.pdf · SQL Server 2016: Everything to operationalize R The above graphics were published

SQL Server 2016 R integration scenario

ExplorationUse Revolution R Enterprise (RRE) from R integrated development environment (IDE) to analyze large data sets and build predictive and embedded models with compute on SQL Server machine (SQL Server compute context)

OperationalizationDeveloper can operationalize R script/model over SQL Server data by using T-SQL constructs

DBA can manage resources, plus secure and govern R runtime execution in SQL Server

Advanced analytics

Page 36: Making R Enterprise Gradefiles.meetup.com/13745382/Session 2 - R Server in the Enterprise.pdf · SQL Server 2016: Everything to operationalize R The above graphics were published

Setup and configuration

SQL Server 2016 setup

Install Advanced Analytics with R feature in setup

Install RRO on SQL Server 2016

machine

Install RRE on SQL Server 2016

machine

Optional: Install R packages on SQL

Server 2016 machine

Database configuration

Enable R language extension in

database

Configure path for RRO runtime in

database

Grant EXECUTE EXTERNAL SCRIPT

permission to users

CREATE EXTERNAL EXTENSION [R]

USING SYSTEM LAUNCHER

WITH (RUNTIME_PATH =

'c:\revolution\bin‘)

GRANT EXECUTE SCRIPT ON

EXTERNAL EXTENSION::R

TO DataScientistsRole;

/* User-defined role /

users */

Advanced analytics

Page 37: Making R Enterprise Gradefiles.meetup.com/13745382/Session 2 - R Server in the Enterprise.pdf · SQL Server 2016: Everything to operationalize R The above graphics were published

Management and monitoring

R runtime usage

Resource governance via resource pool

Monitoring via DMVs

Troubleshooting via XEvents/

DMVs

CREATE RESOURCE POOL R_runtime

FOR EXTERNAL EXTENSION

WITH MAX_CPU_PERCENT = 20,

MAX_MEMORY_PERCENT = 10;

select * from

sys.dm_resource_governor_resouce_po

ols

where name = 'R_runtime';

Advanced analytics

Page 38: Making R Enterprise Gradefiles.meetup.com/13745382/Session 2 - R Server in the Enterprise.pdf · SQL Server 2016: Everything to operationalize R The above graphics were published

R script usage from SQL Server

Original R script:

IrisPredict <- function(data, model){library(e1071)predicted_species <- predict(model, data)return(predicted_species)

}

library(RODBC)conn <- odbcConnect("MySqlAzure", uid = myUser, pwd = myPassword);Iris_data <-sqlFetch(conn, "Iris_Data");Iris_model <-sqlQuery(conn, "select model from my_iris_model");IrisPredict (Iris_data, model);

Calling R script from SQL Server:

/* Input table schema */create table Iris_Data (name varchar(100), length int, width int);/* Model table schema */create table my_iris_model (model varbinary(max));

declare @iris_model varbinary(max) = (select model from my_iris_model);exec sp_execute_external_script

@language = 'R', @script = 'IrisPredict <- function(data, model){library(e1071)predicted_species <- predict(model, data)return(predicted_species)

}IrisPredict(input_data_1, model);', @parallel = default, @input_data_1 = N'select * from Iris_Data', @params = N'@model varbinary(max)', @model = @iris_modelwith result sets ((name varchar(100), length int, width int, species varchar(30)));

Values highlighted in yellow are SQL queries embedded in the original R script

Values highlighted in aqua are R variables that bind to SQL variables by name

Advanced analytics

Page 39: Making R Enterprise Gradefiles.meetup.com/13745382/Session 2 - R Server in the Enterprise.pdf · SQL Server 2016: Everything to operationalize R The above graphics were published

CapabilityExtensible in-database analytics, integrated with R,

exposed through T-SQL

Centralized enterprise library for analytic models

Benefits

SQL Server

Analytical engines

Integrate with R

Become fully extensible

Data management layer

Relational data

Use T-SQL interface

Stream data in-memory

Analytics library

Share and collaborate

Manage and deploy

R +

Data Scientists

Business

Analysts

Publish algorithms, interact

directly with data

Analyze through T-SQL,

tools, and vetted algorithms

DBAsManage storage and

analytics together

Summary: R integration and advanced analytics

Advanced analytics

Page 40: Making R Enterprise Gradefiles.meetup.com/13745382/Session 2 - R Server in the Enterprise.pdf · SQL Server 2016: Everything to operationalize R The above graphics were published

Summary: Analyze your data

Real-time operational analyticsRun analytical workloads concurrently with OLTP workloads through columnstore

indexes

Analysis Services enhancementsMany-to-many support with bi-directional cross filtering, improved performance, new

DAX functions and tooling

Advanced Analytics with RIntegrated R Services, predictive analytics, big data analytics

Page 41: Making R Enterprise Gradefiles.meetup.com/13745382/Session 2 - R Server in the Enterprise.pdf · SQL Server 2016: Everything to operationalize R The above graphics were published

Summary: Deeper insights across data

Store your dataPolyBase • Traditional and Modern Data Warehouse • JSON, FILESTREAM, RBS

Make it accessibleEnhanced SSIS • Enhanced MDS

Analyze dataReal-time operational analytics • Enhanced SSAS • Advanced Analytics with R

Share insightsMobile Report Publisher • Modern Browser Support (SSRS) • Power BI Connector