multi user data science with zeppelin

13
Vinay Shukla Twitter: @ neomythos Feb 17th, 2016 Multi User Data Science with Zeppelin ® ®

Upload: vinay-shukla

Post on 16-Apr-2017

2.484 views

Category:

Software


1 download

TRANSCRIPT

Page 1: Multi User Data science with Zeppelin

Vinay Shukla Twitter: @neomythosFeb 17th, 2016

Multi User Data Science with Zeppelin® ®

Page 2: Multi User Data science with Zeppelin

Page 2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

DisclaimerThis document may contain product features and technology directions that are under development, may be under development in the future or may ultimately not be developed.

Project capabilities are based on information that is publicly available within the Apache Software Foundation project websites ("Apache"). Progress of the project capabilities can be tracked from inception to release through Apache, however, technical feasibility, market demand, user feedback and the overarching Apache Software Foundation community development process can all effect timing and final delivery.

This document’s description of these features and technology directions does not represent a contractual commitment, promise or obligation from Hortonworks to deliver these features in any generally available product.

Product features and technology directions are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.

Since this document contains an outline of general product development plans, customers should not rely upon it when making purchasing decisions.

Page 3: Multi User Data science with Zeppelin

Page 3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Introducing Apache Zeppelin Web-based Notebook for interactive analyticsFeaturesAd-hoc experimentation

Spark, Hive, Shell, Flink, Tajo, Ignite, Lens, etc

Deeply integrated with Spark + HadoopCan be managed via Ambari Stacks

Supports multiple language backendsPluggable “Interpreters”

Incubating at Apache100% open source and open community

Use CaseData exploration and discoveryVisualization

tables, graphs and charts

Interactive snippet-at-a-time experienceCollaboration and publishing“Modern Data Science Studio”

Page 4: Multi User Data science with Zeppelin

Page 4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Apache Zeppelin

Page 5: Multi User Data science with Zeppelin

Page 5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

PySpark / Spark SQL

Page 6: Multi User Data science with Zeppelin

Page 6 © Hortonworks Inc. 2014

Spark & Zeppelin Pace of InnovationHDP 2.2.4

Spark 1.2.1GA

HDP 2.3.2Spark 1.4.1

GA

HDP 2.3.0Spark 1.3.1

GA

HDP 2.3.4Spark 1.5.2*

GA

Spark

Spark 1.3.1 TP

5/2015

Spark 1.4.1 TP

8/2015

Spark 1.5.1 TP

Nov/2015 Now

ZeppelinTP

Oct/2015

Apache Zeppelin

Zeppelin TP Refresh

March 1st 2016

Dec 2015

HDP 2.4.0Spark 1.6

GA

Zeppelin GA

Q1, 2016

Spark 1.6 TP

Jan/2015

March 1st 2016

HDP 2.5.xSpark 1.6.1*

GAQ1, 2016

Page 7: Multi User Data science with Zeppelin

© Hortonworks Inc. 2015. All Rights Reserved

What’s New in HDP 2.4.0?

• Spark 1.6 GA – GA of Dynamic Resource Allocation*

• Zeppelin TP#2– Notebook import/export features– LDAP Authentication*

Marketing announcement coming March 1st

Page 8: Multi User Data science with Zeppelin

© Hortonworks Inc. 2015. All Rights Reserved

Requirements for Zeppelin in a M/T Env• Support multiple users • Security - Provide security sandbox by default• Authentication – LDAP – Integrate with Corporate Identity

Store• Authorization – Access Control for both Data & Notebooks• Encryption – Work with both Wire & encrypted data• Audit – Keep track of who did, what, when & what results

with non-repudiation• Manageability• Sharing/Collaboration of both data & notebooks

Page 9: Multi User Data science with Zeppelin

Page 9 © Hortonworks Inc. 2014

Zeppelin GA – Features

•Ambari Managed Install/Configuration

•Runs in a Kerberos Cluster

•LDAP Authentication

•SSL

•Notebook Import/Export

Coming April, 2016

Page 10: Multi User Data science with Zeppelin

Page 10 © Hortonworks Inc. 2014

Zeppelin Missing Features

•R Interpreter

•Better Visualizations–GGPlot,, Shiny equivalent visualizations

•Access Control on Notebooks

•Library Management

Page 11: Multi User Data science with Zeppelin

Page 11 © Hortonworks Inc. 2014

What is coming later? – H2, 2016•Zeppelin Improvements –Zeppelin Access Control–Ambari managed LDAP Configuration–Pluggable Visualization–R Interpreter

Page 12: Multi User Data science with Zeppelin

Page 12 © Hortonworks Inc. 2014

Various Apache Zeppelin JIRA/Pull Requests–Identity Propagation: https://issues.apache.org/jira/browse/ZEPPELIN-645

–LDAP Authentication: https://github.com/apache/incubator-zeppelin/pull/625

–Notebook Access Control: https://github.com/apache/incubator-zeppelin/pull/681

–Notebook Import/Export: https://issues.apache.org/jira/browse/ZEPPELIN-372

–R Interpreter: https://issues.apache.org/jira/browse/ZEPPELIN-156

Page 13: Multi User Data science with Zeppelin

Page 13 © Hortonworks Inc. 2014

Thank YouTwitter:@neomythos