apache zeppelin, the missing component for the big data ecosystem

33
@doanduyhai Apache Zeppelin, the missing GUI for your BigData eco-system DuyHai DOAN, Technical Advocate

Upload: duyhai-doan

Post on 09-Jan-2017

1.325 views

Category:

Technology


0 download

TRANSCRIPT

@doanduyhai

Apache Zeppelin, the missing GUI for your BigData eco-system DuyHai DOAN, Technical Advocate

@doanduyhai

Who Am I ?Duy Hai DOAN Cassandra technical advocate•  talks, meetups, confs•  open-source devs (Achilles, …)•  OSS Cassandra point of contact

[email protected] ☞ @doanduyhai

2

@doanduyhai

Datastax•  Founded in April 2010

•  We contribute a lot to Apache Cassandra™

•  400+ customers (25 of the Fortune 100), 400+ employees

•  Headquarter in San Francisco Bay area

•  EU headquarter in London, offices in France and Germany

•  Datastax Enterprise = OSS Cassandra + extra features

3

What is Apache Zeppelin ?

PresentationArchitecture

@doanduyhai

Zeppelin Presentation

5

@doanduyhai

Demo https://github.com/doanduyhai/incubator-zeppelin/tree/ZeppelinPresentation

@doanduyhai

Zeppelin Architecture

Zeppelin Server

Zeppelin Engine

7

REST

Web

Sock

et

Spark Interpreter Group

Spark SparkSQL

Zeppelin Interpreter

Factory

Tajo Interpreter

Flink Interpreter

Cassandra Interpreter

JVM

JVM

JVM

JVM

JVM

@doanduyhai

What does Zeppelin provide ?Front-end & display system for free

Generic back-end with REST APIs & WebSocket

Pluggable interpreters system

Task scheduler (à la CRON)

8

Zeppelin UI Layout

NotebookParagraph

UI elements

@doanduyhai

Demo https://github.com/doanduyhai/incubator-zeppelin/tree/ZeppelinPresentation

Zeppelin Display System

Raw, Table, HTMLAvailable graphs

View modesDynamic formIframe export

@doanduyhai

Demo https://github.com/doanduyhai/incubator-zeppelin/tree/ZeppelinPresentation

Interpreter system

Core interpretersThird-parties interpreters

Interpreters conf & usage

@doanduyhai

Interpreter processing lifecycle①  Receive input commands/data•  as raw text

•  from form data

②  Process the input commands/data by the external back-end

③  Format the response using Zeppelin display system

④  Send response back to the Zeppelin engine

14

@doanduyhai

Core interpreters•  Spark (Spark core, SparkSQL/DataFrame, PySpark)•  Spark core = default (or %spark)

•  SparkSQL = %sql

•  Shell (%sh)

•  Markdown (%md)

•  AngularJS (%angular)

15

@doanduyhai

Third-parties interpreters•  Hive•  Phoenix•  Tajo•  Flink•  Ignite•  Lens•  Cassandra •  Geode•  PostgreSQL•  Kylin

16

@doanduyhai

Interpreter conf & usage https://github.com/doanduyhai/incubator-zeppelin/tree/ZeppelinPresentation

Writing An Interpreter

How ToSimple interpreter example (AsciiDoc)

Complex interpreter example (Cassandra)

@doanduyhai

Steps to write your own interpreter

•  Create a class that extends Interpreter base class•  Register it in a static block

•  Optionnally define default config params

19

static {Interpreter.register("MyInterpreterName", MyClassName.class.getName());

}

static {Interpreter.register("MyInterpreterName", MyClassName.class.getName(),

new InterpreterPropertyBuilder() .add("property1", "default value", "Description of property1").build());

}

@doanduyhai

To register your interpreter as default

•  Edit the enum ZeppelinConfiguration.ConfVars

•  Add your interpreter FQCN in the property ZEPPELIN_INTERPRETERS

20

@doanduyhai

To register your interpreter in config files

•  Create conf/zeppelin-site.xml from conf/zeppelin-site.xml.template

•  Add your interpreter FQCN in the property zeppelin.interpreters

21

<property><name>zeppelin.interpreters</name><value>org.apache.zeppelin.spark.SparkInterpreter,org.apache.zeppelin.spark.PySparkInterpreter,

org.apache.zeppelin.spark.SparkSqlInterpreter,org.apache.zeppelin.spark.DepInterpreter,org.apache.zeppelin.markdown.Markdown,org.apache.zeppelin.shell.ShellInterpreter,org.apache.zeppelin.hive.HiveInterpreter,com.me.MyNewInterpreter

</value></property>

@doanduyhai

Simple AsciiDoc Interpreter

22

Zeppelin Server

AsciiDoc Interpreter

JVM Zeppelin Engine

RawTextBlock

RawTextBlock

ConvertedTo

HTML

HTMLOutput

① ②

③ ④

JVM

@doanduyhai

Simple interpreter (AsciiDoc) https://github.com/doanduyhai/incubator-zeppelin/tree/ZeppelinPresentation

@doanduyhai

Cassandra Interpreter Architecture

24

Cassandra Interpreter

JVM

DisplayResults as

HTML

① ②

Zeppelin Server

JVM

RawTextBlock

RawTextBlock

Cassandra Cassandra

Java Driver

Async CQLstatements

④ RenderHTML

@doanduyhai

Cassandra Interpreter Commands

25

Native CQL statements

SELECT * FROM …;INSERT INTO …;…

Schema commands

DESCRIBE TABLE …;DESCRIBE KEYSPACE …;…

Prepared statements Commands

@prepare …;@bind …;@remove_prepared …;

Help commandHELP;

Options Commands@consistency …;@retryPolicy …;@fetchSize …;

@doanduyhai

Complex interpreter (Cassandra) https://github.com/doanduyhai/incubator-zeppelin/tree/ZeppelinPresentation

@doanduyhai

Cassandra Online Interpreter Docs

27

•  http://zeppelin.incubator.apache.org/docs/interpreter/cassandra.html

@doanduyhai

Cassandra Interactive Help

28

•  Type HELP; in the interpreter

Zeppelin future

Roadmap

@doanduyhai

Roadmap & future•  More graph options (Map viz ZEPPELIN-157)

•  Helium project, packaging Zeppelin view, logic (code) & resource into Applications

•  Interpreters packaging re-design •  ship & compile core interpreters only

•  third-parties interpreters can be pulled from repository

•  which interpreter is core ? Who will maintain ? Community….

•  Integrate security (Apache Shiro, pull request #53 by Hayssam Saleh)

30

@doanduyhai

Roadmap & future•  Out of incubation state to become 1st class Apache project

31

@doanduyhai

Q & R

! "

@doanduyhai

Thank You @doanduyhai

[email protected]

http://zeppelin.incubator.apache.org/