apache zeppelin, the missing component for the big data ecosystem
TRANSCRIPT
@doanduyhai
Apache Zeppelin, the missing GUI for your BigData eco-system DuyHai DOAN, Technical Advocate
@doanduyhai
Who Am I ?Duy Hai DOAN Cassandra technical advocate• talks, meetups, confs• open-source devs (Achilles, …)• OSS Cassandra point of contact
☞ [email protected] ☞ @doanduyhai
2
@doanduyhai
Datastax• Founded in April 2010
• We contribute a lot to Apache Cassandra™
• 400+ customers (25 of the Fortune 100), 400+ employees
• Headquarter in San Francisco Bay area
• EU headquarter in London, offices in France and Germany
• Datastax Enterprise = OSS Cassandra + extra features
3
@doanduyhai
Zeppelin Architecture
Zeppelin Server
Zeppelin Engine
7
REST
Web
Sock
et
Spark Interpreter Group
Spark SparkSQL
Zeppelin Interpreter
Factory
Tajo Interpreter
Flink Interpreter
Cassandra Interpreter
JVM
JVM
JVM
JVM
JVM
@doanduyhai
What does Zeppelin provide ?Front-end & display system for free
Generic back-end with REST APIs & WebSocket
Pluggable interpreters system
Task scheduler (à la CRON)
8
@doanduyhai
Interpreter processing lifecycle① Receive input commands/data• as raw text
• from form data
② Process the input commands/data by the external back-end
③ Format the response using Zeppelin display system
④ Send response back to the Zeppelin engine
14
@doanduyhai
Core interpreters• Spark (Spark core, SparkSQL/DataFrame, PySpark)• Spark core = default (or %spark)
• SparkSQL = %sql
• Shell (%sh)
• Markdown (%md)
• AngularJS (%angular)
15
@doanduyhai
Third-parties interpreters• Hive• Phoenix• Tajo• Flink• Ignite• Lens• Cassandra • Geode• PostgreSQL• Kylin
16
@doanduyhai
Interpreter conf & usage https://github.com/doanduyhai/incubator-zeppelin/tree/ZeppelinPresentation
Writing An Interpreter
How ToSimple interpreter example (AsciiDoc)
Complex interpreter example (Cassandra)
@doanduyhai
Steps to write your own interpreter
• Create a class that extends Interpreter base class• Register it in a static block
• Optionnally define default config params
19
static {Interpreter.register("MyInterpreterName", MyClassName.class.getName());
}
static {Interpreter.register("MyInterpreterName", MyClassName.class.getName(),
new InterpreterPropertyBuilder() .add("property1", "default value", "Description of property1").build());
}
@doanduyhai
To register your interpreter as default
• Edit the enum ZeppelinConfiguration.ConfVars
• Add your interpreter FQCN in the property ZEPPELIN_INTERPRETERS
20
@doanduyhai
To register your interpreter in config files
• Create conf/zeppelin-site.xml from conf/zeppelin-site.xml.template
• Add your interpreter FQCN in the property zeppelin.interpreters
21
<property><name>zeppelin.interpreters</name><value>org.apache.zeppelin.spark.SparkInterpreter,org.apache.zeppelin.spark.PySparkInterpreter,
org.apache.zeppelin.spark.SparkSqlInterpreter,org.apache.zeppelin.spark.DepInterpreter,org.apache.zeppelin.markdown.Markdown,org.apache.zeppelin.shell.ShellInterpreter,org.apache.zeppelin.hive.HiveInterpreter,com.me.MyNewInterpreter
</value></property>
@doanduyhai
Simple AsciiDoc Interpreter
22
Zeppelin Server
AsciiDoc Interpreter
JVM Zeppelin Engine
RawTextBlock
RawTextBlock
ConvertedTo
HTML
HTMLOutput
① ②
③ ④
JVM
@doanduyhai
Simple interpreter (AsciiDoc) https://github.com/doanduyhai/incubator-zeppelin/tree/ZeppelinPresentation
@doanduyhai
Cassandra Interpreter Architecture
24
Cassandra Interpreter
JVM
DisplayResults as
HTML
① ②
⑤
Zeppelin Server
JVM
RawTextBlock
RawTextBlock
Cassandra Cassandra
Java Driver
③
Async CQLstatements
④ RenderHTML
⑥
@doanduyhai
Cassandra Interpreter Commands
25
Native CQL statements
SELECT * FROM …;INSERT INTO …;…
Schema commands
DESCRIBE TABLE …;DESCRIBE KEYSPACE …;…
Prepared statements Commands
@prepare …;@bind …;@remove_prepared …;
Help commandHELP;
Options Commands@consistency …;@retryPolicy …;@fetchSize …;
@doanduyhai
Complex interpreter (Cassandra) https://github.com/doanduyhai/incubator-zeppelin/tree/ZeppelinPresentation
@doanduyhai
Cassandra Online Interpreter Docs
27
• http://zeppelin.incubator.apache.org/docs/interpreter/cassandra.html
@doanduyhai
Roadmap & future• More graph options (Map viz ZEPPELIN-157)
• Helium project, packaging Zeppelin view, logic (code) & resource into Applications
• Interpreters packaging re-design • ship & compile core interpreters only
• third-parties interpreters can be pulled from repository
• which interpreter is core ? Who will maintain ? Community….
• Integrate security (Apache Shiro, pull request #53 by Hayssam Saleh)
30