toulouse data science meetup - apache zeppelin

Post on 16-Jan-2017

272 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Apache ZeppelinThe (very) short field trip

by G.Alléon & G.DupontTDS meetup - 2016.06.30

Who are we?Guillaume Alleon - AIRBUS Group Innovation (corporate research center)Research leader for more than 30 people from UK to China, tackling problems in massive data processing and information extraction.

Was already in “big data” when it was still called HPC…

Gerard Dupont - AIRBUS Defence & Space (space systems)Technical coordinator for R&T studies on distributed processing systems.

Spend way too much time processing web data for intelligence, now looking to the sky (satellite data ;-)

Zeppelin moto

“A web-based notebook that enables interactive data analytics.”

Origins & historyMissing piece in HADOOP landscape: a modern analytic playground.

2012.12 - Data analytics solution (NFLabs)

2013.10 - Opensourced

2014.12 - ASF incubation

2015 - 3 stable releases

2016.05 - Maturing to Apache top level project

3000 feet view

What’s cool about Zeppelin⊕ interactive

⊕ out-of-the-box spark integration

⊕ out-of-the-box visualization options

⊕ direct access to DOM for customized visualization

⊕ nice UI (bootstrap & angular)

⊕ notebook run scheduler

⊕ easy to configure

⊕ extensibility, extensibility and extensibility...

What’s cool about Zeppelin⊕ interactive

⊕ out-of-the-box spark integration

⊕ out-of-the-box visualization options

⊕ direct access to DOM for customized visualization

⊕ nice UI (bootstrap & angular)

⊕ notebook run scheduler

⊕ easy to configure

⊕ extensibility, extensibility and extensibility...

… the dark side ⊝ hard to install

⊝ need to build from the source (for customized version)

⊝ not (yet) multi-users

⊝ still “young”

⊝ resources greedy

Overview/look & feel

Interpreter text (aka your code)

Interpreter config

Interactive results

DEMO time

credits: https://www.weasyl.com/~uszatyarbuz

Under the hood○Interpreter isolation with

their own JVM

○Dynamic dependencies loading

○REST & websocket on front

○Thrift in back (or whatever you add)

○Process scheduler (cron-like)

RoadmapEnterprise Ready

○Multi-tenancy

○Job scheduler

○HA

Usability Improvement

○UX improvement

○Table data support

○Dynamic interpreter integration

○Reusable analytic application catalog

ThxOffical website: https://zeppelin.apache.org/

Notebook sample: https://www.zeppelinhub.com/viewer

Source code: https://github.com/apache/incubator-zeppelin

Mailing lists: http://zeppelin.apache.org/community.html

This TDS notebook: http://tinyurl.com/zeppelin-tdsSources for this presentation:

○ http://www.slideshare.net/FlinkForward/moon-soo-lee-data-science-lifecycle-with-apache-flink-and-apache-zeppelin/23○ http://www.slideshare.net/HadoopSummit/apache-zeppelin-helium-and-beyond○ http://www.slideshare.net/felixcss/interactive-data-science-from-scratch-with-apache-zeppelin-and-apache-spark○ http://www.slideshare.net/BrunoBonnin/explorez-vos-donnes-avec-apache-zeppelin

credits: https://www.weasyl.com/~uszatyarbuz

BACKUP

Origins & historyActive core teams

Descent number of external contributors

Plenty of interpreters (official and external)

0.6.0-SNAPSHOT (pending stabilization)

3000 feet view

top related