toulouse data science meetup - apache zeppelin

15
Apache Zeppelin The (very) short field trip by G.Alléon & G.Dupont TDS meetup - 2016.06.30

Upload: dupont-gerard

Post on 16-Jan-2017

272 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Toulouse Data Science meetup - Apache zeppelin

Apache ZeppelinThe (very) short field trip

by G.Alléon & G.DupontTDS meetup - 2016.06.30

Page 2: Toulouse Data Science meetup - Apache zeppelin

Who are we?Guillaume Alleon - AIRBUS Group Innovation (corporate research center)Research leader for more than 30 people from UK to China, tackling problems in massive data processing and information extraction.

Was already in “big data” when it was still called HPC…

Gerard Dupont - AIRBUS Defence & Space (space systems)Technical coordinator for R&T studies on distributed processing systems.

Spend way too much time processing web data for intelligence, now looking to the sky (satellite data ;-)

Page 3: Toulouse Data Science meetup - Apache zeppelin

Zeppelin moto

“A web-based notebook that enables interactive data analytics.”

Page 4: Toulouse Data Science meetup - Apache zeppelin

Origins & historyMissing piece in HADOOP landscape: a modern analytic playground.

2012.12 - Data analytics solution (NFLabs)

2013.10 - Opensourced

2014.12 - ASF incubation

2015 - 3 stable releases

2016.05 - Maturing to Apache top level project

Page 5: Toulouse Data Science meetup - Apache zeppelin

3000 feet view

Page 6: Toulouse Data Science meetup - Apache zeppelin

What’s cool about Zeppelin⊕ interactive

⊕ out-of-the-box spark integration

⊕ out-of-the-box visualization options

⊕ direct access to DOM for customized visualization

⊕ nice UI (bootstrap & angular)

⊕ notebook run scheduler

⊕ easy to configure

⊕ extensibility, extensibility and extensibility...

Page 7: Toulouse Data Science meetup - Apache zeppelin

What’s cool about Zeppelin⊕ interactive

⊕ out-of-the-box spark integration

⊕ out-of-the-box visualization options

⊕ direct access to DOM for customized visualization

⊕ nice UI (bootstrap & angular)

⊕ notebook run scheduler

⊕ easy to configure

⊕ extensibility, extensibility and extensibility...

… the dark side ⊝ hard to install

⊝ need to build from the source (for customized version)

⊝ not (yet) multi-users

⊝ still “young”

⊝ resources greedy

Page 8: Toulouse Data Science meetup - Apache zeppelin

Overview/look & feel

Interpreter text (aka your code)

Interpreter config

Interactive results

Page 9: Toulouse Data Science meetup - Apache zeppelin

DEMO time

credits: https://www.weasyl.com/~uszatyarbuz

Page 10: Toulouse Data Science meetup - Apache zeppelin

Under the hood○Interpreter isolation with

their own JVM

○Dynamic dependencies loading

○REST & websocket on front

○Thrift in back (or whatever you add)

○Process scheduler (cron-like)

Page 11: Toulouse Data Science meetup - Apache zeppelin

RoadmapEnterprise Ready

○Multi-tenancy

○Job scheduler

○HA

Usability Improvement

○UX improvement

○Table data support

○Dynamic interpreter integration

○Reusable analytic application catalog

Page 12: Toulouse Data Science meetup - Apache zeppelin

ThxOffical website: https://zeppelin.apache.org/

Notebook sample: https://www.zeppelinhub.com/viewer

Source code: https://github.com/apache/incubator-zeppelin

Mailing lists: http://zeppelin.apache.org/community.html

This TDS notebook: http://tinyurl.com/zeppelin-tdsSources for this presentation:

○ http://www.slideshare.net/FlinkForward/moon-soo-lee-data-science-lifecycle-with-apache-flink-and-apache-zeppelin/23○ http://www.slideshare.net/HadoopSummit/apache-zeppelin-helium-and-beyond○ http://www.slideshare.net/felixcss/interactive-data-science-from-scratch-with-apache-zeppelin-and-apache-spark○ http://www.slideshare.net/BrunoBonnin/explorez-vos-donnes-avec-apache-zeppelin

credits: https://www.weasyl.com/~uszatyarbuz

Page 13: Toulouse Data Science meetup - Apache zeppelin

BACKUP

Page 14: Toulouse Data Science meetup - Apache zeppelin

Origins & historyActive core teams

Descent number of external contributors

Plenty of interpreters (official and external)

0.6.0-SNAPSHOT (pending stabilization)

Page 15: Toulouse Data Science meetup - Apache zeppelin

3000 feet view