data science notebook webinar 2017-11-16 copy...data science readiness •jupyter: widely used by...
TRANSCRIPT
DataScienceNotebookGuidelines
ODPi BI&DataScienceSIG:CupidChan
MoonSooLeeFrankMcQuillan
• BridgingthegapsothatBItoolscansitharmoniouslyontopofbothHadoopandRDBMS,whileprovidingthesame,orevenmore,businessinsighttotheBIuserswhohavealsoHadoopinthebackend.
• ProvideanobjectiveguidelineforevaluatingtheeffectivenessofaBIsolution,and/orotherrelatedmiddlewaretechnologies
BI&DataScienceSpecialInterestGroup(SIG)
Targetuserpersona
• Jupyter:Datascienceuserwithprogrammingexperienceinoneofthesupportedkernels
• Zeppelin:Dataengineer,datascientistandbusinessusersinthesamedataprocessingpipelineneedtocollaborate
Installation
• Jupyter:EasyinstallationwithAnacondaorpip.Standalone,orHadoopandSpark(viaYARN)clusterssupported.
• Zeppelin:Downloadbinarypackageandstartdaemonscript.IncludedinHDP.
Configuration
• Jupyter:Editconfig filesorusecommandlinetoolfornotebooksettings.Communitymaintainedlanguagekernelshavevariousconfigurationworkflows.
• Zeppelin:Editconfig files.InterpreterscanbeconfiguredthroughGUI.
UserInterface
• Jupyter:Functionalnotebookuserinterfacethatcanbeusedtocreatereadableanalysescombiningcode,images,comments,formulaeandplots.
• Zeppelin:Notebookinterfacethatusercandocument,runcodes,visualizeoutputswithflexiblelayoutandmultiplelookandfeel.
Supportedlanguages
• Jupyter:Python,R,Juliaanddozensofcommunitymaintainedkernels
• Zeppelin:VariouslanguagesupportsareincludedinthebinarypackagewhichSpark,Python,JDBCandetc.3rdpartyinterpretersareavailablethroughonlineregistry
Multi-usersupport
• Jupyter:NativeJupyter doesnotsupportmulti-user.HoweverJupyterHub canbeusedtoservenotebookstousersworkinginseparatesessions.
• Zeppelin:Multipleuserscancollaborateinreal-timeonanotebook.Multipleuserscanworkwithmultiplelanguagesinthesamenotebook.
Supportandcommunity
• Jupyter:Matureprojectwithactivecommunityandgoodsupport.Jupyter projectbornin2014buthasrootsgoingbackto2001.
• Zeppelin:ApacheZeppelinisoneofthemostactiveprojectinApacheSoftwareFoundation.Projectbornin2013andbecametoplevelprojectofASFin2015.
Architecture
• Jupyter:Thenotebookserversendscodetolanguagekernels,rendersinabrowser,andstorescode/output/MarkdowninJSONfiles.
• Zeppelin:Zeppelinserverdaemonmanagesmultipleinterpreters(backendintegrations).Webapplicationcommunicatestoserverusingwebsocketforreal-timecommunication.
Bigdataecosystem
• Jupyter:Canbeconnectedtoavarietyofbigdataexecutionenginesandframeworks:Spark,massivelyparallelprocessing(MPP)databases,Hadoop,etc.
• Zeppelin:TightlyintegratedwithApacheSparkandotherbigdataprocessingengines.
Security
• Jupyter:Codeexecutedinthenotebookistrusted,likeanyotherPythonprogram.Token-basedauthenticationonbydefault.Rootusedisabledbydefault/
• Zeppelin:Userauthentication(LDAP,ADintegration)NotebookACL.InterpreterACL.SSLconnection.
Datasciencereadiness
• Jupyter:Widelyusedbydatascientistsforavarietyoftasksincludingquickexploration,documentationoffindings,reproducibility,teaching,andpresentations
• Zeppelin:Datascientistscancollaborateeachother.Alsobusinessuserscanloginandcollaboratewithdatascientistsdirectlyonnotebooks.
JupyterFrankMcQuillan
Agenda
• WhatisaJupyter notebook?• Lightningtutorial- myfirstJupyter notebook• Datascienceexamples
– Python– SQL
• Keystrengthsandpotentialareasofimprovement
WhatisaJupyter Notebook?
• Tellastorywithyourdata• Programinawebbrowser• “Multimodal”• Favoritetoolofdatascientistsandresearchers
SupportandCommunity
• 2001- IPythonnotebookproject(FernandoPerez)• 2014- Jupyternotebooklaunched• Opensource(modifiedBSDlicense)• Steeringcouncilof~15membersfromacademiaandcommercialcompanies
• Matureproductwithactivecommunityhttps://stackoverflow.com/search?q=jupyter returns~10,500results
Architecture
● IPython● IRkernel● IJulia● Dozensofcommunity
maintainedkernelshttps://github.com/jupyter/jupyter/wiki/Jupyter-kernels
Demo
Summary
• Keystrengths– Datasciencefriendly–Matureproject–Widelyused– IntuitiveUI– Nicepresentationofcode,images,comments,formulae
– Lotsofavailablekernels
• Somepotentialimprovements–Multi-usersupport– Celldraganddrop– Hidingcode/output– IDEtypeoperationslikesyntaxchecking,versioncontrol,runningcodeonelineatatime
ZeppelinMoonSooLee
Slide & demo notebook - https://s.apache.org/ZPLN