3rd hivemall meetup
TRANSCRIPT
![Page 1: 3rd Hivemall meetup](https://reader031.vdocuments.net/reader031/viewer/2022021919/586fdeb61a28ab18428b6d0b/html5/thumbnails/1.jpg)
RecentprogressandfutureroadmapofHivemall
ResearchEngineerMakotoYUI@myui
1
#hivemallmtup
2016/09/083rdHivemallmeetup
![Page 2: 3rd Hivemall meetup](https://reader031.vdocuments.net/reader031/viewer/2022021919/586fdeb61a28ab18428b6d0b/html5/thumbnails/2.jpg)
Agenda
1. ShortIntroductiontoHivemallü Hivemalluse-cases
2. RecentUpdates3. RoadmapofHivemallü comingnewfeatures
22016/09/083rdHivemallmeetup
![Page 3: 3rd Hivemall meetup](https://reader031.vdocuments.net/reader031/viewer/2022021919/586fdeb61a28ab18428b6d0b/html5/thumbnails/3.jpg)
WhatisHivemall
ScalablemachinelearninglibrarybuiltasacollectionofHiveUDFs,licensedundertheApacheLicensev2
3
https://github.com/myui/hivemall
Thankforeveryonecontributedtotheproject!
2016/09/083rdHivemallmeetup
![Page 4: 3rd Hivemall meetup](https://reader031.vdocuments.net/reader031/viewer/2022021919/586fdeb61a28ab18428b6d0b/html5/thumbnails/4.jpg)
HadoopHDFS
MapReduce(MRv1)
Hivemall
ApacheYARN
ApacheTezDAGprocessing
Machine Learning
Query Processing
Parallel Data Processing Framework
Resource Management
Distributed File SystemCloud Storage
SparkSQL
ApacheSpark
MESOS
Hive Pig
MLlib
WhatisHivemall
AmazonS3
2016/09/083rdHivemallmeetup 4
![Page 5: 3rd Hivemall meetup](https://reader031.vdocuments.net/reader031/viewer/2022021919/586fdeb61a28ab18428b6d0b/html5/thumbnails/5.jpg)
Hivemall’s Vision:MLonSQL
ClassificationwithMahout
CREATETABLElr_modelASSELECTfeature,-- reducersperformmodelaveraginginparallelavg(weight)asweightFROM(SELECTlogress(features,label,..)as(feature,weight)FROMtrain)t-- map-onlytaskGROUPBYfeature;-- shuffledtoreducers
✓MachineLearningmadeeasyforSQLdevelopers(MLfortherestofus)✓InteractiveandStableAPIsw/ SQLabstraction
ThisSQLqueryautomaticallyrunsinparallelonHadoop
52016/09/083rdHivemallmeetup
![Page 6: 3rd Hivemall meetup](https://reader031.vdocuments.net/reader031/viewer/2022021919/586fdeb61a28ab18428b6d0b/html5/thumbnails/6.jpg)
Ø CTRpredictionofAdclicklogs•Freakout Inc.,Fancommunication,andmore•ReplacedSparkMLlibw/HivemallatcompanyX
IndustryusecasesofHivemall
6http://www.slideshare.net/masakazusano75/sano-hmm-20150512
2016/09/083rdHivemallmeetup
![Page 7: 3rd Hivemall meetup](https://reader031.vdocuments.net/reader031/viewer/2022021919/586fdeb61a28ab18428b6d0b/html5/thumbnails/7.jpg)
7
ØGenderpredictionofAdclicklogs•Scaleout Inc.andFancommutations
http://eventdots.jp/eventreport/458208
IndustryusecasesofHivemall
2016/09/083rdHivemallmeetup
![Page 8: 3rd Hivemall meetup](https://reader031.vdocuments.net/reader031/viewer/2022021919/586fdeb61a28ab18428b6d0b/html5/thumbnails/8.jpg)
8
IndustryusecasesofHivemallØ ValuepredictionofRealestates•Livesense
http://www.slideshare.net/y-ken/real-estate-tech-with-hivemall2016/09/083rdHivemallmeetup
![Page 9: 3rd Hivemall meetup](https://reader031.vdocuments.net/reader031/viewer/2022021919/586fdeb61a28ab18428b6d0b/html5/thumbnails/9.jpg)
9
ØChurnDetection•OISIX
IndustryusecasesofHivemall
http://www.slideshare.net/TaisukeFukawa/hivemall-meetup-vol2-oisix2016/09/083rdHivemallmeetup
![Page 10: 3rd Hivemall meetup](https://reader031.vdocuments.net/reader031/viewer/2022021919/586fdeb61a28ab18428b6d0b/html5/thumbnails/10.jpg)
Agenda
1. ShortIntroductiontoHivemallü Hivemalluse-cases
2. RecentUpdates3. RoadmapofHivemallü comingnewfeatures
102016/09/083rdHivemallmeetup
![Page 11: 3rd Hivemall meetup](https://reader031.vdocuments.net/reader031/viewer/2022021919/586fdeb61a28ab18428b6d0b/html5/thumbnails/11.jpg)
v0.4.2-rc.2Ø Releasedon2016/06/28ØminorhotfixesØ Thelatestrelease
11
RecentReleases
2016/09/083rdHivemallmeetup
![Page 12: 3rd Hivemall meetup](https://reader031.vdocuments.net/reader031/viewer/2022021919/586fdeb61a28ab18428b6d0b/html5/thumbnails/12.jpg)
v0.4.2-rc.1Ø Releasedon2016/06/07Ø HivemallonSparkv1.6Ø Kudosto@maropu
Ø BPR-MF(MatrixFactorizationforImplicitFeedbacks)
12
RecentReleases
2016/09/083rdHivemallmeetup
![Page 13: 3rd Hivemall meetup](https://reader031.vdocuments.net/reader031/viewer/2022021919/586fdeb61a28ab18428b6d0b/html5/thumbnails/13.jpg)
13
HivemallonApacheSpark
Installationisveryeasyasfollows:$spark-shell--packagesmaropu:hivemall-spark:0.0.6
2016/09/083rdHivemallmeetup
![Page 14: 3rd Hivemall meetup](https://reader031.vdocuments.net/reader031/viewer/2022021919/586fdeb61a28ab18428b6d0b/html5/thumbnails/14.jpg)
14
FeatureHashingFrequentlyusedtechniquetodealwithhigh-dimensionaldata
2016/09/083rdHivemallmeetup
高次元 低次元
![Page 15: 3rd Hivemall meetup](https://reader031.vdocuments.net/reader031/viewer/2022021919/586fdeb61a28ab18428b6d0b/html5/thumbnails/15.jpg)
Kerneltrick
2016/09/083rdHivemallmeetup 15
高次元に写像
InputFeatureSpace MappedFeatureSpace
高次空間でhyperplaneを引く低次元で非線形分離できている
For two dimensional features [a, b], the degree-2 polynomial features are [(1, ) a, b, a^2, ab, b^2].高次元低次元
![Page 16: 3rd Hivemall meetup](https://reader031.vdocuments.net/reader031/viewer/2022021919/586fdeb61a28ab18428b6d0b/html5/thumbnails/16.jpg)
16
PolynomialExpansion
2016/09/083rdHivemallmeetup
![Page 17: 3rd Hivemall meetup](https://reader031.vdocuments.net/reader031/viewer/2022021919/586fdeb61a28ab18428b6d0b/html5/thumbnails/17.jpg)
17
PolynomialExpansion
b^b:1.0andb^b^b:1.0areomittedw/truncateoptiona^a:0.25andc^c:0.09areomittedw/interactiveonlyoption
2016/09/083rdHivemallmeetup
![Page 18: 3rd Hivemall meetup](https://reader031.vdocuments.net/reader031/viewer/2022021919/586fdeb61a28ab18428b6d0b/html5/thumbnails/18.jpg)
FeatureVectorformatterFunctions
18
量的変数は「カラム名:値」質的変数は「カラム名#値」となるなお、nullや重み0.0の特徴は作成されない
2016/09/083rdHivemallmeetup
![Page 19: 3rd Hivemall meetup](https://reader031.vdocuments.net/reader031/viewer/2022021919/586fdeb61a28ab18428b6d0b/html5/thumbnails/19.jpg)
19
Mini-batchGradientDescent
Caution:Mini-batchgenerallyrequiresmoreiterationsthanSGD2016/09/083rdHivemallmeetup
![Page 20: 3rd Hivemall meetup](https://reader031.vdocuments.net/reader031/viewer/2022021919/586fdeb61a28ab18428b6d0b/html5/thumbnails/20.jpg)
20
JapaneseTokenizerusingKuromoji
ThisfeatureisrequestfromaTreasureDatacustomer
2016/09/083rdHivemallmeetup
Thanksprovidingareferenceimplementationtous(companyR)
![Page 21: 3rd Hivemall meetup](https://reader031.vdocuments.net/reader031/viewer/2022021919/586fdeb61a28ab18428b6d0b/html5/thumbnails/21.jpg)
Agenda
1. ShortIntroductiontoHivemallü Hivemalluse-cases
2. RecentUpdates3. RoadmapofHivemallü comingnewfeatures
212016/09/083rdHivemallmeetup
![Page 22: 3rd Hivemall meetup](https://reader031.vdocuments.net/reader031/viewer/2022021919/586fdeb61a28ab18428b6d0b/html5/thumbnails/22.jpg)
22
ImportantAnnouncement
HivemallwillbecomeApacheHivemall(?)Nowonvotingthough..
2016/09/083rdHivemallmeetup
![Page 23: 3rd Hivemall meetup](https://reader031.vdocuments.net/reader031/viewer/2022021919/586fdeb61a28ab18428b6d0b/html5/thumbnails/23.jpg)
23
ApacheIncubationstatus
2016/09/083rdHivemallmeetup
![Page 24: 3rd Hivemall meetup](https://reader031.vdocuments.net/reader031/viewer/2022021919/586fdeb61a28ab18428b6d0b/html5/thumbnails/24.jpg)
•MakotoYui<TreasureData>• TakeshiYamamuro <NTT>Ø HivemallonApacheSpark• DanielDai<Hortonworks>Ø HivemallonApachePigØ ApachePigPMCmember• TsuyoshiOzawa<NTT>ØApacheHadoopPMCmember• KaiSasaki<TreasureData>
24
Initialcommitters
2016/09/083rdHivemallmeetup
![Page 25: 3rd Hivemall meetup](https://reader031.vdocuments.net/reader031/viewer/2022021919/586fdeb61a28ab18428b6d0b/html5/thumbnails/25.jpg)
Champion
NominatedMentors
25
Projectmentors
• ReynoldXin<Databricks,ASFmember>ApacheSparkPMCmember• MarkusWeimer<Microsoft,ASFmember>ApacheREEFPMCmember• Xiangrui Meng <Databricks,ASFmember>ApacheSparkPMCmember
• RomanShaposhnik <Pivotal,ASFmember>ApacheBigtop/IncubatorPMCmember
2016/09/083rdHivemallmeetup
![Page 26: 3rd Hivemall meetup](https://reader031.vdocuments.net/reader031/viewer/2022021919/586fdeb61a28ab18428b6d0b/html5/thumbnails/26.jpg)
• PossiblyenterApacheIncubatorinSept,2016• IPclearanceandproject/repositorysitesetup•Contributionguideline•CreatewhouseHivemalllist•Moredocumentations!SepttoNov
• InitialApacheReleaseDec(orlateNov?)• v0.5
• Non-Apachereleaseofv0.5-beta.xxwillbereleaseingithub inOct
26
Roadmap
2016/09/083rdHivemallmeetup
![Page 27: 3rd Hivemall meetup](https://reader031.vdocuments.net/reader031/viewer/2022021919/586fdeb61a28ab18428b6d0b/html5/thumbnails/27.jpg)
ü HivemallonSpark2.0w/Dataframe support• Kudosto@maropu
ü ChangeFinder• ChangePointandAnomalyDetection• Kudosto@L3sota@takuti• PR#333
ü XGBoost support• Kudosto@maropu
27
ComingNewFeatures- alreadymergedinMaster
2016/09/083rdHivemallmeetup
![Page 28: 3rd Hivemall meetup](https://reader031.vdocuments.net/reader031/viewer/2022021919/586fdeb61a28ab18428b6d0b/html5/thumbnails/28.jpg)
ü ChangeFinder
28
ComingNewFeatures- alreadymergedinMaster
cf_detect(array<double>x[,const stringoptions])
J.TakeuchiandK.Yamanishi,“AUnifyingFrameworkforDetectingOutliersandChangePointsfromTimeSeries,” IEEEtransactionsonKnowledgeandDataEngineering,pp.482-492,2006.
2016/09/083rdHivemallmeetup
![Page 29: 3rd Hivemall meetup](https://reader031.vdocuments.net/reader031/viewer/2022021919/586fdeb61a28ab18428b6d0b/html5/thumbnails/29.jpg)
ü ChangeFinder
29
ComingNewFeatures- alreadymergedinMaster
cf_detect(array<double>x[,const stringoptions])
2016/09/083rdHivemallmeetup
![Page 30: 3rd Hivemall meetup](https://reader031.vdocuments.net/reader031/viewer/2022021919/586fdeb61a28ab18428b6d0b/html5/thumbnails/30.jpg)
ü VariousEvaluationMetrics• Kudosto@takuti,alsoR2by,logloss by• PR#326
30
ComingNewFeatures- alreadymergedinMaster
2016/09/083rdHivemallmeetup
Fan-cs,sakai-san
![Page 31: 3rd Hivemall meetup](https://reader031.vdocuments.net/reader031/viewer/2022021919/586fdeb61a28ab18428b6d0b/html5/thumbnails/31.jpg)
31
ComingNewFeatures- alreadymergedinMaster
ü FeatureBinning• Kudosto@amaya382onPR#382• Mapsquantitativevariablestobins
Age(quantitativevariable)ismappedintoameaningfulbin(categoricalvariable)basedonquantiles
2016/09/083rdHivemallmeetup
![Page 32: 3rd Hivemall meetup](https://reader031.vdocuments.net/reader031/viewer/2022021919/586fdeb61a28ab18428b6d0b/html5/thumbnails/32.jpg)
• v0.5-beta{1,2}release(Oct-Nov)ü Systemtestframework
üKudosto@amaya382ü one-hotencoding
üKudosto@kaiü Field-awareFactorizationMachinesü Kernelized PassiveAggressive
üKudosto@L3sotaü GeneralizedLinearModel
ü OptimizerframeworkincludingADAMü L1/L2regularizationü Kudosto@maropu
ü Disk-basediterationsupportü Toavoidtoolargeamplify
ü GradientTreeBoostingü OnlineLDA
32
Otherundergoingnewfeatures
2016/09/083rdHivemallmeetup
![Page 33: 3rd Hivemall meetup](https://reader031.vdocuments.net/reader031/viewer/2022021919/586fdeb61a28ab18428b6d0b/html5/thumbnails/33.jpg)
33
WesupportmachinelearninginCloud
Anyfeaturerequest?Or,questions?
bit.ly/td-wants-you