implementing a gpu-based machine learning library on apache
TRANSCRIPT
Implementing a GPU-based Machine Learning Library
on Apache Spark
James JiaPradeep KalipatnapuRichard ChiouYiheng YangJohn F. Canny, Ed.
Electrical Engineering and Computer SciencesUniversity of California at Berkeley
Technical Report No. UCB/EECS-2016-51
http://www.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-51.html
May 11, 2016
Copyright © 2016, by the author(s).All rights reserved.
Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission.
Implementing a GPU-based Machine Learning Library on Apache Spark
JamesJia
ElectricalEngineeringandComputerSciences
UniversityofCalifornia,Berkeley
TableofContents
1.1ProblemDefinition....................................................................................31.2TaskBreakdown.........................................................................................41.3SettingupEC2..............................................................................................51.4SupportingReadingandWritingfromHDFS....................................61.5ParallelizingBIDData..............................................................................................81.5.1BIDDataArchitectureOverview.........................................................................................81.5.2ExtendingtheLearner.............................................................................................................81.5.3APIforDistributedMachineLearning...........................................................................101.5.4ImplementingDistributedKMeans.................................................................................101.5.5KMeansBenchmarks..........................................................................................................11
2.1Introduction..............................................................................................142.2TrendsandMarket.................................................................................142.3IndustryAnalysis....................................................................................................162.3.1ValueChainAnalysis..............................................................................................................162.3.2Porter’sFiveForcesAnalysis.............................................................................................182.3.3ThreatofSubstitutes..............................................................................................................182.3.4BargainingPowerofSuppliers..........................................................................................182.3.5BargainingPowerofConsumers......................................................................................192.3.6ThreatofNewEntrants........................................................................................................192.3.7CompetitiveRivalry............................................................................................................20
2.4GotoMarketStrategy............................................................................20Bibliography......................................................................................................................22
1.1ProblemDefinition
OurcapstoneprojectistoportBIDDataoverontoApacheSpark,anopensource
engineusedforlarge-scaledataprocessing,soreal-worlddevelopersatcompaniessuch
asYahoo,Twitter,andFacebookcanleveragethe10xperformanceandcostbenefitsof
GPUacceleratedmachinesatalargescaleformachinelearningtasks.BIDDataisa
machinelearninglibrarythatrunsonasinglemachinethatisGPU-optimized,
outperformingothersetupsthatrunonhundredsofmulti-coreCPUmachinesdueto
theparallelstructureofmanymachinelearningalgorithms.Infact,itholdsthe
benchmarkforaplethoraofcommonalgorithmssuchask-NearestNeighbors,Random
Forests,andLatentDirichletAllocation,beatingothersetupsthatusehundredsofmulti-
coreCPUcomputers(Canny2015).
System nodes/cores nclust Time Cost Energy(KJ)
Spark 32/128 256 180s $0.45 1150
BIDMach 1 256 320s $0.06 90
Spark 96/384 4096 1100 $9.00 22000
BIDMach 1 4096 735 $0.12 140
RunningKMeansontheMNIST-8M(25GB)dataset(Canny2015)
However,runningBIDDataonasinglemachinecanonlyprocessdataonthe
scaleofhundredsofgigabytes.Forcomparison,manycompanies,likeYahoo,process
petabytesofdataeveryday(Feng2013).DevelopersthatwanttoscaleupBIDDatato
matchtheirdataprocessingneedshavetohandlealloftheproblemsthatarisewhen
creatingascalable,distributedplatform,suchasincompatiblelibrarydependenciesand
datastorageandsynchronizationissueswiththeirdatabasefiles.However,by
integratingBIDDataintoApacheSpark,wecaneliminatethisoverheadsodevelopers
canfocusonperformingactualanalytics.
1.2TaskBreakdown
Wesplitourprojectintotwomajorcomponents.PradeepandYihengfocusedon
thefirstcomponent:tocreateanautomatedbuildmanagementsystemforrunning
BIDDataonmultipleplatforms.Currently,developersthatwanttouseBIDDatatorun
machinelearningalgorithmsontheGPUneedtobuildthelibrariesthemselves.This
representsahugebarriertoentryasmanyalternativemachinelearninglibrarieshave
releasedpre-builtversionsthatsimplifythedeploymentprocess.However,asPradeep
andYihengwilldiscussingreaterdetailintheirreports,establishinganautomatedbuild
managementsystemforBIDDataiscomplicatedbythefactthatwearesupporting
multipledifferenthostarchitecturesandeachofthesearchitectureshavedistinctnative
librariesuponwhichBIDDatadepends.Meanwhile,RichardandIworkedonthesecond
component:toportBIDDataontopofApacheSparkandtheHadoopDistributedFile
System(HDFS).SinceBIDDatawillnowbereadingandwritingtoHDFSinsteadsimply
writingtodisk,weneedtoextendBIDData’sexistingAPIthisnewrequirement.
Furthermore,wealsoneedtoimplementseveralparallelmachinelearningalgorithmsin
ordertodocumenttheperformanceofrunningBIDDataontopofSparkandcompare
againstotheralternatives.Richardworkedonimplementingdistributedlogistic
regressionandIworkedonextendingsupporttoreadingandwritingfromHDFSaswell
asimplementedthedistributedK-Meansalgorithm.
1.3SettingupEC2
Myfirstprioritywastosetupasampledistributedsystemasamock
environmentthatwecoulduseforfuturedevelopment.Withoutsuchanenvironment,
wewouldnotbeabletobenchmarktheparallelvariantsofcommonmachinelearning
algorithmstocompareagainsttheserialversions.WechosetouseAmazonEC2asitis
thepredominantcloudcomputingplatformusedinbothacademiaandindustryto
powerdistributedcomputation.ThisentailedsettingupApacheSparkonthemaster
andslavenodeswiththeappropriateconfigurationandhookingSparkuptotheHadoop
DistributedFileSystem(HDFS)inwhichwehouseourdata.TheBIDDatalibrariesand
ApacheSparkbothdependonnativelibrariesthatarespecifictothemachinethatone
isrunningon.ThismeantthatrunningtheBIDDatalibrariesontopofSparkrequires
additionalconfigurationstepsbeyonddownloadingandbuildingSparkandHadoopon
theEC2instances–weneededamechanismtospecifyallthelibrarymodulesthat
BIDDatarequirestobedistributedacrossallthenodesinourcluster.Traditionally,users
whowanttorunBIDDataonasinglemachinerunashellscriptthatdownloadsallthe
librarydependencies.TheSparkguideonconfigurationanddeploymentrecommended
addingintheappropriatelibrarypathstothedefaultconfigurationfilelocatedin
SPARK_HOME/confs/spark-defaults.conf,soIcreatedashellscriptthatdeploysthe
librariesfromthemasternodetotheslavenodesandincludedthefilepathinthe
configurationfileonallnodes.Althoughthissolutionisnotrobusttochangesinlibrary
dependenciesfromversionupgradesinSparkorBIDData,itremovedthebottleneck
thatpreventedusfromparallelizingourworkdistribution.Sincewealsowanttoallow
developerstobuildSparkwithBIDDataforthemselves,Idocumentedthedeployment
processaswellasthecommandlineinvocationstoimporttheBIDDatalibrarieswhen
runningSparkasastandalonecluster.
1.4SupportingReadingandWritingfromHDFS
BIDDatahasitsownspecializedmatrixdatatypesthatcharacterizethedifferent
primitivesthatarestoredwithinthematrixaswellastherepresentationofthematrix
itself(sparseordense).Astheabstractionsforsavingandloadingmatricesarethe
same,Iwilldetailtheprocessforsavingamatrixtodiskforbrevity.Currently,tosavea
matrixtodisk,theBIDMachAPIexposesanoverloadedsaveMatfunctionthat,at
runtime,callstheappropriatesavefunctionbasedonthematrixtype,writingittoan
OutputStream.InordertosupportreadingandwritingfromtheHadoopDistributed
FileSystem,weneededtorevampthematrixI/Oroutinestobebasedon
DataInputStreamsandDataOutputStreamsclassesratherthantheircurrentgeneric
variantsandimplementawrapperoverthesecustommatrixdatatypesinordertoallow
Hadooptoserializethedatafortransmissionacrossthenetwork.
ToupdatethematrixI/Oroutines,ProfessorCannyrewrotethelayerthatwrites
thematrixdatatodisktooperateonDataOutputStreamsinsteadofOutputStreams.
Thisensuresthattheunderlyingdataisformattedinaplatformindependentwayand
adherestoHDFS’sabstractionforI/O.Ithenaddedamiddlelayerthatchecksthe
matrix’sfilepath.IfthepathisaddressedtotheHDFS,IinvokethenewHDFSfunction
towrapthematrixinaserializableformatforHadoop.
1.5ParallelizingBIDData
1.5.1BIDDataArchitectureOverview
WithintheBIDDataarchitecture,theModelclassisanabstractionthat
implementsthespecificmachinelearningalgorithm.Forexample,thereisaKMeans
modelthatimplementstheKMeansmodelupdateandpredictionmethods.The
DatasourceandDatasinkclassesencapsulatethelogicforreadingfromandwritingto
thevariousdataformatsthatwesupport.ProfessorCannycreatedadatasourcecalled
IteratorSourcedesignedtoworkwiththeIteratorsclassprovidedbySpark.Finally,the
Learnerclassorchestratestheflowofmodeltrainingandprediction.Ititeratively
updatesthemodelandoutputsthepredictionstothedatasink.Thelearneralsohasa
referencetoanoptionsclass,whichasthenameimplies,storestheconfigurationsthat
thedeveloperprovide.Thedevelopercreatesaninstanceofalearner,passinginthe
datasource,thedatasink,andthemodelthatisappropriatefortheirmachinelearning
task,trainsthelearnerusingasubsetofthedata,andmakesapredictionwiththe
remainingdata(Canny2015).
1.5.2ExtendingtheLearner
Thelearnerclasspreviouslyassumedthatthemodelisrunonasinglemachine,
andthusrunsthroughtheiterationsoftrainingallatonceuponinvocation.However,
val (mm, opts) = KMeans.learner(data_path) mm.train
val (pp, popts) = KMeans.predictor(mm.model, test_data_path) pp.predict
SamplecodeforrunningKMeansonasinglemachine(Canny2015)
whenrunningBIDDataonadistributedplatform,weneedtosynchronizethemodels
withinthelearnersthatarerunningontheseparateexecutorsaftereachpassthrough
thedataset.
Iabstractedawaythelogicthathandlesonepassthroughthedatasetintotwo
functions,firstPassandnextPass,andinsteadhavetraincalltheappropriatepass
functionbasedonthecurrentiterationindex.Thismorefine-graincontroloverthe
learnerlogicletsthedistributedlearnertoperformmodelsynchronizationaftereach
passthroughthedataset,whilestillallowingthesingle-machinevarianttoiterate
throughallatonce.ThedistinctionbetweenfirstPassandnextPassisbasedonthefact
that,incertainalgorithms,thefirstpassthroughthedatasetisfundamentallydistinct
fromtheremainingpasses.Asanexample,theK-Meansalgorithminstantiatesthe
centroidclustersinthefirstpass,whereastheremainingpassesimprovethecentroids.
Comparisonbetweeniterativetraininganddistributedtraining
1.5.3APIforDistributedMachineLearning
Wedistributeourmodelandourdataacrosstheexecutors.Onebenefitofthis
approachisthatwecantrainmodelsthatdonotfitintothememoryofasingleGPU.
However,thisalsobringsadditionalchallengesofneedingtosynchronizethemodel
aftereachpassthroughthedataset.Atahighlevel,wewanttocreatealearnerforeach
executoranditeratethroughthedatathatexistslocallyonthatexecutoroneachpass,
synchronizingthelearner’smodelbetweenthepasses.Althoughwelaterwantto
utilizeKylix,abutterflyall-reducecommunicationnetworktosynchronizethemodelfor
optimumnetworkperformance,IcurrentlyuseSpark’simplementationoftreeReduce
asabaselineforcomparison.
Icreatedanapplicationprogramminginterface(API)thatabstractsawaythe
implementationdetailsforrunningBIDDataontopofSpark.TheAPItakesinseveral
parameters:thesparkcontextofclassSparkContext,thelearnerofclassLearner,and
thedataofclassRDD[(SerText,MatIO)]onwhichtorunthelearner,andthenumberof
executorsinthecluster.TheSparkContextvariableisrequiredtooperateontheRDD
abstraction,thelearneranddataarenecessarytorunthemachinelearningalgorithm
onthedataset,andthenumberofexecutorsispassedinsotheprogramcanload
balanceacrosstheexecutors.
1.5.4ImplementingDistributedKMeans
TotestouttheframeworktorunmachinelearningalgorithmsonSparkusing
BIDData,IimplementedadistributedvariantoftheexistingKMeansalgorithminthe
BIDDatamachinelearninglibrary.Thisrequiredaddingageneralreductionoperator,
combineModels,totheModelclass,whichtheKMeansmodeloverrideswithitsmodel-
specificreductionoperation.Sincethefirstiterationofamodelupdatemaybe
differentfromtheotheriterations,thefunctionalsotakesintheiterationindex.
InthefirstiterationofdistributedKMeans,wewanttorandomlysamplen
clustersfromourdatasetwithequalprobability.Idosobycountingthenumberofdata
pointsthathavebeenprocessedbyeachmodel,andsamplenclustersfromthetwo
models’clusterswithprobabilityproportionaltothenumberofprocesseddatapoints.In
theotheriterationsofmodelreduction,Iaddupallthecenters’featuresandaverage
thempost-reduction.
1.5.5KMeansBenchmarks
256clusters,batchsizeof10000,10epochs
Total Time Per Epoch
Sequential BIDMach 349 seconds 37 seconds
Distributed BIDMach with 16 clusters
45.6 seconds 1.7 seconds
Spark with 16 clusters 263.1 seconds 26 seconds
5000clusters,batchsizeof10000,10epochs
Total Time Per Epoch
Sequential BIDMach 1605 seconds 169.2 seconds
Distributed BIDMach with 16 clusters
235 seconds 14.4 seconds
Spark with 16 clusters 5779 seconds 576 seconds
ThesebenchmarkswereobtainedfromrunningonAWSg2.2xlargeinstances,eachwith
one1,5360CUDAcoresNVIDIAGPUandeightIntelXeonprocessors(Amazon2016).
FromthebenchmarksweseethatrunningBIDMachonSparkvastlyoutperformsboth
BIDMachrunningonasinglemachineaswellasSparkrunningonthesameEC2setup.
Asexpected,thereissomeoverheadforreducingthemodelattheendofeachepoch
andredistributingtheupdatedmodelbacktotheworkersatthebeginningofthe
followingepoch.
Chapter 2: Engineering Leadership
JamesJia
PradeepKalipatnapu
RichardChiou
YihengYang
2.1Introduction
Asdatastoragebecomesincreasinglycommoditized,companiesarecollecting
transactionalrecordsontheorderofseveralpetabytesthatarebeyondtheabilityof
typicaldatabasesoftwaretoolstostoreandanalyze.Analysisofthis“bigdata”canyield
businessinsightssuchascustomerpreferencesandmarkettrends,whichcanbring
companiesbenefitssuchasnewrevenueopportunitiesandimprovedoperational
efficiency(Manyikaet.al2011).Toanalyzethisbigdata,companiestypicallybuildor
usethirdpartymachinelearning(ML)tools.
ProfessorJohnCannyofUCBerkeley’sEECSdepartmenthasdevelopedBIDData,
asetofGPU-basedMLlibrariesthatiscapableofcompletingcommonMLtasksan
orderofmagnitudefasterthanrivaltechnologies,whenrunonasinglemachine.
However,most“bigdata”tasksrequiregreatercomputationalpowerandstoragespace
thanthatofferedbyasinglemachine.
OurcapstoneprojectintegratesBIDDatawithApacheSpark,afast,open-source
bigdataprocessingframeworkwithrapidadoptionbythesoftwareindustry.Integration
ofBIDDatawithSparkwillenablemorecomplexbigdataanalysisandgeneratetime,
energy,andcostsavings.
2.2TrendsandMarket
Themarketopportunityforbigdataisastronomical.Duetotheconvenienceof
consumerelectronics,moreandmoredailyservicessuchasbankingandshoppingare
beingconductedonline.Thesetransactionsgenerateagargantuanamountofuserand
marketdatathatcompaniescanbenefitfrom.Asanexample,thesocialnetworkingand
searchengineindustriesoftenusemachinelearninginordertoincreasetheirprofitson
sellingadvertisingspacetovariouscompanies,optimizingthepricingmodelforeachad
spotaswellasdeterminingthebestadvertisementplacementtomaximizethe
likelihoodofuserclicks(Kahn2014:7).Determiningtheoptimalpricingandplacement
strategyisespeciallypivotaltoGoogleandYahoo’soperations,sincemostofthesead
spacesarepaidforthroughapay-per-clickmodelandconstitute98.8%ofthetotal
revenueof$11.2billionin2014(Kahn2014:14).
Processinggiganticamountsofinformationwithinsub-secondtimeintervalsis
computationallydemanding.Moderncomputerprocessingunits(CPUs)canprocess
dataat1billionfloatingpointoperationspersecond,buttypicalbigdatasetsareonthe
orderofquadrillionsofbytes(Intel2016).TraditionalCPUswouldtakehourstoprocess
asinglebigdataset,andwillbecomeincreasinglyinefficientasdatasetsizescontinues
togrow.Thus,thereexistsagreatopportunityforcompaniesthatfocusoncost-
efficientdataanalyticstoolswhichprovideeasyintegrationandprocessdataquickly.
Forinstance,McKinseyGlobalInstituteestimatesthatretailersthatuseefficientbig
datatoolscouldincreasetheiroperatingmarginsbymorethan60percent(Manyikaet.
al2011).
Recenttechnologytrendsshowthatinordertoacceleratethespeedofbigdata
MLalgorithms,CPUtechnologyisbeingreplacedbythegraphicprocessingunit(GPU).
BecauseGPUsareoptimizedformathematicaloperations,theyareordersof
magnitudesfasterthanCPUsontasksrelatedtobigdataanalysis,andbothindustryand
academiaaremovingtowardsusingGPUs(LopesandRibeiro2010).AsaGPU-basedML
library,BIDDataalsooffersnumerousimprovementscomparedtorivaltechnologies.In
termsofsingle-machineprocessingspeedandelectricityexpenditure,BIDDataalready
beatstheperformanceofothercompetitors,includingdistributedMLlibraries,byan
orderofmagnitude,assumingthedatasetcanbehousedunderasinglemachine(Canny
2015).
However,single-instanceMLlibrariessuchasBIDDataarecurrentlynotwidely
usedinindustry,astheylackthescalabilitytohandleincreasingsizesandcomplexities
ofbigdatasets.Instead,mostcompaniesareturningtowardsefficientdistributed
computingplatformstoprocessthedata(Lowet.al2012).Thus,ourcapstoneproject
aimstointegrateBIDDatawiththedistributedframeworkofSpark,aleadingmachine
learningsoftwareinthemarkettoday.Moreover,theprojectalsoincorporatesAmazon
WebServicesforbigdatastorage,anotherplatformwhichmanycompaniesarealready
leveragingtoday(Amazon2016).Byusingthisunderlyinginfrastructure,BIDDatais
morelikelytobeviewedasahighlydesirablebigdataanalysistoolbythemarket.
2.3IndustryAnalysis
2.3.1ValueChainAnalysis
Figure1:Theabovevaluechaincovershowusersandcompaniesalikebenefitfrombigdata.Asconsumersuseproducts,technologycompaniesareabletocollectdataontheirhabitsandpreferences.Analyticsfirmsandthird-partytoolsprocessthisdataandselltheirfindings.Ultimately,bigdataanalyticscanleadtoimprovedproductsandbetteruserexperiences.
Inthebigdataindustry,firmsprocessinguserdatatogaininsightsintocustomer
habitsandpreferences.Thestatisticsandtrendsthattheydiscovercanbeusedto
refineexistingproductsandservicesorbesoldtoadvertisers.Forexample,userseither
buyaproduct(e.g.FitBit)orsignuptouseafreead-supportedproduct(e.g.Gmail)
fromvarioustechcompanies.Thesecompaniescollectdatafromtheirusersbasedon
theirusagepatterns:FitBitprovidesanonymized,aggregateddataforresearch
purposes,andGmailprovidesrelevantinformationonuserstotheGoogleAdsteam.
Thisdataispassedontoorganizationsthatspecializeinbigdataanalysis.Afterobtaining
insightsonthedata,theseorganizationsthenselltheirfindingsbacktothetech
companiesortofirmssuchasadvertisers.Ultimately,bigdataanalyticscanbeusedto
improveproductsanduserexperiencesforconsumers.
Whilethesebigdataanalysiscompanieshaveaccesstolotsofdata,theymay
nothavetheunderstandingortheresourcestocreateeveryappropriatetoolfor
analyzingalloftheirbigdata,whichcancomeinvariousformats.Subsequently,these
bigdataanalysisorganizationsmustturntothird-partycustomerstoprocesssomeof
theirdata.Becauseitsmachinelearninglibrariesrecordthebestpossiblebenchmarks
amongsttheirpeers,BIDDatahasastrongcaseforbecomingoneoftheseleadingthird-
partytools.Subsequently,astartupwithexpertiseinBIDDataonSparkcouldactasboth
asupplierandaconsultantforthesebigdataorganizations.
2.3.2Porter’sFiveForcesAnalysis
AccordingtoMichaelPorter,allcompaniesfacefiveforcesofcompetitionwithin
theirindustry.Despiteitsbenchmark-leadingperformances,BIDDatafacespotential
competitionfromexistingfirmsandfutureentrantsinthebigdataindustry.
2.3.3ThreatofSubstitutes
CurrentmachinelearninglibrariesmostlyrunonCPUs,butasdiscussedinthe
TrendsandMarketssection,theindustryhasshiftedawayfromthem.Asevidencedby
Netflix’srecentswitchtousingGPUs,theindustryhasnoticedthatGPU-basedsoftware
solutionsrecordhigherbenchmarksthantraditionalCPU-basedtools(Morgan2014).
Thusinthelongterm,thethreatofsubstitutesisrelativelyweak,asCPU-based
softwareisgraduallyreplaced.
2.3.4BargainingPowerofSuppliers
Typically,programmersarethemainindividualsinvolvedinthecreationof
softwarelibraries.Currently,BIDDataisopensource,allowinganyinterested
independentprogrammerstocollaborateandmakeunpaidcontributionstotheproject.
Thus,thebargainingpowerofsupplierswithregardtowagesisextremelyweak.Asan
additionalbenefit,theopensourcemodelcanpotentiallyleadtohigh-qualityproducts
atafractionofthecost(Weber2005).
2.3.5BargainingPowerofConsumers
Ontheotherhand,thebargainingpowerofconsumersinthisspaceisstrong.
AlthoughlotsofresearchonGPU-basedMLtechniqueshasbeenconductedinrecent
years,mostcomputersusedbyconsumersandcompaniesalikestillonlyuseCPUs.
Subsequently,customersaremorelikelytochoosefromawidevarietyofCPU-based
MLtoolstouse.Moreover,thebiggestandmostprofitablecustomers(e.g.Google)
havetheresourcestocreatetheirowndataanalysistoolsforinternaluse,whichcan
furtherdepressdemandforthird-partymachinelearningtoolsinthisspace.While
newercomputerscomewithGPUs,itmaytakesometimebeforeusersandcompanies
fullyinvestinandtransitiontoGPU-basedMLtools.
2.3.6ThreatofNewEntrants
Ingeneral,thesoftwareindustryhasanextremelylowbarriertoentry,asitonly
takesasingleprogrammerwithonecomputertocreatefully-functionalsoftware.
Additionally,softwareundergoeslotsofiterationsquickly.AlthoughBIDData’s
underlyingGPUtechnologyisstillrelativelynew,startupsandexistingfirmsalikeare
growingmoreinterestedinGPU-basedsolutions.Subsequently,theindustryfacesa
verystrongthreatfromnewentrants.
2.3.7CompetitiveRivalry
Becausebigdatacomesfromavarietyofsourcesandinamultitudeofformats,
consumersrequiremanydifferentMLtechniquesforanalysis.IfBIDDatadoesnot
containanimplementationofaspecificMLalgorithm,consumerscouldsimplyuse
anothertoolthatoffersit.Asaresult,thereexistsanintensefeature-basedrivalryinthe
third-partytoolsspace,inwhichseveralfirmsofferamultitudeofservices.Forinstance,
onefirmmayspecializeindataclassification,whileanotherfirmmayprovideaproduct
optimizedfordataregression.Nonetheless,thisfeature-basedrivalrycouldallowmore
playerstoco-existinthisspace.
2.4GotoMarketStrategy
AlthoughGPUaccelerationhasbeenidentifiedasapromisingdevelopment,itis
largelystillinitsinfancyandhasnotseenwidespreadadoptionacrossmultiplesectors.
Ratherthanfocusingonprofitabilityasintraditionalmodels,wewillfocusonastrategy
thathelpsusgainmarketshare.Todoso,wearegoingtotargetaparticularsubsetof
companiesthathavebigdataproblems,specificallycompaniesthathavetheabilityto
collectlargeamountsofdata,butnotnecessarilyaccesstothecomputationalpoweror
resourcestoobtainbusinessintelligencefromit.Aswehavediscussedearlier,Fitbitisa
primeexample:theyhaveagreatcapacitytogatherinformationfromtheirdevicesas
anauxiliaryeffectoftheirproduct,butitwouldrequireanextraordinaryamountof
technicalexpertiseaswellasinfrastructuretosiftthroughthevastseaofdata.
Withthisinmind,ourgotomarketstrategyhasthreemainemphases.First,we
areutilizinga“plugandplay”modelthatemphasizesfluidsoftwareintegrationto
encourageearlyadoptions.Sincewearealsogoingtoremainopensource,thiswillalso
inspiredeveloperstocontributebacktoourcodebase.Thestrongeradvocatescould
alsoservetoevangelizewithintheircompanies,givingusstrongerleverageoverour
competitors.Lastly,thenatureofourproductisinherentlyscalable.Oncewehave
writtencodereadyforproduction,thecosttohaveanadditionaldeveloperuseour
codebaseisnegligible.
Whilecustomersmaycitealackoftechnicalsupportasanobstacletoadoption,
thereexistsanopportunityforstartupslikeDatabrickstogaincustomersthroughtheir
consultingservices.Subsequently,wecouldproactivelyestablishpotentialpartnerswith
companieslikeDatabricksthatoperateonaconsultingmodel,segmentingoursolution
astheleadingGPU-acceleratedmachinelearninglibrary.
Bibliography
Amazon.(n.d.).AllAWSCaseStudies.RetrievedMarch5,2016,fromhttps://aws.amazon.com/solutions/case-studies/all/
Amazon.(n.d.).EC2InstanceTypes.RetrievedMay7,2016,fromhttps://aws.amazon.com/ec2/instance-types/
Apache.(n.d.).SparkProgrammingGuide.RetrievedApril14,2016,fromhttp://spark.apache.org/docs/latest/programming-guide.html
Apache.(n.d.).Clustering-spark.mllib.RetrievedFebruary9,2016,fromhttp://spark.apache.org/docs/latest/mllib-clustering.html
Canny,J.F.,&Zhao,H.(2013).BIDMach:Large-scaleLearningwithZeroMemoryAllocation.BigLearning.Retrievedfromhttp://biglearn.org/2013/files/papers/biglearning2013_submission_23.pdf
Canny,J.F.(2015,August19).Benchmarks.RetrievedFebruary10,2016,fromhttps://github.com/BIDData/BIDMach/wiki/Benchmarks
Canny,J.F.(2015,September27).BIDMachTutorials.RetrievedFebruary14,2016,fromhttps://github.com/BIDData/BIDMach/wiki/BIDMach-Tutorials
Intel.(n.d.).6thGenerationIntel®Core™i7Processors.RetrievedMarch5,2016,fromhttps://www-ssl.intel.com/content/www/us/en/processors/core/core-i7-processor.html
Kahn,S.(2014).SearchEnginesintheUS.IBISWorldIndustryReport51913a.RetrievedfromIBISWorlddatabase.
Lopes,N.,Ribeiro,B.,&Quintas,R.(2010).GPUMLib:AnewLibrarytocombineMachineLearningalgorithmswithGraphicsProcessingUnits.201010thInternationalConferenceonHybridIntelligentSystems.doi:10.1109/his.2010.5600028
Low,Y.,Bickson,D.,Gonzalez,J.,Guestrin,C.,Kyrola,A.,&Hellerstein,J.M.(2012).DistributedGraphLab.Proc.VLDBEndow.ProceedingsoftheVLDBEndowment,5(8),716-727.doi:10.14778/2212351.2212354
Manyika,J.,Chui,M.,Brown,B.,Bughin,J.,Dobbs,R.,Roxburgh,C.,&Byers,A.H.(2011,May).Bigdata:Thenextfrontierforinnovation,competition,andproductivity(Rep.).Retrievedhttp://www.mckinsey.com/business-functions/business-technology/our-insights/big-data-the-next-frontier-for-innovation
Morgan,T.P.(2014,February11).NetflixSpeedsMachineLearningWithAmazonGPUs.Retrievedfromhttp://www.enterprisetech.com/2014/02/11/netflix-speeds-machine-learning-amazon-gpus/
Porter,M.E.(2008).TheFiveCompetitiveForcesThatShapeStrategy.HarvardBusinessReview.
Weber,S.(2004).Thesuccessofopensource.Cambridge,MA:HarvardUniversityPress.