![Page 1: Founding a Hadoop Lab€¦ · My Adventures in Hadoop Lead Hadoop adoption at three Canadian banks Established a successful Hadoop COE Advisory roles on Hadoop in finance My Career](https://reader036.vdocuments.net/reader036/viewer/2022070100/600620edbe9d405309038957/html5/thumbnails/1.jpg)
FoundingaHadoopLabEVERYTHINGYOUALWAYSWANTEDTOKNOW,
BUTWEREAFRAIDTOASK,
ABOUTF INDINGSUCCESSWITHHADOOP INYOURORAGANIZATION
©UTILISTECHNOLOGYLIMITED2017
![Page 2: Founding a Hadoop Lab€¦ · My Adventures in Hadoop Lead Hadoop adoption at three Canadian banks Established a successful Hadoop COE Advisory roles on Hadoop in finance My Career](https://reader036.vdocuments.net/reader036/viewer/2022070100/600620edbe9d405309038957/html5/thumbnails/2.jpg)
AShortIntroductiontoYourSpeaker
MyAdventuresinHadoop◦ LeadHadoopadoptionatthreeCanadianbanks◦ EstablishedasuccessfulHadoopCOE◦ AdvisoryrolesonHadoopinfinance
MyCareerinFinance◦ Fourbanks,onestockexchange,onepensionfund◦ Capitalmarkets,retailbanking,enterpriseriskroles◦ FounderoftwoITdepartments◦ TechnologyleaderinRiskSystemsfor15years–
◦ Architect,EnterpriseRiskSystems◦ Architect,FrontOfficeRiskSystems◦ ProgramManager,PortfolioManagementSystems◦ HeadofRiskSystems◦ HeadofHadoopCOE
![Page 3: Founding a Hadoop Lab€¦ · My Adventures in Hadoop Lead Hadoop adoption at three Canadian banks Established a successful Hadoop COE Advisory roles on Hadoop in finance My Career](https://reader036.vdocuments.net/reader036/viewer/2022070100/600620edbe9d405309038957/html5/thumbnails/3.jpg)
AgendaWhatrolewillyourHadoopLabplay?◦ Definingobjectives,buildingateamandformingpartnerships◦ Foundationalworktosetapathtosuccess
Whatisareasonablebudget?◦ Calculatingyour“room”basedonindustrybenchmarks◦ Capacityplanning,charge-out,andthecentralcapitalaccount
Real-lifeLessonsLearned◦ SettingupinfrastructuretotakeadvantageofHadoop’suniqueproperties◦ Creatingapracticethatfitsyourusers’workstyles
ProjectsthatSucceed◦ Ideasforaquickwintokeepeveryonemotivated◦ Mediumriskprojectsalignedtocurrentbusinessproblems
![Page 4: Founding a Hadoop Lab€¦ · My Adventures in Hadoop Lead Hadoop adoption at three Canadian banks Established a successful Hadoop COE Advisory roles on Hadoop in finance My Career](https://reader036.vdocuments.net/reader036/viewer/2022070100/600620edbe9d405309038957/html5/thumbnails/4.jpg)
WhatrolewillyourHadoopLabplay?“YOUCAN’TSHRINKYOURWAYTOGREATNESS”- TOMPETERS
![Page 5: Founding a Hadoop Lab€¦ · My Adventures in Hadoop Lead Hadoop adoption at three Canadian banks Established a successful Hadoop COE Advisory roles on Hadoop in finance My Career](https://reader036.vdocuments.net/reader036/viewer/2022070100/600620edbe9d405309038957/html5/thumbnails/5.jpg)
WhatrolewillyourHadoopLabplay?Willyourorganization’sHadoopLabbeacontrolfunction,orathoughtleader?
Controlfunctions◦ Operationalcontrols,complianceandauditing◦ Budgeting◦ Architecturegating◦ Datagovernance
Thoughtleadership◦ Designpatternsandsolutionarchitecture◦ Demonstrationprojectsandproofs-of-concept◦ Fillingupthetalentpoolusingtraining,workshopsandusergroups◦ Educatingonbestpracticesandsuccessstoriestomotivateadoption
![Page 6: Founding a Hadoop Lab€¦ · My Adventures in Hadoop Lead Hadoop adoption at three Canadian banks Established a successful Hadoop COE Advisory roles on Hadoop in finance My Career](https://reader036.vdocuments.net/reader036/viewer/2022070100/600620edbe9d405309038957/html5/thumbnails/6.jpg)
FoundationalWorkInvestinuser-friendlyoperationalmanagement◦ Designasimplemulti-tenancyplanbasedongroupmembership
◦ Includeshareofexecutionqueues,directorystructuresandcascadingpermissions
◦ Setupself-serveuseron-boardingthroughyourorganization’sHelpDesk◦ ImplementsinglesignonforKerberos-securedclusters
Manageexpectationsbymonitoringperformance◦ Setservicelevelobjectivesforbothinteractiveandapplicationuses◦ Use“showback”reportingtomonitorperformanceagainstobjectives
Implementaccesscontrolgovernanceasabasicservice◦ Generateaccesscontrolmatrixauditscentrallyforallgridusers
◦ ReportingfromRanger’sdatabaseworkswellandiseasytobuild
◦ Setpolicyandpreparereportsforperiodicattestation/useraccountreviews
![Page 7: Founding a Hadoop Lab€¦ · My Adventures in Hadoop Lead Hadoop adoption at three Canadian banks Established a successful Hadoop COE Advisory roles on Hadoop in finance My Career](https://reader036.vdocuments.net/reader036/viewer/2022070100/600620edbe9d405309038957/html5/thumbnails/7.jpg)
MaximizingExposuretoChangeHadoopisanexceptionallyfastmovingtechnology,andsoneedsadifferentapproach◦ MaximizeyourabilitytodeploythechangesintheHadoopplatform
◦ Investincontinuousintegrationandautomatedregressiontestingforyourdevelopmentteams◦ Establishabetter-than-quarterlyreleasecycle◦ Publishachecklistofacceptableopensourcelicenses(orblacklistofprohibitedones)
◦ EncourageuseofHadoopasanapplicationcontainer◦ Setuplabenvironments
Discouragepracticesthatpreventyourorganizationfromkeepingpace◦ AvoidencapsulatingHadoopwithframeworksorwrappingHadoopinsideapplications◦ Avoidproprietaryadd-ons– theydon’tgetasmuchcollaborationintheopensourcecommunity◦ Prohibitequipment“carveouts”fromyoursharedgrid
◦ Includethecostofadditionalequipmentinthebusinesscase,co-locate,andchargeoutaccordingly
![Page 8: Founding a Hadoop Lab€¦ · My Adventures in Hadoop Lead Hadoop adoption at three Canadian banks Established a successful Hadoop COE Advisory roles on Hadoop in finance My Career](https://reader036.vdocuments.net/reader036/viewer/2022070100/600620edbe9d405309038957/html5/thumbnails/8.jpg)
BuildingaTeamDataEngineersarethekeytothesuccessfuladoptionofadatalake◦ Dataengineersarehybridofintermediatedeveloperandjuniordatascientist◦ Gooddataengineeringacceleratesdatascience,andtheabilitytodeploydatasciencetoproduction
Otherrolestoconsider◦ AfewversatileseniordeveloperstogiveyoutheabilitytoexecutePOCs◦ DataLibrariantomanagethemetadatacatalogueanddocumentation◦ DataStewardtomanagethedatagovernanceprocess
Keepafewconsultantsonspeeddial◦ Hadoopsecurityexperts– preferablyfromanaudit-capablefirm◦ Complianceandfairusageexperts– particularlyforexternaldatafromthewebandsocialmedia
FundtheHadoopandLinuxadministrators,butleavethemintheinfrastructureteam◦ Theyneedtheadministrativeaccessthattheseteamsareallowed
![Page 9: Founding a Hadoop Lab€¦ · My Adventures in Hadoop Lead Hadoop adoption at three Canadian banks Established a successful Hadoop COE Advisory roles on Hadoop in finance My Career](https://reader036.vdocuments.net/reader036/viewer/2022070100/600620edbe9d405309038957/html5/thumbnails/9.jpg)
YourNewBestFriendsGiveallofyourstakeholdersachancetoparticipate,byformingaworkinggroup◦ Exposuretobusinessstakeholdersisparticularlyvaluablefortechnologyteams
EnlisttheCapitalMarketsinfrastructureteamtobuildandmanagetheHadoopgrid◦ Itisworthsolvingtheaccountingproblemstogettheirexpertise
Co-optyourexistingdatahub’steamtooperateyournewDataLake’sprocesses◦ BCBS-239projectshaveprovidedanexcellentopportunitytodothis
AdoptingasecondarySQLonHadoopsolutionhelpstotransferskillsaswellascode◦ IBMDB2isavailableforHadoop– greatwaytomoveoverabank’sdatawarehousetotheLab◦ OtherANSI-compliantsolutionsincludeHAWQ,Vertica,Polybase*
![Page 10: Founding a Hadoop Lab€¦ · My Adventures in Hadoop Lead Hadoop adoption at three Canadian banks Established a successful Hadoop COE Advisory roles on Hadoop in finance My Career](https://reader036.vdocuments.net/reader036/viewer/2022070100/600620edbe9d405309038957/html5/thumbnails/10.jpg)
Whatisareasonablebudget?“PRICEISWHATYOUPAY. VALUEISWHATYOUGET.”- WARREN BUFFET
![Page 11: Founding a Hadoop Lab€¦ · My Adventures in Hadoop Lead Hadoop adoption at three Canadian banks Established a successful Hadoop COE Advisory roles on Hadoop in finance My Career](https://reader036.vdocuments.net/reader036/viewer/2022070100/600620edbe9d405309038957/html5/thumbnails/11.jpg)
UnderstandingtheCustomersBeforesettingabudget,decidewhoyou’regoingtochargeforyourHadoopLab◦ DataproducerswillseeHadoopasacost-reductionopportunity
◦ Mostfront-endsystemshavedozensofoutboundfeedsthattheyhavetosupportandmaintain– offerthemthechancetodropoffasinglecomprehensivefeedtoHadoopsothatconsumerscanbuildandmanagetheirownoutboundfeeds
◦ Consumingsystemsalsohavesupportteamsmanaginginboundfeeds,sotheywon’tseeasignificantchangeinsupportcosts
◦ DataconsumerswillseeHadoopasimprovingtheircapabilities◦ Traditionaldatasupplychainisverylong:sourcesystemfeedsanEDW,whichfeedsadatamartaccessedbydatascientists◦ Askingfor“onemorefield”requiressourcetosendit,EDWtomodelanddocumentit,datamarttoprovisionit,andthenfinallya
datascientistgetstoconsumeit◦ Givingdatascientistsaccesstotherawdatamakesthemmore efficient– eventhoughless effortgoesintoprovidingthedata!
Alignthefundingmodeltothebenefitsrealizedbytheparticipants:◦ One-timecoststoon-boardnewdatashouldcomefromtheproducerofthedata◦ On-goingoperatingcostsfortheHadoopgridshouldbesharedbytheconsumersofgridservices
![Page 12: Founding a Hadoop Lab€¦ · My Adventures in Hadoop Lead Hadoop adoption at three Canadian banks Established a successful Hadoop COE Advisory roles on Hadoop in finance My Career](https://reader036.vdocuments.net/reader036/viewer/2022070100/600620edbe9d405309038957/html5/thumbnails/12.jpg)
SettingaBudgetforaHadoopLabAnnualcostofHadoopiswidelyquotedasUS$1,000/TB◦ ThiscomparesfavorablytoUS$5KforaSAN,andUS$12Kforatraditionaldatabase◦ Costbasedon“balanced”referenceconfigurations– “compute”ismore,“storage”isless
Usethiswell-knownindustrybenchmarktosetyourbudget◦ Fullyloadedcostsforabank-sizedHadoopgridinabankdatacentrearearoundUS$550/TBperyear
◦ Capitalchargesforinfrastructurecosts,includingserversanddedicatednetworkswitching,areamortizedoverthreeyears◦ Premisescostsfordatacentreincludebareracks,powerandnetworkbackbone◦ On-goingsupportsubscriptionsforoperatingsystemsandHadoop,andnext-dayhardwarereplacementincluded
◦ ThiscreatesaroundUS$450/TBperyearofbudgetroomforyourHadoopLabtoclaim◦ Atypicalbank-sizedHadoopgridis2-4PB,whichyieldsaLabbudgetofUS$1MM-$2MMperyear
◦ Thisbudgetfundsastaffof10-20basedontypicalbudgetingnumbersofUS$100K/FTEperyear
![Page 13: Founding a Hadoop Lab€¦ · My Adventures in Hadoop Lead Hadoop adoption at three Canadian banks Established a successful Hadoop COE Advisory roles on Hadoop in finance My Career](https://reader036.vdocuments.net/reader036/viewer/2022070100/600620edbe9d405309038957/html5/thumbnails/13.jpg)
FinancingSharedHadoopGridsEstablishausagedrivenchargeoutmodelforconsumersoftheservice◦ ChargingbasedonablendofCPUandstorageconsumptionwillbalancecomputeanddatauses◦ Considerchargingconsumersbyservicequalityifyourserviceagreementspermit
◦ Servicequalitycanbedesignedintoyourmulti-tenancysolution
CreateacentralcapitalaccountmanagedbytheHadoopLab◦ Pre-authorizeincrementalexpansionofthedatalaketostaywithinserviceobjectives◦ Amortizationofcapitalaccountwillsmoothoutchargestoavoidpenalizingearlyadopters
![Page 14: Founding a Hadoop Lab€¦ · My Adventures in Hadoop Lead Hadoop adoption at three Canadian banks Established a successful Hadoop COE Advisory roles on Hadoop in finance My Career](https://reader036.vdocuments.net/reader036/viewer/2022070100/600620edbe9d405309038957/html5/thumbnails/14.jpg)
CreativeProjectFinancingManagementlovestoapprove“self-fundingprojects”◦ UsethecostdifferentialofstorageonHadooptofundintra-yearwork
◦ MigratehistoricalcontentfromoperatingdatabasestoHadooptosaveondatabase“tierone”SANcosts◦ CapturegridcomputeoutputstoHadoopinsteadofNASdevices◦ Storingdatabaseback-upsonHadoopcanbecheaperthantapes
Establishaninternal”venturecapital”fundinyourHadoopLab◦ Budget“seedmoney”tospendwiththeapplicationmaintenanceteams
◦ Mostapplicationshave“lightson”fundinginsufficienttosupportthePOCsneededtoexploreHadoopadoption◦ Setasidefundingtopayforcross-teamchargesforparticipationinaPOC◦ UsethePOCstosupportprojectproposalsbasedoncostreduction
◦ StaffingtheHadoopLabwithasmallteamofversatiledeveloperscompletesthiscapability
![Page 15: Founding a Hadoop Lab€¦ · My Adventures in Hadoop Lead Hadoop adoption at three Canadian banks Established a successful Hadoop COE Advisory roles on Hadoop in finance My Career](https://reader036.vdocuments.net/reader036/viewer/2022070100/600620edbe9d405309038957/html5/thumbnails/15.jpg)
Real-LifeLessonsLearned“NOTHINGISLESSPRODUCTIVETHANTOMAKEMOREEFFICIENTWHATSHOULDNOTBEDONEATALL”- PETER DRUCKER
![Page 16: Founding a Hadoop Lab€¦ · My Adventures in Hadoop Lead Hadoop adoption at three Canadian banks Established a successful Hadoop COE Advisory roles on Hadoop in finance My Career](https://reader036.vdocuments.net/reader036/viewer/2022070100/600620edbe9d405309038957/html5/thumbnails/16.jpg)
SaveMoneybyLettingitBreakIt’sOKifanodebreaks– infact,itisbettertohaveadeadHadoopnodethanawoundedone
Educateyourinfrastructureteamtopreventthemfromover-engineeringyourHadoopgrids◦ HDFSimplementsaRAIDstrategyinsoftware– uselocaldisksinsteadofSANfordatanodes◦ YARNiscleveraboutparallelizingwork– don’tusehigh-speeddriveswhencheaponeswilldo◦ Don’tpayfor“criticalcare”hardwaresupportwhennext-daywillbefine
AppliancesandvirtualizationbreaktheeconomicsofHadoop◦ Equipmentfailureinanapplianceisall-or-nothing
◦ CentralizingtheHadoopgridintooneapplianceincreasestheneedforexpensivefaulttolerance◦ Unitpricesincreaseasaresult– annualcostsonappliancesbarelystayunderthe$1K/TBbenchmark
◦ Yourvirtualizationfarmduplicatesallofthefault-toleranceinHadoop– andslowsHadoopdown◦ Vendorbenchmarksshowthatvirtualizationisnowalmost asperformanthasbare-metalHadoopgrids◦ Virtualserversaresmallerandsoyouendupwithmorenode-count-drivenHadoopcosts
![Page 17: Founding a Hadoop Lab€¦ · My Adventures in Hadoop Lead Hadoop adoption at three Canadian banks Established a successful Hadoop COE Advisory roles on Hadoop in finance My Career](https://reader036.vdocuments.net/reader036/viewer/2022070100/600620edbe9d405309038957/html5/thumbnails/17.jpg)
NetworksReallyMatterThequalityofthenetworkismoreimportantthanthequalityofthemachines◦ MapReduce“bringscomputetothedata,”butHadoopstillgenerateslotsofinternalnetworktraffic◦ DatahubandETLoffloadpatternswillgeneratealotoftrafficintoandoutofthegrid◦ Legacytools– mostnotablySAS– willtrytopulllargedatasetsoutofHadoopacrossthenetwork
Investintop-of-rackswitchingorconvergedinfrastructure◦ Mostdatacentreshave1Gbbackbonesconnectinghigherspeedsub-networks◦ Bonded40GbuplinkswithintheHadoopgridandacrossracksarewellworththeaddedcost
Spendthemoneyandtimetoco-locatetheconsumingsystemswithintheHadoopsub-network◦ Thiswillmeana“re-racking”exerciseforsomeappliancesandexistingservers
![Page 18: Founding a Hadoop Lab€¦ · My Adventures in Hadoop Lead Hadoop adoption at three Canadian banks Established a successful Hadoop COE Advisory roles on Hadoop in finance My Career](https://reader036.vdocuments.net/reader036/viewer/2022070100/600620edbe9d405309038957/html5/thumbnails/18.jpg)
DifferingAppetitesforChangeEveryone’sfirstideaistohaveonegreat,shared,co-operativedatalake– anditdoesn'twork!◦ Themoresuccessfulyouareinon-boardingdataproducers,thegreaterthedifficultyofupdatingtheDataLake’sHadoopdistribution– theincentiveto“standpat”grows◦ Evenworseifyou’reusingthird-partytoolsforingestion– itcreatesanexternal stakeholderwhichcanblockchange!
◦ Themoresuccessfulyouareinon-boardingdataconsumers,thegreaterthedemandtoupdatetheDataLake’sHadoopdistribution– datascientistsalwayswantthemostcurrent nextversionofeverything
Separatetheinteractiveusersfromtheapplicationswithafederateddeploymentmodel◦ PutalloftheapplicationsontoaHadoopgridwhichisupdatedveryinfrequently
◦ Staticworkloadsalsoallowtightmanagementofperformanceagainstserviceagreements
◦ PutallofthedatascientistsontotheirowngridthatupdateswiththeHadoopdistribution◦ Self-servedataprovisioningtosmallgridsinacloudalsoworksreallywellfromtheconsumer’sview
◦ Makesureyouhaveagreatnetworksothatmovingdatabetweenthegridsispainless
![Page 19: Founding a Hadoop Lab€¦ · My Adventures in Hadoop Lead Hadoop adoption at three Canadian banks Established a successful Hadoop COE Advisory roles on Hadoop in finance My Career](https://reader036.vdocuments.net/reader036/viewer/2022070100/600620edbe9d405309038957/html5/thumbnails/19.jpg)
HadoopisNotaDatabaseProjectsthatattempttoreplaceadatabaseserverwithHadoopusuallyfail◦ Avoidtransactionalapplications◦ DonotreplacethedatabasetierinanN-tierapplicationwithHadoop
◦ ThinkofHadoopascontainerinstead,andre-architecttheapplicationtoruninsideHadoop
◦ DonotuseHadooptohosthighlynormalizeddatawarehousemodels◦ De-normalizeddatamodelsaremuchmoreefficientonHadoop
◦ DonotcreateabstractionlayersusinglayeredHiveviews
ThebestdesignpatternsforHadoopareoftenmisused◦ “ETLOff-Load”oftenturnsintoHadoopasanFTPdropzone◦ “BringComputetoData”doesn’tmeanusingadatanodetohostanapplicationserver◦ Map/ReduceshouldberunwithMapReduce– notusingHivetocallUDFs
![Page 20: Founding a Hadoop Lab€¦ · My Adventures in Hadoop Lead Hadoop adoption at three Canadian banks Established a successful Hadoop COE Advisory roles on Hadoop in finance My Career](https://reader036.vdocuments.net/reader036/viewer/2022070100/600620edbe9d405309038957/html5/thumbnails/20.jpg)
InternalDataisMoreDifficulttoAccessThinkofyour360° viewofacustomerasbeing180° oftransactionsand180° ofinteractions
Datagovernance,compliance,andsecuritywillinhibittheuseofthetransactionaldata◦ Internaldatasourcesarealsousuallyhigh-costdatasourcestoaccess
Interactiondata– particularlywebandsocialmediaissurprisinglyeasytoaccess◦ Socialmediadataisactuallyconsidered“public,”andsoisentirelyungoverned
◦ Thereareawealthofopensourcesocialmediaingestionandanalysistoolsavailable
◦ IVRsystemsarelinkedtocustomersandcaptureasignificantamountofcustomerinteraction◦ MajorIVRsystemsdiscardtheiroperatingdataafter3-4monthsratherthanwarehousingit
◦ CallCentrerecordingsareawealthofinternalsentimentdata◦ Opensourcetexttospeechandnaturallanguageprocessingtoolsareavailableinpython
◦ Websiteclicksandusagecanbeanalyzedforpriceoptimizationandusedforpushmarketing◦ Mostwebsiteusageisanalyzedthroughvendors– butsettingupaninboundfeediseasy
![Page 21: Founding a Hadoop Lab€¦ · My Adventures in Hadoop Lead Hadoop adoption at three Canadian banks Established a successful Hadoop COE Advisory roles on Hadoop in finance My Career](https://reader036.vdocuments.net/reader036/viewer/2022070100/600620edbe9d405309038957/html5/thumbnails/21.jpg)
DataScienceisUnstructuredWorkDatascientistsdon’tworkthewayITexpectsthemto◦ Traditionaldatawarehousingpatternsarethedatascienceanti-pattern
◦ Datascientistsdon’tknowwhattheirrequirementsareuntilthey’vedonetheirwork– theirjobistoexperiment◦ Datascientistshatepreparedviewsbecausetheydon’tknowwhatlogiccreatesthem
◦ Don’twaste(toomuch)timeoncentraldataquality– they’rejustgoingtore-doitanyway◦ ”Correct”dataissubjectivebystudy,sothereisn’tananswertoimplementcentrally◦ Preparingatimeseriesincludesdataqualitysuitabletodatascience– regardlessofhowgoodthestartingdatais
◦ Datascientistsprobablyknowthedatabetterthanthedatamodelers
![Page 22: Founding a Hadoop Lab€¦ · My Adventures in Hadoop Lead Hadoop adoption at three Canadian banks Established a successful Hadoop COE Advisory roles on Hadoop in finance My Career](https://reader036.vdocuments.net/reader036/viewer/2022070100/600620edbe9d405309038957/html5/thumbnails/22.jpg)
DataScienceLabsDatascientistswanttodevelopanalyticsusingproductiondata– whichbreakslotsofpolicies
SupportthecreationofaDataScienceLabenvironment◦ Leada“onceandforever”platformsecurityreviewthatallHadoopuserscanreference◦ Implementdatagovernancethatfacilitates“windowshopping”forcontent– evenwhengovernancewillinitiallyprohibitusingthecontent
Investinadvanceddatamasking◦ Investinadvanceddatamaskingtoprepareproductiondataforthedatasciencelab◦ Advanceddatamaskingretainsthestatisticalpropertiesoftheunderlyingdata
Buyaself-servedataprovisioningtool◦ Datascientistsloveto“shop”fordataandloveto”engineer”datausingquery-by-exampletools
◦ Thegoodtoolsturnthe”shoppingtrip”intodeployablecodethatyoucanpackagefordeploymentorautomationeasily
![Page 23: Founding a Hadoop Lab€¦ · My Adventures in Hadoop Lead Hadoop adoption at three Canadian banks Established a successful Hadoop COE Advisory roles on Hadoop in finance My Career](https://reader036.vdocuments.net/reader036/viewer/2022070100/600620edbe9d405309038957/html5/thumbnails/23.jpg)
ProjectsthatSucceed“RISKCOMESFROMNOTKNOWINGWHATYOU’REDOING”- WARREN BUFFET
![Page 24: Founding a Hadoop Lab€¦ · My Adventures in Hadoop Lead Hadoop adoption at three Canadian banks Established a successful Hadoop COE Advisory roles on Hadoop in finance My Career](https://reader036.vdocuments.net/reader036/viewer/2022070100/600620edbe9d405309038957/html5/thumbnails/24.jpg)
QuickWinsFindingaquickwinortwowillkeepyourorganizationmotivatedtoadoptHadoop
Massivelyparallelback-testingofStreamBasealgorithms◦ StreamBaseisareal-timeworkflowplatformwidelyusedinprogramtrading◦ MapReducecanencapsulateStreamBaseinordertorunhundredsofcopiesinparallel
Targetingadsonsocialmedia◦ BothTwitterandFacebookhaveverygoodAPIsthatyoucanquicklyusetobuildafeed◦ Python-basedtoolscanbepairedwithsomebasicdatasciencetofind“lifeevents”
TrendAnalysisonRiskData◦ SimulationoutputsfromCVA,VAR,CCR,LRMareoftendiscardedafteronedayduetotheirsize◦ ArchivingonHDFSpermitstrendanalysisatthetradelevelfordiagnosticsandcapitalplanning
![Page 25: Founding a Hadoop Lab€¦ · My Adventures in Hadoop Lead Hadoop adoption at three Canadian banks Established a successful Hadoop COE Advisory roles on Hadoop in finance My Career](https://reader036.vdocuments.net/reader036/viewer/2022070100/600620edbe9d405309038957/html5/thumbnails/25.jpg)
Mid-SizedProjectsManycurrentfocusareasinfinancelendthemselvestoachievableHadoopprojects
VolckerRule◦ VolckerRulemetricsrequireanenormousamountofdata,whichisexpensivetostore◦ Retentionisrequiredforfiveyearsofcalendardays◦ ComputationscanbeimplementedinSQLandwillrunwellinHive
Customer360◦ Hadoopisanaturalplatformtoconsolidateinteractionrecordswithtransactionaldata
DailyLiquidityManagement◦ Runningthecalculationsbeforepoolingfacilitatesdrill-downandanalysis◦ TableauonHadoopworksverywellfordailydashboards
![Page 26: Founding a Hadoop Lab€¦ · My Adventures in Hadoop Lead Hadoop adoption at three Canadian banks Established a successful Hadoop COE Advisory roles on Hadoop in finance My Career](https://reader036.vdocuments.net/reader036/viewer/2022070100/600620edbe9d405309038957/html5/thumbnails/26.jpg)
ThankYouforYourTime