table of contentsindex-of.co.uk/big-data-technologies/big data mba... · figure 3.6 completed...
TRANSCRIPT
TableofContentsIntroduction
OverviewoftheBookandTechnology
HowThisBookIsOrganized
WhoShouldReadThisBook
ToolsYouWillNeed
What'sontheWebsite
WhatThisMeansforYou
PartI:BusinessPotentialofBigData
Chapter1:TheBigDataBusinessMandate
BigDataMBAIntroduction
FocusBigDataonDrivingCompetitiveDifferentiation
CriticalImportanceof“ThinkingDifferently”
Summary
HomeworkAssignment
Notes
Chapter2:BigDataBusinessModelMaturityIndex
IntroducingtheBigDataBusinessModelMaturityIndex
BigDataBusinessModelMaturityIndexLessonsLearned
Summary
HomeworkAssignment
Chapter3:TheBigDataStrategyDocument
EstablishingCommonBusinessTerminology
IntroducingtheBigDataStrategyDocument
IntroducingthePrioritizationMatrix
UsingtheBigDataStrategyDocumenttoWintheWorldSeries
Summary
HomeworkAssignment
Notes
Chapter4:TheImportanceoftheUserExperience
TheUnintelligentUserExperience
ConsumerCaseStudy:ImproveCustomerEngagement
BusinessCaseStudy:EnableFrontlineEmployees
B2BCaseStudy:MaketheChannelMoreEffective
Summary
HomeworkAssignment
PartII:DataScience
Chapter5:DifferencesBetweenBusinessIntelligenceandDataScience
WhatIsDataScience?
TheAnalystCharacteristicsAreDifferent
TheAnalyticApproachesAreDifferent
TheDataModelsAreDifferent
TheViewoftheBusinessIsDifferent
Summary
HomeworkAssignment
Notes
Chapter6:DataScience101
DataScienceCaseStudySetup
FundamentalExploratoryAnalytics
AnalyticAlgorithmsandModels
Summary
HomeworkAssignment
Notes
Chapter7:TheDataLake
IntroductiontotheDataLake
CharacteristicsofaBusiness-ReadyDataLake
UsingtheDataLaketoCrosstheAnalyticsChasm
ModernizeYourDataandAnalyticsEnvironment
AnalyticsHubandSpokeAnalyticsArchitecture
EarlyLearnings
WhatDoestheFutureHold?
Summary
HomeworkAssignment
Notes
PartIII:DataScienceforBusinessStakeholders
Chapter8:ThinkingLikeaDataScientist
TheProcessofThinkingLikeaDataScientist
Summary
HomeworkAssignment
Notes
Chapter9:“By”AnalysisTechnique
“By”AnalysisIntroduction
“By”AnalysisExercise
FootLockerUseCase“By”Analysis
Summary
HomeworkAssignment
Notes
Chapter10:ScoreDevelopmentTechnique
DefinitionofaScore
FICOScoreExample
OtherIndustryScoreExamples
LeBronJamesExerciseContinued
FootLockerExampleContinued
Summary
HomeworkAssignment
Notes
Chapter11:MonetizationExercise
FitnessTrackerMonetizationExample
Summary
HomeworkAssignment
Notes
Chapter12:MetamorphosisExercise
BusinessMetamorphosisReview
BusinessMetamorphosisExercise
BusinessMetamorphosisinHealthCare
Summary
HomeworkAssignment
Notes
PartIV:BuildingCross-OrganizationalSupport
Chapter13:PowerofEnvisioning
Envisioning:FuelingCreativeThinking
ThePrioritizationMatrix
Summary
HomeworkAssignment
Notes
Chapter14:OrganizationalRamifications
ChiefDataMonetizationOfficer
Privacy,Trust,andDecisionGovernance
UnleashingOrganizationalCreativity
Summary
HomeworkAssignment
Notes
Chapter15:Stories
CustomerandEmployeeAnalytics
ProductandDeviceAnalytics
NetworkandOperationalAnalytics
CharacteristicsofaGoodBusinessStory
Summary
HomeworkAssignment
Notes
EndUserLicenseAgreement
EndUserLicenseAgreement
ListofIllustrationsChapter1:TheBigDataBusinessMandate
Figure1.1BigDataBusinessModelMaturityIndex
Figure1.2Moderndata/analyticsenvironment
Chapter2:BigDataBusinessModelMaturityIndex
Figure2.1BigDataBusinessModelMaturityIndex
Figure2.2Crossingtheanalyticschasm
Figure2.3Packagingandsellingaudienceinsights
Figure2.4Optimizeinternalprocesses
Figure2.5Createnewmonetizationopportunities
Chapter3:TheBigDataStrategyDocument
Figure3.1Bigdatastrategydecompositionprocess
Figure3.2Bigdatastrategydocument
Figure3.3Chipotle's2012lettertotheshareholders
Figure3.4Chipotle's“increasesamestoresales”businessinitiative
Figure3.5Chipotlekeybusinessentitiesanddecisions
Figure3.6CompletedChipotlebigdatastrategydocument
Figure3.7BusinessvalueofpotentialChipotledatasources
Figure3.8ImplementationfeasibilityofpotentialChipotledatasources
Figure3.9Chipotleprioritizationofusecases
Figure3.10SanFranciscoGiantsbigdatastrategydocument
Figure3.11Chipotle'ssamestoresalesresults
Chapter4:TheImportanceoftheUserExperience
Figure4.1Originalsubscribere-mail
Figure4.2Improvedsubscribere-mail
Figure4.3Actionablesubscribere-mail
Figure4.4Apprecommendations
Figure4.5TraditionalBusinessIntelligencedashboard
Figure4.6Actionablestoremanagerdashboard
Figure4.7Storemanageraccept/rejectrecommendations
Figure4.8Competitiveanalysisusecase
Figure4.9Localeventsusecase
Figure4.10Localweatherusecase
Figure4.11Financialadvisordashboard
Figure4.12Clientpersonalinformation
Figure4.13Clientfinancialinformation
Figure4.14Clientfinancialgoals
Figure4.15Financialcontributionsrecommendations
Figure4.16Spendanalysisandrecommendations
Figure4.17Assetallocationrecommendations
Figure4.18Otherinvestmentrecommendations
Chapter5:DifferencesBetweenBusinessIntelligenceandDataScience
Figure5.1SchmarzoTDWIkeynote,August2008
Figure5.2OaklandA'sversusNewYorkYankeescostperwin
Figure5.3BusinessIntelligenceversusdatascience
Figure5.4CRISP:CrossIndustryStandardProcessforDataMining
Figure5.5BusinessIntelligenceengagementprocess
Figure5.6TypicalBItoolgraphicoptions
Figure5.7Datascientistengagementprocess
Figure5.8Measuringgoodnessoffit
Figure5.9Dimensionalmodel(starschema)
Figure5.10UsingflatfilestoeliminateorreducejoinsonHadoop
Figure5.11Samplecustomeranalyticprofile
Figure5.12Improvecustomerretentionexample
Chapter6:DataScience101
Figure6.1Basictrendanalysis
Figure6.2Compoundtrendanalysis
Figure6.3Trendlineanalysis
Figure6.4Boxplotanalysis
Figure6.5Geographical(spatial)trendanalysis
Figure6.6Pairsplotanalysis
Figure6.7Timeseriesdecompositionanalysis
Figure6.8Clusteranalysis
Figure6.9Normalcurveequivalentanalysis
Figure6.10Normalcurveequivalentsellerpricinganalysisexample
Figure6.11Associationanalysis
Figure6.12Convertingassociationrulesintosegments
Figure6.13Graphanalysis
Figure6.14Textmininganalysis
Figure6.15Sentimentanalysis
Figure6.16Traversepatternanalysis
Figure6.17Decisiontreeclassifieranalysis
Figure6.18Cohortsanalysis
Chapter7:TheDataLake
Figure7.1Characteristicsofadatalake
Figure7.2Theanalyticsdilemma
Figure7.3Thedatalakelineofdemarcation
Figure7.4CreateaHadoop-baseddatalake
Figure7.5Createananalyticsandbox
Figure7.6MoveETLtothedatalake
Figure7.7HubandSpokeanalyticsarchitecture
Figure7.8Datascienceengagementprocess
Figure7.9Whatdoesthefuturehold?
Figure7.10EMCFederationBusinessDataLake
Chapter8:ThinkingLikeaDataScientist
Figure8.1FootLocker'skeybusinessinitiatives
Figure8.2ExamplesofFootLocker'sin-storemerchandising
Figure8.3FootLocker'sstoremanagerpersona
Figure8.4FootLocker'sstrategicnounsorkeybusinessentities
Figure8.5Thinkinglikeadatascientistdecompositionprocess
Figure8.6Recommendationsworksheettemplate
Figure8.7FootLocker'srecommendationsworksheet
Figure8.8FootLocker'sstoremanageractionabledashboard
Figure8.9Thinkinglikeadatascientistdecompositionprocess
Chapter9:“By”AnalysisTechnique
Figure9.1Identifyingmetricsthatmaybebetterpredictorsofperformance
Figure9.2NBAshootingeffectiveness
Figure9.3LeBronJames'sshootingeffectiveness
Chapter10:ScoreDevelopmentTechnique
Figure10.1FICOscoreconsiderations
Figure10.2FICOscoredecisionrange
Figure10.3Recommendationsworksheet
Figure10.4Updatedrecommendationsworksheet
Figure10.5Completedrecommendationsworksheet
Figure10.6PotentialFootLockercustomerscores
Figure10.7FootLockerrecommendationsworksheet
Figure10.8CLTVbasedonsales
Figure10.9MorepredictiveCLTVscore
Chapter11:MonetizationExercise
Figure11.1“Adayinthelife”customerpersona
Figure11.2Fitnesstrackerprioritization
Figure11.3Monetizationroadmap
Chapter12:MetamorphosisExercise
Figure12.1BigDataBusinessModelMaturityIndex
Figure12.2Patientactionableanalyticprofile
Chapter13:PowerofEnvisioning
Figure13.1BigDataVisionWorkshopprocessandtimeline
Figure13.2BigDataVisionWorkshopillustrativeanalytics
Figure13.3BigDataVisionWorkshopuserexperiencemock-up
Figure13.4PrioritizeHealthcareSystems'susecases
Figure13.5Prioritizationmatrixtemplate
Figure13.6Prioritizationmatrixprocess
Chapter14:OrganizationalRamifications
Figure14.1CDMOorganizationalstructure
ListofTablesChapter1:TheBigDataBusinessMandate
Table1.1ExploitingTechnologyInnovationtoCreateEconomic-DrivenBusinessOpportunities
Table1.2EvolutionoftheBusinessQuestions
Chapter2:BigDataBusinessModelMaturityIndex
Table2.1BigDataBusinessModelMaturityIndexSummary
Chapter3:TheBigDataStrategyDocument
Table3.1MappingChipotleUseCasestoAnalyticModels
Chapter5:DifferencesBetweenBusinessIntelligenceandDataScience
Table5.1BIAnalystVersusDataScientistCharacteristics
Chapter6:DataScience101
Table6.12014–2015TopNBARPMRankings
Table6.2CaseStudySummary
Chapter7:TheDataLake
Table7.1DataLakeDataTypes
Chapter8:ThinkingLikeaDataScientist
Table8.1EvolutionofFootLocker'sBusinessQuestions
Chapter9:“By”AnalysisTechnique
Table9.1LeBronJames'sShootingPercentages
Chapter10:ScoreDevelopmentTechnique
Table10.1PotentialScoresforOtherIndustries
Chapter11:MonetizationExercise
Table11.1PotentialFitnessTrackerRecommendations
Table11.2RecommendationDataRequirements
Table11.3RecommendationsValueVersusFeasibilityAssessment
Chapter12:MetamorphosisExercise
Table12.1DecisionstoAnalyticsMapping
Table12.2Data-to-AnalyticsMapping
IntroductionIneverplannedonwritingasecondbook.Heck,Ithoughtwritingonebookwasenoughtocheckthisitemoffmybucketlist.ButsomuchhaschangedsinceIwrotemyfirstbookthatIfeltcompelledtocontinuetoexplorethisonce-in-a-lifetimeopportunityfororganizationstoleveragedataandanalyticstotransformtheirbusinessmodels.AndI'mnotjusttalkingthe“makememoremoney”partofbusinesses.Bigdatacandrivesignificant“improvethequalityoflife”valueinareassuchaseducation,poverty,parolerehabilitation,healthcare,safety,andcrimereduction.
MyfirstbooktargetedtheInformationTechnology(IT)audience.However,Isoonrealizedthatthebiggestwinnerinthisbigdatalandgrabwasthebusiness.Sothisbooktargetsthebusinessaudienceandisbasedonafewkeypremises:
Organizationsdonotneedabigdatastrategyasmuchastheyneedabusinessstrategythatincorporatesbigdata.
ThedayswhenbusinessleaderscouldturnanalyticsovertoITareover;tomorrow'sbusinessleadersmustembraceanalyticsasabusinessdisciplineinthesameveinasaccounting,finance,managementscience,andmarketing.
Thekeytodatamonetizationandbusinesstransformationliesinunleashingtheorganization'screativethinking;wehavegottogetthebusinessusersto“thinklikeadatascientist.”
Finally,thebusinesspotentialofbigdataisonlylimitedbythecreativethinkingofthebusinessusers.
I'vealsohadtheopportunitytoteach“BigDataMBA”attheUniversityofSanFrancisco(USF)SchoolofManagementsinceIwrotethefirstbook.IdidwellenoughthatUSFmademeitsfirstSchoolofManagementFellow.WhatIexperiencedwhileworkingwiththeseoutstandingandcreativestudentsandProfessorMouwafacSidaouicompelledmetoundertakethechallengeofwritingthissecondbook,targetingthosestudentsandtomorrow'sbusinessleaders.
OneofthetopicsthatIhopejumpsoutinthebookisthepowerofdatascience.Therehavebeenmanybookswrittenaboutdatasciencewiththegoalofhelpingpeopletobecomedatascientists.ButIfeltthatsomethingwasmissing—thatinsteadoftryingtocreateaworldofdatascientists,weneededtohelptomorrow'sbusinessleadersthinklikedatascientists.
Sothat'sthefocusofthisbook—tohelptomorrow'sbusinessleadersintegratedataandanalyticsintotheirbusinessmodelsandtoleadtheculturaltransformationbyunleashingtheorganization'screativejuicesbyhelpingthebusinessto“thinklikeadatascientist.”
OverviewoftheBookandTechnologyThedayswhenbusinessstakeholderscouldrelinquishcontrolofdataandanalyticstoITareover.Thebusinessstakeholdersmustbefrontandcenterinchampioningandmonetizingtheorganization'sdatacollectionandanalysisefforts.Businessleadersneedtounderstandwhereandhowtoleveragebigdata,exploitingthecollisionofnewsourcesofcustomer,product,andoperationaldatacoupledwithdatasciencetooptimizekeybusinessprocesses,uncovernewmonetizationopportunities,andcreatenewsourcesofcompetitivedifferentiation.Andwhileit'snotrealistictoconvertyourbusinessusersintodatascientists,it'scriticalthatweteachthebusinessuserstothinklikedatascientistssotheycancollaboratewithITandthedatascientistsonusecaseidentification,requirementsdefinition,businessvaluation,andultimatelyanalyticsoperationalization.
Thisbookprovidesabusiness-hardenedframeworkwithsupportingmethodologyandhands-onexercisesthatnotonlywillhelpbusinessuserstoidentifywhereandhowtoleveragebigdataforbusinessadvantagebutwillalsoprovideguidelinesforoperationalizingtheanalytics,settinguptherightorganizationalstructure,anddrivingtheanalyticinsightsthroughouttheorganization'suserexperiencetobothcustomersandfrontlineemployees.
HowThisBookIsOrganizedThebookisorganizedintofoursections:
PartI:BusinessPotentialofBigData.PartIincludesChapters1through4andsetsthebusiness-centricfoundationforthebook.HereiswhereIintroducetheBigDataBusinessModelMaturityIndexandframethebigdatadiscussionaroundtheperspectivethat“organizationsdonotneedabigdatastrategyasmuchastheyneedabusinessstrategythatincorporatesbigdata.”
PartII:DataScience.PartIIincludesChapters5through7andcoverstheprinciplebehinddatascience.ThesechaptersintroducesomedatasciencebasicsandexplorethecomplementarynatureofBusinessIntelligenceanddatascienceandhowthesetwodisciplinesarebothcomplementaryanddifferentintheproblemsthattheyaddress.
PartIII:DataScienceforBusinessStakeholders.PartIIIincludesChapters8through12andseekstoteachthebusinessusersandbusinessleadersto“thinklikeadatascientist.”Thispartintroducesamethodologyandseveralexercisestoreinforcethedatasciencethinkingandapproach.Ithasalotofhands-onwork.
PartIV:BuildingCross-OrganizationalSupport.PartIVincludesChapters13through15anddiscussesorganizationalchallenges.Thispartcoversenvisioning,whichmayverywellbethemostimportanttopicinthebookasthebusinesspotentialofbigdataisonlylimitedbythecreativethinkingofthebusinessusers.
Herearesomemoredetailsoneachofthechaptersinthebook:
Chapter1:TheBigDataBusinessMandate.Thischapterframesthebigdatadiscussiononhowbigdataismoreaboutbusinesstransformationandtheeconomicsofbigdatathanitisabouttechnology.
Chapter2:BigDataBusinessModelMaturityIndex.ThischaptercoverstheBigDataBusinessModelMaturityIndex(BDBM),whichisthefoundationfortheentirebook.TakethetimetounderstandeachofthefivestagesoftheBDBMandhowtheBDBMprovidesaroadmapformeasuringhoweffectiveyourorganizationisatintegratingdataandanalyticsintoyourbusinessmodels.
Chapter3:TheBigDataStrategyDocument.ThischapterintroducesaCXOleveldocumentandprocessforhelpingorganizationsidentifywhereandhowtostarttheirbigdatajourneysfromabusinessperspective.
Chapter4:TheImportanceoftheUserExperience.Thisisoneofmyfavoritetopics.ThischapterchallengestraditionalBusinessIntelligencereportinganddashboardconceptsbyintroducingamoresimplebutdirectapproachfordeliveringactionableinsightstoyourkeybusinessstakeholders—
frontlineemployees,channelpartners,andendcustomers.
Chapter5:DifferencesBetweenBusinessIntelligenceandDataScience.ThischapterexploresthedifferentworldsofBusinessIntelligenceanddatascienceandhighlightsboththedifferencesandthecomplementarynatureofeach.
Chapter6:DataScience101.Thischapter(myfavorite)reviews14differentanalytictechniquesthatmydatascienceteamscommonlyuseandinwhatbusinesssituationsyoushouldcontemplateusingthem.ItisaccompaniedbyamarvelousfictitiouscasestudyusingFairy-TaleThemeParks(thanksJen!).
Chapter7:TheDataLake.Thischapterintroducestheconceptofadatalake,explaininghowthedatalakefreesupexpensivedatawarehouseresourcesandunleashesthecreative,fail-fastnatureofthedatascienceteams.
Chapter8:ThinkingLikeaDataScientist.Theheartofthisbook,thischaptercoverstheeight-step“thinkinglikeadatascientist”process.Thischapterisprettydeep,soplanonhavingapenandpaper(andprobablyaneraseraswell)withyouasyoureadthischapter.
Chapter9:“By”AnalysisTechnique.Thischapterdoesadeepdiveintooneoftheimportantconceptsin“thinkinglikeadatascientist”—the“By”analysistechnique.
Chapter10:ScoreDevelopmentTechnique.Thischapterintroduceshowscorescandrivecollaborationbetweenthebusinessusersanddatascientisttocreateactionablescoresthatguidetheorganization'skeybusinessdecisions.
Chapter11:MonetizationExercise.Thischapterprovidesatechniquefororganizationsthathaveasubstantialamountofcustomer,product,andoperationaldatabutdonotknowhowtomonetizethatdata.Thischaptercanbeveryeye-opening!
Chapter12:MetamorphosisExercise.Thischapterisafun,out-of-the-boxexercisethatexploresthepotentialdataandanalyticimpactsforanorganizationasitcontemplatestheBusinessMetamorphosisphaseoftheBigDataBusinessModelMaturityIndex.
Chapter13:PowerofEnvisioning.Thischapterstartstoaddresssomeoftheorganizationalandculturalchallengesyoumayface.Inparticular,Chapter13introducessomeenvisioningtechniquestohelpunleashyourorganization'screativethinking.
Chapter14:OrganizationalRamifications.Thischaptergoesintomoredetailabouttheorganizationalramificationsofbigdata,especiallytheroleoftheChiefData(Monetization)Officer.
Chapter15:Stories.Thebookwrapsupwithsomecasestudies,butnotyour
traditionalcasestudies.Instead,Chapter15presentsatechniqueforcreating“stories”thatarerelevanttoyourorganization.Anyonecanfindcasestudies,butnotjustanyonecancreateastory.
WhoShouldReadThisBookThisbookistargetedtowardbusinessusersandbusinessmanagement.IwrotethisbooksothatIcoulduseitinteachingmyBigDataMBAclass,soincludedallofthehands-onexercisesandtemplatesthatmystudentswouldneedtosuccessfullyearntheirBigDataMBAgraduationcertificate.
Ithinkfolkswouldbenefitbyalsoreadingmyfirstbook,BigData:UnderstandingHowDataPowersBigBusiness,whichistargetedtowardtheITaudience.Thereissomeoverlapbetweenthetwobooks(10to15percent),butthefirstbooksetsthestageandintroducesconceptsthatareexploredinmoredetailinthisbook.
ToolsYouWillNeedNospecialtoolsarerequiredotherthanapencil,aneraser,severalsheetsofpaper,andyourcreativity.Grabachaitealatte,someChipotle,andenjoy!
What'sontheWebsiteYoucandownloadthe“ThinkingLikeaDataScientist”workbookfromthebook'swebsiteatwww.wiley.com/go/bigdatamba.Andoh,theremightbeanothersurprisethereaswell!Hehehe!
WhatThisMeansforYouAsstudentsfrommyclassatUSFhavetoldme,thismaterialallowsthemtotakeaproblemorchallengeanduseawell-thought-outprocesstodrivecross-organizationalcollaborationtocomeupwithideastheycanturnintoactionsusingdataandanalytics.Whatemployerwouldn'twantafutureleaderwhoknowshowtodothat?
PartIBusinessPotentialofBigDataChapters1through4setthefoundationfordrivingbusinessstrategieswithdatascience.Inparticular,theBigDataBusinessModelMaturityIndexhighlightstherealmofwhat'spossiblefromabusinesspotentialperspectivebyprovidingaroadmapthatmeasurestheeffectivenessofyourorganizationtoleveragedataandanalyticstopoweryourbusinessmodels.
InThisPart
Chapter1:TheBigDataBusinessMandate
Chapter2:BigDataBusinessModelMaturityIndex
Chapter3:TheBigDataStrategyDocument
Chapter4:TheImportanceoftheUserExperience
Chapter1TheBigDataBusinessMandateHavingtroublegettingyourseniormanagementteamtounderstandthebusinesspotentialofbigdata?Can'tgetyourmanagementleadershiptoconsiderbigdatatobesomethingotherthananITscienceexperiment?Areyourline-of-businessleadersunwillingtocommitthemselvestounderstandinghowdataandanalyticscanpowertheirtopinitiatives?
Ifso,thenthis“BigDataSeniorExecutiveCarePackage”isforyou!
Andforalimitedtime,yougetanunlimitedlicensetosharethiscarepackagewithasmanyseniorexecutivesasyoudesire.ButyoumustactNOW!Becomethelifeofthecompanypartieswithyourextensiveknowledgeofhownewcustomer,product,andoperationalinsightscanguideyourorganization'svaluecreationprocesses.Andmaybe,justmaybe,getapromotionintheprocess!!
NOTE
Allcompanymaterialreferencedinthisbookcomesfrompublicsourcesandisreferencedaccordingly.
BigDataMBAIntroductionThedayswhenbusinessusersandbusinessmanagementcanrelinquishcontrolofdataandanalyticstoITareover,oratleastfororganizationsthatwanttosurvivebeyondtheimmediateterm.Thebigdatadiscussionnowneedstofocusonhoworganizationscancouplenewsourcesofcustomer,product,andoperationaldatawithadvancedanalytics(datascience)topowertheirkeybusinessprocessesandelevatetheirbusinessmodels.Organizationsneedtounderstandthattheydonotneedabigdatastrategyasmuchastheyneedabusinessstrategythatincorporatesbigdata.
TheBigDataMBAchallengesthethinkingthatdataandanalyticsareancillaryora“bolton”tothebusiness;thatdataandanalyticsaresomeoneelse'sproblem.Inagrowingnumberofleadingorganizations,dataandanalyticsarecriticaltobusinesssuccessandlong-termsurvival.Businessleadersandbusinessusersreadingthisbookwilllearnwhytheymusttakeresponsibilityforidentifyingwhereandhowtheycanapplydataandanalyticstotheirbusinesses—otherwisetheyputtheirbusinessesatriskofbeingmadeobsoletebymorenimble,data-drivencompetitors.
TheBigDataMBAintroducesanddescribesconcepts,techniques,methodologies,andhand-onexercisestoguideyouasyouseektoaddressthebigdatabusinessmandate.Thebookprovideshands-onexercisesandhomeworkassignmentstomaketheseconceptsandtechniquescometolifeforyourorganization.Itprovidesrecommendationsandactionsthatenableyourorganizationtostarttoday.Andintheprocess,BigDataMBAteachesyouto“thinklikeadatascientist.”
TheForresterstudy“ResetonBigData”(Hopkinsetal.,2014)1highlightsthecriticalroleofabusiness-centricfocusinthebigdatadiscussion.Thestudyarguesthattechnology-focusedexecutiveswithinabusinesswillthinkofbigdataasatechnologyandfailtoconveyitsimportancetotheboardroom.
Businessesofallsizesmustreframethebigdataconversationwiththebusinessleadersintheboardroom.Thecriticalanddifficultbigdataquestionthatbusinessleadersmustaddressis:
Howeffectiveisourorganizationatintegratingdataandanalyticsintoourbusinessmodels?
Beforebusinessleaderscanbeginthesediscussions,organizationsmustunderstandtheircurrentlevelofbigdatamaturity.Chapter2discussesindetailthe“BigDataBusinessModelMaturityIndex”(seeFigure1.1).TheBigDataBusinessModelMaturityIndexisameasureofhoweffectiveanorganizationisatintegratingdataandanalyticstopowertheirbusinessmodel.
Figure1.1BigDataBusinessModelMaturityIndex
TheBigDataBusinessModelMaturityIndexprovidesaroadmapforhoworganizationscanintegratedataandanalyticsintotheirbusinessmodels.TheBigDataBusinessModelMaturityIndexiscomposedofthefollowingfivephases:
Phase1:BusinessMonitoring.IntheBusinessMonitoringphase,organizationsareleveragingdatawarehousingandBusinessIntelligencetomonitortheorganization'sperformance.
Phase2:BusinessInsights.TheBusinessInsightsphaseisaboutleveragingpredictiveanalyticstouncovercustomer,product,andoperationalinsightsburiedinthegrowingwealthofinternalandexternaldatasources.Inthisphase,organizationsaggressivelyexpandtheirdataacquisitioneffortsbycouplingalloftheirdetailedtransactionalandoperationaldatawithinternaldatasuchasconsumercomments,e-mailconversations,andtechniciannotes,aswellasexternalandpubliclyavailabledatasuchassocialmedia,weather,traffic,economic,demographics,homevalues,andlocaleventsdata.
Phase3:BusinessOptimization.IntheBusinessOptimizationphase,organizationsapplyprescriptiveanalyticstothecustomer,product,andoperationalinsightsuncoveredintheBusinessInsightsphasetodeliveractionableinsightsorrecommendationstofrontlineemployees,businessmanagers,andchannelpartners,aswellascustomers.ThegoaloftheBusinessOptimizationphaseistoenableemployees,partners,andcustomerstooptimizetheirkeydecisions.
Phase4:DataMonetization.IntheDataMonetizationphase,organizationsleveragethecustomer,product,andoperationalinsightstocreatenewsourcesofrevenue.Thiscouldincludesellingdata—orinsights—intonewmarkets(acellularphoneprovidersellingcustomerbehavioraldatatoadvertisers),integratinganalyticsintoproductsandservicestocreate“smart”products,orre-packagingcustomer,product,andoperationalinsightstocreatenewproductsandservices,toenternewmarkets,and/ortoreachnewaudiences.
Phase5:BusinessMetamorphosis.TheholygrailoftheBigDataBusinessModelMaturityIndexiswhenanorganizationtransitionsitsbusinessmodelfromsellingproductstoselling“business-as-a-service.”ThinkGEselling“thrust”insteadofjetengines.ThinkJohnDeereselling“farmingoptimization”insteadoffarmingequipment.ThinkBoeingselling“airmiles”insteadofairplanes.Andintheprocess,theseorganizationswillcreateaplatformenablingthird-partydeveloperstobuildandmarketsolutionsontopoftheorganization'sbusiness-as-a-servicebusinessmodel.
Ultimately,bigdataonlymattersifithelpsorganizationsmakemoremoneyandimproveoperationaleffectiveness.Examplesincludeincreasingcustomeracquisition,reducingcustomerchurn,reducingoperationalandmaintenancecosts,optimizingpricesandyield,reducingrisksanderrors,improvingcompliance,improvingthecustomerexperience,andmore.
Nomatterthesizeoftheorganization,organizationsdon'tneedabigdatastrategyasmuchastheyneedabusinessstrategythatincorporatesbigdata.
FocusBigDataonDrivingCompetitiveDifferentiationI'malwaysconfusedabouthoworganizationsstruggletodifferentiatebetweentechnologyinvestmentsthatdrivecompetitiveparityandthosetechnologyinvestmentsthatcreateuniqueandcompellingcompetitivedifferentiation.Let'sexplorethisdifferenceinabitmoredetail.
Competitiveparityisachievingsimilarorsameoperationalcapabilitiesasthoseofyourcompetitors.Itinvolvesleveragingindustrybestpracticesandpre-packagedsoftwaretocreateabaselinethat,atworst,isequaltotheoperationalcapabilitiesacrossyourindustry.OrganizationsendupachievingcompetitiveparitywhentheybuyfoundationalandundifferentiatedcapabilitiesfromenterprisesoftwarepackagessuchasEnterpriseResourcePlanning(ERP),CustomerRelationshipManagement(CRM),andSalesForceAutomation(SFA).
Competitivedifferentiationisachievedwhenanorganizationleveragespeople,processes,andtechnologytocreateapplications,programs,processes,etc.,thatdifferentiateitsproductsandservicesfromthoseofitscompetitorsinwaysthatadduniquevaluefortheendcustomerandcreatecompetitivedifferentiationinthemarketplace.
Leadingorganizationsshouldseekto“buy”foundationalandundifferentiatedcapabilitiesbut“build”whatisdifferentiatedandvalue-addedfortheircustomers.Butsometimesorganizationsgetconfusedbetweenthetwo.Let'scallthistheERPeffect.ERPsoftwarepackagesweresoldasasoftwaresolutionthatwouldmakeeveryonemoreprofitablebydeliveringoperationalexcellence.Butwheneveryoneisrunningthesameapplication,what'sthesourceofthecompetitivedifferentiation?
Analytics,ontheotherhand,enablesorganizationstouniquelyoptimizetheirkeybusinessprocesses,driveamoreengagingcustomerexperience,anduncovernewmonetizationopportunitieswithuniqueinsightsthattheygatherabouttheircustomers,products,andoperations.
LeveragingTechnologytoPowerCompetitiveDifferentiationWhilemostorganizationshaveinvestedheavilyinERP-typeoperationalsystems,farfewerhavebeensuccessfulinleveragingdataandanalyticstobuildstrategicapplicationsthatprovideuniquevaluetotheircustomersandcreatecompetitivedifferentiationinthemarketplace.Herearesomeexamplesoforganizationsthathaveinvestedinbuildingdifferentiatedcapabilitiesbyleveragingnewsourcesofdataandanalytics:
Google:PageRankandAdServing
Yahoo:BehavioralTargetingandRetargeting
Facebook:AdServingandNewsFeed
Apple:iTunes
Netflix:MovieRecommendations
Amazon:“CustomersWhoBoughtThisItem,”1-Clickordering,andSupplyChain&Logistics
Walmart:DemandForecasting,SupplyChainLogistics,andRetailLink
Procter&Gamble:BrandandCategoryManagement
FederalExpress:CriticalInventoryLogistics
AmericanExpressandVisa:FraudDetection
GE:AssetOptimizationandOperationsOptimization(Predix)
Noneoftheseorganizationsboughtthesestrategic,business-differentiatingapplicationsofftheshelf.Theyunderstoodthatitwasnecessarytoprovidedifferentiatedvaluetotheirinternalandexternalcustomers,andtheyleverageddataandanalyticstobuildapplicationsthatdeliveredcompetitivedifferentiation.
HistoryLessononEconomic-DrivenBusinessTransformationMorethananythingelse,thedrivingforcebehindbigdataistheeconomicsofbigdata—it's20to50timescheapertostore,manage,andanalyzedatathanitistousetraditionaldatawarehousingtechnologies.This20to50timeseconomicimpactiscourtesyofcommodityhardware,opensourcesoftware,anexplosionofnewopensourcetoolscomingoutofacademia,andreadyaccesstofreeonlinetrainingontopicssuchasbigdataarchitecturesanddatascience.Aclientofmineintheinsuranceindustrycalculateda50Xeconomicimpact.Anotherclientinthehealthcareindustrycalculateda49Xeconomicimpact(theyneedtolookhardertofindthatmissing1X).
Historyhasshownthatthemostsignificanttechnologyinnovationsareonesthatdriveeconomicchange.Fromtheprintingpresstointerchangeablepartstothemicroprocessor,thesetechnologyinnovationshaveprovidedanunprecedentedopportunityforthemoreagileandmorenimbleorganizationstodisruptexistingmarketsandestablishnewvaluecreationprocesses.
Bigdatapossessesthatsameeconomicpotentialwhetheritbetocreatesmartcities,improvethequalityofmedicalcare,improveeducationaleffectiveness,reducepoverty,improvesafety,reducerisks,orevencurecancer.Andformanyorganizations,thefirstquestionthatneedstobeaskedaboutbigdatais:
Howeffectiveismyorganizationatleveragingnewsourcesofdataandadvancedanalyticstouncovernewcustomer,product,andoperationalinsightsthatcanbeusedtodifferentiateourcustomerengagement,optimizekeybusinessprocesses,anduncovernewmonetizationopportunities?
Bigdataisnothingnew,especiallyifyouviewitfromtheproperperspective.Whilethepopularbigdatadiscussionsarearound“disruptive”technology
innovationslikeHadoopandSpark,therealdiscussionshouldbeabouttheeconomicimpactofbigdata.Newtechnologiesdon'tdisruptbusinessmodels;it'swhatorganizationsdowiththesenewtechnologiesthatdisruptsbusinessmodelsandenablesnewones.Let'sreviewanexampleofonesucheconomic-drivenbusinesstransformation:thesteamengine.
Thesteamengineenabledurbanization,industrialization,andtheconqueringofnewterritories.Itliterallyshrankdistanceandtimebyreducingthetimerequiredtomovepeopleandgoodsfromonesideofacontinenttotheother.Thesteamengineenabledpeopletoleavelow-payingagriculturaljobsandmoveintocitiesforhigher-payingmanufacturingandclericaljobsthatledtoahigherstandardofliving.
Forexample,citiessuchasLondonshotupintermsofpopulation.In1801,beforetheadventofGeorgeStephenson'sRocketsteamengine,Londonhad1.1millionresidents.Aftertheinvention,thepopulationofLondonmorethandoubledto2.7millionresidentsby1851.Londontransformedthenucleusofsocietyfromsmalltight-knitcommunitieswheretextileproductionandagriculturewereprevalentintobigcitieswithavarietyofjobs.Thesteamlocomotiveprovidedquickertransportationandmorejobs,whichinturnbroughtmorepeopleintothecitiesanddrasticallychangedthejobmarket.By1861,only2.4percentofLondon'spopulationwasemployedinagriculture,while49.4percentwereinthemanufacturingortransportationbusiness.Thesteamlocomotivewasamajorturningpointinhistoryasittransformedsocietyfromlargelyruralandagriculturalintourbanandindustrial.2
Table1.1showsotherhistoricallessonsthatdemonstratehowtechnologyinnovationcreatedeconomic-drivenbusinessopportunities.
Table1.1ExploitingTechnologyInnovationtoCreateEconomic-DrivenBusinessOpportunities
TechnologyInnovation
EconomicImpact
PrintingPress Expandedliteracy(simplifiedknowledgecaptureandenabledknowledgedisseminationandtheeducationofthemasses)
InterchangeableParts
Drovethestandardizationofmanufacturingpartsandfueledtheindustrialrevolution
SteamEngine(RailroadsandSteamboats)
Sparkedurbanization(drovetransitionfromagriculturaltomanufacturing-centricsociety)
InternalCombustionEngine
Triggeredsuburbanization(enabledpersonalmobility,bothgeographicallyandsocially)
InterstateHighwaySystem
Foundationforinterstatecommerce(enabledregionalspecializationandwealthcreation)
Telephone Democratizedcommunications(byeliminatingdistanceanddelaysascommunicationsissues)
Computers Automatedcommonprocesses(therebyfreeinghumansformorecreativeengagement)
Internet Guttedcostofcommerceandknowledgesharing(enabledremoteworkforceandinternationalcompetition)
Thisbringsusbacktobigdata.Alloftheseinnovationssharethesamelesson:itwasn'tthetechnologythatwasdisruptive;itwashoworganizationsleveragedthetechnologytodisruptexistingbusinessmodelsandenablednewones.
CriticalImportanceof“ThinkingDifferently”Organizationshavebeentaughtbytechnologyvendors,press,andanalyststothinkfaster,cheaper,andsmaller,buttheyhavenotbeentaughtto“thinkdifferently.”Theinabilitytothinkdifferentlyiscausingorganizationalalignmentandbusinessadoptionproblemswithrespecttothebigdataopportunity.Organizationsmustthrowoutmuchoftheirconventionaldata,analytics,andorganizationalthinkinginordertogetthemaximumvalueoutofbigdata.Let'sintroducesomekeyareasforthinkingdifferentlythatwillbecoveredthroughoutthisbook.
Don'tThinkBigDataTechnology,ThinkBusinessTransformationManyorganizationsareinfatuatedwiththetechnicalinnovationssurroundingbigdataandthethreeVsofdata:volume,variety,andvelocity.Butstartingwithatechnologyfocuscanquicklyturnyourbigdatainitiativeintoascienceexperiment.Youdon'twanttobeasolutioninsearchofaproblem.
Instead,focusonthefourMsofbigdata:MakeMeMoreMoney(orifyouareanon-profitorganization,maybethat'sMakeMeMoreEfficient).Startyourbigdatainitiativewithabusiness-firstapproach.Identifyandfocusonaddressingtheorganization'skeybusinessinitiatives,thatis,whattheorganizationistryingtoaccomplishfromabusinessperspectiveoverthenext9to12months(e.g.,reducesupplychaincosts,improvesupplierqualityandreliability,reducehospital-acquiredinfections,improvestudentperformance).Breakdownordecomposethisbusinessinitiativeintothesupportingdecisions,questions,metrics,data,analytics,andtechnologynecessarytosupportthetargetedbusinessinitiative.
CROSS-REFERENCE
ThisbookbeginsbycoveringtheBigDataBusinessModelMaturityIndexinChapter2.TheBigDataBusinessModelMaturityIndexhelpsorganizationsaddressthekeyquestion:
Howeffectiveisourorganizationatleveragingdataandanalyticstopowerourkeybusinessprocessesanduncovernewmonetizationopportunities?
Thematurityindexprovidesaguideorroadmapwithspecificrecommendationstohelporganizationsadvanceupthematurityindex.Chapter3introducesthebigdatastrategydocument.Thebigdatastrategydocumentprovidesaframeworkforhelpingorganizationsidentifywhereandhowtostarttheirbigdatajourneyfromabusinessperspective.
Don'tThinkBusinessIntelligence,ThinkDataScience
DatascienceisdifferentfromBusinessIntelligence(BI).Resisttheadvicetotrytomakethesetwodifferentdisciplinesthesame.Forexample:
BusinessIntelligencefocusesonreportingwhathappened(descriptiveanalytics).Datasciencefocusesonpredictingwhatislikelytohappen(predictiveanalytics)andthenrecommendingwhatactionstotake(prescriptiveanalytics).
BusinessIntelligenceoperateswithschemaonloadinwhichyouhavetopre-buildthedataschemabeforeyoucanloadthedatatogenerateyourBIqueriesandreports.Datasciencedealswithschemaonqueryinwhichthedatascientistscustomdesignthedataschemabasedonthehypothesistheywanttotestorthepredictionthattheywanttomake.
Organizationsthattryto“extend”theirBusinessIntelligencecapabilitiestoencompassbigdatawillfail.That'slikestatingthatyou'regoingtothemoon,thenclimbingatreeanddeclaringthatyouarecloser.Unfortunately,youcan'tgettothemoonfromthetopofatree.Datascienceisanewdisciplinethatofferscompelling,business-differentiatingcapabilities,especiallywhencoupledwithBusinessIntelligence.
CROSS-REFERENCE
Chapter5(“DifferencesBetweenBusinessIntelligenceandDataScience”)discussesthedifferencesbetweenBusinessIntelligenceanddatascienceandhowdatasciencecancomplementyourBusinessIntelligenceorganization.Chapter6(“DataScience101”)reviewsseveraldifferentanalyticalgorithmsthatyourdatascienceteammightuseanddiscussesthebusinesssituationsinwhichthedifferentalgorithmsmightbemostappropriate.
Don'tThinkDataWarehouse,ThinkDataLakeIntheworldofbigdata,HadoopandHDFSisagamechanger;itisfundamentallychangingthewayorganizationsthinkaboutstoring,managing,andanalyzingdata.AndIdon'tmeanHadoopasyetanotherdatasourceforyourdatawarehouse.I'mtalkingaboutHadoopandHDFSasthefoundationforyourdataandanalyticsenvironments—totakeadvantageofthemassivelyparallelprocessing,cheapscale-outdataarchitecturethatcanrunhundreds,thousands,oreventensofthousandsofHadoopnodes.
Wearewitnessingthedawnoftheageofthedatalake.Thedatalakeenablesorganizationstogather,manage,enrich,andanalyzemanynewsourcesofdata,whetherstructuredorunstructured.Thedatalakeenablesorganizationstotreatdataasanorganizationalassettobegatheredandnurturedversusacosttobeminimized.
Organizationsneedtotreattheirreportingenvironments(traditionalBIanddata
warehousing)andanalytics(datascience)environmentsdifferently.Thesetwoenvironmentshaveverydifferentcharacteristicsandservedifferentpurposes.ThedatalakecanmakebothoftheBIanddatascienceenvironmentsmoreagileandmoreproductive(Figure1.2).
Figure1.2Moderndata/analyticsenvironment
CROSS-REFERENCE
Chapter7(”TheDataLake“)introducestheconceptofadatalakeandtherolethedatalakeplaysinsupportingyourexistingdatawarehouseandBusinessIntelligenceinvestmentswhileprovidingthefoundationforyourdatascienceenvironment.Chapter7discusseshowthedatalakecanun-cuffyourdatascientistsfromthedatawarehousetouncoverthosevariablesandmetricsthatmightbebetterpredictorsofbusinessperformance.Italsodiscusseshowthedatalakecanfreeupexpensivedatawarehouseresources,especiallythoseresourcesassociatedwithExtract,Transform,andLoad(ETL)dataprocesses.
Don'tThink“WhatHappened,”Think“WhatWillHappen”Businessusershavebeentrainedtocontemplatebusinessquestionsthatmonitorthecurrentstateofthebusinessandtofocusonretrospectivereportingonwhathappened.BusinessusershavebecomeconditionedbytheirBIanddatawarehouseenvironmentstoonlyconsiderquestionsthatreportoncurrentbusinessperformance,suchas“HowmanywidgetsdidIselllastmonth?”and“Whatweremygrosssaleslastquarter?”
Unfortunately,thisretrospectiveviewofthebusinessdoesn'thelpwhentryingtomakedecisionsandtakeactionaboutfuturesituations.Weneedtogetbusinessusersto“thinkdifferently”aboutthetypesofquestionstheycanask.Weneedto
movethebusinessinvestigationprocessbeyondtheperformancemonitoringquestionstothepredictive(e.g.,Whatwilllikelyhappen?)andprescriptive(e.g.,WhatshouldIdo?)questionsthatorganizationsneedtoaddressinordertooptimizekeybusinessprocessesanduncovernewmonetizationopportunities(seeTable1.2).
Table1.2EvolutionoftheBusinessQuestions
WhatHappened?(Descriptive/BI)
WhatWillHappen?(PredictiveAnalytics)
WhatShouldIdo?(PrescriptiveAnalytics)
HowmanywidgetsdidIselllastmonth?
HowmanywidgetswillIsellnextmonth?
Order[5,0000]unitsofComponentZtosupportwidgetsalesfornextmonth
WhatweresalesbyzipcodeforChristmaslastyear?
WhatwillbesalesbyzipcodeoverthisChristmasseason?
Hire[Y]newsalesrepsbythesezipcodestohandleprojectedChristmassales
HowmanyofProductXwerereturnedlastmonth?
HowmanyofProductXwillbereturnednextmonth?
Setaside[$125K]infinancialreservetocoverProductXreturns
Whatwerecompanyrevenuesandprofitsforthepastquarter?
Whatareprojectedcompanyrevenuesandprofitsfornextquarter?
Sellthefollowingproductmixtoachievequarterlyrevenueandmargingoals
HowmanyemployeesdidIhirelastyear?
HowmanyemployeeswillIneedtohirenextyear?
Increasehiringpipelineby35percenttoachievehiringgoals
CROSS-REFERENCE
Chapter8(“ThinkingLikeaDataScientist”)differentiatesbetweendescriptiveanalytics,predictiveanalytics,andprescriptiveanalytics.Chapters9,10,and11thenintroduceseveraltechniquestohelpyourbusinessusersidentifythepredictive(“Whatwillhappen?”)andprescriptive(“WhatshouldIdo?”)questionsthattheyneedtomoreeffectivelydrivethebusiness.Yeah,thiswillmeanlotsofPost-itnotesandwhiteboards,myfavoritetools.
Don'tThinkHIPPO,ThinkCollaborationUnfortunately,todayitisstilltheHIPPO—theHighestPaidPerson'sOpinion—thatdeterminesmostofthebusinessdecisions.Reasonssuchas“We'vealwaysdonethingsthatway”or“Myyearsofexperiencetellme…”or“ThisiswhattheCEOwants…”arestillgivenasreasonsforwhytheHIPPOneedstodrivetheimportantbusinessdecisions.
Unfortunately,thattypeofthinkinghasledtosiloeddatafiefdoms,siloeddecisions,andanun-empoweredandfrustratedbusinessteam.Organizationsneedtothinkdifferentlyabouthowtheyempoweralloftheiremployees.Organizationsneedtofindawaytopromoteandnurturecreativethinkingandgroundbreakingideasacrossalllevelsoftheorganization.Thereisnoedictthatstatesthatthebestideasonlycomefromseniormanagement.
Thekeytobigdatasuccessisempoweringcross-functionalcollaborationandexploratorythinkingtochallengelong-heldorganizationalrulesofthumb,heuristics,and“gut”decisionmaking.Thebusinessneedsanapproachthatisinclusiveofallthekeystakeholders—IT,businessusers,businessmanagement,channelpartners,andultimatelycustomers.Thebusinesspotentialofbigdataisonlylimitedbythecreativethinkingoftheorganization.
CROSS-REFERENCE
Chapter13(“PowerofEnvisioning”)discusseshowtheBIanddatascienceteamscancollaboratetobrainstorm,test,andrefinenewvariablesthatmightbebetterpredictorsofbusinessperformance.WewillintroduceseveraltechniquesandconceptsthatcanbeusedtodrivecollaborationbetweenthebusinessandITstakeholdersandultimatelyhelpyourdatascienceteamuncovernewcustomer,product,andoperationalinsightsthatleadtobetterbusinessperformance.Chapter14(“OrganizationalRamifications”)introducesorganizationalramifications,especiallytheroleofChiefDataMonetizationOfficer(CDMO).
SummaryBigdataisinterestingfromatechnologyperspective,buttherealstoryforbigdataishoworganizationsofdifferentsizesareleveragingdataandanalyticstopowertheirbusinessmodels.Bigdatahasthepotentialtouncovernewcustomer,product,andoperationalinsightsthatorganizationscanusetooptimizekeybusinessprocesses,improvecustomerengagement,uncovernewmonetizationopportunities,andre-wiretheorganization'svaluecreationprocesses.
Asdiscussedinthischapter,organizationsneedtounderstandthatbigdataisaboutbusinesstransformationandbusinessmodeldisruption.Therewillbewinnersandtherewillbelosers,andhavingbusinessleadershipsitbackandwaitforITtosolvethebigdataproblemsforthemquicklyclassifiesintowhichgroupyourorganizationwilllikelyfall.Seniorbusinessleadershipneedstodeterminewhereandhowtoleveragedataandanalyticstopoweryourbusinessmodelsbeforeamorenimblecompetitororahungriercompetitordisintermediatesyourbusiness.
Torealizethefinancialpotentialofbigdata,businessleadershipmustmakebigdataatopbusinesspriority,notjustatopITpriority.Businessleadershipmustactivelyparticipateindeterminingwhereandhowbigdatacandeliverbusinessvalue,andthebusinessleadersmustbefrontandcenterinleadingtheintegrationoftheresultinganalyticinsightsintotheorganization'svaluecreationprocesses.
Forleadingorganizations,bigdataprovidesaonce-in-a-lifetimebusinessopportunitytobuildkeycapabilities,skills,andapplicationsthatoptimizekeybusinessprocesses,driveamorecompellingcustomerexperience,uncovernewmonetizationopportunities,anddrivecompetitivedifferentiation.Remember:buyforparity,butbuildforcompetitivedifferentiation.
Atitscore,bigdataisabouteconomictransformation.Bigdatashouldnotbetreatedlikejustanothertechnologyscienceexperiment.Historyisfulloflessonsofhoworganizationshavebeenabletocapitalizeoneconomics-drivenbusinesstransformations.Bigdataprovidesoneofthoseeconomic“ForrestGump”momentswhereorganizationsarefortunatetobeattherightplaceattherighttime.Don'tmissthisopportunity.
Finally,organizationshavebeentaughttothinkcheaper,smaller,andfaster,buttheyhavenotbeentaughttothinkdifferently,andthat'sexactlywhat'srequiredifyouwanttoexploitthebigdataopportunity.Manyofthedataandanalyticsbestpracticesthathavebeentaughtoverthepastseveraldecadesnolongerholdtrue.Understandwhathaschangedandlearntothinkdifferentlyabouthowyourorganizationleveragesdataandanalyticstodelivercompellingbusinessvalue.
Insummary,businessleadershipneedstoleadthebigdatainitiative,tostepupandmakebigdataatopbusinessmandate.Ifyourbusinessleadersdon'ttaketheleadinidentifyingwhereandhowtointegratebigdataintoyourbusinessmodels,thenyouriskbeingdisintermediatedinamarketplacewheremoreagile,hungrier
HomeworkAssignmentUsethefollowingexercisestoapplywhatyoulearnedinthischapter.
Exercise#1:Identifyakeybusinessinitiativeforyourorganization,somethingthebusinessistryingtoaccomplishoverthenext9to12months.Itmightbesomethinglikeimprovecustomerretention,optimizecustomeracquisition,reducecustomerchurn,optimizepredictivemaintenance,reducerevenuetheft,andsoon.
Exercise#2:Brainstormandwritedownwhat(1)customer,(2)product,and(3)operationalinsightsyourorganizationwouldliketouncoverinordertosupportthetargetedbusinessinitiative.Startbycapturingthedifferenttypesofdescriptive,predictive,andprescriptivequestionsyou'dliketoansweraboutthetargetedbusinessinitiative.Tip:Don'tworryaboutwhetherornotyouhavethedatasourcesyouneedtoderivetheinsightsyouwant(yet).
Exercise#3:Brainstormandwritedowndatasourcesthatmightbeusefulinuncoveringthosekeyinsights.Lookbothinternallyandexternallyforinterestingdatasourcesthatmightbeuseful.Tip:Thinkoutsidetheboxandimaginethatyoucouldaccessanydatasourceintheworld.
Notes1Hopkins,Brian,FatemehKhatibloowithKyleMcNabb,JamesStaten,AndrasCser,HolgerKisker,Ph.D.,LeslieOwens,JenniferBelissent,Ph.D.,AbigailKomlenic,“ResetOnBigData:EmbraceBigDatatoEngageCustomersatScale,”ForresterResearch,2014.
2http://railroadandsteamengine.weebly.com/impact.html
Chapter2BigDataBusinessModelMaturityIndexOrganizationsdonotunderstandhowfarbigdatacantakethemfromabusinesstransformationperspective.Organizationsdon'thaveawayofunderstandingwhattheultimatebigdataendstatewouldorcouldlooklikeoransweringquestionssuchas:
WhereandhowshouldIstartmybigdatajourney?
HowcanIcreatenewrevenueormonetizationopportunities?
HowdoIcomparetootherswithrespecttomyorganization'sadoptionofbigdataasabusinessenabler?
HowfarcanIpushbigdatatopower—eventransform—mybusinessmodels?
Tohelpaddressthesetypesofquestions,I'vecreatedtheBigDataBusinessModelMaturityIndex.Notonlycanorganizationscanusethisindextounderstandwheretheysitwithrespecttootherorganizationsinexploitingbigdataandadvancedanalyticstopowertheirbusinessmodels,buttheindexprovidesaroadmaptohelporganizationsacceleratetheintegrationofdataandanalyticsintotheirbusinessmodels.
TheBigDataBusinessModelMaturityIndexisacriticalfoundationalconceptsupportingtheBigDataMBAandwillbereferencedregularlythroughoutthebook.It'simportanttolayastrongbasefoundationinhoworganizationscanusetheBigDataBusinessModelMaturityIndextoanswerthisfundamentalbigdatabusinessquestion:“Howeffectiveismyorganizationatintegratingdataandanalyticsintoourbusinessmodels?”
Chapter2Objectives
IntroducetheBigDataBusinessModelMaturityIndexasaframeworkfororganizationstomeasurehoweffectivetheyareatleveragingdataandanalyticstopowertheirbusinessmodels
DiscusstheobjectivesandcharacteristicsofeachofthefivephasesoftheBigDataBusinessModelMaturityIndex:BusinessMonitoring,BusinessInsights,BusinessOptimization,DataMonetization,andBusinessMetamorphosis
DiscusshowtheeconomicsofbigdataandthefourbigdatavaluedriverscanenableorganizationstocrosstheanalyticschasmandadvancepasttheBusinessMonitoringphaseintotheBusinessInsightsandBusinessOptimizationphases
ReviewlessonslearnedthathelporganizationsadvancethroughthephasesoftheBigDataBusinessModelMaturityIndex
IntroducingtheBigDataBusinessModelMaturityIndexOrganizationsaremovingatdifferentpaceswithrespecttowhereandhowtheyareadoptingbigdataandadvancedanalyticstocreatebusinessvalue.Someorganizationsaremovingverycautiously,astheyareunclearastowhereandhowtostartandwhichofthebevyofnewtechnologyinnovationstheyneedtodeployinordertostarttheirbigdatajourneys.OthersaremovingatamoreaggressivepacebyacquiringandassemblingabigdatatechnologyfoundationbuiltonmanynewbigdatatechnologiessuchasHadoop,Spark,MapReduce,YARN,Mahout,Hive,HBase,andmore.
However,aselectfewarelookingbeyondjustthetechnologytoidentifywhereandhowtheyshouldbeintegratingbigdataintotheirexistingbusinessprocesses.Theseorganizationsareaggressivelylookingtoidentifyandexploitopportunitiestooptimizekeybusinessprocesses.Andtheseorganizationsareseekingnewmonetizationopportunities;thatis,seekingoutbusinessopportunitieswheretheycan
Packageandselltheiranalyticinsightstoothers
Integrateadvancedanalyticsintotheirproductsandservicestocreate“intelligent”products
Createentirelynewproductsandservicesthathelpthementernewmarketsandtargetnewcustomers
Thesearethefolkswhorealizethattheydon'tneedabigdatastrategyasmuchastheyneedabusinessstrategythatincorporatesbigdata.Andwhenorganizations“flipthatbyte”onthefocusoftheirbigdatainitiatives,thebusinesspotentialisalmostboundless.
OrganizationscanusetheBigDataBusinessModelMaturityIndexasaframeworkagainstwhichtheycanmeasurewheretheysittodaywithrespecttotheiradoptionofbigdata.TheBigDataBusinessModelMaturityIndexprovidesaroadmapforhelpingorganizationstoidentifywhereandhowtheycanleveragedataandanalyticstopowertheirbusinessmodels(seeFigure2.1).
Figure2.1BigDataBusinessModelMaturityIndex
OrganizationstendtofindthemselvesinoneoffivephasesontheBigDataBusinessModelMaturityIndex:
Phase1:BusinessMonitoring.IntheBusinessMonitoringphase,organizationsareapplyingdatawarehousingandBusinessIntelligencetechniquesandtoolstomonitortheorganization'sbusinessperformance(alsocalledBusinessPerformanceManagement).
Phase2:BusinessInsights.IntheBusinessInsightsphase,organizationsaggressivelyexpandtheirdataassetsbyamassingalloftheirdetailedtransactionalandoperationaldataandcouplingthattransactionalandoperationaldatawithnewsourcesofinternaldata(e.g.,consumercomments,e-mailconversations,techniciannotes)andexternaldata(e.g.,socialmedia,weather,traffic,economic,data.gov)sources.OrganizationsintheBusinessInsightsphasethenusepredictiveanalyticstouncovercustomer,product,andoperationalinsightsburiedinandacrossthesedatasources.
Phase3:BusinessOptimization.IntheBusinessOptimizationphase,organizationsbuildonthecustomer,product,andoperationalinsightsuncoveredintheBusinessInsightsphasebyapplyingprescriptiveanalyticstooptimizekeybusinessprocesses.OrganizationsintheBusinessOptimizationphasepushtheanalyticresults(e.g.,recommendations,scores,rules)tofrontlineemployeesandbusinessmanagerstohelpthemoptimizethetargetedbusinessprocessthroughimproveddecisionmaking.TheBusinessOptimizationphasealsoprovidesopportunitiesfororganizationstopushanalyticinsightstotheircustomersinordertoinfluencecustomerbehaviors.AnexampleoftheBusinessOptimizationphaseisaretailerthatdeliversanalytic-basedmerchandisingrecommendationstothestoremanagerstooptimizemerchandisemarkdownsbasedonpurchasepatterns,inventory,weatherconditions,holidays,consumercomments,andsocialmediapostings.
Phase4:DataMonetization.TheDataMonetizationphaseiswhereorganizationsseektocreatenewsourcesofrevenue.Thiscouldincludesellingdata—orinsights—intonewmarkets(acellularphoneproviderselling
customerbehavioraldatatoadvertisers),integratinganalyticalinsightsintoproductsandservicestocreate“smart”productsandservices,and/orre-packagingcustomer,product,andoperationalinsightstocreateentirelynewproductsandservicesthathelpthementernewmarketsandtargetnewcustomersoraudiences.
Phase5:BusinessMetamorphosis.TheholygrailoftheBigDataBusinessModelMaturityIndexiswhenanorganizationleveragesdata,analytics,andinsightstometamorphoseitsbusiness.Thismetamorphosisnecessitatesamajorshiftintheorganization'scorebusinessmodel(e.g.,processes,people,productsandservices,partnerships,targetmarkets,management,promotions,rewardsandincentives)drivenbytheinsightsgatheredastheorganizationtraversedtheBigDataBusinessModelMaturityIndex.Oneexampleisorganizationsthatmetamorphosefromsellingproductstoselling“business-as-a-service.”ThinkGEselling“thrust”insteadofsellingjetengines.ThinkJohnDeereselling“farmingoptimization”insteadofsellingfarmingequipment.ThinkBoeingselling“airmiles”insteadofairplanes.Anotherexampleisanorganizationcreatingadataandanalyticsplatformthatenablesthegrowingbodyofthird-partydeveloperstobuildandmarketvalue-addedapplicationsontheorganization'sbusiness-as-a-serviceplatform.
Let'sexploreeachofthesephasesinmoredetail.
Phase1:BusinessMonitoringTheBusinessMonitoringphaseisthephasewhereorganizationsaredeployingBusinessIntelligence(BI)anddatawarehousingsolutionstomonitorongoingbusinessperformance.SometimescalledBusinessPerformanceManagement,organizationsintheBusinessMonitoringphasecreatereportsanddashboardsthatmonitorthecurrentstateofthebusiness,flagunder-and/orover-performanceareasofthebusiness,andalertkeybusinessstakeholderswithpertinentinformationwheneverspecial“outofbound”performancesituationsoccur.
TheBusinessMonitoringphaseisagreatstartingpointformostbigdatajourneys.AspartoftheirBusinessIntelligenceanddatawarehousingefforts,organizationshaveinvestedsignificanttime,money,andefforttoidentifyanddocumenttheirkeybusinessprocesses;thatis,thosebusinessprocessesthatmaketheirorganizationsuniqueandsuccessful.Theyhaveassembled,cleansed,normalized,enriched,andintegratedthekeyoperationaldatasources;havepainstakinglyconstructedasupportingdatamodelanddataarchitecture;andhavebuiltcountlessreports,dashboards,andalertsaroundthekeyactivitiesandmetricsthatsupportthatbusinessprocess.Lotsofgreatassetshavealreadybeencreated,andtheseassetsprovidethelaunchingpadforstartingourbigdatajourney.
Unfortunately,movingbeyondtheBusinessMonitoringphaseisasignificant
challengeformanyorganizations.TheinertiaestablishedfromyearsanddecadesofBIanddatawarehouseeffortsworkagainstthe“thinkdifferently”approachthatisnecessarytofullyexploitbigdataforbusinessvalue.Plusthebigfinancialpayoffisn'ttypicallyrealizeduntiltheorganizationpushesthroughtheBusinessInsightsphaseintotheBusinessOptimizationphase.Solet'sdiscusshoworganizationscanleveragetheeconomicsofbigdatatocrosstheanalyticschasm.
Phase2:BusinessInsightsTheBusinessInsightsphasecouplestheorganization'sgrowingwealthofinternalandexternalstructuredandunstructureddatawithpredictiveanalyticstouncovercustomer,product,andoperationalinsightsburiedinthedata.Thismeansuncoveringoccurrencesinthedatathatareunusual(oroutsidenormalbehaviors,trends,andpatterns)andworthyofbusinessinvestigation.
ThisisthephaseoftheBigDataBusinessModelMaturityIndexwhereorganizationsneedtoexploittheeconomicsofbigdata;thatis,bigdatatechnologiesare20to50timescheaperthantraditionaldatawarehousesinstoring,managing,andanalyzingdata.Theeconomicsofbigdataenableorganizationstothinkdifferentlyabouthowtheygather,integrate,manage,analyze,andactupondataandprovidethefoundationforhoworganizationscanadvancebeyondtheBusinessMonitoringphaseandcrosstheanalyticschasm.TheeconomicsofbigdataenablefournewcapabilitiesthatwillhelptheorganizationcrosstheanalyticschasmandmovebeyondtheBusinessMonitoringphaseintotheBusinessInsightsphase.Thesefourbigdatavaluedriversare:
1. AccesstoAlloftheOrganization'sTransactionalandOperationalData.Inbigdata,weneedtomovebeyondthesummarizedandaggregateddatathatishousedinthedatawarehouseandbepreparedtostoreandanalyzetheorganization'scompletehistoryofdetailedtransactionalandoperationaldata.Think25yearsofdetailedpointofsale(POS)transactionaldata,notjustthe13to25monthsofaggregatedPOSdatastoredinthedatawarehouse.
ImaginethebusinesspotentialofbeingabletoanalyzeeachPOStransactionattheindividualcustomerlevel(courtesyofloyaltyprograms)forthepast15to25years.Forexample,grocerscouldseewhenindividualcustomersstarttostrugglefinanciallybecausetheyarelikelytochangetheirpurchasebehaviorsandproductpreferences(i.e.,buyinglower-qualityproducts,replacingbrandedproductswithprivatelabelproducts,increasingtheuseofdiscountsandcoupons).Youcan'tseethoseindividualcustomerbehaviorsandpurchasetendenciesintheaggregateddatastoredinthedatawarehouse.Withbigdata,organizationshavetheabilitytocollect,analyze,andactontheentirehistoryofeverypurchaseoccasionbyBillSchmarzo—whatproductsheboughtinwhatcombinations,whatpriceshepaid,whatcouponsheused,whatandwhenheboughtondiscount,whichstoreshefrequentedonwhat
timeofdayanddayoftheweek,whatweretheoutsideweatherconditionsduringthosepurchaseoccasions,whatwerethelocaleconomicconditions,etc.
Whenyoucananalyzetransactionalandoperationaldataattheindividualcustomer(orpatient,student,technician,teacher,windturbine,ATM,truck,jetengine,etc.)level,youcanuncoverinsightsaboutindividualcustomerorproductbehaviors,tendencies,propensities,preferences,andusagepatterns.Itisontheseindividualcustomerorproductinsightsthatorganizationscantakeaction.It'sverydifficulttocreateactionableinsightsattheaggregatedlevelofstore,zipcode,orcustomerbehavioralcategories.
2. AccesstoInternalandExternalUnstructuredData.Datawarehousesdon'tlikeunstructureddata.Datawarehouseswantstructureddata.Sincedatawarehouseshavebeenbuiltonrelationaldatabasemanagementsystems(RDMBS),thedatawarehousewantsitsdatainrowsandcolumns.Asaconsequence,organizationsandtheirbusinessusershavebeentaughtthattheyreallydon'tneedaccesstounstructureddata.
Butbigdatachallengesthisissuebygivingallorganizationsacost-effectivewaytoingest,store,manage,andanalyzevastvarietiesofunstructureddata.Andtheintegrationoftheorganization'sunstructureddatawiththeorganization'sdetailedstructureddataprovidestheopportunitytouncovernewcustomer,product,andoperationalinsights.
Whilemostoftheexcitementaboutunstructureddataseemstobeaboutthepotentialofexternalunstructureddata(e.g.,social,blogs,newsfeeds,annualreports,mobile,third-party,publiclyavailable),thegoldformanyorganizationsliesintheirinternalunstructureddata(e.g.,consumercomments,e-mailconversations,doctor/teacher/techniciannotes,workorders,servicerequests).Forexample,inaprojecttoimprovethepredictivemaintenanceofwindturbines,itwasdiscoveredthatwhenatechnicianscalesawindturbinetoreplaceaballbearing,heorshemakesotherobservationswhileatthetopoftheturbine,observationssuchas“Itsmellsweirdinhere”or“It'swarmerthannormal”or“Therearedustparticlesintheair.”Eachofthesetypesofunstructuredcommentscouldprovideinvaluableinsightsintothepredictivemaintenanceofthewindturbine,especiallywhencoupledwiththeoperationalsensorreadings,errorcodes,andvibrationsthatarecomingoffthatparticularwindturbine.
3. ExploitingReal-TimeAnalytics.Newbigdatatechnologiesprovideorganizationsthetechnicalcapabilitiestoflagandactonspecialorunusualsituationsinreal-time.Datawarehouseshavetraditionallybeenbatchenvironmentsandstruggledtouncoverandsupportthereal-timeopportunitiesinthedata.Forexample,“tricklefeeding”dataintothedatawarehousehasbeenalong-timedatawarehousechallengebecausetheminutenewdataentersthedatawarehouse,allthesupportingindices,
aggregatetables,andmaterializedviewsneedtobeupdatedwiththenewdata.That'shardlyconducivetoreal-timeanalysis.
Mostorganizationsdonothavealonglistofusecasesthatrequireareal-timeanalyticsenvironment(e.g.,real-timebidding,frauddetection,digitaladplacement,pricing,yieldoptimization).However,therearemanyusecasesfor“right-time”analytics,wheretheopportunitytimeismeasurednotinsecondsbutinminutesorhoursorevendays.Forexample,nursesandadmissionspersonnelinahospitallikelyhave4to5minutestoscorethelikelihoodofapatientcatchingahospital-acquiredinfection(staphinfection)duringthepatient'sadmissionprocess.Anotherexampleislocation-basedservicesthattargetshoppersthatmeetcertaindemographicand/orbehavioralcharacteristicsastheywalkbyastore.
Thebestapproachforuncoveringtheseright-timeanalyticopportunitiesistobreakthetargetedkeybusinessinitiativeintothedataeventsthatcomposethatbusinessinitiative.Thenidentifythosedataeventswhereknowingaboutthateventsooner(minutessooner,hourssooner,maybeevenadaysooner)couldprovideamonetizationopportunity.
4. IntegratingPredictiveAnalytics.Finally,wecanusepredictiveanalyticstominethewealthofstructuredandunstructureddatatoidentifyareasof“unusualness”inthedata;thatis,usepredictiveanalyticstouncoveroccurrencesinthedatathatareoutsidenormalbehaviorsorengagementpatterns.Organizationscanapplypredictiveanalyticsanddataminingtechniquestouncovercustomer,product,andoperationalinsightsorareasof“unusualness”buriedinthemassivevolumesofdetailedstructuredandunstructureddata.
TheseinsightsuncoveredduringtheBusinessInsightsphaseneedtobereviewedbythebusinessusers(thesubjectmatterexperts)todetermineiftheseinsightspasstheS.A.M.test;thatis,theinsightsare:
Strategic—theinsightisimportantorstrategictowhatthebusinessistryingtoaccomplishwithrespecttothetargetedbusinessinitiative.
Actionable—theinsightissomethingthattheorganizationcanactonwhenengagingwithitskeybusinessentities.
Material—thevalueorbenefitofactingontheinsightisgreaterthanthecostsassociatedwithactingonthatinsight(e.g.,costtogatherandintegratethedata,costtobuildandvalidatetheanalyticmodel,costtointegratetheanalyticresultsintotheoperationalsystems).
Forexample,organizationscouldapplybasisstatistics,datamining,andpredictiveanalyticstotheirgrowingwealthofstructuredandunstructureddatatoidentifyinsightssuchas:
Marketingcampaignsthatareperformingtwotothreetimesbetterthan
theaveragecampaignperformanceincertainmarketsoncertaindaysoftheweek
Customersthatarereactingtwotothreestandarddeviationsoutsidethenormintheirpurchasepatternsforcertainproductcategoriesincertainweatherconditions
Supplierswhosecomponentsareoperatingoutsidetheupperorlowerlimitsofacontrolchartinextremecoldweathersituations
CROSS-REFERENCE
Forthepredictiveanalyticstobeeffective,organizationsneedtobuilddetailedanalyticprofilesforeachindividualbusinessentity—customers,patients,students,windturbines,jetengines,ATMs,etc.ThecreationandroleofanalyticprofilesisatopiccoveredinChapter5,“DifferencesBetweenBusinessIntelligenceandDataScience.”
BusinessInsightsPhaseChallengeTheBusinessInsightsphaseisthemostdifficultstageoftheBigDataBusinessModelMaturityIndexbecauseitrequiresorganizationsto“thinkdifferently”abouthowtheyapproachdataandanalytics.Therules,techniques,andapproachesthatworkedintheBusinessIntelligenceanddatawarehouseworldsdonotnecessaryapplytotheworldofbigdata.Thisistrulythe“crossingtheanalyticschasm”moment(seeFigure2.2).
Figure2.2Crossingtheanalyticschasm
Forexample,BusinessIntelligenceanalystsweretaughtto“sliceanddice”thedatatouncoverinsightsburiedinthedata.Thisapproachworkedfinewhendealingwithgigabytesofdata,5to9dimensions,and15to25metrics.However,the“sliceanddice”techniquedoesnotworkwellwhendealingwithpetabytesofdata,40to60dimensions,andhundredsofmetrics.
Also,muchofthebigdatafinancialpaybackorReturnonInvestment(ROI)isnotrealizeduntiltheorganizationreachestheBusinessOptimizationphase.Thisiswhyitisimportanttofocusyourbigdatajourneyonakeybusinessinitiative;somethingthatthebusinessistryingtoachieveoverthenext9to12months.ThefocusonabusinessinitiativecanprovidethenecessaryfinancialandorganizationalmotivationtopushthroughtheBusinessInsightsphaseandtorealizethefinancialreturnandpaybackcreatedintheBusinessOptimizationphase.
Phase3:BusinessOptimizationTheBusinessOptimizationphaseisthestageoftheBigDataBusinessModelMaturityIndexwhereorganizationsdevelopthepredictiveanalytics(predictswhatislikelytohappen)andtheprescriptiveanalytics(recommendsactionsthatshouldbetaken)necessarytooptimizethetargetedkeybusinessprocess.ThisphasebuildsontheanalyticinsightsuncoveredduringtheBusinessInsightsphaseandconstructspredictiveandprescriptiveanalyticmodelsaroundthoseinsightsthatpasstheS.A.M.criteria.Oneclientcalledthisthe“TellmewhatIneedtodo”phase.
Whilemanybelievethatthisisthepartofthematurityindexwhereorganizationsturntheoptimizationprocessovertothemachines,inrealityitismorelikelythattheBusinessOptimizationphasedeliversactionableinsights(e.g.,recommendations,scores,rules)tofrontlineemployeesandmanagerstohelpthemmakebetterdecisionssupportingthetargetedbusinessprocess.Examplesinclude:
Deliveringresourceschedulingrecommendationstostoremanagersbasedonpurchasehistory,buyingbehaviors,seasonality,andlocalweatherandevents
Deliveringdistributionandinventoryrecommendationstologisticmanagersgivencurrentandpredictedbuyingpatterns,coupledwithlocaltraffic,demographic,weather,andeventsdata
Deliveringproductpricingrecommendationstoproductmanagersbasedoncurrentbuyingpatterns,inventorylevels,competitiveprices,andproductinterestinsightsgleanedfromsocialmediadata
Deliveringfinancialinvestmentrecommendationstofinancialplannersandagentsbasedonaclient'sfinancialgoals,currentfinancialassetmix,risktolerance,marketandeconomicconditions,andsavingsobjectives(e.g.,house,college,retirement)
Deliveringmaintenance,scheduling,andinventoryrecommendationstowindturbinetechniciansbasedonerrorcodes,sensorreadings,vibrationreadings,andrecentcommentscapturedbythetechnicianduringpreviousmaintenanceactivities
TheBusinessOptimizationphasealsoseekstoinfluencecustomerpurchaseand
engagementbehaviorsbyanalyzingthecustomer'spastpurchasepatterns,behaviors,andtendenciesinordertodeliverrelevantandactionablerecommendations.CommonexamplesincludeAmazon's“CustomersWhoBoughtThisItemAlsoBought”recommendations,Netflix'smovierecommendations,andPandora'smusicrecommendations.Thekeytotheeffectivenessoftheserecommendationsiscapturingandanalyzinganindividualcustomer'spurchase,usage,andengagementactivitiestobuildanalyticprofilesthatcodifythatcustomer'spreferences,behaviors,tendencies,propensities,patterns,trends,interests,passions,affiliations,andassociations.
Finally,theBusinessOptimizationphaseneedstointegratethecustomer,product,andoperationalprescriptiveanalyticsorrecommendationsbackintotheoperationalsystems(e.g.,callcenter,salesforceautomation,directmarketing,procurement,logistics,inventory)andmanagementapplications(reports,dashboards)systems.Forexample,thinkofan“intelligent”storemanager'sdashboard,whereinsteadofjustpresentingtablesandchartsofdata,theintelligentdashboardgoesonestepfurthertoactuallydeliverrecommendationstothestoremanagertoimprovestoreoperations.
CROSS-REFERENCE
ThepotentialuserexperienceramificationsofpushingprescriptiveanalyticstobothcustomersandfrontlineemployeesarediscussedinChapter4,“ImportanceoftheUserExperience.”
Phase4:DataMonetizationTheDataMonetizationphaseisthephaseoftheBigDataBusinessModelMaturityIndexwhereorganizationsleveragetheinsightsgatheredfromtheBusinessInsightsandBusinessOptimizationphasestocreatenewrevenueopportunities.Newmonetizationopportunitiescouldinclude:
Packagingdata(withanalyticinsights)forsaletootherorganizations.Inoneexample,asmartphonevendorcouldcaptureandpackageinsightsaboutcustomerbehaviors,productperformance,andmarkettrendstoselltoadvertisers,marketers,andmanufacturers.Inanotherexample,MapMyRun(whichwaspurchasedbyUnderArmourfor$150M)couldpackagethecustomerusagedatafromitssmartphoneapplicationtocreateaudienceandproductinsightsthatitcouldselltoavarietyofcompanies,includingsportsapparelmanufacturers,sportinggoodsretailers,insurancecompanies,andhealthcareproviders(seeFigure2.3).
Figure2.3Packagingandsellingaudienceinsights
Integratinganalyticinsightsdirectlyintoanorganization'sproductsandservicestocreate“intelligent”productsorservices,suchas:
Carsthatlearnacustomer'sdrivingpatternsandbehaviorsandadjustdrivercontrols,seats,mirrors,brakepedals,suspension,steering,dashboarddisplays,etc.tomatchthecustomer'sdrivingstyle
TelevisionsandDVRsthatlearnwhattypesofshowsandmoviesacustomerlikesandsearchacrossthedifferentcableandInternetchannelstofindandautomaticallyrecordsimilarshowsforthatcustomer
Ovensthatlearnhowacustomerlikescertainfoodspreparedandcookstheminthatmannerautomaticallyandalsoincluderecommendationsforotherfoodsandrecipesthat“otherslikeyou”enjoy
Jetenginesthatcaningestweather,elevation,windspeed,andotherenvironmentaldatatomakeadjustmentstobladeangles,tilt,yaw,androtationspeedstominimizefuelconsumptionduringflight
Repackaginginsightstocreateentirelynewproductsandservicesthathelporganizationstoenternewmarketsandtargetnewcustomersoraudiences.Forexample,organizationscancapture,analyze,andpackagecustomer,product,andoperationalinsightsacrosstheoverallmarketinordertohelpchannelpartnerstomoreeffectivelymarketandselltotheircustomers,suchas:
Onlinedigitalmarketplaces(Yahoo,Google,eBay,Facebook)couldleveragegeneralmarkettrendsandothermerchantperformancedatatoproviderecommendationstosmallmerchantsoninventory,ordering,merchandising,marketing,andpricing.
Financialservicesorganizationscouldcreateafinancialadvisordashboard
fortheiragentsandbrokersthatcapturesclients'investmentgoals,currentincomelevels,andcurrentfinancialportfolioandcreatesinvestment,risk,andassetallocationrecommendationsthathelpthebrokersandagentsmoreeffectivelyservicetheircustomers.
Retailorganizationscouldminecustomerloyaltytransactionsandengagementstouncovercustomerandproductinsightsthatenabletheorganizationtomoveintonewproductcategoriesornewgeographies.
WhiletheDataMonetizationphaseisclearlythephaseoftheBigDataBusinessModelMaturityIndexthatcatcheseveryone'sattention,itisimportantthattheorganizationgoesthroughtheBusinessInsightsandBusinessOptimizationphasesinordertocapturethecustomer,product,operational,andmarketinsightsthatformthebasisforthesenewmonetizationopportunities.
Phase5:BusinessMetamorphosisTheBusinessMetamorphosisphaseoftheBigDataBusinessModelMaturityIndexshouldbetheultimategoalfororganizations.Thisisthephaseofthematurityindexwhereorganizationsseektoleveragethedata,analytics,andanalyticinsightstometamorphosizeortransformtheorganization'sbusinessmodel(e.g.,processes,people,productsandservices,partnerships,targetmarkets,management,promotions,rewardsandincentives).
TheBusinessMetamorphosisphaseiswhereorganizationsintegratetheinsightsthattheycapturedabouttheircustomers'usagepatterns,productperformancebehaviors,andoverallmarkettrendstotransformtheirbusinessmodels.Thisbusinessmodelmetamorphosisallowsorganizationstoprovidenewservicesandcapabilitiestotheircustomersinawaythatiseasierforthecustomerstoconsumeandfacilitatestheorganizationengaginginhigher-valueandmorestrategicservices.
Forexample,contemplatethedata,analytics,andanalyticinsightsthatBoeingwouldneedtotransformitsbusinessfromsellingairplanestosellingairmiles.Thinkofthedata,analytics,andinsightsthatBoeingwouldneedtouncoveraboutpassengers,airlines,airports,routes,holidays,economicconditions,etc.inordertooptimizeitsbusinessmodels,processes,people,etc.tosuccessfullyexecutethisbusinesschange.Thinkofthebusinessrequirementsnecessarytoencouragethird-partydeveloperstobuildandmarketvalue-addservicesandproductsonBoeing'snewbusinessmodel.ThisisatopicandexamplethatisconsideredinmoredetailinChapter12,“MetamorphosisExercise.”
OtherBusinessMetamorphosisphaseexamplescouldinclude:
Energycompaniesmovingintothe“HomeEnergyOptimization”businessbyrecommendingwhentoreplaceappliances(basedonpredictivemaintenance)andevenrecommendingwhichappliancebrandsandmodelstobuybasedontheperformanceofdifferentappliancestakingintoconsiderationyourusage
patterns,localweather,localwaterquality,andlocalenvironmentalconditionssuchaslocalwaterconservationeffortsandenergycosts
Retailersmovingintothe“ShoppingOptimization”businessbyrecommendingspecificproductsgivencustomers'currentbuyingpatternsascomparedwithotherslikethem,includingrecommendationsforproductsthattheymaynotevensell(think“Miracleon43rdStreet”)
Airlinesmovingintothe“TravelDelight”businessofnotonlyofferingdiscountsonairtravelbasedoncustomers'travelbehaviorsandpreferencesbutalsoproactivelyfindingandrecommendingdealsonhotels,rentalcars,limos,sportingormusicalevents,andlocalsites,shows,restaurants,andshoppingintheareasbasedontheirareasofinterestandpreferences
WhileitisasignificantchallengefororganizationstoeverreachtheBusinessMetamorphosisphase,havingthatasthegoalcanbothbemotivatingandprovideanorganizationalcatalysttomovemoreaggressivelyalongthematurityindex.
BigDataBusinessModelMaturityIndexLessonsLearnedTherearesomeinterestinglessonsthatorganizationswilldiscoverastheyprogressthroughthephasesoftheBigDataBusinessModelMaturityIndex.Understandingtheselessonsaheadoftimeshouldhelpprepareorganizationsfortheirbigdatajourney.
Lesson1:FocusInitialBigDataEffortsInternallyThefirstthreephasesoftheBigDataBusinessModelMaturityIndexseektoextractmorefinancialorbusinessvalueoutoftheorganization'sinternalprocessesorbusinessinitiatives.ThefirstthreephasesdrivebusinessvalueandaReturnonInvestment(ROI)byseekingtointegratenewsourcesofcustomer,product,operational,andmarketdatawithadvancedanalyticstoimprovethedecisionsthataremadeaspartoftheorganization'skeyinternalprocessandbusinessinitiatives(seeFigure2.4).
Figure2.4Optimizeinternalprocesses
Theinternalprocessoptimizationeffortsstartbyseekingtoleveragetheorganization'sBusinessIntelligenceanddatawarehouseassets.Thisincludesbuildingonthedatawarehouse'sdatasources,dataextractionandenrichmentalgorithms,dimensions,metrics,keyperformanceindicators,reports,anddashboards.ThematurityprocessthenappliesthefourbigdatavaluedriverstocrosstheanalyticschasmfromtheBusinessMonitoringphaseintotheBusinessInsightsandultimatelytheBusinessOptimizationphases.
TheFourBigDataValueDrivers
1. Accesstoalltheorganization'sdetailedtransactionalandoperationaldataatthelowestlevelofgranularity(attheindividualcustomer,machine,ordevicelevel).
2. Integrationofunstructureddatafrombothinternal(consumercomments,e-mailthreads,techniciannotes)andexternalsources(socialmedia,mobile,publiclyavailable)withthedetailedtransactionalandoperationaldatatoprovidenewmetricsandnewdimensionsagainstwhichtooptimizekeybusinessprocesses.
3. Leveragereal-time(orright-time)dataanalysistoacceleratetheorganization'sabilitytoidentifyandactoncustomer,product,andmarketopportunitiesinatimeliermanner.
4. Applypredictiveanalyticsanddataminingtouncovercustomer,product,andoperationalinsightsorareasof“unusualness”buriedinthemassivevolumesofdetailedstructuredandunstructureddatathatareworthyoffurtherbusinessinvestigation.
Organizationsmustleveragethesefourbigdatavaluedriverstocrosstheanalyticschasmbyuncoveringnewcustomer,product,andoperationalinsightsthatcanbeusedtooptimizekeybusinessprocesses—whetherdeliveringactionablerecommendationstofrontlineemployeesandbusinessmanagersordelivering“NextBestOffer”orrecommendationstodelightcustomersandbusinesspartners.
Lesson2:LeverageInsightstoCreateNewMonetizationOpportunitiesThelasttwophasesoftheBigDataBusinessModelMaturityIndexarefocusedonexternalmarketopportunities;opportunitiestocreatenewmonetizationorrevenueopportunitiesbasedonthecustomer,product,andoperationalinsightsgleanedfromthefirstthreephasesofthematurityindex(seeFigure2.5).
Figure2.5Createnewmonetizationopportunities
Thisisthepartofthebigdatajourneythatcatchesmostorganizations'attention:theopportunitytoleveragetheinsightsgatheredthroughtheoptimizationoftheirkeybusinessprocessestocreatenewrevenueormonetizationopportunities.Organizationsareeagertoleveragenewcorporateassets—data,analytics,andbusinessinsights—inordertocreatenewsourcesofrevenue.Thisisthe“4Ms”phaseoftheBigDataBusinessModelMaturityIndexwhereorganizationsfocusonleveragingdataandanalyticstocreatenewopportunitiesto“MakeMeMoreMoney!”
Lesson3:PreparingforOrganizationalTransformationTofullyexploitthebigdataopportunity,subtleorganizationalandculturalchangeswillbenecessaryfortheorganizationtoadvancealongthematurityindex.Iforganizationsareseriousaboutintegratingdataandanalyticsintotheirbusinessmodels,thenthreeorganizationalorculturaltransformationswillneedtotakeplace:
1.TreatDataasanAsset.Organizationsmuststarttotreatdataasanassettobenurturedandgrown,notacosttobeminimized.Organizationsmustdevelopaninsatiableappetiteformoreandmoredata—eveniftheyareunclearastohowtheywillusethatdata.Thisisasignificantculturalchangefromthedatawarehousedayswherewetreateddataasacosttobeminimized.
2.LegallyProtectYourAnalyticsIntellectualProperty.Organizationsmustputintoplaceformalprocessesandprocedurestocapture,track,refine,andevenlegallyprotecttheiranalyticassets(e.g.,analyticmodels,dataenrichmentalgorithms,andanalyticresultssuchasscores,recommendations,andassociationrules)askeyorganizationalintellectualproperty.Whiletheunderlyingtechnologiesmaychangeovertime,theresultingdataandanalyticassetswillsurvivethosechangesiftheorganizationscaninstituteawell-managedandenforcedprocesstocapture,store,share,andprotectthoseanalyticassets.
3.GetComfortableUsingDatatoGuideDecisions.Businessmanagementandbusinessusersmustgainconfidenceinusingdataandanalyticstoguidetheirdecisionmaking.Organizationsmustgetcomfortablewithmakingbusinessdecisionsbasedonwhatthedataandtheanalyticstellthemversusdefaultingtothe“HighestPaidPerson'sOpinion”(HIPPO).Theorganization'sinvestmentsindata,analytics,people,processes,andtechnologywillbefornaughtiftheorganizationisn'tpreparedtomakedecisionsbasedonwhatthedataandtheanalyticstellthem.Withthatsaid,it'simportantthattheanalyticinsightsarepositionedas“recommendations”thatbusinessusersandbusinessmanagementcanaccept,reject,ormodify.Inthatway,organizationscanleverageanalyticstoestablishorganizationalaccountability.
SummaryBusinessesofallsizesmustreframethebigdataconversationwithbusinessleaders.TheBigDataBusinessModelMaturityIndexprovidesaframeworkthatenablesbusinessandITleaderstodiscussanddebatethequestion“Howeffectiveisourorganizationatintegratingdataandanalyticsintoourbusinessmodel?”
Thebusinesspossibilitiesseemalmostendlesswithrespecttowhereandhoworganizationscanleveragebigdataandadvancedanalyticstodrivetheirbusinessmodel.TheBigDataBusinessModelMaturityIndexprovidesaroadmap—ahow-toguide—todirectthebusinessandITstakeholdersfromtheBusinessMonitoringphasethroughtheBusinessInsightsandBusinessOptimizationphases,totheultimategoalsintheDataMonetizationandBusinessMetamorphosisphasestocreatenewbusinessmodels(seeTable2.1).
Table2.1BigDataBusinessModelMaturityIndexSummary
BusinessMonitoring
BusinessInsights
BusinessOptimization
DataMonetization
BusinessMetamorphosis
MonitorkeybusinessprocessesandreportonbusinessperformanceusingdatawarehousingandBusinessIntelligencetechniques
Poolalldetailedoperationalandtransactionaldatawithinternalunstructureddataandexternal(third-party,publiclyavailable)data;integratewithadvancedanalyticstouncovercustomer,product,andoperationalinsightsburiedinthedata
Deliveractionablerecommendationsandscorestofront-lineemployeestooptimizecustomerengagement;deliveractionablerecommendationstoendcustomersbasedontheirproductandusagepreferences,propensities,andtendencies
Monetizethecustomer,product,andoperationalinsightscomingoutoftheoptimizationprocesstocreatenewservicesandproducts,capturenewmarketsandaudiences,andcreate“smart”productsandservices
Reconstitutecustomer,product,andoperationalinsightstometamorphosetheveryfabricofanorganization'sbusinessmodel,includingprocesses,people,compensation,promotions,products/services,targetmarkets,andpartnerships
Ultimately,bigdataonlymattersifitcanhelporganizationsgeneratemoremoney
throughimproveddecisionmaking(orimprovedoperationaleffectivenessfornon-profitorganizations).Bigdataholdsthepotentialtobothoptimizekeybusinessprocessesandcreatenewmonetizationorrevenueopportunities.
Insummary:
TheBigDataBusinessModelMaturityIndexprovidesaframeworkfororganizationstomeasurehoweffectivetheyareatleveragingdataandanalyticstopowertheirbusinessmodels.
ThefivephasesoftheBigDataBusinessModelMaturityIndexareBusinessMonitoring,BusinessInsights,BusinessOptimization,DataMonetization,andBusinessMetamorphosis.
Theeconomicsofbigdataandthefourbigdatavaluedriverscanenableorganizationstocrosstheanalyticschasm.
TheBigDataBusinessModelMaturityIndexprovidesaroadmapforbeingsuccessfulwithbigdatabybeginningwithanendinmind.Otherwise,“ifyoudon'tknowwhereyouaregoing,youmightendupsomeplaceelse”(toquoteYogiBerra).
HomeworkAssignmentUsethefollowingexercisestoapplyandreinforcetheinformationpresentedinthischapter:
Exercise#1:Listtwoorthreeofyourorganization'skeybusinessprocesses.Thatis,writedowntwoorthreebusinessprocessesthatuniquelydifferentiateyourorganizationfromyourcompetition.
Exercise#2:Listthefourbigdatavaluedriversthatareenabledbytheeconomicsofbigdataanddescribehoweachmightimpactoneofyourorganization'skeybusinessprocessesidentifiedinExercise#1.
Exercise#3:FortheselectedkeybusinessprocessesidentifiedinExercise#1,describehoweachkeybusinessprocessmightbeimprovedasittransitionsalongthefivephasesoftheBigDataBusinessModelMaturityIndex.Identifythecustomer,product,andoperationalramificationsthateachofthefivephasesmighthaveontheselectedkeybusinessprocess.
Exercise#4:Listtheculturalchangesthatyourorganizationmustaddressifithopestoleveragebigdatatoitsfullestbusinesspotential.Flagthetoptwoorthreeculturalchallengesthatmightbethemostdifficultforyourorganizationandlistwhatyouthinktheorganizationneedstodoinordertoaddressthosechallenges.
Chapter3TheBigDataStrategyDocumentOneofthebiggestchallengesorganizationsfacewithrespecttobigdataisidentifyingwhereandhowtostart.Thebigdatastrategydocument,detailedinthischapter,providesaframeworkforlinkinganorganization'sbusinessstrategyandsupportingbusinessinitiativestotheorganization'sbigdataefforts.Thebigdatastrategydocumentguidestheorganizationthroughtheprocessofbreakingdownitsbusinessstrategyandbusinessinitiativesintopotentialbigdatabusinessusecasesandthesupportingdataandanalyticrequirements.
NOTE
ThebigdatastrategydocumentfirstappearedinmybookBigData:UnderstandingHowDataPowersBigBusiness.Sincethenandcourtesyofseveralclientengagements,significantimprovementshavebeenmadetohelpuserstouncoverbigdatausecases.Inparticular,theprocesshasbeenenhancedtoclarifythebusinessvalueandimplementationfeasibilityassessmentsofthedifferentdatasourcesandusecaseprioritization(seeFigure3.1).
Figure3.1Bigdatastrategydecompositionprocess
Chapter3Objectives
Establishcommonterminologyforbigdata.
Examinetheconceptofabusinessinitiativeandprovidesomeexamplesofwheretofindthesebusinessinitiatives.
Introducethebigdatastrategydocumentasaframeworkforhelpingorganizationstoidentifytheusecasesthatguidewhereandhowtheycanstarttheirbigdatajourneys.
Provideahands-onexampleofthebigdatastrategydocumentinactionusingChipotle,achainoforganicMexicanfoodrestaurants(andoneofmyfavoriteplacestoeat!).
Introduceworksheetstohelporganizationstodeterminethebusinessvalueandimplementationfeasibilityofthedatasourcesthatcomeoutofthebigdatastrategydocumentprocess.
IntroducetheprioritizationmatrixasatoolthatcandrivebusinessandITalignmentaroundprioritizingtheusecasesbasedonbusinessvalueandimplementationfeasibilityovera9-to12-monthwindow.
Finally,wewillhavesomefunbyapplyingthebigdatastrategydocumenttotheworldofprofessionalbaseballanddemonstratehowthebigdatastrategydocumentcouldhelpaprofessionalbaseballorganizationwintheWorldSeries.
EstablishingCommonBusinessTerminologyBeforewelaunchintothebigdatastrategydocumentdiscussion,weneedtodefineafewcriticaltermstoensurethatweareusingconsistentterminologythroughoutthechapterandthebook:
CorporateMission.Whytheorganizationexists;defineswhatanorganizationisandtheorganization'sreasonforbeing.Forexample,TheWaltDisneyCompany'scorporatemissionis“tobeoneoftheworld'sleadingproducersandprovidersofentertainmentandinformation.”1
BusinessStrategy.Howtheorganizationisgoingtoachieveitsmissionoverthenexttwotothreeyears.
StrategicBusinessInitiatives.Whattheorganizationplanstodotoachieveitsbusinessstrategyoverthenext9to12months;usuallyincludesbusinessobjectives,financialtargets,metrics,andtimeframes.
BusinessEntities.Thephysicalobjectsorentities(e.g.,customers,patients,students,doctors,windturbines,trucks)aroundwhichthebusinessinitiativewilltrytounderstand,predict,andinfluencebehaviorsandperformance(sometimesreferredtoasthestrategicnounsofthebusiness).
BusinessStakeholders.Thosebusinessfunctions(sales,marketing,finance,storeoperations,logistics,andsoon)thatimpactorareimpactedbythestrategicbusinessinitiative.
BusinessDecisions.Thedecisionsthatthebusinessstakeholdersneedtomakeinsupportofthestrategicbusinessinitiative.
BigDataUseCases.Theanalyticusecases(decisionsandcorrespondingactions)thatsupportthestrategicbusinessinitiative.
Data.Thestructuredandunstructureddatasources,bothinternalandexternaloftheorganization,thatwillbeidentifiedthroughoutthebigdatastrategydocumentprocess.
IntroducingtheBigDataStrategyDocumentThebigdatastrategydocumenthelpsorganizationsaddressthechallengeofidentifyingwhereandhowtostarttheirbigdatajourneys.Thebigdatastrategydocumentusesasingle-pageformatthatanyorganizationcanuse(profitornon-profit)thatlinksanorganization'sbigdataeffortstoitsbusinessstrategyandkeybusinessinitiatives.Thebigdatastrategydocumentiseffectiveforthefollowingreasons:
It'sconcise.Itfitsonasinglepagesothatanyoneintheorganizationcanquicklyreviewittoensureheorsheisworkingonthetoppriorityitems.
It'sclear.Itclearlydefineswhattheorganizationneedstodoinordertoachievetheorganization'skeybusinessinitiatives.
It'sbusinessrelevant.Itstartsbyfocusingonthebusinessstrategyandsupportinginitiativesbeforeitdivesintothedataandtechnologyrequirements.
Thebigdatastrategydocumentiscomposedofthefollowingsections(seeFigure3.2):
Businessstrategy
Keybusinessinitiatives
Keybusinessentities
Keydecisions
Financialdrivers(usecases)
Figure3.2Bigdatastrategydocument
Therestofthechapterwilldetaileachofthesesectionsandprovideguidelinesforhowtheorganizationcantriagetheorganization'sbusinessstrategyintothefinancialdrivers(orusecases)onwhichtheorganizationcanfocusitsbigdataefforts.WewilluseacasestudyaroundChipotleMexicanGrillstoreinforcethetriageandanalysisprocess.
IdentifyingtheOrganization'sKeyBusinessInitiativesThestartingpointforthebigdatastrategydocumentprocessistoidentifytheorganization'sbusinessinitiativesoverthenext9to12months.Thatis,whatisthebusinesstryingtoaccomplishoverthenext9to12months?This9-to12-monthtimeframeiscritical,asit
Focusestheorganization'sbigdataeffortsonsomethingthatisofimmediatevalueandrelevancetothebusiness
Createsasenseofurgencyfortheorganizationtomovequicklyanddiligently
GivesthebigdataprojectamorerealisticchanceofdeliveringapositiveReturnonInvestment(ROI)andafinancialpaybackin12monthsorless
Abusinessinitiativesupportsthebusinessstrategyandhasthefollowingcharacteristics:
Criticaltoimmediate-termbusinessand/orfinancialperformance(usually9-to12-monthtimeframe)
Communicated(eitherinternallyorpublicly)
Cross-functional(involvesmorethanonebusinessfunction)
Ownedorchampionedbyaseniorbusinessexecutive
Hasameasurablefinancialgoal
Hasawell-defineddeliverytimeframe
Deliverscompellingfinancialorcompetitiveadvantage
Forexample,awirelessprovidermighthaveakeybusinessinitiativetoreducetheattritionrateamongitsmostprofitablecustomersby20percentoverthenext12months.Orapublicutilitymighthaveakeybusinessinitiativetoimprovecustomersatisfactionbyacertainnumberofbasispointswhilereducingwaterconsumptionby20percent.
Therearemanyplacestouncoveranorganization'skeybusinessinitiatives.Ifthecompanyispublic,thentheorganization'sfinancialstatementsareagreatstartingpoint.Forbothprivateandnon-profitorganizations,thereisabevyofpubliclyavailablesourcesforidentifyinganorganization'skeybusinessinitiatives,including:
Annualreports
10-K(filedannually)
10-Q(filedquarterly)
Quarterlyanalystcalls
Executivepresentationsandconferences
Executiveblogs
Newsreleases
Socialmediasites
SeekingAlpha.com
WebsearchesusingGoogle,Yahoo,andBing
Thebestwaytograspthebigdatastrategydocumentprocessiswithahands-onexample.Andwhatbetterplacetotestthebigdatastrategydocumentthanwithoneofmyfavoriterestaurants,Chipotle!
What'sImportanttoChipotle?Let'sstartthebusinessstrategyanalysisprocessbyreviewingChipotle'sannualreporttodeterminewhat'simportanttothecompanyfromabusinessstrategyperspective.Figure3.3showsanabbreviatedversionoftheChipotlePresident'sLettertoShareholdersfromthe2012annualreport.
Figure3.3Chipotle's2012lettertotheshareholders
FromthePresident'sLetter,wecanidentifyatleastfourkeybusinessinitiativesforthecomingyear:
Improveemployee(talent)acquisition,maturation,andretention(whichisespeciallyimportantforanorganizationwhere90percentofitsmanagementhascomeupthroughtheranksofthestore).
Continuedouble-digitrevenuegrowth(up20.3percentin2012)byopeningnewstores(opened183over100in2012).
Increasesamestoresalesgrowth(7.1percentgrowthin2012).
ImprovemarketingeffectivenessonbuildingtheChipotlebrandandengagingwithcustomersinwaysthatcreatestronger,deeperbonds.
Whileanyfourofthesebusinessinitiativesareripeforthebigdatastrategydocument,fortheremainderofthisexercise,we'llfocusonthe“increasesamestoresales”businessinitiativebecauseincreasingsalesofabusinessentityoroutletisrelevantacrossanumberofdifferentindustries(i.e.,hospitality,gaming,banking,insurance,retail,highereducation,healthcareproviders).
IdentifyKeyBusinessEntitiesandKeyDecisionsAfteridentifyingourtargetedbusinessinitiative,thenextstepistoidentifythekeybusinessentitiesthatareimportanttothetargetedbusinessinitiative(“increasesamestoresales”).Businessentitiesarethestrategicnounsaroundwhichthetargetedbusinessinitiativemustfocus.Youprobablywon'thavemorethanthreetofivebusinessentities,orstrategicnouns,foranysinglebusinessinitiative.
NOTE
Itisaroundthesebusinessentitiesthatwearegoingtowanttocapturethebehaviors,tendencies,patterns,trends,preferences,etc.attheindividualentitylevel.Forexample,acreditcardcompanywouldwanttocaptureBillSchmarzo'sspecifictravelandbuyingpatternsandtendenciesinordertobetterdetectfraudandimprovemerchantmarketingoffers.
Figure3.4showsthetemplatethatwearegoingtousetosupportthebigdatastrategydocumentprocess.Wehavealreadycapturedourtargeted“increasesamestoresales”businessinitiative.
Figure3.4Chipotle's“increasesamestoresales”businessinitiative
Takeamomenttowritedownwhatyouthinkmightbethekeybusinessentitiesorstrategicnounsforthe“increasesamestoresales”businessinitiative:
HerearethreebusinessentitiesthatIcameupwith:
Stores
Localevents(sporting,entertainment,social)
Localcompetitors
NOTE
Ididnotinclude“customers”asoneofmybusinessentitiesforChipotle.Customersaretypicallyaleadingbusinessentitycandidate;however,asofthewritingofthisbook,Chipotledidnothaveacustomerloyaltyprogram.ThelackofacustomerloyaltyprogramwouldmakeitdifficultforChipotletoidentifyindividualcustomerbehaviors,tendencies,patterns,trends,andproductpreferences.
Next,foreachoftheidentifiedbusinessentities,brainstormtheanalyticinsightsthatyoumightwanttocaptureforeachoftheindividualbusinessentities.Thatis,whatisitthatyou'dliketoknowabouteachindividualbusinessentitythatcouldsupportthestrategicbusinessinitiative?Foreachofthekeybusinessentitieslistedbelow,jotdownsomeanalyticinsightsthatyouwouldliketoknowabouteach:
Stores
Localevents
Localcompetitors
HerearesomepotentialanalyticinsightsthatIwouldliketocaptureabouteachoftheChipotlebusinessentities:
Stores—Foreachindividualstore,understandin-storetrafficpatterns,nearbycustomerdemographics,mostpopularmarketbasketcombinations,customerproductpreferences(bytimeofday,dayofweek,seasonality,andholidays),weatherconditions,outsidetrafficconditions,localeconomicsituation,localhomevalues,nearbyschoolsandcolleges,Yelprating,socialmediasentiment,andsoforth.
LocalEvents—Foreachindividuallocalevent,understandthetypeofevent,thefrequencyofevent,wheneventoccurs(timeofday,dayofweek,timeofyear),eventstartandstoptimes,thenumberofattendees,thedemographicsofparticipantsandattendees(age,gender),eventadministrator/coordinator,eventsponsors,andsoon.
LocalCompetitors—Foreachindividuallocalcompetitor,understandtypeofcompetitor,sizeofcompetitor,chainormom-and-popcompetitor,distancefromcompetitor,typeoffoodserved,typeofservice,pricepoints,Yelpratings,socialmediasentiment,lengthoftimeinservice,customerdemographics,etc.
Next,identifythekeybusinessdecisionsthatneedtobemadeaboutthekeybusinessentitiesinsupportofthetargetedbusinessinitiative.Thatis,whatdecisionsdoesChipotlestoreandcorporatemanagementneedtomakeaboutthebusinessentitiestosupportthe“increasesamestoresales”businessinitiative?Thisisagreatopportunitytobrainstormthedecisionswithyourkeybusiness
stakeholders,thosepeopleorworkerswhoimpactorareimpactedbythekeybusinessinitiative.
Hereareexamplesofsomekeybusinessdecisionsbybusinessentity:
Businessentity:Stores
Howmuchstaffing,inventory,andingredientsdoIneedfortheupcomingweekendgiventhelocalevents?
Howmuchstaffing,inventory,andingredientsdoIneedgivenupcomingholidaysandseasonalevents?
Howmuchstaffing,inventory,andingredientsdoIneedforFridaylocalbusinesscatering?
Howmuchstaffing,inventory,andingredientsdoIneedforlocalhighschooldemand(schoolin-sessionversusschoolout-of-session)?
Whataretheidealhoursofoperationsfortheupcominghighschoolfootballseason?
Businessentity:Localevents
HowmuchadditionalstaffingwillIneedforwhichlocalevents?
HowmuchadditionalinventoryandingredientswillIneedforwhichlocalevents?
WhichlocaleventsdoIwanttosponsorandatwhatcost?
WhatpromotionsdoIwanttoofferinsupportofthelocalevents?
Businessentity:Localcompetitors
Whatarethemosteffectiveoffersorpromotionstocountercompetitorsthataretakingcustomersawayfromme?
Whataremycompetitors'mosteffectivepromotions?
WhatpricingandproductionchangesdoIneedtomakeinlightofkeycompetitoractivities?
Towhichlocalcompetitors'promotionsdoIneedtorespond?
What'sthemosteffectiveresponseorpromotiongivencompetitors'promotionalactivities?
Takeashotbelowatbrainstormingsomeadditionalbusinessdecisionsthatthestoremanagerorcorporatemanagementmighthavetomakeaboutthekeybusinessentitiestosupportthe“increasesamestoresales”businessinitiative.
Businessentity:Stores
Decision:
Decision:
Businessentity:Localevents
Decision:
Decision:
Businessentity:Localcompetitors
Decision:
Decision:
NOTE
Someofthedecisionswillbeverysimilar.That'sgoodbecauseitallowstheorganizationtoapproachthedecisionsfrommultipleperspectives.
Figure3.5showsiswhattheChipotlebigdatastrategydocumentlooksatthispointintheexercisewiththeadditionofsomeofthebusinessdecisions.
Figure3.5Chipotlekeybusinessentitiesanddecisions
IdentifyFinancialDrivers(UseCases)Nextyouwanttogroupthedecisionsintocommonusecasesor“commonthemes.”Thatis,identifyandclusterthosedecisionsthatseemsimilarintheirbusinessandfinancialobjectives.Theresultingusecasesarethefinancialdriversorthe“howdowemakemoremoney”opportunitiesforourtargetedbusinessinitiative.
CROSS-REFERENCE
Whileitishardtoactuallydothisgroupingprocessinabook,theuseoffacilitationtechniquestohelpbrainstormandgroupthesedecisionswillbecoveredintheFacilitationTechniquessectionofChapter13,“PowerofEnvisioning.”
ForChipotle's“increasesamestoresales”businessinitiative,thefollowingarelikelyfinancialdriversorusecases:
Increasestoretraffic(acquirenewcustomers,increasefrequencyofrepeatcustomers)
Increaseshoppingbagrevenueandmargins(cross-sellcomplementaryproducts,up-sell)
Increasenumberofcorporateevents(catering,repeatcateringevents)
Improvepromotionaleffectiveness(HalloweenBoo-ritto,Christmasgiftcards,graduation,holiday,andspecialeventgiftcards)
Improvenewproductintroductioneffectiveness(seasonal,holiday)
Theentirebigdatastrategydocumentprocesshasbeendesignedtouncovertheseusecases—toidentifythosefinancialdriversthatsupportourtargetedbusinessinitiative.Theusecasesandfinancialdriversarethepointofthebigdatastrategydocumentwherewefocustheorganizationonthe“MakeMeMoreMoney”bigdataopportunities.
Inaddition,theseusecasesarekeytoguidingthedatascienceteaminitsdataacquisition,datacleansing,dataenrichment,metricdiscovery,scorecreation,andanalyticmodeldevelopmentprocesses.Forexample,theChipotle“increasesamestoresales”usecasesmaytranslateintotheanalyticsshowninTable3.1:
Table3.1MappingChipotleUseCasestoAnalyticModels
ChipotleUseCases PotentialAnalyticModels
IncreaseStoreTraffic StoreMarketingEffectivenessStoreLayoutFlowAnalysisStoreRemodelingLiftAnalysisStoreCustomerTargeting
IncreaseShoppingBagRevenueandMargin
In-storeMerchandisingEffectivenessPricingOptimizationUp-sell/Cross-sellEffectivenessMarketBasketAnalysis
IncreaseNumberofCorporateEvents CampaignEffectivenessPipelineandSalesEffectivenessPricingOptimizationCustomerLifetimeValueScoreLikelihoodtoRecommendScore
ImprovePromotionalEffectiveness PromotionalEffectivenessPricingOptimizationMarketBasketAnalysisUp-sell/Cross-sellEffectiveness
ImproveNewProductIntroductions PricingOptimizationNewProductIntroductionsEffectivenessUp-sell/Cross-sellEffectiveness
NOTE
Youwilldiscoverthatmanyoftheanalyticmodelsdevelopedforoneusecasewillsupportotherusecases.Asorganizationsbuildouttheiranalyticassets,youwillfindopportunitiestoleveragetheseanalyticassets(data,dataenrichmenttechniques,analyticmodels)toaccelerateaddressingadditionalbigdatausecases.
Figure3.6showswhatthefinalChipotlebigdatastrategydocumentlookslikeatthispointintheexercise.
Figure3.6CompletedChipotlebigdatastrategydocument
IdentifyandPrioritizeDataSourcesWiththeusecasesandfinancialdriversidentified,wearenowreadytomoveintothedataandmetricsenvisioningprocess.Wewanttobrainstormdatasources(regardlessofwhetherornotyoucurrentlyhaveaccesstothesedatasources)thatmightyieldnewinsightsthatsupportthetargetedbusinessinitiative.WewanttounleashthebusinessandITteams'creativethinkingtobrainstormdatasourcesthatmightyieldnewcustomer,product,store,campaign,andoperationalinsightsthatcouldimprovetheeffectivenessofthedifferentusecases.
Forexample,Chipotledatasourcesthatwereidentifiedaspartoftheenvisioningexercisescouldinclude:
PointofSalesTransactions
MarketBaskets
ProductMaster
StoreDemographics
CompetitiveStoresSales
StoreManagerNotes
EmployeeDemographics
StoreManagerDemographics
ConsumerComments
Weather
TrafficPatterns
Yelp
Zillow/Realtor.com
Twitter/Facebook/Instagram
Twellow/Twellowhood
ZipCodeDemographics
EventBrite
MaxPreps
MobileApp
Butnotalldatasourcesareofequalbusinessvalueorhaveequalimplementationfeasibility.Thedatasourcesneedtobeevaluatedinlightof
Thebusinessvaluethatdatasourcecouldprovideinsupportoftheindividualusecase
Thefeasibility(orease)ofacquiring,cleaning,aligning,normalizing,enriching,andanalyzingthosedatasources
Sowewanttoaddtwoprocesses(worksheets)tothebigdatastrategydocumentprocessthatwillevaluatethebusinessvalueandimplementationfeasibilityofeachofthepotentialdatasources.
BuildingonourChipotlecasestudy,let'sfirstassessthepotentialbusinessvalueofthedifferentdatasourcesvis-à-vistheidentifiedusecases(seeFigure3.7).
Figure3.7BusinessvalueofpotentialChipotledatasources
You'dwanttogothroughagroupbrainstormingprocesswiththebusinessstakeholderstoassesstherelativevalueofeachdatasourcewithrespecttoeachusecase.Thebusinessusersownthebusinessvaluedeterminationbecausetheyarebestpositionedtobeabletounderstandandquantifythebusinessvaluethateachdatasourcecouldprovidetotheusecases.
NOTE
IlikeusingHarveyBalls(http://en.wikipedia.org/wiki/Harvey_Balls)inboththedatavalueandthefeasibilityassessmentcharts.TheHarveyBallsquicklyandeasilycommunicatetherelativevalueofeachdatasourcewithrespecttoeachusecase.
ReviewingthedatavalueassessmentchartinFigure3.7,youcanquicklyuncoversomekeyobservations,suchasthefollowing:
Detailedpoint-of-saledataisimportanttoalloftheusecases.
InsightsfromtheStoreDemographicsdataareimportanttofourofthefiveusecases.
MiningConsumerCommentshasasurprisingstrongimpactacrossfourofthefiveusecases.
Localeventsdataisimportanttothe“increasestoretraffic”and“improve
promotionaleffectiveness”usecasesbuthaslittleimpactonthe“increaseshoppingbagrevenue,”“increasenumberofcorporateevents,”or“improvenewproductintroductioneffectiveness”usecases.
Next,youwanttounderstandtheimplementationfeasibilityforeachofthepotentialdatasources.ThispartoftheexerciseisprimarilydrivenbytheITorganizationsinceitisbestpositionedtounderstandtheimplementationchallengesandrisksassociatedwitheachofthedatasources,suchaseaseofdataacquisition,cleanlinessofthedata,dataaccuracy,datagranularity,costofacquiringthedata,organizationalskillsets,toolproficiencies,andotherriskfactors.TheimplementationfeasibilityassessmentchartforChipotle's“increasesamestoresales”businessinitiativelookslikeFigure3.8.
Figure3.8ImplementationfeasibilityofpotentialChipotledatasources
FromtheChipotleimplementationfeasibilityassessmentchartinFigure3.8,wecanquicklymakethefollowingobservations:
PointofSales,MarketBaskets,andStoreManagerDemographicsdataisreadilyavailableandeasytointegrate(likelyduetothemasterdatamanagementanddatagovernanceeffortsnecessarytoloadthisdataintoadatawarehouse).
ConsumerCommentsdata,whichwasveryvaluableinthebusinessvalueassessment,hasseveralimplementationrisks.Lackoforganizationalexperienceindealingwiththeunstructureddataislikelythesourceoftheserisks,whichmanifestitselfintheareasofdataacquisition,standardization,integration,cleanliness,accuracy,granularity,skillsets,andtoolproficiencies.
SocialMediadata,whichwasratedaboutmid-valueinthevalueassessmentexercise,alsolookstobearealchallenge.Manyofthesamecleanliness,accuracy,andgranularityissuesexist,withtheaddedissuethatthisisdatathatwillneedtobe“acquired”throughsomemeans.Probablynotthefirstdatasourceyouwanttodealwithinthisusecase.
LocalEventsdata,whichwasveryimportantinthe“increasestoretraffic”usecase,alsoposesmanychallenges.PullingdatafromsourcessuchasEventBriteandMaxPrepsmayrequirescreenscrapinginordertogetthelevelofdetailneededaboutaparticularlocalevent.WhilethesesitesmayprovideAPIstoeasethedataacquisitionprocess,manytimestheAPIsdon'tprovidethecompletedetailthatthedatascienceteammaywant.Andscreenscraping,whileaveryusefuldatascientisttool,posesallsortsofchallengesincleaningthedataafterscraping.
IntroducingthePrioritizationMatrixThefinalstepinthebigdatastrategydocumentprocessistotakethebusinessandITstakeholdersthroughausecaseprioritizationprocess.WhilewewillcovertheprioritizationmatrixindetailinChapter13,Iwanttointroducetheconcepthereasthenaturalpointofconcludingthebigdatastrategydocumentprocess.
Aspartofthebigdatastrategydocument,wehavenowdonetheworktoidentifytheusecasesthatsupporttheorganization'skeybusinessinitiative,brainstormedadditionaldatasources,anddeterminedtheapplicabilityofthosedatasourcesfromabusinessvalueandimplementationfeasibilityassessment.Wearenowreadytoprioritizetheusecasesbasedontheirrelativebusinessvalueandimplementationfeasibilityoverthenext9to12months(seeFigure3.9).
Figure3.9Chipotleprioritizationofusecases
WARNING
Itiscriticaltoremembertousethenext9-to12-monthtimeframeasthebasisfortheprioritizationprocess.The9-to12-monthtimeframeensuresthatthebigdataprojectisdeliveringimmediate-termbusinessvalueandbusinessrelevancewithasenseofurgency,anditkeepsthebigdataprojectfromwanderingintoa“boiltheocean”typeofprojectthatisdoomedtofailure.
UsingtheBigDataStrategyDocumenttoWintheWorldSeriesTotestourcompetencywiththebigdatastrategydocument,let'sexamineafuncasestudy.Let'ssaythatyouarethegeneralmanagerofaprofessionalbaseballteam.Thecorporatemissionoftheorganizationisto“WintheWorldSeries”(Onapersonalnote,Iamconvincedthatthereareteamswherethegoalisnotto.“WintheWorldSeries”butinsteadtojustmakeprofitswithoutregardtothequalityofplayonthefield,butthat'sthecynicalChicagoCubsfaninmecomingout.)
Asinanycommercialbusiness,therearemultiplebusinessstrategiesthataprofessionalbaseballorganizationcouldpursueinordertoachievethe“WintheWorldSeries”mission,including:
Spendhugeamountsofmoneyforveteran,proven,top-performingplayers(NewYorkYankees,BostonRedSox,LosAngelesDodgers);
Spendhugeamountsofmoneyforover-the-hill,inconsistentperformingplayers(ChicagoCubsaremovingawayfromthisstrategy,thoughtheNewYorkMetsseemtobetryingtoperfectthisapproach);
Spendtopmoneytohaveoutstandingstartingandreliefpitching,andscroungetogetherenoughtimelyhittingtowingames(SanFranciscoGiants);
Spendtopmoneytohaveoutstandinghittingandhopethatyoucanpiecetogetherenoughpitchingtowingames(TexasRangers,LosAngelesAngels);
Spendmiserlyamountsofmoneyandrelyonyourminorleaguesystemandsabermetricstodraftanddevelophigh-quality,low-paidrookieplayers
(OaklandA's,MinnesotaTwins,KansasCityRoyals,TampaBayRays).
Sousingthebigdatastrategydocument,let'splayGeneralManageroftheSanFranciscoGiantstoseewhattheGiantswouldneedtodotoachieveitsgoalto“WintheWorldSeries.”
Thefirststepistoclearlyarticulateourbusinessstrategy.InthecaseoftheSanFranciscoGiants,I'dsaythatthebusinessstrategythatwouldsupportits“WintheWorldSeries”corporatemissionwouldbe“Acquireandretainhigh-performing,sustainable,startingpitchingcoupledwithsmallballhittingtocompeteannuallyfortheWorldSeries.”
Let'srememberthatabusinessstrategyistypicallytwotothreeyearsormoreonthehorizon.Ifyouchangeyourbusinessstrategyannually,thenthat'snotastrategy(soundsmorelikeafad).Butcompaniesdoandshouldchangetheirbusinessstrategiesbasedonchangingeconomicconditions,marketforces,customerdemographictrends,technologychanges,andevennewinsightsfrombigdataanalytics(whichmightrevealthatstrongpitchingtends—fromastatisticalperspective—tobeatstronghittinginthepostseason).ThisisexactlywhattheSanFranciscoGiantsseemtohavedoneastheteamhasmovedaway
froma“longball”baseballstrategyintryingtoreachtheWorldSeries(bysurroundingBarryBondswithotherstrongbatters)toitscurrent“superiorstartingpitching”businessstrategy.
Solet'susethebigdatastrategydocumenttoseewhatwe(theSanFranciscoGiants)needtodotoexecuteagainstthe“superiorstartingpitchingtowintheWorldSeries”businessstrategy.
First,wewanttodecomposethebusinessstrategyintothesupportingbusinessinitiatives.Remember,businessinitiativesarecross-functionalplans,typically9to12monthsinlength,withclearlydefinedfinancialorbusinessmetrics.Forourbaseballexercise,I'monlygoingtolisttwobusinessinitiatives(thoughIcanthinkoftwoorthreemorethatalsoneedtobeaddressedinthecaseoftheSanFranciscoGiants):
Acquireandmaintaintop-tierstartingpitching
Perfectsmallballoffensivestrategy
Next,Iwanttoidentifythekeybusinessentities,orstrategicnouns,aroundwhichIneedtocaptureanalyticinsightstosupportthetargetedbusinessinitiatives.Forourcasestudy,thiscouldinclude:
Pitchers.Developdetailedknowledgeandpredictiveinsightsintoindividualstartingpitchers'in-gameandsituationalpitchingtendenciesandperformanceasmeasuredbyqualitystarts(pitchesatleastsixinningsinastart),EarnedRunAverage(ERA),WalksandHitsperInningPitched(WHIP),strikeout-to-walkratio,andnumberofhomerunspernineinningsbycompetitors,byspecificbatters,byballpark,byweatherconditions,bydaysrest,bynumberofgamesintotheseason,etc.
Batters.DevelopdetailedknowledgeandpredictiveinsightsintobattertendenciesandbehaviorsasmeasuredbyOnBasePercentage(OBP),battingaveragewithrunnersinscoringposition,stealingpercentage,hit-and-runexecution,andsacrificehittingeffectivenessbycount,bynumberofouts,bycompetitivepitcher,bywho'sonbase,bydayversusnight,etc.
WecouldalsodevelopprofilesforCoaches,Competitors,andmaybeevenStadiums,butforreasonsoftimeandsimplicity,we'lljuststicktothePitchersandBattersbusinessentitiesforthisexercise.
Next,let'sbrainstormthedecisionsandquestionsthatweneedtoaddressaboutourkeybusinessentitiestosupportourtargetedbusinessinitiative:.
AcquireandMaintainTop-TierStartingPitching
Whoaremymosteffectivestartingpitchers?
WhichstartingpitchersdoIre-signandforhowmuchmoneyandlengthofcontract?
WhichfreeagentstartingpitchersdoIwanttosignandforhowmuch
moneyandlengthofcontract?
WhichstartingpitchersdoIwanttotrade,andwhatismyexpectationofthevaluethattheywillbringinthemarket?
Whichcompetitors'startingpitchersdoIwanttotrytoacquireviatradesandforhowmuch?
Whichofmyminorleaguepitchersareprojectedtobemymosteffectivebigleaguestartingpitchersoverthenexttwotothreeyears?
Whatismystartingpitchingrotation?
Whichstartingpitchersarecurrentlystruggling,andwhatarethelikelyreasonsforthisstruggling?
WhichstartingpitchersshouldIrestbyhavingthemmissastart?
PerfectSmallBallOffensiveStrategy
Whichbattersaremosteffectiveingettingonbase?
Whichbattersaremosteffectiveinadvancingrunners?
Whichbattersaremosteffectiveindrivinginrunnersfromthirdbase?
Whichplayersaremybestbasestealers?
Whoaremymosttimelyhittersinlate-in-the-gamepressuresituations?
Whichminorleaguebattersaremosteffectiveingettingonbase?
Whichminorleaguebattersaremosteffectiveinadvancingrunners?
Whichminorleaguebattersaremosteffectiveindrivinginrunnersfromthirdbase?
Whichminorleagueplayersexcelatbasestealing?
Whichfreeagentbattersaremosteffectiveingettingonbase?
Whichfreeagentbattersaremosteffectiveinadvancingrunners?
Whichfreeagentbattersaremosteffectiveindrivinginrunnersfromthirdbase?
Whichfreeagentplayersexcelatbasestealing?
IfyouareabaseballjunkielikeIam,takeamomenttolistsomeadditionaldecisionsandquestionsthatyou'dliketoaddressforthetwotargetedbusinessinitiatives:top-tierstartingpitchingandsmallballoffense.
Nowwecangroupthedecisions(andquestionsinthisexercise)intocommonusecasesthatcouldinclude:
Improvestartingpitchingproficiencybyoptimizingtrades,freeagentsigning,minorleaguepromotions,andcontractextensions(costversusstartingpitchingperformanceeffectiveness)
Preservestartingpitchingeffectivenessthroughouttheregularseasonandplayoffsbyoptimizingpitchcounts,pitcherrotations,pitcherrests,etc.
Improvebattingandsluggingproficiencybyoptimizingtrades,freeagentsignings,minorleaguepromotions,andcontractextensions
Increasein-game“smallball”runsscoredeffectivenessthroughtheoptimalcombinationofbatters,hitting,stealing,baserunning,andsacrificehittingstrategies
Accelerateminorleagueplayerdevelopmentthroughplayerstrengthandconditioningtraining,gamesituations,andminorleagueassignments
Optimizein-gamepitchselectiondecisionsthroughimprovedunderstandingofbatterandpitchermatchups
Figure3.10showstheresultingbigdatastrategydocument.
Figure3.10SanFranciscoGiantsbigdatastrategydocument
Next,wewouldbrainstormthepotentialdatasourcestosupporttheusecases,including:
PersonnelPlayerHealth.Thisshouldincludepersonalhealthhistory(weight,health,BMI,injuries,therapy,medications),physicalperformancemetrics(60-footdashtime,longtossdistances,fastballvelocity),andworkouthistory(benchpress,deadlift,crunchesandpushupsin60seconds,frequencyandrecencyofworkouts).
StartingPitcherPerformance.Thisshouldincludeadetailedpitchinghistoryincludingnumberofpitchesthrown,strike-to-ballratio,strikeouts-to-
walkratio,walksandhitsperinningpitched,ERA,firstpitchstrikes,battingaverageagainst,andsluggingpercentagepertimeofyear,peropponent,andpergame.
BatterPerformance.Thisshouldincludeadetailedbattinghistoryincludingbattingaverage,walks,on-basepercentage,sluggingpercentage,strikeouts,hittingintodoubleplays,buntingsuccesspercentage,on-basesluggingpercentage,winsabovereplacement,andhittingwithrunnersonbasepertimeofyear,peropponent,andpergame.
Competitors'PitchingPerformance.Thisshouldincluderecenthistoryofcompetitors'pitchers'performanceincludingnumberofpitchesthrown,strike-to-ballratio,strikeouts-to-walkratio,walksandhitsperinningpitched,ERA,firstpitchstrikes,battingaverageagainst,andsluggingpercentageagainstpertimeofyear,peropponent,andpergame.
Competitors'HittingPerformance.Thisshouldincluderecenthistoryofcompetitors'batters'hittingperformanceincludingbattingaverage,walks,on-basepercentage,sluggingpercentage,strikeouts,hittingintodoubleplays,buntingsuccesspercentage,on-basesluggingpercentage,winsabovereplacement,andhittingwithrunnersonbasepertimeofyear,peropponent,andpergame.
StadiumInformation.Thisshouldincludelengthdownthelines,lengthtodeepcenter,averagetemperaturesbydayofyear,averagehumiditybydayofyear(veryimportantforknuckleballers),elevation,etc.
Thereareotherdatasourcesthatcouldalsobeconsideredsuchasweatherconditionsatgametime,performancenumbersofthegame'stophistoricalpitchers(forbenchmarkingpurposes),performancenumbersforgame'stophistoricalbatters(again,forbenchmarkingpurposes),andeconomiccosts(salary,bonuses,etc.).
Inarealbigdatastrategydocumentexercise,wewouldcontinuetoevaluateeachofthedifferentdatasourcesfromabusinessvalueandimplementationfeasibilityperspectivevis-à-viseachoftheidentifiedusecases.Thenwewouldgothroughaprioritizationmatrixprocesstoensurethatboththebusinessusers(coaches,frontofficemanagement)andITagreeonwhichusecasestostartwith.
SoplayingtheSanFranciscoGiants'GeneralManagerwasafunexercisethatprovidedanotherperspectiveonhowtousethebigdatastrategydocumentnotonlytobreakdownyourorganization'sbusinessstrategyandkeybusinessinitiativesintothekeybusinessentitiesandkeydecisionsbuttoultimatelyuncoverthesupportingdataandanalyticrequirements.
SummaryThischapterfocusedonthebigdatastrategydocumentandkeyrelatedtopicsincluding:
Introducedtheconceptofabusinessinitiativeandprovidedsomeexamplesofwheretofindthesebusinessinitiatives
Introducedthebigdatastrategydocumentasaframeworkforhelpingorganizationstoidentifytheusecasesthatguidewhereandhowtheycanstarttheirbigdatajourneys
Providedahands-onexampleofthebigdatastrategydocumentinactionusingChipotle,achainoforganicMexicanfoodrestaurants
Introducedworksheetstohelporganizationstodeterminethebusinessvalueandimplementationfeasibilityofthedatasourcesthatcomeoutofthebigdatastrategydocumentprocess
IntroducedtheprioritizationmatrixasatooltohelpdrivebusinessandITalignmentaroundthetoppriorityusecasesovera9-to12-monthwindow
Hadsomefunbyapplyingthebigdatastrategydocumenttotheworldofprofessionalbaseball
Thischapteroutlinedthebigdatastrategydocumentasaframeworktohelpanorganizationidentifywhereandhowtostartitsbigdatajourneyinsupportoftheorganization's9-to12-monthkeybusinessinitiatives.Thebigdatastrategydocumentisatooltoensurethatyourbigdatajourneyisvaluableandrelevantfromabusinessperspective.
ToswingbackaroundtotheChipotlecasestudy,Figure3.11showssomeinitialresultsofthecompany'ssuccesswithits“increasesamestorerevenues”businessinitiative.(Formoreinformation,seethearticleatwww.trefis.com/stock/cmg/articles/210221/chipotles-sales-surge-on-traffic-
gains-high-food-costs-dent-margins/2013-10-21.)
Figure3.11Chipotle'ssamestoresalesresults
It'snicetoseethatourChipotleusecaseactuallyhasarealbusinessstorybehindit.Butthenagain,everybigdatainitiativeshouldhavearealbusinessstorybehindit.Remember,organizationsdon'tneedabigdatastrategyasmuchastheyneedabusinessstrategythatincorporatesbigdata.
HomeworkAssignmentUsethefollowingexercisestoapplythebigdatastrategydocumenttoyourorganization(oroneofyourfavoriteorganizations).
Exercise#1:Startbyidentifyingyourorganization'skeybusinessinitiativesoverthenext9to12months.
Exercise#2:Selectoneofyourbusinessinitiatives,andthenbrainstormthekeybusinessentitiesorstrategicnounsthatimpactthatselectedbusinessinitiative.Asareminder,itisaroundtheindividualbusinessentitiesthatwewanttocapturethebehaviors,tendencies,patterns,trends,preferences,etc.attheindividualbusinessentitylevel.
Exercise#3:Next,brainstormthekeydecisionsthatneedtobemadeabouteachkeybusinessentitywithrespecttothetargetedbusinessinitiative.
Exercise#4:Nextwewanttogroupthedecisionsintocommonusecases;thatis,clusterthosedecisionsthatseemsimilarintheirbusinessorfinancialobjectives.
Exercise#5:Thenbrainstormthedifferentdatasourcesthatyoumightneedtosupportthoseusecases:
Identifypotentialinternalstructured(transactionaldatasources,operationaldatasources)andunstructured(consumercomments,notes,workorders,purchaserequests)datasources
Identifypotentialexternaldatasources(socialmedia,blogs,publiclyavailable,data.gov,websites,mobileapps)thatyoualsomightwanttoconsider
Exercise#6:Usethedataassessmentworksheetstodeterminetherelativebusinessvalueandimplementationfeasibilityofeachoftheidentifieddatasourceswithrespecttothedifferentusecases.
Exercise#7:Finally,usetheprioritizationmatrixtorankeachoftheusecasesvis-à-visbusinessvalueandimplementationfeasibilityoverthenext9to12months.
Chapter4TheImportanceoftheUserExperienceTheuserexperienceisoneofthesecretstobigdatasuccess,andoneofmyfavoritetopics.Iforganizationscannotdeliverinsightstoitsemployees,managers,partners,andcustomersinawaythatisactionable,thenwhyevenbother.OneofthekeystosuccessintheBigDataMBAisto“beginwithanendinmind”withrespecttounderstandinghowtheanalyticresultsaregoingtobedeliveredtofrontlineemployees,businessmanagers,channelpartners,andcustomersinawaythatisactionable.TheBigDataMBAseeksto“closetheanalyticsloop”withrespecttodeliveringinsightstothekeybusinessstakeholdersviaanactionableuserexperience(UEX).
Chapter4Objectives
Reviewanexampleofan“unintelligent”userexperience.
Highlighttheimportanceof“thinkingdifferently”withrespecttocreatinganactionabledashboardversusbuildingatraditionalBusinessIntelligencedashboard.
Reviewasampleactionabledashboardtargetingfrontlinestoremanagers.
Reviewanothersampleactionabledashboard(financialadvisordashboard)targetingbusiness-to-businesschannelpartners.
ThischapterwillchallengethetraditionalBusinessIntelligenceapproachestobuildingdashboardsbyseekingtoleverageanalyticinsights(e.g.,recommendations,scores,rules)tocreateactionabledashboardsthatempowerfrontlineemployees,guidechannelpartners,andinfluencecustomerbehaviors.
TheUnintelligentUserExperienceOneofmyfavoritesubjectsagainstwhichIlovetorailisthe“unintelligent”userexperience.Thisisaproblemcausedby,inmyhumbleopinion,thelackofeffortbyorganizationstounderstandtheirkeybusinessstakeholderswellenoughtobeabletodeliveractionableinsightsinsupportoftheorganizations'keybusinessinitiatives.Andthisuserexperienceproblemisoftenonlyexacerbatedbybigdata.
Hereisareal-worldexampleofhowNOTtoleverageactionableanalyticsinyourorganization'sengagementswithyourcustomers.Thenameshavebeenchangedtoprotecttheguilty.
MydaughterAmeliagotthee-mail(seeFigure4.1)fromourcellphoneproviderwarningherthatshewasabouttoexceedhermonthlydatausagelimitof2GB.Shewasveryupsetthatshewasabouttogooverherlimit,anditwouldstartcostingher(actually,me)anadditional$10.00perGBoverthelimit.(Note:The“Monday,August13,2012”dateinthefigurewillplayanimportantroleinthisstory.)
Figure4.1Originalsubscribere-mail
IaskedAmeliawhatinformationshethoughtsheneededinordertomakeadecisionaboutalteringherFacebook,Pandora,Vine,Snapchat,andInstagramusage(sincethosearethemaindatahogculpritsinhercase)sothatshewouldnotexceedherdataplanlimits.Shethoughtforawhileandthensaidthatshethoughtsheneededthefollowinginformation:
Howmuchofherdataplandoesshehaveleftinthecurrentmonth?
Whendoeshernewmonthorbillingperiodstart?
Athercurrentusagerate,whenwillsherunoverforthismonth?
CapturetheKeyDecisionsThisusecaseprovidesagoodexampleoftheprocessthatorganizationscanemploytoidentifythekeybusinessdecisionsthattheorganization'skeybusinessstakeholdersneedtoaddressinordertosupporttheorganization'skeybusinessinitiatives.Hereisanabbreviatedprocess(thatissimilartotheprocesswejustlearnedwiththebigdatastrategydocumentinChapter3):
Step1:Understandyourorganization'skeybusinessinitiativesorbusinesschallenge(inthisexampletheinitiativeis“Don'texceedyourmonthlydatausageplan”).
Step2:Identifyyourkeybusinessstakeholders(Ameliaandmeinthisexample).
Step3:Capturethedecisionsthatthekeybusinessstakeholdersneedtomakeinordertosupporttheorganization'skeybusinessinitiatives(e.g.,alterFacebook,Pandora,Vine,Snapchat,andInstagramusage).
Step4:Brainstormthequestionsthatthekeystakeholdersneedtoanswertofacilitatemakingthedecisions(HowmuchofmydataplandoIhaveleft?Whendoesmynewmonthstart?WhenwillIrunoverformycurrentperiodgivenmycurrentusage?).
SupporttheUserDecisionsUnderstandingtherelationshipbetweenyourbusinessinitiativeandthesupportingdecisionsandquestionsthatneedtobeaddressediskeytocreatingauserexperiencethatprovidestherightinformation(oractionableinsights)totherightusertomaketherightdecisionsattherighttime.
Sotocontinuethecellularproviderstory,IwentonlinetoresearchAmelia'skeyquestions:
Question Answer
HowmuchofmydataplandoIhaveleft?
CurrentusageasofAugust13is65percent
Whendoesmynewmonthstart?
OnAugust14,whichis1dayfromtoday
WhenamIlikelytorunovermydataplanlimit?
Theprobabilityofyouoverrunningyourdataplanis0.00001percent…orNEVER!!
Sogiventheresultsofmyanalysis,Ameliahadnothingtoworryaboutasshewouldhavetoconsumenearlyasmuchbandwidthinherfinal24hoursasshehadconsumedtheprevious30days.Theprobabilityofthathappening:nearzero(oraboutthesameprobabilityofmebeatingUsainBoltinthe100-meterdash).Thebottomlineisthatthee-mailshouldhaveneverbeensent.TherewasnothingforAmeliatoworryabout,anditonlycausedunnecessaryangst.Notthesortofuser
experiencethatorganizationsshouldbetargeting.
OurcellularprovidercouldhaveprovidedauserexperiencethathighlightedtheinformationandinsightsnecessarytohelpAmeliamakeadecisionaboutdatausage.Theuserexperiencecouldhavelookedsomethinglikethee-mailmessageshowninFigure4.2.
Figure4.2Improvedsubscribere-mail
Thissamplee-mailhasalltheinformationthatAmelianeedstomakeadecisionaboutusagebehaviorsincluding:
Actualusagetodate(65percent)
Aforecastofusagebytheendoftheperiod(67percent)
Thedatewhenthedataplanwillreset(in1dayonAugust14)
Withthisinformation,Ameliaisnowinapositiontomakethe“right”decisionaboutherdataplanusage.
ConsumerCaseStudy:ImproveCustomerEngagementButlet'stakethiscasestudyonestepfurther.Let'ssaythatthereactuallywasgoingtobeaproblemwithAmelia'susageandherdataplan.Whatif82percentofdatausagehadbeenconsumedwith50percentofusageperiodremaining?Howdowemaketheuserexperienceandthecustomerengagementuseful,relevant,andactionable?
Themock-upshowninFigure4.3offersonepotentialapproachbasedonthesameprinciplesdiscussedearlier:provideenoughinformationtohelpAmeliachangeherusagebehaviors.However,FutureTelcocouldalsotaketheuserexperienceandcustomerengagementonestepfurtherandofferhersomerecommendationstoavoidthedataplanoverage.
Forexample,FutureTelcocouldofferprescriptiveadviceabouthowtoreducedataconsumptionsuchas:
Transitioningtoappsthataremoredatausageefficient(i.e.,transitioningfromPandoratoRdiooriHeartRadioforstreamingradio,assumingthatRdioandiHeartRadioaremoreefficientintheirusageofthedatabandwidth)
Turningoffappsinthebackgroundthatareunnecessarilyconsumingdatasuchasmappingapps(likeAppleMaporWaze)orappsthatareusingGPStracking
FutureTelcocouldevenofferAmeliaoptionstoavoidpayinganoveragepenalty(seeFigure4.3)suchas:
Purchasea1-monthdatausageupgradefor$2.00(whichischeaperthanthe$10overagepenalty)
Upgradeexistingcontract(covering6months)for$10.00
Figure4.3Actionablesubscribere-mail
Butwait,thereisevenmorethatFutureTelcocoulddotoimprovethecustomerexperience.FutureTelcocouldanalyzeAmelia'sappusagetendenciesandrecommendnewappsbasedonotherappsthatuserslikeAmeliause,similartowhatAmazonandNetflixdo(seeFigure4.4).
Figure4.4Apprecommendations
Thislevelofcustomerintimacycanopenupallsortsofnewmonetizationopportunitiessuchas:
Leverageyourcustomer'susagepatternsandbehaviorstorecommendappsthatmovetheuserintoamoreprofitable,high-retentionusercategory
Helpappdeveloperstobemoresuccessfulwhilecollectingreferralfees,co-
marketingfees,andothermonetizationideasthatalignwiththeappdevelopers'businessobjectives
Cellularprovidersarenotaloneinmissingopportunitiestoleveragecustomerinsightsinordertoprovideamorerelevant,moremeaningfulcustomerexperience.Manyorganizationsaresittingongoldminesofinsightsabouttheircustomers'buyingandusagepatterns,tendencies,propensities,andareasofinterest,butlittleofthatinformationisbeingpackagedanddeliveredinamannerthatimprovestheuserexperience.Bigdataoftenonlyexacerbatesthisproblem.Organizationswilleitherlearntoleveragebigdataasanopportunitytoimprovetheiruserexperience,ortheywillgetburiedbythedataandcontinuetoprovideirrelevantandevenmisleadingcustomerexperiences.
BusinessCaseStudy:EnableFrontlineEmployeesIhadtheopportunitytorunavisionworkshopforagroceryretailer.Thegoalofthesessionwastoidentifyhowthegrocerychaincouldleveragebigdataandadvancedanalyticstodeliveractionableinsights(orrecommendations)tostoremanagersinordertohelpthemimprovestoreperformance.
Bigdatacantransformthebusinessbyenablingacompletelynewuserexperience(UEX)builtaroundinsightandrecommendationsversusjusttraditionalBusinessIntelligencechartsandtables.Retailers,likemostorganizations,canleveragedetailed,historicaltransactionaldata—coupledwithnewsourcesof“right-time”datalikelocalcompetitors'promotions(e.g.,“bestfooddays,”whichisthedaywhengrocerystoresposttheirweeklypromotions),weather,andevents—touncovernewinsightabouttheircustomers,products,merchandising,competitorsandoperations.Bigdataprovidesorganizationstheabilityto(1)rapidlyingestthesenewsourcesofcustomer,product,andoperationaldataandthen(2)leveragedatasciencetoyieldreal-time,actionableinsights.
Let'swalkthroughanexampleofintegratingbigdatawithatraditionalBIdashboardtocreateamoreactionableuserexperiencethatempowersfrontlineemployeesandmanagers.
StoreManagerDashboardWestartwithatraditionalBusinessIntelligencedashboard.Thisdashboardprovidesthekeyperformanceindicators(KPIs)andmetricsagainstwhichthestoremanagermeasurestheperformanceofthestore.ThedashboardcanalsopresentsalesandmargintrendsandpreviousperiodcomparisonsforthoseKPIs.ThisisprettystandardBusinessIntelligencework(seeFigure4.5).
Figure4.5TraditionalBusinessIntelligencedashboard
ThechallengewiththesetraditionalBIdashboardsisthatunlessyouareananalyst,it'snotclearwhatactiontheuserissupposedtotake.Arrowsup,sideways,anddown…Icanseemyperformance,butthedashboarddoesn'tprovideanyinsightstotellthestoremanagerwhatactionstotake.
Theotherchallengeisthatthestoremanager(likemostfrontlineemployeesandmanagers)likelydoesnothaveaBIorananalyticsbackground(likelyworkedhiswayuptheranksinthegrocerystore).Asaresult,UEXandtheactionableinsightsandrecommendationsarecriticalbecausethestoremanagerdoesnotknowhowtodrillintotheBIreportsanddashboardstouncoverinsightsbasedontherawdata.
WecanbuildonthistraditionalBIdashboardbyincludingmorepredictiveandprescriptiveanalytics.InFigure4.6,thetoppartofthenewactionabledashboard(SectionsAandB)leveragespredictiveanalyticsandprescriptiveanalyticstoproviderecommendationsthatcanhelpthestoremanagermakemoreprofitablebusinessdecisions.
Figure4.6Actionablestoremanagerdashboard
InFigure4.6,SectionAshowsspecificproduct,promotion,placement,andpricingrecommendationsbasedonthelayoutofaspecificstore.SectionBprovidesspecificrecommendationsconcerningpricing,merchandising,inventory,staffing,promotions,etc.forthestoremanager.
EachrecommendationinSectionBispresentedwithAccept[+]orReject[-]options.Ifthestoremanageracceptstherecommendationbyselecting[+],thatrecommendationisexecuted(e.g.,raiseprices,addpromotion,addinventory,etc.).However,ifthestoremanagerrejectstherecommendation,thentheactionabledashboardcapturesthereasonfortherejectionsothatthesupportinganalyticmodelscanbeconstantlyfine-tuned(seeFigure4.7).
Finally,thestoremanagercanselecttheMoreoptioninSectionBandmodifytherecommendationbasedonhisownexperience.Allowingthestoremanagertomodifytherecommendationsbasedonhispersonalexperiencesallowstheunderlyinganalyticmodelstoconstantlylearnwhatworksandwhatdoesn'tworkandbuildonthebestpracticesandlearningsfromtheorganization'smosteffectiveandtop-performingstoremanagers.
Figure4.7Storemanageraccept/rejectrecommendations
SampleUseCase:CompetitiveAnalysisOneusecaseforthestoremanagerdashboardenablesthestoremanagertomonitorlocalcompetitiveactivityandpromotions.Thegroceryindustryisverylocallycompetitive.Competitors,forthemostpart,arewithinjustafewmilesorevenblocksofeachother.Inthiscompetitiveanalysisusecase,thedashboardprovidesamapofthelocalgroceryandbeveragecompetitors(seeSectionCofFigure4.7).Hoveringoveranyparticularcompetitoronthemapimmediatelybringsupitscurrentmarketingflyer.Thestoremanager(orhisbusinessanalyst)canbrowsethrougheachofthecompetitors'flyersandmakecustomstorerecommendationsaroundpricing,promotion,merchandising,inventory,andstaffingbasedonthecompetitor'splans(seeFigure4.8).
Figure4.8Competitiveanalysisusecase
Liketheotherrecommendations,thestoremanager'scustomrecommendationswillbemonitoredforeffectivenesssothattheanalyticmodelscanbeconstantlyupdatedandrefined.
AdditionalUseCasesAdditionalusecasescaneasilybeaddedtothestoremanagerdashboard.Wecanaddausecaseforintegratingthelocaleventscalendarintothedashboardwithassociatedstoremanagerpricing,product,promotions,staffing,merchandising,andinventoryrecommendations.Thestoremanagercananalyzethelocaleventscalendartoflageventsthatmayhaveapositiveornegativeimpactonstoresales(seeFigure4.9).Inthisexample,thelocaleventscalendarhighlightstwoevents:(1)Stanfordcollegefootballgame(whichshouldincreasethesaleofbeer,chips,burgers,andothertailgatingmaterials)and(2)farmersmarket(whichshoulddecreasethesaleoffreshproduceandfruitsandotherorganicitems).Theanalyticssupportingthedashboardcouldautomaticallyanalyzetheresultsofpreviouslocaleventsandleveragepredictiveanalyticstopredicthowthoseeventsmightimpactstoretrafficandthesalesofspecificproductcategories.
Figure4.9Localeventsusecase
Anotherusecaseistointegratethelocalweatherforecastintothestoremanagerdashboard.Thestoremanagercananalyzethelocalweatherforecastsandmakeadjustmentsforinventory,merchandising,andpromotionsbasedonwhethertheweatherwillbewarmerorcolderthanexpected(seeFigure4.10).Thedashboardcanautomaticallyanalyzesimilarweatherconditionsandpredicttheimpactonstoretrafficandproductcategorysalesanddeliverrelevantrecommendationstothestoremanager.
Figure4.10Localweatherusecase
Thedashboardcouldevencouplethecompetitiveactivities,localevents,andweatherdatatopredictwhatsortofimpactthecombinationofthesemighthaveonstoretrafficandproductcategorydemand.Theseinsightscouldyieldnewrecommendationsthatdrivethestoremanager'sdecisionsaboutpricing,promotions,merchandising,staffing,and/orinventory.
B2BCaseStudy:MaketheChannelMoreEffectiveOverthecourseofmytravels,Ihavemetwithseveralorganizationsthatworkthroughpartners,brokers,agents,andadvisorstogettheirproductsandservicesintothehandsoftheendconsumer.Thesebusiness-to-business(B2B)organizationsfaceuniquechallenges:
Theyhavetoworkextrahardandbeverycreativeingatheringdataabouthowendconsumersarebuyingandusingtheirproducts.
Theyneedtofindawaytominethecustomer,product,andoperationaldatatouncoverinsightsandmakerecommendationsthatmaketheirpartners,brokers,andagentsmoreeffective.
Thiscanbefrustrating,especiallyinlightofallthesuccessstoriesfrombusiness-to-consumer(B2C)organizationssuchasretailers,mobilephoneproviders,creditcardcompanies,travel,entertainmentandhospitalitycompanies,andotherorganizationsthathavedirectengagementwiththeendconsumer.
Butnottofear,therearethingsthattheseB2Borganizationscandotoencouragetheirpartners,brokers,agents,andadvisorstosharemoreofthatvaluableconsumerdata.TherearealsouniqueinsightsthattheseB2Borganizationscanprovidetotheirpartnersandchannelstomakethemmoreeffective(and,hopefully,evenmorewillingtosharetheendconsumer'spurchaseandengagementdata).
Forpurposesofthiscasestudy,IhavecreatedafictitiousfinancialservicescompanycalledFSI.We'llassumethatFSIsellsitsproductsandservicesviaindependentfinancialadvisors.Ihopethatyoucanseetheapplicabilityofthisusecasetoanyindustrythatmustworkthroughpartners,brokers,agents,andadvisorstoreachtheirendconsumers.
TheAdvisorsAreYourPartners—MakeThemSuccessfulManyfinancialservicesadvisorsaresmall,specializedfirmswith1to10employeesthatprovidefinancialadvicetoasmallgroupofcustomers.Manylackthetechnicalandanalyticcapabilitiestoanalyzelargeamountsofdataanddeveloppredictiveandprescriptivemodelsbasedontheirclients'financialgoals,currentfinancialsituation,ongoingfinancialconversations,anddeephistoryoffinancialtransactions.
ThisprovidesabusinessopportunityforFSItomarketcustomer,product,andmarketinsightstotheseindependentfinancialadvisors.Theseinsightscouldinclude:
Benchmarks:Whatrangeofreturnsshouldaclientexpectfromacertaintypeofportfolio?Howdoestheclient'sportfolioperformancecomparetothatofsimilarclientswithsimilarfinancialobjectives?What'sthetypicalfinancialsituationandassetbaseforotherclientsinsimilarfinancialconditions?
What'stheidealportfoliomixgivenmyclient'sageandspecificfinancialgoals?
PortfolioMix:Howdoesmyclient'spercentageoffinancialandinvestmentcontributionscomparetothatofotherslikehim?Howdoesmyclient'sportfolioandfinancialassetmixcomparetothatofotherslikehim?
BestPractices:Whatarethebestperformingportfoliosforsomeonewiththesamefinancialgoalsgivenhisorherageandemploymenttimeframe?Whatarethebestperforminginvestmentinstrumentsforclientsgiventhesameretirementhorizonasmyclient?
IndustryTrends:Whatcurrentfinancialinstrumentsprovidethebestreturn-to-riskratio?Howarethesefinancialinstrumentsprojectedtoperformoverthenext1,5,and10years?Whatarethemostspecificcontributionandinvestmentrisksforwhichmyclientneedstoplan?
FinancialAdvisorCaseStudyOkay,let'sgettothefunstuff!Let'sseehowFSIcouldleveragetheinsightsmentionedaboveto(1)maketheindependentfinancialadvisorsmoreeffectiveinsupportingtheirclientsand(2)createastronger,moreprofitablerelationshipbetweenFSIandthefinancialadvisors.Let'sexplorehowFSIcoulddeliverclient-specificrecommendations(prescriptiveanalytics)inawaythatisactionableforboththeadvisorandtheclient.Let'salsoexplorehowFSIcouldcreateanactionabledashboardtodrivefurtherclientengagementthatcouldgatherevenmoreclientfinancialdata(financialgoals,employmentplans,spendingpatterns)thatFSIcouldusetoimproveitspredictiveandprescriptivemodeling.
Thefinancialadvisordashboardshouldaddressthefollowingfunctionality(seeFigure4.11):
Reportonclient'scurrentfinancialstatusandrecentfinancialperformance
Assessclient'sfinancialstatusandprogressagainstpersonalfinancialgoalssuchas:
Buyingacar
Buyingahome
Collegeeducation
Startingabusiness
Careerorlifechange
Retirement
Deliverrecommendationstothefinancialadvisorstoimprovetheclient'sfinancialperformancesuchas:
Modifyfinancialcontributions
Adjustinvestmentstrategies(short-term,long-term)
Reallocatefinancialportfolio
Changeinvestmentvehicles(stocks,bonds,mutualfunds,etc.)
Figure4.11Financialadvisordashboard
Thegoalofthefinancialadvisordashboardistouncoverinsightsabouttheclient'sinvestmentperformanceandprovideclient-specificrecommendationsthathelptheseclientsreachtheirfinancialgoals.Togenerateactionable,accuraterecommendations,we'regoingtoneedtoknowasmuchabouttheclientaspossible,including:
Currentandhistoricalpersonalbackgroundinformation(e.g.,maritalstatus,spouse'sfinancialandemploymentsituation,numberandageofchildren,outstandingmortgageonhome(s)andanysecondaryrealestateinvestments)
Currentfinancialinvestmentsandotherassets(e.g.,stocks,bonds,mutualfunds,IRAs,401-Ks,REITs)
Currentandhistoricalincome(andexpenditures,ifpossible)
Financialgoalswithspecifictimelines
Weneedtoensurethatthefinancialadvisordashboardprovidesenoughvaluetoboththefinancialadvisorandtheadvisor'sclientsinordertoincenttheclientstoshareasmuchofthisdataaspossible.
InformationalSectionsofFinancialAdvisorDashboardLet'sexamineinmoredetailsthekeyinformationalsectionsofthefinancialadvisordashboard.Thesesectionsformthefoundationformuchoftheanalyticsthatwillbedevelopedtosupporttheclient'sfinancialgoals.
ClientPersonalInformation:Thefirstpartofthedashboardpresentsrelevantclientpersonalandfinancialinformation.FSIwantstogatherasmuchpersonalinformationasisrelevantwhentheclientfirstopenshisaccounts.Butaftertheclientopenshisaccount,thereneedstobeaconcertedefforttokeepthedataupdatedandcapturenewlifestyle,lifestage,employment,andfamilyinformation.Muchofthatclientdatacanbecapturedviadiscussionsandinteractionsthatthefinancialadvisorishavingwiththeclient(e.g.,informationalcalls,e-maildialogues,officevisits,annualreviews).WhilethisinformationisgoldtoFSI,muchofthisdatanevergetspastthefinancialadvisors'personalcontactmanagementande-mailsystems.FSImustprovidecompellingreasonstopersuadethefinancialadvisorsandclientstosharemoreofthisdatawithFSI(seeFigure4.12).
Someleading-edgeorganizationsareprovidingincentives(e.g.,discounts,promotions,contests,rewards)forclientstosharetheirsocialmediainteractions.Obviously,accesstotheclient'scurrentsituationandplansaspostedonsocialmediasitesisgoldwhenitcanbeminedtouncoveractivitiesthatmightaffecthisfinancialneeds(e.g.,vacations,buyinganewcar,upcomingweddingplans,promotions,jobchanges,childrenchangingschools).
ClientFinancialStatus:Thenextsectionofthedashboardprovidesanoverviewoftheclient'scurrentfinancialstatus.Again,themoredatathatcanbegatheredabouttheclient'sfinancialsituation(e.g.,investments,home,spending,debt),themoreaccurateandprescriptivetheanalyticmodelswillbe(seeFigure4.13).
Figure4.12Clientpersonalinformation
Figure4.13Clientfinancialinformation
Inthisexample,wehavedetailsonalltheclient'sfinancialinvestmentswithFSI.However,theclientmight(andlikelydoes)havefinancialinvestmentswithotherfirmscourtesyofhisemployer's401kprograms,wholelifeinsurancepolicies,andotherstocks,bonds,andfunds.Andthatdoesn'tevenconsidersubstantialinvestmentsinnonfinancialinstrumentslikehisprimaryresidence,vacationhome,antiques,andcollectibles.
Incentingclientstosharetheirentirefinancialportfolioiscomplicatedbyhowharditisforaclienttopullallthatinformationtogetherinoneplace.However,Mint.comhasfiguredouthowtoaggregatefinancialspendingfromcreditcardsandbankchecks.Theinclusionoftheclient'sexpendituredatacouldbeinvaluableinbuildingaclientprofileanddevelopingspecific,actionablefinancialrecommendations.
ClientFinancialGoals:Thefinalinformationalsectionofthedashboardcontainstheclient'sfinancialgoals.Therearelikelyonlyasmallnumberofgoals,andtheyprobablydon'tchangethatoften.However,itisdifficulttodevelopmeaningfulclientfinancialrecommendationswithoutup-to-dateclientfinancialgoals.Fromadatacollectionperspective,thisisprobablytheeasiestdatatocapture,giventhatyouhaveadequatelyaddressedtheclient'sprivacyandsecurityconcerns(seeFigure4.14).
Figure4.14Clientfinancialgoals
However,let'ssaythattheclienteitherwon'tsharehisfinancialgoalsorhasn'teventhoughtthroughwhathisfinancialgoalsneedtobe.Thisiscommonwhendealingwithretirementplanning,sincemanyclientsaren'tclearorrealisticabouttheirretirementgoals.Inthesesituations,FSIcouldleveragetheinformationthatithasabout“similar”clientstomakeretirementgoalrecommendations.IfFSIhastheclient'scurrentfinancialinvestmentsandcurrentsalary,FSIcouldmakeaprettyintelligentguessastotheclient'sretirementgoals.
RecommendationsSectionofFinancialAdvisorDashboardNowlet'sgetintothemeatofthefinancialadvisordashboard.Theclientinformationsectionsofthedashboardweremeanttoprovideaneasyandefficientwaytocapturetheclient'skeylifestyle,demographic,andfinancialdata,aswellashisfinancialgoals.Nowwecancreatepredictivemodelstopredictthelikelyresultsofdifferentfinancialoptionsandactions,andthencreateprescriptivemodelsinordertodeliverclient-specificrecommendationsthathelptheclienttoreachhisfinancialgoals.Thisfinancialadvisordashboardcoversfourdifferentareasfordeliveringclient-specificfinancialrecommendations:
Financialcontributions
Spendinganalysis
Assetallocation
Otherfinancialinvestments
FinancialContributionsRecommendationsThefirstsetofrecommendationsisfocusedonhelpingtheclientoptimizefinancialcontributions(seeFigure4.15).Thetypesofclientdecisionsthatcouldbemodeledinclude:
Monthlyinvestmentsandperiodicincreasesandadjustments
Lifeinsurancecoverageadjustments
Onetimepaymentstojump-startlaggingfinancialgoals
Reallocatemonthlyorperiodicpaymentsagainstdifferentfinancialgoals
Changeretirement,newcar,andnewhometargetdates
Figure4.15Financialcontributionsrecommendations
Wecouldemploydatasciencetoanalyzetheclient'sdetailedfinancialdata,comparethatdatawithbenchmarksacrosssimilarclientsanddevelopclient-specificanalyticprofiles.Thefinancialadvisordashboardcouldprovidea“whatif”capabilitythatallowsthefinancialadvisortoworkwiththeclienttotestoutdifferentscenarios(e.g.,changestoinvestmentamounts,changestofinancialgoaltargetdates).
SpendingAnalysisRecommendationsThesecondsetofrecommendationsisfocusedonhelpingtheclientoptimizespendinghabits.Thisiswhereaccesstotheclient'screditcardandbankingstatements(maybeviaMint.comand/orhischeckingaccounts)couldyieldvaluableinsightstohelptheclientminimizecashoutflowandincreasefinancialinvestments(seeFigure4.16).Thetypesofspendingdecisionsthatwouldneedtobemodeledinclude:
Consolidatingexpendituresofsimilarproductsandservices
Flaggingexpendituresthatareabnormallyhighgiventheclient'sfamilysituation,homelocation,etc.
Integratingcustomerloyaltyprograminformationtofindretailerswhocanprovidebestpricesonfoodandhouseholdstaples
Increasinginsurancedeductiblestolowerpremiums
Findingmorecost-effectivehome,property,andautoinsurance
Figure4.16Spendanalysisandrecommendations
TherearelotsofopportunitiestoleverageexternaldatasourcesandbestpracticesacrosstheFSIclientbasetofindbetterdealsinanattempttoreducetheclient'sdiscretionaryspending.Thereareseveralretail,insurance,travel,hospitality,entertainment,cellphone,andotherwebsitesfromwhichdatacouldbegathered.Thisdatacouldbeusedtocreaterecommendationstoreducetheclient'sspendingandoptimizetheclient'smonthlybudget,withthesavingsbeingusedtoincreasefinancialcontributionsagainsttheclient'sfinancialgoals.
AssetAllocationRecommendationsThethirdsetofrecommendationsisfocusedonhelpingclientsoptimizetheirassetallocationinlightoftheirfinancialgoals.Byleveragingbestpracticesacrossotherclients,portfolios,andinvestmentinstruments,prescriptiveanalyticscanbedevelopedtomakespecificassetallocationrecommendationsthatsupportassetallocationdecisionssuchas(seeFigure4.17):
Whichstocksandbondstosellorbuyagainstspecificfinancialgoalportfolios
Portfolioallocationdecisionsthatproperlybalancetherisk-returnratiooftheclient'sportfolioinlightofrisktoleranceandfinancialgoals
Otherfinancialinstrumentsthatcanacceleratetheclient'sprogressagainstfinancialgoalsorreduceriskforthoseshort-termfinancialgoals
Figure4.17Assetallocationrecommendations
TherearemanyopportunitiestoleverageanalyticbestpracticesacrossFSI'sclientbasetomakeinvestmentrecommendationsthatcanimproveperformancegivena
client'sdesiredrisklevel.Tofurtherprotecttheclient'sinvestmentassets,anaggregatedviewofthemarketplacecouldyieldmoretimelyinsightsintostocksandbondsthataresuddenlyhotorcold.Thisisalsoanareawherereal-timeanalyticscanbeleveragedtoensurethatnosuddenmarketmovementsexposetheclienttounnecessaryassetallocationrisks.Thedashboardcouldalsosupportaninteractive“whatif”collaborationdirectlywiththeclienttogleanevenmoredataandinsightsabouttheclient'sinvestmentpreferencesandtoleranceforrisk.
OtherInvestmentRecommendationsThefourthsetofrecommendationsisfocusedonotherassetsthatclientsneedtoconsideraspartoftheiroverallfinancialstrategy.Realestate(theclient'shomeandanyvacationhomes)isprobablythemostobvious.Thisisanareawhererecommendationsaboutotherinvestmentoptionscanbedeliveredtohelpsupportclientdecisionsregarding(seeFigure4.18):
Identifyingtheidealamountofinsuranceneededgivenhomevaluationchanges
HomeimprovementprojectsthatyieldthebestROIforparticularhousetypes,budgets,andlocationsovertime
Identifyingtherighttimetobuyorsellahome,andevenmakingrecommendationsastowhatpricetobidforhomesinselectareas
Bestareastolookforsecondaryand/orvacationhomeinvestments
Mostcost-effectivelocationstoliveinafterretirement
Figure4.18Otherinvestmentrecommendations
Thereisabevyofexternaldatasourcesthatcanbeleveragedtohelpfacilitateanalyticsinthisarea.Forexample,ZillowandRealtor.comproviderealestatevaluationsandmonthlychangesinrealestatevaluationsthatcouldbeincorporatedintothefinancialadvisordashboard.Costoflivingmetrics,whichcanbeusedtoidentifyidealretirementareas,canbefoundonmanyfinancialwebsitesincludingdata.gov.
SummaryBigdatacanpoweramorerelevantandmoreactionableuserexperience.Insteadofoverwhelmingbusinessuserswithanendlessarrayofcharts,reports,anddashboardsandforcingusersto“sliceanddice”theirwaytoinsights,wecaninsteadleveragethewealthofavailablestructuredandunstructureddatasources,inreal-time,coupledwithdatasciencetouncovercustomer,product,andoperationalinsightsburiedinthedata.Wecanleveragethoseinsightstocreatefrontlineemployee,manager,andcustomerrecommendationsandthenmeasuretheeffectivenessofthoserecommendationssothatwearecontinuouslyrefiningouranalyticmodels.
BigdatacanalsohaveseriousimplicationsforB2Borganizationsthatrelyonbrokers,agents,andadvisorstoreachtheirultimateendconsumer.WhileitmayfrustratemanyB2Borganizationsthattheylackthatdirectengagementwithconsumers,therearewaysthatB2Borganizationscanleveragenewsourcesofdataandanalyticscapabilitiestonotonlyimprovetheeffectivenessoftheirbrokers,agents,andadvisorsbutalsoprovidecompellingreasonswhythebrokers,agents,advisors,andendconsumersshoulddirectlysharemoredatawiththeB2Borganizationtocreateawin-win-winforclients,advisors,andtheB2Borganization.
HomeworkAssignmentUsethefollowingexercisetoapplywhatyoulearnedinthischapter.
Exercise#1:Selectoneofyourorganization'soutward-facingdashboards,websites,ormobileapps.Ifnotsomethingfromyourorganization,thenselectawebsiteordashboardthatyouuseregularly.Thatmightincludesomethingfromyourbank,creditcardprovider,cellularprovider,orutilitycompany.Grabafewscreencapturesofthedashboardorwebsite.
Exercise#2:Thinkthroughhowyouastheuserusethisdashboard,website,ormobiletomakedecisions.Writedownthosedecisionsthatyoutrytomakefromthewebsite.Forexample,fromyourutility,youmightwanttomakedecisionsaboutenergyandwaterconsumption,yourwaste/garbageplan,andmaybeevenwhichofthedifferentappliancerebatesyoumightwanttoconsider.
Exercise#3:Next,addarecommendationspanelthathassuggestionsforeachofthedecisionsthatyoucapturedinStep2.Forourutilityexample,onerecommendationmightbe“Onlywater3daysaweekfrom6:00a.m.to7:00a.m.tosaveapproximately$12.50permonthonyourmonthlywaterbill.”Oranotherrecommendationmightbe“ReplaceyourexistingdryerwithamoreefficientmodelliketheSamsungDV457tosave$21.75onyourmonthlyenergybill.”
Exercise#4:Finally,identifypotentialexternaldatasourcesthatmightprovidesomeinterestingperspectivesthatcouldbeusedtoguideyourkeydecisions.Forourutilityexample,youmightwanttoconsiderintegratinglocalsolarenergycosts(todetermineifsolarenergyisafeasibleenergyoption)orweatherforecasts(toseeifyoucanreducelawnwatering).
PartIIDataScienceThesethreechaptersintroducedatascienceasakeybusinessdisciplinethathelpsorganizations“crosstheanalyticschasm”fromtheBusinessMonitoringtoBusinessInsightsandBusinessOptimizationphases.Thesechapterswillintroducetheconceptofdatascienceandthenbroadenthediscussiontocoverwhatdatasciencetechniquestouseinwhichbusinessscenarios.
InThisPart
Chapter5:DifferencesBetweenBusinessIntelligenceandDataScience
Chapter6:DataScience101
Chapter7:TheDataLake
Chapter5DifferencesBetweenBusinessIntelligenceandDataScienceIwashiredbyalargeInternetportalcompanyin2007toheadupeffortstodevelopitsadvertiseranalytics.TheobjectiveoftheadvertiseranalyticsprojectwastohelptheInternetportalcompany'sadvertisersandagenciesoptimizetheiradvertisingspendacrosstheInternetportal'sadnetwork.Theinternalcodenamefortheprojectwas“LookingGlass”becausewewantedtotaketheadvertisersandagenciesthroughan“AliceinWonderland”typeofexperienceinhowwedeliveredactionableinsightstohelpourkeybusinessstakeholders—MediaPlanners&BuyersandCampaignManagers—successfullyoptimizetheiradvertisingspendontheInternetportal'sadnetwork.Butinmanyways,itwasmethatwentthroughthelookingglass.
Severalmonthslater(August2008),IhadtheopportunitytokeynoteatTheDataWarehouseInstitute(TDWI)conferenceinSanDiego.ItaughtaclassatTDWIonhowtobuildanalyticapplications,soIwasbothfamiliarwithandabigfanoftheTDWIconferences(andstillam).However,inmykeynote,ItoldtheaudiencethateverythingthatIhadtaughtthemabouthowtobuildanalyticapplicationswaswrong(seeFigure5.1).
Figure5.1SchmarzoTDWIkeynote,August2008
Likewithmyownpersonalexperience,manyorganizationsandindividualsareconfusedbythedifferencesintroducedbybigdata,especiallythedifferencesbetweenBusinessIntelligence(BI)anddatascience.BigdataisnotbigBI.Bigdataisakeyenablerofanewdisciplinecalleddatasciencethatseekstoleveragenewsourcesofstructuredandunstructureddata,coupledwithpredictiveandprescriptiveanalytics,touncovernewvariablesandmetricsthatarebetterpredictorsofperformance.AndwhileBIanddatasciencesharemanyofthesameobjectives(gettingvalueoutofdata,dealingwithdirtydata,transformingandaligningdata,helpingsupportimproveddecisionmaking),thequestions,characteristics,processes,tools,andmodelscouldn'tbemoredifferent.
ThischapterdiscussesthedifferencesbetweenBIanddatascience:
Thequestionsaredifferent.
Theanalyticcharacteristicsaredifferent.
Theanalyticengagementprocessesaredifferent.
Thedatamodelsaredifferent.
Thebusinessviewisdifferent.
Solet'sstartyourjourneythroughthe“lookingglass.”Ipromisethatthejourneywillbeenlightening(butnohookahsmoking)!
WhatIsDataScience?Datascienceisacomplicatednewdisciplinethatrequiresadvancedskillsandcompetenciesinareassuchasstatistics,computerscience,datamining,mathematics,andcomputerprogramming.Ashadbeenstatedcountlesstimes,datascientistsarethebusiness“rockstars”ofthe21stcentury.
Althoughwhatdatascientistsdocanbequitecomplex,whattheyaretryingtoachieveisnot.Infact,IfindthattheverybestintroductorybooktodatascienceisMoneyball:TheArtofWinninganUnfairGamebyMichaelLewis(W.W.Norton&Company,2004).ThebookisabouttheOaklandA'sGeneralManagerBillyBeane'suseofsabermetricstohelpthesmall-marketOaklandA'sprofessionalbaseballteamoutperformcompetitorswithsignificantlylargerbankrolls.Thebookyieldsthemostaccuratedescriptionofdatascience:
Datascienceisaboutfindingnewvariablesandmetricsthatarebetterpredictorsofperformance.
That'sit—nothingmore—andyes,datascienceisthatsimple.Butthepowerofthatsimplestatementisgamechanging,ascanbeseeninFigure5.2andthesuccessthatBillyBeaneandtheOaklandA'shaveachievedbymakingplayeracquisitionsandin-gamedecisionsbasedonadifferent,morepredictivesetofmetrics.
Figure5.2OaklandA'sversusNewYorkYankeescostperwin
Thebookalsohasanothervaluablelesson:goodideascanbecopied.Soorganizationshavetoconstantlybeonthesearchforthosenewvariablesandmetricsthatarebetterpredictorsofperformance—tofindthatnext,more
predictive“on-basepercentage”metric.
BIVersusDataScience:TheQuestionsAreDifferentWhenclientsaskmetoexplainthedifferencebetweenaBusinessIntelligenceanalystandadatascientist,Istartbyexplainingthatthetwodisciplineshavedifferentobjectivesandseektoanswerdifferenttypesofquestions(seeFigure5.3).
Figure5.3BusinessIntelligenceversusdatascience
BIQuestionsBIfocusesondescriptiveanalytics:thatis,the“Whathappened?”typesofquestions.Examplesinclude:
HowmanywidgetsdidIselllastmonth?
WhatweresalesbyzipcodeforChristmaslastyear?
HowmanyunitsofProductXwerereturnedlastmonth?
Whatwerecompanyrevenuesandprofitsforthepastquarter?
HowmanyemployeesdidIhirelastyear?
BIfocusesonreportingonthecurrentstateofthebusiness,orasisnowcommonlycalledBusinessPerformanceManagement(BPM).BIprovidesretrospectivereportstohelpbusinessuserstomonitorthecurrentstateofthebusinessandanswerquestionsabouthistoricalbusinessperformance.Thesereportsandquestionsarecriticaltothebusiness,sometimesrequiredforregulatoryandcompliancereasons.
BIcanapplysomerudimentaryanalytics(timeseriesanalysis,previousperiodcomparisons,indices,shares,andbenchmarks)tohelpbusinessuserstoflagareas
ofunder-andover-performance.Buteventheseanalyticsarefocusedonmonitoringwhathappenedtothebusiness.
DataScienceQuestionsOntheotherhand,datascientistsareinsearchofvariablesandmetricsthatarebetterpredictorsofbusinessperformance.Consequently,datascientistsfocusonpredictiveanalytics(“Whatislikelytohappen?”)andprescriptiveanalytics(“WhatshouldIdo?”)typesofquestions.Forexample:
PredictiveQuestions(Whatislikelytohappen?)
HowmanywidgetswillIsellnextmonth?
WhatwillsalesbyzipcodebeoverthisChristmasseason?
HowmanyunitsofProductXwillbereturnednextmonth?
Whatareprojectedcompanyrevenuesandprofitsfornextquarter?
HowmanyemployeeswillIneedtohirenextyear?
PrescriptiveQuestions(WhatshouldIdo?)
Order[5,000]ComponentZtosupportwidgetsalesfornextmonth.
Hire[Y]newsalesrepsbythesezipcodestohandleprojectedChristmassales.
Setaside[$125K]infinancialreservetocoverProductXreturns.
Sellthefollowingproductmixtoachievequarterlyrevenueandmargingoals.
Increasehiringpipelineby35percenttoachievehiringgoals.
Toanswerthesepredictiveandprescriptivequestions,datascientistsbuildanalyticmodelsinanattempttoquantifycauseandeffect.Chapter7coverssomeoftheanalyticalgorithmsandtechniquesthatdatascientistsmightusetohelpthemquantifycauseandeffect.
TheAnalystCharacteristicsAreDifferentAnotherareaofdifferencebetweenBIanddatascienceisintheattitudinalcharacteristicsandworkapproachofthepeoplewhofillthoseroles(seeTable5.1).
Table5.1BIAnalystVersusDataScientistCharacteristics
Area BIAnalyst DataScientist
Focus Reports,KPIs,trends Patterns,correlations,models
Process Static,comparative Exploratory,experimentation,visual
Datasources Pre-planned,addedslowly Onthefly,asneeded
Transform Upfront,carefullyplanned In-database,ondemand,enrichment
Dataquality Singleversionoftruth “Goodenough,”probabilities
Datamodel Schemaonload Schemaonquery
Analysis Retrospective,descriptive Predictive,prescriptive
Courtesy:EMC
ThedifferencesthatjumpedoutmosttomefromTable5.1werethedifferentperspectiveson“dataquality.”FortheBIanalystwhoisdealingwithhistoricaldata,thedataneedstobe100percentaccurate.BIanddatawarehouseorganizationshaveinvestedheavilyindatagovernanceandmasterdatamanagementtoensurethatthedatainthedatawarehouseare100percentaccurate.
Ontheotherhand,thedatascientististryingtopredictwhatislikelytohappeninthefutureand,asaresult,isdealingwithprobabilities,confidencelevels,F-distributions,t-tests,andp-values.Thefutureisnever100percentaccurate,sodatascientistsdevelopasenseofwhatis“goodenough”intryingtopredictwhatislikelytohappenandrecommendwhatactionstotake.AsYogiBerra,thewell-knownNewYorkYankeecatcher,wasfamouslyquoted,“It'stoughtomakepredictions,especiallyaboutthefuture.”
Ittakesadifferentattitudetobeadatascientist,anattitudethatacceptsfailureasatoolforlearning.Datascientistslearntoembracefailureaspartoftheiragile,fail-fastapproachinthesearchtouncovernewmetricsandvariablesthatarebetterpredictorsofperformance.AcommonapproachthatthedatascientistsembraceismodeledaftertheCrossIndustryStandardProcessforDataMining(CRISP)model(seeFigure5.4).
Figure5.4CRISP:CrossIndustryStandardProcessforDataMining
Datasciencetakesaverysimilarapproach:establishabusinesshypothesisorquestion;exploredifferentcombinationsofdataandanalyticstobuild,test,andrefinetheanalyticmodel;andwash,rinse,andrepeatuntilthemodelprovesthatitcanprovidetherequired“analyticlift”whilereachingasatisfactorygoodnessoffit.Finallytheanalyticsaredeployedoroperationalizedincludingpossiblyrewritingtheanalyticsinadifferentlanguagetospeedthemodelexecution(i.e.,in-databaseanalytics)andintegratingtheanalyticmodelsandresultsintotheorganization'soperationalandmanagementsystems.
TheAnalyticApproachesAreDifferentUnfortunately,theseexplanationsareinsufficienttoanswersatisfactorilythequestionofwhat'sdifferentbetweenBusinessIntelligenceanddatascience.Solet'sexaminecloselythedifferentengagementapproaches(includinggoals,tools,andtechniques)thattheBIanalystandthedatascientistusetodotheirjobs.
BusinessIntelligenceAnalystEngagementProcessTheBIanalystengagementprocessisadisciplinethathasbeendocumented,taughtandrefinedoverthreedecadesofbuildingdatawarehousesandBIenvironments.Figure5.5providesahigh-levelviewoftheprocessthatatypicalBIanalystuseswhenengagingwiththebusinessuserstobuildouttheBIandsupportingdatawarehouseenvironments.
Figure5.5BusinessIntelligenceengagementprocess
Step1:Pre-buildDataModel.Theprocessstartsbybuildingthefoundationaldatamodel.Whetheryouuseadatawarehouseordatamartorhub-and-spokeapproach,whetheryouuseastar,snowflake,normalizedordimensionalschema,theBIanalystmustgothroughaformalrequirementsgatheringprocesswiththebusinessuserstoidentifyall(oratleastthevastmajorityof)thequestionsthatthebusinessuserswanttoanswer.Inthisrequirementsgatheringprocess,theBIanalystmustidentifythefirst-andsecond-levelquestionsthebusinessuserswanttoaddressinordertobuildarobustandextensibledatamodel.Forexample:
First-levelquestion:Howmanypatientsdidwetreatlastmonth?
Second-levelquestion:Howdidthatcomparetothepreviousmonth?
Second-levelquestion:WhatwerethemajorDRGtypestreated?
First-levelquestion:HowmanypatientscamethroughERlastnight?
Second-levelquestion:Howdidthatcomparetothepreviousnight?
Second-levelquestion:Whatwerethetopadmissionreasons?
First-levelquestion:WhatpercentageofbedswasusedatHospitalXlastweek?
Second-levelquestion:Whatisthetrendofbedutilizationoverthepastyear?
Second-levelquestion:Whatdepartmentshadthelargestincreaseinbedutilization?
TheBIanalystthenworkscloselywiththedatawarehouseteamtodefineandbuildtheunderlyingdatamodelsthatsupportthesetypesofquestions.
NOTE
Thedatawarehouseusesa“schemaonload”approachbecausethedataschemamustbedefinedandbuiltpriortoloadingdataintothedatawarehouse.Withoutanunderlyingdatamodelorschema,theBItoolswillnotwork.
Step2:DefinetheReport(Query).Oncetheanalyticrequirementshavebeentranscribedintoadatamodel,thenstep2oftheprocessiswheretheBIanalystusesaBItool—SAPBusinessObjects,MicroStrategy,Cognos,Qlikview,Pentaho,etc.—tocreatetheSQL-basedquerytobuildthereportand/oranswerthebusinessquestions.TheBIanalystwillusetheBItool'sgraphicaluserinterface(GUI)togeneratetheSQLquerybyselectingthemeasuresanddimensions;selectingpage,column,andpagedescriptors;specifyingconstraints,subtotals,andtotals;creatingspecialcalculations(mean,movingaverage,rank,shareof);andselectingsortcriteria.TheBItoolGUIhidesmuchofthecomplexityofcreatingtheSQL.
Step3:GenerateSQLCommands.OncetheBIanalystorthebusinessuserhasdefinedthedesiredreportorqueryrequest,theBItoolautomaticallycreatesthenecessarySQLcommands(SQLstatements).Insomecases,theBIanalystmightmodifytheSQLcommandsgeneratedbytheBItooltoincludeuniqueSQLcommandsthatmaynotbesupportedbytheBItool.
Step4:CreateReport.Instep4,theBItoolissuestheSQLcommandsagainstthedatawarehouseandcreatesthecorrespondingreportordashboardwidget.Thisisahighlyiterativeprocess,wheretheBIanalystwilltweaktheSQL(eitherusingtheGUIorhand-codingtheSQLstatement)tofine-tunethe
SQLrequest.TheBIanalystscanalsospecifygraphicalrenderingoptions(barcharts,linecharts,piecharts)untiltheygettheexactreportand/orgraphicthattheywant(seeFigure5.6).
Figure5.6TypicalBItoolgraphicoptions
TheBItoolsareverypowerfulandrelativelyeasytouseifthedatamodelisconfiguredproperly.Bytheway,thisisagoodexampleofthepowerofschemaonload.ThistraditionalschemaonloadapproachremovesmuchoftheunderlyingdatacomplexityfromthebusinessuserswhocanthenusetheBItoolsgraphicaluserinterfacetomoreeasilyqueryandexplorethedata(thinkself-serviceBI).
Insummary,theBIapproachreliesonapre-builtdatamodel(schemaonload),whichenablesuserstoquicklyandeasilyquerythedata—aslongasthedatathattheywanttoqueryisalreadydefinedandloadedintothedatawarehouse.Ifthedataisnotinthedatawarehouse,thenaddingdatatoanexistingwarehousecantakemonthstomakehappen.Notonlydoesmodifyingthedatawarehousetoincludeanewdatasourcerequireasignificantamountoftime,buttheprocesscanbeverycostly,asdataschemashavetobeupdatedtoincludethenewdatasource,newETLprocesseshavetobeconstructedtotransformandnormalizethedatatofitintotheupdateddataschemas,andexistingreportsanddashboardsmayhavetobeupdatedtoincludethenewdata.
TheDataScientistEngagementProcessThedatascienceprocessissignificantlydifferent.Infact,thereisverylittlefromtheBIanalystengagementprocessthatcanbereusedinthedatascienceengagementprocess(seeFigure5.7).
Figure5.7Datascientistengagementprocess
Step1:DefineHypothesistoTest.Step1ofthedatascienceengagementprocessstartswiththedatascientistsidentifyingthepredictiontheywanttomakeorhypothesisthattheywanttotest.Thisisaresultofcollaboratingwiththebusinesssubjectmatterexperttounderstandthekeysourcesofbusinessdifferentiation(e.g.,howtheorganizationdeliversvalue)andthenconstructtheassociatedhypothesesorpredictions.
Step2:GatherData…andMoreData.Instep2ofthedatascienceengagementprocess,thedatascientistgathersrelevantorpotentiallyinterestingdatafromamultitudeofsources—bothinternalandexternaltotheorganization—andpushesthatdataintothedatalakeoranalyticsandbox.Thedatalakeisagreatfoundationalcapabilityforthisprocess,asthedatascientistscanacquireandingestanydatatheywant(as-is),testthedataforitsvaluegiventhehypothesisorprediction,andthendecidewhethertoincludethatdataintheanalyticmodel.Thisiswhereanenvisioningexercisecanaddconsiderablevalueinfacilitatingthecollaborationbetweenthebusinessusersandthedatascientiststoidentifydatasourcesthatmayhelpimprovepredictiveresults.
Step3:BuildDataModel.Step3iswherethedatascientistsdefineandbuildtheschemanecessarytoaddressthehypothesisbeingtested.Thedatascientistscan'tdefinetheschemauntiltheyknowthehypothesisthattheyaretestingandunderstandwhatdatasourcestheyaregoingtousetobuildtheiranalyticmodels.
NOTE
Thisschemaonqueryprocessisnotablydifferentfromthetraditionaldatawarehouseschemaonloadprocess.Thedatascientistdoesn'tspendmonthsintegratingallthedifferentdatasourcestogetherintoaformaldatamodelfirst.Instead,thedatascientistwilldefinetheschemaasneededbasedonthedatathatisbeingusedintheanalysisandtherequirementsoftheanalytictooland/oralgorithm.Thedatascientistwilllikelyiteratethroughseveraldifferentversionsoftheschemauntilfindingaschemathatsupportstheanalyticmodelwithasufficientgoodnessoffitthatacceptsorrejectsthehypothesisbeingtested.
Step4:VisualizetheData.Step4ofthedatascienceprocessleveragesmanyoftheoutstandingdatavisualizationtoolsavailabletodaytouncoverrelationships,correlations,andoutliersinthedata.Thedatascientistswillusethedatavisualizationtoolstojump-starttheiranalyticprocessbytryingtoidentifycorrelationsinthedataworthyofinvestigationandoutliersinthedatathatmayneedspecialtreatment(e.g.,logtransformations).DatavisualizationtoolslikeTableau,Spotfire,DataRPM,andggplot2aregreatdatavisualizationtoolsforexploringthedataandidentifyingvariablesthatthedatascientistsmightwanttotest.
Step5:BuildAnalyticModels.Step5iswheretherealdatascienceworkbegins—wherethedatascientistsuseadvancedanalytictoolslikeSAS,SASMiner,R,Mahout,MADlib,AlpineMiner,H2O,etc.tocorrelatedifferentvariablesinanattempttobuildamoreaccurateanalyticmodels.Thedatascientistswillexploredifferentanalytictechniquesandalgorithmstotrytocreatethemostpredictivemodels.Again,thinkprobabilities,confidencelevels,F-distributions,t-tests,andp-values.Chapter7willcoversomeofthedifferentanalyticalgorithmsthatthedatascientistsmightuseandinwhatcontext.
Step6:EvaluateModelGoodnessofFit.Instep6,thedatascientistsascertainthemodel'sgoodnessoffit.Thegoodnessoffitofastatisticalmodeldescribeshowwellthemodelfitsasetofobservations(F-test,p-value,andt-statistic).AnumberofdifferentanalytictechniqueswillbeusedtodeterminethegoodnessoffitincludingKolmogorov–Smirnovtest,Pearson'schi-squaredtest,analysisofvariance(ANOVA),andconfusion(orerror)matrix(seeFigure5.8).
TheDataModelsAreDifferentThedatamodelsthatareusedinthedatawarehousetosupportanorganization'sBIeffortsaresignificantlydifferentfromthedatamodelsthedatascientistsprefertouse.
DataModelingforBITheworldofBI(akaquery,reporting,dashboards)requiresadatamodelingtechniquethatallowsbusinessuserstocreatetheirownreportingandqueries.Tosupportthisneed,RalphKimballpioneereddimensionalmodeling—orstarschemas—whileatMetaphorComputersbackinthe1980s(seeFigure5.9).
Figure5.9Dimensionalmodel(starschema)
Thedimensionalmodelwasdesignedtoaccommodatetheanalysisneedsofthebusinessusers,withtwoimportantdesignconcepts:
Facttables(populatedwithmetricsormeasures)correspondtotransactionalsystemssuchasorders,shipments,sales,returns,premiums,claims,accountsreceivable,andaccountspayable.Factsaretypicallynumericvaluesthatcanbeaggregated(e.g.,averaged,counted,orsummed).
Dimensiontables(populatedwithattributesaboutthatdimension)representthe“nouns”ofthatparticulartransactionalsystemsuchasproducts,markets,stores,employees,customers,anddifferentvariationsoftime.Dimensionsaregroupsofhierarchiesanddescriptorsthatdescribethefacts.Itisthesedimensionalattributesthatenableanalyticexploration,attributessuchassize,weight,location(street,city,state,zip),age,gender,tenure,etc.
Dimensionalmodelingisidealforbusinessusersbecauseitsupportstheirnaturalquestion-and-answerexplorationprocesses.DimensionalmodelingsupportsBIconceptssuchasdrillacross(navigatingacrossdimensions)anddrillup/drilldown(navigatingupanddownthedimensionalhierarchiessuchastheproductdimensionhierarchyofproduct⇨brand⇨category).Today,allBItoolsusedimensionalmodelingasthestandardwayforinteractingwiththeunderlyingdatawarehouse.
DataModelingforDataScienceIntheworldofdatascience,Hadoopprovidesanopportunitytothinkdifferentlyabouthowwedodatamodeling.HadoopwasoriginallydesignedbyYahootodealwithverylong,flatweblogs.Hadoopwasdesignedwithverylargedatablocks(Hadoopdefaultblocksizeis64MBto128MBversusrelationaldatabaseblocksizesthataretypically32Kborless).Tooptimizethisblocksizeadvantage,thedatascienceteamwantsverylong,flatrecordsandlong,flatdatamodels.1
Forexample,somedatascientistspreferto“flatten”astarschemabycollapsingorintegratingthedimensionaltablesthatsurroundthefacttableintoasingle,flatrecordinordertoconstructandexecutemorecomplexdataquerieswithouthavingtousejoins(seeFigure5.10).
Figure5.10UsingflatfilestoeliminateorreducejoinsonHadoop
AsanexampleinFigure5.10,insteadofthreedifferentstarschemaswithconformedorshareddimensionstolinkthedifferentstarschemas,thedatascienceteamwantsthreelong,flatfileswiththefollowingcustomerdata:
Customerdemographics(age,gender,currentandprevioushomeaddresses,valueofcurrentandprevioushome,historyofmaritalstatus,kidsandtheiragesandgenders,currentandpreviousincome,etc.)
Customerpurchasehistory(annualpurchasesincludingitemspurchased,returns,pricespaid,discounts,coupons,location,dayofweek,timeofday,weathercondition,temperatures)
Customersocialactivities(entirehistoryofsocialmediaposts,likes,shares,tweets,favorites,retweets,etc.)
TheViewoftheBusinessIsDifferentInsteadoftryingtobuildthe“singleversionofthetruth”orcreatea“360-degreeviewofthecustomer,”thedatascienceteamwillbuildanalyticprofilesoneachoftheorganization'skeybusinessentitiesorstrategicnounsattheindividualentitylevel.
Oneofthemostpowerfuldatascienceconceptsistheanalyticprofile.Thedatascienceteambuildsdetailedanalyticprofilesthatcapturethebehaviors,propensities,preferences,andtendenciesofindividualbusinessentities(e.g.,customers,merchants,students,patients,doctors,windturbines,jetengines,ATMs).
Ananalyticprofileisacombinationofmetrics,keyperformanceindicators,scores,associationrules,andanalyticinsightscombinedwiththetendencies,behaviors,propensities,associations,affiliations,interests,andpassionsforanindividualentity(customer,device,partner,machine).
Forexample,theanalyticprofileforBillSchmarzoforStarbucksmightincludethefollowing:
DemographicInformation.Thisisthebasicinformationaboutmesuchasname,homeaddress,workaddress,age,gender,maritalstatus,lengthoftimeasgoldcardloyaltymember,incomelevel,valueofhome,lengthoftimeatcurrenthome,educationlevel,numberofdependents,ageandmakeofcar,ageandgenderofchildren,etc.
TransactionalMetrics.ThisisinformationaboutmytransactionswithStarbuckssuchasnumberofpurchases,purchaseamounts,productpurchasedandinwhatcombinations,frequencyofvisits,recencyofvisits,mostcommontimeofdayforvisits,storesvisitedmostfrequently,etc.
SocialMediaMetrics.ThisisinformationgatheredaboutanysocialmediacommentsthatBillSchmarzomighthavemadeacrossdifferentsocialmediasitesaboutStarbucksincludingposts,likes,tweets,retweets,socialmediaconversations,Yelpratings,blogs,e-mailconversations,consumercomments,mobileusage,webclicks,etc.Starbuckscouldminethesocialmediadatatounderstandmynetworkofpersonalrelationships(number,strength,direction,sequencing,andclusteringofrelationships)andcapturemyinterests,passions,associations,andaffiliations.
BehavioralGroupings.Nowwe'restartingtogetinteresting,aswewanttocreatebehavioralinsightsthatarerelevantforthebusinessinitiativesthatStarbucksistryingtosupport.Dependingonthetargetedbusinessinitiative(customerretention,customerup-sell,customeradvocacy,newstorelocations,channelsales,etc.),hereissomebehavioralinformationthatStarbucksmightwanttocaptureaboutme:favoritedrinksinrankorder,favoritestoresinrankorder,mostfrequenttimeofdaytovisitastore,mostfrequentdayofweekto
visitastore,recencyofstorevisit,frequencyofstorevisitsinpastweek/month/quarter,howlongdoIstayatwhichstores(“passthru”or“linger”),etc.
Classifications.Nowwewanttocreatesome“classifications”aboutBillSchmarzo'slifethatmighthaveimpactonStarbucks'skeybusinessinitiativessuchaslifestageclassification(longmarriage,kidincollege,kidathome,weight/dietconscious,etc.),lifestyleclassification(heavytraveler,heavychaiteadrinker,lightexerciser,andsoon),orproductclassification(morningcoffee/oatmealconsumer,afternoonfrap/cookieconsumer,etc.).
AssociationRules.WemightalsowanttocapturesomepropensitiesaboutBill'susagepatternsthatwecanusetosupportStarbucks'skeybusinessinitiatives,includingpropensitytobuyoatmealwhenhebuyshisventichailattewhentravelinginthemorning,propensitytobuyacookie/pastrywhentravelingintheafternoon,propensitytobuyproductinthechannel,etc.
Scores.Wealsomaywanttocreatescorestosupportdecision-makingandprocessoptimization.Scoresthatwemightwanttocreate(again,dependingonStarbucks'skeybusinessinitiatives)couldincludeadvocacyscore(whichmeasuresmylikelihoodtorecommendStarbucksandmakepositivecommentsforStarbucksonsocialmedia),loyaltyscore(whichmeasuresmylikelihoodtocontinuetovisitStarbucksstoresandbuyStarbucksproductsversuscompetitors),productusagescore(whichisameasureofhowmuchStarbucksproductIconsume—andrevenueIgenerate—whenIvisitaStarbucksstore),etc.
Aprofilecouldbemadeupofhundredsofmetricsandscoresthat—whenusedincombinationagainstaspecificbusinessinitiativelikecustomerretention,customerup-sell,newproductintroductions,orcustomeradvocacy—canimprovethepredictivecapabilitiesofthemodel(seeFigure5.11).
Figure5.11Samplecustomeranalyticprofile
Somemetricsandscoresaremoreimportantthanothers,dependingonthebusinessinitiativebeingaddressed.Forexample,afinancialservicesfirmfocusedoncustomeracquisition,disposableincome,retirementreadiness,lifestage,age,educationlevel,andnumberoffamilymembersdatamaybethemostimportantpredictivemetrics.However,forthatsamefinancialservicesfirmfocusedoncustomerretention,metricssuchasadvocacy,customersatisfaction,attritionrisk,socialnetworkassociations,andselectsocialmediarelationshipsmaybethemostimportantpredictivemetrics.
Forexample,againstacustomerretentionbusinessinitiative,anorganizationcouldcompareacustomer'smostrecentactivities(e.g.,purchases,mobileappusage,websitevisits,consumercomments,socialposts)tothehistoricaldata,metrics,andscoresthatcomposethatcustomer'sanalyticprofileinordertodetermine(score)hisorherlikelihoodtoattrite.Ifthecustomer's“AttritionScore”isaboveacertainlevel,thentheorganizationcoulddeliverapersonalized“nextbestoffer”inordertopreemptcustomerattrition.Theanalysisprocessforthe“ImproveCustomerRetention”businessinitiativeislaidoutinFigure5.12.
Figure5.12Improvecustomerretentionexample
Theanalysisprocessworkslikethis:
Step1:Establishahypothesisthatyouwanttotest.Inourcustomerretentionexample,ourtesthypothesisisthat“Premiumgoldcardmemberswithgreaterthanfivedayswithoutapurchaseormobileappengagementhave25to30percenthigherprobabilityofchurnthansimilarcustomers.”
Step2:Identifyandquantifythemostimportantmetricsorscorestopredictacertainbusinessoutcome.Inourexample,themetricsandscoresthatwe'regoingtousetotestourcustomerattritionhypothesisincludesCustomerTenure(inmonths),CustomerSatisfactionScore,AverageMonthlyPurchases,andCustomerLoyaltyScore.Noticethatthemetricsdonothavethesameweight(orconfidencelevel).Somemetricsandscoresaremoreimportantthanothersinpredictingperformancegiventhetesthypothesis.
Step3:Employthepredictivemetricstobuilddetailedprofilesforeachindividualcustomerwithrespecttothehypothesistobetested.
Step4:Compareanindividual'srecentactivitiesandcurrentstatewithhisorherprofileinordertoflagunusualbehaviorsandactionsthatmaybeindicativeofacustomerretentionproblem.Inourcustomerretentionexample,wemightwanttocreatea“CustomerAttrition”scorethatquantifiesthelikelihoodthatparticularcustomerisgoingtoleave,andthencreatespecificrecommendationsastowhatactionsor“nextbestoffers”canbedeliveredtoretainthatcustomer.
Step5:Continuetoseekoutnewdatasourcesandnewmetricsthatmaybebetterpredictorsofattrition.Thisisalsothepartofthedatascienceprocesstocontinuouslytrytoimprovetheaccuracyandconfidencelevelsofthemetrics
andscoresusingsensitivityanalysisandsimulationsliketheMonteCarloexperiments.
Step6:Integratetheanalyticinsights,scores,andrecommendationsintothekeyoperationalsystems(likelyCRM,directmarketing,pointofsales,andcallcenterforthecustomerretentionbusinessinitiative)inordertoensurethattheinsightsuncoveredbytheanalysisareactionablebyfrontlineorcustomer-engagingemployees.
SummaryOrganizationsarerealizingthatdatascienceisverydifferentfromBIandthatonedoesnotreplacetheother.Bothcombinetoprovidethe“dynamicduo”ofanalytics—onefocusedonmonitoringthecurrentstateofthebusinessandtheothertryingtopredictwhatislikelytohappenandthenprescribewhatactionstotake.
Bigdataisakeyenablerofanewdisciplinecalleddatascience.Datascienceseekstoleveragenewsourcesofstructuredandunstructureddata,coupledwithadvancedpredictiveandprescriptiveanalytics,touncovernewvariablesandmetricsthatarebetterpredictorsofperformance.
Asdiscussedinthischapter,BIisdifferentfromdatascienceinthefollowingways:
Thequestionsaredifferent.
Theanalyticcharacteristicsaredifferent.
Theanalyticengagementprocessesaredifferent.
Thedatamodelsaredifferent.
Thebusinessviewisdifferent.
Thischapteralsointroducedtheveryimportantdatascienceconceptcalledanalyticprofiles.Organizationsarelearningthatmoreimportantthantryingtocreatea360-degreeprofileofthecustomerisidentifyingandquantifyingthosefewerbutmoreimportantmetricsthatarebetterpredictorsofbusinessorcustomerperformancesuchasoptimizingkeybusinessprocesses,influencingcustomerbehaviors,anduncoveringnewmonetizationopportunities.
Hopefullyyourjourneythroughthe“lookingglass”wasasenlighteningtoyouasitwastome!
HomeworkAssignmentUsethefollowingexercisestoapplywhatyoulearnedinthischapter.
Exercise#1:DescribethekeydifferencesbetweenBIanddatascienceandwhatthosedifferencesmeantoyourorganization.
Exercise#2:Listsampledescriptive(Whathappened?),predictive(Whatislikelytohappen?),andprescriptive(WhatactionsshouldItake?)questionsthatarerelevanttothetargetedbusinessinitiativethatyouidentifiedinChapter2.
Exercise#3:ForthetargetedbusinessinitiativeidentifiedinChapter2,listsomeofthekeymetricsandvariablesthatyoumightwanttocaptureinordertosupportthepredictiveandprescriptivequestionslistedinExercise#2.
Notes1ApacheHadoopisanopen-sourcesoftwareframeworkwritteninJavafordistributedstorageanddistributedprocessingofverylargedatasetsoncomputerclustersbuiltfromcommodityhardware.AllthemodulesinHadooparedesignedwithafundamentalassumptionthathardwarefailures(ofindividualmachinesorracksofmachines)arecommonplaceandthusshouldbeautomaticallyhandledinsoftwarebytheframework.(Source:Wikipedia)
Chapter6DataScience101Therearemanyexcellentbooksandcoursesfocusedonteachingpeoplehowtobecomeadatascientist.Thosebooksandcoursesprovidedetailedmaterialandexercisesthatteachthekeycapabilitiesofdatasciencesuchasstatisticalanalysis,datamining,textmining,SQLprogramming,andothercomputing,mathematical,andanalytictechniques.Thatisnotthepurposeofthischapter.
ThepurposeofChapter6istointroducesomedifferentanalyticalgorithmsthatbusinessusersshouldbeawareofandtodiscusswhenitmightbemostappropriatetousewhichtypesofalgorithms.Youdonotneedtobeadatascientisttounderstandwhenandwhytoapplytheseanalyticalgorithms.Amoredetailedunderstandingofthesedifferentanalyticalgorithmswillhelpthebusinessuserstocollaboratewiththedatascienceteamtouncoverthosevariablesandmetricsthatmaybebetterpredictorsofbusinessperformance.
DataScienceCaseStudySetupDatascienceisacomplicatedtopicthatcertainlycannotbegivenjusticeinasinglechapter.SotohelpgraspsomeofthedatascienceconceptsthatarecoveredinChapter6,youaregoingtocreateafictitiouscompanyagainstwhichyoucanapplythedifferentanalyticalgorithms.Hopefullythiswillmakethedifferentdatascienceconcepts“cometolife.”
Ourfictitiouscompany,Fairy-TaleThemeParks(“TheParks”),hasmultipleamusementparksacrossNorthAmericaandwantstoemploybigdataanddatascienceinorderto:
Deliveramorepositiveandcompellingguestexperienceinanincreasinglycompetitiveentertainmentmarketplace;
Determinemaximumpotentialguestlifetimevalue(MPGTV)touseasthebasisfordeterminingguestpromotionalspendanddiscountsandprioritizingPriorityAccesspassesandin-parkhotelrooms;
Promotenewtechnology-heavy3Dattractions(TerrorAirlineandZombieApocalypse)toensurethesuccessfuladoptionandlong-termviabilityofthosenewridesthatappealtonewguestsegments;
EnsurethesuccessofnewmovieandTVcharactersinordertoincreaseassociatedlicensingrevenuesandensurelong-termcharacterviabilityfornewmovieandTVsequels.
TheParksisdeployingamobileappcalledFairy-TaleChaperonthatengagesguestsastheymovethroughtheparkandhelpsguestsenjoythedifferentattractions,entertainment,retailoutlets,andrestaurants.Fairy-TaleChaperonwill:
DeliverPriorityAccesspassestodifferentattractionsandrewardtheirmostimportantguestswithdigitalcoupons,discounts,and“FairyDust”(moneyequivalentthatcanbespentonlyinthepark).
Promotesocialmediapoststodrivegamificationandrewardsaroundcontestssuchasmostsocialposts,mostpopularsocialposts,andmostpopularphotosandvideos.
Trackguestflowandin-parktrafficpatterns,tendencies,andpropensitiesinordertodeterminewhichattractionstopromote(toincreaseattractiontraffic)andwhichattractionsguestsshouldavoidbecauseoflongwaittimes.
Deliverreal-timeguestdiningandentertainmentrecommendationsbasedonguests'areasofinterestandseat/tableavailabilityforselectrestaurantsandentertainment.
Rewardguestswhosharetheirsocialmediainformationthatcanbeusedtomonitorguestreal-timesatisfactionandenjoymentviaFacebook,Twitter,andInstagram.Italsoprovidesanopportunitytopromoteselectphotosinorderto
startviralmarketingcampaigns.
Thischapterreviewsanumberofdifferentanalytictechniques.Youarenotexpectedtobecomeanexpertinthesedifferentanalyticalgorithms.However,themoreyouunderstandwhattheseanalyticalgorithmscando,thebetterpositionyouareintocollaboratewithyourdatascienceteamandsuggesttheartofthepossibletoyourbusinessleadershipteam.
FundamentalexploratoryanalyticalgorithmsthatarecoveredinChapter6are:
Trendanalysis
Boxplots
Geography(spatial)analysis
Pairsplot
Timeseriesdecomposition
Moreadvancedanalyticalgorithmsthatarecoveredinthischapterare:
Clusteranalysis
Normalcurveequivalent(NCE)analysis
Associationanalysis
Graphanalysis
Textmining
Sentimentanalysis
Traversepatternanalysis
Decisiontreeclassifieranalysis
Cohortsanalysis
Throughoutthechapter,youwillcontemplatehowTheParkscouldleverageeachofthesedifferentanalytictechniques.
NOTE
ThroughoutthischapterIprovidelinkstositesthatcanhelpyougetcomfortablewiththesedifferentanalyticalgorithms.ManyofthesiteshaveexercisesthatuseR.1IstronglyrecommenddownloadingRandRStudionow!
NOTE
IcommonlyuseWikipediatorefinethedefinitionsofmanyofthesedifferentanalyticalgorithms.Wikipediaisagreatsourceformoredetailsoneachoftheseanalyticalgorithms.
FundamentalExploratoryAnalyticsLet'sstartbycoveringsomebasicstatisticalanalysisthatwaslikelycoveredinyourfirststatisticscourse(yes,Irealizethatyouprobablysoldyourstatsbooktheminutethestatsclasswasover).Trendanalysis,boxplots,geographicalanalysis,pairsplot,andtimeseriesdecompositionareexamplesofexploratoryanalyticalgorithmsthatthedatascientistsusetogeta“feelforthedata.”Theseexploratoryanalyticalgorithmshelpthedatascienceteamtobetterunderstandthedatacontentandgainahigh-levelunderstandingofrelationshipsandpatternsinthedata.
TrendAnalysisTrendanalysisisafundamentalvisualizationtechniquetospotpatterns,trends,relationships,andoutliersacrossatimeseriesofdata.Oneofthemostbasicyetverypowerfulexploratoryanalytics,trendanalysis(applyingdifferentplottingtechniquesandgraphicvisualizations)canquicklyuncovercustomer,operational,orproducttrendsandeventsthattendtohappentogetherorhappenatsomeperiodofregularity(seeFigure6.1).
Figure6.1Basictrendanalysis
InFigure6.1,thedatascientistmanuallytestedanumberofdifferenttrendingoptionsinordertoidentifythe“bestfit”trendline(inthisexample,usingMicrosoftExcel).Oncethedatascientistidentifiesthebesttrendingoption,thedatascientistcanautomatethegenerationofthetrendlinesusingR.
Next,thedatascientistmightwanttodissectthetrendlineacrossanumberof
differentbusinessdimensions(e.g.,products,geographies,salesterritories,markets)inordertoundercoverpatternsandtrendsatthenextlevelofgranularity.Thedatascientistcanthenwriteaprogramtojuxtaposethedetailedtrendlinesintothesamechartsothatitiseasiertospottrends,patterns,relationships,andoutliersburiedinthegranulardata(seeFigure6.2).
Figure6.2Compoundtrendanalysis
Finally,trendanalysisyieldsmathematicalmodelsforeachofthetrendlines.Thesemathematicalmodelscanbeusedtoquantifyreoccurringpatternsorbehaviorsinthedata.Themostinterestinginsightsfromthetrendlinescanthenbeflaggedforfurtherinvestigationbythedatascienceteam(seeFigure6.3).
Figure6.3Trendlineanalysis
WARNING
Somebusinessuserstrytousetrendanalysistopredictfutureeventsthroughsimpletimeseriesextrapolations.Extrapolatingatimeseriestrendtopredictfuturebehaviorsandeventsiscommonbuthighlyriskyunlessyouoperatewithina100percentstableenvironment.
TheParksRamifications
TheParkscouldusetrendanalysistoidentifythevariables(e.g.,waittimes,socialmediaposts,consumercomments)thatarehighlycorrelatedtotheincreaseordecreaseinguestsatisfactionforeachattraction,restaurant,retailoutlet,andentertainment.TheParkscouldleveragetheresultsfromthetrendanalysisto
1. Flagproblemareasandtakecorrectiveactionssuchasopeningmorelines,promotinglessbusyattractions,movingkiosksthatareblockingtrafficflow,andresituatingcharactersatdifferentpointsintheparks;
2. Identifythelocationandtypesoffutureattractions,restaurants,retailoutlets,andentertainment.
Formoreinformationabouthowtomakesimpleplotsandgraphs(linecharts,barcharts,histograms,dotcharts)inR,checkouthttp://www.harding.edu/fmccown/r/.
BoxplotsBoxplotsareoneofthemoreinterestingandvisuallycreativeexploratoryanalyticalgorithms.Boxplotsquicklyvisualizevariationsinthebasedataandcanbeusedtoidentifyoutliersinthedataworthyoffurtherinvestigation.Aboxplotisaconvenientwayofgraphicallydepictinggroupsofnumericaldatathroughtheirquartiles.Boxplotsmayalsohavelinesextendingverticallyfromtheboxes(whiskers)indicatingvariabilityoutsidetheupperandlowerquartiles,hencethetermsbox-and-whiskerplotandbox-and-whiskerdiagram(seeFigure6.4).
Figure6.4Boxplotanalysis
OnecanquicklyseethedistributionofkeydataelementsfromtheBoxplotinFigure6.4.Whenyouchangethedimensionsagainstwhichyouaredoingtheboxplots,underlyingpatternsandrelationshipsinthedatastarttosurface.
TheParksRamifications
TheParkscanemployboxplotstodetermineitsmostloyalguestsforeachofthepark'sattractions(e.g.,CanyonCopterRide,MonsterMansion,SpaceAdventure,GhoulishGulch).TheParkscanusetheresultsoftheboxplotanalysistocreateguestcurrentandmaximumlifetimevaluescoresagainstwhichtoprioritizetowhomtorewardwithPriorityAccesspassesandothercouponsanddiscounts.
FormoreinformationaboutcreatingboxplotsinR,checkouthttp://www.r-bloggers.com/box-plot-with-r-tutorial/.
Geographical(Spatial)AnalysisGeographicalorspatialanalysisincludestechniquesforanalyzinggeographicalactivitiesandconditionsusingabusinessentity'stopological,geometric,orgeographicproperties.Forexample,geographicalanalysissupportstheintegrationofzipcodeanddata.goveconomicdatawithaclient'sinternaldatatoprovideinsightsaboutthesuccessoftheorganization'sgeographicalreachandmarketpenetration(seeFigure6.5).
Figure6.5Geographical(spatial)trendanalysis
IntheexampleinFigure6.5,geographicalanalysisiscombinedwithtrendanalysisinordertoidentifychangesinmarketpatternsacrosstheorganization'skeymarkets.Geographicalanalysisisespeciallyusefulfororganizationslookingtodeterminethesuccessoftheirsalesandmarketingefforts.
TheParksRamifications
TheParkscanconductgeographicaltrendanalysistospotanychanges(atboththezip+4andhouseholdlevels)inthegeo-demographicsofguestsovertimeandbyseasonalityandholidays.TheParkscanusetheresultsofthisgeographicalplusseasonalityanalysistocreategeo-specificcampaignsandpromotionswiththeobjectiveofincreasingattendancefromunder-penetratedgeographicalareasbydayofweek,holidays,andseasonality.
PairsPlotPairsplotanalysismaybemyfavoriteanalyticsalgorithm.Pairsplotanalysisallowsthedatascientisttospotpotentialcorrelationsusingpairwisecomparisonsacrossmultiplevariables.Pairsplotanalysisprovidesadeepviewintothedifferentvariablesthatmaybecorrelatedandcanformthebasisforguidingthedatascienceteamintheidentificationofkeyvariablesormetricstoincludeinthedevelopmentofpredictivemodels(seeFigure6.6).
Figure6.6Pairsplotanalysis
Pairsplotanalysisdoeslotsofthegruntworkofquicklypairingupdifferentvariablesanddimensionssothatonecanquicklyspotpotentialrelationshipsinthedataworthyofmoredetailedanalysis(seetheboxesinFigure6.6).
TheParksRamifications
TheParkscanleveragepairsplotanalysistocompareamultitudeofvariablestoidentifythosevariablesthatdrivegueststoparticularattractions,entertainment,retailoutlets,andrestaurants.TheParkscanusetheresultsoftheanalysistodrivein-parkpromotionaldecisionsandoffersthatdirectgueststounder-utilizedattractions,entertainment,retailoutlets,andrestaurants.
AdditionalpairedplotoptionsinR(e.g.,pairs,splom,plotmatrix,ggcorplot,panelcor)canbefoundathttp://www.r-bloggers.com/five-ways-to-visualize-your-pairwise-comparisons/.
TimeSeriesDecompositionTimeseriesdecompositionexpandsonthebasictrendanalysisbydecomposingthetraditionaltrendanalysisintothreeunderlyingcomponentsthatcanprovidevaluablecustomer,product,oroperationalperformanceinsights.Thesetrendanalysiscomponentsare
Cyclicalcomponentthatdescribesrepeatedbutnon-periodicfluctuations,
Seasonalcomponentthatreflectsseasonality(seasonalvariation),
Irregularcomponent(or“noise”)thatdescribesrandom,irregularinfluencesandrepresentstheresidualsofthetimeseriesaftertheothercomponentshavebeenremoved.
Fromthetimeseriesdecompositionanalysis,abusinessusercanspotparticularareasofinterestinthedecomposedtrenddatathatmaybeworthyoffurtheranalysis(seeFigure6.7).
Figure6.7Timeseriesdecompositionanalysis
ForexampleinexaminingFigure6.7,onecanspotunusualoccurrencesintheareasofSeasonalityandTrend(highlightedintheboxes)thatmaysuggesttheinclusionofadditionaldatasources(suchasweatherormajorsportingandentertainmenteventsdata)inanattempttoexplainthoseunusualoccurrences.
TheParksRamifications
TheParkscandeploytimeseriesdecompositionanalysistoidentifyandquantifytheimpactthatseasonalityandspecificeventsarehavingonguestvisitsandassociatedspend.TheParkscanusetheresultsoftheanalysisto
1. Createseason-specificmarketingcampaignsandpromotionstoincreaseguestvisitsandassociatedspend,
2. Determinewhicheventsoutsideofthethemeparks(concerts,professionalsportingevents,BCSfootballgames)areworthyofpromotionalandsponsorshipspend.
FormoreinformationabouttimeseriesdecompositioninR,checkouthttp://www.r-bloggers.com/time-series-decomposition/.
AnalyticAlgorithmsandModelsThefollowinganalyticalgorithmsstarttomovethedatascientistbeyondthedataexplorationstageintothemorepredictivestagesoftheanalysisprocess.Theseanalyticalgorithmsbytheirnaturearemoreactionable,allowingthedatascientisttoquantifycauseandeffectandprovidethefoundationtopredictwhatislikelytohappenandrecommendspecificactionstotake.
ClusterAnalysisClusteranalysisisusedtouncoverinsightsabouthowcustomersand/orproductsclusterintonaturalgroupingsinordertodrivespecificactionsorrecommendations(e.g.,personalizedmessaging,targetmarketing,maintenancescheduling).Clusteranalysisorclusteringistheexerciseofgroupingasetofobjectsinsuchawaythatobjectsinthesamegrouparemoresimilartoeachotherthantothoseinothergroups(clusters).
Clusteringanalysiscanuncoverpotentialactionableinsightsacrossmassivedatavolumesofcustomerandproducttransactionsandevents.Clusteranalysiscanuncovergroupsofcustomersandproductsthatsharecommonbehavioraltendenciesand,consequently,andcanbetargetedwiththesamemarketingtreatments(seeFigure6.8).
Figure6.8Clusteranalysis
NOTE
IcanuseapiechartinFigure6.8becauseIwasdealingwithonlyasmallnumberofclusters.Generallyspeaking,piechartsarenotgoodforconveyinginformationbecausealargenumberofpiesegmentsobscuresthedataandmakesithardtouncoveranyunderlyingtrendsorrelationshipsburiedinthedata.
TheParksRamifications
TheParkscanleverageclusteranalysistocreatemoreactionableprofilesofthepark'smostprofitableguestclustersandhighestpotentialguestclusters.TheParkscanusetheresultsoftheanalysistoquantify,prioritize,andfocusguestacquisitionandguestactivationmarketingefforts.
FormoreinformationaboutclusteranalysisinR,checkouthttp://www.statmethods.net/advstats/cluster.html.
NormalCurveEquivalent(NCE)AnalysisAtechniquefirstusedinevaluatingstudents'testingperformance,normalcurveequivalent(NCE),isadatatransformationtechniquethatapproximatelyfitsanormaldistributionbetween0and100bynormalizingadatasetinpreparationforpercentilerankanalysis.Forexample,anNCEdatatransformationisawayofstandardizingscoresreceivedonatestintoa0–100scalesimilartoapercentilerankbutpreservingthevaluableequal-intervalpropertiesofaz-score(seeFigure6.9).
Figure6.9Normalcurveequivalentanalysis
WhatIfindmostusefulabouttheNCEdatatransformationistakingtheNCEresultsandbinningtheresultstolookfornaturalgroupingsinthedata.ForexampleinFigure6.10,youbuildontheNCEanalysistouncoverpricepoints(bins)acrossawiderangeofhigh-margin,mid-margin,andlow-marginproductcategoriesthatmightindicatetheopportunityforpricingand/orpromotionalactivities.
TheParksRamifications
TheParkscanemploytheNCEtechniquetounderstandpriceinflectionpointsforpackagesofattractionsandrestaurants.TheParkscanleveragethepriceinflectionpointstooptimizepricing(e.g.,createapackageofattractionsandrestaurantsbyseasonality,holiday,dayofweek,etc.)andcreatenewPriorityAccesspackages.
Formoreinformationabouthowtousez-scorestonormalizedatausingR,checkouthttp://www.r-bloggers.com/r-tutorial-series-centering-variables-and-generating-z-scores-with-the-scale-function/.FormoreinsightsintotheNCEdatatransformationtechnique,seehttps://en.wikipedia.org/wiki/Normal_curve_equivalent.
AssociationAnalysisAssociationanalysisisapopularalgorithmfordiscoveringandquantifyingrelationshipsbetweenvariablesinlargedatabases.Associationanalysisshowscustomerorproducteventsoractivitiesthattendtohappentogether,whichmakesthistypeofanalysisveryactionable.Forexample,theassociationrule{buns,ketchup}→{burger}foundinthepoint-of-salesdataofasupermarketwouldindicatethatifacustomerbuysbunsandketchuptogether,sheislikelytoalsobuyhamburgermeat.Suchinformationcanbeusedasthebasisformakingpricing,productplacement,promotion,andothermarketingdecisions.
Associationanalysisisthebasisformarketbasketanalysis(identifyingproductsand/orservicesthatsellincombinationorsellwithapredictabletimelag)thatisusedinmanyindustriesincludingretail,telecommunications,insurance,digitalmarketing,creditcards,banking,hospitality,andgaming.
InFigure6.11,thedatascienceteamexaminedthecreditcardtransactionsforoneindividualanduncoveredseveralpurchaseoccurrencesthattendedtohappentogether.Forexample,youcanseeaverystrongrelationshipbetweenChipotleandStarbucksinthesecondlineofFigure6.11,aswellasanumberofpurchaseoccurrences(e.g.,FootLocker+BestBuy)thattendtohappenincombination.
Figure6.11Associationanalysis
Oneveryactionabledatasciencetechniqueistoclustertheresultingassociationrulesintocommongroupsorsegments.Forexample,inFigure6.12,thedatascienceteamclusteredtheresultingassociationrulesacrosstensofmillionsofcustomersinordertocreatemoreaccurate,relevantcustomersegments.Inthisprocess,thedatascienceteam
Runstheassociationanalysisacrossthetensofmillionsofcustomerstoidentifyassociationruleswithahighdegreeofconfidence,
Clustersthecustomersandtheirresultingassociationrulesintocommongroupingsorsegments(e.g.,Chipotle+Starbucks,VirginAmerica+Marriott),
Usesthesenewsegmentsasthebasisforpersonalizedmessaginganddirectmarketing.
Figure6.12Convertingassociationrulesintosegments
Oneoftheinterestingconsequencesofthisassociationruleclusteringtechniqueis
thatacustomermayappearinmultiplesegments.Artificiallyforce-fittingacustomerintoasinglesegmentobscuresthefinenuancesabouteachparticularcustomer'sbuyingbehaviors,tendencies,andpropensities.
TheParksRamifications
TheParkscanleveragemarketbasketanalysistoidentifythemostpopularandleastpopular“packagesofattractions.”TheParkscanusethis“packagesofattractions”datato
1. CreatenewpricingandPriorityAccesspackagesforthemostpopularpackagesinordertooptimizein-parktrafficflowandreduceattractionwaittimes,
2. CreatenewpricingandPriorityAccesspackagesfortheleastpopular“packages”inordertodrivetraffictounder-utilizedattractions.
FormoreinformationaboutassociationanalysisinR,checkouthttp://www.rdatamining.com/examples/association-rules.
GraphAnalysisGraphanalysisisoneofthemorepowerfulanalysistechniquesmadepopularbysocialmediaanalysis.Graphanalysiscanquicklyhighlightcustomerormachine(thinkInternetofThings)relationshipsobscuredacrossmillionsifnotbillionsofsocialandmachineinteractions.
Graphanalysisusesmathematicalstructurestomodelpairwiserelationsbetweenobjects.A“graph”inthiscontextismadeupof“vertices”or“nodes”andlinescallededgesthatconnectthem.Socialnetworkanalysis(SNA)isanexampleofgraphanalysis.SNAisusedtoinvestigatesocialstructuresandrelationshipsacrosssocialnetworks.SNAcharacterizesnetworkedstructuresintermsofnodes(peopleorthingswithinthenetwork)andthetiesoredges(relationshipsorinteractions)thatconnectthem(seeFigure6.13).
Figure6.13Graphanalysis
Whilegraphanalysisismostcommonlyusedtoidentifyclustersof“friends,”
uncovergroupinfluencersoradvocates,andmakefriendrecommendationsonsocialmedianetworks,graphanalysiscanalsolookatclusteringandstrengthofrelationshipsacrossdiversenetworkssuchasATMs,routers,retailoutlets,smartdevices,websites,andproductsuppliers.
TheParksRamifications
TheParkscanemploygraphanalysistouncoverstrengthofrelationshipsamonggroupsofguests(leaders,followers,influencers,cohorts).TheParkscanleveragethegraphanalysisresultstodirectpromotions(discounts,restaurantvouchers,travelvouchers)togroupleadersinordertoencouragetheseleaderstobringgroupsbacktotheparksmorefrequently.
FormoreinformationaboutsocialnetworkanalysisinR,checkouthttp://www.r-bloggers.com/an-example-of-social-network-analysis-with-
r-using-package-igraph/.
TextMiningTextminingreferstotheprocessofderivingusableinformation(metadata)fromtextfilessuchasconsumercomments,e-mailconversations,physicianortechniciannotes,workorders,etc.Basically,textminingcreatesstructureddataoutofunstructureddata.
Textminingisaverypowerfultechniquetoshowduringanenvisioningprocess,asmanybusinessstakeholdershavestruggledtounderstandhowtheycangaininsightsfromthewealthofinternalcustomer,product,andoperationaldata.Textminingisnotsomethingthatthedatawarehousecando,somanybusinessstakeholdershavestoppedthinkingabouthowtheycanderiveactionableinsightsfromtextdata.Consequently,itisimportanttoleverageenvisioningexercisestohelpthebusinessstakeholderstoimagetherealmofwhatispossiblewithtextdata,especiallywhenthattextdataiscombinedwiththeorganization'soperationalandtransactionaldata.
Forexample,inFigure6.14,thetextminingtoolhasminedahistoryofnewsfeedsaboutaparticularproducttouncoverpatternsandcombinationsofwordsthatmayindicateproductperformanceandmaintenanceproblems.
Figure6.14Textmininganalysis
Typicaltextminingtechniquesandalgorithmsincludetextcategorization,textclustering,concept/entityextraction,taxonomiesproduction,sentimentanalysis,documentsummarization,andentityrelationmodeling.
TheParksRamifications
TheParkscanmineguestcomments,socialmediaposts,ande-mailstoflagandrankareasofconcernandproblemsituations.TheParkscanleveragethetextminingresultstolocateunsatisfiedguestsinordertodrivepersonal(face-to-face)guestinterventionefforts.
FormoreinformationabouttextmininganalysisusingR,checkouthttp://www.r-bloggers.com/text-mining-in-r-automatic-categorization-of-
wikipedia-articles/.
SentimentAnalysisSentimentanalysiscanprovideabroadandgeneraloverviewofyourcustomers'sentimenttowardyourcompanyandbrands.Sentimentanalysiscanbeapowerfulwaytogleaninsightsaboutthecustomers'feelingsaboutyourcompany,products,andservicesoutoftheever-growingbodyofsocialmediasites(Facebook,LinkedIn,Twitter,Instagram,Yelp,Snapchat,Vine,etc.)(seeFigure6.15).
Figure6.15Sentimentanalysis
InFigure6.15,thedatascienceteamconductedcompetitivesentimentanalysisbyclassifyingtheemotions(e.g.,anger,disgust,fear,joy,sadness,surprise)ofTwittertweetsaboutourclientanditskeycompetitors.Sentimentanalysiscanprovideanearlywarningalertaboutpotentialcustomerorcompetitiveproblems(e.g.,whereyourorganization'sperformanceandqualityofserviceisconsideredlackingascomparedtokeycompetitors)andbusinessopportunities(e.g.,where
keycompetitor'sperceivedperformanceandqualityofserviceissuffering).
Unfortunately,itissometimesdifficulttogetthesocialmediadataattheleveloftheindividual,whichisrequiredtocreatemoreactionableinsightsandrecommendationsattheindividualcustomerlevel.However,leadingorganizationsaretryingtoincenttheircustomersto“like”theirsocialmediasitesorsharetheirsocialmedianamesinordertoimprovethecollectionofcustomer-identifiabledata.
TheParksRamifications
TheParkscanestablishasentimentscoreforeachattractionandcharacterandmonitorsocialmediasentimentfortheattractionsandcharactersinreal-time.TheParkscanleveragethereal-timesentimentscorestotakecorrectiveactions(placateunhappyguests,openadditionallines,openadditionalattractions,removekiosks,movecharacters).
FormoreinformationaboutsentimentanalysisusingR,checkouthttps://sites.google.com/site/miningtwitter/questions/sentiment/sentiment.
TraversePatternAnalysisTraversepatternanalysisisanexampleofcombiningacoupleofanalyticalgorithmstobetterunderstandcustomer,product,oroperationalusagepatterns.Traverseanalysislinksacustomerorproductusagepatternsandassociationrulestoageographicalorfacilitymapinordertoidentifypotentialpurchase,traffic,flow,fraud,theft,andotherusagepatternsandrelationships.
Theprocessstartsbycreatingassociationrulesfromthecustomer'sorproduct'susagedata,andthenmapsthoseassociationrulestoageographicalmap(store,hospital,school,campus,sportsarena,casino,airport)toidentifypotentialperformance,usage,staffing,inventory,logistics,trafficflow,etc.problems.
InFigure6.16,thedatascienceteamcreatedaseriesofassociationrulesaboutslotandtableplayinacasino,andthenusedthoseassociationrulestoidentifypotentialfootflowproblemsandgamelocationoptimizationopportunities.Thedatascienceteam
Createdplayerperformanceassociationrulesaboutwhatgamestheplayerstendtoplayincombination,
LinkedthegameplayingassociationrulestolocationID,andthen,
Mappedrulesandgameperformancedatatoalayoutofthecasino.
Figure6.16Traversepatternanalysis
Theresultsofthisanalysishighlightsareasofthecasinothataresub-optimizedwhencertaintypesofgameplayersareinthecasinoandcanleadtorecommendationsaboutthelayoutofthecasinoandthetypesofincentivestogiveplayerstochangetheirgameplayingbehaviorsandtendencies.
TheParksRamifications
TheParkscanemploytraversepatternanalysistounderstandparkandguestflowswithrespecttoattractions,entertainment,retailoutlets,restaurants,characters,etc.TheParkscanusethetraversepatternanalysisresultsto
1. Identifywheretoplacecharactersandsituateportablekiosksinordertoincreaserevenues,
2. Determinewhatpromotionstoofferinordertodrivetraffictoidleattractionsandrestaurants.
DecisionTreeClassifierAnalysisDecisiontreeclassifieranalysisusesdecisiontreestoidentifygroupingsandclustersburiedintheusageandperformancedata.Decisionclassifieranalysisusesadecisiontreeasapredictivemodelthatmapsobservationsaboutanitemtoconclusionsabouttheitem'stargetvalue.
InFigure6.17,thedatascienceteamusedthedecisiontreeclassifieranalysistechniquetoidentifyandgroupperformanceandusagevariablesintosimilarclusters.Thedatascienceteamuncoveredproductperformanceclustersthat,whenoccurringincertaincombinations,wereindicativeofpotentialproductperformanceormaintenanceproblems.
Figure6.17Decisiontreeclassifieranalysis
TheParksRamifications
TheParkscanusedecisiontreeclassifieranalysistoquantifythevariablesthatdriveguestsatisfactionandincreasespendbyguestclusters.TheParkscanleveragethedecisiontreeclassifieranalysisresultstodeterminewhichvariablestomanipulateinordertodriveguestsatisfactionandassociatedguestspend.
FormoreinformationaboutbuildingdecisiontreesusingR,checkout“Tree-BasedModels”athttp://www.statmethods.net/advstats/cart.html.
CohortsAnalysisCohortsanalysisisusedtoidentifyandquantifytheimpactthatanindividualormachineshaveonthelargergroup.
Cohortsanalysisiscommonlyusedbysportsteamstoascertaintherelativevalueofaplayerwithrespecttohisorherinfluenceonthesuccessoftheoverallteam.TheNationalBasketballAssociationusesarealplus-minus(RPM)metrictomeasureaplayer'simpactonthegame,representedbydifferencebetweentheteam'stotalscoringanditsopponent's.Table6.1showstopRPMplayersfromthe2014–2015NBAseason.
Table6.12014–2015TopNBARPMRankings
Rank Player Team MPG RPM
1 StephenCurry,PG GS 32.7 9.34
2 LeBronJames,SF CLE 36.1 8.78
3 JamesHarden,SG HOU 36.8 8.50
4 AnthonyDavis,PF NO 36.1 8.18
5 KawhiLeonard,SF SA 31.8 7.57
6 RussellWestbrook,PG OKC 34.4 7.08
7 ChrisPaul,PG LAC 34.8 6.92
8 DraymondGreen,SF GS 31.5 6.80
9 DeMarcusCousins,C SAC 34.1 6.12
10 KhrisMiddleton,SG MIL 30.1 6.06
Source:http://espn.go.com/nba/statistics/rpm/_/sort/RPM
Thispowerfultechnique(withslightvariationsduetothedifferentnatureofthevariablesandrelationships)canbeusedtoquantifytheimpactthataparticularindividual(student,teacher,player,nurse,athlete,technician)hasonthelargergroup(seeFigure6.18).
TheParksRamifications
TheParkscanemploycohortsanalysistoidentifyspecificemployeesandcharactersthatincreasetheoverallpark,attractions,characters,customer,andhouseholdsatisfactionandspendlevels.TheParkscanusetheresultsofthecohortsanalysisto
1. Decidehowmanyandwheretosituatespecific,popularcharacters;
2. Rewardparkassociatesthatdrivehighercustomersatisfactionscores.
FormoreinformationaboutcohortsanalysisinR,checkoutthearticle“CohortAnalysiswithR–RetentionCharts”athttp://analyzecore.com/2014/07/03/cohort-analysis-in-r-retention-
charts/.
SummaryTheobjectiveofChapter6istogiveyouatasteforthedifferenttypesofanalyticalgorithmsadatascienceteamcanbringtobearonthebusinessproblemsoropportunitiesthattheorganizationistryingtoaddress.Thischapterbetteracquaintedyouwiththedifferentalgorithmsthatthedatascienceteamcanusetoacceleratethebusinessuseranddatascienceteamcollaborationprocess.Whileitisnottheexpectationofthisbookorchaptertoturnbusinessusersintodatascientists,itismyhopethatChapter6willsetthefoundationthathelpsbusinessusersandbusinessleadersto“thinklikeadatascientist.”
Thischapterintroducedawidevarietyofanalyticalgorithmsthatthedatascienceteammightuse,dependingontheproblembeingaddressedandthetypesandvarietiesofdataavailable.Italsointroducedafictitiouscompany(Fairy-TaleThemeParks)againstwhichyouappliedthedifferentanalytictechniquestoseethepotentialbusinessactions(seeTable6.2).
Table6.2CaseStudySummary
Analytics Fairy-TaleParksUseCases PotentialBusinessActions
Trendanalysis Performtrendanalysistoidentifythevariables(e.g.,waittimes,socialmediaposts,consumercomments)thatarehighlycorrelatedtotheincreaseordecreaseinguestsatisfactionforeachattraction,restaurant,retailoutlet,andentertainment
Flagproblemareasandtakeimmediatecorrectiveactions(e.g.,openmorelines,promotelessbusyattractions,movekiosks,resituatecharacters)Identifythelocationforfutureattractions,restaurants,andentertainment
Boxplots Leverageboxplotstodeterminemostloyalguestsforeachofthepark'sattractions(e.g.,CanyonCopterRide,MonsterMansion,SpaceAdventure,GhoulishGulch)
CreateguestcurrentandmaximumloyaltyscoresandusethosescorestoprioritizetowhomtorewardwithPriorityAccesspassesandothercouponsanddiscounts
Geography(spatial)analysis
Conductgeographicaltrendanalysistospotanychanges(zip+4andhouseholdlevels)inthegeo-demographicsofvisitorsovertimeandbyseasonalityandholidays
Creategeo-specificmarketingcampaignsandpromotionstoincreaseattendancefromunder-penetratedgeographicalareasbasedondayofweek,holidays,seasonality,andevents(on-parkandoff-parkevents)
Pairsplot Comparemultiplevariablestoidentifythosevariablesthatdrivegueststowhichattractions,entertainment,andrestaurants
Makein-parkpromotionaldecisionsandoffersthatmovesgueststounder-utilizedattractions,entertainment,retailoutlets,andrestaurants
Timeseriesdecomposition
Leveragetimeseriesdecompositionanalysistoquantifytheimpactthatseasonalityandevents(in-parkandoff-park)hasonguestvisitsandassociatedspend
Createseason-specificmarketingcampaignsandpromotionstoincreasenumberofguestvisitsandassociatedspendDeterminewhichlocaleventsoutsideoftheparks(concerts,professionalsportingevents,BCSfootballgames)areworthyofpromotionalandsponsorshipspend
Clusteranalysis
ClustergueststocreatemoreactionableprofilesofTheParks'smostprofitableandhighestpotentialguestclusters
Leverageclusterresultstoprioritizeandfocusguestacquisitionandguestactivation,cross-sellandup-sellmarketingefforts
Normalcurveequivalent(NCE)analysis
LeverageNCEanalysistounderstandthepriceinflectionpointsfordifferentpackagesofattractionsandrestaurants
Leveragethepriceinflectionpointstocreatepackagesofattractionsandrestaurantstooptimizepricing(byseason,dayofweek,etc.)andcreatenewPriorityAccesspackages
Associationanalysis
Leveragemarketbasketanalysistoidentifymostpopularandleastpopular“baskets”ofattractions
Leveragemostcommon“baskets”tocreatenewpricingandPriorityAccesspackagesinordertobettercontroltrafficandwaittimesLeverageleastcommon“baskets”tocreatenewpricingandPriorityAccesspackagesinordertodrivetraffictounder-utilizedattractions
Graphanalysis
Leveragegraphanalysistouncoverdirectionandstrengthofrelationshipsamonggroupsof
Sendpromotions(discounts,restaurantvouchers,travelvouchers)togroupleadersin
guests(leaders,followers,influencers,cohorts)
ordertoencourageleaderstobringtheirgroupsbacktotheparksmoreoften
Textmining Mineguestcomments,socialmediaposts,ande-mailthreadstoflagareasofconcernandproblemsituations
Identifyandlocateunsatisfiedguestsinordertoprioritizeandfocuspersonal(face-to-face)guestinterventionefforts
Sentimentanalysis
Establishasentimentscoreforeachattractionandcharacterandmonitorsocialmediasentimentfortheattractionsandcharactersinreal-time
Leveragereal-timesentimentscorestotakecorrectiveactions(placateunhappyguests,openadditionallines,openadditionalattractions,removekiosks,movecharacters)
Traversepatternanalysis
Leveragetraversepatternanalysistounderstandparkandguestflowswithrespecttoattractions,entertainment,restaurants,retailoutlets,characters,etc.
IdentifywheretoplacecharactersandsituateportablekiosksinordertodriveincreasedrevenueDeterminewhatpromotionstoofferinordertodrivetraffictounder-utilizedattractionsandrestaurants
Decisiontreeclassifieranalysis
Usedecisiontreeclassifieranalysistoquantifythevariablesthatdriveguestsatisfaction
Leveragedecisiontreeclassifieranalysistodeterminewhichvariablestomanipulateinordertodriveguestsatisfactionandincreaseguestassociatedspend
Cohortsanalysis
Identifyspecificemployeesandcharactersthattendtoincreasetheoverallpark,attractions,characters,guestsatisfaction,andspendlevels
LeveragecohortsanalysistodecidehowmanyandwheretosituatecharactersIdentifyandrewardparkemployeesthatdrivehigherguestsatisfactionscores
Istronglyrecommendthatyoustaycurrentwiththedifferentanalytictechniquesthatyourdatascienceteamisusing.Takethetimetobetterunderstandwhentousewhichanalytictechniques.BuyyourdatascienceteamlotsofStarbucks,Chipotle,andwhiskey,andyourteamwillcontinuetoopenyoureyestothebusinesspotentialofdatascience.
HomeworkAssignmentUsethefollowingexercisestoapplywhatyoulearnedinthischapter.
Exercise#1:Revieweachoftheanalyticalgorithmscoveredinthischapterandwritedownoneortwousecaseswherethatparticularanalyticalgorithmmightbeusefulgivenyourbusinesssituations.
Exercise#2:RevisitthekeybusinessinitiativethatyouidentifiedinChapter2.Writedowntwoorthreeoftheanalyticalgorithmscoveredinthischapterthatyouthinkmightbeappropriatetothedecisionsthatyouaretryingtomakeinsupportofthatkeybusinessinitiative.
Exercise#3:Writedowntwoorthreebulletpointsaboutwhyyouthinkthoseselectedanalyticalgorithmsmightbemostappropriateforyourtargetedbusinessinitiative.
Notes1Risaprogramminglanguageandsoftwareenvironmentforstatisticalcomputingandgraphics.TheRlanguageiswidelyusedamongstatisticiansanddataminersfordevelopingstatisticalsoftwareanddataanalysis.R'spopularityhasincreasedsubstantiallyinrecentyears.(Source:Wikipedia)
Chapter7TheDataLakeThereisamajorindustrychangehappeningwithrespecttohoworganizationsstore,manage,andanalyzedata.Notsincetheintroductionofthedatawarehouseinthelate1980shaveweseensomethingwiththepotentialtotransformhoworganizationsleveragedataandanalyticstopowertheirkeybusinessinitiativesandrewiretheirvaluecreationprocesses.Thisnewdataandanalyticsarchitectureiscalledthedatalake,andithaspotentialtobeevenmoreimpactfulthanthedatawarehouseintransformingthewayorganizationsintegratedataandanalyticsintotheirbusinessmodels.Butasinallthingsrelatedtobigdata,organizationsmust“thinkdifferently”withrespecttohowtheydesign,deploy,andmanagetheirdataarchitecture.
Today'sdatawarehousesareextremelyexpensive.Asaresult,mostorganizationslimithowmuchdatatheystoreintheirdatawarehouse,optingfor13to25monthsofsummarizeddataversus15to25yearsofdetailedtransactionalandoperationaldata.Unfortunatelyfordatawarehouses,itisinthatdetailedtransactional,operational,sensor,wearable,socialdataandthegrowingbodyofinternalandexternalunstructureddatathatactionableinsightsaboutyourcustomers,products,campaigns,partners,andoperationscanbefound.
Forexample,overthepast15years,theUSeconomyhasgonethroughtwofulleconomiccycleswheretheeconomywasflyinghigh,collapsed,andthenrecovered.Bylookingateachoftheircustomer'sproductpurchasepatternsoverthosetwoeconomiccycles,organizationscanpredicthowacustomerispersonallyimpactedbyaneconomicdownturn.Agrocerychain,forinstance,couldmonitorindividualcustomer'sshoppingbasketsandpurchasepatternstouncoverchangesthatmayindicatechangesinthatcustomer'spersonaleconomicsituation.Fromthedetailedpoint-of-salestransactions,thegrocercouldseeachangeinanindividual'spurchasebehaviors(i.e.,movingfromexpensivetolower-costproducts,usingmorecouponsanddiscounts,increasingpurchasesofprivatelabelproducts,andsoon),whichmightindicateachangeinhisorherfinancialsituation.Theseinsightscouldprovidenewmonetizationopportunitiesandbetterwaystoservethatcustomergivenhisorherfinancialsituation,suchasapersonalizedpromotionhighlightingmoreeconomicalitemsforthatparticularcustomer.
IntroductiontotheDataLakeThedatalakewasbornoutofthe“economicsofbigdata”thatallowsorganizationstostore,manage,andanalyzemassiveamountsofdataatacostthatcanbe20to50timescheaperthanattraditionaldatawarehousetechnologies.BecauseoftheagileunderlyingHadoop/HDFSarchitecturethattypicallysupportsthedatalake,organizationscanstorestructureddata(relationaltables,csvfiles),semi-structureddata(weblogs,sensorlogs,beaconfeeds),andunstructureddata(textfiles,socialmediaposts,photos,images,video)as-is,withoutthetime-consumingandagility-limitingneedtopredefineadataschemapriortodataload.
However,therealpowerofthedatalakeistoenableadvancedanalyticsordatascienceonthedetailedandcompletehistoryofdatainanattempttouncovernewvariablesandmetricsthatarebetterpredictorsofbusinessperformance.Thedatalakecandothefollowing:
Eliminatedatasilos.Ratherthanhavingdozensofindependentlymanagedcollectionsofdata(e.g.,datawarehouses,datamarts,spreadmarts),youcancombinethesesourcesintoasingledatalakeforindexing,cataloging,andanalytics.Consolidationofthedataintoasingledatarepositoryresultsinincreaseddatauseandsharing,whilecuttingcoststhroughserverandlicensereduction.
Store,manage,protect,andanalyzedatabyconsolidatinginefficientstoragesilosacrosstheorganization.
Provideasimple,scalable,flexible,andefficientsolutionthatworksacrossblock,file,orobjectworkloads(i.e.,asharedstorageplatformthatnativelysupportsbothtraditionalandnextgenerationworkloads).
ReducethecostsofITinfrastructure.
Speeduptimetoinsights.
Improveoperationalflexibility.
Enablerobustdataprotectionandsecuritycapabilities.
Reducedatawarehouseworkloadsbyreducingtheburdenofanalytics-intensivequeriesthatwouldbebestdoneinaspecial-purposeanalyticssandboxenvironment.
Freeupdatawarehousingresourcesbyoff-loadingExtract,Transform,andLoad(ETL)processesfromthedatawarehousetothemorecost-efficient,morepowerfulHadoop-baseddatalake.
NOTE
Itistypicalthat40to60percentofthedatawarehouseprocessingloadisperformingETLwork.Off-loadingsomeoftheETLprocessestothedatalakecanfreeupconsiderabledatawarehouseresources.
UnhandcufftheBIanalystsanddatascienceteamfrombeingreliantonthesummarizedandaggregateddatainthedatawarehouseasthesinglesourceofdatafortheirdataanalytics(andmitigatetheunmanageableproliferation“spreadmarts”1thatarebeingusedbybusinessanalyststoworkaroundtheanalyticlimitationsofthedatawarehouse).
Thedatalakesolvesagreatmanyproblems.However,itcanalsoraisealotofquestions.Inapapertitled“BewaretheDataLakeFallacy”(http://www.gartner.com/newsroom/id/2809117),Gartnerraisedcautionsaboutthedatalake,specificallyaroundtheassumptionthatallenterpriseaudiencesarehighlyskilledatdatamanipulationandanalysis.Gartner'spointwasthatifadatalakefocusesonlyonstoringdisparatedataandignoreshoworwhydataisused,governed,defined,andsecuredorhowdescriptivemetadataiscapturedandmaintained,thedatalakerisksturningintoadataswamp.Andwithoutanadequatemetadatastrategy,everysubsequentuseofdatameanstheanalystsmuststartfromscratch.
Theabilityofanorganizationtorealizebusinessvaluefrombigdatareliesontheorganization'sabilitytoeasilyandquickly:
Identifythe“rightand/orbestdata”
Definetheanalyticsrequiredtoextractthevalue
Bringthedataintoananalyticsenvironment(sandbox)suitedforadvancedanalyticsordatasciencework
Curatethedatatoapointwhereitis“suited”foranalysis
Standuptherequiredinfrastructuretosupporttheanalyticsinaccordancewiththedesiredperformanceandthroughputrequirements
Executetheanalyticmodelsagainstthecurateddatatoderivebusinessvalue
Deploytheanalyticsintotheproductioninfrastructure
Delivertheanalyticresultsinanactionablemannertothebusiness
NOTE
Statingthatthedatalakeisthe“singlerepositoryforALLyourorganization'sdata”doesnotmeanthattherearenootherrepositoriesofdataacrossyourorganization.Youroperationalsystems(suchasSAP,OracleFinancials,PeopleSoft,andSiebel)willcontinuetostoredatafortheirownoperationalreportingneeds,butthedatafromthosedatasourcesshouldeventuallybeloadedintothedatalake.Andyourdatawarehouses,datamarts,andOnlineAnalyticProcessing(OLAP)cubeswillcontinuetostoredatafortheirownuniquereportingandanalysisneeds,butthedataforthoserepositoriesshouldbesourcedfromyourdatalake.Intheend,thedatalakeisthe“central”repositoryforALLyourorganization'sdata.
CharacteristicsofaBusiness-ReadyDataLakeThedatalakeisnotanincrementalenhancementtothedatawarehouse,anditisNOTdatawarehouse2.0.Thedatalakeenablesentirelynewcapabilitiesthatallowyourorganizationtoaddressdataandanalyticchallengesthatthedatawarehousecouldnotaddress.
Therearefivecharacteristicsthatdifferentiateabusiness-readydatalakefromthedatawarehouse(seeFigure7.1):
Figure7.1Characteristicsofadatalake
Ingest.Abilitytorapidlyingestdatafromawiderangeofinternalandexternaldatasources,includingstructuredandunstructureddatasources.Thedatalakecanaccomplishrapiddataingestionbecauseitcanloadthedataas-is;thatis,thedatalakedoesnotrequireanydatatransformationsorpre-buildingadataschemabeforeloadingthedata.
Store.AsingleorcentralrepositoryforamassingALLtheorganization'sdataincludingdatafrompotentiallyinterestingexternalsources.Thedatalakecanstoredataeveniftheorganizationhasnotyetdecidedhowitmightusethedata.AstheDirectorofAnalyticsandBusinessIntelligenceatStarbuckswasquoted:“AfullquarterofStarbuckstransactionsaremadeviaitspopularloyaltycards,andthatresultsin“hugeamounts”ofdata,butthecompanyisn'tsurewhattodowith[allthatdata]yet.”Thesamegoesforsocialmediadata,asStarbuckshasateamwhoanalyzessocialdata,but“Wehaven'tfiguredoutwhatexactlytodowithityet.”2
Analyze.Providesthefoundationfortheanalyticsenvironment(oranalyticssandbox)wherethedatascienceteamisfreetoexploreandevaluatedifferent
internalandexternaldatasourceswiththegoalofuncoveringnewcustomer,product,andoperationalinsightsthatcanbeusedoptimizekeybusinessprocessesandfuelnewmonetizationopportunities.
Surface.Supportstheanalyticmodeldevelopmentandtheextractingoftheanalyticresults(e.g.,scores,recommendations,nextbestoffer,businessrules)thatareusedtoempowerfrontlineemployees'andbusinessmanagers'decisionmakingandinfluencecustomerbehaviorsandactions.
Act.Enablestheintegrationoftheanalyticresultsbackintotheorganization'soperationalsystems(callcenter,directmarketing,procurement,storeoperations,logistics)andmanagementapplications(reports,dashboards)that“closestheloop”withrespecttooptimizingdataandanalytics-baseddecisionmakingacrosstheorganization.
UsingtheDataLaketoCrosstheAnalyticsChasmIcomefromthedatawarehouseworld,havinggottenstartedin1984withMetaphorComputers.Infact,thatwassolongagothatwedidn'tevencallitdatawarehousing,butinsteadcalledit“DecisionSupport”(whichI'dargueisstillabetternameforwhatwearetryingtodo).
Formostorganizations,thedatawarehouses,andtheBusinessIntelligence(BI)toolsthatrunontopofthedatawarehouses,operatelikeaproductionenvironmentwiththefollowingcharacteristics:
Abilitytosupportthecreationanddeliveryofoperationalandmanagementreportsanddashboardsonaregularlyscheduledbasis(e.g.,reportsdeliveredendofday,endofweek,endofquarter;dashboardsupdatedeverymorning)
PredictablecomputeandprocessingloadtoruntheETLroutines,generatethemanagementandoperationalreports,andupdatethemanagementdashboards
SLA-constrainedinthattherearenotmanyextraprocessingcyclestogettheETLandreportanddashboardgeneratingjobsdonewithinthe24-hourdailywindow
Heavilygoverned(datagovernance,auditability,traceability,datalineage,metadatamanagement)toensurethatthehistoricaldatabeingreportedis100percentaccurate
StandardizationoftoolsinordertobettercontrolBIandETLtoolacquisition,maintenance,training,andsupportcosts
Ontheotherhand,theanalyticsenvironmentisdramaticallydifferentfromtheBIanddatawarehouseenvironmentinitsobjectives,purpose,andoperatingcharacteristics(i.e.,howitisused).Theanalyticsenvironmentischaracterizedas:
Anexploratoryenvironmentwherethedataanalystswanttoquicklyingestandanalyzelotsofdata
Unpredictablesystemloadthatishighlydependentontheanalysts'dailyworkobjectives,explorationneeds,andadhocanalyticalrequests
Heavilyexperimentationorientedtogivethedataanalyststhefreedomtotestnewdatasources,newalgorithms,newdataenrichmenttechniques,andnewtools
Looselygovernedinthatthedataneednotbemanagedundersomegovernanceumbrellauntilthedataanalystsfirstprovethatthereissomevalueinthedata
“Besttoolforthejob”withthedataanalystsusingwhateverdatavisualization,dataexploration,andanalyticmodelingtoolswithwhichtheyfeltmostcomfortable
Asadatawarehousemanager,Ihatedtheanalyticsteam.Why?Becausewheneveritsmembersneededdata,theyalwayscametomydatawarehouseforthedatabecausetheyweretoldthatthedatawarehousewasthe“singleversionofthetruth.”Andtheanalyticteam'sdataandqueryrequestsusuallyscrewedupmyproductionSLAsintheprocess(seeFigure7.2).
Figure7.2Theanalyticsdilemma
Thesolution:putaHadoop-baseddatastore(datalake)infrontofboththedatawarehouseandtheanalyticsenvironments(seeFigure7.3).
Figure7.3Thedatalakelineofdemarcation
Thedatalakeprovidesa“lineofdemarcation”betweentheproductionrequirementsofthedatawarehouseandtheadhoc,exploratorynatureoftheanalyticsenvironment.Inaddition,thedatalakeprovidesotherbenefitsthatwewillcoverlaterinthischapter.
ModernizeYourDataandAnalyticsEnvironmentThereareseveralactionsthatorganizationscantaketodaytoexploitthevalueofthedatalaketomodernizetheirexistingdatawarehouseandanalyticsenvironments.
Action#1:CreateaHadoop-BasedDataLakeTheHadoopDistributedFileSystems(HDFS)providesapowerfulyetinexpensivefoundationforyourdatalake.HDFSisacost-effectivelargestoragesystemwithlow-cost,scalablecomputingandanalyticalcapabilities(e.g.,MapReduce,YARN,Spark).Builtoncommodityhardwareclusters,HDFSsimplifiestheacquisitionandstorageofdiversedatasources(seeTable7.1).
Table7.1DataLakeDataTypes
DataType Example
Structureddata Relationaldatabases,datatables,csvfiles
Semi-structureddata
Weblogs,sensorfeeds,XML,JSON
Unstructureddata
Socialmediaposts,textnotes,consumercomments,images,videos,audio
OnceintheHadoop/HDFSsystem,MapReduce,YARN,Spark,andotherHadoop-basedtoolsareavailabletopreparethedataforloadingintoyourdatawarehouseandanalyticenvironments(seeFigure7.4).
Figure7.4CreateaHadoop-baseddatalake
Theadvantagesofthedatalakeinclude:
Rapiddataingestionas-is.Organizationsdonotneedtopre-definetheschemaortransformthedatapriortoloadingthedataintothedatalake,whichsimplifiesandspeedstheprocessofamassingdatafromavarietyofinternalandexternaldatasources.
Low-costdataandanalyticsenvironmentbuiltoncommodityhardwareserversandopensourcesoftwarethatcanbe20to50timescheapertostore,manage,andanalyzedatathantraditionaldatawarehousetechnologies.
100percentlinearcomputescalability.Whenyouneedtodoublecomputecapacity,justdoublethenumberofnodes.
Action#2:IntroducetheAnalyticsSandboxAdatalakestrategysupportstheintroductionofaseparateanalyticsenvironmentthatoff-loadstheanalyticsbeingdonetodayonyouroverlyexpensivedatawarehouse.Thisseparateanalyticsenvironmentprovidesthedatascienceteamanon-demand,fail-fastenvironmentforquicklyingestingandanalyzingawidevarietyofdatasourcesinanattempttoaddressimmediatebusinessopportunitiesindependentofthedatawarehouse'sproductionscheduleandservicelevelagreement(SLA)rules(seeFigure7.5).
Figure7.5Createananalyticsandbox
TheanalyticsenvironmentinFigure7.5couldn'tbemoredifferentfromyourdatawarehouseenvironment.Yourdatawarehouseenvironmentisaproductionenvironmentthatneedstosupporttheregular(daily,weekly,monthly,quarterly,annual)productionofoperationalandmanagementreportsanddashboardsthat
areusedtorunthebusiness.Todothat,datawarehouseenvironmentshavestrictservicelevelagreements,areheavilygoverned,andlimitthenumberofBIandETLtoolsinordertocontroltoolacquisition,maintenance,andtrainingcosts.
Ananalyticsenvironment,ontheotherhand,ismuchmoreadhocandon-demanddriven.Theanalyticsenvironmentmustsupportthecontinuousexplorationandevaluationofnewsourcesofinternalandexternaldatainanattempttouncoveractionableinsightsaboutcustomers,products,andoperations.Theanalyticsenvironmentmustsupportthedatascienceteam'sneedtotestnewanalytictoolsandalgorithmsanddevelopnewdatatransformationandenrichmenttechniquesinsearchofthosevariablesandmetricsthatarebetterpredictorsofbusinessperformance.
Action#3:Off-LoadETLProcessesfromDataWarehousesDoingtheETLprocessingwithinyourexistingdatawarehouseisacommonpracticetoday.However,ifyourdatawarehouseisalreadyoverloadedanditisveryexpensivetoaddmoreprocessingcapacity,whydothatbatch-centric,datatransformationheavyliftingintheexpensivedatawarehouseenvironment?That'slikehavingahigh-powered,ultra-coolTeslahaulturnipsaroundthefarm.
FreeupdatawarehouseresourcesandimproveyourETLprocesseffectivenessbyoff-loadingtheETLprocessesoffyourexpensivedatawarehouseplatform.Instead,performtheETLworkinthedatalake.ThisallowsorganizationstoleveragethenativelyparallelHadoopenvironmenttobringtobeartheappropriatenumberofcomputecapabilitiesattheappropriatetimestogetthejobdonemorequicklyandmorecost-effectively(seeFigure7.6).
Figure7.6MoveETLtothedatalake
Aswe'vediscussedbefore,notonlydoesusingHadoopforyourETLworkmakesensefromacostandprocessingeffectivenessperspective,butitalsogivesorganizationsthecapabilitytocreatenewmetricsthatareoftendifficulttocreateusingtraditionalETLtools.Forexample,usingHadoopmakesitmucheasiertocreateadvancedcustomerpurchaseandproductperformancemetricsaroundfrequency(howoften),recency(howrecently),andsequencing(inwhatorder)activitiesthatcouldyieldnewinsightsthatmightbebetterpredictorsofcustomerbehaviorsandproductperformance.
AnalyticsHubandSpokeAnalyticsArchitectureWehavespentaconsiderableamountofthischapterdescribingthedatalake;nowlet'sdiscusswhyyourorganizationneedsadatalake.Thevalueandpowerofadatalakeareoftennotfullyrealizeduntilwegetintooursecondorthirdanalyticusecase.Thatisbecauseitisatthatpointwheretheorganizationneedstheabilitytoself-provisionananalyticsenvironment(computenodes,data,analytictools,permissions,datamasking)andsharedataacrosstraditionalline-of-businesssilos(onesingularorcentralizedlocationforalltheorganization'sdata)inordertosupporttherapidexplorationanddiscoveryprocessesthatthedatascienceteamusestouncovervariablesandmetricsthatmightbebetterpredictorsofbusinessperformance.
Thisisa“HubandSpoke”analyticsenvironmentwherethedatalakeisthe“hub”thatenablesthedatascienceteamstoself-provisiontheirownanalyticsandboxesandfacilitatesthesharingofdata,analytictools,andanalyticbestpracticesacrossthedifferentpartsoftheorganization(seeFigure7.7).
Figure7.7HubandSpokeanalyticsarchitecture
Thehubofthe“HubandSpoke”architectureisthedatalakethatprovides:
Centralized,singular,schema-lessdatastorewithraw(as-is)dataandmassageddata
Mechanismforrapidingestionofdatawithappropriatelatency
Abilitytomapdataacrosssourcesandprovidevisibilityandsecuritytousers
Catalogtofindandretrievedata
Costingmodelofcentralizedservice
Abilitytomanagesecurity,permissions,anddatamasking
Supportsself-provisioningofcomputenodes,data,andanalytictoolswithoutITintervention
Thespokesofthe“HubandSpoke”analyticsarchitecturearetheanalyticusecasesorapplicationsthathelptheorganizationtooptimizekeybusinessprocesses,deliveramorecompellingcustomerexperience,anduncovernewmonetizationopportunities.The“spokes”havethefollowingcharacteristics:
Abilitytoperformanalytics(datascience)
Analyticssandbox(HDFS,Hadoop,Spark,Hive,HBase)
Dataengineeringtools(ElasticSearch,MapReduce,YARN,PivotalHAWQ,SQL)
Analyticaltools(SAS,R,Mahout,MADlib,H2O)
Visualizationtools(Tableau,DataRPM,ggplot2)
Abilitytoexploitanalytics(applicationdevelopment)
Thirdplatformapplication(mobileappdevelopment,websiteappdevelopment)
Analyticsexposedasservicestoapplications(APIs)
Integratein-memoryand/orin-databasescoringandrecommendationsintobusinessprocessandoperationalsystems
The“HubandSpoke”analyticsarchitectureenablesthedatascienceteamtodevelopthepredictiveandprescriptiveanalyticsthatarenecessarytooptimizekeybusinessprocesses,provideadifferentiatedcustomerengagement,anduncovernewmonetizationopportunities.
EarlyLearningsThereismuchwecanlearnfromthreedecadesdealingwiththelimitationsofthedatawarehouse.Thereareseverallessonsthatwecantakeawayfromourdatawarehousingexperiencesthatwecanapplytodaytoensurethatwedonotmakethesamemistakesindeployingadatalakestrategy.
Lesson#1:TheNameIsNotImportantSeveraldecadesago,abattleragedbetweendatawarehouseadvocates(associatedwithBillInmonandtheCorporateInformationFactory)anddatamartadvocates(associatedwithRalphKimballandstarschemas)regardingnomenclatureandterminology.Countlessyearswerewastedattradeshows,atseminars,andinconferenceroomsacrosstheworlddebatingwhichapproachwasthe“right”approach.Asareminder:
Datawarehouseorenterprisedatawarehouse(EDW)isasubject-oriented,nonvolatile,integrated,timevariantcollectionofdatainsupportofmanagement'sdecisions.Theenterprisedatawarehouseapproachisoftencharacterizedasatop-downapproach,moreinalignmentwiththeOnlineTransactionProcessing(OLTP)systemsfromwhichthedatawassourced.Thedatawarehousetypicallyhasanenterprise-wideperspective.
Datamartistypicallyorientedtoaspecificbusinessfunction,department,orlineofbusiness.Thisenableseachdepartmentorlineofbusinesstouse,manipulate,anddevelopthedataanywayitseesfit,withoutalteringinformationinsideotherdatamartsortheenterprisedatawarehouse.Datamartsusetheconceptof“conformeddimensions”tointegratedataacrossbusinessfunctions,replicatinginmanywaysthesamedatathatiscapturedintheenterprisedatawarehouse.
Interestingly,historyhasshownthatbothapproachesworked!Therewerecertainlyterminology,architectural,anddeploymentdifferencesbetweenthetwoapproaches,butthebottomlineisthattheybothrequiredthesamekeycapabilitiessuchas:
Captureslargeamountsofhistoricaldatathatcouldbeusedtoanalyzetheperformanceofthekeybusinessentities(dimensions)andidentifytrendsandpatternsinthedata
Datagovernanceproceduresandpoliciestoensurethatthedatastoredinthedatawarehouseanddatamartswere100percentaccurate
Masterdatamanagementtoensurecommondefinitions,terminology,andnomenclatureacrossthelinesofbusiness
Abilitytojoinorintegratedatafromdifferentdatasourcescomingfromdifferentbusinessfunctions
Enduserqueryconstruction(usingSQLandBItools)thatsupportedthegenerationofdaily,weekly,monthly,andquarterlyreportsanddashboardsandalsosupportedtheadhocslicinganddicingofthedata—drillup,drilldown,anddrillacrossdifferentdatasources—toidentifyareasofover-andunder-performance
Thedebateaboutwhetheritisadatalakeoradatareservoiroranoperationaldatastoreisneitherusefulnorconstructive.Let'sjustpickanameandmakeitwork—anddatalakeitis!
Lesson#2:It'sDataLake,NotDataLakesHavingmultipledatalakesreplicatesthesameproblemsthatwerecreatedwithmultipledatawarehouses—disparatedatasiloesanddatafiefdomsthatdon'tfacilitatesharingofthecorporatedataassetsacrosstheorganization.OrganizationsneedtohaveasingledatalakefromwhichtheycansourcethedatafortheirBI/datawarehousingandanalyticneeds.Thedatalakemayneverbecomethe“singleversionofthetruth”fortheorganization,butthenagain,neitherwillthedatawarehouse.Instead,thedatalakebecomesthe“singleorcentralrepositoryforalltheorganization'sdata”fromwhichalltheorganization'sreportingandanalyticneedsaresourced.
Unfortunately,someorganizationsarereplicatingthebaddatawarehousepracticebycreatingspecial-purposedatalakes—datalakestoaddressaspecificbusinessneed.Resistthaturge!Instead,sourcethedatathatisneededforthatspecificbusinessneedintoa“analyticsandbox”wherethedatascientistsandthebusinessuserscancollaboratetofindthosedatavariablesandanalyticmodelsthatarebetterpredictorsofthebusinessperformance.Withinthe“analyticsandbox,”theorganizationcanbringtogether(ingestandintegrate)thedatathatitwantstotest,buildtheanalyticmodels,testthemodel'sgoodnessoffit,acquirenewdata,refinetheanalyticmodels,andretestthegoodnessoffit.Yep,theanalyticsandboxperfectlysupportsthedatascienceengagementprocesscoveredinChapter5,“DifferencesBetweenBusinessIntelligenceandDataScience”(seeFigure7.8).
Figure7.8Datascienceengagementprocess
Iforganizationsaretryingtomaintainmultipledatalakes,thentheorganizationrisksthesameresultsandmanagementdistrustthatstillexiststodaywithmanydatawarehouseimplementations—executivesarguingwhichnumbersarecorrectbecausethedatainthereportsanddashboardsisbeingsourcedfromdifferentdatawarehousesanddatamarts.Let'snipthisprobleminthebudnow!It'sasingledatalake.
Lesson#3:DataGovernanceIsaLifeCycle,NotaProjectIlove(hate)theindustrypunditswhoquicklyjumponthe“Whataboutdatagovernance?”issuewhenwetalkaboutbigdataandthedatalake.Well,whataboutit?Ofcourseitisimportant,andofcoursesmartorganizationsneverforgotaboutit.Asthevolumeofdatagrowsinthedatalake,governancebecomesevenmorecriticalforansweringthewhat,where,andwhohasaccesstodataquestions.
However,thedatagovernancediscussiontakesonanewwrinklewhenyoucontemplatedatainthedatawarehouseversusdatainthedatalake.Whiledatainthedatawarehousestrivesfor100percentgovernance,organizationsaregoingtorealizethatthereneedstobedifferent“degrees”orlevelsofdatagovernanceinthedatalakedependingonhowthedataisbeingused,suchas:
HighlyGovernedData.Datathatwillbesourcedoutofthedatalakeintothedatawarehouseneedstobehighlygoverned.Thisincludesoperationalandperformancedata,aswellasdatasuchasmedical,financial,personallyidentifiableinformation,creditcardinfo,accountnumbers,passwords,etc.Sincethisisthedatathatappearsinmanagement,compliance,andregulatoryreporting,thisdataneedstobe100percentgoverned.
ModeratelyGovernedData.Datathatisgoingtobeusedbythedatascienceteamtocreatepredictiveandprescriptivemodelsinanattempttopredictperformanceneedstobemoderatelygoverned.Thelevelofdatagovernancewillbeultimatelydeterminedbasedonthecostoftheanalyticmodelsbeingwrong(thinkTypeI“falsepositive”andTypeII“falsenegative”modelingerrorsandthepotentialbusinessimpactsofthosetypesoferrors).
UngovernedData.Datathatisjustbeingheldinthedatalakeandforwhichnovaluehasyetbeenattributedtothatdatawouldbeungoverned.Thedatascienceteamisfreetoacquireandexperimentwiththisungoverneddata.However,oncethereissomelevelofvalueestablishedforthedata(i.e.,dataisusedtopowerafinancialclient“RetirementReadiness”score),thenthedataneedstomoveintothemoderatelygoverneddataclassification.
Inthebigdataworld,thegoalforthesmartorganizationshouldbe“just-enoughdatagovernance.”Whywastemanagementcyclesgoverningdatawhenthatdatamightnotevenbeusefultotheorganization?Butoncethevalueofthatdatahasbeenascertainedinhowthatdataisgoingtobeusedtooptimizekeybusinessprocessesanduncovernewmonetizationopportunities,thentheappropriatelevelofdatagovernance(highlygoverned,moderatelygoverned,ungoverned)canbeappliedtothatspecificdatasource.
Lesson#4:DataLakeSitsBeforeYourDataWarehouse,NotAfterItSeveraltraditionaldatawarehousevendorsaretryingtoconvincetheircustomersthatthedatalakeshouldsitafterthedatawarehouse;thatis,thedatalakeshouldbepopulatedfromthedatawarehouseversuspopulatingthedatawarehousefromthedatalake.Sorry,butthat'saself-servingpropositionfromvendorswhoarealreadyseeingtheeconomicimpactontheirrevenuesandprofitswithrespecttothepowerofthedatalaketoreshapehoworganizationsstore,manage,analyze,andvaluedata.
Theproblemwiththis“DataWarehouseFirst”argumentisthatmanyofthedatalakebenefits(rapiddataingest,capturingdataas-iswithnoneedtoprebuildadataschema,supportforunstructureddatasources,nolossofdatafidelityduetodatatransformations,singlerepositoryforalltheorganization'sinternalandexternaldata)arelostifthedatafirstneedstogothroughthedatawarehouse.
Iamsurethatthe“DataWarehouseFirst”messageinitiallyresonateswithorganizationsthathavespentyearsbuildingouttheirdatawarehousecapabilities.Butdatawarehouseteamsarebeginningtounderstandthebenefitsofloadingthedatafirstintothedatalake,includingfreeingupdatawarehouseresourcesfromdoingtheETLworkandsupportingtheadvancedanalyticmodelingthatcannoteasilybedonewithinthedatawarehouse.
WhatDoestheFutureHold?Thecost,processing,andagilityadvantagesofHadoop/HDFSwouldmakeitappearthatitisonlyamatteroftimebeforeHadoop/HDFSreplacestheRelationalDataBaseManagementSystems(RDBMS)asthedatawarehouseplatformofchoice.TheHadoop/HDFScost,processingandagilityadvantagesovertheexpensivecommercialandproprietaryRDBMSproductswillsoonbecometoomuchfororganizationstoignore.
TodaythereismuchinertiafororganizationstomoveofftheRDBMSdatawarehouseplatform.OrganizationsnotonlyhaveinvestedyearsandevendecadesofefforttobuildtheirdatawarehouseenvironmentontheseRDBMSplatformsbutalsohavecreatedamultitudeofBIreportsanddashboardsontheseRDBMS-baseddatawarehousesthatactasagiantanchorindissuadingorganizationsfromcontemplatingtransitioningtoaHadoop/HDFSdatawarehouseplatform.
Butthetimestheyareachangin'.Thedevelopmentandrapidadoptionofopensource“SQLonHadoop”productslikePivotalHAWQ(nowpartoftheOpenDataPlatforminitiative),ClouderaImpala,andHortonworksStingerareenablingthelegionsofSQL-traineddeveloperstodevelopSQL-basedreportsanddashboardsonHadoop.
PlusthedevelopmentofnewsoftwareproductslikeAtScale(thatactsasalayerbetweenHadoop/HDFSandanorganization'sexistingBItools)andXplain.io3
(thatautomatestherewritingofRDBMSSQLtoworkonHadoop)willacceleratetheinevitabilityofthetransitionofthedatawarehouseplatformtoHadoop/HDFS(seeFigure7.9).
Figure7.9Whatdoesthefuturehold?
Formanyorganizations,datawarehousedecisionsarefraughtwithpersonalbiases.AndyearsanddecadesofdatawarehouseandBIdevelopment,personneltraining,andtoolacquisitionwillmakeanytransitionofftheRDBMSdatawarehouseplatformtoaHadoopdatawarehouseplatformmoreofareligiousdebatethanafinancialortechnologydecision.Butsoon,theeconomicsofbigdata,plusthecontinueddevelopmentofnewtoolstosupportthedatawarehouseonHadoop,willbetoocompellingtoignore.AndIwanttobetherewhenthatdaycomes!
SummaryThesearecertainlymarveloustimestobeinthedatabusiness.ThedatalakeleveragesnewbigdatatechnologyinnovationstoenableorganizationstoextendandenhancetheirexistingdatawarehouseandETLinvestmentswhileempoweringbusinessanalystsanddatascientiststoexplorenewdatasourcesanddataenrichmenttechniquestoteaseoutnewactionableinsightsabouttheircustomers,products,andoperations.
Figure7.10showsEMC'spre-engineeredFederationBusinessDataLake.Itisoneoftheindustry'smostcomplete,well-thought-outdatalakearchitectures,asitlaysoutthekeycomponentsandservicesnecessary(includingdatagovernance,cataloging,dataingest,indexing,andsearching)asorganizationsmovetoanenterprise-readydatalake—orasIliketocallit,datalake2.0.
Figure7.10EMCFederationBusinessDataLake
Theindustryisonlyattheearlystagesofthedatalakeera.Thereismuchstilltobewrittenabouthowthedatalakewilldramaticallychangethewaysthatorganizationsstore,manage,analyze,andvaluedata.Heck,maybeIwillneedtowriteathirdbookafterall.Watchthisspace!
HomeworkAssignmentUsethefollowingexercisestoapplywhatyoulearnedinthischapter.
Exercise#1:Listthebenefitsthatthedatalakecouldbringtoyourorganization'sexistingdatawarehousingenvironment.
Exercise#2:Listthebenefitsthatthedatalakebringstoyourorganization'sbusinessanalystsanddatascientists.
Exercise#3:ListtheissuesthatarepreventingyourorganizationfrommovingitsdatawarehouseenvironmentfromanRDBMS-basedplatformtoaHadoop-basedplatform.
Exercise#4:ForeachoftheissueslistedinExercise#3,capturewhatyourorganizationwouldneedtoseehappen(e.g.,tools,training,references,managementsupport)inordertoaddressthatissue.
Notes1“Spreadmarts”arespreadsheetsordesktopdatabasemanagementapplications(MicrosoftAccess)thatarecreatedandmaintainedbybusinessanalystsoutsidethepurview,support,andmaintenanceofthecentralizedinformationtechnologyorganization.Spreadmartstypicallycontaindatathatmayhaveoriginatedfromthedatawarehousebuthasbeentransformedandintegratedwithotherdatatosupportthebusinessanalysts'specificanalysisneeds.
2http://adage.com/article/datadriven-marketing/starbucks-data-pours/240502/
3AcquiredbyCloudera.
PartIIIDataScienceforBusinessStakeholders
InThisPart
Chapter8:ThinkingLikeaDataScientist
Chapter9:“By”AnalysisTechnique
Chapter10:ScoreDevelopmentTechnique
Chapter11:MonetizationExercise
Chapter12:MetamorphosisExercise
Chapter8ThinkingLikeaDataScientistOneofthemostfrequentquestionsIgetis:“HowdoIbecomeadatascientist?”Wow,toughquestion.Therearemanyoutstandingbooksanduniversitycoursesthatoutlinethedifferentskills,capabilities,andtechnologiesthatadatascientistisgoingtoneedtolearnandeventuallymaster.I'vereadseveralofthesebooksandamimpressedwiththedepthofthecontent.
Mostofthesebooksspendthevastmajorityoftheirtimecoveringtopicssuchstatistics,datamining,textmining,anddatavisualizationtechniques.Yes,theseareveryimportantdatascienceskills,buttheyarenotnearlysufficienttomakeourdatascienceteamseffective.Anditisnotpractical,orevenbeneficial,totrytoturnyourentireworkforceintodatascientists.Nevertheless,itisrealistictoteachthebusinessstakeholdersto“thinklikeadatascientist”inorderforthebusinessstakeholderstounderstandthetypesofbusinessopportunitiesthatcanbedrivenbyapplyingpredictiveandprescriptiveanalyticstonewsourcesofcustomer,product,andoperationaldataandtohelpthedatascienceteamtouncoverthosevariablesandmetricsthatarebetterpredictorsofperformance.
ThepotentialofthinkinglikeadatascientistfirsthitmewhenIwastheVicePresidentofAdvertiserAnalyticsatalargeInternetportalcompany.IwascharteredwithbuildingtheanalyticstohelpouradvertisersandagenciesimprovetheperformanceoftheirmarketingspendacrosstheInternetportal'sadnetwork.WhenIjoinedthecompany,Iknewverylittleaboutthedigitalmarketingworld.SoIspentthefirstthreemonthsontheroadshadowingthecompany'stopadvertisersandtheirrespectiveadvertisingagenciestobetterunderstandtheiranalyticexpectationsandrequirements.
Afteraboutthreeweeksontheroad,Ithoughttheprojectwasgoingtobeadisaster.Everyoneoftheanalyticteamsattheadvertisersandadagencieswithwhomwemetjustwantedmoredatainatimeliermanner.Wewerealreadygivingtheseanalystteamsmostofdatathatwehad,butyettheydidnotseemtobeabletoleveragethisdatatoimprovetheirperformanceacrossouradnetwork.
Thatiswhenoneofmyteammembershadoneofthoselightbulbmoments—weweretalkingtothewrongpeople.Itwasn'ttheanalystswithintheseadvertisersandadagencieswhoweremakingthedigitalmarketingexecutiondecisions,butitwasthemediaplannersandbuyers(whoweremakingthedecisiontowhichadnetworkstoallocatethemarketingorcampaignspendpriortocampaignlaunch)andthecampaignmanagers(whoweretryingtomakein-flightcampaignadjustmentsusingretrospective,after-the-fact,descriptivereporting).
Soweswitchedtheentirefocusofourproductdevelopmenteffortstofocusonthesekeybusinessstakeholdersandtocapturethedecisionsthattheyhadtomakeandthequestionsthattheyhadtoanswerinsupportoftheirdigitalmarketingcampaigns.Andthat'swhenthefundamentalsbehindthe“thinkinglike
adatascientist”processwereborn.
Thischapterwillintroduceaframework,techniques,andhands-onexercisestohelpbusinessstakeholders“thinklikeadatascientist.”The“thinkinglikeadatascientist”frameworkwillhelpthebusinessstakeholderstocollaboratewithdatascientiststouncoverthosevariablesandmetricsthatcanimprovebusinessperformanceanddrivebusinessandfinancialvalue.
Datascienceteamsneedhelpfromthebusinessusers—orsubjectmatterexperts(SME)—tounderstandthedecisionsthebusinessistryingtomake,thehypothesesthatthebusinesswantstotest,andthepredictionsthatthebusinessneedstomake.Theeight-step“thinkinglikeadatascientist”frameworkcovers:
Step1:IdentifyKeyBusinessInitiative
Step2:DevelopBusinessStakeholderPersonas
Step3:IdentifyStrategicNouns
Step4:CaptureBusinessDecisions
Step5:BrainstormBusinessQuestions
Step6:Leverage“By”Analysis
Step7:CreateActionableScores
Step8:PuttingAnalyticsintoAction
Inessence,toimprovetheoveralleffectivenessofourdatascienceteams,weneedtoteachthebusinessuserstothinklikeadatascientist.Asanoutcomefromthiseight-stepprocess,thebusinessstakeholdersandthedatascientistsshouldbebetterpreparedtouncoverthosevariablesandmetricsthatarebetterpredictorsofperformance.
TheProcessofThinkingLikeaDataScientistThebasicgoalofdatascienceistouncovernewvariablesormetricsthatarebetterpredictorsofperformance.But“performance”ofwhat?Thatis,uponwhatshouldthedatascienceteamfocusitsanalyticexplorationandmodelingdevelopmentefforts?Itshouldbenosurprisethatthestartingpointforour“thinkinglikeadatascientist”processstartsbyunderstandingtheorganization'skeybusinessinitiatives.
Step1:IdentifyKeyBusinessInitiativeWouldyouexpectanythingdifferentfrommethanstartingwithwhat'simportanttothebusiness?So,howcanyouspotakeybusinessinitiative?AswascoveredinChapter3,“TheBigDataStrategyDocument,”akeybusinessinitiativeischaracterizedas:
Criticaltotheimmediate-termperformanceoftheorganization
Documented(communicatedeitherinternallyorpublicly)
Cross-functional(involvesmorethanonebusinessfunction)
Ownedandchampionedbyaseniorbusinessexecutive
Hasameasurablefinancialgoal
Hasawell-defineddeliverytimeframe(9to12months)
Undertakentodeliversignificant,compelling,and/ordifferentiatedfinancialorcompetitiveadvantage
Itiscriticaltothesuccessofyourbigdataeffortstotargetbusinessinitiativesthatarefocusedonthenext9to12months.Anybusinessinitiativeslongerthan12monthslackthesenseofurgencytomotivatetheorganizationandrisksbecominga“scienceexperiment”projectwithallsortsofnewandsometimesrandomrequirementsbeingthrownintothemix.
CROSS-REFERENCE
SeeChapter3,“TheBigDataStrategyDocument,”toreviewideasonhowtoleveragepubliclyavailableinformation(e.g.,annualreports,analystcalls,executivespeeches,companyblogs,SeekingAlpha.com)inordertouncoveranorganization'skeybusinessinitiatives.
Forpurposesofthisexercise,wearegoingtopretendthatourclientisFootLockerandthatourtargetbusinessinitiativeis“improvemerchandisingeffectiveness”ashighlightedinFootLocker's2010annualreport(seeFigure8.1).
Figure8.1FootLocker'skeybusinessinitiatives
Merchandisingisdefinedastheplanningandpromotionofsalesbypresentingaproducttotherightmarketatthepropertime,bycarryingoutorganized,skillfuladvertising,usingattractivedisplays,etc.1Figure8.2showssomeexamplesofdifferentmerchandisingapproachesataFootLockerretailstore.
Figure8.2ExamplesofFootLocker'sin-storemerchandising
Step2:DevelopBusinessStakeholderPersonasThenextstepinthe“thinkinglikeadatascientist”processistoidentifythekeybusinessstakeholderswhoeitherimpactorareimpactedbythetargetedbusinessinitiative(e.g.,sales,marketing,finance,storeoperations,logistics,inventory,manufacturing).Therearetypicallythreetofivedifferentbusinessstakeholderswhoareimpactedbyagivenbusinessinitiative.Wewanttodevelopapersonaforeachofthesebusinessstakeholderstounderstandbettertheirworkenvironmentandjobcharacteristics.Understandingtheworkenvironmentandjob
characteristicsofthebusinessstakeholdershelpstostartidentifyingthedecisionsandquestionsthatthesestakeholdersmustaddresswithrespecttothetargetedbusinessinitiative.
Apersonaisaone-totwo-page“dayinthelife”descriptionthatmakesthekeybusinessstakeholders“cometolife”forthedatascienceanduserexperience(UEX)developmentteams.Personasareusefulinunderstandingthegoals,tasks,keydecisions,keyquestions,andpainpointsofthekeybusinessstakeholders.Thepersonahelpsthedatascienceteamtostarttoidentifythemostappropriatedatasourcesandanalytictechniquestosupportthedecisionsthatthebusinessusersaretryingtomakeandthequestionsthattheyaretryingtoanswer.Personasshouldbecreatedforeachtypeofbusinessstakeholderaffectedbythetargetedbusinessinitiative.
FortheFootLocker“improvemerchandisingeffectiveness”businessinitiative,thebusinessstakeholdersforwhomwewouldwanttobuildpersonascouldinclude:
Customers,whocontemplatevisitingastore,visitthestore,andmakepurchasedecisionswhiletheyareinthestore.Customerscancometothestoreundermanyscenarios(e.g.,buysomethingforthemselves,buysomethingforsomeoneelselikeasonordaughter,browsetoseewhatmightbeinteresting,orbrowseforproductsthattheythenbuyonline).Ineachofthesescenarios,thecustomerconsidersmanyfactors(function,price,value,urgency,aesthetics,socialperceptions,etc.)beforemakingapurchasedecision.
Storemanagers,whoareinchargeofthegeneraloperationsofastore.Storemanagersareresponsibleformeetingthestore'ssalesandbudgetgoals.Toaccomplishthatgoal,storemanagersmakedecisionstocreateschedules,ensurethestoreisstocked,createandmaintainbudgets,andcoordinatein-storemerchandisingandmarketingprograms.
Merchandisemanagers,whooverseetheselection,acquisition,promotion,andsaleofproductsinaretailsetting.Merchandisemanagerstypicallysitatthecorporateheadquartersandstudymarkettrendsandcustomerdemographicsinordertomakedecisionsabouthowtobestprice,stock,display,andpromoteproducts.
Buyers,whoareresponsibleforsourcingnewproductsandanalyzingexistingproductsales.Buyers,whoalsotypicallysitatcorporate,needtoresearch,plan,analyze,andchoosethetypes,quality,andpricesoftheproductsthattheyneedtosource.Thesebuyingdecisionswillbebasedonconsumerdemands,industrytrends,budget,andthecompany'soverallbusinessstrategy.
ApersonaforthestoremanagercouldlooklikeFigure8.3.
Figure8.3FootLocker'sstoremanagerpersona
Iconsource:theamm.org
IhighlightedinFigure8.3someofthekeydecisionsthatthestoremanagerneedstomakeinsupportofthe“improvemerchandisingeffectiveness”businessinitiative.
Step3:IdentifyStrategicNounsStrategicnounsarethekeybusinessentitiesaroundwhichthetargetedbusinessinitiativeisfocused.Itisaroundthesekeybusinessentitiesthatwearetryingtounderstandandquantifytheirbehaviors,tendencies,propensities,patterns,interests,passions,affiliations,andassociationsinordertopredictlikelyactionsandprescribeactionablerecommendations.Thesestrategicnounsarecriticaltoourdatascientistthinkingprocessbecausethesearetheentitiesaroundwhichwewillultimatelybuildindividualanalyticprofilesandthesupportingpredictiveandprescriptivemodels.Examplesofstrategicnounsincludecustomers,patients,students,employees,stores,products,medications,trucks,windturbines,etc.
FortheFootLocker“improvemerchandisingeffectiveness”businessinitiative,thestrategicnounsorkeybusinessentitiesonwhichwewillfocusare(seeFigure8.4):
Customers
Products
Marketingcampaigns
Stores
Figure8.4FootLocker'sstrategicnounsorkeybusinessentities
Step4:CaptureBusinessDecisionsThenextpartofthe“thinkinglikeadatascientist”processistocapturethedecisionsthatthebusinessstakeholdersneedtomakeaboutthestrategicnounsinsupportofthetargetedbusinessinitiative.Westartedtocapturesomeofthekeydecisionsaswebuiltoutthebusinessstakeholderpersonas.However,wewanttoexpandtheeffortsbybrainstormingwitheachofthedifferentstakeholdersthedecisionstheyneedtomakeabouteachstrategicnouninsupportofthetargetedbusinessinitiative.
Capturingandvalidatingthesedecisionsiscriticaltothe“thinkinglikeadatascientist”process.LeadingorganizationslikeUberandNetflixaredisruptivebecausetheybuildabusinessmodelthatseekstosimplifytheirtargetedpersonas'keydecisions.ForUber,oneofthedecisionsthatitaddressesis“HowdoIeasilygetfromwhereIamtowhereIwanttobe?”ForNetflix,oneofthedecisionsthatitaddressesis“Whatcontent[movie,TVshow]canIeasilywatchtonight?”
ForourFootLockerexample,herearesomecustomerpromotionaldecisionsthatthebusinessstakeholdersneedtomakeinsupportofthe“improvemerchandisingeffectiveness”businessinitiative:
Selectcustomerstowhomtosendpromotionaloffers.
Determinewhichtypesofpromotionalofferstosendtothosetargetedcustomers.
Determinewhentosendthosepromotionaloffers.
Herearesomeproductpromotionaldecisionsthatthebusinessstakeholdersneedtomakeinsupportofthe“improvemerchandisingeffectiveness”businessinitiative:
Decidewhichproductsorcombinationsofproductstopromote.
Decidewhichtypesofin-storemerchandisingtoemploy.
Choosethebestplaceswithinthestorestodisplaypromotedproducts.
Choosethebestin-storepromotionsforreducingoutdatedinventory.
Thecaptureandvalidationofthekeybusinessdecisionsarecriticalbecausethesedecisions:
Drivethedevelopmentoftheanalytic(predictiveandprescriptive)modelsbythedatascienceteamtosupportthesedecisions
Supportthedeterminationoftheuserexperience/userpresentationrequirements,thatis,whereandhowdotheanalyticinsights(recommendations,scores,rules)getpresentedtothebusinessstakeholdersinawaythatisactionable
Step5:BrainstormBusinessQuestionsProbablythehardestpartofthe“thinkinglikeadatascientist”exerciseistobrainstormthequestionsthebusinessstakeholdersneedtoanswertosupportthedecisionsthatsupportthetargetedbusinessinitiative.AsyoucanseefromFigure8.5,thequestionsformthefoundationfortheentire“thinkinglikeadatascientist”process.
Figure8.5Thinkinglikeadatascientistdecompositionprocess
IamconfidentwhenIsaythatIhavenevermetbusinessuserswhodidnotalreadyknowthequestionsthattheyaretryingtoanswer.However,thebiggestchallengeisnottocapturethequestionsthattheyaretryingtoanswerbuttoget
thebusinessuserstoexpandtheirlineofthinkingtocontemplatethequestionsthattheyhavegivenuptryingtoanswer.
Thereasonwhythismaybethehardestpartoftheprocessisthatitrequiresthebusinessstakeholderstothinkdifferentlyaboutthetypesofquestionsthattheycanask.Wewantthebusinessstakeholderstoexpandtheirthinkingaboutthebusinessquestionstoinclude:
Predictiveanalytics:Predictingwhatislikelytohappen
Prescriptiveanalytics:Recommendingwhattodonext
Akeypartofthe“thinkinglikeadatascientist”processisgettingthebusinessstakeholderstotransitionfromdescriptiveanalytics(usingBusinessIntelligencetoolstoreportonwhathappened)topredictiveanalytics(topredictwhatislikelytohappen)toprescriptiveanalytics(torecommendwhattodo).
AsdiscussedinChapter5,“DifferencesBetweenBusinessIntelligenceandDataScience,”weneedthebusinessstakeholderstotransitiontheirthinkingtocontemplatethesepredictivequestionsandprescriptivestatements.SeeTable8.1foranexampleoftheevolutionfromdescriptivetopredictivetoprescriptiveanalytics.
Table8.1EvolutionofFootLocker'sBusinessQuestions
WhatHappened?(DescriptiveANALYTICS)
WhatWillHappen?(PredictiveAnalytics)
WhatShouldIdo?(PrescriptiveAnalytics)
HowmanyNikeHyperdunksdidIselllastmonth?
HowmanyNikeHyperdunkswillIsellnextmonth?
Order[50]NikeHyperdunkstosupportnextmonth'ssalesprojections.
WhatwereapparelsalesbyzipcodeforChristmaslastyear?
WhatwillbeapparelsalesbyzipcodeoverthisChristmasseason?
Hire[3]temporaryrepsforStore12234tohandleprojectedChristmassales.
HowmanyofJordanAJFutureswerereturnedlastmonth?
HowmanyofJordanAJFutureswillbereturnednextmonth?
Setaside[$125K]infinancialreservetocoverJordanAJFuturesreturns.
Whatwerecompanyrevenuesandprofitsforthepastquarter?
Whatareprojectedcompanyrevenuesandprofitsfornextquarter?
Markdown[LeBronFoundationapparel]by20percenttoreduceinventorybeforenewproductreleases.
HowmanyemployeesdidIhirelastyear?
HowmanyemployeeswillIneedtohirenextyear?
Increasehiringpipelineby35percenttoachievehiringgoals.
ContinuingourFootLocker“improvemerchandisingeffectiveness”example,wewantthebusinessstakeholderstobrainstormthequestionsthatsupportthecustomerpromotionaldecisionsfromtheperspectivesofdescriptive,predictive,andprescriptiveanalytics.Herearesomeexamplesofthesedifferenttypesofcustomerpromotionalquestions:
DescriptiveAnalytics(Understandingwhathappened)Whatcustomersaremostreceptivetowhattypesofmerchandisingcampaigns?
Whatarethecharacteristicsofcustomers(e.g.,age,gender,customertenure,lifestage,favoritesports)whoaremostresponsivetomerchandisingoffers?
Aretherecertaintimesofyearwherecertaincustomersaremoreresponsivetomerchandisingoffers?
PredictiveAnalytics(Predictingwhatwillhappen)Whichcustomersaremostlikelytovisitthestoreforaback-to-schoolpromotion?
WhichcustomersaremostlikelytorespondtothenewMichaelJordanbasketballshoe?
Whichcustomersaremostlikelytorespondtoa50percentoffin-storemarkdownonNikeapparel?
WhichcustomersarelikelytorespondtoanofferofafreepairofJordanElitesockswhentheybuynewshoes?
PrescriptiveAnalytics(Recommendingwhattodonext)E-mailBillSchmarzoa50percentdiscountcouponfortwopairsofNikeElitesockswhenhebuyshisnewpairofAirJordans.
TextMaxSchmarzothathewillreceiveatriple-pointbonuswhenhebuysNikeapparelthiscomingweekend.
MailAlecSchmarzoa$20cashcoupongoodonlyifhevisitsthestorewithinthenext14days.
ForourFootLocker“improvemerchandisingeffectiveness”example,wewanttobrainstormthequestionsthatsupportourproductpromotionaldecisions.Herearesomeexamplesofthedifferenttypesofproductpromotionalquestionsthatsupportthe“improvemerchandisingeffectiveness”businessinitiative:
DescriptiveAnalytics(Understandingwhathappened)Whatarethetopsellingproductsandproductcategories?
Whatproductsaremostresponsivetoin-storemerchandisingcampaigns?
HowmanybasketballshoesdidIsellduringlastyear'shighschoolandyouthbasketballseasons?
WhichproductsarehotmoversthatImightwanttofeatureatthefrontofthestore?
WhichproductsareslowmoversthatImightneedanin-storemerchandisingcampaigntomove?
Whichproductssellbestatwhichtimesoftheyear/sportsseason?
PredictiveAnalytics(Predictingwhatwillhappen)Whichshoesandapparelaremostlikelytosellwithaback-to-schoolpromotionalevent?
WhichbasketballshoesandwhatsizesamIlikelytoneedtostockgiventheupcominghighschoolandyouthbasketballseasons?
WhatisthelikelymarketbasketrevenueandmarginfromaBuyOneGetOneFree(BOGOF)event?
PrescriptiveAnalytics(Recommendingwhattodonext)
Withtheupcominghighschoolbasketballseason,promoteAirJordansandNikeElitesocksinthesamedisplayatthefrontofthestore.
Giventheendofthefootballseason,providein-storeBOGOFpromotionoffootballapparel.
Reduceprices50percentontheinventoryofbaseballcleatsinanticipationofincomingnewbaseballequipment.
CROSS-REFERENCE
Becauseofthedepthofthetopics,step6,“leverage‘By’analysistouncovernewmetricsandvariablesthatmightbebetterpredictorsofperformance,”andstep7,“createactionablescoresthatthebusinessstakeholderscanusetosupportthetargetedbusinessinitiative,”arecoveredinChapters9and10,respectively.
Step8:PuttingAnalyticsintoActionThisisthepartofthe“thinkinglikeadatascientist”processwhenthehighlyspecializeddatascienceworkhappens(usingsomeoftheanalytictechniquescoveredinChapter6).Thedatascienceteamwilltest,refine,andvalidatethatwehaveidentifiedtherightmetrics,variables,andscores.Thedatascienceteamcanthenrecommendtothebusinessstakeholdershowtheanalyticswillsupportthedecisionsthatsupportthetargetedbusinessinitiative.
Forexample,itisnotsufficienttoknowthatthereisanincreaseinheadinjuries,lacerations,andbrokenbonesforhospitalsnearaNationalFootballLeague(NFL)footballstadiumafteranNFLgame.Onehastoknow(fromthedatasciencework)thatthereisa27percentincreaseifoneistomakeprescriptiverecommendationsaboutadditionalemergencyroomnurses,physicians,andsupplies.
Afterthedatascienceteamhasdoneitsmagictovalidatethemetrics,variables,andscoresthatarebetterpredictorsofbusinessperformance,thenthenextstepinthe“thinkinglikeadatascientist”processis“puttingtheanalyticsintoaction”withrespecttowhatanalytics-drivenscoresorrecommendationstodelivertothebusinessstakeholders.WespentaconsiderableamountoftimeinChapter4,“TheImportanceoftheUserExperience,”detailinghowcriticaltheuserexperienceistotheultimatesuccessoftheorganization'sbigdatainitiatives.Remember:Ifyoucan'tpresenttheanalyticresultsinawaythatisactionable,thenwhyevenbother.
Youcanfacilitatethedevelopmentofacompellingandactionableuserexperiencebystartingwithasimple“recommendationsworksheet.”Therecommendationsworksheetlinksthedecisionsthatourbusinessstakeholdersneedtomaketothepredictiveandprescriptiveanalyticsthatthedatascienceteamisgoingtobuild.Therecommendationsworksheetstartswiththedecisionscapturedinstep4,and
thenidentifiesthepotentialrecommendationsthatcouldbedeliveredtothebusinessstakeholdersinsupportofthosedecisions.Finally,theworksheetcapturesthepotentialscores(andthesupportingvariablesandmetrics)thatcanbeusedtopowertherecommendations.
Insummary,
Decisions→Recommendations→Scores(SupportingMetrics)
SeeFigure8.6forasimpletemplatethatwecanusetoguidetherecommendationsprocess.
Figure8.6Recommendationsworksheettemplate
Let'sseetherecommendationsworksheetinaction.ForourFootLocker“improvemerchandisingeffectiveness”businessinitiative,theresultingrecommendationsworksheetcouldlooklikeFigure8.7.
Figure8.7FootLocker'srecommendationsworksheet
Thelaststep(andpossiblythemostfunstep)isthecreationoftheuserexperiencemock-upthatvalidatesthatwearebuildingtherightanalyticsandhaveathoroughunderstandingofwhereandhowtodelivertheanalyticresults,scores,andrecommendations(e.g.,managementdashboards,reports,callcenter,procurement,sales,marketing,finance,etc.).SeeFigure8.8foranexampleofthestoremanageractionabledashboard.
Figure8.8FootLocker'sstoremanageractionabledashboard
Duringtheenvisioningandrequirementsgatheringandvalidationprocesses,donotworryaboutthequalityofthemock-up.UsingPowerPointandafewstandarddashboardimagescangoalongwayinfuelingthecreativethinkingofthe
SummaryDatascientistsarecriticaltotheabilitytointegratedataandanalyticsintotheorganization'sbusinessmodels.Butanimportantchallengeistogetyourbusinessusersto“thinklikeadatascientist”whencontemplatingdatasourcesandmetricsthatmightbebetterpredictorsofbusinessperformance.Havingabusinessorganizationthatcan“thinklikeadatascientist”willdrivebettercollaborationwithyourdatascienceteamandultimatelyleadtobetterpredictiveandprescriptiveresultsandincreasedvaluetothebusiness.
Weintroducedthe“thinkinglikeadatascientist”eight-stepprocessthatincludes:
Step1:IdentifyKeyBusinessInitiative
Step2:DevelopBusinessStakeholderPersona
Step3:IdentifyStrategicNouns
Step4:CaptureBusinessDecisions
Step5:BrainstormBusinessQuestions
Step6:Leverage“By”Analysis
Step7:CreateActionableScores
Step8:PuttingAnalyticsintoAction
WeusedaFootLockerexampletohelpdrivehometheconceptsandtechniquesintheeight-step“thinkinglikeadatascientist”process.Asaresult,notonlydothebusinessstakeholdersbetterunderstandhowthedatascienceprocessworks,butthebusinessstakeholdersalsounderstandwhattheycandotohelpthedatascienceprocessdelivernewvaluetotheorganizationbyhelpingtouncovernewdatasources,metrics,variables,andscores.
CROSS-REFERENCE
Asnotedearlierinthischapter,steps6and7arecoveredinChapters9and10,respectively.
HomeworkAssignmentUsethefollowingexercisestoapplywhatyoulearnedinthischapter.
Exercise#1:StartwiththekeybusinessinitiativethatyouidentifiedinChapter2.Writedownthekeybusinessstakeholderswhoeitherimpactorareimpactedbythetargetedbusinessinitiative.Captureorganizationalrolesversusindividualnamesatthispoint.
Exercise#2:Developaone-pagepersonaforoneofthekeybusinessstakeholdersidentifiedinExercise#1.Usethepersonatemplatethatwediscussedinthischapter.
Exercise#3:Writedownthekeybusinessentities(orstrategicnouns)forthetargetedbusinessinitiative.Thesecanbebothhumans(e.g.,customers,students,patients,technicians,engineers,etc.)andthings(e.g.,jetengines,trucks,ATMs,testsuites,curriculums,stores,competitors).
Exercise#4:Brainstormthebusinessdecisionsthatthebusinessstakeholdersneedtomakeaboutthebusinessentitiesinsupportofthetargetedbusinessinitiative.
Exercise#5:BrainstormthebusinessquestionsthatthebusinessstakeholdersmightwanttoaskandanswerwithrespecttoeachofthedecisionslistedinExercise#4.Besuretocontemplate(1)descriptivequestions,(2)predictivequestions,and(3)prescriptivestatements.IrepeatedthedecompositionprocessslideinFigure8.9forreference.
Figure8.9Thinkinglikeadatascientistdecompositionprocess
Notes1http://dictionary.reference.com/browse/merchandising
Chapter9“By”AnalysisTechniqueChapter8,“ThinkingLikeaDataScientist,”brieflyintroducedthe“By”analysisasatechniquearoundwhichthebusinesssubjectmatterexperts(SMEs)andthedatascienceteamcouldcollaboratetouncovernewvariablesandmetricsthatmightbebetterpredictorsofbusinessperformance.“By”analysisisatechniquethatwashistoricallyusedduringthedatawarehouserequirementsgatheringprocessestoensurethatthedatawarehouseschemawasrobustenoughtosupportthefullrangeofBusinessIntelligencequeriesandreportsthatbusinessusersmightrequest.Datasciencebuildsonthe“By”analysistocreateacollaborativetechniquetodrivealignmentbetweenthebusinessusersandthedatascientiststoidentifyandbrainstormvariablesandmetricsthatmightbebetterpredictorsofbusinessperformance.The“By”analysistechniquere-enforcestheimportanceofthe“thinkinglikeadatascientist”process.
RememberthedatasciencedefinitionfromMoneyball:TheArtofWinninganUnfairGamecoveredinChapter5:
Datascienceisaboutfindingnewvariablesandmetricsthatarebetterpredictorsofperformance.
The“By”analysistechniquesupportsthisdatascienceobjectivebypoweringthepartnershipbetweenthebusinessusersandthedatascientiststoleveragenewsourcesofcustomer,product,operational,market,andcompetitivedata,coupledwithadvancedanalytics,touncovermetricsandvariablesthatmaybebetterpredictorsofbusinessperformance.
Continuingwiththebaseballanalogy,MajorLeagueBaseball(MLB)teamssuchastheBostonRedSoxarecontinuallyexploringandtestingnewsourcesofdataandnewanalyticsinhopesofuncoveringnewvariablesandmetricsthatarebetterpredictorsofplayerperformance;thatis,theyaretryingtofindthatnextmorepredictive“on-basepercentage”(seeFigure9.1).
Figure9.1Identifyingmetricsthatmaybebetterpredictorsofperformance
Ultimately,thesenewvariablesandmetricswillbeusedtodetermineaplayer'sfinancialvalueinachievingtheteam'sbusinessobjectives.Identifyingthesemetricsandvariablesisnoguaranteethattheywillbebetterpredictorsofperformance(asseenfromtheRedSox'smostrecentperformance),butitgivesthedatascienceteamsastartingpointfortheirdatascienceexplorationand“fail-fast”analyticprocesses.
“By”AnalysisIntroductionThe“By”analysistechniqueexploitsabusinessuser'snatural“questionandanswer”enquiryprocesstoidentifynewdatasources,dimensionalcharacteristics,variables,andmetricsthatcouldbeleveragedbythedatascienceteaminbuildingthepredictiveandprescriptiveanalyticmodelstohelppredictbusinessperformance.The“By”analysisleveragesabusinessstakeholder'snaturalcuriositytobrainstormnew:
Metrics,measures,andkeyperformanceindicators
Dimensions(e.g.,strategicnouns)andtheattributesandcharacteristicsassociatedwiththosedimensionsorstrategicnouns
Areasforpotentialanalyticsexploration
The“By”analysistechniqueleveragesthenormalbusinessstakeholderquestionandqueryexplorationprocess;itfuelsthenaturalinquisitivehumannaturetoseekoutnewvariablesandmetricsthatmaybebetterpredictorsofbusinessperformance.
The“By”analysisusesasimple“Iwantto[verb][metric]by[dimensionalattribute]”formattocapturethebusinessstakeholderbrainstormingprocessandrevealnewdataandanalyticareasofexploration.The“By”analysisformatlookslikethis:
“Iwantto”
[Verb]suchassee,know,report,compare,trend,plot,predict,score,etc.
[Metric]suchassales,margin,profits,socialmediaposts,comments,physiciannotes,vibrationlevels,sensorcodes,etc.
“By”
[Dimensionordimensionalattribute]suchascity,state,zipcode,date,time,seasonality,productcategory,remodeldate,storemanagerdemographics,etc.
Hereisanexampleofa“By”analysisstatement:
Iwantto[report][onlinesalesandproductmargin]by…[productcategory,website,keywordsearchterm,referringwebsite,displayad,daypart,dayofweek,customerbehavioralcategory,customerre-targetingcategory].
Theabove“By”analysissentencebreaksdownassuch:
Theverbis[report],
Themetricis[onlinesalesandproductmargin],and
Thedimensionalattributesandcharacteristicsare[productcategory,website,keywordsearchterm,referringwebsite,displayad,daypart,dayofweek,customerbehavioralcategory,customerre-targetingcategory].
Thedatascienceteamisresponsibleforquantifyingwhichvariablesorcombinationsofvariablesarebetterpredictorsofperformance.Consequently,youwanttogivethedatascienceteamasmanyvariablesasispracticaltoconsider.Forexample,inoneprojectthebusinessstakeholders(teachers,inthiscase)wantedtoknowtheimpactthatachangeinthevalueofahousemighthaveonastudent'sclassroomperformance.SothedatascienceteamgrabbedsomeZillowdatatoseeiftherewasanycorrelation(therewasn't).
Herearesomeadditional“By”analysisstatements:
Iwantto[trend][hospitaladmissions]by…[diseasecategory,zipcode,patientdemographics,hospitalsize,areademographics,anddayofweek].
Iwantto[compare][currentversuspreviousmaintenanceissues]by…[turbine,turbinemanufacturer,dateinstalled,lastmaintenancedate,maintenanceperson,andweatherconditions].
Iwantto[predict][studentperformance]by…[age,gender,familysize,childnumberwithinfamily,familyincome,previoustestscores,currenthomeworkscores,andparent'seducationlevel].
Ihopethatyoucanseethatthesetypesofsentencesareveryeasytocreate.Also,the“By”analysistechniqueisperfectforafacilitatedbrainstormingsessionwherethegoalistofuelthegroupinnovativethinkingprocesstoidentifyadditionalvariablesandmetricsthatmightbebetterpredictorsofbusinessperformance.Andremember,asyougothroughanybrainstormingprocess,allideasareworthyofconsideration,andthebrainstormingprocessshouldnotfilterthesuggestionsandtherebyinadvertentlythrottlethecreativethinkingprocess.
Thedimensionalattributesandcharacteristicsthatfollowthe“by”phrasearethegoldinthedatascienceexplorationprocess.Thewideanddiversevarietyofdimensionanddimensionalattributesuncoveredbythe“By”analysisarecriticaltoguidingthedatascienceteam'sanalyticexplorationandmodelingprocess.The“By”analysiscansuggestadditionalvariablesandmetricsthatthedatascienceteammaywanttoexploreincreatingtheprescriptiveactions,scores,andrecommendationsthatareusedtosupportthetargetedbusinessinitiative.
“By”AnalysisExerciseContinuingwiththesportstheme,let'sintroduceanexercisethatallowsyoutoputthe“By”analysistowork.PretendthatyouaretheheadcoachfortheNationalBasketballAssociation's(NBA's)GoldenStateWarriorsandhavetoplaytheClevelandCavaliersinthe2015NBAChampionshipFinals.YourjobastheheadcoachoftheGoldenStateWarriorsistocraftadefensiveplanandgamestrategythatmaximizesyourchances(orprobability)ofwinningthegamebyminimizingtheshootingandoffensiveeffectivenessofCleveland'ssuperstar,LeBronJames.
Let'sstarttheanalysisprocessbygainingsomefundamentalinsightsregardingthe“hotspots”forshootersacrosstheNBA;thatis,wherethelocationsor“spots”areonthecourtwhereshootersaremostefficientasmeasuredby“pointspershot”(seeFigure9.2).
NOTE
The“pointspershot”metrictakesintoaccount(normalizes)thevalueofatwo-pointshotversusthevalueofathree-pointshot.ThechartinFigure9.2showsthat,generallyspeaking,three-pointshootingismoreeffectivethantwo-pointshootingexceptneartherim(dunksresultinaprettyhighshootingpercentage).
Figure9.2NBAshootingeffectiveness
Let'sdrilldownintoouranalysisprocessbyunderstandingLeBronJames'sspecificshootingtendenciesandperformance.TheshootinghotspotchartinFigure9.3providesagoodstartingpointinthedevelopmentofourdefensivestrategyagainstLeBronJames.Figure9.3showsLeBronJames'sshootingpercentagesfromdifferentspotsonthecourt.Thischarthelpsustostarttocontemplatethekeydecision:“WheredowewanttodirectorforceLeBron
Jamestogowhileonthecourtinordertomitigatehisoffensiveprowess?”
Figure9.3LeBronJames'sshootingeffectiveness
WhilethechartinFigure9.3isinterestinginhighlightingareasingeneralwhereLeBron'sshootingpercentagesarebetterorworse,tobeactionableyouneedtogetmoredetailedinsights.Tocreatemoreactionableinsights,weneedtounderstand“whatdetaileddataorinsightsdoIneedinordertocreatespecific,actionablerecommendationstomitigateLeBronJames'sshootingeffectiveness?”Thisistheperfecttimetoemploythe“By”analysistechnique.
Let'sputthe“By”analysistechniquetoworkbyapplyingthetechniquetothefollowingstatement:
Iwantto[know][LeBronJames'sshootingpercentage]by…
Takeamomenttojotdownsomevariablesormetricsthatcometomindfollowingthe“by”phrase(e.g.,“IwanttoknowLeBronJames'sshootingpercentageby…opponent”).
________________________
________________________
________________________
________________________
BelowaresomevariablesandmetricsthatIcameupwithusingthe“By”analysistechnique:
Athomeversusontheroad
Numberofdaysofrest
Shotarea
Opposingteam
Defender
Gamelocation
Gamelocationelevation
Gametimeweather
Gametimetemperature
Gametimehumidity
Time(hours)sincelastgame
Averagetimeofballpossession
Timeleftingame
Totalminutesplayedingame
Numberofshotsattempted
Numberofshotsmade
Locationofshotsattempted
Locationofshotsmade
Volumeofboos
Numberoffouls
Numberofassists
Playingaformerteam
Timeofday
Recordofopponent
Feelingstowardopponent
Performanceinlastgame
NumberofnegativeTwittercomments
Stadiumtemperature
Stadiumhumidity
Numberoffansinattendance
NumberofLeBronjerseysinattendance
Here'stheinterestingpoint:peoplewhohaveneverbeenanNBAbasketballcoachandevenpeoplewhomayhaveneverevenplayedbasketballcancomeupwithsomeofthemoreinterestingdimensionsanddimensionalattributesintryingtoidentifyvariablesandmetricsthatmaybebetterpredictorsofLeBronJames'sshootingtendenciesandperformance.
BuildingonsomeofthesuggestionsthatcameoutoftheLeBronJames“By”analysistechnique,let'striageLeBronJames'sshootingpercentagesforthe2014–2015regularseasonbyacoupleofdimensionsidentifiedinthebrainstormingsession:[HomeversusRoad]and[NumberofDaysRest].Table9.1showsLeBronJames'sshootingpercentages.
Table9.1LeBronJames'sShootingPercentages
2014–2015
OverallShootingPercentage
OverallShootingIndex
3-pointShootingPercentage
3-pointShootingIndex
Regularseason
48.8 100.0 35.4 100.0
Home 47.3 96.9 35.6 100.6
Road 50.2 102.9 35.3 99.7
0daysrest
49.8 102.0 38.0 107.3
1dayrest
46.3 94.9 32.3 91.2
2daysrest
51.3 105.1 37.3 105.4
3daysrest
52.7 108.0 42.9 121.2
4daysrest
57.1 117.0 60.0 169.5
6+daysrest
48.5 99.4 30.8 87.0
Source:http://stats.nba.com/player/#!/2544/stats/
You'renowstartingtogetsomeinterestinginsights.Remember,insightsareonlyobservationsburiedinthedatathatlookunusualwhencomparedtoanindividual'sstandardperformance.JustfromthesimpleanalysisinTable9.1,youcanstartuncoveringsomeinsightsaboutLeBron'sshootingtendenciesthatthedatascienceteammightwanttoexplorefurther.Forexample:
LeBronshootssignificantlyworsewhenhe'shadjustonedayofrest(8.8percentworsefromthree-pointrange).
IfyougiveLeBronfourdaysofrest,watchout!Hisshootingpercentagesimproveoverallandimprovedramaticallyforthree-pointshooting(69.5percentbetterthree-pointshootingwithfourdaysofrest).
AsdiscussedinChapter2,“BigDataBusinessModelMaturityIndex,”onceyoustartuncoveringinsightsburiedacrossthewidevarietyanddepthofdata,you
needthebusinesssubjectmatterexpertstoassessthevalueoftheseinsightsagainsttheS.A.M.criteria:
IstheinsightofStrategicvaluetowhatyouaretryingtoaccomplish?
IstheinsightActionable(i.e.,istheinsightataleveluponwhichIcanactonthatinsight)?
IstheinsightofMaterialvalue(i.e.,isthevalueoftheinsightgreaterthanthecosttoactonthatinsight)?
Oncean“insight”haspassedtheS.A.M.criteria,youwantthedatascienceteamtobuildtheanalyticmodelsthatquantifycauseandeffect,assessgoodnessoffit,andcreatetheprescriptiveactionsorrecommendationsthatprovideguidancetothefrontlineemployees(LeBronJames'sdefendersinthisexample)andmanagers(GoldenStateWarriorscoachingstaff)intheachievementoftheirbusinessinitiativeofminimizingLeBronJames'sshootingandoffensiveperformanceeffectiveness.
FootLockerUseCase“By”AnalysisContinuingtheFootLockerusecasethatwasstartedinChapter8,“ThinkingLikeaDataScientist,”wewanttoapplythe“By”analysistouncovernewvariablesandmetricsthatmightbebetterpredictorsofperformanceforthe“improvemerchandisingeffectiveness”businessinitiative.
Chapter8capturedthedescriptive,predictive,andprescriptivequestionsthatsupportedtheFootLocker“improvemerchandisingeffectiveness”businessinitiative.Asareminder,belowaresomeofthecustomerpromotionalquestionsthatwerecaptured:
DescriptiveAnalytics(Understandingwhathappened)
Whatcustomersaremostreceptivetowhattypesofmerchandisingcampaigns?
Aretherecertaintimesofyearwherecertaincustomersaremoreresponsivetomerchandisingoffers?
PredictiveAnalytics(Predictingwhatwillhappen)
Whichcustomersaremostlikelytovisitthestoreforaback-to-schoolpromotion?
WhichcustomersaremostlikelytorespondtothenewMichaelJordanbasketballshoe?
PrescriptiveAnalytics(Recommendingwhattodonext)
E-mailBillSchmarzoa50percentdiscountcouponwhenhebuystwopairsofNikeElitesockswhenhebuyshisnewpairofAirJordans.
TextMaxSchmarzotriple-pointbonuswhenhebuysNikeapparelthiscomingweekend.
MailAlecSchmarzoa$20cashcoupongoodonlyifhevisitsthestorewithinthenext14days.
Let'sputthe“By”analysistechniquetoworkagainstthefollowingquestion:
“WhatcustomersaremostreceptivetoFootLocker'smerchandisingcampaignsby…?”
Again,takeamomenttojotdownsomevariablesormetricsthatcometomindfollowingthe“by”phrase.I'llwaitforyoutojotdownyourideas(again,onevariableormetricperline).
_________________
_________________
_________________
_________________
ThefollowingisalistofsomeofthevariablesandmetricsthatIcameupwithwhenIappliedthe“By”analysistechniquetotheFootLocker'scustomerquestion:“WhatcustomersaremostreceptivetoFootLocker'smerchandisingcampaignsby…?”
Age
Gender
Maritalstatus
Numberofchildren
Lengthofmarriage
Incomelevel
Educationlevel
VIPloyaltycardmember
VIPmemberlengthoftime
VIPrewardsexpired(%)
VIPrewardsexpired($)
Ownorrentresidence
Tenureincurrenthome
Valueofcurrenthome
Favoritesports
Favoritesportsteams
Highschoolsportsinterest
Collegesportsinterest
Activeathlete
Typeofathleticactivity
Exerciseminutesperweek
Numberofdaysperweekexercised
…
Forpurposesofcompleteness,youwouldwanttoperformthe“By”analysisexerciseforacoupleofadditionalcustomerquestionsinordertocapturearobustsetofvariablesandmetricsthatcouldbeusedtopredicttheperformanceofthe“improvemerchandisingeffectiveness”businessinitiative.
ContinuingtheFootLockerexamplethatstartedinChapter8,belowaretheproductpromotionalquestionsthatwerecaptured:
DescriptiveAnalytics(Understandingwhathappened)
Whatproductsaremostsuccessfulwithwhatmerchandisingcampaigns?
HowmanybasketballshoesdidIsellduringlastyear'shighschoolandyouthbasketballseasons?
WhichproductsareslowmoversthatImightneedanin-storemerchandisingcampaigntomove?
PredictiveAnalytics(Predictingwhatwillhappen)
Whichshoesandapparelaremostlikelytosellwithaback-to-schoolpromotionalevent?
WhatisthelikelymarketbasketrevenueandmarginfromaBuyOneGetOneFree(BOGOF)event?
PrescriptiveAnalytics(Recommendingwhattodonext)
Withtheupcominghighschoolbasketballseason,promoteAirJordansandNikeElitesocksinthesamedisplayatthefrontofthestore.
Giventheendofthefootballseason,providein-storeBOGOFpromotionoffootballapparel.
Reduceprices50percentontheinventoryofbaseballcleatsinanticipationofincomingnewbaseballequipment.
Inthefollowinglist,the“By”analysistechniqueisappliedtotheFootLocker'sproductquestion:“Whatproductsaremostsuccessfulwithwhatmerchandisingcampaignsby…?”
Productcategory
Productsize
Productstyle
Productcolor
Productform
Producttype
Brand
Primarysport
Retailprice
Productreleasedate
Productdiscontinuedate
Brandage
Athleteendorser
AthleteendorserQscore
Athleteendorsersentiment
LastTVadvertisementdate
Brandsocialsentiment
ProductYelprating
…
Veryimportantnoteaboutthe“By”analysistechnique:thevariablesandmetricsuncoveredfromthe“By”analysistechniqueareonlylimitedbythecreativethinkingofthebusinessusers;thatis,thepeoplewholivethesedecisionsandquestionsdaily.1
Hopefullyyoucanseethatthenumberandvarietyofvariablesandmetricsuncoveredusingthe“By”analysistechniquecanbequitebountiful,andthemorevariablesandmetrics,thebetterfromadatascienceperspective.
SummaryThe“By”analysistechniqueisapowerfultoolinnotonlyhelpingtounderstandthekeymetricsanddimensionsofthebusinessbutalsoyieldinginsightsintoareasofthebusinessripefordatascienceanalysis.The“By”analysistechniquefuelsthecreativediscoveryofnewvariablesandmetricsbyleveragingthenaturalquestionandanswerexplorationofthebusinessusers.The“By”analysistechniqueusesasimplesentenceformat:
“Iwantto”
[Verb]suchassee,know,report,compare,trend,plot,predict,score,etc.
[Metric]suchassales,margin,profits,socialmediaposts,comments,physiciannotes,vibrationlevels,sensorcodes,etc.
“By”
[Dimensionordimensionalattribute]suchascity,state,zipcode,date,time,seasonality,productcategory,remodeldate,storemanagerdemographics,etc.
Finally,andmaybemostimportant,the“By”analysisisatechniquethatcandrivethecollaborationbetweenthebusinessusersandthedatascientiststouncovernewvariablesandmetricsthatcanguidethedatascientists'analyticsexplorationandmodeldevelopmentprocess.The“By”analysistechniquere-enforcestheimportanceofthe“thinkinglikeadatascientist”process.
InChapter10,wewillcoverhowtocombinethesevariablesandmetricstodevelopactionablescoresthatcanbeusedtoaddressthebusinessdecisionsthatsupportthe“improvemerchandisingeffectiveness”businessinitiative.
HomeworkAssignmentUsethefollowingexercisestoapplywhatyoulearnedinthischapter.
Exercise#1:PickoneofthequestionsforoneofyourkeybusinessentitiesorstrategicnounsthatyoucameupwithinChapter8andapplythe“By”analysistechnique.Ifpossible,getasmallgroupofco-workerstogetherandbrainstormthe“By”analysisasagrouptouncoverevenmorepotentialvariablesandmetrics.Fuelthecreativeprocesswithcoffeeanddonuts—lotsofdonuts—ifnecessary.
Exercise#2:Pickadifferentquestionforthesamekeybusinessentityandapplythe“By”analysistechniquetoseewhatadditionalvariablesandmetricsyouuncover.Again,donotworryatthispointifthedataisavailable.Nowisnotthetimetofilterthecreativethinkingoutcomes.Youwillhavetimetoevaluatethevalueandimplementationfeasibilityofeachofthepotentialvariablesanddatasourceslaterintheprocess.
Chapter10ScoreDevelopmentTechniqueInNewZealand,peoplearetakinga“thinkinglikeadatascientist”approachtooptimizingsocialworkerspendingandcaseworkprioritization.ArelatedBusinessWeekarticletitled“AMoneyballApproachtoHelpingTroubledKids”(May11,2015)highlightstherolethatscoresplayinidentifyingandprioritizingproblemareasanddecidingwhatcorrectiveactionstotake.Hereareacoupleofexcerptsfromthearticle:
Usingdatafromwelfare,education,employment,andthehousingagenciesandthecourts,thegovernmentidentifiedthemostexpensivewelfarebeneficiaries–kidswhohaveatleastonecloseadultrelativewho'spreviouslybeenreportedtochildsafetyauthorities,beentoprison,andspentsubstantialtimeonwelfare.“Therearemillion-dollar[cost]kidsinthosefamilies,”MinisterofFinanceBillEnglishsays.“Bythetimetheyare10,theirlikelihoodofincarcerationis70percent.You'vegottodosomethingaboutthat.”
…oneideaistoratefamilies,givingthemanumber[score]thatcouldbeusedtoidentifywho'smostatriskinthesamewaythatlendersrelyoncreditscorestodeterminecreditworthiness.“Thewaywemayuseit,it'sgoingtobelikeit'saFICOscore,”saysJennieFeria,HeadofLosAngeles'DepartmentofChildrenandFamilyService.Theinformation,shesays,couldbeusedbothtoprioritizecasesandtofigureoutwhoneedsextraservices.
Inwrappingupthe“thinkinglikeadatascientist”processthatbeganinChapter8andcontinuedinChapter9,thischapterfocusesontheroleofscoresinsupportinganorganization'skeybusinessdecisions.AsexhibitedintheprecedingNewZealandwelfareexample,scoresareaveryeffectivedatascienceconceptinaggregatingawidevarietyofvariablesandmetricsinordertocomeupwithayardstickorguidethatcanbeusedtosupportkeybusinessandoperationaldecisions.
Scoresareveryimportantconceptsintheworldofdatascience.Manytimes,theresultsofthedatascienceeffortswillbepresentedasscoresthatcanhelptoguidefrontlineemployees'andmanagers'decisionmakinginsupportofthetargetedbusinessinitiative.
Thepowerofascoreisthatitisrelativelyeasytounderstandfromabusinessstakeholderperspective.Itfocusesthedatascienceeffortsonidentifyingandexploringnewmetricsandvariablestoincludeinascorethatmightbeabetterpredictorofbusinessperformanceoranindividual'sbehaviors.
Thepurposeofthescoretechniqueistolookforgroupingsofmetricsandvariablesthatcanbecombinedtocreateanactionablescorethatyoucanusetosupportyourkeybusinessdecisions.Thesescoresarecriticalcomponentsof
the“thinkinglikeadatascientist”processbecausetheycanguidethedecisionsyourfrontlineemployeesaretryingtomakeand/orpredictthelikelihoodofacustomer'sactions,outcomes,orbehaviors.
DefinitionofaScoreLet'sstartbydefiningscore:
Ascoreisadynamicratingorgradestandardizedtoaidincomparisons,performancetracking,anddecisionmaking.
Ascorecanhelppredictthelikelihoodofcertainactionsoroutcomes.
Ascoreisanactionable,analytic-basedmeasurethatsupportsthedecisionsyourorganizationistryingtomakeandguidestheoutcomestheorganizationistryingtopredict.
AcommonexampleofascoreistheintelligencequotientorIQ.AnIQisderivedfromseveralstandardizedtestsinordertocreateasinglenumberthatassessesanindividual'sintelligence.TheIQisstandardizedat100withastandarddeviationof15,whichmeansthat68percentofthepopulationiswithin1standarddeviationofthe100standard(between85and115).ThisstandardizationmakestheIQeasiertocomparedifferentstudents,candidates,orapplicantsandsupportkeyhiring,promotion,andcollegeapplicationdecisions.
Thetruebeautyofascoreisitsabilitytoconvertawiderangeofvariablesandmetrics—allweighted,valued,andcorrelateddifferentlydependingonwhatisbeingmeasured—intoasinglenumberthatcanbeusedtoguidedecisionmaking.Andthetruepowerofthescoreistheabilitytostartsimple,andthenconstantlyfine-tuneandexpandthescorewithnewmetrics,variables,andtherelationshipsthatmightyieldbetterpredictorsofbusinessperformanceoranindividual'sbehaviors.
FICOScoreExampleManyorganizationshavebuilttheirbusinessmodelsonthedevelopmentofscoresthathelporganizationstomakebetterdecisions.Forexample,TraackrandApinionsarecompaniesthatassignscorestoinfluencersonsocialmediatohelpidentifywhoorganizationsshouldtargetfromamediaperspective.FICOmaybethebestexampleofanorganizationthathasbuiltitsbusinessaroundthedevelopmentofascore.1TheFICOscoreisusedtopredictthelikelihoodofaborrowertorepayaloan.Fair,Isaac,andCompanyfirstintroducedtheFICOscorein1989.MostreadersareprobablyfamiliarwiththeFICOscore(andyouhaveprobablyseenyourownFICOscoreseveraltimes),whichcombinesmultiplevariablesandmetricsaboutaloanapplicant'sfinancial,credit,andpaymenthistorytocreateasingularscorethatlendersusetopredictaborrower'sabilitytorepayaloan(seeFigure10.1).
Figure10.1FICOscoreconsiderations
Anindividual'sFICOscorecanrangebetween300and850.AFICOscoreabove650indicatesthattheindividualhasaverygoodcredithistory,whilepeoplewithscoresbelow620willoftenfinditsubstantiallymoredifficulttoobtainfinancingatafavorablerate(seeFigure10.2).2
Figure10.2FICOscoredecisionrange
TheFICOscoreamalgamatesawiderangeofconsumerfinancial,credit,and
paymentmetricsinordertogeneratethesinglescoreforaspecificindividual.ThepowerfulconceptbehindtheFICOscoreisthatitcombinesthiswiderangeofconsumerfinancial,credit,andpaymentmetricsintoasingle,predictivescorethatpredictsanindividual'slikelihoodtorepayaloan.
TodivedeeperintotheFICOscoreexample,weseethedataelementsthatareusedinthecalculationofanindividual'sFICOscoreincludepaymenthistory,creditutilization,lengthofcredithistory,newcreditapplications,andcreditmix.3
PaymentHistory.Thirty-fivepercentoftheFICOcreditscoreisbasedonaborrower'spaymenthistory,makingtherepaymentofpastdebtthemostimportantfactorincalculatingcreditscores.AccordingtoFICO,pastlong-termbehaviorisusedtoforecastfuturelong-termbehavior.Thisisameasureofhowdoyouhandlecredit;thinkcredit“behavioralanalytics.”Thisparticularcategoryencompassesthefollowingmetricsandvariables:
Paymentinformationonvarioustypesofaccounts,includingcreditcards,retailaccounts,installmentloans,andmortgages
Theappearanceofanyadversepublicrecords,suchasbankruptcies,judgments,suits,andliens,aswellascollectionnoticesanddelinquencies
Lengthoftimeforanydelinquentpayments
Amountofmoneystillowedondelinquentaccountsorcollectionitems
Lengthoftimesinceanydelinquencies,adversepublicrecords,orcollectionnotices
Numberofpast-dueitemslistedonacreditreport
Numberofaccountsbeingpaidasagreed
CreditUtilization.ThirtypercentoftheFICOcreditscoreisbasedonaborrower'screditutilization;thatis,thepercentageofavailablecreditthathasbeenborrowedbythatindividual.Thecreditutilizationcalculationiscomposedofsixvariables:
Theamountofdebtstillowedtolenders
Thenumberofaccountswithdebtoutstanding
Theamountofdebtowedonindividualaccounts
Thetypesofloans
Thepercentageofcreditlinesinuseonrevolvingaccounts,likecreditcards
Thepercentageofdebtstillowedoninstallmentloans,likemortgages
LengthofCreditHistory.FifteenpercentoftheFICOcreditscoreisbasedonthelengthoftimeeachaccounthasbeenopenandthelengthoftimesincetheaccount'smostrecentactivity.FICObreaksdown“lengthofcredithistory”intothreevariables:
Lengthoftimetheaccountshavebeenopen
Lengthoftimespecificaccounttypeshavebeenopen
Lengthoftimesincethoseaccountswereused
NewCreditApplications.TenpercentoftheFICOcreditscoreisbasedonborrowers'newcreditapplications.Withinthenewcreditapplicationcategory,FICOconsidersthefollowingvariables:
Numberofaccountsthathavebeenopenedinthepast6to12months,aswellastheproportionofaccountsthatarenew,byaccounttype
Numberofrecentcreditinquiries
Lengthoftimesincetheopeningofanynewaccounts,byaccounttype
Lengthoftimesinceanycreditinquiries
There-appearanceonacreditreportofpositivecreditinformationforanaccountthathadearlierpaymentproblems
CreditMix.TenpercentoftheFICOcreditscoreisbasedonrepayingthevarietyofdebt,whichisameasureoftheborrower'sabilitytohandleawiderangeofcreditincluding:
Installmentloansincludingautoloans,studentloans,andfurniturepurchases
Mortgageloans
Bankcreditcards
Retailcreditcards
Gasstationcreditcards
Unpaidloanstakenonbycollectionagenciesordebtbuyers
Rentaldata
ThepointofshowingallthedetailsbehindtheFICOscorecalculationistoreinforcethebasicconcept(andpower)ofascore—thatascorecantakeintoconsiderationawiderangeofdescriptivevariablesandmetricstocreateasinglepredictivenumberthatcanbeusedtosupportanorganization'skeydecisionsor,inthecaseoftheFICOscore,usedbylenderstopredictaloanapplicant'slikelihoodtorepayaloan.That'saverypowerfulconcept.Scoresareacriticalconceptingettingyourbusinessstakeholderstocontemplatehowtheymightwanttointegratedifferentvariablesandmeasurestocreateactionable,predictivescorestosupporttheirkeybusinessdecisions.
OtherIndustryScoreExamplesDifferenttypesofscorescanbecreatedtosupportdecisionmakingacrossawidevarietyofindustries.Infact,theabilitytocreateactionablescoresisonlylimitedbythecreativethinkingofthebusinessstakeholders;hence,theimportanceofgettingbusinessstakeholdersto“thinklikeadatascientist.”
Forexample,herearesomescorestoconsiderforthefinancialservicesindustry:
RetirementReadinessScore.Thiswouldbeascorethatmeasureshowreadyeachclientisforretirement.Thisscorecouldincludevariablessuchasage,currentannualincome,currentannualexpenses,networth,valueofprimaryhome,valueofsecondaryhomes,desiredretirementage,desiredretirementlocation(IowaisalotcheaperthanPaloAlto!),numberofdependentchildren,numberofdependentparents,desiredretirementlifestyle,andsoforth.
JobSecurityScore.Thisscorewouldmeasurethesecurityofeachindividual'sjob.Thisscorecouldincludevariablessuchasindustry,jobtype,employer(s),joblevel/title,jobexperience,age,educationlevel,skillsets,industrypublicationsandpresentations,Kloutscores,andsoon.
HomeValueStabilityScore.Thisscorewouldmeasurethestabilityofthevalueofaparticularhouse.Thisscorecouldconsidervariablessuchascurrentvalue,supply/demandratioofarea,housesaleshistory,valueofhousecomparedtocomparablehouses,taxassessmentcomparedtocomparablehouses,whetherit'saprimaryresidenceorrentalresidence,localprice-to-rentratio,localhousingvaluetrends(maybepulledfromZillow),distancefromahighschoolorjuniorhighschool,qualityratingofthathighschoolorjuniorhighschool,distancefromshopping,andothers.
Interestingly,combiningthehomevaluestabilityscorewiththeFICOscorewouldhaveprovidedamoreholisticassessmentofbanks'housingmarketexposurepriortothe2007financialmarketmeltdown.TheFICOscorewasinsufficientwhentryingtodeterminethelevelofhousingmarketriskasfinancialorganizationswerewritingmortgageloans.CouplingtheFICOscorewithahomevaluestabilityscorecouldhaveprovidedinvaluableinsightsasbanksdecided(madedecisions)astowhomtomakehomemortgageloansandinwhichhousingmarkets(e.g.,decidingwhichhousingmarketswere“over-valued”).
Thekeypointinthismortgagemarketcollapseexampleisthatitisimportanttoconsiderhowmultiplescorescanprovidedifferentperspectivesonthedecisionthatisbeingevaluated.Usingdifferentscorescanprovideamoreholisticassessmentofthetrueconditionsaroundwhichtomakethesekeybusinessdecisions.
Table10.1showsadditionalscoresfromavarietyofindustries.
Table10.1PotentialScoresforOtherIndustries
FinancialServices
CreditCards Manufacturing Gaming/Hospitality
FICORetirementReadinessInvestmentRisk
AttritionRiskFraudRiskProductPreferences
EquipmentMaintenanceSupplierReliabilitySupplierQuality
Player/CustomerLifetimeValueGamingPreferences
Education Healthcare Utilities ProSports
GraduationReadinessCohortsInfluence
WellnessConditionStressRisk
EnergyEfficiencyConservationEffectiveness
FatigueFactorMotivationFactor
Thepurposeofthescoretechniqueistolookforgroupingsofcommonorsimilarvariablesandmetricsthatcanbemeshedtogethertocreateascorethatcanguideyourdecisionmaking.Thesescoresareacriticalcomponentofthe“thinkinglikeadatascientist”process.Scorescanprovideinvaluablesupportforthedecisionsthatyouaretryingtomakeorwhatactionsoroutcomesyouaretryingtopredictwithrespecttoyourtargetedbusinessinitiative.
LeBronJamesExerciseContinuedLet'scontinuetheLeBronJamesexamplethatyoustartedinChapter9.TheexerciseaskedyoutoplaytheroleastheheadcoachfortheNationalBasketballAssociation's(NBA's)GoldenStateWarriorsinpreparingtoplaytheClevelandCavaliersinthe2015NBAChampionshipFinals.YourjobastheheadcoachoftheGoldenStateWarriorsistocraftadefensiveplanandgamestrategythatmaximizesyourchances(orprobability)ofwinningtheseriesbyminimizingtheshootingandoffensiveeffectivenessofCleveland'ssuperstar,LeBronJames.
Weusedthe“By”analysistechniqueinChapter9toteaseoutavarietyofvariablesandmetricsthatmightbepredictorsofLeBronJames'sshootingprowess.Belowisthelistofthevariablesthatcameoutofthat“By”analysisprocess.
Iwantto[know][LeBronJames'sshootingpercentage]by…
Athomeversusontheroad
Numberofdaysrest
Shotarea
Opposingteam
Defender
Gamelocation
Gamelocationelevation
Gametimeweather
Gametimetemperature
Gametimehumidity
Time(hours)sincelastgame
Averagetimeofballpossession
Timeleftingame
Totalminutesplayedingame
Numberofshotsattempted
Numberofshotsmade
Locationofshotsattempted
Locationofshotsmade
Volumeofboos
Numberoffouls
Numberofassists
Playingaformerteam
Timeofday
Recordofopponent
Feelingstowardopponent
Performanceinlastgame
NumberofnegativeTwittercomments
Stadiumtemperature
Stadiumhumidity
Numberoffansinattendance
NumberofLeBronjerseysinattendance
NextwewanttounderstandthedecisionsthattheGoldenStateWarriorscoachingstaffneedstomakeincraftingadefensivestrategyagainstLeBronJames.Chapter8introducedtherecommendationsworksheetasatooltolinkthekeybusinessdecisionstotherecommendationsandthesupportingscores(seeFigure10.3).
Figure10.3Recommendationsworksheet
Inthe“mitigateLeBronJames'soffensiveeffectiveness”businessinitiative,someofthekeydecisionsthattheGoldenStateWarriorscoachingstaffneedtomakeare:
WhoisgoingtoguardLeBron?
WhatisthebestindividualdefensivestrategyagainstLeBron?
WhatisthebestteamdefensivestrategyagainstLeBron?
Next,youwanttoidentifytherecommendationsyoucoulddeliverinsupportofthosekeydecisions.Forexample,forthe“WhoisgoingtoguardLeBronJames?”decision,youmightwanttomakethefollowingrecommendations:
Whichdefender?
Whichdefenderatwhichtimesofthegame?
Whichdefenderinwhichgamesituations?
Figure10.4showstheupdatedrecommendationsworksheet.
Figure10.4Updatedrecommendationsworksheet
Nowyouwanttoreviewthevariablesandmetricsthatcameoutofthe“By”analysisandlookforcommongroupings.Forexample,thefollowingvariablesandmetricsthatcameoutofthe“By”analysisrelatetohow“Fatigued”LeBronmightbeatanypointinthegame:
Hourssincelastgame
Howmanygamesplayedintheseason
Averagenumberofminutesplayedpergame
Minutesplayedinthecurrentgame
Minuteshandlingtheballinthecurrentgame
Numberofshotstakeninthecurrentgame
Timeremaininginthecurrentgame
Awayorhomegame
ThisfatiguescorecouldbeusedtomeasurehowtiredorexhaustedLeBronisatanypointinthegame.Thefatiguescoreiscreatedfromacombinationofhistoricalmetrics(numberofgamesplayedintheseasonsofar,averagenumberofminutesplayed)combinedwithreal-time,in-gamemetrics(minutesplayedinthegame,numberofshotstakeninthegame,minuteshandlingtheballinthegame).UpdatingLeBron'sfatiguescorethroughoutthegame(sincemanyofthesupportingmetricschangeduringthegame)canleadtoin-gamerecommendationssuchasdefenders,individualdefensivestrategy,andteamdefensivestrategy.
A“Motivation”scorecouldbecreatedoutofthefollowingvariablesandmetrics:
In-gameperformance
Recordofopponent
Defenderguardinghim
Volumeofboos
Playingagainstaformerteam
NumberofLeBronjerseysinthestands
Themotivationscorewouldbeameasureofhow“motivated”LeBronisforthisparticulargame,andhowhardheiswillingtopushhimselfwhenhegetstiredtogetthewin.Themotivationscore,whencombinedwiththefatiguescore,canleadtoin-gamerecommendationsaboutdefenders,individualdefensivestrategy,andteamdefensivestrategy.Figure10.5showsthefinalversionoftherecommendationworksheet.
Figure10.5Completedrecommendationsworksheet
Itisinterestinghowthecombinationofmultipleminormetricshasthepotentialtoyieldamuchmoreactionableandpredictivescore.Thisprocessofuncoveringandgroupingmetricsandvariablesintohigher-levelscoresishighlyiterativewithlotsoftrialanderrorasthedatascienceteamtriestovalidatewhichcombinationsofmetricsandvariablesareactuallybetterpredictorsofperformance.
FootLockerExampleContinuedThroughoutChapters8and9,youapplied“thinkinglikeadatascientist”techniquesandconceptsinanexercisebasedonFootLocker.YouwillnowcompletetheFootLockerexercisebypullingeverythingtogethertoidentifyandcreateactionablescoresthathelpFootLocker“improvemerchandisingeffectiveness.”
InChapter9weconductedthe“By”analysisforFootLocker's“improvemerchandisingeffectiveness.”Theresultsofthe“By”analysisforonecustomerquestionisshowninthefollowinglist:
Age
Gender
Maritalstatus
Numberofchildren
Lengthofmarriage
Incomelevel
Educationlevel
VIPloyaltycardmember
VIPmemberlengthoftime
VIPrewardsexpired(%)
VIPrewardsexpired($)
Ownorrentresidence
Tenureincurrenthome
Valueofcurrenthome
Favoritesports
Favoritesportsteams
Highschoolsportsinterest
Collegesportsinterest
Activeathlete
Typeofathleticactivity
Exerciseminutesperweek
Numberofdaysperweekexercised
Glancingoverthedifferentmetricsandvariablesthatcameoutofthat“By”analysis,youwanttolookforcommongroupings.Forexample:
Youcouldgroupmetricsandvariablessuchas“VIPmember,”“Lengthoftime(tenure)asaVIPmember,”“FrequencyofuseofVIPcard,”“Frequencyofredeemingrewardpoints,”and“Percentageofexpiredrewards”intoa“CustomerLoyalty”score.
Youcouldgroupmetricsandvariablessuchas“Favoritesports,”“Favoritesportsteams,”“Highschoolsportsteamsupporter,”“Collegesportsteamsupporter,”and“Amountofteambrandedapparelpurchased”intoa“SportsPassion”score.
Finally,youcouldgroupmetricsandvariablessuchas“Activeathlete,”“Typeofathleticactivity,”“Frequencyofathleticactivity,”“Averageweeklyamountofathleticactivity,”and“Wearshealthmonitor”intoan“AthleticActivity”score.
Figure10.6showstheresultsofthegroupingofmetricsandvariablesinto
actionablescoresaboutFootLocker'scustomers.YouwouldwanttodoasimilarexerciseforFootLocker'sotherkeybusinessentitiessuchasproductsandstores.
Figure10.6PotentialFootLockercustomerscores
Finally,let'spulleverythingtogetherintoarecommendationsworksheetthathighlightshowyoumightusetheFootLockercustomerscorestohelpguideyourmerchandisingdecisions(seeFigure10.7).
Figure10.7FootLockerrecommendationsworksheet
Thebrainstormingofthedifferentmetricsandvariablesusingthe“By”analysistechniqueandthesubsequentgroupingoftheresultingmetricsandvariablesintocommonscoresisprobablythemostenjoyablepartofthe“thinkinglikeadatascientist”process.Youarefreetoapplyyourcreativejuicestobrainstormdatasourcesandmetricsthatmightbeusedaspartofyourscore.Again,noideaisabadidea.Letthedatascienceteamdecideviaitsanalyticmodelingwhichdatasourcesandmetricsarethebestpredictorsofbusinessperformance.
Buthowdoyouputthesescoresoranalyticsintoaction?HowdoesanorganizationlikeFootLockerleveragethesescorestoimproveitscustomerengagementandmerchandisingdecisions?
OneexamplemightbehowtheFootLockermarketingstakeholdersusethescorestoprioritizetheircustomeroffersandpromotions.Forexample,todaymostorganizationsdeterminethecustomerlifetimevalue(CLTV)basedontheprevious12to18monthsofsales(seeFigure10.8).
Figure10.8CLTVbasedonsales
ThegoaloftheCLTVscoreistohelpmarketingandstorepersonneltodeterminethe“value”ofacustomerthatcansubsequentlybeusedtodeterminewhogetswhatsortsofoffers.Unfortunately,sincethesalesnumbersareahistoricalperspectiveonspendandvalue,mostorganizationsjustcreatearuleofthumbthatallcustomersgetaflatrebate(5percent)onwhatevertheyspend.Boring.
However,whatifyouleveragedtheCustomerLoyaltyandtheAthleticActivityscorestocreateamaximumcustomerlifetimevalue(MCLTV)topredictwhichcustomersmighthaveuntappedsalespotentialandtowhichtypesofpromotionsorofferstheymightbemostresponsive(seeFigure10.9)?
Figure10.9MorepredictiveCLTVscore
YoucouldusethisMCLTVscoretoguidekeybusinessdecisionssuchas:
Whichcustomersgetwhatsortsofpromotions(inordertocapturemoreofeachcustomer'suntappedpotentialvalue)
Whatsortsofspecialeventstooffertowhichcustomers(inordertodriveloyaltyandincreaseeachcustomer'sMCLTV)
Whichstoresgetahigherallocationofpopularproductsbasedonthestoremaximumlifetimevaluescore(wherethe“storemaximumlifetimevalue”scoreisthesumofthe“MCLTV”scoresforthecustomerswhocometothatstoreonaregularbasis)
Hopefullythisisasimplebutpowerfulexampleofhowtoleveragescorestocreatehigherlevelmaximumvaluescoresthatcanbeusedtodrivetheanalyticsintoaction(viadecisions)acrosstheorganization.
SummaryAsyoucompletethe“thinkinglikeadatascientist”process,youcanseehowscoresareaveryimportantandactionableconceptforbusinessstakeholderswhoaretryingtoenvisionwhereandhowdatasciencecanimprovetheirdecisionmakinginsupportoftheirkeybusinessinitiatives.AsyousawfromtheFICOscoreexample,scoresaidindecisionmakingbypredictingthelikelihoodofcertainactionsoroutcomes(e.g.,likelihoodtorepayaloan,inthecaseoftheFICOscore).
Thebeautyofascoreisitsabilitytointegrateawiderangeofvariablesandmetricsintoasinglenumber.Thepowerofthescoreistheabilitytostartsmallandthenconstantlysearchfornewmetricsandvariablesthatmightyieldbetterpredictorsofperformance.
Simplebutpowerful,exactlywhatbigdataanddatascienceshouldstrivetobe.
HomeworkAssignmentUsethefollowingexercisestoapplywhatyoulearnedinthischapter.
Exercise#1:Taketheresultsfromthe“By”analysisconductedinChapter9foryourselectedbusinessinitiativeandlookforcommongroupingsorpotentialscores.ItmaybeeasiertowriteeachofthemetricsandvariablesontoaseparatePost-itnoteandplacethemonaflipchartorwhiteboard.Thatwillmakeiteasiertomovethemetricsandvariablesaroundasyoulookforcommongroupingsorpotentialscores.
Exercise#2:Completetherecommendationsworksheetforyourselectedbusinessinitiative.ValidatethatthescoresuncoveredinExercise#1supportthedecisionsandrecommendationsthatyouneedtosupportyourselectedbusinessinitiative.
Exercise#3:Contemplatehowyoumightcreateamaximumlifetimevaluescorethatcouldbeusedtosupportthekeydecisionsthatyouaretryingtomakeaboutyourtargetedbusinessinitiative.Ithinkyouwillfindthatthemaximumlifetimevaluescorecanbeusedtoprioritizespendandfocusinbusinessinitiativesasdiverseasmarketingeffectiveness,patientcare,teacherretention,predictivemaintenance,revenueprotection,andnetworkoptimization.
Notes1FICOisasoftwarecompanybasedinSanJose,California,andfoundedbyBillFairandEarlIsaacin1956.ItsFICOscore,ameasureofconsumercreditrisk,hasbecomethestandardformeasuringaconsumer'sabilitytorepayaloanintheUnitedStates.
2http://tightwadtravelers.com/check-fico-credit-score-free
3FICO'sfivefactors:ThecomponentsofaFICOcreditscore(http://www.creditcards.com/credit-card-news/help/5-parts-components-fico-credit-score-6000.php).
Chapter11MonetizationExerciseSometimesitisusefultoworkbackwardsinthe“thinkinglikeadatascientist”process.Youcandothisbyfirstidentifyingthepotentialrecommendationsthattheorganizationcoulddelivertoitscustomersandfrontlineemployees,andthenworkingbackwardstoidentifythesupportingdataandanalyticrequirements.
Thischapterintroducesatechniquecalledthemonetizationexercisethatseekstounderstandhowtheorganization'sproductorservicesareusedbyitscustomers,andthenidentifyhowthecustomerandproductusagedatacanbeusedtocreatenewmonetizationopportunities.Theprocessworksbackwardstouncoverthemetrics,variables,data,andanalytictechniquesthatyoumightneedtosupportthenewmonetizationopportunities.
Themonetizationexerciseprovidesanopportunitytouncovernewproductand/orserviceopportunitiesthroughtheidentificationanddeliveryofnewcustomerandfrontlineemployeerecommendations.Themonetizationexerciseworksbyfirstunderstandingtheproductusagepatternsandcustomerusagebehaviorsassociatedwithaparticularproductandservice.Theprocessthenseekstoidentifycomplementaryorsecondaryrecommendationsthatcanbepackagedanddeliveredalongwiththatproductorservice(thinktheDataMonetizationphaseoftheBigDataBusinessModelMaturityIndex).Followingisthemonetizationexerciseprocess:
Step1:Understandproductusagecharacteristicsandbehaviors
Step2:Developpersonasforeachcustomertype(includingkeydecisionsandpainpoints)
Step3:Brainstormpotentialcustomerrecommendations
Step4:Identifysupportingdatasources
Step5:Prioritizemonetizationopportunities(revenue)
Step6:Developmonetizationplan
Togetcomfortablewiththistechnique,you'regoingtousethemonetizationexercisetouncovernewmonetizationopportunitiesformynewfitnesstracker—awearabledevicethatmonitorsandprovidesfeedbackonmyrunningandwalkingactivities.Thegoalofthisparticularmonetizationexerciseistoidentifycomplementaryornewmonetizationopportunitiesincluding:
Newproductsand/orservicesthatcanbesoldtoexistingcustomers
Newproductsand/orservicesthatcanbeusedtoacquirenewcustomers
Newrevenueopportunitiesforthefitnesstrackermanufacturer'scurrentchannelpartners(e.g.,SportsAuthority,Dick'sSportingGoods,FootLocker)
Newmarketsassociatedwithfitness,exercise,andevenpotentiallywellness
Newaudienceswhomightfindthenewfitnessandwellnessservicescompelling
Newchannelsthroughwhichtosellthefitnesstrackerandtheassociatednewservices
Let'sseethemonetizationexerciseinaction!
FitnessTrackerMonetizationExampleIntryingtostaytruetomyannualNewYear'sresolutiontoliveahealthierandmoreathleticallyfitlife,Iwasthinkingaboutupgradingmycurrentfitnesstracker.ThemostimportantrequirementsformyidealfitnesstrackeraretheabilitytoaddGPStrackingandnewperformancemetricstomyworkouts.Inthinkingaboutthefitnesstrackermarketplace,Isawthatthereseemstobelotsofopportunitiesforfitnesstrackermanufacturerstoprovideadditionalproductsandservicesthatwouldmaketheirfitnesstrackersmorevaluabletotheconsumer,aswellasprovidedramaticbusinessbenefitstothefitnesstrackermanufactureranditschannelpartners.
Let'swalkthroughanexampletoseehowthefitnesstrackermanufacturercouldleveragethemonetizationexercisetocreatenewproductsandservicesanduncovernewmonetizationopportunities.
Step1:UnderstandProductUsageThefirststepinthemonetizationexerciseisforthefitnesstrackermanufacturertounderstandthekeyfeaturesandcapabilitiesoftheproductorservicebeinganalyzed.Forexample,myidealfitnesstrackerwouldhavefollowingfunctionality:
Providesacompletehistoryofmyworkoutsincludingmystartandfinishtimes,timeelapsed,distance,pace,andcaloriesburned
Measuresmycurrentspeed,distance,timeelapsed,pace,andcaloriesburned
Hasabuilt-inGPSthatdeliversaccuratespeedanddistancedatareadingsandmapsmyworkout
Monitorsmyheartrate
Recordsupto50runsandmypersonalbests
Enablesmetoeasilyreviewandanalyzemyworkouthistory
Allowsmetodownloadmyworkoutdataformoredetailedanalysis(yeah,Iknow,I'manerd)
DeliversrecognitionalertswhenIbeatapersonalrecord
Integratesperformanceresultseasilyintomydifferentsocialmedianetworks(andsupportsgamificationsoIcanrankmyperformanceresultsversusthoseofmyfriends)
WhatiscriticallyimportanttothefitnesstrackermanufacturerishowthefitnesstrackerisusedandthedecisionsIamtryingtomakeassociatedwiththoseusagebehaviors.Frommyownpersonalexperience,thefitnesstrackerencouragesdifferentusagebehaviorssuchas:
Encouragesmetotakemorewalksincludingalotmorewiththedog(poor
Puffer)
Encouragesmetotakethestairsinsteadoftheescalator
EncouragesmetowalkaroundtheairportterminalasIwaitformydelayedflighttofinallydepart
EncouragesmetoparkfartherawayfromstoresorrestaurantssothatIhavelongertowalk,ortowalktothefurthestbathroominthemalljusttobuildupmysteps
Encouragesmetoridemybikeinsteadofdrivethecarforshorttrips
Thesebehavioralchanges,andthedecisionsassociatedwiththosebehaviors(e.g.,whatshoestowear,whatrunningroutestotake,howlongtorun,withwhomtorun),providenewmonetizationopportunities,whichmeansthatorganizationsneedtogotheextramiletotrulyunderstandnotonlyhowtheirproductisusedbutalsothepersonalbehaviorsthatareassociatedwiththeirproductusage.
Step2:DevelopStakeholderPersonasStep2isforthefitnesstrackermanufacturertoidentifyandunderstanditsdifferentcustomertypes.Identifyingandunderstandingtheorganization'sdifferentcustomertypesisaprocessthatwascoveredinChapter8.Foreachofthecustomertypes,themanufacturerwouldwanttocreateaseparatepersonathatcapturesthecustomers'tasks,decisions,andassociatedpainpointswithrespecttotheirusageofthefitnesstracker.
Figure11.1showsapersonaforakeycustomertypethatIhavelabeledthe“SpiritedRunner”(oratleastthat'showIwouldclassifymyself).
Figure11.1“Adayinthelife”customerpersona
Asyouhaveseenintheuseofpersonasinthepreviouschapters,the“dayinthelife”personaseekstoprovideabaselineunderstandingofthetasks,decisions,andpainpointsassociatedwiththeusageoftheproductorservice.Forexample,inFigure11.1,the“spiritedrunner”personahasthefollowingdecisionstocontemplateforthe“earlymorningrunaroundtheneighborhoodtojump-starttheday”task:
WhatrunningshoesdoIwear?
WhatgeardoIwear?
HowlongdoIrun?
WhatroutedoIrun?
DoIrunaloneorwithafriend?
Asyouknowfromthepreviousexercisesinthisbook,understandingthedecisionsthatthekeyusersorbusinessstakeholdersaretryingmakeiscriticaltouncoveringnewmonetizationopportunities.
Therearelikelyotherdecisionsthatcouldbecapturedforthispersona,soitisworththeextraefforttoputyourselfintheperson'sshoestobetterunderstandthedecisionsheorsheistryingtomakeandtheassociatedpainpoints.
Personascouldalsobedevelopedforadditionalrunnerssuchas:
Extremerunner(runsmarathons,Ironmancontests,andadventureraces)
Occasionalrunner(runsacoupleoftimesaweekbutisnotveryseriousaboutrunning)
Reluctantrunner(runsonlyatthebeginningofeachnewyearaspartofhisorherNewYear'sresolutions)
However,therearesomeotherimportantbusinessstakeholdersforwhichthefitnesstrackermanufacturerwouldwanttocreateadditionalpersonas.Thoseadditionalbusinessstakeholderpersonasinclude:
Fitnesstrackermanufacturerproductdevelopment(whichcouldalsoincludeproductmanagementandproductmarketingforcompleteness)
Fitnesstrackermanufacturersalesandmarketing
Fitnesstrackermanufacturerchannelpartners(FootLocker,SportsAuthority,BigFive,Dick's)
Asahomeworkexercise,youwillbeaskedtocreatepersonasforoneoftheseadditionalbusinessstakeholders.
Step3:BrainstormPotentialRecommendationsStep3istobrainstormpotentialrecommendationsthatcouldbedeliveredtoeachbusinessstakeholder.Thatis,whatrecommendationscouldtheorganizationdelivertothedifferentstakeholdersthatbenefitorsupportthestakeholder'sdecisions?Therearetwoanglesthatyoucanleveragetohelpuncoverpotentialrecommendations:
Understandthedecisionsthedifferentstakeholdersneedtomakeandtheassociatedpainpoints,andcontemplaterecommendationsthatmightsupportthedecisionsand/orhelptoaddresstheassociatedpainpoints
Leverageyourobservationsaboutthepersonalbehavioralchangesinducedbythefitnesstrackertoidentifyotherpotentialrecommendations
Youcoulduseanold-fashionedfacilitatedbrainstormingsession(completewithlotsofPost-itnotes)tobrainstormpotentialrecommendationsforeachofthekeybusinessstakeholdersfromtheperspectivesofthedecisionsthattheyaretryingtomakeandtheassociatedbehavioralchanges.
Table11.1showssomepotentialrecommendationsthatthefitnesstrackermanufacturercoulddelivertothecustomerpersonabasedonthedecisionsthatthecustomeristryingtomakeandthedesiredbehavioralchanges.
Table11.1PotentialFitnessTrackerRecommendations
Decision PotentialRecommendations
WhatrunningshoesdoIwear?
Optimalrunningshoesgiventheconsumer'srunningandwalkingbehaviors,patterns,tendencies,routes,andphysicalattributesWhentoreplacerunningshoesgivenhowmuchtheconsumerhasrunonthoseparticularshoes,howfrequentlytheconsumerruns,thetypeofterrainonwhichtheconsumerruns,andcurrent“wearandtear”oftheshoesRunningaccessoriesorapparelsuchasspecialrunningsocks,thermaltights,stockingcaps,andglovesforthecoldweatherwhenItraveltoIowatovisitmyson
HowlongdoIrun?
Newperformancemetricssuchaselevationcovered,workouteffortlevel,circuittrainingmetrics,caloriesburned,etc.
WhatroutedoIrun?
Newlocalrunningroutesneartherunner'shomeorfavoriterunningroutesNewtravelingrunningrouteswhentheconsumeristravelingtootherareasofthecountry
DoIrunaloneorwithafriend?1
Potentialrunningpartnersbasedonsocialmediacontacts,runningtendencies,andrunninglocations
Step4:IdentifySupportingDataSourcesStep4istobrainstormthedifferentdatasourcesthatonemightneedinordertocreatetherecommendations.
NOTE
Iusethetermmightfrequentlytoconveythatanimportantpartoftheexerciseistonotpassjudgmentonthevalueorviabilityofthebrainstormeddatasources.Youwanttocollectanyandallideasregardingpotentialdatasources.Allideasareworthyofconsideration.Determiningthevalueorviabilityofthedatasourceduringthebrainstormingprocessonlyinhibitsthecreativethinkingprocess.Wewilldeterminethevalueandviabilityofthedatasourceslater.
Table11.2providesanexampleofsomeofthedatasourcesthatyoumightwanttoconsidertosupportthedevelopmentoftherecommendations.
Table11.2RecommendationDataRequirements
KeyStakeholder:EndConsumer
PotentialRecommendations
PotentialDataSources
Optimalrunningshoes
Exercisedata:performancedataaboutmyexercisesincludinglengthoftime,effortlevel,caloriesburned,distancecovered,pointsearned,etc.WorkoutGPSdata:dataaboutmyworkoutrouteincludingamapoftheroute,routeterrain,elevation,timeofday,etc.Weatherdata:dataabouttheweatherconditionsduringmyworkoutincludingtemperature,precipitation,humidity,etc.Runnerdata:weight,height,age,gender,bodymassindex,shoesize,widthoffoot,high/lowarch,preferredterraintype,etc.
Whentoreplacerunningshoes
Shoedata:detaileddataaboutmyshoesincludingmanufacturer,brand,typeofshoe,sizeofshoe,whenshoewasbought,whereshoewasbought,whereshoewasmade,userreviews,etc.Note:thefitnesstrackercouldprovideanoptionthatallowstherunnertotakeaphotooftheshoesandtheappautomaticallyprovidesdataabouttheconditionoftheshoe.WorkoutGPSdata:dataaboutmyworkoutrouteincludingamapoftheroute,routeterrain,elevation,timeofday,etc.Shoeweardata:askconsumertotakeperiodicphotosofthesolesinorderforthemanufacturertotrackshoewearandtear
Runningaccessories Inventoryofmyrunningaccessories:brand,type,size,whereIboughtit,whenIboughtit,whatIboughtitwithRunningaccessoriesusagedata:whatIwearinwhatconditions,whatIwearincombinationwithotherworkoutitems
Newperformancemetrics
Allowuserstocreateandsharetheirowncalculationsandperformancemetrics(SchmarzoPerformanceIndex=INTEGER(Steps/1000)+INTEGER(FuelPoints/1000))AllowuserstodownloadthedatatocreateandsharenewreportsandanalyticsIntegratefitnesstrackerdatawithotherexerciseappslikeMapMyFitnessorMyFitnessPal
Newlocalrunningroutes
AnalyzeGPSandexercisedataacrossallfitnesstrackerusersinordertoidentifynewroutestowhichImightbeinterested
Integratethird-partyappslikeMapMyFitnessandMyFitnessPalforcapturingadditionalroute,exercise,andworkoutdata
Newrunningrouteswhiletraveling
Collectallrunningandwalkingroutesacrossallfitnesstrackercustomersbylocationandexercisetype(lightwalking,heavyrunning,etc.)Matchmyrunningandwalkingtendenciestothecollectionofrunningandwalkingroutesinordertomakenewrouterecommendations
Potentialrunningpartners
SocialmediacontactsfromFacebook,Twitter,Instagram,etc.RelevantsocialmediapostsfrommysocialmediafriendsabouttheirrunningbehaviorsandpatternsandexercisehabitsCurrentlocationofmysocialmediacontacts(inordertomakereal-timerunningpartnerrecommendations)
Step5:PrioritizeMonetizationOpportunitiesStep5isfocusedonprioritizingtherecommendationsfromtheperspectivesofbusinessvalueandimplementationfeasibility.Forthisexercise,youwillusetheprioritizationmatrix(whichiscoveredindetailinChapter13),butwiththreedimensions:
Valueoftherecommendationtotheconsumer
Valueoftherecommendationtothefitnesstrackermanufacturer
Implementationfeasibilityoftherecommendationoverthenext9to12months(basedontheavailabilityofthesupportingdata)
Walkingthroughafacilitationprocesstoexploreandtriagethesethreedimensionsishardtodoinabook;however,youcanleveragebrainstormingandpollingtechniquestogetahigh-levelrankingorratingfortheanswerstothesethreedimensionsasseeninTable11.3.
Table11.3RecommendationsValueVersusFeasibilityAssessment
Recommendation ConsumerValue
ManufacturerValue
Feasibility
A.Optimalnewrunningshoes1
Medium High High
B.Whentoreplacerunningshoes
High High Low
C.Newlocalroutes High Low Medium
D.Runningpartners Medium Low Low
E.Runningapparel Medium High High
F.Routeswhentraveling Low Low Medium
G.Newrunningmetrics High Low Medium
Sincethreedimensionsdon'tworkverywellonatwo-dimensionalsheetofpaper,youwillleverageavisualizationtechnique(shadeofthedots)thatallowsyoutomimicthreedimensionsinatwo-dimensionalenvironment.Figure11.2showswhatthefinalresultsoftheprioritizationprocessmightyield.
Figure11.2Fitnesstrackerprioritization
Step6:DevelopMonetizationPlanAsyoucanseefromFigure11.3,decidingonthe“right”monetizationopportunityisnotalwaysstraightforward.TheconsumerspreferrecommendationsB(whentoreplacerunningshoes),C(recommendingnewroutes),andG(creatingnewmetrics),butonlyrecommendationBisofhighvaluetothefitnesstrackermanufacturer(sinceitleadstomoredirectsales).AndunfortunatelyrecommendationBisn'teasyfromanimplementationfeasibilityperspectivesince
itrequiressignificantconsumer-provideddata.
Figure11.3Monetizationroadmap
Oh,whatisonetodo?
Maybelikeachessgame,theanswerliesacoupleofmovesbeyondtheobvious.MaybethefitnesstrackermanufacturerwouldbebestservedtothinkaboutaroadmapthatlookslikeFigure11.3.
Themonetizationroadmapwouldlookassuch:
Phase1wouldfocusonrecommendationsA,C,andEinordertobuildconsumerinterestinthefitnesstrackerproductsandstarttocollectmoredataaboutrunnersandtheirrunningbehaviors(usingconsumerrunningbehavioralandnextbestofferinganalytics).
Phase2wouldthendeliverrecommendationD,whichallowsthefitnesstrackermanufacturertobuildupitsexpertiseinsocialmediaanalysisinidentifyingandrecommendingpotentialrunningpartners(usingcohortsanalysis).
Phase3wouldthenfocusonrecommendationB,whichhasthehighestvaluetothefitnesstrackermanufacturerandbuildsontheanalyticexpertisethatitdevelopedinphase1tomoveintotheareaofpredictivemaintenanceandproductreplacementanalytics.
Finally,phase4wouldthendeliveronrecommendationG,whichfosterscommunitybybuildingandsharingnewperformancecalculations,metrics,analytics,andreportsbetweenfitnesstrackercommunitymembers.
Thismonetizationroadmaphasthreebigbenefitsforthefitnesstrackermanufacturer:
Capturesmoreandmoredataaboutrunners'usagebehaviors,patterns,andtendencies
Capturesmoredataaboutproductusageandwear
Graduallybuildsuptheorganization'sdatasciencecapabilitiesinareassuchasconsumerbehavioralanalytics,nextbestoffer,cohortsanalysis,predictivemaintenance,andproductreplacementanalytics
SummaryThischapterintroducedthemonetizationexerciseascomplementarytothe“thinkinglikeadatascientist”processtohelporganizationstouncovernewproductand/orserviceopportunitiesthroughtheidentificationofnewcustomerandemployeerecommendations.Themonetizationexerciseisanon-technology,business-centric,organizational-alignmenttechniquethatusesthefollowingprocesstouncovernewmonetizationopportunities(phase4oftheBigDataBusinessModelMaturityIndex):
Identifyandunderstandhowcustomersuseyourproductsand/orservices
Identifyandunderstandkeybusinessstakeholders(customers,frontlineemployees,partners)includingtheirkeytasks,decisions,andassociatedpainpoints
Brainstormthetypesofrecommendationsthatyoucoulddelivertothestakeholdersbasedontheirusageoftheproductorservice
Identifythedifferentdatasourcesthatmighthelpsupporttherecommendations
Gothroughavaluationprocesswhereyoucontemplatethreekeyvariablesforeachrecommendation:valueoftherecommendationtothecustomer,valueoftherecommendationtothemanufacturer,andimplementationfeasibility
Lookforopportunitiestoclusterrecommendationsintosimilargroupsinordertocreateamonetizationroadmap
HomeworkAssignmentUsethefollowingexercisestoapplywhatyoulearnedinthischapter:
Exercise#1:Developapersonaforoneofyourorganization'skeycustomers(stakeholders).Besuretocarefullycontemplatethatcustomer'skeytasks,decisions,andassociatedpainpoints.IstronglyrecommendusingthesametemplateusedinFigure11.2.
Exercise#2:Brainstormtherecommendationsyourorganizationcoulddelivertothatcustomerbasedonthecustomer'skeydecisions.Besuretotakeintoconsiderationthepainpointsasyoubrainstormtherecommendations.UsethetemplateusedinTable11.1.
Exercise#3:Brainstormthepotentialdatasourcesforeachoftheidentifiedrecommendations.Again,alldatasourceideasareworthyofconsideration,andyou'lldeterminethevalueandfeasibilityofthedifferentdatasourceslater.UsetheformatinTable11.2tocapturethedatasources.
Exercise#4:Prioritizetherecommendationsfromtheperspectivesofthevalueoftherecommendationtothecustomer,thevalueoftherecommendationtoyourorganization,andtheimplementationfeasibilityoverthenext9to12months.
Exercise#5:Clustertherecommendationsintosimilarorlogicalgroupstocreateamonetizationplan.
Notes1Howaboutbuyingafitnesstrackerforyourdog'scollarwithanappthatcantellyouwhetherornotyourdogneedsexercise,whattype,howmuch,etc.?Thatwouldbeanotherproductandservicethat,whencoupledwithdataaboutyourdog'sbreed,age,health,etc.,couldyieldamore“fit”dog.Youcouldcalltheproductandservice“FitBark”(hehehe).
2Providingrecommendationsonoptimalrunningshoesandrunningapparelcreatesnewmonetizationopportunitiesfromco-marketingwithsportingshoeandapparelmanufacturers.
Chapter12MetamorphosisExerciseReachingtheBusinessMetamorphosisphaseoftheBigDataBusinessModelMaturityIndexisasignificantaccomplishmentformostorganizations.Evenjustcontemplatingwhatthisendpointmightlooklikecanbequitebeneficialinthedevelopmentofanorganization'sbigdatainitiative.Beginningwithanendinmind,toquoteStephenCovey,notonlycanhelptheorganization'sleaderstoenvisionthepotentialofbigdatafromabusinesstransformationperspectivebutpragmaticallycanhelptheorganizationtoidentifywhereandhowtostarttheirbigdatajourney.
InworkingwithorganizationstomeasurehoweffectivelytheyleveragedataandanalyticswithintheirkeybusinessprocessesusingtheBigDataBusinessModelMaturityIndex(seeFigure12.1),Icreatedanexercisetohelporganizationstoenvisionwhatthebusinessmetamorphosismightlooklike.Whileit'snotpossibletostartyourbigdatajourneyatthisphase,theexercisehashelpedmyclientsidentify,prioritize,anddeveloptheirbigdatausecases.
Figure12.1BigDataBusinessModelMaturityIndex
BusinessMetamorphosisReviewAsarefresher,theBusinessMetamorphosisphaseiswhereorganizationsseektoleveragedata,analytics,andtheresultinganalyticinsightstotransformtheorganization'sbusinessmodels.Thisincludesareassuchasbusinessprocesses,organizationalstructures,productsandservices,partnerships,targetmarkets,management,promotions,rewardsandincentives,andothers.TheBusinessMetamorphosisphaseiswhereorganizationsintegratetheinsightsthattheycapturedabouttheircustomers'usagepatterns,productperformancebehaviors,andoverallmarkettrendstotransformtheirbusinessmodels.Thisbusinessmodelmetamorphosismightenabletheorganizationtoprovidenewservicesandcapabilitiestoitscustomersinawaythatiseasierforthemtoconsume.Perhapsitcouldenablethird-partydeveloperstoproliferateontheorganization'sfoundationalplatform,orfacilitatetheorganizationengaginginhigher-valueandmorestrategicservices.
TheBusinessMetamorphosisphasenecessitatesamajorshiftintheorganization'scorebusinessmodeldrivenbytheanalyticinsightsgatheredastheorganizationtraversestheBigDataBusinessModelMaturityIndex.
Herearesomeexamplesofwhatorganizationscoulddotoleveragedata,analytics,andtheresultinganalyticinsightstometamorphosetheirbusinessmodels:
Jetenginemanufacturertransformingfromsellingjetenginestoselling“thrust”andrelatedhigh-valueservicestotheairlinesaroundservicelevelagreements(on-timedepartures,on-timearrivals),productmaintenance(minimizingaircraftdowntime),insurance,warranties,andupgradingproductperformanceovertime(improvingfuelefficiency).
Farmequipmentmanufacturertransformingfromsellingfarmequipmenttoselling“farmingyieldoptimization”tofarmersbyleveragingsuperiorinsightsintoseeds,soilconditions,weather,fertilizers,pesticides,irrigationtechniques,andprojectedcropprices.
Energycompaniesmovingintothe“HomeEnergyOptimization”businessbyrecommendingwhentoreplaceappliances(basedonpredictivemaintenance)andevenrecommendingwhichappliancebrandsandmodelstobuybasedontheperformanceofdifferentappliancestakingintoconsiderationyourusagepatterns,localweather,localwaterqualityandlocalwaterconservationefforts,andenergycosts.
Airlinesmovingintothe“TravelDelight”businessofnotonlyofferingdiscountsandupgradesonairtravelbasedoncustomers'travelbehaviorsandpreferencesbutalsoproactivelyrecommendingdealsonhotels,rentalcars,limos,sportingormusicalevents,andlocalsites,shows,restaurants,andshoppinginthedestinationareasbasedonyourareasofinterestandpreferences.
ContinuingwiththeFootLockerexamplefrompreviouschapters,businessmetamorphosisforFootLockercouldmeanshiftingawayfromsellingsportingshoesandappareltoproviding“workoutsasaservice.”FootLockercouldmonitorallofyourworkoutsandwalkingactivitiesandautomaticallyrecommendthemostappropriateshoes,workoutapparel,workoutroutines,gymmemberships,andexercisetipsbasedonyouruniqueworkouthabits,patterns,tendencies,andpropensities.FootLockercouldevenexpandinto“healthandwellnessservices”toprovidetipsandrecommendationsaboutyourdiet,exercise,stress,cholesterol,andsoforth,focusedonimprovingyouroverallhealthandwellness(andmaybeevenhelpingtoreduceyourhealthinsurancecosts).
Inalloftheseexamples,theseorganizationscouplenewsourcesofcustomer,product,andoperationaldatawithdatasciencetouncovernewactionableinsightsthatformthebasistometamorphosetheirbusinessmodels.
Let'sintroduceanexercisethatcanhelptostrengthenyour“thinkinglikeadatascientist”methodology.TheexercisebeginswiththeBusinessMetamorphosisstageandworksbackwardstoidentifypotentialbigdatausecasesandthesupportingdataandanalytics.
BusinessMetamorphosisExerciseIaskedstudentsinoneofmyMBAclassestopretendthattheyweremanagementconsultantsthathadbeenaskedbyalargeairplanemanufacturertocontemplatehowbigdatacouldmetamorphosetheorganization'sfuturebusinessmodel.Inessence,thelargeairplanemanufacturerwantedtometamorphosethebusinessbytransitioningfromsellingairplanestoselling“airmiles.”
Thestudents,actingasmanagementconsultants,neededaprocesstouncovertheanalyticinsightsaboutpassengers,airplanes,airlines,airports,androutes(thestrategicnounsofthisexercise)necessarytosupportthebusinessmetamorphosis—totransformbusinessprocesses,people,organizationalstructures,productsandservices,partnerships,markets,organization,promotions,rewards,incentives,andsoon.Themanagementconsultantswouldalsoneedtoidentifythedata,analytic,andbusinessrequirementsnecessarytoencouragethird-partydeveloperstocreatevalue-addedservicesandproductsbasedontheairplanemanufacturer'snewbusinessplatform.
ArticulatetheBusinessMetamorphosisVisionThefirststepinthemetamorphosisexerciseistoarticulateandunderstandthebusinessramificationsoftheairplanemanufacturer'snewbusinessmodelvision.Usethefollowingvisionstatementasyourstartingpoint:
Largeairplanemanufacturerwantstometamorphoseitsbusinessmodelbytransitioningfromsellingairplanestosellingairmiles(transporting250customers2,600airmilesfromSFOtoJFKonSundaymorningsat9:00am)inordertocreatenewhigh-valueservicesforairlines(e.g.,United,American,Delta,Southwest)andenablethird-partydeveloperstoextendtheairplanemanufacturer'sbusinessmodeltoairlinesandpotentiallyothercustomers,partners,andmarkets.
Asastartingpoint,thisvisionstatementcouldhavethefollowingramificationstotheairplanemanufacturer'sbusinessmodel:
Theairplanemanufacturerwouldenjoyadramaticcompetitiveadvantageoverotherairplanemanufacturersbyprovidingnewbusinessbenefitstotheairlinesincludingsignificantlyimprovedcashflowandfinancials(reducedcapitalexpenditures),eliminationofmaintenancecosts,eliminationofpartsinventorycosts,andmitigationofflightdelayrisks.
Theairplanemanufacturerwouldberesponsibleforowningandmanagingthefleetsofairplanes(likelyundertheirbrand),andtheairlineswouldcontractwiththeairplanemanufacturertoacquire(provision?)theairmilesnecessarytotransportaspecifiednumberoftheairline'spassengersfromonelocationtoanotherataspecifiedtimeanddate.
Theairplanemanufacturerwouldassumeallresponsibilitiesforensuringthat
planesareupandrunning(e.g.,maintenancescheduling,maintenancetechniciantrainingandmanagement,maintenanceandreplacementpartsinventory,component,andsoftwareupgrades).Iftheplaneswerenotflying,thentheairplanemanufacturerwouldnotbegettingpaid.
UnderstandYourCustomersThesecondstepinthemetamorphosisexerciseistoidentifyandunderstandtheairplanemanufacturer'scustomers.Itisclearthatitscurrentcustomers(airlines)wouldcontinuetobethefuturecustomers.However,thisopensupopportunitiestoacquirenewtypesofcustomers—airlinepassengers,forexample.
Forexample,theairplanemanufacturerisnowinauniquepositiontoknowdetailsaboutairlinepassengerswhoflyacrossdifferentairlinesandcannowoffernewservicestothoseairlinepassengersthatcouldbemorecompellingthananysingleairlinecouldofferonitsown.Forexample,createanewtypeoffrequentflyerprogramthatoffersrewards,gifts,upgrades,recognition,andspecialprivileges(airportclubaccess,priorityTSApre-check)topassengerswhoflyonanyoftheairplanemanufacturer'splanes,regardlessoftheairline.
Theremaybeotheropportunitiestoleveragethisnewbusinessmodeltoaddressothercustomers,suchas:
Travelagentsbyvirtueofhavingamorecompleteunderstandingofpassengerdemandandflightandseatavailability
Hoteloperatorswhocouldworkwiththeairplanemanufacturertodirectcustomerstoavailablerooms
Groundtransportationcompanies(carrentalcompanies,Uber,Lyft,taxis,airportshuttles)bysharingpassengerforecastsintospecificairports
Sporting,casinos,andentertainmentcompaniesbydirectingpassengerstosportingeventsandentertainmentthatmaymatchthepassengers'areasofinterest
Andtherearesurelyothers.Thebusinesspotentialtoreachnewcustomersandnewmarketswithnewservicesisonlylimitedbythecreativethinkingoftheorganization.
ArticulateValuePropositionsThenextstepistobrainstormwhatthisbusinessmetamorphosiswouldmeantotheairplanemanufacturer'scustomers(United,American,Delta,Southwest,VirginAtlantic,etc.).Let'scontemplatethevaluepropositionsthattheairplanemanufacturer'snewbusinessmodelmightprovidetotheseairlinescustomers.Thesevaluepropositionstotheairlinescouldinclude:
Significantlyimproveairlinecashflowbyconvertingthefixedmonthlyairplaneleasepaymentstoavariablecostbasedonthenumberofpassengers
andairmiles.Thisgivestheairlinessignificantflexibilityindefining,scheduling,andmanagingpassengers,routes,andcrews.
Dramaticreductioninmaintenancecostsincludingspareandmaintenancepartsinventoryandmaintenancepersonnel(includinghiring,training,andmanagingofmaintenancepersonnel).
Reductioninunplannedandovertimecostsassociatedwithflightdelaysduetomechanicalissues,astheseissueswouldnowbecometheairplanemanufacturer'sresponsibility.
Airlinescouldthenfocusondifferentiatingthemselvesinareasotherthanairplaneconfiguration(becausethesamemodelsofairplaneswouldlikelybeusedtoservemultipleairlines)including:on-planecustomerserviceandamenities,onboardmeals(yeah,right),gateareacustomerserviceandamenities(loungechairsinsteadoftoday'sstadiumrejectedseats),frequentflyerrewardprograms(withmilesthatyoucanactuallyuse),clublocationsandamenities,ticketpricing,travelconvenience,tripdurationtimes(e.g.,reducenumberofconnections),andsoon.
Whilethiscouldbeascarypropositionforsomeairlines,forotherairlinesitprovidesanopportunitytoprovidenewhigh-valueservicestohigh-valuecustomersinordertobuildloyaltyinnewandcreativewaysoutsideofjustfightschedulesandseatavailability.
DefineDataandAnalyticRequirementsThefinalstepinthemetamorphosisexerciseistobrainstormtheairplanemanufacturer'sdataandanalyticrequirements.Youwillbrainstormtheserequirementsviaathree-stepprocess:(1)identifykeybusinessandoperationaldecisions,(2)identifytheanalyticstosupportthedecisions,and(3)identifydatatosupporttheanalytics.
Step1:IdentifyBusinessandOperationalDecisionsThefirststepistoidentifythekeyoperationalandbusinessdecisionsthattheairplanemanufacturerneedstomakeinordertosupportthenewbusinessmodel.Itiscriticaltothesuccessofyourbigdatainitiativetothoroughlyunderstandthebusinessandoperationaldecisionsthatthekeybusinessstakeholdersareresponsibleformaking.Decisions(andsomesupportingquestions)fortheairlinemanufacturercouldincludethefollowing:
Decisionsaboutpricingandtheirsupportingquestions:
HowdoIpriceconsideringsurgedemanddrivenbyspecialevents(bowlgames,FinalFourtournament,holidays)?
Howdirectlydoesmypricingimpacttheairlines'pricingandtheirabilitytobeprofitable?
CanIsupportsurgepricing?
CanIprovidepricingdiscountsforpackagesofairmiles?
Decisionsaboutsalesandmarketingandtheirsupportingquestions:
HowcanIleveragealoyaltyprogramtodriveusageandcapturemorepassengerdata?
Whatpromotionalpackagesaremosteffectiveatdrivingpassengerdemand?
CanIleveragesocialmediaandinfluencerstodrivefamiliesandgroupstofly?
Inwhatmarketsandroutesdowhattypesofpromotionsworkbest?
Decisionsaboutin-flightairplaneperformanceandtheirsupportingquestions:
Whichairplaneconfigurationsyieldthebestfuelefficiencies?
Whatpilotsaremostfuelefficient?
Whataretheoptimalcrewconfigurations?
HowdoIbestdistributebaggageandcargotooptimizefuelefficiencies?
Whichin-flightMVPpassengersareunhappy,andwhatshouldIdoaboutthat?
Decisionsaboutpassengerandbaggagemanagementandtheirsupportingquestions:
HowcanIspeedloadingandunloadingpassengersandbaggage(inordertospeedairportturns)?
Whatairplaneconfigurationsaremosteffectiveongettingpassengersandbaggageonandofftheplanesfaster?
HowdoIincentmorepassengerstocheckbagssothatlesstimeisspentinboardingplanes(again,tospeedairplaneturns)?
ShouldIcreatearampmanagementservicewhereItakeresponsibilityforloadingandunloadingtheairplanebaggage?
Decisionsaboutairplanemaintenanceandtheirsupportingquestions:
HowdoIselectwhichairplanesand/orjetenginestoreplacewithmoreefficientmodels?
HowdoIbalancethejetenginefuelefficiencywithjetenginemaintenancecosts?
Whichjetenginesaremostcost-effectivefromafuelefficientandmaintenanceperspective?
Decisionsaboutpartsandlogisticsmanagementandtheirsupporting
questions:
HowcanIreducesparepartsandmaintenancecosts?
Whatistheoptimalnumberandtypeofairplaneconfigurationinordertoreducesparepartsandinventorycosts?
CanIdesignplaneswithmoreinterchangeablepartstoreducepartsinventorycosts?
HowcanIleveragelow-cost,centralizedpartsdepotstosupportthemaintenanceandinventoryneedsofthehigh-volumeairports(e.g.,CedarRapids,IAservicingORD,MSP,MCI,andSTL)?
Decisionsaboutairplanedesignandtheirsupportingquestions:
HowcanIdesign/build/configureairplanestogetpassengersonandofftheplanemorequickly?
HowcanIdesign/build/configureairplanesthatreducepartsmaintenancecosts?
HowcanIdesign/build/configureairplanesthatreduceoperationalcosts(gateagents,baggageworkers,flightattendants,pilots)?
HowcanIdesign/build/configureairplanesthatreducepartsinventorycosts?
Step2:IdentifyAnalyticRequirementsNextyouneedtoidentifytheanalyticsthattheairplanemanufacturerwouldneedtosupporttheoperationalandbusinessdecisions.Inthisstepyouwanttoworkbackwardsfromthedecisionsandsupportingquestionstoidentifythepotentialanalyticsnecessarytosupportthedecisions.Table12.1containsastartersetoftheseanalytics.
Table12.1DecisionstoAnalyticsMapping
Decisions PotentialAnalytics
Pricing Passenger(demand)forecastFuelcostsforecastMaintenancecostsforecastPilot/flightattendantperformanceoptimizationPilot/flightattendantretentioneffectiveness
Salesandmarketing PassengerlifetimevaluescorePassengerloyaltyscorePassengernetpromoterscorePassengeracquisitioneffectivenessPassengerretentioneffectiveness
MarketingcampaigneffectivenessPersonalizedpromotionseffectiveness
In-flightperformance AirplanefueloptimizationCrewschedulingoptimizationCargodistributionoptimizationBaggagedistributionoptimizationPassengerdistributionoptimization
Passengerandbaggagemanagement
Baggagehandler/agentschedulingoptimizationBaggagehandler/agentcostoptimizationBaggagehandler/agentperformancemonitoringBaggagehandler/agentretentionFlightturnaroundeffectiveness
Airplanemaintenance AirplaneandpartspredictivemaintenanceWeatherforecastsAirplane/componentupgradesOptimizeinventorycostsOptimizelogisticscostsMaintenanceworkereffectiveness
Partsandlogisticsmanagement AirplaneandpartspredictivemaintenanceMaintenanceschedulingoptimizationCrewschedulingoptimizationPartsdemandforecastPartsinventoryoptimizationPartslogisticsoptimization
Airplanedesign Long-termfuelcostsforecastAirplanedesignfuelefficiencyPassengerboard/de-boardoptimizationBaggageload/unloadoptimization
Step3:IdentifyDataRequirementsInstep3youidentifythedatathattheairplanemanufacturermightneedtosupportthepricing,sales,marketing,maintenance,logistics,andotheranalytics.Youwanttobrainstormthedifferentdatasourcesthatmightbeusefulinhelpingyoudeveloptheanalyticstosupportyourkeydecisions.Let'sexpandTable11-1toincludethedifferentdatasourcesyoumightneedtosupporttheanalytics(seeTable12.2):
Table12.2Data-to-AnalyticsMapping
Decisions PotentialAnalytics
PotentialDataSources
Pricingdecisions
Passenger(demand)forecastFuelcostsforecastMaintenancecostsforecastPilot/flightattendantperformanceoptimizationPilot/flightattendantretention
PassengerflighthistoryAirplaneflighthistory(routes,airports,milesflown,fuelconsumed,passengerscarried,%emptyseats)AirplaneflightsensordataAirplanephysicaldata(age,lastupgradedate,configuration,weight,fuelconsumption,capacity,maxairspeed)AirplanemaintenancehistoryPilot/flightattendantdemographicsPilot/flightattendantflighthistoryPilot/flightattendantnotesandcommentsAirportphysicaldata(numberofrunways,ageofrunways,operationhours)AirportweatherEconomicdataHistoryoffuelcosts
Salesandmarketingprogramsdecisions
PassengerlifetimevaluescorePassengerloyaltyscorePassengernetpromoterscorePassengeracquisitionPassengerretentionMarketingcampaigneffectivenessPersonalizedpromotionseffectiveness
Passengerdemographics(age,height,weight,familymembers,jobtype)PassengerflighthistoryPassengersocialmediadata(posts,likes,tweets,shares)PassengercommentsPassengersocialmediasentiment
In-flightairplaneperformancedecisions
AirplanefueloptimizationCrewschedulingoptimizationCargodistributionoptimizationBaggagedistributionoptimization
Routedata(departure,destination,distance,windpatterns)WeatherconditionsAirportdata(numberofrunways,landingtrafficpatternsanddemand)WeightofbaggageWeightofpassengers(ouch!)WeightofcargoAirplanefuelconsumptionhistory
Passengerdistributionoptimization
Passengerandbaggagemanagementdecisions
Baggagehandler/agentschedulingoptimizationBaggagehandler/agentcostoptimizationBaggagehandler/agentperformancemonitoringBaggagehandler/agentretentionFlightturnaroundeffectiveness
Baggageloadingandunloadingperformancedata(flight,airplaneconfiguration,airport,sizeofcrew,experienceofcrew)Baggagehandler/agentdemographics(age,experience,training,recognitions)Baggagehandler/agentworkhistoryBaggagehandler/agentnotesandcommentsFlightdata(departuretime,actualdeparturetime,departureairport,destinationairport,airmiles,etc.)
Airplanemaintenancedecisions
AirplaneandpartspredictivemaintenanceAirplane/componentupgradesOptimizeinventorycostsOptimizelogisticscostsMaintenanceworkereffectiveness
Airplanephysicaldata(age,lastupgradedate,configuration,weight,fuelconsumption,capacity)Airplaneflighthistoryofnumberofpassengersflownbyroute,dayofweek,holidayandseasonalityAirplanemaintenancehistory(date,workdone,partsreplaced,technician,costs)Maintenanceworkerdata(age,experience,areasofexpertise,certifications)MaintenanceworkercommentsandnotesAveragemean-time-to-failure(airmiles)bymaintenancetypesAveragemaintenancepartsandpersonalcostsbymaintenancetypes
Partsandlogisticsmanagementdecisions
AirplaneandpartspredictivemaintenanceMaintenanceschedulingoptimizationCrewschedulingoptimizationPartsdemand
Replacementpartsdata(costs,manufacturer,associatedparts,specialcertification)Maintenanceparts(costs,manufacturer)Logisticscenterdata(location,costs,capacity,accesspoints)Inventorylevels
forecastPartsinventoryoptimizationPartslogisticsoptimization
Airplanedesigndecisions
Long-termfuelcostsforecastAirplanedesignfuelefficiencyPassengerboard/de-boardoptimizationBaggageload/unloadoptimization
Forecastfuelcosts/fuelpriceindexAverageweightandagebypassengerOptimalairplaneflow(loadandunload)byairplaneconfiguration
Usingthisapproach,myMBAstudentswereabletoquicklydeterminetheinsights,analytics,andpotentialdatasourcesnecessarytosupporttheairplanemanufacturer'sbusinessmetamorphosiswithouthavinganyworkingexperiencewitheithertheairplanemanufactureroranyairlinecompany.Ithinktheyimpressedthemselves!
BusinessMetamorphosisinHealthCareI'mstruckbywhat'shappeningwiththeUnitedStateshealthcareindustryandthepowerstrugglebetweenhealthcareprovidersandhealthcarepayers.Thehealthcareindustryisripeforametamorphosisintosomethingmuchmoreefficient,effective,andcustomer(patient)centric.Thishealthcarebusinessmetamorphosiscouldcreatenewpowerbrokers;healthcareplayerswhowillleveragenewsourcesofpatient,physician,clinical,medication,wellness,andcaredatatoimprovethequalityofcareandoutcomes,moreeffectivelymanagecosts,dramaticallyreduceoreliminateinefficientandunnecessaryprocessesandprocedures,andprovideamuchmorecompellingpatientandphysicianexperience.ThinkaboutitastheUber-ificationofthehealthcareindustrybysimplifyingtheoverallhealthcareprocessinordertoreducecosts,improvepatientcare,andimproveoverallpopulationwellness.1
Todaythereisfrictionbetweenhealthcareproviders(doctors,hospitals,clinics)andthehealthcarepayers(insurancecompanies,governmentagencies).Thehealthcarepayerswanttocapthecostofmedicalservicesbydictatinghowmuchtheyarewillingtoreimburseforparticulartypesofcareunderparticularconditions.However,thehealthcareprovidersarestartingtocaptureandanalyzeawidervarietyofpatient,care,andtreatmentdata.Thisincludesstructureddatafromoperationalsystems(Epic,Cerner,Lawson,Kronos),unstructureddata(nurseandphysiciannotes,patientcomments,e-mailconversations),andexternaldatasources(WebMD,Fitbit,MyFitnessPal,Yelp,Lumosity,andagrowingvarietyofotherhealthcare-relatedwebsitesandmobileapps).Leadinghealthcareprovidersareintegratingthesedatasourcestocreateactionablescoresabouttheirpatents'overallwellness(diet,exercise,stress),aswellasscoresaboutthepatients'likelihoodforstrokes,heartattacks,diabetes,andothermaladies(seeFigure12.2).
Figure12.2Patientactionableanalyticprofile
Healthcareprovidersareinapositionofstrengthwithrespecttotheirabilitytoleveragesuperiorinsightsaboutpatients,physicians,medications,procedures,treatments,diseases,therapy,maladies,etc.inordertoexertsignificantpressureontheinsurancecompanieswithrespecttowhatproceduresshouldbereimbursedandforhowmuch.Thehealthcareproviderswillknowwhichproceduresandmedicationsworkbestforwhichpatientsinwhatsituationsandcanleveragethoseinsightstoexertmoreinfluenceonthehealthcareindustry.
Healthcareprovidersneedtocontemplatethebusinessmetamorphosispotential,andhowtheywilltransitionfromprovidingjustpatientcaretohowtheybecomethemaintainerofthepopulation'soverallwellness.Preventivecareopportunitiesfueledbysuperiorpatient,medication,exercise,diet,andstressinsightscouldultimatelybethemostimportant(andprofitable)partofthehealthcarevaluechain!
Let'sdrillintothispotentialhealthcareindustrymetamorphosisinmoredetail.You'llusethesameapproachdiscussedwiththeairplanemanufacturerexamplebyfirstunderstandingthekeydecisionsthatneedtobemadetosupportthehealthcareindustrymetamorphosis,andthenidentifyingtheanalytics(orinsights)anddatanecessarytosupportthedecisions.
Businessdecisions→Supportinganalytics→Potentialdatasources
First,youwanttocapturethedecisionsthatthehealthcareprovidersneedtomakeaboutpatients,qualityofcare,costofcare,procedures,medications,etc.Thosedecisionscouldinclude:
Decisionsaboutwhichmedicalproceduresandmedicationstousewithwhichpatientsinwhatmedicalsituations
Decisionsabouttheappropriatelevelofmedicalcareversuscostsgiventhe
patientsituationandprognosis
Decisions(recommendations)forpatientsregardingdiet,sleep,stresslevel,exercise,etc.inorderreducetheriskofdiabetes,strokes,heartattacks,etc.
Decisionsaboutwhatcombinationsofdoctors,nurses,andtechniciansaremostcost-effectiveindifferentsurgicalsituations
Decisionsaboutwhatmedicationsandtreatmentsaremostcost-effectiveintreatingdifferentpatienthealthcaresituations
Decisionsabouttheoptimalcombinationsofrehab,exercise,sleep,medication,therapy,anddietthatcanaccelerateapatient'srecovery
Afteryouhaveidentifiedthebusinessandoperationaldecisions,thenyouwanttocapturetheanalyticsnecessarytosupportthedecisions.Someofthoseanalyticscouldinclude:
Patientwellnessscore
Patientexercisescore
Patientstressscore
Patientdietscore
Medication,procedures,andtreatmenteffectiveness
Hospiceversushospitalcostandcareeffectiveness
Physicianandnurseeffectiveness
Emergencyroomdemandforecasting
Populationhealthforecasting
Physicianandnurseretention
Hospitalacquiredinfectionsreductions
Unplannedreadmissionsreductions
Finally,youwanttobrainstormdatasources(patients,physicians,outcomes,costofcare,procedures,treatments,medications,etc.)thatcouldsupportyouranalytics.Followingisalistofpotentialdatasourcesthatmightbeofvalueindevelopingyouranalytics:
Hospitalcaredata(Epic,Cerner)
MapMyRun
Financials(Lawson,Oracle)
MyFitnessPal
Hoursworked(Kronos)
Strava
Physiciannotes
Smarttoilets
Nurseandtechniciannotes
Smartbloodpressuremonitors
Pharmacyandprescriptions
Smartglucosemonitors
Medicationusage
AppleHealth
Patientcomments
Indeed.com
HCAHPSandsurveys
CDC
Socialmediacomments
Healthcare.gov
Yelpratings
GoogleTrends
WebMD
Trafficpatterns
Lumosity
Weatherforecasts
NikeFuelBand,Fitbit,andGarmin
Holidayschedules
AppleWatch
Specialeventsschedules
Attheendofthismetamorphosisexerciseprocess,healthcareproviderswillbeinabetterpositiontohaveidentifiedthedecisions,analytics,anddatanecessarytoclaimabiggerportionofthehealthcarevaluechain,including:
Whataretheoptimaltreatmentsandmedicationsgivenapatient'sconditionsandhistoryandhowmuchthepayershouldreimburse?
Whatisthevalueofpreventivecare(diet,exercise,sleep,medication,therapy),andhowmuchshouldhealthcarepayerscovertoincentmorehealthyandmoreprofitablepatientbehaviors?
SummaryIndustriesasdiverseasprofessionalsports,manufacturing,consumerpackagegoods,retail,education,socialservices,andhealthcarearegoingthroughbusinessmodelmetamorphosesbyleveragingthewealthofrichdatasourcesabouttheircustomers,products,andoperations.Andleadingorganizationsarelearningtoleveragetheresultinganalyticinsightstochangethebalanceofpowerwithintheirindustry.
Inthehealthcareindustry,healthcareprovidersthatknowthemostabouttheirpatients'andphysicians'behaviors,tendencies,andusagepatternsareinthebestpositiontocorrectthefuzzymaththathealthcarepayershavebeenusingtosettheirreimbursementrates.
Nomatterwhatyourorganization'sultimatebusinessvision,goingthroughthebusinessmetamorphosisexercisecanuncoverbigdatarequirementsarounddecisions,analyticsanddatasourcesthatcanbeleveragedtotransformormetamorphoseyourorganization'sbusinessmodel.Anditisaneasierexercisetodothanonemightthink,asthestudentsinmyMBAclassdiscovered.
Thebottomlineacrossallindustriesisthis:theorganizationsthatknowthemostabouttheirproducts,operations,andcustomers'behaviors,tendencies,andusagepatternsareinthebestpositiontomonetizethoseinsightsandexertcontroloverthoseorganizationswithintheirvaluechainsthatlackthosecustomer,product,andoperationalinsights.Intheend,that'stheultimategoaloftheBusinessMetamorphosisphaseoftheBigDataBusinessModelMaturityIndex.
HomeworkAssignmentUsethefollowingexercisestoapplywhatyoulearnedinthischapter:
Exercise#1:Buildontheairplanemanufacturerexamplebyapplyingthemetamorphosisexercisetechniquestoanotherbusinessstakeholdersuchastravelagents,hoteloperators,orgroundtransportationcompanies.
Exercise#2:Pickanorganization(preferablyyourownorganization)andapplythemetamorphosisexercisetobrainstormthedecisions,analytics,anddatanecessarytosupportyourorganization'sbusinessmetamorphosis.Asalways,itismoreproductiveandmorefuntodothisexercisewithasmallgroup.Maybeflysomeplacecool(likeLasVegas,Austin,CharlesCity,orNashville)toputeveryoneintherightframeofmind!
Notes1Iusetheterm“Uber-ification”todescribethemetamorphosisoftraditionalindustriesbynewbusinessmodelsthatsimplifytheconsumer'sdecisionprocess.ThecompanyUberisthreateningthetraditionaltaxi,limousine,andtransportationindustrieswithasmartphoneappthatgreatlysimplifiestheuser'stransportationdecisions.Uberhascreatedanewmarketplacethatmatchesriderswithdriversandhasturnedeverydriverintoapotentiallimoortaxidriver.
PartIVBuildingCross-OrganizationalSupportChapters13through15laytheorganizationalandculturalfoundationformetamorphosingtheorganization.Thesechapterscovermanyofthepeople,processes,roles,andresponsibilitiesthatneedtobeaddressedasorganizationslooktointegratedataandanalyticsintotheirbusinessmodelsandcompletethebigdatajourneytotheBusinessMetamorphosisphaseoftheBigDataBusinessModelMaturityIndex.
InThisPart
Chapter13:PowerofEnvisioning
Chapter14:OrganizationalRamifications
Chapter15:Stories
Chapter13PowerofEnvisioningThebusinesspotentialofbigdataisonlylimitedbythecreativethinkingofyourbusinessstakeholders.Soinasense,thismaybethemostimportantchaptersupportingthe“thinkinglikeadatascientist”processandthemostfundamentallycriticalguidancewithinthebook.
Opportunitiesaboundfororganizationstoanalyzethe“dark”datathatisburiedwithintheiroperationalsystemsanddatawarehousesandidentifyotherinternalandexternaldatasourcesthattheycouldleveragetooptimizekeybusinessprocesses,differentiatetheircustomerengagement,anduncovernewmonetizationopportunities.However,gettingthebusinessstakeholderstoenvisionwhatmightbepossiblewithrespecttotheircurrentlyunder-utilizedinternaldataandthewealthofexternaldatasourcesisasignificantchallenge.SoundsliketheperfecttimeforanenvisioningengagementsuchasEMC'sBigDataVisionWorkshop.1
NOTE
IamthecreatorofEMC'sBigDataVisionWorkshopmethodology.IhavepersonallyexperiencedthepowerfulbusinessideasthattheBigDataVisionWorkshopcanunleashfromparticipantswhenthepropercreativeenvironmentandprocessesareputintoplace.ConsequentlyIamaverybullishontheBigDataVisionWorkshopandthegame-changingpowerofenvisioning.
Inthischapter,IamgoingtodiscussEMC'sBigDataVisionWorkshopasanexampleofanenvisioningengagementthatcanfueltheorganization'screativethinkingforidentifyingwhereandhowtoleveragebigdatatopowertheorganization'sbusinessmodels.TheBigDataVisionWorkshopleveragesthe“thinkinglikeadatascientist”techniquestohelpthebusinessstakeholdersunderstandhowbigdatacanoptimizetheirkeybusinessprocessesanduncovernewmonetizationopportunities.
Envisioning:FuelingCreativeThinkingTheBigDataVisionWorkshopisanenvisioningengagementdesignedtodriveorganizationalalignmentandfuelcreativethinkingaboutwhereandhowanorganizationcanleveragedataandanalyticstopoweritsbusinessmodels.TheBigDataVisionWorkshophelpsorganizationsthatdon'tknowhowtoanalyzethedatatheyalreadycollectorhowtoidentifyadditionaldataworthcollecting.Specifically,theBigDataVisionWorkshop:
Providesaformalprocessforidentifyingwheredataandanalyticscandrivematerialbusinessimpactthataffectstheorganization'skeybusinessinitiativesoverthenext9to12months.
Ensuresbusinessrelevancebyfocusingontheorganization'smostimpactfulbusinessopportunities.
FacilitatesgroupexercisestoencouragebusinessandITstakeholderstoenvisionthe“realmofwhat'spossible”fromtheorganization'sinternaldata,aswellasexplorethepotentialofexternaldata.
DrivesbusinessandITalignmentaroundthose“best”analyticopportunitieswithaclearroadmapofwhatneedstobedoneoverthenext9to12months.
TheBigDataVisionWorkshopprocessisidealfororganizationswho:
Haveadesiretoleveragebigdatatotransformtheirbusinessbutdonotknowwhereandhowtostart.
Haveawealthofdatathattheydonotknowhowtomonetize.
OrganizationsofallsizeshavesuccessfullyleveragedtheBigDataVisionWorkshoptoidentifywhereandhowtoleveragedataandanalyticstopowertheirbusinessmodels.Noorganizationistoosmall,andyes,yourdatais“big”enough.
BigDataVisionWorkshopProcessTheBigDataVisionWorkshoptypicallyspanstwotothreeweeks.Itconcludeswithahalf-day,facilitated,on-siteinteractiveworkshopthatprioritizesthehigh-valuebusinessusecasesandidentifiesthesupportingdataandadvancedanalyticrecommendations.However,asubstantialamountofworkneedstobedonepriortotheworkshoptodrivethecross-organizationalcollaborationandfuelthecreativethinkingprocesses.Figure13.1outlinestheBigDataVisionWorkshopprocessandtimeline.
Figure13.1BigDataVisionWorkshopprocessandtimeline
Let'sexaminethestagesoftheBigDataVisionWorkshopprocesstohelpyouunderstandhowtostimulatecreativethinkingandgeneratetheactionableanalyticrecommendationsthatbestsupporttheadvancementofyourorganization'skeybusinessinitiatives.
Tomakethisenvisioningprocessmorereal,Iamgoingtowalkthroughtheprocessusingahealthcareorganization(groupofhospitals)asanexample.IwillrefertotheorganizationasHealthcareSystems.
Pre-engagementResearchFortheengagementtobesuccessful,thereareseveralkeyactivitiesthatneedtohappenpriortotheenvisioningengagementtoensurethatitisimpactfultotheorganization.Followingarethekeystepsinthepre-engagementphaseoftheBigDataVisionWorkshop:
Identifytheorganization'sbusinessinitiativeorbusinesschallengeonwhichtofocustheengagement.
Identifythebusinessstakeholderswhoimpactorareimpactedbythetargetedbusinessinitiative.Therearetypicallythreetofivedifferentbusinessfunctionsengagedintheenvisioningprocess.Awidevarietyofbusinessstakeholdersensureacomprehensivecollectionofdecisions,questions,metrics,anddatasourcesthatsupportthetargetedbusinessinitiative.
Gatherinformationaboutthesampledatasetsincludingfileformats,datalocation,datadictionary,andsmallsampleofthedata(5to6gigabytes).Ultimately,thedatascientistswillusethesmalldatasetstocreateillustrativeanalytics.
ThekeybusinessinitiativeforHealthcareSystemsistoexplorehowtoleveragedataandanalyticstoimprovethequalityofpatientcarewhilecontrollingcosts,orto“improvecost/qualityofpatientcare.”
BusinessStakeholderInterviewsTheBigDataVisionWorkshopengagementstartsbyinterviewingthekeybusinessstakeholders.Theinterviewprocessfocuseson(1)capturingthedecisionsthatthebusinessstakeholdersneedtomaketosupportthetargetedbusinessinitiativeand(2)capturingawiderangeofquestionsthatsupportthosedecisions.KeystepsintheinterviewphaseoftheBigDataVisionWorkshopare:
ConductinterviewswithbusinessandITstakeholderstocapturekeybusinessobjectives,thedecisionsthattheyaretryingtomake,andthetypesofquestionsthattheyneedtoanswerinsupportofthosedecisions.
Collectsupportingmaterialssuchassamplereportsanddashboards.Alsocollectanyexamplesofthebusinessstakeholdersdownloadingdataintospreadsheets.Thosespreadsheetscanbegoldinunderstandingthedecisionsthatthebusinessstakeholdersaretryingtomake.
Identifyorbrainstormotherpotentialdatasources(internalandexternal)thatmightbeofvalueinsupportingthekeydecisions.
Itisalwaysbesttocreateandshareaninterviewquestionnairewithintervieweespriortotheinterviews.Theinterviewquestionnaireshouldaddressthefollowing:
Whataretheirkeyobjectivesandresponsibilities?
Whatdecisionsmusttheintervieweesmakewithrespecttothetargetedbusinessinitiative?
Whatquestionsdotheyneedtoanswerinsupportofthosedecisions?
Whatarethemetricsorkeyperformanceindicatorsagainstwhichsuccesswillbemeasured?
Whataretheorganization'svaluedrivers(e.g.,thekeyactivitiesthathelptheorganizationmakemoneyrelatedtothetargetedbusinessinitiative)?
TheHealthcareSystemskeybusinessstakeholdersforthe“improvecost/qualityofpatientcare”initiativearethefollowing:
Physiciansandnurses
Clinical
Operations
Finance
Humanresources
Populationhealth
ExplorewithDataScienceAverypowerfulpartoftheBigDataVisionWorkshopengagementisthedatascienceworktocreateillustrativeanalyticsonthesampledatasets.Thispartoftheenvisioningengagementmightnotbepossibleifyourorganizationdoesnothaveaccesstoadatascienceteam.Butifyoudo,thedatascienceteamshouldexploredifferentanalytictechniques(liketheonescoveredinChapter6)tohelpthebusinessstakeholderstoenvisiontherealmofwhatispossibleusingdatascience.Keytasksinthedatascienceexplorephaseare:
Prepare,transform,andenrichthedata.
Explorethedatausingdifferentdatavisualizationtechniques.
Exploreopportunitiestointegrateexternaldatasourcessuchassocialmedia(Twitter,Facebook,LinkedIn),app-generated(Zillow,Eventbrite),andpublicdomaindata(data.gov).
Buildillustrativeanalyticsusingdifferentanalytictechniquestodeterminewhichanalytictechniquesyieldthemostrelevantinsights.
PackagedatavisualizationsandanalyticmodelsforconsumptionbythebusinessandITstakeholders.
Developsimpleuserexperiencemock-upstovalidatehowtheanalyticswillsupportthebusinessstakeholders'keydecisions.
AtHealthcareSystems,asmallsamplesetofdatafromEpic(hospitaloperations),Kronos(timeandattendance),andLawson(financeandcosts)werepulledtogether,andillustrativeanalyticswerecreatedaroundthefollowingbusinessareas(seeFigure13.2):
Emergencyroomvolumevariances
Operatingroompatientvolumeforecasting
Diagnosticcoderelationships
Kneereplacementcostclusters
Figure13.2BigDataVisionWorkshopillustrativeanalytics
AtHealthcareSystems,simplemock-upswerealsodevelopedsothattheworkshopparticipantscouldenvisionhowtheanalyticresultscouldbepresentedtofrontlineworkers(physicians,nurses,admissions)andpatients(seeFigure13.3).
Figure13.3BigDataVisionWorkshopuserexperiencemock-up
WorkshopOncetheaboveactivitieshavebeencompleted(whichtypicallytakesabouttwotothreeweeksofwork),youarenowreadyforthehalf-dayworkshop.Thegoalof
thefacilitated,on-site,interactiveworkshopistohelptheparticipants:
Gobeyonddescriptivereportingtobrainstormtheapplicabilityofpredictive(whatislikelytohappen)analyticsandprescriptive(whatshouldIdo)analytics.
Brainstorm,identify,andprioritizeadditionaldatasources(bothinternalandexternaldatasources)thatmaybeworthyofcollectingforthetargetedbusinessinitiative.
Useaprioritizationprocesstoidentifythebestanalyticopportunitiesbasedonbusinessvalueandimplementationfeasibilityoverthenext9to12months.
Followingaresomespecifictasksthatshouldbeaccomplishedduringtheworkshop.
FueltheCreativeThinkingProcessYouwanttostimulatecreative,“outofthebox”thinkingduringtheworkshop.Tofuelthecreativethinkingprocess,dothefollowing:
Sharetheillustrativeanalyticsthatthedatascienceteamcreatedfromtheclient'sdatatostimulatecreativethinkingregardinghowadvancedanalyticscouldenergizethebusiness.
Reviewexamplesfromotherindustriesofadvancedanalyticsappliedtodifferentbusinessscenarios.
Sharethemock-upsinordertostimulatecreativethinking(PowerPointworksgreatasyourmock-upanduserexperiencedevelopmenttool).
BrainstormBusinessDecisionsandQuestionsAfterwalkingthroughtheillustrativeanalyticsandmock-ups,leadtheworkshopparticipantsthroughaseriesoffacilitatedbrainstormingscenariosincluding:
Scenario1.BrainstormtheinsightsthatyouwanttouncoveraboutyourtargetedbusinessinitiativeifyoucouldgetaccesstoALLtheorganization'soperationalandtransactionaldata.ForHealthcareSystems,whatinsightswouldyouwantaboutyourkeybusinessinitiativeifyouhad10to20yearsofpatientcaredata,hospitaloperationsdata,timeandattendancedata,andfinancedata?Heck,I'mstartingtosoundliketheNSA!
Scenario2.Brainstormtheinsightsthatyouwouldwanttouncoveraboutyourkeybusinessinitiativeifyouhadaccesstoalloftheorganization'sinternalunstructureddata(physicianornursenotes,patientcomments,e-mailthreads)andexternalunstructureddata(socialmedia,mobile,blogs,newsfeeds,weather,traffic,economic,populationhealth,CentersforDiseaseControl).
Scenario3.Decomposethekeybusinessinitiativeintothedifferentevents
thatcomposethatinitiative,andbrainstormwhatinsightsyouwouldwanttocaptureifyouhadaccesstothatdatainreal-time.ForHealthcareSystems,arethereopportunitiesto“catchthepatientatthetimeofneed”interactingwithyourorganizationinordertoprovidepreventivecarerecommendations?
Scenario4.Brainstormhowyouwouldleveragepredictiveanalyticsandprescriptiveanalyticstouncovernewactionableinsightsaboutyourtargetedbusinessinitiative.ForHealthcareSystems,buildonthelearningsfromthestakeholderinterviewstocreatequestionsaboutthe“improvecost/qualityofpatientcare”businessinitiativethatstartwithverbssuchaspredict,forecast,recommend,score,orcorrelate.
Besuretocapturethedecisionsandquestionsonseparatestickynotesandplacethestickynotesonflipcharts.
GroupDecisionsandQuestionsintoCommonThemesNext,havetheworkshopparticipantsgroupthedecisionsandquestionsintousecasesthatsharecommonbusinessand/orfinancialobjectives.Haveparticipantsgatheraroundtheflipchartsandgroupthestickynotesintousecasesontheflipchartsheets.Oncethestickynotesaregroupedintocommonusecases,useamarkertodrawacirclearoundeachofthegroupingsandgiveeachgroupingadescriptiveshortname.ForHealthcareSystems,brainstormingthe“improvecost/qualitypatientcare”businessinitiativecouldyieldusecasessuchasunplannedreadmissionsanalysis,hospitalacquiredinfections,servicevarianceanalysis,staffing/cost/outcomesanalysis,staffretention,procedurescostanalysis,volumeforecasting,andpopulationhealth.
PrioritizetheGroupingsNext,havetheworkshopparticipantsprioritizetheusecasesusingtheprioritizationmatrix(seeFigure13.4).Theuseoftheprioritizationmatrixiscoveredindepthlaterinthischapter.
Figure13.4PrioritizeHealthcareSystems'susecases
SummarizeWorkshopResultsFinally,summarizetheresultsoftheworkshopincluding:
Reviewoftheprioritizedlistofpotential“AnalyticsOpportunities.”Verifythateveryonebuysoffontheendresult.
Reviewof“ParkingLot”itemsanddiscussionofanypotentialfollow-upsteps.
Discussionofnextsteps.
TheBigDataVisionWorkshopdeliverablesinclude:
Prioritizationmatrixwiththeprioritizationoftheusecases
Thestickynotecontentforeachusecase
Interviewtakeaways
Datascientistillustrativeanalytics
Userexperiencemock-ups
DocumentationoftheParkingLotitems(forpotentialfollow-up)
Dataassessmentworksheetsthatassessthebusinessvalueandimplementationfeasibilityofeachdatasource
SettingUptheWorkshopTherearemany“little”thingsthatneedtobedonepriortotheworkshop.Andwhileyoumaybetemptedtoskipovertheseseeminglysuperficialtasks,theyarecriticaltotheworkshopsuccessbecausetheysettheproperstageforthedesired“outofthebox”thinking….yes,thinkingdifferently!
Pickacreative,out-of-theboxlocation.IhavedoneworkshopsinthemiddleofanIowacornfield(forawindturbineenergycompany),inagradeschoolclassroom
completewiththoselittlechairsandtables(foracharterschool),inacomedyclub(foragamingestablishment),andinatechnologymuseum(forahigh-techmanufacturer).
Setuptheroomforfacilitatedconversations,whichcanincludethefollowing:
Arrangechairsinahorseshoeshape
Createa“ParkingLot”flipchartandtapeittothewall
Createa“GroundRules”flipchartandtapeittothewall
Createaprioritizationmatrixchartandtapeittothewall
Tapefivetosixblankflipchartsheetstothewallsforbrainstorming
Haveplentyof3×5stickynotesandmarkersavailableforimpromptucapturingofideasandthoughts
Confirmthemeetingtimeandduration.Youdonotwantpeoplewalkingoutoftheworkshophalfwaythroughbecausetheythoughtthesessionwasonlytwohours.Haveparticipantsblockoutfourtofivehours,andifyougetdonesooner,givethemthetimeback.
Kickoffthemeetingby:
Explainingwhytheparticipantsarethereandtheobjectivesoftheworkshop.
Sharingtherolesoftheworkshopteam(facilitator,datascientist,subjectmatterexpert,andscribe).
Havingeveryonesharetheirname,theirresponsibilitiesandtheirexpectationsfortheworkshop.
Establishtheworkshopgroundrulesincluding:
Onlyoneconversationatanygivenmoment.
Nohierarchyintheroom;everybodyandtheirideasareequal.
Turnoffcellphones,tablets,andcomputers(oratleastputthemintobuzzorstunmode).
Shareanyandallideas(theonlybadideaistheonethatisn'tshared).
Breaksareplannedthroughouttheworkshop,sopleasestaywiththegroup.
Useicebreakerstokickofftheworkshoptogeteveryoneparticipating.Thereareseveraldifferenttypesoficebreakers.Becreativeandrelevanttotheclient'senvironment.Forexample:
Haveeveryonesharewiththegroupsomethingaboutthemselvesthatyoudon'tthinkanyoneelseknows.
Withamoviechainclient,weaskedeachparticipanttoidentifyamoviecharacterthattheyaremostlikeandwhy.
Haveparticipantspicktheirfavoritesuperheroandexplainwhythatsuperheroistheirfavorite.
UseaParkingLotflipcharttocontroltheworkshop.Explainthepurposeofthe“ParkingLot”(i.e.,capturestopicsthatareoutsidethescopeoftheworkshopandkeepstheworkshopmovingintherightdirection).
Duringtheworkshop,usethefollowingtechniquestohelpfueltheparticipants'creativethinkingprocess:
Havetheworkshopparticipantscaptureoneideaorthoughtperstickynotethroughoutthescenarios.
Havethefacilitatorsplacethestickynotesontheflipchartsastheideasorthoughtscomeup.
Havethefacilitatorsreadaloudtheideaorthoughtastheyarepostingittothewall;thishelpstofuelthecreativethinkingprocess.
Ensurethatparticipantsbrainstormindividually.Ifyoubrainstormingroups,goodideascangetlostwhenthereareoverpoweringpersonalitiesinthegroups.
ThePrioritizationMatrixOnebigobstacletoasuccessfulbigdatajourneyisgainingconsensusandalignmentbetweenthebusinessandITstakeholdersinidentifyingthebigdatausecasesthatdeliversufficientfinancialvaluetothebusinesswhilepossessingahighprobabilityofimplementationsuccessoverthenext9to12months.Youcanidentifymultipleusecaseswherebigdataandadvancedanalyticscandelivercompellingbusinessvalue.However,manyoftheseusecaseshavealowprobabilityofimplementationsuccessoverthenext9to12monthsbecauseof:
Lackofavailabilityoftimely,accuratedata.
Inexperiencewithnewdatasourcessuchassocialmedia,mobile,unstructured,andsensordata.
Limiteddataoranalyticpeopleresources.
LackofexperiencewithnewtechnologieslikeHadoop,MapReduce,Spark,Mahout,MADlib,textmining,etc.
WeakbusinessandITcollaborativerelationship.
Lackofmanagementfortitudetostickwiththeengagement.
Oneofmyfavoriteorganizationalalignmenttoolsforaddressingthisissueistheprioritizationmatrix.Theprioritizationmatrixisamarveloustoolfor:
Identifyingthe“right”usecasetopursuewithbigdatabasedonabalanceofbusinessvalueandimplementationfeasibility.
EnsuringthatbothITandbusinessstakeholdershaveavoiceindiscussingtherelativevalueandimplementationchallengesforeachusecase.
Capturingthebusinessdriversandimplementationrisksforeachoftheusecases.
Catalyzingthedecisiononthe“right”usecasessothateveryone(businessandIT)canagreeonapathforward.
TheprioritizationmatrixisthecapstoneoftheBigDataVisionWorkshopprocess.Theprioritizationmatrixfacilitatesthediscussion(anddebate)betweenthebusinessandITstakeholdersindeterminingthe“right”usecaseonwhichtofocusthebigdatainitiative.The“right”usecasehasbothmeaningfulbusinessvalue(fromthebusinessstakeholders'perspectives)andreasonablefeasibilityofsuccessfulimplementation(fromtheITstakeholders'perspectives)overthenext9to12months.
Focusingtheprioritizationmatrixprocessonakeybusinessinitiative—suchasreducingchurn,increasingsamestoresales,minimizingfinancialrisk,optimizingmarketspend,orreducinghospitalreadmissions—iscriticalasitprovidesthefoundationandtheguardrailsformeaningfulbusinessvalueandimplementationfeasibilitydiscussions.
Theprioritizationmatrixprocessstartsbyplacingeachidentifiedusecase(identifiedintheBigDataVisionWorkshop)onastickynote.Theworkshopparticipantsthendecidetheplacementofeachusecaseontheprioritizationmatrix(weighingbusinessvalueandimplementationfeasibility)vis-à-vistherelativeplacementoftheotherusecases(seeFigure13.5).
Figure13.5Prioritizationmatrixtemplate
Thebusinessstakeholdersareresponsiblefortherelativepositioningofeachbusinesscaseonthebusinessvalueaxis.TheITstakeholdersareresponsibleforrelativepositioningofeachbusinesscaseontheimplementationfeasibilityaxis(consideringdata,technology,skills,andorganizationalreadiness).
Theheartoftheprioritizationprocessisthediscussionthatensuesabouttherelativeplacementofeachoftheusecases.Issuesdiscussedcouldinclude(seeFigure13.6):
Figure13.6Prioritizationmatrixprocess
Whyisusecase[B]moreorlessvaluablethanusecase[A]?Whatarethespecificbusinessdriversorvariablesthatmakeusecase[B]moreorlessvaluablethanusecase[A]?
Whyisusecase[B]lessormorefeasiblefromanimplementationperspectivethanusecase[A]?Whatarethespecificimplementationrisksthatmakeusecase[B]lessormorefeasiblethanusecase[A]?
Itiscriticaltotheorganizationalalignmentprocesstocapturethereasonsfortherelativepositioningofeachusecaseduringtheprioritizationprocess.Thesediscussionsprovidethefinancialguidelinesnecessarytoachievetheusecasebusinessvalueandflagpotentialimplementationrisksthatneedtobeaddressedduringtheproject.
SummaryTheBigDataVisionWorkshopandtheprioritizationmatrixaremarveloustoolsfordrivingorganizationalalignmentbetweenthebusinessandITstakeholdersaboutwhereandhowtostarttheorganization'sbigdatajourney.Thesetoolsprovideaframeworkforidentifyingtherelativebusinessvalueofeachusecasevis-à-visitsimplementationrisksoverthenext9to12months.Asaresultoftheprioritizationprocess,boththebusinessandITstakeholdersknowwhatusecasestheyaretargeting,understandthepotentialbusinessvalueofeachusecase,andhavetheireyeswideopentotheimplementationrisksagainstwhichtheprojectneedstomanage.
ThebottomlineisthattheBigDataVisionWorkshopandprioritizationmatrixensurethatthefullforceoftheorganizationcanbebroughttobearincapturingthebusinesspotentialoftheorganization'sbigdatainitiative.
HomeworkAssignmentUsethefollowingexercisestoapplywhatyoulearnedinthischapter.
Exercise#1:Grabsomecoworkersandblockoff30to45minutestotestouttheprioritizationmatrixprocess.Asagroup,identifysomeinitiativesorprojectstheorganizationiscontemplatingoverthenext9to12months.Thenusetheprioritizationmatrixtodebate,argue,andarmwrestleaboutwheretopositioneachoftheseprojectsvis-à-viseachotherontheprioritymatrix.
Exercise#2:Havesomefunwiththeprioritizationmatrix!Grabsomeguysandgalsandidentifythecurrenttop10to12NBAandWNBAbasketballplayers(thatalonemaybeadifficultchallenge).Usetheprioritymatrixprocesstodecidethevalueoftheplayersbasedontheir:
Personalperformance—theirpersonalperformancenumberslikepointsscored,numberofrebounds,andnumberofassists
Importancetotheteam—theirabilitytomaketheirteammatesbetter(IwonderhowonewouldcompareStephenCurrytoElenaDelleDonne.Man,thatshouldbeafundiscussion!)
Notes1EMCCorporationistheworld'sleadingdeveloperandproviderofinformationinfrastructuretechnologyandsolutionsthatenableorganizationsofallsizestotransformthewaytheycompeteandcreatevaluefromtheirinformation.
Chapter14OrganizationalRamificationsNowcomesthehardpart.No,it'snotthetechnologyandknowingwhattechnologiestobackandwhichonesmightfade.No,it'snotthelackofdatasciencetalent.Andno,it'snotevengainingthebuy-inofthebusinessstakeholders,thoughthatcanbeahugeissue,aswehavediscussedthroughoutthisbook.
Thebiggestthreattothesuccessofanyorganization'sbigdatainitiativeistheorganizationalimpediments.Moreaccuratelyput,itisovercomingtheorganizationalinertiaandimplementingtheorganizationalandculturalchangesnecessarytoadvancefrombusinessmonitoringtobusinessoptimization,monetization,andultimatelymetamorphosis.It'stoughtogettheorganizationto“thinkdifferently.”AsPogofamouslysaid,“Wehavemettheenemyandheisus.”
Inthischapter,youwillexploretheroleoftheChiefDataOfficer,whichIprefertocalltheChiefDataMonetizationOfficer.Youaregoingtoconsiderthetrioofprivacy,trust,anddecision(notdata)governance.Andfinally,thechapterconcludeswithguidanceforliberatingtheorganizationandunleashingtheonlythingstandingbetweenbigdatamediocrityandbigdatametamorphosis—creativethinking.
ChiefDataMonetizationOfficerThere'sanewsheriffinthebigdataworldandthat'stheChiefDataOfficer(CDO).AmoreaccuratetitleforthisroleisChiefDataMonetizationOfficer(CDMO),asthispersonshouldfocusondrivingandderivingvaluefromtheorganization'sdataandanalyticassets.TheCDMOshouldowntheorganization'sinvestmentdecisionswithrespecttodataandanalyticsandownthecharterforidentifyingandmanagingtheorganization'sdataandanalyticsmonetizationinitiatives.
AnidealCDMOcandidateshouldhaveabackgroundineconomics.TheCDMOdoesn'tneedaninformationtechnologybackground(that'stheCIO'sjob).Irecommendaneconomicseducationbecauseeconomistshavebeentrainedtoassignvaluetoabstractconceptsandassets.Aneconomistisanexpertwhostudiestherelationshipbetweenanorganization'sresourcesanditsproductionorvalue.Andintoday'sworld,assigningvaluetodatacanbeextremelyabstract.
CDMOResponsibilitiesTheCDMOownsquantifyingthevalueofdataandchampioningtheorganization'seffortstomonetizetheorganization'sdata(byapplyinganalyticsinordertooptimizekeybusinessprocessesanduncovernewrevenueopportunities).TheCDMOmustcollaboratewithbusinessmanagementtodeterminethecosts,benefits,andReturnonInvestment(ROI)fordataanddata-relatedbusinessinitiatives.
TheCDMOshouldsitbetweenthebusinessleaders(whohaveprofitormarginresponsibilities)andtheCIO(whoownsthetechnologydecisions)inordertodrivetheidentification,valuation,andprioritizationofdataacquisitionanddatamonetizationprojects.TheCDMOshouldreporttotheCOOorCEObecausetheCDMOshouldalsohaverevenueandmarginresponsibilities.ReportingtotheCOOorCEOalsoensurestheCDMOhastheorganizationalclouttodrivecollaborationbetweenbusinessmanagementandtheCIOandtoleadtheorganization'sdatamonetizationefforts.
CDMOOrganizationAsorganizationsbuildouttheirdatascienceteams,thedatascienceteamsshouldfallunderthepurviewoftheCDMO.Thedatascienceteamneedsaseniormanagementchampion,andtheCDMOisthebestchoice.Heck,I'devenputtheBusinessIntelligence(BI)teamsunderthepurviewoftheCDMOinordertodriveclosercollaborationandsharelearningsbetweentheBIanddatascienceteams.(SeeChapter5forareviewonthedifferencesbetweenBIanddatascience.)
Iwouldhavethedatascientists(andBIteams)hardlinetothebusinessfunctionsanddottedlineintotheCDMO(seeFigure14.1).
Figure14.1CDMOorganizationalstructure
Todrivedatamonetizationsuccess,theBIanddatascienceteamsmustthoroughlyunderstandtheorganization'skeybusinessinitiatives,thedecisionsthatthebusinessneedstomakeandthequestionsthatthebusinessneedstoanswertosupportthosebusinessinitiatives.TheBIanddatascienceteamsneedtobeaccountabletothelineofbusinessbecausethat'swherevalue(revenue,profit,margin)isbeingcreated.
Bytheway,theCDMOshouldNOTownthedatalake,thedatawarehouse,oranyoftheunderlyingdataarchitectureortechnologies.ThesedataarchitecturesandtechnologiesneedtobeownedbytheCIO.Consequently,theCDMOmustcollaboratewiththeCIOtocreateadataarchitectureandtechnologyroadmapthatsupportstheCDMO'smonetizationefforts.
AnalyticsCenterofExcellenceTheanalyticsCenterofExcellence(COE)iscriticaltothesuccessoftheCDMO'sdatamonetizationcharterandneedstobetheresponsibilityoftheCDMO.KeyCDMOtaskswithrespecttotheCOEinclude:
Hiring,development,promotion,retention,andtalentmanagementofthedatascienceandBusinessIntelligenceteams(eveniftheydositwithinthebusinessunits)
Continuoustrainingprogramandcertificationonnewtechnologiesandanalyticalgorithms
Activeindustryanduniversitymonitoringtostayontopofmostcurrentdataanddatasciencetrends
BusinessIntelligence,datavisualization,statistical,predictiveanalytics,machinelearning,anddataminingtoolevaluationsandrecommendations
Capturing,sharing,andmanagement(i.e.,libraryfunction)oftheBusinessIntelligence,datawarehousing,anddatasciencebestpracticesacrossthe
organization
Identifyinganalyticprocessesworthyoflegalorpatentprotection
TheanalyticsCOEbecomesthesunaroundwhichthedatascienceandBusinessIntelligencepersonnel“orbit”fromaskillsandcareerdevelopmentperspective.
CDMOLeadershipTheCDMOneedstoworkcloselywiththeFinancedepartmentinordertodevelopdataacquisitionanddatamonetizationROIestimates.FinancewillkeeptheCDMOhonestwithrespecttocreatingvalue,butexpectthatrelationshiptobe“challenging”becauseFinancewilllikelystrugglewithputtingvalueonintangibleassetslikedataandanalyticinsight.That'swheretheCDMO'seconomicsbackgroundwillhelp.
Also,theCDMOwillneedtobecomeamasterfacilitator(bytheway,thisisagoodskillforanyonewhoistryingtobridgethegapbetweenITandthebusiness).TheCDMOisgoingtoneedtoleverageteamworkandcollaborationtobesuccessfulinthejob.Heorshealsomustbeinthefrontofthesedataandanalyticenablementdiscussions.TheCDMOmusttaketheinitiativetoleadtheculturalchangenecessarytogettheorganizationtomorereadilyembracedataandanalyticsintheoperationsofthebusiness.
InfusingaCDMOrolecanbeasignificantchallenge.Notonlyisitnew,sotherearenopredefinedbestpracticestoleverage,butalsotryingtodeterminethevalueofdataandanalyticsissomethingatwhichfeworganizationshavemastered.TheCDMOwillhavetocontinuallyprovehim-orherselftotherestoftheorganization.TheCDMOwillhavetoevangelizetherolethatdataandanalyticscanplayinimprovingtheentireorganization'sdecision-makingcapabilitiesandempoweringfrontlineemployeesandcustomers.
Privacy,Trust,andDecisionGovernanceBynow,we'veallheardthestory.Aretailer,byvirtueofitsadvancedanalysisofwebsiteactivities,determinedwithsomelevelofconfidencethataparticularwebsitevisitorwasapregnantwoman.Onthebasisofthisinsight,theretailerstartedmailingbaby-relatedcoupons(prenatalcare,babyroomfurniture,nursingproducts,etc.)tothewoman,whowasactuallya16-year-oldgirl.Thegirl'sfatherbecameoutragedwhenhesawthecouponsaddressedtohisdaughter.Hecomplainedtothelocalstoremanager,onlytolearntwoweekslaterthathisdaughterwasindeedpregnant.
Manyinthedatasciencecommunitymightperceivethisasahugesuccess—themerchandiser'ssuperiordatascienceskillswereabletodeterminethatafemalecustomerwaspregnantevenbeforeherfatherknew!However,theretailercreatedapublicrelationsfiasco,becausejustastheretailerknewwithsomelevelofconfidencethatthecustomerwaspregnant,theretaileralsolikelyknewwithsomelevelofconfidencethatshewasunderage.
Therearenumerousotherexampleswhereanorganizationmayuncover(withsomelevelofconfidence)insightsaboutitscustomersbutshouldnottoactonthoseinsights.Examplesinclude:
Customerisresearchingcancerorsomeotherseriousailment
Customerisresearchinganewjob(ifheorshehasanexistingjob)
Customerisresearchingdatingsites(ifheorshehasmarried)
Customerisresearchingdivorcelawyers(ifheorshegotbustedvisitingdatingsites)
Allofthesesituationscanprobablycanbeascertained(withsomelevelofconfidence)byminingacustomer'skeywordsearches,socialmediapostingsandexchanges,e-mailcommunications,andwebsiteandblogvisits(e.g.,timeonasite,frequencyofvisits,recencyofvisits,etc.).However,actingonthesesuspectedsituationscouldbecatastrophicfromanorganizationalgoodwillandpublicrelationsperspective.WhichbringsustowhatIbelieveshouldbethe“goldenrule”forbigdataanddatascience:
JustbecauseyouknoworsuspectsomethingaboutacustomerdoesNOTnecessarilymeanthatyoushouldactonthatknowledge.
Reluctancetoadheretothisrulecanbecatastrophicforanorganization,leadingtoprivacyissues,fines,andpotentiallyevenlawsuits.
PrivacyIssues=TrustIssuesCustomerloyaltyprogramsthrivebecauseorganizationsgivetheircustomerssomethinginreturnforpurchaseinformationandinformationaboutthecustomer.I'mamemberofnumerousloyaltyprograms,andtheseloyalty
programsrewardmyloyaltywithdiscounts,freecoffeeandpastries,freeairlinetripsandhotelstays,andcash.Igivetheminformationaboutmyshoppingandtravelactivities,andinreturntheypaymebackinrewardsanddiscounts.
However,I'mhesitanttoshareanyadditionalpersonalinformationbecause(1)theseorganizationshavenotgivenmeacompellingreasontosharemorepersonalinformation,and(2)Idonottrustthemtousethatdatainmybestinterests.Letmewalkyouthroughanexample.
Let'ssaythatyouareagrocerychainandyouwouldlovetoknowthefollowinginformationasacustomerwalksintoyourstore:
What'sonhershoppinglist?
What'sherbudget?
Ifthereisanyparticularevent(birthday,barbeque,party)forwhichsheisplanning?
Withthatinformation,thegrocerychaincouldcreateasetofrecommendationsthatwouldallowthecustomertooptimizeherbudget,aswellasrecommenditemsthatmightbeusefulfortheupcomingevent.Thatwouldbearealwinforboththecustomerandthegrocer.Infact,IwouldbewillingtosharethatinformationwithmygroceraslongasIcouldbeconfidentthatthegrocerwasmakingrecommendationsthatwereinmybestinterest.
However,theminutetheretailerrecommendssomethingthatisnotofvaluetomebutisofvaluetoit(i.e.,recommendsoneoftheretailer'smoreprofitableprivatelabelproductsasareplacementforthebrandedproductthatIhaveusedforyears),thenitwillhaveviolatedmytrustthatitwouldonlyusemydatainmybestinterests.
Trustistheheartoftheprivacyissuefromacustomer'sperspective:
Customersdon'ttrusttheorganizationtohavetheguidelinesandgovernanceinplacetoknowwhenitshouldact,andwhenitshouldNOTact,oninsightsthatithasgleanedaboutthem.
Customersdon'ttrusttheorganizationtofocusonthecustomer'sbestinterestsandinsteadoftheorganization'sbestinterests.
Customersdon'ttrusttheorganizationtorefrainfromsellingtheirpersonaldatatoothersforitsowngain.
Thisprivacyissueisonlygoingtobecomebiggerandbigger,especiallyasorganizationsbecomemoreproficientatminingbigdataanduncoveringnewinsightsabouttheircustomers'interests,passions,affiliations,andassociations.
Onesimplewaytotestwhetherornotyoushouldactontheinsightsthatyouhavegainedaboutyourcustomersisthe“Mom”test.Thatis,whatwouldyourmomthinkofyourdecisionabouthowyouusethatinformationaboutacustomer?Inmostcases,theMomtestwouldquicklyidentifythosethingsthatarejustnotthe
rightthingtodo.
However,organizationscan'trelyontheMomtest,sotheyneedamoreformaldecisiongovernanceorganization.
DecisionGovernanceOrganizationsneedaformaldecisiongovernanceorganizationandprocessesthatclearlyarticulatetherules,policies,andregulationswithrespecttohoworganizationswillandwillnotuseinformationabouttheircustomers.Decisiongovernanceisdifferentfromdatagovernanceinthefollowingways:
Datagovernanceprovidespolicies,procedures,andrulesthatmanagetheavailability,usability,integrity,security,andaccessibilityofanorganization'sdata.
Decisiongovernanceprovidespolicies,procedures,andrulesthatmanagethecapture,privacy,anduseoftheinsightstodriveinteractionsordecisionsthatmightimpactaparticularcustomer.
Mostorganizationsalreadyhaveadatagovernanceorganization,sotheylikelyalreadyhavetheexperience,policies,procedures,andpeopleonwhichtheycouldbuildtheirdecisiongovernanceorganization.
Thedecisiongovernanceteammustworkwiththebusinessstakeholderstodecidewhatinformationtheyareseekingontheircustomersandclearlydefinewhenandwheretheywillusethatinformation.Andifthereeverisasituationthatisnotcoveredbythedecisiongovernancepolicies,thennoactionshouldbetakenuntilthedecisiongovernanceorganizationhasdecidedwhattheproperactionshouldbe.
Decisiongovernancehasbecomeapriorityfororganizationsbecausetheadventofbigdataisenablingorganizationstogatherdetailedinsightsabouttheircustomers'behaviors,tendencies,propensities,interests,passions,affiliations,andassociationsthatcaneasilybeusedforbothappropriateandinappropriatedecisionsandactions.Lackofdecisiongovernanceisaclearandpresentdangertoorganizationsthataretryingtomineactionableinsightsoutoftheirbountyofconsumerdata.Organizationsneedtoacttoensuretheproperandethicaluseoftheircustomers'dataandtheresultinganalyticinsights,otherwisetheyriskopeningthemselvestosignificantprivacyissuesandlawsuits.
UnleashingOrganizationalCreativityAh,theanguishofnotknowingthe“right”answers.Organizationsstrugglewiththeprocessofdeterminingthe“right”answers,resultinginwasteddebatesanddivisiveargumentsregardingwhoseanswersaremoreright.Theyevenhaveanameforthisdebilitatingprocess—analysisparalysis—wheredifferentsidesoftheargumentbringforththeirownfactoidsandantidotalobservationstosupportthejustificationoftheir“right”answer.However,theconceptsofexperimentationandinstrumentationcanactuallyliberateorganizationsfromthis“analysisparalysis”byprovidingawayout—awayforwardthatleadstoactionversusjustmoredebates,morefrustrations,andmoreanalysisparalysis.
Formanyorganizations,theconceptsofexperimentationandinstrumentationareabitforeign.InternetcompaniesanddirectmarketingorganizationshaveingrainedthesetwoconceptsintotheiranalyticsandcustomerengagementprocessesthroughconceptslikeA/Btesting.1Theyhaveleveragedtheconceptsofexperimentationandinstrumentationtofreeuptheorganizationalthinking—tofreelyexplorenewideasandtest“hunches”—butinascientificmannerthatresultsinsolidevidenceandneworganizationallearning.
Let'sexaminehowyourorganizationcanembracethesesameconceptsaspartofyourbigdatabusinessstrategy.Let'sstartbydefiningtwokeyterms:
Experimentationistheact,process,practice,orinstanceofmakingexperiments,whereanexperimentisatest,trial,ortentativeprocedure;anactoroperationforthepurposeofdiscoveringsomethingunknownoroftestingaprinciple,supposition,orhypothesis.
Instrumentationistheprocessofmeasuringtheexperimentationresultswithinaproductionoroperationalenvironment.
Takentogether,thesetwoconceptscanliberateorganizationsthataresufferingfromanalysisparalysisandarestrugglingwhentheyarenotcertainwhatdecisiontomake.Theconceptsofexperimentationandinstrumentationcanempowerthecreativethinkingthatisnecessaryasorganizationslooktoidentifyhowtointegratedataandanalyticsintotheirbusinessmodels.This“empowerment”cycleempowersorganizationstofreelyconsiderdifferentideaswithoutworryingaboutwhethertheideasarecorrectaheadoftime.Organizationscanlettheteststellthemwhichideasare“right”andnotletthemostpersuasivedebaterormostseniorpersonmakethatdetermination.Itempowerstheorganizationtochallengeconventionalthinkingandunleashescreativethinkingthatcansurfacenewmonetizationideas.Nolongerdoyouhavetospendtimedebatingwhoseideaisright.Instead,puttheideastothetestandletthedatatellyou!
Let'swalkthroughanexampleofhowtheempowermentcycleworks(seeFigure14.2):
Figure14.2Empowermentcycle
Step1:Developahypothesisorhunchthatyouwanttotest.Forexample,Ibelievethatmytargetaudiencewillrespondmorefavorablytoa“BuyOneGetOneFree”(BOGOF)offer,whilemycolleaguebelievesthata“50%off”offerismoreattractivetoourtargetaudience.
Step2:Developthedifferenttestcasesthatcanproveordisprovethehypothesis.Youwanttobeclearastothemetricsyouwouldusetomeasurethetestresults(e.g.,clickthroughrate,storetraffic,sales,marketsentiment).Inthisexample,wewouldcreatethreetestcases:“BOGOF”offer,“50%off”offer,andacontrolgroup.Wewouldrandomlyselectourtestandcontrolaudiencesandensurethatothervariablesarebeingheldconstantduringthetest(e.g.,sametimeofday,sameaudiencecharacteristics,samechannel,sametimeframe,etc.).
Step3:Measuretheresultsofthetestcasesinordertodeterminetheeffectivenessofthetestcases.Inthisexample,we'dwanttoensurethateachofthethreetestcaseswasappropriatelyinstrumentedor“tagged”andthatwewerecapturingalltherelevantdatatodeterminewhorespondedtowhichoffers,whodidn'trespond,andtheultimateoutcomesoftheirresponses.
Step4:Executethetests.Wewouldnowstartthetests,capturethedata,endthetest,andquantifythetestresults.
Step5:Learnandmoveon.We'dlookatthetestresults,examinewhorespondedtowhatoffers,determinethefinalresults,anddeclareawinner.Wewouldthenpackageorsharethelearningswithotherpartsoftheorganizationandthenmoveontothenexttest.
Theempowermentcycleleveragesexperimentationandinstrumentationtoempowerorganizationstofreelyexploreandtestnewideas,anditempowersorganizationstogetmovingandnotgetboggeddowninanalysisparalysis.Experimentationandinstrumentationaretheanti-analysisparalysisointment,becausetheyprovideorganizationswiththetoolsandconceptstotestideas,learnfromthosetests,andmoveon.
SummaryThereareseveralorganizationalissuesthatneedtobeaddressedinordertohelporganizationsintegratedataandanalyticsintotheirbusinessmodels.Thischapteraddressedsomeconceptstohelptheorganizationmoreeffectivelyadoptdataandanalytics:
TheroleoftheChiefDataMonetizationOfficertoleadtheorganization'sdataandanalyticsinvestmentandmonetizationefforts
Addressingtheissuesofprivacyandtrustthroughaformalizeddecisiongovernanceorganization
Howtounleashtheorganization'screativethinking,whichistheonlythingstandingbetweenbigdatamediocrityandbigdatametamorphosis
Finally,don'tforgetthiscriticalcustomergoldenrule:
JustbecauseyouknoworsuspectsomethingaboutacustomerdoesNOTnecessarilymeanthatyoushouldactonthatknowledge.
Makeyourmomproud.
HomeworkAssignmentUsethefollowingexercisestoapplywhatyoulearnedinthischapter.
Exercise#1:DocumentthebiggestorganizationalchallengesthataCDMOwouldfacewithinyourorganization.Foreachchallenge,brainstormsomeideasastowhattheCDMOcoulddotoaddressthoseissues.
Exercise#2:Identifysomebusinesspartnerswithwhomyoucoulddiscussandultimatelytesttheempowermentcycle.Identifysomehypothesesorrulesofthumbthatyourbusinesspartnerswouldliketochallenge.Brainstormthedecisions,analytics,anddatarequirementsnecessarytochallengethatconventionalthinking.
Notes1A/Btesting(alsoknownassplittesting)isamethodofcomparingtwoversionsofawebpageagainsteachothertodeterminewhichoneperformsbetter.BycreatinganAandBversionofyourpage,youcanvalidatenewdesignchanges,testhypotheses,andimproveyourwebsite'sconversionrate.Source:https://www.optimizely.com/ab-testing/
Chapter15StoriesEveryonelovesstoriestowhichtheycanrelate,whichprobablymakesittheidealwaytoconcludethisbook.Whilestoriescanbefunandfunny,themostvaluablestoriesarethosethatmotivateustothinkdifferentlyandtakeaction,wherethestoryissocompellingthatthereadercan'twaittoputtheideasintoaction!
Thegoalsofthischapteraretosharesomebigdatastoriesandtohelpyou,thereader,developinspiringstoriesthatarerelevanttoyourorganizationandmotivatetheorganizationintoaction.
Insteadofprovidingalonglistofthedifferentanalyticsthatareoccurringwithindifferentindustries,I'mofferinga“thinkdifferently”approachforhowyoufindandconstructbigdatastoriesthatarethemostrelevanttoyourorganization.Insteadoflookingatthebigdatastoriesfromthetraditionalindustriesperspective,let'slookatstoriesfromtheperspectiveoftheorganization'sstrategicnouns,orkeybusinessentities.Ifindthatmostbigdataanddatasciencestoriesfallintothreecategoriesofbusinessentityanalytics(regardlessofindustry):
Customerandemployeeanalytics
Productanddeviceanalytics
Networkandoperationalanalytics
Theadvantageoflookingforstoriesacrossthesethreecategoriesisthatitpreventsorganizationsfromartificiallylimitingthemselvesinsearchingforrelevantbigdatastories.Manyorganizationsareonlyinterestedinhearingaboutbigdatastoriesthatarehappeningwithintheirindustry.That'sthe“safe”waytogo.Butsometimesthemostpowerfulopportunitiesarerealizedfromstoriesfromotherindustries.Havingabroaderviewofthesebigdatastoriescanopentheeyesofthebusinessexecutivesastothepotentialofbigdatawithintheirorganizations.
Forexample,digitalmediaorganizationsuse“attributionanalytics”toquantifytheimpactofdifferentdigitalmediatreatments(messaging,websites,impressions,displayadtype,displayadpagelocation,keywordsearches,socialmediaposts,dayparting,etc.)onaconversionorsalesevent.Thinkabouthowmanydifferentwebsites,displayads,andkeywordsearchesyouinteractwithasyoudecidetodosomething(e.g.,buyaproduct,requestsomecollateral,downloadanarticle,playagame,researchanevent,etc.).Attributionanalysislooksat“baskets”ofdigitalmediatreatmentsandactivitiesthatleadtoparticularconversioneventsacrossalargenumberofvisitorsandcreatescomplexdataenrichmentcalculations(frequency,recency,andsequencingofmarketingtreatments)inordertoattributesalescredittothesedifferentdigitalmediatreatments.Think“hockeyassist”asintryingtomeasuretheimpactthatawidevarietyofdigitalmediatreatmentshadoveraperiodoftimetodriveaconversionorsalesevent.1Followingisanexampleofhoworganizationsuseattribution
analysistomaximizecampaignreturnonmarketinginvestment(ROMI):
Digitalmediaattributionanalysis1.TrackActivitiesLeadingtoConversionEvents.Createmarketbasketsofkeywordsearches,sitevisits,displayimpressions,displayclicks,andothermediatreatmentsassociatedwitheachconversionevent
2.EnrichDatatoCreateNewMetricstoUnderstandDriversofVisitorBehaviors.Createmetricsaroundfrequencies,ordering,sequencing,andlatencies
3.AnalyzeMetricstoQuantifyCauseandEffect.Identifycommonalitiesinbaskets,calculatecorrelationsandstrengthofcorrelations,andbuild“conversionpath”models
4.OperationalizeActionableInsights.Operationalizeinsightsintomediaplanningandbuyingsystems,andguidein-flightcampaignexecution
Thatsameattributionanalyticswouldworkperfectlyintheareaofhealthcarewherephysicians,nurses,andothercaregiversaretryingtodetermineorattributetheimpactonapatient'swellnessacrossawidevarietyofhealthcare“treatments”includingmedications,surgery,supplements,therapy,diet,exercise,sleep,stress,religion,consoling,andmanyotherhealth-impactingvariables.Usingthedigitalmediaattributionanalytics,healthcareorganizationscoulddeterminewhichcombinations,frequency,recency,andsequencingofhealthcaretreatmentsaremosteffectiveforwhichtypesofpatientsinwhattypesofwellnesssituations.Butifhealthcareorganizationsonlylookwithintheirownindustry,theyarelikelytomissopportunitiestolearnfromotherindustries'analyticstoriesandmisstheopportunitytoapplythosestoriestooptimizetheirownkeybusinessprocesses,uncovernewmonetizationopportunities,andgainacompetitiveedgewithintheirindustry.
Thesethreebusinessentityanalyticsbucketswillhelpyouseethattheusecasetypeismorerelevantthantheindustryfromwhichitcame;thatitprovidesa“thinkdifferently”momenttoborrowanalyticbestpracticesfromotherindustries.Let'sdiscusseachofthesethreecategoriesinmoredetailtoseewhatstoriesyoumightuncoverthatcouldbemeaningfultoyourorganization:
Customerandemployeeanalytics
Productanddeviceanalytics
Networkandoperationalanalytics
CustomerandEmployeeAnalyticsFororganizationsinbusiness-to-consumer(B2C)industries,understandingandtakingcareofcustomersisjob#1.Understandingindetailthepropensities,tendencies,patterns,interests,passions,affiliations,andassociationsofeachofyourindividualcustomersiskeytoincreasingrevenue,reducingcosts,mitigatingrisks,andimprovingmarginsandprofits.
Customerscantakemanyformsincludingvisitors,passengers,travelers,guests,lodgers,patients,students,clients,residents,citizens,constituents,prisoners,players,andmore.ManyB2Cindustriescanbenefitdirectlyfromdataandanalyticsthatyieldsuperiorinsightsintothebehaviorsoftheircustomersincluding:
Retail
Restaurants
Travelandhospitality
Airlines
Automotive
Gaming
Entertainment
Banking
Creditcards
Financialservices
Healthcare
Insurance
Media
Telecommunications
Consumerelectronics(e.g.,computers,tablets,digitalcameras,digitalmediaplayers,GPSdevices)
Primaryandhighereducation
Utilities
Oilandgas
Publicserviceagencies
Governmentagencies
Thefoundationofcustomeranalyticsisidentifying,quantifying,andpredictingtheindividualcustomer'sbehavioralcharacteristics(propensities,tendencies,
patterns,trends,interests,passions,associations,andaffiliations)toidentifyopportunitiestoengagethecustomertoinfluencehisorherbehaviors.Somecallthis“catchingthecustomerintheact.”Themoretimelytheidentificationofthesecustomerinteractions,thebetterthechancesofuncoveringnewrevenueormonetizationopportunities.Customeranalyticsincludethefollowing:2
Customeracquisitionmeasurestheeffectivenessofdifferentsalesandmarketingtechniquestogetcustomerstosampleortrialyourproductorservice.
Customeractivationmeasurestheeffectivenessofdifferentsalesandmarketingtechniquestogetcustomerstoregularlyuseand/orpayforyourproductorservice.
Customercross-sellandup-sellmeasurestheeffectivenessofdifferentsales,marketing,andmerchandisingtechniquestogetcustomerstoupgradetheproductsandservicesthattheyalreadyuseorbuyand/orgetcustomerstouseorbuycomplementaryproductsandservices.
Customerretentionmeasurestheeffectivenessofsales,marketing,andcustomerservicetreatmentstoidentifycustomerslikelytoattriteandthesubsequenteffortstoretainthosecustomers.
Customersentimentmonitorsthesentimentofcustomersacrossmultiplesocialmediasites,blogs,consumercomments,ande-mailconversationstoflagproduct,service,oroperationalproblemareasandrecommendcorrectiveaction.
Customeradvocacymeasureshoweffectiveparticularcustomersareatinfluencingothercustomers'actionsorbehaviors.
NOTE
Someindustryestimatesshowthatjust3percentoftheparticipantsinanonlineconversationyieldover90percentoftheresults—suchaslikes,views,retweets,linkbacks—withinaparticularsubjectarea.
Customerlifetimevaluedeterminesthecurrent(andfutureormaximum)valueofaparticularcustomer.
Customerfraudmonitorsandflagspotentialfraudulentactivitiesinreal-timeinordertorecommendtimelycorrectiveorpreventiveaction.
Cohortanalysisdeterminestheimpactthatoneparticularcustomerhasonothercustomersindrivingparticularcustomerand/orgroupbehaviors.
Thereisalsoasetofcustomeranalyticsaroundmarketing.Thesemarketinganalyticsinclude:
Targetingeffectivenessmeasurestheeffectivenessofmarketing'stargeting
effortstoreachthe“right”orhighestqualifiedprospects.
Re-targetingeffectivenessmeasurestheeffectivenessofre-targetingeffortstore-targetprospectsthathaveshownaninterestinaparticularproductorservice.
Segmentationeffectivenessmeasurestheeffectivenessofsegmentationeffortstoidentifyhigh-valueprospectclusters.
Campaignmarketingeffectivenessmeasurestheeffectivenessofgeneralmarketingcampaignsatdrivingcustomerorprospectactions.
Directmarketingeffectivenessmeasurestheeffectivenessofdirect-to-consumermarketingcampaignstogetcustomerstorespondtomarketingrequestsorbuyparticularproductsorservices.
Promotionaleffectivenessmeasurestheeffectivenessofchannelorpartnerpromotionalactivities,events,packages,andoffers.
A/Btestingteststheeffectivenessoftwodifferentmarketingtreatments(messaging,adtypes,websites,keywords,daypart,andpagelocation)todeterminewhichmarketingtreatmentismosteffectiveindrivingthedesiredcustomeractionorbehavior.
Marketbasketanalysisdeterminesthepropensityofproductsorservicestosellincombinationwithotherproductsandservices(withinsamebasketorshoppingcart).Marketbasketanalysisalsocanidentifytimelagsbetweenpurchaseevents(buyaboatandthentwoweekslater,buywaterskis).
Attributionanalysisquantifiesthecontributionofdifferentdigitalmarketingormediatreatmentsindrivingacustomereventoractivity(e.g.,buyaproduct,downloadanapp,playagame,requestcollateral,researchanevent).
Omni-channelmarketinganalysisquantifiestheinter-playofmarketingeffectivenessacrossmultipleretailorbusinesschannels(e.g.,physicalstore,catalog,callcenter,website,socialmedia)indrivingsalesresults.
Tradepromotioneffectivenessmeasurestheeffectivenessofchannelorpartnerpromotionstodriveendconsumersales.
Pricingandyieldoptimizationdeterminesboththetimingandthe“optimal”pricesinordertomaximizerevenueandprofitabilityforperishableproductsorservices(vegetables,meat,airlineseats,hotelrooms,sportingevents,concerts).
Markdownmanagementoptimizationdeterminesthetimingandamountofpricereductionandpromotionstoreduceobsoleteandexcessinventorywhilebalancingrevenue,margin,andcostvariables.
Bytheway,manyofthesecustomeranalyticshaveacorollaryforemployeeanalytics(teachers,policeofficers,paroleofficers,caseworkers,physician,nurses,
technicians,mechanics,pilots,drivers,entertainers,etc.).Theseanalyticsinclude:
Employeeacquisition(hiring)measurestheeffectivenessofdifferenthiringpracticesandrecruitingpersonneltoidentifyandhirethemostproductiveandsuccessfulemployees.
Employeeactivation(productivityorperformance)measurestheeffectivenessoftrainingprogramsandmanagerstoengageemployeesanddrivemoreproductiveandeffectiveperformance.
Employeedevelopment(promotions,firing)measurestheeffectivenessofreviews,promotions,training,coaching,interventions,andmanagementtoidentifyandpromotehighpotentialemployeesandreleaselowproductivityemployeesatthelowestcostandlowestrisk.
Employeeretentionmeasurestheeffectivenessofpromotions,raises,awards,stockoptions,etc.toretaintheorganization'smostvaluableandproductiveemployees.
Employeeadvocacy(hiringreferrals)measurestheeffectivenessofadvocacyandreferralprogramstoacquirehighpotentialjobcandidates.
Employeelifetimevaluedeterminesorscoresthecurrent(andfutureormaximum)valueofemployeestotheorganization.
Employeesentiment(employeesatisfaction,“bestplacestowork”surveys,etc.)identifies,measures,andrecommendscorrectiveactiononthedriversofemployeeanddepartmentaldissatisfaction.
Employeefraud(shrinkage)monitorsandflagsshrinkageproblemsandtriagesthosesituationstoidentifyrootcausesoffraudandshrinkage.
Itcanbeusefultolookatwhatotherorganizationsinotherindustriesaredoingtobetterunderstandtheircustomersandemployees.Forexample,yourorganizationcouldidentifywhichorganizationsarebestatleveragingcustomerloyaltyprogramstodrivecustomeracquisition,maturation,retention,andadvocacy.Thenidentifywhatdatatheyarecapturingabouttheircustomersandwhatanalyticstheyareleveragingtoimprovethecustomerexperience.Therearemanyexamplesoforganizationsthatunderstandhowtooptimizetheirloyaltyprograms.Justgograbaventinon-fat,nowaterchailatteatacertaincoffeechaintoexperiencethatforyourself.
ProductandDeviceAnalyticsThesecondareaofbusinessentityanalyticsfocusesonphysicalitems—productsandmachines.Manyofthesamebehavioralanalyticbasicsthatareusedincustomeranalyticsareapplicableforproductsandmachines.Likehumans,productsandmachinesexhibitdifferentbehavioraltendencies,especiallyovertime.Twowindturbinesmanufacturedbythesamemanufacturer,installedatthesametime,andlocatedinthesamecornfieldcoulddevelopverydifferentbehaviorsandtendenciesovertimeduetousage,maintenance,upgrades,andgeneralproductwearandtear.
Analyticsaboutproductsandmachines(airplanes,jetengines,cars,deliverytrucks,locomotives,ATMs,washingmachines,routers,trafficlights,windturbines,powerplants,etc.)couldincludeanyofthefollowing:
Predictivemaintenancepredictswhencertainproductsordevicesareinneedofmaintenance,whatsortofmaintenance,thelikelymaintenanceandreplacementmaterials,andtechnicianskillsets.
Maintenanceschedulingoptimizationoptimizestheschedulingofresources(technicianswiththerightskillsets,replacementparts,maintenanceequipment,etc.)inordertooptimizethereplacementand/orupgradingoffailingorunder-performingpartsorproducts.
Maintenance,repair,andoperations(MRO)inventoryoptimizationbalancesMROinventorywithpredictedmaintenanceneedsinordertoreduceinventorycostsandminimizeobsoleteandexcessiveinventory.
Productperformanceoptimizationoptimizesproductperformanceandmeantimebetweenmaintenance(MTBM)byunderstandingtheproduct'sordevice'soptimaloperationperformanceranges,tolerances,andvariances.
Manufacturingeffectivenessreducesmanufacturingcostswhilemaintainingproductqualitylevelsandproductionschedulesthroughtheoptimalmixofsupplies,suppliers,andin-houseandcontractmanufacturingcapabilities.
Supplierperformanceanalyticsquantifysupplierproductqualityanddeliveryreliabilityinordertominimizemanufacturinglinedowntown.
Supplierdecommits/recommitsanalyticsunderstandoptimalproductioncapacitiesofsuppliersandcontractmanufacturersinordertoproperlyrebalancemanufacturingneedscausedbysupplychaindisruptions(strikes,storms,wars,rawmaterialshortages).
Suppliernetworkanalyticstriageproductandsupplierproblemsmorequicklybyunderstandingthedynamicsoftheunderlyingsupplierandcontractmanufacturerrelationshipsandinter-dependencies.
ProducttestingandQAeffectivenessacceleratesproductquality
assurancetestingbyoptimizingthetestsand/orcombinationsofteststhatcauseproducts,components,suppliers,andcontractmanufacturerstofailmorequickly.
Supplychainoptimizationoptimizessupplychaindeliveryandinventorylevelswhileminimizingsupplychaincostsandrisksassociatedwithobsoleteandexcessinventory.
OptimizeMROpartsinventorytodeterminetheappropriatelevelofMROpartsinventorybasedonpredictedmaintenanceneeds.
Newproductintroductionsoptimizeproductandmarketingmixtoincreasetheprobabilityofsuccesswhenlaunchingnewproducts,productextensions,and/ornewproductversions.
Productrationalization/retirementdetermineswhichproductstodivestorretire,andwhen,basedonthatproduct'simpactoncustomervalueandinter-relatedprofitabilityofotherproducts(marketbasketanalysis).
Brandandcategorymanagementanalysisdeterminesoptimalpricing,packaging,placement,andpromotionalvariablesofindividualbrandsandproductswithinbrandstodriveoverallbrandandcategoryrevenues,profitability,andmarketshare.
Product-centricindustriesmostimpactedbyproductanddeviceanalyticsinclude:
Consumerpackagedgoods
High-techmanufacturing
Applianceandelectronicsmanufacturing
Sportinggoodsmanufacturing
Foodandbeverage
Automotive
Agriculture
Farmmachinerymanufacturing
Heavyequipmentmanufacturing
Pharmaceuticals
Financialservices
Banking
Creditcards
Insurance
NetworkandOperationalAnalyticsThethirdareaofbusinessentityanalyticsfocusesonnetworkandoperationalanalytics.The“internetofthings”(IoT)andwearablecomputing(Fitbit,Jawbone,Garmin)hasincreasedthelevelofinterest(andthevolumeandvarietyofdata)aboutwhatishappeningacrossvastandcomplexhumanandmachine/devicenetworks.Morethanever,weareaninterconnectedworldwheretheactionsofonepersonordeviceinasocialorphysicalnetworkcanhavea“butterflyeffect”onallofthepeopleanddevicesacrossthatnetwork.3
NetworkscantakemanydifferentshapesandformsincludingATMnetworks,retailbranches,suppliernetworks,devicesensors,in-storebeacons,mobiledevices,cellulartowers,trafficlights,slotmachines,andcommunicationnetworks.
Analyticsaboutnetworksandoperationscouldincludeanyofthefollowing:
Demandforecastingforecastsnetworkdemand(averagedemand,surgedemand,minimalviabledemand)basedonpredictednetworkusagebehaviors,patterns,andtrends.
Capacityplanningpredictsnetworkcapacityrequirementsinallpotential(whatif)workingsituations.
Reduceunplanneddowntimetoidentify,monitor,andpre-emptivelypredictthefailureofthedriversofunplannednetworkdowntime.
Networkperformanceoptimizationpredictsandoptimizesnetworkperformanceacrossmultipleusagescenarios(networktraffic,weather,seasonality,holidays,specialevents)inreal-time.
Networklayoutoptimizationoptimizesnetworklayoutinordertominimizetrafficbottlenecksandoptimizenetworkbandwidthandthroughput.
Reducenetworktraffictotriagenetworktrafficbottlenecksandprovidereal-timeincentivesand/orgovernorstoreduceorre-routetrafficduringoverloadsituations.
Loadbalancingidentifiesandrebalancesnetworktrafficbasedoncurrentandforecastedtrafficneedsandcurrentnetworkcapacity.
Theftandrevenueprotectionidentifies,understands,andrecommendsthemostappropriaterevenueprotectionactionsbasedontheftsituationsacrossthenetwork.
Predictivemaintenancepredictswhennetworknodesareinneedofmaintenance,whatsortofmaintenance,thelikelymaintenanceandreplacementmaterials,andtechnicianskillsets.
Networksecurityidentifies,understands,andrecommendsthemostappropriateactionsbasedonunauthorizednetworkordevice/nodeentryor
usagesituationsacrossthenetwork.
Industriesmostimpactedbynetworkandoperationalanalyticstendtobeindustriesthatrunormanagecomplexprojectsorsystems.Theseindustrieshavetocoordinatemultiplevendorsandsuppliersacrossmultiplesub-assembliesorsub-projectsinordertodelivertheendproductorprojectontimeandwithinbudget.Someoftheseindustriesinclude:
Large-scaleconstruction(skyscrapers,malls,stadiums,airports,dams,bridges,tunnels,etc.)
Airplanemanufacturing
Shipbuilding
Defensecontractors
Systemsintegrators
Telecommunicationnetworks
Railroadnetworks
Transportationnetworks
Therearemany,manymoreexamplesofcustomer,product,andnetworkanalytics.Thelistaboveisagoodstarterpoint.Andwhileinvestigatinganalyticusecaseswithinyourownindustryis“safe,”betterandpotentiallymoreimpactfulanalyticusecasescanlikelybefoundbylookingforcustomer,product,andnetworkanalyticsuccessstoriesinotherindustries.Bucketingtheanalyticusecasesintothosethreecategorieshelpsthereadertocontemplateawidervarietyofanalyticopportunitiesandbestpracticesacrossdifferentindustries.
Thinkdifferentlywhenyouareinsearchoftheanalyticsthatmaybemostimpactfultoyourorganization.Don'tassumethatyourindustryhasalltheanswers.
CharacteristicsofaGoodBusinessStoryThefinalstepinthebookistopulltogetherthe“thinkinglikeadatascientist”resultsandthesampleanalyticstocreateastorythatisinterestingandrelevanttoyourorganization.Whileitcanbeusefultohearaboutwhatotherorganizationsaredoingwithbigdataanddatascience,themostcompellingstorieswillbethosestoriesaboutyourorganizationthatmotivateyourseniorleadershiptotakeaction.
Youknowfromreadingbooksandwatchingmoviesthatthebeststorieshaveinterestingcharactersthathavebeenputintoadifficultsituation.Heck,thatsoundslikedatasciencealready.Tocreatecompellingstories,youaregoingtoneedthefollowingcomponentstocreateaninterestingandrelevantstorythatisuniquetoyourorganization(thinkabouttheprocessinrelationshiptoyourfavoritesciencefictionadventuremovie):
Keybusinessinitiative(survivalofthehumanrace)
Strategicnounsorkeybusinessentities(pilots,scientists,aliens)
Currentchallengingsituation(aliensaregoingtoconquerEarthandexterminatethehumanrace)
Creativesolution(infectthealienshipswithacomputervirusthatshutsdowntheirdefensiveshields)
Desiredgloriousendstate(aliensgettheirbuttskicked,andthewholeworldbecomesoneunitedbrotherhood)
Let'sseethisprocessinaction:
Let'ssaythatyourorganizationhasasakeybusinessinitiativeto“reducecustomerchurnby10percentoverthenext12months.”
Yourstrategicnounis“customer.”
Thecurrentchallengingsituationis“toomanyofourmostvaluablecustomersareleavingthecompanyandgoingtocompetitors.”
Thecreativesolutionis“developinganalyticsthatflagcustomerswhohaveahighpropensitytoleavethecompany,createacustomerlifetimevaluescoreforeachcustomer(sothatyourorganizationisnotwastingvaluablesalesandmarketingresourcessavingthe‘wrong’customers),anddelivermessagestofrontlineemployees(callcenterreps,salesteams,partners)withrecommendedofferstodelivertothecustomerifavaluablecustomerhasascorewithan‘atrisk’propensitytoleave.”
Thegloriousendstateis“dramaticincreaseintheretentionoftheorganization'smostvaluablecustomersthatleadstoanincreaseincorporateprofits,anincreaseincustomersatisfaction,andgenerousraisesforall!”
Thisisaneasyprocessifyouunderstandyourorganization'skeybusiness
SummaryBroadenyourhorizonswithrespecttolookingforanalyticusecases.Insteadofjustlookingwithinyourownindustry,lookacrossdifferentindustriesforanalyticusecasesaround:
Customerandemployeeanalytics
Productanddeviceanalytics
Networkandoperationalanalytics
Sincethisisthelastchapterofthebook,putacherryonthetopofyourBigDataMBAbydevelopingacompellingandrelevantstorythatyoucansharewithinyourorganizationtomotivateseniorleadershiptoaction.Makethestorycompellingbytyingoneoftheaboveanalyticusecasestoyourorganization'skeybusinessinitiatives,andmakethestoryrelevantbyleveragingyour“thinkinglikeadatascientist”training.Thatwayyouensurethatalltheworkyouhaveputintoreadingthisbookanddoingthehomeworkcanleadtosomethingofcompellinganddifferentiatedvaluetotheorganization.Andheck,maybeyouwillgetapromotionoutofit!
Congratulations!Foraspecialsurprise,gotothisURL:www.wiley.com/go/bigdatamba.Anddon'tsharethisURLwithanyoneelse.Makeotherfolksreadtheentirebooktofindthis“Easteregg”surprise.
NowyouhaveearnedyourBigDataMBA!Goget'em!
HomeworkAssignmentUsethefollowingexercisestoapplywhatyoulearnedinthischapter.
Exercise#1:Identifyoneofyourorganization'skeybusinessinitiatives.
Exercise#2:Applythe“thinkinglikeadatascientist”approachtoidentifytherelevantbusinessstakeholders,keybusinessentitiesorstrategicnouns,keydecisions,potentialrecommendations,andsupportingscores.
Exercise#3:Nowcreateastorythatweavestogetheralloftheseitemswitharelevantanalyticsexamplethatcanhelpseniorleadershiptounderstandthebusinesspotentialandmotivatethemintoaction.Useyourstrategicnounstohelpyoufindsomerelevantanalyticusecasesoutlinedinthischapter.
Notes1Inhockey,a“hockeyassist”orcreditisgiventotheplayerwhogivesanassisttotheplayerwhogetstheultimateassistthatleadsdirectlytoanotherplayerscoringagoal.Thinkofthisasan“assisttoanassist”statistic.
2Thisisnotintendedtobeacomprehensivelistofcustomeranalytics,butitinsteadrepresentsasampleofthetypesofcustomeranalyticsforwhichorganizationsinbusiness-to-consumerindustriesshouldbeaware.
3Inchaostheory,the“butterflyeffect”isthesensitivedependenceoninitialconditionsinwhichasmallchangeinonestateofadeterministicnonlinearsystemcanresultinlargedifferencesinalaterstate.
BigDataMBA:DrivingBusinessStrategieswithDataScience
Publishedby
JohnWiley&Sons,Inc.
10475CrosspointBoulevard
Indianapolis,IN46256www.wiley.com
Copyright©2016byBillSchmarzo
PublishedbyJohnWiley&Sons,Inc.,Indianapolis,Indiana
PublishedsimultaneouslyinCanada
ISBN:978-1-119-18111-8
ISBN:978-1-119-23884-3(ebk)
ISBN:978-1-119-18138-5(ebk)
Nopartofthispublicationmaybereproduced,storedinaretrievalsystemortransmittedinanyformorbyanymeans,electronic,mechanical,photocopying,recording,scanningorotherwise,exceptaspermittedunderSections107or108ofthe1976UnitedStatesCopyrightAct,withouteitherthepriorwrittenpermissionofthePublisher,orauthorizationthroughpaymentoftheappropriateper-copyfeetotheCopyrightClearanceCenter,222RosewoodDrive,Danvers,MA01923,(978)750-8400,fax(978)646-8600.RequeststothePublisherforpermissionshouldbeaddressedtothePermissionsDepartment,JohnWiley&Sons,Inc.,111RiverStreet,Hoboken,NJ07030,(201)748-6011,fax(201)748-6008,oronlineathttp://www.wiley.com/go/permissions.
LimitofLiability/DisclaimerofWarranty:Thepublisherandtheauthormakenorepresentationsorwarrantieswithrespecttotheaccuracyorcompletenessofthecontentsofthisworkandspecificallydisclaimallwarranties,includingwithoutlimitationwarrantiesoffitnessforaparticularpurpose.Nowarrantymaybecreatedorextendedbysalesorpromotionalmaterials.Theadviceandstrategiescontainedhereinmaynotbesuitableforeverysituation.Thisworkissoldwiththeunderstandingthatthepublisherisnotengagedinrenderinglegal,accounting,orotherprofessionalservices.Ifprofessionalassistanceisrequired,theservicesofacompetentprofessionalpersonshouldbesought.Neitherthepublishernortheauthorshallbeliablefordamagesarisingherefrom.Thefactthatanorganizationorwebsiteisreferredtointhisworkasacitationand/orapotentialsourceoffurtherinformationdoesnotmeanthattheauthororthepublisherendorsestheinformationtheorganizationorwebsitemayprovideorrecommendationsitmaymake.Further,readersshouldbeawarethatInternetwebsiteslistedinthisworkmayhavechangedordisappearedbetweenwhenthisworkwaswrittenandwhenitisread.
ForgeneralinformationonourotherproductsandservicespleasecontactourCustomerCareDepartmentwithintheUnitedStatesat(877)762-2974,outsidetheUnitedStatesat(317)572-3993orfax(317)572-4002.
Wileypublishesinavarietyofprintandelectronicformatsandbyprint-on-demand.Somematerialincludedwithstandardprintversionsofthisbookmaynotbeincludedine-booksorinprint-on-demand.IfthisbookreferstomediasuchasaCDorDVDthatisnotincludedintheversionyoupurchased,youmaydownloadthismaterialathttp://booksupport.wiley.com.FormoreinformationaboutWileyproducts,visitwww.wiley.com.
LibraryofCongressControlNumber:2015955444
Trademarks:WileyandtheWileylogoaretrademarksorregisteredtrademarksofJohnWiley&Sons,Inc.and/oritsaffiliates,intheUnitedStatesandothercountries,andmaynotbeusedwithoutwrittenpermission.Allothertrademarksarethepropertyoftheirrespectiveowners.JohnWiley&Sons,Inc.isnotassociatedwithanyproductorvendormentionedinthisbook.
AbouttheAuthor
BillSchmarzoistheChiefTechnologyOfficer(CTO)oftheBigDataPracticeofEMCGlobalServices.AsCTO,BillisresponsibleforsettingthestrategyanddefiningthebigdataserviceofferingsandcapabilitiesforEMCGlobalServices.Healsoworksdirectlywithorganizationstohelpthemidentifywhereandhowtostarttheirbigdatajourneys.BillistheauthorofBigData:UnderstandingHowDataPowersBigBusiness,writeswhitepapers,isanavidblogger,andisafrequentspeakerontheuseofbigdataanddatasciencetopoweranorganization'skeybusinessinitiatives.HeisaUniversityofSanFranciscoSchoolofManagement(SOM)Fellow,whereheteachesthe“BigDataMBA”course.
Billhasoverthreedecadesofexperienceindatawarehousing,businessintelligence,andanalytics.HeauthoredEMC'sVisionWorkshopmethodologyandco-authoredwithRalphKimballaseriesofarticlesonanalyticapplications.BillhasservedonTheDataWarehouseInstitute'sfacultyastheheadoftheanalyticapplicationscurriculum.Previously,hewastheVicePresidentofAnalyticsatYahoo!andoversawtheanalyticapplicationsbusinessunitatBusinessObjects,includingthedevelopment,marketing,andsalesoftheirindustry-defininganalyticapplications.
Billholdsamaster'sdegreeinBusinessAdministrationfromtheUniversityofIowaandaBachelorofSciencedegreeinMathematics,ComputerScience,andBusinessAdministrationfromCoeCollege.Bill'srecentblogscanbefoundathttp://infocus.emc.com/author/william_schmarzo/[email protected]/in/schmarzo.
AbouttheTechnicalEditorJeffreyAbbottleadstheEMCGlobalServicesmarketingpracticearoundbigdata,helpingcustomersunderstandhowtoidentifyandtakeadvantageofopportunitiestoleveragedataforstrategicbusinessinitiatives,whiledrivingawarenessforaportfolioofservicesofferingsthatacceleratecustomertime-to-value.Asacontentdeveloperandprogramlead,Jeffemphasizesclearandconcisemessagingonpersona-basedcampaigns.PriortoEMC,Jeffhelpedbuildandpromoteacloud-basedecosystemforCATechnologiesthatcombinedanonlinesocialcommunity,aclouddevelopmentplatform,andane-commercesiteforcloudservices.JeffalsospentseveralyearswithinCA'sThoughtLeadershipgroup,creatingandpromotingexecutive-levelmessagingandsocial-mediaprogramsaroundmajordisruptivetrendsinIT.JeffhasheldvariousotherproductmarketingrolesatfirmssuchasEMC,Citrix,andArdenceandspentadecaderunningclientaccountsatnumerousboutiquemarketingfirms.JeffstudiedsmallbusinessmanagementattheUniversityofVermontandresidesinSudbury,MA,withhiswife,twoboys,anddog.Jeffenjoysskiing,backpacking,photography,andclassiccars.
CreditsProjectEditor
AdaobiObiTultonandChrisHaviland
TechnicalEditor
JeffreyAbbott
ProductionEditor
BarathKumarRajasekaran
CopyEditor
ChrisHaviland
ManagerofContentDevelopment&Assembly
MaryBethWakefield
ProductionManager
KathleenWisor
MarketingDirector
DavidMayhew
MarketingManager
CarrieSherrill
ProfessionalTechnology&StrategyDirector
BarryPruett
BusinessManager
AmyKnies
AssociatePublisher
JimMinatel
ProjectCoordinator,Cover
BrentSavage
Proofreader
NicoleHirschman
Indexer
NancyGuenther
CoverDesigner
Wiley
AcknowledgmentsAcknowledgmentsaredangerous.NotdangerouslikewrestlinganalligatororanunhappyChicagoCubsfan,butdangerousinthesensethattherearesomanypeopletothank.HowdoIpreventtheAcknowledgmentssectionfrombecominglongerthanmybook?Thisbookrepresentsthesumofmany,manydiscussions,debates,presentations,engagements,andlatenightbeersandpizzathatIhavehadwithsomanycolleaguesandcustomers.Thankstoeveryonewhohasbeenonthisjourneywithme.
SorealizingthatIwillmissmanyfolksinthisacknowledgment,hereIgo…
Ican'tsayenoughaboutthecontributionsofJeffAbbott.NotonlywasJeffmyEMCtechnicaleditorforthisbook,buthealsohastheunrewardingtaskofeditingallofmyblogs.Jeffhasthepatiencetoputupwithmywritingstyleandthesmartstoknowhowtospinmymaterialsothatitisunderstandableandreadable.Ican'tthankJeffenoughforhispatience,guidance,andfriendship.
JenSorenson'sroleinthebookwasonlysupposedtobeEMCPublicRelationseditor,butJendidsomuchmore.TherearemanychaptersinthisbookwhereJen'ssuggestions(usingtheFairy-TaleThemeParksexampleinChapter6)madethechaptersmoreinteresting.Infact,Chapter6isprobablymyfavoritechapterbecauseIwassoovermyskisonthedatasciencealgorithmsmaterial.ButJendidamarvelousjoboftakingadifficulttopic(datasciencealgorithms)andmakingitcometolife.
Speakingofdatascience,PedroDeSouzaandWeiLinarethetwobestdatascientistsIhaveevermet,andIamevenmoregratefulthatIgettocallthemfriends.Theyhavebeenpatientinhelpingmetolearntheworldofdatascienceoverthepastseveralyears,whichisreflectedinmanychaptersinthebook(mostnotablyChapters5and6).Butmorethananythingelse,theytaughtmeaveryvaluablelifelesson:beinghumbleisthebestwaytolearn.Ican'tevenexpressinwordsmyadmirationforthemandhowtheyapproachtheirprofession.
JoeDossantosandJoshSiegelmaybesurprisedtofindtheirnamesintheacknowledgments,buttheyshouldn'tbe.BothJoeandJoshhavebeenwithmeonmanystepsinthisbigdatajourney,andbothhavecontributedtremendouslytomyunderstandingofhowbigdatacanimpactthebusinessworld.Theirfingerprintsarealloverthisbook.
AdaobiObiTultonandChrisHavilandaremytwoWileyeditors,andtheyareabsolutelymarvelous!Theyhavegoneoutoftheirwaytomaketheeditingprocessaspainlessaspossible,andtheyunderstandmyvoicesowellthatIacceptedover99percentofalloftheirsuggestions.BothAdaobiandChrisweremyeditorsonmyfirstbook,soIguesstheyforgothowmuchofaPITA(paininthea**)Icanbewhentheyagreedtobetheeditorsonmysecondbook.ThoughIhavenevermetthemface-to-face,IfeelastrongkinshipwithbothAdaobiandChris.Thanksforallofyourpatienceandguidanceandyourwonderfulsensesofhumor!
AveryspecialthankyoutoProfessorMouwafacSidaoui,withwhomIco-teachtheBigDataMBAattheUniversityofSanFranciscoSchoolofManagement(USFSOM).Icouldnotpickabetterpartnerincrime—heissmart,humble,demanding,fun,engaging,worldly,andeverythingthatonecouldwantinafriend.IamaFellowattheUSFSOMbecauseofMouwafac'sefforts,andhehassetmeupformynextcareer—teaching.
IalsowhattothankDeanElizabethDavisandtheUSFMBAstudentswhowerewillingtobeguineapigsfortestingmanyoftheconceptsandtechniquescapturedinthisbook.Theyhelpedmetodeterminewhichideasworkedandhowtofixtheonesthatdidnotwork.
AnotherspecialthankyoutoEMC,whosupportedmeasIworkedattheleadingedgeofthebusinesstransformationalpotentialofbigdata.EMChasaffordedmethelatitudetopursuenewideas,concepts,andofferingsandinmanysituationshasallowedmetobethetipofthebigdataarrow.Icouldnotaskforabetteremployerandpartner.
ThethankyoulistshouldincludetheexcellentandcreativepeopleatEMCwithwhomIinteractonaregularbasis,butsincethatlististoolong,I'lljustmentionEd,Jeff,Jason,Paul,Dan,Josh,Matt,Joe,Scott,Brandon,Aidan,Neville,Bart,Billy,Mike,Clark,Jeeva,Sean,Shriya,Srini,Ken,Mitch,Cindy,Charles,Chuck,Peter,Aaron,Bethany,Susan,Barb,Jen,Rick,Steve,David,andmany,manymore.
Iwanttothankmyfamily,whohasputupwithmeduringthebookwritingprocess.MywifeCarolynwasgreataboutgrabbingChipotleformewhenIhadatoughdeadline,andmysonsAlecandMaxandmydaughterAmeliaweresupportivethroughoutthebookwritingprocess.I'vebeenblessedwithamarvelousfamily(juststopstealingmyChipotleintherefrigerator!).
Mymomanddadbothpassedaway,butIcanimaginetheirlookofsurpriseandprideinthefactthatIhavewrittentwobooksandamteachingattheUniversityofSanFranciscoinmysparetime.Wewillgetthechancetotalkaboutthatinmynextlife.
Butmostimportant,IwanttothanktheEMCcustomerswithwhomIhavehadthegoodfortunetowork.Customersareatthefrontlineofthebigdatatransformation,andwherebettertobesituatedtolearnaboutwhat'sworkingandwhat'snotworkingthenarm-in-armwithEMC'smostexcellentcustomersatthosefrontlines.Trulythebestpartofmyjobisthechancetoworkwithourcustomers.Heck,I'mwillingtoputupwiththeairlinetraveltodothat!
WILEYENDUSERLICENSEAGREEMENTGotowww.wiley.com/go/eulatoaccessWiley'sebookEULA.