table of contentsindex-of.co.uk/big-data-technologies/big data mba... · figure 3.6 completed...

372

Upload: others

Post on 02-Feb-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

TableofContentsIntroduction

OverviewoftheBookandTechnology

HowThisBookIsOrganized

WhoShouldReadThisBook

ToolsYouWillNeed

What'sontheWebsite

WhatThisMeansforYou

PartI:BusinessPotentialofBigData

Chapter1:TheBigDataBusinessMandate

BigDataMBAIntroduction

FocusBigDataonDrivingCompetitiveDifferentiation

CriticalImportanceof“ThinkingDifferently”

Summary

HomeworkAssignment

Notes

Chapter2:BigDataBusinessModelMaturityIndex

IntroducingtheBigDataBusinessModelMaturityIndex

BigDataBusinessModelMaturityIndexLessonsLearned

Summary

HomeworkAssignment

Chapter3:TheBigDataStrategyDocument

EstablishingCommonBusinessTerminology

IntroducingtheBigDataStrategyDocument

IntroducingthePrioritizationMatrix

UsingtheBigDataStrategyDocumenttoWintheWorldSeries

Summary

HomeworkAssignment

Notes

Chapter4:TheImportanceoftheUserExperience

TheUnintelligentUserExperience

ConsumerCaseStudy:ImproveCustomerEngagement

BusinessCaseStudy:EnableFrontlineEmployees

B2BCaseStudy:MaketheChannelMoreEffective

Summary

HomeworkAssignment

PartII:DataScience

Chapter5:DifferencesBetweenBusinessIntelligenceandDataScience

WhatIsDataScience?

TheAnalystCharacteristicsAreDifferent

TheAnalyticApproachesAreDifferent

TheDataModelsAreDifferent

TheViewoftheBusinessIsDifferent

Summary

HomeworkAssignment

Notes

Chapter6:DataScience101

DataScienceCaseStudySetup

FundamentalExploratoryAnalytics

AnalyticAlgorithmsandModels

Summary

HomeworkAssignment

Notes

Chapter7:TheDataLake

IntroductiontotheDataLake

CharacteristicsofaBusiness-ReadyDataLake

UsingtheDataLaketoCrosstheAnalyticsChasm

ModernizeYourDataandAnalyticsEnvironment

AnalyticsHubandSpokeAnalyticsArchitecture

EarlyLearnings

WhatDoestheFutureHold?

Summary

HomeworkAssignment

Notes

PartIII:DataScienceforBusinessStakeholders

Chapter8:ThinkingLikeaDataScientist

TheProcessofThinkingLikeaDataScientist

Summary

HomeworkAssignment

Notes

Chapter9:“By”AnalysisTechnique

“By”AnalysisIntroduction

“By”AnalysisExercise

FootLockerUseCase“By”Analysis

Summary

HomeworkAssignment

Notes

Chapter10:ScoreDevelopmentTechnique

DefinitionofaScore

FICOScoreExample

OtherIndustryScoreExamples

LeBronJamesExerciseContinued

FootLockerExampleContinued

Summary

HomeworkAssignment

Notes

Chapter11:MonetizationExercise

FitnessTrackerMonetizationExample

Summary

HomeworkAssignment

Notes

Chapter12:MetamorphosisExercise

BusinessMetamorphosisReview

BusinessMetamorphosisExercise

BusinessMetamorphosisinHealthCare

Summary

HomeworkAssignment

Notes

PartIV:BuildingCross-OrganizationalSupport

Chapter13:PowerofEnvisioning

Envisioning:FuelingCreativeThinking

ThePrioritizationMatrix

Summary

HomeworkAssignment

Notes

Chapter14:OrganizationalRamifications

ChiefDataMonetizationOfficer

Privacy,Trust,andDecisionGovernance

UnleashingOrganizationalCreativity

Summary

HomeworkAssignment

Notes

Chapter15:Stories

CustomerandEmployeeAnalytics

ProductandDeviceAnalytics

NetworkandOperationalAnalytics

CharacteristicsofaGoodBusinessStory

Summary

HomeworkAssignment

Notes

EndUserLicenseAgreement

EndUserLicenseAgreement

ListofIllustrationsChapter1:TheBigDataBusinessMandate

Figure1.1BigDataBusinessModelMaturityIndex

Figure1.2Moderndata/analyticsenvironment

Chapter2:BigDataBusinessModelMaturityIndex

Figure2.1BigDataBusinessModelMaturityIndex

Figure2.2Crossingtheanalyticschasm

Figure2.3Packagingandsellingaudienceinsights

Figure2.4Optimizeinternalprocesses

Figure2.5Createnewmonetizationopportunities

Chapter3:TheBigDataStrategyDocument

Figure3.1Bigdatastrategydecompositionprocess

Figure3.2Bigdatastrategydocument

Figure3.3Chipotle's2012lettertotheshareholders

Figure3.4Chipotle's“increasesamestoresales”businessinitiative

Figure3.5Chipotlekeybusinessentitiesanddecisions

Figure3.6CompletedChipotlebigdatastrategydocument

Figure3.7BusinessvalueofpotentialChipotledatasources

Figure3.8ImplementationfeasibilityofpotentialChipotledatasources

Figure3.9Chipotleprioritizationofusecases

Figure3.10SanFranciscoGiantsbigdatastrategydocument

Figure3.11Chipotle'ssamestoresalesresults

Chapter4:TheImportanceoftheUserExperience

Figure4.1Originalsubscribere-mail

Figure4.2Improvedsubscribere-mail

Figure4.3Actionablesubscribere-mail

Figure4.4Apprecommendations

Figure4.5TraditionalBusinessIntelligencedashboard

Figure4.6Actionablestoremanagerdashboard

Figure4.7Storemanageraccept/rejectrecommendations

Figure4.8Competitiveanalysisusecase

Figure4.9Localeventsusecase

Figure4.10Localweatherusecase

Figure4.11Financialadvisordashboard

Figure4.12Clientpersonalinformation

Figure4.13Clientfinancialinformation

Figure4.14Clientfinancialgoals

Figure4.15Financialcontributionsrecommendations

Figure4.16Spendanalysisandrecommendations

Figure4.17Assetallocationrecommendations

Figure4.18Otherinvestmentrecommendations

Chapter5:DifferencesBetweenBusinessIntelligenceandDataScience

Figure5.1SchmarzoTDWIkeynote,August2008

Figure5.2OaklandA'sversusNewYorkYankeescostperwin

Figure5.3BusinessIntelligenceversusdatascience

Figure5.4CRISP:CrossIndustryStandardProcessforDataMining

Figure5.5BusinessIntelligenceengagementprocess

Figure5.6TypicalBItoolgraphicoptions

Figure5.7Datascientistengagementprocess

Figure5.8Measuringgoodnessoffit

Figure5.9Dimensionalmodel(starschema)

Figure5.10UsingflatfilestoeliminateorreducejoinsonHadoop

Figure5.11Samplecustomeranalyticprofile

Figure5.12Improvecustomerretentionexample

Chapter6:DataScience101

Figure6.1Basictrendanalysis

Figure6.2Compoundtrendanalysis

Figure6.3Trendlineanalysis

Figure6.4Boxplotanalysis

Figure6.5Geographical(spatial)trendanalysis

Figure6.6Pairsplotanalysis

Figure6.7Timeseriesdecompositionanalysis

Figure6.8Clusteranalysis

Figure6.9Normalcurveequivalentanalysis

Figure6.10Normalcurveequivalentsellerpricinganalysisexample

Figure6.11Associationanalysis

Figure6.12Convertingassociationrulesintosegments

Figure6.13Graphanalysis

Figure6.14Textmininganalysis

Figure6.15Sentimentanalysis

Figure6.16Traversepatternanalysis

Figure6.17Decisiontreeclassifieranalysis

Figure6.18Cohortsanalysis

Chapter7:TheDataLake

Figure7.1Characteristicsofadatalake

Figure7.2Theanalyticsdilemma

Figure7.3Thedatalakelineofdemarcation

Figure7.4CreateaHadoop-baseddatalake

Figure7.5Createananalyticsandbox

Figure7.6MoveETLtothedatalake

Figure7.7HubandSpokeanalyticsarchitecture

Figure7.8Datascienceengagementprocess

Figure7.9Whatdoesthefuturehold?

Figure7.10EMCFederationBusinessDataLake

Chapter8:ThinkingLikeaDataScientist

Figure8.1FootLocker'skeybusinessinitiatives

Figure8.2ExamplesofFootLocker'sin-storemerchandising

Figure8.3FootLocker'sstoremanagerpersona

Figure8.4FootLocker'sstrategicnounsorkeybusinessentities

Figure8.5Thinkinglikeadatascientistdecompositionprocess

Figure8.6Recommendationsworksheettemplate

Figure8.7FootLocker'srecommendationsworksheet

Figure8.8FootLocker'sstoremanageractionabledashboard

Figure8.9Thinkinglikeadatascientistdecompositionprocess

Chapter9:“By”AnalysisTechnique

Figure9.1Identifyingmetricsthatmaybebetterpredictorsofperformance

Figure9.2NBAshootingeffectiveness

Figure9.3LeBronJames'sshootingeffectiveness

Chapter10:ScoreDevelopmentTechnique

Figure10.1FICOscoreconsiderations

Figure10.2FICOscoredecisionrange

Figure10.3Recommendationsworksheet

Figure10.4Updatedrecommendationsworksheet

Figure10.5Completedrecommendationsworksheet

Figure10.6PotentialFootLockercustomerscores

Figure10.7FootLockerrecommendationsworksheet

Figure10.8CLTVbasedonsales

Figure10.9MorepredictiveCLTVscore

Chapter11:MonetizationExercise

Figure11.1“Adayinthelife”customerpersona

Figure11.2Fitnesstrackerprioritization

Figure11.3Monetizationroadmap

Chapter12:MetamorphosisExercise

Figure12.1BigDataBusinessModelMaturityIndex

Figure12.2Patientactionableanalyticprofile

Chapter13:PowerofEnvisioning

Figure13.1BigDataVisionWorkshopprocessandtimeline

Figure13.2BigDataVisionWorkshopillustrativeanalytics

Figure13.3BigDataVisionWorkshopuserexperiencemock-up

Figure13.4PrioritizeHealthcareSystems'susecases

Figure13.5Prioritizationmatrixtemplate

Figure13.6Prioritizationmatrixprocess

Chapter14:OrganizationalRamifications

Figure14.1CDMOorganizationalstructure

Figure14.2Empowermentcycle

ListofTablesChapter1:TheBigDataBusinessMandate

Table1.1ExploitingTechnologyInnovationtoCreateEconomic-DrivenBusinessOpportunities

Table1.2EvolutionoftheBusinessQuestions

Chapter2:BigDataBusinessModelMaturityIndex

Table2.1BigDataBusinessModelMaturityIndexSummary

Chapter3:TheBigDataStrategyDocument

Table3.1MappingChipotleUseCasestoAnalyticModels

Chapter5:DifferencesBetweenBusinessIntelligenceandDataScience

Table5.1BIAnalystVersusDataScientistCharacteristics

Chapter6:DataScience101

Table6.12014–2015TopNBARPMRankings

Table6.2CaseStudySummary

Chapter7:TheDataLake

Table7.1DataLakeDataTypes

Chapter8:ThinkingLikeaDataScientist

Table8.1EvolutionofFootLocker'sBusinessQuestions

Chapter9:“By”AnalysisTechnique

Table9.1LeBronJames'sShootingPercentages

Chapter10:ScoreDevelopmentTechnique

Table10.1PotentialScoresforOtherIndustries

Chapter11:MonetizationExercise

Table11.1PotentialFitnessTrackerRecommendations

Table11.2RecommendationDataRequirements

Table11.3RecommendationsValueVersusFeasibilityAssessment

Chapter12:MetamorphosisExercise

Table12.1DecisionstoAnalyticsMapping

Table12.2Data-to-AnalyticsMapping

IntroductionIneverplannedonwritingasecondbook.Heck,Ithoughtwritingonebookwasenoughtocheckthisitemoffmybucketlist.ButsomuchhaschangedsinceIwrotemyfirstbookthatIfeltcompelledtocontinuetoexplorethisonce-in-a-lifetimeopportunityfororganizationstoleveragedataandanalyticstotransformtheirbusinessmodels.AndI'mnotjusttalkingthe“makememoremoney”partofbusinesses.Bigdatacandrivesignificant“improvethequalityoflife”valueinareassuchaseducation,poverty,parolerehabilitation,healthcare,safety,andcrimereduction.

MyfirstbooktargetedtheInformationTechnology(IT)audience.However,Isoonrealizedthatthebiggestwinnerinthisbigdatalandgrabwasthebusiness.Sothisbooktargetsthebusinessaudienceandisbasedonafewkeypremises:

Organizationsdonotneedabigdatastrategyasmuchastheyneedabusinessstrategythatincorporatesbigdata.

ThedayswhenbusinessleaderscouldturnanalyticsovertoITareover;tomorrow'sbusinessleadersmustembraceanalyticsasabusinessdisciplineinthesameveinasaccounting,finance,managementscience,andmarketing.

Thekeytodatamonetizationandbusinesstransformationliesinunleashingtheorganization'screativethinking;wehavegottogetthebusinessusersto“thinklikeadatascientist.”

Finally,thebusinesspotentialofbigdataisonlylimitedbythecreativethinkingofthebusinessusers.

I'vealsohadtheopportunitytoteach“BigDataMBA”attheUniversityofSanFrancisco(USF)SchoolofManagementsinceIwrotethefirstbook.IdidwellenoughthatUSFmademeitsfirstSchoolofManagementFellow.WhatIexperiencedwhileworkingwiththeseoutstandingandcreativestudentsandProfessorMouwafacSidaouicompelledmetoundertakethechallengeofwritingthissecondbook,targetingthosestudentsandtomorrow'sbusinessleaders.

OneofthetopicsthatIhopejumpsoutinthebookisthepowerofdatascience.Therehavebeenmanybookswrittenaboutdatasciencewiththegoalofhelpingpeopletobecomedatascientists.ButIfeltthatsomethingwasmissing—thatinsteadoftryingtocreateaworldofdatascientists,weneededtohelptomorrow'sbusinessleadersthinklikedatascientists.

Sothat'sthefocusofthisbook—tohelptomorrow'sbusinessleadersintegratedataandanalyticsintotheirbusinessmodelsandtoleadtheculturaltransformationbyunleashingtheorganization'screativejuicesbyhelpingthebusinessto“thinklikeadatascientist.”

OverviewoftheBookandTechnologyThedayswhenbusinessstakeholderscouldrelinquishcontrolofdataandanalyticstoITareover.Thebusinessstakeholdersmustbefrontandcenterinchampioningandmonetizingtheorganization'sdatacollectionandanalysisefforts.Businessleadersneedtounderstandwhereandhowtoleveragebigdata,exploitingthecollisionofnewsourcesofcustomer,product,andoperationaldatacoupledwithdatasciencetooptimizekeybusinessprocesses,uncovernewmonetizationopportunities,andcreatenewsourcesofcompetitivedifferentiation.Andwhileit'snotrealistictoconvertyourbusinessusersintodatascientists,it'scriticalthatweteachthebusinessuserstothinklikedatascientistssotheycancollaboratewithITandthedatascientistsonusecaseidentification,requirementsdefinition,businessvaluation,andultimatelyanalyticsoperationalization.

Thisbookprovidesabusiness-hardenedframeworkwithsupportingmethodologyandhands-onexercisesthatnotonlywillhelpbusinessuserstoidentifywhereandhowtoleveragebigdataforbusinessadvantagebutwillalsoprovideguidelinesforoperationalizingtheanalytics,settinguptherightorganizationalstructure,anddrivingtheanalyticinsightsthroughouttheorganization'suserexperiencetobothcustomersandfrontlineemployees.

HowThisBookIsOrganizedThebookisorganizedintofoursections:

PartI:BusinessPotentialofBigData.PartIincludesChapters1through4andsetsthebusiness-centricfoundationforthebook.HereiswhereIintroducetheBigDataBusinessModelMaturityIndexandframethebigdatadiscussionaroundtheperspectivethat“organizationsdonotneedabigdatastrategyasmuchastheyneedabusinessstrategythatincorporatesbigdata.”

PartII:DataScience.PartIIincludesChapters5through7andcoverstheprinciplebehinddatascience.ThesechaptersintroducesomedatasciencebasicsandexplorethecomplementarynatureofBusinessIntelligenceanddatascienceandhowthesetwodisciplinesarebothcomplementaryanddifferentintheproblemsthattheyaddress.

PartIII:DataScienceforBusinessStakeholders.PartIIIincludesChapters8through12andseekstoteachthebusinessusersandbusinessleadersto“thinklikeadatascientist.”Thispartintroducesamethodologyandseveralexercisestoreinforcethedatasciencethinkingandapproach.Ithasalotofhands-onwork.

PartIV:BuildingCross-OrganizationalSupport.PartIVincludesChapters13through15anddiscussesorganizationalchallenges.Thispartcoversenvisioning,whichmayverywellbethemostimportanttopicinthebookasthebusinesspotentialofbigdataisonlylimitedbythecreativethinkingofthebusinessusers.

Herearesomemoredetailsoneachofthechaptersinthebook:

Chapter1:TheBigDataBusinessMandate.Thischapterframesthebigdatadiscussiononhowbigdataismoreaboutbusinesstransformationandtheeconomicsofbigdatathanitisabouttechnology.

Chapter2:BigDataBusinessModelMaturityIndex.ThischaptercoverstheBigDataBusinessModelMaturityIndex(BDBM),whichisthefoundationfortheentirebook.TakethetimetounderstandeachofthefivestagesoftheBDBMandhowtheBDBMprovidesaroadmapformeasuringhoweffectiveyourorganizationisatintegratingdataandanalyticsintoyourbusinessmodels.

Chapter3:TheBigDataStrategyDocument.ThischapterintroducesaCXOleveldocumentandprocessforhelpingorganizationsidentifywhereandhowtostarttheirbigdatajourneysfromabusinessperspective.

Chapter4:TheImportanceoftheUserExperience.Thisisoneofmyfavoritetopics.ThischapterchallengestraditionalBusinessIntelligencereportinganddashboardconceptsbyintroducingamoresimplebutdirectapproachfordeliveringactionableinsightstoyourkeybusinessstakeholders—

frontlineemployees,channelpartners,andendcustomers.

Chapter5:DifferencesBetweenBusinessIntelligenceandDataScience.ThischapterexploresthedifferentworldsofBusinessIntelligenceanddatascienceandhighlightsboththedifferencesandthecomplementarynatureofeach.

Chapter6:DataScience101.Thischapter(myfavorite)reviews14differentanalytictechniquesthatmydatascienceteamscommonlyuseandinwhatbusinesssituationsyoushouldcontemplateusingthem.ItisaccompaniedbyamarvelousfictitiouscasestudyusingFairy-TaleThemeParks(thanksJen!).

Chapter7:TheDataLake.Thischapterintroducestheconceptofadatalake,explaininghowthedatalakefreesupexpensivedatawarehouseresourcesandunleashesthecreative,fail-fastnatureofthedatascienceteams.

Chapter8:ThinkingLikeaDataScientist.Theheartofthisbook,thischaptercoverstheeight-step“thinkinglikeadatascientist”process.Thischapterisprettydeep,soplanonhavingapenandpaper(andprobablyaneraseraswell)withyouasyoureadthischapter.

Chapter9:“By”AnalysisTechnique.Thischapterdoesadeepdiveintooneoftheimportantconceptsin“thinkinglikeadatascientist”—the“By”analysistechnique.

Chapter10:ScoreDevelopmentTechnique.Thischapterintroduceshowscorescandrivecollaborationbetweenthebusinessusersanddatascientisttocreateactionablescoresthatguidetheorganization'skeybusinessdecisions.

Chapter11:MonetizationExercise.Thischapterprovidesatechniquefororganizationsthathaveasubstantialamountofcustomer,product,andoperationaldatabutdonotknowhowtomonetizethatdata.Thischaptercanbeveryeye-opening!

Chapter12:MetamorphosisExercise.Thischapterisafun,out-of-the-boxexercisethatexploresthepotentialdataandanalyticimpactsforanorganizationasitcontemplatestheBusinessMetamorphosisphaseoftheBigDataBusinessModelMaturityIndex.

Chapter13:PowerofEnvisioning.Thischapterstartstoaddresssomeoftheorganizationalandculturalchallengesyoumayface.Inparticular,Chapter13introducessomeenvisioningtechniquestohelpunleashyourorganization'screativethinking.

Chapter14:OrganizationalRamifications.Thischaptergoesintomoredetailabouttheorganizationalramificationsofbigdata,especiallytheroleoftheChiefData(Monetization)Officer.

Chapter15:Stories.Thebookwrapsupwithsomecasestudies,butnotyour

traditionalcasestudies.Instead,Chapter15presentsatechniqueforcreating“stories”thatarerelevanttoyourorganization.Anyonecanfindcasestudies,butnotjustanyonecancreateastory.

WhoShouldReadThisBookThisbookistargetedtowardbusinessusersandbusinessmanagement.IwrotethisbooksothatIcoulduseitinteachingmyBigDataMBAclass,soincludedallofthehands-onexercisesandtemplatesthatmystudentswouldneedtosuccessfullyearntheirBigDataMBAgraduationcertificate.

Ithinkfolkswouldbenefitbyalsoreadingmyfirstbook,BigData:UnderstandingHowDataPowersBigBusiness,whichistargetedtowardtheITaudience.Thereissomeoverlapbetweenthetwobooks(10to15percent),butthefirstbooksetsthestageandintroducesconceptsthatareexploredinmoredetailinthisbook.

ToolsYouWillNeedNospecialtoolsarerequiredotherthanapencil,aneraser,severalsheetsofpaper,andyourcreativity.Grabachaitealatte,someChipotle,andenjoy!

What'sontheWebsiteYoucandownloadthe“ThinkingLikeaDataScientist”workbookfromthebook'swebsiteatwww.wiley.com/go/bigdatamba.Andoh,theremightbeanothersurprisethereaswell!Hehehe!

WhatThisMeansforYouAsstudentsfrommyclassatUSFhavetoldme,thismaterialallowsthemtotakeaproblemorchallengeanduseawell-thought-outprocesstodrivecross-organizationalcollaborationtocomeupwithideastheycanturnintoactionsusingdataandanalytics.Whatemployerwouldn'twantafutureleaderwhoknowshowtodothat?

PartIBusinessPotentialofBigDataChapters1through4setthefoundationfordrivingbusinessstrategieswithdatascience.Inparticular,theBigDataBusinessModelMaturityIndexhighlightstherealmofwhat'spossiblefromabusinesspotentialperspectivebyprovidingaroadmapthatmeasurestheeffectivenessofyourorganizationtoleveragedataandanalyticstopoweryourbusinessmodels.

InThisPart

Chapter1:TheBigDataBusinessMandate

Chapter2:BigDataBusinessModelMaturityIndex

Chapter3:TheBigDataStrategyDocument

Chapter4:TheImportanceoftheUserExperience

Chapter1TheBigDataBusinessMandateHavingtroublegettingyourseniormanagementteamtounderstandthebusinesspotentialofbigdata?Can'tgetyourmanagementleadershiptoconsiderbigdatatobesomethingotherthananITscienceexperiment?Areyourline-of-businessleadersunwillingtocommitthemselvestounderstandinghowdataandanalyticscanpowertheirtopinitiatives?

Ifso,thenthis“BigDataSeniorExecutiveCarePackage”isforyou!

Andforalimitedtime,yougetanunlimitedlicensetosharethiscarepackagewithasmanyseniorexecutivesasyoudesire.ButyoumustactNOW!Becomethelifeofthecompanypartieswithyourextensiveknowledgeofhownewcustomer,product,andoperationalinsightscanguideyourorganization'svaluecreationprocesses.Andmaybe,justmaybe,getapromotionintheprocess!!

NOTE

Allcompanymaterialreferencedinthisbookcomesfrompublicsourcesandisreferencedaccordingly.

BigDataMBAIntroductionThedayswhenbusinessusersandbusinessmanagementcanrelinquishcontrolofdataandanalyticstoITareover,oratleastfororganizationsthatwanttosurvivebeyondtheimmediateterm.Thebigdatadiscussionnowneedstofocusonhoworganizationscancouplenewsourcesofcustomer,product,andoperationaldatawithadvancedanalytics(datascience)topowertheirkeybusinessprocessesandelevatetheirbusinessmodels.Organizationsneedtounderstandthattheydonotneedabigdatastrategyasmuchastheyneedabusinessstrategythatincorporatesbigdata.

TheBigDataMBAchallengesthethinkingthatdataandanalyticsareancillaryora“bolton”tothebusiness;thatdataandanalyticsaresomeoneelse'sproblem.Inagrowingnumberofleadingorganizations,dataandanalyticsarecriticaltobusinesssuccessandlong-termsurvival.Businessleadersandbusinessusersreadingthisbookwilllearnwhytheymusttakeresponsibilityforidentifyingwhereandhowtheycanapplydataandanalyticstotheirbusinesses—otherwisetheyputtheirbusinessesatriskofbeingmadeobsoletebymorenimble,data-drivencompetitors.

TheBigDataMBAintroducesanddescribesconcepts,techniques,methodologies,andhand-onexercisestoguideyouasyouseektoaddressthebigdatabusinessmandate.Thebookprovideshands-onexercisesandhomeworkassignmentstomaketheseconceptsandtechniquescometolifeforyourorganization.Itprovidesrecommendationsandactionsthatenableyourorganizationtostarttoday.Andintheprocess,BigDataMBAteachesyouto“thinklikeadatascientist.”

TheForresterstudy“ResetonBigData”(Hopkinsetal.,2014)1highlightsthecriticalroleofabusiness-centricfocusinthebigdatadiscussion.Thestudyarguesthattechnology-focusedexecutiveswithinabusinesswillthinkofbigdataasatechnologyandfailtoconveyitsimportancetotheboardroom.

Businessesofallsizesmustreframethebigdataconversationwiththebusinessleadersintheboardroom.Thecriticalanddifficultbigdataquestionthatbusinessleadersmustaddressis:

Howeffectiveisourorganizationatintegratingdataandanalyticsintoourbusinessmodels?

Beforebusinessleaderscanbeginthesediscussions,organizationsmustunderstandtheircurrentlevelofbigdatamaturity.Chapter2discussesindetailthe“BigDataBusinessModelMaturityIndex”(seeFigure1.1).TheBigDataBusinessModelMaturityIndexisameasureofhoweffectiveanorganizationisatintegratingdataandanalyticstopowertheirbusinessmodel.

Figure1.1BigDataBusinessModelMaturityIndex

TheBigDataBusinessModelMaturityIndexprovidesaroadmapforhoworganizationscanintegratedataandanalyticsintotheirbusinessmodels.TheBigDataBusinessModelMaturityIndexiscomposedofthefollowingfivephases:

Phase1:BusinessMonitoring.IntheBusinessMonitoringphase,organizationsareleveragingdatawarehousingandBusinessIntelligencetomonitortheorganization'sperformance.

Phase2:BusinessInsights.TheBusinessInsightsphaseisaboutleveragingpredictiveanalyticstouncovercustomer,product,andoperationalinsightsburiedinthegrowingwealthofinternalandexternaldatasources.Inthisphase,organizationsaggressivelyexpandtheirdataacquisitioneffortsbycouplingalloftheirdetailedtransactionalandoperationaldatawithinternaldatasuchasconsumercomments,e-mailconversations,andtechniciannotes,aswellasexternalandpubliclyavailabledatasuchassocialmedia,weather,traffic,economic,demographics,homevalues,andlocaleventsdata.

Phase3:BusinessOptimization.IntheBusinessOptimizationphase,organizationsapplyprescriptiveanalyticstothecustomer,product,andoperationalinsightsuncoveredintheBusinessInsightsphasetodeliveractionableinsightsorrecommendationstofrontlineemployees,businessmanagers,andchannelpartners,aswellascustomers.ThegoaloftheBusinessOptimizationphaseistoenableemployees,partners,andcustomerstooptimizetheirkeydecisions.

Phase4:DataMonetization.IntheDataMonetizationphase,organizationsleveragethecustomer,product,andoperationalinsightstocreatenewsourcesofrevenue.Thiscouldincludesellingdata—orinsights—intonewmarkets(acellularphoneprovidersellingcustomerbehavioraldatatoadvertisers),integratinganalyticsintoproductsandservicestocreate“smart”products,orre-packagingcustomer,product,andoperationalinsightstocreatenewproductsandservices,toenternewmarkets,and/ortoreachnewaudiences.

Phase5:BusinessMetamorphosis.TheholygrailoftheBigDataBusinessModelMaturityIndexiswhenanorganizationtransitionsitsbusinessmodelfromsellingproductstoselling“business-as-a-service.”ThinkGEselling“thrust”insteadofjetengines.ThinkJohnDeereselling“farmingoptimization”insteadoffarmingequipment.ThinkBoeingselling“airmiles”insteadofairplanes.Andintheprocess,theseorganizationswillcreateaplatformenablingthird-partydeveloperstobuildandmarketsolutionsontopoftheorganization'sbusiness-as-a-servicebusinessmodel.

Ultimately,bigdataonlymattersifithelpsorganizationsmakemoremoneyandimproveoperationaleffectiveness.Examplesincludeincreasingcustomeracquisition,reducingcustomerchurn,reducingoperationalandmaintenancecosts,optimizingpricesandyield,reducingrisksanderrors,improvingcompliance,improvingthecustomerexperience,andmore.

Nomatterthesizeoftheorganization,organizationsdon'tneedabigdatastrategyasmuchastheyneedabusinessstrategythatincorporatesbigdata.

FocusBigDataonDrivingCompetitiveDifferentiationI'malwaysconfusedabouthoworganizationsstruggletodifferentiatebetweentechnologyinvestmentsthatdrivecompetitiveparityandthosetechnologyinvestmentsthatcreateuniqueandcompellingcompetitivedifferentiation.Let'sexplorethisdifferenceinabitmoredetail.

Competitiveparityisachievingsimilarorsameoperationalcapabilitiesasthoseofyourcompetitors.Itinvolvesleveragingindustrybestpracticesandpre-packagedsoftwaretocreateabaselinethat,atworst,isequaltotheoperationalcapabilitiesacrossyourindustry.OrganizationsendupachievingcompetitiveparitywhentheybuyfoundationalandundifferentiatedcapabilitiesfromenterprisesoftwarepackagessuchasEnterpriseResourcePlanning(ERP),CustomerRelationshipManagement(CRM),andSalesForceAutomation(SFA).

Competitivedifferentiationisachievedwhenanorganizationleveragespeople,processes,andtechnologytocreateapplications,programs,processes,etc.,thatdifferentiateitsproductsandservicesfromthoseofitscompetitorsinwaysthatadduniquevaluefortheendcustomerandcreatecompetitivedifferentiationinthemarketplace.

Leadingorganizationsshouldseekto“buy”foundationalandundifferentiatedcapabilitiesbut“build”whatisdifferentiatedandvalue-addedfortheircustomers.Butsometimesorganizationsgetconfusedbetweenthetwo.Let'scallthistheERPeffect.ERPsoftwarepackagesweresoldasasoftwaresolutionthatwouldmakeeveryonemoreprofitablebydeliveringoperationalexcellence.Butwheneveryoneisrunningthesameapplication,what'sthesourceofthecompetitivedifferentiation?

Analytics,ontheotherhand,enablesorganizationstouniquelyoptimizetheirkeybusinessprocesses,driveamoreengagingcustomerexperience,anduncovernewmonetizationopportunitieswithuniqueinsightsthattheygatherabouttheircustomers,products,andoperations.

LeveragingTechnologytoPowerCompetitiveDifferentiationWhilemostorganizationshaveinvestedheavilyinERP-typeoperationalsystems,farfewerhavebeensuccessfulinleveragingdataandanalyticstobuildstrategicapplicationsthatprovideuniquevaluetotheircustomersandcreatecompetitivedifferentiationinthemarketplace.Herearesomeexamplesoforganizationsthathaveinvestedinbuildingdifferentiatedcapabilitiesbyleveragingnewsourcesofdataandanalytics:

Google:PageRankandAdServing

Yahoo:BehavioralTargetingandRetargeting

Facebook:AdServingandNewsFeed

Apple:iTunes

Netflix:MovieRecommendations

Amazon:“CustomersWhoBoughtThisItem,”1-Clickordering,andSupplyChain&Logistics

Walmart:DemandForecasting,SupplyChainLogistics,andRetailLink

Procter&Gamble:BrandandCategoryManagement

FederalExpress:CriticalInventoryLogistics

AmericanExpressandVisa:FraudDetection

GE:AssetOptimizationandOperationsOptimization(Predix)

Noneoftheseorganizationsboughtthesestrategic,business-differentiatingapplicationsofftheshelf.Theyunderstoodthatitwasnecessarytoprovidedifferentiatedvaluetotheirinternalandexternalcustomers,andtheyleverageddataandanalyticstobuildapplicationsthatdeliveredcompetitivedifferentiation.

HistoryLessononEconomic-DrivenBusinessTransformationMorethananythingelse,thedrivingforcebehindbigdataistheeconomicsofbigdata—it's20to50timescheapertostore,manage,andanalyzedatathanitistousetraditionaldatawarehousingtechnologies.This20to50timeseconomicimpactiscourtesyofcommodityhardware,opensourcesoftware,anexplosionofnewopensourcetoolscomingoutofacademia,andreadyaccesstofreeonlinetrainingontopicssuchasbigdataarchitecturesanddatascience.Aclientofmineintheinsuranceindustrycalculateda50Xeconomicimpact.Anotherclientinthehealthcareindustrycalculateda49Xeconomicimpact(theyneedtolookhardertofindthatmissing1X).

Historyhasshownthatthemostsignificanttechnologyinnovationsareonesthatdriveeconomicchange.Fromtheprintingpresstointerchangeablepartstothemicroprocessor,thesetechnologyinnovationshaveprovidedanunprecedentedopportunityforthemoreagileandmorenimbleorganizationstodisruptexistingmarketsandestablishnewvaluecreationprocesses.

Bigdatapossessesthatsameeconomicpotentialwhetheritbetocreatesmartcities,improvethequalityofmedicalcare,improveeducationaleffectiveness,reducepoverty,improvesafety,reducerisks,orevencurecancer.Andformanyorganizations,thefirstquestionthatneedstobeaskedaboutbigdatais:

Howeffectiveismyorganizationatleveragingnewsourcesofdataandadvancedanalyticstouncovernewcustomer,product,andoperationalinsightsthatcanbeusedtodifferentiateourcustomerengagement,optimizekeybusinessprocesses,anduncovernewmonetizationopportunities?

Bigdataisnothingnew,especiallyifyouviewitfromtheproperperspective.Whilethepopularbigdatadiscussionsarearound“disruptive”technology

innovationslikeHadoopandSpark,therealdiscussionshouldbeabouttheeconomicimpactofbigdata.Newtechnologiesdon'tdisruptbusinessmodels;it'swhatorganizationsdowiththesenewtechnologiesthatdisruptsbusinessmodelsandenablesnewones.Let'sreviewanexampleofonesucheconomic-drivenbusinesstransformation:thesteamengine.

Thesteamengineenabledurbanization,industrialization,andtheconqueringofnewterritories.Itliterallyshrankdistanceandtimebyreducingthetimerequiredtomovepeopleandgoodsfromonesideofacontinenttotheother.Thesteamengineenabledpeopletoleavelow-payingagriculturaljobsandmoveintocitiesforhigher-payingmanufacturingandclericaljobsthatledtoahigherstandardofliving.

Forexample,citiessuchasLondonshotupintermsofpopulation.In1801,beforetheadventofGeorgeStephenson'sRocketsteamengine,Londonhad1.1millionresidents.Aftertheinvention,thepopulationofLondonmorethandoubledto2.7millionresidentsby1851.Londontransformedthenucleusofsocietyfromsmalltight-knitcommunitieswheretextileproductionandagriculturewereprevalentintobigcitieswithavarietyofjobs.Thesteamlocomotiveprovidedquickertransportationandmorejobs,whichinturnbroughtmorepeopleintothecitiesanddrasticallychangedthejobmarket.By1861,only2.4percentofLondon'spopulationwasemployedinagriculture,while49.4percentwereinthemanufacturingortransportationbusiness.Thesteamlocomotivewasamajorturningpointinhistoryasittransformedsocietyfromlargelyruralandagriculturalintourbanandindustrial.2

Table1.1showsotherhistoricallessonsthatdemonstratehowtechnologyinnovationcreatedeconomic-drivenbusinessopportunities.

Table1.1ExploitingTechnologyInnovationtoCreateEconomic-DrivenBusinessOpportunities

TechnologyInnovation

EconomicImpact

PrintingPress Expandedliteracy(simplifiedknowledgecaptureandenabledknowledgedisseminationandtheeducationofthemasses)

InterchangeableParts

Drovethestandardizationofmanufacturingpartsandfueledtheindustrialrevolution

SteamEngine(RailroadsandSteamboats)

Sparkedurbanization(drovetransitionfromagriculturaltomanufacturing-centricsociety)

InternalCombustionEngine

Triggeredsuburbanization(enabledpersonalmobility,bothgeographicallyandsocially)

InterstateHighwaySystem

Foundationforinterstatecommerce(enabledregionalspecializationandwealthcreation)

Telephone Democratizedcommunications(byeliminatingdistanceanddelaysascommunicationsissues)

Computers Automatedcommonprocesses(therebyfreeinghumansformorecreativeengagement)

Internet Guttedcostofcommerceandknowledgesharing(enabledremoteworkforceandinternationalcompetition)

Thisbringsusbacktobigdata.Alloftheseinnovationssharethesamelesson:itwasn'tthetechnologythatwasdisruptive;itwashoworganizationsleveragedthetechnologytodisruptexistingbusinessmodelsandenablednewones.

CriticalImportanceof“ThinkingDifferently”Organizationshavebeentaughtbytechnologyvendors,press,andanalyststothinkfaster,cheaper,andsmaller,buttheyhavenotbeentaughtto“thinkdifferently.”Theinabilitytothinkdifferentlyiscausingorganizationalalignmentandbusinessadoptionproblemswithrespecttothebigdataopportunity.Organizationsmustthrowoutmuchoftheirconventionaldata,analytics,andorganizationalthinkinginordertogetthemaximumvalueoutofbigdata.Let'sintroducesomekeyareasforthinkingdifferentlythatwillbecoveredthroughoutthisbook.

Don'tThinkBigDataTechnology,ThinkBusinessTransformationManyorganizationsareinfatuatedwiththetechnicalinnovationssurroundingbigdataandthethreeVsofdata:volume,variety,andvelocity.Butstartingwithatechnologyfocuscanquicklyturnyourbigdatainitiativeintoascienceexperiment.Youdon'twanttobeasolutioninsearchofaproblem.

Instead,focusonthefourMsofbigdata:MakeMeMoreMoney(orifyouareanon-profitorganization,maybethat'sMakeMeMoreEfficient).Startyourbigdatainitiativewithabusiness-firstapproach.Identifyandfocusonaddressingtheorganization'skeybusinessinitiatives,thatis,whattheorganizationistryingtoaccomplishfromabusinessperspectiveoverthenext9to12months(e.g.,reducesupplychaincosts,improvesupplierqualityandreliability,reducehospital-acquiredinfections,improvestudentperformance).Breakdownordecomposethisbusinessinitiativeintothesupportingdecisions,questions,metrics,data,analytics,andtechnologynecessarytosupportthetargetedbusinessinitiative.

CROSS-REFERENCE

ThisbookbeginsbycoveringtheBigDataBusinessModelMaturityIndexinChapter2.TheBigDataBusinessModelMaturityIndexhelpsorganizationsaddressthekeyquestion:

Howeffectiveisourorganizationatleveragingdataandanalyticstopowerourkeybusinessprocessesanduncovernewmonetizationopportunities?

Thematurityindexprovidesaguideorroadmapwithspecificrecommendationstohelporganizationsadvanceupthematurityindex.Chapter3introducesthebigdatastrategydocument.Thebigdatastrategydocumentprovidesaframeworkforhelpingorganizationsidentifywhereandhowtostarttheirbigdatajourneyfromabusinessperspective.

Don'tThinkBusinessIntelligence,ThinkDataScience

DatascienceisdifferentfromBusinessIntelligence(BI).Resisttheadvicetotrytomakethesetwodifferentdisciplinesthesame.Forexample:

BusinessIntelligencefocusesonreportingwhathappened(descriptiveanalytics).Datasciencefocusesonpredictingwhatislikelytohappen(predictiveanalytics)andthenrecommendingwhatactionstotake(prescriptiveanalytics).

BusinessIntelligenceoperateswithschemaonloadinwhichyouhavetopre-buildthedataschemabeforeyoucanloadthedatatogenerateyourBIqueriesandreports.Datasciencedealswithschemaonqueryinwhichthedatascientistscustomdesignthedataschemabasedonthehypothesistheywanttotestorthepredictionthattheywanttomake.

Organizationsthattryto“extend”theirBusinessIntelligencecapabilitiestoencompassbigdatawillfail.That'slikestatingthatyou'regoingtothemoon,thenclimbingatreeanddeclaringthatyouarecloser.Unfortunately,youcan'tgettothemoonfromthetopofatree.Datascienceisanewdisciplinethatofferscompelling,business-differentiatingcapabilities,especiallywhencoupledwithBusinessIntelligence.

CROSS-REFERENCE

Chapter5(“DifferencesBetweenBusinessIntelligenceandDataScience”)discussesthedifferencesbetweenBusinessIntelligenceanddatascienceandhowdatasciencecancomplementyourBusinessIntelligenceorganization.Chapter6(“DataScience101”)reviewsseveraldifferentanalyticalgorithmsthatyourdatascienceteammightuseanddiscussesthebusinesssituationsinwhichthedifferentalgorithmsmightbemostappropriate.

Don'tThinkDataWarehouse,ThinkDataLakeIntheworldofbigdata,HadoopandHDFSisagamechanger;itisfundamentallychangingthewayorganizationsthinkaboutstoring,managing,andanalyzingdata.AndIdon'tmeanHadoopasyetanotherdatasourceforyourdatawarehouse.I'mtalkingaboutHadoopandHDFSasthefoundationforyourdataandanalyticsenvironments—totakeadvantageofthemassivelyparallelprocessing,cheapscale-outdataarchitecturethatcanrunhundreds,thousands,oreventensofthousandsofHadoopnodes.

Wearewitnessingthedawnoftheageofthedatalake.Thedatalakeenablesorganizationstogather,manage,enrich,andanalyzemanynewsourcesofdata,whetherstructuredorunstructured.Thedatalakeenablesorganizationstotreatdataasanorganizationalassettobegatheredandnurturedversusacosttobeminimized.

Organizationsneedtotreattheirreportingenvironments(traditionalBIanddata

warehousing)andanalytics(datascience)environmentsdifferently.Thesetwoenvironmentshaveverydifferentcharacteristicsandservedifferentpurposes.ThedatalakecanmakebothoftheBIanddatascienceenvironmentsmoreagileandmoreproductive(Figure1.2).

Figure1.2Moderndata/analyticsenvironment

CROSS-REFERENCE

Chapter7(”TheDataLake“)introducestheconceptofadatalakeandtherolethedatalakeplaysinsupportingyourexistingdatawarehouseandBusinessIntelligenceinvestmentswhileprovidingthefoundationforyourdatascienceenvironment.Chapter7discusseshowthedatalakecanun-cuffyourdatascientistsfromthedatawarehousetouncoverthosevariablesandmetricsthatmightbebetterpredictorsofbusinessperformance.Italsodiscusseshowthedatalakecanfreeupexpensivedatawarehouseresources,especiallythoseresourcesassociatedwithExtract,Transform,andLoad(ETL)dataprocesses.

Don'tThink“WhatHappened,”Think“WhatWillHappen”Businessusershavebeentrainedtocontemplatebusinessquestionsthatmonitorthecurrentstateofthebusinessandtofocusonretrospectivereportingonwhathappened.BusinessusershavebecomeconditionedbytheirBIanddatawarehouseenvironmentstoonlyconsiderquestionsthatreportoncurrentbusinessperformance,suchas“HowmanywidgetsdidIselllastmonth?”and“Whatweremygrosssaleslastquarter?”

Unfortunately,thisretrospectiveviewofthebusinessdoesn'thelpwhentryingtomakedecisionsandtakeactionaboutfuturesituations.Weneedtogetbusinessusersto“thinkdifferently”aboutthetypesofquestionstheycanask.Weneedto

movethebusinessinvestigationprocessbeyondtheperformancemonitoringquestionstothepredictive(e.g.,Whatwilllikelyhappen?)andprescriptive(e.g.,WhatshouldIdo?)questionsthatorganizationsneedtoaddressinordertooptimizekeybusinessprocessesanduncovernewmonetizationopportunities(seeTable1.2).

Table1.2EvolutionoftheBusinessQuestions

WhatHappened?(Descriptive/BI)

WhatWillHappen?(PredictiveAnalytics)

WhatShouldIdo?(PrescriptiveAnalytics)

HowmanywidgetsdidIselllastmonth?

HowmanywidgetswillIsellnextmonth?

Order[5,0000]unitsofComponentZtosupportwidgetsalesfornextmonth

WhatweresalesbyzipcodeforChristmaslastyear?

WhatwillbesalesbyzipcodeoverthisChristmasseason?

Hire[Y]newsalesrepsbythesezipcodestohandleprojectedChristmassales

HowmanyofProductXwerereturnedlastmonth?

HowmanyofProductXwillbereturnednextmonth?

Setaside[$125K]infinancialreservetocoverProductXreturns

Whatwerecompanyrevenuesandprofitsforthepastquarter?

Whatareprojectedcompanyrevenuesandprofitsfornextquarter?

Sellthefollowingproductmixtoachievequarterlyrevenueandmargingoals

HowmanyemployeesdidIhirelastyear?

HowmanyemployeeswillIneedtohirenextyear?

Increasehiringpipelineby35percenttoachievehiringgoals

CROSS-REFERENCE

Chapter8(“ThinkingLikeaDataScientist”)differentiatesbetweendescriptiveanalytics,predictiveanalytics,andprescriptiveanalytics.Chapters9,10,and11thenintroduceseveraltechniquestohelpyourbusinessusersidentifythepredictive(“Whatwillhappen?”)andprescriptive(“WhatshouldIdo?”)questionsthattheyneedtomoreeffectivelydrivethebusiness.Yeah,thiswillmeanlotsofPost-itnotesandwhiteboards,myfavoritetools.

Don'tThinkHIPPO,ThinkCollaborationUnfortunately,todayitisstilltheHIPPO—theHighestPaidPerson'sOpinion—thatdeterminesmostofthebusinessdecisions.Reasonssuchas“We'vealwaysdonethingsthatway”or“Myyearsofexperiencetellme…”or“ThisiswhattheCEOwants…”arestillgivenasreasonsforwhytheHIPPOneedstodrivetheimportantbusinessdecisions.

Unfortunately,thattypeofthinkinghasledtosiloeddatafiefdoms,siloeddecisions,andanun-empoweredandfrustratedbusinessteam.Organizationsneedtothinkdifferentlyabouthowtheyempoweralloftheiremployees.Organizationsneedtofindawaytopromoteandnurturecreativethinkingandgroundbreakingideasacrossalllevelsoftheorganization.Thereisnoedictthatstatesthatthebestideasonlycomefromseniormanagement.

Thekeytobigdatasuccessisempoweringcross-functionalcollaborationandexploratorythinkingtochallengelong-heldorganizationalrulesofthumb,heuristics,and“gut”decisionmaking.Thebusinessneedsanapproachthatisinclusiveofallthekeystakeholders—IT,businessusers,businessmanagement,channelpartners,andultimatelycustomers.Thebusinesspotentialofbigdataisonlylimitedbythecreativethinkingoftheorganization.

CROSS-REFERENCE

Chapter13(“PowerofEnvisioning”)discusseshowtheBIanddatascienceteamscancollaboratetobrainstorm,test,andrefinenewvariablesthatmightbebetterpredictorsofbusinessperformance.WewillintroduceseveraltechniquesandconceptsthatcanbeusedtodrivecollaborationbetweenthebusinessandITstakeholdersandultimatelyhelpyourdatascienceteamuncovernewcustomer,product,andoperationalinsightsthatleadtobetterbusinessperformance.Chapter14(“OrganizationalRamifications”)introducesorganizationalramifications,especiallytheroleofChiefDataMonetizationOfficer(CDMO).

SummaryBigdataisinterestingfromatechnologyperspective,buttherealstoryforbigdataishoworganizationsofdifferentsizesareleveragingdataandanalyticstopowertheirbusinessmodels.Bigdatahasthepotentialtouncovernewcustomer,product,andoperationalinsightsthatorganizationscanusetooptimizekeybusinessprocesses,improvecustomerengagement,uncovernewmonetizationopportunities,andre-wiretheorganization'svaluecreationprocesses.

Asdiscussedinthischapter,organizationsneedtounderstandthatbigdataisaboutbusinesstransformationandbusinessmodeldisruption.Therewillbewinnersandtherewillbelosers,andhavingbusinessleadershipsitbackandwaitforITtosolvethebigdataproblemsforthemquicklyclassifiesintowhichgroupyourorganizationwilllikelyfall.Seniorbusinessleadershipneedstodeterminewhereandhowtoleveragedataandanalyticstopoweryourbusinessmodelsbeforeamorenimblecompetitororahungriercompetitordisintermediatesyourbusiness.

Torealizethefinancialpotentialofbigdata,businessleadershipmustmakebigdataatopbusinesspriority,notjustatopITpriority.Businessleadershipmustactivelyparticipateindeterminingwhereandhowbigdatacandeliverbusinessvalue,andthebusinessleadersmustbefrontandcenterinleadingtheintegrationoftheresultinganalyticinsightsintotheorganization'svaluecreationprocesses.

Forleadingorganizations,bigdataprovidesaonce-in-a-lifetimebusinessopportunitytobuildkeycapabilities,skills,andapplicationsthatoptimizekeybusinessprocesses,driveamorecompellingcustomerexperience,uncovernewmonetizationopportunities,anddrivecompetitivedifferentiation.Remember:buyforparity,butbuildforcompetitivedifferentiation.

Atitscore,bigdataisabouteconomictransformation.Bigdatashouldnotbetreatedlikejustanothertechnologyscienceexperiment.Historyisfulloflessonsofhoworganizationshavebeenabletocapitalizeoneconomics-drivenbusinesstransformations.Bigdataprovidesoneofthoseeconomic“ForrestGump”momentswhereorganizationsarefortunatetobeattherightplaceattherighttime.Don'tmissthisopportunity.

Finally,organizationshavebeentaughttothinkcheaper,smaller,andfaster,buttheyhavenotbeentaughttothinkdifferently,andthat'sexactlywhat'srequiredifyouwanttoexploitthebigdataopportunity.Manyofthedataandanalyticsbestpracticesthathavebeentaughtoverthepastseveraldecadesnolongerholdtrue.Understandwhathaschangedandlearntothinkdifferentlyabouthowyourorganizationleveragesdataandanalyticstodelivercompellingbusinessvalue.

Insummary,businessleadershipneedstoleadthebigdatainitiative,tostepupandmakebigdataatopbusinessmandate.Ifyourbusinessleadersdon'ttaketheleadinidentifyingwhereandhowtointegratebigdataintoyourbusinessmodels,thenyouriskbeingdisintermediatedinamarketplacewheremoreagile,hungrier

competitorsarelearningthatdataandanalyticscanyieldcompellingcompetitivedifferentiation.

HomeworkAssignmentUsethefollowingexercisestoapplywhatyoulearnedinthischapter.

Exercise#1:Identifyakeybusinessinitiativeforyourorganization,somethingthebusinessistryingtoaccomplishoverthenext9to12months.Itmightbesomethinglikeimprovecustomerretention,optimizecustomeracquisition,reducecustomerchurn,optimizepredictivemaintenance,reducerevenuetheft,andsoon.

Exercise#2:Brainstormandwritedownwhat(1)customer,(2)product,and(3)operationalinsightsyourorganizationwouldliketouncoverinordertosupportthetargetedbusinessinitiative.Startbycapturingthedifferenttypesofdescriptive,predictive,andprescriptivequestionsyou'dliketoansweraboutthetargetedbusinessinitiative.Tip:Don'tworryaboutwhetherornotyouhavethedatasourcesyouneedtoderivetheinsightsyouwant(yet).

Exercise#3:Brainstormandwritedowndatasourcesthatmightbeusefulinuncoveringthosekeyinsights.Lookbothinternallyandexternallyforinterestingdatasourcesthatmightbeuseful.Tip:Thinkoutsidetheboxandimaginethatyoucouldaccessanydatasourceintheworld.

Notes1Hopkins,Brian,FatemehKhatibloowithKyleMcNabb,JamesStaten,AndrasCser,HolgerKisker,Ph.D.,LeslieOwens,JenniferBelissent,Ph.D.,AbigailKomlenic,“ResetOnBigData:EmbraceBigDatatoEngageCustomersatScale,”ForresterResearch,2014.

2http://railroadandsteamengine.weebly.com/impact.html

Chapter2BigDataBusinessModelMaturityIndexOrganizationsdonotunderstandhowfarbigdatacantakethemfromabusinesstransformationperspective.Organizationsdon'thaveawayofunderstandingwhattheultimatebigdataendstatewouldorcouldlooklikeoransweringquestionssuchas:

WhereandhowshouldIstartmybigdatajourney?

HowcanIcreatenewrevenueormonetizationopportunities?

HowdoIcomparetootherswithrespecttomyorganization'sadoptionofbigdataasabusinessenabler?

HowfarcanIpushbigdatatopower—eventransform—mybusinessmodels?

Tohelpaddressthesetypesofquestions,I'vecreatedtheBigDataBusinessModelMaturityIndex.Notonlycanorganizationscanusethisindextounderstandwheretheysitwithrespecttootherorganizationsinexploitingbigdataandadvancedanalyticstopowertheirbusinessmodels,buttheindexprovidesaroadmaptohelporganizationsacceleratetheintegrationofdataandanalyticsintotheirbusinessmodels.

TheBigDataBusinessModelMaturityIndexisacriticalfoundationalconceptsupportingtheBigDataMBAandwillbereferencedregularlythroughoutthebook.It'simportanttolayastrongbasefoundationinhoworganizationscanusetheBigDataBusinessModelMaturityIndextoanswerthisfundamentalbigdatabusinessquestion:“Howeffectiveismyorganizationatintegratingdataandanalyticsintoourbusinessmodels?”

Chapter2Objectives

IntroducetheBigDataBusinessModelMaturityIndexasaframeworkfororganizationstomeasurehoweffectivetheyareatleveragingdataandanalyticstopowertheirbusinessmodels

DiscusstheobjectivesandcharacteristicsofeachofthefivephasesoftheBigDataBusinessModelMaturityIndex:BusinessMonitoring,BusinessInsights,BusinessOptimization,DataMonetization,andBusinessMetamorphosis

DiscusshowtheeconomicsofbigdataandthefourbigdatavaluedriverscanenableorganizationstocrosstheanalyticschasmandadvancepasttheBusinessMonitoringphaseintotheBusinessInsightsandBusinessOptimizationphases

ReviewlessonslearnedthathelporganizationsadvancethroughthephasesoftheBigDataBusinessModelMaturityIndex

IntroducingtheBigDataBusinessModelMaturityIndexOrganizationsaremovingatdifferentpaceswithrespecttowhereandhowtheyareadoptingbigdataandadvancedanalyticstocreatebusinessvalue.Someorganizationsaremovingverycautiously,astheyareunclearastowhereandhowtostartandwhichofthebevyofnewtechnologyinnovationstheyneedtodeployinordertostarttheirbigdatajourneys.OthersaremovingatamoreaggressivepacebyacquiringandassemblingabigdatatechnologyfoundationbuiltonmanynewbigdatatechnologiessuchasHadoop,Spark,MapReduce,YARN,Mahout,Hive,HBase,andmore.

However,aselectfewarelookingbeyondjustthetechnologytoidentifywhereandhowtheyshouldbeintegratingbigdataintotheirexistingbusinessprocesses.Theseorganizationsareaggressivelylookingtoidentifyandexploitopportunitiestooptimizekeybusinessprocesses.Andtheseorganizationsareseekingnewmonetizationopportunities;thatis,seekingoutbusinessopportunitieswheretheycan

Packageandselltheiranalyticinsightstoothers

Integrateadvancedanalyticsintotheirproductsandservicestocreate“intelligent”products

Createentirelynewproductsandservicesthathelpthementernewmarketsandtargetnewcustomers

Thesearethefolkswhorealizethattheydon'tneedabigdatastrategyasmuchastheyneedabusinessstrategythatincorporatesbigdata.Andwhenorganizations“flipthatbyte”onthefocusoftheirbigdatainitiatives,thebusinesspotentialisalmostboundless.

OrganizationscanusetheBigDataBusinessModelMaturityIndexasaframeworkagainstwhichtheycanmeasurewheretheysittodaywithrespecttotheiradoptionofbigdata.TheBigDataBusinessModelMaturityIndexprovidesaroadmapforhelpingorganizationstoidentifywhereandhowtheycanleveragedataandanalyticstopowertheirbusinessmodels(seeFigure2.1).

Figure2.1BigDataBusinessModelMaturityIndex

OrganizationstendtofindthemselvesinoneoffivephasesontheBigDataBusinessModelMaturityIndex:

Phase1:BusinessMonitoring.IntheBusinessMonitoringphase,organizationsareapplyingdatawarehousingandBusinessIntelligencetechniquesandtoolstomonitortheorganization'sbusinessperformance(alsocalledBusinessPerformanceManagement).

Phase2:BusinessInsights.IntheBusinessInsightsphase,organizationsaggressivelyexpandtheirdataassetsbyamassingalloftheirdetailedtransactionalandoperationaldataandcouplingthattransactionalandoperationaldatawithnewsourcesofinternaldata(e.g.,consumercomments,e-mailconversations,techniciannotes)andexternaldata(e.g.,socialmedia,weather,traffic,economic,data.gov)sources.OrganizationsintheBusinessInsightsphasethenusepredictiveanalyticstouncovercustomer,product,andoperationalinsightsburiedinandacrossthesedatasources.

Phase3:BusinessOptimization.IntheBusinessOptimizationphase,organizationsbuildonthecustomer,product,andoperationalinsightsuncoveredintheBusinessInsightsphasebyapplyingprescriptiveanalyticstooptimizekeybusinessprocesses.OrganizationsintheBusinessOptimizationphasepushtheanalyticresults(e.g.,recommendations,scores,rules)tofrontlineemployeesandbusinessmanagerstohelpthemoptimizethetargetedbusinessprocessthroughimproveddecisionmaking.TheBusinessOptimizationphasealsoprovidesopportunitiesfororganizationstopushanalyticinsightstotheircustomersinordertoinfluencecustomerbehaviors.AnexampleoftheBusinessOptimizationphaseisaretailerthatdeliversanalytic-basedmerchandisingrecommendationstothestoremanagerstooptimizemerchandisemarkdownsbasedonpurchasepatterns,inventory,weatherconditions,holidays,consumercomments,andsocialmediapostings.

Phase4:DataMonetization.TheDataMonetizationphaseiswhereorganizationsseektocreatenewsourcesofrevenue.Thiscouldincludesellingdata—orinsights—intonewmarkets(acellularphoneproviderselling

customerbehavioraldatatoadvertisers),integratinganalyticalinsightsintoproductsandservicestocreate“smart”productsandservices,and/orre-packagingcustomer,product,andoperationalinsightstocreateentirelynewproductsandservicesthathelpthementernewmarketsandtargetnewcustomersoraudiences.

Phase5:BusinessMetamorphosis.TheholygrailoftheBigDataBusinessModelMaturityIndexiswhenanorganizationleveragesdata,analytics,andinsightstometamorphoseitsbusiness.Thismetamorphosisnecessitatesamajorshiftintheorganization'scorebusinessmodel(e.g.,processes,people,productsandservices,partnerships,targetmarkets,management,promotions,rewardsandincentives)drivenbytheinsightsgatheredastheorganizationtraversedtheBigDataBusinessModelMaturityIndex.Oneexampleisorganizationsthatmetamorphosefromsellingproductstoselling“business-as-a-service.”ThinkGEselling“thrust”insteadofsellingjetengines.ThinkJohnDeereselling“farmingoptimization”insteadofsellingfarmingequipment.ThinkBoeingselling“airmiles”insteadofairplanes.Anotherexampleisanorganizationcreatingadataandanalyticsplatformthatenablesthegrowingbodyofthird-partydeveloperstobuildandmarketvalue-addedapplicationsontheorganization'sbusiness-as-a-serviceplatform.

Let'sexploreeachofthesephasesinmoredetail.

Phase1:BusinessMonitoringTheBusinessMonitoringphaseisthephasewhereorganizationsaredeployingBusinessIntelligence(BI)anddatawarehousingsolutionstomonitorongoingbusinessperformance.SometimescalledBusinessPerformanceManagement,organizationsintheBusinessMonitoringphasecreatereportsanddashboardsthatmonitorthecurrentstateofthebusiness,flagunder-and/orover-performanceareasofthebusiness,andalertkeybusinessstakeholderswithpertinentinformationwheneverspecial“outofbound”performancesituationsoccur.

TheBusinessMonitoringphaseisagreatstartingpointformostbigdatajourneys.AspartoftheirBusinessIntelligenceanddatawarehousingefforts,organizationshaveinvestedsignificanttime,money,andefforttoidentifyanddocumenttheirkeybusinessprocesses;thatis,thosebusinessprocessesthatmaketheirorganizationsuniqueandsuccessful.Theyhaveassembled,cleansed,normalized,enriched,andintegratedthekeyoperationaldatasources;havepainstakinglyconstructedasupportingdatamodelanddataarchitecture;andhavebuiltcountlessreports,dashboards,andalertsaroundthekeyactivitiesandmetricsthatsupportthatbusinessprocess.Lotsofgreatassetshavealreadybeencreated,andtheseassetsprovidethelaunchingpadforstartingourbigdatajourney.

Unfortunately,movingbeyondtheBusinessMonitoringphaseisasignificant

challengeformanyorganizations.TheinertiaestablishedfromyearsanddecadesofBIanddatawarehouseeffortsworkagainstthe“thinkdifferently”approachthatisnecessarytofullyexploitbigdataforbusinessvalue.Plusthebigfinancialpayoffisn'ttypicallyrealizeduntiltheorganizationpushesthroughtheBusinessInsightsphaseintotheBusinessOptimizationphase.Solet'sdiscusshoworganizationscanleveragetheeconomicsofbigdatatocrosstheanalyticschasm.

Phase2:BusinessInsightsTheBusinessInsightsphasecouplestheorganization'sgrowingwealthofinternalandexternalstructuredandunstructureddatawithpredictiveanalyticstouncovercustomer,product,andoperationalinsightsburiedinthedata.Thismeansuncoveringoccurrencesinthedatathatareunusual(oroutsidenormalbehaviors,trends,andpatterns)andworthyofbusinessinvestigation.

ThisisthephaseoftheBigDataBusinessModelMaturityIndexwhereorganizationsneedtoexploittheeconomicsofbigdata;thatis,bigdatatechnologiesare20to50timescheaperthantraditionaldatawarehousesinstoring,managing,andanalyzingdata.Theeconomicsofbigdataenableorganizationstothinkdifferentlyabouthowtheygather,integrate,manage,analyze,andactupondataandprovidethefoundationforhoworganizationscanadvancebeyondtheBusinessMonitoringphaseandcrosstheanalyticschasm.TheeconomicsofbigdataenablefournewcapabilitiesthatwillhelptheorganizationcrosstheanalyticschasmandmovebeyondtheBusinessMonitoringphaseintotheBusinessInsightsphase.Thesefourbigdatavaluedriversare:

1. AccesstoAlloftheOrganization'sTransactionalandOperationalData.Inbigdata,weneedtomovebeyondthesummarizedandaggregateddatathatishousedinthedatawarehouseandbepreparedtostoreandanalyzetheorganization'scompletehistoryofdetailedtransactionalandoperationaldata.Think25yearsofdetailedpointofsale(POS)transactionaldata,notjustthe13to25monthsofaggregatedPOSdatastoredinthedatawarehouse.

ImaginethebusinesspotentialofbeingabletoanalyzeeachPOStransactionattheindividualcustomerlevel(courtesyofloyaltyprograms)forthepast15to25years.Forexample,grocerscouldseewhenindividualcustomersstarttostrugglefinanciallybecausetheyarelikelytochangetheirpurchasebehaviorsandproductpreferences(i.e.,buyinglower-qualityproducts,replacingbrandedproductswithprivatelabelproducts,increasingtheuseofdiscountsandcoupons).Youcan'tseethoseindividualcustomerbehaviorsandpurchasetendenciesintheaggregateddatastoredinthedatawarehouse.Withbigdata,organizationshavetheabilitytocollect,analyze,andactontheentirehistoryofeverypurchaseoccasionbyBillSchmarzo—whatproductsheboughtinwhatcombinations,whatpriceshepaid,whatcouponsheused,whatandwhenheboughtondiscount,whichstoreshefrequentedonwhat

timeofdayanddayoftheweek,whatweretheoutsideweatherconditionsduringthosepurchaseoccasions,whatwerethelocaleconomicconditions,etc.

Whenyoucananalyzetransactionalandoperationaldataattheindividualcustomer(orpatient,student,technician,teacher,windturbine,ATM,truck,jetengine,etc.)level,youcanuncoverinsightsaboutindividualcustomerorproductbehaviors,tendencies,propensities,preferences,andusagepatterns.Itisontheseindividualcustomerorproductinsightsthatorganizationscantakeaction.It'sverydifficulttocreateactionableinsightsattheaggregatedlevelofstore,zipcode,orcustomerbehavioralcategories.

2. AccesstoInternalandExternalUnstructuredData.Datawarehousesdon'tlikeunstructureddata.Datawarehouseswantstructureddata.Sincedatawarehouseshavebeenbuiltonrelationaldatabasemanagementsystems(RDMBS),thedatawarehousewantsitsdatainrowsandcolumns.Asaconsequence,organizationsandtheirbusinessusershavebeentaughtthattheyreallydon'tneedaccesstounstructureddata.

Butbigdatachallengesthisissuebygivingallorganizationsacost-effectivewaytoingest,store,manage,andanalyzevastvarietiesofunstructureddata.Andtheintegrationoftheorganization'sunstructureddatawiththeorganization'sdetailedstructureddataprovidestheopportunitytouncovernewcustomer,product,andoperationalinsights.

Whilemostoftheexcitementaboutunstructureddataseemstobeaboutthepotentialofexternalunstructureddata(e.g.,social,blogs,newsfeeds,annualreports,mobile,third-party,publiclyavailable),thegoldformanyorganizationsliesintheirinternalunstructureddata(e.g.,consumercomments,e-mailconversations,doctor/teacher/techniciannotes,workorders,servicerequests).Forexample,inaprojecttoimprovethepredictivemaintenanceofwindturbines,itwasdiscoveredthatwhenatechnicianscalesawindturbinetoreplaceaballbearing,heorshemakesotherobservationswhileatthetopoftheturbine,observationssuchas“Itsmellsweirdinhere”or“It'swarmerthannormal”or“Therearedustparticlesintheair.”Eachofthesetypesofunstructuredcommentscouldprovideinvaluableinsightsintothepredictivemaintenanceofthewindturbine,especiallywhencoupledwiththeoperationalsensorreadings,errorcodes,andvibrationsthatarecomingoffthatparticularwindturbine.

3. ExploitingReal-TimeAnalytics.Newbigdatatechnologiesprovideorganizationsthetechnicalcapabilitiestoflagandactonspecialorunusualsituationsinreal-time.Datawarehouseshavetraditionallybeenbatchenvironmentsandstruggledtouncoverandsupportthereal-timeopportunitiesinthedata.Forexample,“tricklefeeding”dataintothedatawarehousehasbeenalong-timedatawarehousechallengebecausetheminutenewdataentersthedatawarehouse,allthesupportingindices,

aggregatetables,andmaterializedviewsneedtobeupdatedwiththenewdata.That'shardlyconducivetoreal-timeanalysis.

Mostorganizationsdonothavealonglistofusecasesthatrequireareal-timeanalyticsenvironment(e.g.,real-timebidding,frauddetection,digitaladplacement,pricing,yieldoptimization).However,therearemanyusecasesfor“right-time”analytics,wheretheopportunitytimeismeasurednotinsecondsbutinminutesorhoursorevendays.Forexample,nursesandadmissionspersonnelinahospitallikelyhave4to5minutestoscorethelikelihoodofapatientcatchingahospital-acquiredinfection(staphinfection)duringthepatient'sadmissionprocess.Anotherexampleislocation-basedservicesthattargetshoppersthatmeetcertaindemographicand/orbehavioralcharacteristicsastheywalkbyastore.

Thebestapproachforuncoveringtheseright-timeanalyticopportunitiesistobreakthetargetedkeybusinessinitiativeintothedataeventsthatcomposethatbusinessinitiative.Thenidentifythosedataeventswhereknowingaboutthateventsooner(minutessooner,hourssooner,maybeevenadaysooner)couldprovideamonetizationopportunity.

4. IntegratingPredictiveAnalytics.Finally,wecanusepredictiveanalyticstominethewealthofstructuredandunstructureddatatoidentifyareasof“unusualness”inthedata;thatis,usepredictiveanalyticstouncoveroccurrencesinthedatathatareoutsidenormalbehaviorsorengagementpatterns.Organizationscanapplypredictiveanalyticsanddataminingtechniquestouncovercustomer,product,andoperationalinsightsorareasof“unusualness”buriedinthemassivevolumesofdetailedstructuredandunstructureddata.

TheseinsightsuncoveredduringtheBusinessInsightsphaseneedtobereviewedbythebusinessusers(thesubjectmatterexperts)todetermineiftheseinsightspasstheS.A.M.test;thatis,theinsightsare:

Strategic—theinsightisimportantorstrategictowhatthebusinessistryingtoaccomplishwithrespecttothetargetedbusinessinitiative.

Actionable—theinsightissomethingthattheorganizationcanactonwhenengagingwithitskeybusinessentities.

Material—thevalueorbenefitofactingontheinsightisgreaterthanthecostsassociatedwithactingonthatinsight(e.g.,costtogatherandintegratethedata,costtobuildandvalidatetheanalyticmodel,costtointegratetheanalyticresultsintotheoperationalsystems).

Forexample,organizationscouldapplybasisstatistics,datamining,andpredictiveanalyticstotheirgrowingwealthofstructuredandunstructureddatatoidentifyinsightssuchas:

Marketingcampaignsthatareperformingtwotothreetimesbetterthan

theaveragecampaignperformanceincertainmarketsoncertaindaysoftheweek

Customersthatarereactingtwotothreestandarddeviationsoutsidethenormintheirpurchasepatternsforcertainproductcategoriesincertainweatherconditions

Supplierswhosecomponentsareoperatingoutsidetheupperorlowerlimitsofacontrolchartinextremecoldweathersituations

CROSS-REFERENCE

Forthepredictiveanalyticstobeeffective,organizationsneedtobuilddetailedanalyticprofilesforeachindividualbusinessentity—customers,patients,students,windturbines,jetengines,ATMs,etc.ThecreationandroleofanalyticprofilesisatopiccoveredinChapter5,“DifferencesBetweenBusinessIntelligenceandDataScience.”

BusinessInsightsPhaseChallengeTheBusinessInsightsphaseisthemostdifficultstageoftheBigDataBusinessModelMaturityIndexbecauseitrequiresorganizationsto“thinkdifferently”abouthowtheyapproachdataandanalytics.Therules,techniques,andapproachesthatworkedintheBusinessIntelligenceanddatawarehouseworldsdonotnecessaryapplytotheworldofbigdata.Thisistrulythe“crossingtheanalyticschasm”moment(seeFigure2.2).

Figure2.2Crossingtheanalyticschasm

Forexample,BusinessIntelligenceanalystsweretaughtto“sliceanddice”thedatatouncoverinsightsburiedinthedata.Thisapproachworkedfinewhendealingwithgigabytesofdata,5to9dimensions,and15to25metrics.However,the“sliceanddice”techniquedoesnotworkwellwhendealingwithpetabytesofdata,40to60dimensions,andhundredsofmetrics.

Also,muchofthebigdatafinancialpaybackorReturnonInvestment(ROI)isnotrealizeduntiltheorganizationreachestheBusinessOptimizationphase.Thisiswhyitisimportanttofocusyourbigdatajourneyonakeybusinessinitiative;somethingthatthebusinessistryingtoachieveoverthenext9to12months.ThefocusonabusinessinitiativecanprovidethenecessaryfinancialandorganizationalmotivationtopushthroughtheBusinessInsightsphaseandtorealizethefinancialreturnandpaybackcreatedintheBusinessOptimizationphase.

Phase3:BusinessOptimizationTheBusinessOptimizationphaseisthestageoftheBigDataBusinessModelMaturityIndexwhereorganizationsdevelopthepredictiveanalytics(predictswhatislikelytohappen)andtheprescriptiveanalytics(recommendsactionsthatshouldbetaken)necessarytooptimizethetargetedkeybusinessprocess.ThisphasebuildsontheanalyticinsightsuncoveredduringtheBusinessInsightsphaseandconstructspredictiveandprescriptiveanalyticmodelsaroundthoseinsightsthatpasstheS.A.M.criteria.Oneclientcalledthisthe“TellmewhatIneedtodo”phase.

Whilemanybelievethatthisisthepartofthematurityindexwhereorganizationsturntheoptimizationprocessovertothemachines,inrealityitismorelikelythattheBusinessOptimizationphasedeliversactionableinsights(e.g.,recommendations,scores,rules)tofrontlineemployeesandmanagerstohelpthemmakebetterdecisionssupportingthetargetedbusinessprocess.Examplesinclude:

Deliveringresourceschedulingrecommendationstostoremanagersbasedonpurchasehistory,buyingbehaviors,seasonality,andlocalweatherandevents

Deliveringdistributionandinventoryrecommendationstologisticmanagersgivencurrentandpredictedbuyingpatterns,coupledwithlocaltraffic,demographic,weather,andeventsdata

Deliveringproductpricingrecommendationstoproductmanagersbasedoncurrentbuyingpatterns,inventorylevels,competitiveprices,andproductinterestinsightsgleanedfromsocialmediadata

Deliveringfinancialinvestmentrecommendationstofinancialplannersandagentsbasedonaclient'sfinancialgoals,currentfinancialassetmix,risktolerance,marketandeconomicconditions,andsavingsobjectives(e.g.,house,college,retirement)

Deliveringmaintenance,scheduling,andinventoryrecommendationstowindturbinetechniciansbasedonerrorcodes,sensorreadings,vibrationreadings,andrecentcommentscapturedbythetechnicianduringpreviousmaintenanceactivities

TheBusinessOptimizationphasealsoseekstoinfluencecustomerpurchaseand

engagementbehaviorsbyanalyzingthecustomer'spastpurchasepatterns,behaviors,andtendenciesinordertodeliverrelevantandactionablerecommendations.CommonexamplesincludeAmazon's“CustomersWhoBoughtThisItemAlsoBought”recommendations,Netflix'smovierecommendations,andPandora'smusicrecommendations.Thekeytotheeffectivenessoftheserecommendationsiscapturingandanalyzinganindividualcustomer'spurchase,usage,andengagementactivitiestobuildanalyticprofilesthatcodifythatcustomer'spreferences,behaviors,tendencies,propensities,patterns,trends,interests,passions,affiliations,andassociations.

Finally,theBusinessOptimizationphaseneedstointegratethecustomer,product,andoperationalprescriptiveanalyticsorrecommendationsbackintotheoperationalsystems(e.g.,callcenter,salesforceautomation,directmarketing,procurement,logistics,inventory)andmanagementapplications(reports,dashboards)systems.Forexample,thinkofan“intelligent”storemanager'sdashboard,whereinsteadofjustpresentingtablesandchartsofdata,theintelligentdashboardgoesonestepfurthertoactuallydeliverrecommendationstothestoremanagertoimprovestoreoperations.

CROSS-REFERENCE

ThepotentialuserexperienceramificationsofpushingprescriptiveanalyticstobothcustomersandfrontlineemployeesarediscussedinChapter4,“ImportanceoftheUserExperience.”

Phase4:DataMonetizationTheDataMonetizationphaseisthephaseoftheBigDataBusinessModelMaturityIndexwhereorganizationsleveragetheinsightsgatheredfromtheBusinessInsightsandBusinessOptimizationphasestocreatenewrevenueopportunities.Newmonetizationopportunitiescouldinclude:

Packagingdata(withanalyticinsights)forsaletootherorganizations.Inoneexample,asmartphonevendorcouldcaptureandpackageinsightsaboutcustomerbehaviors,productperformance,andmarkettrendstoselltoadvertisers,marketers,andmanufacturers.Inanotherexample,MapMyRun(whichwaspurchasedbyUnderArmourfor$150M)couldpackagethecustomerusagedatafromitssmartphoneapplicationtocreateaudienceandproductinsightsthatitcouldselltoavarietyofcompanies,includingsportsapparelmanufacturers,sportinggoodsretailers,insurancecompanies,andhealthcareproviders(seeFigure2.3).

Figure2.3Packagingandsellingaudienceinsights

Integratinganalyticinsightsdirectlyintoanorganization'sproductsandservicestocreate“intelligent”productsorservices,suchas:

Carsthatlearnacustomer'sdrivingpatternsandbehaviorsandadjustdrivercontrols,seats,mirrors,brakepedals,suspension,steering,dashboarddisplays,etc.tomatchthecustomer'sdrivingstyle

TelevisionsandDVRsthatlearnwhattypesofshowsandmoviesacustomerlikesandsearchacrossthedifferentcableandInternetchannelstofindandautomaticallyrecordsimilarshowsforthatcustomer

Ovensthatlearnhowacustomerlikescertainfoodspreparedandcookstheminthatmannerautomaticallyandalsoincluderecommendationsforotherfoodsandrecipesthat“otherslikeyou”enjoy

Jetenginesthatcaningestweather,elevation,windspeed,andotherenvironmentaldatatomakeadjustmentstobladeangles,tilt,yaw,androtationspeedstominimizefuelconsumptionduringflight

Repackaginginsightstocreateentirelynewproductsandservicesthathelporganizationstoenternewmarketsandtargetnewcustomersoraudiences.Forexample,organizationscancapture,analyze,andpackagecustomer,product,andoperationalinsightsacrosstheoverallmarketinordertohelpchannelpartnerstomoreeffectivelymarketandselltotheircustomers,suchas:

Onlinedigitalmarketplaces(Yahoo,Google,eBay,Facebook)couldleveragegeneralmarkettrendsandothermerchantperformancedatatoproviderecommendationstosmallmerchantsoninventory,ordering,merchandising,marketing,andpricing.

Financialservicesorganizationscouldcreateafinancialadvisordashboard

fortheiragentsandbrokersthatcapturesclients'investmentgoals,currentincomelevels,andcurrentfinancialportfolioandcreatesinvestment,risk,andassetallocationrecommendationsthathelpthebrokersandagentsmoreeffectivelyservicetheircustomers.

Retailorganizationscouldminecustomerloyaltytransactionsandengagementstouncovercustomerandproductinsightsthatenabletheorganizationtomoveintonewproductcategoriesornewgeographies.

WhiletheDataMonetizationphaseisclearlythephaseoftheBigDataBusinessModelMaturityIndexthatcatcheseveryone'sattention,itisimportantthattheorganizationgoesthroughtheBusinessInsightsandBusinessOptimizationphasesinordertocapturethecustomer,product,operational,andmarketinsightsthatformthebasisforthesenewmonetizationopportunities.

Phase5:BusinessMetamorphosisTheBusinessMetamorphosisphaseoftheBigDataBusinessModelMaturityIndexshouldbetheultimategoalfororganizations.Thisisthephaseofthematurityindexwhereorganizationsseektoleveragethedata,analytics,andanalyticinsightstometamorphosizeortransformtheorganization'sbusinessmodel(e.g.,processes,people,productsandservices,partnerships,targetmarkets,management,promotions,rewardsandincentives).

TheBusinessMetamorphosisphaseiswhereorganizationsintegratetheinsightsthattheycapturedabouttheircustomers'usagepatterns,productperformancebehaviors,andoverallmarkettrendstotransformtheirbusinessmodels.Thisbusinessmodelmetamorphosisallowsorganizationstoprovidenewservicesandcapabilitiestotheircustomersinawaythatiseasierforthecustomerstoconsumeandfacilitatestheorganizationengaginginhigher-valueandmorestrategicservices.

Forexample,contemplatethedata,analytics,andanalyticinsightsthatBoeingwouldneedtotransformitsbusinessfromsellingairplanestosellingairmiles.Thinkofthedata,analytics,andinsightsthatBoeingwouldneedtouncoveraboutpassengers,airlines,airports,routes,holidays,economicconditions,etc.inordertooptimizeitsbusinessmodels,processes,people,etc.tosuccessfullyexecutethisbusinesschange.Thinkofthebusinessrequirementsnecessarytoencouragethird-partydeveloperstobuildandmarketvalue-addservicesandproductsonBoeing'snewbusinessmodel.ThisisatopicandexamplethatisconsideredinmoredetailinChapter12,“MetamorphosisExercise.”

OtherBusinessMetamorphosisphaseexamplescouldinclude:

Energycompaniesmovingintothe“HomeEnergyOptimization”businessbyrecommendingwhentoreplaceappliances(basedonpredictivemaintenance)andevenrecommendingwhichappliancebrandsandmodelstobuybasedontheperformanceofdifferentappliancestakingintoconsiderationyourusage

patterns,localweather,localwaterquality,andlocalenvironmentalconditionssuchaslocalwaterconservationeffortsandenergycosts

Retailersmovingintothe“ShoppingOptimization”businessbyrecommendingspecificproductsgivencustomers'currentbuyingpatternsascomparedwithotherslikethem,includingrecommendationsforproductsthattheymaynotevensell(think“Miracleon43rdStreet”)

Airlinesmovingintothe“TravelDelight”businessofnotonlyofferingdiscountsonairtravelbasedoncustomers'travelbehaviorsandpreferencesbutalsoproactivelyfindingandrecommendingdealsonhotels,rentalcars,limos,sportingormusicalevents,andlocalsites,shows,restaurants,andshoppingintheareasbasedontheirareasofinterestandpreferences

WhileitisasignificantchallengefororganizationstoeverreachtheBusinessMetamorphosisphase,havingthatasthegoalcanbothbemotivatingandprovideanorganizationalcatalysttomovemoreaggressivelyalongthematurityindex.

BigDataBusinessModelMaturityIndexLessonsLearnedTherearesomeinterestinglessonsthatorganizationswilldiscoverastheyprogressthroughthephasesoftheBigDataBusinessModelMaturityIndex.Understandingtheselessonsaheadoftimeshouldhelpprepareorganizationsfortheirbigdatajourney.

Lesson1:FocusInitialBigDataEffortsInternallyThefirstthreephasesoftheBigDataBusinessModelMaturityIndexseektoextractmorefinancialorbusinessvalueoutoftheorganization'sinternalprocessesorbusinessinitiatives.ThefirstthreephasesdrivebusinessvalueandaReturnonInvestment(ROI)byseekingtointegratenewsourcesofcustomer,product,operational,andmarketdatawithadvancedanalyticstoimprovethedecisionsthataremadeaspartoftheorganization'skeyinternalprocessandbusinessinitiatives(seeFigure2.4).

Figure2.4Optimizeinternalprocesses

Theinternalprocessoptimizationeffortsstartbyseekingtoleveragetheorganization'sBusinessIntelligenceanddatawarehouseassets.Thisincludesbuildingonthedatawarehouse'sdatasources,dataextractionandenrichmentalgorithms,dimensions,metrics,keyperformanceindicators,reports,anddashboards.ThematurityprocessthenappliesthefourbigdatavaluedriverstocrosstheanalyticschasmfromtheBusinessMonitoringphaseintotheBusinessInsightsandultimatelytheBusinessOptimizationphases.

TheFourBigDataValueDrivers

1. Accesstoalltheorganization'sdetailedtransactionalandoperationaldataatthelowestlevelofgranularity(attheindividualcustomer,machine,ordevicelevel).

2. Integrationofunstructureddatafrombothinternal(consumercomments,e-mailthreads,techniciannotes)andexternalsources(socialmedia,mobile,publiclyavailable)withthedetailedtransactionalandoperationaldatatoprovidenewmetricsandnewdimensionsagainstwhichtooptimizekeybusinessprocesses.

3. Leveragereal-time(orright-time)dataanalysistoacceleratetheorganization'sabilitytoidentifyandactoncustomer,product,andmarketopportunitiesinatimeliermanner.

4. Applypredictiveanalyticsanddataminingtouncovercustomer,product,andoperationalinsightsorareasof“unusualness”buriedinthemassivevolumesofdetailedstructuredandunstructureddatathatareworthyoffurtherbusinessinvestigation.

Organizationsmustleveragethesefourbigdatavaluedriverstocrosstheanalyticschasmbyuncoveringnewcustomer,product,andoperationalinsightsthatcanbeusedtooptimizekeybusinessprocesses—whetherdeliveringactionablerecommendationstofrontlineemployeesandbusinessmanagersordelivering“NextBestOffer”orrecommendationstodelightcustomersandbusinesspartners.

Lesson2:LeverageInsightstoCreateNewMonetizationOpportunitiesThelasttwophasesoftheBigDataBusinessModelMaturityIndexarefocusedonexternalmarketopportunities;opportunitiestocreatenewmonetizationorrevenueopportunitiesbasedonthecustomer,product,andoperationalinsightsgleanedfromthefirstthreephasesofthematurityindex(seeFigure2.5).

Figure2.5Createnewmonetizationopportunities

Thisisthepartofthebigdatajourneythatcatchesmostorganizations'attention:theopportunitytoleveragetheinsightsgatheredthroughtheoptimizationoftheirkeybusinessprocessestocreatenewrevenueormonetizationopportunities.Organizationsareeagertoleveragenewcorporateassets—data,analytics,andbusinessinsights—inordertocreatenewsourcesofrevenue.Thisisthe“4Ms”phaseoftheBigDataBusinessModelMaturityIndexwhereorganizationsfocusonleveragingdataandanalyticstocreatenewopportunitiesto“MakeMeMoreMoney!”

Lesson3:PreparingforOrganizationalTransformationTofullyexploitthebigdataopportunity,subtleorganizationalandculturalchangeswillbenecessaryfortheorganizationtoadvancealongthematurityindex.Iforganizationsareseriousaboutintegratingdataandanalyticsintotheirbusinessmodels,thenthreeorganizationalorculturaltransformationswillneedtotakeplace:

1.TreatDataasanAsset.Organizationsmuststarttotreatdataasanassettobenurturedandgrown,notacosttobeminimized.Organizationsmustdevelopaninsatiableappetiteformoreandmoredata—eveniftheyareunclearastohowtheywillusethatdata.Thisisasignificantculturalchangefromthedatawarehousedayswherewetreateddataasacosttobeminimized.

2.LegallyProtectYourAnalyticsIntellectualProperty.Organizationsmustputintoplaceformalprocessesandprocedurestocapture,track,refine,andevenlegallyprotecttheiranalyticassets(e.g.,analyticmodels,dataenrichmentalgorithms,andanalyticresultssuchasscores,recommendations,andassociationrules)askeyorganizationalintellectualproperty.Whiletheunderlyingtechnologiesmaychangeovertime,theresultingdataandanalyticassetswillsurvivethosechangesiftheorganizationscaninstituteawell-managedandenforcedprocesstocapture,store,share,andprotectthoseanalyticassets.

3.GetComfortableUsingDatatoGuideDecisions.Businessmanagementandbusinessusersmustgainconfidenceinusingdataandanalyticstoguidetheirdecisionmaking.Organizationsmustgetcomfortablewithmakingbusinessdecisionsbasedonwhatthedataandtheanalyticstellthemversusdefaultingtothe“HighestPaidPerson'sOpinion”(HIPPO).Theorganization'sinvestmentsindata,analytics,people,processes,andtechnologywillbefornaughtiftheorganizationisn'tpreparedtomakedecisionsbasedonwhatthedataandtheanalyticstellthem.Withthatsaid,it'simportantthattheanalyticinsightsarepositionedas“recommendations”thatbusinessusersandbusinessmanagementcanaccept,reject,ormodify.Inthatway,organizationscanleverageanalyticstoestablishorganizationalaccountability.

SummaryBusinessesofallsizesmustreframethebigdataconversationwithbusinessleaders.TheBigDataBusinessModelMaturityIndexprovidesaframeworkthatenablesbusinessandITleaderstodiscussanddebatethequestion“Howeffectiveisourorganizationatintegratingdataandanalyticsintoourbusinessmodel?”

Thebusinesspossibilitiesseemalmostendlesswithrespecttowhereandhoworganizationscanleveragebigdataandadvancedanalyticstodrivetheirbusinessmodel.TheBigDataBusinessModelMaturityIndexprovidesaroadmap—ahow-toguide—todirectthebusinessandITstakeholdersfromtheBusinessMonitoringphasethroughtheBusinessInsightsandBusinessOptimizationphases,totheultimategoalsintheDataMonetizationandBusinessMetamorphosisphasestocreatenewbusinessmodels(seeTable2.1).

Table2.1BigDataBusinessModelMaturityIndexSummary

BusinessMonitoring

BusinessInsights

BusinessOptimization

DataMonetization

BusinessMetamorphosis

MonitorkeybusinessprocessesandreportonbusinessperformanceusingdatawarehousingandBusinessIntelligencetechniques

Poolalldetailedoperationalandtransactionaldatawithinternalunstructureddataandexternal(third-party,publiclyavailable)data;integratewithadvancedanalyticstouncovercustomer,product,andoperationalinsightsburiedinthedata

Deliveractionablerecommendationsandscorestofront-lineemployeestooptimizecustomerengagement;deliveractionablerecommendationstoendcustomersbasedontheirproductandusagepreferences,propensities,andtendencies

Monetizethecustomer,product,andoperationalinsightscomingoutoftheoptimizationprocesstocreatenewservicesandproducts,capturenewmarketsandaudiences,andcreate“smart”productsandservices

Reconstitutecustomer,product,andoperationalinsightstometamorphosetheveryfabricofanorganization'sbusinessmodel,includingprocesses,people,compensation,promotions,products/services,targetmarkets,andpartnerships

Ultimately,bigdataonlymattersifitcanhelporganizationsgeneratemoremoney

throughimproveddecisionmaking(orimprovedoperationaleffectivenessfornon-profitorganizations).Bigdataholdsthepotentialtobothoptimizekeybusinessprocessesandcreatenewmonetizationorrevenueopportunities.

Insummary:

TheBigDataBusinessModelMaturityIndexprovidesaframeworkfororganizationstomeasurehoweffectivetheyareatleveragingdataandanalyticstopowertheirbusinessmodels.

ThefivephasesoftheBigDataBusinessModelMaturityIndexareBusinessMonitoring,BusinessInsights,BusinessOptimization,DataMonetization,andBusinessMetamorphosis.

Theeconomicsofbigdataandthefourbigdatavaluedriverscanenableorganizationstocrosstheanalyticschasm.

TheBigDataBusinessModelMaturityIndexprovidesaroadmapforbeingsuccessfulwithbigdatabybeginningwithanendinmind.Otherwise,“ifyoudon'tknowwhereyouaregoing,youmightendupsomeplaceelse”(toquoteYogiBerra).

HomeworkAssignmentUsethefollowingexercisestoapplyandreinforcetheinformationpresentedinthischapter:

Exercise#1:Listtwoorthreeofyourorganization'skeybusinessprocesses.Thatis,writedowntwoorthreebusinessprocessesthatuniquelydifferentiateyourorganizationfromyourcompetition.

Exercise#2:Listthefourbigdatavaluedriversthatareenabledbytheeconomicsofbigdataanddescribehoweachmightimpactoneofyourorganization'skeybusinessprocessesidentifiedinExercise#1.

Exercise#3:FortheselectedkeybusinessprocessesidentifiedinExercise#1,describehoweachkeybusinessprocessmightbeimprovedasittransitionsalongthefivephasesoftheBigDataBusinessModelMaturityIndex.Identifythecustomer,product,andoperationalramificationsthateachofthefivephasesmighthaveontheselectedkeybusinessprocess.

Exercise#4:Listtheculturalchangesthatyourorganizationmustaddressifithopestoleveragebigdatatoitsfullestbusinesspotential.Flagthetoptwoorthreeculturalchallengesthatmightbethemostdifficultforyourorganizationandlistwhatyouthinktheorganizationneedstodoinordertoaddressthosechallenges.

Chapter3TheBigDataStrategyDocumentOneofthebiggestchallengesorganizationsfacewithrespecttobigdataisidentifyingwhereandhowtostart.Thebigdatastrategydocument,detailedinthischapter,providesaframeworkforlinkinganorganization'sbusinessstrategyandsupportingbusinessinitiativestotheorganization'sbigdataefforts.Thebigdatastrategydocumentguidestheorganizationthroughtheprocessofbreakingdownitsbusinessstrategyandbusinessinitiativesintopotentialbigdatabusinessusecasesandthesupportingdataandanalyticrequirements.

NOTE

ThebigdatastrategydocumentfirstappearedinmybookBigData:UnderstandingHowDataPowersBigBusiness.Sincethenandcourtesyofseveralclientengagements,significantimprovementshavebeenmadetohelpuserstouncoverbigdatausecases.Inparticular,theprocesshasbeenenhancedtoclarifythebusinessvalueandimplementationfeasibilityassessmentsofthedifferentdatasourcesandusecaseprioritization(seeFigure3.1).

Figure3.1Bigdatastrategydecompositionprocess

Chapter3Objectives

Establishcommonterminologyforbigdata.

Examinetheconceptofabusinessinitiativeandprovidesomeexamplesofwheretofindthesebusinessinitiatives.

Introducethebigdatastrategydocumentasaframeworkforhelpingorganizationstoidentifytheusecasesthatguidewhereandhowtheycanstarttheirbigdatajourneys.

Provideahands-onexampleofthebigdatastrategydocumentinactionusingChipotle,achainoforganicMexicanfoodrestaurants(andoneofmyfavoriteplacestoeat!).

Introduceworksheetstohelporganizationstodeterminethebusinessvalueandimplementationfeasibilityofthedatasourcesthatcomeoutofthebigdatastrategydocumentprocess.

IntroducetheprioritizationmatrixasatoolthatcandrivebusinessandITalignmentaroundprioritizingtheusecasesbasedonbusinessvalueandimplementationfeasibilityovera9-to12-monthwindow.

Finally,wewillhavesomefunbyapplyingthebigdatastrategydocumenttotheworldofprofessionalbaseballanddemonstratehowthebigdatastrategydocumentcouldhelpaprofessionalbaseballorganizationwintheWorldSeries.

EstablishingCommonBusinessTerminologyBeforewelaunchintothebigdatastrategydocumentdiscussion,weneedtodefineafewcriticaltermstoensurethatweareusingconsistentterminologythroughoutthechapterandthebook:

CorporateMission.Whytheorganizationexists;defineswhatanorganizationisandtheorganization'sreasonforbeing.Forexample,TheWaltDisneyCompany'scorporatemissionis“tobeoneoftheworld'sleadingproducersandprovidersofentertainmentandinformation.”1

BusinessStrategy.Howtheorganizationisgoingtoachieveitsmissionoverthenexttwotothreeyears.

StrategicBusinessInitiatives.Whattheorganizationplanstodotoachieveitsbusinessstrategyoverthenext9to12months;usuallyincludesbusinessobjectives,financialtargets,metrics,andtimeframes.

BusinessEntities.Thephysicalobjectsorentities(e.g.,customers,patients,students,doctors,windturbines,trucks)aroundwhichthebusinessinitiativewilltrytounderstand,predict,andinfluencebehaviorsandperformance(sometimesreferredtoasthestrategicnounsofthebusiness).

BusinessStakeholders.Thosebusinessfunctions(sales,marketing,finance,storeoperations,logistics,andsoon)thatimpactorareimpactedbythestrategicbusinessinitiative.

BusinessDecisions.Thedecisionsthatthebusinessstakeholdersneedtomakeinsupportofthestrategicbusinessinitiative.

BigDataUseCases.Theanalyticusecases(decisionsandcorrespondingactions)thatsupportthestrategicbusinessinitiative.

Data.Thestructuredandunstructureddatasources,bothinternalandexternaloftheorganization,thatwillbeidentifiedthroughoutthebigdatastrategydocumentprocess.

IntroducingtheBigDataStrategyDocumentThebigdatastrategydocumenthelpsorganizationsaddressthechallengeofidentifyingwhereandhowtostarttheirbigdatajourneys.Thebigdatastrategydocumentusesasingle-pageformatthatanyorganizationcanuse(profitornon-profit)thatlinksanorganization'sbigdataeffortstoitsbusinessstrategyandkeybusinessinitiatives.Thebigdatastrategydocumentiseffectiveforthefollowingreasons:

It'sconcise.Itfitsonasinglepagesothatanyoneintheorganizationcanquicklyreviewittoensureheorsheisworkingonthetoppriorityitems.

It'sclear.Itclearlydefineswhattheorganizationneedstodoinordertoachievetheorganization'skeybusinessinitiatives.

It'sbusinessrelevant.Itstartsbyfocusingonthebusinessstrategyandsupportinginitiativesbeforeitdivesintothedataandtechnologyrequirements.

Thebigdatastrategydocumentiscomposedofthefollowingsections(seeFigure3.2):

Businessstrategy

Keybusinessinitiatives

Keybusinessentities

Keydecisions

Financialdrivers(usecases)

Figure3.2Bigdatastrategydocument

Therestofthechapterwilldetaileachofthesesectionsandprovideguidelinesforhowtheorganizationcantriagetheorganization'sbusinessstrategyintothefinancialdrivers(orusecases)onwhichtheorganizationcanfocusitsbigdataefforts.WewilluseacasestudyaroundChipotleMexicanGrillstoreinforcethetriageandanalysisprocess.

IdentifyingtheOrganization'sKeyBusinessInitiativesThestartingpointforthebigdatastrategydocumentprocessistoidentifytheorganization'sbusinessinitiativesoverthenext9to12months.Thatis,whatisthebusinesstryingtoaccomplishoverthenext9to12months?This9-to12-monthtimeframeiscritical,asit

Focusestheorganization'sbigdataeffortsonsomethingthatisofimmediatevalueandrelevancetothebusiness

Createsasenseofurgencyfortheorganizationtomovequicklyanddiligently

GivesthebigdataprojectamorerealisticchanceofdeliveringapositiveReturnonInvestment(ROI)andafinancialpaybackin12monthsorless

Abusinessinitiativesupportsthebusinessstrategyandhasthefollowingcharacteristics:

Criticaltoimmediate-termbusinessand/orfinancialperformance(usually9-to12-monthtimeframe)

Communicated(eitherinternallyorpublicly)

Cross-functional(involvesmorethanonebusinessfunction)

Ownedorchampionedbyaseniorbusinessexecutive

Hasameasurablefinancialgoal

Hasawell-defineddeliverytimeframe

Deliverscompellingfinancialorcompetitiveadvantage

Forexample,awirelessprovidermighthaveakeybusinessinitiativetoreducetheattritionrateamongitsmostprofitablecustomersby20percentoverthenext12months.Orapublicutilitymighthaveakeybusinessinitiativetoimprovecustomersatisfactionbyacertainnumberofbasispointswhilereducingwaterconsumptionby20percent.

Therearemanyplacestouncoveranorganization'skeybusinessinitiatives.Ifthecompanyispublic,thentheorganization'sfinancialstatementsareagreatstartingpoint.Forbothprivateandnon-profitorganizations,thereisabevyofpubliclyavailablesourcesforidentifyinganorganization'skeybusinessinitiatives,including:

Annualreports

10-K(filedannually)

10-Q(filedquarterly)

Quarterlyanalystcalls

Executivepresentationsandconferences

Executiveblogs

Newsreleases

Socialmediasites

SeekingAlpha.com

WebsearchesusingGoogle,Yahoo,andBing

Thebestwaytograspthebigdatastrategydocumentprocessiswithahands-onexample.Andwhatbetterplacetotestthebigdatastrategydocumentthanwithoneofmyfavoriterestaurants,Chipotle!

What'sImportanttoChipotle?Let'sstartthebusinessstrategyanalysisprocessbyreviewingChipotle'sannualreporttodeterminewhat'simportanttothecompanyfromabusinessstrategyperspective.Figure3.3showsanabbreviatedversionoftheChipotlePresident'sLettertoShareholdersfromthe2012annualreport.

Figure3.3Chipotle's2012lettertotheshareholders

FromthePresident'sLetter,wecanidentifyatleastfourkeybusinessinitiativesforthecomingyear:

Improveemployee(talent)acquisition,maturation,andretention(whichisespeciallyimportantforanorganizationwhere90percentofitsmanagementhascomeupthroughtheranksofthestore).

Continuedouble-digitrevenuegrowth(up20.3percentin2012)byopeningnewstores(opened183over100in2012).

Increasesamestoresalesgrowth(7.1percentgrowthin2012).

ImprovemarketingeffectivenessonbuildingtheChipotlebrandandengagingwithcustomersinwaysthatcreatestronger,deeperbonds.

Whileanyfourofthesebusinessinitiativesareripeforthebigdatastrategydocument,fortheremainderofthisexercise,we'llfocusonthe“increasesamestoresales”businessinitiativebecauseincreasingsalesofabusinessentityoroutletisrelevantacrossanumberofdifferentindustries(i.e.,hospitality,gaming,banking,insurance,retail,highereducation,healthcareproviders).

IdentifyKeyBusinessEntitiesandKeyDecisionsAfteridentifyingourtargetedbusinessinitiative,thenextstepistoidentifythekeybusinessentitiesthatareimportanttothetargetedbusinessinitiative(“increasesamestoresales”).Businessentitiesarethestrategicnounsaroundwhichthetargetedbusinessinitiativemustfocus.Youprobablywon'thavemorethanthreetofivebusinessentities,orstrategicnouns,foranysinglebusinessinitiative.

NOTE

Itisaroundthesebusinessentitiesthatwearegoingtowanttocapturethebehaviors,tendencies,patterns,trends,preferences,etc.attheindividualentitylevel.Forexample,acreditcardcompanywouldwanttocaptureBillSchmarzo'sspecifictravelandbuyingpatternsandtendenciesinordertobetterdetectfraudandimprovemerchantmarketingoffers.

Figure3.4showsthetemplatethatwearegoingtousetosupportthebigdatastrategydocumentprocess.Wehavealreadycapturedourtargeted“increasesamestoresales”businessinitiative.

Figure3.4Chipotle's“increasesamestoresales”businessinitiative

Takeamomenttowritedownwhatyouthinkmightbethekeybusinessentitiesorstrategicnounsforthe“increasesamestoresales”businessinitiative:

HerearethreebusinessentitiesthatIcameupwith:

Stores

Localevents(sporting,entertainment,social)

Localcompetitors

NOTE

Ididnotinclude“customers”asoneofmybusinessentitiesforChipotle.Customersaretypicallyaleadingbusinessentitycandidate;however,asofthewritingofthisbook,Chipotledidnothaveacustomerloyaltyprogram.ThelackofacustomerloyaltyprogramwouldmakeitdifficultforChipotletoidentifyindividualcustomerbehaviors,tendencies,patterns,trends,andproductpreferences.

Next,foreachoftheidentifiedbusinessentities,brainstormtheanalyticinsightsthatyoumightwanttocaptureforeachoftheindividualbusinessentities.Thatis,whatisitthatyou'dliketoknowabouteachindividualbusinessentitythatcouldsupportthestrategicbusinessinitiative?Foreachofthekeybusinessentitieslistedbelow,jotdownsomeanalyticinsightsthatyouwouldliketoknowabouteach:

Stores

Localevents

Localcompetitors

HerearesomepotentialanalyticinsightsthatIwouldliketocaptureabouteachoftheChipotlebusinessentities:

Stores—Foreachindividualstore,understandin-storetrafficpatterns,nearbycustomerdemographics,mostpopularmarketbasketcombinations,customerproductpreferences(bytimeofday,dayofweek,seasonality,andholidays),weatherconditions,outsidetrafficconditions,localeconomicsituation,localhomevalues,nearbyschoolsandcolleges,Yelprating,socialmediasentiment,andsoforth.

LocalEvents—Foreachindividuallocalevent,understandthetypeofevent,thefrequencyofevent,wheneventoccurs(timeofday,dayofweek,timeofyear),eventstartandstoptimes,thenumberofattendees,thedemographicsofparticipantsandattendees(age,gender),eventadministrator/coordinator,eventsponsors,andsoon.

LocalCompetitors—Foreachindividuallocalcompetitor,understandtypeofcompetitor,sizeofcompetitor,chainormom-and-popcompetitor,distancefromcompetitor,typeoffoodserved,typeofservice,pricepoints,Yelpratings,socialmediasentiment,lengthoftimeinservice,customerdemographics,etc.

Next,identifythekeybusinessdecisionsthatneedtobemadeaboutthekeybusinessentitiesinsupportofthetargetedbusinessinitiative.Thatis,whatdecisionsdoesChipotlestoreandcorporatemanagementneedtomakeaboutthebusinessentitiestosupportthe“increasesamestoresales”businessinitiative?Thisisagreatopportunitytobrainstormthedecisionswithyourkeybusiness

stakeholders,thosepeopleorworkerswhoimpactorareimpactedbythekeybusinessinitiative.

Hereareexamplesofsomekeybusinessdecisionsbybusinessentity:

Businessentity:Stores

Howmuchstaffing,inventory,andingredientsdoIneedfortheupcomingweekendgiventhelocalevents?

Howmuchstaffing,inventory,andingredientsdoIneedgivenupcomingholidaysandseasonalevents?

Howmuchstaffing,inventory,andingredientsdoIneedforFridaylocalbusinesscatering?

Howmuchstaffing,inventory,andingredientsdoIneedforlocalhighschooldemand(schoolin-sessionversusschoolout-of-session)?

Whataretheidealhoursofoperationsfortheupcominghighschoolfootballseason?

Businessentity:Localevents

HowmuchadditionalstaffingwillIneedforwhichlocalevents?

HowmuchadditionalinventoryandingredientswillIneedforwhichlocalevents?

WhichlocaleventsdoIwanttosponsorandatwhatcost?

WhatpromotionsdoIwanttoofferinsupportofthelocalevents?

Businessentity:Localcompetitors

Whatarethemosteffectiveoffersorpromotionstocountercompetitorsthataretakingcustomersawayfromme?

Whataremycompetitors'mosteffectivepromotions?

WhatpricingandproductionchangesdoIneedtomakeinlightofkeycompetitoractivities?

Towhichlocalcompetitors'promotionsdoIneedtorespond?

What'sthemosteffectiveresponseorpromotiongivencompetitors'promotionalactivities?

Takeashotbelowatbrainstormingsomeadditionalbusinessdecisionsthatthestoremanagerorcorporatemanagementmighthavetomakeaboutthekeybusinessentitiestosupportthe“increasesamestoresales”businessinitiative.

Businessentity:Stores

Decision:

Decision:

Businessentity:Localevents

Decision:

Decision:

Businessentity:Localcompetitors

Decision:

Decision:

NOTE

Someofthedecisionswillbeverysimilar.That'sgoodbecauseitallowstheorganizationtoapproachthedecisionsfrommultipleperspectives.

Figure3.5showsiswhattheChipotlebigdatastrategydocumentlooksatthispointintheexercisewiththeadditionofsomeofthebusinessdecisions.

Figure3.5Chipotlekeybusinessentitiesanddecisions

IdentifyFinancialDrivers(UseCases)Nextyouwanttogroupthedecisionsintocommonusecasesor“commonthemes.”Thatis,identifyandclusterthosedecisionsthatseemsimilarintheirbusinessandfinancialobjectives.Theresultingusecasesarethefinancialdriversorthe“howdowemakemoremoney”opportunitiesforourtargetedbusinessinitiative.

CROSS-REFERENCE

Whileitishardtoactuallydothisgroupingprocessinabook,theuseoffacilitationtechniquestohelpbrainstormandgroupthesedecisionswillbecoveredintheFacilitationTechniquessectionofChapter13,“PowerofEnvisioning.”

ForChipotle's“increasesamestoresales”businessinitiative,thefollowingarelikelyfinancialdriversorusecases:

Increasestoretraffic(acquirenewcustomers,increasefrequencyofrepeatcustomers)

Increaseshoppingbagrevenueandmargins(cross-sellcomplementaryproducts,up-sell)

Increasenumberofcorporateevents(catering,repeatcateringevents)

Improvepromotionaleffectiveness(HalloweenBoo-ritto,Christmasgiftcards,graduation,holiday,andspecialeventgiftcards)

Improvenewproductintroductioneffectiveness(seasonal,holiday)

Theentirebigdatastrategydocumentprocesshasbeendesignedtouncovertheseusecases—toidentifythosefinancialdriversthatsupportourtargetedbusinessinitiative.Theusecasesandfinancialdriversarethepointofthebigdatastrategydocumentwherewefocustheorganizationonthe“MakeMeMoreMoney”bigdataopportunities.

Inaddition,theseusecasesarekeytoguidingthedatascienceteaminitsdataacquisition,datacleansing,dataenrichment,metricdiscovery,scorecreation,andanalyticmodeldevelopmentprocesses.Forexample,theChipotle“increasesamestoresales”usecasesmaytranslateintotheanalyticsshowninTable3.1:

Table3.1MappingChipotleUseCasestoAnalyticModels

ChipotleUseCases PotentialAnalyticModels

IncreaseStoreTraffic StoreMarketingEffectivenessStoreLayoutFlowAnalysisStoreRemodelingLiftAnalysisStoreCustomerTargeting

IncreaseShoppingBagRevenueandMargin

In-storeMerchandisingEffectivenessPricingOptimizationUp-sell/Cross-sellEffectivenessMarketBasketAnalysis

IncreaseNumberofCorporateEvents CampaignEffectivenessPipelineandSalesEffectivenessPricingOptimizationCustomerLifetimeValueScoreLikelihoodtoRecommendScore

ImprovePromotionalEffectiveness PromotionalEffectivenessPricingOptimizationMarketBasketAnalysisUp-sell/Cross-sellEffectiveness

ImproveNewProductIntroductions PricingOptimizationNewProductIntroductionsEffectivenessUp-sell/Cross-sellEffectiveness

NOTE

Youwilldiscoverthatmanyoftheanalyticmodelsdevelopedforoneusecasewillsupportotherusecases.Asorganizationsbuildouttheiranalyticassets,youwillfindopportunitiestoleveragetheseanalyticassets(data,dataenrichmenttechniques,analyticmodels)toaccelerateaddressingadditionalbigdatausecases.

Figure3.6showswhatthefinalChipotlebigdatastrategydocumentlookslikeatthispointintheexercise.

Figure3.6CompletedChipotlebigdatastrategydocument

IdentifyandPrioritizeDataSourcesWiththeusecasesandfinancialdriversidentified,wearenowreadytomoveintothedataandmetricsenvisioningprocess.Wewanttobrainstormdatasources(regardlessofwhetherornotyoucurrentlyhaveaccesstothesedatasources)thatmightyieldnewinsightsthatsupportthetargetedbusinessinitiative.WewanttounleashthebusinessandITteams'creativethinkingtobrainstormdatasourcesthatmightyieldnewcustomer,product,store,campaign,andoperationalinsightsthatcouldimprovetheeffectivenessofthedifferentusecases.

Forexample,Chipotledatasourcesthatwereidentifiedaspartoftheenvisioningexercisescouldinclude:

PointofSalesTransactions

MarketBaskets

ProductMaster

StoreDemographics

CompetitiveStoresSales

StoreManagerNotes

EmployeeDemographics

StoreManagerDemographics

ConsumerComments

Weather

TrafficPatterns

Yelp

Zillow/Realtor.com

Twitter/Facebook/Instagram

Twellow/Twellowhood

ZipCodeDemographics

EventBrite

MaxPreps

MobileApp

Butnotalldatasourcesareofequalbusinessvalueorhaveequalimplementationfeasibility.Thedatasourcesneedtobeevaluatedinlightof

Thebusinessvaluethatdatasourcecouldprovideinsupportoftheindividualusecase

Thefeasibility(orease)ofacquiring,cleaning,aligning,normalizing,enriching,andanalyzingthosedatasources

Sowewanttoaddtwoprocesses(worksheets)tothebigdatastrategydocumentprocessthatwillevaluatethebusinessvalueandimplementationfeasibilityofeachofthepotentialdatasources.

BuildingonourChipotlecasestudy,let'sfirstassessthepotentialbusinessvalueofthedifferentdatasourcesvis-à-vistheidentifiedusecases(seeFigure3.7).

Figure3.7BusinessvalueofpotentialChipotledatasources

You'dwanttogothroughagroupbrainstormingprocesswiththebusinessstakeholderstoassesstherelativevalueofeachdatasourcewithrespecttoeachusecase.Thebusinessusersownthebusinessvaluedeterminationbecausetheyarebestpositionedtobeabletounderstandandquantifythebusinessvaluethateachdatasourcecouldprovidetotheusecases.

NOTE

IlikeusingHarveyBalls(http://en.wikipedia.org/wiki/Harvey_Balls)inboththedatavalueandthefeasibilityassessmentcharts.TheHarveyBallsquicklyandeasilycommunicatetherelativevalueofeachdatasourcewithrespecttoeachusecase.

ReviewingthedatavalueassessmentchartinFigure3.7,youcanquicklyuncoversomekeyobservations,suchasthefollowing:

Detailedpoint-of-saledataisimportanttoalloftheusecases.

InsightsfromtheStoreDemographicsdataareimportanttofourofthefiveusecases.

MiningConsumerCommentshasasurprisingstrongimpactacrossfourofthefiveusecases.

Localeventsdataisimportanttothe“increasestoretraffic”and“improve

promotionaleffectiveness”usecasesbuthaslittleimpactonthe“increaseshoppingbagrevenue,”“increasenumberofcorporateevents,”or“improvenewproductintroductioneffectiveness”usecases.

Next,youwanttounderstandtheimplementationfeasibilityforeachofthepotentialdatasources.ThispartoftheexerciseisprimarilydrivenbytheITorganizationsinceitisbestpositionedtounderstandtheimplementationchallengesandrisksassociatedwitheachofthedatasources,suchaseaseofdataacquisition,cleanlinessofthedata,dataaccuracy,datagranularity,costofacquiringthedata,organizationalskillsets,toolproficiencies,andotherriskfactors.TheimplementationfeasibilityassessmentchartforChipotle's“increasesamestoresales”businessinitiativelookslikeFigure3.8.

Figure3.8ImplementationfeasibilityofpotentialChipotledatasources

FromtheChipotleimplementationfeasibilityassessmentchartinFigure3.8,wecanquicklymakethefollowingobservations:

PointofSales,MarketBaskets,andStoreManagerDemographicsdataisreadilyavailableandeasytointegrate(likelyduetothemasterdatamanagementanddatagovernanceeffortsnecessarytoloadthisdataintoadatawarehouse).

ConsumerCommentsdata,whichwasveryvaluableinthebusinessvalueassessment,hasseveralimplementationrisks.Lackoforganizationalexperienceindealingwiththeunstructureddataislikelythesourceoftheserisks,whichmanifestitselfintheareasofdataacquisition,standardization,integration,cleanliness,accuracy,granularity,skillsets,andtoolproficiencies.

SocialMediadata,whichwasratedaboutmid-valueinthevalueassessmentexercise,alsolookstobearealchallenge.Manyofthesamecleanliness,accuracy,andgranularityissuesexist,withtheaddedissuethatthisisdatathatwillneedtobe“acquired”throughsomemeans.Probablynotthefirstdatasourceyouwanttodealwithinthisusecase.

LocalEventsdata,whichwasveryimportantinthe“increasestoretraffic”usecase,alsoposesmanychallenges.PullingdatafromsourcessuchasEventBriteandMaxPrepsmayrequirescreenscrapinginordertogetthelevelofdetailneededaboutaparticularlocalevent.WhilethesesitesmayprovideAPIstoeasethedataacquisitionprocess,manytimestheAPIsdon'tprovidethecompletedetailthatthedatascienceteammaywant.Andscreenscraping,whileaveryusefuldatascientisttool,posesallsortsofchallengesincleaningthedataafterscraping.

IntroducingthePrioritizationMatrixThefinalstepinthebigdatastrategydocumentprocessistotakethebusinessandITstakeholdersthroughausecaseprioritizationprocess.WhilewewillcovertheprioritizationmatrixindetailinChapter13,Iwanttointroducetheconcepthereasthenaturalpointofconcludingthebigdatastrategydocumentprocess.

Aspartofthebigdatastrategydocument,wehavenowdonetheworktoidentifytheusecasesthatsupporttheorganization'skeybusinessinitiative,brainstormedadditionaldatasources,anddeterminedtheapplicabilityofthosedatasourcesfromabusinessvalueandimplementationfeasibilityassessment.Wearenowreadytoprioritizetheusecasesbasedontheirrelativebusinessvalueandimplementationfeasibilityoverthenext9to12months(seeFigure3.9).

Figure3.9Chipotleprioritizationofusecases

WARNING

Itiscriticaltoremembertousethenext9-to12-monthtimeframeasthebasisfortheprioritizationprocess.The9-to12-monthtimeframeensuresthatthebigdataprojectisdeliveringimmediate-termbusinessvalueandbusinessrelevancewithasenseofurgency,anditkeepsthebigdataprojectfromwanderingintoa“boiltheocean”typeofprojectthatisdoomedtofailure.

UsingtheBigDataStrategyDocumenttoWintheWorldSeriesTotestourcompetencywiththebigdatastrategydocument,let'sexamineafuncasestudy.Let'ssaythatyouarethegeneralmanagerofaprofessionalbaseballteam.Thecorporatemissionoftheorganizationisto“WintheWorldSeries”(Onapersonalnote,Iamconvincedthatthereareteamswherethegoalisnotto.“WintheWorldSeries”butinsteadtojustmakeprofitswithoutregardtothequalityofplayonthefield,butthat'sthecynicalChicagoCubsfaninmecomingout.)

Asinanycommercialbusiness,therearemultiplebusinessstrategiesthataprofessionalbaseballorganizationcouldpursueinordertoachievethe“WintheWorldSeries”mission,including:

Spendhugeamountsofmoneyforveteran,proven,top-performingplayers(NewYorkYankees,BostonRedSox,LosAngelesDodgers);

Spendhugeamountsofmoneyforover-the-hill,inconsistentperformingplayers(ChicagoCubsaremovingawayfromthisstrategy,thoughtheNewYorkMetsseemtobetryingtoperfectthisapproach);

Spendtopmoneytohaveoutstandingstartingandreliefpitching,andscroungetogetherenoughtimelyhittingtowingames(SanFranciscoGiants);

Spendtopmoneytohaveoutstandinghittingandhopethatyoucanpiecetogetherenoughpitchingtowingames(TexasRangers,LosAngelesAngels);

Spendmiserlyamountsofmoneyandrelyonyourminorleaguesystemandsabermetricstodraftanddevelophigh-quality,low-paidrookieplayers

(OaklandA's,MinnesotaTwins,KansasCityRoyals,TampaBayRays).

Sousingthebigdatastrategydocument,let'splayGeneralManageroftheSanFranciscoGiantstoseewhattheGiantswouldneedtodotoachieveitsgoalto“WintheWorldSeries.”

Thefirststepistoclearlyarticulateourbusinessstrategy.InthecaseoftheSanFranciscoGiants,I'dsaythatthebusinessstrategythatwouldsupportits“WintheWorldSeries”corporatemissionwouldbe“Acquireandretainhigh-performing,sustainable,startingpitchingcoupledwithsmallballhittingtocompeteannuallyfortheWorldSeries.”

Let'srememberthatabusinessstrategyistypicallytwotothreeyearsormoreonthehorizon.Ifyouchangeyourbusinessstrategyannually,thenthat'snotastrategy(soundsmorelikeafad).Butcompaniesdoandshouldchangetheirbusinessstrategiesbasedonchangingeconomicconditions,marketforces,customerdemographictrends,technologychanges,andevennewinsightsfrombigdataanalytics(whichmightrevealthatstrongpitchingtends—fromastatisticalperspective—tobeatstronghittinginthepostseason).ThisisexactlywhattheSanFranciscoGiantsseemtohavedoneastheteamhasmovedaway

froma“longball”baseballstrategyintryingtoreachtheWorldSeries(bysurroundingBarryBondswithotherstrongbatters)toitscurrent“superiorstartingpitching”businessstrategy.

Solet'susethebigdatastrategydocumenttoseewhatwe(theSanFranciscoGiants)needtodotoexecuteagainstthe“superiorstartingpitchingtowintheWorldSeries”businessstrategy.

First,wewanttodecomposethebusinessstrategyintothesupportingbusinessinitiatives.Remember,businessinitiativesarecross-functionalplans,typically9to12monthsinlength,withclearlydefinedfinancialorbusinessmetrics.Forourbaseballexercise,I'monlygoingtolisttwobusinessinitiatives(thoughIcanthinkoftwoorthreemorethatalsoneedtobeaddressedinthecaseoftheSanFranciscoGiants):

Acquireandmaintaintop-tierstartingpitching

Perfectsmallballoffensivestrategy

Next,Iwanttoidentifythekeybusinessentities,orstrategicnouns,aroundwhichIneedtocaptureanalyticinsightstosupportthetargetedbusinessinitiatives.Forourcasestudy,thiscouldinclude:

Pitchers.Developdetailedknowledgeandpredictiveinsightsintoindividualstartingpitchers'in-gameandsituationalpitchingtendenciesandperformanceasmeasuredbyqualitystarts(pitchesatleastsixinningsinastart),EarnedRunAverage(ERA),WalksandHitsperInningPitched(WHIP),strikeout-to-walkratio,andnumberofhomerunspernineinningsbycompetitors,byspecificbatters,byballpark,byweatherconditions,bydaysrest,bynumberofgamesintotheseason,etc.

Batters.DevelopdetailedknowledgeandpredictiveinsightsintobattertendenciesandbehaviorsasmeasuredbyOnBasePercentage(OBP),battingaveragewithrunnersinscoringposition,stealingpercentage,hit-and-runexecution,andsacrificehittingeffectivenessbycount,bynumberofouts,bycompetitivepitcher,bywho'sonbase,bydayversusnight,etc.

WecouldalsodevelopprofilesforCoaches,Competitors,andmaybeevenStadiums,butforreasonsoftimeandsimplicity,we'lljuststicktothePitchersandBattersbusinessentitiesforthisexercise.

Next,let'sbrainstormthedecisionsandquestionsthatweneedtoaddressaboutourkeybusinessentitiestosupportourtargetedbusinessinitiative:.

AcquireandMaintainTop-TierStartingPitching

Whoaremymosteffectivestartingpitchers?

WhichstartingpitchersdoIre-signandforhowmuchmoneyandlengthofcontract?

WhichfreeagentstartingpitchersdoIwanttosignandforhowmuch

moneyandlengthofcontract?

WhichstartingpitchersdoIwanttotrade,andwhatismyexpectationofthevaluethattheywillbringinthemarket?

Whichcompetitors'startingpitchersdoIwanttotrytoacquireviatradesandforhowmuch?

Whichofmyminorleaguepitchersareprojectedtobemymosteffectivebigleaguestartingpitchersoverthenexttwotothreeyears?

Whatismystartingpitchingrotation?

Whichstartingpitchersarecurrentlystruggling,andwhatarethelikelyreasonsforthisstruggling?

WhichstartingpitchersshouldIrestbyhavingthemmissastart?

PerfectSmallBallOffensiveStrategy

Whichbattersaremosteffectiveingettingonbase?

Whichbattersaremosteffectiveinadvancingrunners?

Whichbattersaremosteffectiveindrivinginrunnersfromthirdbase?

Whichplayersaremybestbasestealers?

Whoaremymosttimelyhittersinlate-in-the-gamepressuresituations?

Whichminorleaguebattersaremosteffectiveingettingonbase?

Whichminorleaguebattersaremosteffectiveinadvancingrunners?

Whichminorleaguebattersaremosteffectiveindrivinginrunnersfromthirdbase?

Whichminorleagueplayersexcelatbasestealing?

Whichfreeagentbattersaremosteffectiveingettingonbase?

Whichfreeagentbattersaremosteffectiveinadvancingrunners?

Whichfreeagentbattersaremosteffectiveindrivinginrunnersfromthirdbase?

Whichfreeagentplayersexcelatbasestealing?

IfyouareabaseballjunkielikeIam,takeamomenttolistsomeadditionaldecisionsandquestionsthatyou'dliketoaddressforthetwotargetedbusinessinitiatives:top-tierstartingpitchingandsmallballoffense.

Nowwecangroupthedecisions(andquestionsinthisexercise)intocommonusecasesthatcouldinclude:

Improvestartingpitchingproficiencybyoptimizingtrades,freeagentsigning,minorleaguepromotions,andcontractextensions(costversusstartingpitchingperformanceeffectiveness)

Preservestartingpitchingeffectivenessthroughouttheregularseasonandplayoffsbyoptimizingpitchcounts,pitcherrotations,pitcherrests,etc.

Improvebattingandsluggingproficiencybyoptimizingtrades,freeagentsignings,minorleaguepromotions,andcontractextensions

Increasein-game“smallball”runsscoredeffectivenessthroughtheoptimalcombinationofbatters,hitting,stealing,baserunning,andsacrificehittingstrategies

Accelerateminorleagueplayerdevelopmentthroughplayerstrengthandconditioningtraining,gamesituations,andminorleagueassignments

Optimizein-gamepitchselectiondecisionsthroughimprovedunderstandingofbatterandpitchermatchups

Figure3.10showstheresultingbigdatastrategydocument.

Figure3.10SanFranciscoGiantsbigdatastrategydocument

Next,wewouldbrainstormthepotentialdatasourcestosupporttheusecases,including:

PersonnelPlayerHealth.Thisshouldincludepersonalhealthhistory(weight,health,BMI,injuries,therapy,medications),physicalperformancemetrics(60-footdashtime,longtossdistances,fastballvelocity),andworkouthistory(benchpress,deadlift,crunchesandpushupsin60seconds,frequencyandrecencyofworkouts).

StartingPitcherPerformance.Thisshouldincludeadetailedpitchinghistoryincludingnumberofpitchesthrown,strike-to-ballratio,strikeouts-to-

walkratio,walksandhitsperinningpitched,ERA,firstpitchstrikes,battingaverageagainst,andsluggingpercentagepertimeofyear,peropponent,andpergame.

BatterPerformance.Thisshouldincludeadetailedbattinghistoryincludingbattingaverage,walks,on-basepercentage,sluggingpercentage,strikeouts,hittingintodoubleplays,buntingsuccesspercentage,on-basesluggingpercentage,winsabovereplacement,andhittingwithrunnersonbasepertimeofyear,peropponent,andpergame.

Competitors'PitchingPerformance.Thisshouldincluderecenthistoryofcompetitors'pitchers'performanceincludingnumberofpitchesthrown,strike-to-ballratio,strikeouts-to-walkratio,walksandhitsperinningpitched,ERA,firstpitchstrikes,battingaverageagainst,andsluggingpercentageagainstpertimeofyear,peropponent,andpergame.

Competitors'HittingPerformance.Thisshouldincluderecenthistoryofcompetitors'batters'hittingperformanceincludingbattingaverage,walks,on-basepercentage,sluggingpercentage,strikeouts,hittingintodoubleplays,buntingsuccesspercentage,on-basesluggingpercentage,winsabovereplacement,andhittingwithrunnersonbasepertimeofyear,peropponent,andpergame.

StadiumInformation.Thisshouldincludelengthdownthelines,lengthtodeepcenter,averagetemperaturesbydayofyear,averagehumiditybydayofyear(veryimportantforknuckleballers),elevation,etc.

Thereareotherdatasourcesthatcouldalsobeconsideredsuchasweatherconditionsatgametime,performancenumbersofthegame'stophistoricalpitchers(forbenchmarkingpurposes),performancenumbersforgame'stophistoricalbatters(again,forbenchmarkingpurposes),andeconomiccosts(salary,bonuses,etc.).

Inarealbigdatastrategydocumentexercise,wewouldcontinuetoevaluateeachofthedifferentdatasourcesfromabusinessvalueandimplementationfeasibilityperspectivevis-à-viseachoftheidentifiedusecases.Thenwewouldgothroughaprioritizationmatrixprocesstoensurethatboththebusinessusers(coaches,frontofficemanagement)andITagreeonwhichusecasestostartwith.

SoplayingtheSanFranciscoGiants'GeneralManagerwasafunexercisethatprovidedanotherperspectiveonhowtousethebigdatastrategydocumentnotonlytobreakdownyourorganization'sbusinessstrategyandkeybusinessinitiativesintothekeybusinessentitiesandkeydecisionsbuttoultimatelyuncoverthesupportingdataandanalyticrequirements.

SummaryThischapterfocusedonthebigdatastrategydocumentandkeyrelatedtopicsincluding:

Introducedtheconceptofabusinessinitiativeandprovidedsomeexamplesofwheretofindthesebusinessinitiatives

Introducedthebigdatastrategydocumentasaframeworkforhelpingorganizationstoidentifytheusecasesthatguidewhereandhowtheycanstarttheirbigdatajourneys

Providedahands-onexampleofthebigdatastrategydocumentinactionusingChipotle,achainoforganicMexicanfoodrestaurants

Introducedworksheetstohelporganizationstodeterminethebusinessvalueandimplementationfeasibilityofthedatasourcesthatcomeoutofthebigdatastrategydocumentprocess

IntroducedtheprioritizationmatrixasatooltohelpdrivebusinessandITalignmentaroundthetoppriorityusecasesovera9-to12-monthwindow

Hadsomefunbyapplyingthebigdatastrategydocumenttotheworldofprofessionalbaseball

Thischapteroutlinedthebigdatastrategydocumentasaframeworktohelpanorganizationidentifywhereandhowtostartitsbigdatajourneyinsupportoftheorganization's9-to12-monthkeybusinessinitiatives.Thebigdatastrategydocumentisatooltoensurethatyourbigdatajourneyisvaluableandrelevantfromabusinessperspective.

ToswingbackaroundtotheChipotlecasestudy,Figure3.11showssomeinitialresultsofthecompany'ssuccesswithits“increasesamestorerevenues”businessinitiative.(Formoreinformation,seethearticleatwww.trefis.com/stock/cmg/articles/210221/chipotles-sales-surge-on-traffic-

gains-high-food-costs-dent-margins/2013-10-21.)

Figure3.11Chipotle'ssamestoresalesresults

It'snicetoseethatourChipotleusecaseactuallyhasarealbusinessstorybehindit.Butthenagain,everybigdatainitiativeshouldhavearealbusinessstorybehindit.Remember,organizationsdon'tneedabigdatastrategyasmuchastheyneedabusinessstrategythatincorporatesbigdata.

HomeworkAssignmentUsethefollowingexercisestoapplythebigdatastrategydocumenttoyourorganization(oroneofyourfavoriteorganizations).

Exercise#1:Startbyidentifyingyourorganization'skeybusinessinitiativesoverthenext9to12months.

Exercise#2:Selectoneofyourbusinessinitiatives,andthenbrainstormthekeybusinessentitiesorstrategicnounsthatimpactthatselectedbusinessinitiative.Asareminder,itisaroundtheindividualbusinessentitiesthatwewanttocapturethebehaviors,tendencies,patterns,trends,preferences,etc.attheindividualbusinessentitylevel.

Exercise#3:Next,brainstormthekeydecisionsthatneedtobemadeabouteachkeybusinessentitywithrespecttothetargetedbusinessinitiative.

Exercise#4:Nextwewanttogroupthedecisionsintocommonusecases;thatis,clusterthosedecisionsthatseemsimilarintheirbusinessorfinancialobjectives.

Exercise#5:Thenbrainstormthedifferentdatasourcesthatyoumightneedtosupportthoseusecases:

Identifypotentialinternalstructured(transactionaldatasources,operationaldatasources)andunstructured(consumercomments,notes,workorders,purchaserequests)datasources

Identifypotentialexternaldatasources(socialmedia,blogs,publiclyavailable,data.gov,websites,mobileapps)thatyoualsomightwanttoconsider

Exercise#6:Usethedataassessmentworksheetstodeterminetherelativebusinessvalueandimplementationfeasibilityofeachoftheidentifieddatasourceswithrespecttothedifferentusecases.

Exercise#7:Finally,usetheprioritizationmatrixtorankeachoftheusecasesvis-à-visbusinessvalueandimplementationfeasibilityoverthenext9to12months.

Notes1https://thewaltdisneycompany.com/investors

Chapter4TheImportanceoftheUserExperienceTheuserexperienceisoneofthesecretstobigdatasuccess,andoneofmyfavoritetopics.Iforganizationscannotdeliverinsightstoitsemployees,managers,partners,andcustomersinawaythatisactionable,thenwhyevenbother.OneofthekeystosuccessintheBigDataMBAisto“beginwithanendinmind”withrespecttounderstandinghowtheanalyticresultsaregoingtobedeliveredtofrontlineemployees,businessmanagers,channelpartners,andcustomersinawaythatisactionable.TheBigDataMBAseeksto“closetheanalyticsloop”withrespecttodeliveringinsightstothekeybusinessstakeholdersviaanactionableuserexperience(UEX).

Chapter4Objectives

Reviewanexampleofan“unintelligent”userexperience.

Highlighttheimportanceof“thinkingdifferently”withrespecttocreatinganactionabledashboardversusbuildingatraditionalBusinessIntelligencedashboard.

Reviewasampleactionabledashboardtargetingfrontlinestoremanagers.

Reviewanothersampleactionabledashboard(financialadvisordashboard)targetingbusiness-to-businesschannelpartners.

ThischapterwillchallengethetraditionalBusinessIntelligenceapproachestobuildingdashboardsbyseekingtoleverageanalyticinsights(e.g.,recommendations,scores,rules)tocreateactionabledashboardsthatempowerfrontlineemployees,guidechannelpartners,andinfluencecustomerbehaviors.

TheUnintelligentUserExperienceOneofmyfavoritesubjectsagainstwhichIlovetorailisthe“unintelligent”userexperience.Thisisaproblemcausedby,inmyhumbleopinion,thelackofeffortbyorganizationstounderstandtheirkeybusinessstakeholderswellenoughtobeabletodeliveractionableinsightsinsupportoftheorganizations'keybusinessinitiatives.Andthisuserexperienceproblemisoftenonlyexacerbatedbybigdata.

Hereisareal-worldexampleofhowNOTtoleverageactionableanalyticsinyourorganization'sengagementswithyourcustomers.Thenameshavebeenchangedtoprotecttheguilty.

MydaughterAmeliagotthee-mail(seeFigure4.1)fromourcellphoneproviderwarningherthatshewasabouttoexceedhermonthlydatausagelimitof2GB.Shewasveryupsetthatshewasabouttogooverherlimit,anditwouldstartcostingher(actually,me)anadditional$10.00perGBoverthelimit.(Note:The“Monday,August13,2012”dateinthefigurewillplayanimportantroleinthisstory.)

Figure4.1Originalsubscribere-mail

IaskedAmeliawhatinformationshethoughtsheneededinordertomakeadecisionaboutalteringherFacebook,Pandora,Vine,Snapchat,andInstagramusage(sincethosearethemaindatahogculpritsinhercase)sothatshewouldnotexceedherdataplanlimits.Shethoughtforawhileandthensaidthatshethoughtsheneededthefollowinginformation:

Howmuchofherdataplandoesshehaveleftinthecurrentmonth?

Whendoeshernewmonthorbillingperiodstart?

Athercurrentusagerate,whenwillsherunoverforthismonth?

CapturetheKeyDecisionsThisusecaseprovidesagoodexampleoftheprocessthatorganizationscanemploytoidentifythekeybusinessdecisionsthattheorganization'skeybusinessstakeholdersneedtoaddressinordertosupporttheorganization'skeybusinessinitiatives.Hereisanabbreviatedprocess(thatissimilartotheprocesswejustlearnedwiththebigdatastrategydocumentinChapter3):

Step1:Understandyourorganization'skeybusinessinitiativesorbusinesschallenge(inthisexampletheinitiativeis“Don'texceedyourmonthlydatausageplan”).

Step2:Identifyyourkeybusinessstakeholders(Ameliaandmeinthisexample).

Step3:Capturethedecisionsthatthekeybusinessstakeholdersneedtomakeinordertosupporttheorganization'skeybusinessinitiatives(e.g.,alterFacebook,Pandora,Vine,Snapchat,andInstagramusage).

Step4:Brainstormthequestionsthatthekeystakeholdersneedtoanswertofacilitatemakingthedecisions(HowmuchofmydataplandoIhaveleft?Whendoesmynewmonthstart?WhenwillIrunoverformycurrentperiodgivenmycurrentusage?).

SupporttheUserDecisionsUnderstandingtherelationshipbetweenyourbusinessinitiativeandthesupportingdecisionsandquestionsthatneedtobeaddressediskeytocreatingauserexperiencethatprovidestherightinformation(oractionableinsights)totherightusertomaketherightdecisionsattherighttime.

Sotocontinuethecellularproviderstory,IwentonlinetoresearchAmelia'skeyquestions:

Question Answer

HowmuchofmydataplandoIhaveleft?

CurrentusageasofAugust13is65percent

Whendoesmynewmonthstart?

OnAugust14,whichis1dayfromtoday

WhenamIlikelytorunovermydataplanlimit?

Theprobabilityofyouoverrunningyourdataplanis0.00001percent…orNEVER!!

Sogiventheresultsofmyanalysis,Ameliahadnothingtoworryaboutasshewouldhavetoconsumenearlyasmuchbandwidthinherfinal24hoursasshehadconsumedtheprevious30days.Theprobabilityofthathappening:nearzero(oraboutthesameprobabilityofmebeatingUsainBoltinthe100-meterdash).Thebottomlineisthatthee-mailshouldhaveneverbeensent.TherewasnothingforAmeliatoworryabout,anditonlycausedunnecessaryangst.Notthesortofuser

experiencethatorganizationsshouldbetargeting.

OurcellularprovidercouldhaveprovidedauserexperiencethathighlightedtheinformationandinsightsnecessarytohelpAmeliamakeadecisionaboutdatausage.Theuserexperiencecouldhavelookedsomethinglikethee-mailmessageshowninFigure4.2.

Figure4.2Improvedsubscribere-mail

Thissamplee-mailhasalltheinformationthatAmelianeedstomakeadecisionaboutusagebehaviorsincluding:

Actualusagetodate(65percent)

Aforecastofusagebytheendoftheperiod(67percent)

Thedatewhenthedataplanwillreset(in1dayonAugust14)

Withthisinformation,Ameliaisnowinapositiontomakethe“right”decisionaboutherdataplanusage.

ConsumerCaseStudy:ImproveCustomerEngagementButlet'stakethiscasestudyonestepfurther.Let'ssaythatthereactuallywasgoingtobeaproblemwithAmelia'susageandherdataplan.Whatif82percentofdatausagehadbeenconsumedwith50percentofusageperiodremaining?Howdowemaketheuserexperienceandthecustomerengagementuseful,relevant,andactionable?

Themock-upshowninFigure4.3offersonepotentialapproachbasedonthesameprinciplesdiscussedearlier:provideenoughinformationtohelpAmeliachangeherusagebehaviors.However,FutureTelcocouldalsotaketheuserexperienceandcustomerengagementonestepfurtherandofferhersomerecommendationstoavoidthedataplanoverage.

Forexample,FutureTelcocouldofferprescriptiveadviceabouthowtoreducedataconsumptionsuchas:

Transitioningtoappsthataremoredatausageefficient(i.e.,transitioningfromPandoratoRdiooriHeartRadioforstreamingradio,assumingthatRdioandiHeartRadioaremoreefficientintheirusageofthedatabandwidth)

Turningoffappsinthebackgroundthatareunnecessarilyconsumingdatasuchasmappingapps(likeAppleMaporWaze)orappsthatareusingGPStracking

FutureTelcocouldevenofferAmeliaoptionstoavoidpayinganoveragepenalty(seeFigure4.3)suchas:

Purchasea1-monthdatausageupgradefor$2.00(whichischeaperthanthe$10overagepenalty)

Upgradeexistingcontract(covering6months)for$10.00

Figure4.3Actionablesubscribere-mail

Butwait,thereisevenmorethatFutureTelcocoulddotoimprovethecustomerexperience.FutureTelcocouldanalyzeAmelia'sappusagetendenciesandrecommendnewappsbasedonotherappsthatuserslikeAmeliause,similartowhatAmazonandNetflixdo(seeFigure4.4).

Figure4.4Apprecommendations

Thislevelofcustomerintimacycanopenupallsortsofnewmonetizationopportunitiessuchas:

Leverageyourcustomer'susagepatternsandbehaviorstorecommendappsthatmovetheuserintoamoreprofitable,high-retentionusercategory

Helpappdeveloperstobemoresuccessfulwhilecollectingreferralfees,co-

marketingfees,andothermonetizationideasthatalignwiththeappdevelopers'businessobjectives

Cellularprovidersarenotaloneinmissingopportunitiestoleveragecustomerinsightsinordertoprovideamorerelevant,moremeaningfulcustomerexperience.Manyorganizationsaresittingongoldminesofinsightsabouttheircustomers'buyingandusagepatterns,tendencies,propensities,andareasofinterest,butlittleofthatinformationisbeingpackagedanddeliveredinamannerthatimprovestheuserexperience.Bigdataoftenonlyexacerbatesthisproblem.Organizationswilleitherlearntoleveragebigdataasanopportunitytoimprovetheiruserexperience,ortheywillgetburiedbythedataandcontinuetoprovideirrelevantandevenmisleadingcustomerexperiences.

BusinessCaseStudy:EnableFrontlineEmployeesIhadtheopportunitytorunavisionworkshopforagroceryretailer.Thegoalofthesessionwastoidentifyhowthegrocerychaincouldleveragebigdataandadvancedanalyticstodeliveractionableinsights(orrecommendations)tostoremanagersinordertohelpthemimprovestoreperformance.

Bigdatacantransformthebusinessbyenablingacompletelynewuserexperience(UEX)builtaroundinsightandrecommendationsversusjusttraditionalBusinessIntelligencechartsandtables.Retailers,likemostorganizations,canleveragedetailed,historicaltransactionaldata—coupledwithnewsourcesof“right-time”datalikelocalcompetitors'promotions(e.g.,“bestfooddays,”whichisthedaywhengrocerystoresposttheirweeklypromotions),weather,andevents—touncovernewinsightabouttheircustomers,products,merchandising,competitorsandoperations.Bigdataprovidesorganizationstheabilityto(1)rapidlyingestthesenewsourcesofcustomer,product,andoperationaldataandthen(2)leveragedatasciencetoyieldreal-time,actionableinsights.

Let'swalkthroughanexampleofintegratingbigdatawithatraditionalBIdashboardtocreateamoreactionableuserexperiencethatempowersfrontlineemployeesandmanagers.

StoreManagerDashboardWestartwithatraditionalBusinessIntelligencedashboard.Thisdashboardprovidesthekeyperformanceindicators(KPIs)andmetricsagainstwhichthestoremanagermeasurestheperformanceofthestore.ThedashboardcanalsopresentsalesandmargintrendsandpreviousperiodcomparisonsforthoseKPIs.ThisisprettystandardBusinessIntelligencework(seeFigure4.5).

Figure4.5TraditionalBusinessIntelligencedashboard

ThechallengewiththesetraditionalBIdashboardsisthatunlessyouareananalyst,it'snotclearwhatactiontheuserissupposedtotake.Arrowsup,sideways,anddown…Icanseemyperformance,butthedashboarddoesn'tprovideanyinsightstotellthestoremanagerwhatactionstotake.

Theotherchallengeisthatthestoremanager(likemostfrontlineemployeesandmanagers)likelydoesnothaveaBIorananalyticsbackground(likelyworkedhiswayuptheranksinthegrocerystore).Asaresult,UEXandtheactionableinsightsandrecommendationsarecriticalbecausethestoremanagerdoesnotknowhowtodrillintotheBIreportsanddashboardstouncoverinsightsbasedontherawdata.

WecanbuildonthistraditionalBIdashboardbyincludingmorepredictiveandprescriptiveanalytics.InFigure4.6,thetoppartofthenewactionabledashboard(SectionsAandB)leveragespredictiveanalyticsandprescriptiveanalyticstoproviderecommendationsthatcanhelpthestoremanagermakemoreprofitablebusinessdecisions.

Figure4.6Actionablestoremanagerdashboard

InFigure4.6,SectionAshowsspecificproduct,promotion,placement,andpricingrecommendationsbasedonthelayoutofaspecificstore.SectionBprovidesspecificrecommendationsconcerningpricing,merchandising,inventory,staffing,promotions,etc.forthestoremanager.

EachrecommendationinSectionBispresentedwithAccept[+]orReject[-]options.Ifthestoremanageracceptstherecommendationbyselecting[+],thatrecommendationisexecuted(e.g.,raiseprices,addpromotion,addinventory,etc.).However,ifthestoremanagerrejectstherecommendation,thentheactionabledashboardcapturesthereasonfortherejectionsothatthesupportinganalyticmodelscanbeconstantlyfine-tuned(seeFigure4.7).

Finally,thestoremanagercanselecttheMoreoptioninSectionBandmodifytherecommendationbasedonhisownexperience.Allowingthestoremanagertomodifytherecommendationsbasedonhispersonalexperiencesallowstheunderlyinganalyticmodelstoconstantlylearnwhatworksandwhatdoesn'tworkandbuildonthebestpracticesandlearningsfromtheorganization'smosteffectiveandtop-performingstoremanagers.

Figure4.7Storemanageraccept/rejectrecommendations

SampleUseCase:CompetitiveAnalysisOneusecaseforthestoremanagerdashboardenablesthestoremanagertomonitorlocalcompetitiveactivityandpromotions.Thegroceryindustryisverylocallycompetitive.Competitors,forthemostpart,arewithinjustafewmilesorevenblocksofeachother.Inthiscompetitiveanalysisusecase,thedashboardprovidesamapofthelocalgroceryandbeveragecompetitors(seeSectionCofFigure4.7).Hoveringoveranyparticularcompetitoronthemapimmediatelybringsupitscurrentmarketingflyer.Thestoremanager(orhisbusinessanalyst)canbrowsethrougheachofthecompetitors'flyersandmakecustomstorerecommendationsaroundpricing,promotion,merchandising,inventory,andstaffingbasedonthecompetitor'splans(seeFigure4.8).

Figure4.8Competitiveanalysisusecase

Liketheotherrecommendations,thestoremanager'scustomrecommendationswillbemonitoredforeffectivenesssothattheanalyticmodelscanbeconstantlyupdatedandrefined.

AdditionalUseCasesAdditionalusecasescaneasilybeaddedtothestoremanagerdashboard.Wecanaddausecaseforintegratingthelocaleventscalendarintothedashboardwithassociatedstoremanagerpricing,product,promotions,staffing,merchandising,andinventoryrecommendations.Thestoremanagercananalyzethelocaleventscalendartoflageventsthatmayhaveapositiveornegativeimpactonstoresales(seeFigure4.9).Inthisexample,thelocaleventscalendarhighlightstwoevents:(1)Stanfordcollegefootballgame(whichshouldincreasethesaleofbeer,chips,burgers,andothertailgatingmaterials)and(2)farmersmarket(whichshoulddecreasethesaleoffreshproduceandfruitsandotherorganicitems).Theanalyticssupportingthedashboardcouldautomaticallyanalyzetheresultsofpreviouslocaleventsandleveragepredictiveanalyticstopredicthowthoseeventsmightimpactstoretrafficandthesalesofspecificproductcategories.

Figure4.9Localeventsusecase

Anotherusecaseistointegratethelocalweatherforecastintothestoremanagerdashboard.Thestoremanagercananalyzethelocalweatherforecastsandmakeadjustmentsforinventory,merchandising,andpromotionsbasedonwhethertheweatherwillbewarmerorcolderthanexpected(seeFigure4.10).Thedashboardcanautomaticallyanalyzesimilarweatherconditionsandpredicttheimpactonstoretrafficandproductcategorysalesanddeliverrelevantrecommendationstothestoremanager.

Figure4.10Localweatherusecase

Thedashboardcouldevencouplethecompetitiveactivities,localevents,andweatherdatatopredictwhatsortofimpactthecombinationofthesemighthaveonstoretrafficandproductcategorydemand.Theseinsightscouldyieldnewrecommendationsthatdrivethestoremanager'sdecisionsaboutpricing,promotions,merchandising,staffing,and/orinventory.

B2BCaseStudy:MaketheChannelMoreEffectiveOverthecourseofmytravels,Ihavemetwithseveralorganizationsthatworkthroughpartners,brokers,agents,andadvisorstogettheirproductsandservicesintothehandsoftheendconsumer.Thesebusiness-to-business(B2B)organizationsfaceuniquechallenges:

Theyhavetoworkextrahardandbeverycreativeingatheringdataabouthowendconsumersarebuyingandusingtheirproducts.

Theyneedtofindawaytominethecustomer,product,andoperationaldatatouncoverinsightsandmakerecommendationsthatmaketheirpartners,brokers,andagentsmoreeffective.

Thiscanbefrustrating,especiallyinlightofallthesuccessstoriesfrombusiness-to-consumer(B2C)organizationssuchasretailers,mobilephoneproviders,creditcardcompanies,travel,entertainmentandhospitalitycompanies,andotherorganizationsthathavedirectengagementwiththeendconsumer.

Butnottofear,therearethingsthattheseB2Borganizationscandotoencouragetheirpartners,brokers,agents,andadvisorstosharemoreofthatvaluableconsumerdata.TherearealsouniqueinsightsthattheseB2Borganizationscanprovidetotheirpartnersandchannelstomakethemmoreeffective(and,hopefully,evenmorewillingtosharetheendconsumer'spurchaseandengagementdata).

Forpurposesofthiscasestudy,IhavecreatedafictitiousfinancialservicescompanycalledFSI.We'llassumethatFSIsellsitsproductsandservicesviaindependentfinancialadvisors.Ihopethatyoucanseetheapplicabilityofthisusecasetoanyindustrythatmustworkthroughpartners,brokers,agents,andadvisorstoreachtheirendconsumers.

TheAdvisorsAreYourPartners—MakeThemSuccessfulManyfinancialservicesadvisorsaresmall,specializedfirmswith1to10employeesthatprovidefinancialadvicetoasmallgroupofcustomers.Manylackthetechnicalandanalyticcapabilitiestoanalyzelargeamountsofdataanddeveloppredictiveandprescriptivemodelsbasedontheirclients'financialgoals,currentfinancialsituation,ongoingfinancialconversations,anddeephistoryoffinancialtransactions.

ThisprovidesabusinessopportunityforFSItomarketcustomer,product,andmarketinsightstotheseindependentfinancialadvisors.Theseinsightscouldinclude:

Benchmarks:Whatrangeofreturnsshouldaclientexpectfromacertaintypeofportfolio?Howdoestheclient'sportfolioperformancecomparetothatofsimilarclientswithsimilarfinancialobjectives?What'sthetypicalfinancialsituationandassetbaseforotherclientsinsimilarfinancialconditions?

What'stheidealportfoliomixgivenmyclient'sageandspecificfinancialgoals?

PortfolioMix:Howdoesmyclient'spercentageoffinancialandinvestmentcontributionscomparetothatofotherslikehim?Howdoesmyclient'sportfolioandfinancialassetmixcomparetothatofotherslikehim?

BestPractices:Whatarethebestperformingportfoliosforsomeonewiththesamefinancialgoalsgivenhisorherageandemploymenttimeframe?Whatarethebestperforminginvestmentinstrumentsforclientsgiventhesameretirementhorizonasmyclient?

IndustryTrends:Whatcurrentfinancialinstrumentsprovidethebestreturn-to-riskratio?Howarethesefinancialinstrumentsprojectedtoperformoverthenext1,5,and10years?Whatarethemostspecificcontributionandinvestmentrisksforwhichmyclientneedstoplan?

FinancialAdvisorCaseStudyOkay,let'sgettothefunstuff!Let'sseehowFSIcouldleveragetheinsightsmentionedaboveto(1)maketheindependentfinancialadvisorsmoreeffectiveinsupportingtheirclientsand(2)createastronger,moreprofitablerelationshipbetweenFSIandthefinancialadvisors.Let'sexplorehowFSIcoulddeliverclient-specificrecommendations(prescriptiveanalytics)inawaythatisactionableforboththeadvisorandtheclient.Let'salsoexplorehowFSIcouldcreateanactionabledashboardtodrivefurtherclientengagementthatcouldgatherevenmoreclientfinancialdata(financialgoals,employmentplans,spendingpatterns)thatFSIcouldusetoimproveitspredictiveandprescriptivemodeling.

Thefinancialadvisordashboardshouldaddressthefollowingfunctionality(seeFigure4.11):

Reportonclient'scurrentfinancialstatusandrecentfinancialperformance

Assessclient'sfinancialstatusandprogressagainstpersonalfinancialgoalssuchas:

Buyingacar

Buyingahome

Collegeeducation

Startingabusiness

Careerorlifechange

Retirement

Deliverrecommendationstothefinancialadvisorstoimprovetheclient'sfinancialperformancesuchas:

Modifyfinancialcontributions

Adjustinvestmentstrategies(short-term,long-term)

Reallocatefinancialportfolio

Changeinvestmentvehicles(stocks,bonds,mutualfunds,etc.)

Figure4.11Financialadvisordashboard

Thegoalofthefinancialadvisordashboardistouncoverinsightsabouttheclient'sinvestmentperformanceandprovideclient-specificrecommendationsthathelptheseclientsreachtheirfinancialgoals.Togenerateactionable,accuraterecommendations,we'regoingtoneedtoknowasmuchabouttheclientaspossible,including:

Currentandhistoricalpersonalbackgroundinformation(e.g.,maritalstatus,spouse'sfinancialandemploymentsituation,numberandageofchildren,outstandingmortgageonhome(s)andanysecondaryrealestateinvestments)

Currentfinancialinvestmentsandotherassets(e.g.,stocks,bonds,mutualfunds,IRAs,401-Ks,REITs)

Currentandhistoricalincome(andexpenditures,ifpossible)

Financialgoalswithspecifictimelines

Weneedtoensurethatthefinancialadvisordashboardprovidesenoughvaluetoboththefinancialadvisorandtheadvisor'sclientsinordertoincenttheclientstoshareasmuchofthisdataaspossible.

InformationalSectionsofFinancialAdvisorDashboardLet'sexamineinmoredetailsthekeyinformationalsectionsofthefinancialadvisordashboard.Thesesectionsformthefoundationformuchoftheanalyticsthatwillbedevelopedtosupporttheclient'sfinancialgoals.

ClientPersonalInformation:Thefirstpartofthedashboardpresentsrelevantclientpersonalandfinancialinformation.FSIwantstogatherasmuchpersonalinformationasisrelevantwhentheclientfirstopenshisaccounts.Butaftertheclientopenshisaccount,thereneedstobeaconcertedefforttokeepthedataupdatedandcapturenewlifestyle,lifestage,employment,andfamilyinformation.Muchofthatclientdatacanbecapturedviadiscussionsandinteractionsthatthefinancialadvisorishavingwiththeclient(e.g.,informationalcalls,e-maildialogues,officevisits,annualreviews).WhilethisinformationisgoldtoFSI,muchofthisdatanevergetspastthefinancialadvisors'personalcontactmanagementande-mailsystems.FSImustprovidecompellingreasonstopersuadethefinancialadvisorsandclientstosharemoreofthisdatawithFSI(seeFigure4.12).

Someleading-edgeorganizationsareprovidingincentives(e.g.,discounts,promotions,contests,rewards)forclientstosharetheirsocialmediainteractions.Obviously,accesstotheclient'scurrentsituationandplansaspostedonsocialmediasitesisgoldwhenitcanbeminedtouncoveractivitiesthatmightaffecthisfinancialneeds(e.g.,vacations,buyinganewcar,upcomingweddingplans,promotions,jobchanges,childrenchangingschools).

ClientFinancialStatus:Thenextsectionofthedashboardprovidesanoverviewoftheclient'scurrentfinancialstatus.Again,themoredatathatcanbegatheredabouttheclient'sfinancialsituation(e.g.,investments,home,spending,debt),themoreaccurateandprescriptivetheanalyticmodelswillbe(seeFigure4.13).

Figure4.12Clientpersonalinformation

Figure4.13Clientfinancialinformation

Inthisexample,wehavedetailsonalltheclient'sfinancialinvestmentswithFSI.However,theclientmight(andlikelydoes)havefinancialinvestmentswithotherfirmscourtesyofhisemployer's401kprograms,wholelifeinsurancepolicies,andotherstocks,bonds,andfunds.Andthatdoesn'tevenconsidersubstantialinvestmentsinnonfinancialinstrumentslikehisprimaryresidence,vacationhome,antiques,andcollectibles.

Incentingclientstosharetheirentirefinancialportfolioiscomplicatedbyhowharditisforaclienttopullallthatinformationtogetherinoneplace.However,Mint.comhasfiguredouthowtoaggregatefinancialspendingfromcreditcardsandbankchecks.Theinclusionoftheclient'sexpendituredatacouldbeinvaluableinbuildingaclientprofileanddevelopingspecific,actionablefinancialrecommendations.

ClientFinancialGoals:Thefinalinformationalsectionofthedashboardcontainstheclient'sfinancialgoals.Therearelikelyonlyasmallnumberofgoals,andtheyprobablydon'tchangethatoften.However,itisdifficulttodevelopmeaningfulclientfinancialrecommendationswithoutup-to-dateclientfinancialgoals.Fromadatacollectionperspective,thisisprobablytheeasiestdatatocapture,giventhatyouhaveadequatelyaddressedtheclient'sprivacyandsecurityconcerns(seeFigure4.14).

Figure4.14Clientfinancialgoals

However,let'ssaythattheclienteitherwon'tsharehisfinancialgoalsorhasn'teventhoughtthroughwhathisfinancialgoalsneedtobe.Thisiscommonwhendealingwithretirementplanning,sincemanyclientsaren'tclearorrealisticabouttheirretirementgoals.Inthesesituations,FSIcouldleveragetheinformationthatithasabout“similar”clientstomakeretirementgoalrecommendations.IfFSIhastheclient'scurrentfinancialinvestmentsandcurrentsalary,FSIcouldmakeaprettyintelligentguessastotheclient'sretirementgoals.

RecommendationsSectionofFinancialAdvisorDashboardNowlet'sgetintothemeatofthefinancialadvisordashboard.Theclientinformationsectionsofthedashboardweremeanttoprovideaneasyandefficientwaytocapturetheclient'skeylifestyle,demographic,andfinancialdata,aswellashisfinancialgoals.Nowwecancreatepredictivemodelstopredictthelikelyresultsofdifferentfinancialoptionsandactions,andthencreateprescriptivemodelsinordertodeliverclient-specificrecommendationsthathelptheclienttoreachhisfinancialgoals.Thisfinancialadvisordashboardcoversfourdifferentareasfordeliveringclient-specificfinancialrecommendations:

Financialcontributions

Spendinganalysis

Assetallocation

Otherfinancialinvestments

FinancialContributionsRecommendationsThefirstsetofrecommendationsisfocusedonhelpingtheclientoptimizefinancialcontributions(seeFigure4.15).Thetypesofclientdecisionsthatcouldbemodeledinclude:

Monthlyinvestmentsandperiodicincreasesandadjustments

Lifeinsurancecoverageadjustments

Onetimepaymentstojump-startlaggingfinancialgoals

Reallocatemonthlyorperiodicpaymentsagainstdifferentfinancialgoals

Changeretirement,newcar,andnewhometargetdates

Figure4.15Financialcontributionsrecommendations

Wecouldemploydatasciencetoanalyzetheclient'sdetailedfinancialdata,comparethatdatawithbenchmarksacrosssimilarclientsanddevelopclient-specificanalyticprofiles.Thefinancialadvisordashboardcouldprovidea“whatif”capabilitythatallowsthefinancialadvisortoworkwiththeclienttotestoutdifferentscenarios(e.g.,changestoinvestmentamounts,changestofinancialgoaltargetdates).

SpendingAnalysisRecommendationsThesecondsetofrecommendationsisfocusedonhelpingtheclientoptimizespendinghabits.Thisiswhereaccesstotheclient'screditcardandbankingstatements(maybeviaMint.comand/orhischeckingaccounts)couldyieldvaluableinsightstohelptheclientminimizecashoutflowandincreasefinancialinvestments(seeFigure4.16).Thetypesofspendingdecisionsthatwouldneedtobemodeledinclude:

Consolidatingexpendituresofsimilarproductsandservices

Flaggingexpendituresthatareabnormallyhighgiventheclient'sfamilysituation,homelocation,etc.

Integratingcustomerloyaltyprograminformationtofindretailerswhocanprovidebestpricesonfoodandhouseholdstaples

Increasinginsurancedeductiblestolowerpremiums

Findingmorecost-effectivehome,property,andautoinsurance

Figure4.16Spendanalysisandrecommendations

TherearelotsofopportunitiestoleverageexternaldatasourcesandbestpracticesacrosstheFSIclientbasetofindbetterdealsinanattempttoreducetheclient'sdiscretionaryspending.Thereareseveralretail,insurance,travel,hospitality,entertainment,cellphone,andotherwebsitesfromwhichdatacouldbegathered.Thisdatacouldbeusedtocreaterecommendationstoreducetheclient'sspendingandoptimizetheclient'smonthlybudget,withthesavingsbeingusedtoincreasefinancialcontributionsagainsttheclient'sfinancialgoals.

AssetAllocationRecommendationsThethirdsetofrecommendationsisfocusedonhelpingclientsoptimizetheirassetallocationinlightoftheirfinancialgoals.Byleveragingbestpracticesacrossotherclients,portfolios,andinvestmentinstruments,prescriptiveanalyticscanbedevelopedtomakespecificassetallocationrecommendationsthatsupportassetallocationdecisionssuchas(seeFigure4.17):

Whichstocksandbondstosellorbuyagainstspecificfinancialgoalportfolios

Portfolioallocationdecisionsthatproperlybalancetherisk-returnratiooftheclient'sportfolioinlightofrisktoleranceandfinancialgoals

Otherfinancialinstrumentsthatcanacceleratetheclient'sprogressagainstfinancialgoalsorreduceriskforthoseshort-termfinancialgoals

Figure4.17Assetallocationrecommendations

TherearemanyopportunitiestoleverageanalyticbestpracticesacrossFSI'sclientbasetomakeinvestmentrecommendationsthatcanimproveperformancegivena

client'sdesiredrisklevel.Tofurtherprotecttheclient'sinvestmentassets,anaggregatedviewofthemarketplacecouldyieldmoretimelyinsightsintostocksandbondsthataresuddenlyhotorcold.Thisisalsoanareawherereal-timeanalyticscanbeleveragedtoensurethatnosuddenmarketmovementsexposetheclienttounnecessaryassetallocationrisks.Thedashboardcouldalsosupportaninteractive“whatif”collaborationdirectlywiththeclienttogleanevenmoredataandinsightsabouttheclient'sinvestmentpreferencesandtoleranceforrisk.

OtherInvestmentRecommendationsThefourthsetofrecommendationsisfocusedonotherassetsthatclientsneedtoconsideraspartoftheiroverallfinancialstrategy.Realestate(theclient'shomeandanyvacationhomes)isprobablythemostobvious.Thisisanareawhererecommendationsaboutotherinvestmentoptionscanbedeliveredtohelpsupportclientdecisionsregarding(seeFigure4.18):

Identifyingtheidealamountofinsuranceneededgivenhomevaluationchanges

HomeimprovementprojectsthatyieldthebestROIforparticularhousetypes,budgets,andlocationsovertime

Identifyingtherighttimetobuyorsellahome,andevenmakingrecommendationsastowhatpricetobidforhomesinselectareas

Bestareastolookforsecondaryand/orvacationhomeinvestments

Mostcost-effectivelocationstoliveinafterretirement

Figure4.18Otherinvestmentrecommendations

Thereisabevyofexternaldatasourcesthatcanbeleveragedtohelpfacilitateanalyticsinthisarea.Forexample,ZillowandRealtor.comproviderealestatevaluationsandmonthlychangesinrealestatevaluationsthatcouldbeincorporatedintothefinancialadvisordashboard.Costoflivingmetrics,whichcanbeusedtoidentifyidealretirementareas,canbefoundonmanyfinancialwebsitesincludingdata.gov.

SummaryBigdatacanpoweramorerelevantandmoreactionableuserexperience.Insteadofoverwhelmingbusinessuserswithanendlessarrayofcharts,reports,anddashboardsandforcingusersto“sliceanddice”theirwaytoinsights,wecaninsteadleveragethewealthofavailablestructuredandunstructureddatasources,inreal-time,coupledwithdatasciencetouncovercustomer,product,andoperationalinsightsburiedinthedata.Wecanleveragethoseinsightstocreatefrontlineemployee,manager,andcustomerrecommendationsandthenmeasuretheeffectivenessofthoserecommendationssothatwearecontinuouslyrefiningouranalyticmodels.

BigdatacanalsohaveseriousimplicationsforB2Borganizationsthatrelyonbrokers,agents,andadvisorstoreachtheirultimateendconsumer.WhileitmayfrustratemanyB2Borganizationsthattheylackthatdirectengagementwithconsumers,therearewaysthatB2Borganizationscanleveragenewsourcesofdataandanalyticscapabilitiestonotonlyimprovetheeffectivenessoftheirbrokers,agents,andadvisorsbutalsoprovidecompellingreasonswhythebrokers,agents,advisors,andendconsumersshoulddirectlysharemoredatawiththeB2Borganizationtocreateawin-win-winforclients,advisors,andtheB2Borganization.

HomeworkAssignmentUsethefollowingexercisetoapplywhatyoulearnedinthischapter.

Exercise#1:Selectoneofyourorganization'soutward-facingdashboards,websites,ormobileapps.Ifnotsomethingfromyourorganization,thenselectawebsiteordashboardthatyouuseregularly.Thatmightincludesomethingfromyourbank,creditcardprovider,cellularprovider,orutilitycompany.Grabafewscreencapturesofthedashboardorwebsite.

Exercise#2:Thinkthroughhowyouastheuserusethisdashboard,website,ormobiletomakedecisions.Writedownthosedecisionsthatyoutrytomakefromthewebsite.Forexample,fromyourutility,youmightwanttomakedecisionsaboutenergyandwaterconsumption,yourwaste/garbageplan,andmaybeevenwhichofthedifferentappliancerebatesyoumightwanttoconsider.

Exercise#3:Next,addarecommendationspanelthathassuggestionsforeachofthedecisionsthatyoucapturedinStep2.Forourutilityexample,onerecommendationmightbe“Onlywater3daysaweekfrom6:00a.m.to7:00a.m.tosaveapproximately$12.50permonthonyourmonthlywaterbill.”Oranotherrecommendationmightbe“ReplaceyourexistingdryerwithamoreefficientmodelliketheSamsungDV457tosave$21.75onyourmonthlyenergybill.”

Exercise#4:Finally,identifypotentialexternaldatasourcesthatmightprovidesomeinterestingperspectivesthatcouldbeusedtoguideyourkeydecisions.Forourutilityexample,youmightwanttoconsiderintegratinglocalsolarenergycosts(todetermineifsolarenergyisafeasibleenergyoption)orweatherforecasts(toseeifyoucanreducelawnwatering).

PartIIDataScienceThesethreechaptersintroducedatascienceasakeybusinessdisciplinethathelpsorganizations“crosstheanalyticschasm”fromtheBusinessMonitoringtoBusinessInsightsandBusinessOptimizationphases.Thesechapterswillintroducetheconceptofdatascienceandthenbroadenthediscussiontocoverwhatdatasciencetechniquestouseinwhichbusinessscenarios.

InThisPart

Chapter5:DifferencesBetweenBusinessIntelligenceandDataScience

Chapter6:DataScience101

Chapter7:TheDataLake

Chapter5DifferencesBetweenBusinessIntelligenceandDataScienceIwashiredbyalargeInternetportalcompanyin2007toheadupeffortstodevelopitsadvertiseranalytics.TheobjectiveoftheadvertiseranalyticsprojectwastohelptheInternetportalcompany'sadvertisersandagenciesoptimizetheiradvertisingspendacrosstheInternetportal'sadnetwork.Theinternalcodenamefortheprojectwas“LookingGlass”becausewewantedtotaketheadvertisersandagenciesthroughan“AliceinWonderland”typeofexperienceinhowwedeliveredactionableinsightstohelpourkeybusinessstakeholders—MediaPlanners&BuyersandCampaignManagers—successfullyoptimizetheiradvertisingspendontheInternetportal'sadnetwork.Butinmanyways,itwasmethatwentthroughthelookingglass.

Severalmonthslater(August2008),IhadtheopportunitytokeynoteatTheDataWarehouseInstitute(TDWI)conferenceinSanDiego.ItaughtaclassatTDWIonhowtobuildanalyticapplications,soIwasbothfamiliarwithandabigfanoftheTDWIconferences(andstillam).However,inmykeynote,ItoldtheaudiencethateverythingthatIhadtaughtthemabouthowtobuildanalyticapplicationswaswrong(seeFigure5.1).

Figure5.1SchmarzoTDWIkeynote,August2008

Likewithmyownpersonalexperience,manyorganizationsandindividualsareconfusedbythedifferencesintroducedbybigdata,especiallythedifferencesbetweenBusinessIntelligence(BI)anddatascience.BigdataisnotbigBI.Bigdataisakeyenablerofanewdisciplinecalleddatasciencethatseekstoleveragenewsourcesofstructuredandunstructureddata,coupledwithpredictiveandprescriptiveanalytics,touncovernewvariablesandmetricsthatarebetterpredictorsofperformance.AndwhileBIanddatasciencesharemanyofthesameobjectives(gettingvalueoutofdata,dealingwithdirtydata,transformingandaligningdata,helpingsupportimproveddecisionmaking),thequestions,characteristics,processes,tools,andmodelscouldn'tbemoredifferent.

ThischapterdiscussesthedifferencesbetweenBIanddatascience:

Thequestionsaredifferent.

Theanalyticcharacteristicsaredifferent.

Theanalyticengagementprocessesaredifferent.

Thedatamodelsaredifferent.

Thebusinessviewisdifferent.

Solet'sstartyourjourneythroughthe“lookingglass.”Ipromisethatthejourneywillbeenlightening(butnohookahsmoking)!

WhatIsDataScience?Datascienceisacomplicatednewdisciplinethatrequiresadvancedskillsandcompetenciesinareassuchasstatistics,computerscience,datamining,mathematics,andcomputerprogramming.Ashadbeenstatedcountlesstimes,datascientistsarethebusiness“rockstars”ofthe21stcentury.

Althoughwhatdatascientistsdocanbequitecomplex,whattheyaretryingtoachieveisnot.Infact,IfindthattheverybestintroductorybooktodatascienceisMoneyball:TheArtofWinninganUnfairGamebyMichaelLewis(W.W.Norton&Company,2004).ThebookisabouttheOaklandA'sGeneralManagerBillyBeane'suseofsabermetricstohelpthesmall-marketOaklandA'sprofessionalbaseballteamoutperformcompetitorswithsignificantlylargerbankrolls.Thebookyieldsthemostaccuratedescriptionofdatascience:

Datascienceisaboutfindingnewvariablesandmetricsthatarebetterpredictorsofperformance.

That'sit—nothingmore—andyes,datascienceisthatsimple.Butthepowerofthatsimplestatementisgamechanging,ascanbeseeninFigure5.2andthesuccessthatBillyBeaneandtheOaklandA'shaveachievedbymakingplayeracquisitionsandin-gamedecisionsbasedonadifferent,morepredictivesetofmetrics.

Figure5.2OaklandA'sversusNewYorkYankeescostperwin

Thebookalsohasanothervaluablelesson:goodideascanbecopied.Soorganizationshavetoconstantlybeonthesearchforthosenewvariablesandmetricsthatarebetterpredictorsofperformance—tofindthatnext,more

predictive“on-basepercentage”metric.

BIVersusDataScience:TheQuestionsAreDifferentWhenclientsaskmetoexplainthedifferencebetweenaBusinessIntelligenceanalystandadatascientist,Istartbyexplainingthatthetwodisciplineshavedifferentobjectivesandseektoanswerdifferenttypesofquestions(seeFigure5.3).

Figure5.3BusinessIntelligenceversusdatascience

BIQuestionsBIfocusesondescriptiveanalytics:thatis,the“Whathappened?”typesofquestions.Examplesinclude:

HowmanywidgetsdidIselllastmonth?

WhatweresalesbyzipcodeforChristmaslastyear?

HowmanyunitsofProductXwerereturnedlastmonth?

Whatwerecompanyrevenuesandprofitsforthepastquarter?

HowmanyemployeesdidIhirelastyear?

BIfocusesonreportingonthecurrentstateofthebusiness,orasisnowcommonlycalledBusinessPerformanceManagement(BPM).BIprovidesretrospectivereportstohelpbusinessuserstomonitorthecurrentstateofthebusinessandanswerquestionsabouthistoricalbusinessperformance.Thesereportsandquestionsarecriticaltothebusiness,sometimesrequiredforregulatoryandcompliancereasons.

BIcanapplysomerudimentaryanalytics(timeseriesanalysis,previousperiodcomparisons,indices,shares,andbenchmarks)tohelpbusinessuserstoflagareas

ofunder-andover-performance.Buteventheseanalyticsarefocusedonmonitoringwhathappenedtothebusiness.

DataScienceQuestionsOntheotherhand,datascientistsareinsearchofvariablesandmetricsthatarebetterpredictorsofbusinessperformance.Consequently,datascientistsfocusonpredictiveanalytics(“Whatislikelytohappen?”)andprescriptiveanalytics(“WhatshouldIdo?”)typesofquestions.Forexample:

PredictiveQuestions(Whatislikelytohappen?)

HowmanywidgetswillIsellnextmonth?

WhatwillsalesbyzipcodebeoverthisChristmasseason?

HowmanyunitsofProductXwillbereturnednextmonth?

Whatareprojectedcompanyrevenuesandprofitsfornextquarter?

HowmanyemployeeswillIneedtohirenextyear?

PrescriptiveQuestions(WhatshouldIdo?)

Order[5,000]ComponentZtosupportwidgetsalesfornextmonth.

Hire[Y]newsalesrepsbythesezipcodestohandleprojectedChristmassales.

Setaside[$125K]infinancialreservetocoverProductXreturns.

Sellthefollowingproductmixtoachievequarterlyrevenueandmargingoals.

Increasehiringpipelineby35percenttoachievehiringgoals.

Toanswerthesepredictiveandprescriptivequestions,datascientistsbuildanalyticmodelsinanattempttoquantifycauseandeffect.Chapter7coverssomeoftheanalyticalgorithmsandtechniquesthatdatascientistsmightusetohelpthemquantifycauseandeffect.

TheAnalystCharacteristicsAreDifferentAnotherareaofdifferencebetweenBIanddatascienceisintheattitudinalcharacteristicsandworkapproachofthepeoplewhofillthoseroles(seeTable5.1).

Table5.1BIAnalystVersusDataScientistCharacteristics

Area BIAnalyst DataScientist

Focus Reports,KPIs,trends Patterns,correlations,models

Process Static,comparative Exploratory,experimentation,visual

Datasources Pre-planned,addedslowly Onthefly,asneeded

Transform Upfront,carefullyplanned In-database,ondemand,enrichment

Dataquality Singleversionoftruth “Goodenough,”probabilities

Datamodel Schemaonload Schemaonquery

Analysis Retrospective,descriptive Predictive,prescriptive

Courtesy:EMC

ThedifferencesthatjumpedoutmosttomefromTable5.1werethedifferentperspectiveson“dataquality.”FortheBIanalystwhoisdealingwithhistoricaldata,thedataneedstobe100percentaccurate.BIanddatawarehouseorganizationshaveinvestedheavilyindatagovernanceandmasterdatamanagementtoensurethatthedatainthedatawarehouseare100percentaccurate.

Ontheotherhand,thedatascientististryingtopredictwhatislikelytohappeninthefutureand,asaresult,isdealingwithprobabilities,confidencelevels,F-distributions,t-tests,andp-values.Thefutureisnever100percentaccurate,sodatascientistsdevelopasenseofwhatis“goodenough”intryingtopredictwhatislikelytohappenandrecommendwhatactionstotake.AsYogiBerra,thewell-knownNewYorkYankeecatcher,wasfamouslyquoted,“It'stoughtomakepredictions,especiallyaboutthefuture.”

Ittakesadifferentattitudetobeadatascientist,anattitudethatacceptsfailureasatoolforlearning.Datascientistslearntoembracefailureaspartoftheiragile,fail-fastapproachinthesearchtouncovernewmetricsandvariablesthatarebetterpredictorsofperformance.AcommonapproachthatthedatascientistsembraceismodeledaftertheCrossIndustryStandardProcessforDataMining(CRISP)model(seeFigure5.4).

Figure5.4CRISP:CrossIndustryStandardProcessforDataMining

Datasciencetakesaverysimilarapproach:establishabusinesshypothesisorquestion;exploredifferentcombinationsofdataandanalyticstobuild,test,andrefinetheanalyticmodel;andwash,rinse,andrepeatuntilthemodelprovesthatitcanprovidetherequired“analyticlift”whilereachingasatisfactorygoodnessoffit.Finallytheanalyticsaredeployedoroperationalizedincludingpossiblyrewritingtheanalyticsinadifferentlanguagetospeedthemodelexecution(i.e.,in-databaseanalytics)andintegratingtheanalyticmodelsandresultsintotheorganization'soperationalandmanagementsystems.

TheAnalyticApproachesAreDifferentUnfortunately,theseexplanationsareinsufficienttoanswersatisfactorilythequestionofwhat'sdifferentbetweenBusinessIntelligenceanddatascience.Solet'sexaminecloselythedifferentengagementapproaches(includinggoals,tools,andtechniques)thattheBIanalystandthedatascientistusetodotheirjobs.

BusinessIntelligenceAnalystEngagementProcessTheBIanalystengagementprocessisadisciplinethathasbeendocumented,taughtandrefinedoverthreedecadesofbuildingdatawarehousesandBIenvironments.Figure5.5providesahigh-levelviewoftheprocessthatatypicalBIanalystuseswhenengagingwiththebusinessuserstobuildouttheBIandsupportingdatawarehouseenvironments.

Figure5.5BusinessIntelligenceengagementprocess

Step1:Pre-buildDataModel.Theprocessstartsbybuildingthefoundationaldatamodel.Whetheryouuseadatawarehouseordatamartorhub-and-spokeapproach,whetheryouuseastar,snowflake,normalizedordimensionalschema,theBIanalystmustgothroughaformalrequirementsgatheringprocesswiththebusinessuserstoidentifyall(oratleastthevastmajorityof)thequestionsthatthebusinessuserswanttoanswer.Inthisrequirementsgatheringprocess,theBIanalystmustidentifythefirst-andsecond-levelquestionsthebusinessuserswanttoaddressinordertobuildarobustandextensibledatamodel.Forexample:

First-levelquestion:Howmanypatientsdidwetreatlastmonth?

Second-levelquestion:Howdidthatcomparetothepreviousmonth?

Second-levelquestion:WhatwerethemajorDRGtypestreated?

First-levelquestion:HowmanypatientscamethroughERlastnight?

Second-levelquestion:Howdidthatcomparetothepreviousnight?

Second-levelquestion:Whatwerethetopadmissionreasons?

First-levelquestion:WhatpercentageofbedswasusedatHospitalXlastweek?

Second-levelquestion:Whatisthetrendofbedutilizationoverthepastyear?

Second-levelquestion:Whatdepartmentshadthelargestincreaseinbedutilization?

TheBIanalystthenworkscloselywiththedatawarehouseteamtodefineandbuildtheunderlyingdatamodelsthatsupportthesetypesofquestions.

NOTE

Thedatawarehouseusesa“schemaonload”approachbecausethedataschemamustbedefinedandbuiltpriortoloadingdataintothedatawarehouse.Withoutanunderlyingdatamodelorschema,theBItoolswillnotwork.

Step2:DefinetheReport(Query).Oncetheanalyticrequirementshavebeentranscribedintoadatamodel,thenstep2oftheprocessiswheretheBIanalystusesaBItool—SAPBusinessObjects,MicroStrategy,Cognos,Qlikview,Pentaho,etc.—tocreatetheSQL-basedquerytobuildthereportand/oranswerthebusinessquestions.TheBIanalystwillusetheBItool'sgraphicaluserinterface(GUI)togeneratetheSQLquerybyselectingthemeasuresanddimensions;selectingpage,column,andpagedescriptors;specifyingconstraints,subtotals,andtotals;creatingspecialcalculations(mean,movingaverage,rank,shareof);andselectingsortcriteria.TheBItoolGUIhidesmuchofthecomplexityofcreatingtheSQL.

Step3:GenerateSQLCommands.OncetheBIanalystorthebusinessuserhasdefinedthedesiredreportorqueryrequest,theBItoolautomaticallycreatesthenecessarySQLcommands(SQLstatements).Insomecases,theBIanalystmightmodifytheSQLcommandsgeneratedbytheBItooltoincludeuniqueSQLcommandsthatmaynotbesupportedbytheBItool.

Step4:CreateReport.Instep4,theBItoolissuestheSQLcommandsagainstthedatawarehouseandcreatesthecorrespondingreportordashboardwidget.Thisisahighlyiterativeprocess,wheretheBIanalystwilltweaktheSQL(eitherusingtheGUIorhand-codingtheSQLstatement)tofine-tunethe

SQLrequest.TheBIanalystscanalsospecifygraphicalrenderingoptions(barcharts,linecharts,piecharts)untiltheygettheexactreportand/orgraphicthattheywant(seeFigure5.6).

Figure5.6TypicalBItoolgraphicoptions

TheBItoolsareverypowerfulandrelativelyeasytouseifthedatamodelisconfiguredproperly.Bytheway,thisisagoodexampleofthepowerofschemaonload.ThistraditionalschemaonloadapproachremovesmuchoftheunderlyingdatacomplexityfromthebusinessuserswhocanthenusetheBItoolsgraphicaluserinterfacetomoreeasilyqueryandexplorethedata(thinkself-serviceBI).

Insummary,theBIapproachreliesonapre-builtdatamodel(schemaonload),whichenablesuserstoquicklyandeasilyquerythedata—aslongasthedatathattheywanttoqueryisalreadydefinedandloadedintothedatawarehouse.Ifthedataisnotinthedatawarehouse,thenaddingdatatoanexistingwarehousecantakemonthstomakehappen.Notonlydoesmodifyingthedatawarehousetoincludeanewdatasourcerequireasignificantamountoftime,buttheprocesscanbeverycostly,asdataschemashavetobeupdatedtoincludethenewdatasource,newETLprocesseshavetobeconstructedtotransformandnormalizethedatatofitintotheupdateddataschemas,andexistingreportsanddashboardsmayhavetobeupdatedtoincludethenewdata.

TheDataScientistEngagementProcessThedatascienceprocessissignificantlydifferent.Infact,thereisverylittlefromtheBIanalystengagementprocessthatcanbereusedinthedatascienceengagementprocess(seeFigure5.7).

Figure5.7Datascientistengagementprocess

Step1:DefineHypothesistoTest.Step1ofthedatascienceengagementprocessstartswiththedatascientistsidentifyingthepredictiontheywanttomakeorhypothesisthattheywanttotest.Thisisaresultofcollaboratingwiththebusinesssubjectmatterexperttounderstandthekeysourcesofbusinessdifferentiation(e.g.,howtheorganizationdeliversvalue)andthenconstructtheassociatedhypothesesorpredictions.

Step2:GatherData…andMoreData.Instep2ofthedatascienceengagementprocess,thedatascientistgathersrelevantorpotentiallyinterestingdatafromamultitudeofsources—bothinternalandexternaltotheorganization—andpushesthatdataintothedatalakeoranalyticsandbox.Thedatalakeisagreatfoundationalcapabilityforthisprocess,asthedatascientistscanacquireandingestanydatatheywant(as-is),testthedataforitsvaluegiventhehypothesisorprediction,andthendecidewhethertoincludethatdataintheanalyticmodel.Thisiswhereanenvisioningexercisecanaddconsiderablevalueinfacilitatingthecollaborationbetweenthebusinessusersandthedatascientiststoidentifydatasourcesthatmayhelpimprovepredictiveresults.

Step3:BuildDataModel.Step3iswherethedatascientistsdefineandbuildtheschemanecessarytoaddressthehypothesisbeingtested.Thedatascientistscan'tdefinetheschemauntiltheyknowthehypothesisthattheyaretestingandunderstandwhatdatasourcestheyaregoingtousetobuildtheiranalyticmodels.

NOTE

Thisschemaonqueryprocessisnotablydifferentfromthetraditionaldatawarehouseschemaonloadprocess.Thedatascientistdoesn'tspendmonthsintegratingallthedifferentdatasourcestogetherintoaformaldatamodelfirst.Instead,thedatascientistwilldefinetheschemaasneededbasedonthedatathatisbeingusedintheanalysisandtherequirementsoftheanalytictooland/oralgorithm.Thedatascientistwilllikelyiteratethroughseveraldifferentversionsoftheschemauntilfindingaschemathatsupportstheanalyticmodelwithasufficientgoodnessoffitthatacceptsorrejectsthehypothesisbeingtested.

Step4:VisualizetheData.Step4ofthedatascienceprocessleveragesmanyoftheoutstandingdatavisualizationtoolsavailabletodaytouncoverrelationships,correlations,andoutliersinthedata.Thedatascientistswillusethedatavisualizationtoolstojump-starttheiranalyticprocessbytryingtoidentifycorrelationsinthedataworthyofinvestigationandoutliersinthedatathatmayneedspecialtreatment(e.g.,logtransformations).DatavisualizationtoolslikeTableau,Spotfire,DataRPM,andggplot2aregreatdatavisualizationtoolsforexploringthedataandidentifyingvariablesthatthedatascientistsmightwanttotest.

Step5:BuildAnalyticModels.Step5iswheretherealdatascienceworkbegins—wherethedatascientistsuseadvancedanalytictoolslikeSAS,SASMiner,R,Mahout,MADlib,AlpineMiner,H2O,etc.tocorrelatedifferentvariablesinanattempttobuildamoreaccurateanalyticmodels.Thedatascientistswillexploredifferentanalytictechniquesandalgorithmstotrytocreatethemostpredictivemodels.Again,thinkprobabilities,confidencelevels,F-distributions,t-tests,andp-values.Chapter7willcoversomeofthedifferentanalyticalgorithmsthatthedatascientistsmightuseandinwhatcontext.

Step6:EvaluateModelGoodnessofFit.Instep6,thedatascientistsascertainthemodel'sgoodnessoffit.Thegoodnessoffitofastatisticalmodeldescribeshowwellthemodelfitsasetofobservations(F-test,p-value,andt-statistic).AnumberofdifferentanalytictechniqueswillbeusedtodeterminethegoodnessoffitincludingKolmogorov–Smirnovtest,Pearson'schi-squaredtest,analysisofvariance(ANOVA),andconfusion(orerror)matrix(seeFigure5.8).

Figure5.8Measuringgoodnessoffit

TheDataModelsAreDifferentThedatamodelsthatareusedinthedatawarehousetosupportanorganization'sBIeffortsaresignificantlydifferentfromthedatamodelsthedatascientistsprefertouse.

DataModelingforBITheworldofBI(akaquery,reporting,dashboards)requiresadatamodelingtechniquethatallowsbusinessuserstocreatetheirownreportingandqueries.Tosupportthisneed,RalphKimballpioneereddimensionalmodeling—orstarschemas—whileatMetaphorComputersbackinthe1980s(seeFigure5.9).

Figure5.9Dimensionalmodel(starschema)

Thedimensionalmodelwasdesignedtoaccommodatetheanalysisneedsofthebusinessusers,withtwoimportantdesignconcepts:

Facttables(populatedwithmetricsormeasures)correspondtotransactionalsystemssuchasorders,shipments,sales,returns,premiums,claims,accountsreceivable,andaccountspayable.Factsaretypicallynumericvaluesthatcanbeaggregated(e.g.,averaged,counted,orsummed).

Dimensiontables(populatedwithattributesaboutthatdimension)representthe“nouns”ofthatparticulartransactionalsystemsuchasproducts,markets,stores,employees,customers,anddifferentvariationsoftime.Dimensionsaregroupsofhierarchiesanddescriptorsthatdescribethefacts.Itisthesedimensionalattributesthatenableanalyticexploration,attributessuchassize,weight,location(street,city,state,zip),age,gender,tenure,etc.

Dimensionalmodelingisidealforbusinessusersbecauseitsupportstheirnaturalquestion-and-answerexplorationprocesses.DimensionalmodelingsupportsBIconceptssuchasdrillacross(navigatingacrossdimensions)anddrillup/drilldown(navigatingupanddownthedimensionalhierarchiessuchastheproductdimensionhierarchyofproduct⇨brand⇨category).Today,allBItoolsusedimensionalmodelingasthestandardwayforinteractingwiththeunderlyingdatawarehouse.

DataModelingforDataScienceIntheworldofdatascience,Hadoopprovidesanopportunitytothinkdifferentlyabouthowwedodatamodeling.HadoopwasoriginallydesignedbyYahootodealwithverylong,flatweblogs.Hadoopwasdesignedwithverylargedatablocks(Hadoopdefaultblocksizeis64MBto128MBversusrelationaldatabaseblocksizesthataretypically32Kborless).Tooptimizethisblocksizeadvantage,thedatascienceteamwantsverylong,flatrecordsandlong,flatdatamodels.1

Forexample,somedatascientistspreferto“flatten”astarschemabycollapsingorintegratingthedimensionaltablesthatsurroundthefacttableintoasingle,flatrecordinordertoconstructandexecutemorecomplexdataquerieswithouthavingtousejoins(seeFigure5.10).

Figure5.10UsingflatfilestoeliminateorreducejoinsonHadoop

AsanexampleinFigure5.10,insteadofthreedifferentstarschemaswithconformedorshareddimensionstolinkthedifferentstarschemas,thedatascienceteamwantsthreelong,flatfileswiththefollowingcustomerdata:

Customerdemographics(age,gender,currentandprevioushomeaddresses,valueofcurrentandprevioushome,historyofmaritalstatus,kidsandtheiragesandgenders,currentandpreviousincome,etc.)

Customerpurchasehistory(annualpurchasesincludingitemspurchased,returns,pricespaid,discounts,coupons,location,dayofweek,timeofday,weathercondition,temperatures)

Customersocialactivities(entirehistoryofsocialmediaposts,likes,shares,tweets,favorites,retweets,etc.)

TheViewoftheBusinessIsDifferentInsteadoftryingtobuildthe“singleversionofthetruth”orcreatea“360-degreeviewofthecustomer,”thedatascienceteamwillbuildanalyticprofilesoneachoftheorganization'skeybusinessentitiesorstrategicnounsattheindividualentitylevel.

Oneofthemostpowerfuldatascienceconceptsistheanalyticprofile.Thedatascienceteambuildsdetailedanalyticprofilesthatcapturethebehaviors,propensities,preferences,andtendenciesofindividualbusinessentities(e.g.,customers,merchants,students,patients,doctors,windturbines,jetengines,ATMs).

Ananalyticprofileisacombinationofmetrics,keyperformanceindicators,scores,associationrules,andanalyticinsightscombinedwiththetendencies,behaviors,propensities,associations,affiliations,interests,andpassionsforanindividualentity(customer,device,partner,machine).

Forexample,theanalyticprofileforBillSchmarzoforStarbucksmightincludethefollowing:

DemographicInformation.Thisisthebasicinformationaboutmesuchasname,homeaddress,workaddress,age,gender,maritalstatus,lengthoftimeasgoldcardloyaltymember,incomelevel,valueofhome,lengthoftimeatcurrenthome,educationlevel,numberofdependents,ageandmakeofcar,ageandgenderofchildren,etc.

TransactionalMetrics.ThisisinformationaboutmytransactionswithStarbuckssuchasnumberofpurchases,purchaseamounts,productpurchasedandinwhatcombinations,frequencyofvisits,recencyofvisits,mostcommontimeofdayforvisits,storesvisitedmostfrequently,etc.

SocialMediaMetrics.ThisisinformationgatheredaboutanysocialmediacommentsthatBillSchmarzomighthavemadeacrossdifferentsocialmediasitesaboutStarbucksincludingposts,likes,tweets,retweets,socialmediaconversations,Yelpratings,blogs,e-mailconversations,consumercomments,mobileusage,webclicks,etc.Starbuckscouldminethesocialmediadatatounderstandmynetworkofpersonalrelationships(number,strength,direction,sequencing,andclusteringofrelationships)andcapturemyinterests,passions,associations,andaffiliations.

BehavioralGroupings.Nowwe'restartingtogetinteresting,aswewanttocreatebehavioralinsightsthatarerelevantforthebusinessinitiativesthatStarbucksistryingtosupport.Dependingonthetargetedbusinessinitiative(customerretention,customerup-sell,customeradvocacy,newstorelocations,channelsales,etc.),hereissomebehavioralinformationthatStarbucksmightwanttocaptureaboutme:favoritedrinksinrankorder,favoritestoresinrankorder,mostfrequenttimeofdaytovisitastore,mostfrequentdayofweekto

visitastore,recencyofstorevisit,frequencyofstorevisitsinpastweek/month/quarter,howlongdoIstayatwhichstores(“passthru”or“linger”),etc.

Classifications.Nowwewanttocreatesome“classifications”aboutBillSchmarzo'slifethatmighthaveimpactonStarbucks'skeybusinessinitiativessuchaslifestageclassification(longmarriage,kidincollege,kidathome,weight/dietconscious,etc.),lifestyleclassification(heavytraveler,heavychaiteadrinker,lightexerciser,andsoon),orproductclassification(morningcoffee/oatmealconsumer,afternoonfrap/cookieconsumer,etc.).

AssociationRules.WemightalsowanttocapturesomepropensitiesaboutBill'susagepatternsthatwecanusetosupportStarbucks'skeybusinessinitiatives,includingpropensitytobuyoatmealwhenhebuyshisventichailattewhentravelinginthemorning,propensitytobuyacookie/pastrywhentravelingintheafternoon,propensitytobuyproductinthechannel,etc.

Scores.Wealsomaywanttocreatescorestosupportdecision-makingandprocessoptimization.Scoresthatwemightwanttocreate(again,dependingonStarbucks'skeybusinessinitiatives)couldincludeadvocacyscore(whichmeasuresmylikelihoodtorecommendStarbucksandmakepositivecommentsforStarbucksonsocialmedia),loyaltyscore(whichmeasuresmylikelihoodtocontinuetovisitStarbucksstoresandbuyStarbucksproductsversuscompetitors),productusagescore(whichisameasureofhowmuchStarbucksproductIconsume—andrevenueIgenerate—whenIvisitaStarbucksstore),etc.

Aprofilecouldbemadeupofhundredsofmetricsandscoresthat—whenusedincombinationagainstaspecificbusinessinitiativelikecustomerretention,customerup-sell,newproductintroductions,orcustomeradvocacy—canimprovethepredictivecapabilitiesofthemodel(seeFigure5.11).

Figure5.11Samplecustomeranalyticprofile

Somemetricsandscoresaremoreimportantthanothers,dependingonthebusinessinitiativebeingaddressed.Forexample,afinancialservicesfirmfocusedoncustomeracquisition,disposableincome,retirementreadiness,lifestage,age,educationlevel,andnumberoffamilymembersdatamaybethemostimportantpredictivemetrics.However,forthatsamefinancialservicesfirmfocusedoncustomerretention,metricssuchasadvocacy,customersatisfaction,attritionrisk,socialnetworkassociations,andselectsocialmediarelationshipsmaybethemostimportantpredictivemetrics.

Forexample,againstacustomerretentionbusinessinitiative,anorganizationcouldcompareacustomer'smostrecentactivities(e.g.,purchases,mobileappusage,websitevisits,consumercomments,socialposts)tothehistoricaldata,metrics,andscoresthatcomposethatcustomer'sanalyticprofileinordertodetermine(score)hisorherlikelihoodtoattrite.Ifthecustomer's“AttritionScore”isaboveacertainlevel,thentheorganizationcoulddeliverapersonalized“nextbestoffer”inordertopreemptcustomerattrition.Theanalysisprocessforthe“ImproveCustomerRetention”businessinitiativeislaidoutinFigure5.12.

Figure5.12Improvecustomerretentionexample

Theanalysisprocessworkslikethis:

Step1:Establishahypothesisthatyouwanttotest.Inourcustomerretentionexample,ourtesthypothesisisthat“Premiumgoldcardmemberswithgreaterthanfivedayswithoutapurchaseormobileappengagementhave25to30percenthigherprobabilityofchurnthansimilarcustomers.”

Step2:Identifyandquantifythemostimportantmetricsorscorestopredictacertainbusinessoutcome.Inourexample,themetricsandscoresthatwe'regoingtousetotestourcustomerattritionhypothesisincludesCustomerTenure(inmonths),CustomerSatisfactionScore,AverageMonthlyPurchases,andCustomerLoyaltyScore.Noticethatthemetricsdonothavethesameweight(orconfidencelevel).Somemetricsandscoresaremoreimportantthanothersinpredictingperformancegiventhetesthypothesis.

Step3:Employthepredictivemetricstobuilddetailedprofilesforeachindividualcustomerwithrespecttothehypothesistobetested.

Step4:Compareanindividual'srecentactivitiesandcurrentstatewithhisorherprofileinordertoflagunusualbehaviorsandactionsthatmaybeindicativeofacustomerretentionproblem.Inourcustomerretentionexample,wemightwanttocreatea“CustomerAttrition”scorethatquantifiesthelikelihoodthatparticularcustomerisgoingtoleave,andthencreatespecificrecommendationsastowhatactionsor“nextbestoffers”canbedeliveredtoretainthatcustomer.

Step5:Continuetoseekoutnewdatasourcesandnewmetricsthatmaybebetterpredictorsofattrition.Thisisalsothepartofthedatascienceprocesstocontinuouslytrytoimprovetheaccuracyandconfidencelevelsofthemetrics

andscoresusingsensitivityanalysisandsimulationsliketheMonteCarloexperiments.

Step6:Integratetheanalyticinsights,scores,andrecommendationsintothekeyoperationalsystems(likelyCRM,directmarketing,pointofsales,andcallcenterforthecustomerretentionbusinessinitiative)inordertoensurethattheinsightsuncoveredbytheanalysisareactionablebyfrontlineorcustomer-engagingemployees.

SummaryOrganizationsarerealizingthatdatascienceisverydifferentfromBIandthatonedoesnotreplacetheother.Bothcombinetoprovidethe“dynamicduo”ofanalytics—onefocusedonmonitoringthecurrentstateofthebusinessandtheothertryingtopredictwhatislikelytohappenandthenprescribewhatactionstotake.

Bigdataisakeyenablerofanewdisciplinecalleddatascience.Datascienceseekstoleveragenewsourcesofstructuredandunstructureddata,coupledwithadvancedpredictiveandprescriptiveanalytics,touncovernewvariablesandmetricsthatarebetterpredictorsofperformance.

Asdiscussedinthischapter,BIisdifferentfromdatascienceinthefollowingways:

Thequestionsaredifferent.

Theanalyticcharacteristicsaredifferent.

Theanalyticengagementprocessesaredifferent.

Thedatamodelsaredifferent.

Thebusinessviewisdifferent.

Thischapteralsointroducedtheveryimportantdatascienceconceptcalledanalyticprofiles.Organizationsarelearningthatmoreimportantthantryingtocreatea360-degreeprofileofthecustomerisidentifyingandquantifyingthosefewerbutmoreimportantmetricsthatarebetterpredictorsofbusinessorcustomerperformancesuchasoptimizingkeybusinessprocesses,influencingcustomerbehaviors,anduncoveringnewmonetizationopportunities.

Hopefullyyourjourneythroughthe“lookingglass”wasasenlighteningtoyouasitwastome!

HomeworkAssignmentUsethefollowingexercisestoapplywhatyoulearnedinthischapter.

Exercise#1:DescribethekeydifferencesbetweenBIanddatascienceandwhatthosedifferencesmeantoyourorganization.

Exercise#2:Listsampledescriptive(Whathappened?),predictive(Whatislikelytohappen?),andprescriptive(WhatactionsshouldItake?)questionsthatarerelevanttothetargetedbusinessinitiativethatyouidentifiedinChapter2.

Exercise#3:ForthetargetedbusinessinitiativeidentifiedinChapter2,listsomeofthekeymetricsandvariablesthatyoumightwanttocaptureinordertosupportthepredictiveandprescriptivequestionslistedinExercise#2.

Notes1ApacheHadoopisanopen-sourcesoftwareframeworkwritteninJavafordistributedstorageanddistributedprocessingofverylargedatasetsoncomputerclustersbuiltfromcommodityhardware.AllthemodulesinHadooparedesignedwithafundamentalassumptionthathardwarefailures(ofindividualmachinesorracksofmachines)arecommonplaceandthusshouldbeautomaticallyhandledinsoftwarebytheframework.(Source:Wikipedia)

Chapter6DataScience101Therearemanyexcellentbooksandcoursesfocusedonteachingpeoplehowtobecomeadatascientist.Thosebooksandcoursesprovidedetailedmaterialandexercisesthatteachthekeycapabilitiesofdatasciencesuchasstatisticalanalysis,datamining,textmining,SQLprogramming,andothercomputing,mathematical,andanalytictechniques.Thatisnotthepurposeofthischapter.

ThepurposeofChapter6istointroducesomedifferentanalyticalgorithmsthatbusinessusersshouldbeawareofandtodiscusswhenitmightbemostappropriatetousewhichtypesofalgorithms.Youdonotneedtobeadatascientisttounderstandwhenandwhytoapplytheseanalyticalgorithms.Amoredetailedunderstandingofthesedifferentanalyticalgorithmswillhelpthebusinessuserstocollaboratewiththedatascienceteamtouncoverthosevariablesandmetricsthatmaybebetterpredictorsofbusinessperformance.

DataScienceCaseStudySetupDatascienceisacomplicatedtopicthatcertainlycannotbegivenjusticeinasinglechapter.SotohelpgraspsomeofthedatascienceconceptsthatarecoveredinChapter6,youaregoingtocreateafictitiouscompanyagainstwhichyoucanapplythedifferentanalyticalgorithms.Hopefullythiswillmakethedifferentdatascienceconcepts“cometolife.”

Ourfictitiouscompany,Fairy-TaleThemeParks(“TheParks”),hasmultipleamusementparksacrossNorthAmericaandwantstoemploybigdataanddatascienceinorderto:

Deliveramorepositiveandcompellingguestexperienceinanincreasinglycompetitiveentertainmentmarketplace;

Determinemaximumpotentialguestlifetimevalue(MPGTV)touseasthebasisfordeterminingguestpromotionalspendanddiscountsandprioritizingPriorityAccesspassesandin-parkhotelrooms;

Promotenewtechnology-heavy3Dattractions(TerrorAirlineandZombieApocalypse)toensurethesuccessfuladoptionandlong-termviabilityofthosenewridesthatappealtonewguestsegments;

EnsurethesuccessofnewmovieandTVcharactersinordertoincreaseassociatedlicensingrevenuesandensurelong-termcharacterviabilityfornewmovieandTVsequels.

TheParksisdeployingamobileappcalledFairy-TaleChaperonthatengagesguestsastheymovethroughtheparkandhelpsguestsenjoythedifferentattractions,entertainment,retailoutlets,andrestaurants.Fairy-TaleChaperonwill:

DeliverPriorityAccesspassestodifferentattractionsandrewardtheirmostimportantguestswithdigitalcoupons,discounts,and“FairyDust”(moneyequivalentthatcanbespentonlyinthepark).

Promotesocialmediapoststodrivegamificationandrewardsaroundcontestssuchasmostsocialposts,mostpopularsocialposts,andmostpopularphotosandvideos.

Trackguestflowandin-parktrafficpatterns,tendencies,andpropensitiesinordertodeterminewhichattractionstopromote(toincreaseattractiontraffic)andwhichattractionsguestsshouldavoidbecauseoflongwaittimes.

Deliverreal-timeguestdiningandentertainmentrecommendationsbasedonguests'areasofinterestandseat/tableavailabilityforselectrestaurantsandentertainment.

Rewardguestswhosharetheirsocialmediainformationthatcanbeusedtomonitorguestreal-timesatisfactionandenjoymentviaFacebook,Twitter,andInstagram.Italsoprovidesanopportunitytopromoteselectphotosinorderto

startviralmarketingcampaigns.

Thischapterreviewsanumberofdifferentanalytictechniques.Youarenotexpectedtobecomeanexpertinthesedifferentanalyticalgorithms.However,themoreyouunderstandwhattheseanalyticalgorithmscando,thebetterpositionyouareintocollaboratewithyourdatascienceteamandsuggesttheartofthepossibletoyourbusinessleadershipteam.

FundamentalexploratoryanalyticalgorithmsthatarecoveredinChapter6are:

Trendanalysis

Boxplots

Geography(spatial)analysis

Pairsplot

Timeseriesdecomposition

Moreadvancedanalyticalgorithmsthatarecoveredinthischapterare:

Clusteranalysis

Normalcurveequivalent(NCE)analysis

Associationanalysis

Graphanalysis

Textmining

Sentimentanalysis

Traversepatternanalysis

Decisiontreeclassifieranalysis

Cohortsanalysis

Throughoutthechapter,youwillcontemplatehowTheParkscouldleverageeachofthesedifferentanalytictechniques.

NOTE

ThroughoutthischapterIprovidelinkstositesthatcanhelpyougetcomfortablewiththesedifferentanalyticalgorithms.ManyofthesiteshaveexercisesthatuseR.1IstronglyrecommenddownloadingRandRStudionow!

NOTE

IcommonlyuseWikipediatorefinethedefinitionsofmanyofthesedifferentanalyticalgorithms.Wikipediaisagreatsourceformoredetailsoneachoftheseanalyticalgorithms.

FundamentalExploratoryAnalyticsLet'sstartbycoveringsomebasicstatisticalanalysisthatwaslikelycoveredinyourfirststatisticscourse(yes,Irealizethatyouprobablysoldyourstatsbooktheminutethestatsclasswasover).Trendanalysis,boxplots,geographicalanalysis,pairsplot,andtimeseriesdecompositionareexamplesofexploratoryanalyticalgorithmsthatthedatascientistsusetogeta“feelforthedata.”Theseexploratoryanalyticalgorithmshelpthedatascienceteamtobetterunderstandthedatacontentandgainahigh-levelunderstandingofrelationshipsandpatternsinthedata.

TrendAnalysisTrendanalysisisafundamentalvisualizationtechniquetospotpatterns,trends,relationships,andoutliersacrossatimeseriesofdata.Oneofthemostbasicyetverypowerfulexploratoryanalytics,trendanalysis(applyingdifferentplottingtechniquesandgraphicvisualizations)canquicklyuncovercustomer,operational,orproducttrendsandeventsthattendtohappentogetherorhappenatsomeperiodofregularity(seeFigure6.1).

Figure6.1Basictrendanalysis

InFigure6.1,thedatascientistmanuallytestedanumberofdifferenttrendingoptionsinordertoidentifythe“bestfit”trendline(inthisexample,usingMicrosoftExcel).Oncethedatascientistidentifiesthebesttrendingoption,thedatascientistcanautomatethegenerationofthetrendlinesusingR.

Next,thedatascientistmightwanttodissectthetrendlineacrossanumberof

differentbusinessdimensions(e.g.,products,geographies,salesterritories,markets)inordertoundercoverpatternsandtrendsatthenextlevelofgranularity.Thedatascientistcanthenwriteaprogramtojuxtaposethedetailedtrendlinesintothesamechartsothatitiseasiertospottrends,patterns,relationships,andoutliersburiedinthegranulardata(seeFigure6.2).

Figure6.2Compoundtrendanalysis

Finally,trendanalysisyieldsmathematicalmodelsforeachofthetrendlines.Thesemathematicalmodelscanbeusedtoquantifyreoccurringpatternsorbehaviorsinthedata.Themostinterestinginsightsfromthetrendlinescanthenbeflaggedforfurtherinvestigationbythedatascienceteam(seeFigure6.3).

Figure6.3Trendlineanalysis

WARNING

Somebusinessuserstrytousetrendanalysistopredictfutureeventsthroughsimpletimeseriesextrapolations.Extrapolatingatimeseriestrendtopredictfuturebehaviorsandeventsiscommonbuthighlyriskyunlessyouoperatewithina100percentstableenvironment.

TheParksRamifications

TheParkscouldusetrendanalysistoidentifythevariables(e.g.,waittimes,socialmediaposts,consumercomments)thatarehighlycorrelatedtotheincreaseordecreaseinguestsatisfactionforeachattraction,restaurant,retailoutlet,andentertainment.TheParkscouldleveragetheresultsfromthetrendanalysisto

1. Flagproblemareasandtakecorrectiveactionssuchasopeningmorelines,promotinglessbusyattractions,movingkiosksthatareblockingtrafficflow,andresituatingcharactersatdifferentpointsintheparks;

2. Identifythelocationandtypesoffutureattractions,restaurants,retailoutlets,andentertainment.

Formoreinformationabouthowtomakesimpleplotsandgraphs(linecharts,barcharts,histograms,dotcharts)inR,checkouthttp://www.harding.edu/fmccown/r/.

BoxplotsBoxplotsareoneofthemoreinterestingandvisuallycreativeexploratoryanalyticalgorithms.Boxplotsquicklyvisualizevariationsinthebasedataandcanbeusedtoidentifyoutliersinthedataworthyoffurtherinvestigation.Aboxplotisaconvenientwayofgraphicallydepictinggroupsofnumericaldatathroughtheirquartiles.Boxplotsmayalsohavelinesextendingverticallyfromtheboxes(whiskers)indicatingvariabilityoutsidetheupperandlowerquartiles,hencethetermsbox-and-whiskerplotandbox-and-whiskerdiagram(seeFigure6.4).

Figure6.4Boxplotanalysis

OnecanquicklyseethedistributionofkeydataelementsfromtheBoxplotinFigure6.4.Whenyouchangethedimensionsagainstwhichyouaredoingtheboxplots,underlyingpatternsandrelationshipsinthedatastarttosurface.

TheParksRamifications

TheParkscanemployboxplotstodetermineitsmostloyalguestsforeachofthepark'sattractions(e.g.,CanyonCopterRide,MonsterMansion,SpaceAdventure,GhoulishGulch).TheParkscanusetheresultsoftheboxplotanalysistocreateguestcurrentandmaximumlifetimevaluescoresagainstwhichtoprioritizetowhomtorewardwithPriorityAccesspassesandothercouponsanddiscounts.

FormoreinformationaboutcreatingboxplotsinR,checkouthttp://www.r-bloggers.com/box-plot-with-r-tutorial/.

Geographical(Spatial)AnalysisGeographicalorspatialanalysisincludestechniquesforanalyzinggeographicalactivitiesandconditionsusingabusinessentity'stopological,geometric,orgeographicproperties.Forexample,geographicalanalysissupportstheintegrationofzipcodeanddata.goveconomicdatawithaclient'sinternaldatatoprovideinsightsaboutthesuccessoftheorganization'sgeographicalreachandmarketpenetration(seeFigure6.5).

Figure6.5Geographical(spatial)trendanalysis

IntheexampleinFigure6.5,geographicalanalysisiscombinedwithtrendanalysisinordertoidentifychangesinmarketpatternsacrosstheorganization'skeymarkets.Geographicalanalysisisespeciallyusefulfororganizationslookingtodeterminethesuccessoftheirsalesandmarketingefforts.

TheParksRamifications

TheParkscanconductgeographicaltrendanalysistospotanychanges(atboththezip+4andhouseholdlevels)inthegeo-demographicsofguestsovertimeandbyseasonalityandholidays.TheParkscanusetheresultsofthisgeographicalplusseasonalityanalysistocreategeo-specificcampaignsandpromotionswiththeobjectiveofincreasingattendancefromunder-penetratedgeographicalareasbydayofweek,holidays,andseasonality.

PairsPlotPairsplotanalysismaybemyfavoriteanalyticsalgorithm.Pairsplotanalysisallowsthedatascientisttospotpotentialcorrelationsusingpairwisecomparisonsacrossmultiplevariables.Pairsplotanalysisprovidesadeepviewintothedifferentvariablesthatmaybecorrelatedandcanformthebasisforguidingthedatascienceteamintheidentificationofkeyvariablesormetricstoincludeinthedevelopmentofpredictivemodels(seeFigure6.6).

Figure6.6Pairsplotanalysis

Pairsplotanalysisdoeslotsofthegruntworkofquicklypairingupdifferentvariablesanddimensionssothatonecanquicklyspotpotentialrelationshipsinthedataworthyofmoredetailedanalysis(seetheboxesinFigure6.6).

TheParksRamifications

TheParkscanleveragepairsplotanalysistocompareamultitudeofvariablestoidentifythosevariablesthatdrivegueststoparticularattractions,entertainment,retailoutlets,andrestaurants.TheParkscanusetheresultsoftheanalysistodrivein-parkpromotionaldecisionsandoffersthatdirectgueststounder-utilizedattractions,entertainment,retailoutlets,andrestaurants.

AdditionalpairedplotoptionsinR(e.g.,pairs,splom,plotmatrix,ggcorplot,panelcor)canbefoundathttp://www.r-bloggers.com/five-ways-to-visualize-your-pairwise-comparisons/.

TimeSeriesDecompositionTimeseriesdecompositionexpandsonthebasictrendanalysisbydecomposingthetraditionaltrendanalysisintothreeunderlyingcomponentsthatcanprovidevaluablecustomer,product,oroperationalperformanceinsights.Thesetrendanalysiscomponentsare

Cyclicalcomponentthatdescribesrepeatedbutnon-periodicfluctuations,

Seasonalcomponentthatreflectsseasonality(seasonalvariation),

Irregularcomponent(or“noise”)thatdescribesrandom,irregularinfluencesandrepresentstheresidualsofthetimeseriesaftertheothercomponentshavebeenremoved.

Fromthetimeseriesdecompositionanalysis,abusinessusercanspotparticularareasofinterestinthedecomposedtrenddatathatmaybeworthyoffurtheranalysis(seeFigure6.7).

Figure6.7Timeseriesdecompositionanalysis

ForexampleinexaminingFigure6.7,onecanspotunusualoccurrencesintheareasofSeasonalityandTrend(highlightedintheboxes)thatmaysuggesttheinclusionofadditionaldatasources(suchasweatherormajorsportingandentertainmenteventsdata)inanattempttoexplainthoseunusualoccurrences.

TheParksRamifications

TheParkscandeploytimeseriesdecompositionanalysistoidentifyandquantifytheimpactthatseasonalityandspecificeventsarehavingonguestvisitsandassociatedspend.TheParkscanusetheresultsoftheanalysisto

1. Createseason-specificmarketingcampaignsandpromotionstoincreaseguestvisitsandassociatedspend,

2. Determinewhicheventsoutsideofthethemeparks(concerts,professionalsportingevents,BCSfootballgames)areworthyofpromotionalandsponsorshipspend.

FormoreinformationabouttimeseriesdecompositioninR,checkouthttp://www.r-bloggers.com/time-series-decomposition/.

AnalyticAlgorithmsandModelsThefollowinganalyticalgorithmsstarttomovethedatascientistbeyondthedataexplorationstageintothemorepredictivestagesoftheanalysisprocess.Theseanalyticalgorithmsbytheirnaturearemoreactionable,allowingthedatascientisttoquantifycauseandeffectandprovidethefoundationtopredictwhatislikelytohappenandrecommendspecificactionstotake.

ClusterAnalysisClusteranalysisisusedtouncoverinsightsabouthowcustomersand/orproductsclusterintonaturalgroupingsinordertodrivespecificactionsorrecommendations(e.g.,personalizedmessaging,targetmarketing,maintenancescheduling).Clusteranalysisorclusteringistheexerciseofgroupingasetofobjectsinsuchawaythatobjectsinthesamegrouparemoresimilartoeachotherthantothoseinothergroups(clusters).

Clusteringanalysiscanuncoverpotentialactionableinsightsacrossmassivedatavolumesofcustomerandproducttransactionsandevents.Clusteranalysiscanuncovergroupsofcustomersandproductsthatsharecommonbehavioraltendenciesand,consequently,andcanbetargetedwiththesamemarketingtreatments(seeFigure6.8).

Figure6.8Clusteranalysis

NOTE

IcanuseapiechartinFigure6.8becauseIwasdealingwithonlyasmallnumberofclusters.Generallyspeaking,piechartsarenotgoodforconveyinginformationbecausealargenumberofpiesegmentsobscuresthedataandmakesithardtouncoveranyunderlyingtrendsorrelationshipsburiedinthedata.

TheParksRamifications

TheParkscanleverageclusteranalysistocreatemoreactionableprofilesofthepark'smostprofitableguestclustersandhighestpotentialguestclusters.TheParkscanusetheresultsoftheanalysistoquantify,prioritize,andfocusguestacquisitionandguestactivationmarketingefforts.

FormoreinformationaboutclusteranalysisinR,checkouthttp://www.statmethods.net/advstats/cluster.html.

NormalCurveEquivalent(NCE)AnalysisAtechniquefirstusedinevaluatingstudents'testingperformance,normalcurveequivalent(NCE),isadatatransformationtechniquethatapproximatelyfitsanormaldistributionbetween0and100bynormalizingadatasetinpreparationforpercentilerankanalysis.Forexample,anNCEdatatransformationisawayofstandardizingscoresreceivedonatestintoa0–100scalesimilartoapercentilerankbutpreservingthevaluableequal-intervalpropertiesofaz-score(seeFigure6.9).

Figure6.9Normalcurveequivalentanalysis

WhatIfindmostusefulabouttheNCEdatatransformationistakingtheNCEresultsandbinningtheresultstolookfornaturalgroupingsinthedata.ForexampleinFigure6.10,youbuildontheNCEanalysistouncoverpricepoints(bins)acrossawiderangeofhigh-margin,mid-margin,andlow-marginproductcategoriesthatmightindicatetheopportunityforpricingand/orpromotionalactivities.

Figure6.10Normalcurveequivalentsellerpricinganalysisexample

TheParksRamifications

TheParkscanemploytheNCEtechniquetounderstandpriceinflectionpointsforpackagesofattractionsandrestaurants.TheParkscanleveragethepriceinflectionpointstooptimizepricing(e.g.,createapackageofattractionsandrestaurantsbyseasonality,holiday,dayofweek,etc.)andcreatenewPriorityAccesspackages.

Formoreinformationabouthowtousez-scorestonormalizedatausingR,checkouthttp://www.r-bloggers.com/r-tutorial-series-centering-variables-and-generating-z-scores-with-the-scale-function/.FormoreinsightsintotheNCEdatatransformationtechnique,seehttps://en.wikipedia.org/wiki/Normal_curve_equivalent.

AssociationAnalysisAssociationanalysisisapopularalgorithmfordiscoveringandquantifyingrelationshipsbetweenvariablesinlargedatabases.Associationanalysisshowscustomerorproducteventsoractivitiesthattendtohappentogether,whichmakesthistypeofanalysisveryactionable.Forexample,theassociationrule{buns,ketchup}→{burger}foundinthepoint-of-salesdataofasupermarketwouldindicatethatifacustomerbuysbunsandketchuptogether,sheislikelytoalsobuyhamburgermeat.Suchinformationcanbeusedasthebasisformakingpricing,productplacement,promotion,andothermarketingdecisions.

Associationanalysisisthebasisformarketbasketanalysis(identifyingproductsand/orservicesthatsellincombinationorsellwithapredictabletimelag)thatisusedinmanyindustriesincludingretail,telecommunications,insurance,digitalmarketing,creditcards,banking,hospitality,andgaming.

InFigure6.11,thedatascienceteamexaminedthecreditcardtransactionsforoneindividualanduncoveredseveralpurchaseoccurrencesthattendedtohappentogether.Forexample,youcanseeaverystrongrelationshipbetweenChipotleandStarbucksinthesecondlineofFigure6.11,aswellasanumberofpurchaseoccurrences(e.g.,FootLocker+BestBuy)thattendtohappenincombination.

Figure6.11Associationanalysis

Oneveryactionabledatasciencetechniqueistoclustertheresultingassociationrulesintocommongroupsorsegments.Forexample,inFigure6.12,thedatascienceteamclusteredtheresultingassociationrulesacrosstensofmillionsofcustomersinordertocreatemoreaccurate,relevantcustomersegments.Inthisprocess,thedatascienceteam

Runstheassociationanalysisacrossthetensofmillionsofcustomerstoidentifyassociationruleswithahighdegreeofconfidence,

Clustersthecustomersandtheirresultingassociationrulesintocommongroupingsorsegments(e.g.,Chipotle+Starbucks,VirginAmerica+Marriott),

Usesthesenewsegmentsasthebasisforpersonalizedmessaginganddirectmarketing.

Figure6.12Convertingassociationrulesintosegments

Oneoftheinterestingconsequencesofthisassociationruleclusteringtechniqueis

thatacustomermayappearinmultiplesegments.Artificiallyforce-fittingacustomerintoasinglesegmentobscuresthefinenuancesabouteachparticularcustomer'sbuyingbehaviors,tendencies,andpropensities.

TheParksRamifications

TheParkscanleveragemarketbasketanalysistoidentifythemostpopularandleastpopular“packagesofattractions.”TheParkscanusethis“packagesofattractions”datato

1. CreatenewpricingandPriorityAccesspackagesforthemostpopularpackagesinordertooptimizein-parktrafficflowandreduceattractionwaittimes,

2. CreatenewpricingandPriorityAccesspackagesfortheleastpopular“packages”inordertodrivetraffictounder-utilizedattractions.

FormoreinformationaboutassociationanalysisinR,checkouthttp://www.rdatamining.com/examples/association-rules.

GraphAnalysisGraphanalysisisoneofthemorepowerfulanalysistechniquesmadepopularbysocialmediaanalysis.Graphanalysiscanquicklyhighlightcustomerormachine(thinkInternetofThings)relationshipsobscuredacrossmillionsifnotbillionsofsocialandmachineinteractions.

Graphanalysisusesmathematicalstructurestomodelpairwiserelationsbetweenobjects.A“graph”inthiscontextismadeupof“vertices”or“nodes”andlinescallededgesthatconnectthem.Socialnetworkanalysis(SNA)isanexampleofgraphanalysis.SNAisusedtoinvestigatesocialstructuresandrelationshipsacrosssocialnetworks.SNAcharacterizesnetworkedstructuresintermsofnodes(peopleorthingswithinthenetwork)andthetiesoredges(relationshipsorinteractions)thatconnectthem(seeFigure6.13).

Figure6.13Graphanalysis

Whilegraphanalysisismostcommonlyusedtoidentifyclustersof“friends,”

uncovergroupinfluencersoradvocates,andmakefriendrecommendationsonsocialmedianetworks,graphanalysiscanalsolookatclusteringandstrengthofrelationshipsacrossdiversenetworkssuchasATMs,routers,retailoutlets,smartdevices,websites,andproductsuppliers.

TheParksRamifications

TheParkscanemploygraphanalysistouncoverstrengthofrelationshipsamonggroupsofguests(leaders,followers,influencers,cohorts).TheParkscanleveragethegraphanalysisresultstodirectpromotions(discounts,restaurantvouchers,travelvouchers)togroupleadersinordertoencouragetheseleaderstobringgroupsbacktotheparksmorefrequently.

FormoreinformationaboutsocialnetworkanalysisinR,checkouthttp://www.r-bloggers.com/an-example-of-social-network-analysis-with-

r-using-package-igraph/.

TextMiningTextminingreferstotheprocessofderivingusableinformation(metadata)fromtextfilessuchasconsumercomments,e-mailconversations,physicianortechniciannotes,workorders,etc.Basically,textminingcreatesstructureddataoutofunstructureddata.

Textminingisaverypowerfultechniquetoshowduringanenvisioningprocess,asmanybusinessstakeholdershavestruggledtounderstandhowtheycangaininsightsfromthewealthofinternalcustomer,product,andoperationaldata.Textminingisnotsomethingthatthedatawarehousecando,somanybusinessstakeholdershavestoppedthinkingabouthowtheycanderiveactionableinsightsfromtextdata.Consequently,itisimportanttoleverageenvisioningexercisestohelpthebusinessstakeholderstoimagetherealmofwhatispossiblewithtextdata,especiallywhenthattextdataiscombinedwiththeorganization'soperationalandtransactionaldata.

Forexample,inFigure6.14,thetextminingtoolhasminedahistoryofnewsfeedsaboutaparticularproducttouncoverpatternsandcombinationsofwordsthatmayindicateproductperformanceandmaintenanceproblems.

Figure6.14Textmininganalysis

Typicaltextminingtechniquesandalgorithmsincludetextcategorization,textclustering,concept/entityextraction,taxonomiesproduction,sentimentanalysis,documentsummarization,andentityrelationmodeling.

TheParksRamifications

TheParkscanmineguestcomments,socialmediaposts,ande-mailstoflagandrankareasofconcernandproblemsituations.TheParkscanleveragethetextminingresultstolocateunsatisfiedguestsinordertodrivepersonal(face-to-face)guestinterventionefforts.

FormoreinformationabouttextmininganalysisusingR,checkouthttp://www.r-bloggers.com/text-mining-in-r-automatic-categorization-of-

wikipedia-articles/.

SentimentAnalysisSentimentanalysiscanprovideabroadandgeneraloverviewofyourcustomers'sentimenttowardyourcompanyandbrands.Sentimentanalysiscanbeapowerfulwaytogleaninsightsaboutthecustomers'feelingsaboutyourcompany,products,andservicesoutoftheever-growingbodyofsocialmediasites(Facebook,LinkedIn,Twitter,Instagram,Yelp,Snapchat,Vine,etc.)(seeFigure6.15).

Figure6.15Sentimentanalysis

InFigure6.15,thedatascienceteamconductedcompetitivesentimentanalysisbyclassifyingtheemotions(e.g.,anger,disgust,fear,joy,sadness,surprise)ofTwittertweetsaboutourclientanditskeycompetitors.Sentimentanalysiscanprovideanearlywarningalertaboutpotentialcustomerorcompetitiveproblems(e.g.,whereyourorganization'sperformanceandqualityofserviceisconsideredlackingascomparedtokeycompetitors)andbusinessopportunities(e.g.,where

keycompetitor'sperceivedperformanceandqualityofserviceissuffering).

Unfortunately,itissometimesdifficulttogetthesocialmediadataattheleveloftheindividual,whichisrequiredtocreatemoreactionableinsightsandrecommendationsattheindividualcustomerlevel.However,leadingorganizationsaretryingtoincenttheircustomersto“like”theirsocialmediasitesorsharetheirsocialmedianamesinordertoimprovethecollectionofcustomer-identifiabledata.

TheParksRamifications

TheParkscanestablishasentimentscoreforeachattractionandcharacterandmonitorsocialmediasentimentfortheattractionsandcharactersinreal-time.TheParkscanleveragethereal-timesentimentscorestotakecorrectiveactions(placateunhappyguests,openadditionallines,openadditionalattractions,removekiosks,movecharacters).

FormoreinformationaboutsentimentanalysisusingR,checkouthttps://sites.google.com/site/miningtwitter/questions/sentiment/sentiment.

TraversePatternAnalysisTraversepatternanalysisisanexampleofcombiningacoupleofanalyticalgorithmstobetterunderstandcustomer,product,oroperationalusagepatterns.Traverseanalysislinksacustomerorproductusagepatternsandassociationrulestoageographicalorfacilitymapinordertoidentifypotentialpurchase,traffic,flow,fraud,theft,andotherusagepatternsandrelationships.

Theprocessstartsbycreatingassociationrulesfromthecustomer'sorproduct'susagedata,andthenmapsthoseassociationrulestoageographicalmap(store,hospital,school,campus,sportsarena,casino,airport)toidentifypotentialperformance,usage,staffing,inventory,logistics,trafficflow,etc.problems.

InFigure6.16,thedatascienceteamcreatedaseriesofassociationrulesaboutslotandtableplayinacasino,andthenusedthoseassociationrulestoidentifypotentialfootflowproblemsandgamelocationoptimizationopportunities.Thedatascienceteam

Createdplayerperformanceassociationrulesaboutwhatgamestheplayerstendtoplayincombination,

LinkedthegameplayingassociationrulestolocationID,andthen,

Mappedrulesandgameperformancedatatoalayoutofthecasino.

Figure6.16Traversepatternanalysis

Theresultsofthisanalysishighlightsareasofthecasinothataresub-optimizedwhencertaintypesofgameplayersareinthecasinoandcanleadtorecommendationsaboutthelayoutofthecasinoandthetypesofincentivestogiveplayerstochangetheirgameplayingbehaviorsandtendencies.

TheParksRamifications

TheParkscanemploytraversepatternanalysistounderstandparkandguestflowswithrespecttoattractions,entertainment,retailoutlets,restaurants,characters,etc.TheParkscanusethetraversepatternanalysisresultsto

1. Identifywheretoplacecharactersandsituateportablekiosksinordertoincreaserevenues,

2. Determinewhatpromotionstoofferinordertodrivetraffictoidleattractionsandrestaurants.

DecisionTreeClassifierAnalysisDecisiontreeclassifieranalysisusesdecisiontreestoidentifygroupingsandclustersburiedintheusageandperformancedata.Decisionclassifieranalysisusesadecisiontreeasapredictivemodelthatmapsobservationsaboutanitemtoconclusionsabouttheitem'stargetvalue.

InFigure6.17,thedatascienceteamusedthedecisiontreeclassifieranalysistechniquetoidentifyandgroupperformanceandusagevariablesintosimilarclusters.Thedatascienceteamuncoveredproductperformanceclustersthat,whenoccurringincertaincombinations,wereindicativeofpotentialproductperformanceormaintenanceproblems.

Figure6.17Decisiontreeclassifieranalysis

TheParksRamifications

TheParkscanusedecisiontreeclassifieranalysistoquantifythevariablesthatdriveguestsatisfactionandincreasespendbyguestclusters.TheParkscanleveragethedecisiontreeclassifieranalysisresultstodeterminewhichvariablestomanipulateinordertodriveguestsatisfactionandassociatedguestspend.

FormoreinformationaboutbuildingdecisiontreesusingR,checkout“Tree-BasedModels”athttp://www.statmethods.net/advstats/cart.html.

CohortsAnalysisCohortsanalysisisusedtoidentifyandquantifytheimpactthatanindividualormachineshaveonthelargergroup.

Cohortsanalysisiscommonlyusedbysportsteamstoascertaintherelativevalueofaplayerwithrespecttohisorherinfluenceonthesuccessoftheoverallteam.TheNationalBasketballAssociationusesarealplus-minus(RPM)metrictomeasureaplayer'simpactonthegame,representedbydifferencebetweentheteam'stotalscoringanditsopponent's.Table6.1showstopRPMplayersfromthe2014–2015NBAseason.

Table6.12014–2015TopNBARPMRankings

Rank Player Team MPG RPM

1 StephenCurry,PG GS 32.7 9.34

2 LeBronJames,SF CLE 36.1 8.78

3 JamesHarden,SG HOU 36.8 8.50

4 AnthonyDavis,PF NO 36.1 8.18

5 KawhiLeonard,SF SA 31.8 7.57

6 RussellWestbrook,PG OKC 34.4 7.08

7 ChrisPaul,PG LAC 34.8 6.92

8 DraymondGreen,SF GS 31.5 6.80

9 DeMarcusCousins,C SAC 34.1 6.12

10 KhrisMiddleton,SG MIL 30.1 6.06

Source:http://espn.go.com/nba/statistics/rpm/_/sort/RPM

Thispowerfultechnique(withslightvariationsduetothedifferentnatureofthevariablesandrelationships)canbeusedtoquantifytheimpactthataparticularindividual(student,teacher,player,nurse,athlete,technician)hasonthelargergroup(seeFigure6.18).

Figure6.18Cohortsanalysis

TheParksRamifications

TheParkscanemploycohortsanalysistoidentifyspecificemployeesandcharactersthatincreasetheoverallpark,attractions,characters,customer,andhouseholdsatisfactionandspendlevels.TheParkscanusetheresultsofthecohortsanalysisto

1. Decidehowmanyandwheretosituatespecific,popularcharacters;

2. Rewardparkassociatesthatdrivehighercustomersatisfactionscores.

FormoreinformationaboutcohortsanalysisinR,checkoutthearticle“CohortAnalysiswithR–RetentionCharts”athttp://analyzecore.com/2014/07/03/cohort-analysis-in-r-retention-

charts/.

SummaryTheobjectiveofChapter6istogiveyouatasteforthedifferenttypesofanalyticalgorithmsadatascienceteamcanbringtobearonthebusinessproblemsoropportunitiesthattheorganizationistryingtoaddress.Thischapterbetteracquaintedyouwiththedifferentalgorithmsthatthedatascienceteamcanusetoacceleratethebusinessuseranddatascienceteamcollaborationprocess.Whileitisnottheexpectationofthisbookorchaptertoturnbusinessusersintodatascientists,itismyhopethatChapter6willsetthefoundationthathelpsbusinessusersandbusinessleadersto“thinklikeadatascientist.”

Thischapterintroducedawidevarietyofanalyticalgorithmsthatthedatascienceteammightuse,dependingontheproblembeingaddressedandthetypesandvarietiesofdataavailable.Italsointroducedafictitiouscompany(Fairy-TaleThemeParks)againstwhichyouappliedthedifferentanalytictechniquestoseethepotentialbusinessactions(seeTable6.2).

Table6.2CaseStudySummary

Analytics Fairy-TaleParksUseCases PotentialBusinessActions

Trendanalysis Performtrendanalysistoidentifythevariables(e.g.,waittimes,socialmediaposts,consumercomments)thatarehighlycorrelatedtotheincreaseordecreaseinguestsatisfactionforeachattraction,restaurant,retailoutlet,andentertainment

Flagproblemareasandtakeimmediatecorrectiveactions(e.g.,openmorelines,promotelessbusyattractions,movekiosks,resituatecharacters)Identifythelocationforfutureattractions,restaurants,andentertainment

Boxplots Leverageboxplotstodeterminemostloyalguestsforeachofthepark'sattractions(e.g.,CanyonCopterRide,MonsterMansion,SpaceAdventure,GhoulishGulch)

CreateguestcurrentandmaximumloyaltyscoresandusethosescorestoprioritizetowhomtorewardwithPriorityAccesspassesandothercouponsanddiscounts

Geography(spatial)analysis

Conductgeographicaltrendanalysistospotanychanges(zip+4andhouseholdlevels)inthegeo-demographicsofvisitorsovertimeandbyseasonalityandholidays

Creategeo-specificmarketingcampaignsandpromotionstoincreaseattendancefromunder-penetratedgeographicalareasbasedondayofweek,holidays,seasonality,andevents(on-parkandoff-parkevents)

Pairsplot Comparemultiplevariablestoidentifythosevariablesthatdrivegueststowhichattractions,entertainment,andrestaurants

Makein-parkpromotionaldecisionsandoffersthatmovesgueststounder-utilizedattractions,entertainment,retailoutlets,andrestaurants

Timeseriesdecomposition

Leveragetimeseriesdecompositionanalysistoquantifytheimpactthatseasonalityandevents(in-parkandoff-park)hasonguestvisitsandassociatedspend

Createseason-specificmarketingcampaignsandpromotionstoincreasenumberofguestvisitsandassociatedspendDeterminewhichlocaleventsoutsideoftheparks(concerts,professionalsportingevents,BCSfootballgames)areworthyofpromotionalandsponsorshipspend

Clusteranalysis

ClustergueststocreatemoreactionableprofilesofTheParks'smostprofitableandhighestpotentialguestclusters

Leverageclusterresultstoprioritizeandfocusguestacquisitionandguestactivation,cross-sellandup-sellmarketingefforts

Normalcurveequivalent(NCE)analysis

LeverageNCEanalysistounderstandthepriceinflectionpointsfordifferentpackagesofattractionsandrestaurants

Leveragethepriceinflectionpointstocreatepackagesofattractionsandrestaurantstooptimizepricing(byseason,dayofweek,etc.)andcreatenewPriorityAccesspackages

Associationanalysis

Leveragemarketbasketanalysistoidentifymostpopularandleastpopular“baskets”ofattractions

Leveragemostcommon“baskets”tocreatenewpricingandPriorityAccesspackagesinordertobettercontroltrafficandwaittimesLeverageleastcommon“baskets”tocreatenewpricingandPriorityAccesspackagesinordertodrivetraffictounder-utilizedattractions

Graphanalysis

Leveragegraphanalysistouncoverdirectionandstrengthofrelationshipsamonggroupsof

Sendpromotions(discounts,restaurantvouchers,travelvouchers)togroupleadersin

guests(leaders,followers,influencers,cohorts)

ordertoencourageleaderstobringtheirgroupsbacktotheparksmoreoften

Textmining Mineguestcomments,socialmediaposts,ande-mailthreadstoflagareasofconcernandproblemsituations

Identifyandlocateunsatisfiedguestsinordertoprioritizeandfocuspersonal(face-to-face)guestinterventionefforts

Sentimentanalysis

Establishasentimentscoreforeachattractionandcharacterandmonitorsocialmediasentimentfortheattractionsandcharactersinreal-time

Leveragereal-timesentimentscorestotakecorrectiveactions(placateunhappyguests,openadditionallines,openadditionalattractions,removekiosks,movecharacters)

Traversepatternanalysis

Leveragetraversepatternanalysistounderstandparkandguestflowswithrespecttoattractions,entertainment,restaurants,retailoutlets,characters,etc.

IdentifywheretoplacecharactersandsituateportablekiosksinordertodriveincreasedrevenueDeterminewhatpromotionstoofferinordertodrivetraffictounder-utilizedattractionsandrestaurants

Decisiontreeclassifieranalysis

Usedecisiontreeclassifieranalysistoquantifythevariablesthatdriveguestsatisfaction

Leveragedecisiontreeclassifieranalysistodeterminewhichvariablestomanipulateinordertodriveguestsatisfactionandincreaseguestassociatedspend

Cohortsanalysis

Identifyspecificemployeesandcharactersthattendtoincreasetheoverallpark,attractions,characters,guestsatisfaction,andspendlevels

LeveragecohortsanalysistodecidehowmanyandwheretosituatecharactersIdentifyandrewardparkemployeesthatdrivehigherguestsatisfactionscores

Istronglyrecommendthatyoustaycurrentwiththedifferentanalytictechniquesthatyourdatascienceteamisusing.Takethetimetobetterunderstandwhentousewhichanalytictechniques.BuyyourdatascienceteamlotsofStarbucks,Chipotle,andwhiskey,andyourteamwillcontinuetoopenyoureyestothebusinesspotentialofdatascience.

HomeworkAssignmentUsethefollowingexercisestoapplywhatyoulearnedinthischapter.

Exercise#1:Revieweachoftheanalyticalgorithmscoveredinthischapterandwritedownoneortwousecaseswherethatparticularanalyticalgorithmmightbeusefulgivenyourbusinesssituations.

Exercise#2:RevisitthekeybusinessinitiativethatyouidentifiedinChapter2.Writedowntwoorthreeoftheanalyticalgorithmscoveredinthischapterthatyouthinkmightbeappropriatetothedecisionsthatyouaretryingtomakeinsupportofthatkeybusinessinitiative.

Exercise#3:Writedowntwoorthreebulletpointsaboutwhyyouthinkthoseselectedanalyticalgorithmsmightbemostappropriateforyourtargetedbusinessinitiative.

Notes1Risaprogramminglanguageandsoftwareenvironmentforstatisticalcomputingandgraphics.TheRlanguageiswidelyusedamongstatisticiansanddataminersfordevelopingstatisticalsoftwareanddataanalysis.R'spopularityhasincreasedsubstantiallyinrecentyears.(Source:Wikipedia)

Chapter7TheDataLakeThereisamajorindustrychangehappeningwithrespecttohoworganizationsstore,manage,andanalyzedata.Notsincetheintroductionofthedatawarehouseinthelate1980shaveweseensomethingwiththepotentialtotransformhoworganizationsleveragedataandanalyticstopowertheirkeybusinessinitiativesandrewiretheirvaluecreationprocesses.Thisnewdataandanalyticsarchitectureiscalledthedatalake,andithaspotentialtobeevenmoreimpactfulthanthedatawarehouseintransformingthewayorganizationsintegratedataandanalyticsintotheirbusinessmodels.Butasinallthingsrelatedtobigdata,organizationsmust“thinkdifferently”withrespecttohowtheydesign,deploy,andmanagetheirdataarchitecture.

Today'sdatawarehousesareextremelyexpensive.Asaresult,mostorganizationslimithowmuchdatatheystoreintheirdatawarehouse,optingfor13to25monthsofsummarizeddataversus15to25yearsofdetailedtransactionalandoperationaldata.Unfortunatelyfordatawarehouses,itisinthatdetailedtransactional,operational,sensor,wearable,socialdataandthegrowingbodyofinternalandexternalunstructureddatathatactionableinsightsaboutyourcustomers,products,campaigns,partners,andoperationscanbefound.

Forexample,overthepast15years,theUSeconomyhasgonethroughtwofulleconomiccycleswheretheeconomywasflyinghigh,collapsed,andthenrecovered.Bylookingateachoftheircustomer'sproductpurchasepatternsoverthosetwoeconomiccycles,organizationscanpredicthowacustomerispersonallyimpactedbyaneconomicdownturn.Agrocerychain,forinstance,couldmonitorindividualcustomer'sshoppingbasketsandpurchasepatternstouncoverchangesthatmayindicatechangesinthatcustomer'spersonaleconomicsituation.Fromthedetailedpoint-of-salestransactions,thegrocercouldseeachangeinanindividual'spurchasebehaviors(i.e.,movingfromexpensivetolower-costproducts,usingmorecouponsanddiscounts,increasingpurchasesofprivatelabelproducts,andsoon),whichmightindicateachangeinhisorherfinancialsituation.Theseinsightscouldprovidenewmonetizationopportunitiesandbetterwaystoservethatcustomergivenhisorherfinancialsituation,suchasapersonalizedpromotionhighlightingmoreeconomicalitemsforthatparticularcustomer.

IntroductiontotheDataLakeThedatalakewasbornoutofthe“economicsofbigdata”thatallowsorganizationstostore,manage,andanalyzemassiveamountsofdataatacostthatcanbe20to50timescheaperthanattraditionaldatawarehousetechnologies.BecauseoftheagileunderlyingHadoop/HDFSarchitecturethattypicallysupportsthedatalake,organizationscanstorestructureddata(relationaltables,csvfiles),semi-structureddata(weblogs,sensorlogs,beaconfeeds),andunstructureddata(textfiles,socialmediaposts,photos,images,video)as-is,withoutthetime-consumingandagility-limitingneedtopredefineadataschemapriortodataload.

However,therealpowerofthedatalakeistoenableadvancedanalyticsordatascienceonthedetailedandcompletehistoryofdatainanattempttouncovernewvariablesandmetricsthatarebetterpredictorsofbusinessperformance.Thedatalakecandothefollowing:

Eliminatedatasilos.Ratherthanhavingdozensofindependentlymanagedcollectionsofdata(e.g.,datawarehouses,datamarts,spreadmarts),youcancombinethesesourcesintoasingledatalakeforindexing,cataloging,andanalytics.Consolidationofthedataintoasingledatarepositoryresultsinincreaseddatauseandsharing,whilecuttingcoststhroughserverandlicensereduction.

Store,manage,protect,andanalyzedatabyconsolidatinginefficientstoragesilosacrosstheorganization.

Provideasimple,scalable,flexible,andefficientsolutionthatworksacrossblock,file,orobjectworkloads(i.e.,asharedstorageplatformthatnativelysupportsbothtraditionalandnextgenerationworkloads).

ReducethecostsofITinfrastructure.

Speeduptimetoinsights.

Improveoperationalflexibility.

Enablerobustdataprotectionandsecuritycapabilities.

Reducedatawarehouseworkloadsbyreducingtheburdenofanalytics-intensivequeriesthatwouldbebestdoneinaspecial-purposeanalyticssandboxenvironment.

Freeupdatawarehousingresourcesbyoff-loadingExtract,Transform,andLoad(ETL)processesfromthedatawarehousetothemorecost-efficient,morepowerfulHadoop-baseddatalake.

NOTE

Itistypicalthat40to60percentofthedatawarehouseprocessingloadisperformingETLwork.Off-loadingsomeoftheETLprocessestothedatalakecanfreeupconsiderabledatawarehouseresources.

UnhandcufftheBIanalystsanddatascienceteamfrombeingreliantonthesummarizedandaggregateddatainthedatawarehouseasthesinglesourceofdatafortheirdataanalytics(andmitigatetheunmanageableproliferation“spreadmarts”1thatarebeingusedbybusinessanalyststoworkaroundtheanalyticlimitationsofthedatawarehouse).

Thedatalakesolvesagreatmanyproblems.However,itcanalsoraisealotofquestions.Inapapertitled“BewaretheDataLakeFallacy”(http://www.gartner.com/newsroom/id/2809117),Gartnerraisedcautionsaboutthedatalake,specificallyaroundtheassumptionthatallenterpriseaudiencesarehighlyskilledatdatamanipulationandanalysis.Gartner'spointwasthatifadatalakefocusesonlyonstoringdisparatedataandignoreshoworwhydataisused,governed,defined,andsecuredorhowdescriptivemetadataiscapturedandmaintained,thedatalakerisksturningintoadataswamp.Andwithoutanadequatemetadatastrategy,everysubsequentuseofdatameanstheanalystsmuststartfromscratch.

Theabilityofanorganizationtorealizebusinessvaluefrombigdatareliesontheorganization'sabilitytoeasilyandquickly:

Identifythe“rightand/orbestdata”

Definetheanalyticsrequiredtoextractthevalue

Bringthedataintoananalyticsenvironment(sandbox)suitedforadvancedanalyticsordatasciencework

Curatethedatatoapointwhereitis“suited”foranalysis

Standuptherequiredinfrastructuretosupporttheanalyticsinaccordancewiththedesiredperformanceandthroughputrequirements

Executetheanalyticmodelsagainstthecurateddatatoderivebusinessvalue

Deploytheanalyticsintotheproductioninfrastructure

Delivertheanalyticresultsinanactionablemannertothebusiness

NOTE

Statingthatthedatalakeisthe“singlerepositoryforALLyourorganization'sdata”doesnotmeanthattherearenootherrepositoriesofdataacrossyourorganization.Youroperationalsystems(suchasSAP,OracleFinancials,PeopleSoft,andSiebel)willcontinuetostoredatafortheirownoperationalreportingneeds,butthedatafromthosedatasourcesshouldeventuallybeloadedintothedatalake.Andyourdatawarehouses,datamarts,andOnlineAnalyticProcessing(OLAP)cubeswillcontinuetostoredatafortheirownuniquereportingandanalysisneeds,butthedataforthoserepositoriesshouldbesourcedfromyourdatalake.Intheend,thedatalakeisthe“central”repositoryforALLyourorganization'sdata.

CharacteristicsofaBusiness-ReadyDataLakeThedatalakeisnotanincrementalenhancementtothedatawarehouse,anditisNOTdatawarehouse2.0.Thedatalakeenablesentirelynewcapabilitiesthatallowyourorganizationtoaddressdataandanalyticchallengesthatthedatawarehousecouldnotaddress.

Therearefivecharacteristicsthatdifferentiateabusiness-readydatalakefromthedatawarehouse(seeFigure7.1):

Figure7.1Characteristicsofadatalake

Ingest.Abilitytorapidlyingestdatafromawiderangeofinternalandexternaldatasources,includingstructuredandunstructureddatasources.Thedatalakecanaccomplishrapiddataingestionbecauseitcanloadthedataas-is;thatis,thedatalakedoesnotrequireanydatatransformationsorpre-buildingadataschemabeforeloadingthedata.

Store.AsingleorcentralrepositoryforamassingALLtheorganization'sdataincludingdatafrompotentiallyinterestingexternalsources.Thedatalakecanstoredataeveniftheorganizationhasnotyetdecidedhowitmightusethedata.AstheDirectorofAnalyticsandBusinessIntelligenceatStarbuckswasquoted:“AfullquarterofStarbuckstransactionsaremadeviaitspopularloyaltycards,andthatresultsin“hugeamounts”ofdata,butthecompanyisn'tsurewhattodowith[allthatdata]yet.”Thesamegoesforsocialmediadata,asStarbuckshasateamwhoanalyzessocialdata,but“Wehaven'tfiguredoutwhatexactlytodowithityet.”2

Analyze.Providesthefoundationfortheanalyticsenvironment(oranalyticssandbox)wherethedatascienceteamisfreetoexploreandevaluatedifferent

internalandexternaldatasourceswiththegoalofuncoveringnewcustomer,product,andoperationalinsightsthatcanbeusedoptimizekeybusinessprocessesandfuelnewmonetizationopportunities.

Surface.Supportstheanalyticmodeldevelopmentandtheextractingoftheanalyticresults(e.g.,scores,recommendations,nextbestoffer,businessrules)thatareusedtoempowerfrontlineemployees'andbusinessmanagers'decisionmakingandinfluencecustomerbehaviorsandactions.

Act.Enablestheintegrationoftheanalyticresultsbackintotheorganization'soperationalsystems(callcenter,directmarketing,procurement,storeoperations,logistics)andmanagementapplications(reports,dashboards)that“closestheloop”withrespecttooptimizingdataandanalytics-baseddecisionmakingacrosstheorganization.

UsingtheDataLaketoCrosstheAnalyticsChasmIcomefromthedatawarehouseworld,havinggottenstartedin1984withMetaphorComputers.Infact,thatwassolongagothatwedidn'tevencallitdatawarehousing,butinsteadcalledit“DecisionSupport”(whichI'dargueisstillabetternameforwhatwearetryingtodo).

Formostorganizations,thedatawarehouses,andtheBusinessIntelligence(BI)toolsthatrunontopofthedatawarehouses,operatelikeaproductionenvironmentwiththefollowingcharacteristics:

Abilitytosupportthecreationanddeliveryofoperationalandmanagementreportsanddashboardsonaregularlyscheduledbasis(e.g.,reportsdeliveredendofday,endofweek,endofquarter;dashboardsupdatedeverymorning)

PredictablecomputeandprocessingloadtoruntheETLroutines,generatethemanagementandoperationalreports,andupdatethemanagementdashboards

SLA-constrainedinthattherearenotmanyextraprocessingcyclestogettheETLandreportanddashboardgeneratingjobsdonewithinthe24-hourdailywindow

Heavilygoverned(datagovernance,auditability,traceability,datalineage,metadatamanagement)toensurethatthehistoricaldatabeingreportedis100percentaccurate

StandardizationoftoolsinordertobettercontrolBIandETLtoolacquisition,maintenance,training,andsupportcosts

Ontheotherhand,theanalyticsenvironmentisdramaticallydifferentfromtheBIanddatawarehouseenvironmentinitsobjectives,purpose,andoperatingcharacteristics(i.e.,howitisused).Theanalyticsenvironmentischaracterizedas:

Anexploratoryenvironmentwherethedataanalystswanttoquicklyingestandanalyzelotsofdata

Unpredictablesystemloadthatishighlydependentontheanalysts'dailyworkobjectives,explorationneeds,andadhocanalyticalrequests

Heavilyexperimentationorientedtogivethedataanalyststhefreedomtotestnewdatasources,newalgorithms,newdataenrichmenttechniques,andnewtools

Looselygovernedinthatthedataneednotbemanagedundersomegovernanceumbrellauntilthedataanalystsfirstprovethatthereissomevalueinthedata

“Besttoolforthejob”withthedataanalystsusingwhateverdatavisualization,dataexploration,andanalyticmodelingtoolswithwhichtheyfeltmostcomfortable

Asadatawarehousemanager,Ihatedtheanalyticsteam.Why?Becausewheneveritsmembersneededdata,theyalwayscametomydatawarehouseforthedatabecausetheyweretoldthatthedatawarehousewasthe“singleversionofthetruth.”Andtheanalyticteam'sdataandqueryrequestsusuallyscrewedupmyproductionSLAsintheprocess(seeFigure7.2).

Figure7.2Theanalyticsdilemma

Thesolution:putaHadoop-baseddatastore(datalake)infrontofboththedatawarehouseandtheanalyticsenvironments(seeFigure7.3).

Figure7.3Thedatalakelineofdemarcation

Thedatalakeprovidesa“lineofdemarcation”betweentheproductionrequirementsofthedatawarehouseandtheadhoc,exploratorynatureoftheanalyticsenvironment.Inaddition,thedatalakeprovidesotherbenefitsthatwewillcoverlaterinthischapter.

ModernizeYourDataandAnalyticsEnvironmentThereareseveralactionsthatorganizationscantaketodaytoexploitthevalueofthedatalaketomodernizetheirexistingdatawarehouseandanalyticsenvironments.

Action#1:CreateaHadoop-BasedDataLakeTheHadoopDistributedFileSystems(HDFS)providesapowerfulyetinexpensivefoundationforyourdatalake.HDFSisacost-effectivelargestoragesystemwithlow-cost,scalablecomputingandanalyticalcapabilities(e.g.,MapReduce,YARN,Spark).Builtoncommodityhardwareclusters,HDFSsimplifiestheacquisitionandstorageofdiversedatasources(seeTable7.1).

Table7.1DataLakeDataTypes

DataType Example

Structureddata Relationaldatabases,datatables,csvfiles

Semi-structureddata

Weblogs,sensorfeeds,XML,JSON

Unstructureddata

Socialmediaposts,textnotes,consumercomments,images,videos,audio

OnceintheHadoop/HDFSsystem,MapReduce,YARN,Spark,andotherHadoop-basedtoolsareavailabletopreparethedataforloadingintoyourdatawarehouseandanalyticenvironments(seeFigure7.4).

Figure7.4CreateaHadoop-baseddatalake

Theadvantagesofthedatalakeinclude:

Rapiddataingestionas-is.Organizationsdonotneedtopre-definetheschemaortransformthedatapriortoloadingthedataintothedatalake,whichsimplifiesandspeedstheprocessofamassingdatafromavarietyofinternalandexternaldatasources.

Low-costdataandanalyticsenvironmentbuiltoncommodityhardwareserversandopensourcesoftwarethatcanbe20to50timescheapertostore,manage,andanalyzedatathantraditionaldatawarehousetechnologies.

100percentlinearcomputescalability.Whenyouneedtodoublecomputecapacity,justdoublethenumberofnodes.

Action#2:IntroducetheAnalyticsSandboxAdatalakestrategysupportstheintroductionofaseparateanalyticsenvironmentthatoff-loadstheanalyticsbeingdonetodayonyouroverlyexpensivedatawarehouse.Thisseparateanalyticsenvironmentprovidesthedatascienceteamanon-demand,fail-fastenvironmentforquicklyingestingandanalyzingawidevarietyofdatasourcesinanattempttoaddressimmediatebusinessopportunitiesindependentofthedatawarehouse'sproductionscheduleandservicelevelagreement(SLA)rules(seeFigure7.5).

Figure7.5Createananalyticsandbox

TheanalyticsenvironmentinFigure7.5couldn'tbemoredifferentfromyourdatawarehouseenvironment.Yourdatawarehouseenvironmentisaproductionenvironmentthatneedstosupporttheregular(daily,weekly,monthly,quarterly,annual)productionofoperationalandmanagementreportsanddashboardsthat

areusedtorunthebusiness.Todothat,datawarehouseenvironmentshavestrictservicelevelagreements,areheavilygoverned,andlimitthenumberofBIandETLtoolsinordertocontroltoolacquisition,maintenance,andtrainingcosts.

Ananalyticsenvironment,ontheotherhand,ismuchmoreadhocandon-demanddriven.Theanalyticsenvironmentmustsupportthecontinuousexplorationandevaluationofnewsourcesofinternalandexternaldatainanattempttouncoveractionableinsightsaboutcustomers,products,andoperations.Theanalyticsenvironmentmustsupportthedatascienceteam'sneedtotestnewanalytictoolsandalgorithmsanddevelopnewdatatransformationandenrichmenttechniquesinsearchofthosevariablesandmetricsthatarebetterpredictorsofbusinessperformance.

Action#3:Off-LoadETLProcessesfromDataWarehousesDoingtheETLprocessingwithinyourexistingdatawarehouseisacommonpracticetoday.However,ifyourdatawarehouseisalreadyoverloadedanditisveryexpensivetoaddmoreprocessingcapacity,whydothatbatch-centric,datatransformationheavyliftingintheexpensivedatawarehouseenvironment?That'slikehavingahigh-powered,ultra-coolTeslahaulturnipsaroundthefarm.

FreeupdatawarehouseresourcesandimproveyourETLprocesseffectivenessbyoff-loadingtheETLprocessesoffyourexpensivedatawarehouseplatform.Instead,performtheETLworkinthedatalake.ThisallowsorganizationstoleveragethenativelyparallelHadoopenvironmenttobringtobeartheappropriatenumberofcomputecapabilitiesattheappropriatetimestogetthejobdonemorequicklyandmorecost-effectively(seeFigure7.6).

Figure7.6MoveETLtothedatalake

Aswe'vediscussedbefore,notonlydoesusingHadoopforyourETLworkmakesensefromacostandprocessingeffectivenessperspective,butitalsogivesorganizationsthecapabilitytocreatenewmetricsthatareoftendifficulttocreateusingtraditionalETLtools.Forexample,usingHadoopmakesitmucheasiertocreateadvancedcustomerpurchaseandproductperformancemetricsaroundfrequency(howoften),recency(howrecently),andsequencing(inwhatorder)activitiesthatcouldyieldnewinsightsthatmightbebetterpredictorsofcustomerbehaviorsandproductperformance.

AnalyticsHubandSpokeAnalyticsArchitectureWehavespentaconsiderableamountofthischapterdescribingthedatalake;nowlet'sdiscusswhyyourorganizationneedsadatalake.Thevalueandpowerofadatalakeareoftennotfullyrealizeduntilwegetintooursecondorthirdanalyticusecase.Thatisbecauseitisatthatpointwheretheorganizationneedstheabilitytoself-provisionananalyticsenvironment(computenodes,data,analytictools,permissions,datamasking)andsharedataacrosstraditionalline-of-businesssilos(onesingularorcentralizedlocationforalltheorganization'sdata)inordertosupporttherapidexplorationanddiscoveryprocessesthatthedatascienceteamusestouncovervariablesandmetricsthatmightbebetterpredictorsofbusinessperformance.

Thisisa“HubandSpoke”analyticsenvironmentwherethedatalakeisthe“hub”thatenablesthedatascienceteamstoself-provisiontheirownanalyticsandboxesandfacilitatesthesharingofdata,analytictools,andanalyticbestpracticesacrossthedifferentpartsoftheorganization(seeFigure7.7).

Figure7.7HubandSpokeanalyticsarchitecture

Thehubofthe“HubandSpoke”architectureisthedatalakethatprovides:

Centralized,singular,schema-lessdatastorewithraw(as-is)dataandmassageddata

Mechanismforrapidingestionofdatawithappropriatelatency

Abilitytomapdataacrosssourcesandprovidevisibilityandsecuritytousers

Catalogtofindandretrievedata

Costingmodelofcentralizedservice

Abilitytomanagesecurity,permissions,anddatamasking

Supportsself-provisioningofcomputenodes,data,andanalytictoolswithoutITintervention

Thespokesofthe“HubandSpoke”analyticsarchitecturearetheanalyticusecasesorapplicationsthathelptheorganizationtooptimizekeybusinessprocesses,deliveramorecompellingcustomerexperience,anduncovernewmonetizationopportunities.The“spokes”havethefollowingcharacteristics:

Abilitytoperformanalytics(datascience)

Analyticssandbox(HDFS,Hadoop,Spark,Hive,HBase)

Dataengineeringtools(ElasticSearch,MapReduce,YARN,PivotalHAWQ,SQL)

Analyticaltools(SAS,R,Mahout,MADlib,H2O)

Visualizationtools(Tableau,DataRPM,ggplot2)

Abilitytoexploitanalytics(applicationdevelopment)

Thirdplatformapplication(mobileappdevelopment,websiteappdevelopment)

Analyticsexposedasservicestoapplications(APIs)

Integratein-memoryand/orin-databasescoringandrecommendationsintobusinessprocessandoperationalsystems

The“HubandSpoke”analyticsarchitectureenablesthedatascienceteamtodevelopthepredictiveandprescriptiveanalyticsthatarenecessarytooptimizekeybusinessprocesses,provideadifferentiatedcustomerengagement,anduncovernewmonetizationopportunities.

EarlyLearningsThereismuchwecanlearnfromthreedecadesdealingwiththelimitationsofthedatawarehouse.Thereareseverallessonsthatwecantakeawayfromourdatawarehousingexperiencesthatwecanapplytodaytoensurethatwedonotmakethesamemistakesindeployingadatalakestrategy.

Lesson#1:TheNameIsNotImportantSeveraldecadesago,abattleragedbetweendatawarehouseadvocates(associatedwithBillInmonandtheCorporateInformationFactory)anddatamartadvocates(associatedwithRalphKimballandstarschemas)regardingnomenclatureandterminology.Countlessyearswerewastedattradeshows,atseminars,andinconferenceroomsacrosstheworlddebatingwhichapproachwasthe“right”approach.Asareminder:

Datawarehouseorenterprisedatawarehouse(EDW)isasubject-oriented,nonvolatile,integrated,timevariantcollectionofdatainsupportofmanagement'sdecisions.Theenterprisedatawarehouseapproachisoftencharacterizedasatop-downapproach,moreinalignmentwiththeOnlineTransactionProcessing(OLTP)systemsfromwhichthedatawassourced.Thedatawarehousetypicallyhasanenterprise-wideperspective.

Datamartistypicallyorientedtoaspecificbusinessfunction,department,orlineofbusiness.Thisenableseachdepartmentorlineofbusinesstouse,manipulate,anddevelopthedataanywayitseesfit,withoutalteringinformationinsideotherdatamartsortheenterprisedatawarehouse.Datamartsusetheconceptof“conformeddimensions”tointegratedataacrossbusinessfunctions,replicatinginmanywaysthesamedatathatiscapturedintheenterprisedatawarehouse.

Interestingly,historyhasshownthatbothapproachesworked!Therewerecertainlyterminology,architectural,anddeploymentdifferencesbetweenthetwoapproaches,butthebottomlineisthattheybothrequiredthesamekeycapabilitiessuchas:

Captureslargeamountsofhistoricaldatathatcouldbeusedtoanalyzetheperformanceofthekeybusinessentities(dimensions)andidentifytrendsandpatternsinthedata

Datagovernanceproceduresandpoliciestoensurethatthedatastoredinthedatawarehouseanddatamartswere100percentaccurate

Masterdatamanagementtoensurecommondefinitions,terminology,andnomenclatureacrossthelinesofbusiness

Abilitytojoinorintegratedatafromdifferentdatasourcescomingfromdifferentbusinessfunctions

Enduserqueryconstruction(usingSQLandBItools)thatsupportedthegenerationofdaily,weekly,monthly,andquarterlyreportsanddashboardsandalsosupportedtheadhocslicinganddicingofthedata—drillup,drilldown,anddrillacrossdifferentdatasources—toidentifyareasofover-andunder-performance

Thedebateaboutwhetheritisadatalakeoradatareservoiroranoperationaldatastoreisneitherusefulnorconstructive.Let'sjustpickanameandmakeitwork—anddatalakeitis!

Lesson#2:It'sDataLake,NotDataLakesHavingmultipledatalakesreplicatesthesameproblemsthatwerecreatedwithmultipledatawarehouses—disparatedatasiloesanddatafiefdomsthatdon'tfacilitatesharingofthecorporatedataassetsacrosstheorganization.OrganizationsneedtohaveasingledatalakefromwhichtheycansourcethedatafortheirBI/datawarehousingandanalyticneeds.Thedatalakemayneverbecomethe“singleversionofthetruth”fortheorganization,butthenagain,neitherwillthedatawarehouse.Instead,thedatalakebecomesthe“singleorcentralrepositoryforalltheorganization'sdata”fromwhichalltheorganization'sreportingandanalyticneedsaresourced.

Unfortunately,someorganizationsarereplicatingthebaddatawarehousepracticebycreatingspecial-purposedatalakes—datalakestoaddressaspecificbusinessneed.Resistthaturge!Instead,sourcethedatathatisneededforthatspecificbusinessneedintoa“analyticsandbox”wherethedatascientistsandthebusinessuserscancollaboratetofindthosedatavariablesandanalyticmodelsthatarebetterpredictorsofthebusinessperformance.Withinthe“analyticsandbox,”theorganizationcanbringtogether(ingestandintegrate)thedatathatitwantstotest,buildtheanalyticmodels,testthemodel'sgoodnessoffit,acquirenewdata,refinetheanalyticmodels,andretestthegoodnessoffit.Yep,theanalyticsandboxperfectlysupportsthedatascienceengagementprocesscoveredinChapter5,“DifferencesBetweenBusinessIntelligenceandDataScience”(seeFigure7.8).

Figure7.8Datascienceengagementprocess

Iforganizationsaretryingtomaintainmultipledatalakes,thentheorganizationrisksthesameresultsandmanagementdistrustthatstillexiststodaywithmanydatawarehouseimplementations—executivesarguingwhichnumbersarecorrectbecausethedatainthereportsanddashboardsisbeingsourcedfromdifferentdatawarehousesanddatamarts.Let'snipthisprobleminthebudnow!It'sasingledatalake.

Lesson#3:DataGovernanceIsaLifeCycle,NotaProjectIlove(hate)theindustrypunditswhoquicklyjumponthe“Whataboutdatagovernance?”issuewhenwetalkaboutbigdataandthedatalake.Well,whataboutit?Ofcourseitisimportant,andofcoursesmartorganizationsneverforgotaboutit.Asthevolumeofdatagrowsinthedatalake,governancebecomesevenmorecriticalforansweringthewhat,where,andwhohasaccesstodataquestions.

However,thedatagovernancediscussiontakesonanewwrinklewhenyoucontemplatedatainthedatawarehouseversusdatainthedatalake.Whiledatainthedatawarehousestrivesfor100percentgovernance,organizationsaregoingtorealizethatthereneedstobedifferent“degrees”orlevelsofdatagovernanceinthedatalakedependingonhowthedataisbeingused,suchas:

HighlyGovernedData.Datathatwillbesourcedoutofthedatalakeintothedatawarehouseneedstobehighlygoverned.Thisincludesoperationalandperformancedata,aswellasdatasuchasmedical,financial,personallyidentifiableinformation,creditcardinfo,accountnumbers,passwords,etc.Sincethisisthedatathatappearsinmanagement,compliance,andregulatoryreporting,thisdataneedstobe100percentgoverned.

ModeratelyGovernedData.Datathatisgoingtobeusedbythedatascienceteamtocreatepredictiveandprescriptivemodelsinanattempttopredictperformanceneedstobemoderatelygoverned.Thelevelofdatagovernancewillbeultimatelydeterminedbasedonthecostoftheanalyticmodelsbeingwrong(thinkTypeI“falsepositive”andTypeII“falsenegative”modelingerrorsandthepotentialbusinessimpactsofthosetypesoferrors).

UngovernedData.Datathatisjustbeingheldinthedatalakeandforwhichnovaluehasyetbeenattributedtothatdatawouldbeungoverned.Thedatascienceteamisfreetoacquireandexperimentwiththisungoverneddata.However,oncethereissomelevelofvalueestablishedforthedata(i.e.,dataisusedtopowerafinancialclient“RetirementReadiness”score),thenthedataneedstomoveintothemoderatelygoverneddataclassification.

Inthebigdataworld,thegoalforthesmartorganizationshouldbe“just-enoughdatagovernance.”Whywastemanagementcyclesgoverningdatawhenthatdatamightnotevenbeusefultotheorganization?Butoncethevalueofthatdatahasbeenascertainedinhowthatdataisgoingtobeusedtooptimizekeybusinessprocessesanduncovernewmonetizationopportunities,thentheappropriatelevelofdatagovernance(highlygoverned,moderatelygoverned,ungoverned)canbeappliedtothatspecificdatasource.

Lesson#4:DataLakeSitsBeforeYourDataWarehouse,NotAfterItSeveraltraditionaldatawarehousevendorsaretryingtoconvincetheircustomersthatthedatalakeshouldsitafterthedatawarehouse;thatis,thedatalakeshouldbepopulatedfromthedatawarehouseversuspopulatingthedatawarehousefromthedatalake.Sorry,butthat'saself-servingpropositionfromvendorswhoarealreadyseeingtheeconomicimpactontheirrevenuesandprofitswithrespecttothepowerofthedatalaketoreshapehoworganizationsstore,manage,analyze,andvaluedata.

Theproblemwiththis“DataWarehouseFirst”argumentisthatmanyofthedatalakebenefits(rapiddataingest,capturingdataas-iswithnoneedtoprebuildadataschema,supportforunstructureddatasources,nolossofdatafidelityduetodatatransformations,singlerepositoryforalltheorganization'sinternalandexternaldata)arelostifthedatafirstneedstogothroughthedatawarehouse.

Iamsurethatthe“DataWarehouseFirst”messageinitiallyresonateswithorganizationsthathavespentyearsbuildingouttheirdatawarehousecapabilities.Butdatawarehouseteamsarebeginningtounderstandthebenefitsofloadingthedatafirstintothedatalake,includingfreeingupdatawarehouseresourcesfromdoingtheETLworkandsupportingtheadvancedanalyticmodelingthatcannoteasilybedonewithinthedatawarehouse.

WhatDoestheFutureHold?Thecost,processing,andagilityadvantagesofHadoop/HDFSwouldmakeitappearthatitisonlyamatteroftimebeforeHadoop/HDFSreplacestheRelationalDataBaseManagementSystems(RDBMS)asthedatawarehouseplatformofchoice.TheHadoop/HDFScost,processingandagilityadvantagesovertheexpensivecommercialandproprietaryRDBMSproductswillsoonbecometoomuchfororganizationstoignore.

TodaythereismuchinertiafororganizationstomoveofftheRDBMSdatawarehouseplatform.OrganizationsnotonlyhaveinvestedyearsandevendecadesofefforttobuildtheirdatawarehouseenvironmentontheseRDBMSplatformsbutalsohavecreatedamultitudeofBIreportsanddashboardsontheseRDBMS-baseddatawarehousesthatactasagiantanchorindissuadingorganizationsfromcontemplatingtransitioningtoaHadoop/HDFSdatawarehouseplatform.

Butthetimestheyareachangin'.Thedevelopmentandrapidadoptionofopensource“SQLonHadoop”productslikePivotalHAWQ(nowpartoftheOpenDataPlatforminitiative),ClouderaImpala,andHortonworksStingerareenablingthelegionsofSQL-traineddeveloperstodevelopSQL-basedreportsanddashboardsonHadoop.

PlusthedevelopmentofnewsoftwareproductslikeAtScale(thatactsasalayerbetweenHadoop/HDFSandanorganization'sexistingBItools)andXplain.io3

(thatautomatestherewritingofRDBMSSQLtoworkonHadoop)willacceleratetheinevitabilityofthetransitionofthedatawarehouseplatformtoHadoop/HDFS(seeFigure7.9).

Figure7.9Whatdoesthefuturehold?

Formanyorganizations,datawarehousedecisionsarefraughtwithpersonalbiases.AndyearsanddecadesofdatawarehouseandBIdevelopment,personneltraining,andtoolacquisitionwillmakeanytransitionofftheRDBMSdatawarehouseplatformtoaHadoopdatawarehouseplatformmoreofareligiousdebatethanafinancialortechnologydecision.Butsoon,theeconomicsofbigdata,plusthecontinueddevelopmentofnewtoolstosupportthedatawarehouseonHadoop,willbetoocompellingtoignore.AndIwanttobetherewhenthatdaycomes!

SummaryThesearecertainlymarveloustimestobeinthedatabusiness.ThedatalakeleveragesnewbigdatatechnologyinnovationstoenableorganizationstoextendandenhancetheirexistingdatawarehouseandETLinvestmentswhileempoweringbusinessanalystsanddatascientiststoexplorenewdatasourcesanddataenrichmenttechniquestoteaseoutnewactionableinsightsabouttheircustomers,products,andoperations.

Figure7.10showsEMC'spre-engineeredFederationBusinessDataLake.Itisoneoftheindustry'smostcomplete,well-thought-outdatalakearchitectures,asitlaysoutthekeycomponentsandservicesnecessary(includingdatagovernance,cataloging,dataingest,indexing,andsearching)asorganizationsmovetoanenterprise-readydatalake—orasIliketocallit,datalake2.0.

Figure7.10EMCFederationBusinessDataLake

Theindustryisonlyattheearlystagesofthedatalakeera.Thereismuchstilltobewrittenabouthowthedatalakewilldramaticallychangethewaysthatorganizationsstore,manage,analyze,andvaluedata.Heck,maybeIwillneedtowriteathirdbookafterall.Watchthisspace!

HomeworkAssignmentUsethefollowingexercisestoapplywhatyoulearnedinthischapter.

Exercise#1:Listthebenefitsthatthedatalakecouldbringtoyourorganization'sexistingdatawarehousingenvironment.

Exercise#2:Listthebenefitsthatthedatalakebringstoyourorganization'sbusinessanalystsanddatascientists.

Exercise#3:ListtheissuesthatarepreventingyourorganizationfrommovingitsdatawarehouseenvironmentfromanRDBMS-basedplatformtoaHadoop-basedplatform.

Exercise#4:ForeachoftheissueslistedinExercise#3,capturewhatyourorganizationwouldneedtoseehappen(e.g.,tools,training,references,managementsupport)inordertoaddressthatissue.

Notes1“Spreadmarts”arespreadsheetsordesktopdatabasemanagementapplications(MicrosoftAccess)thatarecreatedandmaintainedbybusinessanalystsoutsidethepurview,support,andmaintenanceofthecentralizedinformationtechnologyorganization.Spreadmartstypicallycontaindatathatmayhaveoriginatedfromthedatawarehousebuthasbeentransformedandintegratedwithotherdatatosupportthebusinessanalysts'specificanalysisneeds.

2http://adage.com/article/datadriven-marketing/starbucks-data-pours/240502/

3AcquiredbyCloudera.

PartIIIDataScienceforBusinessStakeholders

InThisPart

Chapter8:ThinkingLikeaDataScientist

Chapter9:“By”AnalysisTechnique

Chapter10:ScoreDevelopmentTechnique

Chapter11:MonetizationExercise

Chapter12:MetamorphosisExercise

Chapter8ThinkingLikeaDataScientistOneofthemostfrequentquestionsIgetis:“HowdoIbecomeadatascientist?”Wow,toughquestion.Therearemanyoutstandingbooksanduniversitycoursesthatoutlinethedifferentskills,capabilities,andtechnologiesthatadatascientistisgoingtoneedtolearnandeventuallymaster.I'vereadseveralofthesebooksandamimpressedwiththedepthofthecontent.

Mostofthesebooksspendthevastmajorityoftheirtimecoveringtopicssuchstatistics,datamining,textmining,anddatavisualizationtechniques.Yes,theseareveryimportantdatascienceskills,buttheyarenotnearlysufficienttomakeourdatascienceteamseffective.Anditisnotpractical,orevenbeneficial,totrytoturnyourentireworkforceintodatascientists.Nevertheless,itisrealistictoteachthebusinessstakeholdersto“thinklikeadatascientist”inorderforthebusinessstakeholderstounderstandthetypesofbusinessopportunitiesthatcanbedrivenbyapplyingpredictiveandprescriptiveanalyticstonewsourcesofcustomer,product,andoperationaldataandtohelpthedatascienceteamtouncoverthosevariablesandmetricsthatarebetterpredictorsofperformance.

ThepotentialofthinkinglikeadatascientistfirsthitmewhenIwastheVicePresidentofAdvertiserAnalyticsatalargeInternetportalcompany.IwascharteredwithbuildingtheanalyticstohelpouradvertisersandagenciesimprovetheperformanceoftheirmarketingspendacrosstheInternetportal'sadnetwork.WhenIjoinedthecompany,Iknewverylittleaboutthedigitalmarketingworld.SoIspentthefirstthreemonthsontheroadshadowingthecompany'stopadvertisersandtheirrespectiveadvertisingagenciestobetterunderstandtheiranalyticexpectationsandrequirements.

Afteraboutthreeweeksontheroad,Ithoughttheprojectwasgoingtobeadisaster.Everyoneoftheanalyticteamsattheadvertisersandadagencieswithwhomwemetjustwantedmoredatainatimeliermanner.Wewerealreadygivingtheseanalystteamsmostofdatathatwehad,butyettheydidnotseemtobeabletoleveragethisdatatoimprovetheirperformanceacrossouradnetwork.

Thatiswhenoneofmyteammembershadoneofthoselightbulbmoments—weweretalkingtothewrongpeople.Itwasn'ttheanalystswithintheseadvertisersandadagencieswhoweremakingthedigitalmarketingexecutiondecisions,butitwasthemediaplannersandbuyers(whoweremakingthedecisiontowhichadnetworkstoallocatethemarketingorcampaignspendpriortocampaignlaunch)andthecampaignmanagers(whoweretryingtomakein-flightcampaignadjustmentsusingretrospective,after-the-fact,descriptivereporting).

Soweswitchedtheentirefocusofourproductdevelopmenteffortstofocusonthesekeybusinessstakeholdersandtocapturethedecisionsthattheyhadtomakeandthequestionsthattheyhadtoanswerinsupportoftheirdigitalmarketingcampaigns.Andthat'swhenthefundamentalsbehindthe“thinkinglike

adatascientist”processwereborn.

Thischapterwillintroduceaframework,techniques,andhands-onexercisestohelpbusinessstakeholders“thinklikeadatascientist.”The“thinkinglikeadatascientist”frameworkwillhelpthebusinessstakeholderstocollaboratewithdatascientiststouncoverthosevariablesandmetricsthatcanimprovebusinessperformanceanddrivebusinessandfinancialvalue.

Datascienceteamsneedhelpfromthebusinessusers—orsubjectmatterexperts(SME)—tounderstandthedecisionsthebusinessistryingtomake,thehypothesesthatthebusinesswantstotest,andthepredictionsthatthebusinessneedstomake.Theeight-step“thinkinglikeadatascientist”frameworkcovers:

Step1:IdentifyKeyBusinessInitiative

Step2:DevelopBusinessStakeholderPersonas

Step3:IdentifyStrategicNouns

Step4:CaptureBusinessDecisions

Step5:BrainstormBusinessQuestions

Step6:Leverage“By”Analysis

Step7:CreateActionableScores

Step8:PuttingAnalyticsintoAction

Inessence,toimprovetheoveralleffectivenessofourdatascienceteams,weneedtoteachthebusinessuserstothinklikeadatascientist.Asanoutcomefromthiseight-stepprocess,thebusinessstakeholdersandthedatascientistsshouldbebetterpreparedtouncoverthosevariablesandmetricsthatarebetterpredictorsofperformance.

TheProcessofThinkingLikeaDataScientistThebasicgoalofdatascienceistouncovernewvariablesormetricsthatarebetterpredictorsofperformance.But“performance”ofwhat?Thatis,uponwhatshouldthedatascienceteamfocusitsanalyticexplorationandmodelingdevelopmentefforts?Itshouldbenosurprisethatthestartingpointforour“thinkinglikeadatascientist”processstartsbyunderstandingtheorganization'skeybusinessinitiatives.

Step1:IdentifyKeyBusinessInitiativeWouldyouexpectanythingdifferentfrommethanstartingwithwhat'simportanttothebusiness?So,howcanyouspotakeybusinessinitiative?AswascoveredinChapter3,“TheBigDataStrategyDocument,”akeybusinessinitiativeischaracterizedas:

Criticaltotheimmediate-termperformanceoftheorganization

Documented(communicatedeitherinternallyorpublicly)

Cross-functional(involvesmorethanonebusinessfunction)

Ownedandchampionedbyaseniorbusinessexecutive

Hasameasurablefinancialgoal

Hasawell-defineddeliverytimeframe(9to12months)

Undertakentodeliversignificant,compelling,and/ordifferentiatedfinancialorcompetitiveadvantage

Itiscriticaltothesuccessofyourbigdataeffortstotargetbusinessinitiativesthatarefocusedonthenext9to12months.Anybusinessinitiativeslongerthan12monthslackthesenseofurgencytomotivatetheorganizationandrisksbecominga“scienceexperiment”projectwithallsortsofnewandsometimesrandomrequirementsbeingthrownintothemix.

CROSS-REFERENCE

SeeChapter3,“TheBigDataStrategyDocument,”toreviewideasonhowtoleveragepubliclyavailableinformation(e.g.,annualreports,analystcalls,executivespeeches,companyblogs,SeekingAlpha.com)inordertouncoveranorganization'skeybusinessinitiatives.

Forpurposesofthisexercise,wearegoingtopretendthatourclientisFootLockerandthatourtargetbusinessinitiativeis“improvemerchandisingeffectiveness”ashighlightedinFootLocker's2010annualreport(seeFigure8.1).

Figure8.1FootLocker'skeybusinessinitiatives

Merchandisingisdefinedastheplanningandpromotionofsalesbypresentingaproducttotherightmarketatthepropertime,bycarryingoutorganized,skillfuladvertising,usingattractivedisplays,etc.1Figure8.2showssomeexamplesofdifferentmerchandisingapproachesataFootLockerretailstore.

Figure8.2ExamplesofFootLocker'sin-storemerchandising

Step2:DevelopBusinessStakeholderPersonasThenextstepinthe“thinkinglikeadatascientist”processistoidentifythekeybusinessstakeholderswhoeitherimpactorareimpactedbythetargetedbusinessinitiative(e.g.,sales,marketing,finance,storeoperations,logistics,inventory,manufacturing).Therearetypicallythreetofivedifferentbusinessstakeholderswhoareimpactedbyagivenbusinessinitiative.Wewanttodevelopapersonaforeachofthesebusinessstakeholderstounderstandbettertheirworkenvironmentandjobcharacteristics.Understandingtheworkenvironmentandjob

characteristicsofthebusinessstakeholdershelpstostartidentifyingthedecisionsandquestionsthatthesestakeholdersmustaddresswithrespecttothetargetedbusinessinitiative.

Apersonaisaone-totwo-page“dayinthelife”descriptionthatmakesthekeybusinessstakeholders“cometolife”forthedatascienceanduserexperience(UEX)developmentteams.Personasareusefulinunderstandingthegoals,tasks,keydecisions,keyquestions,andpainpointsofthekeybusinessstakeholders.Thepersonahelpsthedatascienceteamtostarttoidentifythemostappropriatedatasourcesandanalytictechniquestosupportthedecisionsthatthebusinessusersaretryingtomakeandthequestionsthattheyaretryingtoanswer.Personasshouldbecreatedforeachtypeofbusinessstakeholderaffectedbythetargetedbusinessinitiative.

FortheFootLocker“improvemerchandisingeffectiveness”businessinitiative,thebusinessstakeholdersforwhomwewouldwanttobuildpersonascouldinclude:

Customers,whocontemplatevisitingastore,visitthestore,andmakepurchasedecisionswhiletheyareinthestore.Customerscancometothestoreundermanyscenarios(e.g.,buysomethingforthemselves,buysomethingforsomeoneelselikeasonordaughter,browsetoseewhatmightbeinteresting,orbrowseforproductsthattheythenbuyonline).Ineachofthesescenarios,thecustomerconsidersmanyfactors(function,price,value,urgency,aesthetics,socialperceptions,etc.)beforemakingapurchasedecision.

Storemanagers,whoareinchargeofthegeneraloperationsofastore.Storemanagersareresponsibleformeetingthestore'ssalesandbudgetgoals.Toaccomplishthatgoal,storemanagersmakedecisionstocreateschedules,ensurethestoreisstocked,createandmaintainbudgets,andcoordinatein-storemerchandisingandmarketingprograms.

Merchandisemanagers,whooverseetheselection,acquisition,promotion,andsaleofproductsinaretailsetting.Merchandisemanagerstypicallysitatthecorporateheadquartersandstudymarkettrendsandcustomerdemographicsinordertomakedecisionsabouthowtobestprice,stock,display,andpromoteproducts.

Buyers,whoareresponsibleforsourcingnewproductsandanalyzingexistingproductsales.Buyers,whoalsotypicallysitatcorporate,needtoresearch,plan,analyze,andchoosethetypes,quality,andpricesoftheproductsthattheyneedtosource.Thesebuyingdecisionswillbebasedonconsumerdemands,industrytrends,budget,andthecompany'soverallbusinessstrategy.

ApersonaforthestoremanagercouldlooklikeFigure8.3.

Figure8.3FootLocker'sstoremanagerpersona

Iconsource:theamm.org

IhighlightedinFigure8.3someofthekeydecisionsthatthestoremanagerneedstomakeinsupportofthe“improvemerchandisingeffectiveness”businessinitiative.

Step3:IdentifyStrategicNounsStrategicnounsarethekeybusinessentitiesaroundwhichthetargetedbusinessinitiativeisfocused.Itisaroundthesekeybusinessentitiesthatwearetryingtounderstandandquantifytheirbehaviors,tendencies,propensities,patterns,interests,passions,affiliations,andassociationsinordertopredictlikelyactionsandprescribeactionablerecommendations.Thesestrategicnounsarecriticaltoourdatascientistthinkingprocessbecausethesearetheentitiesaroundwhichwewillultimatelybuildindividualanalyticprofilesandthesupportingpredictiveandprescriptivemodels.Examplesofstrategicnounsincludecustomers,patients,students,employees,stores,products,medications,trucks,windturbines,etc.

FortheFootLocker“improvemerchandisingeffectiveness”businessinitiative,thestrategicnounsorkeybusinessentitiesonwhichwewillfocusare(seeFigure8.4):

Customers

Products

Marketingcampaigns

Stores

Figure8.4FootLocker'sstrategicnounsorkeybusinessentities

Step4:CaptureBusinessDecisionsThenextpartofthe“thinkinglikeadatascientist”processistocapturethedecisionsthatthebusinessstakeholdersneedtomakeaboutthestrategicnounsinsupportofthetargetedbusinessinitiative.Westartedtocapturesomeofthekeydecisionsaswebuiltoutthebusinessstakeholderpersonas.However,wewanttoexpandtheeffortsbybrainstormingwitheachofthedifferentstakeholdersthedecisionstheyneedtomakeabouteachstrategicnouninsupportofthetargetedbusinessinitiative.

Capturingandvalidatingthesedecisionsiscriticaltothe“thinkinglikeadatascientist”process.LeadingorganizationslikeUberandNetflixaredisruptivebecausetheybuildabusinessmodelthatseekstosimplifytheirtargetedpersonas'keydecisions.ForUber,oneofthedecisionsthatitaddressesis“HowdoIeasilygetfromwhereIamtowhereIwanttobe?”ForNetflix,oneofthedecisionsthatitaddressesis“Whatcontent[movie,TVshow]canIeasilywatchtonight?”

ForourFootLockerexample,herearesomecustomerpromotionaldecisionsthatthebusinessstakeholdersneedtomakeinsupportofthe“improvemerchandisingeffectiveness”businessinitiative:

Selectcustomerstowhomtosendpromotionaloffers.

Determinewhichtypesofpromotionalofferstosendtothosetargetedcustomers.

Determinewhentosendthosepromotionaloffers.

Herearesomeproductpromotionaldecisionsthatthebusinessstakeholdersneedtomakeinsupportofthe“improvemerchandisingeffectiveness”businessinitiative:

Decidewhichproductsorcombinationsofproductstopromote.

Decidewhichtypesofin-storemerchandisingtoemploy.

Choosethebestplaceswithinthestorestodisplaypromotedproducts.

Choosethebestin-storepromotionsforreducingoutdatedinventory.

Thecaptureandvalidationofthekeybusinessdecisionsarecriticalbecausethesedecisions:

Drivethedevelopmentoftheanalytic(predictiveandprescriptive)modelsbythedatascienceteamtosupportthesedecisions

Supportthedeterminationoftheuserexperience/userpresentationrequirements,thatis,whereandhowdotheanalyticinsights(recommendations,scores,rules)getpresentedtothebusinessstakeholdersinawaythatisactionable

Step5:BrainstormBusinessQuestionsProbablythehardestpartofthe“thinkinglikeadatascientist”exerciseistobrainstormthequestionsthebusinessstakeholdersneedtoanswertosupportthedecisionsthatsupportthetargetedbusinessinitiative.AsyoucanseefromFigure8.5,thequestionsformthefoundationfortheentire“thinkinglikeadatascientist”process.

Figure8.5Thinkinglikeadatascientistdecompositionprocess

IamconfidentwhenIsaythatIhavenevermetbusinessuserswhodidnotalreadyknowthequestionsthattheyaretryingtoanswer.However,thebiggestchallengeisnottocapturethequestionsthattheyaretryingtoanswerbuttoget

thebusinessuserstoexpandtheirlineofthinkingtocontemplatethequestionsthattheyhavegivenuptryingtoanswer.

Thereasonwhythismaybethehardestpartoftheprocessisthatitrequiresthebusinessstakeholderstothinkdifferentlyaboutthetypesofquestionsthattheycanask.Wewantthebusinessstakeholderstoexpandtheirthinkingaboutthebusinessquestionstoinclude:

Predictiveanalytics:Predictingwhatislikelytohappen

Prescriptiveanalytics:Recommendingwhattodonext

Akeypartofthe“thinkinglikeadatascientist”processisgettingthebusinessstakeholderstotransitionfromdescriptiveanalytics(usingBusinessIntelligencetoolstoreportonwhathappened)topredictiveanalytics(topredictwhatislikelytohappen)toprescriptiveanalytics(torecommendwhattodo).

AsdiscussedinChapter5,“DifferencesBetweenBusinessIntelligenceandDataScience,”weneedthebusinessstakeholderstotransitiontheirthinkingtocontemplatethesepredictivequestionsandprescriptivestatements.SeeTable8.1foranexampleoftheevolutionfromdescriptivetopredictivetoprescriptiveanalytics.

Table8.1EvolutionofFootLocker'sBusinessQuestions

WhatHappened?(DescriptiveANALYTICS)

WhatWillHappen?(PredictiveAnalytics)

WhatShouldIdo?(PrescriptiveAnalytics)

HowmanyNikeHyperdunksdidIselllastmonth?

HowmanyNikeHyperdunkswillIsellnextmonth?

Order[50]NikeHyperdunkstosupportnextmonth'ssalesprojections.

WhatwereapparelsalesbyzipcodeforChristmaslastyear?

WhatwillbeapparelsalesbyzipcodeoverthisChristmasseason?

Hire[3]temporaryrepsforStore12234tohandleprojectedChristmassales.

HowmanyofJordanAJFutureswerereturnedlastmonth?

HowmanyofJordanAJFutureswillbereturnednextmonth?

Setaside[$125K]infinancialreservetocoverJordanAJFuturesreturns.

Whatwerecompanyrevenuesandprofitsforthepastquarter?

Whatareprojectedcompanyrevenuesandprofitsfornextquarter?

Markdown[LeBronFoundationapparel]by20percenttoreduceinventorybeforenewproductreleases.

HowmanyemployeesdidIhirelastyear?

HowmanyemployeeswillIneedtohirenextyear?

Increasehiringpipelineby35percenttoachievehiringgoals.

ContinuingourFootLocker“improvemerchandisingeffectiveness”example,wewantthebusinessstakeholderstobrainstormthequestionsthatsupportthecustomerpromotionaldecisionsfromtheperspectivesofdescriptive,predictive,andprescriptiveanalytics.Herearesomeexamplesofthesedifferenttypesofcustomerpromotionalquestions:

DescriptiveAnalytics(Understandingwhathappened)Whatcustomersaremostreceptivetowhattypesofmerchandisingcampaigns?

Whatarethecharacteristicsofcustomers(e.g.,age,gender,customertenure,lifestage,favoritesports)whoaremostresponsivetomerchandisingoffers?

Aretherecertaintimesofyearwherecertaincustomersaremoreresponsivetomerchandisingoffers?

PredictiveAnalytics(Predictingwhatwillhappen)Whichcustomersaremostlikelytovisitthestoreforaback-to-schoolpromotion?

WhichcustomersaremostlikelytorespondtothenewMichaelJordanbasketballshoe?

Whichcustomersaremostlikelytorespondtoa50percentoffin-storemarkdownonNikeapparel?

WhichcustomersarelikelytorespondtoanofferofafreepairofJordanElitesockswhentheybuynewshoes?

PrescriptiveAnalytics(Recommendingwhattodonext)E-mailBillSchmarzoa50percentdiscountcouponfortwopairsofNikeElitesockswhenhebuyshisnewpairofAirJordans.

TextMaxSchmarzothathewillreceiveatriple-pointbonuswhenhebuysNikeapparelthiscomingweekend.

MailAlecSchmarzoa$20cashcoupongoodonlyifhevisitsthestorewithinthenext14days.

ForourFootLocker“improvemerchandisingeffectiveness”example,wewanttobrainstormthequestionsthatsupportourproductpromotionaldecisions.Herearesomeexamplesofthedifferenttypesofproductpromotionalquestionsthatsupportthe“improvemerchandisingeffectiveness”businessinitiative:

DescriptiveAnalytics(Understandingwhathappened)Whatarethetopsellingproductsandproductcategories?

Whatproductsaremostresponsivetoin-storemerchandisingcampaigns?

HowmanybasketballshoesdidIsellduringlastyear'shighschoolandyouthbasketballseasons?

WhichproductsarehotmoversthatImightwanttofeatureatthefrontofthestore?

WhichproductsareslowmoversthatImightneedanin-storemerchandisingcampaigntomove?

Whichproductssellbestatwhichtimesoftheyear/sportsseason?

PredictiveAnalytics(Predictingwhatwillhappen)Whichshoesandapparelaremostlikelytosellwithaback-to-schoolpromotionalevent?

WhichbasketballshoesandwhatsizesamIlikelytoneedtostockgiventheupcominghighschoolandyouthbasketballseasons?

WhatisthelikelymarketbasketrevenueandmarginfromaBuyOneGetOneFree(BOGOF)event?

PrescriptiveAnalytics(Recommendingwhattodonext)

Withtheupcominghighschoolbasketballseason,promoteAirJordansandNikeElitesocksinthesamedisplayatthefrontofthestore.

Giventheendofthefootballseason,providein-storeBOGOFpromotionoffootballapparel.

Reduceprices50percentontheinventoryofbaseballcleatsinanticipationofincomingnewbaseballequipment.

CROSS-REFERENCE

Becauseofthedepthofthetopics,step6,“leverage‘By’analysistouncovernewmetricsandvariablesthatmightbebetterpredictorsofperformance,”andstep7,“createactionablescoresthatthebusinessstakeholderscanusetosupportthetargetedbusinessinitiative,”arecoveredinChapters9and10,respectively.

Step8:PuttingAnalyticsintoActionThisisthepartofthe“thinkinglikeadatascientist”processwhenthehighlyspecializeddatascienceworkhappens(usingsomeoftheanalytictechniquescoveredinChapter6).Thedatascienceteamwilltest,refine,andvalidatethatwehaveidentifiedtherightmetrics,variables,andscores.Thedatascienceteamcanthenrecommendtothebusinessstakeholdershowtheanalyticswillsupportthedecisionsthatsupportthetargetedbusinessinitiative.

Forexample,itisnotsufficienttoknowthatthereisanincreaseinheadinjuries,lacerations,andbrokenbonesforhospitalsnearaNationalFootballLeague(NFL)footballstadiumafteranNFLgame.Onehastoknow(fromthedatasciencework)thatthereisa27percentincreaseifoneistomakeprescriptiverecommendationsaboutadditionalemergencyroomnurses,physicians,andsupplies.

Afterthedatascienceteamhasdoneitsmagictovalidatethemetrics,variables,andscoresthatarebetterpredictorsofbusinessperformance,thenthenextstepinthe“thinkinglikeadatascientist”processis“puttingtheanalyticsintoaction”withrespecttowhatanalytics-drivenscoresorrecommendationstodelivertothebusinessstakeholders.WespentaconsiderableamountoftimeinChapter4,“TheImportanceoftheUserExperience,”detailinghowcriticaltheuserexperienceistotheultimatesuccessoftheorganization'sbigdatainitiatives.Remember:Ifyoucan'tpresenttheanalyticresultsinawaythatisactionable,thenwhyevenbother.

Youcanfacilitatethedevelopmentofacompellingandactionableuserexperiencebystartingwithasimple“recommendationsworksheet.”Therecommendationsworksheetlinksthedecisionsthatourbusinessstakeholdersneedtomaketothepredictiveandprescriptiveanalyticsthatthedatascienceteamisgoingtobuild.Therecommendationsworksheetstartswiththedecisionscapturedinstep4,and

thenidentifiesthepotentialrecommendationsthatcouldbedeliveredtothebusinessstakeholdersinsupportofthosedecisions.Finally,theworksheetcapturesthepotentialscores(andthesupportingvariablesandmetrics)thatcanbeusedtopowertherecommendations.

Insummary,

Decisions→Recommendations→Scores(SupportingMetrics)

SeeFigure8.6forasimpletemplatethatwecanusetoguidetherecommendationsprocess.

Figure8.6Recommendationsworksheettemplate

Let'sseetherecommendationsworksheetinaction.ForourFootLocker“improvemerchandisingeffectiveness”businessinitiative,theresultingrecommendationsworksheetcouldlooklikeFigure8.7.

Figure8.7FootLocker'srecommendationsworksheet

Thelaststep(andpossiblythemostfunstep)isthecreationoftheuserexperiencemock-upthatvalidatesthatwearebuildingtherightanalyticsandhaveathoroughunderstandingofwhereandhowtodelivertheanalyticresults,scores,andrecommendations(e.g.,managementdashboards,reports,callcenter,procurement,sales,marketing,finance,etc.).SeeFigure8.8foranexampleofthestoremanageractionabledashboard.

Figure8.8FootLocker'sstoremanageractionabledashboard

Duringtheenvisioningandrequirementsgatheringandvalidationprocesses,donotworryaboutthequalityofthemock-up.UsingPowerPointandafewstandarddashboardimagescangoalongwayinfuelingthecreativethinkingofthe

targetedbusinessstakeholders.Heck,mostofmymock-upslookliketheyweredrawnwithacrayon!

SummaryDatascientistsarecriticaltotheabilitytointegratedataandanalyticsintotheorganization'sbusinessmodels.Butanimportantchallengeistogetyourbusinessusersto“thinklikeadatascientist”whencontemplatingdatasourcesandmetricsthatmightbebetterpredictorsofbusinessperformance.Havingabusinessorganizationthatcan“thinklikeadatascientist”willdrivebettercollaborationwithyourdatascienceteamandultimatelyleadtobetterpredictiveandprescriptiveresultsandincreasedvaluetothebusiness.

Weintroducedthe“thinkinglikeadatascientist”eight-stepprocessthatincludes:

Step1:IdentifyKeyBusinessInitiative

Step2:DevelopBusinessStakeholderPersona

Step3:IdentifyStrategicNouns

Step4:CaptureBusinessDecisions

Step5:BrainstormBusinessQuestions

Step6:Leverage“By”Analysis

Step7:CreateActionableScores

Step8:PuttingAnalyticsintoAction

WeusedaFootLockerexampletohelpdrivehometheconceptsandtechniquesintheeight-step“thinkinglikeadatascientist”process.Asaresult,notonlydothebusinessstakeholdersbetterunderstandhowthedatascienceprocessworks,butthebusinessstakeholdersalsounderstandwhattheycandotohelpthedatascienceprocessdelivernewvaluetotheorganizationbyhelpingtouncovernewdatasources,metrics,variables,andscores.

CROSS-REFERENCE

Asnotedearlierinthischapter,steps6and7arecoveredinChapters9and10,respectively.

HomeworkAssignmentUsethefollowingexercisestoapplywhatyoulearnedinthischapter.

Exercise#1:StartwiththekeybusinessinitiativethatyouidentifiedinChapter2.Writedownthekeybusinessstakeholderswhoeitherimpactorareimpactedbythetargetedbusinessinitiative.Captureorganizationalrolesversusindividualnamesatthispoint.

Exercise#2:Developaone-pagepersonaforoneofthekeybusinessstakeholdersidentifiedinExercise#1.Usethepersonatemplatethatwediscussedinthischapter.

Exercise#3:Writedownthekeybusinessentities(orstrategicnouns)forthetargetedbusinessinitiative.Thesecanbebothhumans(e.g.,customers,students,patients,technicians,engineers,etc.)andthings(e.g.,jetengines,trucks,ATMs,testsuites,curriculums,stores,competitors).

Exercise#4:Brainstormthebusinessdecisionsthatthebusinessstakeholdersneedtomakeaboutthebusinessentitiesinsupportofthetargetedbusinessinitiative.

Exercise#5:BrainstormthebusinessquestionsthatthebusinessstakeholdersmightwanttoaskandanswerwithrespecttoeachofthedecisionslistedinExercise#4.Besuretocontemplate(1)descriptivequestions,(2)predictivequestions,and(3)prescriptivestatements.IrepeatedthedecompositionprocessslideinFigure8.9forreference.

Figure8.9Thinkinglikeadatascientistdecompositionprocess

Notes1http://dictionary.reference.com/browse/merchandising

Chapter9“By”AnalysisTechniqueChapter8,“ThinkingLikeaDataScientist,”brieflyintroducedthe“By”analysisasatechniquearoundwhichthebusinesssubjectmatterexperts(SMEs)andthedatascienceteamcouldcollaboratetouncovernewvariablesandmetricsthatmightbebetterpredictorsofbusinessperformance.“By”analysisisatechniquethatwashistoricallyusedduringthedatawarehouserequirementsgatheringprocessestoensurethatthedatawarehouseschemawasrobustenoughtosupportthefullrangeofBusinessIntelligencequeriesandreportsthatbusinessusersmightrequest.Datasciencebuildsonthe“By”analysistocreateacollaborativetechniquetodrivealignmentbetweenthebusinessusersandthedatascientiststoidentifyandbrainstormvariablesandmetricsthatmightbebetterpredictorsofbusinessperformance.The“By”analysistechniquere-enforcestheimportanceofthe“thinkinglikeadatascientist”process.

RememberthedatasciencedefinitionfromMoneyball:TheArtofWinninganUnfairGamecoveredinChapter5:

Datascienceisaboutfindingnewvariablesandmetricsthatarebetterpredictorsofperformance.

The“By”analysistechniquesupportsthisdatascienceobjectivebypoweringthepartnershipbetweenthebusinessusersandthedatascientiststoleveragenewsourcesofcustomer,product,operational,market,andcompetitivedata,coupledwithadvancedanalytics,touncovermetricsandvariablesthatmaybebetterpredictorsofbusinessperformance.

Continuingwiththebaseballanalogy,MajorLeagueBaseball(MLB)teamssuchastheBostonRedSoxarecontinuallyexploringandtestingnewsourcesofdataandnewanalyticsinhopesofuncoveringnewvariablesandmetricsthatarebetterpredictorsofplayerperformance;thatis,theyaretryingtofindthatnextmorepredictive“on-basepercentage”(seeFigure9.1).

Figure9.1Identifyingmetricsthatmaybebetterpredictorsofperformance

Ultimately,thesenewvariablesandmetricswillbeusedtodetermineaplayer'sfinancialvalueinachievingtheteam'sbusinessobjectives.Identifyingthesemetricsandvariablesisnoguaranteethattheywillbebetterpredictorsofperformance(asseenfromtheRedSox'smostrecentperformance),butitgivesthedatascienceteamsastartingpointfortheirdatascienceexplorationand“fail-fast”analyticprocesses.

“By”AnalysisIntroductionThe“By”analysistechniqueexploitsabusinessuser'snatural“questionandanswer”enquiryprocesstoidentifynewdatasources,dimensionalcharacteristics,variables,andmetricsthatcouldbeleveragedbythedatascienceteaminbuildingthepredictiveandprescriptiveanalyticmodelstohelppredictbusinessperformance.The“By”analysisleveragesabusinessstakeholder'snaturalcuriositytobrainstormnew:

Metrics,measures,andkeyperformanceindicators

Dimensions(e.g.,strategicnouns)andtheattributesandcharacteristicsassociatedwiththosedimensionsorstrategicnouns

Areasforpotentialanalyticsexploration

The“By”analysistechniqueleveragesthenormalbusinessstakeholderquestionandqueryexplorationprocess;itfuelsthenaturalinquisitivehumannaturetoseekoutnewvariablesandmetricsthatmaybebetterpredictorsofbusinessperformance.

The“By”analysisusesasimple“Iwantto[verb][metric]by[dimensionalattribute]”formattocapturethebusinessstakeholderbrainstormingprocessandrevealnewdataandanalyticareasofexploration.The“By”analysisformatlookslikethis:

“Iwantto”

[Verb]suchassee,know,report,compare,trend,plot,predict,score,etc.

[Metric]suchassales,margin,profits,socialmediaposts,comments,physiciannotes,vibrationlevels,sensorcodes,etc.

“By”

[Dimensionordimensionalattribute]suchascity,state,zipcode,date,time,seasonality,productcategory,remodeldate,storemanagerdemographics,etc.

Hereisanexampleofa“By”analysisstatement:

Iwantto[report][onlinesalesandproductmargin]by…[productcategory,website,keywordsearchterm,referringwebsite,displayad,daypart,dayofweek,customerbehavioralcategory,customerre-targetingcategory].

Theabove“By”analysissentencebreaksdownassuch:

Theverbis[report],

Themetricis[onlinesalesandproductmargin],and

Thedimensionalattributesandcharacteristicsare[productcategory,website,keywordsearchterm,referringwebsite,displayad,daypart,dayofweek,customerbehavioralcategory,customerre-targetingcategory].

Thedatascienceteamisresponsibleforquantifyingwhichvariablesorcombinationsofvariablesarebetterpredictorsofperformance.Consequently,youwanttogivethedatascienceteamasmanyvariablesasispracticaltoconsider.Forexample,inoneprojectthebusinessstakeholders(teachers,inthiscase)wantedtoknowtheimpactthatachangeinthevalueofahousemighthaveonastudent'sclassroomperformance.SothedatascienceteamgrabbedsomeZillowdatatoseeiftherewasanycorrelation(therewasn't).

Herearesomeadditional“By”analysisstatements:

Iwantto[trend][hospitaladmissions]by…[diseasecategory,zipcode,patientdemographics,hospitalsize,areademographics,anddayofweek].

Iwantto[compare][currentversuspreviousmaintenanceissues]by…[turbine,turbinemanufacturer,dateinstalled,lastmaintenancedate,maintenanceperson,andweatherconditions].

Iwantto[predict][studentperformance]by…[age,gender,familysize,childnumberwithinfamily,familyincome,previoustestscores,currenthomeworkscores,andparent'seducationlevel].

Ihopethatyoucanseethatthesetypesofsentencesareveryeasytocreate.Also,the“By”analysistechniqueisperfectforafacilitatedbrainstormingsessionwherethegoalistofuelthegroupinnovativethinkingprocesstoidentifyadditionalvariablesandmetricsthatmightbebetterpredictorsofbusinessperformance.Andremember,asyougothroughanybrainstormingprocess,allideasareworthyofconsideration,andthebrainstormingprocessshouldnotfilterthesuggestionsandtherebyinadvertentlythrottlethecreativethinkingprocess.

Thedimensionalattributesandcharacteristicsthatfollowthe“by”phrasearethegoldinthedatascienceexplorationprocess.Thewideanddiversevarietyofdimensionanddimensionalattributesuncoveredbythe“By”analysisarecriticaltoguidingthedatascienceteam'sanalyticexplorationandmodelingprocess.The“By”analysiscansuggestadditionalvariablesandmetricsthatthedatascienceteammaywanttoexploreincreatingtheprescriptiveactions,scores,andrecommendationsthatareusedtosupportthetargetedbusinessinitiative.

“By”AnalysisExerciseContinuingwiththesportstheme,let'sintroduceanexercisethatallowsyoutoputthe“By”analysistowork.PretendthatyouaretheheadcoachfortheNationalBasketballAssociation's(NBA's)GoldenStateWarriorsandhavetoplaytheClevelandCavaliersinthe2015NBAChampionshipFinals.YourjobastheheadcoachoftheGoldenStateWarriorsistocraftadefensiveplanandgamestrategythatmaximizesyourchances(orprobability)ofwinningthegamebyminimizingtheshootingandoffensiveeffectivenessofCleveland'ssuperstar,LeBronJames.

Let'sstarttheanalysisprocessbygainingsomefundamentalinsightsregardingthe“hotspots”forshootersacrosstheNBA;thatis,wherethelocationsor“spots”areonthecourtwhereshootersaremostefficientasmeasuredby“pointspershot”(seeFigure9.2).

NOTE

The“pointspershot”metrictakesintoaccount(normalizes)thevalueofatwo-pointshotversusthevalueofathree-pointshot.ThechartinFigure9.2showsthat,generallyspeaking,three-pointshootingismoreeffectivethantwo-pointshootingexceptneartherim(dunksresultinaprettyhighshootingpercentage).

Figure9.2NBAshootingeffectiveness

Let'sdrilldownintoouranalysisprocessbyunderstandingLeBronJames'sspecificshootingtendenciesandperformance.TheshootinghotspotchartinFigure9.3providesagoodstartingpointinthedevelopmentofourdefensivestrategyagainstLeBronJames.Figure9.3showsLeBronJames'sshootingpercentagesfromdifferentspotsonthecourt.Thischarthelpsustostarttocontemplatethekeydecision:“WheredowewanttodirectorforceLeBron

Jamestogowhileonthecourtinordertomitigatehisoffensiveprowess?”

Figure9.3LeBronJames'sshootingeffectiveness

WhilethechartinFigure9.3isinterestinginhighlightingareasingeneralwhereLeBron'sshootingpercentagesarebetterorworse,tobeactionableyouneedtogetmoredetailedinsights.Tocreatemoreactionableinsights,weneedtounderstand“whatdetaileddataorinsightsdoIneedinordertocreatespecific,actionablerecommendationstomitigateLeBronJames'sshootingeffectiveness?”Thisistheperfecttimetoemploythe“By”analysistechnique.

Let'sputthe“By”analysistechniquetoworkbyapplyingthetechniquetothefollowingstatement:

Iwantto[know][LeBronJames'sshootingpercentage]by…

Takeamomenttojotdownsomevariablesormetricsthatcometomindfollowingthe“by”phrase(e.g.,“IwanttoknowLeBronJames'sshootingpercentageby…opponent”).

________________________

________________________

________________________

________________________

BelowaresomevariablesandmetricsthatIcameupwithusingthe“By”analysistechnique:

Athomeversusontheroad

Numberofdaysofrest

Shotarea

Opposingteam

Defender

Gamelocation

Gamelocationelevation

Gametimeweather

Gametimetemperature

Gametimehumidity

Time(hours)sincelastgame

Averagetimeofballpossession

Timeleftingame

Totalminutesplayedingame

Numberofshotsattempted

Numberofshotsmade

Locationofshotsattempted

Locationofshotsmade

Volumeofboos

Numberoffouls

Numberofassists

Playingaformerteam

Timeofday

Recordofopponent

Feelingstowardopponent

Performanceinlastgame

NumberofnegativeTwittercomments

Stadiumtemperature

Stadiumhumidity

Numberoffansinattendance

NumberofLeBronjerseysinattendance

Here'stheinterestingpoint:peoplewhohaveneverbeenanNBAbasketballcoachandevenpeoplewhomayhaveneverevenplayedbasketballcancomeupwithsomeofthemoreinterestingdimensionsanddimensionalattributesintryingtoidentifyvariablesandmetricsthatmaybebetterpredictorsofLeBronJames'sshootingtendenciesandperformance.

BuildingonsomeofthesuggestionsthatcameoutoftheLeBronJames“By”analysistechnique,let'striageLeBronJames'sshootingpercentagesforthe2014–2015regularseasonbyacoupleofdimensionsidentifiedinthebrainstormingsession:[HomeversusRoad]and[NumberofDaysRest].Table9.1showsLeBronJames'sshootingpercentages.

Table9.1LeBronJames'sShootingPercentages

2014–2015

OverallShootingPercentage

OverallShootingIndex

3-pointShootingPercentage

3-pointShootingIndex

Regularseason

48.8 100.0 35.4 100.0

Home 47.3 96.9 35.6 100.6

Road 50.2 102.9 35.3 99.7

0daysrest

49.8 102.0 38.0 107.3

1dayrest

46.3 94.9 32.3 91.2

2daysrest

51.3 105.1 37.3 105.4

3daysrest

52.7 108.0 42.9 121.2

4daysrest

57.1 117.0 60.0 169.5

6+daysrest

48.5 99.4 30.8 87.0

Source:http://stats.nba.com/player/#!/2544/stats/

You'renowstartingtogetsomeinterestinginsights.Remember,insightsareonlyobservationsburiedinthedatathatlookunusualwhencomparedtoanindividual'sstandardperformance.JustfromthesimpleanalysisinTable9.1,youcanstartuncoveringsomeinsightsaboutLeBron'sshootingtendenciesthatthedatascienceteammightwanttoexplorefurther.Forexample:

LeBronshootssignificantlyworsewhenhe'shadjustonedayofrest(8.8percentworsefromthree-pointrange).

IfyougiveLeBronfourdaysofrest,watchout!Hisshootingpercentagesimproveoverallandimprovedramaticallyforthree-pointshooting(69.5percentbetterthree-pointshootingwithfourdaysofrest).

AsdiscussedinChapter2,“BigDataBusinessModelMaturityIndex,”onceyoustartuncoveringinsightsburiedacrossthewidevarietyanddepthofdata,you

needthebusinesssubjectmatterexpertstoassessthevalueoftheseinsightsagainsttheS.A.M.criteria:

IstheinsightofStrategicvaluetowhatyouaretryingtoaccomplish?

IstheinsightActionable(i.e.,istheinsightataleveluponwhichIcanactonthatinsight)?

IstheinsightofMaterialvalue(i.e.,isthevalueoftheinsightgreaterthanthecosttoactonthatinsight)?

Oncean“insight”haspassedtheS.A.M.criteria,youwantthedatascienceteamtobuildtheanalyticmodelsthatquantifycauseandeffect,assessgoodnessoffit,andcreatetheprescriptiveactionsorrecommendationsthatprovideguidancetothefrontlineemployees(LeBronJames'sdefendersinthisexample)andmanagers(GoldenStateWarriorscoachingstaff)intheachievementoftheirbusinessinitiativeofminimizingLeBronJames'sshootingandoffensiveperformanceeffectiveness.

FootLockerUseCase“By”AnalysisContinuingtheFootLockerusecasethatwasstartedinChapter8,“ThinkingLikeaDataScientist,”wewanttoapplythe“By”analysistouncovernewvariablesandmetricsthatmightbebetterpredictorsofperformanceforthe“improvemerchandisingeffectiveness”businessinitiative.

Chapter8capturedthedescriptive,predictive,andprescriptivequestionsthatsupportedtheFootLocker“improvemerchandisingeffectiveness”businessinitiative.Asareminder,belowaresomeofthecustomerpromotionalquestionsthatwerecaptured:

DescriptiveAnalytics(Understandingwhathappened)

Whatcustomersaremostreceptivetowhattypesofmerchandisingcampaigns?

Aretherecertaintimesofyearwherecertaincustomersaremoreresponsivetomerchandisingoffers?

PredictiveAnalytics(Predictingwhatwillhappen)

Whichcustomersaremostlikelytovisitthestoreforaback-to-schoolpromotion?

WhichcustomersaremostlikelytorespondtothenewMichaelJordanbasketballshoe?

PrescriptiveAnalytics(Recommendingwhattodonext)

E-mailBillSchmarzoa50percentdiscountcouponwhenhebuystwopairsofNikeElitesockswhenhebuyshisnewpairofAirJordans.

TextMaxSchmarzotriple-pointbonuswhenhebuysNikeapparelthiscomingweekend.

MailAlecSchmarzoa$20cashcoupongoodonlyifhevisitsthestorewithinthenext14days.

Let'sputthe“By”analysistechniquetoworkagainstthefollowingquestion:

“WhatcustomersaremostreceptivetoFootLocker'smerchandisingcampaignsby…?”

Again,takeamomenttojotdownsomevariablesormetricsthatcometomindfollowingthe“by”phrase.I'llwaitforyoutojotdownyourideas(again,onevariableormetricperline).

_________________

_________________

_________________

_________________

ThefollowingisalistofsomeofthevariablesandmetricsthatIcameupwithwhenIappliedthe“By”analysistechniquetotheFootLocker'scustomerquestion:“WhatcustomersaremostreceptivetoFootLocker'smerchandisingcampaignsby…?”

Age

Gender

Maritalstatus

Numberofchildren

Lengthofmarriage

Incomelevel

Educationlevel

VIPloyaltycardmember

VIPmemberlengthoftime

VIPrewardsexpired(%)

VIPrewardsexpired($)

Ownorrentresidence

Tenureincurrenthome

Valueofcurrenthome

Favoritesports

Favoritesportsteams

Highschoolsportsinterest

Collegesportsinterest

Activeathlete

Typeofathleticactivity

Exerciseminutesperweek

Numberofdaysperweekexercised

Forpurposesofcompleteness,youwouldwanttoperformthe“By”analysisexerciseforacoupleofadditionalcustomerquestionsinordertocapturearobustsetofvariablesandmetricsthatcouldbeusedtopredicttheperformanceofthe“improvemerchandisingeffectiveness”businessinitiative.

ContinuingtheFootLockerexamplethatstartedinChapter8,belowaretheproductpromotionalquestionsthatwerecaptured:

DescriptiveAnalytics(Understandingwhathappened)

Whatproductsaremostsuccessfulwithwhatmerchandisingcampaigns?

HowmanybasketballshoesdidIsellduringlastyear'shighschoolandyouthbasketballseasons?

WhichproductsareslowmoversthatImightneedanin-storemerchandisingcampaigntomove?

PredictiveAnalytics(Predictingwhatwillhappen)

Whichshoesandapparelaremostlikelytosellwithaback-to-schoolpromotionalevent?

WhatisthelikelymarketbasketrevenueandmarginfromaBuyOneGetOneFree(BOGOF)event?

PrescriptiveAnalytics(Recommendingwhattodonext)

Withtheupcominghighschoolbasketballseason,promoteAirJordansandNikeElitesocksinthesamedisplayatthefrontofthestore.

Giventheendofthefootballseason,providein-storeBOGOFpromotionoffootballapparel.

Reduceprices50percentontheinventoryofbaseballcleatsinanticipationofincomingnewbaseballequipment.

Inthefollowinglist,the“By”analysistechniqueisappliedtotheFootLocker'sproductquestion:“Whatproductsaremostsuccessfulwithwhatmerchandisingcampaignsby…?”

Productcategory

Productsize

Productstyle

Productcolor

Productform

Producttype

Brand

Primarysport

Retailprice

Productreleasedate

Productdiscontinuedate

Brandage

Athleteendorser

AthleteendorserQscore

Athleteendorsersentiment

LastTVadvertisementdate

Brandsocialsentiment

ProductYelprating

Veryimportantnoteaboutthe“By”analysistechnique:thevariablesandmetricsuncoveredfromthe“By”analysistechniqueareonlylimitedbythecreativethinkingofthebusinessusers;thatis,thepeoplewholivethesedecisionsandquestionsdaily.1

Hopefullyyoucanseethatthenumberandvarietyofvariablesandmetricsuncoveredusingthe“By”analysistechniquecanbequitebountiful,andthemorevariablesandmetrics,thebetterfromadatascienceperspective.

SummaryThe“By”analysistechniqueisapowerfultoolinnotonlyhelpingtounderstandthekeymetricsanddimensionsofthebusinessbutalsoyieldinginsightsintoareasofthebusinessripefordatascienceanalysis.The“By”analysistechniquefuelsthecreativediscoveryofnewvariablesandmetricsbyleveragingthenaturalquestionandanswerexplorationofthebusinessusers.The“By”analysistechniqueusesasimplesentenceformat:

“Iwantto”

[Verb]suchassee,know,report,compare,trend,plot,predict,score,etc.

[Metric]suchassales,margin,profits,socialmediaposts,comments,physiciannotes,vibrationlevels,sensorcodes,etc.

“By”

[Dimensionordimensionalattribute]suchascity,state,zipcode,date,time,seasonality,productcategory,remodeldate,storemanagerdemographics,etc.

Finally,andmaybemostimportant,the“By”analysisisatechniquethatcandrivethecollaborationbetweenthebusinessusersandthedatascientiststouncovernewvariablesandmetricsthatcanguidethedatascientists'analyticsexplorationandmodeldevelopmentprocess.The“By”analysistechniquere-enforcestheimportanceofthe“thinkinglikeadatascientist”process.

InChapter10,wewillcoverhowtocombinethesevariablesandmetricstodevelopactionablescoresthatcanbeusedtoaddressthebusinessdecisionsthatsupportthe“improvemerchandisingeffectiveness”businessinitiative.

HomeworkAssignmentUsethefollowingexercisestoapplywhatyoulearnedinthischapter.

Exercise#1:PickoneofthequestionsforoneofyourkeybusinessentitiesorstrategicnounsthatyoucameupwithinChapter8andapplythe“By”analysistechnique.Ifpossible,getasmallgroupofco-workerstogetherandbrainstormthe“By”analysisasagrouptouncoverevenmorepotentialvariablesandmetrics.Fuelthecreativeprocesswithcoffeeanddonuts—lotsofdonuts—ifnecessary.

Exercise#2:Pickadifferentquestionforthesamekeybusinessentityandapplythe“By”analysistechniquetoseewhatadditionalvariablesandmetricsyouuncover.Again,donotworryatthispointifthedataisavailable.Nowisnotthetimetofilterthecreativethinkingoutcomes.Youwillhavetimetoevaluatethevalueandimplementationfeasibilityofeachofthepotentialvariablesanddatasourceslaterintheprocess.

Notes1Qscoreisameasureofthefamiliarityandappealofabrand,celebrity,company,orentertainmentproduct.

Chapter10ScoreDevelopmentTechniqueInNewZealand,peoplearetakinga“thinkinglikeadatascientist”approachtooptimizingsocialworkerspendingandcaseworkprioritization.ArelatedBusinessWeekarticletitled“AMoneyballApproachtoHelpingTroubledKids”(May11,2015)highlightstherolethatscoresplayinidentifyingandprioritizingproblemareasanddecidingwhatcorrectiveactionstotake.Hereareacoupleofexcerptsfromthearticle:

Usingdatafromwelfare,education,employment,andthehousingagenciesandthecourts,thegovernmentidentifiedthemostexpensivewelfarebeneficiaries–kidswhohaveatleastonecloseadultrelativewho'spreviouslybeenreportedtochildsafetyauthorities,beentoprison,andspentsubstantialtimeonwelfare.“Therearemillion-dollar[cost]kidsinthosefamilies,”MinisterofFinanceBillEnglishsays.“Bythetimetheyare10,theirlikelihoodofincarcerationis70percent.You'vegottodosomethingaboutthat.”

…oneideaistoratefamilies,givingthemanumber[score]thatcouldbeusedtoidentifywho'smostatriskinthesamewaythatlendersrelyoncreditscorestodeterminecreditworthiness.“Thewaywemayuseit,it'sgoingtobelikeit'saFICOscore,”saysJennieFeria,HeadofLosAngeles'DepartmentofChildrenandFamilyService.Theinformation,shesays,couldbeusedbothtoprioritizecasesandtofigureoutwhoneedsextraservices.

Inwrappingupthe“thinkinglikeadatascientist”processthatbeganinChapter8andcontinuedinChapter9,thischapterfocusesontheroleofscoresinsupportinganorganization'skeybusinessdecisions.AsexhibitedintheprecedingNewZealandwelfareexample,scoresareaveryeffectivedatascienceconceptinaggregatingawidevarietyofvariablesandmetricsinordertocomeupwithayardstickorguidethatcanbeusedtosupportkeybusinessandoperationaldecisions.

Scoresareveryimportantconceptsintheworldofdatascience.Manytimes,theresultsofthedatascienceeffortswillbepresentedasscoresthatcanhelptoguidefrontlineemployees'andmanagers'decisionmakinginsupportofthetargetedbusinessinitiative.

Thepowerofascoreisthatitisrelativelyeasytounderstandfromabusinessstakeholderperspective.Itfocusesthedatascienceeffortsonidentifyingandexploringnewmetricsandvariablestoincludeinascorethatmightbeabetterpredictorofbusinessperformanceoranindividual'sbehaviors.

Thepurposeofthescoretechniqueistolookforgroupingsofmetricsandvariablesthatcanbecombinedtocreateanactionablescorethatyoucanusetosupportyourkeybusinessdecisions.Thesescoresarecriticalcomponentsof

the“thinkinglikeadatascientist”processbecausetheycanguidethedecisionsyourfrontlineemployeesaretryingtomakeand/orpredictthelikelihoodofacustomer'sactions,outcomes,orbehaviors.

DefinitionofaScoreLet'sstartbydefiningscore:

Ascoreisadynamicratingorgradestandardizedtoaidincomparisons,performancetracking,anddecisionmaking.

Ascorecanhelppredictthelikelihoodofcertainactionsoroutcomes.

Ascoreisanactionable,analytic-basedmeasurethatsupportsthedecisionsyourorganizationistryingtomakeandguidestheoutcomestheorganizationistryingtopredict.

AcommonexampleofascoreistheintelligencequotientorIQ.AnIQisderivedfromseveralstandardizedtestsinordertocreateasinglenumberthatassessesanindividual'sintelligence.TheIQisstandardizedat100withastandarddeviationof15,whichmeansthat68percentofthepopulationiswithin1standarddeviationofthe100standard(between85and115).ThisstandardizationmakestheIQeasiertocomparedifferentstudents,candidates,orapplicantsandsupportkeyhiring,promotion,andcollegeapplicationdecisions.

Thetruebeautyofascoreisitsabilitytoconvertawiderangeofvariablesandmetrics—allweighted,valued,andcorrelateddifferentlydependingonwhatisbeingmeasured—intoasinglenumberthatcanbeusedtoguidedecisionmaking.Andthetruepowerofthescoreistheabilitytostartsimple,andthenconstantlyfine-tuneandexpandthescorewithnewmetrics,variables,andtherelationshipsthatmightyieldbetterpredictorsofbusinessperformanceoranindividual'sbehaviors.

FICOScoreExampleManyorganizationshavebuilttheirbusinessmodelsonthedevelopmentofscoresthathelporganizationstomakebetterdecisions.Forexample,TraackrandApinionsarecompaniesthatassignscorestoinfluencersonsocialmediatohelpidentifywhoorganizationsshouldtargetfromamediaperspective.FICOmaybethebestexampleofanorganizationthathasbuiltitsbusinessaroundthedevelopmentofascore.1TheFICOscoreisusedtopredictthelikelihoodofaborrowertorepayaloan.Fair,Isaac,andCompanyfirstintroducedtheFICOscorein1989.MostreadersareprobablyfamiliarwiththeFICOscore(andyouhaveprobablyseenyourownFICOscoreseveraltimes),whichcombinesmultiplevariablesandmetricsaboutaloanapplicant'sfinancial,credit,andpaymenthistorytocreateasingularscorethatlendersusetopredictaborrower'sabilitytorepayaloan(seeFigure10.1).

Figure10.1FICOscoreconsiderations

Anindividual'sFICOscorecanrangebetween300and850.AFICOscoreabove650indicatesthattheindividualhasaverygoodcredithistory,whilepeoplewithscoresbelow620willoftenfinditsubstantiallymoredifficulttoobtainfinancingatafavorablerate(seeFigure10.2).2

Figure10.2FICOscoredecisionrange

TheFICOscoreamalgamatesawiderangeofconsumerfinancial,credit,and

paymentmetricsinordertogeneratethesinglescoreforaspecificindividual.ThepowerfulconceptbehindtheFICOscoreisthatitcombinesthiswiderangeofconsumerfinancial,credit,andpaymentmetricsintoasingle,predictivescorethatpredictsanindividual'slikelihoodtorepayaloan.

TodivedeeperintotheFICOscoreexample,weseethedataelementsthatareusedinthecalculationofanindividual'sFICOscoreincludepaymenthistory,creditutilization,lengthofcredithistory,newcreditapplications,andcreditmix.3

PaymentHistory.Thirty-fivepercentoftheFICOcreditscoreisbasedonaborrower'spaymenthistory,makingtherepaymentofpastdebtthemostimportantfactorincalculatingcreditscores.AccordingtoFICO,pastlong-termbehaviorisusedtoforecastfuturelong-termbehavior.Thisisameasureofhowdoyouhandlecredit;thinkcredit“behavioralanalytics.”Thisparticularcategoryencompassesthefollowingmetricsandvariables:

Paymentinformationonvarioustypesofaccounts,includingcreditcards,retailaccounts,installmentloans,andmortgages

Theappearanceofanyadversepublicrecords,suchasbankruptcies,judgments,suits,andliens,aswellascollectionnoticesanddelinquencies

Lengthoftimeforanydelinquentpayments

Amountofmoneystillowedondelinquentaccountsorcollectionitems

Lengthoftimesinceanydelinquencies,adversepublicrecords,orcollectionnotices

Numberofpast-dueitemslistedonacreditreport

Numberofaccountsbeingpaidasagreed

CreditUtilization.ThirtypercentoftheFICOcreditscoreisbasedonaborrower'screditutilization;thatis,thepercentageofavailablecreditthathasbeenborrowedbythatindividual.Thecreditutilizationcalculationiscomposedofsixvariables:

Theamountofdebtstillowedtolenders

Thenumberofaccountswithdebtoutstanding

Theamountofdebtowedonindividualaccounts

Thetypesofloans

Thepercentageofcreditlinesinuseonrevolvingaccounts,likecreditcards

Thepercentageofdebtstillowedoninstallmentloans,likemortgages

LengthofCreditHistory.FifteenpercentoftheFICOcreditscoreisbasedonthelengthoftimeeachaccounthasbeenopenandthelengthoftimesincetheaccount'smostrecentactivity.FICObreaksdown“lengthofcredithistory”intothreevariables:

Lengthoftimetheaccountshavebeenopen

Lengthoftimespecificaccounttypeshavebeenopen

Lengthoftimesincethoseaccountswereused

NewCreditApplications.TenpercentoftheFICOcreditscoreisbasedonborrowers'newcreditapplications.Withinthenewcreditapplicationcategory,FICOconsidersthefollowingvariables:

Numberofaccountsthathavebeenopenedinthepast6to12months,aswellastheproportionofaccountsthatarenew,byaccounttype

Numberofrecentcreditinquiries

Lengthoftimesincetheopeningofanynewaccounts,byaccounttype

Lengthoftimesinceanycreditinquiries

There-appearanceonacreditreportofpositivecreditinformationforanaccountthathadearlierpaymentproblems

CreditMix.TenpercentoftheFICOcreditscoreisbasedonrepayingthevarietyofdebt,whichisameasureoftheborrower'sabilitytohandleawiderangeofcreditincluding:

Installmentloansincludingautoloans,studentloans,andfurniturepurchases

Mortgageloans

Bankcreditcards

Retailcreditcards

Gasstationcreditcards

Unpaidloanstakenonbycollectionagenciesordebtbuyers

Rentaldata

ThepointofshowingallthedetailsbehindtheFICOscorecalculationistoreinforcethebasicconcept(andpower)ofascore—thatascorecantakeintoconsiderationawiderangeofdescriptivevariablesandmetricstocreateasinglepredictivenumberthatcanbeusedtosupportanorganization'skeydecisionsor,inthecaseoftheFICOscore,usedbylenderstopredictaloanapplicant'slikelihoodtorepayaloan.That'saverypowerfulconcept.Scoresareacriticalconceptingettingyourbusinessstakeholderstocontemplatehowtheymightwanttointegratedifferentvariablesandmeasurestocreateactionable,predictivescorestosupporttheirkeybusinessdecisions.

OtherIndustryScoreExamplesDifferenttypesofscorescanbecreatedtosupportdecisionmakingacrossawidevarietyofindustries.Infact,theabilitytocreateactionablescoresisonlylimitedbythecreativethinkingofthebusinessstakeholders;hence,theimportanceofgettingbusinessstakeholdersto“thinklikeadatascientist.”

Forexample,herearesomescorestoconsiderforthefinancialservicesindustry:

RetirementReadinessScore.Thiswouldbeascorethatmeasureshowreadyeachclientisforretirement.Thisscorecouldincludevariablessuchasage,currentannualincome,currentannualexpenses,networth,valueofprimaryhome,valueofsecondaryhomes,desiredretirementage,desiredretirementlocation(IowaisalotcheaperthanPaloAlto!),numberofdependentchildren,numberofdependentparents,desiredretirementlifestyle,andsoforth.

JobSecurityScore.Thisscorewouldmeasurethesecurityofeachindividual'sjob.Thisscorecouldincludevariablessuchasindustry,jobtype,employer(s),joblevel/title,jobexperience,age,educationlevel,skillsets,industrypublicationsandpresentations,Kloutscores,andsoon.

HomeValueStabilityScore.Thisscorewouldmeasurethestabilityofthevalueofaparticularhouse.Thisscorecouldconsidervariablessuchascurrentvalue,supply/demandratioofarea,housesaleshistory,valueofhousecomparedtocomparablehouses,taxassessmentcomparedtocomparablehouses,whetherit'saprimaryresidenceorrentalresidence,localprice-to-rentratio,localhousingvaluetrends(maybepulledfromZillow),distancefromahighschoolorjuniorhighschool,qualityratingofthathighschoolorjuniorhighschool,distancefromshopping,andothers.

Interestingly,combiningthehomevaluestabilityscorewiththeFICOscorewouldhaveprovidedamoreholisticassessmentofbanks'housingmarketexposurepriortothe2007financialmarketmeltdown.TheFICOscorewasinsufficientwhentryingtodeterminethelevelofhousingmarketriskasfinancialorganizationswerewritingmortgageloans.CouplingtheFICOscorewithahomevaluestabilityscorecouldhaveprovidedinvaluableinsightsasbanksdecided(madedecisions)astowhomtomakehomemortgageloansandinwhichhousingmarkets(e.g.,decidingwhichhousingmarketswere“over-valued”).

Thekeypointinthismortgagemarketcollapseexampleisthatitisimportanttoconsiderhowmultiplescorescanprovidedifferentperspectivesonthedecisionthatisbeingevaluated.Usingdifferentscorescanprovideamoreholisticassessmentofthetrueconditionsaroundwhichtomakethesekeybusinessdecisions.

Table10.1showsadditionalscoresfromavarietyofindustries.

Table10.1PotentialScoresforOtherIndustries

FinancialServices

CreditCards Manufacturing Gaming/Hospitality

FICORetirementReadinessInvestmentRisk

AttritionRiskFraudRiskProductPreferences

EquipmentMaintenanceSupplierReliabilitySupplierQuality

Player/CustomerLifetimeValueGamingPreferences

Education Healthcare Utilities ProSports

GraduationReadinessCohortsInfluence

WellnessConditionStressRisk

EnergyEfficiencyConservationEffectiveness

FatigueFactorMotivationFactor

Thepurposeofthescoretechniqueistolookforgroupingsofcommonorsimilarvariablesandmetricsthatcanbemeshedtogethertocreateascorethatcanguideyourdecisionmaking.Thesescoresareacriticalcomponentofthe“thinkinglikeadatascientist”process.Scorescanprovideinvaluablesupportforthedecisionsthatyouaretryingtomakeorwhatactionsoroutcomesyouaretryingtopredictwithrespecttoyourtargetedbusinessinitiative.

LeBronJamesExerciseContinuedLet'scontinuetheLeBronJamesexamplethatyoustartedinChapter9.TheexerciseaskedyoutoplaytheroleastheheadcoachfortheNationalBasketballAssociation's(NBA's)GoldenStateWarriorsinpreparingtoplaytheClevelandCavaliersinthe2015NBAChampionshipFinals.YourjobastheheadcoachoftheGoldenStateWarriorsistocraftadefensiveplanandgamestrategythatmaximizesyourchances(orprobability)ofwinningtheseriesbyminimizingtheshootingandoffensiveeffectivenessofCleveland'ssuperstar,LeBronJames.

Weusedthe“By”analysistechniqueinChapter9toteaseoutavarietyofvariablesandmetricsthatmightbepredictorsofLeBronJames'sshootingprowess.Belowisthelistofthevariablesthatcameoutofthat“By”analysisprocess.

Iwantto[know][LeBronJames'sshootingpercentage]by…

Athomeversusontheroad

Numberofdaysrest

Shotarea

Opposingteam

Defender

Gamelocation

Gamelocationelevation

Gametimeweather

Gametimetemperature

Gametimehumidity

Time(hours)sincelastgame

Averagetimeofballpossession

Timeleftingame

Totalminutesplayedingame

Numberofshotsattempted

Numberofshotsmade

Locationofshotsattempted

Locationofshotsmade

Volumeofboos

Numberoffouls

Numberofassists

Playingaformerteam

Timeofday

Recordofopponent

Feelingstowardopponent

Performanceinlastgame

NumberofnegativeTwittercomments

Stadiumtemperature

Stadiumhumidity

Numberoffansinattendance

NumberofLeBronjerseysinattendance

NextwewanttounderstandthedecisionsthattheGoldenStateWarriorscoachingstaffneedstomakeincraftingadefensivestrategyagainstLeBronJames.Chapter8introducedtherecommendationsworksheetasatooltolinkthekeybusinessdecisionstotherecommendationsandthesupportingscores(seeFigure10.3).

Figure10.3Recommendationsworksheet

Inthe“mitigateLeBronJames'soffensiveeffectiveness”businessinitiative,someofthekeydecisionsthattheGoldenStateWarriorscoachingstaffneedtomakeare:

WhoisgoingtoguardLeBron?

WhatisthebestindividualdefensivestrategyagainstLeBron?

WhatisthebestteamdefensivestrategyagainstLeBron?

Next,youwanttoidentifytherecommendationsyoucoulddeliverinsupportofthosekeydecisions.Forexample,forthe“WhoisgoingtoguardLeBronJames?”decision,youmightwanttomakethefollowingrecommendations:

Whichdefender?

Whichdefenderatwhichtimesofthegame?

Whichdefenderinwhichgamesituations?

Figure10.4showstheupdatedrecommendationsworksheet.

Figure10.4Updatedrecommendationsworksheet

Nowyouwanttoreviewthevariablesandmetricsthatcameoutofthe“By”analysisandlookforcommongroupings.Forexample,thefollowingvariablesandmetricsthatcameoutofthe“By”analysisrelatetohow“Fatigued”LeBronmightbeatanypointinthegame:

Hourssincelastgame

Howmanygamesplayedintheseason

Averagenumberofminutesplayedpergame

Minutesplayedinthecurrentgame

Minuteshandlingtheballinthecurrentgame

Numberofshotstakeninthecurrentgame

Timeremaininginthecurrentgame

Awayorhomegame

ThisfatiguescorecouldbeusedtomeasurehowtiredorexhaustedLeBronisatanypointinthegame.Thefatiguescoreiscreatedfromacombinationofhistoricalmetrics(numberofgamesplayedintheseasonsofar,averagenumberofminutesplayed)combinedwithreal-time,in-gamemetrics(minutesplayedinthegame,numberofshotstakeninthegame,minuteshandlingtheballinthegame).UpdatingLeBron'sfatiguescorethroughoutthegame(sincemanyofthesupportingmetricschangeduringthegame)canleadtoin-gamerecommendationssuchasdefenders,individualdefensivestrategy,andteamdefensivestrategy.

A“Motivation”scorecouldbecreatedoutofthefollowingvariablesandmetrics:

In-gameperformance

Recordofopponent

Defenderguardinghim

Volumeofboos

Playingagainstaformerteam

NumberofLeBronjerseysinthestands

Themotivationscorewouldbeameasureofhow“motivated”LeBronisforthisparticulargame,andhowhardheiswillingtopushhimselfwhenhegetstiredtogetthewin.Themotivationscore,whencombinedwiththefatiguescore,canleadtoin-gamerecommendationsaboutdefenders,individualdefensivestrategy,andteamdefensivestrategy.Figure10.5showsthefinalversionoftherecommendationworksheet.

Figure10.5Completedrecommendationsworksheet

Itisinterestinghowthecombinationofmultipleminormetricshasthepotentialtoyieldamuchmoreactionableandpredictivescore.Thisprocessofuncoveringandgroupingmetricsandvariablesintohigher-levelscoresishighlyiterativewithlotsoftrialanderrorasthedatascienceteamtriestovalidatewhichcombinationsofmetricsandvariablesareactuallybetterpredictorsofperformance.

FootLockerExampleContinuedThroughoutChapters8and9,youapplied“thinkinglikeadatascientist”techniquesandconceptsinanexercisebasedonFootLocker.YouwillnowcompletetheFootLockerexercisebypullingeverythingtogethertoidentifyandcreateactionablescoresthathelpFootLocker“improvemerchandisingeffectiveness.”

InChapter9weconductedthe“By”analysisforFootLocker's“improvemerchandisingeffectiveness.”Theresultsofthe“By”analysisforonecustomerquestionisshowninthefollowinglist:

Age

Gender

Maritalstatus

Numberofchildren

Lengthofmarriage

Incomelevel

Educationlevel

VIPloyaltycardmember

VIPmemberlengthoftime

VIPrewardsexpired(%)

VIPrewardsexpired($)

Ownorrentresidence

Tenureincurrenthome

Valueofcurrenthome

Favoritesports

Favoritesportsteams

Highschoolsportsinterest

Collegesportsinterest

Activeathlete

Typeofathleticactivity

Exerciseminutesperweek

Numberofdaysperweekexercised

Glancingoverthedifferentmetricsandvariablesthatcameoutofthat“By”analysis,youwanttolookforcommongroupings.Forexample:

Youcouldgroupmetricsandvariablessuchas“VIPmember,”“Lengthoftime(tenure)asaVIPmember,”“FrequencyofuseofVIPcard,”“Frequencyofredeemingrewardpoints,”and“Percentageofexpiredrewards”intoa“CustomerLoyalty”score.

Youcouldgroupmetricsandvariablessuchas“Favoritesports,”“Favoritesportsteams,”“Highschoolsportsteamsupporter,”“Collegesportsteamsupporter,”and“Amountofteambrandedapparelpurchased”intoa“SportsPassion”score.

Finally,youcouldgroupmetricsandvariablessuchas“Activeathlete,”“Typeofathleticactivity,”“Frequencyofathleticactivity,”“Averageweeklyamountofathleticactivity,”and“Wearshealthmonitor”intoan“AthleticActivity”score.

Figure10.6showstheresultsofthegroupingofmetricsandvariablesinto

actionablescoresaboutFootLocker'scustomers.YouwouldwanttodoasimilarexerciseforFootLocker'sotherkeybusinessentitiessuchasproductsandstores.

Figure10.6PotentialFootLockercustomerscores

Finally,let'spulleverythingtogetherintoarecommendationsworksheetthathighlightshowyoumightusetheFootLockercustomerscorestohelpguideyourmerchandisingdecisions(seeFigure10.7).

Figure10.7FootLockerrecommendationsworksheet

Thebrainstormingofthedifferentmetricsandvariablesusingthe“By”analysistechniqueandthesubsequentgroupingoftheresultingmetricsandvariablesintocommonscoresisprobablythemostenjoyablepartofthe“thinkinglikeadatascientist”process.Youarefreetoapplyyourcreativejuicestobrainstormdatasourcesandmetricsthatmightbeusedaspartofyourscore.Again,noideaisabadidea.Letthedatascienceteamdecideviaitsanalyticmodelingwhichdatasourcesandmetricsarethebestpredictorsofbusinessperformance.

Buthowdoyouputthesescoresoranalyticsintoaction?HowdoesanorganizationlikeFootLockerleveragethesescorestoimproveitscustomerengagementandmerchandisingdecisions?

OneexamplemightbehowtheFootLockermarketingstakeholdersusethescorestoprioritizetheircustomeroffersandpromotions.Forexample,todaymostorganizationsdeterminethecustomerlifetimevalue(CLTV)basedontheprevious12to18monthsofsales(seeFigure10.8).

Figure10.8CLTVbasedonsales

ThegoaloftheCLTVscoreistohelpmarketingandstorepersonneltodeterminethe“value”ofacustomerthatcansubsequentlybeusedtodeterminewhogetswhatsortsofoffers.Unfortunately,sincethesalesnumbersareahistoricalperspectiveonspendandvalue,mostorganizationsjustcreatearuleofthumbthatallcustomersgetaflatrebate(5percent)onwhatevertheyspend.Boring.

However,whatifyouleveragedtheCustomerLoyaltyandtheAthleticActivityscorestocreateamaximumcustomerlifetimevalue(MCLTV)topredictwhichcustomersmighthaveuntappedsalespotentialandtowhichtypesofpromotionsorofferstheymightbemostresponsive(seeFigure10.9)?

Figure10.9MorepredictiveCLTVscore

YoucouldusethisMCLTVscoretoguidekeybusinessdecisionssuchas:

Whichcustomersgetwhatsortsofpromotions(inordertocapturemoreofeachcustomer'suntappedpotentialvalue)

Whatsortsofspecialeventstooffertowhichcustomers(inordertodriveloyaltyandincreaseeachcustomer'sMCLTV)

Whichstoresgetahigherallocationofpopularproductsbasedonthestoremaximumlifetimevaluescore(wherethe“storemaximumlifetimevalue”scoreisthesumofthe“MCLTV”scoresforthecustomerswhocometothatstoreonaregularbasis)

Hopefullythisisasimplebutpowerfulexampleofhowtoleveragescorestocreatehigherlevelmaximumvaluescoresthatcanbeusedtodrivetheanalyticsintoaction(viadecisions)acrosstheorganization.

SummaryAsyoucompletethe“thinkinglikeadatascientist”process,youcanseehowscoresareaveryimportantandactionableconceptforbusinessstakeholderswhoaretryingtoenvisionwhereandhowdatasciencecanimprovetheirdecisionmakinginsupportoftheirkeybusinessinitiatives.AsyousawfromtheFICOscoreexample,scoresaidindecisionmakingbypredictingthelikelihoodofcertainactionsoroutcomes(e.g.,likelihoodtorepayaloan,inthecaseoftheFICOscore).

Thebeautyofascoreisitsabilitytointegrateawiderangeofvariablesandmetricsintoasinglenumber.Thepowerofthescoreistheabilitytostartsmallandthenconstantlysearchfornewmetricsandvariablesthatmightyieldbetterpredictorsofperformance.

Simplebutpowerful,exactlywhatbigdataanddatascienceshouldstrivetobe.

HomeworkAssignmentUsethefollowingexercisestoapplywhatyoulearnedinthischapter.

Exercise#1:Taketheresultsfromthe“By”analysisconductedinChapter9foryourselectedbusinessinitiativeandlookforcommongroupingsorpotentialscores.ItmaybeeasiertowriteeachofthemetricsandvariablesontoaseparatePost-itnoteandplacethemonaflipchartorwhiteboard.Thatwillmakeiteasiertomovethemetricsandvariablesaroundasyoulookforcommongroupingsorpotentialscores.

Exercise#2:Completetherecommendationsworksheetforyourselectedbusinessinitiative.ValidatethatthescoresuncoveredinExercise#1supportthedecisionsandrecommendationsthatyouneedtosupportyourselectedbusinessinitiative.

Exercise#3:Contemplatehowyoumightcreateamaximumlifetimevaluescorethatcouldbeusedtosupportthekeydecisionsthatyouaretryingtomakeaboutyourtargetedbusinessinitiative.Ithinkyouwillfindthatthemaximumlifetimevaluescorecanbeusedtoprioritizespendandfocusinbusinessinitiativesasdiverseasmarketingeffectiveness,patientcare,teacherretention,predictivemaintenance,revenueprotection,andnetworkoptimization.

Notes1FICOisasoftwarecompanybasedinSanJose,California,andfoundedbyBillFairandEarlIsaacin1956.ItsFICOscore,ameasureofconsumercreditrisk,hasbecomethestandardformeasuringaconsumer'sabilitytorepayaloanintheUnitedStates.

2http://tightwadtravelers.com/check-fico-credit-score-free

3FICO'sfivefactors:ThecomponentsofaFICOcreditscore(http://www.creditcards.com/credit-card-news/help/5-parts-components-fico-credit-score-6000.php).

Chapter11MonetizationExerciseSometimesitisusefultoworkbackwardsinthe“thinkinglikeadatascientist”process.Youcandothisbyfirstidentifyingthepotentialrecommendationsthattheorganizationcoulddelivertoitscustomersandfrontlineemployees,andthenworkingbackwardstoidentifythesupportingdataandanalyticrequirements.

Thischapterintroducesatechniquecalledthemonetizationexercisethatseekstounderstandhowtheorganization'sproductorservicesareusedbyitscustomers,andthenidentifyhowthecustomerandproductusagedatacanbeusedtocreatenewmonetizationopportunities.Theprocessworksbackwardstouncoverthemetrics,variables,data,andanalytictechniquesthatyoumightneedtosupportthenewmonetizationopportunities.

Themonetizationexerciseprovidesanopportunitytouncovernewproductand/orserviceopportunitiesthroughtheidentificationanddeliveryofnewcustomerandfrontlineemployeerecommendations.Themonetizationexerciseworksbyfirstunderstandingtheproductusagepatternsandcustomerusagebehaviorsassociatedwithaparticularproductandservice.Theprocessthenseekstoidentifycomplementaryorsecondaryrecommendationsthatcanbepackagedanddeliveredalongwiththatproductorservice(thinktheDataMonetizationphaseoftheBigDataBusinessModelMaturityIndex).Followingisthemonetizationexerciseprocess:

Step1:Understandproductusagecharacteristicsandbehaviors

Step2:Developpersonasforeachcustomertype(includingkeydecisionsandpainpoints)

Step3:Brainstormpotentialcustomerrecommendations

Step4:Identifysupportingdatasources

Step5:Prioritizemonetizationopportunities(revenue)

Step6:Developmonetizationplan

Togetcomfortablewiththistechnique,you'regoingtousethemonetizationexercisetouncovernewmonetizationopportunitiesformynewfitnesstracker—awearabledevicethatmonitorsandprovidesfeedbackonmyrunningandwalkingactivities.Thegoalofthisparticularmonetizationexerciseistoidentifycomplementaryornewmonetizationopportunitiesincluding:

Newproductsand/orservicesthatcanbesoldtoexistingcustomers

Newproductsand/orservicesthatcanbeusedtoacquirenewcustomers

Newrevenueopportunitiesforthefitnesstrackermanufacturer'scurrentchannelpartners(e.g.,SportsAuthority,Dick'sSportingGoods,FootLocker)

Newmarketsassociatedwithfitness,exercise,andevenpotentiallywellness

Newaudienceswhomightfindthenewfitnessandwellnessservicescompelling

Newchannelsthroughwhichtosellthefitnesstrackerandtheassociatednewservices

Let'sseethemonetizationexerciseinaction!

FitnessTrackerMonetizationExampleIntryingtostaytruetomyannualNewYear'sresolutiontoliveahealthierandmoreathleticallyfitlife,Iwasthinkingaboutupgradingmycurrentfitnesstracker.ThemostimportantrequirementsformyidealfitnesstrackeraretheabilitytoaddGPStrackingandnewperformancemetricstomyworkouts.Inthinkingaboutthefitnesstrackermarketplace,Isawthatthereseemstobelotsofopportunitiesforfitnesstrackermanufacturerstoprovideadditionalproductsandservicesthatwouldmaketheirfitnesstrackersmorevaluabletotheconsumer,aswellasprovidedramaticbusinessbenefitstothefitnesstrackermanufactureranditschannelpartners.

Let'swalkthroughanexampletoseehowthefitnesstrackermanufacturercouldleveragethemonetizationexercisetocreatenewproductsandservicesanduncovernewmonetizationopportunities.

Step1:UnderstandProductUsageThefirststepinthemonetizationexerciseisforthefitnesstrackermanufacturertounderstandthekeyfeaturesandcapabilitiesoftheproductorservicebeinganalyzed.Forexample,myidealfitnesstrackerwouldhavefollowingfunctionality:

Providesacompletehistoryofmyworkoutsincludingmystartandfinishtimes,timeelapsed,distance,pace,andcaloriesburned

Measuresmycurrentspeed,distance,timeelapsed,pace,andcaloriesburned

Hasabuilt-inGPSthatdeliversaccuratespeedanddistancedatareadingsandmapsmyworkout

Monitorsmyheartrate

Recordsupto50runsandmypersonalbests

Enablesmetoeasilyreviewandanalyzemyworkouthistory

Allowsmetodownloadmyworkoutdataformoredetailedanalysis(yeah,Iknow,I'manerd)

DeliversrecognitionalertswhenIbeatapersonalrecord

Integratesperformanceresultseasilyintomydifferentsocialmedianetworks(andsupportsgamificationsoIcanrankmyperformanceresultsversusthoseofmyfriends)

WhatiscriticallyimportanttothefitnesstrackermanufacturerishowthefitnesstrackerisusedandthedecisionsIamtryingtomakeassociatedwiththoseusagebehaviors.Frommyownpersonalexperience,thefitnesstrackerencouragesdifferentusagebehaviorssuchas:

Encouragesmetotakemorewalksincludingalotmorewiththedog(poor

Puffer)

Encouragesmetotakethestairsinsteadoftheescalator

EncouragesmetowalkaroundtheairportterminalasIwaitformydelayedflighttofinallydepart

EncouragesmetoparkfartherawayfromstoresorrestaurantssothatIhavelongertowalk,ortowalktothefurthestbathroominthemalljusttobuildupmysteps

Encouragesmetoridemybikeinsteadofdrivethecarforshorttrips

Thesebehavioralchanges,andthedecisionsassociatedwiththosebehaviors(e.g.,whatshoestowear,whatrunningroutestotake,howlongtorun,withwhomtorun),providenewmonetizationopportunities,whichmeansthatorganizationsneedtogotheextramiletotrulyunderstandnotonlyhowtheirproductisusedbutalsothepersonalbehaviorsthatareassociatedwiththeirproductusage.

Step2:DevelopStakeholderPersonasStep2isforthefitnesstrackermanufacturertoidentifyandunderstanditsdifferentcustomertypes.Identifyingandunderstandingtheorganization'sdifferentcustomertypesisaprocessthatwascoveredinChapter8.Foreachofthecustomertypes,themanufacturerwouldwanttocreateaseparatepersonathatcapturesthecustomers'tasks,decisions,andassociatedpainpointswithrespecttotheirusageofthefitnesstracker.

Figure11.1showsapersonaforakeycustomertypethatIhavelabeledthe“SpiritedRunner”(oratleastthat'showIwouldclassifymyself).

Figure11.1“Adayinthelife”customerpersona

Asyouhaveseenintheuseofpersonasinthepreviouschapters,the“dayinthelife”personaseekstoprovideabaselineunderstandingofthetasks,decisions,andpainpointsassociatedwiththeusageoftheproductorservice.Forexample,inFigure11.1,the“spiritedrunner”personahasthefollowingdecisionstocontemplateforthe“earlymorningrunaroundtheneighborhoodtojump-starttheday”task:

WhatrunningshoesdoIwear?

WhatgeardoIwear?

HowlongdoIrun?

WhatroutedoIrun?

DoIrunaloneorwithafriend?

Asyouknowfromthepreviousexercisesinthisbook,understandingthedecisionsthatthekeyusersorbusinessstakeholdersaretryingmakeiscriticaltouncoveringnewmonetizationopportunities.

Therearelikelyotherdecisionsthatcouldbecapturedforthispersona,soitisworththeextraefforttoputyourselfintheperson'sshoestobetterunderstandthedecisionsheorsheistryingtomakeandtheassociatedpainpoints.

Personascouldalsobedevelopedforadditionalrunnerssuchas:

Extremerunner(runsmarathons,Ironmancontests,andadventureraces)

Occasionalrunner(runsacoupleoftimesaweekbutisnotveryseriousaboutrunning)

Reluctantrunner(runsonlyatthebeginningofeachnewyearaspartofhisorherNewYear'sresolutions)

However,therearesomeotherimportantbusinessstakeholdersforwhichthefitnesstrackermanufacturerwouldwanttocreateadditionalpersonas.Thoseadditionalbusinessstakeholderpersonasinclude:

Fitnesstrackermanufacturerproductdevelopment(whichcouldalsoincludeproductmanagementandproductmarketingforcompleteness)

Fitnesstrackermanufacturersalesandmarketing

Fitnesstrackermanufacturerchannelpartners(FootLocker,SportsAuthority,BigFive,Dick's)

Asahomeworkexercise,youwillbeaskedtocreatepersonasforoneoftheseadditionalbusinessstakeholders.

Step3:BrainstormPotentialRecommendationsStep3istobrainstormpotentialrecommendationsthatcouldbedeliveredtoeachbusinessstakeholder.Thatis,whatrecommendationscouldtheorganizationdelivertothedifferentstakeholdersthatbenefitorsupportthestakeholder'sdecisions?Therearetwoanglesthatyoucanleveragetohelpuncoverpotentialrecommendations:

Understandthedecisionsthedifferentstakeholdersneedtomakeandtheassociatedpainpoints,andcontemplaterecommendationsthatmightsupportthedecisionsand/orhelptoaddresstheassociatedpainpoints

Leverageyourobservationsaboutthepersonalbehavioralchangesinducedbythefitnesstrackertoidentifyotherpotentialrecommendations

Youcoulduseanold-fashionedfacilitatedbrainstormingsession(completewithlotsofPost-itnotes)tobrainstormpotentialrecommendationsforeachofthekeybusinessstakeholdersfromtheperspectivesofthedecisionsthattheyaretryingtomakeandtheassociatedbehavioralchanges.

Table11.1showssomepotentialrecommendationsthatthefitnesstrackermanufacturercoulddelivertothecustomerpersonabasedonthedecisionsthatthecustomeristryingtomakeandthedesiredbehavioralchanges.

Table11.1PotentialFitnessTrackerRecommendations

Decision PotentialRecommendations

WhatrunningshoesdoIwear?

Optimalrunningshoesgiventheconsumer'srunningandwalkingbehaviors,patterns,tendencies,routes,andphysicalattributesWhentoreplacerunningshoesgivenhowmuchtheconsumerhasrunonthoseparticularshoes,howfrequentlytheconsumerruns,thetypeofterrainonwhichtheconsumerruns,andcurrent“wearandtear”oftheshoesRunningaccessoriesorapparelsuchasspecialrunningsocks,thermaltights,stockingcaps,andglovesforthecoldweatherwhenItraveltoIowatovisitmyson

HowlongdoIrun?

Newperformancemetricssuchaselevationcovered,workouteffortlevel,circuittrainingmetrics,caloriesburned,etc.

WhatroutedoIrun?

Newlocalrunningroutesneartherunner'shomeorfavoriterunningroutesNewtravelingrunningrouteswhentheconsumeristravelingtootherareasofthecountry

DoIrunaloneorwithafriend?1

Potentialrunningpartnersbasedonsocialmediacontacts,runningtendencies,andrunninglocations

Step4:IdentifySupportingDataSourcesStep4istobrainstormthedifferentdatasourcesthatonemightneedinordertocreatetherecommendations.

NOTE

Iusethetermmightfrequentlytoconveythatanimportantpartoftheexerciseistonotpassjudgmentonthevalueorviabilityofthebrainstormeddatasources.Youwanttocollectanyandallideasregardingpotentialdatasources.Allideasareworthyofconsideration.Determiningthevalueorviabilityofthedatasourceduringthebrainstormingprocessonlyinhibitsthecreativethinkingprocess.Wewilldeterminethevalueandviabilityofthedatasourceslater.

Table11.2providesanexampleofsomeofthedatasourcesthatyoumightwanttoconsidertosupportthedevelopmentoftherecommendations.

Table11.2RecommendationDataRequirements

KeyStakeholder:EndConsumer

PotentialRecommendations

PotentialDataSources

Optimalrunningshoes

Exercisedata:performancedataaboutmyexercisesincludinglengthoftime,effortlevel,caloriesburned,distancecovered,pointsearned,etc.WorkoutGPSdata:dataaboutmyworkoutrouteincludingamapoftheroute,routeterrain,elevation,timeofday,etc.Weatherdata:dataabouttheweatherconditionsduringmyworkoutincludingtemperature,precipitation,humidity,etc.Runnerdata:weight,height,age,gender,bodymassindex,shoesize,widthoffoot,high/lowarch,preferredterraintype,etc.

Whentoreplacerunningshoes

Shoedata:detaileddataaboutmyshoesincludingmanufacturer,brand,typeofshoe,sizeofshoe,whenshoewasbought,whereshoewasbought,whereshoewasmade,userreviews,etc.Note:thefitnesstrackercouldprovideanoptionthatallowstherunnertotakeaphotooftheshoesandtheappautomaticallyprovidesdataabouttheconditionoftheshoe.WorkoutGPSdata:dataaboutmyworkoutrouteincludingamapoftheroute,routeterrain,elevation,timeofday,etc.Shoeweardata:askconsumertotakeperiodicphotosofthesolesinorderforthemanufacturertotrackshoewearandtear

Runningaccessories Inventoryofmyrunningaccessories:brand,type,size,whereIboughtit,whenIboughtit,whatIboughtitwithRunningaccessoriesusagedata:whatIwearinwhatconditions,whatIwearincombinationwithotherworkoutitems

Newperformancemetrics

Allowuserstocreateandsharetheirowncalculationsandperformancemetrics(SchmarzoPerformanceIndex=INTEGER(Steps/1000)+INTEGER(FuelPoints/1000))AllowuserstodownloadthedatatocreateandsharenewreportsandanalyticsIntegratefitnesstrackerdatawithotherexerciseappslikeMapMyFitnessorMyFitnessPal

Newlocalrunningroutes

AnalyzeGPSandexercisedataacrossallfitnesstrackerusersinordertoidentifynewroutestowhichImightbeinterested

Integratethird-partyappslikeMapMyFitnessandMyFitnessPalforcapturingadditionalroute,exercise,andworkoutdata

Newrunningrouteswhiletraveling

Collectallrunningandwalkingroutesacrossallfitnesstrackercustomersbylocationandexercisetype(lightwalking,heavyrunning,etc.)Matchmyrunningandwalkingtendenciestothecollectionofrunningandwalkingroutesinordertomakenewrouterecommendations

Potentialrunningpartners

SocialmediacontactsfromFacebook,Twitter,Instagram,etc.RelevantsocialmediapostsfrommysocialmediafriendsabouttheirrunningbehaviorsandpatternsandexercisehabitsCurrentlocationofmysocialmediacontacts(inordertomakereal-timerunningpartnerrecommendations)

Step5:PrioritizeMonetizationOpportunitiesStep5isfocusedonprioritizingtherecommendationsfromtheperspectivesofbusinessvalueandimplementationfeasibility.Forthisexercise,youwillusetheprioritizationmatrix(whichiscoveredindetailinChapter13),butwiththreedimensions:

Valueoftherecommendationtotheconsumer

Valueoftherecommendationtothefitnesstrackermanufacturer

Implementationfeasibilityoftherecommendationoverthenext9to12months(basedontheavailabilityofthesupportingdata)

Walkingthroughafacilitationprocesstoexploreandtriagethesethreedimensionsishardtodoinabook;however,youcanleveragebrainstormingandpollingtechniquestogetahigh-levelrankingorratingfortheanswerstothesethreedimensionsasseeninTable11.3.

Table11.3RecommendationsValueVersusFeasibilityAssessment

Recommendation ConsumerValue

ManufacturerValue

Feasibility

A.Optimalnewrunningshoes1

Medium High High

B.Whentoreplacerunningshoes

High High Low

C.Newlocalroutes High Low Medium

D.Runningpartners Medium Low Low

E.Runningapparel Medium High High

F.Routeswhentraveling Low Low Medium

G.Newrunningmetrics High Low Medium

Sincethreedimensionsdon'tworkverywellonatwo-dimensionalsheetofpaper,youwillleverageavisualizationtechnique(shadeofthedots)thatallowsyoutomimicthreedimensionsinatwo-dimensionalenvironment.Figure11.2showswhatthefinalresultsoftheprioritizationprocessmightyield.

Figure11.2Fitnesstrackerprioritization

Step6:DevelopMonetizationPlanAsyoucanseefromFigure11.3,decidingonthe“right”monetizationopportunityisnotalwaysstraightforward.TheconsumerspreferrecommendationsB(whentoreplacerunningshoes),C(recommendingnewroutes),andG(creatingnewmetrics),butonlyrecommendationBisofhighvaluetothefitnesstrackermanufacturer(sinceitleadstomoredirectsales).AndunfortunatelyrecommendationBisn'teasyfromanimplementationfeasibilityperspectivesince

itrequiressignificantconsumer-provideddata.

Figure11.3Monetizationroadmap

Oh,whatisonetodo?

Maybelikeachessgame,theanswerliesacoupleofmovesbeyondtheobvious.MaybethefitnesstrackermanufacturerwouldbebestservedtothinkaboutaroadmapthatlookslikeFigure11.3.

Themonetizationroadmapwouldlookassuch:

Phase1wouldfocusonrecommendationsA,C,andEinordertobuildconsumerinterestinthefitnesstrackerproductsandstarttocollectmoredataaboutrunnersandtheirrunningbehaviors(usingconsumerrunningbehavioralandnextbestofferinganalytics).

Phase2wouldthendeliverrecommendationD,whichallowsthefitnesstrackermanufacturertobuildupitsexpertiseinsocialmediaanalysisinidentifyingandrecommendingpotentialrunningpartners(usingcohortsanalysis).

Phase3wouldthenfocusonrecommendationB,whichhasthehighestvaluetothefitnesstrackermanufacturerandbuildsontheanalyticexpertisethatitdevelopedinphase1tomoveintotheareaofpredictivemaintenanceandproductreplacementanalytics.

Finally,phase4wouldthendeliveronrecommendationG,whichfosterscommunitybybuildingandsharingnewperformancecalculations,metrics,analytics,andreportsbetweenfitnesstrackercommunitymembers.

Thismonetizationroadmaphasthreebigbenefitsforthefitnesstrackermanufacturer:

Capturesmoreandmoredataaboutrunners'usagebehaviors,patterns,andtendencies

Capturesmoredataaboutproductusageandwear

Graduallybuildsuptheorganization'sdatasciencecapabilitiesinareassuchasconsumerbehavioralanalytics,nextbestoffer,cohortsanalysis,predictivemaintenance,andproductreplacementanalytics

SummaryThischapterintroducedthemonetizationexerciseascomplementarytothe“thinkinglikeadatascientist”processtohelporganizationstouncovernewproductand/orserviceopportunitiesthroughtheidentificationofnewcustomerandemployeerecommendations.Themonetizationexerciseisanon-technology,business-centric,organizational-alignmenttechniquethatusesthefollowingprocesstouncovernewmonetizationopportunities(phase4oftheBigDataBusinessModelMaturityIndex):

Identifyandunderstandhowcustomersuseyourproductsand/orservices

Identifyandunderstandkeybusinessstakeholders(customers,frontlineemployees,partners)includingtheirkeytasks,decisions,andassociatedpainpoints

Brainstormthetypesofrecommendationsthatyoucoulddelivertothestakeholdersbasedontheirusageoftheproductorservice

Identifythedifferentdatasourcesthatmighthelpsupporttherecommendations

Gothroughavaluationprocesswhereyoucontemplatethreekeyvariablesforeachrecommendation:valueoftherecommendationtothecustomer,valueoftherecommendationtothemanufacturer,andimplementationfeasibility

Lookforopportunitiestoclusterrecommendationsintosimilargroupsinordertocreateamonetizationroadmap

HomeworkAssignmentUsethefollowingexercisestoapplywhatyoulearnedinthischapter:

Exercise#1:Developapersonaforoneofyourorganization'skeycustomers(stakeholders).Besuretocarefullycontemplatethatcustomer'skeytasks,decisions,andassociatedpainpoints.IstronglyrecommendusingthesametemplateusedinFigure11.2.

Exercise#2:Brainstormtherecommendationsyourorganizationcoulddelivertothatcustomerbasedonthecustomer'skeydecisions.Besuretotakeintoconsiderationthepainpointsasyoubrainstormtherecommendations.UsethetemplateusedinTable11.1.

Exercise#3:Brainstormthepotentialdatasourcesforeachoftheidentifiedrecommendations.Again,alldatasourceideasareworthyofconsideration,andyou'lldeterminethevalueandfeasibilityofthedifferentdatasourceslater.UsetheformatinTable11.2tocapturethedatasources.

Exercise#4:Prioritizetherecommendationsfromtheperspectivesofthevalueoftherecommendationtothecustomer,thevalueoftherecommendationtoyourorganization,andtheimplementationfeasibilityoverthenext9to12months.

Exercise#5:Clustertherecommendationsintosimilarorlogicalgroupstocreateamonetizationplan.

Notes1Howaboutbuyingafitnesstrackerforyourdog'scollarwithanappthatcantellyouwhetherornotyourdogneedsexercise,whattype,howmuch,etc.?Thatwouldbeanotherproductandservicethat,whencoupledwithdataaboutyourdog'sbreed,age,health,etc.,couldyieldamore“fit”dog.Youcouldcalltheproductandservice“FitBark”(hehehe).

2Providingrecommendationsonoptimalrunningshoesandrunningapparelcreatesnewmonetizationopportunitiesfromco-marketingwithsportingshoeandapparelmanufacturers.

Chapter12MetamorphosisExerciseReachingtheBusinessMetamorphosisphaseoftheBigDataBusinessModelMaturityIndexisasignificantaccomplishmentformostorganizations.Evenjustcontemplatingwhatthisendpointmightlooklikecanbequitebeneficialinthedevelopmentofanorganization'sbigdatainitiative.Beginningwithanendinmind,toquoteStephenCovey,notonlycanhelptheorganization'sleaderstoenvisionthepotentialofbigdatafromabusinesstransformationperspectivebutpragmaticallycanhelptheorganizationtoidentifywhereandhowtostarttheirbigdatajourney.

InworkingwithorganizationstomeasurehoweffectivelytheyleveragedataandanalyticswithintheirkeybusinessprocessesusingtheBigDataBusinessModelMaturityIndex(seeFigure12.1),Icreatedanexercisetohelporganizationstoenvisionwhatthebusinessmetamorphosismightlooklike.Whileit'snotpossibletostartyourbigdatajourneyatthisphase,theexercisehashelpedmyclientsidentify,prioritize,anddeveloptheirbigdatausecases.

Figure12.1BigDataBusinessModelMaturityIndex

BusinessMetamorphosisReviewAsarefresher,theBusinessMetamorphosisphaseiswhereorganizationsseektoleveragedata,analytics,andtheresultinganalyticinsightstotransformtheorganization'sbusinessmodels.Thisincludesareassuchasbusinessprocesses,organizationalstructures,productsandservices,partnerships,targetmarkets,management,promotions,rewardsandincentives,andothers.TheBusinessMetamorphosisphaseiswhereorganizationsintegratetheinsightsthattheycapturedabouttheircustomers'usagepatterns,productperformancebehaviors,andoverallmarkettrendstotransformtheirbusinessmodels.Thisbusinessmodelmetamorphosismightenabletheorganizationtoprovidenewservicesandcapabilitiestoitscustomersinawaythatiseasierforthemtoconsume.Perhapsitcouldenablethird-partydeveloperstoproliferateontheorganization'sfoundationalplatform,orfacilitatetheorganizationengaginginhigher-valueandmorestrategicservices.

TheBusinessMetamorphosisphasenecessitatesamajorshiftintheorganization'scorebusinessmodeldrivenbytheanalyticinsightsgatheredastheorganizationtraversestheBigDataBusinessModelMaturityIndex.

Herearesomeexamplesofwhatorganizationscoulddotoleveragedata,analytics,andtheresultinganalyticinsightstometamorphosetheirbusinessmodels:

Jetenginemanufacturertransformingfromsellingjetenginestoselling“thrust”andrelatedhigh-valueservicestotheairlinesaroundservicelevelagreements(on-timedepartures,on-timearrivals),productmaintenance(minimizingaircraftdowntime),insurance,warranties,andupgradingproductperformanceovertime(improvingfuelefficiency).

Farmequipmentmanufacturertransformingfromsellingfarmequipmenttoselling“farmingyieldoptimization”tofarmersbyleveragingsuperiorinsightsintoseeds,soilconditions,weather,fertilizers,pesticides,irrigationtechniques,andprojectedcropprices.

Energycompaniesmovingintothe“HomeEnergyOptimization”businessbyrecommendingwhentoreplaceappliances(basedonpredictivemaintenance)andevenrecommendingwhichappliancebrandsandmodelstobuybasedontheperformanceofdifferentappliancestakingintoconsiderationyourusagepatterns,localweather,localwaterqualityandlocalwaterconservationefforts,andenergycosts.

Airlinesmovingintothe“TravelDelight”businessofnotonlyofferingdiscountsandupgradesonairtravelbasedoncustomers'travelbehaviorsandpreferencesbutalsoproactivelyrecommendingdealsonhotels,rentalcars,limos,sportingormusicalevents,andlocalsites,shows,restaurants,andshoppinginthedestinationareasbasedonyourareasofinterestandpreferences.

ContinuingwiththeFootLockerexamplefrompreviouschapters,businessmetamorphosisforFootLockercouldmeanshiftingawayfromsellingsportingshoesandappareltoproviding“workoutsasaservice.”FootLockercouldmonitorallofyourworkoutsandwalkingactivitiesandautomaticallyrecommendthemostappropriateshoes,workoutapparel,workoutroutines,gymmemberships,andexercisetipsbasedonyouruniqueworkouthabits,patterns,tendencies,andpropensities.FootLockercouldevenexpandinto“healthandwellnessservices”toprovidetipsandrecommendationsaboutyourdiet,exercise,stress,cholesterol,andsoforth,focusedonimprovingyouroverallhealthandwellness(andmaybeevenhelpingtoreduceyourhealthinsurancecosts).

Inalloftheseexamples,theseorganizationscouplenewsourcesofcustomer,product,andoperationaldatawithdatasciencetouncovernewactionableinsightsthatformthebasistometamorphosetheirbusinessmodels.

Let'sintroduceanexercisethatcanhelptostrengthenyour“thinkinglikeadatascientist”methodology.TheexercisebeginswiththeBusinessMetamorphosisstageandworksbackwardstoidentifypotentialbigdatausecasesandthesupportingdataandanalytics.

BusinessMetamorphosisExerciseIaskedstudentsinoneofmyMBAclassestopretendthattheyweremanagementconsultantsthathadbeenaskedbyalargeairplanemanufacturertocontemplatehowbigdatacouldmetamorphosetheorganization'sfuturebusinessmodel.Inessence,thelargeairplanemanufacturerwantedtometamorphosethebusinessbytransitioningfromsellingairplanestoselling“airmiles.”

Thestudents,actingasmanagementconsultants,neededaprocesstouncovertheanalyticinsightsaboutpassengers,airplanes,airlines,airports,androutes(thestrategicnounsofthisexercise)necessarytosupportthebusinessmetamorphosis—totransformbusinessprocesses,people,organizationalstructures,productsandservices,partnerships,markets,organization,promotions,rewards,incentives,andsoon.Themanagementconsultantswouldalsoneedtoidentifythedata,analytic,andbusinessrequirementsnecessarytoencouragethird-partydeveloperstocreatevalue-addedservicesandproductsbasedontheairplanemanufacturer'snewbusinessplatform.

ArticulatetheBusinessMetamorphosisVisionThefirststepinthemetamorphosisexerciseistoarticulateandunderstandthebusinessramificationsoftheairplanemanufacturer'snewbusinessmodelvision.Usethefollowingvisionstatementasyourstartingpoint:

Largeairplanemanufacturerwantstometamorphoseitsbusinessmodelbytransitioningfromsellingairplanestosellingairmiles(transporting250customers2,600airmilesfromSFOtoJFKonSundaymorningsat9:00am)inordertocreatenewhigh-valueservicesforairlines(e.g.,United,American,Delta,Southwest)andenablethird-partydeveloperstoextendtheairplanemanufacturer'sbusinessmodeltoairlinesandpotentiallyothercustomers,partners,andmarkets.

Asastartingpoint,thisvisionstatementcouldhavethefollowingramificationstotheairplanemanufacturer'sbusinessmodel:

Theairplanemanufacturerwouldenjoyadramaticcompetitiveadvantageoverotherairplanemanufacturersbyprovidingnewbusinessbenefitstotheairlinesincludingsignificantlyimprovedcashflowandfinancials(reducedcapitalexpenditures),eliminationofmaintenancecosts,eliminationofpartsinventorycosts,andmitigationofflightdelayrisks.

Theairplanemanufacturerwouldberesponsibleforowningandmanagingthefleetsofairplanes(likelyundertheirbrand),andtheairlineswouldcontractwiththeairplanemanufacturertoacquire(provision?)theairmilesnecessarytotransportaspecifiednumberoftheairline'spassengersfromonelocationtoanotherataspecifiedtimeanddate.

Theairplanemanufacturerwouldassumeallresponsibilitiesforensuringthat

planesareupandrunning(e.g.,maintenancescheduling,maintenancetechniciantrainingandmanagement,maintenanceandreplacementpartsinventory,component,andsoftwareupgrades).Iftheplaneswerenotflying,thentheairplanemanufacturerwouldnotbegettingpaid.

UnderstandYourCustomersThesecondstepinthemetamorphosisexerciseistoidentifyandunderstandtheairplanemanufacturer'scustomers.Itisclearthatitscurrentcustomers(airlines)wouldcontinuetobethefuturecustomers.However,thisopensupopportunitiestoacquirenewtypesofcustomers—airlinepassengers,forexample.

Forexample,theairplanemanufacturerisnowinauniquepositiontoknowdetailsaboutairlinepassengerswhoflyacrossdifferentairlinesandcannowoffernewservicestothoseairlinepassengersthatcouldbemorecompellingthananysingleairlinecouldofferonitsown.Forexample,createanewtypeoffrequentflyerprogramthatoffersrewards,gifts,upgrades,recognition,andspecialprivileges(airportclubaccess,priorityTSApre-check)topassengerswhoflyonanyoftheairplanemanufacturer'splanes,regardlessoftheairline.

Theremaybeotheropportunitiestoleveragethisnewbusinessmodeltoaddressothercustomers,suchas:

Travelagentsbyvirtueofhavingamorecompleteunderstandingofpassengerdemandandflightandseatavailability

Hoteloperatorswhocouldworkwiththeairplanemanufacturertodirectcustomerstoavailablerooms

Groundtransportationcompanies(carrentalcompanies,Uber,Lyft,taxis,airportshuttles)bysharingpassengerforecastsintospecificairports

Sporting,casinos,andentertainmentcompaniesbydirectingpassengerstosportingeventsandentertainmentthatmaymatchthepassengers'areasofinterest

Andtherearesurelyothers.Thebusinesspotentialtoreachnewcustomersandnewmarketswithnewservicesisonlylimitedbythecreativethinkingoftheorganization.

ArticulateValuePropositionsThenextstepistobrainstormwhatthisbusinessmetamorphosiswouldmeantotheairplanemanufacturer'scustomers(United,American,Delta,Southwest,VirginAtlantic,etc.).Let'scontemplatethevaluepropositionsthattheairplanemanufacturer'snewbusinessmodelmightprovidetotheseairlinescustomers.Thesevaluepropositionstotheairlinescouldinclude:

Significantlyimproveairlinecashflowbyconvertingthefixedmonthlyairplaneleasepaymentstoavariablecostbasedonthenumberofpassengers

andairmiles.Thisgivestheairlinessignificantflexibilityindefining,scheduling,andmanagingpassengers,routes,andcrews.

Dramaticreductioninmaintenancecostsincludingspareandmaintenancepartsinventoryandmaintenancepersonnel(includinghiring,training,andmanagingofmaintenancepersonnel).

Reductioninunplannedandovertimecostsassociatedwithflightdelaysduetomechanicalissues,astheseissueswouldnowbecometheairplanemanufacturer'sresponsibility.

Airlinescouldthenfocusondifferentiatingthemselvesinareasotherthanairplaneconfiguration(becausethesamemodelsofairplaneswouldlikelybeusedtoservemultipleairlines)including:on-planecustomerserviceandamenities,onboardmeals(yeah,right),gateareacustomerserviceandamenities(loungechairsinsteadoftoday'sstadiumrejectedseats),frequentflyerrewardprograms(withmilesthatyoucanactuallyuse),clublocationsandamenities,ticketpricing,travelconvenience,tripdurationtimes(e.g.,reducenumberofconnections),andsoon.

Whilethiscouldbeascarypropositionforsomeairlines,forotherairlinesitprovidesanopportunitytoprovidenewhigh-valueservicestohigh-valuecustomersinordertobuildloyaltyinnewandcreativewaysoutsideofjustfightschedulesandseatavailability.

DefineDataandAnalyticRequirementsThefinalstepinthemetamorphosisexerciseistobrainstormtheairplanemanufacturer'sdataandanalyticrequirements.Youwillbrainstormtheserequirementsviaathree-stepprocess:(1)identifykeybusinessandoperationaldecisions,(2)identifytheanalyticstosupportthedecisions,and(3)identifydatatosupporttheanalytics.

Step1:IdentifyBusinessandOperationalDecisionsThefirststepistoidentifythekeyoperationalandbusinessdecisionsthattheairplanemanufacturerneedstomakeinordertosupportthenewbusinessmodel.Itiscriticaltothesuccessofyourbigdatainitiativetothoroughlyunderstandthebusinessandoperationaldecisionsthatthekeybusinessstakeholdersareresponsibleformaking.Decisions(andsomesupportingquestions)fortheairlinemanufacturercouldincludethefollowing:

Decisionsaboutpricingandtheirsupportingquestions:

HowdoIpriceconsideringsurgedemanddrivenbyspecialevents(bowlgames,FinalFourtournament,holidays)?

Howdirectlydoesmypricingimpacttheairlines'pricingandtheirabilitytobeprofitable?

CanIsupportsurgepricing?

CanIprovidepricingdiscountsforpackagesofairmiles?

Decisionsaboutsalesandmarketingandtheirsupportingquestions:

HowcanIleveragealoyaltyprogramtodriveusageandcapturemorepassengerdata?

Whatpromotionalpackagesaremosteffectiveatdrivingpassengerdemand?

CanIleveragesocialmediaandinfluencerstodrivefamiliesandgroupstofly?

Inwhatmarketsandroutesdowhattypesofpromotionsworkbest?

Decisionsaboutin-flightairplaneperformanceandtheirsupportingquestions:

Whichairplaneconfigurationsyieldthebestfuelefficiencies?

Whatpilotsaremostfuelefficient?

Whataretheoptimalcrewconfigurations?

HowdoIbestdistributebaggageandcargotooptimizefuelefficiencies?

Whichin-flightMVPpassengersareunhappy,andwhatshouldIdoaboutthat?

Decisionsaboutpassengerandbaggagemanagementandtheirsupportingquestions:

HowcanIspeedloadingandunloadingpassengersandbaggage(inordertospeedairportturns)?

Whatairplaneconfigurationsaremosteffectiveongettingpassengersandbaggageonandofftheplanesfaster?

HowdoIincentmorepassengerstocheckbagssothatlesstimeisspentinboardingplanes(again,tospeedairplaneturns)?

ShouldIcreatearampmanagementservicewhereItakeresponsibilityforloadingandunloadingtheairplanebaggage?

Decisionsaboutairplanemaintenanceandtheirsupportingquestions:

HowdoIselectwhichairplanesand/orjetenginestoreplacewithmoreefficientmodels?

HowdoIbalancethejetenginefuelefficiencywithjetenginemaintenancecosts?

Whichjetenginesaremostcost-effectivefromafuelefficientandmaintenanceperspective?

Decisionsaboutpartsandlogisticsmanagementandtheirsupporting

questions:

HowcanIreducesparepartsandmaintenancecosts?

Whatistheoptimalnumberandtypeofairplaneconfigurationinordertoreducesparepartsandinventorycosts?

CanIdesignplaneswithmoreinterchangeablepartstoreducepartsinventorycosts?

HowcanIleveragelow-cost,centralizedpartsdepotstosupportthemaintenanceandinventoryneedsofthehigh-volumeairports(e.g.,CedarRapids,IAservicingORD,MSP,MCI,andSTL)?

Decisionsaboutairplanedesignandtheirsupportingquestions:

HowcanIdesign/build/configureairplanestogetpassengersonandofftheplanemorequickly?

HowcanIdesign/build/configureairplanesthatreducepartsmaintenancecosts?

HowcanIdesign/build/configureairplanesthatreduceoperationalcosts(gateagents,baggageworkers,flightattendants,pilots)?

HowcanIdesign/build/configureairplanesthatreducepartsinventorycosts?

Step2:IdentifyAnalyticRequirementsNextyouneedtoidentifytheanalyticsthattheairplanemanufacturerwouldneedtosupporttheoperationalandbusinessdecisions.Inthisstepyouwanttoworkbackwardsfromthedecisionsandsupportingquestionstoidentifythepotentialanalyticsnecessarytosupportthedecisions.Table12.1containsastartersetoftheseanalytics.

Table12.1DecisionstoAnalyticsMapping

Decisions PotentialAnalytics

Pricing Passenger(demand)forecastFuelcostsforecastMaintenancecostsforecastPilot/flightattendantperformanceoptimizationPilot/flightattendantretentioneffectiveness

Salesandmarketing PassengerlifetimevaluescorePassengerloyaltyscorePassengernetpromoterscorePassengeracquisitioneffectivenessPassengerretentioneffectiveness

MarketingcampaigneffectivenessPersonalizedpromotionseffectiveness

In-flightperformance AirplanefueloptimizationCrewschedulingoptimizationCargodistributionoptimizationBaggagedistributionoptimizationPassengerdistributionoptimization

Passengerandbaggagemanagement

Baggagehandler/agentschedulingoptimizationBaggagehandler/agentcostoptimizationBaggagehandler/agentperformancemonitoringBaggagehandler/agentretentionFlightturnaroundeffectiveness

Airplanemaintenance AirplaneandpartspredictivemaintenanceWeatherforecastsAirplane/componentupgradesOptimizeinventorycostsOptimizelogisticscostsMaintenanceworkereffectiveness

Partsandlogisticsmanagement AirplaneandpartspredictivemaintenanceMaintenanceschedulingoptimizationCrewschedulingoptimizationPartsdemandforecastPartsinventoryoptimizationPartslogisticsoptimization

Airplanedesign Long-termfuelcostsforecastAirplanedesignfuelefficiencyPassengerboard/de-boardoptimizationBaggageload/unloadoptimization

Step3:IdentifyDataRequirementsInstep3youidentifythedatathattheairplanemanufacturermightneedtosupportthepricing,sales,marketing,maintenance,logistics,andotheranalytics.Youwanttobrainstormthedifferentdatasourcesthatmightbeusefulinhelpingyoudeveloptheanalyticstosupportyourkeydecisions.Let'sexpandTable11-1toincludethedifferentdatasourcesyoumightneedtosupporttheanalytics(seeTable12.2):

Table12.2Data-to-AnalyticsMapping

Decisions PotentialAnalytics

PotentialDataSources

Pricingdecisions

Passenger(demand)forecastFuelcostsforecastMaintenancecostsforecastPilot/flightattendantperformanceoptimizationPilot/flightattendantretention

PassengerflighthistoryAirplaneflighthistory(routes,airports,milesflown,fuelconsumed,passengerscarried,%emptyseats)AirplaneflightsensordataAirplanephysicaldata(age,lastupgradedate,configuration,weight,fuelconsumption,capacity,maxairspeed)AirplanemaintenancehistoryPilot/flightattendantdemographicsPilot/flightattendantflighthistoryPilot/flightattendantnotesandcommentsAirportphysicaldata(numberofrunways,ageofrunways,operationhours)AirportweatherEconomicdataHistoryoffuelcosts

Salesandmarketingprogramsdecisions

PassengerlifetimevaluescorePassengerloyaltyscorePassengernetpromoterscorePassengeracquisitionPassengerretentionMarketingcampaigneffectivenessPersonalizedpromotionseffectiveness

Passengerdemographics(age,height,weight,familymembers,jobtype)PassengerflighthistoryPassengersocialmediadata(posts,likes,tweets,shares)PassengercommentsPassengersocialmediasentiment

In-flightairplaneperformancedecisions

AirplanefueloptimizationCrewschedulingoptimizationCargodistributionoptimizationBaggagedistributionoptimization

Routedata(departure,destination,distance,windpatterns)WeatherconditionsAirportdata(numberofrunways,landingtrafficpatternsanddemand)WeightofbaggageWeightofpassengers(ouch!)WeightofcargoAirplanefuelconsumptionhistory

Passengerdistributionoptimization

Passengerandbaggagemanagementdecisions

Baggagehandler/agentschedulingoptimizationBaggagehandler/agentcostoptimizationBaggagehandler/agentperformancemonitoringBaggagehandler/agentretentionFlightturnaroundeffectiveness

Baggageloadingandunloadingperformancedata(flight,airplaneconfiguration,airport,sizeofcrew,experienceofcrew)Baggagehandler/agentdemographics(age,experience,training,recognitions)Baggagehandler/agentworkhistoryBaggagehandler/agentnotesandcommentsFlightdata(departuretime,actualdeparturetime,departureairport,destinationairport,airmiles,etc.)

Airplanemaintenancedecisions

AirplaneandpartspredictivemaintenanceAirplane/componentupgradesOptimizeinventorycostsOptimizelogisticscostsMaintenanceworkereffectiveness

Airplanephysicaldata(age,lastupgradedate,configuration,weight,fuelconsumption,capacity)Airplaneflighthistoryofnumberofpassengersflownbyroute,dayofweek,holidayandseasonalityAirplanemaintenancehistory(date,workdone,partsreplaced,technician,costs)Maintenanceworkerdata(age,experience,areasofexpertise,certifications)MaintenanceworkercommentsandnotesAveragemean-time-to-failure(airmiles)bymaintenancetypesAveragemaintenancepartsandpersonalcostsbymaintenancetypes

Partsandlogisticsmanagementdecisions

AirplaneandpartspredictivemaintenanceMaintenanceschedulingoptimizationCrewschedulingoptimizationPartsdemand

Replacementpartsdata(costs,manufacturer,associatedparts,specialcertification)Maintenanceparts(costs,manufacturer)Logisticscenterdata(location,costs,capacity,accesspoints)Inventorylevels

forecastPartsinventoryoptimizationPartslogisticsoptimization

Airplanedesigndecisions

Long-termfuelcostsforecastAirplanedesignfuelefficiencyPassengerboard/de-boardoptimizationBaggageload/unloadoptimization

Forecastfuelcosts/fuelpriceindexAverageweightandagebypassengerOptimalairplaneflow(loadandunload)byairplaneconfiguration

Usingthisapproach,myMBAstudentswereabletoquicklydeterminetheinsights,analytics,andpotentialdatasourcesnecessarytosupporttheairplanemanufacturer'sbusinessmetamorphosiswithouthavinganyworkingexperiencewitheithertheairplanemanufactureroranyairlinecompany.Ithinktheyimpressedthemselves!

BusinessMetamorphosisinHealthCareI'mstruckbywhat'shappeningwiththeUnitedStateshealthcareindustryandthepowerstrugglebetweenhealthcareprovidersandhealthcarepayers.Thehealthcareindustryisripeforametamorphosisintosomethingmuchmoreefficient,effective,andcustomer(patient)centric.Thishealthcarebusinessmetamorphosiscouldcreatenewpowerbrokers;healthcareplayerswhowillleveragenewsourcesofpatient,physician,clinical,medication,wellness,andcaredatatoimprovethequalityofcareandoutcomes,moreeffectivelymanagecosts,dramaticallyreduceoreliminateinefficientandunnecessaryprocessesandprocedures,andprovideamuchmorecompellingpatientandphysicianexperience.ThinkaboutitastheUber-ificationofthehealthcareindustrybysimplifyingtheoverallhealthcareprocessinordertoreducecosts,improvepatientcare,andimproveoverallpopulationwellness.1

Todaythereisfrictionbetweenhealthcareproviders(doctors,hospitals,clinics)andthehealthcarepayers(insurancecompanies,governmentagencies).Thehealthcarepayerswanttocapthecostofmedicalservicesbydictatinghowmuchtheyarewillingtoreimburseforparticulartypesofcareunderparticularconditions.However,thehealthcareprovidersarestartingtocaptureandanalyzeawidervarietyofpatient,care,andtreatmentdata.Thisincludesstructureddatafromoperationalsystems(Epic,Cerner,Lawson,Kronos),unstructureddata(nurseandphysiciannotes,patientcomments,e-mailconversations),andexternaldatasources(WebMD,Fitbit,MyFitnessPal,Yelp,Lumosity,andagrowingvarietyofotherhealthcare-relatedwebsitesandmobileapps).Leadinghealthcareprovidersareintegratingthesedatasourcestocreateactionablescoresabouttheirpatents'overallwellness(diet,exercise,stress),aswellasscoresaboutthepatients'likelihoodforstrokes,heartattacks,diabetes,andothermaladies(seeFigure12.2).

Figure12.2Patientactionableanalyticprofile

Healthcareprovidersareinapositionofstrengthwithrespecttotheirabilitytoleveragesuperiorinsightsaboutpatients,physicians,medications,procedures,treatments,diseases,therapy,maladies,etc.inordertoexertsignificantpressureontheinsurancecompanieswithrespecttowhatproceduresshouldbereimbursedandforhowmuch.Thehealthcareproviderswillknowwhichproceduresandmedicationsworkbestforwhichpatientsinwhatsituationsandcanleveragethoseinsightstoexertmoreinfluenceonthehealthcareindustry.

Healthcareprovidersneedtocontemplatethebusinessmetamorphosispotential,andhowtheywilltransitionfromprovidingjustpatientcaretohowtheybecomethemaintainerofthepopulation'soverallwellness.Preventivecareopportunitiesfueledbysuperiorpatient,medication,exercise,diet,andstressinsightscouldultimatelybethemostimportant(andprofitable)partofthehealthcarevaluechain!

Let'sdrillintothispotentialhealthcareindustrymetamorphosisinmoredetail.You'llusethesameapproachdiscussedwiththeairplanemanufacturerexamplebyfirstunderstandingthekeydecisionsthatneedtobemadetosupportthehealthcareindustrymetamorphosis,andthenidentifyingtheanalytics(orinsights)anddatanecessarytosupportthedecisions.

Businessdecisions→Supportinganalytics→Potentialdatasources

First,youwanttocapturethedecisionsthatthehealthcareprovidersneedtomakeaboutpatients,qualityofcare,costofcare,procedures,medications,etc.Thosedecisionscouldinclude:

Decisionsaboutwhichmedicalproceduresandmedicationstousewithwhichpatientsinwhatmedicalsituations

Decisionsabouttheappropriatelevelofmedicalcareversuscostsgiventhe

patientsituationandprognosis

Decisions(recommendations)forpatientsregardingdiet,sleep,stresslevel,exercise,etc.inorderreducetheriskofdiabetes,strokes,heartattacks,etc.

Decisionsaboutwhatcombinationsofdoctors,nurses,andtechniciansaremostcost-effectiveindifferentsurgicalsituations

Decisionsaboutwhatmedicationsandtreatmentsaremostcost-effectiveintreatingdifferentpatienthealthcaresituations

Decisionsabouttheoptimalcombinationsofrehab,exercise,sleep,medication,therapy,anddietthatcanaccelerateapatient'srecovery

Afteryouhaveidentifiedthebusinessandoperationaldecisions,thenyouwanttocapturetheanalyticsnecessarytosupportthedecisions.Someofthoseanalyticscouldinclude:

Patientwellnessscore

Patientexercisescore

Patientstressscore

Patientdietscore

Medication,procedures,andtreatmenteffectiveness

Hospiceversushospitalcostandcareeffectiveness

Physicianandnurseeffectiveness

Emergencyroomdemandforecasting

Populationhealthforecasting

Physicianandnurseretention

Hospitalacquiredinfectionsreductions

Unplannedreadmissionsreductions

Finally,youwanttobrainstormdatasources(patients,physicians,outcomes,costofcare,procedures,treatments,medications,etc.)thatcouldsupportyouranalytics.Followingisalistofpotentialdatasourcesthatmightbeofvalueindevelopingyouranalytics:

Hospitalcaredata(Epic,Cerner)

MapMyRun

Financials(Lawson,Oracle)

MyFitnessPal

Hoursworked(Kronos)

Strava

Physiciannotes

Smarttoilets

Nurseandtechniciannotes

Smartbloodpressuremonitors

Pharmacyandprescriptions

Smartglucosemonitors

Medicationusage

AppleHealth

Patientcomments

Indeed.com

HCAHPSandsurveys

CDC

Socialmediacomments

Healthcare.gov

Yelpratings

GoogleTrends

WebMD

Trafficpatterns

Lumosity

Weatherforecasts

NikeFuelBand,Fitbit,andGarmin

Holidayschedules

AppleWatch

Specialeventsschedules

Attheendofthismetamorphosisexerciseprocess,healthcareproviderswillbeinabetterpositiontohaveidentifiedthedecisions,analytics,anddatanecessarytoclaimabiggerportionofthehealthcarevaluechain,including:

Whataretheoptimaltreatmentsandmedicationsgivenapatient'sconditionsandhistoryandhowmuchthepayershouldreimburse?

Whatisthevalueofpreventivecare(diet,exercise,sleep,medication,therapy),andhowmuchshouldhealthcarepayerscovertoincentmorehealthyandmoreprofitablepatientbehaviors?

SummaryIndustriesasdiverseasprofessionalsports,manufacturing,consumerpackagegoods,retail,education,socialservices,andhealthcarearegoingthroughbusinessmodelmetamorphosesbyleveragingthewealthofrichdatasourcesabouttheircustomers,products,andoperations.Andleadingorganizationsarelearningtoleveragetheresultinganalyticinsightstochangethebalanceofpowerwithintheirindustry.

Inthehealthcareindustry,healthcareprovidersthatknowthemostabouttheirpatients'andphysicians'behaviors,tendencies,andusagepatternsareinthebestpositiontocorrectthefuzzymaththathealthcarepayershavebeenusingtosettheirreimbursementrates.

Nomatterwhatyourorganization'sultimatebusinessvision,goingthroughthebusinessmetamorphosisexercisecanuncoverbigdatarequirementsarounddecisions,analyticsanddatasourcesthatcanbeleveragedtotransformormetamorphoseyourorganization'sbusinessmodel.Anditisaneasierexercisetodothanonemightthink,asthestudentsinmyMBAclassdiscovered.

Thebottomlineacrossallindustriesisthis:theorganizationsthatknowthemostabouttheirproducts,operations,andcustomers'behaviors,tendencies,andusagepatternsareinthebestpositiontomonetizethoseinsightsandexertcontroloverthoseorganizationswithintheirvaluechainsthatlackthosecustomer,product,andoperationalinsights.Intheend,that'stheultimategoaloftheBusinessMetamorphosisphaseoftheBigDataBusinessModelMaturityIndex.

HomeworkAssignmentUsethefollowingexercisestoapplywhatyoulearnedinthischapter:

Exercise#1:Buildontheairplanemanufacturerexamplebyapplyingthemetamorphosisexercisetechniquestoanotherbusinessstakeholdersuchastravelagents,hoteloperators,orgroundtransportationcompanies.

Exercise#2:Pickanorganization(preferablyyourownorganization)andapplythemetamorphosisexercisetobrainstormthedecisions,analytics,anddatanecessarytosupportyourorganization'sbusinessmetamorphosis.Asalways,itismoreproductiveandmorefuntodothisexercisewithasmallgroup.Maybeflysomeplacecool(likeLasVegas,Austin,CharlesCity,orNashville)toputeveryoneintherightframeofmind!

Notes1Iusetheterm“Uber-ification”todescribethemetamorphosisoftraditionalindustriesbynewbusinessmodelsthatsimplifytheconsumer'sdecisionprocess.ThecompanyUberisthreateningthetraditionaltaxi,limousine,andtransportationindustrieswithasmartphoneappthatgreatlysimplifiestheuser'stransportationdecisions.Uberhascreatedanewmarketplacethatmatchesriderswithdriversandhasturnedeverydriverintoapotentiallimoortaxidriver.

PartIVBuildingCross-OrganizationalSupportChapters13through15laytheorganizationalandculturalfoundationformetamorphosingtheorganization.Thesechapterscovermanyofthepeople,processes,roles,andresponsibilitiesthatneedtobeaddressedasorganizationslooktointegratedataandanalyticsintotheirbusinessmodelsandcompletethebigdatajourneytotheBusinessMetamorphosisphaseoftheBigDataBusinessModelMaturityIndex.

InThisPart

Chapter13:PowerofEnvisioning

Chapter14:OrganizationalRamifications

Chapter15:Stories

Chapter13PowerofEnvisioningThebusinesspotentialofbigdataisonlylimitedbythecreativethinkingofyourbusinessstakeholders.Soinasense,thismaybethemostimportantchaptersupportingthe“thinkinglikeadatascientist”processandthemostfundamentallycriticalguidancewithinthebook.

Opportunitiesaboundfororganizationstoanalyzethe“dark”datathatisburiedwithintheiroperationalsystemsanddatawarehousesandidentifyotherinternalandexternaldatasourcesthattheycouldleveragetooptimizekeybusinessprocesses,differentiatetheircustomerengagement,anduncovernewmonetizationopportunities.However,gettingthebusinessstakeholderstoenvisionwhatmightbepossiblewithrespecttotheircurrentlyunder-utilizedinternaldataandthewealthofexternaldatasourcesisasignificantchallenge.SoundsliketheperfecttimeforanenvisioningengagementsuchasEMC'sBigDataVisionWorkshop.1

NOTE

IamthecreatorofEMC'sBigDataVisionWorkshopmethodology.IhavepersonallyexperiencedthepowerfulbusinessideasthattheBigDataVisionWorkshopcanunleashfromparticipantswhenthepropercreativeenvironmentandprocessesareputintoplace.ConsequentlyIamaverybullishontheBigDataVisionWorkshopandthegame-changingpowerofenvisioning.

Inthischapter,IamgoingtodiscussEMC'sBigDataVisionWorkshopasanexampleofanenvisioningengagementthatcanfueltheorganization'screativethinkingforidentifyingwhereandhowtoleveragebigdatatopowertheorganization'sbusinessmodels.TheBigDataVisionWorkshopleveragesthe“thinkinglikeadatascientist”techniquestohelpthebusinessstakeholdersunderstandhowbigdatacanoptimizetheirkeybusinessprocessesanduncovernewmonetizationopportunities.

Envisioning:FuelingCreativeThinkingTheBigDataVisionWorkshopisanenvisioningengagementdesignedtodriveorganizationalalignmentandfuelcreativethinkingaboutwhereandhowanorganizationcanleveragedataandanalyticstopoweritsbusinessmodels.TheBigDataVisionWorkshophelpsorganizationsthatdon'tknowhowtoanalyzethedatatheyalreadycollectorhowtoidentifyadditionaldataworthcollecting.Specifically,theBigDataVisionWorkshop:

Providesaformalprocessforidentifyingwheredataandanalyticscandrivematerialbusinessimpactthataffectstheorganization'skeybusinessinitiativesoverthenext9to12months.

Ensuresbusinessrelevancebyfocusingontheorganization'smostimpactfulbusinessopportunities.

FacilitatesgroupexercisestoencouragebusinessandITstakeholderstoenvisionthe“realmofwhat'spossible”fromtheorganization'sinternaldata,aswellasexplorethepotentialofexternaldata.

DrivesbusinessandITalignmentaroundthose“best”analyticopportunitieswithaclearroadmapofwhatneedstobedoneoverthenext9to12months.

TheBigDataVisionWorkshopprocessisidealfororganizationswho:

Haveadesiretoleveragebigdatatotransformtheirbusinessbutdonotknowwhereandhowtostart.

Haveawealthofdatathattheydonotknowhowtomonetize.

OrganizationsofallsizeshavesuccessfullyleveragedtheBigDataVisionWorkshoptoidentifywhereandhowtoleveragedataandanalyticstopowertheirbusinessmodels.Noorganizationistoosmall,andyes,yourdatais“big”enough.

BigDataVisionWorkshopProcessTheBigDataVisionWorkshoptypicallyspanstwotothreeweeks.Itconcludeswithahalf-day,facilitated,on-siteinteractiveworkshopthatprioritizesthehigh-valuebusinessusecasesandidentifiesthesupportingdataandadvancedanalyticrecommendations.However,asubstantialamountofworkneedstobedonepriortotheworkshoptodrivethecross-organizationalcollaborationandfuelthecreativethinkingprocesses.Figure13.1outlinestheBigDataVisionWorkshopprocessandtimeline.

Figure13.1BigDataVisionWorkshopprocessandtimeline

Let'sexaminethestagesoftheBigDataVisionWorkshopprocesstohelpyouunderstandhowtostimulatecreativethinkingandgeneratetheactionableanalyticrecommendationsthatbestsupporttheadvancementofyourorganization'skeybusinessinitiatives.

Tomakethisenvisioningprocessmorereal,Iamgoingtowalkthroughtheprocessusingahealthcareorganization(groupofhospitals)asanexample.IwillrefertotheorganizationasHealthcareSystems.

Pre-engagementResearchFortheengagementtobesuccessful,thereareseveralkeyactivitiesthatneedtohappenpriortotheenvisioningengagementtoensurethatitisimpactfultotheorganization.Followingarethekeystepsinthepre-engagementphaseoftheBigDataVisionWorkshop:

Identifytheorganization'sbusinessinitiativeorbusinesschallengeonwhichtofocustheengagement.

Identifythebusinessstakeholderswhoimpactorareimpactedbythetargetedbusinessinitiative.Therearetypicallythreetofivedifferentbusinessfunctionsengagedintheenvisioningprocess.Awidevarietyofbusinessstakeholdersensureacomprehensivecollectionofdecisions,questions,metrics,anddatasourcesthatsupportthetargetedbusinessinitiative.

Gatherinformationaboutthesampledatasetsincludingfileformats,datalocation,datadictionary,andsmallsampleofthedata(5to6gigabytes).Ultimately,thedatascientistswillusethesmalldatasetstocreateillustrativeanalytics.

ThekeybusinessinitiativeforHealthcareSystemsistoexplorehowtoleveragedataandanalyticstoimprovethequalityofpatientcarewhilecontrollingcosts,orto“improvecost/qualityofpatientcare.”

BusinessStakeholderInterviewsTheBigDataVisionWorkshopengagementstartsbyinterviewingthekeybusinessstakeholders.Theinterviewprocessfocuseson(1)capturingthedecisionsthatthebusinessstakeholdersneedtomaketosupportthetargetedbusinessinitiativeand(2)capturingawiderangeofquestionsthatsupportthosedecisions.KeystepsintheinterviewphaseoftheBigDataVisionWorkshopare:

ConductinterviewswithbusinessandITstakeholderstocapturekeybusinessobjectives,thedecisionsthattheyaretryingtomake,andthetypesofquestionsthattheyneedtoanswerinsupportofthosedecisions.

Collectsupportingmaterialssuchassamplereportsanddashboards.Alsocollectanyexamplesofthebusinessstakeholdersdownloadingdataintospreadsheets.Thosespreadsheetscanbegoldinunderstandingthedecisionsthatthebusinessstakeholdersaretryingtomake.

Identifyorbrainstormotherpotentialdatasources(internalandexternal)thatmightbeofvalueinsupportingthekeydecisions.

Itisalwaysbesttocreateandshareaninterviewquestionnairewithintervieweespriortotheinterviews.Theinterviewquestionnaireshouldaddressthefollowing:

Whataretheirkeyobjectivesandresponsibilities?

Whatdecisionsmusttheintervieweesmakewithrespecttothetargetedbusinessinitiative?

Whatquestionsdotheyneedtoanswerinsupportofthosedecisions?

Whatarethemetricsorkeyperformanceindicatorsagainstwhichsuccesswillbemeasured?

Whataretheorganization'svaluedrivers(e.g.,thekeyactivitiesthathelptheorganizationmakemoneyrelatedtothetargetedbusinessinitiative)?

TheHealthcareSystemskeybusinessstakeholdersforthe“improvecost/qualityofpatientcare”initiativearethefollowing:

Physiciansandnurses

Clinical

Operations

Finance

Humanresources

Populationhealth

ExplorewithDataScienceAverypowerfulpartoftheBigDataVisionWorkshopengagementisthedatascienceworktocreateillustrativeanalyticsonthesampledatasets.Thispartoftheenvisioningengagementmightnotbepossibleifyourorganizationdoesnothaveaccesstoadatascienceteam.Butifyoudo,thedatascienceteamshouldexploredifferentanalytictechniques(liketheonescoveredinChapter6)tohelpthebusinessstakeholderstoenvisiontherealmofwhatispossibleusingdatascience.Keytasksinthedatascienceexplorephaseare:

Prepare,transform,andenrichthedata.

Explorethedatausingdifferentdatavisualizationtechniques.

Exploreopportunitiestointegrateexternaldatasourcessuchassocialmedia(Twitter,Facebook,LinkedIn),app-generated(Zillow,Eventbrite),andpublicdomaindata(data.gov).

Buildillustrativeanalyticsusingdifferentanalytictechniquestodeterminewhichanalytictechniquesyieldthemostrelevantinsights.

PackagedatavisualizationsandanalyticmodelsforconsumptionbythebusinessandITstakeholders.

Developsimpleuserexperiencemock-upstovalidatehowtheanalyticswillsupportthebusinessstakeholders'keydecisions.

AtHealthcareSystems,asmallsamplesetofdatafromEpic(hospitaloperations),Kronos(timeandattendance),andLawson(financeandcosts)werepulledtogether,andillustrativeanalyticswerecreatedaroundthefollowingbusinessareas(seeFigure13.2):

Emergencyroomvolumevariances

Operatingroompatientvolumeforecasting

Diagnosticcoderelationships

Kneereplacementcostclusters

Figure13.2BigDataVisionWorkshopillustrativeanalytics

AtHealthcareSystems,simplemock-upswerealsodevelopedsothattheworkshopparticipantscouldenvisionhowtheanalyticresultscouldbepresentedtofrontlineworkers(physicians,nurses,admissions)andpatients(seeFigure13.3).

Figure13.3BigDataVisionWorkshopuserexperiencemock-up

WorkshopOncetheaboveactivitieshavebeencompleted(whichtypicallytakesabouttwotothreeweeksofwork),youarenowreadyforthehalf-dayworkshop.Thegoalof

thefacilitated,on-site,interactiveworkshopistohelptheparticipants:

Gobeyonddescriptivereportingtobrainstormtheapplicabilityofpredictive(whatislikelytohappen)analyticsandprescriptive(whatshouldIdo)analytics.

Brainstorm,identify,andprioritizeadditionaldatasources(bothinternalandexternaldatasources)thatmaybeworthyofcollectingforthetargetedbusinessinitiative.

Useaprioritizationprocesstoidentifythebestanalyticopportunitiesbasedonbusinessvalueandimplementationfeasibilityoverthenext9to12months.

Followingaresomespecifictasksthatshouldbeaccomplishedduringtheworkshop.

FueltheCreativeThinkingProcessYouwanttostimulatecreative,“outofthebox”thinkingduringtheworkshop.Tofuelthecreativethinkingprocess,dothefollowing:

Sharetheillustrativeanalyticsthatthedatascienceteamcreatedfromtheclient'sdatatostimulatecreativethinkingregardinghowadvancedanalyticscouldenergizethebusiness.

Reviewexamplesfromotherindustriesofadvancedanalyticsappliedtodifferentbusinessscenarios.

Sharethemock-upsinordertostimulatecreativethinking(PowerPointworksgreatasyourmock-upanduserexperiencedevelopmenttool).

BrainstormBusinessDecisionsandQuestionsAfterwalkingthroughtheillustrativeanalyticsandmock-ups,leadtheworkshopparticipantsthroughaseriesoffacilitatedbrainstormingscenariosincluding:

Scenario1.BrainstormtheinsightsthatyouwanttouncoveraboutyourtargetedbusinessinitiativeifyoucouldgetaccesstoALLtheorganization'soperationalandtransactionaldata.ForHealthcareSystems,whatinsightswouldyouwantaboutyourkeybusinessinitiativeifyouhad10to20yearsofpatientcaredata,hospitaloperationsdata,timeandattendancedata,andfinancedata?Heck,I'mstartingtosoundliketheNSA!

Scenario2.Brainstormtheinsightsthatyouwouldwanttouncoveraboutyourkeybusinessinitiativeifyouhadaccesstoalloftheorganization'sinternalunstructureddata(physicianornursenotes,patientcomments,e-mailthreads)andexternalunstructureddata(socialmedia,mobile,blogs,newsfeeds,weather,traffic,economic,populationhealth,CentersforDiseaseControl).

Scenario3.Decomposethekeybusinessinitiativeintothedifferentevents

thatcomposethatinitiative,andbrainstormwhatinsightsyouwouldwanttocaptureifyouhadaccesstothatdatainreal-time.ForHealthcareSystems,arethereopportunitiesto“catchthepatientatthetimeofneed”interactingwithyourorganizationinordertoprovidepreventivecarerecommendations?

Scenario4.Brainstormhowyouwouldleveragepredictiveanalyticsandprescriptiveanalyticstouncovernewactionableinsightsaboutyourtargetedbusinessinitiative.ForHealthcareSystems,buildonthelearningsfromthestakeholderinterviewstocreatequestionsaboutthe“improvecost/qualityofpatientcare”businessinitiativethatstartwithverbssuchaspredict,forecast,recommend,score,orcorrelate.

Besuretocapturethedecisionsandquestionsonseparatestickynotesandplacethestickynotesonflipcharts.

GroupDecisionsandQuestionsintoCommonThemesNext,havetheworkshopparticipantsgroupthedecisionsandquestionsintousecasesthatsharecommonbusinessand/orfinancialobjectives.Haveparticipantsgatheraroundtheflipchartsandgroupthestickynotesintousecasesontheflipchartsheets.Oncethestickynotesaregroupedintocommonusecases,useamarkertodrawacirclearoundeachofthegroupingsandgiveeachgroupingadescriptiveshortname.ForHealthcareSystems,brainstormingthe“improvecost/qualitypatientcare”businessinitiativecouldyieldusecasessuchasunplannedreadmissionsanalysis,hospitalacquiredinfections,servicevarianceanalysis,staffing/cost/outcomesanalysis,staffretention,procedurescostanalysis,volumeforecasting,andpopulationhealth.

PrioritizetheGroupingsNext,havetheworkshopparticipantsprioritizetheusecasesusingtheprioritizationmatrix(seeFigure13.4).Theuseoftheprioritizationmatrixiscoveredindepthlaterinthischapter.

Figure13.4PrioritizeHealthcareSystems'susecases

SummarizeWorkshopResultsFinally,summarizetheresultsoftheworkshopincluding:

Reviewoftheprioritizedlistofpotential“AnalyticsOpportunities.”Verifythateveryonebuysoffontheendresult.

Reviewof“ParkingLot”itemsanddiscussionofanypotentialfollow-upsteps.

Discussionofnextsteps.

TheBigDataVisionWorkshopdeliverablesinclude:

Prioritizationmatrixwiththeprioritizationoftheusecases

Thestickynotecontentforeachusecase

Interviewtakeaways

Datascientistillustrativeanalytics

Userexperiencemock-ups

DocumentationoftheParkingLotitems(forpotentialfollow-up)

Dataassessmentworksheetsthatassessthebusinessvalueandimplementationfeasibilityofeachdatasource

SettingUptheWorkshopTherearemany“little”thingsthatneedtobedonepriortotheworkshop.Andwhileyoumaybetemptedtoskipovertheseseeminglysuperficialtasks,theyarecriticaltotheworkshopsuccessbecausetheysettheproperstageforthedesired“outofthebox”thinking….yes,thinkingdifferently!

Pickacreative,out-of-theboxlocation.IhavedoneworkshopsinthemiddleofanIowacornfield(forawindturbineenergycompany),inagradeschoolclassroom

completewiththoselittlechairsandtables(foracharterschool),inacomedyclub(foragamingestablishment),andinatechnologymuseum(forahigh-techmanufacturer).

Setuptheroomforfacilitatedconversations,whichcanincludethefollowing:

Arrangechairsinahorseshoeshape

Createa“ParkingLot”flipchartandtapeittothewall

Createa“GroundRules”flipchartandtapeittothewall

Createaprioritizationmatrixchartandtapeittothewall

Tapefivetosixblankflipchartsheetstothewallsforbrainstorming

Haveplentyof3×5stickynotesandmarkersavailableforimpromptucapturingofideasandthoughts

Confirmthemeetingtimeandduration.Youdonotwantpeoplewalkingoutoftheworkshophalfwaythroughbecausetheythoughtthesessionwasonlytwohours.Haveparticipantsblockoutfourtofivehours,andifyougetdonesooner,givethemthetimeback.

Kickoffthemeetingby:

Explainingwhytheparticipantsarethereandtheobjectivesoftheworkshop.

Sharingtherolesoftheworkshopteam(facilitator,datascientist,subjectmatterexpert,andscribe).

Havingeveryonesharetheirname,theirresponsibilitiesandtheirexpectationsfortheworkshop.

Establishtheworkshopgroundrulesincluding:

Onlyoneconversationatanygivenmoment.

Nohierarchyintheroom;everybodyandtheirideasareequal.

Turnoffcellphones,tablets,andcomputers(oratleastputthemintobuzzorstunmode).

Shareanyandallideas(theonlybadideaistheonethatisn'tshared).

Breaksareplannedthroughouttheworkshop,sopleasestaywiththegroup.

Useicebreakerstokickofftheworkshoptogeteveryoneparticipating.Thereareseveraldifferenttypesoficebreakers.Becreativeandrelevanttotheclient'senvironment.Forexample:

Haveeveryonesharewiththegroupsomethingaboutthemselvesthatyoudon'tthinkanyoneelseknows.

Withamoviechainclient,weaskedeachparticipanttoidentifyamoviecharacterthattheyaremostlikeandwhy.

Haveparticipantspicktheirfavoritesuperheroandexplainwhythatsuperheroistheirfavorite.

UseaParkingLotflipcharttocontroltheworkshop.Explainthepurposeofthe“ParkingLot”(i.e.,capturestopicsthatareoutsidethescopeoftheworkshopandkeepstheworkshopmovingintherightdirection).

Duringtheworkshop,usethefollowingtechniquestohelpfueltheparticipants'creativethinkingprocess:

Havetheworkshopparticipantscaptureoneideaorthoughtperstickynotethroughoutthescenarios.

Havethefacilitatorsplacethestickynotesontheflipchartsastheideasorthoughtscomeup.

Havethefacilitatorsreadaloudtheideaorthoughtastheyarepostingittothewall;thishelpstofuelthecreativethinkingprocess.

Ensurethatparticipantsbrainstormindividually.Ifyoubrainstormingroups,goodideascangetlostwhenthereareoverpoweringpersonalitiesinthegroups.

ThePrioritizationMatrixOnebigobstacletoasuccessfulbigdatajourneyisgainingconsensusandalignmentbetweenthebusinessandITstakeholdersinidentifyingthebigdatausecasesthatdeliversufficientfinancialvaluetothebusinesswhilepossessingahighprobabilityofimplementationsuccessoverthenext9to12months.Youcanidentifymultipleusecaseswherebigdataandadvancedanalyticscandelivercompellingbusinessvalue.However,manyoftheseusecaseshavealowprobabilityofimplementationsuccessoverthenext9to12monthsbecauseof:

Lackofavailabilityoftimely,accuratedata.

Inexperiencewithnewdatasourcessuchassocialmedia,mobile,unstructured,andsensordata.

Limiteddataoranalyticpeopleresources.

LackofexperiencewithnewtechnologieslikeHadoop,MapReduce,Spark,Mahout,MADlib,textmining,etc.

WeakbusinessandITcollaborativerelationship.

Lackofmanagementfortitudetostickwiththeengagement.

Oneofmyfavoriteorganizationalalignmenttoolsforaddressingthisissueistheprioritizationmatrix.Theprioritizationmatrixisamarveloustoolfor:

Identifyingthe“right”usecasetopursuewithbigdatabasedonabalanceofbusinessvalueandimplementationfeasibility.

EnsuringthatbothITandbusinessstakeholdershaveavoiceindiscussingtherelativevalueandimplementationchallengesforeachusecase.

Capturingthebusinessdriversandimplementationrisksforeachoftheusecases.

Catalyzingthedecisiononthe“right”usecasessothateveryone(businessandIT)canagreeonapathforward.

TheprioritizationmatrixisthecapstoneoftheBigDataVisionWorkshopprocess.Theprioritizationmatrixfacilitatesthediscussion(anddebate)betweenthebusinessandITstakeholdersindeterminingthe“right”usecaseonwhichtofocusthebigdatainitiative.The“right”usecasehasbothmeaningfulbusinessvalue(fromthebusinessstakeholders'perspectives)andreasonablefeasibilityofsuccessfulimplementation(fromtheITstakeholders'perspectives)overthenext9to12months.

Focusingtheprioritizationmatrixprocessonakeybusinessinitiative—suchasreducingchurn,increasingsamestoresales,minimizingfinancialrisk,optimizingmarketspend,orreducinghospitalreadmissions—iscriticalasitprovidesthefoundationandtheguardrailsformeaningfulbusinessvalueandimplementationfeasibilitydiscussions.

Theprioritizationmatrixprocessstartsbyplacingeachidentifiedusecase(identifiedintheBigDataVisionWorkshop)onastickynote.Theworkshopparticipantsthendecidetheplacementofeachusecaseontheprioritizationmatrix(weighingbusinessvalueandimplementationfeasibility)vis-à-vistherelativeplacementoftheotherusecases(seeFigure13.5).

Figure13.5Prioritizationmatrixtemplate

Thebusinessstakeholdersareresponsiblefortherelativepositioningofeachbusinesscaseonthebusinessvalueaxis.TheITstakeholdersareresponsibleforrelativepositioningofeachbusinesscaseontheimplementationfeasibilityaxis(consideringdata,technology,skills,andorganizationalreadiness).

Theheartoftheprioritizationprocessisthediscussionthatensuesabouttherelativeplacementofeachoftheusecases.Issuesdiscussedcouldinclude(seeFigure13.6):

Figure13.6Prioritizationmatrixprocess

Whyisusecase[B]moreorlessvaluablethanusecase[A]?Whatarethespecificbusinessdriversorvariablesthatmakeusecase[B]moreorlessvaluablethanusecase[A]?

Whyisusecase[B]lessormorefeasiblefromanimplementationperspectivethanusecase[A]?Whatarethespecificimplementationrisksthatmakeusecase[B]lessormorefeasiblethanusecase[A]?

Itiscriticaltotheorganizationalalignmentprocesstocapturethereasonsfortherelativepositioningofeachusecaseduringtheprioritizationprocess.Thesediscussionsprovidethefinancialguidelinesnecessarytoachievetheusecasebusinessvalueandflagpotentialimplementationrisksthatneedtobeaddressedduringtheproject.

SummaryTheBigDataVisionWorkshopandtheprioritizationmatrixaremarveloustoolsfordrivingorganizationalalignmentbetweenthebusinessandITstakeholdersaboutwhereandhowtostarttheorganization'sbigdatajourney.Thesetoolsprovideaframeworkforidentifyingtherelativebusinessvalueofeachusecasevis-à-visitsimplementationrisksoverthenext9to12months.Asaresultoftheprioritizationprocess,boththebusinessandITstakeholdersknowwhatusecasestheyaretargeting,understandthepotentialbusinessvalueofeachusecase,andhavetheireyeswideopentotheimplementationrisksagainstwhichtheprojectneedstomanage.

ThebottomlineisthattheBigDataVisionWorkshopandprioritizationmatrixensurethatthefullforceoftheorganizationcanbebroughttobearincapturingthebusinesspotentialoftheorganization'sbigdatainitiative.

HomeworkAssignmentUsethefollowingexercisestoapplywhatyoulearnedinthischapter.

Exercise#1:Grabsomecoworkersandblockoff30to45minutestotestouttheprioritizationmatrixprocess.Asagroup,identifysomeinitiativesorprojectstheorganizationiscontemplatingoverthenext9to12months.Thenusetheprioritizationmatrixtodebate,argue,andarmwrestleaboutwheretopositioneachoftheseprojectsvis-à-viseachotherontheprioritymatrix.

Exercise#2:Havesomefunwiththeprioritizationmatrix!Grabsomeguysandgalsandidentifythecurrenttop10to12NBAandWNBAbasketballplayers(thatalonemaybeadifficultchallenge).Usetheprioritymatrixprocesstodecidethevalueoftheplayersbasedontheir:

Personalperformance—theirpersonalperformancenumberslikepointsscored,numberofrebounds,andnumberofassists

Importancetotheteam—theirabilitytomaketheirteammatesbetter(IwonderhowonewouldcompareStephenCurrytoElenaDelleDonne.Man,thatshouldbeafundiscussion!)

Notes1EMCCorporationistheworld'sleadingdeveloperandproviderofinformationinfrastructuretechnologyandsolutionsthatenableorganizationsofallsizestotransformthewaytheycompeteandcreatevaluefromtheirinformation.

Chapter14OrganizationalRamificationsNowcomesthehardpart.No,it'snotthetechnologyandknowingwhattechnologiestobackandwhichonesmightfade.No,it'snotthelackofdatasciencetalent.Andno,it'snotevengainingthebuy-inofthebusinessstakeholders,thoughthatcanbeahugeissue,aswehavediscussedthroughoutthisbook.

Thebiggestthreattothesuccessofanyorganization'sbigdatainitiativeistheorganizationalimpediments.Moreaccuratelyput,itisovercomingtheorganizationalinertiaandimplementingtheorganizationalandculturalchangesnecessarytoadvancefrombusinessmonitoringtobusinessoptimization,monetization,andultimatelymetamorphosis.It'stoughtogettheorganizationto“thinkdifferently.”AsPogofamouslysaid,“Wehavemettheenemyandheisus.”

Inthischapter,youwillexploretheroleoftheChiefDataOfficer,whichIprefertocalltheChiefDataMonetizationOfficer.Youaregoingtoconsiderthetrioofprivacy,trust,anddecision(notdata)governance.Andfinally,thechapterconcludeswithguidanceforliberatingtheorganizationandunleashingtheonlythingstandingbetweenbigdatamediocrityandbigdatametamorphosis—creativethinking.

ChiefDataMonetizationOfficerThere'sanewsheriffinthebigdataworldandthat'stheChiefDataOfficer(CDO).AmoreaccuratetitleforthisroleisChiefDataMonetizationOfficer(CDMO),asthispersonshouldfocusondrivingandderivingvaluefromtheorganization'sdataandanalyticassets.TheCDMOshouldowntheorganization'sinvestmentdecisionswithrespecttodataandanalyticsandownthecharterforidentifyingandmanagingtheorganization'sdataandanalyticsmonetizationinitiatives.

AnidealCDMOcandidateshouldhaveabackgroundineconomics.TheCDMOdoesn'tneedaninformationtechnologybackground(that'stheCIO'sjob).Irecommendaneconomicseducationbecauseeconomistshavebeentrainedtoassignvaluetoabstractconceptsandassets.Aneconomistisanexpertwhostudiestherelationshipbetweenanorganization'sresourcesanditsproductionorvalue.Andintoday'sworld,assigningvaluetodatacanbeextremelyabstract.

CDMOResponsibilitiesTheCDMOownsquantifyingthevalueofdataandchampioningtheorganization'seffortstomonetizetheorganization'sdata(byapplyinganalyticsinordertooptimizekeybusinessprocessesanduncovernewrevenueopportunities).TheCDMOmustcollaboratewithbusinessmanagementtodeterminethecosts,benefits,andReturnonInvestment(ROI)fordataanddata-relatedbusinessinitiatives.

TheCDMOshouldsitbetweenthebusinessleaders(whohaveprofitormarginresponsibilities)andtheCIO(whoownsthetechnologydecisions)inordertodrivetheidentification,valuation,andprioritizationofdataacquisitionanddatamonetizationprojects.TheCDMOshouldreporttotheCOOorCEObecausetheCDMOshouldalsohaverevenueandmarginresponsibilities.ReportingtotheCOOorCEOalsoensurestheCDMOhastheorganizationalclouttodrivecollaborationbetweenbusinessmanagementandtheCIOandtoleadtheorganization'sdatamonetizationefforts.

CDMOOrganizationAsorganizationsbuildouttheirdatascienceteams,thedatascienceteamsshouldfallunderthepurviewoftheCDMO.Thedatascienceteamneedsaseniormanagementchampion,andtheCDMOisthebestchoice.Heck,I'devenputtheBusinessIntelligence(BI)teamsunderthepurviewoftheCDMOinordertodriveclosercollaborationandsharelearningsbetweentheBIanddatascienceteams.(SeeChapter5forareviewonthedifferencesbetweenBIanddatascience.)

Iwouldhavethedatascientists(andBIteams)hardlinetothebusinessfunctionsanddottedlineintotheCDMO(seeFigure14.1).

Figure14.1CDMOorganizationalstructure

Todrivedatamonetizationsuccess,theBIanddatascienceteamsmustthoroughlyunderstandtheorganization'skeybusinessinitiatives,thedecisionsthatthebusinessneedstomakeandthequestionsthatthebusinessneedstoanswertosupportthosebusinessinitiatives.TheBIanddatascienceteamsneedtobeaccountabletothelineofbusinessbecausethat'swherevalue(revenue,profit,margin)isbeingcreated.

Bytheway,theCDMOshouldNOTownthedatalake,thedatawarehouse,oranyoftheunderlyingdataarchitectureortechnologies.ThesedataarchitecturesandtechnologiesneedtobeownedbytheCIO.Consequently,theCDMOmustcollaboratewiththeCIOtocreateadataarchitectureandtechnologyroadmapthatsupportstheCDMO'smonetizationefforts.

AnalyticsCenterofExcellenceTheanalyticsCenterofExcellence(COE)iscriticaltothesuccessoftheCDMO'sdatamonetizationcharterandneedstobetheresponsibilityoftheCDMO.KeyCDMOtaskswithrespecttotheCOEinclude:

Hiring,development,promotion,retention,andtalentmanagementofthedatascienceandBusinessIntelligenceteams(eveniftheydositwithinthebusinessunits)

Continuoustrainingprogramandcertificationonnewtechnologiesandanalyticalgorithms

Activeindustryanduniversitymonitoringtostayontopofmostcurrentdataanddatasciencetrends

BusinessIntelligence,datavisualization,statistical,predictiveanalytics,machinelearning,anddataminingtoolevaluationsandrecommendations

Capturing,sharing,andmanagement(i.e.,libraryfunction)oftheBusinessIntelligence,datawarehousing,anddatasciencebestpracticesacrossthe

organization

Identifyinganalyticprocessesworthyoflegalorpatentprotection

TheanalyticsCOEbecomesthesunaroundwhichthedatascienceandBusinessIntelligencepersonnel“orbit”fromaskillsandcareerdevelopmentperspective.

CDMOLeadershipTheCDMOneedstoworkcloselywiththeFinancedepartmentinordertodevelopdataacquisitionanddatamonetizationROIestimates.FinancewillkeeptheCDMOhonestwithrespecttocreatingvalue,butexpectthatrelationshiptobe“challenging”becauseFinancewilllikelystrugglewithputtingvalueonintangibleassetslikedataandanalyticinsight.That'swheretheCDMO'seconomicsbackgroundwillhelp.

Also,theCDMOwillneedtobecomeamasterfacilitator(bytheway,thisisagoodskillforanyonewhoistryingtobridgethegapbetweenITandthebusiness).TheCDMOisgoingtoneedtoleverageteamworkandcollaborationtobesuccessfulinthejob.Heorshealsomustbeinthefrontofthesedataandanalyticenablementdiscussions.TheCDMOmusttaketheinitiativetoleadtheculturalchangenecessarytogettheorganizationtomorereadilyembracedataandanalyticsintheoperationsofthebusiness.

InfusingaCDMOrolecanbeasignificantchallenge.Notonlyisitnew,sotherearenopredefinedbestpracticestoleverage,butalsotryingtodeterminethevalueofdataandanalyticsissomethingatwhichfeworganizationshavemastered.TheCDMOwillhavetocontinuallyprovehim-orherselftotherestoftheorganization.TheCDMOwillhavetoevangelizetherolethatdataandanalyticscanplayinimprovingtheentireorganization'sdecision-makingcapabilitiesandempoweringfrontlineemployeesandcustomers.

Privacy,Trust,andDecisionGovernanceBynow,we'veallheardthestory.Aretailer,byvirtueofitsadvancedanalysisofwebsiteactivities,determinedwithsomelevelofconfidencethataparticularwebsitevisitorwasapregnantwoman.Onthebasisofthisinsight,theretailerstartedmailingbaby-relatedcoupons(prenatalcare,babyroomfurniture,nursingproducts,etc.)tothewoman,whowasactuallya16-year-oldgirl.Thegirl'sfatherbecameoutragedwhenhesawthecouponsaddressedtohisdaughter.Hecomplainedtothelocalstoremanager,onlytolearntwoweekslaterthathisdaughterwasindeedpregnant.

Manyinthedatasciencecommunitymightperceivethisasahugesuccess—themerchandiser'ssuperiordatascienceskillswereabletodeterminethatafemalecustomerwaspregnantevenbeforeherfatherknew!However,theretailercreatedapublicrelationsfiasco,becausejustastheretailerknewwithsomelevelofconfidencethatthecustomerwaspregnant,theretaileralsolikelyknewwithsomelevelofconfidencethatshewasunderage.

Therearenumerousotherexampleswhereanorganizationmayuncover(withsomelevelofconfidence)insightsaboutitscustomersbutshouldnottoactonthoseinsights.Examplesinclude:

Customerisresearchingcancerorsomeotherseriousailment

Customerisresearchinganewjob(ifheorshehasanexistingjob)

Customerisresearchingdatingsites(ifheorshehasmarried)

Customerisresearchingdivorcelawyers(ifheorshegotbustedvisitingdatingsites)

Allofthesesituationscanprobablycanbeascertained(withsomelevelofconfidence)byminingacustomer'skeywordsearches,socialmediapostingsandexchanges,e-mailcommunications,andwebsiteandblogvisits(e.g.,timeonasite,frequencyofvisits,recencyofvisits,etc.).However,actingonthesesuspectedsituationscouldbecatastrophicfromanorganizationalgoodwillandpublicrelationsperspective.WhichbringsustowhatIbelieveshouldbethe“goldenrule”forbigdataanddatascience:

JustbecauseyouknoworsuspectsomethingaboutacustomerdoesNOTnecessarilymeanthatyoushouldactonthatknowledge.

Reluctancetoadheretothisrulecanbecatastrophicforanorganization,leadingtoprivacyissues,fines,andpotentiallyevenlawsuits.

PrivacyIssues=TrustIssuesCustomerloyaltyprogramsthrivebecauseorganizationsgivetheircustomerssomethinginreturnforpurchaseinformationandinformationaboutthecustomer.I'mamemberofnumerousloyaltyprograms,andtheseloyalty

programsrewardmyloyaltywithdiscounts,freecoffeeandpastries,freeairlinetripsandhotelstays,andcash.Igivetheminformationaboutmyshoppingandtravelactivities,andinreturntheypaymebackinrewardsanddiscounts.

However,I'mhesitanttoshareanyadditionalpersonalinformationbecause(1)theseorganizationshavenotgivenmeacompellingreasontosharemorepersonalinformation,and(2)Idonottrustthemtousethatdatainmybestinterests.Letmewalkyouthroughanexample.

Let'ssaythatyouareagrocerychainandyouwouldlovetoknowthefollowinginformationasacustomerwalksintoyourstore:

What'sonhershoppinglist?

What'sherbudget?

Ifthereisanyparticularevent(birthday,barbeque,party)forwhichsheisplanning?

Withthatinformation,thegrocerychaincouldcreateasetofrecommendationsthatwouldallowthecustomertooptimizeherbudget,aswellasrecommenditemsthatmightbeusefulfortheupcomingevent.Thatwouldbearealwinforboththecustomerandthegrocer.Infact,IwouldbewillingtosharethatinformationwithmygroceraslongasIcouldbeconfidentthatthegrocerwasmakingrecommendationsthatwereinmybestinterest.

However,theminutetheretailerrecommendssomethingthatisnotofvaluetomebutisofvaluetoit(i.e.,recommendsoneoftheretailer'smoreprofitableprivatelabelproductsasareplacementforthebrandedproductthatIhaveusedforyears),thenitwillhaveviolatedmytrustthatitwouldonlyusemydatainmybestinterests.

Trustistheheartoftheprivacyissuefromacustomer'sperspective:

Customersdon'ttrusttheorganizationtohavetheguidelinesandgovernanceinplacetoknowwhenitshouldact,andwhenitshouldNOTact,oninsightsthatithasgleanedaboutthem.

Customersdon'ttrusttheorganizationtofocusonthecustomer'sbestinterestsandinsteadoftheorganization'sbestinterests.

Customersdon'ttrusttheorganizationtorefrainfromsellingtheirpersonaldatatoothersforitsowngain.

Thisprivacyissueisonlygoingtobecomebiggerandbigger,especiallyasorganizationsbecomemoreproficientatminingbigdataanduncoveringnewinsightsabouttheircustomers'interests,passions,affiliations,andassociations.

Onesimplewaytotestwhetherornotyoushouldactontheinsightsthatyouhavegainedaboutyourcustomersisthe“Mom”test.Thatis,whatwouldyourmomthinkofyourdecisionabouthowyouusethatinformationaboutacustomer?Inmostcases,theMomtestwouldquicklyidentifythosethingsthatarejustnotthe

rightthingtodo.

However,organizationscan'trelyontheMomtest,sotheyneedamoreformaldecisiongovernanceorganization.

DecisionGovernanceOrganizationsneedaformaldecisiongovernanceorganizationandprocessesthatclearlyarticulatetherules,policies,andregulationswithrespecttohoworganizationswillandwillnotuseinformationabouttheircustomers.Decisiongovernanceisdifferentfromdatagovernanceinthefollowingways:

Datagovernanceprovidespolicies,procedures,andrulesthatmanagetheavailability,usability,integrity,security,andaccessibilityofanorganization'sdata.

Decisiongovernanceprovidespolicies,procedures,andrulesthatmanagethecapture,privacy,anduseoftheinsightstodriveinteractionsordecisionsthatmightimpactaparticularcustomer.

Mostorganizationsalreadyhaveadatagovernanceorganization,sotheylikelyalreadyhavetheexperience,policies,procedures,andpeopleonwhichtheycouldbuildtheirdecisiongovernanceorganization.

Thedecisiongovernanceteammustworkwiththebusinessstakeholderstodecidewhatinformationtheyareseekingontheircustomersandclearlydefinewhenandwheretheywillusethatinformation.Andifthereeverisasituationthatisnotcoveredbythedecisiongovernancepolicies,thennoactionshouldbetakenuntilthedecisiongovernanceorganizationhasdecidedwhattheproperactionshouldbe.

Decisiongovernancehasbecomeapriorityfororganizationsbecausetheadventofbigdataisenablingorganizationstogatherdetailedinsightsabouttheircustomers'behaviors,tendencies,propensities,interests,passions,affiliations,andassociationsthatcaneasilybeusedforbothappropriateandinappropriatedecisionsandactions.Lackofdecisiongovernanceisaclearandpresentdangertoorganizationsthataretryingtomineactionableinsightsoutoftheirbountyofconsumerdata.Organizationsneedtoacttoensuretheproperandethicaluseoftheircustomers'dataandtheresultinganalyticinsights,otherwisetheyriskopeningthemselvestosignificantprivacyissuesandlawsuits.

UnleashingOrganizationalCreativityAh,theanguishofnotknowingthe“right”answers.Organizationsstrugglewiththeprocessofdeterminingthe“right”answers,resultinginwasteddebatesanddivisiveargumentsregardingwhoseanswersaremoreright.Theyevenhaveanameforthisdebilitatingprocess—analysisparalysis—wheredifferentsidesoftheargumentbringforththeirownfactoidsandantidotalobservationstosupportthejustificationoftheir“right”answer.However,theconceptsofexperimentationandinstrumentationcanactuallyliberateorganizationsfromthis“analysisparalysis”byprovidingawayout—awayforwardthatleadstoactionversusjustmoredebates,morefrustrations,andmoreanalysisparalysis.

Formanyorganizations,theconceptsofexperimentationandinstrumentationareabitforeign.InternetcompaniesanddirectmarketingorganizationshaveingrainedthesetwoconceptsintotheiranalyticsandcustomerengagementprocessesthroughconceptslikeA/Btesting.1Theyhaveleveragedtheconceptsofexperimentationandinstrumentationtofreeuptheorganizationalthinking—tofreelyexplorenewideasandtest“hunches”—butinascientificmannerthatresultsinsolidevidenceandneworganizationallearning.

Let'sexaminehowyourorganizationcanembracethesesameconceptsaspartofyourbigdatabusinessstrategy.Let'sstartbydefiningtwokeyterms:

Experimentationistheact,process,practice,orinstanceofmakingexperiments,whereanexperimentisatest,trial,ortentativeprocedure;anactoroperationforthepurposeofdiscoveringsomethingunknownoroftestingaprinciple,supposition,orhypothesis.

Instrumentationistheprocessofmeasuringtheexperimentationresultswithinaproductionoroperationalenvironment.

Takentogether,thesetwoconceptscanliberateorganizationsthataresufferingfromanalysisparalysisandarestrugglingwhentheyarenotcertainwhatdecisiontomake.Theconceptsofexperimentationandinstrumentationcanempowerthecreativethinkingthatisnecessaryasorganizationslooktoidentifyhowtointegratedataandanalyticsintotheirbusinessmodels.This“empowerment”cycleempowersorganizationstofreelyconsiderdifferentideaswithoutworryingaboutwhethertheideasarecorrectaheadoftime.Organizationscanlettheteststellthemwhichideasare“right”andnotletthemostpersuasivedebaterormostseniorpersonmakethatdetermination.Itempowerstheorganizationtochallengeconventionalthinkingandunleashescreativethinkingthatcansurfacenewmonetizationideas.Nolongerdoyouhavetospendtimedebatingwhoseideaisright.Instead,puttheideastothetestandletthedatatellyou!

Let'swalkthroughanexampleofhowtheempowermentcycleworks(seeFigure14.2):

Figure14.2Empowermentcycle

Step1:Developahypothesisorhunchthatyouwanttotest.Forexample,Ibelievethatmytargetaudiencewillrespondmorefavorablytoa“BuyOneGetOneFree”(BOGOF)offer,whilemycolleaguebelievesthata“50%off”offerismoreattractivetoourtargetaudience.

Step2:Developthedifferenttestcasesthatcanproveordisprovethehypothesis.Youwanttobeclearastothemetricsyouwouldusetomeasurethetestresults(e.g.,clickthroughrate,storetraffic,sales,marketsentiment).Inthisexample,wewouldcreatethreetestcases:“BOGOF”offer,“50%off”offer,andacontrolgroup.Wewouldrandomlyselectourtestandcontrolaudiencesandensurethatothervariablesarebeingheldconstantduringthetest(e.g.,sametimeofday,sameaudiencecharacteristics,samechannel,sametimeframe,etc.).

Step3:Measuretheresultsofthetestcasesinordertodeterminetheeffectivenessofthetestcases.Inthisexample,we'dwanttoensurethateachofthethreetestcaseswasappropriatelyinstrumentedor“tagged”andthatwewerecapturingalltherelevantdatatodeterminewhorespondedtowhichoffers,whodidn'trespond,andtheultimateoutcomesoftheirresponses.

Step4:Executethetests.Wewouldnowstartthetests,capturethedata,endthetest,andquantifythetestresults.

Step5:Learnandmoveon.We'dlookatthetestresults,examinewhorespondedtowhatoffers,determinethefinalresults,anddeclareawinner.Wewouldthenpackageorsharethelearningswithotherpartsoftheorganizationandthenmoveontothenexttest.

Theempowermentcycleleveragesexperimentationandinstrumentationtoempowerorganizationstofreelyexploreandtestnewideas,anditempowersorganizationstogetmovingandnotgetboggeddowninanalysisparalysis.Experimentationandinstrumentationaretheanti-analysisparalysisointment,becausetheyprovideorganizationswiththetoolsandconceptstotestideas,learnfromthosetests,andmoveon.

SummaryThereareseveralorganizationalissuesthatneedtobeaddressedinordertohelporganizationsintegratedataandanalyticsintotheirbusinessmodels.Thischapteraddressedsomeconceptstohelptheorganizationmoreeffectivelyadoptdataandanalytics:

TheroleoftheChiefDataMonetizationOfficertoleadtheorganization'sdataandanalyticsinvestmentandmonetizationefforts

Addressingtheissuesofprivacyandtrustthroughaformalizeddecisiongovernanceorganization

Howtounleashtheorganization'screativethinking,whichistheonlythingstandingbetweenbigdatamediocrityandbigdatametamorphosis

Finally,don'tforgetthiscriticalcustomergoldenrule:

JustbecauseyouknoworsuspectsomethingaboutacustomerdoesNOTnecessarilymeanthatyoushouldactonthatknowledge.

Makeyourmomproud.

HomeworkAssignmentUsethefollowingexercisestoapplywhatyoulearnedinthischapter.

Exercise#1:DocumentthebiggestorganizationalchallengesthataCDMOwouldfacewithinyourorganization.Foreachchallenge,brainstormsomeideasastowhattheCDMOcoulddotoaddressthoseissues.

Exercise#2:Identifysomebusinesspartnerswithwhomyoucoulddiscussandultimatelytesttheempowermentcycle.Identifysomehypothesesorrulesofthumbthatyourbusinesspartnerswouldliketochallenge.Brainstormthedecisions,analytics,anddatarequirementsnecessarytochallengethatconventionalthinking.

Notes1A/Btesting(alsoknownassplittesting)isamethodofcomparingtwoversionsofawebpageagainsteachothertodeterminewhichoneperformsbetter.BycreatinganAandBversionofyourpage,youcanvalidatenewdesignchanges,testhypotheses,andimproveyourwebsite'sconversionrate.Source:https://www.optimizely.com/ab-testing/

Chapter15StoriesEveryonelovesstoriestowhichtheycanrelate,whichprobablymakesittheidealwaytoconcludethisbook.Whilestoriescanbefunandfunny,themostvaluablestoriesarethosethatmotivateustothinkdifferentlyandtakeaction,wherethestoryissocompellingthatthereadercan'twaittoputtheideasintoaction!

Thegoalsofthischapteraretosharesomebigdatastoriesandtohelpyou,thereader,developinspiringstoriesthatarerelevanttoyourorganizationandmotivatetheorganizationintoaction.

Insteadofprovidingalonglistofthedifferentanalyticsthatareoccurringwithindifferentindustries,I'mofferinga“thinkdifferently”approachforhowyoufindandconstructbigdatastoriesthatarethemostrelevanttoyourorganization.Insteadoflookingatthebigdatastoriesfromthetraditionalindustriesperspective,let'slookatstoriesfromtheperspectiveoftheorganization'sstrategicnouns,orkeybusinessentities.Ifindthatmostbigdataanddatasciencestoriesfallintothreecategoriesofbusinessentityanalytics(regardlessofindustry):

Customerandemployeeanalytics

Productanddeviceanalytics

Networkandoperationalanalytics

Theadvantageoflookingforstoriesacrossthesethreecategoriesisthatitpreventsorganizationsfromartificiallylimitingthemselvesinsearchingforrelevantbigdatastories.Manyorganizationsareonlyinterestedinhearingaboutbigdatastoriesthatarehappeningwithintheirindustry.That'sthe“safe”waytogo.Butsometimesthemostpowerfulopportunitiesarerealizedfromstoriesfromotherindustries.Havingabroaderviewofthesebigdatastoriescanopentheeyesofthebusinessexecutivesastothepotentialofbigdatawithintheirorganizations.

Forexample,digitalmediaorganizationsuse“attributionanalytics”toquantifytheimpactofdifferentdigitalmediatreatments(messaging,websites,impressions,displayadtype,displayadpagelocation,keywordsearches,socialmediaposts,dayparting,etc.)onaconversionorsalesevent.Thinkabouthowmanydifferentwebsites,displayads,andkeywordsearchesyouinteractwithasyoudecidetodosomething(e.g.,buyaproduct,requestsomecollateral,downloadanarticle,playagame,researchanevent,etc.).Attributionanalysislooksat“baskets”ofdigitalmediatreatmentsandactivitiesthatleadtoparticularconversioneventsacrossalargenumberofvisitorsandcreatescomplexdataenrichmentcalculations(frequency,recency,andsequencingofmarketingtreatments)inordertoattributesalescredittothesedifferentdigitalmediatreatments.Think“hockeyassist”asintryingtomeasuretheimpactthatawidevarietyofdigitalmediatreatmentshadoveraperiodoftimetodriveaconversionorsalesevent.1Followingisanexampleofhoworganizationsuseattribution

analysistomaximizecampaignreturnonmarketinginvestment(ROMI):

Digitalmediaattributionanalysis1.TrackActivitiesLeadingtoConversionEvents.Createmarketbasketsofkeywordsearches,sitevisits,displayimpressions,displayclicks,andothermediatreatmentsassociatedwitheachconversionevent

2.EnrichDatatoCreateNewMetricstoUnderstandDriversofVisitorBehaviors.Createmetricsaroundfrequencies,ordering,sequencing,andlatencies

3.AnalyzeMetricstoQuantifyCauseandEffect.Identifycommonalitiesinbaskets,calculatecorrelationsandstrengthofcorrelations,andbuild“conversionpath”models

4.OperationalizeActionableInsights.Operationalizeinsightsintomediaplanningandbuyingsystems,andguidein-flightcampaignexecution

Thatsameattributionanalyticswouldworkperfectlyintheareaofhealthcarewherephysicians,nurses,andothercaregiversaretryingtodetermineorattributetheimpactonapatient'swellnessacrossawidevarietyofhealthcare“treatments”includingmedications,surgery,supplements,therapy,diet,exercise,sleep,stress,religion,consoling,andmanyotherhealth-impactingvariables.Usingthedigitalmediaattributionanalytics,healthcareorganizationscoulddeterminewhichcombinations,frequency,recency,andsequencingofhealthcaretreatmentsaremosteffectiveforwhichtypesofpatientsinwhattypesofwellnesssituations.Butifhealthcareorganizationsonlylookwithintheirownindustry,theyarelikelytomissopportunitiestolearnfromotherindustries'analyticstoriesandmisstheopportunitytoapplythosestoriestooptimizetheirownkeybusinessprocesses,uncovernewmonetizationopportunities,andgainacompetitiveedgewithintheirindustry.

Thesethreebusinessentityanalyticsbucketswillhelpyouseethattheusecasetypeismorerelevantthantheindustryfromwhichitcame;thatitprovidesa“thinkdifferently”momenttoborrowanalyticbestpracticesfromotherindustries.Let'sdiscusseachofthesethreecategoriesinmoredetailtoseewhatstoriesyoumightuncoverthatcouldbemeaningfultoyourorganization:

Customerandemployeeanalytics

Productanddeviceanalytics

Networkandoperationalanalytics

CustomerandEmployeeAnalyticsFororganizationsinbusiness-to-consumer(B2C)industries,understandingandtakingcareofcustomersisjob#1.Understandingindetailthepropensities,tendencies,patterns,interests,passions,affiliations,andassociationsofeachofyourindividualcustomersiskeytoincreasingrevenue,reducingcosts,mitigatingrisks,andimprovingmarginsandprofits.

Customerscantakemanyformsincludingvisitors,passengers,travelers,guests,lodgers,patients,students,clients,residents,citizens,constituents,prisoners,players,andmore.ManyB2Cindustriescanbenefitdirectlyfromdataandanalyticsthatyieldsuperiorinsightsintothebehaviorsoftheircustomersincluding:

Retail

Restaurants

Travelandhospitality

Airlines

Automotive

Gaming

Entertainment

Banking

Creditcards

Financialservices

Healthcare

Insurance

Media

Telecommunications

Consumerelectronics(e.g.,computers,tablets,digitalcameras,digitalmediaplayers,GPSdevices)

Primaryandhighereducation

Utilities

Oilandgas

Publicserviceagencies

Governmentagencies

Thefoundationofcustomeranalyticsisidentifying,quantifying,andpredictingtheindividualcustomer'sbehavioralcharacteristics(propensities,tendencies,

patterns,trends,interests,passions,associations,andaffiliations)toidentifyopportunitiestoengagethecustomertoinfluencehisorherbehaviors.Somecallthis“catchingthecustomerintheact.”Themoretimelytheidentificationofthesecustomerinteractions,thebetterthechancesofuncoveringnewrevenueormonetizationopportunities.Customeranalyticsincludethefollowing:2

Customeracquisitionmeasurestheeffectivenessofdifferentsalesandmarketingtechniquestogetcustomerstosampleortrialyourproductorservice.

Customeractivationmeasurestheeffectivenessofdifferentsalesandmarketingtechniquestogetcustomerstoregularlyuseand/orpayforyourproductorservice.

Customercross-sellandup-sellmeasurestheeffectivenessofdifferentsales,marketing,andmerchandisingtechniquestogetcustomerstoupgradetheproductsandservicesthattheyalreadyuseorbuyand/orgetcustomerstouseorbuycomplementaryproductsandservices.

Customerretentionmeasurestheeffectivenessofsales,marketing,andcustomerservicetreatmentstoidentifycustomerslikelytoattriteandthesubsequenteffortstoretainthosecustomers.

Customersentimentmonitorsthesentimentofcustomersacrossmultiplesocialmediasites,blogs,consumercomments,ande-mailconversationstoflagproduct,service,oroperationalproblemareasandrecommendcorrectiveaction.

Customeradvocacymeasureshoweffectiveparticularcustomersareatinfluencingothercustomers'actionsorbehaviors.

NOTE

Someindustryestimatesshowthatjust3percentoftheparticipantsinanonlineconversationyieldover90percentoftheresults—suchaslikes,views,retweets,linkbacks—withinaparticularsubjectarea.

Customerlifetimevaluedeterminesthecurrent(andfutureormaximum)valueofaparticularcustomer.

Customerfraudmonitorsandflagspotentialfraudulentactivitiesinreal-timeinordertorecommendtimelycorrectiveorpreventiveaction.

Cohortanalysisdeterminestheimpactthatoneparticularcustomerhasonothercustomersindrivingparticularcustomerand/orgroupbehaviors.

Thereisalsoasetofcustomeranalyticsaroundmarketing.Thesemarketinganalyticsinclude:

Targetingeffectivenessmeasurestheeffectivenessofmarketing'stargeting

effortstoreachthe“right”orhighestqualifiedprospects.

Re-targetingeffectivenessmeasurestheeffectivenessofre-targetingeffortstore-targetprospectsthathaveshownaninterestinaparticularproductorservice.

Segmentationeffectivenessmeasurestheeffectivenessofsegmentationeffortstoidentifyhigh-valueprospectclusters.

Campaignmarketingeffectivenessmeasurestheeffectivenessofgeneralmarketingcampaignsatdrivingcustomerorprospectactions.

Directmarketingeffectivenessmeasurestheeffectivenessofdirect-to-consumermarketingcampaignstogetcustomerstorespondtomarketingrequestsorbuyparticularproductsorservices.

Promotionaleffectivenessmeasurestheeffectivenessofchannelorpartnerpromotionalactivities,events,packages,andoffers.

A/Btestingteststheeffectivenessoftwodifferentmarketingtreatments(messaging,adtypes,websites,keywords,daypart,andpagelocation)todeterminewhichmarketingtreatmentismosteffectiveindrivingthedesiredcustomeractionorbehavior.

Marketbasketanalysisdeterminesthepropensityofproductsorservicestosellincombinationwithotherproductsandservices(withinsamebasketorshoppingcart).Marketbasketanalysisalsocanidentifytimelagsbetweenpurchaseevents(buyaboatandthentwoweekslater,buywaterskis).

Attributionanalysisquantifiesthecontributionofdifferentdigitalmarketingormediatreatmentsindrivingacustomereventoractivity(e.g.,buyaproduct,downloadanapp,playagame,requestcollateral,researchanevent).

Omni-channelmarketinganalysisquantifiestheinter-playofmarketingeffectivenessacrossmultipleretailorbusinesschannels(e.g.,physicalstore,catalog,callcenter,website,socialmedia)indrivingsalesresults.

Tradepromotioneffectivenessmeasurestheeffectivenessofchannelorpartnerpromotionstodriveendconsumersales.

Pricingandyieldoptimizationdeterminesboththetimingandthe“optimal”pricesinordertomaximizerevenueandprofitabilityforperishableproductsorservices(vegetables,meat,airlineseats,hotelrooms,sportingevents,concerts).

Markdownmanagementoptimizationdeterminesthetimingandamountofpricereductionandpromotionstoreduceobsoleteandexcessinventorywhilebalancingrevenue,margin,andcostvariables.

Bytheway,manyofthesecustomeranalyticshaveacorollaryforemployeeanalytics(teachers,policeofficers,paroleofficers,caseworkers,physician,nurses,

technicians,mechanics,pilots,drivers,entertainers,etc.).Theseanalyticsinclude:

Employeeacquisition(hiring)measurestheeffectivenessofdifferenthiringpracticesandrecruitingpersonneltoidentifyandhirethemostproductiveandsuccessfulemployees.

Employeeactivation(productivityorperformance)measurestheeffectivenessoftrainingprogramsandmanagerstoengageemployeesanddrivemoreproductiveandeffectiveperformance.

Employeedevelopment(promotions,firing)measurestheeffectivenessofreviews,promotions,training,coaching,interventions,andmanagementtoidentifyandpromotehighpotentialemployeesandreleaselowproductivityemployeesatthelowestcostandlowestrisk.

Employeeretentionmeasurestheeffectivenessofpromotions,raises,awards,stockoptions,etc.toretaintheorganization'smostvaluableandproductiveemployees.

Employeeadvocacy(hiringreferrals)measurestheeffectivenessofadvocacyandreferralprogramstoacquirehighpotentialjobcandidates.

Employeelifetimevaluedeterminesorscoresthecurrent(andfutureormaximum)valueofemployeestotheorganization.

Employeesentiment(employeesatisfaction,“bestplacestowork”surveys,etc.)identifies,measures,andrecommendscorrectiveactiononthedriversofemployeeanddepartmentaldissatisfaction.

Employeefraud(shrinkage)monitorsandflagsshrinkageproblemsandtriagesthosesituationstoidentifyrootcausesoffraudandshrinkage.

Itcanbeusefultolookatwhatotherorganizationsinotherindustriesaredoingtobetterunderstandtheircustomersandemployees.Forexample,yourorganizationcouldidentifywhichorganizationsarebestatleveragingcustomerloyaltyprogramstodrivecustomeracquisition,maturation,retention,andadvocacy.Thenidentifywhatdatatheyarecapturingabouttheircustomersandwhatanalyticstheyareleveragingtoimprovethecustomerexperience.Therearemanyexamplesoforganizationsthatunderstandhowtooptimizetheirloyaltyprograms.Justgograbaventinon-fat,nowaterchailatteatacertaincoffeechaintoexperiencethatforyourself.

ProductandDeviceAnalyticsThesecondareaofbusinessentityanalyticsfocusesonphysicalitems—productsandmachines.Manyofthesamebehavioralanalyticbasicsthatareusedincustomeranalyticsareapplicableforproductsandmachines.Likehumans,productsandmachinesexhibitdifferentbehavioraltendencies,especiallyovertime.Twowindturbinesmanufacturedbythesamemanufacturer,installedatthesametime,andlocatedinthesamecornfieldcoulddevelopverydifferentbehaviorsandtendenciesovertimeduetousage,maintenance,upgrades,andgeneralproductwearandtear.

Analyticsaboutproductsandmachines(airplanes,jetengines,cars,deliverytrucks,locomotives,ATMs,washingmachines,routers,trafficlights,windturbines,powerplants,etc.)couldincludeanyofthefollowing:

Predictivemaintenancepredictswhencertainproductsordevicesareinneedofmaintenance,whatsortofmaintenance,thelikelymaintenanceandreplacementmaterials,andtechnicianskillsets.

Maintenanceschedulingoptimizationoptimizestheschedulingofresources(technicianswiththerightskillsets,replacementparts,maintenanceequipment,etc.)inordertooptimizethereplacementand/orupgradingoffailingorunder-performingpartsorproducts.

Maintenance,repair,andoperations(MRO)inventoryoptimizationbalancesMROinventorywithpredictedmaintenanceneedsinordertoreduceinventorycostsandminimizeobsoleteandexcessiveinventory.

Productperformanceoptimizationoptimizesproductperformanceandmeantimebetweenmaintenance(MTBM)byunderstandingtheproduct'sordevice'soptimaloperationperformanceranges,tolerances,andvariances.

Manufacturingeffectivenessreducesmanufacturingcostswhilemaintainingproductqualitylevelsandproductionschedulesthroughtheoptimalmixofsupplies,suppliers,andin-houseandcontractmanufacturingcapabilities.

Supplierperformanceanalyticsquantifysupplierproductqualityanddeliveryreliabilityinordertominimizemanufacturinglinedowntown.

Supplierdecommits/recommitsanalyticsunderstandoptimalproductioncapacitiesofsuppliersandcontractmanufacturersinordertoproperlyrebalancemanufacturingneedscausedbysupplychaindisruptions(strikes,storms,wars,rawmaterialshortages).

Suppliernetworkanalyticstriageproductandsupplierproblemsmorequicklybyunderstandingthedynamicsoftheunderlyingsupplierandcontractmanufacturerrelationshipsandinter-dependencies.

ProducttestingandQAeffectivenessacceleratesproductquality

assurancetestingbyoptimizingthetestsand/orcombinationsofteststhatcauseproducts,components,suppliers,andcontractmanufacturerstofailmorequickly.

Supplychainoptimizationoptimizessupplychaindeliveryandinventorylevelswhileminimizingsupplychaincostsandrisksassociatedwithobsoleteandexcessinventory.

OptimizeMROpartsinventorytodeterminetheappropriatelevelofMROpartsinventorybasedonpredictedmaintenanceneeds.

Newproductintroductionsoptimizeproductandmarketingmixtoincreasetheprobabilityofsuccesswhenlaunchingnewproducts,productextensions,and/ornewproductversions.

Productrationalization/retirementdetermineswhichproductstodivestorretire,andwhen,basedonthatproduct'simpactoncustomervalueandinter-relatedprofitabilityofotherproducts(marketbasketanalysis).

Brandandcategorymanagementanalysisdeterminesoptimalpricing,packaging,placement,andpromotionalvariablesofindividualbrandsandproductswithinbrandstodriveoverallbrandandcategoryrevenues,profitability,andmarketshare.

Product-centricindustriesmostimpactedbyproductanddeviceanalyticsinclude:

Consumerpackagedgoods

High-techmanufacturing

Applianceandelectronicsmanufacturing

Sportinggoodsmanufacturing

Foodandbeverage

Automotive

Agriculture

Farmmachinerymanufacturing

Heavyequipmentmanufacturing

Pharmaceuticals

Financialservices

Banking

Creditcards

Insurance

NetworkandOperationalAnalyticsThethirdareaofbusinessentityanalyticsfocusesonnetworkandoperationalanalytics.The“internetofthings”(IoT)andwearablecomputing(Fitbit,Jawbone,Garmin)hasincreasedthelevelofinterest(andthevolumeandvarietyofdata)aboutwhatishappeningacrossvastandcomplexhumanandmachine/devicenetworks.Morethanever,weareaninterconnectedworldwheretheactionsofonepersonordeviceinasocialorphysicalnetworkcanhavea“butterflyeffect”onallofthepeopleanddevicesacrossthatnetwork.3

NetworkscantakemanydifferentshapesandformsincludingATMnetworks,retailbranches,suppliernetworks,devicesensors,in-storebeacons,mobiledevices,cellulartowers,trafficlights,slotmachines,andcommunicationnetworks.

Analyticsaboutnetworksandoperationscouldincludeanyofthefollowing:

Demandforecastingforecastsnetworkdemand(averagedemand,surgedemand,minimalviabledemand)basedonpredictednetworkusagebehaviors,patterns,andtrends.

Capacityplanningpredictsnetworkcapacityrequirementsinallpotential(whatif)workingsituations.

Reduceunplanneddowntimetoidentify,monitor,andpre-emptivelypredictthefailureofthedriversofunplannednetworkdowntime.

Networkperformanceoptimizationpredictsandoptimizesnetworkperformanceacrossmultipleusagescenarios(networktraffic,weather,seasonality,holidays,specialevents)inreal-time.

Networklayoutoptimizationoptimizesnetworklayoutinordertominimizetrafficbottlenecksandoptimizenetworkbandwidthandthroughput.

Reducenetworktraffictotriagenetworktrafficbottlenecksandprovidereal-timeincentivesand/orgovernorstoreduceorre-routetrafficduringoverloadsituations.

Loadbalancingidentifiesandrebalancesnetworktrafficbasedoncurrentandforecastedtrafficneedsandcurrentnetworkcapacity.

Theftandrevenueprotectionidentifies,understands,andrecommendsthemostappropriaterevenueprotectionactionsbasedontheftsituationsacrossthenetwork.

Predictivemaintenancepredictswhennetworknodesareinneedofmaintenance,whatsortofmaintenance,thelikelymaintenanceandreplacementmaterials,andtechnicianskillsets.

Networksecurityidentifies,understands,andrecommendsthemostappropriateactionsbasedonunauthorizednetworkordevice/nodeentryor

usagesituationsacrossthenetwork.

Industriesmostimpactedbynetworkandoperationalanalyticstendtobeindustriesthatrunormanagecomplexprojectsorsystems.Theseindustrieshavetocoordinatemultiplevendorsandsuppliersacrossmultiplesub-assembliesorsub-projectsinordertodelivertheendproductorprojectontimeandwithinbudget.Someoftheseindustriesinclude:

Large-scaleconstruction(skyscrapers,malls,stadiums,airports,dams,bridges,tunnels,etc.)

Airplanemanufacturing

Shipbuilding

Defensecontractors

Systemsintegrators

Telecommunicationnetworks

Railroadnetworks

Transportationnetworks

Therearemany,manymoreexamplesofcustomer,product,andnetworkanalytics.Thelistaboveisagoodstarterpoint.Andwhileinvestigatinganalyticusecaseswithinyourownindustryis“safe,”betterandpotentiallymoreimpactfulanalyticusecasescanlikelybefoundbylookingforcustomer,product,andnetworkanalyticsuccessstoriesinotherindustries.Bucketingtheanalyticusecasesintothosethreecategorieshelpsthereadertocontemplateawidervarietyofanalyticopportunitiesandbestpracticesacrossdifferentindustries.

Thinkdifferentlywhenyouareinsearchoftheanalyticsthatmaybemostimpactfultoyourorganization.Don'tassumethatyourindustryhasalltheanswers.

CharacteristicsofaGoodBusinessStoryThefinalstepinthebookistopulltogetherthe“thinkinglikeadatascientist”resultsandthesampleanalyticstocreateastorythatisinterestingandrelevanttoyourorganization.Whileitcanbeusefultohearaboutwhatotherorganizationsaredoingwithbigdataanddatascience,themostcompellingstorieswillbethosestoriesaboutyourorganizationthatmotivateyourseniorleadershiptotakeaction.

Youknowfromreadingbooksandwatchingmoviesthatthebeststorieshaveinterestingcharactersthathavebeenputintoadifficultsituation.Heck,thatsoundslikedatasciencealready.Tocreatecompellingstories,youaregoingtoneedthefollowingcomponentstocreateaninterestingandrelevantstorythatisuniquetoyourorganization(thinkabouttheprocessinrelationshiptoyourfavoritesciencefictionadventuremovie):

Keybusinessinitiative(survivalofthehumanrace)

Strategicnounsorkeybusinessentities(pilots,scientists,aliens)

Currentchallengingsituation(aliensaregoingtoconquerEarthandexterminatethehumanrace)

Creativesolution(infectthealienshipswithacomputervirusthatshutsdowntheirdefensiveshields)

Desiredgloriousendstate(aliensgettheirbuttskicked,andthewholeworldbecomesoneunitedbrotherhood)

Let'sseethisprocessinaction:

Let'ssaythatyourorganizationhasasakeybusinessinitiativeto“reducecustomerchurnby10percentoverthenext12months.”

Yourstrategicnounis“customer.”

Thecurrentchallengingsituationis“toomanyofourmostvaluablecustomersareleavingthecompanyandgoingtocompetitors.”

Thecreativesolutionis“developinganalyticsthatflagcustomerswhohaveahighpropensitytoleavethecompany,createacustomerlifetimevaluescoreforeachcustomer(sothatyourorganizationisnotwastingvaluablesalesandmarketingresourcessavingthe‘wrong’customers),anddelivermessagestofrontlineemployees(callcenterreps,salesteams,partners)withrecommendedofferstodelivertothecustomerifavaluablecustomerhasascorewithan‘atrisk’propensitytoleave.”

Thegloriousendstateis“dramaticincreaseintheretentionoftheorganization'smostvaluablecustomersthatleadstoanincreaseincorporateprofits,anincreaseincustomersatisfaction,andgenerousraisesforall!”

Thisisaneasyprocessifyouunderstandyourorganization'skeybusiness

initiativesorwhat'simportanttotheorganization'sbusinessleadership.

SummaryBroadenyourhorizonswithrespecttolookingforanalyticusecases.Insteadofjustlookingwithinyourownindustry,lookacrossdifferentindustriesforanalyticusecasesaround:

Customerandemployeeanalytics

Productanddeviceanalytics

Networkandoperationalanalytics

Sincethisisthelastchapterofthebook,putacherryonthetopofyourBigDataMBAbydevelopingacompellingandrelevantstorythatyoucansharewithinyourorganizationtomotivateseniorleadershiptoaction.Makethestorycompellingbytyingoneoftheaboveanalyticusecasestoyourorganization'skeybusinessinitiatives,andmakethestoryrelevantbyleveragingyour“thinkinglikeadatascientist”training.Thatwayyouensurethatalltheworkyouhaveputintoreadingthisbookanddoingthehomeworkcanleadtosomethingofcompellinganddifferentiatedvaluetotheorganization.Andheck,maybeyouwillgetapromotionoutofit!

Congratulations!Foraspecialsurprise,gotothisURL:www.wiley.com/go/bigdatamba.Anddon'tsharethisURLwithanyoneelse.Makeotherfolksreadtheentirebooktofindthis“Easteregg”surprise.

NowyouhaveearnedyourBigDataMBA!Goget'em!

HomeworkAssignmentUsethefollowingexercisestoapplywhatyoulearnedinthischapter.

Exercise#1:Identifyoneofyourorganization'skeybusinessinitiatives.

Exercise#2:Applythe“thinkinglikeadatascientist”approachtoidentifytherelevantbusinessstakeholders,keybusinessentitiesorstrategicnouns,keydecisions,potentialrecommendations,andsupportingscores.

Exercise#3:Nowcreateastorythatweavestogetheralloftheseitemswitharelevantanalyticsexamplethatcanhelpseniorleadershiptounderstandthebusinesspotentialandmotivatethemintoaction.Useyourstrategicnounstohelpyoufindsomerelevantanalyticusecasesoutlinedinthischapter.

Notes1Inhockey,a“hockeyassist”orcreditisgiventotheplayerwhogivesanassisttotheplayerwhogetstheultimateassistthatleadsdirectlytoanotherplayerscoringagoal.Thinkofthisasan“assisttoanassist”statistic.

2Thisisnotintendedtobeacomprehensivelistofcustomeranalytics,butitinsteadrepresentsasampleofthetypesofcustomeranalyticsforwhichorganizationsinbusiness-to-consumerindustriesshouldbeaware.

3Inchaostheory,the“butterflyeffect”isthesensitivedependenceoninitialconditionsinwhichasmallchangeinonestateofadeterministicnonlinearsystemcanresultinlargedifferencesinalaterstate.

BigDataMBADrivingBusinessStrategieswithDataScienceBillSchmarzo

BigDataMBA:DrivingBusinessStrategieswithDataScience

Publishedby

JohnWiley&Sons,Inc.

10475CrosspointBoulevard

Indianapolis,IN46256www.wiley.com

Copyright©2016byBillSchmarzo

PublishedbyJohnWiley&Sons,Inc.,Indianapolis,Indiana

PublishedsimultaneouslyinCanada

ISBN:978-1-119-18111-8

ISBN:978-1-119-23884-3(ebk)

ISBN:978-1-119-18138-5(ebk)

Nopartofthispublicationmaybereproduced,storedinaretrievalsystemortransmittedinanyformorbyanymeans,electronic,mechanical,photocopying,recording,scanningorotherwise,exceptaspermittedunderSections107or108ofthe1976UnitedStatesCopyrightAct,withouteitherthepriorwrittenpermissionofthePublisher,orauthorizationthroughpaymentoftheappropriateper-copyfeetotheCopyrightClearanceCenter,222RosewoodDrive,Danvers,MA01923,(978)750-8400,fax(978)646-8600.RequeststothePublisherforpermissionshouldbeaddressedtothePermissionsDepartment,JohnWiley&Sons,Inc.,111RiverStreet,Hoboken,NJ07030,(201)748-6011,fax(201)748-6008,oronlineathttp://www.wiley.com/go/permissions.

LimitofLiability/DisclaimerofWarranty:Thepublisherandtheauthormakenorepresentationsorwarrantieswithrespecttotheaccuracyorcompletenessofthecontentsofthisworkandspecificallydisclaimallwarranties,includingwithoutlimitationwarrantiesoffitnessforaparticularpurpose.Nowarrantymaybecreatedorextendedbysalesorpromotionalmaterials.Theadviceandstrategiescontainedhereinmaynotbesuitableforeverysituation.Thisworkissoldwiththeunderstandingthatthepublisherisnotengagedinrenderinglegal,accounting,orotherprofessionalservices.Ifprofessionalassistanceisrequired,theservicesofacompetentprofessionalpersonshouldbesought.Neitherthepublishernortheauthorshallbeliablefordamagesarisingherefrom.Thefactthatanorganizationorwebsiteisreferredtointhisworkasacitationand/orapotentialsourceoffurtherinformationdoesnotmeanthattheauthororthepublisherendorsestheinformationtheorganizationorwebsitemayprovideorrecommendationsitmaymake.Further,readersshouldbeawarethatInternetwebsiteslistedinthisworkmayhavechangedordisappearedbetweenwhenthisworkwaswrittenandwhenitisread.

ForgeneralinformationonourotherproductsandservicespleasecontactourCustomerCareDepartmentwithintheUnitedStatesat(877)762-2974,outsidetheUnitedStatesat(317)572-3993orfax(317)572-4002.

Wileypublishesinavarietyofprintandelectronicformatsandbyprint-on-demand.Somematerialincludedwithstandardprintversionsofthisbookmaynotbeincludedine-booksorinprint-on-demand.IfthisbookreferstomediasuchasaCDorDVDthatisnotincludedintheversionyoupurchased,youmaydownloadthismaterialathttp://booksupport.wiley.com.FormoreinformationaboutWileyproducts,visitwww.wiley.com.

LibraryofCongressControlNumber:2015955444

Trademarks:WileyandtheWileylogoaretrademarksorregisteredtrademarksofJohnWiley&Sons,Inc.and/oritsaffiliates,intheUnitedStatesandothercountries,andmaynotbeusedwithoutwrittenpermission.Allothertrademarksarethepropertyoftheirrespectiveowners.JohnWiley&Sons,Inc.isnotassociatedwithanyproductorvendormentionedinthisbook.

AbouttheAuthor

BillSchmarzoistheChiefTechnologyOfficer(CTO)oftheBigDataPracticeofEMCGlobalServices.AsCTO,BillisresponsibleforsettingthestrategyanddefiningthebigdataserviceofferingsandcapabilitiesforEMCGlobalServices.Healsoworksdirectlywithorganizationstohelpthemidentifywhereandhowtostarttheirbigdatajourneys.BillistheauthorofBigData:UnderstandingHowDataPowersBigBusiness,writeswhitepapers,isanavidblogger,andisafrequentspeakerontheuseofbigdataanddatasciencetopoweranorganization'skeybusinessinitiatives.HeisaUniversityofSanFranciscoSchoolofManagement(SOM)Fellow,whereheteachesthe“BigDataMBA”course.

Billhasoverthreedecadesofexperienceindatawarehousing,businessintelligence,andanalytics.HeauthoredEMC'sVisionWorkshopmethodologyandco-authoredwithRalphKimballaseriesofarticlesonanalyticapplications.BillhasservedonTheDataWarehouseInstitute'sfacultyastheheadoftheanalyticapplicationscurriculum.Previously,hewastheVicePresidentofAnalyticsatYahoo!andoversawtheanalyticapplicationsbusinessunitatBusinessObjects,includingthedevelopment,marketing,andsalesoftheirindustry-defininganalyticapplications.

Billholdsamaster'sdegreeinBusinessAdministrationfromtheUniversityofIowaandaBachelorofSciencedegreeinMathematics,ComputerScience,andBusinessAdministrationfromCoeCollege.Bill'srecentblogscanbefoundathttp://infocus.emc.com/author/william_schmarzo/[email protected]/in/schmarzo.

AbouttheTechnicalEditorJeffreyAbbottleadstheEMCGlobalServicesmarketingpracticearoundbigdata,helpingcustomersunderstandhowtoidentifyandtakeadvantageofopportunitiestoleveragedataforstrategicbusinessinitiatives,whiledrivingawarenessforaportfolioofservicesofferingsthatacceleratecustomertime-to-value.Asacontentdeveloperandprogramlead,Jeffemphasizesclearandconcisemessagingonpersona-basedcampaigns.PriortoEMC,Jeffhelpedbuildandpromoteacloud-basedecosystemforCATechnologiesthatcombinedanonlinesocialcommunity,aclouddevelopmentplatform,andane-commercesiteforcloudservices.JeffalsospentseveralyearswithinCA'sThoughtLeadershipgroup,creatingandpromotingexecutive-levelmessagingandsocial-mediaprogramsaroundmajordisruptivetrendsinIT.JeffhasheldvariousotherproductmarketingrolesatfirmssuchasEMC,Citrix,andArdenceandspentadecaderunningclientaccountsatnumerousboutiquemarketingfirms.JeffstudiedsmallbusinessmanagementattheUniversityofVermontandresidesinSudbury,MA,withhiswife,twoboys,anddog.Jeffenjoysskiing,backpacking,photography,andclassiccars.

CreditsProjectEditor

AdaobiObiTultonandChrisHaviland

TechnicalEditor

JeffreyAbbott

ProductionEditor

BarathKumarRajasekaran

CopyEditor

ChrisHaviland

ManagerofContentDevelopment&Assembly

MaryBethWakefield

ProductionManager

KathleenWisor

MarketingDirector

DavidMayhew

MarketingManager

CarrieSherrill

ProfessionalTechnology&StrategyDirector

BarryPruett

BusinessManager

AmyKnies

AssociatePublisher

JimMinatel

ProjectCoordinator,Cover

BrentSavage

Proofreader

NicoleHirschman

Indexer

NancyGuenther

CoverDesigner

Wiley

CoverImage

©STILLFX/iStockphoto

AcknowledgmentsAcknowledgmentsaredangerous.NotdangerouslikewrestlinganalligatororanunhappyChicagoCubsfan,butdangerousinthesensethattherearesomanypeopletothank.HowdoIpreventtheAcknowledgmentssectionfrombecominglongerthanmybook?Thisbookrepresentsthesumofmany,manydiscussions,debates,presentations,engagements,andlatenightbeersandpizzathatIhavehadwithsomanycolleaguesandcustomers.Thankstoeveryonewhohasbeenonthisjourneywithme.

SorealizingthatIwillmissmanyfolksinthisacknowledgment,hereIgo…

Ican'tsayenoughaboutthecontributionsofJeffAbbott.NotonlywasJeffmyEMCtechnicaleditorforthisbook,buthealsohastheunrewardingtaskofeditingallofmyblogs.Jeffhasthepatiencetoputupwithmywritingstyleandthesmartstoknowhowtospinmymaterialsothatitisunderstandableandreadable.Ican'tthankJeffenoughforhispatience,guidance,andfriendship.

JenSorenson'sroleinthebookwasonlysupposedtobeEMCPublicRelationseditor,butJendidsomuchmore.TherearemanychaptersinthisbookwhereJen'ssuggestions(usingtheFairy-TaleThemeParksexampleinChapter6)madethechaptersmoreinteresting.Infact,Chapter6isprobablymyfavoritechapterbecauseIwassoovermyskisonthedatasciencealgorithmsmaterial.ButJendidamarvelousjoboftakingadifficulttopic(datasciencealgorithms)andmakingitcometolife.

Speakingofdatascience,PedroDeSouzaandWeiLinarethetwobestdatascientistsIhaveevermet,andIamevenmoregratefulthatIgettocallthemfriends.Theyhavebeenpatientinhelpingmetolearntheworldofdatascienceoverthepastseveralyears,whichisreflectedinmanychaptersinthebook(mostnotablyChapters5and6).Butmorethananythingelse,theytaughtmeaveryvaluablelifelesson:beinghumbleisthebestwaytolearn.Ican'tevenexpressinwordsmyadmirationforthemandhowtheyapproachtheirprofession.

JoeDossantosandJoshSiegelmaybesurprisedtofindtheirnamesintheacknowledgments,buttheyshouldn'tbe.BothJoeandJoshhavebeenwithmeonmanystepsinthisbigdatajourney,andbothhavecontributedtremendouslytomyunderstandingofhowbigdatacanimpactthebusinessworld.Theirfingerprintsarealloverthisbook.

AdaobiObiTultonandChrisHavilandaremytwoWileyeditors,andtheyareabsolutelymarvelous!Theyhavegoneoutoftheirwaytomaketheeditingprocessaspainlessaspossible,andtheyunderstandmyvoicesowellthatIacceptedover99percentofalloftheirsuggestions.BothAdaobiandChrisweremyeditorsonmyfirstbook,soIguesstheyforgothowmuchofaPITA(paininthea**)Icanbewhentheyagreedtobetheeditorsonmysecondbook.ThoughIhavenevermetthemface-to-face,IfeelastrongkinshipwithbothAdaobiandChris.Thanksforallofyourpatienceandguidanceandyourwonderfulsensesofhumor!

AveryspecialthankyoutoProfessorMouwafacSidaoui,withwhomIco-teachtheBigDataMBAattheUniversityofSanFranciscoSchoolofManagement(USFSOM).Icouldnotpickabetterpartnerincrime—heissmart,humble,demanding,fun,engaging,worldly,andeverythingthatonecouldwantinafriend.IamaFellowattheUSFSOMbecauseofMouwafac'sefforts,andhehassetmeupformynextcareer—teaching.

IalsowhattothankDeanElizabethDavisandtheUSFMBAstudentswhowerewillingtobeguineapigsfortestingmanyoftheconceptsandtechniquescapturedinthisbook.Theyhelpedmetodeterminewhichideasworkedandhowtofixtheonesthatdidnotwork.

AnotherspecialthankyoutoEMC,whosupportedmeasIworkedattheleadingedgeofthebusinesstransformationalpotentialofbigdata.EMChasaffordedmethelatitudetopursuenewideas,concepts,andofferingsandinmanysituationshasallowedmetobethetipofthebigdataarrow.Icouldnotaskforabetteremployerandpartner.

ThethankyoulistshouldincludetheexcellentandcreativepeopleatEMCwithwhomIinteractonaregularbasis,butsincethatlististoolong,I'lljustmentionEd,Jeff,Jason,Paul,Dan,Josh,Matt,Joe,Scott,Brandon,Aidan,Neville,Bart,Billy,Mike,Clark,Jeeva,Sean,Shriya,Srini,Ken,Mitch,Cindy,Charles,Chuck,Peter,Aaron,Bethany,Susan,Barb,Jen,Rick,Steve,David,andmany,manymore.

Iwanttothankmyfamily,whohasputupwithmeduringthebookwritingprocess.MywifeCarolynwasgreataboutgrabbingChipotleformewhenIhadatoughdeadline,andmysonsAlecandMaxandmydaughterAmeliaweresupportivethroughoutthebookwritingprocess.I'vebeenblessedwithamarvelousfamily(juststopstealingmyChipotleintherefrigerator!).

Mymomanddadbothpassedaway,butIcanimaginetheirlookofsurpriseandprideinthefactthatIhavewrittentwobooksandamteachingattheUniversityofSanFranciscoinmysparetime.Wewillgetthechancetotalkaboutthatinmynextlife.

Butmostimportant,IwanttothanktheEMCcustomerswithwhomIhavehadthegoodfortunetowork.Customersareatthefrontlineofthebigdatatransformation,andwherebettertobesituatedtolearnaboutwhat'sworkingandwhat'snotworkingthenarm-in-armwithEMC'smostexcellentcustomersatthosefrontlines.Trulythebestpartofmyjobisthechancetoworkwithourcustomers.Heck,I'mwillingtoputupwiththeairlinetraveltodothat!

WILEYENDUSERLICENSEAGREEMENTGotowww.wiley.com/go/eulatoaccessWiley'sebookEULA.