techconnex big data series - big data in banking

Post on 26-Jan-2017

354 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Big Data in BankingRisk Systems Perspective

AndreLangevinlangevin@utilis.ca

www.swi.com

AgendaØ BigDataattheBig6Ø RDARDataHubsØ LessonsLearned(sofar)Ø TechnologyThemesin2016

Animportant noteaboutthispresentation:inorder torespectthecommercialinterestsandprivacyofmyclients,Ihaverefrainedfromusingspecificcompanynames,unlessinformationispubliclyavailable.

Big Data at the Big 6

RDARRDrivesBig6AdoptionØ RDARRisamandatoryregulatoryproject:

v Regulatoryresponseto2008creditcrisisv Requiresre-buildofdatagatheringandregulatoryreportingtoimplement

measurabledataquality,operationalmetadataandauditabledatalineagev Regulatoryenforcementstartsin2017

Ø Big6 ITspendof~$800MMoverthreeyearsonRDARRv CombinedBig6ITspendonallRiskSystemsprojectsis~$400MMperyearv RDARRspendhaslargelybeenincremental– otherregulatoryinitiativeshave

continuedtodriveprojectspendseparatefromRDARR

Ø HadoopdatahubisatypicalRDARRsolutionelement

TheinvestmentspendbyG-SIBsonRDARRisverysignificant,averagingUS$230MMperbank.Theseinvestmentcostsarelikelytoincrease.

OliverWyman“BCBS239:LearningfromthePrimeMovers”

AllofCanada’sBig6banksweredesignatedasDomesticallySystematicallyImportantBanks(D-SIBS)byOSFI,meaningtheymustfullycomplywithBCBS-239.

Big6HadoopRiskApplicationsØ Manyprojectsareunderway,butrelativelyfewareinproduction:

v Plansforenhancedmodelbuildingandanalyticsforretailbankingfollowing2016RDARRdeadlinev CapitalMarketshasbeenleadingdriverofHadoopadoptionforcomputeapplications

Ø RiskSystemsteamshavestartedbuildingHadoop-basedapplications:v VolckerRuleComplianceMetrics(e.g.RENTD)v PortfolioStressTestingv MarketRiskVaR Historyv On-DemandRisk

Ø TradingFloorRiskManagershaveinstalledstand-aloneHadoopinstances:v Oftencloud-based, usedinspecializedanalysisofderivativesensitivities orhistoricalmarketdata

ImportingUSRiskApplicationsØ ExpecttoseemoreriskapplicationspioneeredbyleadingUSbanks:

v TradingStrategyBackTestingv GranularCapital,CVAandMarketRiskTrendingv CapitalMarketsDealerCompliancev CreditAdjudicationModelsv BehavioralModels(OftenforCollections)v Fast-timeTransactionalFraudDetectionv AMLv CommercialCreditNetworkAnalysis

Big6VendorAlignmentsØ Bankshaveeachchosenastrategic

Hadoopvendor:v TD,CIBCandNBuseClouderav RBCandBNSuseHortonworksv BMOusesPivotal(Hortonworks)

Ø “Landgrab”amongvendors:v Multi-yearsubscription dealsatlargediscountsto

lockincustomers

Ø IBMstrugglingforsharedespiteentrenchedstartingposition:

v LackofSASsupportwasashowstopper

ForresterWaveQ12014

DeploymentPatternsØ Mixofvirtualandphysicalserverdeployments:

v CiscoUCSandVMWarevSphereareleadinginfrastructurechoices

Ø Manybanksreportusingmultiplegridsalignedtobusinessunits*:v Toolstomanagemulti-tenancyonHadooparestillnascentv Organizationalissues(costallocation,supportteamalignments)inhibitshareddeployments

Ø Vendorcommunityhasinvestedheavilyinclouddeploymenttools:v One-clickdeploymentsofallmajorHadoopdistributions areavailableonpublicclouds

Ø Bankslookingat“hubandsandbox”deploymentsonprivateclouds:v PopularpatterninestablishedUSdeploymentsv Big6allhaveabuiltinternalprivatecloudoraccesstoonethroughamajorinfrastructureproviderv NotableS3/AWSdeploymentbyUSregulatorFINRAsetsthestandard

*HortonworksCAB

RDARR Data Hubs

TypicalRDARRDataHubØ RDARRfocusdrivesDataHubsolutioncharacteristics:

v RDARRobjectiveisauditablebatchreporting– tiedintocentrallineageandmetadatasolutionsv Littleconsiderationofunstructuredorreal-timedatasourcesv Oftencharacterizedasaraw-datalandingzoneforotherwiseinaccessiblemainframedatav ResistancetofullyadoptHadoopasadatahub– oftenpairedwithlegacydatabasehubs

Ø Retaildatafocusdrivesemphasisonsecurityv PIPEDA/GBLcompliancedeemedcriticaldespitelittletonouseofPII/PCIdatainreportsv SOXcompliancemandatory

Ø Architectureteamsarethedominantviewindatahubprojectsv Business sponsorisoftenanewlyestablishedDataManagementOfficev Focusoncostandprocessoptimizationofdataflowstodownstreamreportingsolutions

Ø Internalbuild– lowtonoadoptionofcommercialhubsolutions

RDARRDataHubChallengesØ HadoopDataGovernanceisearlystageandpoorlyintegrated:

v NogoodHadoopsolution todatagovernance(yet)v DatalinageisatthefilelevelinHadoop– notsuitableforRDARRcriticaldataelementtraceabilityv Policy-baseddataaccesssolutions stillindevelopment(e.g.Navigator,Atlas)

Ø EnterpriseETLtoolsnotHadoopenabled:v Manytoolsunabletopush transformationworktoHadoop(oronlyasrudimentaryHiveSQL)v PerformanceofestablishedETLtoolsoftenpooronHadoop

Ø Earlymoverpenalty:Hadoop2.xincludedsolutionstomanyearlysecurityandoperationalproblems“inthebox:”

v Projectswith2013startdateswerebasedonHadoop1.x– andsoareusually Cloudera-basedv EstablishedUSbankingshopsareusuallyonCloudera orMapR implementationsforsamereason

LeavingBusinessValueontheTableØ Rudimentarygovernanceandsecuritytoolsproducea

biasagainstself-serveaccesstodata:v Transfermodellingandanalyticusers’ frustrationswithexistingdata

warehousesolutions toanewplatformv PII/PCIdatacontrolsolutions canpreventdeploymentofanalyticaltools

Ø Designforstaticregulatoryreportingobjectivesignoreshigh-valueinteractiveexplorationanddiscoveryuses:

v Standardizedreportingschemas(suchasIBMBDW)havelimitedvaluetoriskmodelersandanalysts

Ø FocusonmeetingoperationalSLAsoversharingofgrids

BanksarestrugglingtounderstandtheconcretebusinessimpactassociatedwithBCBS239;nearly70percentofdomesticsystemicallyimportant banks(D-SIBs)andhalfofG-SIBshavenotquantifiedthebenefits.

OliverWyman“BCBS239:LearningfromthePrimeMovers”

Lessons Learned (so far)

ChoosingaHadoopDistributionØ Maximizeyourexposuretochange:

v Hadoopmovesatveryfastpace:expecttodeploy ameaningfulupdateevery3-6monthsv AvoiddesignsandproductsthattrytoencapsulateHadoop– theyfallbehind fasterthanyoucan

recoveryourinvestment

Ø Legacytoolcompatibilityisimportant:v SAScompatibilityiscritical(eventhoughSASdoesn’tintegratewellwithHadoop)v DoesyourorganizationhaveDB2orPL/SQLskills topreserve?

Ø It’snotaseasytoswitchdistributionsasyouthinkØ Waitforthefeaturesyouliketobecomefree:

v Stronghistoryoftheopen-sourcedistribution incorporatingfeaturesthatwerepreviouslyproprietary– newervendorsattackincumbentsbyproducingopen-sourcereplacementsforproprietaryextensions

DataEngineeringØ Riskmodellingisoftenveryinefficient:

v Aquantitativemodelertypicallyspends 80%oftheirtimedatagatheringandpreparingdatav Specializeddatapreparationisoftendifficulttorepeatinproductionenvironments

Ø DataEngineeringacceleratesquantitativemodelling:v Advancedresearchlabshiredataengineerstosupporttheirquantitativemodelersv DataEngineersareahybridofcomputerprogrammerandmathematician:theyuseIT-friendlytools

tosourceandpackagedataintoformsthataretailoredtothemodeler’stoolset(e.g.buildingasmoothingatimeseries)

v Marketingteamsusea1:5ratioofmodelersanddataengineers– but10:1iscommononthe“buyside”andsoisabetterstaffingtargetforabank

Ø Datahubsshouldtargetdataengineersasusers:v Buildsophisticatedtoolsforexpertconsumers,ratherthanrudimentarytoolsforcasualusers

DeveloperLessonsLearnedØ ProductivityandperformanceimprovewithnativeHadooptools:

v The“Hadoopedition”ofmostlegacyETLpackagesperformslowlyandarepoorlyintegratedwithHadoop– youareusually justbuyinganHDFSadapter

Ø Learnthenativetools– it’seasierthanyouthink:v AJavaprogrammercanlearnMap/Reduceinaweekv Mostend-usersalreadyknowhowtouseSQLandpython

Ø UsePigtotuneyourSQLqueries:v ThebestoptimizationforHiveSQLisoftentostructuredataoningestioninaHadoop-friendlyway

Ø YouwillfindlotsofsmallbugsinHadoop:v YourHadoopvendor’ssupportteamareacriticalresourcetoyoursuccess

RiskArchitectureInsightsØ Hadoopisacomputegrid:

v Yarnisafunctionally equivalenttoDataSynapseorPlatformSymphony

Ø Youcanwrapmostcomputationsusingmap/reduce:v Writingamap/reducewrappertofeeddatatoyourC#,Java,C++,or

pythonapplicationsissurprisinglyeasy– ahundredlinesofcodeusuallydoesit

Ø UseHadooptobringthecomputationtothedata:v Re-processyourdatafilesintocomputationallyefficientHDFSblocksv Eliminatingmovementofdatainacompute-centricriskapplication

improvesperformancedramaticallyv Stillneedcachingofintermediatevaluationproducts(e.g.zerocurves)

InfrastructureLessonsLearnedØ Payattentiontothenetwork:

v Hadoopneedsafastnetworkbackbonebetweennodesv ApplicationsanddatabasesthatdrawdatafromHadoop(e.g.

Tableau)should beco-located

Ø Hadoopgridsshouldcostlessthan$1,000/TB:v Includinghardwareandsupport subscriptionforamajorHadoop

distributionv Hadoopreferenceconfigurationsarebasedonmid-pricecommodity

hardware,sousethatv Virtualizationwillprovidecheaperinfrastructure,buthighernode

countsoffsetsavingsbydrivingupsupport subscriptioncosts

StorageCosts(TB)

Hadoop $1,000SAN $5,000Database $12,000

InformationWeek07/27/2012

InfrastructureLessonsLearnedØ Don’ttrytopreventinfrastructurefailure:

v Hadoopisveryfaulttolerant– itisdesignedtohandleanannualequipmentfailurerateof8%v Donotusefaulttoleranthardware– useJBODinsteadofRAIDarraysv Awell-designedHadoopgridwillkeeprunningforthe24hoursittakesyourhardwarevendorto

replaceabrokenmachineunderanormalsupportcontract

Ø Thebestback-upforHadoopisHadoop:v Hadoopisthecheapestformofon-linestorageavailable,andiscost-competitiveandmore

reliablethantape.v ReplicateyourHadoopgridtoasecondgridatadifferentsiteforahigh-gradedisasterrecovery

solution.

Technology Themes in 2016

TechnologyThemesfor2016Ø Mix-and-matchSQLengines:

v NativeHadoopSQLengineslackmanyadvancedfeaturesindatabaseSQLenginesv OracleandIBMareunbundling theirHadoopimplementationsofPL/SQLandDB2v Oracle’sPL/SQLengineforHadooprunsonCloudera andcouldbeavailableonHortonworksv IBMisreleasingBigSQL (DB2)forODP– meaningitwon’tbeavailableonCloudera

Ø OpenDataPlatform:FUDorfantastic?v PivotalhasusedODPtopartnerwithHortonworksandfocusontheirothertoolsv IBMhaspromisedtoreleasealloftheirdatasciencetoolsforODP,buthasbeenslowtodeliver

Ø IBM“allin”onSpark:v IBM’sdatasciencetools(e.g.BigR)complementtypicalSparkusecases(e.g.clustering)

Ø TableaudisplacingCognos&BOBJ

DataGovernanceThemesfor2016Ø NativeHadoopDataGovernance:

v HortonworkshaspartneredwithJPMorgan,MerckandAetnatobuildanadvancedHadoopdatagovernancesolution intheApacheAtlasproject

v AtlasisintendedtogovernHadoopdatainafederatedgovernancemodel– partneradoptionwilldrivesuccess

Ø FederatedDataGovernance:v TheBig6havealladoptedIBMIGCastheirenterpriseRDARR

lineageandmetadatasolution.v IBMprovidesRESTAPIstointegrateIGCwithnon-IBMproducts.v WillODPpartnersHortonworksandIBMmanagetoestablish

AtlasonIGCasthedefinitiveHadoopsolutioninadistributedgovernancemodel?

RiskTechnologyThemesfor2016Ø ModeldevelopmentonHadoop:

v AsRDARRdatahubshitcriticalmass,riskmodeldevelopmentwillgravitatetoHadoop-basedtools

Ø Notebookworkspaces:v IncreaseduseofHadoopmodellingenvironmentswilldrive

demandforNotebookenvironmentsbasedonJupyter andApacheZeppelin(e.g.IBMKnowledgeAnyhow)

Ø On-DemandRiskonHadoop:v Nextgenerationon-demand riskapplicationswillconverge

stand-alonecomputegridanddatacacheandpersistenceontoHadoopstacktoeliminatedatamovement– betterperformanceandlowercosts

Questions?

top related