techconnex big data series - big data in banking

24
Big Data in Banking Risk Systems Perspective Andre Langevin [email protected] www.swi.com

Upload: andre-langevin

Post on 26-Jan-2017

354 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: TechConnex Big Data Series - Big Data in Banking

Big Data in BankingRisk Systems Perspective

[email protected]

www.swi.com

Page 2: TechConnex Big Data Series - Big Data in Banking

AgendaØ BigDataattheBig6Ø RDARDataHubsØ LessonsLearned(sofar)Ø TechnologyThemesin2016

Animportant noteaboutthispresentation:inorder torespectthecommercialinterestsandprivacyofmyclients,Ihaverefrainedfromusingspecificcompanynames,unlessinformationispubliclyavailable.

Page 3: TechConnex Big Data Series - Big Data in Banking

Big Data at the Big 6

Page 4: TechConnex Big Data Series - Big Data in Banking

RDARRDrivesBig6AdoptionØ RDARRisamandatoryregulatoryproject:

v Regulatoryresponseto2008creditcrisisv Requiresre-buildofdatagatheringandregulatoryreportingtoimplement

measurabledataquality,operationalmetadataandauditabledatalineagev Regulatoryenforcementstartsin2017

Ø Big6 ITspendof~$800MMoverthreeyearsonRDARRv CombinedBig6ITspendonallRiskSystemsprojectsis~$400MMperyearv RDARRspendhaslargelybeenincremental– otherregulatoryinitiativeshave

continuedtodriveprojectspendseparatefromRDARR

Ø HadoopdatahubisatypicalRDARRsolutionelement

TheinvestmentspendbyG-SIBsonRDARRisverysignificant,averagingUS$230MMperbank.Theseinvestmentcostsarelikelytoincrease.

OliverWyman“BCBS239:LearningfromthePrimeMovers”

AllofCanada’sBig6banksweredesignatedasDomesticallySystematicallyImportantBanks(D-SIBS)byOSFI,meaningtheymustfullycomplywithBCBS-239.

Page 5: TechConnex Big Data Series - Big Data in Banking

Big6HadoopRiskApplicationsØ Manyprojectsareunderway,butrelativelyfewareinproduction:

v Plansforenhancedmodelbuildingandanalyticsforretailbankingfollowing2016RDARRdeadlinev CapitalMarketshasbeenleadingdriverofHadoopadoptionforcomputeapplications

Ø RiskSystemsteamshavestartedbuildingHadoop-basedapplications:v VolckerRuleComplianceMetrics(e.g.RENTD)v PortfolioStressTestingv MarketRiskVaR Historyv On-DemandRisk

Ø TradingFloorRiskManagershaveinstalledstand-aloneHadoopinstances:v Oftencloud-based, usedinspecializedanalysisofderivativesensitivities orhistoricalmarketdata

Page 6: TechConnex Big Data Series - Big Data in Banking

ImportingUSRiskApplicationsØ ExpecttoseemoreriskapplicationspioneeredbyleadingUSbanks:

v TradingStrategyBackTestingv GranularCapital,CVAandMarketRiskTrendingv CapitalMarketsDealerCompliancev CreditAdjudicationModelsv BehavioralModels(OftenforCollections)v Fast-timeTransactionalFraudDetectionv AMLv CommercialCreditNetworkAnalysis

Page 7: TechConnex Big Data Series - Big Data in Banking

Big6VendorAlignmentsØ Bankshaveeachchosenastrategic

Hadoopvendor:v TD,CIBCandNBuseClouderav RBCandBNSuseHortonworksv BMOusesPivotal(Hortonworks)

Ø “Landgrab”amongvendors:v Multi-yearsubscription dealsatlargediscountsto

lockincustomers

Ø IBMstrugglingforsharedespiteentrenchedstartingposition:

v LackofSASsupportwasashowstopper

ForresterWaveQ12014

Page 8: TechConnex Big Data Series - Big Data in Banking

DeploymentPatternsØ Mixofvirtualandphysicalserverdeployments:

v CiscoUCSandVMWarevSphereareleadinginfrastructurechoices

Ø Manybanksreportusingmultiplegridsalignedtobusinessunits*:v Toolstomanagemulti-tenancyonHadooparestillnascentv Organizationalissues(costallocation,supportteamalignments)inhibitshareddeployments

Ø Vendorcommunityhasinvestedheavilyinclouddeploymenttools:v One-clickdeploymentsofallmajorHadoopdistributions areavailableonpublicclouds

Ø Bankslookingat“hubandsandbox”deploymentsonprivateclouds:v PopularpatterninestablishedUSdeploymentsv Big6allhaveabuiltinternalprivatecloudoraccesstoonethroughamajorinfrastructureproviderv NotableS3/AWSdeploymentbyUSregulatorFINRAsetsthestandard

*HortonworksCAB

Page 9: TechConnex Big Data Series - Big Data in Banking

RDARR Data Hubs

Page 10: TechConnex Big Data Series - Big Data in Banking

TypicalRDARRDataHubØ RDARRfocusdrivesDataHubsolutioncharacteristics:

v RDARRobjectiveisauditablebatchreporting– tiedintocentrallineageandmetadatasolutionsv Littleconsiderationofunstructuredorreal-timedatasourcesv Oftencharacterizedasaraw-datalandingzoneforotherwiseinaccessiblemainframedatav ResistancetofullyadoptHadoopasadatahub– oftenpairedwithlegacydatabasehubs

Ø Retaildatafocusdrivesemphasisonsecurityv PIPEDA/GBLcompliancedeemedcriticaldespitelittletonouseofPII/PCIdatainreportsv SOXcompliancemandatory

Ø Architectureteamsarethedominantviewindatahubprojectsv Business sponsorisoftenanewlyestablishedDataManagementOfficev Focusoncostandprocessoptimizationofdataflowstodownstreamreportingsolutions

Ø Internalbuild– lowtonoadoptionofcommercialhubsolutions

Page 11: TechConnex Big Data Series - Big Data in Banking

RDARRDataHubChallengesØ HadoopDataGovernanceisearlystageandpoorlyintegrated:

v NogoodHadoopsolution todatagovernance(yet)v DatalinageisatthefilelevelinHadoop– notsuitableforRDARRcriticaldataelementtraceabilityv Policy-baseddataaccesssolutions stillindevelopment(e.g.Navigator,Atlas)

Ø EnterpriseETLtoolsnotHadoopenabled:v Manytoolsunabletopush transformationworktoHadoop(oronlyasrudimentaryHiveSQL)v PerformanceofestablishedETLtoolsoftenpooronHadoop

Ø Earlymoverpenalty:Hadoop2.xincludedsolutionstomanyearlysecurityandoperationalproblems“inthebox:”

v Projectswith2013startdateswerebasedonHadoop1.x– andsoareusually Cloudera-basedv EstablishedUSbankingshopsareusuallyonCloudera orMapR implementationsforsamereason

Page 12: TechConnex Big Data Series - Big Data in Banking

LeavingBusinessValueontheTableØ Rudimentarygovernanceandsecuritytoolsproducea

biasagainstself-serveaccesstodata:v Transfermodellingandanalyticusers’ frustrationswithexistingdata

warehousesolutions toanewplatformv PII/PCIdatacontrolsolutions canpreventdeploymentofanalyticaltools

Ø Designforstaticregulatoryreportingobjectivesignoreshigh-valueinteractiveexplorationanddiscoveryuses:

v Standardizedreportingschemas(suchasIBMBDW)havelimitedvaluetoriskmodelersandanalysts

Ø FocusonmeetingoperationalSLAsoversharingofgrids

BanksarestrugglingtounderstandtheconcretebusinessimpactassociatedwithBCBS239;nearly70percentofdomesticsystemicallyimportant banks(D-SIBs)andhalfofG-SIBshavenotquantifiedthebenefits.

OliverWyman“BCBS239:LearningfromthePrimeMovers”

Page 13: TechConnex Big Data Series - Big Data in Banking

Lessons Learned (so far)

Page 14: TechConnex Big Data Series - Big Data in Banking

ChoosingaHadoopDistributionØ Maximizeyourexposuretochange:

v Hadoopmovesatveryfastpace:expecttodeploy ameaningfulupdateevery3-6monthsv AvoiddesignsandproductsthattrytoencapsulateHadoop– theyfallbehind fasterthanyoucan

recoveryourinvestment

Ø Legacytoolcompatibilityisimportant:v SAScompatibilityiscritical(eventhoughSASdoesn’tintegratewellwithHadoop)v DoesyourorganizationhaveDB2orPL/SQLskills topreserve?

Ø It’snotaseasytoswitchdistributionsasyouthinkØ Waitforthefeaturesyouliketobecomefree:

v Stronghistoryoftheopen-sourcedistribution incorporatingfeaturesthatwerepreviouslyproprietary– newervendorsattackincumbentsbyproducingopen-sourcereplacementsforproprietaryextensions

Page 15: TechConnex Big Data Series - Big Data in Banking

DataEngineeringØ Riskmodellingisoftenveryinefficient:

v Aquantitativemodelertypicallyspends 80%oftheirtimedatagatheringandpreparingdatav Specializeddatapreparationisoftendifficulttorepeatinproductionenvironments

Ø DataEngineeringacceleratesquantitativemodelling:v Advancedresearchlabshiredataengineerstosupporttheirquantitativemodelersv DataEngineersareahybridofcomputerprogrammerandmathematician:theyuseIT-friendlytools

tosourceandpackagedataintoformsthataretailoredtothemodeler’stoolset(e.g.buildingasmoothingatimeseries)

v Marketingteamsusea1:5ratioofmodelersanddataengineers– but10:1iscommononthe“buyside”andsoisabetterstaffingtargetforabank

Ø Datahubsshouldtargetdataengineersasusers:v Buildsophisticatedtoolsforexpertconsumers,ratherthanrudimentarytoolsforcasualusers

Page 16: TechConnex Big Data Series - Big Data in Banking

DeveloperLessonsLearnedØ ProductivityandperformanceimprovewithnativeHadooptools:

v The“Hadoopedition”ofmostlegacyETLpackagesperformslowlyandarepoorlyintegratedwithHadoop– youareusually justbuyinganHDFSadapter

Ø Learnthenativetools– it’seasierthanyouthink:v AJavaprogrammercanlearnMap/Reduceinaweekv Mostend-usersalreadyknowhowtouseSQLandpython

Ø UsePigtotuneyourSQLqueries:v ThebestoptimizationforHiveSQLisoftentostructuredataoningestioninaHadoop-friendlyway

Ø YouwillfindlotsofsmallbugsinHadoop:v YourHadoopvendor’ssupportteamareacriticalresourcetoyoursuccess

Page 17: TechConnex Big Data Series - Big Data in Banking

RiskArchitectureInsightsØ Hadoopisacomputegrid:

v Yarnisafunctionally equivalenttoDataSynapseorPlatformSymphony

Ø Youcanwrapmostcomputationsusingmap/reduce:v Writingamap/reducewrappertofeeddatatoyourC#,Java,C++,or

pythonapplicationsissurprisinglyeasy– ahundredlinesofcodeusuallydoesit

Ø UseHadooptobringthecomputationtothedata:v Re-processyourdatafilesintocomputationallyefficientHDFSblocksv Eliminatingmovementofdatainacompute-centricriskapplication

improvesperformancedramaticallyv Stillneedcachingofintermediatevaluationproducts(e.g.zerocurves)

Page 18: TechConnex Big Data Series - Big Data in Banking

InfrastructureLessonsLearnedØ Payattentiontothenetwork:

v Hadoopneedsafastnetworkbackbonebetweennodesv ApplicationsanddatabasesthatdrawdatafromHadoop(e.g.

Tableau)should beco-located

Ø Hadoopgridsshouldcostlessthan$1,000/TB:v Includinghardwareandsupport subscriptionforamajorHadoop

distributionv Hadoopreferenceconfigurationsarebasedonmid-pricecommodity

hardware,sousethatv Virtualizationwillprovidecheaperinfrastructure,buthighernode

countsoffsetsavingsbydrivingupsupport subscriptioncosts

StorageCosts(TB)

Hadoop $1,000SAN $5,000Database $12,000

InformationWeek07/27/2012

Page 19: TechConnex Big Data Series - Big Data in Banking

InfrastructureLessonsLearnedØ Don’ttrytopreventinfrastructurefailure:

v Hadoopisveryfaulttolerant– itisdesignedtohandleanannualequipmentfailurerateof8%v Donotusefaulttoleranthardware– useJBODinsteadofRAIDarraysv Awell-designedHadoopgridwillkeeprunningforthe24hoursittakesyourhardwarevendorto

replaceabrokenmachineunderanormalsupportcontract

Ø Thebestback-upforHadoopisHadoop:v Hadoopisthecheapestformofon-linestorageavailable,andiscost-competitiveandmore

reliablethantape.v ReplicateyourHadoopgridtoasecondgridatadifferentsiteforahigh-gradedisasterrecovery

solution.

Page 20: TechConnex Big Data Series - Big Data in Banking

Technology Themes in 2016

Page 21: TechConnex Big Data Series - Big Data in Banking

TechnologyThemesfor2016Ø Mix-and-matchSQLengines:

v NativeHadoopSQLengineslackmanyadvancedfeaturesindatabaseSQLenginesv OracleandIBMareunbundling theirHadoopimplementationsofPL/SQLandDB2v Oracle’sPL/SQLengineforHadooprunsonCloudera andcouldbeavailableonHortonworksv IBMisreleasingBigSQL (DB2)forODP– meaningitwon’tbeavailableonCloudera

Ø OpenDataPlatform:FUDorfantastic?v PivotalhasusedODPtopartnerwithHortonworksandfocusontheirothertoolsv IBMhaspromisedtoreleasealloftheirdatasciencetoolsforODP,buthasbeenslowtodeliver

Ø IBM“allin”onSpark:v IBM’sdatasciencetools(e.g.BigR)complementtypicalSparkusecases(e.g.clustering)

Ø TableaudisplacingCognos&BOBJ

Page 22: TechConnex Big Data Series - Big Data in Banking

DataGovernanceThemesfor2016Ø NativeHadoopDataGovernance:

v HortonworkshaspartneredwithJPMorgan,MerckandAetnatobuildanadvancedHadoopdatagovernancesolution intheApacheAtlasproject

v AtlasisintendedtogovernHadoopdatainafederatedgovernancemodel– partneradoptionwilldrivesuccess

Ø FederatedDataGovernance:v TheBig6havealladoptedIBMIGCastheirenterpriseRDARR

lineageandmetadatasolution.v IBMprovidesRESTAPIstointegrateIGCwithnon-IBMproducts.v WillODPpartnersHortonworksandIBMmanagetoestablish

AtlasonIGCasthedefinitiveHadoopsolutioninadistributedgovernancemodel?

Page 23: TechConnex Big Data Series - Big Data in Banking

RiskTechnologyThemesfor2016Ø ModeldevelopmentonHadoop:

v AsRDARRdatahubshitcriticalmass,riskmodeldevelopmentwillgravitatetoHadoop-basedtools

Ø Notebookworkspaces:v IncreaseduseofHadoopmodellingenvironmentswilldrive

demandforNotebookenvironmentsbasedonJupyter andApacheZeppelin(e.g.IBMKnowledgeAnyhow)

Ø On-DemandRiskonHadoop:v Nextgenerationon-demand riskapplicationswillconverge

stand-alonecomputegridanddatacacheandpersistenceontoHadoopstacktoeliminatedatamovement– betterperformanceandlowercosts

Page 24: TechConnex Big Data Series - Big Data in Banking

Questions?