zero-revelaon regtech: detec+ng risk through corporate...
TRANSCRIPT
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
Zero-Revela+onRegTech:
Detec+ngRiskthroughCorporateEmails
SeoyoungKimSantaClaraUniversity
@R/FinanceConference
Chicago,ILMay19,2017
Jointworkwith:
SanjivDas(SCU)andBhushanKothari(GoogleInc.)
BigPicture
§ FinancialsareoIendelayedindicatorsofcorporatequality
§ Internaldiscussion(e.g.,emails)maybeusedasanearlywarningsystem
§ An automated plaRorm that parses emails and produces summarysta+s+cswouldbehighlyvaluable,since…
– It can analyze vast quan++es of textual not amenable to humanprocessing
– Itdoesnotrequirerevela+onof individualemailcontentexplicitlytomonitors/regulators
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
OurPurpose
§ Ourpurpose is toexplore thepredic+vepowerof informa+onconveyedbyemployeeemails
§ Specifically,weareinterestedin:
§ Thesen+mentconveyedbyemailcontent
§ Theinforma+onconveyedbystructuralcharacteris+cs,suchasemailvolumeorlength
§ Othernon-verbalindicatorsofpoten+altrouble(e.g.,shiIingemailnetworkpaYerns)
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
PreviewofResults
§ We find that the net sen+ment conveyed by Enron employee emailcontentisasignificantpredictorofstock-returnperformance
§ Interes+ngly, email lengthwas a stronger predictor of subsequent pricedeclinesthanthenetsen+mentconveyedbythemessagebodyitself.
§ Wealsoiden+fyotherpoten+alindicators/predictorsofescala+ngriskormalfeasance.
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
Data
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
TheEnronEmailCorpus
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
TheEnronEmailCorpus
§ Ini+alSample:– Approximately500,000emails
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
TheEnronEmailCorpus
§ Ini+alSample:– Approximately500,000emails
– January2000throughDecember2001
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
TheEnronEmailCorpus
§ Ini+alSample:– Approximately500,000emails
– January2000throughDecember2001
– First made publicly available by the Federal Energy Regulatory Commission (FERC)duringitsinves+ga+onofEnron
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
TheEnronEmailCorpus
§ Ini+alSample:– Approximately500,000emails
– January2000throughDecember2001
– First made publicly available by the Federal Energy Regulatory Commission (FERC)duringitsinves+ga+onofEnron
– SubsequentlyculledanddistributedbytheCarnegieMellonCALOproject
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
TheEnronEmailCorpus
§ Ini+alSample:– Approximately500,000emails
– January2000throughDecember2001
– First made publicly available by the Federal Energy Regulatory Commission (FERC)duringitsinves+ga+onofEnron
– SubsequentlyculledanddistributedbytheCarnegieMellonCALOproject
§ Caveats/Redac+ons– TheEnroncorpushasbeenscrubbedover+meforlegalreasonsandtohonorrequests
fromaffectedemployees.
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
TheEnronEmailCorpus
§ Ini+alSample:– Approximately500,000emails
– January2000throughDecember2001
– First made publicly available by the Federal Energy Regulatory Commission (FERC)duringitsinves+ga+onofEnron
– SubsequentlyculledanddistributedbytheCarnegieMellonCALOproject
§ Caveats/Redac+ons– TheEnroncorpushasbeenscrubbedover+meforlegalreasonsandtohonorrequests
fromaffectedemployees.
– Ex(1):user“fastow-a”isnotablymissing
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
TheEnronEmailCorpus
§ Ini+alSample:– Approximately500,000emails
– January2000throughDecember2001
– First made publicly available by the Federal Energy Regulatory Commission (FERC)duringitsinves+ga+onofEnron
– SubsequentlyculledanddistributedbytheCarnegieMellonCALOproject
§ Caveats/Redac+ons– TheEnroncorpushasbeenscrubbedover+meforlegalreasonsandtohonorrequests
fromaffectedemployees.
– Ex(1):user“fastow-a”isnotablymissing
– Ex(2): Email chaYer surrounding Mr. Skilling’s sudden resigna+on on 8/14/2001 hasbeenexpunged.
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
TheEnronEmailCorpus
§ Ini+alSample:– Approximately500,000emails
– January2000throughDecember2001
– First made publicly available by the Federal Energy Regulatory Commission (FERC)duringitsinves+ga+onofEnron
– SubsequentlyculledanddistributedbytheCarnegieMellonCALOproject
§ Caveats/Redac+ons– TheEnroncorpushasbeenscrubbedover+meforlegalreasonsandtohonorrequests
fromaffectedemployees.
– Ex(1):user“fastow-a”isnotablymissing
– Ex(2): Email chaYer surrounding Mr. Skilling’s sudden resigna+on on 8/14/2001 hasbeenexpunged.
– Overall,detailsregardingexclusioncriteriahavenotbeenmadepublic,andouranalysesshouldbeviewedasexploratoryandprescrip+ve
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
CuringtheData
§ Wefocuson“sent”emails(ratherthanallemails)inorderto…– AnalyzecontentspecificallywriYenbyEnronemployees
– Avoidprocessingthesamecontentmorethanonce
– i.e.,ifuser“lay-k”sendsanemailto“skilling-j”
§ Otherfiltersappliedtoremovenoisy(junk)mail:– Emailsgreaterthan3,000charactersinlength
– Emailssenttomorethan20recipients
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
OurFinalSample
§ Overall,weobtain…– TheEnronemailcorpusfromtheCarnegieMellonCSsite
– Stockpriceandstockreturninforma+onfromCRSP
– Newsar+clesfromFac+vaPRNewswire
– Sen+mentworddic+onariesfromtheHarvardInquirerandtheLoughranandMcDonaldsen+mentwordlists
§ FinalSample:– 144dis+nctemployees
– 113,266sentemails
– January2000throughDecember2001
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
EnronCodePipeline
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
S1.Rmd(readinallemails,preprocess)
S2.Rmd(processandparseallemails)
Mails2.RDataDataframewithheader,messagebody,date,from,to,cc,nchar,year,month,week.
Mails1.RData(1.3GB)>½millionemails
S3.RmdResolvemul+pleuseraccounts;summarystats;de-duplica+on;plotsover+me;moodscoring(weekxuser);calculatereturnsfromstockdata;regressionsofsen+mentandreturns;wordclouds;emailnetworks.
ENE_daily.csv(stockdata)
MoodScoredDF.RData(weeklysen+mentscoreandreturns)
Mails3.RData(allemailsdataandreturns)
S3_2.Rmd(networkanalysis)
Networkoutput(adjacencymatrices,stats,degreedistribu+on)
WeeklyReturn.csv(returns)
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
S4.Rmd(listof1000mostusedwordsover+me)
Mails3.RData(allemailsdataandreturns)
TDM.RData(termdocumentmatrix:76518wordsxweeks;1353negwords,288poswords,206uncwords,105weeks.)
WeeklyReturn.csv(returns)
MoodScoredDF.RData(weeklysen+mentscoreandreturns)
ui.R,server.R(Shinyapptoshowwordplayover+me)
S5.Rmd(topicanalysis)
FacEva_Extract.Rmd(Extractnewsar+clesfromFac+va) FacEva
FacEva_Analysis.Rmd(Processdatatodataframeanddoanalysis)
MoodScoredNews.RData(weeklysen+mentscore;POStagging)
TDMNews.RData(Weeklynews)
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
Analyses
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
Table1.SummarySta+s+csofSentMail
Theaverageemailis362charactersinlength,withamedianof163characters…
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
Table1.SummarySta+s+csofSentMail
…withanaverageof1.77recipientspersentmail.
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
Table1.SummarySta+s+csofSentMail
Manyemails(closeto11%)aresimplyforwardedwithoutaddedtext.
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
Figure1.AverageEmailLengthoverTime
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
Figure1.AverageEmailLengthoverTime
Year2000:
Year2001:
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
Figure1.AverageEmailLengthoverTime
Year2000:
Year2001:
Ini+ally,averageemaillengthisfairlystable,straddlingroughly400charactersperemail.
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
Figure1.AverageEmailLengthoverTime
Year2000:
Year2001:
Ini+ally,averageemaillengthisfairlystable,straddlingroughly400charactersperemail.
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
Figure1.AverageEmailLengthoverTime
Year2000:
Year2001:
MarkeddeclineinaveragelengthasweapproachEnron’sdemise(approx.50%).
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
Figure2.EmailSen+mentandDisagreementoverTime
NetSenEment:
Disagreement:
𝑃𝑜𝑠−𝑁𝑒𝑔/𝑃𝑜𝑠+𝑁𝑒𝑔
1− |𝑃𝑜𝑠−𝑁𝑒𝑔|/𝑃𝑜𝑠+𝑁𝑒𝑔
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
Figure2.EmailSen+mentandDisagreementoverTime
NetSenEment:
Disagreement:
𝑃𝑜𝑠−𝑁𝑒𝑔/𝑃𝑜𝑠+𝑁𝑒𝑔
1− |𝑃𝑜𝑠−𝑁𝑒𝑔|/𝑃𝑜𝑠+𝑁𝑒𝑔
Basedoncontext-dependentsen+mentdic+onariesforwordclassifica+on
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
Figure3.Fac+vaNewsCoverageoverTime
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
Figure3.Fac+vaNewsCoverageoverTime
Spikeinnumberofar+clesasweapproachEnron’sdemise.
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
Figure4.Fac+vaNewsSen+mentoverTime
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
Figure4.Fac+vaNewsSen+mentoverTime
Netsen+mentfrombody
Netsen+mentfromheader
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
Figure5.StockReturnsandNetSen+mentoverTime
Netsen+mentacrossEnronemployeeemailsover+me
MovingaveragestockreturnsforEnronover+me
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
Figure6.StockPricesandNetSen+mentoverTime
Netsen+mentacrossEnronemployeeemailsover+me
MovingaveragestockpriceforEnronover+me
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
Figure7.StockReturnsandEmailLengthoverTime
MovingaveragestockreturnsforEnronover+me
MovingaverageemaillengthacrossEnronemployeeemailsover+me
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
Figure8.StockPricesandEmailLengthoverTime
MovingaveragestockpriceforEnronover+me
MovingaverageemaillengthacrossEnronemployeeemailsover+me
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
Figure8.StockPricesandEmailLengthoverTime
MovingaveragestockpriceforEnronover+me
MovingaverageemaillengthacrossEnronemployeeemailsover+me
Perhaps,asrisk/malfeasanceescalates,emailsbecomeshorter,asemployeesarelesslikelytoincludedetailsinmessagesentviathecorporateserver
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
Table2.EmailContentandStockReturns
DependentVariable=StockReturnst
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
Table2.EmailContentandStockReturns
DependentVariable=StockReturnst
Onestdev(i.e.,0.019)decreaseinNetSen+mentisassociatedwitha4.5%declineinstockreturns…
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
Table2.EmailContentandStockReturns
DependentVariable=StockReturnst …butnolongersignificantwhenwecontrolforemaillength.
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
Table2.EmailContentandStockReturnsoverTime
DependentVariable=StockReturnst
Overall,20-characterdeclineinmovingaverageemaillengthisassociatedwitha1.17%declineinstockreturns.
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
Table3.EmailContentversusFac+vaNewsContent
DependentVariable=StockReturnst
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
Table3.EmailContentversusFac+vaNewsContent
DependentVariable=StockReturnst
Emailcontentcontainsmoreinforma+onthannews-headercontent….
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
Table3.EmailContentversusFac+vaNewsContent
DependentVariable=StockReturnst
….Butneitherissignificantwhenaccoun+ngforemaillength.
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
Table3.EmailContentversusFac+vaNewsContent
DependentVariable=StockReturnst
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
Table3.EmailContentversusFac+vaNewsContent
DependentVariable=StockReturnst
Ontheotherhand,emailcontentcontainslessinforma+onthancontentfromthenewsbody…(couldthisbeduetoredac+onsontheEnronemailcorpus?)
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
Table3.EmailContentversusFac+vaNewsContent
DependentVariable=StockReturnst
….But,again,neitherissignificantwhenaccoun+ngforemaillength.
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
SummaryandImplica+ons
§ Thus far,wehave shown that thenet sen+ment conveyedbyemployeesentmailsisasignificantpredictorofstock-returnperformance
§ Interes+ngly, email lengthwas a stronger predictor of subsequent pricedeclinesthanthenetsen+mentconveyedbythemessagebodyitself.
§ Overall,emailcontentmaybecontrolledormanipulated
– Thus, we are also (and perhaps evenmore!) interested in the non-verbal,interac+on-ornetwork-basedindicatorsofpoten+altrouble.
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
Addi+onalExplora+ons
Otherdimensionsripeforinves+ga+on….
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
Figure11.EmailNetworks
Year2000,Q4:
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
Figure11.EmailNetworks
Year2001,Q4:
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
Figure13.VocabularyTrends
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
Figure13.VocabularyTrends
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
Figure13.VocabularyTrends
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
Figure14.TopicAnalysisoverTime
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
Figure14.TopicAnalysisoverTime
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses
TopicsassignedvialatentDirichletalloca+on(LDA)
ConcludingRemarks
§ We introduce an automatedplaRorm to parse corporate email content,andwefindthatthenetsen+mentconveyedbyemployeesentmailsisa+melyindicatorofstock-returnperformance.
§ Non-verbal indicators, such as email length and network structure, arepar+cularlypromisingavenuestoexplore.
§ Overall, we suggest the promise of a regulatory technology (RegTech)approach by which to systema+cally parse email content and networkstructure to detect indicators of risk ormalfeasance on an ongoing andmore+melybasis.
Thankyou.
S. Kim / SCU 2017 Data Concluding Remarks Motivation Analyses