data science at insidesales.com

21
Applying data science to sales pipelines – for fun and profit Andy Twigg Chief Scientist

Upload: andy-twigg

Post on 16-Apr-2017

948 views

Category:

Technology


0 download

TRANSCRIPT

Applying data science to sales pipelines !– for fun and profit!

!Andy Twigg!

Chief Scientist!

WHY APPLY DATA SCIENCE TO SALES?!

Problem: sales teams are biased!!•  Unrealistic targets – “you must have 3x coverage”!•  Happy ears – “they said they’ll definitely buy it”!•  Sandbagging – reps want to look like heroes, so don’t report deals

until late in the quarter!

We should be able to remove these biases!

•  Stat: since 1995, CRM data has increased ~150x, but forecast accuracy has reduced by 10% !

!è data is available, but not helping!

PROBLEMS!

Opportunity Scoring!•  Pr(win) ?!•  Pr(win in quarter) ?!•  How does this compare to sales team commits?!•  Which deals can we influence most?!

Forecasting!•  How much will be won this quarter?!

SALES OPPORTUNITIES!

•  Opportunities are temporal, either open or closed. Once closed, either won/lost!•  Usually proceed through stages, except:!

•  Stages are a partial order - can skip / revisit!•  An opportunity can be entered as closed (no open observations)!

•  As the opportunity evolves, we get more and more data about the opportunity!•  Sales teams mark an opportunity ‘committed’ – they predict win within the quarter!•  A pipeline is a set of open opportunities!•  We want to estimate Pr(final outcome = won), Pr(closed before time t), …!

Lead created!

Stage: Qualifying!

Email sent! Email opened! Amount= $1000! Call! Stage:

Validate! Meeting! Demo! Close date!changed!

Stage: negotiation!

Outcome: Closed/won!

open closed

committed

•  sales team: good precision (~70-80%) but poor recall (~10-40%)!•  model won precision ~ sales team won precision!•  model won recall ~ 3 x sales team won recall!

First observation Last observation

precision recall F1 precision recall F1

model 0.65 0.86 0.74 0.75 0.93 0.83

sales team 0.70 0.07 0.13 0.87 0.45 0.59

ANATOMY OF AN OPPTY!

ANATOMY OF AN OPPTY!

Pushed out Pulled back

in

Final outcome: won

Committed here (by the sales rep)

ANATOMY OF AN OPPTY!

Pushed out Pulled back

in

Final outcome: won

Committed here (by the sales rep)

Predicted won from the start

Predicted won in the correct

quarter

SALES OPPORTUNITIES!

Lead created!

Stage: Qualifying!

Email sent! Email opened! Amount= $1000! Call! Stage:

Validate! Meeting! Demo! Close date!changed!

Stage: negotiation!

Outcome: Closed/won!

stat

e!

xt!

stat

e! …!

x0!

y=1!

Lead created!

Stage: Qualifying!

Email sent! Email opened! Amount= $1000! Call! Stage:

Validate! Meeting! Demo! Close date!changed!

Stage: negotiation!

Outcome: Closed/won!

SALES OPPORTUNITIES!

stat

e!

xt!

stat

e! …!

x0!

•  Sequence of observations x0, x1, … !•  associated with fixed target y={0,1}!•  Consider states as a MDP: state xt encodes temporal features

about previous states (cf RMF features)!•  # times this stage was previously visited, time between successive

visits, time in current stage, direction of amount change, …!

y=1!

•  Sequence of observations x0, x1, … !•  associated with fixed target y={0,1}!•  Consider states as a MDP: state xt encodes temporal features

about previous states (cf RMF features)!•  # times this stage was previously visited, time between successive

visits, time in current stage, direction of amount change, …!•  States also contain!

•  Sales-specific features e.g. momentum!•  External data e.g. firmographic!•  Global features e.g. avg_sales_cycle(target)!

•  Gives examples {(x0,y),(x1,y),…} for each opportunity!•  Shuffle to break correlations between successive examples!!

SALES OPPORTUNITIES!

y=1!

stat

e!

xt!

stat

e! …!

x0!

Lead created!

Stage: Qualifying!

Email sent! Email opened! Amount= $1000! Call! Stage:

Validate! Meeting! Demo! Close date!changed!

Stage: negotiation!

Outcome: Closed/won!

DURATION MODEL!

•  Win/loss model!•  Pr(win)!•  independent of time horizon!•  RF/GBDT!!

•  Duration model!•  Pr(win within quarter)!•  Poisson regression: assume that in current state xt, fixed probability of closing each day!•  Train a model to predict expected duration d, conditioned on outcome=win!•  Integrating corresponding exponential distribution gives Pr(close < t) (interarrival times)!•  Pr(win < t) = Pr(win) Pr(close < t | win)!

FORECASTING: BOTTOM-UP!

Bottom-up: Predict current quarter based on currently open pipeline!!

Considers quality of deals in pipeline!!

Ignores trends, deals not in pipeline!

$265,410!

$157,000 77%

$200,000 37%

$82,000 86%

+!-!

Obvious solution: expected amount in pipeline wrt Pr(win in quarter) scores!

FORECASTING: TOP-DOWN!

Top-down: Predict current quarter based on previous quarters!!

Accounts for seasonality and trending!!

Ignores state of current pipeline!

0.0e+00

2.5e+08

observed

5.0e+07

2.5e+08

trend

−5e+06

5e+06

seasonal

−1e+07

5e+06

2013.0 2013.2 2013.4 2013.6 2013.8 2014.0 2014.2 2014.4

random

Time

Decomposition of additive time series

+!-!

Typical decomposition of revenue time series into 3 components:!!•  Trend component!•  Seasonal component!•  Random component!

Idea: try to reduce the random component by taking into account current pipeline!

‘HYBRID’ FORECASTING!top down + bottom up!

•  Idea: augment ARIMA model with side information from bottom-up model!

•  Allows model to adjust coefficients in response to bottom-up features (representing current pipeline) while retaining ARIMA features !

•  Amount predicted to close in current quarter!

•  Average score of currently open opportunities!

•  Average predicted days to close!•  Historic adjusted coverage ratios!!

•  Sometimes known as ARIMAX [1]!

[1] robjhyndman.com/hyndsight/arimax!!

WORD VECTORS!

•  Train word2vec model on text fields on opportunities!

•  description, status, risks, …!•  “deal pushed out because no

budget this quarter”!!

•  ~200m words!•  Gives 300-dimensional ‘neural’ word

embeddings!•  Compare to GoogleNews model!•  Learned some sales-specific

concepts!

In [23]: model.most_similar('lost') !Out[23]: ![('disqualified', 0.7105633020401001), ! ('killed', 0.6871206164360046), ! ('won', 0.6662579774856567), ! ('abandoned', 0.6619119048118591), ! ('closing', 0.6464139223098755), ! ('moved', 0.6406350135803223), ! ('reopened', 0.6268107891082764), ! ('closed_lost', 0.6187739968299866), ! ('low_probability', 0.6092942953109741), ! ('closed', 0.6073518395423889)] !!In [24]: gn_model.most_similar('lost') !Out[24]: ![(u'losing', 0.7544215321540833), ! (u'lose', 0.7136349081993103), ! (u'regained', 0.618366003036499), ! (u'loses', 0.6115548610687256), ! (u'loosing', 0.576453447341919), ! (u'gained', 0.5561528205871582), ! (u'dropped', 0.5492223501205444), ! (u'loss', 0.5399519205093384), ! (u'won', 0.5263957977294922), ! (u'regain', 0.5241336822509766)] !

WORD VECTORS! In [8]: model.most_similar('pushed') !Out[8]: ![('moved', 0.8117796778678894), ! ('pushing', 0.72132408618927), ! ('delayed', 0.7004601955413818), ! ('stalled', 0.6817235946655273), ! ('indefinitely', 0.6797506809234619), ! ('until', 0.6696473360061646), ! ('shelved', 0.6633578538894653), ! ('slowed_down', 0.6619900465011597), ! ('might_slip', 0.6591036915779114), ! ('gone', 0.6582096815109253)] !!In [9]: gn_model.most_similar('pushed') !Out[9]: ![(u'pushing', 0.762706458568573), ! (u'push', 0.695708692073822), ! (u'nudged', 0.6802582144737244), ! (u'shoved', 0.6162334084510803), ! (u'bumped', 0.6148176789283752), ! (u'pushes', 0.610393762588501), ! (u'dragged', 0.5916476845741272), ! (u'pulled', 0.5719939470291138), ! (u'moved', 0.5660783052444458), ! (u'inched', 0.5563575029373169)] !

In [49]: model.most_similar('sdr') !Out[49]: ![('mktg', 0.6193182468414307), ! ('lead_gen', 0.5637482404708862), ! ('ppl', 0.5618690252304077), ! ('lss', 0.5492127537727356), ! ('reps', 0.5445878505706787), ! ('cold_calling', 0.5426461696624756), ! ('mkt', 0.5422939658164978), ! ('marketo', 0.5341131687164307), ! ('team', 0.532421886920929), ! ('guru', 0.5259524583816528)] !!In [50]: gn_model.most_similar('sdr') !!!KeyError: "word 'sdr' not in vocabulary" !

We’re hiring!!data {scientists, engineers}[email protected]!