challenges in business analytics an industrial research perspective 30 o ctober 2008
DESCRIPTION
Challenges in Business Analytics An industrial research perspective 30 O ctober 2008. Eleni Pratsini Manager Mathematical and Computational Sciences IBM Zurich Research Laboratory. Alexis Tsoukiàs. Fred Roberts. Overview. Business challenges Analytics opportunities - PowerPoint PPT PresentationTRANSCRIPT
Zurich Research Lab
© Copyright IBM Corporation 2005
Challenges in Business AnalyticsAn industrial research perspective
30 October 2008 Eleni Pratsini
Manager Mathematical and Computational SciencesIBM Zurich Research Laboratory
Fred Roberts Alexis Tsoukiàs
2 Challenges in Business Analytics; An industrial research perspective 30 October 2008
Business challenges Analytics opportunities
- Need for advanced analytics in a resource constrained world
Technical challenges in a world of increasing complexity
(see slide 8 for information)
- Data availability – a wonder and a curse
- Optimization under data uncertainty
Parallel computing and optimization
Overview
3 Challenges in Business Analytics; An industrial research perspective 30 October 2008
A couple of slides here describing the business environment that leads to the need for advanced analytics - charts from some of our studies showing the trends
4 Challenges in Business Analytics; An industrial research perspective
Analytics Landscape
Degree of Complexity
Com
petit
ive
Adv
anta
ge
Standard Reporting
Ad hoc reporting
Query/drill down
Alerts
Forecasting
Simulation
Predictive modeling
Optimization
What exactly is the problem?
How can we achieve the best outcome?
What will happen next?
What will happen if … ?
What if these trends continue?
What actions are needed?
How many, how often, where?
What happened?
Based on: Competing on Analytics, Davenport and Harris, 2007
Reporting
Analytics
Stochastic OptimizationHow can we achieve the best outcome including the effects of variability?
5 Challenges in Business Analytics; An industrial research perspective
Analytics Landscape
Degree of Complexity
Com
petit
ive
Adv
anta
ge
Standard Reporting
Ad hoc reporting
Query/drill down
Alerts
Forecasting
Simulation
Predictive modeling
Optimization
What exactly is the problem?
How can we achieve the best outcome?
What will happen next?
What will happen if … ?
What if these trends continue?
What actions are needed?
How many, how often, where?
What happened?
Based on: Competing on Analytics, Davenport and Harris, 2007
Descriptive
Prescriptive
Predictive
Stochastic OptimizationHow can we achieve the best outcome including the effects of variability?
6 Challenges in Business Analytics; An industrial research perspective 30 October 2008
And we have lots of data …
7 Challenges in Business Analytics; An industrial research perspective 30 October 2008
Business challenges Analytics opportunities
- Need for advanced analytics in a resource constrained world
Technical challenges in a world of increasing complexity
(see slide 8 for information)
- Data availability – a wonder and a curse
- Optimization under data uncertainty
Parallel computing and optimization
Overview
8 Challenges in Business Analytics; An industrial research perspective 30 October 2008
Outline of slides that follow Intro on increasing complexity of the problems we want to address (slides 9-10) Use a project in Pharmaceutical manufacturing to introduce some of the technical challenges we
face:- Introduce / motivate problem (slides 11-12)- Structure / architecture of the analysis (slides13-14 – here I can highlight the need to work with other
functional groups) Data integration (I will talk about it on slide 14 - I am looking for some chaotic spreadsheets we
had to use to extract the information – it is confidential so I need to see on this) Effect of uncontrolled data collection (slide 15) leading into:
- Use of techniques from Decision Theory / AI (slide 16 is taken from article by Fred – this can come in the discussion by Fred & Alexis)
- Dealing with missing, inconsistent, incomplete data etc (slides 17-19; more slides if needed) Use expert elicitation techniques method of aggregation
Optimization model & optimization under uncertainty (slides 20-37; condense if needed) Was the client (or the situation?) ready for the analysis (slides 38-39) at the end they simply
simulated predefined scenarios!! Slide 40: some of the non so technical challenges we have in “selling” our work. This will basically
be a discussion – I want to find a nice cartoon on this Slides 41-42 a description of one of our most successful projects and some discussion why it was
successful Finally, if appropriate I can present some exploratory research on using // processing in
optimization (currently we have better results for MINLP than MIP)
9 Challenges in Business Analytics; An industrial research perspective 30 October 2008
Challenges in a world of increasing complexity
Advances in computer power bring capabilities (and expectations) for solving larger problem sizes- Problem decomposition not always obvious- Sequential optimization- Exact vs approximate solution or mixed- Hybrid methods
Information Explosion- There is a lot of data (too much!) but how about data quality?
Not enough of the “right” data Missing values Outdated information
- Data in multiple formats Can our optimization models handle the information?
10 Challenges in Business Analytics; An industrial research perspective 30 October 2008
Where do we start?
11 Challenges in Business Analytics; An industrial research perspective 30 October 2008
Example: Risk Based Approach to Manufacturing
Post-Marketing Adverse Event Reports
156,477
191,865212,843
247,604278,143 266,991
286,755
0
150,000
300,000
1995 1996 1997 1998 1999 2000 2001
Calendar yearNu
mbe
r
‘FDA’s GMPs for the 21st Century – A Risk-Based Approach’
- Manage risk to patient safety – “Critical to Quality”- Apply science in development and manufacturing - Adopt systems thinking – build quality in
*Horowitz D.J. Risk-based approaches for GMP programs, PQRI Risk management Workshop, February 2005
12 30 October 2008Challenges in Business Analytics; An industrial research perspective 12
Risk Management and Optimization
Using company’s knowledge, industry expertise and industry guidelines & regulations we identify the sources of risk that need to be considered in the analysis.
Based on historical data, industry expertise and other sources of causal relationships, we set the causal link strengths (conditional probabilities) and develop a model for risk quantification
Given the risk values, financial and physical considerations, we determine the best sequence of corrective actions for minimizing exposure to risk and optimizing a given performance metric.
Phase 1: Identify Risk Sources Phase 2: Quantify Risk Values Phase 3: Optimal Transformation
client
data
experts
industry
ekjjke
ep jpjepe
jpjepje
jpjepje
jpjepe RRiskTXX
,,,jkeCRRevCTCRev
j
pjeX 1 ep,
k
jkeY 1 ejp ,,
k p
pjejke XY ej,
p
pjeX jepj CAPa ej,
pjek
jkepjkepje XYRisk Haz ejp ,,
pjepje TX ejp ,,
pjeipjepje TXX 1 ejp ,,
j
pjeT 1 ep,
11 pjepje XT ejp ,,
jkejkejke RYY 1 ekj ,,
1 jkejke YY
jkejke RY
1/0,,,,0 RTYXRisk
ekj ,,
ekj ,,
0 2000 4000 6000 8000 10000 12000 14000 16000 18000clientRevenue at Risk as a percentage of Revenue
1.6%7%
9.7%6.4%
11%10.6%
5.3%1%1%1.3%
CAPE BLST 4CAPE CRD 1CAPE PCH 1
CAPE PTCH 1CAPE BOT 1CAPE BOT 2CAPE BOT 3
CAPE VIAL 1CAPE VIAL 2CAPE VIAL 3
industry /experts
13 30 October 2008Challenges in Business Analytics; An industrial research perspective 13
Iterative process
Refactoring ModelMIP Formulation
SolverOptimisation Engine
Sets:
Risk ParametersCost Parameters
Revenue Parameters
Current State
DecisionVariables:
RisksStatus
Optimal RouteMap
New State
HazardsConstraints
Mitigation Actions
Objective Function:
Total Rev – Mitigation_Costs - Rev@Risk
Business Case
14 30 October 2008Challenges in Business Analytics; An industrial research perspective 14
Overview of Approach
Refactoring Model
Data Collection
Routemap
Envisioning supported by Re-factoring model
Products
Systems
Technology
Define Scenarios Product/AssetConfiguration
Implementation Strategy
Review Cost base of
Sites Collect key data
Establish Risk Profile
Assess readiness for change
Test scenarios:Prioritise on impact on
Revenue Risk, and
Profitability
Adapt scenarios Optimise route-map for implementationbased on Impact on:
Revenue @ Riskprofile
Investment StrategyImpact of Change
Output
Input
1
3
4
2
Review route maps
select finalscenario options
Select scenarioDefine
implementation strategy
5
6
Input
Calculate Revenue
profileInput
15 Challenges in Business Analytics; An industrial research perspective 30 October 2008
Effect of uncontrolled data collection
A pharmaceutical firm is assessing the risk of two of their sites: Country A and Country X. Engineers at both sites are asked to collect non-compliant and defective batches in the last two years. Analysis of the data indicates that site A has 50% more defective / non-compliant batches than site X. They deduce that they should move their manufacturing processes to X or drastically change their operations at A.
Just before launching an audit of their site A, they receive a letter from the FDA mentioning many recalls of batches all coming from site X.
Further investigation indicates that site X does not report the many defects they notice every week and they are indeed more risky.
Lack of observations does not mean no events; it means no reporting of eventsExtract knowledge from non statistical sources, e.g. experts
16 Challenges in Business Analytics; An industrial research perspective 30 October 2008
Introduce Decision Theoretic / AI approaches to tackle Business Optimization problems*
Decision Theoretic methods of concensus- Combining partial information to reach decisions
Algorithmic decision theory- Sequential decision making models and algorithms- Graphical models for decision making- AI and computer-based reasoning systems- Information management: representation and elicitation
Aggregation- Decision making under uncertainty or in the presence of partial
information
*Computer Science and Decision Theory, Fred Roberts, DIMACS Center, Rutgers University
17 Challenges in Business Analytics; An industrial research perspective 30 October 2008
Technology, product and system failure are all linked via common root causes. The risk model starts from building a dependency graph of all these root causes.
Cause Effect Analysis industry ExpertiseClient Constraints
Past Experiences
Different sub-models created:• Dosage form variability• Employee execution• Process control• Technology level
18 Challenges in Business Analytics; An industrial research perspective 30 October 2008
Elicitation / Aggregation / Estimation
19 30 October 2008Challenges in Business Analytics; An industrial research perspective 19
Based on location, technologies or any previously identified components involved in the drug production process, the network model computes an estimate of the quality risk. Changing the way a drug is produced will (a priori) induce a change in the quality risk estimate.
Site A
Site B
Quality failure: 2.4%
Quality failure: 1.9%
Optimization model
20 Challenges in Business Analytics; An industrial research perspective 30 October 2008
Risk Exposure
Performance measure for exposure to risk:
(Insurance premium)
Risk value:
j
pjepe RiskskRevenue@ri peRevenue
k
pjejkepje XYRisk pjkeHaz
21 Challenges in Business Analytics; An industrial research perspective 30 October 2008
FormulationMAX
ekjjke
ep jpjepe
jpjepje
jpjepje
jpjepe RRiskTXX
,,,jkeCRRevCTCRev
ST
j
pjeX 1 ep,
k
jkeY 1 ejp ,,
k p
pjejke XY ej,
p
pjeX jepj CAPa ej,
pjek
jkepjkepje XYRisk Haz ejp ,,
pjepje TX ejp ,,
pjeipjepje TXX 1 ejp ,,
j
pjeT 1 ep,
11 pjepje XT ejp ,,
jkejkejke RYY 1 ekj ,,
1 jkejke YY
jkejke RY
1/0,,,,0 RTYXRisk
Transfer
Remediation
Capacity, Risk
State, Level
ekj ,,
ekj ,,
22 Challenges in Business Analytics; An industrial research perspective 30 October 2008
17 product families 3 systems plus one for subcontracting 14 technology platforms – most available in all systems Planning horizon: 5 years split into 10 periods of 6 months Restriction on rate of change Statistical analysis gave risk values based on historical data Revenue projections available Aggregated costs per scenario – later disaggregated Simulation of given scenarios (ending result); optimization
Problem characteristics
23 Challenges in Business Analytics; An industrial research perspective 30 October 2008
Work center Current Scenario XTech 1Tech 2Tech 3Tech 4 Site BTech 5Tech 6 Site ATech 7Tech 8Tech 9Tech 10Tech 11Tech 12Tech 13Tech 14 Site D
2005 2006 2007 2008
Site B
Site C
Site B
2009
Site C
Site A
Site C
Site A / Site C
Site A
Example Analysis
Values in 10^ -4
Values in Thousands
0.0000.5001.0001.5002.0002.5003.0003.5004.0004.500
Mean=
3 5 7 9 113 5 7 9 11
5% 90% 5% 5.7013 8.625
Mean=7208
Optimal sequence of actions for minimizing risk exposure
ii
i prevE )(exposure
)p(prev)(prev)Var( iii
iii
i 1var 22exposureRevenue@risk distributions
0 5000 10000 15000 20000
Revenue@risk
Dens
ity
Scenario 1
Scenario 2
As Is
Fully Opt
Calculate Business Exposure
Scenario Comparison
24 Challenges in Business Analytics; An industrial research perspective 30 October 2008
Data Variability
The consideration of the frequency distribution of each solution is essential when there is uncertainty in the input parameters
Robust Optimization techniques to determine the solution that is optimal under most realizations of the uncertain parameters
Which Solution / Plan is be tte r?
0 2000 4000 6000 8000 10000 12000 14000 16000
Rev@ Risk
freq
uenc
y di
stri
butio
n
Plan A
Plan B
6200
= 4300
= 3900
area = 5%
area = 20%
Which Solution / Plan is be tte r?
0 2000 4000 6000 8000 10000 12000 14000 16000
Rev@ Risk
freq
uenc
y di
stri
butio
n
Plan A
Plan B
Which Solution / Plan is be tte r?
0 2000 4000 6000 8000 10000 12000 14000 16000
Rev@ Risk
freq
uenc
y di
stri
butio
n
Plan A
Plan B
6200
= 4300
= 3900
area = 5%
area = 20%
25 Challenges in Business Analytics; An industrial research perspective 30 October 2008
Possible Approaches
Stochastic Programming- Average Value certainty equivalent- Chance constraints (known CDF – continuous)
Robust Optimization- Soyster (1973)- Bertsimas & Sim (2004)
CVaR - Rockafeller & Uryasev (2000)
26 Challenges in Business Analytics; An industrial research perspective 30 October 2008
Modeling with the approach of Bertsimas and Sim Formulation adjusted to incorporate a constraint that could be violated with a
certain probability:
Uncertain parameters have unknown but symmetric distributions Uncertain parameters take values in bounded intervals:
Uncertain parameters are independent
estpx
epx
xx
estp
stestp
,,,
value critical s.t.)fit(SecuredPro
,,,,
}1,0{
,1
max
,,,
)Rev@Risk( E Xx
es,t,es,t,es,t,es,t, H + H ; H-H ˆˆes,t,Haz
27 Challenges in Business Analytics; An industrial research perspective 30 October 2008
Bertsimas & Sim cont’d
Provided guarantee:
value critical)Rev@Risk(Pr x
Example Values of Γ:
12n 1
n
nl
l 1
n
n total number of uncertain parameters , n
2
,
28 Challenges in Business Analytics; An industrial research perspective 30 October 2008
Application to pharma example
Robust Formulation:
0
,,0
,,0
}1,0{
,1
,,
,,
ˆmax
,,
,,
,,,
,,,,,,,
,,,,
Z
estY
estV
estpX
epX
estYXY
estYVZ
VZX
est
est
estp
stestp
estp
estpest
estest
t,s,ep,t,s,e
,,,
HRev
value critical Rev s.t.
,,,,
es,t,ep,
es,t,p, es,t,ep,es,t, ++H
fitSecuredPro )(E xXx
29 Challenges in Business Analytics; An industrial research perspective 30 October 2008
Results!
30 Challenges in Business Analytics; An industrial research perspective 30 October 2008
Observations
Provides solutions that are much more conservative than expected
Plans satisfying risk requirement are discarded by the optimization - suboptimal solutions
Adjustments to calculation of parameters to improve results- Extreme distribution - Consider the number of expected basic variables and not the
total number of variables affected by uncertain coefficients e.g. production of 100 products, 3 possible sites, uncertainty in
capacity absorption coefficient:- 300 possible variables (value of n)- each product can only be produced on one site i.e. only 100 variables >
0 (use this value for the calculation of Γ)
value critical)Rev@Risk(Pr x
31 Challenges in Business Analytics; An industrial research perspective 30 October 2008
Bertsimas & Sim approach
Production – Distributioncontinuous variables
Pharma Regulatory Riskbinary variables
32 Challenges in Business Analytics; An industrial research perspective 30 October 2008
Similar work
Robust optimization Models for Network-based Resource Allocation Problems, Prof. Cynthia Barnhart, MIT- Robust Aircraft Maintenance Routing (RAMR) minimizing expected
propagated delay to get schedule back on track (taken from Ian, Clarke and Barnhart, 2005) Among multiple optimal solutions, no incentive to pick the solution with “highest”
slack Relationship of Γ (per constraint) to overall solution robustness?
- Comparison with Chance constrained Programming Explicitly differentiates solutions with different levels of slack Extended Chance Constrained Programming
- Introduce a “budget” constraint setting an upper bound on acceptable delay- Improved results
33 Challenges in Business Analytics; An industrial research perspective 30 October 2008
Value at Risk & Conditional Value at Risk
VaR Maximum loss with a specified
confidence level- Non sub-additive- Non-convex
Density of f (x)
-Quantile
1-
CVaR -Quantile αβ(x) (=95%) E [ f (x) | f (x) > αβ(x)] Advantages
- Sub-additive- Convex
Density of f (x)
CVaR(x)
αβ(x)
34 Challenges in Business Analytics; An industrial research perspective 30 October 2008
CVaR in Math Programming: Linear Program with Random Coefficients
CVaR can be used to ensure 'robustness' in the stochastic constraint:
Min
CVaR ( )
T
T b
c x
xDx d, x 0
Consider a LP where, coefficient vector, is random:
Min
T
T b
c x
x Dx d, x 0
35 Challenges in Business Analytics; An industrial research perspective 30 October 2008
Reformulation of Rockafellar & Uryasev (2000)
Good news: Fully linear reformulation, so presentable to any LP solver Bad news: New variables, and New constraints = O (Number of Samples)
- Reformulated LP has grown in both dimensions!- Large number of CVaR constraints & large number of samples => Very Large
LP
1
Each CVaR expression, say a constraint:
CVaR ( )is reformulated as:
1 ( )(1 )
: sample of random vector (out of N total samples)
T
NTi
ith
i
b
t t bN
i
x
x -
1
1 (1 )
0
N
ii
Ti i
i
t z bN
z tz
x -
Linear ReformulationReformulation
36 Challenges in Business Analytics; An industrial research perspective 30 October 2008
Observations
Number of scenarios necessary for a good estimation of CVaR(α, x) and thus meaningful results?
>20,000 scenarios necessary for stable results (Kuenzi–Bay A. and J. Mayer. ”Computational aspects of minimizing conditional value–at–risk”,NCCR FINRISK Working Paper No. 211, 2005).
Can handle up to 300 scenarios – computational complexity
37 Challenges in Business Analytics; An industrial research perspective 30 October 2008
Recent advances in CVaR Kunzi-Bay & Mayer (2006) presented a 2-stage interpretation if CVaR appears only in the
Objective Function & their algorithm CVaRMin led to an order of magnitude speed-up
Interesting work on Robust Optimization techniques & CVaR David B. Brown, Duke University:
- Risk and Robust Optimization (presentation)- Constructing uncertainty sets for robust linear optimization (with Bertsimas)- Theory and Applications of Robust Optimization (with Bertsimas and Caramanis)
P. Huang and D. Subramanian. ”Iterative Estimation Maximization for Stochastic Programs with Conditional-Value-at-Risk Constraints”. IBM Technical Report, RC24535, 2008
-They exploit a different linear reformulation, motivated by Order Statistics, and this leads to a new and efficient algorithm.-The resulting algorithm addresses CVaR in both the Objective Function, as well as Constraints
38 Challenges in Business Analytics; An industrial research perspective 30 October 2008
The fastest traveling salesman solution*
*http://xkcd.com/
39 Challenges in Business Analytics; An industrial research perspective 30 October 2008
Practical Analysis ….
Revenue@risk distributions
0 5000 10000 15000 20000
Revenue@risk
Dens
ity
Scenario 1
Scenario 2
As Is
Fully Opt
Scenario Comparison
At the end the client w
anted to
simulate his various scenarios!
40 Challenges in Business Analytics; An industrial research perspective 30 October 2008
Challenges in selling our work
Need user friendly interfaces Difficulty in communicating with the client (we speak different
languages) We often build the world’s most expensive calculators (i.e. our
sophisticated tools are not used to their fullest capabilities) IBM specific: we have better acceptability internally
41 Challenges in Business Analytics; An industrial research perspective 30 October 2008
Inventory Optimization Tool
Successful example of a tool developed over the course of 10 years- Strong software architecture- Gui- Interactive optimization / simulation algorithms- Adaptive modeling- Continuously being evolved based on new client projects
42 Challenges in Business Analytics; An industrial research perspective 30 October 2008
43 Challenges in Business Analytics; An industrial research perspective 30 October 2008
02 Jan 1998 12 Oct 1998 29 Jul 1999 12 May 20000.0
200.0
400.0
600.0
800.0
#
Day
Qty
1
Moving Average of period 4, forecast accuracy: 0.456
Continuous forecasting & adaptive time bucketing
Adaptive model Inventory Optimization (DIOS)
Model adaptation
'
''
FX
dYXadXa
ij
ji
jkikkijij
jiijij
Change in parameters, franco limit, logistics requirements, …
44 Challenges in Business Analytics; An industrial research perspective 30 October 2008
Example of MINLP and the use of parallel processing
Present a few slides showing some exploratory results in // processing of MINLP (a bit of explanation why the linear case does not work as well when using parallel processing, while the NL works better!)
45 Challenges in Business Analytics; An industrial research perspective 30 October 2008