excelsimulation.2,tutorial,2012may

50
Simulation in Excel: Tricks, Trials & Trends May 2012 (working draft) Dennis Sweitzer, Ph.D. www.Dennis-Sweitzer.com Presented to the Delaware Chapter of the ASA American College of Radiology AstraZeneca Biostatistics Department

Upload: amir-molazecirovic

Post on 16-Aug-2015

215 views

Category:

Documents


0 download

DESCRIPTION

excel

TRANSCRIPT

Simulation in Excel: Tricks, Trials & Trends May 2012(working draft) Dennis Sweitzer, Ph.D. www.Dennis-Sweitzer.com Presented to theDelaware Chapter of the ASA American College of Radiology AstraZeneca Biostatistics Department AbstractSimulation in Excel:Tricks, Trials & Trends Excel is a general purpose spreadsheet which is widely used & understood, but rarely used by itself for simulations. However, the Data Table function in MS Excel can be used to execute substantial simulations, without requiring cumbersome programming "tricks" or VBA coding.The result is an arbitrarily large results table in which each row is one iteration of the simulation, and each column is a random variable generated in the simulation. A small number of additional probability functions are easily programmed using VBA to make Excel a general purpose simulation package.Because VBA is interpreted, use of VBA functions can greatly limit the speed of a simulation.However, for simulations of small size and complexity, the ease and familiarity of working in Excel, outweigh the disadvantages of speed.Examples from clinical trials will be used.

Finally, I discuss new methods to move simulations out of the black boxes and into the enterprise, based on work by Sam Savage.Simulation results (a SIP, or Stochastic Information Packet) from multiple platforms can be stored as XML strings(using the DIST standard) in a SLURP (Stochastic Library Unit with Relationships Preserved), and from there used for reports, planning, etc, or incorporated into other simulations. Some Macros and VBA functions Clinical Trial Examples OutlineHow to do Simulation in ExcelSimulation Sample Spreadsheet Notes on using Inverse Probability FunctionsValidation, Verification, SensitivityProbability Management in SIPS, SLURPS, & DIST Background Occasional need for simulations Excel is convenient, but does not explicitly support simulations Simulation usually requires VBA programming (so why not use R or SAS instead) Or Add-in commercial programs (eg., @Risk) Or some academic add-ins Does have iterative calculations,Solver Why not simulation? Simulate what?Stochastic Models Unknown parameters? !Guestimate a distribution Optimizing choices?!Test each with simulations Sensitivity Analysis Variations in Inputs!Variations in Outputs 2 parameters:use a table >2 parameters:simulate & compare variation Excel: Pros Common Language / Common ToolsMost people understand Excel Many tools available in Excel Transparency:Modeling assumptions can be: Specified--Graphed --Debated What you see is what you get! More hands on deck, more eyes on the prize.: StatisticianTeam Member Initial ModelExplores & breaks model Repair & enhanceRepeat until satisfied MEGO Excel Cons Slower than in SAS, S+, R, etc Lacks some statistical/probability functions Latest versions are a little better Still need to add some VBA codeKnown bugs in statistical routines (often fixed) Tradeoffs: Quicker modificationsvs slower execution Simple Solution:Data TablesExcel Data Tables Creates a table of values of a function Each column is a Random Variable Leftmost column is used as an argument (unneeded for simulation) Data Table repeats calculations for each row Each row is a simulation iteration !" !" 1. Create SimulationCreate Random Variables using Inverse Probability Method: For Random Variable X with distribution function F(x), F(x):!! [0,1] IfRandom UniformU" [0,1] X = F-1(U) (Excel: U=Rand() ) 2. Align Random VariablesCalculations can be anywhere in Spreadsheet Reference the Variables in a row Is best to label variables in same way 3. Select Data TableSelect table region 1st row is Rand Vars 1st column is not used (can label iterations) From toolbar: Data>Data Table4. Create Simulation TableColumn input cell = Upper left hand corner of table Row input cell = ignore OK ! Populates the table (may have to manually recalculate) 5. Execute SimulationIterative development Simulation can be changed Add reporting variables Recalculate to rerun (no need to use Data Table again, unless expanding) Hint:debug with short table, expand for final run The End (of the key concepts) Spreadsheet limitationsOnly simple data structures are available Rows & columns, no lists & trees Discrete event simulations Complex algorithms: difficult Eg, While or for loops Can improvise (cumbersome, slow, buggy) Speed:slow Data Storage:what-you-see-is-all-you-get Tools:Excel Simulation TemplateAdds some missing random functions Adds some set-up macros Excel template & examples at: www.Dennis-Sweitzer.comMacro SimulateSamplerTo start a new simulation when you don't remember the names & parameters of common random variables used in simulation: Run the Macro SimulationSample Copy, delete, and edit as needed. Make sure all random values are referenced in the first row of the data table at the bottom. Macro SimulationSamplerCreates a simulation with each of common simulation functions Macro SimulationSampler Sets up header row for data table Sets up a place for statistics Macro SimulateHighlight the row of random variables(1st row of simulation table) Run macro "Simulate Prompts for which will ask for the number of simulation iterations, The default number of iterations is 100 Debug & develop (manually recalculate) Final run with >1000 iterations Visual Basic code is computationally intensive,Macro SimulateExcel Random VariablesRand()--Random Uniform [0,1] NormSInv() Inverse Standard Normal DistributionCriticalBinomial() Inverse Binomial Distribution LogNormInv()- Inverse Log Normal Distribution Caveat: parameters are mean, SD after the Log transformation Erlang DistributionHow long do you wait until you get a predetermined number of arrivals? Interarrival times are distributed IID exponentialErlang is Gamma with integer parameter Beta DistributionCan use asDistribution of a Binomial probability Range = [0,1] Generic bounded hump (vs Normal as generic unbounded hump) Better behaved than a triangular distribution Example#2, ProblemClient:Heres our plan. Simple spreadsheet calculation But only the expected value,but not variability Example #2, SimulationTime to 100th patient Patients arrive IID Exponential Summary Statistics of Simulated values (below) Interpretation:under the assumptions, 90% of simulations required more than 4.4 months Added VBA FunctionsInverse Functions Needed for Simulation Poisson, Negative BinomialInterpolation from Table Interpolate: 1 or 2 dimensional interpolation Convenience Beta with Mean, SD as parameters Beta withHi, Low, and Mode used for parametersLog Normal with mean, SD as parameters Missing Statistical FunctionsInverse DistributionsInvPoisson:: Poisson InvPascal::Negative Binomial(how many failures before k successes) Negative Binomial is continuous valued distribution;Discrete version is often denoted Pascal distribution Example#3, Patients to ScreenExpected Enrollment rate= 75% 5% ~ Beta Distribution # Screen Failures ~ Negative Binomial (Pascal) Depends on Enrollment Rate Beta Distribution (2)ForConvenience Beta distribution given Mean, SD Beta distribution given Mean, SD, upper, lower bounds Beta distribution given Mode, Upper, Lower bounds Simulation from a Table Find the value in the 1st vector;! Return interpolated value from 2nd

Simulate arbitrary distribution:Top Row: values in [0,1] Bottom Row: Quantiles Result:interpolated value of U from table Or a function:y=f(x) X is found in top row, y is interpolated from bottom row Table Simulation Uses Polygonal distributions (like Triangular) Survival curve (for time to event) Est. K-M curve from data, simulate rest of trial Arbitrary empirical distributions Distribution from observations Table of power calculations eg, assurance calculations: If # patients is random, so is effective power of the study If True effect size is random, so is Pr{success}Simulation from a 2-dimensional tableHere: Rows are quartiles of a random function Left column is value of a parameter A family of distributions which vary with the parameter Parameter y=75%(can be random) Generate random numbers from the interpolated distribution. Example #4: Interim ReviewAfter 2 months, review randomization rates Continue to Randomize to 100 patients How long? Example#4: Interim Review (Simulation)Y= # Patients at 2 mos ~ Poisson Time to Randomize (100-Y) additional pts ~ Erlang (Gamma) 80% CI:; (2.5, 3.7) months Clinical Trials Applications Simulations for planning Prototyping larger simulation Checking assumptions/validation Planning Expected Trial Performance Usually not of interest--already done w/o simulation But should be Variability of Trial Performance Important for Risk Management:Whats the earliest, the latest, the most, the least, etc 80% CIs Structural Problems Interactions of parameters may doom the trial before it even starts! (eg, mean (max{ X, Y} )vs max{ mean(X), mean(Y) } ) The Flaw of Averages!Prototyping Prototyping: Toy simulation with hands-on teamworkDevelopment model Get team buy-in on assumptions Processing speed not important Rapid modifications are important Ideal?Develop a prototype in an 1 hour meeting Check for errors later Run large simulations later for precise estimates Checking planning assumptions H0 = Simulation assumptions Observed: a value X{xi} = corresponding values in simulation Rank of X in {xi} ! p-value Stored Values:Use Function Percent Rank Descriptive Statistics: Use Frequency Count Use to: Test assumptions, validate model, +?? If an observed value of X is rare in the simulation, question assumptions! Checking Assumptions Example: A trial is designed based on a non-trivial simulation. The model predicts a completion rate of 65% with 95% C.I.= (55%, 75%) 4 months into the trial, a 50% completion rate is observed.How significant is this discrepancy? Resimulate: {xi} = simulated completion rates (1/iteration) %tile Rank of observed 50% in simulated {xi} ! p-value How likely is the observation, under the modeled assumptions? Checking Assumptions in Literature Idea:Are assumptions consistent with observations from other sources? Build the simulation to also simulate studies from literature Common Assumptions & Code Code & Assumptions for Study More Code & Assumptions to match literature Compare vs. literature Compare vs. Internal Assumptions Sensitivity Analysis What-ifs Interactions between parameters !Identify Key Control points!Vary parameters between simulations Compare simulation results Eg, average, worst-case scenarios Correlations between simulated parameters and outcomes Macro ManagementVBA Editor: Alt-F11 (or find the menu) Copy Module between sheets Copy code from .xlssheet& insert into VBA editor Open & save as new sheet Macro Management (newer)In Visual Basic From theTool Bar File > Export File Export VBA code (module: SweitzerSimulationCoreCode) File > Import File Imports VBA code (into a module) Further resourcesCommercial and Free software packages Provide: More rigorous algorithmsMore functionsResampling, multivariate, etc More support Commercial Add-Ins@RISK www.palisade.com Crystal Ball www.decisioneering.com Free Add-InsPopTools (Windows only) www.cse.csiro.au/poptools SimTools.xla (Macintosh & Windows) http://home.uchicago.edu/~rmyerson/addins.htm Caveat: Licensing Free for non-commercial (eg, education) Not clear for other uses (NB: vba code from my website is free for all use, but not as useful)Semi-CommercialLow-cost Excel simulation add-in: RiskSimby Michael Middleton www.treeplan.com/ Also:Decision Trees, Sensitivity Analysis, on-line text-book: http://www.treeplan.com/chapters.htmAdditional ReadingINTRODUCTION TO MODELING AND GENERATING PROBABILISTIC INPUT PROCESSES FOR SIMULATION www.informs-sim.org/wsc07papers/008.pdf Spreadsheet Simulation (Seila, 2006) www.informs-sim.org/wsc06papers/002.pdf Work Smarter, Not Harder: Guidelines for Designing Simulation Experiments www.informs-sim.org/wsc06papers/005.pdf Tips for the Successful Practice of Simulation www.informs-sim.org/wsc06papers/007.pdf The End (Actual not simulated)