faster computation for hierarchical bayesian models with ... · 23/09/2019  · rcpp packages i...

26
Faster Computation for Hierarchical Bayesian Models with Rcpp Packages Lu Chen *,# , Balgobin Nandram + , Nathan B. Cruze # * National Institute of Statistical Sciences (NISS) + Worcester Polytechnic Institute # United States Department of Agriculture National Agricultural Statistics Service (NASS) GASP II Washington, DC September 23, 2019 ... providing timely, accurate, and useful statistics in service to U.S. agriculture.” 1

Upload: others

Post on 07-Aug-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Faster Computation for Hierarchical Bayesian Models with ... · 23/09/2019  · Rcpp Packages I Rcpp is a R package to extend R with C++ codes developed by Dirk Eddelbuettel and Romain

Faster Computation for Hierarchical BayesianModels with Rcpp Packages

Lu Chen∗,#, Balgobin Nandram+, Nathan B. Cruze#

∗National Institute of Statistical Sciences (NISS)

+Worcester Polytechnic Institute

#United States Department of AgricultureNational Agricultural Statistics Service (NASS)

GASP IIWashington, DC

September 23, 2019

“. . . providing timely, accurate, and useful statistics in service to U.S. agriculture.” 1

Page 2: Faster Computation for Hierarchical Bayesian Models with ... · 23/09/2019  · Rcpp Packages I Rcpp is a R package to extend R with C++ codes developed by Dirk Eddelbuettel and Romain

Disclaimer and Acknowledgment

The findings and conclusions in this presentation are thoseof the authors and should not be construed to represent anyofficial USDA or US Government determination or policy.

I This research was supported in part by the intramural researchprogram of the U.S. Department of Agriculture, NationalAgriculture Statistics Service.

GASP–Faster Computation with Rcpp Packages 2

Page 3: Faster Computation for Hierarchical Bayesian Models with ... · 23/09/2019  · Rcpp Packages I Rcpp is a R package to extend R with C++ codes developed by Dirk Eddelbuettel and Romain

Outline

Motivation

Introduction of Rcpp Packages

Examples

Conclusion

GASP–Faster Computation with Rcpp Packages 3

Page 4: Faster Computation for Hierarchical Bayesian Models with ... · 23/09/2019  · Rcpp Packages I Rcpp is a R package to extend R with C++ codes developed by Dirk Eddelbuettel and Romain

Motivation

”Sometimes R code just isn’t fast enough.“

– Hadley Wickham

I Problem: How to combine survey and auxiliary data toimprove the county-level estimates for crops?

I Application: Bayesian small area models

I Potential bottlenecks of R code: subsequent iterations,repeatedly calling functions, loops in Markov chain MonteCarlo (MCMC) algorithms

I One solution: rewriting key functions in C++ through Rcpppackages

GASP–Faster Computation with Rcpp Packages 4

Page 5: Faster Computation for Hierarchical Bayesian Models with ... · 23/09/2019  · Rcpp Packages I Rcpp is a R package to extend R with C++ codes developed by Dirk Eddelbuettel and Romain

Rcpp Packages

I Rcpp is a R package to extend R with C++ codes developedby Dirk Eddelbuettel and Romain Francois (2013) .

I SpeedI New Things

I RcppArmadillo is a Rcpp extension package that provides allthe functionality of Armadillo, focusing on matrix math.

I Easy-to-useI Further speedup

GASP–Faster Computation with Rcpp Packages 5

Page 6: Faster Computation for Hierarchical Bayesian Models with ... · 23/09/2019  · Rcpp Packages I Rcpp is a R package to extend R with C++ codes developed by Dirk Eddelbuettel and Romain

Using sourceCpp() in R

I The Rcpp::sourceCpp function parses the C++ file (.cpp) andmakes C++ functions available as R functions.

Example: Calculate mean of xi , i = 1, . . . , 105, where xi ∼ U(0, 1).

Page 7: Faster Computation for Hierarchical Bayesian Models with ... · 23/09/2019  · Rcpp Packages I Rcpp is a R package to extend R with C++ codes developed by Dirk Eddelbuettel and Romain

Using sourceCpp() in R

I The Rcpp::sourceCpp function parses the C++ file (.cpp) andmakes C++ functions available as R functions.

Example: Calculate mean of xi , i = 1, . . . , 105, where xi ∼ U(0, 1).

Page 8: Faster Computation for Hierarchical Bayesian Models with ... · 23/09/2019  · Rcpp Packages I Rcpp is a R package to extend R with C++ codes developed by Dirk Eddelbuettel and Romain

Comparison

name min mean median max times

Rcpp 105.432 151.448 109.209 228.249 100

R 210.845 237.125 235.366 396.785 100

Page 9: Faster Computation for Hierarchical Bayesian Models with ... · 23/09/2019  · Rcpp Packages I Rcpp is a R package to extend R with C++ codes developed by Dirk Eddelbuettel and Romain

MCMC in Bayesian Computation

I MCMC is a sampling method to draw random samples fromdistributions.

I Each random sample is used as a stepping stone to generatethe next one (chains).

I Gibbs sampler, Metropolis-Hastings sampler and many othersare widely used in Bayesian inference.

I Involve loops and calling functions repeatedly within loops.

Rcpp (C++) and RcppArmadillo are useful tools for efficientMCMC computation.

GASP–Faster Computation with Rcpp Packages 9

Page 10: Faster Computation for Hierarchical Bayesian Models with ... · 23/09/2019  · Rcpp Packages I Rcpp is a R package to extend R with C++ codes developed by Dirk Eddelbuettel and Romain

Simulated Data

Data: simulated planted acres data inIllinois (Nandram et al., 2019 andBattese et al., 1988)

I Survey estimates θi , i = 1, . . . , 102

I Survey standard errorsσi , i = 1, . . . , 102

I Covariates: corn and soybeanplanted acres from landobservatory satellites (LANDSAT)

GASP–Faster Computation with Rcpp Packages 10

Page 11: Faster Computation for Hierarchical Bayesian Models with ... · 23/09/2019  · Rcpp Packages I Rcpp is a R package to extend R with C++ codes developed by Dirk Eddelbuettel and Romain

Fay-Herriot ModelFay-Herriot Model (1979) in small area estimation:

θi |θiind∼ N(θi , σ

2i ),

θi |β, δ2ind∼ N(x′iβ, δ

2), i = 1, . . . , n,

Priors for the parameters: π(β) ∝ 1;π(δ2) ∝ 1δ2.

The full conditional distributions for Gibbs sampling are:

1. β|δ2 ∼ MVN(

(∑n

i=1 xix′i )−1(∑n

i=1 x′iβ), δ2(

∑ni=1 xix

′i )−1),

2. θi |β, δ2ind∼ N(λi θi + (1− λi )x′iβ, (1− λi )δ2), λi = δ2

δ2+σ2i,

3. δ2|θ,β ∼ IG(n−12 , 12

∑ni=1(θi − x′iβ)2

).

GASP–Faster Computation with Rcpp Packages 11

Page 12: Faster Computation for Hierarchical Bayesian Models with ... · 23/09/2019  · Rcpp Packages I Rcpp is a R package to extend R with C++ codes developed by Dirk Eddelbuettel and Romain

Comparison

12,000 iterations with 2,000 burn-in and pick every 10th sample

name min mean median max times

Rcpp 94.863 97.285 96.621 106.362 100

R 4707.729 4856.196 4829.291 4912.025 100

Page 13: Faster Computation for Hierarchical Bayesian Models with ... · 23/09/2019  · Rcpp Packages I Rcpp is a R package to extend R with C++ codes developed by Dirk Eddelbuettel and Romain

Comparison - R density plots

GASP–Faster Computation with Rcpp Packages 13

Page 14: Faster Computation for Hierarchical Bayesian Models with ... · 23/09/2019  · Rcpp Packages I Rcpp is a R package to extend R with C++ codes developed by Dirk Eddelbuettel and Romain

Comparison - Rcpp density plots

GASP–Faster Computation with Rcpp Packages 14

Page 15: Faster Computation for Hierarchical Bayesian Models with ... · 23/09/2019  · Rcpp Packages I Rcpp is a R package to extend R with C++ codes developed by Dirk Eddelbuettel and Romain

Fay-Herriot Model with Benchmarking Constraints

In some applications, we need to incorporate benchmarkingconstraints into the model. For example, the county-levelestimates should be summed to state target and they need tocover certain values. The model with inequality constraints:

θi |θiind∼ N(θi , σ

2i ), i = 1, . . . , n,

θi |β, δ2ind∼ N(x′iβ, δ

2), θi ≥ ci ,n∑

i=1

θi ≤ a,

where C = (c1, . . . , cn)′ are known and fixed and a is state target.The priors are π(β) ∝ 1; δ2 ∝ 1

(1+δ2)2.

GASP–Faster Computation with Rcpp Packages 15

Page 16: Faster Computation for Hierarchical Bayesian Models with ... · 23/09/2019  · Rcpp Packages I Rcpp is a R package to extend R with C++ codes developed by Dirk Eddelbuettel and Romain

Joint Posterior Distribution

The posterior density is

π(θ,β, δ2|θ, σ2) =

∏ni=1 φ((θi − X′β)/δ)φ((θi − θi )/σi )∫θ∈V

∏ni=1 φ((θi − X′β)/δ)dθ

, θ ∈ V ,

where φ(·) is the standard normal density and the support of θ is

V ={ci ≤ θi ,

n∑i=1

θi ≤ a, i = 1, . . . , n}.

Awkward joint posterior distribution and intractable.

GASP–Faster Computation with Rcpp Packages 16

Page 17: Faster Computation for Hierarchical Bayesian Models with ... · 23/09/2019  · Rcpp Packages I Rcpp is a R package to extend R with C++ codes developed by Dirk Eddelbuettel and Romain

Computation

Our strategy:

π(θ,β, δ2|θ, σ2) = π(β, δ2|θ, σ2) × π(θ|β, δ2, θ, σ2)

I Metropolis-Hastings Sampler

I Gibbs Sampler

GASP–Faster Computation with Rcpp Packages 17

Page 18: Faster Computation for Hierarchical Bayesian Models with ... · 23/09/2019  · Rcpp Packages I Rcpp is a R package to extend R with C++ codes developed by Dirk Eddelbuettel and Romain

Computation

Our strategy:

π(θ,β, δ2|θ, σ2) = π(β, δ2|θ, σ2) × π(θ|β, δ2, θ, σ2)

I Metropolis-Hastings Sampler

I Gibbs Sampler

GASP–Faster Computation with Rcpp Packages 17

Page 19: Faster Computation for Hierarchical Bayesian Models with ... · 23/09/2019  · Rcpp Packages I Rcpp is a R package to extend R with C++ codes developed by Dirk Eddelbuettel and Romain

Metropolis-Hastings SamplerWe will draw (β, δ2) samples from π(β, δ2|θ, σ2). The proposaldensity is

(β, log(δ2)) ∼ MVN(βp, σ2Σp)

ν/σ2 ∼ Γ(ν/2, 1/2)

Bottleneck:For each iteration h:

I Generate: Generate a candidate (βc , log(δ2)c) from proposaldensity;

I Calculate: Calculate the acceptance ratioα = π(βc , log(δ2)c)/π(β(h), log(δ2)(h));

I Accept or Reject candidate based on the comparisonbetween α and u ∼ U(0, 1).

GASP–Faster Computation with Rcpp Packages 18

Page 20: Faster Computation for Hierarchical Bayesian Models with ... · 23/09/2019  · Rcpp Packages I Rcpp is a R package to extend R with C++ codes developed by Dirk Eddelbuettel and Romain

Comparison

10,000 iterations with 2,000 burn-in and pick every 8th sample

name min mean median max times

Rcpp 69.527 70.554 70.711 71.073 10

R 2451.688 2540.393 2519.215 2741.561 10

Page 21: Faster Computation for Hierarchical Bayesian Models with ... · 23/09/2019  · Rcpp Packages I Rcpp is a R package to extend R with C++ codes developed by Dirk Eddelbuettel and Romain

Gibbs Sampler for θ

The conditional posterior density of θi is

θi |θ(i),β, δ2, θ, σ2 ∼ N(µi , φi

), θi ∈ Vi ,

where µi and φi related to β and δ2 and

Vi ={ci ≤ θi ≤ a−

n−1∑j=1,j 6=i

θj

}, i = 1, . . . , n.

Bottleneck:Each θi related to other θs based on the Vi .For one iteration, we need to loop n times.

GASP–Faster Computation with Rcpp Packages 20

Page 22: Faster Computation for Hierarchical Bayesian Models with ... · 23/09/2019  · Rcpp Packages I Rcpp is a R package to extend R with C++ codes developed by Dirk Eddelbuettel and Romain

Comparison

name min mean median max times

Rcpp 2.130 2.162 2.153 2.213 10

R 55.934 56.615 56.696 57.134 10

Page 23: Faster Computation for Hierarchical Bayesian Models with ... · 23/09/2019  · Rcpp Packages I Rcpp is a R package to extend R with C++ codes developed by Dirk Eddelbuettel and Romain

Convergence Diagnostics

I M-H: 1,000 samples of (β(h), δ2(h)), h = 1, . . . , 1000.

I Gibbs: for each (β(h), δ2(h)), we run 100 times Gibbs samplerand pick the last set of θ.

pm psd lb ub gewe.pval ess

β0 116.22 2.01 112.43 120.06 0.45 909β1 16.74 4.58 7.75 25.29 0.40 870β2 -4.81 4.23 -13.14 3.35 0.24 968δ2 325.96 60.78 206.98 469.46 0.64 827

I Rcpp vs R code: 72s vs 2576s for 102 samples size in theconstraint Bayesian model.

GASP–Faster Computation with Rcpp Packages 22

Page 24: Faster Computation for Hierarchical Bayesian Models with ... · 23/09/2019  · Rcpp Packages I Rcpp is a R package to extend R with C++ codes developed by Dirk Eddelbuettel and Romain

Conclusion

I Rcpp functions can reduce the running time by a significantfactor and reasonable in further production for county-levelestimates in NASS.

I Large data set or complicated hierarchical Bayesian models:Rcpp packages

I Pros: incorporating C++ code into R workflow easily;substantially speed up MCMC computation in R

I Cons: learning curve and long coding time

I Small data set or simple, classic Bayesian models: such asRJags and RStan

I Pros: Easy-to-use; less coding timeI Cons: Black-box sampler; not for non-standard problems

GASP–Faster Computation with Rcpp Packages 23

Page 25: Faster Computation for Hierarchical Bayesian Models with ... · 23/09/2019  · Rcpp Packages I Rcpp is a R package to extend R with C++ codes developed by Dirk Eddelbuettel and Romain

Reference[1] H. Wickham, Advanced r. Chapman and Hall/CRC, 2014.

[2] D. Eddelbuettel and R. Francois, “Rcpp: Seamless R and C++ integration,”Journal of Statistical Software, vol. 40, no. 8, pp. 1–18, 2011.

[3] D. Eddelbuettel and C. Sanderson, “Rcpparmadillo: Accelerating r withhigh-performance c++ linear algebra,” Computational Statistics and DataAnalysis, vol. 71, pp. 1054–1063, March 2014.

[4] B. Nandram, A. L. Erciulescu, and N. B. Cruze, “Bayesian benchmarking of thefay-herriot model using random deletion,” Survey Methodology 45-2, vol. 45,p. 365, 2019.

[5] G. E. Battese, R. M. Harter, and W. A. Fuller, “An error-components model forprediction of county crop areas using survey and satellite data,” Journal of theAmerican Statistical Association, vol. 83, no. 401, pp. 28–36, 1988.

[6] R. E. Fay III and R. A. Herriot, “Estimates of income for small places: anapplication of james-stein procedures to census data,” Journal of the AmericanStatistical Association, vol. 74, no. 366a, pp. 269–277, 1979.

GASP–Faster Computation with Rcpp Packages 24

Page 26: Faster Computation for Hierarchical Bayesian Models with ... · 23/09/2019  · Rcpp Packages I Rcpp is a R package to extend R with C++ codes developed by Dirk Eddelbuettel and Romain

Thank [email protected]

GASP–Faster Computation with Rcpp Packages 25