Control Functionals for Monte Carlo Integration

In a recent paper, Chris Oates, Mark Girolami and Nicolas Chopin introduced a new class of estimators for Monte Carlo integration that leverage gradient information on the sampling distribution in order to improve performance. The proposed estimators, called ''control functionals'', achieve super-root- $n$ convergence and often require orders of magnitude fewer simulations, compared with existing approaches, in order to achieve a fixed level of precision.

Summary of the Paper

Problem

Monte Carlo integation attempts to estimate an expectation $\mu := \mathbb{E}[f(X)] = \int f(x) \pi(x) dx$

of a function $f:\mathcal{X}\rightarrow\mathbb{R}$ of a random variable $X\in\mathbb{R}^d$ with density $\pi$, based on a collection of independent realisations $\mathcal{D} = \{x_i\}_{i=1}^n$ of $X$. The most basic solution to this problem is to construct an ''arithmetic mean'' estimator $\overline{\mu}_n(\mathcal{D}) := \frac{1}{n} \sum_{i=1}^n f(x_i).$

Under the central limit theorem, the arithmetic mean $\overline{\mu}_n(\mathcal{D})$ converges to its expectation $\mu$ at a rate $O_P(n^{-1/2})$. However in many modern applications, e.g. involving complex computer models, root- $n$ convergence is simply too slow.

Solution

Our methodology, called ''control functionals'', proceeds on the premise that the score function $u(x) := \nabla_{x} \log \pi(x) \in \mathbb{R}^d$

exists and can, in principle, be evaluated at any given point $\bm{x} \in \mathcal{X}$. We will leverage the score to construct more efficient estimators for $\mu$.

Step 1: Begin by splitting samples $\mathcal{D} = \{\bm{x}_i\}_{i=1}^n$ into two disjoint subsets $\mathcal{D} = \mathcal{D}_0 \cup \mathcal{D}_1 = \{\bm{x}_i\}_{i=1}^m \cup \{\bm{x}_i\}_{i=m+1}^n$

where the size of both subsets is assumed to increase linearly as $n \rightarrow \infty$.

Step 2: The first subset $\mathcal{D}_0$ is used to estimate a surrogate function $\tilde{f}_{\mathcal{D}_0} : \mathcal{X} \rightarrow \mathbb{R}$, based on the gradient information contained in $\bm{u}$, such that $\tilde{f}_{\mathcal{D}_0}(\bm{X})$ shares the same expectation as $f(\bm{X})$ but has a variance that vanishes as $m \rightarrow \infty$. This step is discussed in detail in the paper and can be easily facilitated using techniques from non-parametric regression.

Step 3: Then the second subset $\mathcal{D}_1$ is used to evaluate an arithmetic mean $\hat{\mu}_{\mathcal{D}_0}(\mathcal{D}_1) := \frac{1}{n-m}\sum_{i=m+1}^n \tilde{f}_{\mathcal{D}_0}(\bm{x}_i)$

based on the values $\tilde{f}_{\mathcal{D}_0}(\bm{x}_i)$ where $i = m+1,\dots,n$.

Step 4: Any dependence on the split of the samples is then removed by averaging over many possible splits.

It is proven in the paper that the estimator that results from applying Steps 1-4 is (under weak regularity conditions) unbiased and achieves super-root- $n$ convergence.

Results

Consider the toy example where $f(x) = \sin(\pi x)$ and $X \sim N(0,1)$. Here we know from symmetry that $\mu = 0$. Applying the usual arithmetic mean estimator $\overline{\mu}_n(\mathcal{D})$ we obtain root- $n$ convergence: Now contrast with the control functional estimator. This particular implementation estimates the surrogate function $\tilde{f}_{\mathcal{D}_0}$ using techniques from Gaussian process (GP) regression: The performance of control functionals is so strong that we need to use a different scale on the y-axis in order to compare the estimators. Here, on the y-axis we plot the (estimated) estimator standard deviation, multiplied by $\sqrt{n}$, so that the familiar root- $n$ convergence is represented by a horizontal line: You can see here that control functionals achieve super-root- $n$ convergence. Here we also comare against "Riemann Sums" (DOI 10.1023/A:1008926514119) and "ZV Control Variates" (DOI 10.1007/s11222-012-9344-6). The difference between GP Control Functionals and GP Control Functionals (Simplified) is explained in the main text of the paper.

For further details, please refer to the full paper (link above), where you can also find applications of the control functional methodology to Bayes-Hermite quadrature, marginalisation in hierarchical models, and computation of normalising constants for non-linear differential equation models.

Acknowledgements

CJO was supported by EPSRC [EP/D002060/1]. MG was supported by EPSRC [EP/J016934/1], EU [EU/259348] and a Royal Society Wolfson Research Merit Award. NC was supported by the ANR (Agence Nationale de la Recherche) grant Labex ECODEC ANR [11-LABEX-0047]. 