Abstracts

Talk Abstracts

Nicolas Chopin (ENSAE) - "SMC: Motivation"
- Abstract: The first lecture of the SMC course will introduce:
  - state-space models and their applications
  - the sequential analysis of state-space models: filtering, prediction, smoothing, likelihood evaluation.
  - non-sequential problems that have a similar structure: data tempering, thermodynamic integration, ABC
Nicolas Chopin (ENSAE) - "SMC: Basics"
- Abstract: The second lecture of the SMC course will start with a friendly introduction to the Feynman-Kac formalism, and then a quick recap on importance sampling (see Art's talks). It will then introduce particle filters, in particular particle filters derived from a given state-space model (bootstrap, guided, auxiliary particle filters). If time permits, we will have a quick look at particle smoothing.
Nicolas Chopin (ENSAE) - "SMC: Estimation of state-space models, PMMC algorithms"
- Abstract: The third lecture of the SMC course will cover parameter estimation in state-space models:
  - MLE estimation
  - Bayesian estimation
- For Bayesian estimation, the focus will be on PMCMC (particle MCMC) algorithms.
Nicolas Chopin (ENSAE) - "SMC: Advanced topics"
- Abstract: The fourth lecture of the SMC course will cover more advanced topics such as:
  - SMC samplers: IBIS, tempering, ABC (SMC samplers are SMC algorithms that may be used to deal with non-sequential problems, or problems not directly related to state-space models)
  - SMC^2 (a pseudo-marginal SMC algorithm that makes it possible to perform sequential inference in state-space models)
  - SQMC: a QMC (quasi-Monte Carlo, see Art Owen's course) version of SQMC
Mike Giles (Oxford) - "Multi-Level Monte Carlo Methods"
- Abstract: In recent years there has been very substantial growth in stochastic modelling in many application areas, and this has led to much greater use of Monte Carlo methods to estimate expected values of output
  quantities from stochastic simulation. However, such calculations can be expensive when the cost of individual stochastic simulations is very high.
- Multilevel Monte Carlo greatly reduces the computational cost by performing most simulations with low accuracy at a correspondingly low cost, with relatively few being performed at high accuracy and a high cost. This talk reviews the key ideas behind the multilevel Monte Carlo method, with some simple applications being discussed to illustrate the flexibility and generality of the approach, and the challenges in its numerical analysis.
Mark Huber (Claremont-McKenna) -"Perfect Simulation: What, Why, and How"
- Abstract: This lecture will covers the basics of perfect simulation: what it is, why we might need it, and how to get started with it. We will begin with the first perfect simulation protocol, acceptance/rejection, to get better insight into the theory and then expand applications to nontrivial examples. This extension will be partially fueled by the Fundamental Theorem of Simulation. Also, we will look at robust integration methods for inference, beginning with a user-specified relative error algorithm for estimating the mean of a Bernoulli random variable.
Mark Huber (Claremont-McKenna) - "Perfect Simulation: Coupling from the past and the Fundamental Theorem"
- Abstract: Acceptance rejection applies in many situations, but can be frustratingly slow. Enter Coupling from the past, a protocol that turns any stationary update function into a method for perfectly sampling from the stationary distribution. Most commonly used with Markov chain, this was the method that first showed that it was possible to sample exactly from high dimensional distributions where the variables weakly interact. With both AR and CFTP under our belts, we can begin to say what makes a perfect simulation perfect. This idea is encapsulated in the Fundamental Theorem of Perfect Simulation.
Mark Huber (Claremont-McKenna) - "Perfect Simulation: Slicing and dominating our way to better CFTP"
- Abstract: For many statistical applications, CFTP cannot be used directly because the state space is continuous and unbounded. Perpetuities provide a nice example of these problems, but also can be used to illustrate the solutions. We will take a page from AR in dealing with the continuity problem, and the perfect slice sampler of Mira, M{\o}ller, and Roberts accomplishes the same function for much more general problems. By using a dominating chain, we can take care of the maximum element missing problem. After that we will discuss Markov chains and perfect simulators for point processes that have a density with respect to a Poisson point process.
Mark Huber (Claremont-McKenna) - "Perfect Integration: More accurate integration with a random number of samples"
- Abstract: The goal of most Monte Carlo algorithms, whether for statistical inference or approximation algorithms, is to find some sort of integral, usually in high dimensions. One of the strengths of Monte Carlo is that it is often possible to develop methods with provable bounds on the accuracy of the results. Any deterministic integrator can be tricked, a random algorithm cannot. In this last lecture of the perfect simulation series, we will look at several of these types of algorithms, The Gamma Bernoulli and Gamma Poisson approximation schemes, the Tootsie Pop Algorithm, and importance sampling with a well balanced cooling schedule.
Art Owen (Stanford) - "Foundations 1: Monte Carlo, what, why and how"
- Abstract: Monte Carlo sampling is based on simulations using random number generators. We use it to answer questions about systems that may or may not involve true randomness. We use it the most when the problem is so hard or has so much detail that more classical mathematical or even numerical approaches fail us. The problem usually comes down to estimating an integral or expectation. Monte Carlo excels when a problem has high dimensionality and little smoothness.
- This talk includes basic usages (seeds and streams) of random number generators (which we assume are good). It also includes strategies to turn uniform random numbers into desired non-uniform ones. Although most of the named distributions (Gaussian, binomial, Poisson et cetera) already have good samplers, the principles we use underly more advanced methods. Those principles are inversion, other transformations, acceptance rejection sampling and mixing.
Art Owen (Stanford) - "Foundations 2: Random vectors"
- Abstract: While almost any univariate random variable with a name can be sampled by tools built in to our computing environment (R, Matlab, python, et cetera), random vectors remain a challenge. The core of the challenge is that dependence is intrinsically multivariate and much harder to wrangle than are marginal distributions. Indeed, this challenge is the reason why we need Markov chain Monte Carlo and sequential Monte Carlo. The multivariate context brings us one new principled approach: copula-marginal sampling.
- There are a few very good tools to generate random vectors and we use and combine them whenever we can. We look at methods for Gaussian random variables, multinomial random variables, Dirichlet random variables, random permutations and related quantities We also look at a few random processes which are essentially infinite dimensional vectors: Poisson processes, random lines, stochastic differential equations and the Chinese restaurant process. It is often easy to induce a desired positive correlation structure. Negative dependences can be harder to do.
Art Owen (Stanford) - "Foundations 3: Variance reduction"
- Abstract: In probability, we study random variables defined in terms of a point in a sample space. In Monte Carlo sampling that point and space are under our control when we write our software. We can use some simple mathematics to replace our problem by a different one with the same answer, and sometimes with greatly reduced Monte Carlo variance.
- In antithetic sampling, we balance out samples that we expect to give high numbers with others that we expect to give low ones. In stratified sampling, we strive to get a more balanced representation of the sample space than we would get in purely random sampling. Importance sampling takes much the same approach but it is complicated enough to warrant its own lecture. Control variates let us make use of known integrals of some functions to better estimate an expectation we seek. We can also use conditional expectations to reduce variance by integrating out some part of the randomness. This talk also explores the method of common variates to synchronize computations and get more accurate comparisons.
Art Owen (Stanford) - "Foundations 4: Importance sampling"
- Abstract: Importance sampling (IS) is by far the most complicated of the variance reduction methods used in plain Monte Carlo sampling. It is aimed at problems defined by rare events or singular integrands, where special attention must be paid to points from an important region of low probability. We take extra data from the important region and then adjust away the bias. Importance sampling is sometimes also used just to mimic samples from an unnormalized distribution.
- Done well, IS can turn a problem from intractable to simple. Done poorly, IS can do the reverse, even producing infinite variance unnecessarily. The main ways to do IS are sampling from exponentially tilted distributions, sampling from Gaussian distributions tuned to a problem’s Hessian, and mixtures of such strategies. The talk will show a valuable connection between importance sampling and control variates. Adaptive importance sampling will be discussed, but only surveyed.
Art Owen (Stanford) - "Foundations 5: Introduction to MCMC"
- Abstract: Sometimes there is simply no known way to get a useful independent sample for our Monte Carlo. This problem commonly arises in the physical sciences (e.g., the hard disk problem) and also in Bayesian statistics. The next best thing is to run a Markov chain that converges to a stationary distribution matching our problem’s distribution. We get two new problems: the chain might not have gotten as close as we’d like to our target distribution, and the sampled values might be very highly correlated or completely miss parts of the space. Sometimes we can’t effectively do MCMC and then we fall back on various approximate MCMC methods.
- This talk briefly reviews some basics about Markov chains including the special property called detailed balance. Using detailed balance and acceptance-rejection sampling we derive the Metropolis-Hastings update. We look at random walk Metropolis and Gibbs sampling, as well as data analysis techniques, burn-in, thinning, batching and diagnostics. There is a brief mention of approximate Bayesian computation, Hamiltonian MCMC and variational Bayes. The latter two are featured in probabilistic programming languages Stan and Edward.
Art Owen (Stanford) - "QMC 1: Survey of QMC"
- Abstract: Monte Carlo sampling starts with points that (simulate) uniform draws from $(0,1)$ or $(0,1)^d$ . Those points will clump up in some places and leave holes in others. Quasi-Monte Carlo points are designed to be as uniform as possible, minimizing holes and clumps. Making the points more uniform can reduce the integration error. The usually attained rate is $\mathcal{O}(n^{-1+\epsilon})$ for any $\epsilon > 0$ compared to $\mathcal{O}_p(n^{-1/2})$ in MC. QMC does this by exploiting smoothness in the integrand, when it exists.
- This talk will present the basic theory of QMC: discrepancy, the Koksma-Hlawka bound, and some results on tractability. It covers the basic digital constructions of van der Corput, Halton, Sobol’, Faure and Niederreiter. Lattice rules will be covered but only briefly. QMC estimates are deterministic and their usual bounds are worst case, which can be very conservative. Randomizing the QMC points can give us operational error estimates while preserving their accuracy. Sometimes RQMC reduces the error rate to $\mathcal{O}_p(n^{-3/2+\epsilon})$ . The greatest gains from (R)QMC come for high dimensional integrands with a certain kind of low effective dimension.
Art Owen (Stanford) - "QMC 2: QMC for MCMC"
- Abstract: MCMC can vastly widen the set of problems doable by Monte Carlo. QMC sam- pling can greatly increase the quality of our Monte Carlo answers. It is natu- ral to then wonder whether and how to have both at once. To put QMC into MCMC we ideally need a single point uniformly distributed in $(0,1)^\infty$ . We ap- proximate this ideal by replacing a random number generator’s $u_1,u_2,\ldots{}$ by a completely uniformly distributed sequence. In such a sequence consecutive k-tuples $(u_i,u_{i+1},\ldots{},u_{i+k-1})\in(0,1)^k$ have low discrepancy, simultaneously for all k. Op- erationally, we find a smallish random number generator, which tries to get such balance for all small k, and then use all of it. The MCQMC approach ends up giving a non-Markov simulation of a Markov chain, which complicates analysis, much as adaptive MCMC complicates MCMC. The talk will give conditions under which MCQMC has been shown to at least be consistent. Empirically, MCQMC can show a better rate of convergence than MCMC, especially for the Gibbs sampler. Some theoretical support for a better rate exists in the thesis of Su Chen, but under very strong assumptions. This talk is based on joint work with Seth Tribble, Su Chen, Josef Dick, Makoto Matsumoto, and Takuji Nishimura. 3
Art Owen (Stanford) - "QMC 3: QMC outside the unit cube"
- Abstract: QMC is usually studied for n points nearly uniformly distributed in $(0,1)^d$ Lately there is increased interest in approximating more general distributions. This talk focusses on uniformly sampling the triangle, sphere, ball, simplex, hypersphere and Cartesian products of such sets. In Monte Carlo sampling we can apply transfor- mations to map the uniform distribution on $(0,1)^d$ to the uniform distribution on one of these sets. We can do that in QMC too, but the mappings may introduce Jacobians that are awkward for the smoothness that QMC uses. This talk will present the triangle discrepancy of Brandolini et al. (2013) and some recursive partitioning constructions that extend digital nets from the interval to the triangle or disk and to Cartesian products of spaces. The construction supports RQMC approaches. There is also a construction specific to the triangle that attains the known optimal rate for discrepancy there. Some standard volume preserving mappings used in MC are smooth enough to preserve the superior rate from QMC but not smooth enough for the superior accuracy rate that can come from RQMC. This talk is based on joint work with Kinjal Basu.
Art Owen (Stanford) - "QMC 4: Stolarsky invariance and gene set testing"
- Abstract: It is common for genomic data analysis to use p-values from a large number of random permutations. To adjust for multiple tests we might need to use a tiny p-value, and then we face a rare event problem. The permuted levels of a binary quantity, such as diseased versus healthy patients, have a regular lattice structure in the unit sphere. A permutation p-value is the fraction of those points inside a given set, here a high dimensional spherical cap. We can get a first approximation using the volume of that spherical cap. Doing so is the opposite of QMC: the exact answer is a discrete average that we approximate by a closed form integral. This talk will introduce a gene set testing motivated by Parkinson’s disease. It will show the formulation as an inverse QMC problem and then present Stolarsky’s invariance principle, a remarkably interesting geometric identity. Using Stolarsky and a further refinement, leads to a non-sampling approximation to the permuta- tion p-value with accuracy properties similar to the saddlepoint approximation. In simulations, the saddlepoint approximation looks a bit more accurate but the new method does better on some real Parkinson’s data gene sets. This talk is based on joint work with Hera He, Kinjal Basu, and Qingyuan Zhao.
Gareth Roberts (Warwick) - "Relaxing reversibility, the Zig-Zag algorithm, and MCMC for big data settings”
- Abstract: Traditional MCMC methods are based on simulating dynamics which obey detailed balance, thus creating reversible Markov chains. While this is mathematically and computationally convenient, non-reversible alternatives often have better convergence properties. The talk will discuss a new family of Monte Carlo methods based upon a multi-dimensional continuous-time Markov process known as the Zig-Zag process. The Zig-Zag and its close cousin the Bouncy Particle Sampler, are examples of piecewise deterministic Markov processes which can be readily simulated, and which offer a flexible non-reversible alternative to reversible MCMC methods. The dynamics of the Zig-Zag process correspond to a constant speed model, with the velocity of the process switching at events from a point process. The rate of this point process can be related to the invariant distribution of the process. If we wish to target a given posterior distribution, then rates need to be set equal to the gradient of the log of the posterior. The talk will discuss two generalisations which have good scaling properties for big data: firstly a sub-sampling version of the Zig-Zag process that is an example of an exact approximate scheme; and secondly a control-variate variant of the subsampling idea to reduce the variance of our unbiased estimator. Ergodic theory for the Zig-Zag will also be described.
Jeff Rosenthal (Toronto) - "Optimising and Adapting Metropolis Algorithms" short course (3 lectures)
- Abstract: We will discuss various aspects of the theoretical analysis of Markov chain Monte Carlo (MCMC) algorithms. Specific topics to be covered include quantitative convergence bounds, optimal proposal distributions, high-dimensional diffusion limits, computational complexity orders, and asymptotic validity of adaptive MCMC algorithms.