Skip to main content Skip to navigation

Optimal Transport and Machine Learning

Date: 23 January 2023

Venue: Room B3.02 (Zeeman Building)

Organisers: Florian Theil and Matthew Thorpe

Registration: LinkLink opens in a new window


1:00 - 1:45 Lisa Kreusser (Bath): Wasserstein GANs Work Because They Fail (to Approximate the Wasserstein Distance)

1:45 - 2:30 Matthew Thorpe (Manchester): Linearised Optimal Transport Distances

2:30 - 3:15 Yunan Yang (ETH):Efficient natural gradient method for large-scale optimization problems

3:15 - 4:00 Tea in the Mathematics Common Room

4:00 - 4:45 Marie-Therese Wolfram (Warwick): Inverse Optimal Transport

4:45 - 5:30 David Bourne (Edinburgh): Optimal transport and geometric modelling of polycrystalline materials

5:30 - 6:30 Drinks and nibbles in the Mathematics Common Room



We present a fast algorithm for generating Laguerre diagrams with cells of given volumes, which can be used for creating RVEs of polycrystalline materials for computational homogenisation, or for fitting Laguerre diagrams to EBSD or XRD measurements of metals. Given a list of desired cell volumes, we solve a convex optimisation problem to find a Laguerre diagram with cells of these volumes, up to any prescribed tolerance. The algorithm is built on tools from computational geometry and optimal transport theory which, as far as we are aware, have not been applied to microstructure modelling before. We illustrate the speed and accuracy of the algorithm by generating RVEs with user-defined volume distributions with up to 20,000 grains in 3D. We can achieve volume percentage errors of less than 1% in the order of minutes on a standard desktop PC. We also give examples of polydisperse microstructures with bands, clusters and size gradients, and of fitting a Laguerre diagram to 3D EBSD measurements of an IF steel.


Wasserstein GANs (WGANs) were originally motivated by the idea of minimising the Wasserstein distance between a real and a generated distribution. In this talk, we investigate differences between the theoretical setup and the reality of training WGANs. We point out that there are two notions of the Wasserstein distance implicitly used in the WGAN literature: the distributional and the batch Wasserstein distance. We gather theoretical and empirical evidence that the WGAN loss is not a meaningful approximation of neither the distributional nor the batch Wasserstein distance. Further, we argue that the batch Wasserstein distance is not even a desirable loss function for deep generative models. We conclude that the success of WGANs can be attributed to the failure to approximate the batch Wasserstein distance.


Optimal transport is a powerful tool for measuring the distances between signals and images. A common choice is to use the Wasserstein distance where one is required to treat the signal as a probability measure. This places restrictive conditions on the signals and although ad-hoc renormalisation can be applied to sets of unnormalised measures this can often dampen features of the signal. The second disadvantage is that despite recent advances, computing optimal transport distances for large sets is still difficult. In this talk I will extend the linearisation of optimal transport distances to the Hellinger--Kantorovich distance, which can be applied between any pair of non-negative measures, and the TLp distance, a version of optimal transport applicable to functions. Linearisation provides an embedding into a Euclidean space where the Euclidean distance in the embedded space is approximately the optimal transport distance in the original space. This method, in particular, allows for the application of off-the-shelf data analysis tools such as principal component analysis as well as reducing the number of optimal transport calculations from O(n^2) to O(n) in a data set of size n. I will touch on a range of applications such as data generation, classification of particle decays and colour transfer.


Discrete optimal transportation problems arise in various contexts in engineering, the sciences and the social sciences. Often the underlying cost criterion is unknown, or only partly known, and the observed optimal solutions are corrupted by noise. In this talk we propose a systematic approach to infer unknown costs from noisy observations of optimal transportation plans. The algorithm requires only the ability to solve the forward optimal transport problem, which is a linear program, and to generate random numbers. It has a Bayesian interpretation, and may also be viewed as a form of stochastic optimization.

We illustrate the developed methodologies using the example of international migration flows. Reported migration flow data captures (noisily) the number of individuals moving from one country to another in a given period of time. It can be interpreted as a noisy observation of an optimal transportation map, with costs related to the geographical position of countries. We use a graph-based formulation of the problem, with countries at the nodes of graphs and non-zero weighted adjacencies only on edges between countries which share a border. We use the proposed algorithm to estimate the weights, which represent cost of transition, and to quantify uncertainty in these weights.


First-order methods are workhorses for large-scale optimization problems, but they are often agnostic to the structural properties of the problem under consideration and suffer from slow convergence, being trapped in bad local minima, etc. Natural gradient descent is an acceleration technique in optimization that takes advantage of the problem's geometric structure and preconditions the objective function's gradient by a suitable "natural" metric. Despite its success in machine learning, the natural gradient descent method is far from a mainstream computational technique due to the computational complexity of calculating and inverting the preconditioning matrix. This work aims at a unified computational framework and streamlining the computation of a general natural gradient flow via efficient tools from numerical linear algebra. We obtain robust numerical methods for natural gradient flows without directly calculating, storing, or inverting the dense preconditioning matrix. We treat various natural gradients in a unified framework for any loss function.