# Machine Learning Research Group

This page is out of date. The Machine Learning Research Group is now run by Ayman Boustati and Jev Gamper, and you can visit the new web page here.

We meet informally every week to discuss Machine Learning topics over biscuits and tea. We alternate between **Speaker Series** and **CoreML** sessions. Participants come from a variety of disciplines so please feel free to come along!

The format of the **Speaker series** is one presenter per meeting who chooses the topic. The talk could be about anything Machine Learning related - your own research, an interesting ML paper, or some new exciting method.

In the weeks when the Speaker Series is not running, you can get your Machine Learning fix at the CoreML group where we discuss the latest ideas within the Machine Learning community

#### Previous Meetings:

Term 1

- 25/09/2017, Alejandra Avalos Pacheco

**Batch effect correction using Bayesian factor analysis and latent factor regression with sparse prior specification (local and non-local)**

Experimental variation, such as "batch effects", is present in many large datasets and adds difficulty to the task of integrate and summarise these data for exploratory analyses. In order to keep up with the large influx of biological data, available due to high-throughput technologies, new dimensionality reduction techniques are needed for effective understanding of these vast amounts of information. There is a need for innovative methods that can can integrate data from different batches while preventing these technical biases from dominating the results.

We provide a model based on factor analysis and latent factor regression, which incorporates a novel adjustment for variance batch effects often observed in bioinformatics data. This model is extended by using different sparse priors (local and non-local). Finally, a toy example and a motivation case study based on ovarian cancer datasets are presented and discussed.

- 09/10/2017, Laura Guzman Rincon,

**Shortening number of questions in long psychological questionnaires**For many companies, it is relevant to extract personal data from clients and transform it into valuable information for the company. In some cases, the data is extracted through qualitative questionnaires asked directly to users, and then, the data collected is represented in a discrete or categorical form. However, it is expensive for clients to answer long questionnaires. In this context, the following question emerges: is it possible to ask only a subset of questions and infer the remaining ones? If N features have been collected from M users, the problem can be faced as an N-dimensional problem with M points. For our particular dataset, clustering methods have been applied, as well as random forest, SVM, and some simple Neural Networks, but the nature of data limits the performance of those approaches. In this talk, the problem will be explained in detail as well as the main drawbacks. Which methods can be used to reduce the number of questions in this particular case?

- 23/10/2017, Bhavan Chahal,

**Using Deep Learning To Infer House Prices From Google Street View Images**

House prices have been on the rise in the UK for a number of years and many researchers have attempted to predict them. With major improvements in computing power and technology, a rise in deep learning methods and their application to big data has been observed. Can the two be combined in a novel way? We first explore whether house prices in London in the years 2015 and2016 can be estimated using Google Street View images and deep learning tools. We find that house prices can be inferred from Google Street View images, although predictions for areas with very high prices are less reliable. We also explore whether we can automatically identify areas which have experienced the highest relative increases in house price over the past ten years. Here we find that predictions are reasonable, but the model appears to mostly label areas which resemble the centre of London as areas in which house prices have risen the most, rather than capturing more nuanced patterns across the city. Analysis of changes in visual features across time may lead to better performance on this task. Our findings demonstrate the potential for large scale image data combined with deep learning techniques to inform our understanding of the current economic stateof local areas. Such novel, low cost metrics may benefit a range of policy stakeholders, including national statistics agencies such as the Office for National Statistics.

- 06/11/2017, Elena Kochkina,

**Breaking the rumour mill: Resolving rumour stance and veracity in social media conversations**

False information circulating on social media presents many risks as social media is used as a source of news by many users. Detecting rumourous content is important to prevent the spread of false information which can affect important decisions and stock markets. Rumour stance classification is considered to be an important step towards rumour verification as claims that attract a lot of scepticism among users are more likely to be proven false later. Therefore performing stance classification well is expected to be useful in debunking false rumours. In our work we classify a set of Twitter posts from conversations discussing rumours as either supporting, denying, questioning or commenting on the underlying rumours. We propose a LSTM-based sequential model that achieves state-of-the-art results on this task through modelling the conversational structure of tweets.

- 20/11/2017, Michael Pearce

**NP-hardness of Super Mario Bros**

I will first give a 5 minute presentation on my PhD work, briefly summarizing three problems we have considered in the field of black box optimisation.

Secondly, I will give an overview of my experience with the internship application process with DeepMind - dates, interviews, emails, conversations and the offer email.

Finally, for the main event, I will describe how to show that Super Mario Brothers is an NP-hard problem, The classic 1985 Nintendo TV game, and many others including minesweeper and 'The Legend of Zelda' have the representational power to encode Boolean Satisfiability problems, thus if an algorithm can solve any level of Mario/Zelda/minesweeper, then that algorithm must also have the power to solve Boolean satisfiability problems which are NP-complete! - 04/12/2017, Matt Neal

**Big (Digital) Brother: Why you should be more worried about Machine Learning than Killer Robots**

The prospect of true artificial intelligence in the near future has fired up the imagination of headline writers around the world. But beyond headlines like "Elon Musk’s Billion-Dollar Crusade to Stop the A.I. Apocalypse", a more prosaic but no less pressing problem is already affecting the world we live in. From enhanced surveillance techniques, to advert profiling, to re-identification of anonymised data, machine learning is racing ahead of legislation. All too often we only ask "should we do this?" once we have answered "can we do this?".

In this talk we will discuss the ways that machine learning has changed and is changing what we, our governments, corporations, and criminals are capable of, and what the ethical implications of these new capabilities are.

#### Term 2

- 12/01/2018, Henry Jia,

**Generative Adversarial Networks**

Generative Adversarial Networks are currently the most powerful class of generative model for natural images. The model is formed of a generator and a discriminator which play a minimax game. The discriminator seeks to discriminate between real and fake image whilst the generator seeks to fool the discriminator into believing its images are real. Both the generator and the discriminator are parameterised via convolutional neural networks and are trained via stochastic gradient descent. However, GANs are notoriously difficult to train as the adversarial optimisation does not always converge. I will also discuss various methodologies people from the field use to attempt to stabilise GAN training.

I will base my talk on the original papers which proposed the GAN methodhttps://papers.nips.cc/paper/5423-generative-adversarial-nets

https://arxiv.org/abs/1511.06434If time permits, I will also talk briefly about the application of GANs to super-resolution problems

- 29/01/2018, François-Xavier Briol

**The Information Geometry of Maximum Mean Discrepancy Estimator**Likelihood-based inference methods have several limitations: most notably they cannot handle generative models. In this work, we study alternative estimators based on minimising the square of the maximum mean discrepancy in some reproducing kernel Hilbert space. We make use of information geometry to provide theory on the asymptotic properties and robustness of these estimators, then propose a novel natural gradient algorithm for efficient implementation.

- 12/02/2018, Zhangdaihong Liu

**Machine Learning in Population Neuroimaging and Behaviour**

In this talk, I will describe the applications of machine learning and statistical techniques, like PCA, CCA, in finding links between brain connectivity and behavioural data. I will introduce a refined method of PCA, called Supervised Dimension Reduction (SDR), that can estimate the dimension of the low-rank representation of the data automatically. Another focus of the talk would be how those techniques could be applied to improve the interpretability of the results. I will finish the talk by addressing the challenges we face when working with real-world big health data.

- 26/02/2018, Xiaoyue Xi

**Extending Bayesian quadrature for multiple integrals case**

Bayesian quadrature is developed to address the uncertainty due to incomplete information about mathematical integration problems. In this talk, I will briefly review Bayesian quadrature rule and discuss its extension to evaluate multiple related integrals. The efficiency of this extension will then be demonstrated through applications to multi-fidelity modelling and global illumination in computer graphics.

- 12/03/2018, Bernardo Pérez Orozco

**Memory-based ordinal regression with recurrent neural networks**

Time series forecasting is an ubiquitous task. From energy forecasting to preventive medicine, decision makers often face the prospect of predicting a quantity ahead of time. To be of use, such predictions must be accompanied by accurate and realistic uncertainty bounds. Failure to do so could lead decision-makers to make the wrong move, risking anything from big sums of money to human life.

In this talk, we will introduce a recurrent neural network-based framework to perform autoregressive time series forecasting in an ordinal fashion. Allowing for ordinal autoregression enables our model to learn a nonparametric emission distribution, and is thus allowed to approximate both Gaussian and non-Gaussian behaviour without any prior assumptions. Additionally, encoding observations in an ordinal manner also allows for propagation of noisy observations naturally.

We show empirically that our model can achieve or otherwise improve on state-of-the-art performance attained by models such as Gaussian Processes autoregression in tasks such as long-term forecasting and event occurrence timing.

#### Term 3

- 14/05/2018, Jev Gamper (University of Warwick)

**On importance of explicit assumptions, and benefits of generative thinking in medical image**

**analysis - a case of digital histopathology**

In this talk I will outline some of the common problems tackled in digital pathology. In particular,

using a generative point of view we will attempt to obtain insights about the nature of problems at

hand, and will speculate on the validity of the solutions and modelling approaches currently used in

the field. An objective treatment of the assumptions behind the current state of the art solutions

and their implications is of crucial importance, as these methods are starting to be widely adopted in

clinical applications. - 21/05/2018, Jim Skinner (University of Warwick)

**The Zoo of Latent Variable Models**

There are very many unsupervised dimension reduction techniques (PCA, ICA, SPCA, etc…), and learning about them all is exhausting. Luckily they tend to fall under the category of ''Latent Variable Model’’ (LVM), meaning there is a coherent framework that lets us think about all of these techniques at once, and the important ways in which they differ. I will be chatting about a few different unsupervised learning techniques and linking them together.

- 04/06/2018, Alessandra Tosi (Mind Foundry)

**Machine learning models for industrial internet-of-things data**Our research goal is to fully automate the data science pipeline, in a principled way, integrating probabilistic reasoning and uncertainty estimation into the process. We will present the challenges of building a platform which enhances data, both structured and unstructured, with innovative machine learning algorithms.

We will give examples of commercial application domains and specific case studies where the full value of machine learning can be exploited to address operational challenges and service growth opportunities.

**About Mind Foundry:**Mind Foundry is a technology spin-out from the University of Oxford’s Machine Learning Research Group (MLRG), within Information Engineering in the Department of Engineering Science. The group pioneers probabilistic reasoning applied to problems in science, industry and commerce. Mind Foundry refines a foundation of principled academic research, developing new ML capabilities to solve real-world problems at scale and speed.

**Webpage**: https://mindfoundry.ai/

- 11/06/2018, Justina Zurauskiene (Centre for Computational Biology, University of Birmingham)

**Segmental Length Constraint Based Learning For Hidden Markov Model**Hidden Markov models (HMMs) are widely used in the analysis of sequential data. On the simplest scale HMM can be understood as an extension to the standard mixture model where discrete latent variables are related via Markov process. For a given sequence of observation it is usually of interest to learn the corresponding HMM model and estimate the sequence of latent variables, which may have some practical interpretations. Usually, the statistical inference for HMM can be achieved by maximizing the joint probability of parameters and data, whereas the reconstruction of latent states is obtained using Viterbi algorithm.

In many HMM application areas, the estimates for transition matrix probabilities might be affected by the distance between emission distributions. For this reason, it is expected that component densities, which are closer to each other in emission space, will lead to the reconstructed sequence of latent variables having transition errors between corresponding states. In this work we present an approach that allows one to deal with previously described transition errors. We propose to adjust/estimate the transition matrix probabilities by taking into account the distance between emission densities, where distance is measure using the Kullback–Leibler divergence, and demonstrate that sequences of desired minimum length can be retained in latent variables using Viterbi. This constrained based learning for latent variables might be of interest e.g. in genomic data analysis — modelling DNA copy number alterations, where specific segment length constraints must be satisfied. We test this approach on simulated data examples.