Haoran Ni

My research interests lie at the intersection of the mathematics of information, machine learning, deep learning, and numerical analysis. Specifically, my work focuses on areas such as the numerical perspective of information measures, dimensionality reduction, optimal transport, machine learning algorithms, generative models, and other advanced deep neural network architectures.
Office: C0.06, ZeemanEmail: Forename(dot)Surname(at)Warwick(dot)ac(dot)uk
Current CAMaCS Projects:
- Agent-based Modelling on Simulating Human Mobility.
-
PREEMPTR: Predictive Recommendation Engine for Early Event Monitoring, Proactive Testing and Result Interpretation.
Previous CAMaCS Projects:
-
Performance-Efficiency Trade-off for Fashion Image Retrieval.
The fashion industry has been identified as a major contributor to waste and emissions, leading to an increased interest in promoting the second-hand market. Modern machine learning methods play an important role in facilitating the creation and expansion of second-hand marketplaces by enabling the large-scale valuation of used garments. We contribute to this line of work by addressing the scalability of second-hand image retrieval from databases. By introducing a selective representation framework, we can shrink databases by 10% of their original size without sacrificing retrieval accuracy. We first explore clustering and coreset selection methods to identify representative samples that capture the key features of each garment and its internal variability. Then, we introduce an efficient outlier removal method based on a neighbour-homogeneity consistency score measure, that filters out uncharacteristic samples prior to selection. We evaluate our approach on three public datasets: DeepFashion Attribute, DeepFashion Con2Shop, and DeepFashion2. The results demonstrate a clear performance-efficiency trade-off by strategically pruning and selecting representative vectors of images. The retrieval system maintains near-optimal accuracy, while greatly reducing computational costs by reducing the images added to the vector database. Furthermore, applying our outlier removal method to clustering techniques yields even higher retrieval performance by removing non-discriminative samples before the selection.
This project is supported by a £1 million UKRI grant and carried out in collaboration with Dr. Julio Hurtado and Truss. (Oct. - May. 2025)
Education:
- PhD in Mathematics for Real-world Systems at the MathSys CDT, University of Warwick, UK. (2020 - 2024)
Supervisor: Dr. Martin Lotz. Thesis: Numerical Estimation of Information Measures and Learning Generative Models.
- Msc in Mathematics for Real-world Systems at the MathSys CDT, University of Warwick, UK. (2019 - 2020)
- MSc in Modern Application of Mathematics at the Department of Mathematical Science, University of Bath, UK. (2017 -2018)
- BSc in Mathematics and Applied Mathematics (Financial Mathematics) at the Department of Statistics and Mathematics, Central University of Finance and Economics, Beijing, China. (2013 - 2017)
Previous Other Projects:
- Graph representations learning at scale.
Graph representation learning aims to map complex graph (network) data into low-dimensional vector spaces while preserving the inherent structure of the graph, enabling the application of various machine learning tasks like node classification, link prediction, and anomaly detection. The scalability of graph embeddings remains a challenge, due to the dependencies between nodes and the corresponding feature vectors. One approach to address this issue is the Local2Global algorithm, where the graph is split into overlapping patches for which local representations are learned independently. In a second step, one combines the local representations into a globally consistent representation by estimating the affine transformations that best align the local representations via group synchronization. In this project, we develop an implement a new algorithm that simultaneously learns the patch embeddings using Variational Graph Autoencoders (VGAE) and the affine transformations between overlapping patches. This new method is applied to anomaly detection in temporal Autonomous System graphs and fraud detection in cryptocurrency transaction networks, which are particularly challenging due to their size (for example, the Elliptic2 dataset for subgraph learning, with $50$ million nodes and $200$ million edges).
This project is in collaboration with Dr. Martin Lotz and Dr. Marco LaVecchia , as part of a project with the Alan Turing Institute, GCHQ and Oxford. (Mar.2024 - Mar. 2025)
- Stochastic Parareal: an application of probabilistic methods to time-parallelisation.
The project is focused on improving the rate of convergence (equivalently, computational efficiency) of Parareal (which is a time-parallel algorithm that provides speed-up for a broad variety of existing initial value problems (IVPs)) by applying stochastic methods. Certain classes of problems such as the Brusselator equations and the Lorenz systems were investigated.
The idea of stochastic methods is to generate, instead of deterministic solutions at each time interval, M solutions from a probability distribution (denoted as the 'sampling rule'), and piece together a continuous trajectory that minimises the errors at interval boundaries.
We presented in the experiments that, with the increasing number of samples M and larger variance in the sampling rule, our proposed methods tend to beat the deterministic Parareal with high probability. In chaotic systems such as the Lorenz, our methods also showed the potential to indicate multiple numerical solutions caused by small perturbations.
This project is supervised by Dr. Massimiliano TamborrinoLink opens in a new window, Dr. Debasmita Samaddar and Dr. Lynton Appel, and supported by UKAEA. (Mar. - Jun. 2020)
- Research paper classification using neural networks.
The project is focused on classifying research papers by disciplines using NLP techniques such as word embedding algorithms (word2vec & GloVe), convolutional neural networks and recurrent neural networks. Auto-optimization algorithms of hyper-parameters such as Bayesian optimization and Tree-structured Parzen estimator were implemented in the paper.
Although the training datasets are extremely small sized due to multiple difficulties in labelling research papers, the final model was successfully managed to classify more than 70000 research papers published by the Chinese Academy of Sciences (CAS). The accuracy of classification is averagely over 90% on test datasets.
This project is supported by Computer Network Information Center, CAS. (Jun. - Sep. 2019)
The project is focused on numerically estimating entropy and mutual information using k-th nearest neighbor estimators and its applications in related areas. Entropy and mutual information are defined as follows:
For continuous estimators, KSG, BI-KSG and G-knn estimators were reproduced. For discrete cases, Gao’s estimator and Multi-KL estimator were reproduced. We also improved the bias of G-knn method (not vanished yet) and proposed an approximate k-NN method which slightly outperforms the state-of-art KSG method in the paper.
The applications of these methods such as MIMO channel systems, quadrature amplitude modulation and feature selection were also discussed.
This project is supervised by Dr. Keith Briggs and supported by BT Wireless Research. (Jun. - Oct. 2018)
Teaching:
- Lecturer for MA930: Data Analysis and Machine Learning. (Oct. 2023 - Jun. 2024)
- Teaching Assistant for MA258: Mathematical Analysis III, MA4M9: Mathematics of Neuronal Networks and MA3K1: Mathematics of Machine Learning. Invited for an one day workshop on practical AI of MA930: Data Analysis and Machine Learning. (Oct. 2022 - Jun. 2023)
- Teaching Assistant for MA124: Mathematics by Computer, MA3J4: Mathematical Modelling with PDE, and MA3K1: Mathematics of Machine Learning. (Oct. 2021 - Jun. 2022)
- Teaching Assistant for MA124: Mathematics by Computer and MA3K1: Mathematics of Machine Learning. (Oct. 2020 - Jun. 2021)
Outreach:
- President of the Warwick SIAM-IMA Student Chapter. (Oct. 2022 - Jun. 2023)
- Vice-President of the Warwick SIAM-IMA Student Chapter. (Oct. 2021 - Jun. 2022)
- Seminar Organiser of the Warwick SIAM-IMA Student Chapter. (Oct. 2020 - Jun. 2021)
- Chairman of Qianfan Maths Association, Central University of Finance and Economics, China. (Sep. 2014 - Sep. 2015)
Skills / Languages:
- Proficient in Python, and machine learning libraries including PyTorch, Scikit-learn and NLTK. Proficient in MATLAB, Julia, Fortran and R.
- Native in Mandarin, Chinese.
- Fluent in English.
- Eager to learn Spanish!