Yiming Ma

Hi , my name is Yiming Ma.

I currently work at King's College London (School of Biomedical Engineering & Imaging Sciences) as a postdoctoral researcher. I recently completed my PhD in the MathSys CDT at the University of Warwick and was part of the Signal and Information Processing (SIP) Lab. My PhD research focuses on machine learning and computer vision, especially crowd counting / density estimation and multimodal representation learning.

Google Scholar: gC0aZsAAAAAJLink opens in a new window
LinkedIn: yiming-ma-5b401a201Link opens in a new window
Personal E-mail: yiming.ma.cv@gmail.comLink opens in a new window
GitHub: Yiming-MLink opens in a new window

⚠️ Notice: This page is archived and no longer actively maintained. For the latest information, please visit yiming-m.github.ioLink opens in a new window.

Preprints

2025 – arXiv: ZIP: Scalable Crowd Counting via Zero-Inflated Poisson ModelingLink opens in a new window
- Motivation: Ground-truth blockwise density maps are highly sparse (>95% zero blocks); MSE poorly models blockwise counts.
- Method: Proposes ZIP that models blockwise counts with a Zero-Inflated Poisson.
- Results: SOTA results on ShanghaiTech A & B, UCF-QNRF and NWPU with models ranging from <1M to ~100M parameters.
2022 – arXiv: Real-Time Driver Monitoring Systems through Modality and View AnalysisLink opens in a new window
- Motivation: 3D CNNs are costly while adjacent frames in in-cabin video are highly similar.
- Method: Image-level DMS with 2D encoders and (feature/decision) fusion across views/modalities; evidence for dropping explicit temporal modeling; open-set handling.
- Results: AUC-ROC 95.6 / Accuracy 92.4% on DAD with 243M-1.01GFLOPs.

Publications

2025 – ICME: CLIP-EBC: CLIP Can Count Accurately through Enhanced Blockwise ClassificationLink opens in a new window
- Motivation: CLIP in regression-style dense prediction is under-explored; existing blockwise counting models rely on Gaussian-smoothed labels, causing ambiguity.
- Method: Introduces EBC based on integer-valued bins; proposes CLIP-EBC, the first fully CLIP-based model that achieves accurate crowd counting results.
- Results: Large gains over prior blockwise methods; competitive results on NWPU-Test (MAE 58.2).
2025 – ICME (co-author): Interact with me: Joint Egocentric Forecasting of Intent to Interact, Attitude and Social ActionsLink opens in a new window
- Motivation: Assistive/AR systems need early forecasts of social intent and actions from egocentric videos.
- Method: A joint forecasting framework that shares an egocentric encoder and learns multi-task heads for intent, attitude, and social actions with temporal modelling.
- Results: Benchmarked on standard egocentric datasets, with quantitative and qualitative analyses showing the benefit of joint modelling for consistent forecasts.
2023 – CVPRW: Robust Multiview Multimodal Driver Monitoring System Using Masked Multi-Head Self-AttentionLink opens in a new window
- Motivation: Real-world DMS must be robust to view/modality collapse and occlusions.
- Method: Masked multi-head self-attention to fuse Top/Front × IR/Depth streams; supervised contrastive learning and robustness regularization.
- Results: On DAD, four-stream fusion reaches AUC-ROC 97.0% / mAP 97.8%, outperforming decision-level fusion and alternative feature-level baselines.

2022 – ICIP: FusionCount: Efficient Crowd Counting via Multiscale Feature FusionLink opens in a new window
- Motivation: Encoder-decoder counters underuse low-level features and add heavy multiscale modules.
- Method: Contrast-aware group-wise fusion of encoder features plus a dual-branch channel-reduction decoder (1×1 + dilated conv).
- Results: ShanghaiTech-B MAE 6.9 / RMSE 11.8 with ~815GFLOPs, surpassing or matching VGG-based peers (CSRNet, CAN, BL, DM-Count) at lower compute.

Experience

Research Associate, King's College London; London, UK — 2025–Now

Multimodal patient fingerprinting: Built a Multi-Modal Fingerprint (MMF) by integrating imaging, demographic, clinicopathological variables, and radiology reports to support patient-level risk stratification and personalised surveillance planning.
Longitudinal AI-driven clinical decision support: Developed and benchmarked deep learning models for AS enrolment/risk profiling, automated prostate/lesion assessment on bp-MRI, and longitudinal progression modelling; prioritised robustness to scanner/site shift and incomplete follow-up data.
Clinical translation & reporting: Prototyped a web-based standardised report aligned with PRECISE-style longitudinal assessment to streamline clinical review and improve consistency of follow-up decisions.

Research Assistant, University of Warwick; Coventry, UK — 2022–2023

Data curation: Refined and extended DAD annotations, adding 9 non-driving-related activities; prepared data for robust benchmarking.
Multiview multimodal fusion: Designed a multi-view multimodal driver monitoring system based on masked multi-head self attention; improved AUC-ROC from 88% to 97% on DAD and increased robustness to view/modality collapse.

Teaching Assistant, University of Warwick; Coventry, UK — 2023

Lab sessions: Assisted delivery of an undergraduate Python & Introductory ML module; led labs and tutorials guiding students to implement regression, classification, and neural networks in Python.
Tutoring: Provided one-to-one and small-group academic support, clarifying core programming/ML concepts and troubleshooting code and experiment design.

Education Background

2021~2025 (Doctor of Philosophy): University of WarwickLink opens in a new window, Coventry, UK 🇬🇧.

Programme: Mathematics of Systems.
Supervisors: Prof Victor SanchezLink opens in a new window & Dr Tanaya GuhaLink opens in a new window.
Research Interests: crowd counting & driver distraction detection.

2020~2021 (Master of Science): University of WarwickLink opens in a new window, Coventry, UK 🇬🇧.

Programme: Mathematics of Systems.
Group Project: Prediction of Oestrus Intervals for Guide DogsLink opens in a new window.
- Supervisor: Prof. Colm ConnaughtonLink opens in a new window.
- External Partner: Guide Dogs UK Charity For The Blind And Partially SightedLink opens in a new window.
- Teammates: Callum IlkiwLink opens in a new window & Satoshi KomuroLink opens in a new window.
Individual Project: Inception-Based Crowd CountingLink opens in a new window.
- Supervisors: Dr. Victor SanchezLink opens in a new window & Dr. Tanaya GuhaLink opens in a new window & Prof. Theo DamoulasLink opens in a new window.
- External Partner: Transport for LondonLink opens in a new window.

2016~2020 (Bachelor of Science): Southern University of Science and TechnologyLink opens in a new window, Shenzhen, China 🇨🇳.

Programme: Mathematics and Applied Mathematics.
Graduation Research Project: Contraction Methods for Composite Convex Optimisation.
- Supervisor: Prof. Bingsheng HeLink opens in a new window.

📰 Recent News

2026-01-05: Joined King's College London and started to work with Dr Michela AntonelliLink opens in a new window as a Research Associate.

2025-07-31: Implemented a HuggingFace Space for ZIPLink opens in a new window.

2025-07-31: Released the code of ZIPLink opens in a new window on GitHub.

2025-07-31: Released a new paper ZIP: Scalable Crowd Counting via Zero-Inflated Poisson ModelingLink opens in a new window on arXiv.

2025-07-04: Attended IEEE ICME 2025Link opens in a new window @ Nantes, France.

2025-03-20: CLIP-EBC and Interact with me got accepted by IEEE ICME 2025Link opens in a new window.

2025-02-03: Released the code of Interact with meLink opens in a new window on GitHub.

2024-12-21: Released a new paper Interact with me: Joint Egocentric Forecasting of Intent to Interact, Attitude and Social ActionsLink opens in a new window (co-author) on arXiv.

2024-07-17: Released the code of CLIP-EBCLink opens in a new window on GitHub.

2024-03-14: Released a new paper CLIP-EBC: CLIP Can Count Accurately through Enhanced Blockwise ClassificationLink opens in a new window on arXiv.

2023-06-18: Attended CVPR 2023Link opens in a new window online.

2023-04-13: Released the code and dataset of MHSALink opens in a new window.

2023-04-13: Uploaded the paper of MHSALink opens in a new window on arXiv.

2023-03-21: The paper Robust Multiview Multimodal Driver Monitoring System Using Masked Multi-Head Self-Attention (MHSA) got accepted by MULA WorkshopLink opens in a new window at CVPR 2023.

2022-10-19: Attended IEEE ICIP 2022Link opens in a new window online.

2022-10-17: Released a new paper Real-Time Driver Monitoring Systems through Modality and View AnalysisLink opens in a new window on arXiv.

2022-06-20: FusionCount got accepted by ICIP 2022.

2022-04-05: Released the code of FusionCountLink opens in a new window on GitHub.

2022-02-27: Released a new paper FusionCount: Efficient Crowd Counting via Multiscale Feature FusionLink opens in a new window on arXiv.

2021-10-04: Started my PhD journey at Mathsys CDT of University of Warwick.

⚙️ Services

Reviewer for TNNLS, TMM, SPL, ECCV, ACM MM, CVPRW, ICME, WACV, BMVC, ICIP.

🧰 Skills

Deep Learning Concepts: Attention Mechanism, ViT, Prompt Tuning, CLIP, Contrastive Learning, Multimodal Alignment, Multimodal Fusion.

PyTorch: TIMM, OpenCLIP, Transformers, TensorBoard, Optuna.

Python: NumPy, SciPy, Scikit-learn, Pandas, OpenCV, Matplotlib / Seaborn.

Maths & Stats: Probability Theory, Statistical Inference, Optimization, Stochastic Processes, Time Series Modeling, Survival Analysis, Computational Statistics, Real / Complex / Functional / Fourier Analysis, Measure Theory.

Development & Tools: SSH, Git, Linux, LaTeX, Markdown, MS Word.

Languages: Mandarin Chinese (native), English (IELTS: 8.0/9.0).