APTS module: Statistical Machine Learning
Module leader: L Aslett
Please see the full Module Specifications for background information relating to all of the APTS modules, including how to interpret the information below.
Aims: This module introduces students to modern supervised machine learning methodology and practice, with an emphasis on statistical and probabilistic approaches in the field. The course seeks to balance theory, methods and application, providing an introduction with firm foundations that is accessible to those working on applications and seeking to employ best practice. There will be exploration of some key software tools which have facilitated the growth in the use of these methods across a broad spectrum of applications and an emphasis on how to carefully assess machine learning models.
Learning Outcomes: Students following this module will gain a broad view of the supervised statistical machine learning landscape, including some of the main theoretical and methodological foundations. They will be able to appraise different machine learning methods and reason about their use. In particular, students completing the course will gain an understanding of how to practically apply these techniques, with an ability to critically evaluate the performance of their models. Students will also have an insight into the extensive software libraries available today and their use to construct a full machine learning pipeline.
Prerequisites
To undertake this module, students should have:
-
at least one undergraduate level course in probability and in statistics;
-
standard undergraduate level knowledge of linear algebra and calculus;
-
solid grasp of statistical computing in R;
-
knowledge of statistical modelling, including regression modelling (eg. APTS Statistical Modelling course);
-
some basic understanding of optimisation methods beneficial, but not essential.
As preparatory reading, the enthusiastic student may choose to browse An Introduction to Statistical Learning (James et al., 2013) (freely and legally available online), which covers some of the topics of the course at a more elementary and descriptive level.
Textbooks at roughly the level of the course include:
-
The Elements of Statistical Learning (Friedman, Tibshirani, and Hastie) • Pattern Recognition and Machine Learning (Bishop)
-
Machine Learning: A Probabilistic Perspective (Murphy)
-
Deep Learning (Goodfellow, Bengio and Courville)
Topics
-
Formulation of supervised learning for regression and classification (scoring/probabilistic, decision boundaries, generative/discriminative), loss functions and basic decision theory;
-
Theory of model capacity, complexity and bias-variance decomposition;
-
Curse of dimensionality;
-
Overview of some key modelling methodologies (eg logistic regression, local methods, kernel smoothing, trees, boosting, bagging, forests);
-
Model selection, ensembles, tuning and super-learning;
-
Evaluation of model performance, validation and calibration and their
reporting in applications;
-
Reproducibility;
- Coverage of some key software frameworks for applying machine learning in the real world.
Assessment: An exercise set by the module leader involving practical use of some of the machine learning methods covered and critical evaluation of their performance.