# MA3K1 Mathematics of Machine Learning

Lecturer: Martin Lotz

Term(s): Term 2

Status for Mathematics students:

Commitment: 10 x 3 hour lectures with support classes

Assessment: 85% 2 hour Examination, 15% Assignments

Formal registration prerequisites: None

Assumed knowledge: The module assumes good working knowledge of MA106 Linear Algebra, MA259 Multivariable Calculus and ST111 Probability Part A, as provided by the compulsory modules in the first two years for Maths programmes. In addition MA260 Norms, Metrics and Topologies or MA222 Metric Spaces - Notion of norm, metric, topology and convergence, open and closed sets as well as compactness.

Useful background: The Term 1 modules MA359 Measure Theory and MA3K0 High Dimensional Probability will provide useful background. Programming skills and knowledge of Numerical Analysis (as covered in MA261 Differential Equations: Modelling and Numerics) are beneficial but not required.

Synergies: The module combines well with MA3K0 High Dimensional Probability.

Content:

Fundamentals of statistical learning theory:

• Regression and classification
• Empirical risk minimization and regulation
• VC theory

Optimization:

• Basic algorithms (gradient descent, Newton’s method)
• Convexity, Lagrange duality and KKT theory
• Quadratic optimization and support vector machines
• Accelerated and stochastic algorithms

Machine learning:

• Neural networks and deep learning
• Kernel methods and Gaussian processes
• Recurrent neural networks
• Applications (pattern recognition, time series prediction)
• Applications (pattern recognition, time series prediction)

Aims:

The aim of this course is to introduce Machine Learning from the point of view of modern optimization and approximation theory.

Objectives:

By the end of the module the student should be able to:

• Describe the problem of supervised learning from the point of view of function approximation, optimization, and statistics
• Identify the most suitable optimization and modelling approach for a given machine learning problem
• Analyse the performance of various optimization algorithms from the point of view of computational complexity (both space and time) and statistical accuracy
• Implement a simple neural network architecture and apply it to a pattern recognition task

Books:

1. Friedman, Jerome, Trevor Hastie, and Robert Tibshirani. The elements of statistical learning. Springer series in statistics, 2001.
2. Beck, Amir. First-Order Methods in Optimization. Vol. 25. SIAM, 2017.
3. Vapnik, Vladimir. The nature of statistical learning theory. Springer, 2013.
4. Cucker, Felipe, and Ding Xuan Zhou. Learning theory: an approximation theory viewpoint. Vol. 24. Cambridge University Press, 2007.

5. Higham, Catherine F. and Desmond J. Higham. Deep Learning: An Introduction for Applied Mathematicians.   arXiv preprint arXiv:1801.05894 (2018).