# MA3K0 High Dimensional Probability

Term(s): Term 2

Status for Mathematics students:

Commitment: 10 x 3 hour lectures + 9 x 1 hour support classes

Assessment: Assessed homework sheets (15%) and Summer exam (85%)

Formal registration prerequisites: None

Assumed knowledge: Basic probability theory: random variables, law of large numbers, Chebycheff inequality, distribution functions, expectation and variance, Bernoulli distribution, normal distribution, Poisson distribution, exponential distribution, de Moivre Laplace theorem e.g. ST111 Probability A & ST112 Probability B.

Some basic skills in analysis: MA258 Mathematical Analysis III or MA259 Multivariate Calculus or ST208 Mathematical Methods or MA244 Analysis III. The module works in Euclidean vector space \${R}^n \$ , so norm, basic inequalities, scalar product, linear mappings and matrix algebra (eigenvalues, eigenvectors, singular values etc) are relevant.

Useful background: Know what a a probability measure/distribution is. Earlier probability modules will be of some use but not necessary. The framework is some mild probability theory (e.g. ST202 Stochastic Processes). Know what the Central Limit Theorem is (de Moirvre Laplace for general random variables).

Synergies: In general the module is a mathematical basis for machine learning, data science and random matrix theory. The following modules provide some synergies and connections:

There are also strong links and thus suitable combinations to the following modules:

Leads to: The following modules have this module listed as assumed knowledge or useful background:

Content:

• Preliminaries on Random Variables (limit theorems, classical inequalities, Gaussian models, Monte Carlo)
• Basic Information theory (entropy; Kull-Back Leibler information divergence)
• Concentrations of Sums of Independent Random Variables
• Random Vectors in High Dimensions
• Random Matrices
• Concentration with Dependency structures
• Deviations of Random Matrices and Geometric Consequences
• Graphical models and deep learning

Aims:

• Concentration of measure problem in high dimensions
• Three basic concentration inequalities
• Application of basic variational principles
• Concentration of the norm
• Dependency structures
• Introduction to random matrices

Objectives:

By the end of the module the student should be able to:

• Understand the concentration of measure problem in high dimensions
• Distinguish three basic concentration inequalities
• Distinguish between concentration for independent families as well as for various dependency structures
• Understand the basic concentrations of the norm
• Be familiar with random matrices (main properties)
• Be able to understand basic variational problems
• Be familiar with some application of graphical models

Books:

We won't follow a particular book and will provide lecture notes. The course is based on the following three books where the majority is taken from [1]:

[1] Roman Vershynin, High-Dimensional Probability: An Introduction with Applications in Data Science, Cambridge Series in Statistical and Probabilistic Mathematics, (2018).

[2] Kevin P. Murphy, Machine Learning - A Probabilistic Perspective, MIT Press (2012).

[3] Simon Rogers and Mark Girolami, A first course in Machine Learning, CRC Press (2017).

[4] Alex Kulesza and Ben Taskar, Determinantal point processes for machine learning, Lecture Notes (2013).