Important: If you decide to take ST323 you cannot then take ST412. Bear this in mind when planning your module selection. Recall: an integrated Masters student must take at least 120 CATS, of level 4+ modules over their 3rd & 4thyears.
- Statistics students: ST115 Introduction to Probability, ST218 Mathematical Statistics A, ST219 Mathematical Statistics B.
- Non-Statistics students: ST111/112 Probability A&B, ST220 Introduction to Mathematical Statistics.
The coursework uses the statistical software package R, so basic knowledge in R such as covered in ST104 Statistical Laboratory I or ST952 Introduction to Statistical Practices is helpful.
Commitment: 30 lectures. This module runs in Term 1.
Aims: Multivariate data arises whenever several interdependent variables are measured simultaneously. Such high-dimensional data is becoming the rule, rather than the exception in many areas: in medicine, in the social and environmental sciences and in economics. The analysis of such multidimensional data often presents an exciting challenge that requires new statistical techniques which are usually implemented using computer packages. This module aims to give you a good understanding of the geometric and algebraic ideas that these techniques are based on, before giving you a chance to try them out on some real data sets.
Objectives: By the end of the course students will be able to:
- Construct and interpret graphical representations of multivariate data
- Carry out a principal components and canonical correlation analysis to summarise high dimensional data
- Perform clustering analysis to discover and characterize subgroups in the population
- Use classification and discrimination methods to assign individuals into groups
- Assess multivariate normality and do multivariate tests for comparing means across groups
- Understand any additional topics covered in the lectures. Time permitting lectures will cover one or two additional topics such as Factor Analysis, Multidimensional Scaling, Random Forests, Bagging, sparse multivariate methods, Gaussian graphical models, multiple testing, Functional Data Analysis, spatial statistics.
- James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning (Vol. 112). New York: Springer.
- Johnson, R. A., & Wichern, D. W. (2007). Applied Multivariate Statistical Analysis.: Pearson Prentice Hall. Upper Saddle River, NJ.
- Friedman, J., Hastie, T., & Tibshirani, R. (2009). The elements of statistical learning (second edition). New York: Springer.
- Efron, B., & Hastie, T. (2016). Computer age statistical inference (Vol. 5). Cambridge University Press.
- Hastie, T., Tibshirani, R., & Wainwright, M. (2015). Statistical learning with sparsity: the lasso and generalizations. CRC press.
Assessment: Two assignments worth 10% each; 80% by 2-hour examination in June.
Deadlines: Deadlines: Assignment 1: week 5 and Assignment 2: week 4 (Term 2).
Examination Period: Summer
Feedback: Feedback on both assignments will be returned after 2 weeks, following submission.