Please read our student and staff community guidance on COVID-19
Skip to main content Skip to navigation

ST221 Linear Statistical Modelling

Lecturer(s): Prof Martyn Plummer

Students wishing to pursue the integrated Master's MMORSE are expected to take ST221 in Year 2. The module is also strongly recommended for Data Science students intending to do substantial data analysis in their third year modules (including their third-year Data Science Project). ST221 may form part of the criteria for determining places on ST modules with capped numbers.

Pre-requisites:
Statistics students:
ST115 Introduction to Probability, ST218 Mathematical Statistics A and ST219 Mathematical Statistics B (taken concurrently).
Non-Statistics students:
ST111/ST112 Probability A & B and ST220 Introduction to Mathematical Statistics.

Leads to: ST340 Programming for Data Science, ST344 Professional Practice of Data Analysis, ST404 Applied Statistical Modelling. Students who have taken ST221 will be given priority if student numbers are over the limit for ST340 from 2019/20 and ST344 from 2020/21 onwards.

Commitment: 3 lectures/week, 4 hours of computer practical taking place in weeks 7 and 9 of Term 2 and weeks 2 and 4 of Term 3. This module runs in Term 2/3.

Aims:
To introduce the ideas and methods of statistical modelling and statistical model exploration. To introduce students to the application of R software and its use as a tool for statistical modelling, specifically for working with linear models in a variety of different scenarios.

Content:

1. Introduction to the R software. Some useful methods of examining large data sets. The use of this package to obtain important summary features in different data structures.

2. A review of the simple linear regression. Distributions of estimators and residuals.

3. An introduction to multiple regression. Estimators of these models. How the study of residuals can inform and refine model choice. How to use R to check the plausibility of such a statistical model and how to use diagnostic plots in combination with the theory of model refinement.

4. Introduction of polynomial regression and various ANOVA models. The coding and interpretation of these models using R.

5. An introduction to linear models for time series and generalized linear models for frequency data.

Books:
Data Analysis and Graphics using R, Maindonald and Braun, Cambridge Series in Statistical and Probabilistic Mathematics.

Assessment: 30% coursework and 70% examination.

Deadlines: Assignment 1: Week 10 (Term 2) and Assignment 2: Week 3 (Term 3).

Examination Period: Summer

Feedback: Feedback on Assignment 1 will be returned after 2 weeks and on Assignment 2 after 3 weeks, following submission.