ST117 Introduction to Statistical Modelling
ST11715 Introduction to Statistical Modelling
Introductory description
This module is an introduction to statistical thinking, modelling, and inference. At its core are the concepts of a statistical model and the associated likelihood function, as well as their manipulation to obtain rigorous inferences.
This module is core for students with their home department in Statistics and available to students from other departments for whom it is a listed option. It will be useful for all subsequent modules on statistics.
This module is NOT available as an unusual option. Students from outside the Statistics dept who are interested in a first year statistics module should consider taking ST121 Statistical Laboratory.
Prerequisites:
Statistics students: ST118 Probability 1
NonStatistics students: ST120 Introduction to Probability
Module aims
To introduce the students to statistical thinking, formal reasoning under uncertainty, and the specification of a statistical model.
To build a foundation for likelihoodbased statistical inference.
To connect mathematical models and inferences to realworld results, as well as provide practice in communicating them effectively.
To introduce computational tools and concepts necessary for modern data science.
To consider how collection, choice, or preprocessing of data sets influences the results of statistical analyses.
To convey basic ethical concepts arising with the generation, interpretation, and dissemination of data and information.
Outline syllabus
This is an indicative module outline only to give an indication of the sort of topics that may be covered. Actual sessions held may differ.
This module introduces the inherently interdisciplinary field of modern statistics. It covers the specification of appropriate statistical models for a variety of data sets, the mathematical underpinnings of modelbased statistical inference, and the interpretation and communication of inference outcomes. The R computational language is introduced and used as a toolkit for examples throughout.
Learning outcomes
By the end of the module, students should be able to:
 Describe an appropriate probabilistic model and associated likelihood function for a simple data set.
 Calculate estimates of unknown parameters and their associated uncertainty based on a simple model and observed data.
 Compare model predictions and observed data graphically.
 Describe the modelling assumptions underlying a simple statistical model.
 Interpret model output to inform decisions or further experiments.
 Know the R programming environment well enough to write simple scripts to accomplish computational or data visualisation tasks.
 Discuss ethical aspects of a data collection/selection, their statistical analysis, and the interpretation and communication of the results.
Indicative reading list
See Talis Aspire link.
View reading list on Talis Aspire
Subject specific skills
 Select and apply appropriate mathematical and/or statistical techniques.
 Create structured and coherent arguments communicating them in written form.
 Construct and develop logical mathematical arguments with clear identification of assumptions and conclusions.
 Communicate subjectspecific information effectively and coherently.
 Analyse problems, abstracting their essential information formulating them using appropriate mathematical language to facilitate their solution.
 Select and apply appropriate statistical programming language (for example, R) for exploratory data analysis.
 Understand major aspects of data collection, generation, and quality, and how this influences analyses and conclusions.
Transferable skills
 Critical thinking: extracting patterns from incomplete data and using them to form evidencebased conclusions.
 Problem solving: use of logical reasoning to build arguments grounded in evidence and with explicit underlying assumptions.
 Selfawareness: monitoring of your own learning and seeking feedback.
 Communication: verbal discussion of ideas in seminars and among peers; written communication in assignments and the final project.
 Teamwork: collaboration with peers in seminars and during selfstudy.
 Information literacy: evaluation of data and uncertainty in a modelbased way.
 Digital literacy: use of computational tools to understand and visualise data, and to produce reports.
 Professionalism: selfmotivation, taking charge of your own learning, and prioritising effectively.
 Ethics: reflect on professional responsibilities as a statistician in conjunction with the generation and dissemination of information.
Study time
Type  Required  Optional 

Lectures  30 sessions of 1 hour (20%)  2 sessions of 1 hour 
Seminars  10 sessions of 1 hour (7%)  
Private study  22 hours (15%)  
Assessment  88 hours (59%)  
Total  150 hours 
Private study description
Weekly revision of lecture notes, work on problem sheets, study for quizzes, participate in activities, preparation of the final project.
Costs
No further costs have been identified for this module.
You do not need to pass all assessment components to pass the module.
Assessment group A
Weighting  Study time  

Exercise sheet 1  10%  6 hours 
One of three exercise sheets supported by seminars, including both analytical and computational tasks. The problem sheets will contain a number of questions for which solutions and / or written responses will be required. The preparation and completion time noted below refers to the amount of time in hours that a wellprepared student who has attended lectures and carried out an appropriate amount of independent study on the material could expect to spend on this assignment. 

Exercise sheet 2  12%  6 hours 
One of three exercise sheets supported by seminars, including both analytical and computational tasks. The problem sheets will contain a number of questions for which solutions and / or written responses will be required. The preparation and completion time noted below refers to the amount of time in hours that a wellprepared student who has attended lectures and carried out an appropriate amount of independent study on the material could expect to spend on this assignment. 

Exercise sheet 3  13%  6 hours 
One of three exercise sheets supported by seminars, including both analytical and computational tasks. The problem sheets will contain a number of questions for which solutions and / or written responses will be required. The preparation and completion time noted below refers to the amount of time in hours that a wellprepared student who has attended lectures and carried out an appropriate amount of independent study on the material could expect to spend on this assignment. 

Multiple Choice Quiz 1  11%  5 hours 
A multiple choice quiz which will take place during the term that the module is delivered. 

Multiple Choice Quiz 2  14%  7 hours 
A multiple choice quiz which will take place during the term that the module is delivered. 

Final project  36%  50 hours 
A written report on a project completed over an extended period, based on a project outline provided to the student. The scope of the project spans the whole module syllabus and may include: specification of an appropriate likelihood from a description of an experiment, justified choice of estimators or statistical methods to answer a research question, execution of a statistical analysis in R, production of appropriate visualisations to assess inference results and model fit, and communication of the context and results of the analysis to a nonspecialist audience. Should not exceed 10 pages in length, including legible figures, displayed equations, and appropriate code snippets; the word limit is indicative of these expectations. Further details on the structure (e.g. word counts in sections) will be specified in written instructions. 

Activity 1  2%  4 hours 
Short example of data visualisation or aspects of data analysis to share in a very short oral presentation with other students. 

Activity 2  2%  4 hours 
Short example of data visualisation or aspects of data analysis to share in a very short oral presentation (a few minutes) with other students. 
Assessment group R
Weighting  Study time  

Reassessment as an individual project  100%  
This is an individual project replacing any parts of the module that need to be reassessed. 
Feedback on assessment
Individual feedback will be provided on problem sheets by class tutors, and on the final project by the lecturer. A cohortlevel summary will also be available for the project. Students are actively encouraged to make use of office hours to build up their understanding, and to view all their interactions with lecturers and class tutors as feedback.
Courses
This module is Core for:
 Year 1 of USTAG302 Undergraduate Data Science
 Year 1 of USTAG304 Undergraduate Data Science (MSci)
 Year 1 of USTAG300 Undergraduate Master of Mathematics,Operational Research,Statistics and Economics
 Year 1 of USTAG1G3 Undergraduate Mathematics and Statistics (BSc MMathStat)
 Year 1 of USTAGG14 Undergraduate Mathematics and Statistics (BSc)
 Year 1 of USTAY602 Undergraduate Mathematics,Operational Research,Statistics and Economics
This module is Option list B for:
 Year 1 of UECAGL12 Undergraduate Mathematics and Economics (with Intercalated Year)
This module is Option list C for:

UMAAGV17 Undergraduate Mathematics and Philosophy
 Year 1 of GV17 Mathematics and Philosophy
 Year 1 of GV17 Mathematics and Philosophy
Catalogue 
Resources 
Feedback and Evaluation 
Grade Distribution 
Assessments dates for Statistics modules, including coursework and examinations, can be found in the Statistics Assessment Handbook.