Skip to main content

Computational Statistics with Python

Pre-requisites

An understanding of statistics and probability to the level of an undergraduate statistics class.

Introduction

This module aims to introduce students to many of the advanced statistical techniques made possible by innovations in computing and modern processing power. This includes Markov chain Monte Carlo approaches, probabilistic methods, Bayesian statistics, dimension reduction and high performance computing.


Objectives

Upon successful completion participants will be able to:

  1. Write original, non-trivial Python applications and algorithms.
  2. Develop a sound understanding of current, modern computational statistical approaches and their application to a variety of datasets.
  3. Comprehension of current applications of Bayesian statistics and their impact on computational statistics.
  4. Automate dimension reduction techniques on a variety of complex datasets and critically evaluate the results.
  5. Comprehension of the core concepts and critically evaluate potential use cases of probabilistic programming.
  6. Evaluate and optimise algorithms for better computational performance.


Syllabus

  • Programming in Python
    - Python structures and syntax
    - Statistical computing
    - Numpy, Scipy and Pandas
  • Dimension Reduction
    - Theoretical background
    - Principal Component Analysis
    - Dimension reduction for feature engineering
  • Probabilistic Programming
    - Bayesian methods for computational statistics
    - PyMC
  • Sampling / MCMC Methods
    - Monte Carlo Methods
    - Markov Chains
    - Bootstrapping
  • Big Data and High Performance Computing
    - Parallelisation
    - PySpark and MapReduce
    - Analysis of Algorithms
    - CPython, Cython and Numba


Assessment

3500 Words Post Module Assessment (50 hours, 70% weighting) and 3.5 hour in-class test (30% weighting)


Duration

1 week (delivered over 2 weeks), to include lectures, seminars, workshops drop-in-sessions and other activities, in total approximately 33 contact hours.