Skip to main content

CS909 Data Mining


No Warwick module is required as pre-requisite. However familiarity with basic probability and statistics (for example: discrete and continuous random variables, densities and distributions, common distributions including Bernoulli, binomial, uniform and normal distribution, expectations) will be needed.

Academic Aims

  • Understanding of the value of data mining in solving real-world problems.
  • Understanding of foundational concepts underlying data mining.
  • Understanding of algorithms commonly used in data mining tools.
  • Ability to apply data mining tools to real-world problems.

Learning Outcomes

By the end of the module, the student should

  • Display a comprehensive understanding of different data mining tasks and the algorithms most appropriate for addressing them.
  • Evaluate models/algorithms with respect to their accuracy.
  • Demonstrate capacity to perform a self directed piece of practical work that requires the application of data mining techniques.
  • Critique the results of a data mining exercise.
  • Develop hypotheses based on the analysis of the results obtained and test them.
  • Conceptualise a data mining solution to a practical problem.


  • Introduction to machine learning, basic concepts and motivation.
  • Data pre-processing and basic data transformations.
  • Regression models (linear regression, logistical regression.
  • Classification: decision trees, probabilistic generative models
  • Model evaluation, bias-variance trade-off
  • Ensemble methods: boosting, bagging & random forests.
  • Dimensionality reduction: Principal Component Analysis (PCA), T-distributed Stochastic Neighbour Embedding (t-SNE).
  • Introduction to deep learning, backpropagation, gradient descent
  • Convolutional neural networks
  • Word embeddings
  • Sequence-to-sequence models
  • Attention mechanisms and memory networks
  • Unsupervised deep learning and generative models
  • Transfer learning


Term 2

Yulan He


Online material