# CS909 Data Mining

*CS909 15 CATS Term 2*

## Availability

Core - MSc Data Analytics, MSc Behavioural and Data Science.

Option - MSc Computer Science, Year 4 MEng CS and DM

## Pre-requisites

No Warwick module is required as pre-requisite. However familiarity with basic probability and statistics (for example: discrete and continuous random variables, densities and distributions, common distributions including Bernoulli, binomial, uniform and normal distribution, expectations) will be needed.

## Academic Aims

• Understanding of the value of data mining in solving real-world problems.

• Understanding of foundational concepts underlying data mining.

• Understanding of algorithms commonly used in data mining tools.

• Ability to apply data mining tools to real-world problems.

## Learning Outcomes

By the end of the module, the student should

- Display a comprehensive understanding of different data mining tasks and the algorithms most appropriate for addressing them.
- Evaluate models/algorithms with respect to their accuracy.
- Demonstrate capacity to perform a self directed piece of practical work that requires the application of data mining techniques.
- Critique the results of a data mining exercise.
- Develop hypotheses based on the analysis of the results obtained and test them.
- Conceptualise a data mining solution to a practical problem.

## Content

- Introduction to machine learning, basic concepts and motivation.
- Data pre-processing and basic data transformations.
- Regression models (linear regression, logistical regression.
- Classification: decision trees, probabilistic generative models
- Model evaluation, bias-variance trade-off
- Ensemble methods: boosting, bagging & random forests.
- Dimensionality reduction: Principal Component Analysis (PCA), T-distributed Stochastic Neighbour Embedding (t-SNE).
- Introduction to deep learning, backpropagation, gradient descent
- Convolutional neural networks
- Word embeddings
- Sequence-to-sequence models
- Attention mechanisms and memory networks
- Unsupervised deep learning and generative models
- Transfer learning

## Books

- Bishop, C (2008) Pattern Recognition and Machine Learning, Springer
- Goodfellow, I, Bengio, Y and Courville, A (2016). Deep Learni, MIT Press
- Leskovec, J, Rajaraman, A & Ullman, J.D. (2014). Mining of massive datasets. Cambridge university press.
- Tan, P, Steinbach, M, Karpatne, A, Kumar, V. (2019), Introduction to Data Mining, 2nd Edition
- Murphy, K. P. Machine learning: a probabilistic perspective. The MIT Press.

## Assessment

Two hour examination (50%), coursework (50%) - MEng Students

Two hour examination (40%), coursework (60%) - MSc students

## Teaching

20 one-hour lectures:

8 one-hour labs

5 one-hour seminars