# CS909 Data Mining

## Pre-requisites

No Warwick module is required as pre-requisite. However familiarity with basic probability and statistics (for example: discrete and continuous random variables, densities and distributions, common distributions including Bernoulli, binomial, uniform and normal distribution, expectations) will be needed.

## Academic Aims

- Understanding of the value of data mining in solving real-world problems.
- Understanding of foundational concepts underlying data mining.
- Understanding of algorithms commonly used in data mining tools.
- Ability to apply data mining tools to real-world problems.

## Learning Outcomes

By the end of the module, the student should

- Display a comprehensive understanding of different data mining tasks and the algorithms most appropriate for addressing them.
- Evaluate models/algorithms with respect to their accuracy.
- Demonstrate capacity to perform a self directed piece of practical work that requires the application of data mining techniques.
- Critique the results of a data mining exercise.
- Develop hypotheses based on the analysis of the results obtained and test them.
- Conceptualise a data mining solution to a practical problem.

## Content

- Introduction to machine learning, basic concepts and motivation.
- Data pre-processing and basic data transformations.
- Regression models (linear regression, logistical regression.
- Classification: decision trees, probabilistic generative models
- Model evaluation, bias-variance trade-off
- Ensemble methods: boosting, bagging & random forests.
- Dimensionality reduction: Principal Component Analysis (PCA), T-distributed Stochastic Neighbour Embedding (t-SNE).
- Introduction to deep learning, backpropagation, gradient descent
- Convolutional neural networks
- Word embeddings
- Sequence-to-sequence models
- Attention mechanisms and memory networks
- Unsupervised deep learning and generative models
- Transfer learning