CS909 Data Mining
CS909 15 CATS Term 2
Availability
Core - MSc Data Analytics, MSc Behavioural and Data Science.
Option - MSc Computer Science, Year 4 MEng CS and DM
Pre-requisites
No Warwick module is required as pre-requisite. However familiarity with basic probability and statistics (for example: discrete and continuous random variables, densities and distributions, common distributions including Bernoulli, binomial, uniform and normal distribution, expectations) will be needed.
Academic Aims
• Understanding of the value of data mining in solving real-world problems.
• Understanding of foundational concepts underlying data mining.
• Understanding of algorithms commonly used in data mining tools.
• Ability to apply data mining tools to real-world problems.
Learning Outcomes
By the end of the module, the student should
- Display a comprehensive understanding of different data mining tasks and the algorithms most appropriate for addressing them.
- Evaluate models/algorithms with respect to their accuracy.
- Demonstrate capacity to perform a self directed piece of practical work that requires the application of data mining techniques.
- Critique the results of a data mining exercise.
- Develop hypotheses based on the analysis of the results obtained and test them.
- Conceptualise a data mining solution to a practical problem.
Content
- Introduction to machine learning, basic concepts and motivation.
- Data pre-processing and basic data transformations.
- Regression models (linear regression, logistical regression.
- Classification: decision trees, probabilistic generative models
- Model evaluation, bias-variance trade-off
- Ensemble methods: boosting, bagging & random forests.
- Dimensionality reduction: Principal Component Analysis (PCA), T-distributed Stochastic Neighbour Embedding (t-SNE).
- Introduction to deep learning, backpropagation, gradient descent
- Convolutional neural networks
- Word embeddings
- Sequence-to-sequence models
- Attention mechanisms and memory networks
- Unsupervised deep learning and generative models
- Transfer learning
Books
- Bishop, C (2008) Pattern Recognition and Machine Learning, Springer
- Goodfellow, I, Bengio, Y and Courville, A (2016). Deep Learni, MIT Press
- Leskovec, J, Rajaraman, A & Ullman, J.D. (2014). Mining of massive datasets. Cambridge university press.
- Tan, P, Steinbach, M, Karpatne, A, Kumar, V. (2019), Introduction to Data Mining, 2nd Edition
- Murphy, K. P. Machine learning: a probabilistic perspective. The MIT Press.
Assessment
Two hour examination (50%), coursework (50%) - MEng Students
Two hour examination (40%), coursework (60%) - MSc students
Teaching
20 one-hour lectures:
8 one-hour labs
5 one-hour seminars