ST340 Programming for Data Science
Introductory description
This module runs in Term 2 and is available for students on a course where it is a listed option and as an Unusual Option to students who have completed the prerequisite module ST221 Linear Statistical Modelling.
Module aims
To introduce students to algorithms suitable to the analysis of large datasets. In the modern world it is very easy to generate very large amounts of data. Capturing and exploiting the important information contained within such datasets poses a number of statistical challenges. It may not even be clear how much useful information the data contains. The module will cover a variety of algorithms developed to tackle some of these challenges.
Outline syllabus
This is an indicative module outline only to give an indication of the sort of topics that may be covered. Actual sessions held may differ.
 Computational Complexity.
 Principal components analysis and singular value decomposition.
 Markov chains and PageRank.
 Clustering, EM algorithm.
 Bandit problems.
 Supervised learning, kNearest neighbours.
 Supervised and unsupervised learning. Penalised regression.
 Support vector machines.
 Artificial neural networks.
 Gaussian processes.
 Parallel and distributed algorithms.
Learning outcomes
By the end of the module, students should be able to:
 Understand how to use a variety of practical algorithms when dealing with data analysis problems.
 Use R to implement data analysis algorithms.
 Interpret the output of various algorithms when applied to data sets.
Indicative reading list
Subject specific skills
Transferable skills
Study time
Type  Required  Optional 

Lectures  20 sessions of 1 hour (13%)  2 sessions of 1 hour 
Practical classes  10 sessions of 1 hour (7%)  
Private study  46 hours (31%)  
Assessment  74 hours (49%)  
Total  150 hours 
Private study description
Weekly revision of lecture notes and materials, wider reading, practice exercises and preparing for examination.
Costs
No further costs have been identified for this module.
You do not need to pass all assessment components to pass the module.
Assessment group C2
Weighting  Study time  

Assignment 3  17%  25 hours 
Due in Term 2 Week 10. 

Assignment 1  16%  24 hours 
Due in Term 2 Week 5. 

Assignment 2  17%  25 hours 
Due in Term 2 Week 8. 

Oncampus Examination  50%  
The examination paper will contain four questions, of which the best marks of THREE questions will be used to calculate your grade. ~Platforms  Moodle

Assessment group R1
Weighting  Study time  

Assignment  50%  
You will be asked to complete this assignment if you failed the module and you failed the coursework component of the original assessment. The reassessment will be similar in nature to the original assignments. 500 words is equivalent to one page of text, diagrams, formula or equations; your Assignment should not exceed 25 pages in length. 

Online Examination  50%  
The examination paper will contain four questions, of which the best marks of THREE questions will be used to calculate your grade. ~Platforms  Moodle

Feedback on assessment
Marked assignments will be available for viewing at the support office within 20 working days of the submission deadline. Cohort level feedback will be provided, and students will be given the opportunity to receive feedback via facetoface meetings.
Cohort level feedback will be provided for the examination.
