ST340 Programming for Data Science
Please note that all lectures for Statistics modules taught in the 2022-23 academic year will be delivered on campus, and that the information below relates only to the hybrid teaching methods utilised in 2021-22 as a response to Coronavirus. We will update the Additional Information (linked on the right side of this page) prior to the start of the 2022/23 academic year.
Throughout the 2021-22 academic year, we will be adapting the way we teach and assess your modules in line with government guidance on social distancing and other protective measures in response to Coronavirus. Teaching will vary between online and on-campus delivery through the year, and you should read the additional information linked on the right hand side of this page for details of how this will work for this module. The contact hours shown in the module information below are superseded by the additional information. You can find out more about the University’s overall response to Coronavirus at: https://warwick.ac.uk/coronavirus.
All dates for assessments for Statistics modules, including coursework and examinations, can be found in the Statistics Assessment Handbook at http://go.warwick.ac.uk/STassessmenthandbook
You must register for this module using the pre-registration form linked on the right hand side.
ST340-15 Programming for Data Science
Introductory description
This module runs in Term 2 and is available for students on a course where it is a listed option and as an Unusual Option to students who have completed the prerequisite module ST221 Linear Statistical Modelling.
There is a cap on student numbers for this module and pre-registration is essential. Information about prioritisation and the pre-registration form can be found at http://go.warwick.ac.uk/ST340
Module aims
To introduce students to algorithms suitable to the analysis of large datasets. In the modern world it is very easy to generate very large amounts of data. Capturing and exploiting the important information contained within such datasets poses a number of statistical challenges. It may not even be clear how much useful information the data contains. The module will cover a variety of algorithms developed to tackle some of these challenges.
Outline syllabus
This is an indicative module outline only to give an indication of the sort of topics that may be covered. Actual sessions held may differ.
- Computational Complexity.
- Principal components analysis and singular value decomposition.
- Markov chains and PageRank.
- Clustering, EM algorithm.
- Bandit problems.
- Supervised learning, k-Nearest neighbours.
- Supervised and unsupervised learning. Penalised regression.
- Support vector machines.
- Artificial neural networks.
- Gaussian processes.
- Parallel and distributed algorithms.
Learning outcomes
By the end of the module, students should be able to:
- Understand how to use a variety of practical algorithms when dealing with data analysis problems.
- Use R to implement data analysis algorithms.
- Interpret the output of various algorithms when applied to data sets.
Indicative reading list
View reading list on Talis Aspire
Subject specific skills
TBC
Transferable skills
TBC
Study time
Type | Required | Optional |
---|---|---|
Lectures | 20 sessions of 1 hour (13%) | 2 sessions of 1 hour |
Practical classes | 10 sessions of 1 hour (7%) | |
Private study | 46 hours (31%) | |
Assessment | 74 hours (49%) | |
Total | 150 hours |
Private study description
Weekly revision of lecture notes and materials, wider reading, practice exercises and preparing for examination.
Costs
No further costs have been identified for this module.
You do not need to pass all assessment components to pass the module.
Assessment group C2
Weighting | Study time | |
---|---|---|
Assignment 3 | 17% | 25 hours |
Due in Term 2 Week 10. |
||
Assignment 1 | 16% | 24 hours |
Due in Term 2 Week 5. |
||
Assignment 2 | 17% | 25 hours |
Due in Term 2 Week 8. |
||
On-campus Examination | 50% | |
The examination paper will contain four questions, of which the best marks of THREE questions will be used to calculate your grade. ~Platforms - Moodle
|
Assessment group R1
Weighting | Study time | |
---|---|---|
Assignment | 50% | |
You will be asked to complete this assignment if you failed the module and you failed the coursework component of the original assessment. The reassessment will be similar in nature to the original assignments. 500 words is equivalent to one page of text, diagrams, formula or equations; your Assignment should not exceed 25 pages in length. |
||
In-person Examination - Resit | 50% | |
The examination paper will contain four questions, of which the best marks of THREE questions will be used to calculate your grade. ~Platforms - Moodle
|
Feedback on assessment
Marked assignments will be available for viewing at the support office within 20 working days of the submission deadline. Cohort level feedback will be provided, and students will be given the opportunity to receive feedback via face-to-face meetings.
Cohort level feedback will be provided for the examination.
Courses
This module is Optional for:
-
UCSA-G4G1 Undergraduate Discrete Mathematics
- Year 3 of G4G1 Discrete Mathematics
- Year 3 of G4G1 Discrete Mathematics
- Year 3 of UCSA-G4G3 Undergraduate Discrete Mathematics
- Year 4 of UCSA-G4G2 Undergraduate Discrete Mathematics with Intercalated Year
-
USTA-G300 Undergraduate Master of Mathematics,Operational Research,Statistics and Economics
- Year 3 of G300 Mathematics, Operational Research, Statistics and Economics
- Year 4 of G300 Mathematics, Operational Research, Statistics and Economics
This module is Option list A for:
- Year 3 of USTA-G304 Undergraduate Data Science (MSci)
- Year 4 of USTA-G303 Undergraduate Data Science (with Intercalated Year)
-
USTA-G1G3 Undergraduate Mathematics and Statistics (BSc MMathStat)
- Year 3 of G1G3 Mathematics and Statistics (BSc MMathStat)
- Year 4 of G1G3 Mathematics and Statistics (BSc MMathStat)
-
USTA-G1G4 Undergraduate Mathematics and Statistics (BSc MMathStat) (with Intercalated Year)
- Year 4 of G1G4 Mathematics and Statistics (BSc MMathStat) (with Intercalated Year)
- Year 5 of G1G4 Mathematics and Statistics (BSc MMathStat) (with Intercalated Year)
-
USTA-GG14 Undergraduate Mathematics and Statistics (BSc)
- Year 3 of GG14 Mathematics and Statistics
- Year 3 of GG14 Mathematics and Statistics
- Year 4 of USTA-GG17 Undergraduate Mathematics and Statistics (with Intercalated Year)
-
USTA-Y602 Undergraduate Mathematics,Operational Research,Statistics and Economics
- Year 3 of Y602 Mathematics,Operational Research,Stats,Economics
- Year 3 of Y602 Mathematics,Operational Research,Stats,Economics
- Year 4 of USTA-Y603 Undergraduate Mathematics,Operational Research,Statistics,Economics (with Intercalated Year)
This module is Option list B for:
-
USTA-G302 Undergraduate Data Science
- Year 3 of G302 Data Science
- Year 3 of G302 Data Science
This module is Option list D for:
-
USTA-G300 Undergraduate Master of Mathematics,Operational Research,Statistics and Economics
- Year 4 of G30C Master of Maths, Op.Res, Stats & Economics (Operational Research and Statistics Stream)
- Year 4 of G30C Master of Maths, Op.Res, Stats & Economics (Operational Research and Statistics Stream)
- Year 5 of USTA-G301 Undergraduate Master of Mathematics,Operational Research,Statistics and Economics (with Intercalated
This module is Option list E for:
- Year 4 of USTA-G300 Undergraduate Master of Mathematics,Operational Research,Statistics and Economics
- Year 5 of USTA-G301 Undergraduate Master of Mathematics,Operational Research,Statistics and Economics (with Intercalated
This module is Option list F for:
- Year 3 of USTA-G300 Undergraduate Master of Mathematics,Operational Research,Statistics and Economics
-
USTA-G301 Undergraduate Master of Mathematics,Operational Research,Statistics and Economics (with Intercalated
- Year 3 of G30H Master of Maths, Op.Res, Stats & Economics (Statistics with Mathematics Stream)
- Year 4 of G30H Master of Maths, Op.Res, Stats & Economics (Statistics with Mathematics Stream)