Data Science and Machine Learning: The Fundamentals | Short course at the Summer School at Warwick University
Data Science and Machine Learning: The Fundamentals
The most important aspect of computer science is problem solving, an essential skill for life.
Data Science is concerned with how to gain knowledge from the vast volumes of data generated daily in modern life, from social networks to scientific research and finance, and proposes sophisticated computing techniques for processing this deluge of information. In parallel, Machine Learning is concerned with the development of analytical models and algorithms to learn from data and make accurate predictions.
This course addresses fundamental aspects of Data Science and Machine Learning, e.g., analytical models to represent and understand the data, efficient algorithms to manipulate and extract relevant knowledge, and corresponding models to understand their overall performance and limitations.
In particular, students study the design, development and analysis of software and hardware used to solve problems in a variety of business, scientific and social contexts. During this course, students will study techniques for how to go from raw data to a deeper understanding of the patterns and structures within the data, to support making predictions and decision making. Students would be expected to have some basic knowledge of linear algebra and calculus.
Key Information
Level: Introductory to intermediate Fees: Please see Fees page Teaching: 60 hours Expected independent study: 90 hours Optional assessment: Dependent on course
Typical credit: 3-4 credits (US) 7.5 ECTS points (EU) - please check with your home institution
Data Analytics involves being about to go from raw data to a deeper understanding of the patterns and structures within the data, to support making predictions and decision making. The course will cover a number of topics, including:
Introduction to Data Science and Machine Learning: motivating successful analytic examples (Walmart, Google, and Twitter), introducing Supervised, Unsupervised, and Reinforcement Learning, measuring performance / regret, fundamental limits (the No-Free-Lunch-Theorem and Bias);
Probability recap, e.g., sample spaces, random variables, distributions, heavy-tails, quantiles, Q-Q plots, Bayes, correlation;
Statistics recap, e.g., hypothesis testing, chi-square distributions, density estimation (MoM and MLEs), confidence intervals and application to voting;
Stochastic bandits as a fundamental example of Reinforcement Learning: naïve Explore-then-Exploit strategy and UCB bounds;
Regression: linear regression, least squares, logistic regression - Predicting new data values via regression models. Simple linear regression over low dimensional data, regression for higher dimensional data via least squares optimization, logistic regression for categoric data;
Matrices: Linear Algebra, SVD, PCA - Matrices to represent relations between data, and necessary linear algebraic operations on matrices. Approximately representing matrices by decompositions (Singular Value Decomposition and Principal Components Analysis). Application to the Netflix prize;
Classification: Trees, NB, Support Vector Machines, Kernel Trick - Building models to classify new data instances. Decision tree approaches and Naive Bayes classifiers. The Support Vector Machines model and use of Kernels to produce separable data and non-linear classification boundaries. The Weka toolkit;
Clustering: hierarchical, k-means, k-center - Finding clusters in data via different approaches. Choosing distance metrics. Different clustering approaches: hierarchical agglomerative clustering, k-means (Lloyd's algorithm), k-center approximations. Relative merits of each method;
Basic tools: command line tools, plotting tools, programming tools - The wide variety of tools available to work with data, including unix/linux command line tools for data manipulation (sorting, counting, reformatting, aggregating, joining); tools such as gnuplot for displaying and visualizing data;
A number of hands-on exercises involving real data and to be solved in either the Weka toolkit, Python, or R.
Course Aims
To understand the foundational skills in data analytics, including preparing and working with data; abstracting and modelling an analytic question; and using tools from statistics, learning and mining to address the question.
Learning Outcomes
By the end of the module, the student should be able to:
Understand the basic mathematical models for large data sets.
Understand the principles and purposes of data analytics, and articulate the different dimensions of the area.
Work with and manipulate a data set to extract statistics and features, coping with missing and dirty data.
Apply basic data mining machine learning techniques to build a classifier or regression model, and predict values for new examples.
Course Structure
For this course, there will be 4 hours of teaching on most weekdays, comprised of lectures and small group teaching. The structure will be:
3 hours of lectures.
A 1 hour seminar in small groups.
Students will also be given time each day for independent study. Towards the end of the third week, students will also be provided with time for revision.
Due to the intensive nature of the Warwick Summer School, students are expected to attend all timetabled teaching activity, to engage with the teaching material and to spend an average of 2-3 hours per day in self-guided study.
Students registered for Warwick Summer School must attain a minimum of 75% authorised attendance of the timetabled teaching activity. If students fall short of this threshold, they will not be awarded a certificate of completion and they will not be able to sit the exam/submit final assessment.
Authorised attendance includes any sickness absence during the course of the programme that has been properly notified and recorded. An attendance register will be taken at the beginning of teaching session so students need to ensure they arrive on time in order to be marked as attended.
Course Assessment
The module will be assessed via a 2-hour examination. It should be noted that the exam is not compulsory. Everyone who completes the course – whether or not they sit the exam - will receive a certificate of attendance.
Students who choose to sit the final exam will receive a transcript of results. The transcript of results will state the name of the student, the course studied, the exam mark and the grade. Transcripts are automatically sent to students via email by the middle of September. You should keep this safe in case you need it for credit transfer approval or for future reference. If you would like us to email a copy of your transcript directly to someone at your University who deals with credit transfer please let us know.
The Summer School does not offer re-takes of examinations, whatever your result in the original examination.
Subject to changes, this is an overview of the timetable for Warwick Summer School students.
Monday
Tuesday
Wednesday
Thursday
Friday
Week 1
2 hour introductory lecture
4 hours of teaching
1 hour study group
4 hours of teaching
4 hours of teaching
1 hour study group
4 hours of teaching
Week 2
4 hours of teaching
4 hours of teaching
1 hour study group
No timetabled teaching activity
4 hours of teaching
1 hour study group
Sports activities
Week 3
4 hours of teaching
4 hours of teaching
1 hour study group
4 hours of teaching
2 hour final assessment preparation
1 hour study group
Final Assessment
In addition to the above schedule, students are expected to attend the Guest Lecture Series on Tuesday afternoons and dedicate approximately 88 hours to self-study throughout the programme. Social activities are also organised throughout the week, with optional trips running on Saturdays and the Wednesday in Week 2; for details, please visit our Social and Cultural Link opens in a new windowpage.
This course is open to students studying any discipline at University level provided they have basic knowledge of linear algebra and calculus. We welcome individuals from all backgrounds, including students who are currently studying another subject but who want to broaden their knowledge in another discipline.
Students must be aged 18 or over by the time the Summer School commences and have a good understanding of the English language.
"As I transitioned from the Warwick Summer School back to my studies and work, the skills and knowledge I gained during the three weeks proved invaluable. The Data Science course, particularly the SVM model and programming skills, has enabled me to analyse different sets of big data such as survey data during my studies and work.
Additionally, my improved English skills have significantly enhanced my communication abilities. My critical thinking, decision making and problem solving skills have been improved during the data science course as it involves solving complex problems and the application of theoretical knowledge to practical scenarios."