HPC for Data Science video lecture series
Warwick Research Software Engineering and the Alan Turing institute are pleased to present the HPC for Data Science video lecture series. This series covers many aspects of using HPC for data science, covering how to live in the world of HPC clusters, how to move code between different machines using Singularity containers and how to write GPU accelerated code that can help your code keep up with fast libraries.
This course takes the form of roughly 2 hours of Core video lecture and 4 hours of additional Skills videos. The Core section plus exercises constitutes a 1-day workshop. Your selection of 2 or 3 of the Skills videos plus exercises constitutes a second day. An optional 3rd day can be used to cover the other Skills. We suggest watching the videos at your own pace - pausing where necessary to look up, practice, or get feedback. We offer some support via email or forum and will have a live support session in December 2020 - see Support below for details.
At the end of the workshop, there is a quiz and some further wrap-up exercises to take you forward with your new skills.
Terminology
A few quick terminology refreshers before we kick off!
HPC - High Performance Computing - Computing that's too big, too hard, or too specialized for your laptop or desktop computer.
Cluster - A set of computers, usually with a fast network linking them.
Command line, terminal, console - Let's you run commands on a computer by typing them. Usually looks a bit like this:
UserName@example$>
SSH - A secure way to access a remote machine over the internet via the command line. If this is new to you, we have some videos here to get you started!
GPU - Graphics Processing Unit or Graphics Card - usually helps your computer display images on screen but can also do computations in general, especially on large arrays - some GPUs now are only for calculation such as the nVidia Tesla/Ampere series - these may be called GPGPUs (General Purpose GPU).
Scheduler - The program on an HPC cluster which controls the use of processors, memory and other hardware. This is in charge of giving everybody fair access to resources without overloading the machines.
Container - A system to combine code and the libraries it depends on into a single thing that can be moved to different computers.
Video Tutorials
The links below take you to either a Youtube playlist or to the individual videos, with brief descriptions.
Core Videos
Introduction - what these videos are for and how to use them.
From Notebook to Script - Turning code in a Notebook into a script suitable for HPC facilities
Clusters, Queues and Modules - Understanding how to run programs on a cluster and how to access libraries and programs you might need
Skills Videos
Checkpoints and Batches - Writing code that can be stopped and restarted, and dealing with limited runtimes on clusters
Containers - Using containers to portably package software
Numba - Using the Numba Python library to speed up your Python code
High Performance Libraries - An introduction to using external libraries to speed up parts of a code
GPUs - Using GPUs in Python to offload some of your programs work
Exercises and Assessment
We've provided some example code and some suggestions for how to get started with using the skills you've learned. You can get the examples from our GitHub page here, and the instructions are linked below as pdfs.
Getting Started
A walk through of how to start and some basic knowledge -
Core
Clusters, Queues and Modules -
Skills
Numba -
GPUs - For this section we use Jupyter notebooks, which are available at the Github link above.
Final
Consolidation Questions document
ANSWERS to the Questions
Support
For support or questions on the contents of the videos, you can contact us at rse{@}warwick.ac.uk or post on our forum
If you have questions from the material, or want feedback or more discussion, we will be running a remote support session on Monday December 14th, where we can offer guidance, or discuss any questions from the course or beyond. This session will give priority to attendees from The Alan Turing Institute. Researchers or students at Warwick can signup for the support session via SkillsForge (for PhD students, course code RSE9) or here.
A feedback form is provided at the top of this page (or at this link) and we appreciate any comments you want to submit so that we can make this course better!