Lecturer: Nick Hine
Weighting: 7.5 CATS
The aim of this module is to complete your training in the use of computers by exploring the use of super-computers to solve super problems. The module teaches how to write scalable, portable programs for parallel computer systems and explore how large-scale physics problems are tackled. This module is 100% continuously assessed (there is no examination).
To explain the methods used in computer simulations and data analysis on high performance computers, for research in all fields of computational physics and other sciences.
At the end of this module you should be able to:
- Identify and correct common inefficiencies in both serial scientific computer codes
- Choose an appropriate programming paradigm for a particular problem or hardware architecture
- Write a parallel program using shared-memory or message passing constructs in a physics context
- Write a simple GPU accelerated program.
- Identify sources of performance bottlenecks in parallel computer programs and understand how these relate to the computer architecture
- Use batch systems to access parallel computing hardware and to validate the correctness of a parallel computer program vs equivalent serial software
Programming for efficiency. Modern cache architectures and CPU pipelining. Avoiding expensive and repeated operations. Compiler optimisation flags. Profiling with gprof.
Introduction to parallel computing. Modern HPC hardware and parallelisation strategies. Applications in Physics, super problems need super-computers.
Shared memory programming. The OpenMP standard. Parallelisation using compiler directives. Threading and variable types. Loop and sections constructs. Program correctness and reproducibility. Scheduling and false sharing as factors influencing performance.
Distributed memory programming. The MPI standard for message passing. Point-to-point and collective communication. Synchronous vs asynchronous communication. MPI communicators and topologies.
GPU programming. CUDA vs OpenCL. Kernels and host-device communication. Shared and constant memory, synchronicity and performance. GPU coding restrictions.
Limitations to parallel performance. Strong vs weak scaling. Amdahl’s law. Network contention in modern many-core architectures. Mixed mode OpenMP+MPI programming.
Commitment: 15 Lectures + 5 Laboratory Sessions
Assessment: Assignments (100%)
Recommended Texts: R Chandra et. al,. Parallel Programming in OpenMP , Morgan Kaufmann,
P Pacheco, Parallel Programming with MPI Morgan Kaufmann
M Quinn, Parallel Programming in C with MPI and OpenMP McGraw-Hill
D Kirk and W Hwu, Programming Massively Parallel Processors Elsevier
This module has its own website where lecture notes and other resources are available.
Leads from: PX390 Scientific Computing A good working knowledge of a scientific programming language preferably C is essential.