Skip to main content

CY901 High Performance Scientific Computing

Current Lecturers:

Nicholas Hine
David Quigley

This module shares lectures and assignments with PX425 - High Performance Computing in Physics.

CY901 students are set an additional assignment in term 2 by the MSc Course Director.


This module will address the rapid increase, in recent years, of computer simulations and data analysis on high performance computers, for research in all fields of scientific computing.

Learning Outcomes:

  • Ability to identify and correct common inefficiencies in both serial and parallel scientific computer codes.

  • Concepts of shared and distributed memory programming. Ability to choose an appropriate programming paradigm for a particular problem or hardware architecture.
  • Ability to write a parallel program using shared memory or message passing constructs in a Physics context. Ability to write a simple GPU accelerated program.

  • Source of performance bottlenecks in parallel computer programs and how these relate to basics of computing architecture.

  • Use of batch systems to access parallel computing hardware. Validate the correctness of a parallel computer program vs equivalent serial software.


  1. Programming for efficiency (1 lecture). Modern cache architectures and CPU pipelining. Avoiding expensive and repeated operations. Compiler optimisation flags. Profiling with gprof.
  2. Introduction to parallel computing (1 lecture). Modern HPC hardware and parallelisation strategies. Applications in Physics, super problems need super-computers.
  3. Shared memory programming (5 lectures). The OpenMP standard. Parallelisation using compiler directives. Threading and variable types. Loop and sections constructs. Program correctness and reproducibility. Scheduling and false sharing as factors influencing performance.
  4. Distributed memory programming (5 lectures). The MPI standard for message passing. Point-to-point and collective communication. Synchronous vs asynchronous communication. MPI communicators and topologies.
  5. GPU programming (1 lecture). CUDA vs OpenCL. Kernels and host-device communication. Shared and constant memory, synchronicity and performance. GPU coding restrictions.
  6. Limitations to parallel performance (2 lectures). Strong vs weak scaling. Amdahl’s law. Network contention in modern many-core architectures. Mixed mode OpenMP+MPI programming.


A good working knowledge of a scientific programming language (either Fortran- 95/2003 or C), as taught, for example, in PX250 Fortran Programming for Scientists, will be a pre-requisite.