Skip to main content Skip to navigation

CY901 High Performance Scientific Computing

Current Lecturers:

P. Mark Rodger
David Quigley

Current Course Homepage:

Academic Year 2014-15


This module will address the rapid increase, in recent years, of computer simulations and data analysis on high performance computers, for research in all fields of scientific computing.

Learning Outcomes:

  • Ability to identify and correct common inefficiencies in both serial and parallel scientific computer codes.

  • Concepts of shared and distributed memory programming. Ability to choose an appropriate programming paradigm for a particular problem or hardware architecture.
  • Ability to write a parallel program using shared memory or message passing constructs in a Physics context. Ability to write a simple GPU accelerated program.

  • Source of performance bottlenecks in parallel computer programs and how these relate to basics of computing architecture.

  • Use of batch systems to access parallel computing hardware. Validate the correctness of a parallel computer program vs equivalent serial software.


Support for this course is provided by a web based bulletin board. Please do not email the lecturer directly with questions and requests for help, but submit them to this forum so that everyone can see them (and answer them) and so that we have an archive of questions raised. All submissions to the bulletin board are also emailed to the lecturer so she/he will see them all.


  1. Programming for efficiency (1 lecture). Modern cache architectures and CPU pipelining. Avoiding expensive and repeated operations. Compiler optimisation flags. Profiling with gprof.
  2. Introduction to parallel computing (1 lecture). Modern HPC hardware and parallelisation strategies. Applications in Physics, super problems need super-computers.
  3. Shared memory programming (5 lectures). The OpenMP standard. Parallelisation using compiler directives. Threading and variable types. Loop and sections constructs. Program correctness and reproducibility. Scheduling and false sharing as factors influencing performance.
  4. Distributed memory programming (5 lectures). The MPI standard for message passing. Point-to-point and collective communication. Synchronous vs asynchronous communication. MPI communicators and topologies.
  5. GPU programming (1 lecture). CUDA vs OpenCL. Kernels and host-device communication. Shared and constant memory, synchronicity and performance. GPU coding restrictions.
  6. Limitations to parallel performance (2 lectures). Strong vs weak scaling. Amdahl’s law. Network contention in modern many-core architectures. Mixed mode OpenMP+MPI programming.


A good working knowledge of a scientific programming language (either Fortran- 95/2003 or C), as taught, for example, in PX250 Fortran Programming for Scientists, will be a pre-requisite.