Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

To achieve high performance in an HPC, a computing task needs to be able to be broken down into smaller pieces. These pieces can then be distributed across multiple computing nodes (or cores/threads/devices...), which scales up the speed of the computation, IF everything is configured/program correctly, and IF the problem you are trying to solve scales well. This concept of dividing a computing task into smaller pieces to then be distributed to a number of devices (nodes, CPUs, cores, threads, NUMA domains, GPUs, ...) is known as parallelization. To use the full potential of any HPC one needs to parallelize their code, in one way or another.

One simple example of parallelization is a loop where each iteration is independent from the previous ones. A way of easily parallelizing a simple script with a parallelizable loop is to break the loop into smaller subloops (even single iterations). Then one can submit that script with different input to trigger the different subloops with SLURM. To do that you can use SLURM Job Arrays. Here is a very simple example on how to do such a thing for a python script (https://rcpedia.stanford.edu/topicGuides/jobArrayPythonExample.html) but  but there are many examples online if you search on internet for "SLURM Job Arrays".

Another way of parallelizing code (for example FORTRAN, C, C++...) is by using parallelization standards such as MPI, OpenMP, OpenACC...

...