Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Note

Here we don't provide a complete list of available software. To check, what is currently available, please also check the output of

Code Block
languagebash
module avail


...

NameVersionmoduleNotes
gcc8.5.0-system default

12.1.0gcc/12.1.0activated support for offloading on Nvidia GPUs (nvptx)
intel-oneapi-compilers2022.1.0intel-oneapi-compilers/2022.1.0
nvhpc22.3nvhpc/22.3 NVIDIA HPC Software Development Kit (SDK) 
aocc3.2.0aocc/3.2.0 AMD Optimizing C/C++ and Fortran Compilers (“AOCC”)



 
openmpi4.1.3openmpi/4.1.3 
intel-oneapi-mpi2021.6.0intel-oneapi-mpi/2021.6.0 

Compiler options for optimization

...

  1. do not  use -xHost, because Intel does not "recognize" AMD (officially for security reasons (wink). Therefore use: -xcore-avx2
  2. These compiler options were used by NEC during the FESOM2 benchmark:

    PT

    OPT =

    -O3

    -qopt-report5

    -no-prec-div

    -ip

    -fp-model=fast=2

    -implicitnone

    -march=core-avx2

    -fPIC

    –qopenmp

    -qopt-malloc-options=2

    -qopt-prefetch=5

    -unroll-

    aggressive 

    aggressive

    These are (at least partially) quite important for good performance. However, we do not have the experience which are more or less critical. Be careful, some options might kill reproducibility (e.g., -fp-model=fast=2).
  3. Natalja is now responsible for this: https://docs.dkrz.de/doc/levante/running-jobs/runtime-settings.html#open-mpi-4-0-0-and-lat let's stll benefit from here knowledge (wink)
  4. Independent on the MPI used, please try runtime setting
    UCX_TLS=knem,dc_x,self
    for your jobs. According to NEC for smaller Jobs it might be beneficial to replace "dc_x" with "rc_x".
  5. When using OpenMPI, parallel IO (using OMPIO) might be slow. Try using

    export OMPI_MCA_io="romio321"

Spack

On albedo we mainly use spack to install software and provide module files.

...

We provide Python modules (currently only 3.10) with a series of useful packages pre-installed. This python with only basic Python installed for currently supported Python versions 3.7 through 3.10. Further packages you might need on top of this can be installed via standard methods, e.g. pip.

Additionally, we provide a toolbox of common data analysis tools in both Python and R. This module is available under analysis_toolbox and is updated every 3 months. This module is actually a conda environment which simply sets the correct shell variables for you. You can also use it as a starting location to create your own environments. The definition file may be found here: (e.g. $PATH, $PYTHONPATH, and similar) for you. A list of currently installed Python and R tools in this module may be found under /albedo/soft/install_templates/conda-workspace/plotting/requirementsenv-ymls/analysis-toolbox-03.2023.yml.

If you would like additional libraries in this globally available python environmentsanalysis toolbox, just ask! They will be added whenever the next one is released (January, March, June, and September)

R

Similar to the Python, the some basic R modules are also conda environments. The module r/4.2 also includes r-studio, which you can open from the terminal and then use via X.included in the analysis_toolbox. Currently, there is no R-Studio available but we are planning to install it in the September-2023 update of the analysis_toolbox. Be aware that we still recommend against using graphical interfaces since that's not what an HPC is designed for. Our recommended workflow for using R is either with the interactive session and the R, or via Rscript and sbatch script:

Interactive session with R command line interface

From Albedo's login node:

Code Block
languagebash
salloc --account=<your_account> --time=<HH:MM:SS> --qos=<QOS> --nodes=<#Nodes> <other_options...>

To understand which options you need to specify please refer to the Jobs section on Albedo-Slurm and the SLURM user guide.

Code Block
languagebash
module load analysis-toolbox
R

The R command line will open and you can start using R from there.

Rscript and sbatch script

From Albedo's login node:

$ module load analysis-toolbox

The command Rscript allows you to run an R script you wrote from the shell (outside of R IDE or R-Studio). This means that you can also write a sbatch script (see Albedo-Slurm > Jobs) that runs your R scripts via Rscript and submit it to the slurm queue using sbatch command. You can even use the slurm array feature to launch the same R script multiple times in parallel if what your script does can be broken into multiple computing chunks. Depending on your R script it might need more or less changes, but it's probably worth to spend the time on chaning it to be able to benefit from this first order parallelization. For a simple example on slurm array + Rscript the following tutorial covers most of it: https://rcpedia.stanford.edu/topicGuides/jobArrayRExample.html

Conda

Conda is a package manager for Python, R, and Julia software. You can use it on our HPC system by:

...

Jupyter is an interactive computing environment which lets you execute notebooks which can mix code, text, graphics, and LaTeX all in a single document. To start a local Jupyter server, ensure that you have Jupyter installed in your currently activated conda environment, and then run:$ jupyter lab . There are different ways to use Jupyter from Albedo listed bellow.

JupyterHub

See Jupyterhub on Albedo.

JupyterLab from a login node

Please, read until the end of this section, the last step is really important for things to work. Load the analysis-toolbox:

Code Block
[mandresm@albedo1:~]$ module load conda
[mandresm@albedo1:~]$ module load analysis-toolbox
[mandresm@albedo1:~]$ jupyter notebook --no-browser --ip=0.0.0.0

...

The printed output will direct you to a website where you can then open up the Jupyter interface. 

...


...
[I 15:26:36.310 NotebookApp] Jupyter Notebook 6.5.3 is running at:
[I 15:26:36.310 NotebookApp] http://albedo1:8891/?token=asdasdads
[I 15:26:36.310 NotebookApp]  or http://127.0.0.1:8891/?token=asdasdasd
[I 15:26:36.310 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 15:26:36.313 NotebookApp] 
    
    To access the notebook, open this file in a browser:
        file:///albedo/home/mandresm/.local/share/jupyter/runtime/nbserver-3890270-open.html
    Or copy and paste one of these URLs:
        http://albedo1:8891/?token=asdasdaas
     or http://127.0.0.1:8891/?token=asdasda

On the local machine, paste the URL including the albedo0 or albedo1 word into your browser, but replace albedo0 or albedo1 with albedo0.dmawi.de or albedo1.dmawi.de respectively. For the example above the address for the browser would be http://albedo1.dmawi.de:8891/?token=asdasdaas

JupyterLab from a COMPUTE or a GPU node

This example covers how to request a GPU node, but doing the same with a COMPUTE node would be almost identical, you will just need to remove the GPU parts on the salloc call. 

Background: By default, Jupyer notebook uses /run/user/<uid> as default directory for small files like notebook_cookie_secret. If you log in by ssh, /run/user/<uid> is created and it is removed when you close your last login session on the computer. However, if you enter a node via Slurm sbatch, salloc, or srun, /run/user/<uid> is not available. XDG_RUNTIME_DIR sets a different path.

Code Block
mandresm@albedo1:~$ salloc --partition=gpu --gpus=1 -A computing.computing --time=00:30:00
salloc: Pending job allocation 6526219
salloc: job 6526219 queued and waiting for resources
salloc: job 6526219 has been allocated resources
salloc: Granted job allocation 6526219
salloc: Waiting for resource configuration
salloc: Nodes gpu-001 are ready for job


mandresm@gpu-001:~$ export XDG_RUNTIME_DIR="/tmp/tmp_$SLURM_JOBID"
mandresm@gpu-001:~$ module load conda
mandresm@gpu-001:~$ module load analysis-toolbox
mandresm@gpu-001:~$ jupyter notebook --no-browser --ip=0.0.0.0
...
[I 15:37:11.953 NotebookApp] Serving notebooks from local directory: /albedo/home/mandresm
[I 15:37:11.953 NotebookApp] Jupyter Notebook 6.5.3 is running at:
[I 15:37:11.953 NotebookApp] http://gpu-001:8888/?token=123
[I 15:37:11.953 NotebookApp]  or http://127.0.0.1:8888/?token=123
[I 15:37:11.953 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 15:37:11.958 NotebookApp] 
    
    To access the notebook, open this file in a browser:
        file:///albedo/home/mandresm/.local/share/jupyter/runtime/nbserver-698858-open.html
    Or copy and paste one of these URLs:
        http://gpu-001:8888/?token=123
     or http://127.0.0.1:8888/?token=123

Now, we have to establish an SSH tunnel from your PC to the compute node, in this case gpu-001, to forward the Jupyter Notebook. Check the port number - usually 8888 for jupyter notebook, but it might differ. Open a new local terminal and execute:

Code Block
mandresm@blik0256:~$ ssh -NL localhost:8888:gpu-001:8888 mandresm@albedo1.dmawi.de

If you have ssh automatically configured to connect to Albedo then the process will be idling at this point. If you are requested your password then enter your password and again there will be no output. You don't need to do anything here anymore, just leave this local terminal open.

Now copy the address that looks like http://gpu-001:8888/?token=123 paste it in your browser, and substitute gpu-001 with 0.0.0.0. Jupyter Lab should open now!

How to use your own conda environment in JupyterHub/Lab? → add a Jupyter Kernel?

Python Notebook

For using your own conda environment in JupyterHub/Lab and access it via a Kernel there make sure your conda environment already has ipykernel:

Code Block
languagebash
conda activate <your_conda_environment>
conda install ipykernel

Then install your environment as an ipykernel:

Code Block
languagebash
conda activate {environment_name}
python -m ipykernel install --user --name=<environment_name>

Refresh Jupyter in your browser, you should now have your environment available as a kernel.

R Notebook

To create an R kernel for your own instance of Jupyter load the R module:

Code Block
languagebash
module load r

Then install the kernel with IRkernel, making sure you name it differently than just simply "R" (e.g. "my_R_kernel"):

Code Block
languagebash
Rscript -e 'IRkernel::installspec(name="<your_R_kernel>", displayname="<your_R_kernel")'

Refresh Jupyter in your browser, you should now have your environment available as a kernel. Whatever you had installed in your instance of R should be available via that kernel's notebooks.

Singularity

Warning
titleStill under constuction

Singularity support is still under construction!

...

Important note about building containers from scratch: Building requires root privileges! However, the generated container files are portable and can be copied from (e.g.) your personal laptop to the HPC system. Alternatively, you can consider to use the "remote builder" hosted at Sylabs.io: https://cloud.sylabs.io/builder

Matlab

Currently the most recent version (R2022b) of Matlab is available on albedo.

...

Right now the usage on compute nodes is still being set up so , currently, please use Matlab temporarily on albedo0 or albedo1 until usage on compute nodes is fully set up.

In addition to that, the Live Editor features are not yet working properly, for the time being please use some external editor to create or modify MATLAB scripts.

on the compute nodes fat-003 and fat-004 accessed by the slurm partition matlab.

Note

Currently the nodes fat-00[3,4] are reserved for Matlab users to activate their personal licenses. These nodes can be accessed via the slurm partition matlab.

Please note that this might change again in the future. We will keep you informed!

Necessary ressources may be allocated by the slurm command initializing your Matlab session. Please adjust the example

Code Block
salloc -p matlab --x11 --mem=8GB --time=2:00:00 --qos=12h

to your needs (and append your account if necessary). Since the nodes are shared between all Matlab users there are limitations implemented with respect to CPU and memory usage.

Please activate your Please activate personal license if you have one. This is done just like on any other platform by running

...

(after loading the module). A GUI will guide you through the necessary steps for activation of your personal license. Please note that you need to activate a license on albedo0 and albedo1 in case each node you would like to use Matlab on both  login nodes. for Matlab (by using --nodelist=fat-003 or fat-004).


OpenCV


You might need before your script if you use OpenCV:

Code Block
$ module load mesa



IDE/ENVI

We were reported that idede, the graphical Interface for IDL, might crash because it might require more virtual memory than we allow. The IDL support provided the following solution:

I have shared your case with the team and would like to share the additional information:
1) First of all, when using older IDL version such as IDL 8.6, you can try to start IDLDE with the below steps to try to workaround the issue:
To be sure everything works fine, please first delete your ".idl" folder which can be found in the user home directory: "/home/user/"
Then run your IDLDE session with the following command:   " idlde -outofprocess "
(This will separate the java process which is running in the background.)

2) The virtual memory is somewhat alarming, but the good thing is that we are sure it’s not actually using that much memory.
However, that can still cause problems, like you are currently experiencing.

First of all, can you please confirm that you're not using an IDL startup script that is pre-allocating a bunch of array space? 

In addition, we noticed that you are using a non-standard system and it might be due to the used kernel and glibc version.
Theoretically, given the right kernel and glibc version, IDL should be able to run.
But you might need to do some tweaking (like that environment variable below) to get IDL to run properly on this specific OS.
We have found a thread on the web, which says that it isn’t Eclipse but could be related to glibc: https://www.eclipse.org/forums/index.php/t/1082034/

They recommend setting some flag, MALLOC_ARENA_MAX=4.
We have never heard of that but it could be worth trying on this specific system.
Here is another thread that also mentions that same environment variable: https://stackoverflow.com/questions/561245/virtual-memory-usage-from-java-under-linux-too-much-memory-used

Panoply

Panoply plots geo-referenced and other arrays from netCDF, HDF, GRIB, and other datasets. To use its graphical interface make sure you login into albedo via ssh with X forwarding (ssh -X ...). Then run the following commands:

Code Block
module load panoply
panoply.sh

uftp-client

uftp is a parallel data transfer tool that uses multiple streams to transfer large ammounts of data efficiently between different systems. The tool has two main components: a server and a client. Albedo does not have a uftp server, but other HPC centers do have one (e.g. DKRZ and Jülich), which means you can efficiently transfer data from and to Albedo to and from these other HPC centers, using Albedo's uftp-client. For that follow these steps:

  1. Follow the steps to register an ssh-key in the target uftp-server, and read the full uftp-server documentation:
    1. For data transfers from/to Levante: https://docs.dkrz.de/doc/levante/data-transfer/uftp.html
    2. For data transfers from/to Jülich: https://apps.fz-juelich.de/jsc/hps/judac/uftp.html
  2. Load the module on Albedo:
    Code Block
    module load uftp-client
  3. Use the commands referred in the documentation (and uftp documentation) above to transfer your data.

OpenFOAM

OpenFOAM is installed on Albedo and available as a module:

Code Block
$ module load avail
$ module load openfoam/<version>


To install an external OpenFOAM library/solver you'd need to load openfoam from spack instead, so that the environment and the dependencies are set up for you automatically:

Code Block
languagebash
$ module purge
$ module load spack
$ spack load openfoam/<version>

Then follow the installation instructions of the library you are trying to install. They can involve running a make file. In that case make sure you set up the necessary environment variables for building in a directory where you have writting access, otherwise, you'll end up with an error similar to:

Code Block
languagebash
mkdir: cannot create directory ‘’: No such file or directory                                                                                                                                                                    
make: *** [/albedo/soft/sw/spack-sw/openfoam/2112-u257d6d/wmake/makefiles/general:182: /libhydrology.so] Error 1

This is an example for how to solve this problem for the hydrology library:

Code Block
$ export FOAM_USER_APPBIN=/path/with/writting/permissions/lib # I found about these variables in the make file Allwmake of the hydrology package
$ export FOAM_USER_APPBIN=/path/with/writting/permissions/bin
$ ./Allwmake