Table of Contents
outline true
indent 10px
absoluteUrl true
exclude 1

You have to be member of HPC_user (can be applied for on id.awi.de, Start a new request > IT Services > select High-Performance-Computing (HPC) > Add to cart). See HPC account for more info.
The login nodes can be accessed via
ssh albedo0.

...

dmawi.de and ssh albedo1.

...

dmawi.de
→ If you do not familiar with ssh and/or bash you should start here

...

for a basic introduction.
Please do not use these login nodes for computing, please use the compute nodes (

...

and take a look at our hardware and slurm documentation)
HPC resources are not available from remote for security reasons (VPN is possible

...

).
By using albedo you accept ourHPC data policy

...

Support

You can open a support ticket on helpdesk.awi.de or by writing an email to hpc@awi.de. Please do not send a personal email to an admin!

Storage

Local user storage

The local storage is a GxFS Storage Appliance from NEC based on https://en.wikipedia.org/wiki/GPFS.
All nodes are connected via a 100 Gb OPA/Mellanox/Inifiniband network.

...

You can ssh to-a-node where a job of yours is running, if (and only if) you have a valid ssh-key pair. (e.g. on a login node: ssh-keygen -t ed25519; ssh-copy-id albedo1)
Make sure your key is secured with a password!

Software

Albedo is running the operating system Rocky Linux release 8.6 (Green Obsidian).
Slurm 22.05 is used as the job scheduling system. Important details on its configuration on Albedo are given here: Slurm.
Details on the user software can be found here: Software.

Environment modules

On albedo we use environment modules to load/unload specific versions of software. Loading a module modifies environment variables so that the shell e.g. knows where to look for binaries.

You get an overview of all software installed by typing

Code Block

language	bash

module avail

To load and unload a module use

Code Block

language	bash

# load
module load <module>

# unload
module unload <loaded module>

Sometimes it might be useful to unload all loaded modules at once. This is done with

Code Block

language	bash

module purge

It is also possible to use the module command from some scripting languages. For example, in Python you can do:

Code Block

language	py
title	Module Command from Python
linenumbers	true

$ python
Python 3.8.16 | packaged by conda-forge | (default, Feb  1 2023, 16:01:55)
[GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> module_python_init = "/usr/share/Modules/init/python.py"
>>> exec(open(module_python_init).read())
>>> result = module("list")
Currently Loaded Modulefiles:
 1) git/2.35.2   2) conda/22.9.0-2
>>> result is True
True

Usage of a node's internal NVMe storage

All compute (including fat and gpu) nodes have a local NVMe disk mounted as /tmp. The GPU nodes have an additional storage /scratch. See System overview for the exact sizes. We strongly encourage you to use these node-internal storage, which is faster than the global /albedo storage, if your job does lots of reading/writing. In particular, it might be beneficial to write your job output to the local disk and copy it to /albedo after your job is finished.

Code Block

language	bash

# Copy input data to the node, where your main MPI (rank 0) task runs
rsync -ur $INPUT_DATA /tmp/

# If you need the input data on every node, you have to add `srun` in front of the copy command
srun

...

variable
30 €/TB/yr

...

variable
10 €/TB/yr

...

2x soft quota for 90 days

...

--

...

low latency, huge bandwidth

System storage

Is installed and maintained in /albedo/home/soft/:

./AWIsoft → binaries
./AWIbuild → sources
./AWImodules → additional/customized modules

If you need space here, please contact hpc@awi.de

Remote user storage

You can access your online space on the Isilon in Bremerhaven (see https://spaces.awi.de/x/a13-Eg for more information) via the mountpoints
/isibhv/projects
/isibhv/projects-noreplica
/isibhv/netscratch
/isibhv/platforms
/isibhv/home
albedo is connected with the AWI backbone (including the Isilon) via four 100 Gb interfaces.
Each single albedo node has a 10 Gb interface.

Compute nodes & Slurm

...

A submitted job has/needs the following information/resources:

...

-J or --job-name=

...

-N or --nodes=

...

-n or --ntasks=

--ntasks-per-node=

...

-c or --cpus-per-task=

...

Needed for OpenMP
If -n N and -c C is given, you get N x C cores.

...

Memory/RAM per CPU

...

--mem=

...

Maximum walltime

...

-t or --time=

...

-p or --partition=

...

Partitions

...

QOS

...

1 rsync -ur $INPUT_DATA /tmp/

# do the main calculation
srun $MY_GREAT_PROGRAM

# Copy your results from node where main MPI (rank 0) task runs to global storage
# If data is written on all nodes, start rsync using srun, as above
rsync -r /tmp/output/* /albedo/scratch/$MYPROJECT/output/

CPU, Memory, and Process Time Restrictions on a Login Node

On the login nodes albedo0 and albedo1, you have limits for what a process is allowed to do. Note please that the login nodes are not available for computing, and should be used for simple shell usage only! You get a total of 2048 processes (PIDs), 9 logins

Have a look at /etc/security/limits.conf. For further details.

Monitoring

Files

info.sh -f <file> shows if a file is on NVMe or HDD

Node usage monitoring

Try info.sh -l to get output of cat /proc/loadavg and vmstat -t -a -w -S M of all nodes your jobs are running. Use info.sh -L to add output of top -b -n1 -u$USER
ssh prod-xyz where a job of yours is running and try something like [h]top or vmstat -t -a -w -S M 1
info.sh -S to see running jobs and resources used from finished slurm jobs.

GPU monitoring

When using the GPUs you can monitor their usage with

Code Block

language	bash

ssh gpu-00[1-5]  # login
module load gpustat
gpustat -i1 --show-user --show-cmd -a

Code Block

language	bash

ssh gpu-00[1-5]  # login 
watch -d -n 1 nvidia-smi   # -d shows differences

Useful SLURM commands

...

Space shortcuts

Page tree

Versions Compared

Old Version 18

New Version Current

Key

Table of Contents
outline true
indent 10px
absoluteUrl true
exclude 1

Support

Storage

Local user storage

Software

Environment modules

Usage of a node's internal NVMe storage

System storage

Remote user storage

Compute nodes & Slurm

Partitions

QOS

CPU, Memory, and Process Time Restrictions on a Login Node

Monitoring

Files

Node usage monitoring

GPU monitoring

Useful SLURM commands

Space shortcuts

Page tree

Page History

Versions Compared

Old Version 18

New Version Current

Key

Table of Contentsoutlinetrueindent10pxabsoluteUrltrueexclude1

Login

Support

Storage

Local user storage

Software

Environment modules

Usage of a node's internal NVMe storage

System storage

Remote user storage

Compute nodes & Slurm

Partitions

QOS

CPU, Memory, and Process Time Restrictions on a Login Node

Monitoring

Files

Node usage monitoring

GPU monitoring

Useful SLURM commands

Table of Contents
outline true
indent 10px
absoluteUrl true
exclude 1