Login
- The login nodes can be accessed via
ssh albedo0.dmawi.de and ssh albedo1.dmawi.de
If you do not familiar with ssh and/or bash you should start here https://spaces.awi.de/x/-q1-Fw for a basic introduction. - Please do not use these nodes for computing, please use the compute nodes (see section Computenodes&Slurm below)
- HPC resources are not available from remote for security reasons (VPN is possible, https://spaces.awi.de/x/Smr-Eg).
- By using albedo you accept our HPC data policy https://spaces.awi.de/x/GgrXFw
Support
You can open a support ticket on helpdesk.awi.de or by writing an email to hpc@awi.de. Please do not send a personal email to any admin!
Storage
Local user storage
- The local storage is a parallel GxFS Storage Appliance from NEC based on https://en.wikipedia.org/wiki/GPFS.
- All nodes are connected via a 100 Gb Mellanox/Inifiniband network.
Personal directories | Project directories | |||||
---|---|---|---|---|---|---|
Mountpoint | /albedo/home/$USER | /albedo/work/user/$USER | /albedo/scratch/user/$USER | /albedo/work/projects/$PROJECT | /albedo/scratch/projects/$PROJECT | /albedo/burst |
Quota (soft) | - | 3TB | 50 TB | variable | variable | -- |
Quota (hard) | 100 GB | 15 TB for 90 days | -- | 2x soft quota for 90 days | -- | |
Delete | 90 days after user account expired | all data after 90 days | 90 days after project expired | all data after 90 days | after 10 days | |
Security | Snapshots for 6 months | -- | Snapshots for 6 months | -- | -- | |
Owner | $USER:hpc_user | $OWNER:$PROJECT | root:root | |||
Permission | 2700 → drwx--S--- | 2770 → rwxrws--- | 1777 → rwxwrxrwt | |||
Focus | many small files | large files, large bandwidth | low latency, huge bandwidth |
Remote user storage (/isibhv)
- You can access your online space on the Isilon in Bremerhaven (see https://spaces.awi.de/x/a13-Eg for more information) via the nfs-mountpoints
/isibhv/projects
/isibhv/projects-noreplica
/isibhv/netscratch
/isibhv/platforms
/isibhv/home - albedo is connected to the AWI backbone (including the Isilon and the HSM) via four eth-100 Gb interfaces.
Each single albedo node has a 10 Gb interface.
Compute nodes & Slurm
- To work interactively on a compute node use salloc. You can use all options (more CPU, RAM, time, partition, qos, ...) described below.
- To submit a job from the login nodes to the compute nodes you need slurm (job scheduler, batch queueing system and workload manager)
A submitted job has/needs the following information/resources:
What Use Default Comment Name -J or --job-name=
Account -A or --account= primary section, e.g.
clidyn.clidyn
computing.computingNew on albedo (was not necessary on ollie)
You can add a project (defined in eResources, you must be a member of the project) with -A <section>.<project>. This is helpful for reporting.
e.g.,
clidyn.clidyn
computing.tsunami
clidyn.fesomNumber of nodes -N or --nodes=
1 Number of (MPI-)tasks (per node) -n or --ntasks=
--ntasks-per-node=
1 Needed for MPI Number of cores/threads per task -c or --cpus-per-task=
1 Needed for OpenMP
If -n N and -c C is given, you get N x C cores.Memory/RAM per CPU
--mem=
ntasks x nthreads x 1.6 GB Only needed for smp-jobs, mpp-jobs get whole nodes (cores and memory) Maximum walltime
-t or --time=
01:00 Partition -p or --partition=
smp qos (quality of service) -q or --qos= normal - Please take a look at our examples scripts (from ollie) SLURM Example Scripts
Partitions
Partition | Nodes | Purpose |
---|---|---|
smp | 240 | For OpenMP or serial jobs with up to 36 (OpenMP) cores. Jobs may share nodes (until all cores are occupied). |
mpp | 240 | For MPI-parallel jobs, typically >=36 cores. Jobs get nodes exclusively |
fat | 4 | For OpenMP or serial jobs that need >256 GiB RAM |
gpu | 1 | For jobs that can take advantage from a Nvidia A100/80 |
QOS
QOS | max. time | max nodes job/user/total | Fairshare usage factor | Comment |
---|---|---|---|---|
short | 00:30:00 | 128 / 312 / 312 | 1 | High priority, jobs in this qos run first |
normal | 12:00:00 | 128 / 312 / 312 | 1 | Default |
large | 96:00:00 | 8 / 16 / 64 | 2 | |
xlarge | 400:00:00 | 1 / 2 / 8 | 10 | Only on request. Use at own risk, consider short jobs with restarts if possible |
knurd | - | - | 1 | For admins only |
Useful SLURM commands
- sinfo shows existing queues
- scontrol show job <JobID> shows information about specific job
- sstat <JobID> shows resources used by a specific job
- squeue shows information about queues and used nodes
- smap curses-graphic of queues and nodes
- sbatch <script> submits a batch job
- salloc <resources> requests access to compute nodes for interactive use
- scancel <JobID> cancels a batch job
- srun <ressources> <executable> starts a (parallel) code
- sshare and sprio give information on fair share value and job priority