Table of Contents | ||||||
---|---|---|---|---|---|---|
|
Login
The login nodes can be accessed via
ssh albedo0.awi.de and ssh albedo1.awi.de
Please do not use these nodes for computing, please use the compute nodes (see section below)
Storage
Users storage
Personal directories | Project directories | |||||
---|---|---|---|---|---|---|
Mountpoint | /albedo/home/$USER | /albedo/work/user/$USER | /albedo/scratch/user/$USER | /albedo/work/projects/$PROJECT | /albedo/scratch/projects/$PROJECT | /albedo/burst |
Quota (soft) | 100 GB | 3TB | 50 TB | variable | variable | -- |
Quota (hard) | 100 GB | 15 TB for 90 days | -- | 2x soft quota for 90 days | -- | |
Delete | 90 days after user account expired | all data after 90 days | 90 days after project expired | all data after 90 days | after 10 days | |
Security | Snapshots for 6 months | -- | Snapshots for 6 months | -- | -- | |
Owner | $USER:hpc_user | $OWNER:$PROJECT | root:root | |||
Permission | 2700 → drwx--S--- | 2770 → rwxrws--- | 1777 → rwxwrxrwt | |||
Focus | many small files | large files, large bandwith | low latency, huge bandwitch |
System storage
Is installed and maintained in /albedo/home/soft/:
- ./AWIsoft → binaries
- ./AWIbuild → sources
- ./AWImodules → additional modules
If you need space here, please contact hpc@awi.de
Compute nodes & Slurm
- To submit a job from the login nodes to the compute nodes you need slurm (job scheduler, batch queueing system and workload manager)
A submitted job has/needs the following information/ressources:
What Use Default Comment A Name --job-name=
Number of nodes -N or --nodes=
1 Number of (MPI-)tasks (per node) -n or --ntasks=
--ntasks-per-node=
1 Needed for MPI Number of cores/threads per task -c or --cpus-per-task=
1 Needed for OpenMP
If -n N and -c C is given, you get NxC cores.
Memory/RAM per CPU
--mem=
ntasks x nthreads x 1.6 GB Only needed for smp-jobs, mpp-jobs get whole nodes (cores and memory) A maximum walltime
--time=
01:00 A partition -p
smp A qos (quality of service) --qos= normal - Please take a look at our examples scripts (from ollie) SLURM Example Scripts
Partitions
Partition | Nodes | Purpose |
---|---|---|
smp | For jobs with up to 36 cores. Jobs may share nodes (until all cores are occupied). | |
mpp | For parallel jobs, typically >=36 cores. Jobs get nodes exclusively | |
gpu | ||
QOS
QOS | max. time | max nodes job/user/total | Fairshare usage factor | Comment |
---|---|---|---|---|
short | 00:30:00 | 128 / 312 / 312 | 1 | High priority, jobs in this qos run first |
normal | 12:00:00 | 128 / 312 / 312 | 1 | Default |
large | 96:00:00 | 8 / 16 / 64 | 2 | |
xlarge | 400:00:00 | 1 / 2 / 8 | 10 | Only on request. Use at own risk, consider short jobs with restarts if possible |
knurd | - | - | 1 | For admins only |
SLURM commands
- sinfo shows existing queues
- scontrol show job <JobID> shows information about specific job
- sstat <JobID> shows resources used by a specific job
- squeue shows information about queues and used nodes
- smap curses-graphic of queues and nodes
- sbatch <script> submits a batch job
- salloc <resources> requests access to compute nodes for interactive use
- scancel <JobID> cancels a batch job
- srun <ressources> <executable> starts a (parallel) code
- sshare and sprio give information on fair share value and job priority