Table of Contents | ||||
---|---|---|---|---|
|
Storage
Users storage
Personal directories | Project directories | |||||
---|---|---|---|---|---|---|
Mountpoint | /albedo/home/$USER | /albedo/work/user/$USER | /albedo/scratch/user/$USER | /albedo/work/projects/$PROJECT | /albedo/scratch/projects/$PROJECT | /albedo/burst |
Quota (soft) | 100 GB | 3TB | 50 TB | variable | variable | -- |
Quota (hard) | 100 GB | 15 TB for 90 days | -- | 2x soft quota for 90 days | -- | |
Delete | 90 days after user account expired | all data after 90 days | 90 days after project expired | all data after 90 days | after 10 days | |
Security | Snapshots for 6 months | -- | Snapshots for 6 months | -- | -- | |
Owner | $USER:hpc_user | $OWNER:$PROJECT | root:root | |||
Permission | 2700 → drwx--S--- | 2770 → rwxrws--- | 1777 → rwxwrxrwt | |||
Focus | many small files | large files, large bandwith | low latency, huge bandwitch |
System storage
Is installed and maintained in /albedo/home/soft/:
...
If you need space here, please contact hpc@awi.de
Compute nodes & Slurm
- To submit a job from the login nodes to the compute nodes you need slurm (job scheduler, batch queueing system and workload manager)
A submitted job has/needs the following information/ressources:
What Use Default Comment A Name --job-name=
Number of nodes -N or --nodes=
1 Number of (MPI-)tasks (per node) -n or --ntasks=
--ntasks-per-node=
1 Needed for MPI Number of cores/threads per task -c or --cpus-per-task=
1 Needed for OpenMP
If -n N and -c C is given, you get NxC cores.
Memory/RAM per CPU
--mem=
ntasks x nthreads x 1.6 GB Only needed for smp-jobs, mpp-jobs get whole nodes (cores and memory) A maximum walltime
--time=
01:00 A partition -p
smp A qos (quality of service) --qos= normal - Please take a look at our examples scripts (from ollie) SLURM Example Scripts
Partitions
Partition | Nodes | Purpose |
---|---|---|
smp | For jobs with up to 36 cores. Jobs may share nodes (until all cores are occupied). | |
mpp | For parallel jobs, typically >=36 cores. Jobs get nodes exclusively | |
gpu | ||
QOS
QOS | max. time | max nodes job/user/total | Fairshare usage factor | Comment |
---|---|---|---|---|
short | 00:30:00 | 128 / 312 / 312 | 1 | High priority, jobs in this qos run first |
normal | 12:00:00 | 128 / 312 / 312 | 1 | Default |
large | 96:00:00 | 8 / 16 / 64 | 2 | |
xlarge | 400:00:00 | 1 / 2 / 8 | 10 | Only on request. Use at own risk, consider short jobs with restarts if possible |
knurd | - | - | 1 | For admins only |
SLURM commands
- sinfo shows existing queues
- scontrol show job <JobID> shows information about specific job
- sstat <JobID> shows resources used by a specific job
- squeue shows information about queues and used nodes
- smap curses-graphic of queues and nodes
- sbatch <script> submits a batch job
- salloc <resources> requests access to compute nodes for interactive use
- scancel <JobID> cancels a batch job
- srun <ressources> <executable> starts a (parallel) code
- sshare and sprio give information on fair share value and job priority