...
*) For filename patterns see: https://slurm.schedmd.com/sbatch.html#SECTION_%3CB%3EfilenameFILENAME-pattern%3C/B%3EPATTERN
Job enforcements
We implemented some enforcements to improve albedo's overall performance.
...
Partition | Nodes | Description | |||||
---|---|---|---|---|---|---|---|
smp | prod-[001-200] |
| |||||
mpp | prod-[001-200] |
| |||||
fat | fat-00[1-2] |
| |||||
matlab | fat-00[3-4] | currently reserved for matlab users (as personal matlab licenses are node-bound). This might change later.
| |||||
gpu | gpu-00[1-25] |
|
Quality of service (--qos)
...
To facilitate development and testing, we have reserved 10 20 nodes during working hours exclusively for jobs with QOS=30min.
...
- sinfo shows existing queues
For example to check how many nodes are available in a given partition (mpp, fat, gpu...)Code Block language bash sinfo -p<partition_name>
- scontrol show job <JobID> shows information about specific job
- sstat <JobID> shows resources used by a specific job
- squeue shows information about queues and used nodes
- sbatch <script> submits a batch job
- salloc <resources> requests access to compute nodes for interactive use
- scancel <JobID> cancels a batch job
- srun <ressources> <executable> starts a (parallel) code
- sshare and sprio give information on fair share value and job priority
- sreport -t Percent cluster UserUtilizationByAccount Start=$(date +%FT%T -d "1 month ago") Format=used,login,account | head -20 top usage users during the last month
Do's & Don'ts
- Do not use srun for simple non-parallel jobs like cp, ln, rm, cat, g[un]zip
- Do not write loops in your slurm script to start several instance of similar jobs → See Job arrays below
- Make use of parallel srun p[gu]igz instead of g[un]zip if you have allocated more than one CPU already
- Do not allocate costly resources (like fat/gpu nodes) if you not need them. Check the CPU/Memory-Efficiency of your jobs with info.sh -S
...