...
- sinfo shows existing queues
- scontrol show job <JobID> shows information about specific job
- sstat <JobID> shows resources used by a specific job
- squeue shows information about queues and used nodes
- smap curses-graphic of queues and nodes
- sbatch <script> submits a batch job
- salloc <resources> requests access to compute nodes for interactive use
- scancel <JobID> cancels a batch job
- srun <ressources> <executable> starts a (parallel) code
- sshare and sprio give information on fair share value and job priority
Usage of local storage on compute nodes
The compute nodes all have a local storage that is mounted at /tmp, respectively. Additionally, the GPU nodes have storage mounted at /scratch. See System overview for the exact sizes.
The access to these local storage is much faster than to $WORK, $HOME or $SCRATCH, but each node only "sees" their own storage (and not the storage of the job's other nodes).
If your job does lots of reading/writing, using the local disk might be beneficial. To do so, you have to copy data to / from the compute nodes. The /tmp folders are deleted once a day.
Code Block | ||
---|---|---|
| ||
# Copy data to the node, where your main MPI (rank 0) task runs
cp $MYDATA /tmp/myfolder
# If you need this data on every node, you have to add `srun` in front of the copy command
srun cp $MYDATA /tmp/myfolder |
The same applies for the opposite direction (copying results to the global filesystem).
If your job produces many small files, please consider packing those files into an archive (i.e. tar -czvf file.tar.gz <input data>
) before moving them to $WORK or $SCRATCH.
Example Scripts
Job arrays
...