On albedo we use environment modules to load/unload specific versions of software. Loading a module modifies environment variables so that the shell e.g. knows where to look for binaries.
You get an overview of all software installed by typing
module avail |
To load and unload a module use
# load module load <module> # unload module unload <loaded module> |
Sometimes it might be useful to unload all loaded modules at once. This is done with
module purge |
It is also possible to use the module command from some scripting languages. For example, in Python you can do:
$ python Python 3.8.16 | packaged by conda-forge | (default, Feb 1 2023, 16:01:55) [GCC 11.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> module_python_init = "/usr/share/Modules/init/python.py" >>> exec(open(module_python_init).read()) >>> result = module("list") Currently Loaded Modulefiles: 1) git/2.35.2 2) conda/22.9.0-2 >>> result is True True |
All compute (including fat and gpu) nodes have a local NVMe disk mounted as /tmp. The GPU nodes have an additional storage /scratch. See System overview for the exact sizes. We strongly encourage you to use these node-internal storage, which is faster than the global /albedo storage, if your job does lots of reading/writing. In particular, it might be beneficial to write your job output to the local disk and copy it to /albedo after your job is finished.
# Copy input data to the node, where your main MPI (rank 0) task runs rsync -ur $INPUT_DATA /tmp/ # If you need the input data on every node, you have to add `srun` in front of the copy command srun --ntasks-per-node=1 rsync -ur $INPUT_DATA /tmp/ # do the main calculation srun $MY_GREAT_PROGRAM # Copy your results from node where main MPI (rank 0) task runs to global storage # If data is written on all nodes, start rsync using srun, as above rsync -r /tmp/output/* /albedo/scratch/$MYPROJECT/output/ |
On the login nodes albedo0 and albedo1, you have limits for what a process is allowed to do. Note please that the login nodes are not available for computing, and should be used for simple shell usage only! You get a total of 2048 processes (PIDs), 9 logins
Have a look at /etc/security/limits.conf. For further details.
When using the GPUs you can monitor their usage with
ssh gpu-00[1-5] # login module load gpustat gpustat -i1 --show-user --show-cmd -a |
ssh gpu-00[1-5] # login watch -d -n 1 nvidia-smi # -d shows differences |