Table of Contents | ||||||||
---|---|---|---|---|---|---|---|---|
|
Login
- You have to be member of HPC_user (can be applied for on id.awi.de)
- The login nodes can be accessed via
ssh albedo0.dmawi.de and ssh albedo1.dmawi.de
→ If you do not familiar with ssh and/or bash you should start here for a basic introduction. - Please do not use these nodes for computing, please use the compute nodes (and take a look at our hardware and slurm documentation)
- HPC resources are not available from remote for security reasons (VPN is possible).
- By using albedo you accept our HPC data policy
Copy data from ollie
- login to ollie
on ollie:
Code Block language bash rsync -Pauv --no-g /work/ollie/$USER/your-data $USER@albedo0.dmawi.de:/albedo/work/projects/$YOURPROJECT/
Note: The other way round (rsync from albedo instead of from ollie) does not work, because of a specific route set on ollie.
Software
- Albedo is running the operating system Rocky Linux release 8.6 (Green Obsidian).
- Slurm 22.05 is used as the job scheduling system. Important details on its configuration on Albedo are given here: Slurm.
- Details on the user software can be found here: Software.
Environment modules
On albedo we use environment modules to load/unload specific versions of software. Loading a module modifies environment variables so that the shell e.g. knows where to look for binaries.
You get an overview of all software installed by typing
Code Block | ||
---|---|---|
| ||
module avail |
To load and unload a module use
Code Block | ||
---|---|---|
| ||
# load
module load <module>
# unload
module unload <loaded module> |
Sometimes it might be useful to unload all loaded modules at once. This is done with
Code Block | ||
---|---|---|
| ||
module purge |
Usage of node's internal storage
All compute (including fat and gpu) nodes have a local NVMe disk mounted as /tmp. The GPU nodes have an additional storage /scratch. See System overview for the exact sizes. We strongly encourage you to use these node-internal storage, which is faster than the global /albedo storage, if your job does lots of reading/writing. In particular, it might be beneficial to write your job output to the local disk and copy it to /albedo after your job is finished.
Code Block | ||
---|---|---|
| ||
# Copy input data to the node, where your main MPI (rank 0) task runs
rsync -ur $INPUT_DATA /tmp/
# If you need the input data on every node, you have to add `srun` in front of the copy command
srun rsync -ur $INPUT_DATA /tmp/
# do the main calculation
srun $MY_GREAT_PROGRAM
# Copy your results from node where main MPI (rank 0) task runs to global storage
rsync -r /tmp/output/* /albedo/scratch/$MYPROJECT/output/ |
GPU monitoring
When using the GPUs you can monitor there usage with
Code Block | ||
---|---|---|
| ||
ssh gpu-00[12] # login
module load gpustat
gpustat -i1 --show-user --show-cmd -a |
Code Block | ||
---|---|---|
| ||
ssh gpu-00[12] # login
watch -d -n 0.5 nvidia-smi # -d shows differences |