Table of Contents

Account

To help attributing the usage of computing resources to the groups and projects of AWI, which is needed for reporting, on Albedo it is necessary to specify an account.
This is done by setting

Code Block

language	bash

-A, --account=<account>

Warning
This is new on Albedo!

Possible slurm accounts are listed after login. To enforce setting an account, no (valid) default account is set.
Users are, however, able to change this setting on their own:

Code Block

language	bash

sacctmgr modify user <user> set DefaultAccount=<account>

Note
The account specified is only used for reporting issues. No account gets privileged access to compute resources compared to others!

Partitions

Albedo’s compute nodes are divided into the following partitions, which are shown in the table below.

The smp partiton is the default and is for jobs which require less or equal than 128 cores. By default each core is attributed 256GB/128core = 2GB/core. Jobs in this partition might share a node.

Nodes in the mpp partition are exclusively reserved, and hence the entire memory is available for the job. This partition is used when one ore more nodes are needed.

The fat nodes can be selected via the fat partition. This partition resembles the smp partition but each node has much more memory.

Similarly, the GPU nodes can be accessed via the gpu partition. Note, that the type and number of GPUs need to be specified.

More infos about the hardware specification of each node can be found in the System Overview (TODO: Link).

Partition	Nodes	Description
smp	prod-[001-240]	default partition, MaxNodes=1 → MaxCores=128, Jobs can share a node
mpp	prod-[001-240]	exclusive access to nodes, MaxNodes=240
fat	fat-00[1-4]	MaxNodes=1 Jobs can share a Node
gpu	gpu-00[1-2]	MaxNodes=1, Jobs can share a node, Note: You have to specify the type and number of desired GPUs via `--gpus=<GpuType>:<GpuNumber>` . The two gpu nodes each contain a different number and type of GPU: gpu-001: 2x a40 gpu-002: 4x a100

Quality of service (QOS)

By default, the QOS 30min is used. It has a max. walltime of 30 minutes and jobs with this QOS get a higher priority and have access to a special SLURM reservation during working time (TODO: add details when set up), to facilitate development and testing. For longer runs, another QOS (and walltime) has to be specified. See table below. Note: long running jobs (longer than 12 hours, up to 48 hours) “cost” more in terms of fairshare.

QOS	max. walltime	UsageFactor	Priority QOS_factor
30min	0:30:00	1	50
12h	12:00:00	1	0
48h	48:00:00	2	0

A short note on the definitions:

UsageFactor: A float that is factored into a job’s TRES usage (e.g. RawUsage, …)

...

is albedo's job scheduling system. It is used to submit jobs from the login nodes to the compute nodes.

Jobs

Submitting jobs

To work interactively on a compute node use salloc.
You can use all options (more CPU, RAM, time, partition, qos, ...) described in the next section.
To enable working with graphical interfaces (X forwarding) add the option --x11 .
Job scripts are submitted via sbatch
You can ssh to a node where a job of yours is running, if (and only if) you have a valid ssh-key pair. (e.g. on a login node: ssh-keygen -t ed25519; ssh-copy-id albedo1)
Make sure your key is secured wit ha with a password!

Specifying job resources

...

*) For filename patterns see: https://slurm.schedmd.com/sbatch.html#SECTION_%3CB%3Efilename-pattern%3C/B%3E

Details about specific parameters

Account (-A)

Compute resources are attributed to (primary)sections and projects at AWI. Therefore it is mandatory to specify an account.

Warning
This is new on Albedo, compared to ollie

The slurm accounts you may use are listed after login. You can change this setting on their own:

Code Block

language	bash

sacctmgr modify user <user> set DefaultAccount=<account>

Partitions (-p)

Identical compute nodes are combined in partitions. More information about the hardware specification of each node can be found in the System Overview.

Partition	Nodes	Description
smp	prod-[001-240]	default partition, MaxNodes=1 → MaxCores=128, default RAM: 1900 MB/core Jobs can share a node
mpp	prod-[001-240]	exclusive access to nodes, MaxNodes=240
fat	fat-00[1-4]	like smp but for jobs with extensive need of RAM default RAM: 30000 MB
gpu	gpu-00[1-2]	like smp but... ... you have to specify the type and number of desired GPUs via `--gpus=<GpuType>:<GpuNumber>` . The two gpu nodes each contain a different number and type of GPU: gpu-001: 2x a40 gpu-002: 4x a100

Quality of service (QOS)

A higher priority means your job is scheduled before other jobs. In addition, during working hours 10 nodes are reserved exclusively for jobs using qos=30min (to facilitate development and testing). For longer runs, another QOS (and walltime) has to be specified. Note: long running jobs (longer than 12 hours, up to 48 hours) “cost” more in terms of fairshare (meaning you priority will decrease for further jobs).

QOS	max. walltime	UsageFactor	Priority QOS factor	Notes
30min	00:30:00	1	50	default
12h	12:00:00	1	0
48h	48:00:00	2	0

Job Scheduling

Priority

For the job scheduling, Slurm assigns each job a priority, which is calculated based on several factors (Multifactor Priority Plugin). Jobs with higher priority, run first. (In principle – the backfill scheduling plugin helps making best use of available resources by filling up resources that are reserved (and thus idle) for large higher priority jobs with small (lower priority) jobs.)

...

Space shortcuts

Page tree

Versions Compared

Old Version 20

New Version 21

Key

Account

Partitions

Quality of service (QOS)

Jobs

Submitting jobs

Specifying job resources

Details about specific parameters

Account (-A)

Partitions (-p)

Quality of service (QOS)

Job Scheduling

Priority

Space shortcuts

Page tree

Page History

Versions Compared

Old Version 20

New Version 21

Key

Account

Partitions

Quality of service (QOS)

Jobs

Submitting jobs

Specifying job resources

Details about specific parameters

Account (-A)

Partitions (-p)

Quality of service (QOS)

Job Scheduling

Priority