Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

slurm is  is albedo's job scheduling system. It is used to submit jobs from the login nodes to the compute nodes.

...

  • To work interactively on a compute node use salloc.
    You can use all options (more CPU, RAM, time, partition, qos, ...) described in the next section.
    To enable working with graphical interfaces (X forwarding) add the option --x11 .
  • Job scripts are submitted via sbatch
  • You can ssh to a node where a job of yours is running, if (and only if) you have a valid ssh-key pair. (e.g. on a login node: ssh-keygen -t ed25519;  ssh-copy-id albedo1)
    Make sure your key is secured with a password!

Specifying job resources

Job resources are defined at the header of your job script (or as command line arguments for sbatch or salloc). A full list see https://slurm.schedmd.com/sbatch.html#SECTION_OPTIONS. Here is a list of the most common ones:

Specifying job resources

Job resources are defined at the header of your job script (or as command line arguments for sbatch or salloc). A full list see https://slurm.schedmd.com/sbatch.html#SECTION_OPTIONS. Here is a list of the most common ones:

Code Block
languagebash
#SBATCH --account=<account>          # Your account
#SBATCH --partition=<partition>      # Slurm Partition; Default: smp
#SBATCH --time=<time>
Code Block
languagebash
#SBATCH --account=<account>          # Your account
#SBATCH --partition=<partition>   # time limit #for Slurm Partitionjob; Default: smp0:30:00
#SBATCH --time=<time>                # time limit for job; Default: 0:30:00
#SBATCH --qos=qos=<QOS>                  # Slurm QOS; Default: 30min
#SBATCH --nodes=<#Nodes>             # Number of nodes
#SBATCH --ntasks=<#Tasks>            # Number of tasks (MPI) tasks to be launched
#SBATCH --mem=<memory>               # If more than the default memory is needed;
                                     # Default: <#Cores> * <mem per node>/<cores per node>
#SBATCH --ntasks-per-node=<ntasks>   # Numer of tasks per node
#SBATCH --mail-user=<email adress>   # Your mail adress if you want to get notifications
#SBATCH --mail-type=<email type>     # Valid type values are NONE, BEGIN, END, FAIL, REQUEUE, ALL
#SBATCH --job-name=<jobname>         # Job name
#SBATCH --output=<filename_pattern>  # File where the standard output is written to(*)
#SBATCH --error=<filename_pattern>   # File where the error messages are written to(*)

 *) For filename patterns see: https://slurm.schedmd.com/sbatch.html#SECTION_%3CB%3EfilenameFILENAME-pattern%3C/B%3E

Details about specific parameters

PATTERN

Job enforcements

We implemented some enforcements to improve albedo's overall performance.

  • Jobs requesting --partition=fat but only low memory are rejected.
  • Jobs requesting less than 40 nodes are enforced to use only nodes connected to the very same Infiniband switch (if this is feasible within 15 Minutes).

Details about specific parameters

Account (-Account (-A)

Compute resources are attributed to (primary)sections and projects at AWI. Therefore it is mandatory to specify an account.

...

The slurm accounts you may use are listed after login. You can change this setting on their own:, with info.sh -s or can be shown via

Code Block
languagebashtext
sacctmgr -s modifyshow user <user> set DefaultAccount=<account>

Partitions (-p)

name=$USER format=user,account%-30

Note: The account noaccount is just a dummy account that can not be used for computing.

You can change the default setting on your own:

Code Block
languagetext
sacctmgr modify user $USER set DefaultAccount=<account>

Partitions (-p)

Identical compute nodes are combined in partitions. More information Identical compute nodes are combined in partitions. More information about the hardware specification of each node can be found in the System Overview.

Partition

Nodes

Description

smp

prod-[001-

240

200]

  • default partition,

  • MaxNodes=1 → MaxCores=128,

  • default RAM: 1900 MB/core
  • Jobs can share a node

mpp

prod-[001-

240

200]

  • exclusive access to nodes,

  • MaxNodes=240

fat

fat-00[1-

4

2]

  • like smp but for jobs with extensive need of RAM

  • default RAM: 30000 MB/core
gpu

matlab

gpu

fat-00[

1

3-

2]
  • like smp but...

  • ... you have to specify the type and number of desired GPUs via --gpus=<GpuType>:<GpuNumber> .
    The two gpu nodes

    4]

    currently reserved for matlab users (as personal matlab licenses are node-bound). This might change later.

    Note

    To prohibit single users from allocating too many resources on these dedicated nodes, we limit the resources per user in this partition to 32 CPUs and 1TB RAM. Please get in touch with us if these limitations conflict with your use case!


    gpu

    gpu-00[1-5]


    • like smp but...

    • ... 5 gpu nodes, each contain a different number and type of GPU:

      • gpu-001: 2x a40  

      • gpu-

    002
      • 00[2-5]: 4x a100

    Quality of service (--qos)

    A higher priority means your job is scheduled before other jobs. In addition,  during working hours 10 nodes are reserved exclusively for jobs using qos=30min (to facilitate development and testing). For longer runs, another QOS (and walltime) has to be specified.  Note: long running jobs (longer than 12 hours, up to 48 hours) “cost” more in terms of fairshare (meaning you priority will decrease for further jobs).

    • ...you have to specify the type and number of desired GPUs via
      --gpus=<GpuType>:<GpuQuantity>
      (otherwise no GPU will be allocated for you)
      Example for requesting 2 a40 GPUs with salloc:
      Code Block
      languagebash
      salloc --partition=gpu --gpus=a40:2


    Quality of service (--qos)

    Slurm's QOS is a way for us to influence a job's priority (priority QOS_factor) and "cost" (UsageFactor) based on the job's size (we only take walltime into account here!). We therefore created the different QOS, which are listed below.

    The default QOS is 30min; for a job with a walltime >30min you have to select and set an appropriate QOS in addition to your walltime!

    To facilitate development and testing, we have reserved 20 nodes during working hours exclusively for jobs with QOS=30min.

    50:00:00

    QOS

    max. walltime

    max. Nodes/User

    UsageFactor

    Priority QOS_factor

    Notes

    30min

    00:30

    -

    1

    1

    QOS

    max. walltime

    UsageFactor

    Priority QOS factor

    Notes

    30min

    00:30:00

    1

    default

    12h

    12:00

    120

    1

    0


    48h

    48:00

    80

    2

    0

    Job Scheduling

    Priority

    For the job scheduling, Slurm assigns each job a priority, which is calculated based on several factors (Multifactor Priority Plugin). Jobs with higher priority, run first. (In principle – the backfill scheduling plugin helps making best use of available resources by filling up resources that are reserved (and thus idle) for large higher priority jobs with small (lower priority) jobs.)

    On albedo,  the priority is mainly influenced by the

    • the fairshare factor (which is based on the user’s recent use of resources) and
    • the QOS' priority factor and
    • the time your job waits in the queue

    Job size (RAM, cores), partitions and/or associations have no influence.

    Fairshare

    On Albedo all users have the same share of resources, independent of the account used. … TODO…

    Sebastian Hinck 

    Accounting

    TODO...

    Sebastian Hinck 

    Useful Slurm commands

    • sinfo shows existing queues
    • scontrol show job <JobID> shows information about specific job
    • sstat <JobID> shows resources used by a specific job
    • squeue shows information about queues and used nodes
    • smap curses-graphic of queues and nodes
    • sbatch <script> submits a batch job
    • salloc <resources> requests access to compute nodes for interactive use
    • scancel <JobID> cancels a batch job
    • srun <ressources> <executable> starts a (parallel) code
    • sshare and sprio give information on fair share value and job priority

    Example Scripts

    ToDO... Sebastian Hinck 

    Job arrays

    Job arrays in Slurm are an easy way to submit multiple similar jobs (e.g. executing the same script with multiple input data). See here for further details.

    Code Block
    languagebash
    #!/bin/bash
    
    #SBATCH --account=<account>          # Your account
    #SBATCH --partition=smp
    #SBATCH --time=0:10:00
    #SBATCH --ntasks=1
    
    # run 100 tasks, but only run 10 at a time
    #SBATCH --array=1-100%10
    #SBATCH --output=result_%A_%a.out    # gives result_<jobID>_<taskID>.out
    
    echo "SLURM_JOBID:         $SLURM_JOBID"
    echo "SLURM_ARRAY_TASK_ID: $SLURM_ARRAY_TASK_ID"
    echo "SLURM_ARRAY_JOB_ID:  $SLURM_ARRAY_JOB_ID"
    
    # Here we "translate" the $SLURM_ARRAY_TASK_ID (which takes values from 1-100)
    # into an input file, that we want to analyze.
    # Suppose 'input_files.txt' is a text file that has 100 lines, each containing
    # the respective input file.
    
    INPUT_LIST=input_files.txt
    
    # Read the (SLURM_ARRAY_TASK_ID)th input file
    INPUT_FILE=`sed -n "${SLURM_ARRAY_TASK_ID}p" < ${INPUT_LIST}`
    
    srun my_executable $INPUT_FILE
    Note

    How you “translate” your task ID into the srun command line is up to you. You could, for example, also have different scripts that you select in some way and execute.

    OpenMP

    Code Block
    languagebash
    #!/bin/bash
    
    #SBATCH --time 0:10:00
    #SBATCH -p smp
    #SBATCH --tasks-per-node 1
    #SBATCH --cpus-per-task 64
    #SBATCH --job-name=openMP
    #SBATCH --output=out_%x.%j
    
    # disable hyperthreading
    #SBATCH --hint=nomultithread
    
    module purge
    module load    xthi/1.0-intel-oneapi-mpi2021.6.0-oneapi2022.1.0   intel-oneapi-mpi
    # module load    xthi/1.0-openmpi4.1.3-gcc8.5.0   openmpi/4.1.3
    
    ## Uncomment the following line to enlarge the stacksize if needed,
    ##  e.g., if your code crashes with a spurious segmentation fault.
    # ulimit -s unlimited
    
    # OpenMP and srun, both need to know the number of CPUs per task
    export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
    export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK
    
    srun xthi | sort -g -k 4

    MPI

    Code Block
    languagebash
    titlefull node
    #!/bin/bash
    
    #SBATCH --time 0:10:00
    #SBATCH -p mpp
    #SBATCH -N 2
    #SBATCH --tasks-per-node 128
    #SBATCH --cpus-per-task 1
    #SBATCH --hint=nomultithread
    #SBATCH --job-name=mpi
    #SBATCH --output=out_%x.%j
    
    # disable hyperthreading
    #SBATCH --hint=nomultithread
    
    module purge
    module load    xthi/1.0-intel-oneapi-mpi2021.6.0-oneapi2022.1.0    intel-oneapi-mpi
    # module load    xthi/1.0-openmpi4.1.3-gcc8.5.0   openmpi/4.1.3
    
    ## Uncomment the following line to enlarge the stacksize if needed,
    ##  e.g., if your code crashes with a spurious segmentation fault.
    # ulimit -s unlimited
    
    # To be on the safe side, we emphasize that it is pure MPI, no OpenMP threads
    export OMP_NUM_THREADS=1
    
    srun  xthi | sort -g -k 4
    
    Code Block
    languagebash
    titlepartially filled node
    #!/bin/bash
    
    #SBATCH --time 0:10:00
    #SBATCH -p mpp
    #SBATCH -N 2
    #SBATCH --tasks-per-node 31
    #SBATCH --hint=nomultithread
    #SBATCH --job-name=mpi_partial_node
    #SBATCH --output=out_%x.%j
    
    # disable hyperthreading
    #SBATCH --hint=nomultithread
    
    module purge
    module load    xthi/1.0-intel-oneapi-mpi2021.6.0-oneapi2022.1.0   intel-oneapi-mpi
    # module load    xthi/1.0-openmpi4.1.3-gcc8.5.0   openmpi/4.1.3
    
    ## Uncomment the following line to enlarge the stacksize if needed,
    ##  e.g., if your code crashes with a spurious segmentation fault.
    # ulimit -s unlimited
    
    # To be on the safe side, we emphasize that it is pure MPI, no OpenMP threads
    export OMP_NUM_THREADS=1
    
    # The --cpu-bind=rank_ldom distributes the tasks via the node's cores
    # respecting the node's NUMA domains
    srun --cpu-bind=rank_ldom xthi | sort -g -k 4

    Hybrid (MPI+OpenMP)


    1wk

    7-00:00:00 (168h)

    1

    10

    0

    only available for users upon request; whenever possible try to adapt your workflow to allow for shorter walltime!

    Warning
    In case of urgent system maintenance we might cancel long jobs using this QOS without further warning!


    Job Scheduling

    Priority

    Jobs on albedo are scheduled based on a priority that is computed by Slurm depending on multiple factors (https://slurm.schedmd.com/priority_multifactor.html).
    The higher the priority, the sooner your job begins. (In principle – the backfill scheduling plugin helps making best use of available resources by filling up resources that are reserved (and thus idle) for large higher priority jobs with small (lower priority) jobs.)
    At AWI, only few of the possible factors are taken into account:

    Code Block
    languagetext
    Job_priority =   (PriorityWeightAge) * (age_factor)
                     + (PriorityWeightFairshare) * (fair-share_factor)
                     + (PriorityWeightQOS) * (QOS_factor)
                     - nice_factor


                     
    The weights in this formula are set to balance the different factors and might become subject for tuning.
    The current values can be assessed by running

    Code Block
    languagebash
    $ scontrol show config | grep -i PriorityWeight
    PriorityWeightAge       = 3500
    PriorityWeightAssoc     = 0
    PriorityWeightFairShare = 10000
    PriorityWeightJobSize   = 0
    PriorityWeightPartition = 0
    PriorityWeightQOS       = 5000
    PriorityWeightTRES      = (null)

    The factors (except of the nice_factor (default is zero), which can be set by the user to downgrade the jobs priority by the setting --nice=...), are numbers in the range from 0 to 1.
    They are shortly explained in the following.

    You can check the recent usage of albedo with this command: 

    Code Block
    sreport -t Percent cluster UserUtilizationByAccount  Start=$(date +%FT%T -d "1 week ago")  Format=used,login,account
    FairShare

    The fairshare factor is the most important factor here, but also the most difficult factor to understand. This factor is calculated using the "classic" fairshare algorithm of Slurm (https://slurm.schedmd.com/classic_fair_share.html). It computes the fairshare for each user based on the recent usage of the system.
    Note, the usage of your associated account is *not* taken into accunt here, as it was the case on ollie!
    Usage is basically "CPU seconds", but weighted using the UsageFactor depending on the used QOS (see section QOS). Furthermore, the usage taken into account here decays with time (with a half life time of 7 days).
    Fairshare is the calculated by

    Code Block
    languagetext
    FS = 2^(- (U_N / S_N) / D),


    where the normalized usage U_N is the own usage relative to the total usage of albedo, the normalized share S_N is the share of a user on the entire system (1/(number of albedo users)) and D is a dampening factor. The formula basically assigns users a fairshare > 0,5 who under-use their share and < 0,5 for users who over-use their share. This is shown in the following figure, where the dots are taken from historic data from ollie. D has to be adjusted to account for the many users with an HPC account, who don't use it. This might also need tuning.

    Image Added

    Fairshare values can be shown with the command

    Code Block
    languagebash
    sshare


    QOS

    To reward usage of the short 30min QOS for jobs, which are easier to schedule, the priority is increased!
    See section about QOS.

    Age

    Job's priority slowly increases with waiting time in the queue. With the current setting the priority is increased by 500 for each day waiting. The factor saturates after 7 days.
    Note: Jobs waiting for a dependency to finish are not ageing.

    Useful Slurm commands

    • sinfo shows existing queues
      For example to check how many nodes are available in a given partition (mpp, fat, gpu...)
      Code Block
      languagebash
      sinfo -p<partition_name>
    • scontrol show job <JobID> shows information about specific job
    • sstat <JobID> shows resources used by a specific job
    • squeue shows information about queues and used nodes
    • sbatch <script> submits a batch job
    • salloc <resources> requests access to compute nodes for interactive use
    • scancel <JobID> cancels a batch job
    • srun <ressources> <executable> starts a (parallel) code
    • sshare and sprio give information on fair share value and job priority
    • sreport -t Percent cluster UserUtilizationByAccount  Start=$(date +%FT%T -d "1 month ago")  Format=used,login,account | head -20 top usage users  during the last month

    Do's & Don'ts

    • Do not use srun for simple non-parallel jobs like cplnrm, cat, g[un]zip
    • Do not write loops in your slurm script to start several instance of similar jobs → See Job arrays below
    • Make use of parallel srun p[gu]igz instead of g[un]zip if you have allocated more than one CPU already
    • Do not allocate costly resources (like fat/gpu nodes) if you not need them. Check the CPU/Memory-Efficiency of your jobs with info.sh -S

    Example Scripts

    Job arrays

    Job arrays in Slurm are an easy way to submit multiple similar jobs (e.g. executing the same script with multiple input data). See here for further details.

    Code Block
    languagebash
    #!/bin/bash
    
    #SBATCH --account=<account>          # Your account
    #SBATCH --partition=smp
    #SBATCH --time=0:10:00
    #SBATCH --ntasks=1
    
    # run 100 tasks, but only run 10 at a time
    #SBATCH --array=1-100%10
    #SBATCH --output=result_%A_%a.out    # gives result_<jobID>_<taskID>.out
    
    echo "SLURM_JOBID:         $SLURM_JOBID"
    echo "SLURM_ARRAY_TASK_ID: $SLURM_ARRAY_TASK_ID"
    echo "SLURM_ARRAY_JOB_ID:  $SLURM_ARRAY_JOB_ID"
    
    # Here we "translate" the $SLURM_ARRAY_TASK_ID (which takes values from 1-100)
    # into an input file, that we want to analyze.
    # Suppose 'input_files.txt' is a text file that has 100 lines, each containing
    # the respective input file.
    
    INPUT_LIST=input_files.txt
    
    # Read the (SLURM_ARRAY_TASK_ID)th input file
    INPUT_FILE=`sed -n "${SLURM_ARRAY_TASK_ID}p" < ${INPUT_LIST}`
    
    srun my_executable $INPUT_FILE


    Info


    How you “translate” your task ID into the srun command line is up to you. You could, for example, also have different scripts that you select in some way and execute.


    MPI

    Code Block
    languagebash
    titlefull node
    #!/bin/bash
    
    #SBATCH --account=<account>          # Your account 
    #SBATCH --time=0:10:00
    #SBATCH -p mpp
    #SBATCH -N 2
    #SBATCH --tasks-per-node=128
    #SBATCH --cpus-per-task=1
    #SBATCH --hint=nomultithread
    #SBATCH --job-name=mpi
    #SBATCH --output=out_%x.%j
    
    # disable hyperthreading
    #SBATCH --hint=nomultithread
    
    module purge
    module load    xthi/1.0-intel-oneapi-mpi2021.6.0-oneapi2022.1.0    intel-oneapi-mpi
    # module load    xthi/1.0-openmpi4.1.3-gcc8.5.0   openmpi/4.1.3
    
    ## Uncomment the following line to enlarge the stacksize if needed,
    ##  e.g., if your code crashes with a spurious segmentation fault.
    # ulimit -s unlimited
    
    # To be on the safe side, we emphasize that it is pure MPI, no OpenMP threads
    export OMP_NUM_THREADS=1
    
    srun  xthi | sort -g -k 4
    


    Code Block
    languagebash
    titlepartially filled node
    #!/bin/bash
    
    #SBATCH --account=<account>          # Your account 
    #SBATCH --time=0:10:00
    #SBATCH -p mpp
    #SBATCH -N 2
    #SBATCH --tasks-per-node=31
    #SBATCH --hint=nomultithread
    #SBATCH --job-name=mpi_partial_node
    #SBATCH --output=out_%x.%j
    
    # disable hyperthreading
    #SBATCH --hint=nomultithread
    
    module purge
    module load    xthi/1.0-intel-oneapi-mpi2021.6.0-oneapi2022.1.0   intel-oneapi-mpi
    # module load    xthi/1.0-openmpi4.1.3-gcc8.5.0   openmpi/4.1.3
    
    ## Uncomment the following line to enlarge the stacksize if needed,
    ##  e.g., if your code crashes with a spurious segmentation fault.
    # ulimit -s unlimited
    
    # To be on the safe side, we emphasize that it is pure MPI, no OpenMP threads
    export OMP_NUM_THREADS=1
    
    # The --cpu-bind=rank_ldom distributes the tasks via the node's cores
    # respecting the node's NUMA domains
    srun --cpu-bind=rank_ldom xthi | sort -g -k 4

    OpenMP

    Code Block
    languagebash
    #!/bin/bash
    
    #SBATCH --account=<account>          # Your account 
    #SBATCH --time=0:10:00
    #SBATCH -p smp
    #SBATCH --tasks-per-node=1
    #SBATCH --cpus-per-task=64
    #SBATCH --job-name=openMP
    #SBATCH --output=out_%x.%j
    
    # disable hyperthreading
    #SBATCH --hint=nomultithread
    
    module purge
    module load    xthi/1.0-intel-oneapi-mpi2021.6.0-oneapi2022.1.0   intel-oneapi-mpi
    # module load    xthi/1.0-openmpi4.1.3-gcc8.5.0   openmpi/4.1.3
    
    ## Uncomment the following line to enlarge the stacksize if needed,
    ##  e.g., if your code crashes with a spurious segmentation fault.
    # ulimit -s unlimited
    # export OMP_STACKSIZE=128M
    
    # This binds each thread to one core
    export OMP_PROC_BIND=TRUE
    
    # OpenMP and srun, both need to know the number of CPUs per task
    export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
    export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK
    
    srun xthi | sort -g -k 4

    Hybrid (MPI+OpenMP)

    Code Block
    languagebash
    #!/bin/bash
    
    #SBATCH --account=<account>          # Your account 
    #SBATCH --time=0:10:00
    #SBATCH -p mpp
    #SBATCH -N 2
    #SBATCH --tasks-per-node=8
    #SBATCH --cpus-per-task=16
    #SBATCH --job-name=hybrid
    #SBATCH --output=out_%x.%j
    
    # disable hyperthreading
    #SBATCH --hint=nomultithread
    
    module purge
    module load    xthi/1.0-intel-oneapi-mpi2021.6.0-oneapi2022.1.0   intel-oneapi-mpi
    # module load    xthi/1.0-openmpi4.1.3-gcc8.5.0   openmpi/4.1.3
    
    ## Uncomment the following line to enlarge the stacksize if needed,
    ##  e.g., if your code crashes with a spurious segmentation fault.
    # ulimit -s unlimited
    # export OMP_STACKSIZE=128M
    
    # This binds each thread to one core
    export OMP_PROC_BIND=TRUE
    
    # OpenMP and srun, both need to know the number of CPUs per task
    export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
    export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK
    
    srun xthi | sort -g -k 4


    Usage of GPU

    Code Block
    languagebash
    #!/bin/bash
    
    #SBATCH --account=<account>          # Your account 
    #SBATCH --time=0:10:00
    #SBATCH -p gpu
    #SBATCH --ntasks=1
    #SBATCH --gpus=a100:2                 # allocate 2 (out of 4) A100 GPUs; to get 2 (out of 2) A40 GPUs use --gpus=a40:2
    #SBATCH --hint=nomultithread
    #SBATCH --job-name=gpu
    #SBATCH --output=out_%x.%j
    
    # disable hyperthreading
    #SBATCH --hint=nomultithread
    Code Block
    languagebash
    #!/bin/bash
    
    #SBATCH --time 0:10:00
    #SBATCH -p mpp
    #SBATCH -N 2
    #SBATCH --tasks-per-node 8
    #SBATCH --cpus-per-task 16
    #SBATCH --job-name=hybrid
    #SBATCH --output=out_%x.%j
    
    # disable hyperthreading
    #SBATCH --hint=nomultithread
    
    module purge
    module load    xthi/1.0-intel-oneapi-mpi2021.6.0-oneapi2022.1.0   intel-oneapi-mpi
    # module load    xthi/1.0-openmpi4.1.3-gcc8.5.0   openmpi/4.1.3
    
    ## Uncomment the following line to enlarge the stacksize if needed,
    ##  e.g., if your code crashes with a spurious segmentation fault.
    # ulimit -s unlimited
    
    # OpenMP and srun, both need to know the number of CPUs per task To be on the safe side, we emphasize that it is pure MPI, no OpenMP threads
    export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
    export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK
    
    srun xthi | sort -g -k 41
    
    srun your_code_that_runs_on_GPUs