Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents
outlinetrue
indent10px

Storage

Users storage


Personal directoriesProject directories
Mountpoint/albedo/home/$USER/albedo/work/user/$USER/albedo/scratch/user/$USER/albedo/work/projects/$PROJECT/albedo/scratch/projects/$PROJECT/albedo/burst
Quota (soft)100 GB3TB50 TB

variable
30 €/TB/yr

variable
10 €/TB/yr

--
Quota (hard)100 GB15 TB for 90 days--

2x soft quota for 90 days

--


Delete90 days after user account expiredall data after 90 days90 days after project expiredall data after 90 days after 10 days
SecuritySnapshots for 6 months--Snapshots for 6 months----
Owner$USER:hpc_user$OWNER:$PROJECTroot:root
Permission2700 → drwx--S---2770 → rwxrws---1777 → rwxwrxrwt
Focusmany small files
large files, large bandwith

low latency, huge bandwitch

System storage

Is installed and maintained in /albedo/home/soft/:

...

If you need space here, please contact hpc@awi.de

Compute nodes & Slurm

  • To submit a job from the login nodes to the compute nodes you need slurm (job scheduler, batch queueing system and workload manager)
  • A submitted job has/needs the following information/ressources:

    WhatUseDefaultComment
    A Name
    --job-name=


    Number of nodes
    -N or --nodes=
    1
    Number of (MPI-)tasks (per node)
    -n or --ntasks=
    --ntasks-per-node=
    1Needed for MPI
    Number of cores/threads per task
    -c or --cpus-per-task=


    1

    Needed for OpenMP

    If -n N and -c C is given, you get NxC cores.

    Memory/RAM per CPU

    --mem=
    ntasks x nthreads x 1.6 GBOnly needed for smp-jobs, mpp-jobs get whole nodes (cores and memory)

    A maximum walltime

    --time=
    01:00
    A partition
    -p 
    smp
    A qos (quality of service)--qos=normal


  • Please take a look at our examples scripts (from ollie) SLURM Example Scripts

Partitions

PartitionNodesPurpose
smp
For jobs with up to 36 cores. Jobs may share nodes (until all cores are occupied).
mpp
For parallel jobs, typically >=36 cores. Jobs get nodes exclusively
gpu





QOS

QOSmax. timemax nodes
job/user/total
Fairshare
usage factor
Comment
short00:30:00128 / 312 / 3121High priority, jobs in this qos run first
normal12:00:00128 / 312 / 3121Default
large96:00:008 / 16 / 642
xlarge400:00:001 / 2 / 810Only on request. Use at own risk, consider short jobs with restarts if possible
knurd--1For admins only


SLURM commands

  • sinfo shows existing queues
  • scontrol show job <JobID> shows information about specific job
  • sstat <JobID> shows resources used by a specific job
  • squeue shows information about queues and used nodes
  • smap curses-graphic of queues and nodes
  • sbatch <script> submits a batch job
  • salloc <resources> requests access to compute nodes for interactive use
  • scancel <JobID> cancels a batch job
  • srun <ressources> <executable> starts a (parallel) code
  • sshare and sprio give information on fair share value and job priority