Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Quantity

Name

Partition

Specification

Notes

240x

prod-[001-240]

smp, smpht
mpp, mppht

  • 2x AMD Rome Epyc 7702 (64 cores each) → 128 cores

  • 256 GB RAM

  • internal storage: /tmp: 314 GB NVMe

For our test phase, we have split the compute nodes in two sets, as announced by Malte in the HPC meeting:
-  smp,mpp: hyperthreading disabled (one thread per core)
-  smpht, mppht: hyperthreading enabled (2 threads per core).

More information can be found on the slurm documentation.

4x

fat-00[1-4]

fat, fatht

  • like prod, but with
  • 4 TB RAM

  • internal storage:/tmp: 6.5 TB NVMe


1x

gpu-001

gpu

  • like prod, but with
  • 1 TB RAM
  • internal storage:

    • /tmp: 3 TB NVMe

    • /scratch: 6.3 TB
  • 2x Nvidia A40 GPU (48GB)

A comparison of the two different GPUs can be found here:
https://askgeek.io/en/gpus/vs/NVIDIA_A40-vs-NVIDIA_A100-SXM4-80-GB
The saying is: 

  • How big are your models? Very, very big ⟹ A100

  • Do you mainly work with mixed precision training (TensorFloat-32)? ⟹ A100

  • Is FP32 more important? ⟹ A40

  • Is FP64 more important? ⟹ A100


1x

gpu-002

gpu

  • like prod, but with
  • 1TB RAM
  • internal storage:

    • /tmp: 3 TB NVMe
    • /scratch: 6.3 TB
  • 4x Nvidia A100 GPU (80GB)

...