Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Quantity

Name

Partition

Specification

Notes

240x

prod-[001-240]

smp, smpht
mpp, mppht

  • 2x AMD Rome Epyc 7702 (64 cores each) → 128 cores

  • 256 GB RAM

  • internal storage: /tmp: 314 GB NVMe

For our test phase, we have split the compute nodes in two sets
- prod-[001-200] : smp,mpp: hyperthreading disabled (one thread per core)
- prod-[201-240]: smpht, mppht: hyperthreading enabled (2 threads per core).

More information can be found on the slurm documentation.

4x

fat-00[1-4]

fat, matlab

  • like prod, but with
  • 4 TB RAM

  • internal storage:/tmp: 6.5 TB NVMe

fat-00[3,4] are currently reserved for matlab users, this might change later

1x

gpu-001

gpu

  • like prod, but with
  • 1 TB RAM
  • internal storage:

    • /tmp: 3 TB NVMe

    • /scratch: 6.3 TB
  • 2x Nvidia A40 GPU (48GB)

A comparison of the two different GPUs can be found here:
https://askgeek.io/en/gpus/vs/NVIDIA_A40-vs-NVIDIA_A100-SXM4-80-GB
The saying is: 

  • How big are your models? Very, very big ⟹ A100

  • Do you mainly work with mixed precision training (TensorFloat-32)? ⟹ A100

  • Is FP32 more important? ⟹ A40

  • Is FP64 more important? ⟹ A100


1x4x

gpu-00[2-5]

gpu

  • like prod, but with
  • 1TB RAM
  • internal storage:

    • /tmp: 3 TB NVMe
    • /scratch: 6.3 TB
  • 4x Nvidia A100 GPU (80GB)

...


Personal directoriesProject directories 
Mountpoint/albedo/home/$USER/albedo/work/user/$USER/albedo/scratch/user/$USER/albedo/work/projects/$PROJECT/albedo/scratch/projects/$PROJECT/albedo/burst
Comes withHPC_user account: https://id.awi.deStart a new request/BestellungIT Service → HPC → Add to chart/In den EinkaufswagenApply for Quota here: https://cloud.awi.de/#/projects--
Block Quota100 GB (fixed)3 TB (fixed)50 TB (fixed)

30 €/TB/yr (variable)

10 €/TB/yr (variable)

 
File Quota
1e6 (fixed)3e6 (fixed)

max(1,log(1.5*BlockQuota)) * 1e6

3*max(1,log(1.5*BlockQuota)) * 1e6

 
Delete90 days after user account expiredall data older than 90 days90 days after project expiredall data older then 90 days after 10 days
SecuritySnapshots for 100 days--Snapshots for 100 days----
Snapshots/albedo/home/.snapshots//albedo/work/user/.snapshots/--/albedo/work/projects/.snapshots/-- 
Owner:Group$USER:hpc_user$OWNER:$PROJECTroot:root
Permissions2700 → drwx--S---2770 → rwxrws---1777 → rwxwrxrwt
Focusmany small files
large files, large bandwidth
large files, large bandwidth

low latency, huge bandwidth

...

Data Pools

...

Markdown
A data pool refers to a centralized storage space where large datasets are stored and managed, for easy and shared access by users. Data stored in pools is typically input data that is common to many users and it is not expected to change. For instance, input, boundary conditions and meshes for numerical models are typically stored in data pools, but also reanalysis data, and historical observations that will be accessed and processed by many users. Having a common pool to all users is very advantageous, since it avoids having identical copies of large datasets spreaded throught the file system.

In Albedo data pools are located in `/albedo/pool/<pool_name>`. That directory contains soft links to folders in **project directories** (`/albedo/work/projects/p_<project_name>`). If you would like to make a pool out of your data project please contact us (hpc@awi.de). If you'd like to know more about project directories and how to create one, please read https://spaces.awi.de/display/HELP/HPC+Data+Policy.

Remote user storage (/isibhv)

...