You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 24 Next »


Login nodes

The two login nodes are the only accessible nodes from the AWI intranet.

Quantity

Name

Specification

2x

albedo[0|1]

  • 2x AMD Rome Epyc 7702 (64 cores each) → 128 cores

  • 512GB RAM

  • internal storage: 1.7TB SSD

Compute nodes

Quantity

Name

Partition

Specification

Notes

240x

prod-[001-240]

smp, smpht
mpp, mppht

  • 2x AMD Rome Epyc 7702 (64 cores each) → 128 cores

  • 256 GB RAM

  • internal storage: /tmp: 314 GB NVMe

For our test phase, we have split the compute nodes in two sets, as announced by Malte in the HPC meeting:
-  smp,mpp: hyperthreading disabled (one thread per core)
-  smpht, mppht: hyperthreading enabled (2 threads per core).

More information can be found on the slurm documentation.

4x

fat-00[1-4]

fat, fatht

  • like prod, but with
  • 4 TB RAM

  • internal storage: /tmp: 6.5 TB NVMe


1x

gpu-001

gpu

  • like prod, but with
  • 1 TB RAM
  • internal storage:

    • /tmp: 3 TB NVMe

    • /scratch: 6.3 TB
  • 2x Nvidia A40 GPU (48GB)

A comparison of the two different GPUs can be found here:
https://askgeek.io/en/gpus/vs/NVIDIA_A40-vs-NVIDIA_A100-SXM4-80-GB
The saying is: 

  • How big are your models? Very, very big ⟹ A100

  • Do you mainly work with mixed precision training (TensorFloat-32)? ⟹ A100

  • Is FP32 more important? ⟹ A40

  • Is FP64 more important? ⟹ A100


1x

gpu-002

gpu

  • like prod, but with
  • 1TB RAM
  • internal storage:

    • /tmp: 3 TB NVMe
    • /scratch: 6.3 TB
  • 4x Nvidia A100 GPU (80GB)


Filesystem

Local user storage

    • Tier 1:   ~171 TiB NVMe as fast cache and/or burst buffer

    • Tier 2: ~5030 TiB NL-SAS HDD (NetApp EF300)

  • All nodes are connected via a 100 Gb Mellanox/Inifiniband network.

Personal directoriesProject directories
Mountpoint/albedo/home/$USER/albedo/work/user/$USER/albedo/scratch/user/$USER/albedo/work/projects/$PROJECT/albedo/scratch/projects/$PROJECT/albedo/burst
Comes withHPC_user account: https://id.awi.deStart a new request/BestellungIT Service → HPC → Add to chart/In den EinkaufswagenApply for Quota here: https://cloud.awi.de/#/projects--
Quota100 GB (fixed)3TB (fixed)50 TB (fixed)

30 €/TB/yr (variable)

10 €/TB/yr (variable)


Delete90 days after user account expiredall data older than 90 days90 days after project expiredall data older then 90 days after 10 days
SecuritySnapshots for 180 days--Snapshots for 180 days----
Snapshots/albedo/home/.snapshots//albedo/work/user/.snapshots/--/albedo/work/projects/.snapshots/--
Owner:Group$USER:hpc_user$OWNER:$PROJECTroot:root
Permissions2700 → drwx--S---2770 → rwxrws---1777 → rwxwrxrwt
Focusmany small files
large files, large bandwidth
large files, large bandwidth

low latency, huge bandwidth

Storage Pools

If you want to share data with other users use * Project directories*. For convenience project administrator may request a link from /albedo/pool/<something> → /albedo/work/projects/$PROJECT/<somewhere> via hpc@awi.de

Remote user storage (/isibhv)

  • You can access your online space on the Isilon in Bremerhaven (see https://spaces.awi.de/x/a13-Eg for more information) via the nfs-mountpoints
    /isibhv/projects
    /isibhv/projects-noreplica
    /isibhv/netscratch
    /isibhv/platforms
    /isibhv/home
  • Tape storage (HSM) is not mounted. However, you could archive your results with something like
    rsync -Pauv /albedo/work/projects/$PROJECT/my_valuable_results/* hssrv1:/hs/projects/$PROJECT/my_valuable_results_from_albedo/

Network

Fast interconnect (beween albedo's nodes):

  • HDR Infiniband

Ethernet:

  • albedo is connected to the AWI backbone (including the Isilon and the HSM) via four eth-100 Gb interfaces.
    Each single albedo node has a 10 Gb interface.
  • No labels