You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 38 Next »


Login nodes

The two login nodes are the only accessible nodes from the AWI intranet.

Quantity

Name

Specification

2x

albedo[0|1]

  • 2x AMD Rome Epyc 7702 (64 cores each) → 128 cores

  • 512GB RAM

  • internal storage: 1.7TB SSD

Compute nodes

Quantity

Name

Partition

Specification

Notes

240x

prod-[001-240]

smp, smpht
mpp, mppht

  • 2x AMD Rome Epyc 7702 (64 cores each) → 128 cores

  • 256 GB RAM

  • internal storage: /tmp: 314 GB NVMe

For our test phase, we have split the compute nodes in two sets
- prod-[001-200] : smp,mpp: hyperthreading disabled (one thread per core)
- prod-[201-240]: smpht, mppht: hyperthreading enabled (2 threads per core).

More information can be found on the slurm documentation.

4x

fat-00[1-4]

fat, matlab

  • like prod, but with
  • 4 TB RAM

  • internal storage: /tmp: 6.5 TB NVMe

fat-00[3,4] are currently reserved for matlab users, this might change later

1x

gpu-001

gpu

  • like prod, but with
  • 1 TB RAM
  • internal storage:

    • /tmp: 3 TB NVMe

    • /scratch: 6.3 TB
  • 2x Nvidia A40 GPU (48GB)

A comparison of the two different GPUs can be found here:
https://askgeek.io/en/gpus/vs/NVIDIA_A40-vs-NVIDIA_A100-SXM4-80-GB
The saying is: 

  • How big are your models? Very, very big ⟹ A100

  • Do you mainly work with mixed precision training (TensorFloat-32)? ⟹ A100

  • Is FP32 more important? ⟹ A40

  • Is FP64 more important? ⟹ A100


1x

gpu-002

gpu

  • like prod, but with
  • 1TB RAM
  • internal storage:

    • /tmp: 3 TB NVMe
    • /scratch: 6.3 TB
  • 4x Nvidia A100 GPU (80GB)


Filesystem

Local user storage

    • Tier 1 "system":   ~171 TiB NVMe as fast cache and/or burst buffer

    • Tier 2 "data": ~5030 TiB NL-SAS HDD (NetApp EF300)

    • You can check in which storage pool your data resides with mmlsattr -L <file>
  • All nodes are connected via a 100 Gb Mellanox/Inifiniband network.

Personal directoriesProject directories
Mountpoint/albedo/home/$USER/albedo/work/user/$USER/albedo/scratch/user/$USER/albedo/work/projects/$PROJECT/albedo/scratch/projects/$PROJECT/albedo/burst
Comes withHPC_user account: https://id.awi.deStart a new request/BestellungIT Service → HPC → Add to chart/In den EinkaufswagenApply for Quota here: https://cloud.awi.de/#/projects--
Block Quota100 GB (fixed)3 TB (fixed)50 TB (fixed)

30 €/TB/yr (variable)

10 €/TB/yr (variable)


File Quota
1e61e6

max(1,log(1.5*BlockQuota)) * 1e6


Delete90 days after user account expiredall data older than 90 days90 days after project expiredall data older then 90 days after 10 days
SecuritySnapshots for 100 days--Snapshots for 100 days----
Snapshots/albedo/home/.snapshots//albedo/work/user/.snapshots/--/albedo/work/projects/.snapshots/--
Owner:Group$USER:hpc_user$OWNER:$PROJECTroot:root
Permissions2700 → drwx--S---2770 → rwxrws---1777 → rwxwrxrwt
Focusmany small files
large files, large bandwidth
large files, large bandwidth

low latency, huge bandwidth

Storage Pools

If you want to share data with other users use project directories. For convenience, a project administrator may request a link from /albedo/work/projects/$PROJECT/<somewhere> → /albedo/pool/<something> via hpc@awi.de

Remote user storage (/isibhv)

  • You can access your online space on the Isilon in Bremerhaven (see https://spaces.awi.de/x/a13-Eg for more information) via the nfs-mountpoints
    /isibhv/projects
    /isibhv/projects-noreplica
    /isibhv/netscratch
    /isibhv/platforms
    /isibhv/home
  • Tape storage (HSM) is not mounted. However, you could archive your results with something like
    rsync -Pauv /albedo/work/projects/$PROJECT/my_valuable_results/* hssrv1:/hs/projects/$PROJECT/my_valuable_results_from_albedo/

Considerations: Where should I store my data?

/workSubject/isibhv
procon
procon
100 Gb Infinibandalbedo intern onlyAcceptabilityavailable from everywhere (inside AWI)10 Gb Ethernet
low
Latency
higher
about 10-30 €/TB/yr
Cost
about 100-125 €/TB/yr
snapshots
Securitysnapshots, automatic tape backup available (+25 €/TB/yr)










Network

Fast interconnect (beween albedo's nodes):

  • HDR Infiniband

Ethernet:

  • albedo is connected to the AWI backbone (including the Isilon and the HSM) via four eth-100 Gb interfaces.
    Each single albedo node has a 10 Gb interface.
  • No labels