Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Albedo is the tier-3 High Performance Computing platform (HPC) hosted and supported at AWI. In this documentation you can find the basics on how to operat it. Please, be aware that a basic knowledge on linux and HPCs is expected from Albedo's users, so this documentation does not cover all there is to know about HPCs, linux user permissions, data management...

That being said, Albedo documentation is a living document and we will do our best to improve it and answer the most common questions of the users, either with expanding the documentation or pointing at external sources.

Children Display

Table of Contents

Questions/Suggestions

hpc@awi.de

Hardware

...

  • 64bit-processor, AMD Rome (Epyc 7702)

  • 2 sockets per node
  • 64 core, cTDP reduced from 200W to 165W
  • → 128 cores/node
  • 2GHz (3.3GHz Boost)
  • 512 GiB RAM (8 DIMMs per socket with 32GiB DDR4-3200)
  • 2x SSD 480 GB SATA
  • 1x SSD 1,92 TB

...

  • 2x NVDIA A40
  • 2x AMD Rome (Epyc 7702) 
  • 512 GiB RAM (8 DIMMs per socket with 32GiB DDR4-3200)
  • for testing, Jupyter notebooks, Matlab, ...

...

240x standard compute nodes (je vier in einem NEC HPC2824Ri-2)

  • 256 GiB RAM (8 DIMMs  per socket with 16GiB DDR4-3200)
  • 512 GiB NVMe pro Knoten

...

  • 4 TiB RAM (16 DIMMs per socket with 128 GiB DDR4-3200)
  • 512 GiB NVMe
  • 7.68 TiB NVMe

...

  • 4x Nvidia A100/80
  • 2x AMD Rome (Epyc 7702) 
  • 512/1024 GiB RAM
  • 3x3.84 TiB = 11.52 TiB NVMe
  • More GPU nodes will follow later, after we gained first experience of what you really need, and to offer most recent hardware

...

  • IBM Spectrum Scale (GxFS)
  • Tier 1: 220 TiB NVMe as fast cache and/or burst buffer
  • Tier 2: ~5.38 PiB NL-SAS HDD (NetApp EF300)
  • future extension of both tiers (capacity, bandwidth) is possible
  • Fast interconnect: HDR Infiniband (100 Gb)
  • All nodes connected to /isibhv (NFS, 10 Gb ethernet)
  • Alma Linux ("free RedHat", version 8.x)

The FESOM2 Benchmark we used for the procurement on 240 Albedo nodes compares to 800 Ollie nodes.

Preliminary schedule for the transition Ollie → Albedo

Disclaimer: We cannot guarantee the following time frame! It is an optimistic view and any step can be delayed by weeks or months! Except the first "Now" step, which is in your hands ;-)

...

  • Delete data you do not need to keep,
  • copy tarballs of valuable data to the tape archive (you should do this anyhow, /work has no backup),
  • place a copy of data which you need continuously with fast access to /isibhv.
  • We will not transfer data automatically from Ollie to Albedo!! This is a chance to clean up ;-)

...