Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

...

Questions/Suggestions

hpc@awi.de

Hardware

  • 2x interactive login nodes
    • 64bit-processor, AMD Rome (Epyc 7702) Serie

    • 2 socket per node
    • 64 core, cTDP reduced from 200W to 165W
    • → 128 cores/node
    • 2GHz (3.3GHz Boost)
    • 512 GiB RAM (8 DIMMs per socket with 32GiB DDR4-3200)
    • 2x SSD 480 GB SATA
    • 1x SSD 1,92 TB
  • 1x interactive GPU node
    • 2x NVDIA A40
    • for testing, Jupyter notebooks, Matlab, ...
  • 240x standard compute nodes (je vier in einem NEC HPC2824Ri-2) ????????

    • 64bit-Prozessoren, AMD Rome (Epyc 7702) Serie

    • 2 processors per node ???????????????
    • 64 Core, cTDP reduced from 200W to 165W
    • 2GHz (3.3GHz Boost)
    • 256 GB RAM
    • 8 DIMMs pro Prozessor mit 16GB DDR4-3200 → 128 Cores und 256 GiB RAM pro Standard-Knoten.256 GiB RAM (8 DIMMs  per socket with 16GiB DDR4-3200)
    • 512 GiB NVMe pro Knoten

  • 4x fat nodes (jeweils in einem NEC HPC2104Ri-1)
    • 4 TiB RAM (16 DIMMs per socket with 128 GiB DDR4-3200)
    • 512 GiB NVMe
    • + 7.68 TiB NVMe (je Knoten) ???????????????
    • 7.5TB SSD
  • 1x 4x GPU nodes (1x HPC22G8Ri)
    • 4x Nvidia A100/80
    • 512/1024 GiB RAM
    • 3x3.84 TiB = 11.52 TiB NVMe
    • More GPU nodes will follow later, after we gained first experience of what you really need, and to offer most recent hardware
  • 3x Management nodes
    •  one AMD 7302P(NEC HPC2104Ri-1) each
    • 1 socket
    • 16 cores
    • 128 GiB RAM (8 DIMMs  per socket with 16GiB DDR4-3200)
    • 2x SATA 960 GiB SSD
    • 1x 1.92 TiB NVMe
  • File Storage:
    • IBM Spectrum Scale (GxFS)
    • Tier 1: 220 TB TiB NVMe as fast cache and/or burst buffer
    • Tier 2: ~5~5.38 PB PiB NL-SAS HDD (NetApp EF300)
    • future extension of both tiers (capacity, bandwidth) is possible
  • Fast interconnect: HDR Infinband Infiniband (100 Gb)
  • All nodes connected to /isibhv (NFS, 10GbE10 Gb ethernet)
  • Alma Linux ("free RedHat", version 8.x)
  • ??????????????????
    • More GPU nodes will follow later, after we gained first experience of what you really need, and to offer most recent hardware
  • Our small test node with NEC's new vector engine "SX-Aurora TSUBASA" can be integrated

The FESOM2 Benchmark we used for the procurement on 240 Albedo nodes compares to 800 Ollie nodes.

...

  • Now Please start to prepare your directories on /work/ollie for transfer to Albedo!
    • Delete data you do not need to keep,
    • copy tarballs of valuable data to the tape archive (you should do this anyhow, /work has no backup),
    • place a copy of data which you need continuously with fast access to /isibhv.
    • We will not transfer data automatically from Ollie to Albedo!! This is a chance to clean up ;-)
  • until April  NEC investigates if OmniPath is an alternative for Mellanox Infinband as the fast network in Albedo, because Infiniband would be delivered with a large delay
  • May
    Albedo installed at AWI  with "slow" Gigabit  (10 Gb/s) network and, if recommended by NEC, fast OmniPath (100 Gb/s)
  • June
    Albedo open for power users
    Start to copy data from Ollie to Albedo (Note: must be done by yourself, there is no automatic data migration!)
    The more Albedo nodes are powered on, the more Ollie nodes have to be switched off. We have no schedule for this, but would decide depending on how useful Albedo already is for how many users. Of course, Albedo has a far better ratio of computing vs. electric power.
  • July
    Albedo open for all users
  • 30. June
    Ollie hardware support ends. As a prolongation would cost approx. 200.000€, we keep Ollie running w/o support, but with the option to have basic components (file system, network) repaired in case of an emergency (maintenance on demand). So far, we have very good experiences with the service of MegWare and are confident that we found a solution that is cost efficient and sustainable.
  • (August Downtime to add Infiniband and  - if we decide against OmniPath)

...