Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

ModelUserPro SMT (smile)Contra SMT (sad)
idleadmin-

(∑ Esocket[0-7] according to lm_sensors:
Nodes need 30% more power (3500 kJ) and get warmer compared to without SMT (~2500 kJ)

stress-ng streamadmin--

~13% slower

FESOMNECUsing 128 Threads per node: 3% faster (probably because the (buggy) GXFS daemon can use a virtual core)Using 256 Thread per node: 10% slower
Python AIvhelmno impact/differenceno impact/difference
matlab
#SBATCH --cpus-per-task=16
vhelm
Runtime: 1440s instead of 1366s → ~5% slower
unzip 262 about ~50 MB files in parallel:
S=$(date +%s); parallel -P$P gunzip -c > /dev/null ::: /tmp/input/*  ; echo "$(( $(date +%s) - $S )) sec"

salloc -psmp     --qos=12h --time=12:00:00 --ntasks-per-node=128
salloc -psmpht --qos=12h --time=12:00:00 --ntasks-per-node=256 --mem=249G
lkalesch
mthoma

If a user does only request --ntasks-per-node=1 (the default) and uses P=0 (use all available cores) then you get

$Psmp
(sec)
smpht
(sec)
0175113

But please note: This is a improper use of slurm/HPC, so this "advantage" does not justify SMT

no advantage (if slurm is used properly)

$Psmp
(sec)
smpht
(sec)
Note
1175175without parallisation
2
88
63333
122020
361212
1281010
25699
088use all cores



GPU nodes (A40 vs. A100)

ModelUserA40 vs. A100
 tensorflow-gpu AI applicationvhelmno difference

python3, matrix operations with with numpy (fat) vs cupy (gpu)

sviquera






...



albedoollie
Applicationusernode internal
/tmp (NVMe)
100 Gb Infiniband
/albedo (GPFS)
10 Gb Ethernet
/isibhv (NVMe)

node internal
/tmp (SSD)

100 Gb Omnipath
/work (BeeGFS)

10 Gb Ethernet
/isibhv (NVMe)

idmidl: reading 244 data filesvhelm~9 sec10~13 sec8~11 sec
spikes up to 181 sec
27~29 sec27~37 sec29~60 sec
spikes up to 98 sec
ls -f
ls # default with stat/color
ls -f
directory with
30000 entires
0.08 sec
0.19 sec
0.04 sec
6~15 0.3 sec
0.03 sec
0.2 sec
0.1 sec
0.4 sec
0.2 sec
1.6 sec
0.08 sec
0.3~0.7 sec








  • ...

Runtime compared with ollie


ModelUser
albedo
GPFS
albedo
local NVMe
ollie
BeeGFS
idl vhelmvhelm

Cumulative time for loop and if conditions:  
Cumulative time file open: 
Cumulative time data read and file close: 
Total cumulative time: 
Total amount of imported data (Mega Bytes):   

0.32 s ( 3.11 %)
0.03 s ( 0.26 %)
9.95 s (96.62 %)
10.30 s
34269 (  3442 MB/s)

0.13 s ( 1.40 %)
0.01 s ( 0.06 %)
8.94 s (98.54 %)
9.07 s
34269 (  3832 MB/s)

3.48 s (12.73 %)
0.05 s ( 0.19 %)
23.77 s (87.07 %)
27.30 s
34269 (  1441 MB/s)
tensorflowvhelm

AI for trained modell

4 times slower

~10 minutes