Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

ModelUserPro SMT (smile)Contra SMT (sad)
idleadmin-

(∑ Esocket[0-7] according to lm_sensors:
Nodes need 30% more power (3500 kJ) and get warmer compared to without SMT (~2500 kJ)

stress-ng streamadmin--

~13% slower

FESOMNECUsing 128 Threads per node: 3% faster (probably because the (buggy) GXFS daemon can use a virtual core)Using 256 Thread per node: 10% slower
Python AIvhelmno impact/differenceno impact/difference
matlab
#SBATCH --cpus-per-task=16
vhelm
Runtime: 1440s instead of 1366s → ~5% slower
unzip 262 about ~50 MB files in parallel:
S=$(date +%s); parallel -P$P gunzip -c > /dev/null ::: /tmp/lkaleschke-huinput/*  ; echo "$(( $(date +%s) - $S )) sec"

salloc -psmp     --qos=12h --time=12:00:00 --ntasks-per-node=128
salloc -psmpht --qos=12h --time=12:00:00 --ntasks-per-node=256 --mem=249G
lkalesch
mthoma

If a user does only request --ntasks-per-node=1 (the default) and uses P=0 (use all available cores) then you get

$Psmp
(sec)
smpht
(sec)
0175113

But please note: This is a improper use of slurm/HPC, so this "advantage" does not justify SMT

no advantage (if slurm is used properly)

$Psmp
(sec)
smpht
(sec)
Note
1
177
175175without parallisation
2
88
63333
122020
361212
1281010
25699
088use all cores



GPU nodes (A40 vs. A100)

ModelUserA40 vs. A100
 tensorflow-gpu AI applicationvhelmno difference

python3, matrix operations with with numpy (fat) vs cupy (gpu)

sviquera






...

Runtime compared with ollie


ModelUser
albedo
GPFS
albedo
local NVMe
ollie
BeeGFS
idl vhelmvhelm

Cumulative time for loop and if conditions:  
Cumulative time file open: 
Cumulative time data read and file close: 
Total cumulative time: 
Total amount of imported data (Mega Bytes):     

0.32 s ( 3.11 %)
0.03 s ( 0.26 %)
9.95 s (96.62 %)
10.30 s
34269 (  3442 MB/s)

0.13 s ( 1.40 %)
0.01 s ( 0.06 %)
8.94 s (98.54 %)
9.07 s
34269 (  3832 MB/s)

3.48 s (12.73 %)
0.05 s ( 0.19 %)
23.77 s (87.07 %)
27.30 s
34269 (  1441 MB/s)
tensorflowvhelm

AI for trained modell

4 times slower

~10 minutes