...

Model

User

Pro SMT

Contra SMT

idle

admin

-

(∑ Esocket[0-7] according to lm_sensors:
Nodes need 30% more power (3500 kJ) and get warmer compared to without SMT (~2500 kJ)

stress-ng stream

admin

--

~13% slower

FESOM

NEC

Using 128 Threads per node: 3% faster (probably because the (buggy) GXFS daemon can use a virtual core)

Using 256 Thread per node: 10% slower

Python AI

vhelm

no impact/difference

matlab
#SBATCH --cpus-per-task=16

vhelm

Runtime: 1440s instead of 1366s → ~5% slower

unzip 262 about ~50 MB files in parallel:
S=$(date +%s); parallel -P$P gunzip -c > /dev/null ::: /tmp/lkaleschke-huinput/* ; echo "$(( $(date +%s) - $S )) sec"

salloc -psmp --qos=12h --time=12:00:00 --ntasks-per-node=128
salloc -psmpht --qos=12h --time=12:00:00 --ntasks-per-node=256 --mem=249G

lkalesch
mthoma

If a user does only request --ntasks-per-node=1 (the default) and uses P=0 (use all available cores) then you get

$P	smp (sec)	smpht (sec)
0	175	113

But please note: This is a improper use of slurm/HPC, so this "advantage" does not justify SMT

no advantage (if slurm is used properly)

$P	smp (sec)	smpht (sec)	Note
1

177

175	175	without parallisation
2		88
6	33	33
12	20	20
36	12	12
128	10	10
256	9	9
0	8	8	use all cores

GPU nodes (A40 vs. A100)

Model	User	A40 vs. A100
tensorflow-gpu AI application	vhelm	no difference
python3, matrix operations with with numpy (fat) vs cupy (gpu)	sviquera

...

Runtime compared with ollie

Model	User		albedo GPFS	albedo local NVMe	ollie BeeGFS
idl vhelm	vhelm	Cumulative time for loop and if conditions: Cumulative time file open: Cumulative time data read and file close: Total cumulative time: Total amount of imported data (Mega Bytes):	0.32 s ( 3.11 %) 0.03 s ( 0.26 %) 9.95 s (96.62 %) 10.30 s 34269 ( 3442 MB/s)	0.13 s ( 1.40 %) 0.01 s ( 0.06 %) 8.94 s (98.54 %) 9.07 s 34269 ( 3832 MB/s)	3.48 s (12.73 %) 0.05 s ( 0.19 %) 23.77 s (87.07 %) 27.30 s 34269 ( 1441 MB/s)
tensorflow	vhelm	AI for trained modell	4 times slower		~10 minutes

Space shortcuts

Page tree

Versions Compared

Old Version 16

New Version Current

Key

GPU nodes (A40 vs. A100)

Runtime compared with ollie

Space shortcuts

Page tree

Page History

Versions Compared

Old Version 16

New Version Current

Key

GPU nodes (A40 vs. A100)

Runtime compared with ollie